Note: This article is the first of a two-part series. The quotes in this article have been edited and condensed for clarity.
Although artificial intelligence in healthcare is an area of intense interest, exploring the topic is akin to peeling an onion — each layer revealing a new set of opportunities and obstacles.
While clinical decision support and identifying targets for drug development are frequently cited as examples of where AI is impacting healthcare, a deeper dive on this topic soon reveals the limitations of some data sources, how some companies are addressing them, and where standards and best practices are needed for this relatively young and vibrant aspect of health tech to continue to evolve in life sciences and healthcare.
In a panel discussion at the MedCity CONVERGE conference exploring the opportunities and challenges posed by the data underpinning AI tools, panelists representing big pharma, health tech and clinical data networks shared some of their insights.
The participants included Chris Boone, Head of Real World Data and Analytics for Pfizer; Gaurav Singal, chief data officer at Foundation Medicine; Janak Joshi, chief technology officer and head of strategy at Life Image; and Nate Nussbaum, senior medical director at Flatiron Health. Brenda Hodge, chief marketing officer for healthcare at Nuance, served as the moderator.
One of the challenges in harnessing the data to support AI in healthcare is simply gathering it. Nussbaum explained how Flatiron Health sorts out the data:
“Teams of human abstractors review EHR data to understand what that unstructured documentation actually means and to pull the data out in a way so that we can use that as a source of truth and then use it to build models, use it to understand the quality of models, and then understand things like how much bias a machine-learning model has, introducing so that we can then ask research questions and have confidence in the answers.”
AI in oncology care — what’s being done and where is the potential?
Joshi cited the need for the clinical context found in unstructured data to assess therapy effectiveness. He cited collaborations between Life Image and life science organizations that are working to understand the potential indicators.
“We are working with a couple of companies on non-small cell lung cancer to identify the signals that can potentially indicate therapy effectiveness. How do you conduct comparative effectiveness by marrying both generic biomarker data as well as imaging data?
Radiomics is a relatively new concept of marrying generic biomarker data with imaging data. Currently, the output of this model is unknown. But what we are finding is an increasing need, utility and, most importantly, clinical relevancy of not using only medical claims data or structured data sets coming from EHRs but the unstructured data coming from everything else that surrounds the patient.
Joshi also cited another project the company is working on that illustrates how difficult it can be to develop accurate machine learning algorithms that effectively read and understand medical images in a clinical context.
“Writing a simple query that indicates how many patients diagnosed with non-small cell lung cancer were former smokers with cancer diagnosed specifically for the left lung is, actually, quite burdensome. The indication of ‘left lung’ is very hard to find in imaging data sets coming from PACS systems in hospitals. It is often a manually curated effort where a human says, ‘This is a left lung; this is a right lung,’ but, if you flip the image, you end up some of the false positives and false negatives.Life Image is essentially using [Cloud] AutoML functionality to identify that label. But, more important than the label, is going to be the classification around it. Once you know if it’s a left lung, you need to determine how many other left lungs exist in your data set and if there is a pattern at the pixel level associated with that. The labeling, classification, and normalization across multiple different vendors is a really hard problem to solve.”
Singal of Foundation Medicine offered short term and long term goals it has identified for AI.
“The long-term dream for us is that, by combining lots of different diagnostic components, we’ll be able to help inform the best treatment for every patient. I personally think that’s still quite a ways away. One place we’re spending a lot of time nowadays is in pathology images because every case we sequence we also get a pathology slide and we digitize [them]. That’s a place where I think, even in the near term, we’ll start to pull out new features and new biomarkers.”
Importance of access, availability, and curation of data sets
The most widely known obstacle to the curation of data sets that companies rely on for AI tech is how best to normalize/standardize the data.
Boone said that Pfizer relies heavily upon the data that it curates from its randomized clinical trials and work with partners like Flatiron Health and Foundation Medicine to access previously unseen real world data.
“How do we transform the way we do clinical research today, and what makes the most sense? We’re also thinking about, when we’re looking at [electronic care report forms] most of the data that we’re capturing now is captured, in essence, in these platforms that exist today. But there’s still that 30 percent that we know that is not there.” He noted that these collaborations with either company will help address data elements that should be captured.
To avoid black boxes and support transparency, context is critical
Although Boone emphasized that he likes working with startups and tech companies, he finds that they need to understand that it’s not enough to provide data — they need to explain how it was collected and the circumstances under which it was done.
“The argument that’s being made is that ‘we have the best data that you can find anywhere or captured in this context.’ I’m not even sure they understand how and why they’re capturing it. So, for us, context is probably the most important thing as you get into it. It can’t be this black box approach to, ‘Well, we captured this data in this way. You may not understand how we’re doing this.’ We can’t go to any regulator and say, ‘Well, they did it, and we trust that they did it the right way.’”
Joshi agreed that context and transparency are key to data access and utility of real world evidence and drug development.
“In order to develop a regulatory-grade data set, it’s not just about the data capture. You must understand the pathway or the point in the clinical workflow when the information was captured. A lung CT scan captured at a particular point in the patient’s pre or post-diagnosis workflow is actually a very important marker for how to interpret that data using either a machine learning or human interpretation.”
Linkability of data is also critical. Joshi noted that this becomes particularly difficult with pathology and claims data in oncology, of which an estimated 15 to 20 percent is biased due to incorrect diagnoses, duplicate diagnoses, or unintended diagnoses.
Next week: A look at the democratization of data and the need to balance this with ethical considerations.
Photo: Getty Images