Artificial Intelligence In The Healthcare Industry
There are dozens of ways to interpret the term “Artificial Intelligence” (AI) – not just in literature or movies, where we encounter AI as the red camera eye HAL 9000 or in the shape of a humanoid robot. In pharma industry and medical engineering, however, the term AI can be narrowed down to machine learning: a computer system is trained to recognize a specific pattern to process a specific task. Machine learning is based on training data with proper characteristics that suit the intended purpose of the AI. Thus, successful validation depends on high quality training data.
Areas of application within the healthcare industry
The potential for quality assurance and pharmaceutical production is huge:
1) Predictive maintenance: incidents in the production process can be anticipated using machine learning. Connected sensors collect a tremendous amount of data, which is analyzed to predict failures.
2) Quality assurance through a “machine” based visual inspection: optical sensors are used to take a “picture” of every tablet or vial, while AI assesses the surface quality or turbidity. This visual inspection demonstrates the superiority of AI to the human eye, which is prone to distraction and weariness. Therefore, Fraunhofer IPA developed an adaptive, visual inspection procedure, which enables AI to detect surface errors, impurities and fluctuations in large-scale production processes.(1) Comparable areas of application for visual processes are also found in pharma research, such as cell identification.
The impact on patient well-being is even more evident in medical engineering, where AI-based diagnostic methods already exist. For example, a system that detects heart rate fluctuations in patients with heart conditions and warns them of an impending heart attack. However, such systems must meet the highest demands regarding functional reliability and precision.
What do companies need to be aware of when validating machine learning / AI?
The first step of any validation is the specification of the “intended purpose”; followed by planning of the technical implementation. Traditional (linear deterministic) systems and non-linear AI systems hardly differ at this point. The fundamental difference is the black box and non-linear nature of AI. Hence, we have no deeper insight on how and what the AI system has learned to make a specific decision. However, we can define the specific data the AI system is using to train and learn for its proper task. That raises the following questions:
1) What is considered as good data and where is it coming from? What if there is no connection between the data and the intended purpose? Then I am training my system simply to some arbitrary action. Pseudo correlations are even more treacherous, for example, ducks were also identified as ships when AI was trained to consider the surrounding water as criteria. In that case, there are approaches in place on how to open the black box to understand the AI decision-making process. (2)
2) When is the training done? A system lock can be activated once a target behavior has been observed. This turns the traditional test into a statistical number or probability: how many false positives and how many non-detected symptoms are acceptable? Additionally, an application-based risk assessment is also necessary. If the ultimate goal is to create a system that learns continuously, that system must be protected against incorrect data. Clinical studies are essential to proof the reliability of AI software if it is designed to be a part of a medical device.
Thus, AI validation only partially follows known methods and procedures. There are fundamental differences compared to the validation of traditional computer systems, e.g. when moving the focus from testing deterministic routines to the quality of training data and statistical probabilities.
Sources
(1) Source: Fraunhofer IPA, https://www.ipa.fraunhofer.de/de/ueber_uns/Leitthemen/ki/ki_fuer_die_produktion/ki_fuer_die_ qualitaetssicherung.html (retrieved 2019-07-23)
(2) Unmasking Clever Hans predictors and assessing what machines really learn, S. Lapuschkin, et. Al., Nature Communications, 2019