Regulating the Black Box of Medical AI


Two scenes, 5000 miles and 10 months apart:

April 2016 ~ One of the largest suppliers of electronic healthcare record systems to General Practitioners in the UK realise that an algorithm used to estimate patients’ risk of heart disease or stroke has been coded incorrectly. As a result of this simple programming error, thousands of patients were given incorrect information about their risk, potentially receiving unnecessary drugs or missing out on preventative treatment.

Feb 2017 ~ Researchers from Stanford University publish a research letter in Nature describing the use of a deep convolutional neural network (a type of machine learning algorithm that takes inspiration from the layered structure of the part of the brain responsible for vision) to diagnose skin cancer. By training the algorithm on 130000 pictures of assorted spots, rashes, blemishes and skin lumps, the neural network was able to diagnose skin cancers with the same level of accuracy as qualified dermatologists. In principle, the system could be used to automatically diagnose likely skin cancers from a smartphone snap: skin selfie to diagnosis in seconds.



These examples illustrate both the potential and the pitfalls of a future where healthcare is increasingly automated. Among the many questions raised by the rapid development of machine learning techniques and applications is what this means for medical regulation. Healthcare services are, for obvious reasons, highly regulated and subject to a wide range of legal and regulatory frameworks to ensure their quality and safety. Patients expect that the doctors, nurses and therapists providing their care to have the necessary skills, knowledge and qualifications, that the drugs they take are manufactured and prescribed safely, that medical equipment works, that lab results are accurate and that errors and accidents during care will be prevented and dealt with.

All these are subject to a hugely diverse range of laws and regulations. From a UK (or more specifically, England) perspective, the three areas of healthcare regulation most most likely to be impacted by increasing automation are:

The regulation of medical devices by the Medicines and Healthcare products Regulatory Agency (MHRA). As well as regulating drugs and medical devices (e.g. cardiac stents, joint replacements), the MHRA is the statutory regulator of medical  software and apps. Software involved in clinical decision making (think software that helps calculate drug dosages or makes treatment recommendations, but not software used for booking appointments) is regulated as a medical device. The MHRA has a nice summary of what this means for developers and clinicians : in short, low risk applications are managed through a self certification approach, but higher risk applications need to be independently validated by an independent organisation . The approach is closely linked to European Union regulations (specifically MEDEV 2.1/6) and the process of CE certification (which incidentally means that whatever happens as a result of Brexit is going to have major implications for the regulation of medical AI in the UK). Most of the interest in machine learning based applications is in the field of diagnostics (in particular radiology, opthamology and dermatology), and these would almost certainly be regulated as medical devices and require CE certification.

The regulation of the providers of healthcare services by the Care Quality Commission (CQC). The CQC is the statutory regulator of healthcare services (e.g. hospitals, GPs, care homes, community services, dentists) in England. This includes not only traditional healthcare services but also providers of online based consultations or medical services such as Babylon. Providers are assessed on a range of criteria to ensure that the services they provide are safe, effective, caring, responsive and well-led. It is not clear if automated healthcare services would fall under the remit of the CQC, but this seems likely, especially if these services were being purchased on behalf of patients by the NHS. Certainly, the use of AI by traditional providers would also potentially be of interest to the CQC – how hospitals demonstrate for example that their machine learning based radiology system are safe and accurate for example.

The regulation of medical professionals. In the UK, the main professional regulators are the General Medical Council (for doctors), and the Nursing and Midwifery Council (nurses and midwives). These bodies set the standards of training, behaviour, and practice expected of healthcare professionals. There are going to be some difficult challenges in how these standards are interpreted in an age of increasing medical automation. For example, where do the professional responsibilities of a doctor begin and end when she is following a treatment plan recommended by a machine learning algorithm? One of the issues here is that in contrast with a traditional protocol or clinical guideline, the rationale of how and why a machine learning algorithm has generated a particular output or recommendation can be very hard (or even impossible) to determine. How can doctors and nurses assure themselves, and in turn their professional regulators, that they have acted responsibly when they are making use of algorithms that are essentially black boxes whose internal processes and decision making are hidden? Current guidelines for clinicians about the use of medical apps are linked to MHRA regulation and CE certification, and this could provide a blueprint for future regulation of clinicians’ use of medical AI technology.

This is almost certainly an over-simplification of the many regulatory issues surrounding the implementation of machine learning applications in healthcare. And I haven’t even mentioned any of the issues relating to legal liability and litigation (I am not a lawyer) but these are likely going to be at least as complex (though you never know, maybe we will one day have our medical algorithms being taken to virtual electronic court by the legal AIs).

Although this seems like a dry topic, getting the regulatory frameworks right is important. A “wild west” approach to healthcare, free from any regulatory oversight is unlikely to be acceptable to society and could lead to a great deal of harm (a digital equivalent of healthcare in the era of blood letting, snake oil salesmen and quakery). At the same time, poorly designed regulation may fail to provide the intended protection to patients, generate perverse incentives and unexpected harms, and stifle innovation and implementation. I don’t know what the ideal regulatory framework for medical AI looks like, but there are few things that we could be doing now to increase the chance that we get this right:

  1. Look across and share learning with other industries also being changed by automation. What can healthcare learn from regulatory approaches to machine learning and automation in say, transportation, legal services or fintech?
  2. Develop better ways to unpack, inspect and understand the black box of algorithms. For complicated neural networks this is at present exceptionally hard, if not impossible (imagine for example using a brain CT scan to explain how your brain creates the visual perception of a beautiful sunset). Making artificial neural networks explainable is however an active area of research and would help immensely in developing regulatory frameworks for medical AI
  3. Develop approaches to measuring and evaluating the quality and safety of medical AI applications. This could involve extending existing post marketing surveillance and reporting systems to include medical AI, and setting up registries and audits to measure the real world outcomes of patients managed using these systems. We might need to think creatively about how to capture this type of data – it might for example be useful to capture a record of the “mind state” of the machine learning algorithm at the time that it made a particular recommendation or decision (if an algorithm is continually being updated and learning from new data, it would be important to be able to know if it made a lot of dangerous mistakes one Tuesday afternoon for example)
  4. Start thinking about these regulatory issues sooner rather than later. It would be much better if (proportionate, wise) regulation develops alongside technical innovation and implementation and not only in response to some major quality or safety scandal.

If you want to read in more detail about the issues of licensing AI applications in healthcare, I can heartily recommend this blog by Dr Mark Wandle.




Improving healthcare quality in the age of big data


It is now almost 100 years since Walter Shewhart started work at the Western Electric Company Hawthorne Works. His work there to develop quality control methods based on measuring and understanding variation in industrial systems was foundational, greatly influencing industry and management in the 20th Century. Many people would recognize today the statistical process control charts that he developed and still use the simple heuristics he developed to interpret them. Shewart’s work greatly influenced the ideas of William Deming, whose “Theory of Profound Knowledge” has been adopted extensively in healthcare quality improvement, including the single most influential concept in healthcare quality improvement – the Institute of Health Improvement’s “Model for Improvement” and the idea of the PDSA cycle.

The IHI Model for Improvement

It is no exaggeration to say that the Model for Improvement has become dogma, included in almost every teaching course and programme of healthcare quality improvement – the national programmes to improve the NHS in England, Wales and Scotland for example all borrow heavily from this model.  Using and understanding data is a central component of the Model for Improvement. Here the role of data is in measuring the effects of tests of change – for example measuring the change in number of hospital acquired infections after implementing a new catheter checklist. The role of data in the Model for Improvement is limited largely to simple measurement. 

With its focus on rapid cycle tests of change, the model promotes the idea that measurement should be done little and often, using small samples to assess change in a small number of metrics – ideas that Walter Shewart would also have espoused as he stalked the production line with his clipboard, carbon copy and pencil. There were perfectly good reasons for this approach in 1918 and many would argue that these still apply in 2017. Data can be burdensome to collect, and the delay between data collection and its availability so long that it is not useful to measure continuous improvement. Having more data does not necessarily lead to greater insight, and collecting data can distract from the actual work of improvement. However, the world has moved on in many ways since 1918, and it is important to ask if we should reappraise the role of data in quality improvement in this age of Big Data.

Quality improvement in the age of Big Data

There are several reasons why I think we should think again. As the marginal cost of collecting and storing electronic data falls towards zero, the idea that data for improvement should still be “little and often”, limited in scope and based only on small samples, looks less convincing that it did in the past. Rules that were appropriate in an age of pen and paper measurement look less relevant in the digital age. Indeed, there are reasons to think that holding on to the traditional approach to using data is actively problematic. For example, the “little and often” approach to measurement ignores the problems that come from sampling: achieving unbiased samples of data in the real world is actually quite hard to do for example. Even “little and often” measurement is still more burdensome and slower than completely automated data collection and analysis – which might seem like a pipe dream in many healthcare organizations but can still be an aspiration to which organizations work towards. For me, probably the biggest limitation with the traditional way of thinking about data in the world of quality improvement is that it demotes data to the role of simple measurement: just a dumb ruler to measure change by. The role of data to help people understand the systems they are trying to improve, learn how and what to improve, plan their interventions and tests of change or simulate them in advance, all become possible when we have much richer and detailed data, and use data in more sophisticated ways. I believe this to be a missed opportunity. Shewhart’s ideas have taken us a long way, but it’s time to think again.