Using machine learning to make healthcare services better, cheaper and safer


Most of the interest (and venture capital funding) in healthcare AI is currently focused on very clinical use cases – automatically interpreting CT scans or retinal photographs for example, or trying to make a diagnosis from patients’ symptoms. These are the types of uses of AI that feature in a typical doctor-patient consultation. But behind each consultation is the whole multi-trillion dollar industry of healthcare: all the work, activity, people, and infrastructure that would perhaps be less obvious to patients (and hence data scientists and engineers turning an eye to using AI in healthcare) but which actually make up the bulk of healthcare activity and expenditure. Human resources, management, financial administration, logistics and supply chains, planning, laboratories, facilities, R&D, safety systems : all of these are data rich aspects of the healthcare industry where AI systems could find many uses.

I have a particular interest in using data to understand and improve the quality and safety of healthcare and I am struck by the wide number of uses that one particular approach to AI, machine learning, could have in the type of work I do. One of the challenges faced in making this happen is that the machine learning experts (by and large) know very little about the healthcare industry, and the healthcare experts (in turn) know very little about machine learning. Bridging this knowledge gap through collaboration is going to be key.

So what then are the types of problems in healthcare quality that could be addressed by machine learning? Here I outline, in an admittedly extremely broad and simplistic sense, the main types of problems that machine learning algorithms can be used to solve, and how they could be used to make the industry of healthcare better, cheaper and safer.


These are problems where the goal is to classify data into groups or categories. Examples include systems to help self-driving cars detect and avoid pedestrians or to automatically classify photographs according to subject matter (“Pictures of cats and dogs”)

  • Classifying hospitals into categories of performance or service provision, to generate hospital quality ratings or scorecards
  • Classifying patients into different categories based on diagnostic or procedure codes, or measures of healthcare utilization and cost (such as length of stay). These classifications are widely used as the currency in healthcare payment and reimbursement systems

These are problems where the goal is to make predictions based on an existing set of data. Examples include prediction systems used in the finance (e.g. Financial forecasting, fraud detection) and retail (e.g. More efficient logistics by predicting demand)

  • Predicting the effects of a service reorganisation or a quality improvement intervention (e.g. What will happen if we introduce this new patient referral pathway?)
  • Predicting patient outcomes for prognostication, providing better information for shared decision making or planning future health and social care needs
  • Estimating case mix adjusted outcomes such as survival rates after cancer or rates of surgical complications. These case mix adjusted outcomes are often used to compare the quality of hospitals
  • Predicting counterfactuals (what would have happened if the intervention had not taken place) as part of the evaluation of service reorganization or improvement interventions
  • Predicting variation in demand for healthcare services

These are problems where the goal is to identify data points that are similar to each other. For example, clustering algorithms are widely used in recommender systems in online retail (“Customers who bought this item also bought these….”) and in entertainment platforms such as Netflix and Spotify

  • Identifying inequalities in care provision and quality, according to time (e.g. the weekend effect), place (e.g. geographical disparities) and person (e.g. inequalities)
  • Estimating the associations between processes of care and patient outcomes. These types of analyses are widely done as part of epidemiological or health services research studies and are useful in generating hypotheses for randomized controlled trials
  • Grouping  together similar healthcare providers to enable more representative benchmarking and comparisons (e.g. Between hospitals or between surgeons)
  • Identifying subgroups of patients with unexpectedly poor outcomes. This could help in detecting safety problems
  • Detecting significant patterns in time series data (Anomoly detection; also a Regression type problem). Time series such as Run Charts and the various flavor of Statistical Process Control chart are some of the most frequently used tools in healthcare quality improvement

This is the process of identifying the most significant variables (“features” in the language of ML) in datasets with lots of variables. These methods can use used to help summarise complex datasets

  • Devise and select metrics to measure the quality and safety of healthcare systems
  • Extract relevant information from electronic healthcare record systems with large numbers of data items
  • Design datasets for programmes to measure the quality and safety of healthcare (e.g. Clinical registries and audits)


This is a very high level and simplistic look at the types of ML methods available – underneath this extremely broad (and arguably over simplistic classification) are a whole ecosystem of different methods and families of ML algorithms. The other key ingredient here is of course, data – without training data these algorithms are merely concepts. Healthcare is full of data, but using it for machine learning is going to throw up all sorts of technical and ethical challenges. More on this another time…




Regulating the Black Box of Medical AI


Two scenes, 5000 miles and 10 months apart:

April 2016 ~ One of the largest suppliers of electronic healthcare record systems to General Practitioners in the UK realise that an algorithm used to estimate patients’ risk of heart disease or stroke has been coded incorrectly. As a result of this simple programming error, thousands of patients were given incorrect information about their risk, potentially receiving unnecessary drugs or missing out on preventative treatment.

Feb 2017 ~ Researchers from Stanford University publish a research letter in Nature describing the use of a deep convolutional neural network (a type of machine learning algorithm that takes inspiration from the layered structure of the part of the brain responsible for vision) to diagnose skin cancer. By training the algorithm on 130000 pictures of assorted spots, rashes, blemishes and skin lumps, the neural network was able to diagnose skin cancers with the same level of accuracy as qualified dermatologists. In principle, the system could be used to automatically diagnose likely skin cancers from a smartphone snap: skin selfie to diagnosis in seconds.



These examples illustrate both the potential and the pitfalls of a future where healthcare is increasingly automated. Among the many questions raised by the rapid development of machine learning techniques and applications is what this means for medical regulation. Healthcare services are, for obvious reasons, highly regulated and subject to a wide range of legal and regulatory frameworks to ensure their quality and safety. Patients expect that the doctors, nurses and therapists providing their care to have the necessary skills, knowledge and qualifications, that the drugs they take are manufactured and prescribed safely, that medical equipment works, that lab results are accurate and that errors and accidents during care will be prevented and dealt with.

All these are subject to a hugely diverse range of laws and regulations. From a UK (or more specifically, England) perspective, the three areas of healthcare regulation most most likely to be impacted by increasing automation are:

The regulation of medical devices by the Medicines and Healthcare products Regulatory Agency (MHRA). As well as regulating drugs and medical devices (e.g. cardiac stents, joint replacements), the MHRA is the statutory regulator of medical  software and apps. Software involved in clinical decision making (think software that helps calculate drug dosages or makes treatment recommendations, but not software used for booking appointments) is regulated as a medical device. The MHRA has a nice summary of what this means for developers and clinicians : in short, low risk applications are managed through a self certification approach, but higher risk applications need to be independently validated by an independent organisation . The approach is closely linked to European Union regulations (specifically MEDEV 2.1/6) and the process of CE certification (which incidentally means that whatever happens as a result of Brexit is going to have major implications for the regulation of medical AI in the UK). Most of the interest in machine learning based applications is in the field of diagnostics (in particular radiology, opthamology and dermatology), and these would almost certainly be regulated as medical devices and require CE certification.

The regulation of the providers of healthcare services by the Care Quality Commission (CQC). The CQC is the statutory regulator of healthcare services (e.g. hospitals, GPs, care homes, community services, dentists) in England. This includes not only traditional healthcare services but also providers of online based consultations or medical services such as Babylon. Providers are assessed on a range of criteria to ensure that the services they provide are safe, effective, caring, responsive and well-led. It is not clear if automated healthcare services would fall under the remit of the CQC, but this seems likely, especially if these services were being purchased on behalf of patients by the NHS. Certainly, the use of AI by traditional providers would also potentially be of interest to the CQC – how hospitals demonstrate for example that their machine learning based radiology system are safe and accurate for example.

The regulation of medical professionals. In the UK, the main professional regulators are the General Medical Council (for doctors), and the Nursing and Midwifery Council (nurses and midwives). These bodies set the standards of training, behaviour, and practice expected of healthcare professionals. There are going to be some difficult challenges in how these standards are interpreted in an age of increasing medical automation. For example, where do the professional responsibilities of a doctor begin and end when she is following a treatment plan recommended by a machine learning algorithm? One of the issues here is that in contrast with a traditional protocol or clinical guideline, the rationale of how and why a machine learning algorithm has generated a particular output or recommendation can be very hard (or even impossible) to determine. How can doctors and nurses assure themselves, and in turn their professional regulators, that they have acted responsibly when they are making use of algorithms that are essentially black boxes whose internal processes and decision making are hidden? Current guidelines for clinicians about the use of medical apps are linked to MHRA regulation and CE certification, and this could provide a blueprint for future regulation of clinicians’ use of medical AI technology.

This is almost certainly an over-simplification of the many regulatory issues surrounding the implementation of machine learning applications in healthcare. And I haven’t even mentioned any of the issues relating to legal liability and litigation (I am not a lawyer) but these are likely going to be at least as complex (though you never know, maybe we will one day have our medical algorithms being taken to virtual electronic court by the legal AIs).

Although this seems like a dry topic, getting the regulatory frameworks right is important. A “wild west” approach to healthcare, free from any regulatory oversight is unlikely to be acceptable to society and could lead to a great deal of harm (a digital equivalent of healthcare in the era of blood letting, snake oil salesmen and quakery). At the same time, poorly designed regulation may fail to provide the intended protection to patients, generate perverse incentives and unexpected harms, and stifle innovation and implementation. I don’t know what the ideal regulatory framework for medical AI looks like, but there are few things that we could be doing now to increase the chance that we get this right:

  1. Look across and share learning with other industries also being changed by automation. What can healthcare learn from regulatory approaches to machine learning and automation in say, transportation, legal services or fintech?
  2. Develop better ways to unpack, inspect and understand the black box of algorithms. For complicated neural networks this is at present exceptionally hard, if not impossible (imagine for example using a brain CT scan to explain how your brain creates the visual perception of a beautiful sunset). Making artificial neural networks explainable is however an active area of research and would help immensely in developing regulatory frameworks for medical AI
  3. Develop approaches to measuring and evaluating the quality and safety of medical AI applications. This could involve extending existing post marketing surveillance and reporting systems to include medical AI, and setting up registries and audits to measure the real world outcomes of patients managed using these systems. We might need to think creatively about how to capture this type of data – it might for example be useful to capture a record of the “mind state” of the machine learning algorithm at the time that it made a particular recommendation or decision (if an algorithm is continually being updated and learning from new data, it would be important to be able to know if it made a lot of dangerous mistakes one Tuesday afternoon for example)
  4. Start thinking about these regulatory issues sooner rather than later. It would be much better if (proportionate, wise) regulation develops alongside technical innovation and implementation and not only in response to some major quality or safety scandal.

If you want to read in more detail about the issues of licensing AI applications in healthcare, I can heartily recommend this blog by Dr Mark Wandle.




Improving healthcare quality in the age of big data


It is now almost 100 years since Walter Shewhart started work at the Western Electric Company Hawthorne Works. His work there to develop quality control methods based on measuring and understanding variation in industrial systems was foundational, greatly influencing industry and management in the 20th Century. Many people would recognize today the statistical process control charts that he developed and still use the simple heuristics he developed to interpret them. Shewart’s work greatly influenced the ideas of William Deming, whose “Theory of Profound Knowledge” has been adopted extensively in healthcare quality improvement, including the single most influential concept in healthcare quality improvement – the Institute of Health Improvement’s “Model for Improvement” and the idea of the PDSA cycle.

The IHI Model for Improvement

It is no exaggeration to say that the Model for Improvement has become dogma, included in almost every teaching course and programme of healthcare quality improvement – the national programmes to improve the NHS in England, Wales and Scotland for example all borrow heavily from this model.  Using and understanding data is a central component of the Model for Improvement. Here the role of data is in measuring the effects of tests of change – for example measuring the change in number of hospital acquired infections after implementing a new catheter checklist. The role of data in the Model for Improvement is limited largely to simple measurement. 

With its focus on rapid cycle tests of change, the model promotes the idea that measurement should be done little and often, using small samples to assess change in a small number of metrics – ideas that Walter Shewart would also have espoused as he stalked the production line with his clipboard, carbon copy and pencil. There were perfectly good reasons for this approach in 1918 and many would argue that these still apply in 2017. Data can be burdensome to collect, and the delay between data collection and its availability so long that it is not useful to measure continuous improvement. Having more data does not necessarily lead to greater insight, and collecting data can distract from the actual work of improvement. However, the world has moved on in many ways since 1918, and it is important to ask if we should reappraise the role of data in quality improvement in this age of Big Data.

Quality improvement in the age of Big Data

There are several reasons why I think we should think again. As the marginal cost of collecting and storing electronic data falls towards zero, the idea that data for improvement should still be “little and often”, limited in scope and based only on small samples, looks less convincing that it did in the past. Rules that were appropriate in an age of pen and paper measurement look less relevant in the digital age. Indeed, there are reasons to think that holding on to the traditional approach to using data is actively problematic. For example, the “little and often” approach to measurement ignores the problems that come from sampling: achieving unbiased samples of data in the real world is actually quite hard to do for example. Even “little and often” measurement is still more burdensome and slower than completely automated data collection and analysis – which might seem like a pipe dream in many healthcare organizations but can still be an aspiration to which organizations work towards. For me, probably the biggest limitation with the traditional way of thinking about data in the world of quality improvement is that it demotes data to the role of simple measurement: just a dumb ruler to measure change by. The role of data to help people understand the systems they are trying to improve, learn how and what to improve, plan their interventions and tests of change or simulate them in advance, all become possible when we have much richer and detailed data, and use data in more sophisticated ways. I believe this to be a missed opportunity. Shewhart’s ideas have taken us a long way, but it’s time to think again. 


An AI Lab for the NHS


Should the NHS invest in building artificial intelligence services for healthcare? Or should it instead be a buyer of products and services made by others, in the same way it is for pharmaceuticals and medical devices?

In a time when it’s hard to move beyond worries about balancing the books, it might seem naive to talk about spending more of the NHS’s hard pressed budget on yet another new initiave. But as many others have pointed out, the potential for AI in healthcare is huge, and could help us manage the very challenges that make the financial pressures so acute. The NHS also has a track record of supporting Research and Development, funding the NIHR to the tune of £500 million each year.

If the NHS did decide to set up its own NHS AI Lab for Health (let’s call it NHS AI), what might it look like? I’d suggest that trying to become a competitive player in the bleeding edge of machine learning and AI is nether feasible or desirable. The big commercial sector organisations in this arena (think Google Deepmind, Amazon, Baidu, Facebook) are spending billions in R&D, and employ hundreds, or thousands, of mathematicians, physicicts and computer scientists. Universities and academic groups around the world have whole departments working on AI and related disciplines. So our NHS AI is never going to be a player in pushing the boundaries of computer science or mathematics.

Instead, the real space where NHS AI could shine would be in implementation. By building useful things (algorithms, applications), it’s goal could be to apply AI algorithms and techniques developed in other industries and settings to the big challenges in healthcare. These are developing so quickly that real world uses can barely keep up, and the challenge is increasingly becoming one of applying these amazing algorithms into the design of real world products and services. For example, can the NHS make use of open source image classification algorithms to help radiologists and pathologists diagnose cancer? Can you use chat bots in Facebook Messenger to book appointments or get health information? How can AI help hospitals manage their bed capacity better? The potential applications are myriad, and could help us tackle some really important and difficult challenges in healthcare.

Building useful things and implementing new technologies require different skills than doing cutting edge research in, say, deep neural networks. Being an expert in obscure branches of linear algebra might get you a job at Google, but is not necessarily going to help you much in designing applications that work well for children with asthma. So as well as having people with the technical knowledge to create and train machine learning algorithms, we need people with social knowledge: anthropologists and ethnographers helping us understand the lives and work of patients, clinicians and managers; user experience designers who know how to build things that actually work for users; patient leaders who can help us work through the ethical dilemmas involved in using patient data or turning to algorithms to make decisions. Our NHS Lab needs to bring these people together, and task them with creating new knowledge, products and services. 

To do this well is not going to be cheap. If the NHS spent 0.05% of it’s annual budget on funding AI, this work out as £60 million per year. Not small change by any means, but a drop in the ocean of healthcare spending. Given the potential to help improve the quality and efficiency of the NHS, I think this would be money well spent. This would be small beer in terms of AI projects in other industries, but would be a start.

There are of course many ways this could go wrong. Done badly, the Lab could stifle innovation through bureaucratisation, rule-making and crowding out. Big public sector organisations are hardly beacons of creativity and innovation and often don’t have the risk appetite to take on projects with uncertain returns: there is a reason why the NHS is not a pharmaceutical company. It is easy to see a project like this becoming caught in the fickle political winds that so often change the course of decision making in the NHS. “The Secretary of State has decided ….” are words that could kill a project like this. 

So how might we mitigate these risks? We could start by giving the lab a stable, multi year budget and a clear mandate: “Use AI to build useful things for healthcare. Commit to data transparency and open data. Make everything available open source and under licenses that promote use and creativity. Be responsive to your users. Give patients a leadership role in governance and decision making”.

This may all be hopelessly naive, a futurist day dream blind to the practicalities of making it happen. But I think we should at least imagine what this future could be, and start the conversation.


Patients, your data is yours


I genuinely believe that we can use patient data for good, in ways that are not exploitative and respect people’s right to privacy. In fact, I could not do my job without using this type of data. For example, most of the research studies I work on use data about real patients, to help understand how we can make healthcare services better. I also use this type of data to help hospitals measure and improve the quality of care that patients with stroke receive, and to help plan public health services.

Information about our health is some of the most private and personal information there is, and how this data is used is extremely sensitive. Explaining to people how this data is used in ways that people understand is therefore essential, but something that we have been quite poor at doing in the past. One of the reasons for this is that it is very easy to slip into using technical jargon, using language that we assume that other people understand, but which they do not.

Phrases like “psuedonymised”, “information governance” and “data controller” don’t mean very much to many people. By making it hard for people to understand the language of data sharing, we are locking people out of making meaningful decisions about how patient data is used. This is ethically troubling, and has probably contributed to generating fear and mistrust about how patient data is used.
Understanding Patient Data  is a new initiative by the Wellcome Trust which has recently done some great work to show us how we should be talking when explaining or asking how patient data is used. After carrying out focus groups with experts and patients, they have produced some very useful, and very clear, guidelines:

  • The term “patient data” is well understood by most people
  • Use “patient” and not other terms like citizen, user or consumer
  • Avoid using “personal data” – people often think that this means that the data are identifiable rather than being anonymised
  • Use data in the singular…always (sorry Grammer fans)
  • Use the term “individual care” not “direct care” when talking about data for people’s own treatment
  • The phrase “improving health, care and services through research and planning” is much better than terms like “secondary uses” to describe how patient data can be used for other uses apart from individual care
  • Pictures are powerful ways of explaining what removing identifiable information means

  • Be clear and specific about how much and what kind of data is used, and avoid general terms like “medical records”
  • It’s often better to say “using data” rather than “sharing data”, as this makes it clear that controls are in place to make sure that any data is used responsibly better than saying “data sharing agreement”

    I think that these findings are going to be incredibly helpful. The idea behind this project was to improve communication when discussing using data for research, but I think that these tips are just as valid for other uses of patient data, such as audit, quality improvement and surveillance.


    Let’s start a blog


    Blogging is almost an ancient technology by internet standards and so it feels a bit late coming to this party.

    But better late than never. Sometimes (actually almost always) the 140 characters of a tweet are not enough to express exactly what I want to say.

    So this blog is for the longer, slower thoughts, on the subjects that I’m interested in. There isn’t a neat way to describe this, but I’ll be writing about the unnamed overlaps between  :

    – how we can design healthcare to be better, cheaper and fairer. Think QI, design, health services research, public health

    – the role that data (big and small) can have, and could one day play, in helping us live healthier, longer lives.  Think analytics, data visualisation, data science, epidemiology