top of page

AI Case Study

Babylon Health claims 82% accuracy for video medical diagnosis based on machine learning and natural language processing

Babylon health has developed an inference engine based on machine learning to diagnose disease. It refers to the knowledge graph built from medical knowledge. The patient interface is an app with video capabilities which uses natural language processing. Babylon has a partnership with the NHS in the UK and is being piloted in Rwanda. Performance and data privacy concerns have been raised.



Healthcare Equipment And Supplies

Project Overview

According to the arXiv paper: "The Babylon Triage and Diagnostic System is designed to provide users with triage advice alongside an explanation of why this action has been suggested; this
consists of any reported symptoms that require urgent
attention, and/or a list of possible causes for the user’s symptoms... [the System] is based on a Probabilistic Graphical
Model (PGM) of primary care medicine, which models the prior probabilities of diseases and the conditional dependencies between diseases, symptoms and risk factors via a directed acyclic graph. The structure of the graph (i.e., the connections
between diseases, symptoms and risk factors) is created
by medical experts and reviewed from a modelling perspective.

The triage capability is based on an extension of the PGM with a utility model. This combines the likelihood of each disease with the
potential harm caused by that disease, under all possible triage decisions. The triage decision that is recommended is the one that minimises the expected harm to the patient, while also penalising overtriaging. To guarantee the safe triage of patients with symptoms that require urgent treatment (regardless of their underlying cause), the utility model is augmented with a set of rules that dictate a specific triage action where a particular combination of symptoms (so-called 'red-flag' symptoms) are
present. The system is designed to identify one of six triage actions: 'call an ambulance', 'go to A&E/ER', 'urgent GP' (i.e., within 6 hours), 'non-urgent GP' (i.e. within a week), 'pharmacy' and 'self-care'."

"Last year Babylon started a trial with a hospital trust in London in which calls to the NHS’s non-­emergency 111 advice line are handled partly by Babylon’s AI. Callers are asked if they want to wait for a human to pick up or download the Babylon-powered “NHS Online: 111” app instead.

Around 40,000 people have already opted for the app. Between late January and early October 2017, 40% of those who used the app were directed to self-treatment options rather than a doctor—around three times the proportion of people who spoke to a human operator. But both the AI and the humans staffing the phone line told the same proportion of people to seek emergency care (21%).

Now Babylon has also co-launched the UK’s first digital doctor’s practice, called GP at Hand. People in London can register with the service as they would with their local doctor. But instead of waiting for an appointment slot and taking time off work to see a physician in person, patients can either chat with the app or talk to a GP at Hand doctor on a video link. And in many cases the call isn’t needed. The human doctor becomes your last resort rather than your first.

GP at Hand has proved popular; some 50,000 people registered in the first few months, among them Matt Hancock, the UK health minister. Babylon now wants to expand across the UK. The service is also available in Rwanda, where 20% of the adult population has already signed up, according to Mobasher Butt, a doctor and a member of Babylon’s founding team. And it’s setting up services in Canada, with plans to do the same in the US, the Middle East, and China.

Not everyone is happy about all this. For a start, there are safety concerns. Parsa compares what Babylon does with your medical data to what Facebook does with your social activities—amassing information, building links, drawing on what it knows about you to prompt some action. Suggesting you make a new friend won’t kill you if it’s a bad recommendation, but the stakes are a lot higher for a medical app.

According to Babylon, its chatbot can identify medical conditions as well as human doctors do, and give treatment advice that’s safer. In a study posted online in June and coauthored with researchers at Imperial College London, Stanford University, and the Northeastern Medical Group, Babylon put its AI through a version of the final exam of the Royal College of General Practitioners (RCGP), which British GPs must pass in order to practice unsupervised. Babylon’s AI scored 81%, 9% higher than the average grade achieved by UK medical students.

The RCGP was quick to distance itself from Babylon’s hype, however. “The potential of technology to support doctors to deliver the best possible patient care is fantastic, but at the end of the day, computers are computers, and GPs are highly trained medical professionals: the two can’t be compared and the former may support but will never replace the latter,” said RCGP vice chair Martin Marshall in a statement. “No app or algorithm will be able to do what a GP does.”

Others level far more serious charges, suggesting that Babylon has focused on making its service accessible and affordable at the expense of patients’ safety. One Twitter user with the handle DrMurphy11 (he’s an NHS consultant who told me he needs to remain anonymous because of the corporate culture there) has coined the hashtag #DeathByChatbot. In videos showing interactions with the app, DrMurphy11 suggests that Babylon’s AI misses obvious diagnoses and fails to ask the right questions. “I have no concerns about health tech or AI in general,” he says. “No doctor wants to make mistakes, and any system that helps minimize the risk of harm from human error will be welcomed.” But he’s worried that companies are misleading doctors and the public with marketing claims that vastly oversell their current tech.

Babylon has also met with criticism in Rwanda, where it runs the Babyl service, for not taking local epidemiology into account. In an interview with the BBC, Rwanda’s minister of health claimed that the Babyl app included no questions about malaria, for example (although Babylon disputes this).

Still, while Babylon may not be as good as a real doctor (and such apps are always careful to recommend you see a real doctor when in doubt), playing it too safe would defeat the purpose. “We wanted to re-create the same pragmatic approach that a clinician takes,” says Butt. “If we just had a group of nonclinical people building the service, they might have gone for something that was 100 percent safe, but that could mean you send everyone to hospital, which is not what a real doctor or nurse would do.”

Another fear is that digital-­first services will create a two-tiered health-care system. For example, GP at Hand advises people with serious medical issues to think twice about signing up to a practice that offers mostly remote access to doctors. That might seem prudent, but it has led to accusations that GP at Hand is effectively cherry-picking younger patients with less complex—and less expensive—health-care needs. Since British GP practices get per-­patient funding from the NHS, cherry-picking would mean the rest of the health-care system is left to do more with less.

For some GPs, this isn’t acceptable. “We take everybody,” says Bhatti. But Oliver Michelson, a spokesperson for the NHS, accepts that GP at Hand has to issue some form of caveat—it can’t realistically welcome everyone. “They are not denying people access but saying that if you’re going to need to come into your GP regularly, a digital-first service may not be the best place to be,” he says.

And Butt insists that they exclude nobody. “The service is available to everyone,” he says; it just may not suit some people, such as those with severe learning difficulties or visual impairments, who would struggle with the app." (.technologyreview)

Reported Results

From the arXiv research paper: "Overall we found that the Babylon AI powered Triage and Diagnostic System was able to identify the
condition modelled by a clinical vignette with accuracy comparable to human doctors (in terms of precision and recall). In addition, we found that the triage advice recommended by the AI System was, on average, safer than that of human doctors, when compared to the ranges of acceptable triage provided by independent expert judges, with only a minimal reduction in appropriateness". Babylon claims a diagnosis accuracy of 82% compared to 72% the average passmark for the MRCGP exam.

However, recently the UK's Medicines and Healthcare products Regulatory Agency investigated the app "after one doctor complained it had failed to identify symptoms of a heart attack or deep vein thrombosis. Two other doctors told the FT they had complained about wording on the website, which has since changed, that could confuse patients", according to the Financial Times.


According to the arXiv paper: "Once constructed and parameterised, the model is used to reason about the possible underlying diseases that cause the user-entered symptoms and risk factors, using novel Bayesian inference methods". Whether Bayesian inference is truly "AI" has been disputed, however Babylon claims in their response to such critiques that "A deep neural network is also used to enable rapid and scalable inference for reasoning on the Bayesian generative model".

The NLP abilities of the system certainly fall under contemporary definitions of AI. From the Babylon Health website: "Throughout the Babylon platform we use Machine Learning (ML) for a variety of tasks. In the inference engine we combine probabilistic models with deep learning techniques to speed up the inference process. In the Knowledge Graph we predict new relationships between medical concepts based on reading medical literature. In NLP we build language understanding models based on large-scale datasets of interactions with our users and data from the web. We use ML to teach our NLP system new languages".


R And D

Product Development


Babylon can deliver unparalleled access to healthcare, including personalised health assessments, treatment advice and face-to-face appointments with a doctor 24/7. From the Financial Times:
"Babylon, one of a number of new technology products being adopted by overburdened health services eager to cut costs, has two significant partnerships with the NHS. Some patients in London can register to use the app’s video consultations to communicate with a doctor instead of registering with a traditional GP’s surgery. Babylon also delivers a telephone advice service called NHS 111 in north London".



From the arXiv paper: "The prior probabilities of diseases and risk factors are obtained from epidemiological data, where available. Conditional probabilities (for example, the probability of a symptom occurring given a disease) are obtained through elicitation from multiple independent medical experts".

bottom of page