AI Case Study

Stanford University Medical uses machine learning to improve palliative care by predicting end-of-life within the next year with 90% accuracy

Stanford University Medical trained deep neural networks on over 221,284 patient electronic health records to make predictions about patient survival 3 - 12 months into the future. Admitted patients are automatically evaluated by the model to identify probability of death and who could benefit from palliative care. The model could predict with 90% accuracy end-of-life within the next year. Stanford believe this could revolutionise palliative care giving more control back to the patient and family who overwhelmingly would like to pass at their home and optimising treatment plans.



Healthcare Providers And Services

Project Overview

They selected a total of 221,284 patients and analysed their electronic health records (EHR) with a deep neural network.

"The EHR data of admitted patients are automatically evaluated by an algorithm, which brings patients who are likely to benefit from palliative care services to the attention of the Palliative Care team. The algorithm is a Deep Neural Network trained on the EHR data from previous years, to predict all-cause 3-12 month mortality of patients as a proxy for patients that could benefit from palliative care. Our predictions enable the Palliative Care team to take a proactive approach in reaching out to such patients, rather than relying on referrals from treating physicians, or conduct time consuming chart reviews of all patients. We also present a novel interpretation technique which we use to provide explanations of the model’s predictions."

Reported Results

"When the algorithm is used to predict the passing of patients who are in palliative care (meaning they are not expected to live longer than 12 months), the system forecasts death with an astonishing 90 percent accuracy."

"The model achieves recall of 0.34 at 0.9 precision (0.32 on admitted patients)."

"The way the AI goes about its predictions is quite complex, and while the researchers know how the system digests the data it’s being fed, the algorithm doesn’t have a simple way of explaining how it reaches its conclusions. Despite that, the digital brain is accurate enough in its predictions that the 'how' is a secondary concern."


The "model is a Deep Neural Network (DNN) comprising an input layer (of 13,654 dimensions), 18 hidden layers (each 512 dimensions) and a scalar output layer. We employ the logistic loss function at the output layer and use the Scaled Exponential Linear Unit (SeLU) activation function at each layer. The model is optimized using the Adam optimizer, with a mini-batch size of 128 examples. Intermediate model snapshots were taken every 250 mini-batch iterations, and the snapshot that performed best on the validation test was selected as the final model. Explicit regularization was not found necessary. The network configuration was reached by extensive hyperparameter search over various network depths (ranging from 2 to 32) and activation functions (tanh, ReLUand SeLU)."



General Operations


"Humans today live a lot longer than they used to. That’s great news, but as modern medical advances are giving patients second chances at living normal lives, end-of-life care continues to be a difficult thing to plan. Forecasting when someone will die is an extremely challenging and often uncomfortable thing..."

"Inaccurate predictions from doctors can create unwanted consequences, such as patients spending their final days in a hospital ward rather than at home, surrounded by loved ones. By removing the human predictive element from the equation — which may be influenced by emotion and optimism" the question is whether machine learning algorithms could be used reliably?



They selected a total of 221,284 patients. "For each patient, we consider the 12 months leading up to their prediction date as their observation window. Within the observation window of each patient, we use ICD9 (International Classification of Diseases 9th rev) diagnostic and billing codes, CPT (Current Procedural Terminology) procedure codes, RxNorm prescription code."

"We also include the patient demographics (age, gender, race
and ethnicity), and the following per-patient summary statistics
in the observation window for each code category:
• Count of unique codes in the category.
• Count of total number of codes in the category.
• Maximum number of codes assigned in any day.
• Minimum number of codes (non-zero) assigned in any
• Range of number of codes assigned in a day.
• Mean of number of codes assigned in a day.
• Variance in number of codes assigned in a day."

"All these features (i.e, code counts in each of the four observation slices, per category summary statistics over the observation window, and demographics) were concatenated to form the candidate feature set. From this set, we pruned away those features which occur in 100 or fewer patients. This resulted in the final set of 13,654 features. Of the 13,654 features, each patient on average has 74 non-zero values (with a standard deviation of 62), and up to a maximum of 892 values. The overall feature matrix is approximately 99.5% sparse."