AI Case Study

Centers for Disease Control and Prevention reduce polio report generation time to 1 hour using machine learning to automate regional mapping of the disease

The Centers for Disease Control and Prevention investigate evolving strands of the polio virus and map its activity dgeographically. Using machine learning has reduced the manual tie involved in generating reports to 1 hour and from clustering virus strains from over 3 months to 1 week.

Industry

Public And Social Sector

Government

Project Overview

"Assembling all the data for 3,000 sequences and then labeling, color-coding, and separating the viruses into clusters of genetic lineages took up to three days. The process was very complex, with a steep learning curve for cross-training others to do the task. CDC needed to automate this workflow with tools that others in the lab could use and produce reports in a format that was easy to
distribute and understand. MATLAB, Bioinformatics Toolbox™, and
other toolboxes provided a platform for CDC to build tools that streamline the poliovirus tracking and reporting process. To link patient data with individual strains, the researchers use Database Toolbox™ to read patient information, including the date and location of each genetic sample, into MATLAB, where they link it to sequencing information from a FASTA-formatted file imported using Bioinformatics Toolbox. To analyze genetic data and identify clusters of genetically similar viruses, CDC researchers align genetic sequences and generate neighbor-joining phylogenetic trees using Bioinformatics Toolbox and Statistics and
Machine Learning Toolbox™.

Working with MathWorks consultants, the team developed a MATLAB based cluster analysis tool that classifies viruses by serotype and genotype and then separates them into clusters of related viruses. The team plots these clusters as color-coded dots on regional maps using Mapping Toolbox™. The cluster distribution maps enable health agencies to see where poliovirus is active and to detect patterns in the movement of the virus. To simplify the overall workflow, CDC PMEL built standalone programs using MATLAB Compiler™. These programs feature an interface that makes it easy to select databases and files, annotate dendrograms with patient information, and generate monthly reports. More extensive documentation of annotated phylogenetic trees is produced using MATLAB Report Generator™."

Reported Results

"Producing the monthly polio report used to take three days. Using the tools that CDC built with MATLAB, Bioinformatics Toolbox, and MATLAB Compiler, any technician with minimal training can generate the report in about an hour. In the past CDC researchers designated clusters by hand, writing on large posters and incorporating genetic difference data from spreadsheets. It was an immense effort spread out over three months. With the MATLAB based cluster analysis tool, all the data is in one place. The process is welldocumented, and CDC researchers can complete it in one week of focused effort."

Technology

Function

Information Technology

Knowledge Management

Background

"Eliminated throughout most of the world, polioviruses are still active in several countries, including Afghanistan, India, Nigeria,
and Pakistan. The Centers for Disease Control and Prevention (CDC) supports the Polio Eradication Initiative of the World Health
Organization (WHO) by providing epidemiologic and technical expertise to polio-endemic countries and partner agencies. CDC’s Polio Molecular Epidemiology Laboratory (PMEL) sequences poliovirus samples to determine their genetic signature and monitors the virus as it changes and spreads. The lab produces comprehensive reports that enable researchers to understand how the virus evolves during replication and to help health agencies mount more effective immunization campaigns."

Benefits

Data

Patient data and 3,000 polio virus strain sequences.