AI Case Study
MIT scientists develop system that crowdsources data to speed up drug discovery using neural networks
Scientists at MIT have developed a system that lets pharmaceutical companies share their data to speed up drug discovery, while keeping them confidential. An artificial neural network (ANN), trained on 1.4 million drug-protein pairs that are known to both do and do not interact, identifies new drug-protein interactions. The system achieved that with 95% accuracy.
Industry
Healthcare
Pharmaceuticals And Biotech
Project Overview
"A new cryptographic system could allow pharmaceutical companies and academic labs to work together to develop new medications more quickly — without revealing any confidential data to their competitors.
The centerpiece of this computing system is an artificial intelligence program known as a neural network. The AI studies information about which drugs interact with various proteins in the human body to predict new drug-protein interactions.
More training data beget a smarter AI, which was a challenge in the past because drug developers generally don’t share data due to intellectual property concerns. The new system allows an AI to crowdsource data while keeping that information private, which could encourage partnerships for speedier drug development, researchers report in the Oct. 19 Science.
Identifying new drug-protein interactions can uncover potential new treatments for various diseases. Or it could reveal whether drugs interact with unintended protein targets, which might indicate if a medication is likely to cause particular side effects, says Ivet Bahar, a computational biologist at the University of Pittsburgh not involved in the work.
In the new AI-training system, data pooled from research groups get divvied up among multiple servers, and the owner of each server sees what appear to be only random numbers. “That’s where the crypto-magic happens,” says computer scientist David Wu of the University of Virginia in Charlottesville, who wasn’t involved in the work. Although no individual participant can see the millions of drug-protein interactions that compose the training set, the servers can collectively use that information to teach a neural network to predict the interactivity of previously unseen drug-protein combinations.
“This work is visionary,” says Jian Peng, a computer scientist at the University of Illinois at Urbana-Champaign not involved in the study. “I think [it] will lay the groundwork for the future of collaborations in biomedicine.”
MIT computational biologist Bonnie Berger and colleagues Brian Hie and Hyunghoon Cho evaluated their system’s accuracy by training a neural network on about 1.4 million drug-protein pairs. Half of these pairs were drawn from the STITCH database of known drug-protein interactions; the other half comprised drug-protein pairs that don’t interact. When shown new drug-protein pairs known to interact or not, the AI picked out which sets interacted with 95 percent accuracy.
To test whether the system could identify hitherto unknown drug-protein interactions, Berger’s team then trained the neural network on nearly 2 million drug-protein pairs: the entire STITCH dataset of known interactions, plus the same number of noninteracting pairs. The fully trained AI suggested several interactions that had never before been reported or that had been reported but were not in the STITCH database.
For instance, the AI identified an interaction between estrogen receptor proteins and a drug developed to treat breast cancer called droloxifene. The neural network also found a never-before-seen interaction between the leukemia medication imatinib and the protein ErbB4, which is thought to be involved in different types of cancer. The researchers confirmed this interaction with lab experiments.
This secure computing network may also encourage more collaboration in areas outside of pharmaceutical development. Hospitals could share confidential health records to train AI programs that predict patient prognoses or devise treatment strategies, Peng says.
“Whenever you want to do a study on a large number of people on behavior, on genomics, on medical records, legal records, financial records — anything that’s privacy-sensitive, these kinds of techniques can be very useful,” Wu says. "
Reported Results
"When shown new drug-protein pairs known to interact or not, the AI picked out which sets interacted with 95 percent accuracy."
Technology
Function
R And D
Core Research And Development
Background
Benefits
Data
1.4 million drug-protein pairs