AI Case Study

DeepMind develops a highly accurate machine learning method for predicting protein structures

DeepMind has developed a machine learning method to predict the 3D structure of proteins based on their genetic makeup. According to the company, the AlphaFold system is the most accurate methodology currently in the world for the prediction.

Industry

Healthcare

Pharmaceuticals And Biotech

Project Overview

From DeepMind, "Our team focused specifically on the hard problem of modelling target shapes from scratch, without using previously solved proteins as templates. Our system, AlphaFold, which we have been working on for the past two years, builds on years of prior research in using vast genomic data to predict protein structure. The 3D models of proteins that AlphaFold generates are far more accurate than any that have come before—making significant progress on one of the core challenges in biology."

Reported Results

According to DeepMind, "We achieved a high degree of accuracy when predicting the physical properties of a protein structure, and then used two distinct methods to construct predictions of full protein structures."

Technology

From DeepMind: "these methods relied on deep neural networks that are trained to predict properties of the protein from its genetic sequence. The properties our networks predict are: (a) the distances between pairs of amino acids and (b) the angles between chemical bonds that connect those amino acids. The first development is an advance on commonly used techniques that estimate whether pairs of amino acids are near each other.

We trained a neural network to predict a separate distribution of distances between every pair of residues in a protein. These probabilities were then combined into a score that estimates how accurate a proposed protein structure is. We also trained a separate neural network that uses all distances in aggregate to estimate how close the proposed structure is to the right answer.

Using these scoring functions, we were able to search the protein landscape to find structures that matched our predictions. Our first method built on techniques commonly used in structural biology, and repeatedly replaced pieces of a protein structure with new protein fragments. We trained a generative neural network to invent new fragments, which were used to continually improve the score of the proposed protein structure.

The second method optimised scores through gradient descent—a mathematical technique commonly used in machine learning for making small, incremental improvements—which resulted in highly accurate structures. This technique was applied to entire protein chains rather than to pieces that must be folded separately before being assembled, reducing the complexity of the prediction process."

Function

R And D

Core Research And Development

Background

"Proteins are large, complex molecules essential in sustaining life. Nearly every function our body performs—contracting muscles, sensing light, or turning food into energy—can be traced back to one or more proteins and how they move and change. What any given protein can do depends on its unique 3D structure. But figuring out the 3D shape of a protein purely from its genetic sequence is a complex task that scientists have found challenging for decades. The challenge is that DNA only contains information about the sequence of a protein’s building blocks called amino acid residues, which form long chains. Predicting how those chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”."

Benefits

Data

Genomic data