AI Case Study

Peptone accelerates protein research for drug discovery with a machine learning derived database

Peptone uses machine learning to predict structural stability for Intrinsically Disordered Proteins. This modelling has been compiled into a database of over 7,000 proteins, which can be integrated with other machine learning techniques to accelerate and enhance biochemistry research.


Public And Social Sector

Education And Academia

Project Overview

As stated in their arXiv paper, Peptone wanted to improve intrinsic protein disorder understanding to aid with "to expand our understanding of intrinsic protein disorder, we have created a database of secondary structure (SS) propensities for proteins (dSPP) as a reference resource for experimental research and computational biophysics. The structural propensity approach offers a computationally effective alternative to binary-type
234 SS class assignments."

According to the Peptone blog: "The biggest value of Machine Learning methods in prediction of biophysical properties of proteins is their ability to “equate” loosely related protein features to measurable experimental data. Thus predictions using complex numerical models that underlie Machine Learning methodology, can be further tweaked and refined by providing independent experimental proxies of protein structure and dynamics. dSPP is a unique source of information for Intrinsically Disordered Proteins (IDPs), which are a challenging class of proteins to study. IDPs are implicated in numerous debilitating human pathologies, including Alzheimer’s, Parkinson’s, prion diseases, molecular basis of cancer, HIV, HSV, HVC, ZIKVR, and many others. dSPP data can be readily used by experimentalist to gain exclusive insight into structural stability of secondary structure motifs, as well as high throughput computational techniques, which aim to deliver realistic models of medically relevant proteins." The Peptone database is thus tailored for integration with deep learning models to accelerate protein research.

Reported Results

The arXiv paper evaluates of the advantages of using the Peptone database of structural propensities over traditional methods for targeting proteins for drug discovery: "we show how absolute SS propensity can be used as a benchmark for residual disorder probability predictions, complementing existing approaches. Our analyses indicate that established disorder prediction methods suffer from insufficient sensitivity to disordered regions among folded domains."


Peptone uses in-house proprietary machine learning algorithms. As explained in the arXiv paper: "ML methods aimed specifically at structural disorder prediction are predominantly trained on datasets of binary-encoded, multi-class SS types. Protein disorder is inferred from missing structural coordinates, sequence conservation, similarity to known disordered proteins, and 3D contact maps."



Data Science


From the Peptone blog: "Although great advancements in laboratory and industrial-scale protein production have been made, protein engineering and all the associated steps remain laborious, expensive and truly complicated. We want to engineer proteins to enhance their properties. Typically, stability under different temperatures, pH or salinity. Frequently, researchers are aiming at improving catalytic performance of protein enzymes, or adding completely new types of chemical activities to known proteins. The most common and established way to engineer a protein is to create its variants with substituted amino acids, also known as mutants. Protein biotechnology is to a large extent hampered by scale and complexity of mutational analysis."



Nuclear magnetic resonance spectroscopy, specifically "NMR resonance assignment data from 7200+ proteins stored in public repositories" according to Peptone's blog.