AI Case Study
Researchers at EPFL predict chemical shifts of molecular solids and their polymorphs to within DFT accuracy with machine learning
Scientists at the Institut des Sciences et Ingénierie Chimiques and Institut des Sciences et Génie Matériaux at Ecole Polytechnique Fédérale de Lausanne (EPFL) are conducting research in molecular chemistry using machine learning. Their aim was to predict chemical shifts of molecular solids and their polymorphs. They have proposed a machine learning model to predict those within DFT accuracy. In trials, the model was able to successfully determine the structures of cocaine and another drug.
"We propose a machine learning method based on local environments to accurately predict chemical shifts of molecular solids and their polymorphs to within DFT accuracy.
We have presented a ML model based on local environments to
Far from being just a benchmark of a machine-learning scheme, the method is accurate enough to be used to determine structures by comparison to experimental shifts in chemical shift- based NMR crystallography approaches to structure determina- tion, as shown here for cocaine and AZD8329.
The ML model only scales linearly with the number of atoms and, for the pre- diction of individual structures, is dominated by a constant I/O overhead. Here it allows the calculation of chemical shifts for a set of six structures with between 768 and 1584 atoms in their unit cells in less than 6 min (an acceleration of a factor 106 for the largest structure).
The accuracy of the method is likely to increase further with the size of the training set, and subsequently with the future evolution of the accuracy of the method used to calculate the reference shifts used in training (here DFT), or by using experi- mental shifts if a large enough set were available. A web version based on the protocol described here is publicly available at http://shiftml.epfl.ch.
The model used here can easily be extended to organic solids including halides or other nuclei, and to network materials such as oxides, and these will be the subject of further work."
"The trained model is able to determine, based on the match between experimentally measured and ML-predicted shifts, the structures of cocaine and the drug 4-[4-(2-adamantylcarba- moyl)-5-tert-butylpyrazol-1-yl]benzoic acid."
"The machine learning code to calculate the SOAP environ- ments, the kernels, and the chemical shifts is called glosim2, and is publicly available at https://github.com/cosmo-epfl/glosim2. The DFT codes used to opti- mize geometry and calculate chemical shifts are available from the corresponding developers."
R And D
Core Research And Development
"Due to their strong dependence on local atonic environments, NMR chemical shifts are among the most powerful tools for strucutre elucidation of powdered solids or amorphous materials. Unfortunately, using them for structure determination depends on the ability to calculate them, which comes at the cost of high accuracy first-principles calculations. Machine learning has recently emerged as a way to overcome the need for quantum chemical calculations, but for chemical shifts in solids it is hindered by the chemical and combinatorial space spanned by molecular solids, the strong dependency of chemical shifts on their environment, and the lack of an experimental database of shifts."
"Computational details. For the SOAP kernels59,60, each atomic environment is represented as a three-dimensional neighborhood density given by a superposition of Gaussians, one centered at each of the atom positions in a spherical neigh- borhood within a cut-off radius rc from the core atom. The Gaussians have a variance ς2, and a separate density is built for each atomic species. The kernel is then constructed as the symmetrized overlap between the amplitudes representing X and X′. This degree of overlap thus measures the similarity between the envir- onments X and X′."