Catalog of genetic mutations to help indicate the cause of diseases

The new AI tool classifies the effects of 71 million “Missense” mutations

Discovering the original causes of the disease is one of the biggest challenges in human genetics. With millions of possible mutations and limited experimental data, this is largely a secret that can cause diseases. This knowledge is crucial for faster diagnosis and development of life -saving treatment.

Today we release catalog From the “Missense” mutation, in which scientists can learn more about what impact they can have. Missense variants are genetic mutations that can affect the function of human proteins. In some cases, they can lead to diseases such as cystic fibrosis, sickle anemia or cancer.

The Alphamissense catalog has been developed using Alphamissense, our new AI model, which classifies Missense variants. In an article published in ScienceWe show that it categorized 89% of all 71 million possible Missense variants as probable pathogenic or probably mild. However, only 0.1% were confirmed by human experts.

AI tools that can accurately predict the impact of variants have the power to accelerate research between areas from molecular biology to clinical and statistical genetics. Experiments to discover the disease causing the disease They are expensive and laborious – each protein is unique and each experiment must be designed separately, which may take months. By using artificial intelligence forecasts, scientists can get a preview of the results of thousands of proteins at the same time, which can help in the priority of resources and accelerating more complex research.

We have provided all our forecasts freely for commercial use and researchers and open Code model for alphamissense.

Alphamissense predicted the pathogenicity of all possible 71 million Missense variants. 89% classified – the prediction of 57% was probably mild, and 32% was probably pathogenic.

What is the Missense variant?

The Missense variant is a single substitution of DNA, which causes another amino acid in protein. If you think about DNA as a language, changing one letter can change the word and completely change the meaning of the sentence. In this case, the substitution that the amino acid is translated, which can affect the protein function.

The average person is carrying Over 9,000 Missense variants. Most are mild and have a small or no effect, but they are different pathogenic and can seriously interfere with the protein function. Missense variants can be used in diagnosing rare genetic diseases, in which several or even a single Missense variant can directly cause disease. They are also important for the study of complex diseases, such as type 2 diabetes, which can be caused by a combination of many different types of genetic changes.

The classification of Missense variants is an important step in understanding which of these protein changes can cause disease. Of the over 4 million Missense variants, which have already been observed in humans, only 2% were adnounced as pathogenic or mild by experts, about 0.1% of all 71 million possible Missense variants. The rest is considered “variants of unknown importance” due to the lack of experimental or clinical data on their impact. Thanks to Alphamissense, we now have the purest image so far, classifying 89% of variants using a threshold that gave 90% precision in the database of known disease variants.

Pathogenic or mild: as alphamissense classifies variants

Alphamissense is based on our groundbreaking model Alphafoldwhich predicted the structures of almost all proteins known to science from their amino acid sequences. Our adapted model can predict the pathogenicity of Missense variants changing individual protein amino acids.

To train Alphamissense, we have refined Alphafold on labels distinguishing variants observed in human and closely related populations. Commonly observed variants are treated as mild, and variants are never treated as pathogenic. Alphamissense does not provide for a change in the protein structure after the mutation or other effect on protein stability. Instead, it uses databases of related protein sequences and the structural context of variants to get a result from 0 to 1 approximately the probability assessment that the variant is pathogenic. The continuous result allows users to choose the threshold of classification of variants as pathogenic or mild, which corresponds to their accuracy requirements.

Illustration of how Alphamissense classifies human Missense variants. The Missense variant has been introduced, and the AI ​​system obtains it as pathogenic or probably mild. Alphamissense combines structural context and protein modeling and is adapted to the frequency databases of human and primary population frequencies.

Alphamissense reaches the latest forecasts in a wide range of genetic and experimental reference points, all without clear training of such data. Our tool exceeded other calculation methods when it is used to classify variants from Clinvar, a public data archive regarding the relationship between human variants and the disease. Our model was also the most accurate method of predicting results from the laboratory, which shows that it is in line with different ways to measure pathogenicity.

Alphamissense exceeds other calculation methods regarding the prediction of the effects of the Missense variant.
Left: Comparison of Alphamissen performance and other methods in the field of classification variants from the Public Archive Clinvar. The methods shown at Gray have been trained directly to Cllinvar, and their performance in this reference is probably overwhelmed, because some of their training variants are included in this test set.
Normal: A chart comparing the efficiency of Alphamissese and other methods for predicting measurements from biological experiments.

Building social resources

Alphamissense is based on Alphafold to increase the understanding of proteins by the world. A year ago we spent 200 million protein structures Excluded using Alphafold – which helps millions of scientists around the world in accelerating research and paving the way to new discoveries. We are looking forward to how Alphamissense can help solve open questions at the center of genomics and biological sciences.

We have provided Alphamissense forecasts freely to both commercial and scientific communities. Together with the EMBL-EBI, we make them more useful by EnseMBL Variant Effect Effector.

In addition to our Missense mutation search table, we divided the extended forecasts of all possible 216 million bases of sequences of individual amino acids in over 19,000 human proteins. We also included the average forecast for each gene, which is similar to measuring the limit of the evolutionary gene – it indicates how necessary the gene is for the experience of the body.

Examples of Alphamissense forecasts applied to the anticipated structures of Alfafold (red = expected as pathogenic, blue = predicted as mild, gray = uncertain). Red dots represent known pathogenic variants of Missense, blue dots represent known mild variants from the Clinvar database.
Left: HBB protein. Variants of this protein can cause sickle disease.
Normal: CFTR protein Variants of this protein can cause cystic fibrosis.

Acceleration of research on genetic diseases

The key step in translating this research is cooperation with the scientific community. We worked in cooperation with Genomics England to examine how these forecasts can help in studying genetics of rare diseases. Alphamissense Genomics England arrangements with a given pathogenic variant previously aggregated with the participation of people. Their assessment confirmed that our forecasts are accurate and consistent, providing another real reference point for Alphamissense.

Although our forecasts are not designed so that they are used directly in the clinic-they should be interpreted with other sources of evidence-this work may improve the diagnosis of rare genetic disorders and help in discovering new genes that cause diseases.

Ultimately, we hope that Alphamissense and other tools will allow researchers to better understand diseases and develop new life -save treatments.

Learn more about Alphamissense:

Notes

*As of March 13, 2024, Alphamissense forecasts are available under CC by V.4 License, thus raising the previous non -commercial limitation of use. See Published database AND Zenodo For further information about access.

We would like to thank Juany Bawagan, Jess Valdez, Katie Mcatackney, Kathryn Seager, Hollie Dobson, for help in text and characters. We are also grateful to our external partners, Genomics England and EMBL-EBI, for their constant support. This work was done thanks to the contribution of co -authors: Guido Novati, Joshua PAN, Clare Motoft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zieliński Kohli. We would also like to thank Kathryn Tunaasuvunakool, Rob Fergus, Eliseo Papa, David La, Zachary Wu, Sara-Jane Dunn, Kyle R. Taylor, Natasha Latysheva, Hamish Tomlinson, Augustin Žídek, Rozions, Mira Lutfi, Jon Small, Molly Beck, Annette, Annette Obika, Folece, Folece, Folece, Folece, Folece, Folece, Folece, Folece, Folece, Folek, Foles of Alyssa Pierce, James Tam, Q Green, Meer Last, Tharindi Hapurachchi and a larger Google Deepmind team for their support, help and opinions.

LEAVE A REPLY

Please enter your comment!
Please enter your name here