Thanks to AI, scientists provide for the location of virtually every protein in the human cell Myth news

The protein located in the wrong part of the cell can contribute to several diseases, such as Alzheimer, Cystic Fibrosis and Cancer. But there are about 70,000 different protein proteins and variants in one human cell, and because scientists can usually test only a handful in one experiment, manual identification of protein location is extremely expensive and time -consuming.

The new generation of computing techniques is aimed at improving the process using machine learning models, which often use data sets containing thousands of proteins and their locations, measured on many cellular lines. One of the largest such data sets is the atlas of human protein, which catalogs subhelm behavior over 13,000 proteins in over 40 cellular lines. But although it is a huge, human Atlas of Protein examined only about 0.25 percent of all possible pairs of proteins and cell lines in the database.

Now scientists from MIT, Harvard University and Broad Institute of Mit and Harvard have developed a new computing approach that can effectively explore the remaining unexplored space. Their method can predict the location of any protein in any human cell line, even if both protein and the cell have never been tested before.

Their technique goes a step further than many methods based on AI by locating protein at a unicellular level, and not as an average estimate in all cells of a specific type. This one -cell location may, for example, indicate the position of protein in a specific cancer cell after treatment.

Scientists combined a protein language model with a special type of computer vision model to capture rich details about protein and cell. Ultimately, the user receives a cell image with an illuminated portion indicating the anticipation of the model where the protein is located. Since the location of the protein indicates its functional status, this technique can help researchers and clinicians more effectively diagnose diseases or identify drug goals, while enabling biologists to better understand how complex biological processes are associated with protein location.

“You can conduct these experiments with the location of protein on a computer without having to touch any laboratory bench, I hope that by saving yourself months of effort. Although you will still have to verify the forecast, this technique may act as a preliminary study of what should be experimentally”, “says Yitong Tseo, a graduate in the compulsory biology and system biology program tests.

The author of Xinyi Zhang, a graduate of the Department of Electrical and Computer Science (EECS) and the center of Eric and Wendy Schmidt at the Broad Institute; Yunhao Bai from the Broad Institute; and older authors of Fei Chen, assistant professor at Harvard and a member of the Broad Institute, as well as Caroline Uhler, Andrew and Erna Viterbi Professor of Engineering at EEC and the MIT Institute for Data, Systems and Society (IDSS), which is also the director of Eric and Wendy Schmidt Center and researcher in MIT decision -making (LIDS). Test appears today in Nature methods.

Cooperation models

Many existing protein forecasting models can only make forecasts based on the data of protein and cells on which they have been trained or are unable to indicate the location of protein in a single cell.

To overcome these restrictions, scientists have created a two -part method of forecasting an invisible sub -sewage position, called puppies.

The first part uses a protein sequence model to capture properties determining the location of protein and its 3D structure based on the amino acid chain that forms it.

The second part contains the image painting model, which is designed to fill the missing parts of the image. This computer vision model analyzes three colored cell images in order to collect information about the condition of this cell, such as its type, individual features and whether it is under stress.

Puppies combine representations created by each model to predict where the protein is in a single cell, using a image decoder to display the illuminated image, which shows the expected location.

“Different cells in the cell line show different features, and our model is able to understand this nuance,” says TSEO.

The user introduces the sequence of amino acids, which form protein and three images of cell coloring – one for the nucleus, one for microtubul and one for endoplasmic reticulum. Then puppies do the rest.

Deeper understanding

Scientists used several tricks during the training process to teach puppies how to combine information from each model in such a way that it can develop about the location of protein, even if it did not see protein before.

For example, they assign a secondary task to the model during training: to clearly mention the location range, just like the cell nucleus. This is done with the original painting task to help the model learn more effectively.

A good analogy can be a teacher who asks his students to draw all parts of the flower in addition to writing their names. It was found that this additional step helped the model improve its general understanding of possible cellular compartments.

In addition, the fact that puppies are also trained on proteins and cellular lines helps in a deeper understanding of where in the proteins of cellular image they tend to locate.

Puppies can even independently understand how different parts of the protein sequences contribute separately to its overall location.

“Most other methods usually need a spirits of protein first, so you have already seen them in training data. Our approach is unique, because it can simultaneously generalize between proteins and a cell line,” says Zhang.

Because puppies can generalize to invisible proteins, they can capture changes in the location driven by unique protein mutations, which are not included in the human protein atlas.

Scientists have confirmed that puppies can predict a chapter location of new proteins in invisible cellular lines, conducting laboratory experiments and a comparison of results. In addition, compared to the initial AI method, the puppies showed an average lower forecasting error in the tested proteins.

In the future, scientists want to improve puppies so that the model can understand protein-white interactions and make the location of many proteins in the cell. In the long run, they want to enable the crevices to forecast in terms of living human tissue, not bred cells.

These studies are financed by Eric and Wendy Schmidt Center at the Broad Institute, National Institutes of Health, National Science Foundation, Burroughs Welcome Fund, Searle Scholars Foundation, Harvard STEM Cell Institute, Merkin Institute, Naval Research Office and Energy Department.

LEAVE A REPLY

Please enter your comment!
Please enter your name here