Proteins are working works that keep our cells, and in our cells there are many thousands of types of proteins, each of which has a specialized function. Scientists have long known that the protein structure determines what it can do. Recently, researchers will appreciate that protein location is also crucial for its functions. The cells are full of compartments that help organize many residents. Together with well -known organelles, which decorate the pages of biology textbooks, these spaces also include various dynamic, membrane compounds that focus some molecules to perform common functions. Knowledge about where the protein is located and located, it can therefore be useful for a better understanding that protein and its role in a healthy or sick cell, but scientists lacked a systematic way of predicting this information.
Meanwhile, the protein structure was studied for over half a century, culminating in the artificial intelligence tool of Alphafold, which can predict the protein structure from the protein amino acid code, the linear building of building blocks, which consists of its structure. Alphafold and models such as it became widely used tools in research.
Proteins also contain amino acid regions, which do not make up a permanent structure, but are important for the help of proteins in a combination of dynamic compartments in the cell. Professor Mit Richard Young and colleagues wondered if the code in these regions can be used to predict the location of protein in the same way as other regions are used to predict the structure. Other researchers have discovered some protein sequences that encode protein location, and some began to develop predictive models for protein location. However, scientists did not know if the location of the protein to any dynamic compartment can be predicted on the basis of its sequence, nor had a comparable tool for Alphafold to predict the location.
Now Young, also a member of the Whitehead Institute for Biological Research; Young Lab Postdoc Henry Kilgore; Regina Barzilay, School of Engineering Distinguished Professor for AI and Health in the Department of Electrical Engineering and Computer Science and the main researcher at the Computer Science and Artificial Intelligence Laboratory (CSAIL); And colleagues built a model that they call Protgps. In an published article February 6 in the journal ScienceWith the first authors of Kilgore and graduates of the Barzila itamar Chinn, Peter Mikhael and Ilan Mitnikov, the interdisciplinary team debut in their model. Scientists show that ProtgP can predict which of 12 known types of intervals will be located protein, as well as whether the mutation associated with the disease will change this location. In addition, the research team has developed a generative algorithm that can design new proteins for locations in specific compartments.
“I hope that this is the first step towards a powerful platform that allows people to study proteins to conduct their research,” says Young, “and that it helps us understand how people develop in complex organisms, how mutations interfere with these natural processes and how to generate therapeutic hypotheses and design drugs for the treatment of dysfunction in the cell.”
Scientists also approved many model forecasts using experimental tests in cells.
“I really excited me that I could go from computing design to trying these things in the laboratory,” says Barzilay. “There are many exciting articles in this field of artificial intelligence, but 99.9 percent of them are never tested in real systems. Thanks to our cooperation with the young laboratory, we were able to test and really find out how our algorithm is doing well.”
Development of the model
Scientists trained and tested Protgps on two parts of proteins with known locations. They discovered that this could correctly predict where proteins end in high accuracy. Researchers also checked how ProtgP can predict changes in protein location based on mutations related to the disease in protein. It was found that many mutations – changes in the sequence of the gene and its appropriate protein contribute to the disease or cause disease based on associative studies, but the ways in which the mutations lead to the symptoms of the disease remain unknown.
Considering the mechanism how the mutation contributes to the disease is important, because then scientists can develop therapies to determine this mechanism, prevent or treat the disease. Young and colleagues suspected that many disease -related mutations could contribute to the disease by changing the location of protein. For example, a mutation can make protein unable to join the compartment containing the necessary partners.
They tested this hypothesis, feeding over 200,000 proteins with mutations related to the disease, and then asking it to predict where these mutated proteins will locate and measures how much its prediction changed for a given protein from normal to a mutated version. A large change in the forecast indicates a probable change in location.
Scientists have discovered many cases in which the disease associated with the disease seemed to change the location of the protein. They tested 20 examples in cells, using fluorescence to compare, where the cell has normal protein and its mutated version. The experiments were confirmed by Protgps forecasts. To sum up, the discoveries confirm the suspicion of researchers that the wrong location can be an underestimated disease mechanism and show the value of ProtgP as a tool for understanding the disease and identifying new therapeutic possibilities.
“The cell is such a complicated system, with so many components and complex interaction networks,” says Mitnikov. “It is very interesting to think that thanks to this approach we can disturb the system, see the result, and therefore drive the discovery of mechanisms in the cell, and even rely on this therapeutics.”
Scientists hope that others are starting to use ProtgP in the same way as they use predictive structural models such as Alphafold, developing various projects regarding the function of protein, dysfunction and diseases.
Going beyond the forecast to the innovative generation
Scientists were excited about the possible applications of their forecasting model, but they also wanted their model to go beyond predicting the location of existing proteins and let them design completely new proteins. The goal was to create completely new amino acid sequences, which after creating in the cell would be located in the desired place. Generating a new protein, which can actually perform a function – in this case the location function in a specific cellular range – is extremely difficult. To increase the chances of their model's success, scientists have limited their algorithm to designing only proteins such as those found in nature. This is a common approach in the design of drugs for logical reasons; Nature was billions of years to find out which protein sequences work well and which do not.
Due to cooperation with the young laboratory, the machine learning team was able to check if their protein generator was working. The model had good results. In one round he generated 10 proteins intended for locating in the nucleus. When scientists tested these proteins in the cell, they discovered that four of them strongly located in the nucleus, and others could have little prejudices towards this place.
“Cooperation between our laboratories was so generative for us,” says Mikhael. “We learned how to speak of each other's languages, in our case they learned a lot about how cells work, and the possibility of experimental testing our model, we were able to find out what we need to do to make the model work and then make it work better.”
The ability to generate functional proteins in this way can improve researchers' ability to develop therapy. For example, if the drug must affect the goal that is located in a certain compartment, scientists can use this model to design the drug to locate there. This should make the medicine more effective and reduce side effects, because the drug will spend more time to get involved in the goal and less time to interact with other particles, causing results outside of goals.
Members of the machine learning team are delighted with the prospect of using what they have learned from this cooperation to design new proteins with other functions except locations that would expand the possibilities of therapeutic design and other applications.
“Many articles show that they can design a protein that can be expressed in the cell, but not that the protein has a specific function,” says Chinn. “In fact, we had a functional protein design and a relatively huge success rate compared to other generative models. This is really exciting for us and something we would like to build on.”
All involved researchers perceive ProtgP as an exciting start. They expect their tool to be used to learn more about the role of location in the function of protein and incorrect location in the disease. In addition, they are interested in expanding the model location forecasts with more types of intervals, testing more therapeutic hypotheses and designing more and more functional therapy proteins or other applications.
“Now that we know that this protein code for location exists and that machine learning models can understand this code and even create functional proteins using its logic, which open the door to so many potential research and applications,” says Kilgore.