As the generative possibilities of AI models increase, you probably saw how they can transform simple text hints into hyperrealistic images and even extended video clips.
Recently, generative artificial intelligence has shown potential in helping chemists and biologists in studying static molecules such as proteins and DNA. Models such as Alphafold can predict molecular structures to accelerate the discovery of drugs, and the “myth” support mythRfdiffusion“For example, it can help to design new proteins. One of the challenges is that the particles are constantly moving and striving, which is important for modeling when constructing new proteins and medicines. Simulation of these movements on the computer using physics – a technique known as molecular dynamics – they can be very expensive, requiring billions of time steps.
As a step towards a more effective simulation of these behaviors, the myth Computer Science and Artificial Intelligence Laboratory (CSAIL) and Mathematics Department researchers have developed a generative model that learns from earlier data. The team's system, called MDGEN, can take a 3D molecule frame and simulate what happens next, like video, combine separate photos and even fill in the missing frames. By hitting the “playback button” on molecules, the tool can potentially help chemists in designing new molecules and strictly examine how well their drug prototypes in the field of cancer and other diseases would affect the molecular structure that they intend to influence.
Author Bowen Jing SM '22 says that MDGEN is an early proof of the concept, but suggests the beginning of a new exciting direction of research. “At the beginning, generative AI models produced somewhat simple films, such as a flashing person or a dog waving a tail,” says Jing, a PhD student at CSAIL. “Quickly ahead a few years, and now we have amazing models, such as Sora or VEO, which can be useful in various interesting ways. We hope to instill a similar vision of the molecular world, in which dynamics trajectories are movies. For example, you can give the first and 10 model. fleeting. “
Scientists say that MDGEN represents the paradigm from previous comparable work with generative AI in a way that allows much wider use of use. The previous approaches were “autoregressive”, which means that they relied on the previous zone to build the next one, starting from the first frame to create a video sequence. However, MDGen generates frames in parallel with diffusion. This means that MDGEN can be used, for example, connecting frames at end points or a “accelerated” trajectory with a low frame content, in addition to pressing playback in the initial frame.
These works were presented in an article shown at a conference on neural information processing systems (NEUIPS) in December last year. Last summer was awarded for the potential commercial influence at the international conference on the ML4LMS workshop at Machine Learning.
Some small steps forward for molecular dynamics
In Jing experiments and his colleagues, MDGEN simulations were similar to direct physical simulations, while producing trajectories 10 to 100 times faster.
The team first tested the ability of their model to take a 3D frame of the molecule and generate another 100 nanoseconds. Their system has laid another 10-second-second blocks for these generations to achieve this duration. The team stated that MDGEN was able to compete with the accuracy of the starting model, while ending the process of generating video in about a minute – only a fraction of three hours in which the basic model required the same dynamics.
After receiving the first and last sequence of one nanosecond sequence, MDGen also modeled the steps between. The system of scientists showed the degree of realism in over 100,000 different forecasts: it simulated more likely molecular trajectories than its basic clips shorter than 100 nanoseconds. In these tests, MDGen also indicated the ability to generalize peptides that he had not previously seen.
MDGEN capacity also includes simulation of frames within, “impairment” of steps between each nanosecond, to properly capture faster molecular phenomena. Maybe even “inspiring” molecules' structures, restoring information about them that have been removed. These features can ultimately be used by scientists to design protein based on the specification of how different parts of the molecule should move.
Playing with protein dynamics
Author Jing and contemporary Hannes Stärk claims that MDGEN is an early sign of progress towards more efficiently generating molecular dynamics. Despite this, they lack data so that these models are immediate influential in the design of medicines or molecules that induce the movements of chemists will want to see in the target structure.
Scientists are aimed at scaling MDGen from modeling particles to predict how proteins change over time. “We are currently using toy systems,” says Stärk, also a PhD student at CSAIL. “To strengthen MDGEN predictive capabilities for protein modeling, we will have to rely on the current available architecture and data. We do not have a YouTube repository yet for this type of simulation, so we hope to develop a separate machine learning method that can accelerate the process of data collection for our model.”
For now, MDGen presents the encouraging path forward in modeling molecular lesions invisible in the naked eye. Chemists can also use these simulations to delve into the behavior of medicine prototypes in the case of diseases such as cancer or tuberculosis.
“Methods of machine learning that learn from physical simulation are a new new limit in artificial intelligence for science,” says Bonnie Berger, professor MIT Simons Mathematics, the main CSAIL researcher and senior author in the newspaper. “MDGEN is a versatile, multi -functional framework that combines these two domains, and we are very excited about sharing our early models in this direction.”
“Trying realistic transition paths between molecular states is the main challenge,” says colleague, elderly author Tommi Jaakkola, who is Professor Mit Thomas Siebel Electrical Engineering and Computer Science, as well as the Institute of Data, Systems and Society, and the main CSAIL researcher. “This early work shows how we can start solving such challenges by transferring generative modeling to full simulation waveforms.”
Scientists from the entire field of bioinformatics heralded this system due to its ability to simulate molecular transformation. “MDGen models molecular dynamics simulations as a joint distribution of structural embedded, capturing molecular movements between discrete time stages,” says Professor Chalmers University of Technology Professor Simon Olsson, who was not involved in the study. “Using the masked learning goal, MDGEN enables innovative use cases, such as taking a transition path samples, drawing analogies to trajectories related to metastabilic connection.”
The work of scientists on MDGEN was partly supported by the National Institute of General Medical Sciences, United States Department of Energy, National Science Foundation, Machine Learning for Pharmaceutical Discovery and Syntisis Consortium, Abdul Latif Jameel Clinic for Machine Learning in Health, Agency Reduction Defense threats and the Advanced Research Projects defense agency.