New algorithms enable effective machine learning using symmetrical data Myth news

If you turn the picture of the molecular structure, a person can say that the rotated image is still the same molecule, but the machine learning model may think that this is a new data point. In IT language, the molecule is “symmetrical”, which means that the basic structure of this molecule remains the same if it undergoes some transformation, such as rotation.

If the model of discovering drugs does not understand symmetry, it can make inaccurate forecasts regarding molecular properties. But despite some empirical successes, it was unclear whether there was an effective computational method of training a good model that guarantees respect for symmetry.

The new study conducted by researchers MIT answers this question and shows the first machine learning method with symmetry, which is possible to perform both in terms of the amount of calculations needed and the data needed.

These results explain the fundamental question and can help researchers develop stronger machine learning models that have been designed to support symmetry. Such models would be useful in various applications, from discovering new materials to identification of astronomical anomalies to solving complex climate patterns.

“These symmetries are important because they are a kind of information that nature tells us about data, and we should take this into account in our machine learning models. We have now shown that it is possible to machine learning with symmetrical data in an effective way,” says Behrooz Tahmasebi, a graduate of MIT and author operators of this study.

He joins him paper by the author and graduate myth Ashkan Soleymani; Stefanie Jegelka, professor of electrical and computer science (ECS) and member of the Institute for Data, Systems and Society (IDSS) and the IT laboratory and artificial intelligence (CSIL); and senior author Patrick Jaillet, Dugald C. Jackson professor of electrical and computer science and main researcher in the laboratory of information and decision systems (LIDS). Research has recently been presented at an international conference on machine learning.

Studying symmetry

Symmetrical data appear in many domains, especially on natural sciences and physics. The model that recognizes symmetries is able to identify an object such as a car, no matter where this object is placed, for example, in the image.

While the machine learning model is not designed to support symmetry, it can be less accurate and susceptible to failure in the face of new symmetrical data in real situations. On the other hand, models using symmetry can be faster and require less data for training.

But training of the symmetrical data processing model is not an easy task.

One common approach is called data enlargement in which researchers transform each symmetrical data point into many data points to help the model better generalize into new data. For example, you can repeatedly rotate the molecular structure to obtain new training data, but if scientists want this model to be guaranteed to respect symmetry, it can be excessive computation.

An alternative approach is coding of symmetry to model architecture. A well -known example of this is the neural network of graphs (GNN), which by nature supports symmetrical data due to the way they design.

“Graph Neural Networks are fast and efficient and they care about symmetry quite well, but nobody really knows what they are learning or why they work. Understanding GNN is the main motivation of our work, so we started with the theoretical assessment of what happens when the data is symmetrical,” says Tahmasebi.

They examined the statistical compromise in the field of machine learning with symmetrical data. This compromise means methods that require fewer data, can be more expensive computing, so scientists must find the right balance.

Based on this theoretical assessment, scientists designed an effective machine learning algorithm with symmetrical data.

Mathematical combinations

To do this, they borrowed ideas from Algebra to reduce and simplify the problem. Then they reformulated the problem using geometry ideas that effectively capture symmetry.

Finally, they combined algebra and geometry in the optimization problem, which can be effectively solved, which results in their new algorithm.

“Most of the theories and applications focused on algebra or geometry. Here we simply combined them,” says Tahmasebi.

The algorithm requires less data samples for training than classic approaches, which would improve the accuracy and the ability to adapt to new applications.

By proving that scientists can develop efficient machine learning algorithms with symmetry and showing how it can be done, these results can lead to the development of new neural networks architecture, which can be more accurate and less demanding resources than current models.

Scientists can also use this analysis as a starting point to examine the internal GNN activities and how their operations differ from the algorithm developed by MIT scientists.

“When we know that it is better, we can design a more interpretative, more solid and more efficient architecture of neural networks,” adds Solemani.

These studies are partly financed by the National Research Foundation of Singapore, DSO National Laboratories of Singapore, the American Naval Research Office, the American National Science Foundation and Professor Alexander von Humboldt.

LEAVE A REPLY

Please enter your comment!
Please enter your name here