This “intelligent trainer” helps LLM switch between text and code Myth news

Large language models (LLM) are distinguished by the use of text reasoning to understand the context of the document and give a logical response to its content. But the same LLM often try to correctly respond to even the simplest mathematical problems.

Text reasoning is usually a smaller way to think about computing or algorithmic tasks. While some LLM can generate code, such as Python to support symbolic queries, models do not always know when to use code or what type of code it will work best.

It seems that LLMS may need a trainer to direct them towards the best technique.

Enter CodesterAn intelligent assistant developed by MIT researchers who leads LLM to switch between code and text generation until he responds to the query.

Codester, LLM itself, automatically generates a series of hints for iterative control of larger LLM. He browses the current and previous answers of the model after each round and contains tips on how to fix or improve this solution until he thinks that the answer is correct.

Scientists have found that increasing the larger LLM by Codester increased its accuracy of symbolic tasks, such as multiplying numbers, playing in sudoku and laying blocks, by over 30 percent. This also enabled less advanced models to surpass more advanced models with improved reasoning skills.

This progress can improve the possibilities of solving LLM problems for complex tasks, which are particularly difficult to solve by means of text reasoning, such as generating robot paths in uncertain environments or planning shipments in an international supply chain.

“There is a race to develop better and better models that are able to do everything, but we have adopted a complementary approach. Scientists have spent years of developing effective technologies and tools to solve problems in many fields. We want LLM to choose the right tools and methods and use the specialist knowledge of others to increase their ability,” says Chuchu, Fan, Fan, Fan, Fan, Fan, Fan, Fan, Fan, Fan, Armon and Armon, (Astrothouts (Astrotho). In the MIT laboratory in the field of information and decision systems (LIDS).

A fan, an older author of the study, joins Article about work by a graduate Lids, Yongchao Chen; Graduate Assolder Aeroastro Yilun Hao; University of Illinois at Urbana-Champaign graduate Yueying Liu; I mit-ibm watson ai lab research scientist yang zhang. Research will be presented at an international conference on machine learning.

“Trainer” LLM

Ask LLM which number is larger, 9.11 or 9.9, and often gives the wrong answer using text reasoning. But ask it to use the code to answer the same question, and he can generate and perform a Python script to compare two numbers, an easy solution to the problem.

Initially, trained in understanding and predicting the human language, LLM more often respond to queries using the text, even when the code would be more effective. And although they learned to generate the code by refining, these models often generate an incorrect or less efficient version of the code.

Instead of trying to cross the powerful LLM, such as GPT-4 or Claude to improve these possibilities, you will get the smaller, light LLM researchers to lead a larger model between the text and the code. Tuning a smaller model does not change a larger LLM, so there is no risk that they would undermine other skills of a larger model.

“He was also inspired by people. In sport, the coach may not be better than the athlete's stars in the team, but the coach may continue to give helpful suggestions to lead the athlete. This control method also works for LLM,” says Chen.

This trainer, Codester, works in combination with a larger LLM. First, he reviews the query and determines whether the text or code is suitable for this problem, and what type of code would be the best.

Then he generates prompting with a larger LLM, telling him to use a text coding or textual reasoning method to answer the inquiry. The larger model follows this poem to answer the query and sends the result with a return to the codesteer who reviews it.

If the answer is incorrect, Codesteer will continue to monitor LLM to try various things that can solve the problem, for example, including the search algorithm or restriction in Python's code until the answer is correct.

“We found that often a larger LLM will try to be lazy and use a shorter, less efficient code that will not have the correct symbolic calculations. We designed Codester to avoid this phenomenon,” says Chen.

The symbolic chessboard assesses the complexity of the code and sends a signal to Codesteer if it is too simple or inefficient. Scientists also include checking the answer to Codester, which prompts LLM to generate code, which calculates the answer to check if it is correct.

Employing complex tasks

As scientists, they designed Codesteer, they could not find the right symbolic data sets to tune and test the model, because many existing reference points do not indicate whether a specific query can be best solved using a text or code.

So they collected the body of 37 complex symbolic tasks, including spatial reasoning, mathematics, reasoning and optimization, and built their own set of data, called symbench. They implemented the approach to tuning, which Symbench uses to maximize Codesteer performance.

In its experiments, Codester exceeded all nine basic methods that they assessed and increased the average accuracy from 53.3 percent to 86.4 percent. It maintains similar performance even in invisible tasks and different LLM.

In addition, the general purpose model enlarged by Codesteer can achieve higher accuracy than the latest models designed to focus on complex reasoning and planning, while requiring much less calculations.

“Our method uses its own LLM capabilities. By expanding LLM with the possibility of elegant use of coding, we can take a model that is already very strong and improve its performance even more,” says Chen.

In the future, scientists want to improve Codesteer to speed up the iterative process of hints. In addition, they examine how to effectively tune the unified model with the possibility of switching between textual reasoning and code generation, instead of relying on a separate assistant.

“The authors present an elegant solution to the critical challenge of using tools in LLM. This simple but influential method allows the latest LLM to achieve a significant improvement in performance without the requirement of direct tuning,” says Jinsung Yoon, a scientist from staff research in Google Cloud Ai, who was not involved in this work. “These studies are a significant contribution that promises a significant increase in the use of LLM on various tasks they are currently fighting.”

“Their success in training a smaller, specialized model of strategic conducting larger, advanced models is particularly influential,” adds Chi Wang, a senior scientist from Google Deepmind, who was not involved in this work. “This intelligent cooperation between various AI agents paves the way to more solid and versatile applications in complex real scenarios.”

These studies are partly supported by the American Naval Research Office and MIT-IBM Watson Ai Lab.

LEAVE A REPLY

Please enter your comment!
Please enter your name here