Helping AI agents search for the best results from large language models | MIT News

Whether you're a scientist brainstorming research ideas or a CEO hoping to automate a task in human resources or finance, you'll find that AI tools are becoming the assistants you didn't know you needed. In particular, there are many professionals reaching for talents semi-autonomous software systems called AI agents that can call on artificial intelligence at specific points to solve problems and complete tasks.

AI agents are particularly effective when they use large language models (LLMs) because these systems are powerful, efficient, and adaptable. One way to program such technology is to describe in code what the system should do (the “workflow”), including when it should use LLM. If you are a software developer and are trying to modernize your old code to use a more modern programming language for better optimization and security, you can build a system that uses LLM to translate the codebase one file at a time, testing each file as it goes.

But what happens when LLMs make mistakes? You'll want the agent to back off and try again, using the lessons learned from previous mistakes. Coding this may take as much effort as implementing the original agent; if your codebase translation system contained thousands of lines of code, then you would make thousands of lines of code changes or additions to support the rollback logic when LLMs make mistakes.

To save developers time and effort, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Asari AI developed a platform called “EnCompass”.

With EnCompass, you no longer have to make these changes yourself. Instead, when EnCompass runs your program, it automatically rolls back if LLMs make mistakes. EnCompass can also create runtime clones of a program to perform multiple parallel trials to find the best solution. Broadly speaking, EnCompass searches through the various possible paths your agent might take as a result of the various possible outcomes of all LLM calls, looking for the path where LLM finds the best solution.

Then simply mark locations where you might want to undo or clone the program's runtime, and record any information that might be useful in the strategy you use to search for different possible agent execution paths (search strategy). You can then specify your search strategy separately – you can use the one provided by EnCompass or, if you want, implement your own custom search strategy.

“With EnCompass, we separated the search strategy from the core AI agent workflow,” says lead author Zhening Li ’25, MEng ’25, a doctoral student in electrical engineering and computer science (EECS) at MIT, a CSAIL researcher, and a research consultant at Asari AI. “Our framework allows developers to easily experiment with different search strategies to find the one that will provide the best performance for the AI ​​agent.”

EnCompass was used for agents implemented as Python programs calling LLM, where it showed noticeable code savings. EnCompass has reduced the coding effort associated with implementing search by up to 80 percent across agents such as the agent for translating code repositories and discovering digital network transformation principles. In the future, EnCompass could enable agents to perform large-scale tasks, including managing massive code libraries, designing and conducting science experiments, and creating blueprints for rockets and other equipment.

Branching out

When programming an agent, you select specific operations – such as calls to LLM – whose results may vary. These annotations are called “branching points”. If you imagine your agent program generating a single story arc, adding branch points turns that story into a choose-your-own-adventure role-playing game, where branch points are locations where the story branches off into multiple future plots.

You can then determine the strategy EnCompass uses to navigate the role-playing game in search of the best possible ending to the story. This may include running parallel threads of execution or reverting to a previous branch point when you hit a dead end.

Users can also immediately plug in several popular search strategies provided by EnCompass or define their own custom strategy. For example, you can choose Monte Carlo tree search, which builds a search tree by balancing exploration and exploitation, or beam search, which retains a few of the best results from each step. EnCompass makes it easy to experiment with different approaches to find the best strategy to maximize the likelihood of successfully completing a task.

EnCompass encoding performance

So how efficient is the EnCompass code at adding search to agent programs? According to the researchers' findings, the framework drastically reduced the number of programs developers had to add to their agent programs to add search, helping them experiment with different strategies to find the one that works best.

For example, researchers used EnCompass in an agent that translates a code repository from the Java programming language, commonly used to program applications and enterprise software, to Python. They found that implementing search with EnCompass—which mainly involved adding annotations for branching points and annotations that recorded the effectiveness of each step—required 348 fewer lines of code (about 82 percent) than the manual implementation. They also showed how EnCompass allowed them to easily try out different search strategies, identifying the best strategy as a two-level bundle search algorithm, achieving a 15 to 40 percent increase in accuracy across five different repositories with a search budget of 16 times the LLM calls made by the searchless agent.

“As LLM programs become more and more integral to everyday software, it becomes increasingly important to understand how to effectively create software that leverages their strengths and circumvents their limitations,” says co-author Armando Solar-Lezama, MIT professor in EECS and principal investigator of CSAIL. “EnCompass is an important step in this direction.”

The researchers add that EnCompass is intended for agents whose program defines workflow steps at a high level; the current version of their framework is less applicable to agents wholly controlled by the LLM. “For these agents, instead of having a program lay out the steps and then using the LLM to complete those steps, everything is decided by the LLM itself,” Li says. “There is no programming workflow, so you can run inference-time searches on the fly on anything LLM comes up with. In this case, there is less need for a tool like EnCompass, which modifies how the program executes with search and undo.”

Li and his colleagues plan to expand EnCompass to more general AI agent search platforms. They also plan to test their system on more complex tasks to improve it for real-world applications, including in businesses. Moreover, they evaluate how well EnCompass helps agents collaborate with humans on tasks such as brainstorming hardware designs or translating much larger code libraries. For now, EnCompass is a powerful building block that allows people to more easily tinker with AI agents, improving their performance.

“EnCompass comes at an opportune time as agents using artificial intelligence and search-based techniques are beginning to transform software engineering workflows,” says Yiming Yang, a professor at Carnegie Mellon University who was not involved in the research. “By clearly separating an agent's programming logic from its search strategy based on inference time, the platform offers a principles-based way to explore how structured search can improve code generation, translation, and analysis. This abstraction provides a solid foundation for a more systematic and robust approach to search-driven software development.”

Li and Solar-Lezama wrote the paper with two Asari artificial intelligence researchers: Caltech professor Yisong Yue, an advisor at the company; and senior author Stephan Zheng, who is the founder and CEO. Their work was supported by Asari artificial intelligence.

The team's work was presented in December at the conference on neural information processing systems (NeurIPS).

LEAVE A REPLY

Please enter your comment!
Please enter your name here