Charting the future of artificial intelligence, from safer solutions to faster thinking | MIT News

Adoption of new tools and technologies occurs when users largely perceive them to be reliable, accessible, and an improvement over available methods and workflows in terms of cost. From learning when to trust a model that predicts another person's accuracy to reasoning more effectively against knowledge bases, five graduate students in the inaugural class of the MIT-IBM Watson AI Lab summer program are using cutting-edge resources to alleviate AI-related problems and create new features and capabilities to promote the usability and adoption of AI. Collectively, the efforts of students and their mentors create a crossroads where practical and technically rigorous research leads to more reliable and valuable models in a variety of fields.

By building probes, routers, new attention mechanisms, synthetic datasets, and program synthesis pipelines, students' work explores security, inference efficiency, multimodal data, and knowledge-based reasoning. Their techniques emphasize scaling and integration, and the impact is always visible.

Learn to trust and when

In the research of MIT mathematics student Andrey Bryutkin, the reliability of models is a priority. Looks for internal structures within problems, such as the system's governing equations and conservation laws, to understand how to use them to obtain more reliable and robust solutions. Armed with this knowledge and working with the lab, Bryutkin developed a method for examining the nature of the behavior of large learning models (LLMs). Together with Veronika Thost of IBM Research and Marzyeh Ghassemi—an associate professor and Germeshausen Career Development Professor in MIT's Department of Electrical Engineering and Computer Science (EECS) and a member of the Institute of Medical Engineering Sciences and the Information Systems and Decisions Laboratory—Bryutkin explored the “uncertainty of uncertainty” in the LLM.

Classically, small forward-forward neural networks two to three layers deep, called probes, are trained alongside LLM and used to flag unreliable responses from the larger model to programmers; however, these classifiers can also produce false negatives and provide only point estimates that do not offer much information about when LLM fails. By examining safe/unsafe prompts and question-answer tasks, the MIT-IBM team used prompt and label pairs, as well as hidden states such as activation vectors and last tokens from LLM, to measure gradient scores, prompt sensitivity, and out-of-distribution data to determine probe reliability and understand areas of the data that are difficult to predict. Their method also helps identify potential labeling noise. This is a critical function because the credibility of AI systems depends entirely on the quality and accuracy of the labeled data on which they are built. More accurate and consistent probes are especially important for domains containing critical data in applications such as the IBM Granite Guardian family of models.

Another way to ensure reliable responses to LLM queries is to supplement them with external, trusted knowledge bases to eliminate hallucinations. For structured data such as social media connections, financial transactions, or corporate databases, knowledge graphs (GLs) are a natural solution; however, communication between LLM and KG often uses persistent, multi-agent pipelines that are computationally inefficient and expensive. To solve this problem, physics graduate student Jinyeop Song, along with lab researchers Yada Zhu of IBM Research and EECS Associate Professor Julian Shun, created a single-agent, multi-pivot reinforcement learning framework that streamlines this process. In this case, the group designed an API server supporting Freebase and Wikidata KG, which consists of general web knowledge-based data, and an LLM agent that performs targeted retrieval actions to retrieve relevant information from the server. Then, continuously, the agent appends the collected data from the GL to the context and responds to the query. Most importantly, the system uses reinforcement learning to train itself to provide answers that strike a balance between accuracy and completeness. The framework combines an API server with a single reinforcement learning agent to coordinate data-driven reasoning with greater accuracy, transparency, efficiency, and portability.

Spend your calculations wisely

The timeliness and completeness of a model's responses are of similar importance to the importance of its accuracy. This is especially true when handling long input texts and those whose elements, such as the story's theme, evolve over time, so EECS student Songlin Yang is reinventing models that can handle each stage of inference. Focusing on the limitations of transformers like those in LLM, Rameswar Panda of the IBM Research lab and Yoon Kim, NBX professor and associate professor at EECS, joined Yang to develop next-generation language model architectures beyond transformers.

Transformers face two key limitations: high computational complexity in modeling long sequences due to the softmax attention mechanism, and limited expressiveness due to the weak inductive bias of RoPE (rotational position encoding). This means that as the input length increases, the computational cost quadruples. RoPE allows transformers to understand the order of sequences of tokens (i.e. words); however, it does not do a good job of capturing internal state changes over time, such as variable values, and is limited to the length of the sequences observed during training.

To solve this problem, the MIT-IBM team explored theoretically grounded but hardware-efficient algorithms. As an alternative to softmax attention, they adopted linear attention, reducing the quadratic complexity that limits the possible length of a sequence. They also explored hybrid architectures that combine softmax and linear attention to provide a better balance between computational efficiency and performance.

Increasing expressiveness, they replaced RoPE with dynamic, reflective positional encoding based on the Householder transform. This approach enables richer positional interactions for a deeper understanding of sequential information, while maintaining fast and efficient computations. The MIT-IBM team's progress reduces the need for transformers to split problems into multiple stages, instead enabling them to deal with more complex subproblems with fewer inference tokens.

Visions anew

Visual data contains a wealth of data that the human brain can quickly analyze, internalize, and then imitate. Using visual language models (VLM), two graduate students are exploring ways to achieve this through code.

Over the last two summers, under the supervision of Aude Oliva, director of the MIT-IBM Watson AI Lab and senior research associate at the Computer Science and Artificial Intelligence Laboratory; and Rogerio Feris, Dan Gutfreund, and Leonid Karlinsky (now at Xero) of IBM Research, Jovana Kondic of EECS examined understanding of visual documents, particularly charts. They contain elements such as data points, legends, and axis labels that require optical character recognition and numerical inference, which models still struggle with. To make such tasks easier, Kondic's group decided to create a large, synthetic, open-source graph dataset from the code that could be used for training and benchmarking.

Using the ChartGen prototype, researchers created a pipeline that passes seed graph images through VLM, which is prompted to read the graph and generate the Python script that was likely used to create the graph in the first place. The platform's LLM component then iteratively extends code from multiple charts to ultimately generate over 200,000 unique chart pairs and their codes, covering nearly 30 chart types, as well as supporting data and annotations such as descriptions and chart Q&A pairs. The team continues to expand its dataset, helping to critically understand the multimodality of data visualization for enterprise applications such as financial and scientific reports, blogs, and more.

Instead of graphs, EECS graduate Leonardo Hernandez Cano is focusing on digital design, specifically generating visual textures for CAD applications, and aiming to discover effective ways to leverage the capabilities of VLM. Working with lab groups led by Armando Solar-Lezama, EECS professor and distinguished professor of computer science at the MIT Schwarzman College of Computing, and Nathan Fulton of IBM Research, Hernandez Cano created a program synthesis system that learns to improve code on its own. The system starts with a texture description provided by the user in the form of an image. It then generates an initial Python program that creates visual textures and iteratively refines the code to find a program that creates a texture that matches the target description, learning to find new programs based on data generated by the system itself. Thanks to these improvements, the innovative program can create visualizations with the desired brightness, color, opalescence, etc., imitating real materials.

Taken together, these projects and the people behind them represent a coherent step towards a more robust and practical AI. By overcoming fundamental challenges of reliability, performance and multimodal reasoning, the work paves the way for artificial intelligence systems that are not only more powerful, but also more reliable and cost-effective, for real-world enterprise and scientific applications.

LEAVE A REPLY

Please enter your comment!
Please enter your name here