Computer-aided design (CAD) systems are proven tools used to design many of the physical objects we use every day. However, mastering CAD software requires extensive expertise, and many tools have such a high level of detail that they are not suitable for brainstorming or rapid prototyping.
In an effort to make design faster and more accessible to non-experts, researchers at MIT and elsewhere have developed an artificial intelligence-based robotic assembly system that allows humans to build physical objects by simply describing them in words.
Their system uses a generative artificial intelligence model to create a three-dimensional representation of an object's geometry based on user input. Then, a second generative AI model analyzes the desired object and determines where various components should be placed according to the object's function and geometry.
The system can automatically build an object from a set of prefabricated parts using robotic assembly. It can also iterate through the design based on user feedback.
Scientists used this comprehensive system to produce furniture, including chairs and shelves, from two types of off-the-shelf components. The elements can be freely disassembled and reassembled, which reduces the amount of waste generated in the production process.
They evaluated these designs through a user study and found that over 90 percent of participants preferred objects made by their AI-based system over other approaches.
While this work is an initial demonstration, the platform may be particularly useful for rapid prototyping of complex objects such as aerospace components and architectural objects. In the long term, it could be used in homes to produce furniture or other items locally, without having to ship bulky products from a central plant.
“Sooner or later, we want to be able to communicate and talk to a robot and artificial intelligence system the same way we talk to each other to work together. Our system is the first step toward making that future possible,” says lead author Alex Kyaw, a graduate student in MIT's departments of electrical engineering and computer science (EECS) and architecture.
In the article, Kyaw is joined by Richa Gupta, an architecture graduate from MIT; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computing Group in the Department of Architecture; senior author Randall Davis, EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); as well as others at Google Deepmind and Autodesk Research. The paper was recently presented at a conference on neural information processing systems.
Generating a multi-component design
While generative AI models are good at generating 3D representations, called meshes, from text prompts, most do not create uniform representations of an object's geometry that contain the component-level detail needed for robotic assembly.
Separating these meshes into components poses a challenge to the model because the assignment of components depends on the geometry and functionality of the object and its parts.
The researchers tackled these challenges using the Vision-Language Model (VLM), a powerful generative artificial intelligence model that was pre-trained to understand images and text. They entrust VLM with the task of determining how two types of prefabricated parts, structural elements and panel elements, should fit together to form an object.
“There are many ways to place panels on a physical object, but the robot must see the geometry and think about that geometry to make a decision. By serving as both the robot's eyes and brain, VLM enables the robot to do this,” says Kyaw.
The user asks the system for text, for example by typing “make me a chair”, and initially displays an image of the chair generated by artificial intelligence.
VLM then considers the chair selection and determines where the panel elements sit on the structural members, based on the functionality of many of the example objects it has seen before. For example, the model may specify that the seat and backrest should have panels to provide a surface for a person to sit and lean on the chair.
Displays this information as text, such as “seat” or “backrest”. Each surface of the chair is then marked with numbers and the information is sent back to the VLM.
VLM then selects labels corresponding to the geometric parts of the chair that should receive panels on the 3D mesh to complete the design.
Human and artificial intelligence co-design
The user stays informed throughout the process and can refine the design by presenting a new prompt to the model, such as “only use panels on the backrest, not on the seat.”
“The design space is very large, so we narrow it down based on user feedback. We think this is the best way to do it because people have different preferences and it would be impossible to build a perfect model for everyone,” says Kyaw.
“The human-in-the-loop process allows users to control AI-generated projects and feel accountable for the end result,” adds Gupta.
Once the 3D mesh is completed, the robotic assembly system builds the object using prefabricated parts. These reusable parts can be disassembled and reassembled into various configurations.
The researchers compared the results of their method with an algorithm that placed panels on all horizontal surfaces facing upwards and an algorithm that placed panels randomly. In a user study, over 90 percent of people preferred the designs made by their system.
They also asked VLM to explain why it chose to place the panels in these areas.
“We learned that the vision language model is able to understand some of the functional aspects of the chair, such as backrest and sitting, to understand why it places panels on the seat and backrest. It's not just a matter of randomly assigning these assignments,” says Kyaw.
In the future, the researchers want to improve their system to support more complex and refined user prompts, such as a table made of glass and metal. Additionally, they want to include additional prefabricated components such as gears, hinges, or other moving parts so that the objects can have greater functionality.
“We hope to dramatically lower the barrier to accessing design tools. We have shown that we can use generative artificial intelligence and robotics to turn ideas into physical objects quickly, easily and easily,” says Davis.
















