Why did humans evolve the eyes we have today?
While scientists cannot go back in time to study the environmental pressures that have shaped the evolution of the diverse vision systems that exist in nature, a new computational framework developed by MIT researchers allows them to study this evolution of artificial intelligence agents.
Their framework, in which embodied artificial intelligence agents evolve their eyes and learn to see over many generations, resembles a “scientific sandbox” that allows researchers to recreate different evolutionary trees. The user does this by changing the structure of the world and the tasks performed by AI agents, such as searching for food or distinguishing objects.
This allows them to investigate why one animal could evolve simple, light-sensitive patches of eyes while another animal evolved complex, camera-like eyes.
The researchers' experiments with this environment show how tasks drove the evolution of agents' eyes. They found, for example, that navigation tasks often led to the evolution of compound eyes composed of many individual units, such as the eyes of insects and crustaceans.
On the other hand, if agents focused on distinguishing objects, they were more likely to evolve camera-type eyes with irises and retinas.
This framework could enable scientists to explore “what if” questions about vision systems that are difficult to investigate experimentally. It can also help design novel sensors and cameras for robots, drones and wearables that balance performance with real-world constraints such as energy efficiency and manufacturability.
“While we will never be able to go back and discover all the details of evolution, in this work we have created an environment where we can, in a sense, recreate evolution and examine the environment in many different ways. This method of doing science opens the door to many possibilities,” says Kushagra Tiwary, an MIT Media Lab graduate student and co-author of a paper on this research.
He is joined on the paper by co-author and co-author, PhD student Aaron Young; PhD student Tzofi Klinghoffer; former graduate student Akshat Dave, now an assistant professor at Stony Brook University; Tomaso Poggio, Eugene McDermott Professor in the Department of Brain and Cognitive Sciences, researcher at the McGovern Institute and co-director of the Center for Brain, Minds and Machines; co-senior authors Brian Cheung, a postdoc at the Center for Brains, Minds and Machines and a new assistant professor at the University of California, San Francisco; and Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; and others at Rice University and Lund University. Tests is published today in Progress of science.
Building a science sandbox
The article began with a conversation between researchers about the discovery of new vision systems that could be useful in various fields such as robotics. To test “what if” questions, researchers decided to do this use artificial intelligence to explore multiple evolutionary possibilities.
“What-if questions inspired me as I grew up and studied science. With artificial intelligence, we have a unique opportunity to create embodied agents that allow us to ask questions that are usually impossible to answer,” says Tiwary.
To build this evolutionary sandbox, researchers took all the components of a camera, such as sensors, lenses, apertures and processors, and turned them into parameters that could be learned by an embodied AI agent.
They used these elements as a starting point for an algorithmic learning mechanism that the agent would use as the eyes evolved over time.
“We couldn't simulate the entire universe atom by atom. It was difficult to determine what components we needed and didn't need and how to allocate resources among these different components,” says Cheung.
Within these, this evolutionary algorithm can choose which elements to evolve based on the constraints of the environment and the agent's task.
Each environment has a single task, such as navigation, food identification, or prey tracking, designed to mimic real-world visual tasks that animals must overcome to survive. Agents start with a single photoreceptor that looks at the world and an associated neural network model that processes visual information.
The agent is then trained throughout its life using reinforcement learning, a trial-and-error technique in which the agent is rewarded for achieving the goal of its task. The environment also contains constraints, such as a specific number of pixels for the agent's visual sensors.
“These constraints guide the design process in the same way that there are physical constraints in our world, such as the physics of light, that influence the design of our own eyes,” Tiwary says.
Over many generations, agents develop various elements of vision systems that maximize rewards.
Their structure uses a genetic coding mechanism to computationally mimic evolution, in which individual genes mutate to control the development of an agent.
For example, morphological genes record how an agent perceives its environment and controls the placement of its eyes; optical genes determine how the eye interacts with light and determine the number of photoreceptors; and neural genes control the agents' learning ability.
Hypothesis testing
When researchers conducted experiments in this area, they found that the tasks had a large impact on the vision systems that agents evolved.
For example, agents focused on navigation tasks developed eyes designed to maximize spatial awareness through low-resolution detection, while agents focused on object detection developed eyes focused more on frontal acuity rather than peripheral vision.
Another experiment showed that a bigger brain is not always better when it comes to processing visual information. Only a certain amount of visual information can enter the system at one time due to physical limitations such as the number of photoreceptors in the eyes.
“At some point, a bigger brain doesn't help agents at all, and in the wild, that would be a waste of resources,” Cheung says.
In the future, researchers want to use this simulator to study the best vision systems for specific applications, which could help scientists develop sensors and cameras tailored to specific tasks. They also want to integrate LLM into their environment to make it easier for users to ask “what if” questions and explore additional opportunities.
“There is a real benefit in asking questions in a more imaginative way. I hope this inspires others to create a broader framework in which, instead of focusing on narrow questions covering a specific area, they try to answer questions with a much broader scope,” says Cheung.
This work was supported in part by the Center for Brains, Minds and Machines and the Defense Advanced Research Projects Agency (DARPA) Mathematics for the Discovery of Algorithms and Architecture (DIAL) program.

















