New research shows that reorganizing a model's visual representations can make it more useful, robust, and reliable
“Visual” artificial intelligence (AI) is everywhere. We use it to sort photos, identify unfamiliar flowers, and drive our cars. However, these powerful systems do not always “see” the world the way we do and sometimes behave in surprising ways. For example, an artificial intelligence system that can identify hundreds of car manufacturers and models may still fail to capture similarities between a car and an airplane. i.e both are large vehicles made primarily of metal.
To better understand these differences, we are publishing an article today new article in Nature examining important ways in which AI systems organize the visual world differently than humans. We present a method to better adapt these systems to human knowledge and show that addressing these discrepancies improves their robustness and ability to generalize.
This work is a step towards building more intuitive and trustworthy AI systems.
Why artificial intelligence struggles with “odd”
When you see a cat, your brain creates a mental representation that registers everything about the cat, from basic concepts such as its color and fur to high-level concepts such as “cat.” AI vision models also create representations by mapping images to points in a multidimensional space where similar objects (e.g., two sheep) are placed close to each other and different objects (a sheep and a cake) are placed far apart.
To understand differences in the organization of human and model representations, we used a classic “odd-number” task from cognitive science, asking both humans and models to choose which of three given images did not match the others. This test reveals which two items they “consider” to be most similar.
Sometimes everyone agrees. Considering tapir, sheep and birthday cake, both people and models reliably choose the cake as unusual. Other times the right answer is unclear and people and models disagree with it.
Interestingly, we also found many cases where humans strongly agree on an answer, but the AI models misinterpret it. In the third example below, most people would agree that the starfish is weird. However, most vision models focus more on superficial features, such as background color and texture, and choose the cat instead.
















