We present the Gemini 2.5 computer support model

Earlier this year we mentioned that we provide developers with desktop capabilities through the Gemini API. Today we publish Gemini 2.5 computer usage modelour new specialized model based on Gemini 2.5 Pro's visual understanding and inference capabilities that enable agents to interact with user interfaces (UIs). Outperforms leading alternatives in many web and mobile control benchmarks, all with lower latency. Developers can access these capabilities through the Gemini API Google Artificial Intelligence Studio AND Apex AI.

Although AI models can connect to software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, such as filling out and submitting forms. To perform these tasks, agents must navigate websites and applications just like humans: clicking, typing, and scrolling. The ability to natively fill out forms, manipulate interactive elements such as drop-down menus and filters, and manipulate logins is a key next step in building powerful general-purpose agents.

How it works

The basic capabilities of the model are exposed using the new “computer_use” tool in the Gemini API and should be handled in a loop. The input to the tool is a user request, a screenshot of the environment, and a history of recent activities. The input can also specify whether to exclude features from the file complete list of supported UI actions or specify additional custom features you want to include.

LEAVE A REPLY

Please enter your comment!
Please enter your name here