At the beginning of this year we mentioned that we provide programmers with the possibilities of using a computer via the API Gemini. Today we publish Model use of the Gemini 2.5 computerOur new specialized model based on the possibilities of visual understanding and inference of Gemini 2.5 Pro, which enable agents to interact with user interfaces (UI). It exceeds leading alternatives in many comparative tests of mobile network and devices, all at lower delays. Programmers can access these possibilities via the API Gemini interface in Google Artificial Intelligence Studio AND Top -up artificial intelligence.
Although artificial intelligence models can connect to software via structured API interfaces, many digital tasks still require direct interaction with graphic user interfaces, for example, filling out and sending forms. To perform these tasks, agents must move on websites and applications as well as people: by clicking, entering and scrolling. The ability to native filling forms, manipulating interactive elements, such as developed menu and filters, as well as operating after logins, is the key next step in building powerful general purpose agents.
How it works
The basic capabilities of the model are revealed using the new “Computer_use” tool in the API Gemini interface and should be supported in the loop. Input data to the tool is a user's request, a screenshot of the environment and the history of recent activities. Input data can also specify whether to exclude functions from the file Full list of supported user interface activities or specify additional non -standard functions that you want to include.