Updates to Gemini 2.5 from Google Deepmind

New possibilities of Gemini 2.5

Native audio output and upgrades of the API live

Today API LIVE It introduces a version of the audio-visual preview of the input and native audio dialogue, so you can directly build conversational experiences with a more natural and expressive Gemini.

It also allows the user to direct the tone, accent and style of speaking. For example, you can tell the model to use a dramatic voice when telling stories. And supports the use of a tool to be able to search on your behalf.

You can experiment with a set of early functions, including:

  • An affective dialogue, in which the model detects emotions in the user's voice and reacts accordingly.
  • A proactive sound in which the model will ignore conversations in the background and knows when to answer.
  • Live API thinking, in which the model uses Gemini's thinking opportunities to service more complex tasks.

We also give a new preview for speech text in 2.5 Pro and 2.5 Flash. They have the first handling of many of a kind of many speakers, enabling text for speech with two votes via Native Audio Out.

Like native audio dialogue, speech text is expressive and can capture really subtle nuances, such as whispers. It works in over 24 languages ​​and switches smoothly between them.

LEAVE A REPLY

Please enter your comment!
Please enter your name here