Gemini 2.5 Native Audio update and speech synthesizer model updates

December 13, 2025

What customers say

Google Cloud clients they are already using Gemini's native audio capabilities to drive real business results, from mortgage processing to customer conversations.

“Users often forget they are talking to AI within a minute of using Sidekick, and in some cases they are thanking the bot after a long chat… The new AI Live API capabilities offered by Gemini (2.5 Flash Native Audio) enable our sellers to win.” – David Wurtz, VP of Product, Shopify
“By integrating the Gemini 2.5 Flash Native Audio model… we have significantly expanded Mia's capabilities since its launch in May 2025. This powerful combination has enabled us to generate over 14,000 loans for our broker partners.” – Jason Bressler, Chief Technology Officer, United Wholesale Mortgage (UWM)
“Working with Gemini 2.5 Flash Native Audio via Vertex AI allows Newo.ai AI receptionists to achieve unrivaled conversational intelligence… They can identify the keynote speaker even in noisy environments, switch languages mid-conversation, and sound incredibly natural and emotionally expressive.” – David Yang, co-founder of Newo.ai

Live speech translation

Gemini now natively supports new live speech-to-speech translation capabilities, designed to support both continuous listening and two-way conversation.

By listening continuously, Gemini automatically translates speech in multiple languages into one target language. This allows you to put on headphones and hear the world around you in your language.

For a two-way conversation, Gemini live speech translation supports translation between two languages in real time, automatically switching the output language depending on who is speaking. For example, if you speak English and want to talk to a Hindi speaker, you will hear real-time English translations on your headphones and when you finish speaking, your phone will broadcast the announcement in Hindi.

Gemini Live Speech Translation has a number of key features that help you in the real world:

Linguistic coverage: Translates speech into over 70 languages and 2,000 language pairs, combining Gemini's world-class knowledge and multilingual capabilities with its native audio capabilities
Style transfer: It captures the nuances of human speech, preserving the speaker's intonation, pace and tone, making the translation sound natural.
Multilingual input: It understands multiple languages simultaneously in a single session, helping you follow multilingual conversations without having to fiddle with your language settings.
Auto detection: It identifies the spoken language and starts translating, so you don't even need to know what language is spoken to start translating.
Noise resistance: Filters ambient noise so you can talk comfortably even in noisy outdoor environments.

Gemini 2.5 Native Audio update and speech synthesizer model updates

What customers say

Live speech translation

LEAVE A REPLY Cancel reply

APLICATIONS

New AI agent learns to use CAD to create 3D objects...

Former content strategist MrBeast is building an AI tool for creative...

Improving Ethical AI: Using RLHF to Align LLMs with Human Preferences

Step-by-Step Guide to Unboxing and Assembling the Raspberry Pi AI Kit

HOT NEWS

EasyJailbreak: Simplifying Jailbreak Attack Creation and Assessment Against Emerging Threats with...

Secure AI Applications in Business: Embracing the New Paradigm

The Complete Guide to Optimizing and Using AI Tools

The Rise of Artificial Intelligence in Custom Fashion: A Revolution or...

POPULAR POSTS

Advantages and Disadvantages of the Top 14 AI Applications in 2024

National Recognition for GPHA Takoradi Hospital’s A.I. Application Focus Lab Week...

KRISP uses artificial intelligence to help Indians sound like Americans on...

POPULAR CATEGORY

The latest Deepmind research on ICLR 2023