What customers say
Google Cloud clients they are already using Gemini's native audio capabilities to drive real business results, from mortgage processing to customer conversations.
- “Users often forget they are talking to AI within a minute of using Sidekick, and in some cases they are thanking the bot after a long chat… The new AI Live API capabilities offered by Gemini (2.5 Flash Native Audio) enable our sellers to win.” – David Wurtz, VP of Product, Shopify
- “By integrating the Gemini 2.5 Flash Native Audio model… we have significantly expanded Mia's capabilities since its launch in May 2025. This powerful combination has enabled us to generate over 14,000 loans for our broker partners.” – Jason Bressler, Chief Technology Officer, United Wholesale Mortgage (UWM)
- “Working with Gemini 2.5 Flash Native Audio via Vertex AI allows Newo.ai AI receptionists to achieve unrivaled conversational intelligence… They can identify the keynote speaker even in noisy environments, switch languages mid-conversation, and sound incredibly natural and emotionally expressive.” – David Yang, co-founder of Newo.ai
Live speech translation
Gemini now natively supports new live speech-to-speech translation capabilities, designed to support both continuous listening and two-way conversation.
By listening continuously, Gemini automatically translates speech in multiple languages into one target language. This allows you to put on headphones and hear the world around you in your language.
For a two-way conversation, Gemini live speech translation supports translation between two languages in real time, automatically switching the output language depending on who is speaking. For example, if you speak English and want to talk to a Hindi speaker, you will hear real-time English translations on your headphones and when you finish speaking, your phone will broadcast the announcement in Hindi.
Gemini Live Speech Translation has a number of key features that help you in the real world:
- Linguistic coverage: Translates speech into over 70 languages and 2,000 language pairs, combining Gemini's world-class knowledge and multilingual capabilities with its native audio capabilities
- Style transfer: It captures the nuances of human speech, preserving the speaker's intonation, pace and tone, making the translation sound natural.
- Multilingual input: It understands multiple languages simultaneously in a single session, helping you follow multilingual conversations without having to fiddle with your language settings.
- Auto detection: It identifies the spoken language and starts translating, so you don't even need to know what language is spoken to start translating.
- Noise resistance: Filters ambient noise so you can talk comfortably even in noisy outdoor environments.


















