After exciting premieres Gemma 3 AND Gemma 3QATour family of cutting-edge open models that can run on a single cloud or desktop accelerator, we further advance our vision of accessible AI. Gemma 3 gave developers advanced capabilities, and now we're extending that vision to high-performance, real-time AI directly on the devices you use every day – phones, tablets and laptops.
To power the next generation of AI on devices and support a diverse range of applications, including enhancing the capabilities of Gemini Nano, we have developed a new, state-of-the-art architecture. Created in close collaboration with mobile hardware leaders such as Qualcomm Technologies, MediaTek and Samsung's System LSI division, this next-generation foundation is optimized for lightning-fast, multimodal AI, enabling truly personal and private experiences right on your device.
Gemma 3n is our first open model built on this groundbreaking, shared architecture, enabling developers to start experimenting with the technology today in early preview. The same advanced architecture also powers the next generation of Gemini Nano, which brings these capabilities to a wide range of features within Google Apps and our on-device ecosystem and will be available later this year. Gemma 3n allows you to start building on a foundation that will be available on major platforms such as Android and Chrome.
This chart shows the ranking of AI models by Chatbot Arena Elo scores; higher scores (top numbers) indicate greater user preference. Gemma 3n ranks high among both popular proprietary and open models.
Gemma 3n uses a Google DeepMind innovation called Per-Layer Embeddings (PLE), which provides a significant reduction in RAM consumption. Although the raw parameter count is 5B and 8B, this innovation allows larger models to run on mobile devices or live stream from the cloud, with memory overhead comparable to the 2B and 4B model, meaning the models can run with dynamic memory usage as low as 2GB and 3GB. Find out more in our documentation.
By exploring Gemma 3n, developers can get an early preview of the core open model capabilities and mobile-friendly architectural innovations that will be available on Android and Chrome with Gemini Nano.
In this post, we'll discuss the new capabilities of Gemma 3n, our approach to responsible development, and how you can access the preview today.
Key capabilities of Gemma 3n
Designed for fast, lightweight, on-premise AI requirements, Gemma 3n delivers:
- Optimized device performance and efficiency: Gemma 3n starts to respond approximately 1.5 times faster on mobile devices, providing significantly improved quality (compared to Gemma 3 4B) and reduced memory consumption achieved thanks to innovations such as per-layer embedding, KVC sharing and advanced activation quantization.
- Many in 1 flexibility: A 4B active memory model that natively includes a nested, state-of-the-art 2B active memory submodel (thanks to Food shapes training). This provides the flexibility to dynamically switch performance and quality on the fly without having to provision separate models. Additionally, we are introducing a mix-and-match feature in Gemma 3n to dynamically create sub-models from the 4B that can optimally fit your specific use case – and the associated quality/latency trade-off. More information on this research can be found in our upcoming technical report.
- Privacy and offline readiness: Local execution enables features that respect user privacy and work reliably, even without an Internet connection.
- Extended multimodal understanding with audio: Gemma 3n understands and processes audio, text and images, and offers significantly improved video understanding. Audio capabilities enable the model to perform high-quality automatic speech recognition (transcription) and translation (speech into translated text). Additionally, the model accepts interleaved input across modalities, enabling understanding of complex multimodal interactions. (Public rollout coming soon)
- Improved multilingual capabilities: Improved multilingual performance, especially in Japanese, German, Korean, Spanish and French. Good results reflected in multilingual benchmarks such as 50.1% in WMT24++ (ChrF).

This graph shows MMLU performance versus model size under the Gemma 3n mix-n-match (pre-trained) feature.
Unlocking new travel experiences
Gemma 3n will enable a new wave of smart mobile applications, enabling developers to:
- Create interactive live experiences that understand and respond to real-time visual and auditory signals from the user's environment.
2. The power of deeper understanding and contextual text generation using combined audio, image, video and text input – all processed privately on the device.
3. Build advanced audio-centric applicationsincluding real-time speech transcription, translation, and rich voice-driven interactions.
Here's an overview and the types of experiences you can build:
Build responsibly, together
Our commitment to the responsible development of artificial intelligence is of the utmost importance. The Gemma 3n, like all Gemma models, has been subjected to rigorous safety assessments, data management and compliance with our security policies. We approach open models with detailed risk assessment and continually refine our practices as the AI landscape evolves.
Get started: watch Gemma 3n today
We're excited to get Gemma 3n in your hands today in preview:
Initial Access (available now):
- Cloud mining with Google AI Studio: Try Gemma 3n directly in your browser Google Artificial Intelligence Studio – no configuration required. Instantly explore text input options.
- On-device development with Google AI Edge: For developers looking to integrate Gemma 3n locally, Google's Artificial Intelligence Edge provides tools and libraries. You can start using the ability to understand/generate text and images today.
Gemma 3n is the next step in democratizing access to cutting-edge, efficient artificial intelligence. We're incredibly excited to see what you build as this technology gradually becomes available, starting with today's preview.
Check out this announcement and all Google I/O 2025 updates io.google from May 22.
















