We are announcing a new open source toolkit for interpreting language models
Large language models (LLMs) are capable of incredible feats of reasoning, but their internal decision-making processes remain largely opaque. If a system isn't behaving as expected, the lack of insight into its inner workings can make it difficult to pinpoint the exact cause of its behavior. Last year, we advanced the science of interpretability with Gemma Scope, a toolkit designed to help researchers understand the inner workings of Gemma 2, our lightweight collection of open models.
Today we are slowing down Gemma Range 2: a comprehensive, open set of interpretation tools for all Gemma 3 sizes, from 270M to 27B. These tools can enable us to track potential threats throughout the “brain” of the model.
To our knowledge, this is the largest-ever release of open-source interpretation tools from an AI lab. The production of Gemma Scope 2 required storing approximately 110 petabytes of data and training over 1 trillion total parameters.
As AI continues to advance, we look forward to the AI research community using Gemma Scope 2 to debug emerging model behavior, using these tools to better audit and debug AI agents, and ultimately accelerating the development of practical and robust security interventions against problems such as prison breaks, hallucinations, and sycophancy.
Our Gemma 2 interactive telescope A demo is available to try out courtesy of Neuronpedia.
What's new in Gemma Scope 2
Interpretability research aims to understand the inner workings and learned algorithms of artificial intelligence models. As AI becomes more powerful and complex, interpretability is critical to creating safe and reliable AI.
Like its predecessor, Gemma Scope 2 acts as a microscope for the Gemma family of language models. By combining sparse autoencoders (SAEs) and transcoders, it allows researchers to look inside models, see what they are thinking about, and how those thoughts arise and connect to the model's behavior. This, in turn, enables richer exploration of jailbreaks or other security-relevant AI behavior, such as discrepancies between the reasoning conveyed by a model and its internal state.
While the original Gemma Scope enabled research in key safety areas such as model hallucination, identifying secrets known to the modelAND training safer modelsGemma Scope 2 supports even more ambitious research with significant improvements:
- Full coverage at scale: We provide a complete set of tools for the entire Gemma 3 family (up to 27B parameters), necessary to study emergent behaviors that only appear on a large scale, such as these previously discovered by the 27b C2S scale model, which helped discover a new potential path for cancer therapy. While Gemma Scope 2 is not trained on this model, it is an example of the type of emergent behavior these tools may be able to understand.
- More sophisticated tools to decipher complex internal behavior: Gemma Scope 2 includes SAE and transcoders trained on every layer of our Gemma 3 family of models. Skip transcoders AND Cross-layer transcoders make it easier to decipher multi-step calculations and algorithms distributed throughout the model.
- Advanced training techniques: We use the most modern techniques, in particular Matryoshka training techniquewhich helps SAE detect more useful concepts and addresses some flaws discovered in Gemma Scope.
- Tools for analyzing chatbot behavior: We also provide interpretation tools tailored to Gemma 3 versions tailored for chat applications. These tools enable the analysis of complex, multi-step behaviors such as jailbreaks, denial mechanisms, and chain of thought fidelity.


















