Today we are rolling out an early version Gemini 2.5 Flash IN announcement via Gemini API via Google Artificial Intelligence Studio AND Apex AI. Based on the popular Flash 2.0 technology, this new version delivers significant improvements in inference capabilities while prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on and off. The model also allows developers to set thoughtful budgets to find the right trade-off between quality, cost, and latency. Even with meditate, Developers can maintain Flash 2.0's fast speeds and improve performance.
Our Gemini 2.5 models are thinking models, able to think through their thoughts before giving an answer. Instead of immediately generating an output, the model can perform a “thinking” process to better understand the prompt, break down complex tasks, and plan a response. For complex tasks that require multiple steps of reasoning (such as solving math problems or analyzing research questions), the thinking process enables the model to provide more accurate and comprehensive answers. In fact, the Gemini 2.5 Flash performs great Hard hints in LMArenasecond only to 2.5 Pro.
The 2.5 Flash delivers performance comparable to other leading models at a fraction of the cost and size.
Our most profitable thinking model
2.5 Flash continues to lead as the model with the best price-performance ratio.

Gemini 2.5 Flash adds another model to Google's quality pareto frontier.*
Fine-grained controls to manage your thinking
We know that different use cases have different trade-offs in terms of quality, cost and latency. To provide developers with flexibility, we have enabled file setting thinking budget which provides precise control over the maximum number of tokens the model can generate while thinking. A higher budget allows the model to reason further to improve quality. Importantly, the budget sets an upper limit on how much Flash 2.5 can think of, but the model does not use the entire budget unless the incentive requires it.

The quality of reasoning improves as the thinking budget increases.
The model is trained to know how long to think in response to a given prompt, and therefore automatically decides how much to think based on the perceived complexity of the task.
If you want to maintain the lowest cost and latency while improving performance compared to Flash 2.0, set your thinking budget to 0. You can also choose set a specific token budget at the thinking stage via a parameter in the API or a slider in Google AI Studio and Vertex AI. The budget can range from 0 to 24576 tokens for 2.5 Flash.
The prompts below show how much reasoning can be applied in Flash 2.5's default mode.
Low reasoning prompts:
Example 1: “Thank you” in Spanish
Example 2: How many provinces does Canada have?
Hints requiring intermediate reasoning:
Example 1: You roll two dice. What is the probability that they add 7?
Example 2: At my gym, basketball pick-up is from 9:00 p.m. to 3:00 p.m. on MWF and from 2:00 p.m. to 8:00 p.m. on Tuesdays and Saturdays. If I work from 9 p.m. to 6 p.m. 5 days a week and I want to play basketball 5 hours on weekdays, create a schedule for me that makes everything work.
Hints requiring high reasoning:
Example 1: A cantilever beam with a length of L = 3 m has a rectangular cross-section (width b = 0.1 m, height h = 0.2 m) and is made of steel (E = 200 GPa). It is subjected to a uniformly distributed load of w = 5 kN/m along its entire length and a point load of P = 10 kN at the free end. Calculate the maximum bending stress (σ_max).
Example 2: Write a function evaluate_cells(cells: Dict(str, str)) -> Dict(str, float) which calculates the values of spreadsheet cells.
Each cell contains:
- Or a formula like
"=A1 + B1 * 2"using+,-,*,/and other cells.
Requirements:
- Resolving dependencies between cells.
- Service operator priority (
*/before+-).
- Detect cycles and pick up
ValueError("Cycle detected at.") |
- NO
eval(). Use only built-in libraries.
Start building with Gemini 2.5 Flash today
Gemini 2.5 Flash with Thinking Features is now available in preview via Gemini API IN Google Artificial Intelligence Studio and Apex AIand in a dedicated drop-down menu in the file Gemini application. We encourage you to experiment with thinking_budget parameter and explore how controlled reasoning can help solve more complex problems.
from google import genai
client = genai.Client(api_key="GEMINI_API_KEY")
response = client.models.generate_content(
model="gemini-2.5-flash-preview-04-17",
contents="You roll two dice. What’s the probability they add up to 7?",
config=genai.types.GenerateContentConfig(
thinking_config=genai.types.ThinkingConfig(
thinking_budget=1024
)
)
)
print(response.text)
Python
Find detailed API references and thinking guides in our developer documentation or start with code examples With The Gemini cookbook.
We will continue to improve Gemini 2.5 Flash, with more coming soon, before making it generally available for full production use.
*Model prices come from artificial analysis and company documentation


















