Today we are glad that we can share updates on the entire form of our model family Gemini 2.5:
- Gemini 2.5 Pro is generally available and stable (no changes in the review 06-05)
- Gemini 2.5 Flash is generally available and stable (no changes in review 05-20, see price updates below)
- Gemini 2.5 Flash-Lite is now available in preview
Gemini 2.5 models are thought of models capable of reasoning their thoughts before response, which results in increased performance and better accuracy. Each model has control over the budget of thinking, which gives programmers the opportunity to choose when and how much the “thoughts” model before generating answers.
Review of our family thinking models Gemini 2.5
We present Gemini 2.5 Flash-Lite
Today we are introducing 2.5 flash-lite in a preview with the lowest delay and costs in a model family 2.5. It is designed as a profitable update from our previous 1.5 and 2.0 flash models. It also offers better performance in most evolution and a lower time for the first token, while reaching higher tokens per second. This model is perfect for high bandwidth tasks, such as classification or a summary on a scale.
Gemini 2.5 Flash-Lite is a model of reasoning that allows dynamic control of thinking budget with the API parameter. Because Flash-Lite is optimized in terms of costs and speed, “thinking” is turned off by default, unlike our other models. 2.5 Flash-Lite also supports all our native tools, such as grounding with Google search, code and URL context in addition to calling function.
Benchmarks for Gemini 2.5 Flash-Lite
Flash updates and Gemini prices 2.5
Over the past year, our research teams are still pushing Pareto Frontier thanks to our Flash Model series. When 2.5 flash was initially announced, we have not yet finalized the possibility of 2.5 Flash-Lite. We also launched “thinking” and “do not think about the price”, which led to the confusion of programmers.
Thanks to the stable version of Gemini 2.5 Flash implementation (i.e. the same preview of the 05-20 model, which we made available on Google I/O) and an amazing efficiency of 2.5 flash, we update prices for 2.5 flash:
- $ 0.30 / 1 million input tokens (*compared to the $ 0.15 input)
- Output tokens 2.50 USD / 1 ml
- We removed thinking vs. They don't think about price difference
- We have maintained one layer of price, regardless of the size of the input token
Although we try to maintain consistent prices between a preview and stable layoffs to minimize interference, this is a special regulation reflecting the exceptional value of Flash, still offering the best available cost for intelligence.
And thanks to Gemini 2.5 Flash-Lite we now have an even lower option (with or without thinking) for cases of using costs and delays that require less model intelligence.
Price updates for our family Gemini Flash
If you use the Flash Gemini 2.5 04-17 preview, the existing preview prices will remain valid until the planned depreciation on July 15, 2025, at this point this end point will be turned off. You can go to the generally available “Gemini-2.5-Flash” model or go to the 2.5 Flash-Lite preview as an option of a lower cost.
Continuous growth of Gemini 2.5 Pro
The growth and demand for Gemini 2.5 Pro are still the most steep of our models we've ever seen. To enable more customers for this model in production, we create the version of the 06-05 model, with the same Pareto Frontier price point as before.
We expect that cases in which you need the highest intelligence, and most of the possibilities are in the place where you see pro shine, such as coding and agency tasks. Gemini 2.5 Pro is at the center of many most loved programmers tools.
The best programmers tools using Gemini 2.5 Pro
If you use 2.5 Pro Preview 05-06, the model will remain available until June 19, 2025, and then it will be turned off. If you use 2.5 Pro Preview 06-05, you can simply update your model model for “Gemini-2.5-PRO”.
We can't wait for even more domains to use 2.5 Pro intelligence and we are waiting for more about Scaling Beyond Pro in the near future.