Updated production-ready Gemini models, lowered 1.5 Pro price, increased bid limits and more

Today we are releasing two updated, production-ready Gemini models: Gemini-1.5-Pro-002 AND Gemini-1.5-Flash-002 along with:

  • >50% reduced price on 1.5 Pro (both input and output for prompts <128k)
  • 2x higher speed limits in 1.5 Flash and ~3x higher in 1.5 Pro
  • 2x faster output and 3x lower latency
  • Default filter settings updated

These new models build on our latest experimental releases and include significant improvements to the Gemini 1.5 models presented at Google I/O in May. Developers can get free access to our latest models via Google Artificial Intelligence Studio and Gemini API. For larger organizations and Google Cloud customers, models are also available on Apex AI.


Improved overall quality, with greater benefits in math, long context, and vision

The Gemini 1.5 Series are models designed for overall performance across a wide range of text, code, and multimodal tasks. For example, Gemini models can be used to synthesize information from 1,000-page PDF files, answer questions about repositories with more than 10,000 lines of code, record hour-long videos and create actionable content from them, and more.

With the latest updates, 1.5 Pro and Flash versions are now better, faster and cheaper to use in production. We see an increase of ~7% in MMLU-Pro, a more demanding version of the popular MMLU benchmark. In the MATH and HiddenMath (internal set of competitive math problems) benchmarks, both models achieved significant improvements of ~20%. For vision and code applications, both models also perform better (~2 to 7%) on tests measuring visual understanding and code generation in Python.

We've also improved the overall usefulness of model responses while adhering to our content security policies and standards. This means fewer cardinal responses/fewer denials and more helpful responses on many topics.

In response to developer feedback, both models now have a more concise style that is intended to make the models easier to use and reduce costs. For applications such as summarization, question answering, and extraction, the default output length of the updated models is ~5-20% shorter than previous models. For chat-based products where users default to longer responses, you can read ours hint strategy guide to learn more about how to make your models more detailed and conversational.

For more information on migrating to the latest versions of Gemini 1.5 Pro and 1.5 Flash, please refer to API Gemini models page.


Gemini 1.5 Pro

We continue to be amazed by the creative and useful applications of the 2 millionth Gemini 1.5 Pro token long context window and multimodal opportunities. From understanding video to processing 1000-page PDF filesmany new use cases remain. Today we are announcing a 64% price reduction on input tokens, a 52% price reduction on output tokens and a 64% price reduction on cache incremental tokens for our most powerful 1.5 series model, Gemini 1.5 Pro, valid from October 1, 2024for prompts containing less than 128,000 tokens. Combined with context cachingthis continues to reduce construction costs for Gemini.

Increased bid limits

To make it even easier for developers to build Gemini software, we are increasing the paid tier rate limits for 1.5 Flash to 2000 RPM and increasing 1.5 Pro to 1000 RPM, from 1000 and 360 RPM, respectively. We expect further growth in the coming weeks Gemini API rate limits so developers can create more with Gemini.


2x faster output and 3x lower latency

In addition to core improvements to our latest models, over the last few weeks we have reduced latency by 1.5 Flash and significantly increased the number of output tokens per second, enabling new use cases for our most powerful models.

Updated filter settings

Since Gemini's first launch in December 2023 building a safe and a reliable model was the main goal. In the latest versions of Gemini (-002 models), we have improved the model's ability to follow user instructions while maintaining security. We will continue to offer the package safety filters that developers can apply to Google models. For the models released today, filters will not be applied by default, allowing developers to determine the configuration that best suits their use case.


Gemini 1.5 Flash-8B Experimental Updates

We are releasing another improved version of the Gemini 1.5 announced in August called “Gemini-1.5-Flash-8B-Exp-0924”. This improved version provides significant performance gains for both text-based and multimodal applications. It is now available via Google AI Studio and Gemini API.

The overwhelmingly positive feedback developers have provided on Flash-8B version 1.5 has been incredible, and we will continue to shape our process for releasing experimental builds to production based on developer feedback.

We're excited about these updates and can't wait to see what you build with the new Gemini models! And for Advanced Gemini users, you will soon be able to access the chat-optimized version of Gemini 1.5 Pro-002.

LEAVE A REPLY

Please enter your comment!
Please enter your name here