Updated Gemini models ready for production, reduced prices 1.5 Pro, increased speed limits and more

Today we release two updated Gemini models ready for production: Gemini-1.5-PRO-002 AND Gemini-1.5-Flash-002 together with:

  • > 50% reduced price 1.5 Pro (both input and output for hints <128k)
  • 2x higher speed limits 1.5 flash and ~ 3x higher at 1.5 PRO
  • 2x faster output and 3x lower delay
  • Updated default filter settings

These new models are based on our latest versions of the experimental model and include significant improvements of Gemini 1.5 models released on Google I/O in May. Developers can access our latest models for free via Google to learn and API Gemini. For larger organizations and Google Cloud customers, models are also available Vertex AI.


Improvement of overall quality, with greater profits in mathematics, long contexts and vision

The Gemini 1.5 series are models designed for general performance in a wide range of text, code and multimodal tasks. For example, Gemini models can be used to synthesize information from 1000 pdf pages, answer questions about repository containing over 10,000 code lines, take an hour for an hour and create useful content and more.

Thanks to the latest 1.5 Pro and Flash updates, they are now better, faster and more profitable to build in production. We see ~ 7% height MMLU-PRO, a more difficult version of the popular MMLU comparative test. In relation to mathematics and Hiddenmath (internal set of mathematical problems of the competition), both models have made a significant improvement in ~ 20%. In cases of vision and code, both models also work better (from ~ 2-7%) between evolutions measuring visual understanding and generation of Python code.

We also improved the overall usefulness of the response model, while maintaining our principles and content safety standards. This means less assuming/less refusals and more helpful answers to many topics.

Both models now have a more concise style in response to feedback for programmers that are to facilitate the use of these models and reduce costs. In the case of use, such as summary, answering questions and extraction, the default output length of updated models is ~ 5-20% shorter than previous models. In the case of chat products, in which users can prefer longer answers by default, you can read ours Under the guide of fast strategies To learn more about how to make models more talkative and conversational.

For more information on migration to the latest versions of Gemini 1.5 Pro and 1.5 Flash, check API Gemini API MODELS.


Gemini 1.5 Pro

We are still surprised by the creative and useful applications of 2 million Gemini 1.5 Pro tokens Long contextual window and multimodal possibilities. From understanding video to Processing 1000 pages PDFSThere are so many new use cases that should be built. Today we announce a 64% reduction in the price of input tokens, by a 52% reduction in the price of production tokens and a 64% reduction in the price of incremental buffered tokens for our strongest series of the 1.5 series, Gemini 1.5 Pro, Applies on October 1, 2024on hints less than 128 tokens. In combination with Context bufferingThis continues to increase building costs with Gemini Down.

Increased speed limits

To make it even easier to build twins developers, we increase the rate limits for 1.5 flash to 2000 RPM and increase 1.5 PRO to 1000 rpm, respectively compared to 1000 and 360. In the coming weeks we expect that we will continue to increase API Gemini speed limits Therefore, developers can build more from Gemini.


2x Faster output and 3x less delay

In addition to the basic improvements of our latest models, in the last few weeks we have brought a delay from 1.5 flash and significantly increased the output tokens per second, enabling new cases of use with our most powerful models.

Updated filter settings

From the first launch of Gemini in December 2023, building a safe And a reliable model was a key goal. Thanks to the latest vemin versions (-002 models), we improved the model's capabilities to follow the user's manual while balancing security. We will continue to offer the package Safety filters This programmers can apply to Google models. In the case of models released today, filters will not be used by default so that developers can determine the configuration best suited to their use cases.


Gemini 1.5 Flash-8B Experimental updates

We issue a further improved version of the Gemini 1.5 model, which we announced in August “Gemini-1.5-Flash-8B-Exp-0924”. This improved version includes a significant increase in performance in both text and multimodal cases. It is now available via Google AI Studio and API Gemini.

Mostly positive reviewers shared about 1.5 flash-8b, it was amazing, and we will continue to shape our experimental pipeline based on feedback for programmers.

We are excited about these updates and I can't wait to see what you will build with new Gemini models! And for Gemini Advanced Users, soon you will be able to access the optimized chat version of the Gemini 1.5 Pro-002 version.

LEAVE A REPLY

Please enter your comment!
Please enter your name here