Google I/O Overdrive: Google’s new AI offerings include Veo 3 video generator, lightweight Gemma 3n, updates to Gemini Pro and Ultra, and more
Google revamped its roster of models, closed and open, and added more AI-powered features to its existing products.

Google revamped its roster of models, closed and open, and added more AI-powered features to its existing products.
What’s new: Google staged a parade of announcements at this year’s I/O developer conference. New offerings include improvements to Gemini 2.5 Pro and Gemini 2.5 Flash and a preview of Gemma 3n (all three generally available in June), the updated Veo 3 video generator (available via Flow, Google’s AI videography app, for paid subscribers to its AI Pro and Ultra services), and increasingly AI-powered search.
How it works: The I/O offerings spanned from public-facing products to developer tools.
- Google updated Gemini 2.5 Pro and the speedier Gemini 2.5 Flash with audio output, so both models now take in text, audio, images, and video and produce text and audio. In addition, they offer summaries of tokens produced while reasoning. Gemini-2.5-Pro-Preview-05-06, which topped the LMSys Text Arena and WebDev Arena (tied with Claude 4 Opus and Sonnet), lets users set a reasoning budget up to 128,000 tokens, enabling it to outperform OpenAI o3 and o4-mini (set to high effort) on math, coding, and multimodal benchmarks in Google’s tests. Gemini-2.5-Flash-Preview-05-20 uses 22 percent fewer tokens than its predecessor while ranking near the top of the LMSys Text Arena and WebDev Arena.
- The Veo 3 text-to-video generator produces 3840x2160-pixel video with audio (dialogue, sound effects, and music) with creative controls including the ability to add and remove objects and maintain consistent characters. It bested Kuaishu Kling 2.0, Runway Gen 3, and OpenAI Sora in Google’s comparisons.
- New members of Google’s Gemma 3 family of open-weights models, Gemma 3n 5B and 8B, are multilingual (over 140 languages), multimodal (text, vision, audio in; text out), and optimized for mobile platforms. Gemma-3n-E4B-it (8 billion parameters) ranks just ahead of Anthropic Claude 3.7 Sonnet in the LMSys Text Arena. Gemma 3n 5B and 8B are 1.5 times faster than their predecessors and require 2 gigabytes and 3 gigabytes of memory, respectively, thanks to techniques that include per-layer embeddings, key-value caching, conditional parameter loading (constraining active parameters to specific modalities at inference), and a Matryoshka Transformer design that dynamically activates nested sub-models. They’re available in preview via Google’s AI Studio, AI Edge, GenAI SDK, or MediaPipe.
- Google introduced several specialized AI tools and models. Jules is an autonomous, asynchronous, multi-agent coding assistant that clones repos into a secure virtual machine to perform tasks like writing tests, building features, and fixing bugs (available in public beta). SignGemma translates American sign language to text (previously ASL to English). MedGemma analyzes medical text and images (part of the open-weights collection Health AI Developer Foundations).
- Building on Google Search’s AI Overviews, Google is further building AI into search. Google Search’s AI Mode uses Gemini 2.5 to deliver a “deep search” mode that decomposes users’ questions into hundreds of sub-queries for analysis and visualization. Google plans to integrate AI Mode features into its core search product. In addition, Google Search’s AI Mode will gain Search Live (real-time, audio-enabled visual interaction via camera) and agentic features (for tasks such as purchasing tickets). Computer-use capabilities are coming to the Gemini API and Vertex AI.
Why it matters: Google is catching up with the Microsoft/OpenAI colossus on several fronts. The addition of audio output to Gemini and Gemma models fuels the rise of voice-to-voice and other audio applications and gives developers powerful new tools to build them. At the same time, Veo 3’s text-to-video-plus-audio output shows marked improvement over the previous version.
Behind the news: The number of tokens Google processed monthly has surged this year from 9.7 trillion last year to 480 trillion, a sign that its AI APIs and AI-infused products are rapidly gaining traction. Google’s progress contrasts with Apple’s ongoing struggles. Both share advantages in smartphones and app distribution. But, while Google has showcased a string of advanced models as well as early efforts to integrate them into legacy products, Apple’s organizational challenges have hampered its AI development. Now Apple must contend with OpenAI’s acquisition of LoveFrom, the startup founded by its former lead product designer Jony Ive.
We’re thinking: Google I/O 2025 was a strong showing of generative AI capabilities! There’s still work to be done to translate these innovations into compelling products, but the company now has a strong base for building numerous innovative products.