Predicting cyclones using neural networks: o3-pro trades speed for accuracy in code, science

Video game performers’ new AI deal. Meta’s new robotics world model. Mistral’s multilingual reasoning models. An open-source template for research agents.

Machine learning engineers analyzing real-time hurricane data across multiple screens in a high-tech control room.

Welcome back! In today’s edition of Data Points, you’ll learn more about:

  • Video game performers’ new AI deal
  • Meta’s new robotics world model
  • Mistral’s multilingual reasoning models
  • An open-source template for research agents

But first:

Google DeepMind launches Weather Lab with hurricane predictions

Google DeepMind and Google Research launched Weather Lab, an interactive website featuring experimental AI weather models that predict tropical cyclone formation, track, intensity, size and shape up to 15 days ahead. The new model, based on stochastic neural networks, generates 50 possible scenarios and shows accuracy matching or exceeding current physics-based methods in internal testing. The system overcomes traditional trade-offs by training on both global weather data and specialized cyclone databases, achieving 5-day track predictions 140 kilometers more accurate than leading ensemble models. Google partnered with the U.S. National Hurricane Center to validate the approach, with NHC forecasters now viewing live AI predictions alongside traditional models to potentially improve official forecasts and warnings. Weather Lab provides free access to live predictions and over two years of historical data for research purposes. (Google)

OpenAI launches o3-pro, its most advanced reasoning model

OpenAI released an enhanced version of its o3 model that generates more reasoning tokens to deliver more reliable responses across math, science, and coding tasks, although responses take longer than the previous o1-pro model. Evaluations show o3-pro outperforms both o3 and o1-pro in all tested categories, with particularly strong results in science, education, programming, business, and writing assistance. ChatGPT Pro and Team users can access o3-pro immediately through the model picker, while Enterprise and Education users will receive access next week. o3-pro is also available via API at a sharp price reduction, costing $20/$80 per million tokens of input/output. (OpenAI)

SAG-AFTRA board approves video game performer AI agreement

The U.S. actors’ union’s national board approved a tentative agreement with video game companies that includes AI protections and compensation increases for voice actors and performers. The deal requires informed consent for AI uses, establishes minimum payments for digital replicas, and sets higher rates (7.5 times scale) for real-time AI-generated performances like chatbot voices in games. The three-year contract also provides immediate pay increases upon ratification, with additional raises scheduled annually through 2027. SAG-AFTRA claims this is the first major entertainment industry contract to establish comprehensive AI safeguards following recent strikes over technology concerns. The full contract terms will be released June 18, with union members voting on ratification after a strike that ended June 11. (SAG-AFTRA)

Meta’s V-JEPA 2 teaches robots to predict physical interactions through video

Meta unveiled V-JEPA 2, a world model trained on video data that helps robots and AI agents understand and predict physical interactions in their environment. The model learns patterns from video footage, including how people handle objects and how objects move and interact, enabling robots to perform tasks like picking up and placing items in unfamiliar settings. V-JEPA 2 builds on Meta’s original V-JEPA from last year, improving the system’s ability to understand and predict physical outcomes. Meta touts that V-JEPA 2 will accelerate the development of robots that can “think before they act,” making them more useful in real-world applications. Meta also released three new benchmarks to help researchers evaluate how well AI models learn and reason about the physical world through video. (Meta)

Mistral AI unveils Magistral reasoning models

Mistral AI launched Magistral, its first reasoning models. The company released two versions: a 24 billion parameter open-weights model called Magistral Small and a larger enterprise variant, Magistral Medium, which scored 73.6 percent on AIME2024 math benchmarks (jumping to 90 percent when given multiple tries). Magistral can reason natively across languages and alphabets, not just translate after thinking in English first. Mistral also claims that Magistral can return answers up to ten times as fast as competing reasoning models. (Mistral)

Google releases Gemini LangGraph project for research-augmented AI

Google released a full-stack application template that combines a React frontend with a LangGraph-powered backend to create AI agents capable of comprehensive web research. The system uses Google’s Gemini models to dynamically generate search queries, analyze results for knowledge gaps, and iteratively refine searches until producing well-supported answers with citations. The agent architecture includes reflection capabilities that allow it to assess information sufficiency and generate follow-up queries when needed. This open-source quickstart provides developers with a complete example of building research-augmented conversational AI using LangGraph’s agent framework. The project is available under Apache License 2.0 and includes Docker deployment configurations for production use. (GitHub)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng highlighted the rise of GenAI Application Engineers and the key skills that make them successful — from mastering AI building blocks to using AI-assisted coding tools effectively.

“Skilled GenAI Application Engineers meet two primary criteria: (i) They are able to use the new AI building blocks to quickly build powerful applications. (ii) They are able to use AI assistance to carry out rapid engineering, building software systems in dramatically less time than was possible before.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:

  • Black Forest Labs launched FLUX.1 Kontext, a tool for generating and altering images with more consistent character identities and visual styles.
  • New research revealed that benchmarking reasoning in large language models is becoming increasingly expensive due to rising computational costs.
  • Venture capitalist Mary Meeker revived her influential trend reports with a data-rich analysis of the current AI market boom.
  • STORM, a new video model, outperformed GPT-4o on key video understanding benchmarks while processing significantly fewer tokens.

Subscribe to Data Points