Gemini 3 Flash delivers fast, low-cost reasoning: Tencent’s video world model maintains scenes and geometries
GPT’s leaderboard-topping Image 1.5. Qwen’s 30B long-context reasoning model. iRobot’s bankruptcy and planned acquisition. Microsoft’s Voxel 3D generation model.
In today’s edition of Data Points, you’ll learn more about:
- GPT’s leaderboard-topping Image 1.5
- Qwen’s 30B long-context reasoning model
- iRobot’s bankruptcy and planned acquisition
- Microsoft’s Voxel 3D generation model
But first:
Google’s Gemini 3 Flash matches frontier models at triple the speed
Google released Gemini 3 Flash, a model that delivers frontier-level reasoning while running significantly faster and cheaper than previous versions. It scores 90.4 percent on GPQA Diamond and matches Gemini 3 Pro’s performance on several benchmarks, but uses 30 percent fewer tokens than Gemini 2.5 Pro for typical tasks. On coding tasks, it hits 78 percent on SWE-bench Verified, beating both the 2.5 series and Gemini 3 Pro. The model costs 50 cents per million input tokens and 3 dollars per million output tokens. Google is making it the default model in the Gemini app worldwide for free, rolling it out in AI Mode in Search, and offering it to developers through the Gemini API, Google AI Studio, Vertex AI, and its new Antigravity IDE. (Google)
Tencent’s WorldPlay video model keeps revisited scenes consistent
Researchers built WorldPlay to solve a core problem in interactive world models: existing systems are either fast but inconsistent, or consistent but slow. WorldPlay achieves both through three key techniques. First, it uses dual action controls—combining keyboard inputs (which work across different scene scales) with precise camera positions (which enable accurate memory retrieval). Second, it maintains a “reconstituted context memory” that pulls relevant past frames and uses “temporal reframing” to keep geometrically important old frames influential, essentially treating distant memories as if they’re recent. Third, it uses “context forcing,” a distillation method that aligns what the teacher and student models remember, enabling real-time generation without losing consistency or accumulating errors. Trained on 320,000 real and synthetic videos, the system runs at 24 FPS on 8 H800 GPUs and works across first-person and third-person views in both realistic and stylized environments. It also supports 3D reconstruction and lets users trigger events with text prompts during generation. (arXiv)
GPT Image 1.5 generates images 4x faster with improved editing
OpenAI released GPT Image 1.5, which generates images up to four times faster than the previous version while keeping details like lighting and faces consistent across edits. The model handles complex editing tasks—adding, removing, or transforming elements—while preserving what matters in the original image. It follows instructions more reliably and renders dense text better than before. OpenAI also added a new Images feature in ChatGPT with preset filters and one-time likeness uploads. The new model is available now to all ChatGPT users and through the API, where it costs 20 percent less than GPT Image 1. (OpenAI)
QwenLong-L1.5 matches GPT-5 long-context performance
Alibaba’s Tongyi Lab released QwenLong-L1.5, a 30 billion parameter model that performs on par with GPT-5 and Gemini 2.5 Pro on long-context reasoning tasks. The key breakthrough is a data synthesis pipeline that creates genuinely challenging multi-hop reasoning questions by breaking documents into atomic facts and linking them together, rather than just testing whether models can find needles in haystacks. To handle the training instability that comes with super-long contexts, they developed a new RL algorithm that dynamically balances exploration and exploitation, plus a memory management system that lets the model tackle documents up to 4 million tokens long. Starting from their Qwen3 base model, they achieved nearly a 10 point improvement across benchmarks, with especially strong gains on tasks requiring complex reasoning across scattered information. The improvements also transferred to other domains like math and long conversations, suggesting the model learned fundamental skills for maintaining coherence over extended sequences rather than just memorizing patterns. (arXiv)
iRobot files for bankruptcy, will be acquired by manufacturer
iRobot, maker of the Roomba vacuum cleaner, filed for Chapter 11 bankruptcy protection and will be acquired by its primary manufacturer, Shenzhen-based Picea Robotics. The company cited competition from Chinese rivals and U.S. tariffs as key factors, with 46 percent import duties on goods from Vietnam adding 23 million dollars in costs this year. iRobot’s valuation collapsed from 3.56 billion dollars in 2021 to roughly 140 million dollars today. The company, which holds 42 percent of the U.S. robotic vacuum market, saw a planned 1.7 billion dollar Amazon acquisition blocked by EU regulators last year. (BBC)
Voxel generates high-quality 3D shapes with realistic geometry
Microsoft developed O-Voxel, a new representation that captures both shape and appearance for 3D generation. Unlike existing methods that struggle with open surfaces and complex structures, O-Voxel handles arbitrary topology including non-watertight meshes and enclosed interior geometry. It encodes physically-based rendering properties—base color, metallic ratio, roughness, and opacity—directly aligned with the geometry. The system compresses 1024³ resolution textured assets into 9,600 tokens using a variational autoencoder with 16× spatial downsampling. Researchers trained 4 billion parameter models on 800,000 public 3D assets. Generation runs in 3 seconds at 512³ resolution and 17 seconds at 1024³ on an H100 GPU. In user studies, participants preferred the method’s outputs 66.5 percent of the time over existing approaches, citing better detail and realism. (arXiv)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng talked about the limitations and current challenges of improving LLMs, emphasizing the need for a data-centric approach and the piecemeal nature of advancements in AI models.
“We shouldn’t buy into the inaccurate hype that LLMs are a path to AGI in just a few years, but we also shouldn’t buy into the opposite, also inaccurate hype that they are only demoware. Instead, I find it helpful to have a more precise understanding of the current path to building more intelligent models.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth:
- Runway’s GWM-1 models generated videos with consistent physics for robots and entertainment, advancing coherent, interactive worlds.
- Disney collaborated with OpenAI to integrate the Sora video generator with Disney characters, allowing fan videos on Disney+.
- OpenAI launched GPT-5.2, touting variable reasoning and coding performance as a response to Gemini 3.
- Researchers introduced SEMI, a technique for adapting LLMs to any sort of data using few-shot examples to tackle new domains.
A special offer for our community
DeepLearning.AI recently launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Over 150 AI courses and specializations from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!