Machine Learning Research - The Batch | DeepLearning.AI (Page 4)

Diagram shows LLM training with encoders for images, audio, video; inference with galaxies, satellites.

Machine Learning Research

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples

Enabling a pretrained large language model to process a data type other than text (say, images), possibly in a specialized domain (say, radiology), typically requires thousands to millions of examples that pair the other data (perhaps x-rays) with text.

A table compares GPT-5.2's benchmark scores to Claude Opus 4.5 and Gemini 3 Pro in various reasoning tasks.

Machine Learning Research

OpenAI’s Answer to Gemini 3: GPT-5.2 arrives, touting variable reasoning and coding performance

OpenAI launched GPT-5.2 only weeks after its CEO Sam Altman reportedly issued a “code red” alarm in response to Google's Gemini 3.

GIF showing a robotic arm picking up glasses on a table and handling tools on a kitchen countertop.

Machine Learning Research

Coherent, Interactive Worlds: Runway’s GWM-1 models generate videos with consistent physics for robots and entertainment

Runway’s GWM-1 family of video-generation models respond to user input in real time while producing scenes that remain consistent regardless of the camera’s position.

Flowchart showing Tiny Recursive Model process with stages: input, prediction, and latent refinement.

Machine Learning Research

Small Models Solve Hard Puzzles: Tiny Recursive Model beats larger competitors at games like Sudoku and Maze

Large language models often fail at puzzles like Sudoku, for which a solution includes multiple elements and a single mistake invalidates all of them. Researchers showed that a tiny network, by repeatedly refining its solution, can solve this sort of puzzle well.

Table comparing Nova 2 Pro to other models in reasoning, coding, perception, and workflows.

Machine Learning Research

Amazon Steps Forward: Nova 2 family boosts cost-effective performance, adds new agentic features

Amazon raised the competitive profile of its foundation models and added services for custom model training and an agent platform for browser automation.

Table highlights Opus 4.5’s superior scores in coding and reasoning compared to other AI models.

Machine Learning Research

Claude Does More With Fewer Tokens: Claude Opus 4.5 retakes the coding crown at one-third the price of its predecessor

Claude Opus 4.5, the latest version of Anthropic’s flagship model, extends the earlier version’s strengths in coding, computer use, and agentic workflows while generating fewer tokens.

In a lab, four robots move a metal frame using graph neural network coordination on a platform.

Machine Learning Research

Coordinating Robot Teams: Google DeepMind’s RoboBallet project blends GNNs with RL to drive 8-armed robots

In factories, where teams of robotic arms work in tight spaces, their motions are programmed by hand to keep them from interfering with one another. Researchers automated this programming using graph neural networks trained via reinforcement learning.

Graph shows Ernie-4.5 outperforming competitors in document understanding and visual reasoning tasks.

Machine Learning Research

Baidu’s Multimodal Bids: Giant Ernie 5 natively generates multiple media; Ernie-4.5-VL-28B-A3B-Thinking tops Vision-Language metrics

Baidu debuted two models: a lightweight, open-weights, vision-language model and a giant, proprietary, multimodal model built to take on U.S. competitors.

GIF showing a 360° walkthrough of a conference room with a wooden table, high-back chairs, wall screens, and ceiling lights.

Machine Learning Research

Generated, Editable Virtual Spaces: World Labs makes Marble world model public, adds Chisel editing tool

Models that generate 3D spaces typically generate them as users move through them without generating a persistent world to be explored later. A new model produces 3D worlds that can be exported and modified.

GIF showing AI object detection tagging penguins on a beach, cars in traffic, and dancing people.

Machine Learning Research

Open 3D Generation Pipeline: Meta’s SAM 3 image segmentation models can analyze and create bodies and other objects

Meta’s Segment Anything Model (SAM) image-segmentation model has evolved into an open-weights suite for generating 3D objects. SAM 3 segments images, SAM 3D turns the segments into 3D objects, and SAM 3D Body produces 3D objects of any people among the segments. You can experiment with all three.

Diagram shows AI traits with pipelines for "evil" vs. "helpful" responses to user queries on animal treatment.

Machine Learning Research

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.

Table shows Gemini 3 Pro leading in benchmarks, outperforming Gemini 2.5, Claude Sonnet 4.5, and GPT-5.1.

Machine Learning Research

Google Dominates Arena Leaderboards (For the Moment): Gemini 3 Pro and Nano Banana Pro boast best-in-class multimodal reasoning and image generation

Google introduced Gemini 3 Pro and Nano Banana Pro, its flagship vision-language and image-generation models, and deployed them to billions of users worldwide.