Transformers - The Batch | DeepLearning.AI

GIF showing fluid simulation with patch jittering, followed by predicted vs. actual flows across time steps 0–38.

Machine Learning Research

How Liquids and Gases Behave: A dynamic fluids model appears to solve transformers’ pixellation problem

Simulating complex physical systems through traditional numerical methods is slow and expensive, and simulations based on machine learning are usually specialized for a specific type of system, such as water in a pipe or atmosphere surrounding a planet.

Diagram shows DNA analysis with interconnected devices, output types, and species-specific data.

Science

Dark DNA Unveiled: Google’s AlphaGenome interprets DNA that regulates genetic expression

An open-weights model could help scientists compare the impact of genetic variations, identify mutations that cause diseases, and develop treatments.

Two graphs show TTT-E2E maintains stable loss and latency across increasing context lengths up to 128k.

Machine Learning Research

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Large language models typically become less accurate and slower when they process longer contexts, but researchers enabled an LLM to keep accuracy stable and inference time constant as its context grew.

Machine Learning Research

A Single Tokenizer for Visual Media: Apple’s AToken, a multimodal model with a single encoder and tokenizer for images, videos, and 3D objects

Multimodal models typically use different tokenizers to embed different media types, and different encoders when training to generate media rather than classify it.

Diagram shows SleepFM's data processing flow from sleep signals to disease prediction using neural networks.

Machine Learning Research

Sleep Signals Predict Illness: SleepFM detects signs of neurological disorders years before symptoms manifest

Difficulty sleeping often precedes heart disease, psychiatric disorders, and many other illnesses. Researchers used data gathered during sleep studies to detect such conditions.

Two comparison tables show AI model performance across varied benchmarks, highlighting LFM2.5-1.2B.

Machine Learning Research

Faster Reasoning at the Edge: Liquid AI’s small reasoning model mixes attention with convolutional layers for efficiency

Reasoning models in the 1 to 2 billion-parameter range typically require more than 1 gigabyte of RAM to run. Liquid AI released one that runs in less than 900 megabytes, and does it with exceptional speed and efficiency.

Flowchart showing Mistral Small 3.1 model distillation into smaller Ministral 3 models with post-training steps.

Machine Learning Research

Recipe for Smaller, Capable Models: Mistral uses cascade distillation on Mistral 3 to build Ministral family

Mistral compressed Mistral Small 3.1 into much smaller versions, yielding a family of relatively small, open-weights, vision-language models that perform better by some measures than competing models of similar size. The method combines pruning and distillation.

Collage with comic strip, concert poster, diagrams on water cycle and trash sorting, and movie poster.

Machine Learning Research

Refining Words in Pictures: Z.ai’s GLM-Image blends transformer and diffusion architectures for better text in images

Image generators often mangle text. An open-weights model outperforms open and proprietary competitors in text rendering.

View from a car on a tree-lined street, with an overlay instructing to decelerate if hazards are detected.

Machine Learning Research

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Chain-of-thought reasoning can help autonomous vehicles decide what to do next.

Graph shows Ernie-4.5 outperforming competitors in document understanding and visual reasoning tasks.

Machine Learning Research

Baidu’s Multimodal Bids: Giant Ernie 5 natively generates multiple media; Ernie-4.5-VL-28B-A3B-Thinking tops Vision-Language metrics

Baidu debuted two models: a lightweight, open-weights, vision-language model and a giant, proprietary, multimodal model built to take on U.S. competitors.

Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.

Machine Learning Research

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.

Series of graphs transformed via tokenization and transformer layers, resulting in predicted outputs.

Machine Learning Research

Forecasting Multiple Time Series: Amazon’s Chronos-2 sorts out tangled variables to make better predictions

Transformers are well suited to predicting future values of time series like energy prices, wages, or weather, but often — as in those examples — multiple time series often influence one another. Researchers built a model that can forecast multiple time series simultaneously.