Diffusion Models - The Batch

Diagram showing step-by-step image creation process, featuring bears, cats, and birds as examples.

Machine Learning Research

Planning Generated Images In Stages: Meta improves image models by plotting and revising generations step-by-step

Text-to-image generators that use diffusion or flow-matching typically compose a whole image at once (although they refine the whole image in steps).

Text input in Google's Gemini app creates a comical R&B song, showcasing music generation capabilities.

Machine Learning Research

Gemini’s Music Generator: Google debuted Lyria 3, an app that turns text or images into 30-second songs

Google added a music generator to Gemini and YouTube, putting a model that produces synthetic songs in front of hundreds of millions of users.

A dimly lit studio with an idle camera and an illuminated exit door signaling OpenAI's retreat from video.

Business

OpenAI Exits Video Generation: OpenAI will shut down Sora, its once state-of-the-art video model

OpenAI plans to shut down its video generator Sora in a sudden retreat from the video market.

Visual schema of FAE's learning process, featuring fire and snowflake icons showing performance focus.

Machine Learning Research

Lightning-Fast Diffusion Learning: Inside Feature Auto-Encoder, a diffusion image generator that shrinks embeddings for more speed

Research shows that diffusion image generators can train somewhat faster if they learn to reconstruct embeddings from a pretrained encoder that’s built for vision tasks like classification, segmentation, and retrieval — not image generation.

Collage with comic strip, concert poster, diagrams on water cycle and trash sorting, and movie poster.

Machine Learning Research

Refining Words in Pictures: Z.ai’s GLM-Image blends transformer and diffusion architectures for better text in images

Image generators often mangle text. An open-weights model outperforms open and proprietary competitors in text rendering.

A warm-toned room features a sofa, a decorated shelf, and sunlight filtering through patterned curtains.

Machine Learning Research

Detailed Text- or Image-to-3D, Pronto: FlashWorld generates 3D objects, scenes, and surfaces with photorealistic fidelity

Current methods that produce 3D scenes from text or images are slow and produce inconsistent results. Researchers introduced a technique that generates detailed, coherent 3D scenes seconds.

View from a car on a tree-lined street, with an overlay instructing to decelerate if hazards are detected.

Machine Learning Research

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Chain-of-thought reasoning can help autonomous vehicles decide what to do next.

Bar chart shows HunyuanImage 3.0's performance against Nano Banana and Seedream 4.0, highlighting differences.

Machine Learning Research

Better Images Through Reasoning: HunyuanImage-3.0 uses reinforcement learning and thinking tokens to better understand prompts

A new image generator reasons over prompts to produce outstanding pictures.

Three AI-generated video clips: a man vaulting over a moving car, a gymnast flipping on a plane wing, and a rabbit ice skating in pink boots.

Machine Learning Research

Mixture of Video Experts: Alibaba’s Wan 2.2 video models adopt a new architecture to sort noisy from less-noisy inputs

The mixture-of-experts approach that has boosted the performance of large language models may do the same for video generation.

Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.

Machine Learning Research

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.

Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.

Machine Learning Research

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.

Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry

Science

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.