Machine Learning Research

Gemini’s Music Generator: Google debuted Lyria 3, an app that turns text or images into 30-second songs

Google added a music generator to Gemini and YouTube, putting a model that produces synthetic songs in front of hundreds of millions of users.

Analytics DeepLearning.AI

03 Apr 2026 — 3 min read

Google added a music generator to Gemini and YouTube, putting a model that produces synthetic songs in front of hundreds of millions of users.

What’s new: Lyria 3 takes text descriptions or images and generates 30-second audio clips that can include instruments, singing voices, and song lyrics in several languages. Google took measures to ensure that the model’s output doesn’t violate copyrights: licensing its training data, filtering outputs for similarity to copyrighted works, and avoiding reproduction of an artist’s sonic likeness.

Input/output: Text in, audio (30 seconds) and text (lyrics) out; the Gemini app accepts images and videos as input, converts them to text, and passes to Lyria 3
Architecture: Latent diffusion model
Features: Users can specify instrumentation, style, era, vocal style, tempo, and dynamics; song lyrics in eight languages (English, German, Spanish, French, Hindi, Japanese, Korean, and Portuguese); cover art produced by Nano Banana, Google's image generator; MP3 (audio) and MP4 (video with cover art) format; watermarked output
Performance: In human and automated evaluations conducted by Google, Lyria 3 outperformed its predecessor, Lyria 2, with respect to audio quality and prompt adherence
Availability: Free to users of Gemini app 18 years and older with higher usage limits for subscribers to Google AI Plus, Pro, and Ultra; free to users of YouTube Shorts via the video soundtrack generation tool Dream Track
Undisclosed: Architecture, parameter count, training data and methods

How it works: Google disclosed only a high-level overview of Lyria 3’s architecture and training. Like latent diffusion image generators, which produce images by removing noise from embeddings of pure noise, Lyria 3 removes noise from representations of audio during a given slice of time. The Batch previously described an audio diffusion process developed by Stability.AI as well as Google’s earlier MusicLM music generation method.

Lyria 3 was trained on audio annotated with text captions at varying levels of detail and filtered for quality, duplicates, and safety. Google licensed Lyria 3’s training data, a significant change after Lyria 2, which reportedly was trained on recordings under copyright without authorization.
The model underwent three phases of training: pretraining, supervised fine-tuning, and reinforcement learning from human feedback.
Lyria 3 marks its output with SynthID, a hidden watermark that identifies synthetic media. Users can upload audio files to the Gemini app to check whether they were generated by a Google model.
If a prompt mentions a specific musician, the model will generate music in a similar style without replicating the artist’s voice or sound. Google said it compares outputs with existing music to avoid copyright violations, but acknowledged the approach is fallible and invites users to report outputs that may violate intellectual-property rights.

Behind the news: Lyria 3 arrives as the music industry is aggressively prosecuting developers of AI music generators for alleged copyright violations. The leading music generators, Suno and Udio, no longer generating music from scratch, leaving Google among a dwindling number of developers that do.

In June 2024, Sony Music, Universal Music Group (UMG), and Warner Music, the world’s three largest music companies, sued Suno and Udio, which offer web-based music generators, for alleged copyright violations. In late 2025, the defendants settled with Universal Music Group and changed their services to emphasize altering existing, licensed recordings rather than generating new music. Sony’s lawsuit remains in progress.
Google responded to music-industry pressure partly by exploring models geared for professional music production. In spring 2025, it introduced Music AI Sandbox, MusicFX DJ, and Lyria RealTime, which enable more fine-grained control over generated music. Days after launching Lyria 3, Google acquired another professional production tool, ProducerAI, formerly known as Riffusion.

Why it matters: Music generation is finding its place in an entertainment industry dominated by large, powerful incumbents. Lyria 3 puts it in front of more than 750 million Gemini users, dwarfing the current user bases of Suno (around two million paid subscribers) and Udio (around 3.3 million monthly users). It continues to produce original music — the direction that put Suno and Udio in the crosshairs of the world’s biggest recording companies — but adds safeguards, such as training on licensed music, to avoid aggravating copyright holders.

We’re thinking: Music generators produce impressive, versatile, surprisingly human-like output, yet we’re still waiting for generated music to have its ChatGPT moment. It may happen quietly as, say, producers of YouTube clips increasingly use Lyria 3 rather than pre-recorded sources.

Gemini’s Music Generator: Google debuted Lyria 3, an app that turns text or images into 30-second songs

Analytics DeepLearning.AI

Read more

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

OpenAI Exits Video Generation: OpenAI will shut down Sora, its once state-of-the-art video model

Inside Claude Code: Claude Code’s source code leaked, exposing potential future features Kairos and autoDream

Claude Code’s Source Leaks, OpenAI Exits Video Generation, Gemini Adds Music Generation, LLMs Learn at Inference