Analytics DeepLearning.AI - The Batch | DeepLearning.AI (Page 32)

Gavel striking a neural network, symbolizing legal decisions impacting AI and machine learning technologies.

Tech & Society

Judge Upholds Copyright in AI Training Case: U.S. court rejects fair use defense in Thomson Reuters AI lawsuit

A United States court delivered a major ruling that begins to answer the question whether, and under what conditions, training an AI system on copyrighted material is considered fair use that doesn’t require permission.

Phi-4 Mini multimodal architecture integrating vision, audio, and text with token merging and LoRA-adapted weights for AI processing.

Machine Learning Research

Microsoft Tackles Voice-In, Text-Out: Microsoft’s Phi-4 Multimodal model can process text, images, and speech simultaneously

Microsoft debuted its first official large language model that responds to spoken input.

QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.

Machine Learning Research

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.

Futuristic nightclub with neon lights, a dancing crowd, and a supercomputer DJ booth glowing amid fog and lasers.

Data Points

EAGLE-3 speeds up language models: And the 2024 Turing Award goes to…

Music and lyrics in one diffusion model. Manus AI’s impressive demos spark excitement and backlash. OpenAI sees AGI as a gradual evolution. Google unveils its first Gemini-branded embedding models.

A man sitting side by side with his computer at a bar as if they are having a friendly conversation.

Data Points

Qwen’s mid-sized reasoning model scores big: Sesame moves through speech models’ “uncanny valley”

Cohere’s open vision models support many languages. Jamba 1.6’s two hybrid MoE models promise more speed. Anthropic overhauls its developer console for Claude Sonnet 3.7. Mistral brings its multilingual/multimedia skills to OCR.

Diagram of an RQ-Transformer speech system with Helium and Depth Transformers for audio processing.

Letters

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

The Batch Newsletter

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

The Batch AI News and Insights: Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

Amazon smart display with widgets for recipes, calendar, weather, events, and streaming (Prime Video, Netflix, Disney+).

Tech & Society

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Amazon announced Alexa+, a major upgrade to its long-running voice assistant.

Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.

Machine Learning Research

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.

Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.

Machine Learning Research

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.

Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.

Machine Learning Research

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

Team in modern office applauding while watching a news anchor on a big screen.

Data Points

All the models we’ve been waiting for: OpenAI’s scaled-up Project Orion arrives

Mercury debuts diffusion language models. Alibaba’s top video model is now free to download. A new model from Tencent is built for speed. IBM’s Granite 3.2 models are built for business.