Large Language Models (LLMs) - The Batch | DeepLearning.AI (Page 4)

A performance comparison table highlights Ling-1T's success in reasoning and coding tasks against rivals.

Machine Learning Research

Reasoning Without “Thinking”: All about Ant Group’s Ling-1T, an open, non-reasoning model that outperforms closed competitors

Reasoning models typically learn to undertake a separate process of “thinking” through their output of before they produce final response. Ant Group built a top non-reasoning model that can take similar steps as part of its immediate response.

Close-up of a violin scroll and pegs, symbolizing precision needed in fine-tuning AI models.

Machine Learning Research

Fine-Tuning Simplified: Thinking Machines’ new Tinker API makes it easier to fine-tune models on many GPUs

The first offering from Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, aims to simplify — and democratize — the process of fine-tuning AI models.

Graphs compare DeepSeek models showing reduced cost per million tokens with V3.2-Exp over V3.1-Terminus.

Machine Learning Research

DeepSeek Cuts Inference Costs: DeepSeek-V3.2-Exp streamlines processing using a "lightning indexer," boosting efficiency

DeepSeek’s latest large language model can cut inference costs by more than half and processes long contexts dramatically faster relative to its predecessor.

Flowchart of Text-to-LoRA model processes task embeddings into LoRA adapters, showing weights and losses.

Machine Learning Research

LoRA Adapters On Tap: Text-to-LoRA generates task-specific LoRA adapters directly from natural language descriptions

The approach known as LoRA streamlines fine-tuning by training a small adapter that modifies a pretrained model’s weights at inference. Researchers built a model that generates such adapters directly.

Bar chart comparing performance of Qwen3 models against others in diverse tasks, highlighting Qwen3-Max.

Machine Learning Research

Qwen3 Goes Big (and Smaller): Alibaba expands Qwen3 family with a 1 trillion-parameter Max model, open-weights Qwen3-VL, and the Qwen3-Omni voice model

Alibaba rounded out the Qwen3 family with its biggest large language model to date as well as smaller models that process text, images, video, and/or audio.

Icons for files, pictures, and shopping connect through nodes to a dollar sign, illustrating AI-driven profit pathways.

Business

OpenAI, Meta Diversify AI Product Lines: OpenAI and Meta launch social video apps while ChatGPT adds Pulse and Instant Checkout

OpenAI and Meta, which have been content to offer standalone chatbots or tuck them into existing products, introduced dueling social video networks and other initiatives designed to boost revenue and engagement.

Comparison table highlighting Claude Sonnet 4.5's top scores in coding and reasoning benchmarks, featuring improved capabilities.

Machine Learning Research

Claude Levels Up: Anthropic launches Claude Sonnet 4.5 and the Claude Agent SDK, and overhauls Claude Code for developers

Anthropic updated its mid-size Claude Sonnet model, making it the first member of the Claude family to reach version 4.5. It also enhanced the Claude Code agentic coding tool with long-desired features.

Flowchart shows data reordering, probability sampling, and effective gradient updating in reinforcement learning.

Machine Learning Research

Faster Reinforcement Learning: New technique auto-selects training examples to speed up fine-tuning

Fine-tuning large language models via reinforcement learning is computationally expensive, but researchers found a way to streamline the process.

Chart details ChatGPT conversations. Writing (28.1%), info-seeking (21.3%), and guidance (28.3%) lead.

Machine Learning Research

What ChatGPT Users Want: ChatGPT users now more likely to be young, female, and seeking info, study shows

What do ChatGPT’s 700 million weekly active users do with it? OpenAI teamed up with a Harvard economist to find out.

Central AI agent icon links to merchant, cart, and payment symbols, illustrating agentic payments process.

Business

Agents of Commerce: Google’s AP2 gives developers new tools to build agentic payments

Google launched an open protocol for agentic payments that enables agents based on any large language model to purchase items over the internet.

Diagram of Qwen3-Next architecture with Mixture of Experts, Gated Attention, and Gated DeltaNet layers.

Machine Learning Research

Qwen3-Next Accelerates: Alibaba’s new model uses hybrid attention layers and a sparse MoE architecture for speed and performance

Alibaba updated its popular Qwen3 open-weights models with a number of fresh, speed-boosting tweaks.

Diagram comparing sliding window attention and ATLAS memory, showing wider context tracking in ATLAS.

Machine Learning Research

10 Million Tokens of Input Context: ATLAS, a transformer-like architecture, can process a context window as large as ten million tokens

An alternative to attention enables large language models to track relationships among words across extraordinarily wide spans of text.