Large Language Models (LLMs) - The Batch | DeepLearning.AI (Page 7)

Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.

Machine Learning Research

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.

Business

The Fall and Rise of Sam Altman: Inside Sam Altman’s brief ouster from OpenAI

A behind-the-scenes account provides new details about the abrupt firing and reinstatement of OpenAI CEO Sam Altman in November 2023.

Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.

Business

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.

AI benchmark comparison chart showing Gemini 2.5 Pro, GPT-4.5, Claude, Grok, and others across science, math, code, and reasoning.

Machine Learning Research

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Google’s new flagship model raised the state of the art in a variety of subjective and objective tests.

Llama 4 Behemoth benchmark chart comparing coding, reasoning, and multilingual scores with Claude, Gemini, and GPT-4.5.

Machine Learning Research

Llama’s Mixture of Vision-Language Experts: Meta releases Llama 4 models, claims edge over AI competitors

Meta updated its popular open-weights models, claiming performance superior to closed competitors in three size classes.

AI tutoring system interface showing real-time context integration, privacy, and expert-like feedback generation.

Tech & Society

LLM Support for Tutors: GPT-4 boosts remote tutors’ performance in real time, study finds

Students benefit from tutoring, but training tutors is expensive. A study shows that large language models can boost tutors’ effectiveness in real time.

Comparison table of Gemini and Gemma models across benchmarks like MMLU, MATH, and GPQA with radar charts.

Machine Learning Research

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Google updated its open-weights family of large language models to include versions that handle image and video inputs.

GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright

Business

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.

AI model performance benchmark comparing R1 1776 and DeepSeek-R1 across MMLU, DROP, MATH-500, and AIME 2024 tests.

Tech & Society

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.

QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.

Machine Learning Research

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.

Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.

Machine Learning Research

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.

Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.

Machine Learning Research

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.