Data Points

Qwen3.7-Max, China’s latest top model: OpenAI model disproves key math conjecture

Nvidia’s Gated DeltaNet-2, its latest attention alternative. Trump calls off executive order regulating U.S. AI models. Microsoft’s benchmark-topping suite of computer use agents. Cohere’s Command-A+, a local alternative to big cloud AI.

Analytics DeepLearning.AI

25 May 2026 — 5 min read

In today’s edition of Data Points, you’ll learn more about:

Nvidia’s Gated DeltaNet-2, its latest attention alternative
Trump calls off executive order regulating U.S. AI models
Microsoft’s benchmark-topping suite of computer use agents
Cohere’s Command-A+, a local alternative to big cloud AI

But first:

Qwen tops China’s AI leaderboards, rivals Gemini and Claude

Alibaba released Qwen3.7-Max on May 20, a reasoning model built for multi-step agent tasks rather than snappy responses. The model stretches to one million tokens of context (four times the previous version) and includes an extended-thinking mode that forces the model to show its work before answering. That transparency comes at a cost: during benchmarking, the model generated 97 million tokens compared to 24 million for typical models, adding noticeable latency. The tradeoff reveals a deliberate design choice. On simple factual questions, Qwen3.7-Max now refuses to answer more often than it used to, dropping its accuracy on AA-Omniscience by 7.6 percentage points. But on scientific reasoning and agentic tasks, where the model needs to call tools, write code, and iterate, it pulls ahead. Alibaba demonstrated the model executing over 1,000 tool calls and code modifications autonomously to optimize a chip kernel, claiming a 10x speedup. On the Artificial Analysis Intelligence Index, Qwen3.7-Max ranks fifth overall, slightly behind GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview, but ahead of Gemini 3.5 Flash. The model runs text-only for now and connects through Alibaba Cloud’s DashScope platform. (MarkTechPost)

OpenAI model finds proof resolving famous mathematics problem

An OpenAI reasoning model has resolved the planar unit distance problem, a central question in discrete geometry posed by legendary mathematician Paul Erdős in 1946. The conjecture held that square grid constructions were essentially optimal for maximizing unit-distance pairs among points in a plane, a belief that stood unchallenged for nearly 80 years. The model instead found an infinite family of configurations yielding polynomial improvements over the grid approach, disproving the assumption. What makes the breakthrough unusual is not just the result itself, but how it was found: a general-purpose reasoning model, not a system specialized for mathematics, produced a proof that external mathematicians have verified. The proof brings sophisticated tools from algebraic number theory to bear on an elementary geometric question, revealing unexpected connections between distant mathematical domains. Fields medalist Tim Gowers called it “a milestone in AI mathematics,” while number theorist Arul Shankar argued the result shows AI models “are capable of having original ingenious ideas, and then carrying them out to fruition.” (OpenAI)

Gated DeltaNet-2 beats Mamba-3, other solid state architectures

Nvidia researchers built Gated DeltaNet-2, a promising alternative to standard attention, by decoupling a fundamental constraint in linear attention models. Most linear attention architectures compress sequence history into a fixed-size state to avoid the quadratic cost of Transformers, but this forces many associations to share the same memory—a problem that worsens as contexts grow. Earlier work like Gated DeltaNet and Kimi Delta Attention let the model selectively overwrite stale associations, but both used a single scalar gate to control two separate decisions: how much old content to erase on the key side and how much new content to write on the value side. Gated DeltaNet-2 splits those into independent channel-wise gates, letting the model clear specific stale associations without committing to a preset budget for new values. At 1.3 billion parameters trained on 100 billion tokens, it outperformed Mamba-2, Gated DeltaNet, Kimi Delta Attention, and Mamba-3 across language modeling, reasoning, and retrieval. The gap widened most on long-context benchmarks where a compressed state must keep competing associations distinct, suggesting the decoupled gates solve a real bottleneck rather than just add capacity. (GitHub)

Trump reserves self on AI models, says “I hate regulation”

U.S. President Donald Trump postponed signing an executive order on artificial intelligence hours before a scheduled White House ceremony Thursday, saying the measure could weaken America’s technological edge against China. He disliked the order’s text and wanted to preserve U.S. AI leadership without regulatory friction. The order stemmed from cybersecurity concerns raised in April, when Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened Wall Street CEOs to warn about vulnerabilities that AI models like Anthropic’s Claude could expose in critical software systems. The retreat signals internal tensions between factions pushing for deregulation and those advocating vetting mechanisms on the most capable AI systems. Computer science professor Serena Booth called the whiplash a symptom of deeper fractures: the administration is caught between its pro-business impulse to accelerate AI development and security concerns that some officials think warrant government oversight. (Associated Press)

Lightweight Microsoft computer use agents beat Google, OpenAI

Microsoft Research released Fara1.5, a family of computer-use agents in three sizes (4B, 9B, 27B) that operate browsers by reading screenshots and issuing mouse and keyboard commands. The largest model scores 72 percent on Online-Mind2Web, a benchmark of 300 tasks across 136 websites—outperforming OpenAI’s Operator (58.3 percent) and Google’s Gemini 2.5 Computer Use (57.3 percent). The models are built on Qwen 3.5 checkpoints and trained on roughly two million samples, mostly web trajectories and synthetic environments. The core innovation is FaraGen1.5, a synthetic data pipeline that builds functional clones of gated applications like email and calendar, solving the problem of training on tasks that require authentication or irreversible actions. Fara1.5 pauses to ask users before handling sensitive information, accepting ambiguous instructions, or taking irreversible actions—a safety constraint that distinguishes it from competitors chasing pure benchmark numbers. (MarkTechPost)

Cohere updates sovereign-ready model, now with open weights

Cohere released Command A+, a mixture-of-experts model designed for sovereign AI, with 218 billion total parameters and 25 billion active during inference, under an Apache 2.0 license. It consolidates reasoning, multimodal understanding, tool use, and multilingual support across 48 languages into a single architecture that runs on as few as two H100 GPUs at 4-bit quantization with minimal quality loss. Performance gains are substantial: agentic coding scores on Terminal-Bench Hard rose from 3 percent to 25 percent over Command A Reasoning, while telecom task accuracy climbed from 37 percent to 85 percent. Efficiency improvements include 63 percent higher token throughput and up to 20 percent better tokenization compression for languages like Arabic and Korean. Cohere attributes the model’s design to a year of deploying its North enterprise platform. (Cohere)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew talked about Harvard's decision to limit A grades to combat grade inflation, his belief in supporting all learners to succeed, and his preference for educational practices that focus on skill-building rather than judgment.

“We should hold a high bar, but also work mightily to support the success of 100% of learners, rather than a fraction.”

Read Andrew’s letter here.

Other top AI news and research covered in depth:

Hermes Agent Challenges OpenClaw as the upstart personal agent outworks the established class created by OpenClaw.
Thinking Machines unveils Built-In Conversational Interactivity with its first interaction model, introducing a new type of multimodal AI.
Cybersecurity Alarms Grow Louder as a Google study reveals that LLM-generated malware is becoming increasingly difficult to track and stop.
Researchers aim Toward Agent Benchmarks That Reflect Human Work, highlighting that AI agents may not be improving in performing a full range of economically valuable labor.

A special offer for our community

In case you missed it, DeepLearning.AI launched our first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:

Nearly 200 AI short and long courses from Andrew Ng and industry experts
Labs and quizzes to test your knowledge
Projects to share with employers
Certificates to testify to your new skills
A community to help you advance at the speed of AI

Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one-week free trial. Explore Pro’s benefits and start building today!

Try Pro Now

Data Points is produced by human editors with AI assistance.

Qwen3.7-Max, China’s latest top model: OpenAI model disproves key math conjecture

Analytics DeepLearning.AI

A special offer for our community

Read more

Planning Generated Images In Stages: Meta improves image models by plotting and revising generations step-by-step

Agents Surf the AI-Written Web: Internet traffic driven by AI rripled last year, study shows

Europe Pauses Some AI Regulations: European Union regulators delay some AI Act provisions, delete others

Gemini 3.5 Flash Pairs Smarts With Speed: Google's updated Flash levels up, approaching top models but raising prices