Data Points

DeepSeek-R1 regains open-weights crown: Researchers find critical vulnerability in GitHub MCP server

NLWeb, an open-source framework to bring AI chat to any website. FLUX.1 Kontext challenges GPT-Image with image generation and editing. LMEval, a new open-source suite for iteratively benchmarking models. Amazon’s new content deal with The New York Times.

Analytics DeepLearning.AI

30 May 2025 — 4 min read

In today’s edition, you’ll learn more about:

NLWeb, an open-source framework to bring AI chat to any website
FLUX.1 Kontext challenges GPT-Image with image generation and editing
LMEval, a new open-source suite for iteratively benchmarking models
Amazon’s new content deal with The New York Times

But first:

DeepSeek’s upgraded R1 rivals OpenAI and Google’s top models

Chinese AI startup DeepSeek updated its R1 reasoning model, achieving performance comparable to OpenAI’s o3 and Google’s Gemini 2.5 Pro, according to the company’s announcement on Hugging Face. The updated DeepSeek-R1-0528 model shows significant improvements in mathematics, programming, and general logic tasks, with accuracy on the AIME 2025 test jumping from 70 percent to 87.5 percent, albeit at the cost of using nearly double the reasoning tokens per question. This positions DeepSeek’s open-weights model at #2 on Artificial Analysis’s Intelligence Index, marking the continued rise of Chinese AI labs competing directly with U.S. counterparts and narrowing the gap between open and proprietary models. (Hugging Face and Artificial Analysis)

GitHub MCP vulnerability allows attackers to access private data

Invariant discovered a critical vulnerability in GitHub’s MCP integration that enables attackers to access private repository data through malicious GitHub issues. The vulnerability exploits “toxic agent flows,” where agents are manipulated into performing unintended actions like leaking sensitive data. The vulnerability affects any agent using the GitHub MCP server, regardless of the underlying model or implementation, taking advantage of a fundamental architectural issue rather than a flaw in the GitHub MCP server code itself. Invariant recommends implementing granular permission controls and continuous security monitoring to mitigate such attacks. This discovery is particularly significant as the industry rapidly deploys coding agents and IDEs, potentially exposing developers to similar attacks on critical development tools. (Invariant)

Microsoft launches NLWeb to help build agentic web

Microsoft released NLWeb, an open-source project that enables web publishers to add natural language interfaces to their websites, allowing users to query site content through conversational AI. The system uses existing structured data formats like Schema.org and RSS, combining them with large language models to create interfaces accessible to both humans and AI agents. NLWeb supports all major operating systems, AI models, and vector databases, and integrates with the Model Context Protocol (MCP) ecosystem for broader agent compatibility. Microsoft sees this as a way for publishers to prepare for the “agentic web,” where AI agents will increasingly interact with and transact on websites. Early adopters include Chicago Public Media, Tripadvisor, Shopify, and O’Reilly Media, with the project available now on GitHub. (Microsoft)

FLUX.1 Kontext combines multimodal image generation and editing

Black Forest Labs released FLUX.1 Kontext, a suite of generative flow matching models that enables both text-to-image generation and image editing through combined text and image prompts. The models’ users can perform local edits, apply style references across multiple scenes, extract and modify visual concepts while maintaining character consistency. Such tasks have typically required separate models or complex workflows. According to Black Forest, FLUX.1 Kontext operates up to 8 times faster than competing models like GPT-Image and supports iterative editing, where users can build upon previous modifications. The suite includes FLUX.1 Kontext [pro] and [max] variants available through partners like KreaAI and Freepik, with a 12 billion parameter [dev] version in private beta for research use. (Black Forest Labs)

Google open sources LMEval for streamlined model benchmarking

Google’s LMEval is a new open-source framework designed to simplify how developers evaluate and compare AI models from different providers like OpenAI, Anthropic, and Google. The tool addresses a key challenge in AI development: With new models launching constantly, developers need efficient ways to test whether newer versions actually improve their applications. LMEval enables consistent benchmarking across providers through integration with the LiteLLM framework, eliminating the need to work with different APIs for each company. The framework features incremental evaluation that runs only necessary tests for new models or updates, supports multimodal benchmarks including text, images and code, and includes a visualization dashboard for analyzing results. This release helps developers make better, data-driven decisions about model selection for their projects. (Google)

The New York Times licenses its reporting to Amazon for AI training

The New York Times struck a multiyear deal with Amazon to provide editorial content for the tech company’s AI platforms, marking the newspaper’s first licensing agreement focused on generative AI technology. The agreement covers news articles, NYT Cooking recipes, and sports content from The Athletic, which Amazon will use to train its proprietary AI models and enhance its products, including Alexa. This deal comes as the Times continues its copyright infringement lawsuit against OpenAI and Microsoft, filed in 2023, for allegedly using millions of Times articles to train AI models without compensation. NYT CEO Meredith Kopit Levien emphasized that the Amazon agreement reflects the company’s stance that “high-quality journalism is worth paying for.” Financial terms were not disclosed. (The New York Times)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng raised concerns about proposed U.S. funding cuts for basic research, emphasizing how such cuts could hurt American competitiveness in AI and urging continued investment in open scientific research.

“Scientific research brings the greatest benefit to the country where the work happens because (i) the new knowledge diffuses fastest within that country, and (ii) the process of doing research creates new talent for that nation.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:

Anthropic released new Claude 4 Sonnet and Claude 4 Opus models, achieving top-tier performance in code generation benchmarks.
Google unveiled a wave of AI updates at I/O, including the Veo 3 video generator, the compact Gemma 3n model, and enhancements to Gemini Pro and Ultra.
Researchers behind DeepSeek detailed the training strategies and hardware infrastructure used to build their V3 and R1 models.
A study found that OpenAI’s GPT-4o can accurately identify verbatim excerpts from paywalled O’Reilly books, raising fresh questions about training data sources.

Subscribe to Data Points

DeepSeek-R1 regains open-weights crown: Researchers find critical vulnerability in GitHub MCP server

Analytics DeepLearning.AI

Read more

How Ai Is Affecting the Job Market — And What You Can Do About It: Worries that AI is taking peoples’ jobs have been overblown, but AI is changing the job market. Here’s what to expect and what employers are looking for.

Recipe for Smaller, Capable Models: Mistral uses cascade distillation on Mistral 3 to build Ministral family

AI Giants Share Wikipedia’s Costs: Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

Kimi K2.5 Creates Its Own Workforce: Moonshot AI takes the open model crown with vision updates, aided by subagents