GPT-5 gets a Codex-specific model update: A new MCP-style protocol for agentic payments

GitHub’s new MCP server registry. Google and OpenAI’s gold medal achievements at ICPC. VaultGemma, an open, privacy-first language model. How Anthropic’s usage restrictions risk U.S. ire.

Robot cashier with human assistant at supermarket checkout counter handling groceries.

Welcome back! In today’s edition of Data Points, you’ll learn more about:

  • GitHub’s new MCP server registry
  • Google and OpenAI’s gold medal achievements at ICPC
  • VaultGemma, an open, privacy-first language model
  • How Anthropic’s usage restrictions risk U.S. ire

But first:

OpenAI releases GPT-5-Codex with platform updates for developers

GPT-5-Codex, a specialized coding version of GPT-5, outperforms the base model on code refactoring tasks (51.3 percent vs. 33.9 percent) and can work independently for over 7 hours on complex projects. The model adapts its thinking time to match task difficulty; it responds quickly to simple requests but takes much longer on challenging problems, catching more critical bugs during code reviews. OpenAI also rebuilt its Codex tools with new features like image attachments in the command line, a VS Code extension that syncs between local and cloud work, and infrastructure improvements that cut task completion times by 90 percent. These updates position Codex competitively against Claude Code and other agentic assistants and model scaffolds in the semi-autonomous coding market. Codex comes with all paid ChatGPT plans, with API access planned for the near future. (OpenAI)

Google unveils protocol for AI agents to make secure payments

Google announced AP2, an open protocol that lets AI agents safely make purchases on users’ behalf using cryptographically-signed “Mandates” that prove authorization for purchases or sales. The protocol works with payment methods from credit cards to cryptocurrencies and includes an extension built with Coinbase and the Ethereum Foundation. The new protocol could create a unified framework for AI-driven commerce, ensuring accountability when agents transact autonomously, while also preventing a fragmented agentic payments ecosystem. Over 60 organizations including American Express, Mastercard, and PayPal are collaborating on the protocol, which is now available on GitHub. (Google)

GitHub launches a central hub for discovering MCP servers

GitHub launched the MCP Registry to solve a big developer headache: finding Model Context Protocol (MCP) servers that let AI agents talk to external tools and systems. The registry features curated servers from partners like Figma, Postman, HashiCorp, and Dynatrace, with one-click installation in VS Code and sorting by GitHub stars to help developers quickly find what they need. Without a registry, developers often had to hunt through scattered repositories and community forums to find MCP servers, which slowed adoption and created security risks. The registry marks the first step toward building an open-source MCP registry with Anthropic and the MCP Steering Committee, where developers can self-publish servers that automatically appear in GitHub’s registry. (GitHub)

AI systems make breakthrough at world programming championship

Google’s Gemini 2.5 Deep Think and OpenAI’s reasoning system both achieved gold-medal level performance at the 2025 International Collegiate Programming Contest (ICPC) World Finals. OpenAI earned a perfect 12/12 score that would have placed first among all human participants. Google’s system solved 10 problems, including one that no human team completed, while OpenAI’s ensemble of GPT-5 and an experimental reasoning model solved all 12 without specific ICPC training. Both companies’ systems competed under official ICPC rules with the same time constraints as human teams, showing significant advances in AI’s abstract reasoning and problem-solving capabilities. (Google and X)

Google releases VaultGemma, a language model with built-in privacy

At 1 billion parameters, Google says VaultGemma is the largest open language model trained from scratch with differential privacy, a mathematical technique that prevents the model from memorizing individual training examples by. carefully adding calibrated noise during training. However, this requires significantly larger batch sizes and more computational resources than standard training. Google’s research establishes new scaling laws that help developers understand the trade-offs between compute budget, privacy guarantees, and model performance when training with differential privacy. This work provides useful guidance for organizations seeking to build AI systems that protect user privacy while maintaining useful capabilities. VaultGemma’s weights are available on Hugging Face and Kaggle, along with a technical report detailing the training methodology. (Google)

Anthropic faces U.S. government backlash over law enforcement usage restrictions

Anthropic’s refusal to allow its AI models for certain law enforcement purposes has created tensions with the Trump administration, according to two senior officials. The company recently declined requests from federal law enforcement contractors because its usage policy prohibits surveillance of U.S. citizens, affecting agencies like the FBI, Secret Service, and Immigration and Customs Enforcement. This poses challenges for government contractors since Anthropic’s Claude models are sometimes the only top-tier AI models cleared for top secret security situations through Amazon Web Services GovCloud. The dispute highlights broader questions about how much control AI companies should have over government use of their technology, particularly as those governments use AI to take over controversial functions. (Semafor)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng highlighted the growing importance of automated software testing in the era of AI-assisted coding, emphasizing how agentic testing can make coding agents more reliable, prevent subtle infrastructure bugs, and support stable software development.

“Automatically testing infrastructure software components that you intend to build on top of is especially helpful and results in more stable infrastructure and less downstream debugging.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:


Subscribe to Data Points