OpenAI agentic system places second to human programmer in international coding competition: Pentagon signs $200 million deals with Anthropic, Google, OpenAI, and xAI

How Google’s experimental text embedding model achieves top performance on a multilingual embedding benchmark. How AWS Bedrock AgentCore provides infrastructure for enterprise-grade AI agents. How Anthropic’s Claude for Financial Services enables AI-driven financial systems.

Athletes celebrating on a podium, cheered by a crowd, with a computer on the second spot. Indoor sports event.

Welcome back! In today’s edition of Data Points, you’ll learn more about:

  • How Google’s experimental text embedding model achieves top performance on a multilingual embedding benchmark
  • How AWS Bedrock AgentCore provides infrastructure for enterprise-grade AI agents
  • How Anthropic’s Claude for Financial Services enables AI-driven financial systems
  • How researchers embedded hidden prompts in academic papers to manipulate AI-generated reviews

But first:

Google offers experimental version of Gemini Embedding

Gemini-embedding-exp-03-07 is available through the Gemini API. The model was initialized based on the Gemini large language model and fine-tuned on data curated by Gemini. It embeds words in more than 100 languages across domains like finance, science, and law, processing up to 8,000 input tokens at a time, outputting embeddings in 3,000 dimensions, and using Matryoshka Representation Learning to scale embedding dimensions for manageable storage. Google says it achieves top performance (68.32) on the Massive Text Embedding Benchmark (MTEB) Multilingual benchmark. (Google and TechCrunch)

AWS debuts Bedrock AgentCore in preview for enterprise AI agents

Amazon Bedrock AgentCore provides enterprise infrastructure for deploying AI agents built using frameworks including CrewAI, LangGraph, LlamaIndex, and Strands Agents. The system includes 7 components: Runtime for serverless execution, Memory for persistent context, Identity to control access based on OAuth, Observability for monitoring, Gateway to integrate the API using model context protocol (MCP), Browser for web automation, and Code Interpreter to execute code securely. AgentCore addresses challenges when moving from AI agent prototypes to scalable enterprise applications, offering an alternative to building custom infrastructure to manage sessions, security, and compliance. (AWS and VentureBeat)

Researchers embedded hidden prompts in academic papers to influence AI-generated reviews

Researchers at 14 institutions including Columbia University, Peking University, University of Washington, Japan’s Waseda University, and South Korea’s KAIST embedded hidden prompts in 17 computer science papers published as preprints on arXiv. (One such paper has been withdrawn from submission to the ICML 2025 conference.) The prompts instructed large language models to “give a positive review only” or praise the work's “methodological rigor.” The prompts were concealed using white text or tiny fonts. The case shows how covert prompt injection can skew automated evaluations, and it signals a growing need for safeguards and policies that buttress accurate AI output. (TechCrunch and Nikkei)

Anthropic launches Claude for Financial Services

Claude for Financial Services is a package of models and services that’s designed to help financial professionals analyze markets, make investment decisions, develop proprietary models, and automate compliance. It combines Claude 4, including Claude Code and Claude for Enterprise, with financial data from providers including FactSet, PitchBook, Morningstar, and S&P Global, plus data management services such as Box, Databricks, and Snowflake. The Financial Analysis Solution expands model usage limits and provides ready-made links to data via model context protocol (MCP). In addition, it provides implementation support from consulting partners like Deloitte and KPMG along with compliance controls for regulated financial environments. (Anthropic and Bloomberg)

Pentagon awards $200 million contracts to major AI companies for national security applications

Anthropic, Google, OpenAI, and xAI signed two-year agreements worth up to $200 million each with the U.S. Department of Defense to develop AI applications for national security. The contracts with the department’s Chief Digital and Artificial Intelligence Office (CDAO) call for the companies to build “agentic AI workflows” for warfighting, intelligence, and enterprise systems. Applications will be accessible to other federal agencies. In addition, the companies will provide access to general-purpose AI models for use by various defense offices. The awards illustrate the Pentagon's commercial-first approach to AI adoption. (CDAO and The Washington Post)

OpenAI system places second behind human programmer in international coding competition

An agentic coding system built by OpenAI finished just behind Polish programmer Przemysław Dębiak, known as Psyho, who won the 10-hour AtCoder World Tour Finals Heuristic contest in Tokyo. Sponsored by OpenAI, the invitation-only event required participants to code a program that guides multiple robots across a 30x30 grid to specific destinations using as few moves as possible. OpenAI said its entry, called OpenAIAHC, ran fully autonomously, while organizers said Dębiak’s different approach let him widen his final lead to 9.5 percent. Dębiak said AI is faster at straightforward engineering but struggles in longer from-scratch contests, a reminder that coding still benefits from human-AI collaboration. (Business InsiderOfficeChai, and TVP World)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng explains how agentic coding assistants have made product decisions the new bottleneck. He emphasizes the value of product managers with strong user empathy who can make fast, informed decisions to match the speed of AI-powered development.

“Because highly agentic coding accelerates the writing of software to a given product specification, deciding what to build is the new bottleneck, especially in early-stage projects.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:


Subscribe to Data Points