DeepSWE claims to measure agents better: Pope Leo XIV calls for AI to “disarm,” consider humanity
DeepSeek’s permanent V4 price cuts. MAI-Image-2.5, currently third on the Arena Leaderboard. Mythos-1’s remarkable security skills. How MCP will change later this year
In today’s edition of Data Points, you’ll learn more about:
- DeepSeek’s permanent V4 price cuts
- MAI-Image-2.5, currently third on the Arena Leaderboard
- Mythos-1’s remarkable security skills
- How MCP will change later this year
But first:
Datacurve debuts verified agentic coding benchmark
Researchers released DeepSWE, a software engineering benchmark designed to fix what they see as critical flaws in existing evaluations like SWE-bench Pro. The benchmark contains 113 original tasks spanning 91 repositories across five languages, with every task written from scratch rather than adapted from existing commits—eliminating contamination risk. Solutions require an average of 668 lines of code versus SWE-bench Pro’s 120, despite prompts being half as long, forcing agents to discover implementation details rather than execute prescriptive specifications. An audit of SWE-bench Pro found its verifiers misgrades outputs at alarming rates: 8.5 percent false positives and 25 percent false negatives, often because inherited test suites weren’t designed to grade arbitrary solutions. DeepSWE’s hand-written verifiers disagreed with a careful LLM judge only 1.4 percent of the time. On the new benchmark, frontier models show 70 percentage points of separation from best to worst, compared to just 30 points on SWE-bench Pro, suggesting the tighter clustering on existing leaderboards understates real capability gaps developers encounter in practice. (DeepSWE)
Vatican letter calls for responsibile AI use and development
Pope Leo XIV published Magnifica Humanitas, his first official encyclical on social and moral issues, writing that AI should be “disarmed” to avoid uses that enable domination, exclusion, or warfare The encyclical emerged from consultations with scientists, engineers, educators, political leaders, and families, and directly addresses autonomous weapons and discriminatory algorithms. The current pope invoked Pope Leo XIII’s 1891 letter on industrial change as a precedent, positioning the Catholic Church as obligated to address major technological upheaval through the framework of human dignity. The letter articulates a core principle: Humans must not be reduced to productivity metrics or data. The encyclical stresses that persons possess freedom, interiority, and a vocation that no machine can replace. The Pope also called for international cooperation among nations, institutions, tech developers, and affected communities to ensure AI advances benefit all humanity rather than privileged few. (Vatican News and Associated Press)
DeepSeek makes preview price cuts permanent for V4 Pro
DeepSeek made permanent a 75 percent price reduction on its V4 Pro flagship model, dropping costs from $0.0145–$3.48 per million tokens to $0.003625–$0.87. The Chinese startup, which released V4 models a month ago, framed the move around cost-effective AI at scale, a direct challenge to OpenAI, Google, and Anthropic in a market increasingly sensitive to token economics. For enterprises and power users burning through millions of tokens daily, the savings compound quickly. The aggressive pricing undercuts competitors who have faced accusations of model theft from DeepSeek; Anthropic previously alleged the company performed “distillation attacks” to extract capabilities from Claude. (Engadget)
Microsoft updates and improves its image generation
Microsoft announced MAI-Image-2.5, its latest text-to-image model, which debuted at third place on the Arena leaderboard. The model shows marked improvements over its predecessor in text rendering, stylized illustration, and commercial imagery—areas where previous versions struggled with fine details. The company emphasizes gains in professional creative work: sharper text in posters and packaging, better scene structure and lighting, and stronger product photography. MAI-Image-2.5 is now available on Arena, with releases to the MAI Playground and Foundry within two weeks. (Microsoft)
Mythos discovers bugs and vulnerabilities faster than patches
Anthropic and its approximately 50 Project Glasswing partners used Claude Mythos Preview to discover over 10,000 high- or critical-severity vulnerabilities across critical software systems in the first month after Glasswing’s April 2026 launch — including both partner codebases and over 1,000 open-source projects scanned by Anthropic itself. The real problem has become obvious: nobody can fix them fast enough. Cloudflare alone surfaced 2,000 bugs—400 critical—with fewer false positives than human testers would generate. Mozilla patched 271 vulnerabilities in Firefox 150, more than ten times the count from previous Claude versions. Yet of 530 critical bugs disclosed across the program, only 75 have been patched so far, with high-severity fixes averaging two weeks to roll out. Anthropic scanned over 1,000 open-source projects and confirmed that 90.6 percent of flagged issues were valid after manual review. The asymmetry matters: attackers with access to similar models could soon exploit this window between discovery and remediation, while maintainers remain swamped by the sheer volume of findings. (Anthropic)
MCP proposes biggest updates since the specification’s debut
The Model Context Protocol released a candidate spec that removes stateful sessions and lets remote MCP servers run behind ordinary load balancers. The old flow required an initialize handshake that pinned clients to a specific server instance with sticky sessions; the new version makes every request self-contained, carrying protocol version and client info in request metadata instead of session state. Servers that need continuity across calls can still maintain it—they just mint explicit handles (like a cart ID) and have the model thread them through as tool arguments, which turns out to give models more composability than hidden session state ever did. The release also formalizes extensions as a first-class concept, ships MCP Apps for server-rendered UIs in sandboxed iframes, and deprecates roots, sampling, and logging under a new lifecycle policy that promises they’ll keep working for at least a year. (Model Context Protocol Foundation)
Want to know more about what matters in AI right now?
Read the latest issue of The Batch for in-depth analysis of news and research.
Last week, Andrew talked about Harvard's decision to limit A grades to combat grade inflation, his belief in supporting all learners to succeed, and his preference for educational practices that focus on skill-building rather than judgment.
“I believe in letting — and even encouraging — someone to redo something until they succeed. This is as opposed to standing in judgment of the fact they didn’t get it right the first time.”
Read Andrew’s letter here.
Other top AI news and research stories covered in depth:
- Hermes Agent Challenges OpenClaw as the upstart personal agent outworks the established class created by OpenClaw.
- Thinking Machines unveils Built-In Conversational Interactivity with its first interaction model, introducing a new type of multimodal AI.
- Cybersecurity Alarms Grow Louder as a Google study reveals that LLM-generated malware is becoming increasingly difficult to track and stop.
- Researchers aim Toward Agent Benchmarks That Reflect Human Work, highlighting that AI agents may not be improving in performing a full range of economically valuable labor.
A special offer for our community
In case you missed it, DeepLearning.AI launched our first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Nearly 200 AI short and long courses from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one-week free trial. Explore Pro’s benefits and start building today!
Data Points is produced by human editors with AI assistance.