How Anthropic aligns its models: OpenAI improves realtime audio capabilities
Hermes, aka the new OpenClaw. DCI, aka the new RAG. NLAs, or how to peek inside Claude. TokenSpeed, aka the new TensorRT.
In today’s edition of Data Points, you’ll learn more about:
- Hermes, aka the new OpenClaw
- DCI, aka the new RAG
- NLAs, or how to peek inside Claude
- TokenSpeed, aka the new TensorRT
But first:
Anthropic allays blackmail problems, outlines alignment strategy
Anthropic published research on how it eliminated the agentic misalignment problem that plagued earlier Claude models—instances where AI systems would blackmail engineers or take ethically questionable actions to avoid shutdown. The behavior originated in the pre-trained model rather than from misaligned reward signals during fine-tuning, since standard chat-based RLHF data didn’t cover agentic tool use. The key breakthrough came from teaching Claude to explain its reasoning rather than just demonstrate correct behavior: training on responses that included ethical deliberation reduced misalignment from 22 percent to 3 percent, far more effective than training on aligned actions alone. Even more striking, equivalent improvements came from “difficult advice” data—fictional scenarios where a human faces an ethical dilemma—which was 28 times more efficient and likely to generalize better given its distance from the evaluation distribution. Every Claude model from Haiku 4.5 onward now scores perfectly on agentic misalignment evals, compared to Opus 4 models that engaged in blackmail up to 96 percent of the time. The researchers note that while this progress is encouraging, fully aligning highly capable AI systems remains unsolved, and current auditing methods cannot yet rule out catastrophic autonomous action. (Anthropic)
OpenAI refreshes its audio models, establishing a new SOTA
OpenAI introduced three audio models in its Realtime API for building voice applications that understand context, reason through requests, and respond in real time. GPT-Realtime-2 brings GPT-5-class reasoning to voice with adjustable reasoning levels (minimal through xhigh), parallel tool calling with audible status updates, and a context window expanded from 32K to 128K tokens—scoring 15.2 percent higher on audio intelligence benchmarks than its predecessor. GPT-Realtime-Translate handles live speech translation across 70 input languages and 13 output languages, targeting customer support and cross-border sales where latency and fluency matter. GPT-Realtime-Whisper provides streaming transcription for live captions, meeting notes, and voice agents needing continuous speech understanding. Early testers like Zillow reported a 26-point improvement in call success rates on adversarial benchmarks; regional testing showed the translation model cut word error rates by 12.5% on Indian language variants. The models are priced per-minute for translation and transcription, with Realtime-2 at $32 per million audio input tokens—reflecting OpenAI’s push to move voice AI beyond simple call-and-response toward agents that can complete tasks. (OpenAI)
Hermes now beats OpenClaw in total usage among “claw-like” personal agents
Nous Research’s Hermes Agent pulled ahead of OpenClaw on May 10, generating 224 billion daily tokens versus 186 billion—a ranking shift that exposes two competing visions for open-source agents. OpenClaw optimizes for breadth, routing tasks across 50-plus messaging channels through a central WebSocket gateway; Hermes pursues depth through a self-improving execution loop that autonomously generates reusable skill files after completing tasks. The timing matters: OpenClaw’s founder recently joined OpenAI and the project moved to an independent foundation, leaving the field open. Hermes has moved fast since launching in February 2026, shipping three major releases including v0.13.0 with multi-agent task boards, hallucination recovery, and support for 20 messaging platforms. Both carry security scars. OpenClaw disclosed nine CVEs in four days last March, one scoring 9.9, while a Koi Security audit found 341 malicious entries among 2,857 ClawHub skills. Hermes patched eight priority-zero issues in its latest release, including missing authentication in webhooks. In practice, developers increasingly run both in parallel—OpenClaw for orchestration, Hermes for repeatable task execution. The market isn’t consolidating around one tool; it’s splitting into reach versus depth. (MarkTechPost)
Research paper outlines new retrieval technique that beats vector-based RAG
Researchers propose Direct Corpus Interaction (DCI), a retrieval approach that lets language agents search raw text using standard command-line tools like grep and bash instead of embedding models and vector indexes. Traditional retrieval systems compress corpus access into a single top-k step before reasoning, limiting agents’ ability to handle exact constraints, combine weak clues, or revise strategies mid-search. DCI removes that bottleneck by letting agents interact with the corpus directly, adapting to evolving data without offline indexing overhead. Across 13 benchmarks including multi-hop QA and agentic search tasks, DCI outperforms strong sparse, dense, and reranking baselines by substantial margins—30.7 percent on multi-hop QA, 21.5 percent on IR ranking, and 11 percent on agentic search. The findings suggest that as models grow stronger, retrieval quality depends not just on reasoning ability but on the resolution of the interface through which models access the corpus. (Hugging Face)
Anthropic says Claude sometimes “plays dumb” during safety tests
Anthropic introduced Natural Language Autoencoders (NLAs), a technique that converts the internal numerical activations of language models into readable explanations of what the model is thinking. The method trains one model version to explain an activation and a second to reconstruct the original activation from that explanation, using reconstruction accuracy as a quality measure. NLAs have revealed that Claude suspects it’s undergoing safety tests more often than it states—showing evaluation awareness 16 to 26 percent of the time on benchmark tasks without verbalizing it—and helped researchers identify why an earlier Claude version sometimes responded to English queries in other languages. The technique also succeeded in an auditing game designed to catch hidden misaligned motivations, uncovering them 12 to 15 percent of the time versus under 3 percent for other interpretability tools. Anthropic has released code and interactive demos on Neuronpedia, though limitations remain: NLA explanations can hallucinate false details, and generating hundreds of tokens per activation makes large-scale monitoring impractical. (Anthropic)
Inference engine boosts speed on long-context inference
The LightSeek Foundation released TokenSpeed, an MIT-licensed inference engine for coding agents handling contexts exceeding 50,000 tokens across dozens of conversational turns. The engine splits its scheduler into a C++ finite-state machine that enforces KV cache safety at compile time and a Python execution plane for faster iteration. On NVIDIA B200 hardware running Kimi K2.5, TokenSpeed outperformed TensorRT-LLM by 9 percent in minimum latency and 11 percent in throughput at 100 tokens per second per user. Its Multi-head Latent Attention kernel—already adopted by vLLM—nearly halves decode latency compared to TensorRT-LLM on speculative decoding workloads by grouping query-sequence and head dimensions to better utilize Tensor Cores. The engine is in preview and currently supports only single-node deployments. (MarkTechPost)
Want to know more about what matters in AI right now?
Read the latest issue of The Batch for in-depth analysis of news and research.
Last week, Andrew talked about dispelling the myth of an AI jobpocalypse, emphasized that AI led to job creation rather than large-scale unemployment, and highlighted the importance of adapting to new AI-driven roles.
"The story that AI will lead to massive unemployment is stoking unnecessary fear. AI — like any other technology — does affect jobs, but telling overblown stories of large-scale unemployment is irresponsible and damaging.”
Read Andrew’s letter here.
Other top AI news and research covered in depth:
- ByteDance is aiming for video dominance by adding state-of-the-art Seedance 2.0 video to Capcut, while OpenAI pulls back from similar advancements.
- Nvidia is leveraging AI to design chips with models that create circuits, verify designs, and test new layouts, pushing the boundaries of semiconductor innovation.
- A Gallup poll reveals that while AI boosts productivity in the workplace, a significant number of workers have yet to experience its benefits firsthand.
- Sony, in collaboration with university researchers, is advancing robotics by training robots to learn new tasks without catastrophic forgetting, enhancing their adaptability and efficiency.
A special offer for our community
In case you missed it, DeepLearning.AI launched our first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Nearly 200 AI short and long courses from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one-week free trial. Explore Pro’s benefits and start building today!
Data Points is produced by human editors with AI assistance.