Data Points

Tencent’s new hybrid approach to images: Text adventures test models’ memories

How to manage multiple Claude Code sessions. Reddit’s standoff with the Internet Archive. A new framework for developing open computer-use agents. Using AI to kill antibiotic-resistant germs.

Analytics DeepLearning.AI

18 Aug 2025 — 4 min read

Welcome back! In today’s edition of Data Points, you’ll learn more about:

How to manage multiple Claude Code sessions
Reddit’s standoff with the Internet Archive
A new framework for developing open computer-use agents
Using AI to kill antibiotic-resistant germs

But first:

X-Omni uses reinforcement learning to power up image generation

Tencent engineers developed X-Omni, a hybrid AI system that uses reinforcement learning to better coordinate multiple image models. The system combines an autoregressive model for semantic planning with Black Forest Labs’ FLUX.1-dev diffusion decoder. X-Omni outperforms comparable unified models and matches or beats GPT-4o image generation in several areas, achieving a score of 0.901 for English text rendering and 87.65 on the DPG benchmark (which measures ability to respond to dense, complex prompts). The team trained the autoregressive and diffusion models together rather than separately to ensure that tokens from one model work effectively with the other, addressing a key weakness in hybrid systems, where mismatched tokens can degrade image quality. Tencent released X-Omni under an Apache 2.0 license on Hugging Face and GitHub. (arXiv)

TextQuests benchmark tests agents on classic text adventure games

Researchers have released TextQuests, a new benchmark that evaluates AI agents using 25 Infocom interactive fiction games from the 1980s. These text-based adventures, which can take human players over 30 hours to complete, test an agent’s ability to learn from trial and error, reason over long contexts of up to 100,000 tokens, and execute multi-step plans without external tools. GPT-5 led all models with 37.8 percent progress unaided and 70 percent when given external clues, followed by Claude Opus 4.1 (33.9/68 percent). Meanwhile, smaller models struggled significantly. In particular, long-context reasoning significantly challenges current AI agents, which often hallucinate about prior interactions or fail to use information from their own gameplay history. The benchmark is available online for researchers to assess and improve AI agents’ long-horizon reasoning abilities. (arXiv)

Crystal, an open-source interface for managing Claude Code

Stravu released Crystal, a graphical interface removing the productivity bottleneck of waiting for single AI assistant responses by allowing developers to run and manage multiple Claude Code sessions in parallel. The tool isolates each session in its own git worktree, preventing conflicts while enabling developers to work on multiple features simultaneously, experiment with various solutions side-by-side, and test or execute code changes directly from the interface. Stravu calls the application the first IVE (integrated vibe environment). Crystal is available as a free, open-source desktop application for macOS at GitHub. (Stravu)

Reddit blocks Internet Archive from indexing site over AI scraping concerns

Reddit announced it will prevent the Internet Archive’s Wayback Machine from indexing most of its sites after discovering AI companies were using the archive service to indirectly scrape Reddit data. The Wayback Machine will only be able to index Reddit’s homepage going forward, meaning it can no longer archive individual posts, comments, or user profiles. Reddit has paid agreements to share some of its content with Google and OpenAI, but has accused other AI companies of violating its platform policies by secretly scraping its sites. This dispute reflects the business and technical difficulties of being fully or partially closed to AI scrapers. The Internet Archive’s director Mark Graham confirmed the organization has a longtime, ongoing relationship with Reddit and the two are in discussions to resolve the matter. (The Verge)

OpenCUA: Open Foundations for Computer-Use Agents

Researchers from the University of Hong Kong, Moonshot AI, Stanford, and other institutions published OpenCUA, an open-source framework for developing computer-use agents (CUAs) to autonomously complete arbitrary tasks on computers. The framework includes: an annotation tool to capture human demonstrations of computer use across Windows, macOS, and Ubuntu; AgentNet, a dataset of 22,600 computer task trajectories spanning over 200 applications and websites; and a training pipeline that transforms demonstrations into state-action pairs with reflective reasoning. OpenCUA-32B achieves a 34.8 percent success rate on OSWorld-Verified, establishing a new state-of-the-art among open-source models and surpassing OpenAI’s CUA (GPT-4o based). According to the team, OpenCUA counters the lack of transparency in proprietary CUA systems and provides researchers with tools to study these agents’ capabilities, limitations, and safety implications as they increasingly handle high-stakes digital tasks. (arXiv)

MIT uses AI to create new medicines that kill resistant bacteria

MIT researchers used AI to design new antibiotics that can kill two dangerous drug-resistant bacteria: Neisseria gonorrhoeae (which causes gonorrhea) and MRSA (a type of staph infection). The team developed two AI algorithms (called CrEM, or Chemically Reasonable Mutations, and F-VAE, fragment-based variational autoencoder) to create and then test over 36 million possible drug compounds with new antibacterial mechanisms. For developers, this showcases how generative AI algorithms can explore vast chemical spaces beyond existing databases, demonstrating AI’s ability to create novel solutions rather than just analyzing existing data. Drug-resistant infections kill nearly 5 million people yearly, and this AI approach lets scientists explore millions of new drug possibilities that would be impossible to test by hand. The nonprofit Phare Bio is now working to develop these compounds for further testing. (MIT)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng shared his experience visiting the University of Exeter in the UK to receive an honorary doctorate, highlighting the university leadership’s enthusiastic embrace of AI and its forward-looking approach to integrating AI across disciplines like computer science, environmental science, and business.

“Just as every company is becoming an AI company, every university must become an AI university — not just teaching AI, but using it to advance every field of study. This doesn’t mean abandoning disciplinary expertise. It means maintaining technical excellence while ensuring AI enhances every field.”

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:

OpenAI’s latest model, GPT-5, faced turbulence as developers raised concerns over its cost, performance, and API reliability.
India launched a nationwide GPU network and talent development programs to accelerate the creation of homegrown large language models.
AI-generated video entered the mainstream as Meta, Google, and other tech giants unveiled advancements in text-to-video technology.
Stanford and Alibaba released a bug-fixing dataset and training pipeline to improve coding assistants’ capabilities.

Subscribe to Data Points

Tencent’s new hybrid approach to images: Text adventures test models’ memories

Analytics DeepLearning.AI

Read more

Gemini 3 Flash delivers fast, low-cost reasoning: Tencent’s video world model maintains scenes and geometries

Large Language Models Are General — But Not _That_ General: Current progress in AI is piecemeal and laborious. Unforeseen breakthroughs may drive future progress, but the trend of improvement is incremental.

OpenAI’s Answer to Gemini 3, Runway’s Interactive Worlds, Disney’s Alliance With OpenAI, Adapting LLMs for Low-Data Domains

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples