Data Points: Microsoft’s first MAI foundation models

In today’s edition of Data Points, you’ll learn more about:

Anthropic and OpenAI’s audits of each others’ models
Claude’s opt-out training data policies
ChatGPT’s responses to mental health crises
Alibaba’s new AI inference chip

But first:

Microsoft unveils two new foundation models

Microsoft began public testing of MAI-1-preview, its first end-to-end trained foundation model, on LMArena. The mixture-of-experts model, trained on approximately 15,000 NVIDIA H100 GPUs, specializes in instruction-following and responding to everyday queries. Microsoft also released MAI-Voice-1, a single-GPU speech generation model for both single and multi-speaker scenarios. These models represent Microsoft’s strategy to complement partner models like GPT-5 with specialized systems tailored for different use cases. MAI-1-preview will roll out to select Copilot text features over coming weeks. with API access available by application. MAI-Voice-1 is immediately available in Copilot Daily, Podcasts, and Copilot Labs experiences. (Microsoft)

Latin America builds its own ChatGPT rival

The Chilean National Center for Artificial Intelligence (CENIA) launched Latam-GPT, an open source language model designed specifically for Latin American languages and cultural contexts. The project brings together 33 institutions across Latin America and the Caribbean, collecting over 8 terabytes of text data from 20 countries to train a 50 billion parameter model comparable to GPT-3.5. The project addresses the need for AI models that understand regional dialects, history, and cultural nuances that global models often overlook, while enabling Latin American researchers to experiment directly with large language models. The first version launches this year as a free, open model that organizations can adapt for specific sectors like education, healthcare, and agriculture. (Wired)

OpenAI and Anthropic share results from first cross-lab safety tests

OpenAI and Anthropic tested each other’s publicly released models using their internal safety evaluations and published the results. The labs evaluated models including Claude Opus 4, Claude Sonnet 4, GPT-4o, GPT-4.1, OpenAI o3, and OpenAI o4-mini on instruction hierarchy, jailbreaking, hallucination, and scheming behaviors. The evaluations deliberately used adversarial scenarios outside normal usage patterns to identify potential failure modes and edge cases. Claude models excelled at respecting instruction hierarchy and resisting system prompt extraction but showed higher refusal rates on factual questions, while OpenAI’s reasoning models demonstrated stronger resistance to jailbreaks and lower refusal rates at the cost of more hallucinations. This collaboration demonstrates how AI labs can hold each other accountable and establish industry-wide safety standards through shared evaluation practices. (OpenAI and Anthropic)

Anthropic changes policies to train on user prompts by default

Anthropic updated its Consumer Terms and Privacy Policy to train its models on conversations by Claude Free, Pro, and Max users, extending data retention from 30 days to five years. This is an optional choice, but existing users must make a decision by September 28, 2025 to continue using the service, with new users prompted during signup. Anthropic claims the data will improve model capabilities and safety systems. However, users who opt in cannot fully remove their data from models already trained, even if they later change their preference. The extended retention period raises privacy concerns, as five years of conversation data could contain sensitive personal or professional information. Enterprise, API, and government users remain exempt from these data collection practices. (Anthropic)

OpenAI outlines mental health safeguards for ChatGPT

OpenAI updated its approach to handling users experiencing mental health crises, following recent cases of people using ChatGPT during acute emotional distress. OpenAI says the company’s models recognize signs of distress and respond with empathy, directing users to resources like the 988 suicide hotline in the U.S. and similar services globally. OpenAI acknowledges that safeguards can degrade during lengthy conversations and says it is working to strengthen protections, particularly for teenagers. The company plans to expand interventions, enable one-click emergency service access, and explore connecting users with licensed therapists directly through ChatGPT. (OpenAI)

Alibaba develops new AI chip as China pushes for semiconductor independence

Alibaba created a versatile AI inference chip that works with Nvidia’s software platform, marking the Chinese cloud giant’s latest effort to replace restricted American processors. The chip, currently in testing and manufactured by a Chinese company, joins similar efforts to develop alternatives to Nvidia’s H20 chip from Shanghai-based MetaX and Beijing’s Cambricon Technologies. Alibaba designed the processor for a broad set of inference tasks rather than specific applications, addressing surging demand for AI use. Chinese companies continue to build their AI capabilities despite U.S. export restrictions and Beijing’s recent directive against purchasing Nvidia chips. However, Chinese processors still face challenges with model training compared to inference. (The Wall Street Journal)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng shared thoughts on parallel agents as a new way to scale AI, highlighting how running agents simultaneously sped up research, coding, and other workflows while boosting performance.

“The falling cost of LLM inference makes it worthwhile to use a lot more tokens, and using them in parallel allows this to be done without significantly increasing the user’s waiting time.”

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:

Google unveiled Magic Cue, a new no-prompt AI assistant for the upcoming Pixel 10.
French startup Mistral published detailed data on energy, water, and material consumption for the full lifecycle of its Mistral Large 2 model.
Chinese researchers disguised a modified robot dog as an antelope to study herd behavior in the wild.
Meta introduced DINOv3, an update to its self-supervised learning framework with a new loss term that delivers better image processing and vision performance.

Subscribe to Data Points