OpenAI security agent finds and plugs holes: Cognition’s SWE-1.5 model brings more speed for coding agents
In today’s edition of Data Points, you’ll learn more about:
- Why Gemma’s been pulled from Google’s AI Studio
- How Minimax built M2 for better coding performance
- A new technique for efficiently training smaller models
- arXiv’s new requirements for computer science submissions
But first:
OpenAI unveils new security agent for open-source projects
OpenAI announced Aardvark, an autonomous GPT-5-powered agent that analyzes code repositories to discover security vulnerabilities, assess their severity, and propose patches. The system monitors code commits, creates threat models, uses sandbox environments to validate whether a bug can be exploited, and integrates with GitHub and Codex to deliver fixes without disrupting development. In benchmark testing, Aardvark identified 92 percent of known vulnerabilities and discovered ten issues in open-source projects that received Common Vulnerabilities and Exposures (CVE) identifiers. The tool addresses a growing challenge for developers — over 40,000 CVEs were reported in 2024 alone — by automating security research that traditionally requires specialized human expertise. Aardvark is now available through a private beta program, with OpenAI planning to offer free scanning for select non-commercial open-source projects. (OpenAI)
Cognition updates speedy coding model for Windsurf agents
Cognition’s Codeium team released SWE-1.5, a software engineering model with hundreds of billions of parameters that runs at up to 950 tokens per second. The company partnered with Cerebras to serve the model 6 times faster than Claude Haiku 4.5 and 13 times faster than Sonnet 4.5. Codeium trained the model using reinforcement learning on coding tasks with its Cascade agent harness, building on an open-source base model and deploying it on Nvidia’s GB200 chips. The model scored competitively on SWE-Bench Pro, a benchmark of coding tasks across different codebases. SWE-1.5 is available now in Windsurf, Codeium’s code editor. (Cognition)
Google pulls Gemma from AI Studio after defamation claims
Google removed its open-weights Gemma model from AI Studio after U.S. Senator Marsha Blackburn said the model falsely accused her of sexual misconduct. In a letter to CEO Sundar Pichai, Blackburn said Gemma fabricated claims about a 1987 campaign incident involving a state trooper, though no such accusation exists and she didn’t run for office until 1998. The senator also referenced a lawsuit by conservative activist Robby Starbuck, who claims Google’s AI models generated defamatory statements calling him a “child rapist,” and argued these fabrications constitute defamation rather than harmless hallucinations. Google said it never intended Gemma to be used as a consumer tool for factual questions and will continue making the models available via API while removing them from AI Studio. (TechCrunch)
MiniMax’s M2 outperforms other open-weight models
Chinese AI lab MiniMax released MiniMax-M2, an open-weight mixture-of-experts model with 230 billion total parameters but only 10 billion active at inference time. M2 ranks first among open models on Artificial Analysis’s composite intelligence benchmark and performs competitively with leading proprietary models on coding tasks like SWE-Bench Verified and agentic benchmarks like GAIA. M2’s smaller compute footprint enables faster feedback loops and allows developers to run more simultaneous agent instances on the same hardware budget. MiniMax made M2 available via API at $0.30/$1.20 per million input/output tokens and released the model weights on Hugging Face for local deployment. The company’s M2-based agent is also free for a limited period. (GitHub)
New distillation method promises more efficiently trained models
Researchers at Thinking Machines showed that on-policy distillation — a training method that samples outputs from a student model and grades each token using a teacher model — achieves expert performance at a fraction of the cost of reinforcement learning. The technique combines the relevance of on-policy training with the dense feedback of distillation, allowing an 8 billion parameter model to reach 70 percent accuracy on the AIME ‘24 math benchmark with 9-30 times less compute than standard supervised fine-tuning. The researchers also showed on-policy distillation can restore instruction-following abilities lost during specialized training, making it useful for continual learning and model personalization. This approach could enable practitioners to train high-performing specialized models without the computational expense of large-scale RL, while maintaining the ability to update models with new knowledge over time. (Thinking Machines)
ArXiv restricts AI-generated survey and position paper submissions
ArXiv’s computer science section will now only accept review articles and position papers that have already passed peer review at a journal or conference. Authors must provide documentation of successful peer review when submitting, or their papers will likely be rejected. The change aims to help moderators manage an “unmanageable influx” of such papers, many of which arXiv describes as low-quality and likely generated with the help of large language models. ArXiv emphasizes that review and position papers were never officially accepted content types, though moderators previously approved high-quality submissions at their discretion. arXiv says these rules will free up volunteer moderators to focus on research papers, which remain the platform’s core mission. (arXiv)
A special offer for our community
DeepLearning.AI just launched the first-ever subscription plan for our entire course catalog! As a Pro Member, you’ll immediately enjoy access to:
- Over 150 AI courses and specializations from Andrew Ng and industry experts
- Labs and quizzes to test your knowledge
- Projects to share with employers
- Certificates to testify to your new skills
- A community to help you advance at the speed of AI
Enroll now to lock in a year of full access for $25 per month paid upfront, or opt for month-to-month payments at just $30 per month. Both payment options begin with a one week free trial. Explore Pro’s benefits and start building today!
Want to know more about what matters in AI right now?
Read the latest issue of The Batch for in-depth analysis of news and research.
Last week, Andrew Ng talked about launching DeepLearning.AI Pro, a membership offering access to over 150 AI programs, including new courses and tools to help build AI applications.
“Beyond courses, I’m working on new tools to help you build AI applications and grow your career (and have fun doing so!). Many of these tools will be available first to DeepLearning.AI Pro members. So please join to be the first to hear about these new developments!”
Read Andrew’s letter here.
Other AI news and research stories we covered that might scare you to your bones:
- Chatbots could lead users into rabbit holes as they intertwine with paranoia and delusions, raising concerns about mental health impacts of AI.
- Experts warn that the AI boom is bound to bust if the massive investments in AI models and infrastructure fail to deliver expected returns.
- The landscape of AI training faces challenges as web data diminishes, with online publishers potentially restricting access to valuable data.
- Autonomous systems wage war with drones reshaping modern combat and sparking fears over the potential loss of human oversight.