Data Points: Cursor’s BugBot improves vibe debugging

Welcome back! In today’s edition of Data Points, you’ll learn more about how:

Google’s new AI tool probes images’ backstories
ChatGPT agent tackles general computer tasks
Amazon buys Bee for another crack at AI wearables
New benchmark measures AI agents’ predictions

But first:

Cursor releases Bugbot for automated code reviews

Cursor launched Bugbot, an AI-powered code review agent that automatically analyzes pull requests to identify logic bugs, edge cases, and security issues. The tool uses advanced models and proprietary techniques to understand code intent and provide meaningful feedback while maintaining a low false positive rate. During its beta phase, Bugbot reviewed over 1 million pull requests and identified 1.5 million issues, with more than 50% of flagged bugs resolved before merging. Users say automated code review significantly reduces the time engineers spend on manual reviews, allowing them to focus on higher-value work. Bugbot is now available through the Cursor dashboard for all users. (Cursor)

GitHub launches Spark, a tool to build web apps from natural language prompts

GitHub unveiled Spark, a new AI-powered development platform that creates and deploys complete web applications from natural language descriptions. The tool uses Claude Sonnet 4 to generate both frontend and backend code, includes built-in hosting and deployment, and provides access to LLMs from OpenAI, Meta, DeepSeek, and xAI without requiring API key management. Developers can iterate on their apps using natural language, visual controls, or traditional coding with GitHub Copilot assistance, and can export projects to full GitHub repositories with Actions and Dependabot integration. This release moves toward making app development accessible to non-programmers while offering more experienced developers a rapid prototyping tool. Spark is currently available in public preview for GitHub Copilot Pro+ subscribers, with broader availability planned for the future. (GitHub)

Google’s Backstory investigates image authenticity

Google introduced Backstory, an experimental AI tool that helps users better understand the context and origin of images found online. The tool uses Gemini to detect whether images are AI-generated, tracks their previous online usage, identifies digital alterations, and generates readable reports of its findings. Backstory’s holistic approach to establishing trustworthiness examines not just whether an image is AI-generated, but also how it has been used across the internet and whether it has been presented out of context. Google is currently testing Backstory with image creators and information professionals, gathering feedback throughout the year to improve the technology, but is available to select users through a gated waitlist. (Google)

OpenAI’s ChatGPT agent completes users’ online and local tasks

OpenAI released ChatGPT agent, a general-purpose AI tool that can navigate calendars, create presentations, run code, and complete various computer tasks through natural language prompts. The agent combines capabilities from OpenAI's previous tools, including Operator's web navigation and Deep Research's information synthesis, while adding new features like app connectivity through Gmail and GitHub integrations and terminal access. ChatGPT agent achieved significant benchmark improvements over OpenAI base models, scoring 41.6 percent on Humanity's Last Exam (double the performance of OpenAI's o3 and o4-mini) and 27.4 percent on FrontierMath with tools, compared to o4-mini's 6.3 percent. OpenAI implemented safety measures including real-time monitoring for biological threats and disabled the memory feature to prevent data exfiltration, as the model received a "high capability" designation for biological and chemical weapon domains. ChatGPT agent is available now to Pro, Plus, and Team subscribers through a dropdown menu option in ChatGPT. (OpenAI)

Amazon acquires AI wearable startup Bee for undisclosed sum

Amazon agreed to purchase Bee, a San Francisco startup that makes a $50 AI-powered bracelet that records, transcribes and summarizes conversations. The wristband can create summaries, to-do lists and other outputs from recorded audio, with features to mute recording for privacy control. Amazon confirmed the acquisition will help give users more control over AI-enabled devices, with Bee likely joining Amazon's devices division under executive Panos Panay. The deal follows Amazon's previous wearables efforts, including the discontinued Halo health tracker and current Echo smart glasses with Alexa integration. (CNBC)

FutureBench, a new benchmark for agentic prediction

Together AI and Hugging Face created FutureBench, a benchmark that evaluates AI agents on their ability to predict future events rather than recall past information. The benchmark generates questions from current news and prediction markets, asking agents to forecast outcomes like Federal Reserve rate decisions, election results, or geopolitical developments within specific timeframes. Initial testing shows Claude 3.7 Sonnet leading with 67.3 percent accuracy, followed by GPT-4.1 at 62 percent and DeepSeek-V3 at 61.8 percent, with agents using search and web scraping tools to gather information for their predictions. This approach eliminates data contamination concerns since models cannot train on future events, providing a more authentic test of reasoning capabilities. The benchmark operates at three evaluation levels—comparing frameworks, tools, and models—and is available as an interactive leaderboard on Hugging Face. (Hugging Face)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng invites top developers to the Buildathon — a one-day challenge to build software fast with AI tools, shifting the focus from coding to product decisions.

“AI-assisted coding is speeding up software engineering more than most people appreciate. We’re inviting the best builders from Silicon Valley and around the world to compete in person on rapidly engineering software.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:

Google and Cognition split up Windsurf assets and talent following OpenAI’s unsuccessful $3B bid, shifting dynamics in AI-assisted coding.
Moonshot unveiled Kimi K2, a trillion-parameter model designed for advanced agentic tool use.
The EU introduced a code of practice to help developers comply with the AI Act’s upcoming regulations.
Google’s AlphaEvolve combined LLMs with evolutionary algorithms to tackle complex math problems and accelerate Gemini model training.

Subscribe to Data Points