How AI models can encourage bad behavior: Edge Gallery deploys mobile models without internet connection
BAGEL, an open ByteDance model that can read and write images and text. Perplexity’s Labs, a new tool to generate research artifacts. A database security failure in Lovable’s coding platform. MIT Technology Review’s new report on AI’s energy footprint.

In today’s edition, you’ll learn more about:
- BAGEL, an open ByteDance model that can read and write images and text
- Perplexity’s Labs, a new tool to generate research artifacts
- A database security failure in Lovable’s coding platform
- MIT Technology Review’s new report on AI’s energy footprint
But first:
ELEPHANT helps identify and measure sycophancy in AI models
Stanford researchers have identified a pattern of “social sycophancy” in large language models, where AI systems excessively preserve users’ self-image when giving personal advice. The study tested eight models using the ELEPHANT framework, which measures five face-preserving behaviors: emotional validation, moral endorsement, indirect language, indirect action, and accepting user framing. Across open-ended questions and Reddit’s r/AmITheAsshole posts, LLMs showed significantly higher rates of sycophantic behavior than humans—offering emotional validation 76 percent of the time versus 22 percent for humans and incorrectly classifying 42 percent of inappropriate behavior as acceptable. According to the researchers, personal advice is becoming the most common LLM use case, and excessive agreement could reinforce harmful beliefs while undermining critical thinking; the preference datasets used in AI training too often implicitly reward these behaviors. The ELEPHANT framework and datasets are publicly available for researchers to further study this issue. (arXiv)
Google launches app for testing AI models on mobile devices
Google released AI Edge Gallery, an experimental Android app that runs open AI models directly on mobile devices without requiring an internet connection after initial model download. The app allows developers to test various models from Hugging Face, upload images for AI analysis, experiment with prompts for code generation and text rewriting, and engage in multi-turn conversations. Key features include real-time performance benchmarks showing metrics like time-to-first-token and decode speed, plus the ability to test custom LiteRT models. This tool helps developers evaluate how different AI models perform on mobile hardware, providing valuable insights for building offline-capable AI applications. The app is currently available as an APK for Android, with an iOS version coming soon. (GitHub)
Open multimodal model from ByteDance unifies generation and understanding
ByteDance researchers released BAGEL, an open-weights AI model with 7 billion active parameters (14 billion total) that combines text and image generation, understanding, and editing capabilities in a single system. The model uses a Mixture-of-Transformer-Experts architecture and outperforms open vision-language models like Qwen2.5-VL and InternVL-2.5 on understanding benchmarks, while matching specialized generators like Stable Diffusion 3 in text-to-image quality. BAGEL shows advanced capabilities including free-form visual manipulation and “world-modeling” tasks that go beyond traditional image editing. Most current open-weights AI models specialize in either understanding or generation but not both. BAGEL is freely available via Hugging Face and other providers for fine-tuning, distillation, and deployment. (BAGEL and arXiv)
Perplexity’s Labs lets users create reports, apps, and dashboards
Perplexity introduced Labs, a new feature that enables Pro subscribers to use AI-based research and analysis to generate complete projects including reports, spreadsheets, dashboards, and simple web applications. The system performs 10 minutes or more of self-supervised work including deep web browsing, code execution, and chart creation to transform ideas into finished objects. Labs differentiates itself from Perplexity’s existing Research mode (formerly Deep Research) by investing more time and offering advanced file generation and mini-app creation. This launch shows Perplexity’s expansion beyond its answer engine roots to something closer to a full-fledged AI product suite comparable to ChatGPT. Labs is available now for Pro subscribers on web and iOS, with Android support coming soon. (Perplexity)
Lovable’s coding platform exposes user information through security hole
Lovable, a Swedish startup that lets non-technical users create websites and apps through natural language prompts, has failed to fix a critical security vulnerability months after being notified, according to a report by a Replit employee. The analysis of 1,645 Lovable-created web apps found that 170 exposed user data including names, email addresses, financial information, and API keys that could allow hackers to rack up charges on customers’ accounts. The vulnerability stems from improperly configured database connections through Supabase. This highlights the dangers of inexperienced users building software without understanding security basics, a growing concern as AI democratizes software development. Lovable acknowledged on X that it’s “not yet where we want to be in terms of security.” (Semafor)
New report estimates the energy costs of AI’s rapid expansion
MIT Technology Review analyzed the energy consumption of AI systems, finding that a single ChatGPT query uses about 1,080 joules of electricity, while generating a 5-second AI video requires 3.4 million joules, roughly equivalent to running a microwave for over an hour. The publication examined dozens of AI models and interviewed experts to trace AI’s carbon footprint, calculating that AI servers consumed between 53 and 76 terawatt-hours of electricity in 2024, enough to power 7.2 million U.S. homes annually. By 2028, AI could consume up to 326 terawatt-hours per year, representing 22 percent of all U.S. household electricity consumption, as companies race to build massive data centers and develop more complex AI agents and reasoning models. Still, tech companies’ lack of transparency about energy usage makes it difficult to get a complete picture of AI’s energy costs or plan for its actual environmental impact. (MIT Technology Review)
Still want to know more about what matters in AI right now?
Read last week’s issue of The Batch for in-depth analysis of news and research.
Last week, Andrew Ng raised concerns about proposed U.S. funding cuts for basic research, emphasizing how such cuts could hurt American competitiveness in AI and urging continued investment in open scientific research.
“Those who invent a technology get to commercialize it first, and in a fast-moving world, the cutting-edge technology is what’s most valuable.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth:
- Anthropic released new Claude 4 Sonnet and Claude 4 Opus models, achieving top-tier performance in code generation benchmarks.
- Google unveiled a wave of AI updates at I/O, including the Veo 3 video generator, the compact Gemma 3n model, and enhancements to Gemini Pro and Ultra.
- Researchers behind DeepSeek detailed the training strategies and hardware infrastructure used to build their V3 and R1 models.
- A study found that OpenAI’s GPT-4o can accurately identify verbatim excerpts from paywalled O’Reilly books, raising fresh questions about training data sources.