Tech & Society
Image Generators in the Arena: Text-to-image generators face off in arena leaderboard by Artificial Analysis
An arena-style contest pits the world’s best text-to-image generators against each other.
Tech & Society
An arena-style contest pits the world’s best text-to-image generators against each other.
Tech & Society
An influential ranking of open models revamped its criteria, as large language models approach human-level performance on popular tests.
Machine Learning Research
Tool use and planning are key behaviors in agentic workflows that enable large language models (LLMs) to execute complex sequences of steps. New benchmarks measure these capabilities in common workplace tasks.
Tech & Society
Scale AI offers new leaderboards based on its own benchmarks.
Tech & Society
How well do large language models respond to professional-level queries in various industry domains? A new company aims to find out.
Machine Learning Research
Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of motions (reach, grasp, turn, pull, release) can take from tens of thousands to millions of examples...
Machine Learning Research
While neural networks perform well on image, text, and audio datasets, they fall behind decision trees and their variations for tabular datasets. New research looked into why.
Machine Learning Research
Robots trained via reinforcement learning usually study videos of robots performing the task at hand. A new approach used videos of humans to pre-train robotic arms.
Tech & Society
A new benchmark aims to raise the bar for large language models. Researchers at 132 institutions worldwide introduced the Beyond the Imitation Game benchmark (BIG-bench), which includes tasks that humans perform well but current state-of-the-art models don’t.
AI Industry
A new study showcases AI’s growing importance worldwide. What’s new: The fifth annual AI Index from Stanford University’s Institute for Human-Centered AI documents rises in funding, regulation, and performance.
Machine Learning Research
The transformer architecture has inspired a plethora of variations. Yet researchers have used a patchwork of metrics to evaluate their performance, making them hard to compare. New work aims to level the playing field.
Tech & Society
How much processing power do various nations have on hand to drive their AI strategy? An international trade group aims to find out. The Organisation for Economic Co-operation and Development (OECD) is launching an effort to measure the computing capacity available in countries around the world.