Machine Learning Research
Benchmarks for Agentic Behaviors: New LLM benchmarks for Tool Use and Planning in workplace tasks
Tool use and planning are key behaviors in agentic workflows that enable large language models (LLMs) to execute complex sequences of steps. New benchmarks measure these capabilities in common workplace tasks.