Biomni, an AI Agent for Multidisciplinary Biology Research

An agent designed for broad biological research could accelerate the work of scientists in specialties from anatomy to zoology.

What’s new: Kexing Huang and colleagues at Stanford, Princeton, University of Washington, Arc Institute, and Genentech introduced Biomni, an agent that performs tasks in genomics, immunology, microbiology, neuroscience, pathology, and much more. You can join a waitlist to get access. The authors intend to release the system as open source.

How it works: The authors assembled a collection of tools, software packages, and databases. Then they built an agent based on Claude 4 Sonnet that draws upon those resources to answer questions, propose hypotheses, design processes, analyze datasets, generate graphs, and so on.

The authors prompted Claude 3.5 Sonnet (the most current version when the work started) to extract the relevant tasks, tools, and databases used in 2,500 recent papers (100 from each of 25 specialties). They filtered the list manually to settle on 150 tools and nearly 60 databases. To that, they added around 100 popular biological software packages.
At inference, given a query, Biomni prompts Claude 4 Sonnet to determine which tools, packages, and databases are needed. Then it prompts the model to build a step-by-step plan to produce a response.
From there, the agent follows the CodeAct framework: Given a prompt to follow the plan or results of executing code, it can ask for clarification, write code and execute it, and return the result. The agent continues to follow the plan, generate code, and reason iteratively until it’s ready to produce a final response.
At each intermediate output, a different copy of Claude 4 Sonnet judges whether the model followed a proper procedure or confabulated its output. If the judge determines the model fell short, it tells the agent to repeat the step. If not, execution continues normally.

Results: Biomni outperformed Claude 4 Sonnet alone, as well as the same model with access to research literature, on Lab-bench, a biomedical subset of Humanity’s Last Exam, and eight other datasets, as well as three practical case studies.

On the subset of Humanity’s Last Exam, Biomni (17.3 percent accuracy) outperformed Claude 4 Sonnet alone (6 percent accuracy) and Claude 4 Sonnet with access to research (12.2 percent accuracy).
Asked to diagnose a patient based on a full genome, Biomni achieved roughly 85 percent accuracy, while Claude 4 Sonnet alone achieved 5 percent.
The authors assessed the ability to produce a protocol for cloning DNA sequences, co-author Serena Zhang said in an interview. Across 10 tests, experts rated Biomni’s protocol around 4.5 out of 5 — on par with those produced by human experts, higher than trainees, and much higher than Claude 4 Sonnet alone. A DNA synthesis lab was able to produce the sequence specified by one of the generated protocols.

Behind the news: While Biomni is designed to apply to biology broadly, most previous work on agents focused on narrower areas. For instance, just two days after the release of Biomni, a separate team at Stanford released CellVoyager, an agent that generates hypotheses about datasets of single-cell RNA sequences. Other examples include CRISPR-GPT, which designs gene-editing experiments, and SpatialAgent, which analyzes and hypothesizes about how cells interact within organisms.

Why it matters: While agents conversant in biology typically focus on narrow specialties, Biomni’s knowledge and skills span the entire domain, offering expert assistance to biologists across many specialties. Its reasoning capabilities can improve by substituting more capable LLMs as they become available, and its library of resources can be updated to keep up with changes in the field and extend its knowledge to new areas.

We’re thinking: Like biology, many sciences are so deep and broad that most scientists have deep expertise only within their areas of specialty. Yet agents can pull together resources from disparate areas to reach novel conclusions. In this way, Biomni demonstrates the potential of AI to augment human expertise in meaningful ways.