AI Generates Viral Genomes: Researchers use genomic language models to create custom viruses

Loading the Elevenlabs Text to Speech AudioNative Player...

Researchers used AI models to create novel viruses from scratch.

What’s new: Samuel King and colleagues at the nonprofit biotech lab Arc Institute, Stanford University, and Memorial Sloan Kettering Cancer Center used model architectures related to transformers, trained on DNA sequences rather than text, to synthesize viruses that fight a common bacterial infection.

Key insight: The class of models known as genomic language models can produce DNA sequences by generating chains of nucleotides, the building blocks of DNA. Typically such models produce sequences up to the length of a single gene, of which many are required to make a genome. But fine-tuning such models on sequences associated with a family of viruses can enable them to produce longer sequences within that family. At inference, feeding the fine-tuned model the initial part of the genome of a virus from the fine-tuned family can prompt the model to generate an entire novel genome.

How it works: The authors fine-tuned existing genome language models on the genomes of 14,500 viruses in the Microviridae family of bacteriophages, viruses that kill specific bacteria. Using the fine-tuned models, they generated potential viral genomes similar to Microviridae, identified the most promising ones, and synthesized them.

  • The authors started with Evo 1 (a 7 billion-parameter StripedHyena architecture pretrained on 2.7 million bacterial and viral genomes) and Evo 2 (a 7 billion-parameter StripedHyena 2 architecture pretrained on 8.8 trillion tokens from viral, bacterial, plant, and animal genomes). The StripedHyena architectures blend transformer-like self-attention layers that encode long-range dependencies with convolution-like  blocks, enabling them to read and generate long DNA sequences efficiently.
  • The authors generated 11,000 candidate genomes by prompting the models with the first 11 nucleotides in the genome of the virus ΦX174, a relatively simple member of the Microviridae family that kills the bacterium E. coli C by making it burst.
  • They used existing tools for DNA sequence interpretation to filter the candidates, keeping those that were (i) likely to produce novel proteins, (ii) likely to produce proteins that would bind to E. Coli C, (iii) around the same length as ΦX174’s genome, and (iv) made up of the most common nucleotides. This left 302 genomes.
  • They successfully synthesized 285 of the 302 generated candidates.

Results: The authors tested a cocktail of 16 synthetic viruses on 3 bacterial strains that are resistant to ΦX174. Initially, the cocktail failed to kill the bacteria within three hours. However, when they moved the viruses to new cultures of the same bacterial strain to give them opportunities to recombine and mutate, the bacteria succumbed. 

  • In three side-by-side contests, the synthetic virus called Evo-Φ69 replicated in host cells more than ΦX174 and other synthetic viruses. Six hours after infecting its host, the population of Evo-Φ69 had increased between 16 times and 65 times its initial level, while the population of ΦX174 had increased between 1.3 times and 4.0 times.
  • In a test that tracked cloudiness of the liquid bacterial culture, a proxy for the density of the bacterial population, Evo-Φ2483 reduced the culture’s cloudiness to 0.07 optical density in 135 minutes, while ΦX174 achieved 0.22 optical density in 180 minutes.
  • Many of the synthetic viruses qualified as new species, meaning their genomes were no more than 95 percent identical to those of the nearest naturally occurring viruses.

Behind the news: Genome engineering typically relies on selective breeding, introducing random mutations, or making specific changes based on known biology, all of which modify existing genomes instead of designing new ones. These approaches struggle to change features like genome lengths and the speed at which bacteriophages kill bacterial cells.

Why it matters: Bacteriophage therapy is a potential alternative to antibiotics. However, bacteria can evolve resistance bacteriophages, just as they develop resistance to antibiotics. In this work, AI generated genomes for viable, diverse, novel synthetic bacteriophages that defeated resistant bacteria. This approach could give doctors a fresh approach to fighting bacterial infections.

We’re thinking: Making new viruses from scratch is cause for both excitement and concern. On one hand, the implications for medicine and other fields are enormous. On the other, although the authors took care to produce viruses that can’t infect humans, malicious actors may not. Research into responding to biological threats is as critical as research that enables us to create such threats.