Colin Carlson is a Biologist at Georgetown University.

The mice are killed by the virus with ruthless efficiency. Scientists have never considered it a threat to humans. Dr. Carlson and his colleagues are not sure.

The researchers have spent the past few years programming computers to teach themselves about viruses that can harm humans. The computers combed through a lot of information about the biology and ecology of the animal hosts of those viruses, as well as the genomes and other features of the viruses themselves. The computers were able to recognize certain factors that could predict whether a virus would cause harm to humans.

Dr. Carlson and his colleagues used the computers to create a short list of animal viruses with the potential to cause a human outbreak.

The mousepox virus was put in the top ranks of risky pathogens by the algorithms.

Every time we run this model, it comes up high.

Dr. Carlson and his colleagues were stumped by the scientific literature. There was an outbreak in 1987 in rural China. School children had an illness that caused sore throats and inflammation in their hands and feet.

A team of scientists ran tests on throat swabs that had been collected during the outbreak and put into storage. The group reported in 2012 that the samples contained mousepox DNA. Mousepox is not considered a threat to humans a decade after their study was done.

The computer programmed by Dr. Carlson and his colleagues should be able to tell if the virus is right.

He said that it was crazy that this was lost in the pile of stuff that public health has to sift through.

There are about 250 human diseases that arose when an animal virus jumped the species barrier. H.I.V. jumped from Chimpanzees and the new coronaviruses came from bats.

Scientists want to be able to identify the next spillover virus before it starts infecting people. There are too many animal viruses to study. More than 1,000 viruses have been identified in mammals, but that is most likely a tiny fraction of the true number. Some researchers think mammals carry tens of thousands of viruses, while others think the number is hundreds of thousands.

Researchers like Dr. Carlson are using computers to find hidden patterns in scientific data. The machines can zero in on viruses that are likely to cause disease in humans, and can also predict which animals are most likely to harbor dangerous viruses.

ImageBarbara Han, a disease ecologist at the Cary Institute of Ecosystem Studies in Millbrook, N.Y., who collaborates with Dr. Carlson.
Barbara Han, a disease ecologist at the Cary Institute of Ecosystem Studies in Millbrook, N.Y., who collaborates with Dr. Carlson.Credit...Pamela Freeman/Cary Institute of Ecosystem Studies
Barbara Han, a disease ecologist at the Cary Institute of Ecosystem Studies in Millbrook, N.Y., who collaborates with Dr. Carlson.

Barbara Han, a disease ecologist at the Cary Institute of Ecosystem Studies, said that it felt like you had a new set of eyes.

Dr. Han was the first to use machine learning. The technique was being developed by computer scientists for decades and they were starting to build powerful tools with it. Machine learning allows computers to spot fraudulent credit charges and recognize people.

Few researchers applied machine learning to diseases. Dr. Han wondered if she could use it to answer open questions, such as why less than 10 percent of rodents harbor diseases that can be transmitted to humans.

She fed a computer with information about various rodents from an online database. The computer looked for features of the rodents that had high numbers of jumping pathogens.

She tested the model against another group of rodents to see how well it could guess which ones were laden with disease-causing agents. The computer's model reached an accuracy of 90 percent.

Dr. Han put together a list of high-priority species that have yet to be examined for spillover pathogens. Dr. Han and her colleagues predicted that the montane vole and Northern grasshopper mouse of western North America would carry worrisome pathogens.

The life span of the rodents was the most important trait provided to the computer by Dr. Han and her colleagues. Evolution may have put more resources into reproducing than in building a strong immune system, which may explain why species that die young carry more pathogens.

Dr. Han and her colleagues combed through databases and scientific studies to find useful data. Researchers have been speeding this work up by building databases that teach computers about Viruses and their Hosts.

ImageThe Northern grasshopper mouse, one of the species Dr. Han’s team predicted would carry a worrisome pathogen.
The Northern grasshopper mouse, one of the species Dr. Han’s team predicted would carry a worrisome pathogen.Credit...Rick & Nora Bowers/Alamy
The Northern grasshopper mouse, one of the species Dr. Han’s team predicted would carry a worrisome pathogen.

In March, for example, Dr. Carlson and his colleagues unveiled an open-access database called VIRION, which has amassed half a million pieces of information about 9,521 viruses and their 3,692 animal hosts.

Asking more focused questions about new Pandemics is now possible thanks to databases. When the Covid epidemic hit, it was clear that it was caused by a new virus. Dr. Carlson, Dr. Han, and their colleagues created programs to identify the animals most likely to harbor relatives of the new coronaviruses.

The viruses that caused the epidemics in humans are part of a group of species. Most of the time, bats are exposed to theviruses. When the bats were discovered, they were known to carry the disease.

It would take many years to complete a project like this because scientists have not systematically searched all the bats.

Dr. Carlson, Dr. Han and their colleagues created a model that could give predictions about the bats most likely to fly. There are over 300 species that fit the bill.

The researchers had created computer models that they used to predict the number of bats with the disease.

Daniel Becker, a disease ecologist at the University of Oklahoma who also worked on the betacoronaviruses study, said it was striking the way simple features such as body size could lead to powerful predictions about viruses.

Dr. Becker is following up on the potential hosts of the disease. The bats in Oklahoma are predicted to harbor them.

If Dr. Becker finds a backyard version of the disease, he won't be able to say that it's an imminent threat to humans. Scientists would have to carry out a series of experiments to judge the risk.

The models are very much a work in progress according to the epidemiologist at the University of California at Davis. They do better than random chance when tested on well-studied viruses.

It isn't at a stage where we can just take those results and create an alert, he said.

A method that could increase the accuracy of the models has been pioneered by Nardus and his colleagues. Rather than looking at the hosts of the virus, their models look at its genes. A computer can be taught to recognize the differences in the genes of viruses that can cause illness.

In their first report on this technique, Dr. Mollentze and his colleagues developed a model that could correctly recognize human-infecting viruses more than 70% of the time. He has some ideas, but he can't say why his model worked. Our cells can alert the immune system to foreign genes. Viruses that can cause harm to our cells may be able to mimic our own genes.

The model was used to come up with a list of high risk animal viruses. It's too many to study in any depth.

Emmie de Wit, a researcher at the Rocky Mountain Laboratories in Hamilton, Mont., said that they can only work on so many viruses.

He acknowledged that he and his colleagues need to find a way to find the worst of the worst.

To follow up on his initial study, Dr. Mollentze is working with Dr. Carlson and his colleagues to combine data about the genes of viruses with data related to the biology and ecology of their hosts. The researchers are getting promising results from this approach.

Data from other sources may make predictions even better. The coating of sugar on a virus' surface is one of the most important features. Different viruses have different patterns of sugar molecule, and that arrangement can have a huge impact on their success. Some viruses can use this to hide. In other cases, the virus can use its sugar molecule to cause a new disease.

Dr. Carlson and his colleagues posted a commentary online that said machine learning may be able to learn a lot from the sugar coating of viruses. Scientists have gathered a lot of knowledge, but they haven't put it into a form that computers can learn from.

"My gut sense is that we know a lot more than we think," Dr. Carlson said.

Dr. de Wit said that machine learning models could help her study animal viruses.

She noted that the models so far have focused on a pathogen's potential for infecting human cells. Before a new human disease can be caused, a virus has to spread from one person to another and cause serious symptoms. She is waiting for a new generation of machine learning models to be able to make those predictions.

She said that they want to know which viruses can cause an outbreak.