Where once were black boxes, NIST's new LANTERN illuminates
How do you figure out how to alter a gene so that it makes a usefully different protein? The job might be imagined as interacting with a complex machine (at left) that sports a vast control panel filled with thousands of unlabeled switches, which all affect the device's output somehow. A new tool called LANTERN figures out which sets of switches—rungs on the gene's DNA ladder—have the largest effect on a given attribute of the protein. It also summarizes how the user can tweak that attribute to achieve a desired effect, essentially transmuting the many switches on our machine's panel into another machine (at right) with just a few simple dials. Credit: B. Hayes / NIST

A new statistical tool has been developed by researchers at theNIST. It also works by methods that are fully interpretable, which is an advantage over the conventional artificial intelligence that has aided with protein engineering in the past.

The new tool, called LanTERN, can be used in many different ways. The building blocks of biology are the main elements of these tasks. It remains difficult to determine which specific base pairs are the keys to producing a desired effect, even though it is relatively easy to make changes to the strand of DNA that serves as the blueprints for a givenProtein. Deep neural networks, which are notoriously opaque to human understanding, have been used to find these keys.

The ability to predict genetic edits is described in a new paper. Understanding how changes in the DNA can alter the spikeProtein from the SARS-CoV-2 virus that causes COVID-19 could help predict the future of the Pandemic. Two of the lab workhorses are LacI and GFP. The NIST team was able to show not only that their tool works, but also that its results are interpretable, which is an important characteristic for industry.

"We have an approach that is fully interpretable and that also has no loss in predictive power," said Peter Tonner, a statistician and Computational Biologist at NIST. If you want one of those things, you cannot have the other. You can have both when we show that.

The NIST team is tackling the problem of interacting with a complex machine that sports a vast control panel filled with thousands of un labeled switches The devices output is affected by the switches. Which switches should you flip if your job is to change the way the machine works?

Scientists have to choose a new combination and measure again because the answer might need to be changed. There are a lot of permutations.

The number of combinations can be more than the number of atoms. All the possibilities could not be measured. It's a lot.

DNNs have been tasked with sorting through a large amount of data and predicting which base pairs need to be flipped. If you don't ask for an explanation of how they get their answers, they will prove successful. Black boxes are often described as being inscrutable.

NIST physicist David Ross is one of the paper's co- authors. That is a big problem if you want to engineer something new.

LanTERN is designed to be easy to understand. It uses interpretable parameters to represent the data it analyses. Instead of allowing the number of parameters to grow large and often inscrutable, each one of them has a purpose that is meant to be intuitive, helping users understand what these parameters mean.

TheTERN LAN model uses mathematical tools that are often depicted as arrows. The direction and length of the arrow show how strong the effect is. There is a correlation between the function of the proteins and the direction in which they are pointing.

The directions are often mapped ontobiological mechanisms. The team studied three different datasets and discovered a direction for the folding of the proteins. The model functions as intended whenFolding plays a critical role in how aprotein functions, so identifying this factor across dataset was an indication that the model functions as intended. Users can trace this method when looking at its predictions.

The NIST team decided to go against the results of the DNNs' predictions because other labs had already used them. According to the team, the new approach achieved a new state of the art in predicting the outcome of a problem.

"LANTERN was equal or better than almost all other approaches with respect to prediction accuracy," Tonner said. Predicting changes to LacI is more accurate than any other approach and it has the same accuracy as GFP. It has higher accuracy than all other alternatives, but it didn't beat LanTERN's accuracy.

In order to figure out which sets of switches have the largest effect on a given attribute, Lantern takes a look at how the user can modify that attribute. A lot of the switches on our machine's panel are simplified by Lantern.

Ross said that it reduces thousands of switches to a few small knobs. The first dial will have a big effect, the second will have a different effect but smaller, and so on. As an engineer, it tells me that I can focus on the first and second dial. This is incredibly helpful, and it was laid out for me by LanTERN.

Rajmonda Caceres is a scientist at MIT's Lincoln Laboratory who is familiar with the method behind LanTERN.

There aren't a lot of artificial intelligence methods that explicitly design for interpretability in biology applications. When biologists see the results, they can see what's going on The level of interpretation allows for more interdisciplinary research because biologists can understand how the program is learning.

While he is pleased with the results, Tonner said that Lantern isn't a panacea for explainability problems. He said that exploring alternatives to DNNs would benefit the entire effort.

According to Tonner, the first example of something that rivals DNNs in predictive power is LANTERN. There is a solution to a problem. We hope that it will inspire the development of new interpretations. We don't want it to be a dark place.

More information: Peter D. Tonner et al, Interpretable modeling of genotype–phenotype landscapes with state-of-the-art predictive power, Proceedings of the National Academy of Sciences (2022). DOI: 10.1073/pnas.2114021119 Journal information: Proceedings of the National Academy of Sciences