dna
Credit: CC0 Public Domain

The "gaps" within the sequence alignments used in evolutionary biology can provide important information over time. Those studying distantly related species might be interested in the finding. The work is in the national academy of sciences

Biologists look at how the sequence of genes change over time. Sequence length changes can be when specific nucleotides are deleted or added at certain positions.

Jeff Thorne is a professor of biological sciences and statistics at NC State and a co-corresponding author of the research. A substitution is when a letter in a word changes. Adding or leaving out letters or words are related to deletions.

When looking at evolutionary DNA changes, the first step is to build a sequence alignment. To figure out how all of the sequence correspond to one another, you have to align them into columns. There can be different types within a column due to changes in sequence or deletion. A gap is put in the alignment column for a sequence that doesn't have a corresponding nucleotide.

"Conventionally, when using sequence alignments to do analyses, the gaps within alignment columns are treated as missing data that don't give any information about the replacements," she says. The research community has assumed that gap locations are their own. What if that assumption isn't correct?

A simple statistical test was created by Thorne and his colleagues. In roughly two-thirds of the 1390 sets they tested, the assumption of independence between gap locations was not accepted.

There is a chance that gap locations give useful information. Evolutionary biologists should come up with better ways to get this information.

The research showed how it can be difficult to base evolutionary conclusions on a single optimal alignment. If the alignment is not right, what should we do? What if the alignment is not neutral?

Researchers tend to choose replacements over gaps when building a sequence alignment because it can contain too many gaps. The small errors in alignments between closely related species will most likely not affect outcomes, over time, and particularly in comparisons between diverse species.

The principal research scientist at the Korea Polar Research Institute says that sometimes their best guesses are biased. Hopefully this study will help us be aware of possible pitfalls. There are problems with conventional statistical methods that need to be fixed.

Ben Redelings is a researcher at Duke University and the University of Kansas.

More information: Correlations between alignment gaps and nucleotide substitution or amino acid replacement," Proceedings of the National Academy of Sciences (2022). DOI: 10.1073/pnas.2204435119 Journal information: Proceedings of the National Academy of Sciences