Machine learning offers high-definition glimpse of how genomes organize in single cells

Credit: CC0 Public Domain
Within the microscopic boundaries of a single human cell, the intricate folds and arrangements of protein and DNA bundles dictate a person's fate: which genes are expressed, which are suppressed, andimportantlywhether they stay healthy or develop disease.

Science is still not sure how genome folding occurs in cells and how it affects gene expression, despite the potential health effects these bundles could have on humans. A new algorithm, developed by the Carnegie Mellon University's Computational Biology Department, illustrates this process with unprecedented clarity.

Higashi is the algorithm. It uses hypergraph representation learning, a form of machine learning that can suggest music to an app and perform 3D object identification.

Ruochi Zhang, a School of Computer Science student, led the project along with Tianming Zhou, a Ph.D. candidate, and Jian Ma (Ray and Stephanie Lane Professor of Computational Biology). Zhang named Higashi after the traditional Japanese sweet. This is a continuation of a tradition that he started with other algorithms.

Ma stated, "He approaches research with passion, but also with a sense for humor sometimes."

Their research was published by Nature Biotechnology. It was part of a multi-institution research centre that sought to better understand the three-dimensional structure and effects of changes in this structure on cell functions in disease and health. CMU is the center's principal investigator and it was funded by National Institutes of Health. It cost $10 million.

This algorithm is the first to make use of sophisticated neural networks to analyze single-cell genome organization. A hypergraph connects multiple vertices together to form an edge. An ordinary graph has two vertices that join to create an intersection.

The DNA-RNA-protein complex called Chromin is what makes up chromosomes. It folds and arranges itself inside the cell nucleus to form chromatin. This process can influence the expression of genes by connecting the functional elements of different ingredients together. It allows them to activate or suppress particular genetic traits.

Higashi uses a new technology called single-cell Hi-C. This creates snapshots of chromatin interaction occurring simultaneously in one cell. Higashi offers a deeper analysis of the organization of chromatin in single cells within complex tissues and biological processes. It also shows how interactions change from one cell to another. This analysis allows scientists see subtle variations in the organization and folding of chromatin within cells. However, this analysis is important for identifying potential health implications.

Ma stated that "The variability in genome organization has strong implications for gene expression and cellular status."

Scientists can also use the Higashi algorithm to simultaneously analyze genomic signals that were profiled together with single-cell HiC. This feature will eventually allow Higashi to expand his capabilities, which is timely considering the growth in single-cell data Ma anticipates seeing in the coming years thanks to projects like the NIH 4D Nucleome Program to which his center belongs. The increased data flow will allow scientists to develop more algorithms that will improve their scientific understanding of the structure and function of the human genome and how it functions in health and disease.

Ma stated, "This is an area that is rapidly changing." "The computational development is also rapidly moving, as is experimental technology."

Further AI can identify single diseased cells

Jian Ma, Multiscale integrative single-cell HiC analysis with Higashi Nature Biotechnology (2021). Journal information: Nature Biotechnology Jian Ma, Multiscale and integrative single-cell Hi-C analysis with Higashi,(2021). DOI: 10.1038/s41587-021-01034-y