Machine learning uncovers 'genes of importance' in agriculture and medicine

Corn (maize), growing in the NYU Rose Sohn Zegar Greenhouse, on the NYU Center for Genomics & Systems Biology's roof. Credit: NYU Coruzzi Lab
According to Nature Communications, machine learning can identify "genes that are important" to help crops grow without using as much fertilizer. It can also predict disease outcomes in animals and additional traits in plants, which is a great example of its potential applications beyond agriculture.

Systems biology faces both challenges and opportunities when it comes to using genomic data to predict the outcome of agriculture and medicine. Researchers are trying to figure out how to make the most of the large amount genomic data that is available to predict how organisms will respond to changes in nutrition, pathogen exposure, and other factors. This would help inform crop improvement, disease prognosis and epidemiology as well as public health. It is still difficult to accurately predict such complex outcomes in medicine and agriculture from genome-scale data.

NYU researchers and their collaborators in the U.S., Taiwan, and Taiwan tackled the challenge with machine learning, an artificial intelligence type that detects patterns in data.

"We show that focusing our attention on genes whose expression patterns have been evolutionarily conserved between species increases our ability to predict 'genes of significance' for growth performance for staple crops as well as disease outcomes for animals," Gloria Coruzzi, Carroll & Milton Petrie Professor at NYU's Department of Biology and Center for Genomics and Systems Biology, and the paper's senior writer.

Chia-Yi Cheng, a lead author on this study at NYU's Center for Genomics and Systems Biology and National Taiwan University said that "our approach exploits natural variation in genome-wide expression and associated phenotypes within and across species." "We demonstrate that reducing our genomic input to genes whose expression pattern is conserved within and between species is a biologically sound way to reduce the dimensionality of genomic data. This significantly improves our machine learning models' ability to identify genes important for a trait."

Corn (or maize) is grown in the NYU Rose Sohn Zegar Greenhouse, which is located on the NYU Center for Genomics & Systems Biology's roof. Credit: NYU Coruzzi Lab

The researchers proved that the genes responsible for nitrogen responsiveness are evolutionary conserved among two different plant species. They used Arabidopsis, which is a small flowering plant commonly used in plant biology, as a test organism. This also improved the predictive ability of machine learning models to identify important genes to determine how plants use nitrogen. The main ingredient of fertilizer, nitrogen is essential for plants. Crops that use it more efficiently will grow better and need less fertilizer. This has both economic and environmental benefits.

Eight master transcription factors were identified by the researchers as important for nitrogen use efficiency. They found that altering gene expression in Arabidopsis and corn could increase plant growth when low-nitrogen soils are present. The experiments were conducted in both the NYU lab and in cornfields at University of Illinois.

We can quickly improve this trait by predicting which corn hybrids will use nitrogen fertilizer more efficiently. Increasing the efficiency of nitrogen use in corn and other crops has three key benefits: lowering farmer costs and reducing pollution. It also reduces greenhouse gas emissions. Stephen Moose is the Alexander Professor of Crop Sciences at University of Illinois Urbana-Champaign.

The researchers also proved that evolutionarily-informed machine learning can be applied to other traits or species, predicting additional traits in plants such as biomass and yield in both Arabidopsis (and corn). This approach is also capable of predicting genes that are important for drought resistance in rice, another staple crop. It can also predict disease outcomes in animals by studying mouse models.

Coruzzi stated, "Because our evolutionarily-informed pipeline can also work in animals, this underlines the potential for uncovering genes of significance for any physiological and clinical traits of concern across biology, medicine, and agriculture."

"Many of the key traits that are important for agronomic and clinical purposes are genetically complex. It's therefore difficult to determine their control and inheritance. "Our success shows that big data and systems-level thinking can make these notoriously challenging challenges tractable," said Ying Li, a professor at Purdue University's Department of Horticulture and Landscape Architecture.

More information: Evolutionarily-informed machine learning increases the power of predictive genes-to-phenotype relations, Nature Communications (2021). Journal information: Nature Communications Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships,(2021). DOI: 10.1038/s41467-021-25893-w