How Big Data Carried Graph Theory Into New Dimensions

Graph theory is not enough.
Since at least the 18th Century, mathematical language has been used to talk about connections. It usually relies on networksvertices (dots), and edges (lines connecting them). This is a valuable way to model real-world phenomena. However, researchers were forced to increase their toolboxes by the advent of large data sets a few decades back. This gave them ample space to explore new mathematical insights. Josh Grochow, a computer science researcher at the University of Colorado Boulder, stated that there has been a period of exciting growth since then. Researchers have created new network models that can detect complex structures and signals within the big data noise.

Original story reprinted by permission of Quanta Magazine. This independent publication published by the Simons Foundation has an editorially neutral content policy. It aims to increase public understanding of science through the coverage of research developments and trends in mathematics, the physical and the life sciences.

Grochow is one of a growing number of researchers who are pointing out the limitations of graph theory when it comes to connecting big data. Every relationship is represented as a graph, or pairwise interaction. Many complex systems cannot be represented using binary connections. The field has made significant progress.

Try to build a network model for parenting. Each parent is connected to a child in some way, but the parenting relationship doesn't consist of just the summation of these two links as graph theory may suggest. Peer pressure is a similar phenomenon.

There are many intuitive models. Leonie Neuhuser, a German University researcher at RWTH Aachen, stated that the peer pressure effect on social dynamics can only be captured if there are already groups in your data. Binary networks don't capture group influences.

Computer scientists and mathematicians use the term higher order interactions to describe the complex ways group dynamics can affect individual behavior. This is in contrast to binary links. These mathematical phenomena can be found in everything, from quantum mechanics' entanglement interactions to the spread of a disease through a population. For example, graph theory could be used by pharmacologists to model drug interactions. But what about three? Or maybe four?

Although these tools are not new for exploring interactions, high-dimensional data sets in the last few years have been a powerful tool for discovery. This has given mathematicians as well as network theorists new insights. These efforts have led to interesting findings about the limits of graphs as well as the potential for scaling up.

Grochow stated that we now know that the network is only a shadow of the whole thing. Modeling a data set as a graph can reveal only a small portion of its true story if it has a complex structure.

Emilie Purvine from the Pacific Northwest National Laboratory is enthusiastic about hypergraphs' ability to find subtler connections between data points. Photograph by Andrea Starr/Pacific Northwest National Laboratory

Emilie Purvine, a mathematician at the Pacific Northwest National Laboratory, stated that we have realized that the data structures used to study the data aren't exactly fitting the data.

Mathematicians, computer scientists, as well as other researchers, are increasingly looking for ways to generalize graph theory in its many forms. This allows them to explore higher-order phenomena. These interactions have been described in a variety of ways over the years, with many mathematical verifications possible using high-dimensional data sets.

Purvine describes the mathematical exploration of higher order interactions as the mapping of new dimensions. She suggested that a graph is like a foundation for a plot of two-dimensional land. There are many three-dimensional buildings that could be built on top. They look identical from ground level but the structures you build on top are different.

Use the Hypergraph

This is where math becomes particularly complicated and interesting. A hypergraph is a higher-order analog of a graph. It has hyperedges instead of edges. This means that it can represent multi-way or multilinear relationships. A hyperedge can be viewed as a surface instead of a line. It could be compared to a tarp that is staked in three or four places.

This is great, but we still don't know a lot about the relationships between these structures and their conventional counterparts. Mathematicians are learning how graph theory applies to higher-order interactions. This opens up new avenues of exploration.

A hypergraph can reveal the relationships that a big data set can produce. An ordinary graph cannot. This is why the world of scientific publishing, a very simple example, can be used to illustrate this. Imagine two data sets containing papers coauthored up to three mathematicians. Let's call them A, BC, and C. The other data set contains six papers with two papers from each of the three distinct groups (AB, AC and BC). The second data set contains two papers, each one coauthored by all three mathematicians (ABC).