Graph theory is not enough.
Since at least the 18th Century, mathematical language has been used to talk about connections. It usually relies on networks vertices (dots), and edges (lines connecting them). This is a great way to model real-world phenomena. Researchers were forced to increase their toolboxes by the advent of large data sets a few decades back. This gave them ample space to explore new mathematical insights. Josh Grochow, a computer science researcher at the University of Colorado Boulder, stated that there has been a period of exciting growth since then. Researchers have created new network models that can detect complex structures and signals within the big data noise.
Grochow is one of a growing number of researchers who are pointing out the limitations of graph theory when it comes to connecting big data. Every relationship is represented as a graph, or pairwise interaction. Many complex systems cannot be represented using binary connections. The field has made significant progress.
Try to build a network model for parenting. Each parent is connected to a child in some way, but the parenting relationship doesn't consist of just the summation of these two links as graph theory may suggest. Peer pressure is a similar phenomenon.
There are many intuitive models. Leonie Neuhuser, a German University researcher at RWTH Aachen, stated that the peer pressure effect on social dynamics can only be captured if there are already groups in your data. Binary networks don't capture group influences.
Computer scientists and mathematicians use the term higher order interactions to describe the complex ways group dynamics can affect individual behavior. This is in contrast to binary links. These mathematical phenomena can be found in everything, from quantum mechanics' entanglement interactions to the spread of a disease through a population. For example, graph theory could be used by pharmacologists to model drug interactions. But what about three? Or maybe four?
Although these tools are not new for exploring interactions, high-dimensional data sets in the last few years have been a powerful tool for discovery. This has given mathematicians as well as network theorists new insights. These efforts have led to interesting findings about the limits of graphs as well as the potential for scaling up.
Grochow stated that we now know that the network is only a shadow of the whole thing. Modeling a data set as a graph can reveal only a small portion of its true story if it has a complex structure.
Emilie Purvine, a mathematician at the Pacific Northwest National Laboratory, stated that we have realized that the data structures used to study the data are not quite right for what we were seeing in the data.
Mathematicians, computer scientists, and other researchers are increasingly looking for ways to generalize graph theory to explore higher-order phenomena. These interactions have been described in a multitude of ways over the past few years, with many mathematically validating them in high-dimensional data.
Purvine describes the mathematical exploration of higher order interactions as the mapping of new dimensions. She suggested that a graph is like a foundation for a plot of two-dimensional land. There are many three-dimensional buildings that could be built on top. They look identical from ground level but the structures you build on top are different.
Use the Hypergraph
This is where math becomes particularly complicated and fascinating. A hypergraph is a higher-order analog of a graph. It has hyperedges instead of edges. This means that it can represent multi-way or multilinear relationships. A hyperedge can be viewed as a surface instead of a line. It could be compared to a tarp that is staked in three or four places.
This is great, but we still don't know a lot about the relationships between these structures and their conventional counterparts. Mathematicians are learning how graph theory applies to higher-order interactions. This opens up new avenues of exploration.
Purvine cites a simple example from the world of scientific publishing to illustrate the kind of relationship a hypergraph can find between a large data set and an average graph. Imagine two data sets containing papers co-authored up to three mathematicians. Let's call them A, BC, and B for simplicity. The other data set contains six papers with two papers from each of the three distinct groups (AB, AC, and BC). The second data set contains two papers, each one co-authored by all three mathematicians (ABC).
The graph representing co-authorship might look like a triangle. It shows that each mathematician (three points) had collaborated (three nodes). Purvine stated that if your only concern was who collaborated with whom, you would not need a hypergraph.
If you had a hypergraph you could answer questions about more obscure structures. Hyperedges could be included in a hypergraph of the first set, which has six papers. This would show that each mathematician contributed four papers to the hypergraph. Comparing hypergraphs of the two sets will show that authors for papers differ in the first, but are the same in both the second and third.
Hypergraphs in Wild
These higher-order methods are already useful in applied research. For example, ecologists have shown how the reintroductions of wolves to Yellowstone National Park in 1990s led to changes in biodiversity and the structure of the food chains. Purvine and her coworkers recently analysed a collection of biological responses to viral infection, using hypergraphs in order to identify the most important genes. These interactions were also missed by graph theory's usual pairwise analysis.
Purvine said that this is the power we were seeing with hypergraphs.
But, it is easy to make the transition from graphs into hypergraphs difficult. This is illustrated by the canonical-cut problem in graph theory. It asks: Given two distinct nodes, how many edges can you cut to cut all the connections between them? Numerous algorithms are capable of finding the optimal number cuts for any given graph.
What about cutting a hypergraph. Austin Benson, Cornell University mathematician, stated that there are many ways to generalize the notion of a cut to create a hypergraph. He said that there is no single solution because hyperedges can be broken in many ways and create new nodes.
Benson, along with two colleagues, recently attempted to formalize the various ways that a hypergraph can be split up. They discovered that the problem could be solved in polynomial times for some cases. This basically means that a computer can find solutions within a reasonable amount of time. For others, however, the problem was so complex that it was difficult to determine if a solution was possible.
Benson stated that there are many unanswered questions. These impossible results can be interesting because they are not possible to reduce to graphs. The theory side of things is that if it doesn't reduce to something you can find with a graph, it means that something new has occurred.
The Mathematical Sandwich
The hypergraph is not the only way to study higher-order interactions. Topology, the mathematical study and visualization of geometric properties that don't change when objects are stretched, compressed or transformed in any way, is a more visual approach. Topologists look for dimensions and shapes when they study a network. One-dimensional edges connecting nodes might be noticed by topologists who may then ask questions about the properties of objects with one-dimensional properties in other networks. They might also be able to see the triangular two-dimensional surface created by connecting three nodes, and ask similar questions.
These structures are called simplicial complexes by topologists. These structures are effectively hypergraphs that are viewed within the framework of topology. A good example is the use of neural networks, which are part of machine learning. These networks are driven by algorithms that mimic the way our brains process information. Graph neural networks (GNNs), that model connections between things in pairs, are great at inferring missing data from large data sets. However, as with other applications, they can miss interactions that arise only from three or more. Computer scientists have created simplicial neural networks that use higher-order complexes in recent years to extend the GNN approach to finding these effects.
Simplicial complexes link topology and graph theory. They raise mathematical questions, much like hypergraphs. These will be the basis for future research. Special types of subsets within simplicial complexes have similar properties, so in topology they can be seen as having the same properties. The same would hold true for hypergraphs. Subsets would include all hyperedges, including embedded two-way edges.
However, this is not always true. Purvine stated that data often falls in the middle ground, where not all hyperedges, not every complex interaction is equal. It is possible to have a three-way interaction but not the pairwise interaction.
Purvine refers to data as being the middle of a mathematical sandwich. It is bound on top by topology ideas and below by the limitations of graphs. The challenge for network theorists is to discover the new rules of higher-order interactions. She said that mathematicians have plenty of room to play.
Random Walks and Matrices
This sense of creativity extends to other tools, too. Benson said that graphs can be used to describe data in a variety of ways. These connections become more difficult to find when you move up to the higher-order setting.
He said that this is especially evident when you consider a higher-dimensional Markov chain. Markov chains are multistage processes in which each stage is determined by the current position of an element. Researchers have used Markov models for information, energy, and money flow through systems. The most well-known Markov chain example is the random walk. This describes a path in which each step is determined randomly from its predecessor. A random walk can also be a specific graph. Any walk along a graph is shown as a sequence of links moving from one node to the next.
How can you scale up something so simple as walking? Researchers now turn to higher-order Markov chain, which can take into account many previous states and not just the current position. This method was useful in modeling web browsing behavior and traffic flows. Benson and his colleagues have other ideas. They recently presented a model for stochastic processes, also known as random processes. It combines higher-order Markov chain with another tool called Tensors. To test it against taxi rides in New York City, they tested it. Although their model predicted the movement better than a Markov chain, neither model proved to be very reliable.
Tensors are a new tool that can be used to study higher-order interactions. First, think about matrices. These are data structures that organize data into rows and columns. Tensors can be understood by thinking about matrices. Think of matrices composed of matrices. These are matrices with more than rows and columns. These are called tensors. If each matrix was a musical duet then tensors would encompass all possible combinations of instruments.
Tensors are not new to physicists. They have been used for many years to describe the possible quantum states of particles. However, network theorists adopted the tool to increase the power of matrices within high-dimensional data sets. Mathematicians use them to solve new types of problems. Grochow employs tensors in his study of the isomorphism question, which basically asks how to determine if two objects are identical. Recent work with Youming Qiao led to a new method of identifying complex problems that may be hard or impossible to solve.
How to Hypergraph Responsibly
The inconclusive taxi model of Benson raises the question: When are researchers really going to need tools such as hypergraphs? A hypergraph can often deliver exact the same types of analyses and predictions as a graph, provided it is given the right conditions. Michael Schaub, RWTH Aachen University asked Michael Schaub if it is necessary to model the network if something is already contained in the network.
He said that it depends on the data set. Although a graph can be used to represent a social network, it is only one abstraction of the whole. Social networks offer so much more. Higher-order systems are easier to model, but graph theory does not capture how friends influence each other's behavior.
These higher-order interactions won't appear in every data set. New theories are therefore generated by data that challenges the underlying logic that brought Purvine to this field. Math is based on logic, and if you go in the right direction you will find the right answer. That's what I love about math. She says that sometimes it can be difficult to determine the correct way to do something when you're defining new areas of math. You might be driving the community in the wrong way if you don't recognize the multiple ways to do it.
Grochow stated that these tools offer a degree of freedom. They allow researchers to understand their data better, and also allow computer scientists and mathematicians to explore new realms of possibilities. There is so much more to discover. It's beautiful and interesting, and it can be a great source of many great questions.