A good puzzle is something mathematicians like. When you try to find the most efficient way to do it, it can feel like a game. It is similar to trying to solve a cube in a few moves. For matrix multiplication, even in relatively simple cases, every step can present more than 10 12 options.
Over the past 50 years, researchers have used computer searches to find the problem. Last month, a team at the artificial intelligence company DeepMind showed how to tackle the problem from a new direction, reporting in a paper in Nature that they had successfully trained a neural network to discover new fast algorithm for matrix multiplication. It was as if the artificial intelligence had come up with a plan to solve the cube.
Josh Alman is a computer scientist at Columbia University. He and other matrix multiplication specialists said that artificial intelligence will complement existing methods rather than replace them. It is like a proof of concept for a breakthrough. Researchers will benefit from the result.
Three days after the Nature paper was published, a pair of Austrian researchers showed how new and old methods could work together. They used a computer-aided search to improve the neural network's find.
The results show that the path to better algorithms will be full of twists and turns.
One of the most fundamental operations in mathematics is matrix multiplication. A third n-by-n matrix can be generated if you add elements together in certain combinations. A 2-by-2 matrix requires eight multiplications because the standard recipe requires n 3 operations.
This process can be difficult for large matrices with a lot of rows and columns. In 1969 a procedure for multiplication of a pair of 2-by-2 matrices using seven rather than eight multiplication steps was discovered.
If you only want to multiply a pair of 2-by-2 matrices, you should use a simpler method. He realized it would work for larger matrices. The elements of a matrix can be a matrix. A matrix with 20,000 rows and 20,000 columns can be transformed into a two-by-2 matrix with 10,000-by-10,000 matrices. Four 5,000-by-5,000 blocks can be created from each of these matrices. At each level of the hierarchy, he could use his method to multiply two-by-2 matrices. The savings from fewer multiplications grow with the matrix size increasing.
Two distinct subfields have arisen from the search for efficient algorithms for matrix multiplication. How does the number of multiplication steps in a fast scale with n compare to the number of multiplication steps in a slower scale with n? Alman and Virginia Vassilevska Williams are computer scientists at the Massachusetts Institute of Technology. A new technique was used to report a tiny improvement. The only way to win over theoreticians is to use methods like Strassen's.
The second subfield is not as big as the first one. Shmuel Winograd, an Israeli American computer scientist, showed that it is not possible to multiply two-by-2 matrices with less than seven multiplication steps. The minimum number of required multiplications remains an open question. Fast algorithms for small matrices could have an outsize impact since they can be repeated many times.
There are a lot of possibilities. According to Alhussein Fawzi, one of the leaders of the new work, the number of possibilities exceeds the number of atoms in the universe.
Researchers have made progress by changing matrix multiplication into a simpler math problem that is easier for computers to understand. A three-dimensional array of numbers called a tensor can be used to represent the abstract task of multiplication two matrices. A sum of elementary components, called "rank-1" tensors, can be broken up by researchers to represent different steps in the multiplication formula. Finding an efficient multiplication formula reduces the number of terms in a tensor decomposition.
Researchers have found a new way to use fewer multiplication steps for smaller matrices. It has remained out of reach to find a program that is better than the standard and also the one created by Strassen.
The problem was solved by turning it into a game. DeepMind's AlphaGo artificial intelligence learned to play the board game Go well enough to beat the top humans.
Neural networks are webs of artificial neurons sorted into layers with connections that can vary in strength to represent how much each neuron influences those in the next layer. The neural network learns to transform each input it receives into an output that helps it accomplish its overall goal by tweaking the strength of these connections.
The inputs are represented by steps along the way to a valid matrix multiplication scheme. AlphaTensor has chosen the rank-1 tensor as the first input to the neural network and will use it for its first move. An updated rank-1 tensor is fed back into the network as a new input after the rank-1 tensor is subtracted. There are no more rank-1 tensors to take out when every element in the beginning is zero.
The neural network has found that the sum of all the rank-1 tensors is the same as the starting one. The steps that were taken to get there can be translated back into the steps of the multiplication formula.
The game is about Alpha Tensor repeatedly decomposing a tensor to a set of components. If Alpha Tensor can reduce the number of steps, it gets rewarded. There are times when you have to scramble a perfectly ordered face on a cube before you can solve the whole thing.
The team had a solution to their problem. It had to be trained first.
Alpha Tensor requires a lot of data to train on, but it is difficult to find a solution to the problem of tensor decomposition. There weren't many examples of efficient decompositions that the researchers could use. They helped the program get started by training it on the inverse problem.
The easy problem is being used to produce more data for the harder problem. Alpha Tensor generated its own training data as it blundered around looking for efficient decompositions, which worked better than either training method on its own.
Alpha Tensor was trained to represent the multiplication of matrices up to 12-by-12. It was looking for a fast formula for multiplication of matrices of ordinary real numbers and a specific formula for modulo 2 arithmetic. The matrix elements can only be 0 or 1, and 1 + 1 is the only number that matters. Researchers often start with this more restricted space in hopes that they can find ways to work on matrices of real numbers.
Within minutes after training, AlphaTensor came up with a new formula. Thousands of new fast algorithms were found for each matrix size. They had the same number of multiplication steps as the best known ones.
In a few instances, Alpha Tensor beat existing records. It found a new formula for multiplication 4-by-4 matrices in 47 multiplication steps, which is an improvement over the 49 steps needed for two versions of the same formula. The number of required multiplications was reduced from 98 to 96 thanks to it. The new record still lags behind the 91 steps needed to beat the program.
The new high-profile result created a lot of excitement with some researchers heaping praise on the improved status quo. Not everyone in the community was impressed. Vassilevska Williams felt that it was overstated. Another tool is what it is. It's not like the computers beat the humans.
There are important considerations besides speed when it comes to applications of the record-breaking 4-by-4 algorithm.
This is just the beginning. He said that there is a lot of room for improvement.
The greatest strength of Alpha Tensor is that it is not constrained by human intuition, so it can make its own decisions. It's hard for researchers to learn from its accomplishments.
This might not be as big of a problem. A few days after the Alpha Tensor result, the mathematician and his graduate student reported another step forward.