One of the world's worst neural networks is inside a soundproofed crate. After being presented with an image of the number 6, it takes a moment to identify the digit: zero. Peter McMahon, the physicist-engineer at Cornell University who led the development of the network, defended it with a sheepish smile, pointing out that the handwritten number looked sloppy. The device usually gets the answer right, but sometimes it doesn't, according to the postdoc visiting McMahon's lab. The neural network is a groundbreaker despite the lackluster performance. The researchers tip the crate over, revealing a microphone that's bolted to a speaker. The neural networks that operate in the digital world of 0s and 1s run on sound. A faint chattering fills the lab as the speaker shakes the plate after Wright shows him a new image of a digit. Software runs on Silicon, whereas metallic reverberations do the reading. It beggars belief that the device often succeeds. McMahon said that the shaking metal shouldn't have anything to do with classifying a handwritten digit. McMahon and others hope that the device's distant descendants could change computing because of its primitive reading ability. Computer scientists have discovered that bigger is better when it comes to machine learning. Stuffing a neural network with more artificial neurons improves its ability to tell a dachshund from a Dalmatian, or to succeed at other pattern recognition tasks. Neural networks can pull off unnervingly human tasks like writing essays and creating illustrations. Even grander feats may become possible with more computational muscle. Efforts have been made to develop more powerful and efficient methods of computation. McMahon and a group of physicists want the universe to crunch the numbers for us. When engineers design a plane, they might spend hours on a computer simulation of how air flows around the wings. They can put the vehicle in a wind tunnel and see if it flies. The wind tunnel instantly calculates how wings interact with air.
A wind tunnel is a single minded machine. McMahon is looking for an apparatus that can learn to do anything and can adapt its behavior through trial and error to acquire new abilities. Waves of light, networks of superconductors and branching streams of electrons can all learn from recent work. Benjamin Scellier, a mathematician at the Swiss Federal Institute of Technology Zurich in Switzerland, said that they are reinventing not just the hardware, but also the whole computing paradigm. Until about a decade ago, the only systems that did it well were the brains. Artificial learning models were inspired by the structure of the brain. A deep neural network is a computer program. The network can be thought of as a grid, with layers of cells called neurons, which store values, connected to cells in adjacent layers by lines. If you want the network to read a digit, you should make the first layer of neurons represent a raw image of the 4. The network moves layer by layer and adds the synaptic weights to populate the next layer of neurons. The network's answer is indicated by the neuron with the highest value in the final layer. The network guesses if it saw a 2 if it is the second neuron. A learning algorithm works backwards to teach the network to make smarter guesses. After each trial, it calculates the difference between the guess and the correct answer, which is represented by a high value for the fourth neuron in the final layer and low values elsewhere. The network layer by layer is the place where the weights are calculated in order to get the values of the final neurons to rise or fall as needed. Backpropagation is at the heart of deep learning. Through many guess-and-tweak repetition, backpropagation guides the weights to a configuration of numbers that will be created through a cascade of multiplications initiated by an image. The digitized version of learning that happens in artificial neural networks looks inefficient compared to what goes on in the brain. A human child can learn to talk, read, play games, and much more in a few years if they eat less than 2,000 calories a day. It would take a millennium to learn to chat with the GPT-3, a neural network that is capable of fluent conversation. A large digital neural network is trying to do too much math. Today's giants have to record and manipulate more than half a trillion numbers. The universe is able to pull off tasks far beyond the limits of computers. A room with trillions of trillions of air molecule bouncing around is impossible for a computer to track, but the air itself has no trouble deciding how to behave from moment to moment. The challenge is to build physical systems that can pull off both of the processes needed for artificial intelligence. A system that mastered both tasks would leverage the universe's ability to act without actually doing math. Scellier said that they never compute 3.532 times 1.567 or something. McMahon and his team have made progress on half of the puzzle. McMahon wondered about a strange finding while setting up his lab at Cornell. The top-performing image-recognition neural networks were getting deeper. That is, networks with more layers were better able to take in a bunch of pixels and put out a label. A curve at that position is turned into an output by a function in math. More layers do better because the function is less jagged, moving closer to an ideal curve. The research made McMahon think. One could sidestep the blockiness inherent in the digital approach with a smoothly changing physical system. To domesticate a complicated system, you had to find a way to train it. The titanium plate was chosen by McMahon and his team because of its many patterns of vibrations. To make the plate act like a neural network, they fed in one sound and another representing the synaptic weights and hit the titanium plate at precisely the right moments.Learning to Think
The Thinking Part
The group implemented their scheme in an optical system where the input image and weights are stored in two beams of light, and in an electronic circuit capable of similarly shuffling inputs. The researchers believe the optical system holds particular promise, though in principle any system with Byzantine behavior will do. Light can blend light extremely quickly and it also contains a lot of data. McMahon thinks of his optical neural network as the eyes of self-driving cars, identifying stop signs and pedestrians before feeding that information to the vehicle's computer chip. Training these systems requires a return to the digital world. Backpropagation involves running a neural network in reverse, but plates and crystals can't unmix sounds and light. The group created a digital model of the physical systems. The backpropagation algorithm could be used to calculate how to adjust the weights for accurate answers. The plate was trained to classify handwritten digits correctly 87% of the time. The circuit and laser were 98% and 98% accurate, respectively. Julie Grollier, a physicist at the French National Center for Scientific Research, said that the results showed that neural networks can be trained through backpropagation. The group's metal plate has not yet brought computing closer to the efficiency of the brain. It does not approach the speed of digital neural networks. McMahon believes his devices are proof that you don't need a brain or computer chip to think. The other half of the puzzle is getting a system to learn on its own. A physicist at the Max Planck Institute for the Science of Light in Germany thinks a machine that runs backward is an option. He and a colleague proposed a physical analogue of the backpropagation algorithm. To show that it works, they used a laser setup similar to McMahon's, with the weights adjusted in a light wave that mixes with another input wave. When you try the device, Marquardt said, it's the magic because they push the output to be closer to the right answer. Other researchers are leaving backpropagation behind because they focus on systems that run in reverse. The brain learns in different ways than standard backpropagation. It is only one-way between neuron A and neuron B. Scellier and Yoshua Bengio, a computer scientist at the University of Montreal, developed a learning method called equilibrium propagation. Imagine a network of arrows that act like neurons, their direction indicating a 0 or 1 connected in a grid by springs that act as synaptic weights. The less the linked arrows snap into alignment, the better. First, you twist arrows in the leftmost row to reflect the size of your handwritten digit and hold them fixed while the disturbance ripples out through the springs, flipping other arrows. The rightmost arrows give the answer when the flipping stops. You don't have to train this system by un-flipping the arrows. Instead, you connect another set of arrows showing the correct answer along the bottom of the network; these flip arrows in the upper set, and the whole grid settles into a new equilibrium. You tighten or loosen the springs when you compare the old and new orientations. The springs acquire smarter tensions in a way similar to backpropagation shown by Scellier and Bengio. It was thought that there was no link between physical neural networks and backpropagation. The initial work on equilibrium propagation was all theoretical. In an upcoming publication, Grollier and Jérémie Laydevant describe a machine called a quantum annealer, built by the company D-Wave. The apparatus has a network of thousands of interacting superconductors that can act like arrows and calculate how the springs should be updated. The system can't automatically update the weights. One team has gathered the pieces to build an electronic circuit that does all the heavy lifting. The goal for Dillavou and his partners is to emulate the brain, a smart substance that learns without any single structure calling the shots. They built a self-learning circuit in which variable resistors act as the synaptic weights and neurons are the voltages measured between the resistors. The data is translated into voltages that are applied to a few nodes. The electric current goes through the circuit, looking for the paths that have the least energy and the highest voltages. The answer is the output voltage. They came up with a scheme similar to equilibrium propagation called coupled learning, which was their major innovation. An identical second circuit starts with the correct answer and incorporates it into its behavior. The smarter configuration is achieved by electronics connecting each pair of resistors. The group described their circuit in a preprint last summer, showing that it could distinguish three types of flowers. They are working on a faster device. Even an upgrade won't be enough to beat a state-of-the-art chip. Physicists building these systems suspect that digital neural networks will eventually appear slow and inadequate next to their analog cousins. Digital neural networks can scale up so much before they getbogged down by too much computation. It's hard to believe that there won't be some pretty powerful computers made with these principles.The Learning Part
Closing the Circle