A huge inventory of labeled data has been used to teach artificial intelligence systems for a decade. An artificial neural network can be trained to distinguish a tabby from a tiger. The strategy has been a huge success.

Neural networks are trained to associate the labels with minimal and sometimes superficial information. A neural network could use the presence of grass to identify a photo of a cow.

A computer scientist at the University of California, Berkeley said that they are raising a generation of computers that are similar to undergrads who didn't come to class the whole semester and then the night before the final. They do well on the test, but they don't really know much about it.

It might be limited in what it can reveal about biological brains for researchers interested in the intersection of animal and machine intelligence. Humans and animals don't use data sets to learn. They gain a rich and robust understanding of the world by exploring the environment on their own.

Computational neuroscientists are exploring neural networks that have been trained with little or no human data. Human language and image recognition have been models of self-supervised learning. Computational models of the mammal visual and auditory systems built using self-supervised learning models have shown a closer correspondence to brain function than their supervised learning counterparts. The artificial networks seem to be showing some of the methods our brains use to learn.

Flawed Supervision

Around the same time that AlexNet changed the way we classify images, brain models inspired by artificial neural networks came of age. The network was made of layers of artificial neurons and computational units that form connections to one another that can vary in strength. If a neural network fails to classify an image correctly, the learning algorithm updates the weights of the connections between the neurons in order to make it less likely that it will be misclassified in the next round of training. This process is repeated many times with all the training images until the error rate is low.

Alexei Efros in a red scarf and furry hat covered in snow

The first models of the primate visual system were created using neural networks. The relationship between monkeys and artificial neural nets looked promising when they were shown the same images. Hearing and odor detection were followed by artificial models.

Researchers realized the limitations of supervised training as the field got more advanced. Leon Gatys, a computer scientist at the University of Tbingen in Germany, and his colleagues took a picture of a Ford Model T and superimposed a leopard skin pattern across it. The original image was classified as a Model T by the leading artificial neural network. It had no idea of the shape of a car or a leopard.

Learning strategies that are self-supervised are designed to avoid such problems. Humans don't label the data Friedemann Zenke said that the labels came from the data itself. The neural network is asked to fill in the gaps created by self-supervised algorithms. The neural network will be shown the first few words of a sentence and asked to predict the next word. The model appears to learn the structure of the language without being supervised and without being labeled.

There is a similar effort going on. The masked auto-encoder was revealed by Kaiming He and colleagues in late 2021. Almost three-quarters of the images are obscured by the self-supervised learning program. Latent representations are compressed mathematical descriptions that contain important information about an object, and are turned into masked auto-encoders. The shape of an object in an image could be captured by the mathematical description of the image. The representations are then converted into images.

The coding combination is trained to turn masked images into their full version. The system learns from the differences between the real and reconstructed images. The process repeats until the error rate is low. When a trained masked auto-encoder was shown a previously unseen image of a bus, it was able to reconstruct the structure of the bus.

The result was very impressive.

The representations created in a system such as this appear to contain a lot more information than before. The system might be able to learn the shape of a car and not just their patterns. The idea of self-supervised learning is that you build up your knowledge from the bottom up. There was no last-minute preparation for the exams.

Self-Supervised Brains

There are echoes of how we learn in this system. The majority of what the brain does is self-supervised learning according to a researcher. As an object moves, or the next word in a sentence, a biological brain can be used to predict the future location of the object. Only a small portion of our brain's feedback comes from an external source.

Blake Richards in a blue shirt in front of a brick wall

Consider the visual systems of humans and other primate These are the best studied of all animal sensory systems, but neuroscientists have a hard time explaining why they include two separate pathways.

A self-supervised model was created by Richards and his team. The first neural network, called the ResNet architecture, was designed for processing images, while the second, known as a recurrent network, could keep track of a sequence of prior inputs. The team started with a sequence of 10 frames from a video and allowed ResNet to process them one by one. While not matching the first 10 frames, the recurrent network predicted the 11th frame. The neural networks were instructed to update their weights in order to make the prediction better.

An artificial intelligence trained with a single ResNet was good at object recognition but not at categorization. When the ResNet was split into two, the artificial intelligence was able to create representations for objects in one and for movement in the other, just as our brains do.

The videos were shown to the team by researchers at the Allen Institute for Brain Science. The brains of mice are specialized for static images and movement. The neural activity in the mouse's visual cortex was recorded.

Richards' team found similarities between the way the living brain reacted to the videos and the way the artificial intelligence reacted to them. One of the pathways in the artificial neural network became similar to the brain areas of the mouse.

Richards said that a single pathway isn't good enough because it doesn't help predict the future.

The models show a similar story. A team led by Jean-Rémi King trained an artificial intelligence called Wav2Vec 2.0 which uses a neural network to transform audio into representations. A transformer is a component neural network that feeds into some representations. The transformer makes predictions during training. The entire artificial intelligence learns to turn sounds into representations. About 600 hours of speech data was used by the team to train the network.