DeepMind says its new language model can beat others 25 times its size

The RETRO is an artificial intelligence that matches the performance of neural networks 25 times its size, cutting the time and cost needed to train large models. The database makes it easier to analyze what the artificial intelligence has learned, which could help with removing bias and toxic language.
Jack Rae, who leads the research at DeepMind, says that being able to look things up on the fly can be useful.

Predicting what words will come next in a sentence is what language models do. The bigger the model, the more information it can learn about the world. The 175 billion parameters are the values in a neural network that are adjusted as the model learns. Microsoft's language model Megatron has 530 billion parameters. Large models take a lot of computing power to train, making them out of reach for most organizations.

DeepMind tried to cut the cost of training without reducing the amount of learning. The researchers trained the model on a large amount of data from a code repository. The data set contains text in 10 languages.

7 billion parameters is what RETRO's neural network has. The system makes up for it with a database containing 2 trillion passages of text. The database and neural network are trained at the same time.

RETRO uses a database to look up and compare passages similar to the one it is writing, which makes its predictions more accurate. RETRO can do more with less by outsourcing some of the neural network's memory to the database.

This is the first time that a look-up system has been developed for a large language model, and the first time that the results from this approach have been compared to the best language artificial intelligences.

Bigger isn't always better.

One study looked at how the size of a model affects its performance and the other looked at the potential harms caused by these artificial intelligences.

DeepMind built a large language model to study size. It beat state-of-the-art models on almost all of the language challenges they tested. The researchers found that the 7-billion-parameter model matched the performance of Gopher on most tasks.

The ethics study is a comprehensive survey of problems inherent in large language models. These models are trained to pick up biases, misinformation, and toxic language from the articles and books they read. They spit out harmful statements because they don't know what they mean. Rae says a model that mimicked the data would be biased.