An AI-generated illustration of robots making science.
Enlarge / An AI-generated illustration of robots making science.

There is a demo of a large language model designed to store, combine and reason about scientific knowledge. It was found that it could also generate realistic nonsense. Meta took the demo offline after being criticized for it.

LIMS learn to write text by studying millions of examples and understanding the statistical relationships between words. They can write convincing-sounding documents, but they can also be filled with false information. Critics call LLMs " stochastic parrots" because of their ability to spit out text without knowing what it means.

Writing scientific literature is what Galactica is focused on. Over 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias were trained on by the authors of the book. This purported high-quality data would lead to high-quality output according to the researchers.

A screenshot of Meta AI's Galactica website before the demo ended.
Enlarge / A screenshot of Meta AI's Galactica website before the demo ended.

On Tuesday, visitors to the website will be able to type in prompts to generate documents such as literature reviews, lecture notes, and answers to questions. The model was presented as a new way to access and manipulate information about the universe.

Advertisement

While some people found the demo promising and useful, others discovered that anyone could type in racist or offensive Prompts just as easily. Someone used it to write a fictional research paper about eating crushed glass.

When Galactica's output wasn't offensive to social norms, the model could assault well-understood scientific facts, spitting out inaccuracies such as incorrect dates or animal names.

I asked #Galactica about some things I know about and I'm troubled. In all cases, it was wrong or biased but sounded right and authoritative. I think it's dangerous. Here are a few of my experiments and my analysis of my concerns. (1/9)

— Michael Black (@Michael_J_Black) November 17, 2022

The demo was pulled Thursday. Thelactica demo is off line at the moment. It's not possible to have fun by using it. Do you think it's happy?

When it comes to potentially harmful generative models, is it up to the general public to use them, or for the publishers of the models to stop misuse?

As deep learning models mature, the industry practice will probably vary between cultures. Government regulation may be involved in shaping the answer.