Illustration by Alex Castro / The Verge

Meta has created an artificial intelligence model that can be used to translate across 200 different languages. The project is open-sourced in the hope that other people will build on it.

Meta sees a so-called universal speech translator as important for growth across its many platforms, which is why the company is working on an artificial intelligence model to create it. Meta can use machine translation to better understand its users, as well as improve the advertising systems that generate 97 percent of its revenue, in addition to being the foundation of a killer app for future projects.

The model’s translations definitely won’t be flawless

The quality of some of the model's translations would likely be below that of better-supported languages, according to experts in machine translation.

The major contribution is data according to Professor Alexander Fraser. There are 100 new languages that can be translated.

The scope and focus of Meta's research is what leads to its achievements. Meta's model is the only one that can translate in more than 40,000 different directions between 200 different languages. Meta wants to include low-resource languages in the model. Many African and Indian languages are not supported by translation tools.

“what would it take to produce translation technology that works for everybody?”

The lack of attention paid to lower-resource languages inspired the team to work on the project. The project was started becauseTranslation doesn't work for the languages we speak. The inclusion motivation is to find translation technology that works for everyone.

Fan says the model is being tested to support a project that helps editors translate articles into other languages. The techniques used to create the model will be integrated into Meta's translation tools.

How do you judge a translation?

Machine translation can be hard to use at the best of times. Even a small number of errors can lead to disastrous results, as was the case when Facebook mistranslationd a post by a Palestinian man, leading to his arrest by Israeli police.

Meta created a test dataset consisting of 3001 sentence-pairs for each language covered by the model, each translated from English into a target language by a professional translator.

The machine translation was compared with the human reference sentences using a benchmark common in machine translation known as the BLEU.

Meta’s model delivers improved benchmarks, but they can’t tell the whole story

Meta says its model has an improvement of 44 percent in BLEU scores across supported languages compared to previous state-of-the-art work. Judgement of progress based on benchmarks requires context.

The relative progress of different machine translation models can be compared with the help of the BLEU scores.

Each sentence in Meta's dataset has been translated by one individual. This gives a baseline for judging translation quality, but the total power of the entire language can't be captured by such a small portion. The scope of the challenges facing the field is shown by the problem that affects all machine translation work and is particularly acute when assessing low-resource languages.

Christian Federmann, a principal research manager who works on machine translation at Microsoft, said the project as a whole was "commendable" in its desire to expand the scope of machine translation software to lesser covered languages.

There are many different translations which are all equally good or bad according to Federmann. It is not possible to give a general level ofBLEU score goodness as they are dependent on the test set used, its reference quality, but also inherent properties of the language pair under investigation.

Fan said that the feedback from human evaluation was very positive and that there were some surprising reactions.

People who speak low-resource languages have a lower bar for translation quality because they don't have any other tool. If you see an error, call it out, because they're so generous.

The power imbalances of corporate AI

The creation of this software is difficult for speakers of low-resource languages. Some communities don't want the attention of Big Tech because they don't want the tools to preserve their language in anyone's hands but their own Questions of quality and influence are more important for others.

Some communities just don’t want Big Tech controlling their language

Interviews were conducted with 44 speakers of low-resource languages to explore some of the questions. The positives and negatives of opening up their languages to machine translation were raised by these interviews.

Tools like this allow speakers to access more media and information. They can be used to translate text from one language to another. If low-resource language speakers consume more media generated by speakers of better supported languages, this could diminish the incentives to create such materials in their own language.

The problems encountered in this project show why balancing these issues is difficult. Immigrants living in the US and Europe make up the majority of the 44 low-resource language speakers who were interviewed to explore these questions.

Professor Fraser said that despite this, the research was done in a way that is becoming more of involving native speakers.

“Overall, I’m glad that Meta has been doing this.”

I'm happy that Meta has been doing this. Fraser said that more of this from companies like Microsoft and Meta is good for the world. Some of the thinking behind why and how to do this is coming from academia as well as the training of most of the listed researchers.

Fan said that Meta tried to address many of the social challenges by broadening their expertise. When it comes to the development of artificial intelligence, it is often very engineering. We can build it, so let's get together and make it happen. She said that they worked with linguist, sociologist, and ethicists. The human problem is the focus of this approach. Who would like to see this technology built? How are they going to build it? What are they going to do with it?

The decision to open-source as many elements of the project as possible, from the model to the evaluation dataset and training code, should help remedy the power imbalance inherent in a corporation working on such an initiative. Grants can be given to researchers who want to contribute to translation projects but can't afford their own.

It is not like one company will be able to solve the problem of machine translation. We want to support these types of community efforts because they are universal.