Join us at the Low-Code/No-Code Summit on November 9 to learn how to innovate and achieve efficiency. You must register here.

At the Speech Artificial Intelligence Summit today, the company announced its new speech artificial intelligence platform, which it developed through a partnership with Mozilla Common Voice. Open-sourced pretrained models are one of the areas of focus of the ecosystem. Automatic speech recognition models that work universally for every language speaker is the goal of the two companies.

Only a small percentage of the world's spoken languages are supported by standard voice assistants. The company aims to improve linguistic inclusion in speech artificial intelligence and expand the availability of speech data for low-resource languages.

Both Meta and Google are already running a race in which they are competing against each other. A large amount of documents can be translated into many different languages. According to the company, it is the largest language model coverage seen in a speech model today, as it builds a universal speech translator.

Real-time speech-to-speech translation across all languages is possible thanks to the universal speech translator project.

There is a low-code/no-code summit.

It's easy to learn how to scale and govern low-code programs in a straightforward way. Today is your free pass.

Register Here

An ecosystem for global language users 

Helping artificial intelligence models understand speaker diversity and a spectrum of noise profiles is one of the benefits of linguistic inclusion. Developers are able to build, maintain and improve the speech artificial intelligence models and datasets. Users can use the Common Voice dataset to train their models and then offer them as high-quality automatic speech recognition architectures. It is possible for other organizations and individuals to use those architectures to build their speech applications.

De Brito Gottlieb said that demographic diversity is important to capturing language diversity. There are a number of vital factors affecting speech variation. Through this partnership, we aim to make it easier for communities to build models for any language.

There is 24,000 hours of speech data available from 500,000 contributors to the Common Voice platform. There are six new languages in the latest version of Common Voice, as well as more data from female speakers.

Through the Common Voice platform, users can donate their audio data by recording their sentences as short voice clips, which Mozilla checks to make sure the data is quality.

Image Source: Mozilla Common Voice.

Siddharth Sharma, head of product marketing, artificial intelligence and deep learning at Nvidia, told VentureBeat that the speech platform focuses on accents and noise profiles of different language speakers. This has been the focus of the company and they have created a solution that can be tailored.

Nvidia’s current speech AI implementations

Automatic speech recognition, artificial speech translation, and text-to-speech are some of the uses for which the company is developing speech artificial intelligence. State-of-the-art, state-of-the-art, state-of-the-art, state-of-the-art, state-of-the-art, state-of- the-art, state-of- the-art, state- Riva applications can be deployed across all types of data centers and cloud types.

NCS, a multinational company and a transportation technology partner of the Singapore government, created its own text-to-speech engine using local speakers' voice data. Breeze is a local driver's app thattranslates languages such as Mandarin, Hokkien, Malay and Tamil into Singaporean English with the same clarity and expression as a native Singaporean.

T-Mobile's customer experience centers will be able to use artificial intelligence to recommend solutions to thousands of workers on the front line, thanks to a partnership with Nvidia. The software was created using an open-source framework for state-of-the-art artificial intelligence models. T-Mobile engineers were able to fine-tune their models on T-Mobile's custom datasets with the help of these tools.

Nvidia’s future focus on speech AI

Current developments of AST and next-gen speech artificial intelligence will be incorporated into real-time metaverse use case.

He said that they are only able to offer slow translation from one language to another. He said that in the future, people in the metaverse can have instant translation with each other.

He said that the next step is developing systems that will enable fluid interactions with people across the globe.

The mission of VentureBeat is to be a digital town square for technical decision-makers to gain knowledge. Our briefings can be found here.