Meta's AI translation breaks 200 language barrier

Meta's quest to translate underserved languages is marking its first victory with the open source release of a language model able to decipher 202 languages.

Named after Meta's No Language Left Behind initiative and dubbed NLLB-200, the model is the first able to translate so many languages, according to its makers, all with the goal to improve translation for languages overlooked by similar projects.

"The vast majority of improvements made in machine translation in the last decades have been for high-resource languages," Meta researchers wrote in a paper [PDF]. "While machine translation continues to grow, the fruits it bears are unevenly distributed," they said.

According to the announcement of NLLB-200, the model can translate 55 African languages "with high-quality results." Prior to NLLB-200's creation, Meta said fewer than 25 African languages were covered by widely used translation tools. When tested against the BLEU standard, Meta said NLLB-200 showed an average improvement of 44 percent over other state-of-the-art translation models. For some African and Indian languages, the improvement reportedly went as high as 70 percent.

Along with its release on GitHub as an open-source model, Meta said it's also providing $200,000 in grants to nonprofits willing to research real-world applications for NLLB-200.

Lofty goals aside, Meta is already putting NLLB-200 to work. The model and other results from the NLLB program "will support more than 25 billion translations served every day on Facebook News Feed, Instagram, and our other platforms."

In addition, Meta has been working with the Wikimedia Foundation to use NLLB-200 as the back end of Wikipedia's Content Translation Tool. By including NLLB-200, the CTT added 10 languages that were unsupported by any other translation tool.

There are still hurdles. Meta explains it had to do quite a bit of work to overcome hurdles in doubling NLLB's capabilities, which it overcame through "regularization and curriculum learning, self-supervised learning and diversifying back-translation." Meta also made extensive use of language model distillation, which reduces previously trained AIs into training data for newer models.

As part of its open sourcing of NLLB-200, Meta is also releasing the new Flores-200 evaluation dataset it built for the project, seed training data, its 200-language toxicity list, its new LASER3 sentence encoder, the stopes data mining library, 3.3 billion and 1.3 billion parameter dense transformer models, 1.3 billion and 600 million parameter models distilled from NLLB-200 and NLLB-200 itself, which contains 54.5 billion parameters.

Not all communities may welcome the inclusion of their language in NLLB, or other programs for that matter. New Zealand's Māori community faced off against translation companies last year, arguing the entities didn't have a right to buy language data and sell the Māori language back to its speakers. ®

About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Aug 12
Ubuntu 22.04.1: Slightly late, but worth the upgrade

Latest shine on the Jammy Jellyfish brings ton of fixes to keep you working smoothly

Aug 12
Our software is perfect. If something has gone wrong, it must be YOUR fault

Something for the Weekend Hello customer, can I help you? Ha ha, just kidding, of course I won't

Aug 12
VMware offers cloudy upgrade lifeline to legacy vCenter users

But warns 'upcoming major release of vSphere' will break some plugins

Aug 11
Dealing with legacy issues around Red Hat crypto versions? Here's a fix

RHEL SHA-ll speak unto RHEL... except from 9 to 6

Aug 11
Want the very latest Windows Insider Dev Channel build? Check your disk space

You might need to free up 24GB. A bug for now, but might be sign of way things are going

Aug 11
Rescuezilla 2.4 is here: Grab it before you need it

A fork of Redo Rescue that outdoes the original - and beats Clonezilla too