


But native speakers said the text was riddled with errors. Unfortunately for Google, a number of observers expressed skepticism about the quality of translation for those lower-resource languages because of a marketing error.ĭuring the presentation at the I/O conference of Sundar Pichai, CEO of Google and Alphabet, a backdrop was meant to display the names of the 24 newest languages to Google Translate in their own scripts. Subscribe now! Not a Translation Quality Error $880 BUY NOW Included in our Growth, Pro, and Enterprise plans. And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example.”

Isaac Caswell, Senior Software Engineer at Google Translate and co-author of the May 2022 paper, wrote in Google’s product update: “While this technology is impressive, it isn’t perfect. They used this dataset and a parallel corpus spanning 112 languages to build massively multilingual models “capable of translating across 1,000 languages” - noting that the inclusion of more languages, large-scale back-translation, and self-training contributed to significant quality improvements among zero-resource languages. Researchers sidestepped the lack of parallel data for these languages by gathering monolingual web text, which they then used to build a multilingual unlabeled text dataset containing more than one million sentences. The authors (all affiliated with Google Research) acknowledged that Google Translate’s limited menu of languages has historically skewed “European,” despite high speaker populations of languages spoken in Africa, South and Southeast Asia, as well as the indigenous languages of the Americas. A May 2022 paper, Building MT Systems for the Next Thousand Languages, detailed efforts to create datasets for more than 1,500 languages, develop MT for those languages, analyze system outputs, and identify frequent errors.
