Facebook’s parent company, Meta has released SeamlessM4T, a foundational multilingual and multitask model that can translate and transcribe across speech and text.
Meta said that SeamlessM4T supports:
• Automatic speech recognition for nearly 100 languages
• Speech-to-text translation for nearly 100 input and output languages
• Speech-to-speech translation, supporting nearly 100 input languages and 35 (+ English) output languages
• Text-to-text translation for nearly 100 languages
• Text-to-speech translation, supporting nearly 100 input languages and 35 (+ English) output languages
Meta said that it is publicly releasing SeamlessM4T under CC BY-NC 4.0 to allow researchers and developers to build on this work.
The technology company said it is also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.
In its blog, Meta said, “SeamlessM4T represents a significant breakthrough in the field of speech-to-speech and speech-to-text by addressing the challenges of limited language coverage and a reliance on separate systems, which divide the task of speech-to-speech translation into multiple stages across subsystems.”
“These systems can leverage large amounts of data and generally perform well for only one modality. Our challenge was to create a unified multilingual model that could do it all.”
Seamless4T capabilities
- The single model provides on-demand translations that enable people who speak different languages to communicate more effectively.
- SeamlessM4T implicitly recognises the source languages, without the need for a separate language identification model.
- Significantly improve performance for the low and mid-resource languages. These are languages that have smaller digital linguistic footprints.
- Maintain strong performance on high-resource languages, such as English, Spanish, and German.
This work builds on advancements Meta and others have made over the years in the quest to create a universal translator.
SeamlessM4T draws on findings from all of Meta’s earlier language-model projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources and with state-of-the-art results.