Corpora

mGeNTE

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

Read More

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the Germanโ†’English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

๐Ÿ™Œ๐Ÿผ Excited to share our work on Speech Foundation Model for data crowdsourcing at COLING 2025 ๐Ÿ™Œ๐Ÿผ

Our co-author Laurent Besacier (@laurent_besacie) at NAVER LABS Europe will be presenting -- don't miss it.

๐Ÿ‘‰๐Ÿผ Details: https://mt.fbk.eu/1-paper-accepted-at-coling-2025

Exciting news: @iwslt is co-located with #ACL2025NLP again this year! ๐ŸŽ‰
Interested in speech processing? Check out the new task on instruction following โ€” any model can participate! ๐Ÿš€
๐Ÿ“… Data release: April 1
โณ Submission deadline: April 15
Donโ€™t miss it! ๐Ÿ’ฌ #NLP #SpeechTech

Weekly pick from the #MeetweenScientificWatch: โ€œVideo-SALMONN: Speech-enhanced audio-visual large language modelsโ€ โ€“ Redefining video comprehension with speech-aware AV-LLMs and groundbreaking QA accuracy. ๐ŸŽฅ๐ŸŽค๐Ÿค–

Iโ€™m glad to announce that our work โ€œHow "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?โ€ has been accepted at the Transactions of @aclanthology (TACL)! ๐ŸŽ‰

The preprint is available here:

Load More