Corpora

mGeNTE

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

Read More

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

🗣️ Calling all researchers & practitioners in #Speech #Translation!

Help shape the future of the #Simultaneous track at @iwslt 2026. Your input matters!

Please spare 3-5 min to fill out this quick survey📋:
➡️

🚀 JOB ALERT 3: The FBK's MT Unit is hiring!

Join us as a Researcher in Responsible & Trustworthy NLP and advance ethical, fair, and transparent language technologies. If you care about building safe and accountable AI systems, you can apply here:
👉 https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_A_Researcher_in_Responsible_and_Trustworthy_NLP_241757983.htm

Load More