Corpora

MCIF

MCIF (Multimodal Crosslingual Instruction Following) is a multilingual human-annotated benchmark based on scientific talks that is designed to evaluate instruction-following in crosslingual, multimodal settings over both short-...

Read More

mGeNTE

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

Read More

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

🚀 New Shared Task: Model Compression for Machine Translation at #WMT2026 (co-located with #EMNLP2026)!
📅 Test data out on June 18th, submissions by July 2nd!
Can you shrink an LLM and keep translation quality high? 🧠🔧
👉 https://www2.statmt.org/wmt26/model-compression.html #NLP #ML #LLM #ModelCompression

🎉We’re happy to welcome our new research engineer, Yiu Chung Leung, joining us to work on multimodal LLMs with a focus on speech. Excited for what’s ahead! 🚀

With @MalvinaNissim and @VivianaPatti, we've been teaching ethics in NLP as a hands-on course across Groningen, Pavia & Turin. We wrote up the experience and received the ✨Best Paper Award✨ at #EACL2026's TeachNLP Workshop. Huge thanks to the organizers and all our students!

Load More