Corpora

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

The 22nd edition of IWSLT will be co-located with @aclmeeting in Vienna, Austria on 31 July-1 Aug 2025!

Stay tuned for the CFP and more info about our 2025 shared tasks! Join our google group for periodic updates.

In "Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps," @BeatriceSavoldi, @DennisFucci, @dirk_hovy, and I show how speech recognition serves different gender groups differently and what to do about it.

Meet @sarapapi, @BeatriceSavoldi, and @negri_teo at EMNLP 2024 in Miami next week! 🌴

They will present two main conference papers about human-centered #MT and #genderbias, and #opensource #speech resources!

📍 Details here: https://mt.fbk.eu/our-postdocs-sara-papi-and-beatrice-savoldi-and-our-researcher-matteo-negri-at-emnlp-2024/

#NLProc #EMNLP2024

Weekly pick from the #MeetweenScientificWatch: "Vcoder: Versatile Vision Encoders for Multimodal LLMs" - A novel encoder boosts object perception in MLLMs, outperforming GPT-4V in visual reasoning! 🌆👀

Load More