Corpora

mGeNTE

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

Read More

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

SHADES: a global dataset to uncover AI bias
Over 50 researchers, 16 languages, thousands of interactions analysed: the international SHADES project investigates how generative language models (LLM) reproduce and amplify cultural stereotypes

◾https://magazine.fbk.eu/en/news/shades-the-new-global-dataset-to-monitor-as-ai-reproduces-and-invents-cultural-stereotypes/

🎉 Excited to share our paper “Different Speech Translation Models Encode and Translate Speaker Gender Differently” was accepted at #ACL2025 (main)!

✍🏼 Big thanks to amazing co-authors: @mgaido91, @negri_teo, @luisabentivogli, @andre_t_martins, @peppeatta!

📄 Preprint out soon!

🎉 Excited to share that our @sarapapi has won the 2024 Best PhD Award from the Information and Engineering Doctoral School at @UniTrento_DISI for her thesis “Direct Speech Translation in Constrained Contexts: The Simultaneous and Subtitling Scenarios.”

#nlproc @FBK_research

🎉 Excited to share that our @sarapapi has won the 2024 Best PhD Award from the Information and Engineering Doctoral School at @UniTrento_DISI for her thesis “Direct Speech Translation in Constrained Contexts: The Simultaneous and Subtitling Scenarios.”

#nlproc @FBK_research

2

Load More