MCIF
MCIF (Multimodal Crosslingual Instruction Following) is a multilingual human-annotated benchmark based on scientific talks that is designed to evaluate instruction-following in crosslingual, multimodal settings over both short-...
Read Moreby Beatrice Savoldi | Jan 13, 2025 | Corpora | 0
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beomseok Lee | Aug 21, 2024 | Corpora | 0
Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...
Read Moreby Mauro Cettolo | Apr 30, 2024 | Corpora | 0
Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Read Moreby Dennis Fucci | Oct 20, 2023 | Corpora | 0
Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker
Read Moreby Beatrice Savoldi | Oct 19, 2023 | Corpora | 0
The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the GermanβEnglish language pair. By design, each German source sentence in INES includes an...
Read Moreby Beatrice Savoldi | Oct 9, 2023 | Corpora | 0
GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Matteo Negri | Jun 1, 2023 | Corpora | 0
Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
Read Moreby Mauro Cettolo | May 30, 2023 | Corpora | 0
Annotation of dubbing segments based on the Heroes corpus
Read More
β° 4 days left to apply!
π 2 PhD positions still open:
π― Human-Centred Evaluation Frameworks for Multilingual Technologies
β¨ Multimedia Personalization with Multimodal Large Language Models
π
Deadline: 15 May 2026
π Full details: https://iecs.unitn.it/education/admission/call-for-application
π @lina_conti and @luisabentivogli are heading to #LREC2026 in Palma! They'll present two papers:
π "Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation"
Paper link:
π€ What Matters in Data for DPO? I asked myself this question a few days ago while trying to understand how to generate a dataset with preferences to run #DPO. This recent #NeurIPS paper answered some of my questions. The findings are simple but crucial for data creation:
π Come and join our group! π
We offer 2 fully funded PhD positions:
π Human-Centred Evaluation Frameworks for Multilingual Technologies (A6)
π€ Multimedia Personalization with Multimodal Large Language Models (A7)
β° Deadline: 15 May 2026
π Details: https://iecs.unitn.it/education/admission/call-for-application