mGeNTE
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beatrice Savoldi | Jan 13, 2025 | Corpora | 0
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beomseok Lee | Aug 21, 2024 | Corpora | 0
Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...
Read Moreby Mauro Cettolo | Apr 30, 2024 | Corpora | 0
Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Read Moreby Dennis Fucci | Oct 20, 2023 | Corpora | 0
Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker
Read Moreby Beatrice Savoldi | Oct 19, 2023 | Corpora | 1
The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...
Read Moreby Beatrice Savoldi | Oct 9, 2023 | Corpora | 0
GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Matteo Negri | Jun 1, 2023 | Corpora | 0
Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
Read Moreby Mauro Cettolo | May 30, 2023 | Corpora | 0
Annotation of dubbing segments based on the Heroes corpus
Read Moreby Beatrice Savoldi | May 30, 2023 | Corpora | 0
This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies.
Read More
Our pick of the week by @beomseok_lee_: "ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs" by Pooneh Mousavi, @yingzhi_wang, @mirco_ravanelli, and @CemSubakan (2025)
#SLU #speech #multimodal #LLM
Speech-language models show promise in multimodal tasks—but how well are speech & text actually aligned? 🤔
This paper https://arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. 🔍🗣️📄
🔍 Ciao! Stiamo studiando come l'AI viene usata in Italia e per farlo abbiamo costruito un sondaggio!
👉https://bocconi.eu.qualtrics.com/jfe/form/SV_2nTelXaXvJlinbg (è anonimo, dura ~10 m, se partecipi o lo diffondi ci aiuti un sacco🙏)
Ci interessa anche raggiungere persone che non si occupano di AI!
Our pick of the week by @apierg: "Agree to Disagree? A Meta-Evaluation of LLM Misgendering" by Arjun Subramonian, Vagrant Gautam, Preethi Seshadri, Dietrich Klakow, @kaiwei_chang, @YizhouSun (2025).
#LLM #misgendering #gender
Super interesting paper by Subramonian et al: "Agree to Disagree? A Meta-Evaluation of LLM Misgendering" https://arxiv.org/abs/2504.17075
Turns out, misgendering is messier than just pronouns. I'd love to see this analysis extended to grammatical gender languages! #LLM #AI #ethics @fbk_mt
🚀 New tech report out! Meet FAMA, our open-science speech foundation model family for both ASR and ST in 🇬🇧 English and 🇮🇹 Italian.
The models are live and ready to try on @huggingface 👇
🔗
#ASR #ST #OpenScience #MultilingualAI