MCIF
MCIF (Multimodal Crosslingual Instruction Following) is a multilingual human-annotated benchmark based on scientific talks that is designed to evaluate instruction-following in crosslingual, multimodal settings over both short-...
Read Moreby Beatrice Savoldi | Jan 13, 2025 | Corpora | 0
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beomseok Lee | Aug 21, 2024 | Corpora | 0
Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...
Read Moreby Mauro Cettolo | Apr 30, 2024 | Corpora | 0
Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Read Moreby Dennis Fucci | Oct 20, 2023 | Corpora | 0
Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker
Read Moreby Beatrice Savoldi | Oct 19, 2023 | Corpora | 0
The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...
Read Moreby Beatrice Savoldi | Oct 9, 2023 | Corpora | 0
GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Matteo Negri | Jun 1, 2023 | Corpora | 0
Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
Read Moreby Mauro Cettolo | May 30, 2023 | Corpora | 0
Annotation of dubbing segments based on the Heroes corpus
Read More
Our pick of the week by
@lina_conti
: "Greater accessibility can amplify discrimination in generative AI" by
@CarolinHolterm, @minhducbui_nlp, @KaitlynZhou, @vjhofmann, @kelina1124, @anne_lauscher
📰
#GenderBias #SpeechLLM
Pick of the week @fbk_mt: "Greater accessibility can amplify discrimination in generative AI"
Gender bias in speech-based LLMs examined from multiple angles: a user survey, automatic bias measurement, and pitch manipulation experiments.
https://arxiv.org/pdf/2603.22260
Late update, but we had two great talks last month!
#MachineTranslation #FBK #NLProc #GenderBias #SpeechSynthesis
Our pick of the week by @dhairya_su47605
: "Scaling Laws for Precision" by @tanishqkumar07, Zachary Ankner, @bfspectorShiekh, @blake__bordelon, @Muennighoff, @mansiege, @CPehlevan, Christopher R´e, @AdtRaghunathan
📰
#Quantization #LLM #ScalingLaw
Pick of the week @fbk_mt
Super interesting paper on the limitations of quantization, demonstrating how post-training quantization scales poorly in data.
https://arxiv.org/abs/2411.04330
⭐ For our #PickOfTheWeek, this paper explores an important question for modern speech AI:
🎙️ Which Evaluation for Which Speech Model?
👥 Authors: @Maureendss , @EeshanDhekane
Speech foundation models are evolving rapidly, but evaluation practices are still fragmented.