mGeNTE
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beatrice Savoldi | Jan 13, 2025 | Corpora | 0
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beomseok Lee | Aug 21, 2024 | Corpora | 0
Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...
Read Moreby Mauro Cettolo | Apr 30, 2024 | Corpora | 0
Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Read Moreby Dennis Fucci | Oct 20, 2023 | Corpora | 0
Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker
Read Moreby Beatrice Savoldi | Oct 19, 2023 | Corpora | 1
The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the GermanโEnglish language pair. By design, each German source sentence in INES includes an...
Read Moreby Beatrice Savoldi | Oct 9, 2023 | Corpora | 0
GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Matteo Negri | Jun 1, 2023 | Corpora | 0
Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
Read Moreby Mauro Cettolo | May 30, 2023 | Corpora | 0
Annotation of dubbing segments based on the Heroes corpus
Read Moreby Beatrice Savoldi | May 30, 2023 | Corpora | 0
This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies.
Read More
๐๐ผ Excited to share our work on Speech Foundation Model for data crowdsourcing at COLING 2025 ๐๐ผ
Our co-author Laurent Besacier (@laurent_besacie) at NAVER LABS Europe will be presenting -- don't miss it.
๐๐ผ Details: https://mt.fbk.eu/1-paper-accepted-at-coling-2025
Exciting news: @iwslt is co-located with #ACL2025NLP again this year! ๐
Interested in speech processing? Check out the new task on instruction following โ any model can participate! ๐
๐
Data release: April 1
โณ Submission deadline: April 15
Donโt miss it! ๐ฌ #NLP #SpeechTech
Weekly pick from the #MeetweenScientificWatch: โVideo-SALMONN: Speech-enhanced audio-visual large language modelsโ โ Redefining video comprehension with speech-aware AV-LLMs and groundbreaking QA accuracy. ๐ฅ๐ค๐ค
Iโm glad to announce that our work โHow "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?โ has been accepted at the Transactions of @aclanthology (TACL)! ๐
The preprint is available here: