mGeNTE
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beatrice Savoldi | Jan 13, 2025 | Corpora | 0
mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...
Read Moreby Beomseok Lee | Aug 21, 2024 | Corpora | 0
Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...
Read Moreby Mauro Cettolo | Apr 30, 2024 | Corpora | 0
Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Read Moreby Dennis Fucci | Oct 20, 2023 | Corpora | 0
Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker
Read Moreby Beatrice Savoldi | Oct 19, 2023 | Corpora | 0
The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the GermanβEnglish language pair. By design, each German source sentence in INES includes an...
Read Moreby Beatrice Savoldi | Oct 9, 2023 | Corpora | 0
GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Marco Gaido | Jul 7, 2023 | Corpora | 0
EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.
Read Moreby Matteo Negri | Jun 1, 2023 | Corpora | 0
Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
Read Moreby Mauro Cettolo | May 30, 2023 | Corpora | 0
Annotation of dubbing segments based on the Heroes corpus
Read Moreby Beatrice Savoldi | May 30, 2023 | Corpora | 0
This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies.
Read More
New benchmark evaluates π #AI detection tools across languages, π finding performance gaps π in low-resource languages and challenges β οΈ with distinguishing AI-translated and hybrid humanβAI text.
@jasonslucas1 @adaku_uchendu @penn_state @Visa
π Call for Participation: @iwslt Offline Speech Translation 2026
Break language barriers with new languages & real-world scenarios + a brand new source-language agnostic speech translation track π
π
Evaluation: Apr 1β15
π
#IWSLT2026 #SpeechAI
π Call for Participation: @iwslt Model Compression 2026
Make large multilingual foundation models small β‘ without losing power in ENβDE/ZH speech-to-text translation.
π
Evaluation: Apr 1β15
#IWSLT2026 #SpeechAI #Qwen2 #EfficientAI
π Call for Participation: @iwslt Subtitling 2026
Turn speech into ready-to-watch subtitles π¬ across TV, News & YouTube!
π
Evaluation: Apr 1β15
#IWSLT2026 #SpeechAI #MultimodalAI