Corpora

mGeNTE

by Beatrice Savoldi | Jan 13, 2025 | Corpora | 0

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

MOSEL

by Sara Papi | Oct 31, 2024 | Corpora | 0

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Speech-MASSIVE

by Beomseok Lee | Aug 21, 2024 | Corpora | 0

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

WAGS

by Mauro Cettolo | Apr 30, 2024 | Corpora | 0

Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks

GenderCrawl

by Dennis Fucci | Oct 20, 2023 | Corpora | 0

Text corpora for Spanish, French, and Italian containing gendered words referring to the first-person speaker

INES

by Beatrice Savoldi | Oct 19, 2023 | Corpora | 0

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

GeNTE

by Beatrice Savoldi | Oct 9, 2023 | Corpora | 0

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

EC Short Clips

by Marco Gaido | Jul 7, 2023 | Corpora | 0

EC Short Clips is a test set dedicated to evaluate automatic subtitling systems.

EuroParl Interviews

by Marco Gaido | Jul 7, 2023 | Corpora | 0

EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems.

NEuRoparl-ST

by Matteo Negri | Jun 1, 2023 | Corpora | 0

Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology

Heroes-ON-OFF

by Mauro Cettolo | May 30, 2023 | Corpora | 0

Annotation of dubbing segments based on the Heroes corpus

TOSCA-MP SPEECH GROUND TRUTH

by Beatrice Savoldi | May 30, 2023 | Corpora | 0

This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies.

MT Group at FBK Follow

#MachineTranslation Research Unit @FBK_research. #nlproc #deeplearning #ai

Avatar MT Group at FBK @fbk_mt ·

9 Apr

At our last seminar, @MaikeZufle presented her work on Duplex Models, "Building Controllable Speech Systems"

Reply on Twitter 2042273419534754143 Retweet on Twitter 2042273419534754143 4 Like on Twitter 2042273419534754143 9 Twitter 2042273419534754143

Avatar MT Group at FBK @fbk_mt ·

9 Apr

🎉We’re happy to welcome our new research engineer, Yiu Chung Leung, joining us to work on multimodal LLMs with a focus on speech. Excited for what’s ahead! 🚀

Reply on Twitter 2042244043254432027 Retweet on Twitter 2042244043254432027 2 Like on Twitter 2042244043254432027 9 Twitter 2042244043254432027

Retweet on Twitter MT Group at FBK Retweeted

Avatar BeatriceSavoldi @beatricesavoldi ·

7 Apr

With @MalvinaNissim and @VivianaPatti, we've been teaching ethics in NLP as a hands-on course across Groningen, Pavia & Turin. We wrote up the experience and received the ✨Best Paper Award✨ at #EACL2026's TeachNLP Workshop. Huge thanks to the organizers and all our students!

Reply on Twitter 2041435543037255751 Retweet on Twitter 2041435543037255751 4 Like on Twitter 2041435543037255751 13 Twitter 2041435543037255751

Avatar MT Group at FBK @fbk_mt ·

31 Mar

📢Thrilled to welcome our new postdoc, @aissawafa94! Looking forward to working together on trustworthy and responsible language technologies 🚀

Reply on Twitter 2038896487703236753 Retweet on Twitter 2038896487703236753 7 Like on Twitter 2038896487703236753 14 Twitter 2038896487703236753