Corpora

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

Our @apierg presenting our #calamita challenges at #CLiCit2024: machine translation and gender-fair generation.

Poster session upcoming, see you there!

For more details:
👉 MagneT: https://clic2024.ilc.cnr.it/wp-content/uploads/2024/12/120_calamita_long.pdf
👉 GFG: https://clic2024.ilc.cnr.it/wp-content/uploads/2024/12/122_calamita_long.pdf

2

Our very own @DennisFucci presenting the challenges of Explainability for Speech Models at #CLiCit2024.

👉Check out the paper: https://clic2024.ilc.cnr.it/wp-content/uploads/2024/12/44_main_long.pdf

@BeatriceSavoldi @mgaido91 @negri_teo @MauroCettolo @luisabentivogli

🌍 Interested in Simultaneous Translation? We're organizing the @iwslt SimulST Shared Task and would love your input for the 2025 edition. 🗣️

📝 Share your thoughts here:

Load More