Corpora

mGeNTE

mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.mGente is built upon European Parliament speech data extracted...

Read More

MOSEL

The MOSEL corpus is a multilingual dataset collection including up to 950K hours of open-source speech recordings covering the 24 official languages of the European Union. We collect data by surveying labeled and unlabeled...

Read More

Speech-MASSIVE

Spoken Language Understanding (SLU) involves interpreting spoken input using Natural Language Processing (NLP). Voice assistants like Alexa and Siri are real-world examples of SLU applications. The core tasks in SLU include...

Read More

INES

The INclusive Evaluation Suite (INES) is a test set designed to assess MT systems ability to produce gender-inclusive translations for the German→English language pair. By design, each German source sentence in INES includes an...

Read More

GeNTE

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations. Built from European Parliament speeches,...

Read More
Loading

Our pick of the week by @mgaido91: "AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM" by @RuchaoFan, Bo Ren, Yuxuan Hu, Rui Zhao, Shujie Liu, Jinyu Li (2024).

#NLProc #Speech #instructionfollowing #zeroshot #speechtech #speechllm

AI is transforming cultural heritage, but what have we learned?

Come and join the #AI4Culture movement at our Final Conference on March 10 in Hilversum to explore AI’s current & future impact on cultural heritage.

Details & Registration: https://pretix.eu/EFHA/AI4Culture/

@EU_HaDEA

BOUQuET💐: an OPEN INITIATIVE aimed at building an evaluation dataset for massively multilingual text-to-text MT.

Let’s make MT available for any written language!

We are inviting everyone to contribute: ➡️

More details at: https://arxiv.org/abs/2502.04314

I am happy to announce that I will speak about our recent work "How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?" at the SlatorCon in March 🎊

📃 Preprint available here:

Load More