TOSCA-MP SPEECH GROUND TRUTH

May 30, 2023 | Corpora

This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies. The dataset includes two video genres – television broadcast news and talk-shows – and covers four languages.
Besides segmentation, turn and speaker identification, and orthographic transcription, a very rich annotation on the audio signal has been carried out, both at the linguistic level (overlapped speech and foreign speech) and the acoustic level (e.g. background noise, applause and cough, music such as songs and jingles).
Orthographic transcriptions were generated by non-expert workers through crowdsourcing and revised by expert transcribers. Rich annotation was carried out by expert transcribers only.

Annotated and transcribed videos:

Flemish: 5h:51m (news), 6h:13m (talk shows)
English: 5h:07m (news only)
German: 4h:03m (news), 5h:02m (talk shows)
Italian 3h:54m (news), 7h:21m (talk shows)

Furthermore, a subset of the broadcast news data (around two hours, corresponding to about 20,000 words) was translated by professional translators in the following directions:

Flemish to English
English to Italian
German to English
German to Italian

The TOSCA-MP Speech Ground Truth is distributed under a Creative Commons Attribution 4.0 International license (CC BY 4.0). Due to copyright issues only the ground truth generated is distributed here, but corresponding videos are available (links are provided in the ground truth documentation).

Download TOSCA-MP

Publications or presentations containing results obtained through the use of TOSCA-MP Speech Ground Truth should cite the following reference:

R. Sprugnoli, G. Moretti, M. Fuoli, D. Giuliani, L. Bentivogli, E. Pianta, R. Gretter, F. Brugnara. 2013. “Comparing two methods for crowdsourcing speech transcription“. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8116-8120.

Contacts

For more infomation please contact Roldano Cattoni (cattoni[at]fbk.eu)

MT Group at FBK Follow

#MachineTranslation Research Unit @FBK_research. #nlproc #deeplearning #ai

Avatar MT Group at FBK @fbk_mt ·

16 Jul

Our pick of the week by @mgaido91: "WhisperKit: On-device Real-time ASR with Billion-Scale Transformers" by Atila Orhon, Arda Okan, Berkin Durmus, @zachnagengast, and Eduardo Pacheco (ICML 2025)

#speech #speechtech #whisper #ASR #realtime

Marco Gaido @mgaido91

A couple of weeks before presenting our large-scale speech model compression task at IWSLT, here there is of the first attempts to bring large-scale models to the devices on the edge: https://arxiv.org/pdf/2507.10860... Hope to see more works along this direction!

Reply on Twitter 1945464120620323275 Retweet on Twitter 1945464120620323275 Like on Twitter 1945464120620323275 3 Twitter 1945464120620323275

Avatar MT Group at FBK @fbk_mt ·

9 Jul

Our pick of the week by @FBKZhihangXie: "Adversarial Speech-Text Pre-Training for Speech Translation" by Chenxuan Liu, Liping Chen, Weitai Zhang, Xiaoxi Li, Peiwang Tang, Mingjia Yu, Sreyan Ghosh, and Zhongyi Ye (ICASSP 2025)

#speech #speechprocessing #speechtech #translation

Zhihang Xie @FBKZhihangXie

🚀 AdvST: Adversarial training aligns speech and text distributions without parallel data! Combines adversarial learning + hidden-state swapping to fix length mismatch & boost low-resource speech translation. https://ieeexplore.ieee.org/document/10888294

Reply on Twitter 1942964328593834393 Retweet on Twitter 1942964328593834393 Like on Twitter 1942964328593834393 2 Twitter 1942964328593834393

Retweet on Twitter MT Group at FBK Retweeted

Avatar DVPS @dvps_ai ·

4 Jul

A special evening in Rome to talk about Physical AI and Europe’s role in shaping this new frontier.

Partners from across Europe came together to present the DVPS project, and connect with key people from public institutions, embassies, industries, national & international media.

Reply on Twitter 1941079592242151811 Retweet on Twitter 1941079592242151811 5 Like on Twitter 1941079592242151811 7 Twitter 1941079592242151811

Avatar MT Group at FBK @fbk_mt ·

6 Jul

Thrilled to be part of this amazing project and team!

🚀 DVPS has launched at Translated's HQ!
70 researchers from 20 institutions across 9 countries unite to build next-gen multimodal foundation models that learn from real-world interaction.
A new European AI journey begins.
#DVPS #PhysicalAI #HorizonEurope #MultimodalAI

Reply on Twitter 1941818149499723814 Retweet on Twitter 1941818149499723814 Like on Twitter 1941818149499723814 3 Twitter 1941818149499723814

Load More