NESPOLE! System has been developed using two scenarios: the tourism scenario and the first aid medical assistance scenario. During the project life three main data collection have been carried on in order to develop the first and the second showcase. During the first year 191 dialogues have been collected. There are 62 German dialogues recorded, 61 Italian, 37 English and 31 French. Particularly an amount of 6 hours of dialogues for Italian and French, 7 hours for English, 8 hours for German has been recorded. Dialogues were about five predefined tourism scenarios. During the last year two major data collections have been carried on: the first one aimed at expanding the tourism scenario and the second one at addressing the medical domain. For the monolingual data collection five tourism scenarios were developed; 66 dialogues were recorded yielding 994.57 minutes of data: 243.52 minutes comprised in sixteen English dialogues, 246 minutes in sixteen German dialogues, 272.52 minutes in seventeen French dialogues and 232.53 minutes in seventeen Italian dialogues. The data collection on the medical domain involved Italian, English and German languages. A total of 49 dialogues were collected. The recording results in a total of 8 hours 25 minutes of audio files.
Our pick of the week by @FBKZhihangXie: "PHRASED: Phrase Dictionary Biasing for Speech Translation" by Peidong Wang, Jian Xue, Rui Zhao, @ChenJunkun, Aswin Shanmugam Subramanian, and Jinyu Li (2025).
#Speech #SpeechAI #Translation #ST #SpeechTranslation
🚀 Boost rare-phrase translation in speech! Uses **bilingual dictionaries** to dynamically bias outputs.
✅ **+21%** recall in streaming ST
✅ **+85%** in multimodal LLMs
🔗: http://arxiv.org/abs/2506.09175
FAMA è il primo foundation model vocale open-science per ita e eng, sviluppato da FBK. Riconosce e traduce la voce usando solo dati e strumenti pubblici: oltre 150.000 ore di audio open, codice e processi completamente accessibili.
@fbk_stek @fbk_mt
https://magazine.fbk.eu/it/news/la-prima-famiglia-di-modelli-open-science-per-il-riconoscimento-vocale-e-la-traduzione-del-parlato/
Emanuele Pianta Award for the Best Master’s Thesis in Computational Linguistics submitted at an Italian university and defended between August 1st 2024 and July 31st 2025
- Deadline: August 1st, 2025 (11:59 pm CEST)
- All details online: https://clic2025.unica.it/emanuele-pianta-award-for-the-best-masters-thesis/
Our pick of the week by @DennisFucci: "Speech Representation Analysis Based on Inter- and Intra-Model Similarities" by Yassine El Kheir, Ahmed Ali, and Shammur Absar Chowdhury (ICASSP Workshops 2024)
#speech #speechtech
Findings from https://ieeexplore.ieee.org/document/10669908 show that speech SSL models converge on similar embedding spaces, but via different routes. While overall representations align, individual neurons learn distinct localized concepts.
Interesting read! @fbk_mt