The PF-STAR project intended to contribute to establish future activities in the field of multisensorial and multilingual communication (interface technologies) on firmer bases by providing technological baselines, comparative evaluations, and assessment of prospects of core technologies, which future research and development efforts can build from. To this end, the project addressed three crucial areas: technologies for speech-to-speech translation, the detection and expressions of emotional states, and core speech technologies for children. For each of them, promising technologies/approaches were selected, further developed and aligned towards common baselines. The results were assessed and evaluated with respect to both their performances and future prospects. To maximise the impact, the duration of the project was limited to 24 months, and the workplan was designed to delivered results in two stages: at mid-project term (month 14), and at the end of the project. This permitted to make relevant results available as soon as possible, and in particular on time for them to be used during the preparatory phase of the first call of FP6. The Lehrstuhl für Informatik 6 was involved in the comparative evaluation and further development of speech translation technologies. The statistical approach was compared to an interlingua based approach. After the evaluation phase, the two approaches were further developed and aligned towards common baselines. PF-STAR was supported by the European Union.
Our pick of the week by @FBKZhihangXie: "PHRASED: Phrase Dictionary Biasing for Speech Translation" by Peidong Wang, Jian Xue, Rui Zhao, @ChenJunkun, Aswin Shanmugam Subramanian, and Jinyu Li (2025).
#Speech #SpeechAI #Translation #ST #SpeechTranslation
🚀 Boost rare-phrase translation in speech! Uses **bilingual dictionaries** to dynamically bias outputs.
✅ **+21%** recall in streaming ST
✅ **+85%** in multimodal LLMs
🔗: http://arxiv.org/abs/2506.09175
FAMA è il primo foundation model vocale open-science per ita e eng, sviluppato da FBK. Riconosce e traduce la voce usando solo dati e strumenti pubblici: oltre 150.000 ore di audio open, codice e processi completamente accessibili.
@fbk_stek @fbk_mt
https://magazine.fbk.eu/it/news/la-prima-famiglia-di-modelli-open-science-per-il-riconoscimento-vocale-e-la-traduzione-del-parlato/
Emanuele Pianta Award for the Best Master’s Thesis in Computational Linguistics submitted at an Italian university and defended between August 1st 2024 and July 31st 2025
- Deadline: August 1st, 2025 (11:59 pm CEST)
- All details online: https://clic2025.unica.it/emanuele-pianta-award-for-the-best-masters-thesis/
Our pick of the week by @DennisFucci: "Speech Representation Analysis Based on Inter- and Intra-Model Similarities" by Yassine El Kheir, Ahmed Ali, and Shammur Absar Chowdhury (ICASSP Workshops 2024)
#speech #speechtech
Findings from https://ieeexplore.ieee.org/document/10669908 show that speech SSL models converge on similar embedding spaces, but via different routes. While overall representations align, individual neurons learn distinct localized concepts.
Interesting read! @fbk_mt