SubSONAR

May 17, 2024 | Software

SubSONAR evaluates the quality of SRT files using the multilingual multimodal SONAR model.

The evaluation accounts for the semantic similarity (computed as a cosine similarity) between each subtitle block and the corresponding audio to which the block is assigned to (through the timestamps in the SRT). The returned scores range in [-1, 1] where the higher, the better.

SubSONAR is available both in its open source repository on GitHub and on PyPi.

Licence

SubSONAR is licensed under Apache Version 2.0. However, the SONAR encoders have a dedicated license that can be found in their repository LICENSE. Please check the license for the encoders you are using.

Credits

If using this repository, please consider citing:

@inproceedings{gaido-et-al-2024-sbaam,
title = {{SBAAM! Eliminating Transcript Dependency in Automatic Subtitling}},
author = {Gaido, Marco and Papi, Sara and Negri, Matteo and Cettolo, Mauro and Bentivogli, Luisa},
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
}

MT Group at FBK Follow

#MachineTranslation Research Unit @FBK_research. #nlproc #deeplearning #ai

Avatar MT Group at FBK @fbk_mt ·

3 Jul

Our pick of the week by @FBKZhihangXie: "PHRASED: Phrase Dictionary Biasing for Speech Translation" by Peidong Wang, Jian Xue, Rui Zhao, @ChenJunkun, Aswin Shanmugam Subramanian, and Jinyu Li (2025).

#Speech #SpeechAI #Translation #ST #SpeechTranslation

Zhihang Xie @FBKZhihangXie

🚀 Boost rare-phrase translation in speech! Uses **bilingual dictionaries** to dynamically bias outputs.
✅ **+21%** recall in streaming ST
✅ **+85%** in multimodal LLMs
🔗: http://arxiv.org/abs/2506.09175

Reply on Twitter 1940680349392294302 Retweet on Twitter 1940680349392294302 Like on Twitter 1940680349392294302 2 Twitter 1940680349392294302

Retweet on Twitter MT Group at FBK Retweeted

Avatar Fondazione Bruno Kessler - FBK @fbk_research ·

30 Jun

FAMA è il primo foundation model vocale open-science per ita e eng, sviluppato da FBK. Riconosce e traduce la voce usando solo dati e strumenti pubblici: oltre 150.000 ore di audio open, codice e processi completamente accessibili.

@fbk_stek @fbk_mt

https://magazine.fbk.eu/it/news/la-prima-famiglia-di-modelli-open-science-per-il-riconoscimento-vocale-e-la-traduzione-del-parlato/

Reply on Twitter 1939609858812027189 Retweet on Twitter 1939609858812027189 9 Like on Twitter 1939609858812027189 14 Twitter 1939609858812027189

Retweet on Twitter MT Group at FBK Retweeted

Avatar AILC_NLP @ailc_nlp ·

30 Jun

Emanuele Pianta Award for the Best Master’s Thesis in Computational Linguistics submitted at an Italian university and defended between August 1st 2024 and July 31st 2025
- Deadline: August 1st, 2025 (11:59 pm CEST)
- All details online: https://clic2025.unica.it/emanuele-pianta-award-for-the-best-masters-thesis/

Reply on Twitter 1939681444143710301 Retweet on Twitter 1939681444143710301 3 Like on Twitter 1939681444143710301 3 Twitter 1939681444143710301

Avatar MT Group at FBK @fbk_mt ·

19 Jun

Our pick of the week by @DennisFucci: "Speech Representation Analysis Based on Inter- and Intra-Model Similarities" by Yassine El Kheir, Ahmed Ali, and Shammur Absar Chowdhury (ICASSP Workshops 2024)

#speech #speechtech

Dennis Fucci @DennisFucci

Findings from https://ieeexplore.ieee.org/document/10669908 show that speech SSL models converge on similar embedding spaces, but via different routes. While overall representations align, individual neurons learn distinct localized concepts.
Interesting read! @fbk_mt

Reply on Twitter 1935711333431037957 Retweet on Twitter 1935711333431037957 2 Like on Twitter 1935711333431037957 3 Twitter 1935711333431037957

Load More