From a scientific/technical perspective, LiveMemories aimed at scaling up content extraction techniques towards very large scale extraction from multimedia sources, setting the scene for a Content Management Platform for Trentino; using this information to support new ways of linking, summarizing and classifying data in a new generation of digital memories which are `alive’ and user-centered; and to turn the creation of such memories into a communal web activity. Achieving these objectives made Trento a key player in the new Web Science Initiative, digital memories, and Web 2.0. But LiveMemories was also intended to have a social and cultural impact besides the scientific one: through the collection, analysis and preservation of digital memories of Trentino; by facilitating and encouraging the preservation of such community memories; and the fostering of new forms of community, and enrichment of our cultural and social heritage.
🇦🇹 I’ll be in Vienna for #ACL2025NLP!
Interested in training a SpeechLLM without a lot of params or data? Come to my poster:
🖼️ Mon, 18:00
Also into Speech Summarization? Join my IWSLT talk in collab with @fbk_mt:
🎤 Fri, 14:00
Happy to chat - come say hi! 😎
Papers in 🧵
Sara Papi, Maike Z\"ufle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues, "MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks,"
Our pick of the week by @mgaido91: "WhisperKit: On-device Real-time ASR with Billion-Scale Transformers" by Atila Orhon, Arda Okan, Berkin Durmus, @zachnagengast, and Eduardo Pacheco (ICML 2025)
#speech #speechtech #whisper #ASR #realtime
A couple of weeks before presenting our large-scale speech model compression task at IWSLT, here there is of the first attempts to bring large-scale models to the devices on the edge: https://arxiv.org/pdf/2507.10860... Hope to see more works along this direction!
Our pick of the week by @FBKZhihangXie: "Adversarial Speech-Text Pre-Training for Speech Translation" by Chenxuan Liu, Liping Chen, Weitai Zhang, Xiaoxi Li, Peiwang Tang, Mingjia Yu, Sreyan Ghosh, and Zhongyi Ye (ICASSP 2025)
#speech #speechprocessing #speechtech #translation
🚀 AdvST: Adversarial training aligns speech and text distributions without parallel data! Combines adversarial learning + hidden-state swapping to fix length mismatch & boost low-resource speech translation. https://ieeexplore.ieee.org/document/10888294