JUMAS addresses the need to build an infrastructure able to optimise the information workflow in order to facilitate later analysis. New models and techniques for representing and automatically extracting the embedded semantics derived from multiple data sources will be developed. The most important goal of the JUMAS system is to collect, enrich and share multimedia documents annotated with embedded semantic minimising manual transcription activity. JUMAS is tailored at managing situations in which multiple cameras and audio sources are used to record assemblies in which people debates and event sequences need to be semantically reconstructed for future consultations. The prototype of JUMAS will be tested interworking with legacy systems, but the system can be viewed as able to support business processes and problem-solving in a variety of domains.
Weekly pick from the #MeetweenScientificWatch: “Video-SALMONN: Speech-enhanced audio-visual large language models” – Redefining video comprehension with speech-aware AV-LLMs and groundbreaking QA accuracy. 🎥🎤🤖
I’m glad to announce that our work “How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?” has been accepted at the Transactions of @aclanthology (TACL)! 🎉
The preprint is available here:
The new @iwslt shared task on instruction following speech models is out! Test sets will be available on the 1st of April and participants have to submit their models by April 15th. Check out the description for more info (or get in touch with us):
📢First Call for Papers 📢
The 22nd @iwslt event will be co-located with @aclmeeting
31 July-1 August 2025 –Vienna, Austria
Scientific submission due March 15, 2025
More details here:
@marcfede @esalesk @ELRAnews @shashwatup9k @MarineCarpuat @_janius_