EC Short Clips is a test set dedicated to evaluate automatic subtitling systems. It is composed of short videos from the Audiovisual Service of the European Commission (EC) recorded between 2016 and 2022. These informative clips have an average duration of 2 minutes and cover various topics discussed in EC debates such as economy, environment, and international rights. They contain multiple speakers, and background music is sometimes present during the speech. In total, the benchmark contains 27 English videos having a total duration of 1 hour, corresponding to ~5,000 words for each of the two target languages (German and Spanish).
How to obtain EC Short Clips
The EC Short Clips test set is released under the Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0).
All rights of the data (videos, and srt files) belong to the European Commission and respective copyright holders (see the Copyright in the official website for more information).
If you use EC Short Clips in your work, please cite the following paper
@article{papi2023directsub,
title={{Direct Speech Translation for Automatic Subtitling}},
author={Papi, Sara and Gaido, Marco and Karakanta, Alina and Cettolo, Mauro and Negri, Matteo and Turchi, Marco},
journal={Transactions of the Association for Computational Linguistics},
year={2023}
}