EuroParl Interviews is a test set dedicated to evaluate automatic subtitling systems. It was compiled from publicly available video interviews from the European Parliament TV (https://www.europarltv.europa.eu/) registered in 2009-2015. We selected 12 videos of 1 hour total duration, amounting to ~6,500 words per target language. The videos present multiple speakers and sometimes contain short interposed clips with news or narratives and the target subtitles are not verbatim, i.e. they do not contain verbatim translations, and demonstrate a high degree of compression and reduction.

The benchmark contains target subtitles (in the format of one SRT file for each video) in two languages (German and Spanish).

This test set has been used as dev set of the IWSLT 2023 Subtitling Track.

How to obtain EuroParl Interviews

The EuroParl Interviews test set is released under the Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0).

All rights of the data (videos, and srt files) belong to the European Union and respective copyright holders (see the Copyright in the official website for more information).

If you use EuroParl Interviews  in your work, please cite the following paper

@article{papi2023directsub,
title={{Direct Speech Translation for Automatic Subtitling}},
author={Papi, Sara and Gaido, Marco and Karakanta, Alina and Cettolo, Mauro and Negri, Matteo and Turchi, Marco},
journal={Transactions of the Association for Computational Linguistics},
year={2023}
}