Heroes-on-off is an annotation of dubbing segments based on whether the actor’s lips are visible on screen (on) or not (off). The annotation is built over an existing dubbing corpus, Heroes (Öktem, 2018).
The annotation is performed on the source segments (English) at two levels:
- segment level: on -> lips are visible for the entire segment; off-> lips are not visible for the entire segment, or mixed -> lips are visible for only a part of the segment
- word level: for the mixed segments only, the parts where the lips are not visible are included in [ ]
For example: [But that’s your power,] isn’t it? Class = Mixed
Stands for: *But that’s your power* is off-screen, while *isn’t it* appears on-screen.
How to obtain Heroes-on-off
The annotation and the scripts to reproduce the annotation on the original Heroes corpus are freely downloadable.
The release contains two files:
- category_list.csv (file containing annotation)
- separate_categories.py (script to obtain the parallel corpus with the segments separated per category)
HEROES is licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Heroes-on-off is released under under the same Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License
Reference paper
If you use Heroes-on-off in your work, please cite the following paper:
Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri, Marco Turchi.
“The Two Shades of Dubbing in Neural Machine Translation”.
In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pages 4327-4333, Barcelona, Spain (Online), December 8-13 2020.
Bibtex:
@inproceedings{karakanta-etal-2020-two,
title = “The Two Shades of Dubbing in Neural Machine Translation”,
author = “Karakanta, Alina and
Bhattacharya, Supratik and
Nayak, Shravan and
Baumann, Timo and
Negri, Matteo and
Turchi, Marco”,
booktitle = “Proceedings of the 28th International Conference on Computational Linguistics”,
month = dec,
year = “2020”,
address = “Barcelona, Spain (Online)”,
publisher = “International Committee on Computational Linguistics”,
url = “https://www.aclweb.org/anthology/2020.coling-main.382”,
pages = “4327–4333”
}