Heroes-on-off is an annotation of dubbing segments based on whether the actor’s lips are visible on screen (on) or not (off). The annotation is built over an existing dubbing corpus, Heroes  (Öktem, 2018).

The annotation is performed on the source segments (English) at two levels:

  • segment levelon -> lips are visible for the entire segment; off-> lips are not visible for the entire segment, or mixed -> lips are visible for only a part of the segment
  • word level: for the mixed segments only, the parts where the lips are not visible are included in [ ]

For example: [But that’s your power,] isn’t it? Class = Mixed

Stands for: *But that’s your power* is off-screen, while *isn’t it* appears on-screen.

How to obtain Heroes-on-off

The annotation and the scripts to reproduce the annotation on the original Heroes corpus are freely downloadable.

The release contains two files:

  1. category_list.csv (file containing annotation)
  2. separate_categories.py (script to obtain the parallel corpus with the segments separated per category)

HEROES is licensed under a Creative Commons  Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Heroes-on-off is released under under the same Creative Commons  Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License

Reference paper

If you use Heroes-on-off  in your work, please cite the following paper:

Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri, Marco Turchi.
“The Two Shades of Dubbing in Neural Machine Translation”.
In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pages 4327-4333, Barcelona, Spain (Online), December 8-13 2020.

title = “The Two Shades of Dubbing in Neural Machine Translation”,
author = “Karakanta, Alina  and
Bhattacharya, Supratik  and
Nayak, Shravan  and
Baumann, Timo  and
Negri, Matteo  and
Turchi, Marco”,
booktitle = “Proceedings of the 28th International Conference on Computational Linguistics”,
month = dec,
year = “2020”,
address = “Barcelona, Spain (Online)”,
publisher = “International Committee on Computational Linguistics”,
url = “https://www.aclweb.org/anthology/2020.coling-main.382”,
pages = “4327–4333”