GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations.
Built from European Parliament speeches, GeNTE comprises a subset of the English-Italian portion of the Europarl corpus. GeNTE comprises 1500 parallel sentences, which are enriched with manual annotations and feature a balanced distribution of translation phenomena that either entail i) a gender-neutral translation, or ii) a gendered translation in the target language.
For full details about the dataset, see the reference paper below.
How to obtain GeNTE
The GeNTE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).
- If you use GeNTE in your work, please cite the following paper:
Andrea Piergentili*, Beatrice Savoldi*, Dennis Fucci, Matteo Negri, Luisa Bentivogli.
“Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus“.
To appear in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 6th–10th December 2023, Singapore.
(*) equal contribution