GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations.
Built from European Parliament speeches, GeNTE comprises a subset of the English-Italian portion of the Europarl corpus. GeNTE comprises 1500 parallel sentences, which are enriched with manual annotations and feature a balanced distribution of translation phenomena that either entail i) a gender-neutral translation, or ii) a gendered translation in the target language.
For full details about the dataset, see the reference paper below.
How to obtain GeNTE
The GeNTE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).
GeNTE contains text data extracted from the Europarl Corpus (common test set 2) and all rights of the data belong to the European Union and/or respective copyright holders. Please refer to Europarl “Terms of Use” for details.
Reference papers
- If you use GeNTE in your work, please cite the following paper:
Andrea Piergentili*, Beatrice Savoldi*, Dennis Fucci, Matteo Negri, Luisa Bentivogli.
“Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus“.
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 6th–10th December 2023, Singapore.
(*) equal contribution
GeNTE annotated translations
We provide the en-it translations of a subset of the GeNTE corpus generated by Amazon Translate, DeePL, Google Translate and GPT-4, along with two layers of sentence-level manual annotations, i.e. neutrality and acceptability.
The GeNTE annotated translations are released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).
Reference papers
- If you use GeNTE annotated translations in your work, please cite the following paper:
Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli.
“A Prompt Response to the Demand for Automatic Gender-Neutral Translation“.
In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 17-22 March 2024, Malta.