Neo-GATE is a bilingual corpus designed to benchmark the ability of machine translation (MT) systems to translate from English into Italian using gender-inclusive neomorphemes. It is built upon GATE (Rarrick et al., 2023), a benchmark for the evaluation of gender rewriters and gender bias in MT.

Neo-GATE includes 841 test entries and 100 dev entries. Each entry is composed of an English source sentence, three Italian references which only differ for the presence of either masculine/feminine/nonbinary words, and the annotation of the target words that are relevant for the evaluation of gender-inclusive MT.

The source sentences are gender-ambiguous, i.e. they provide no information about the gender of human referents. In this setting, words referring to human entities in the target language should express gender with neomorphemes, special characters or symbols that replace masculine and feminine inflectional morphemes.

The Neo-GATE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).

Neo-GATE is available in HuggingFace.

The evaluation code will soon be available at fbk-NEUTR-evAL.

If you use Neo-GATE in your work, please cite the following paper:

@misc{piergentili2024enhancing,
title={Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models},
author={Andrea Piergentili and Beatrice Savoldi and Matteo Negri and Luisa Bentivogli},
year={2024},
eprint={2405.08477},
archivePrefix={arXiv},
primaryClass={cs.CL}
}