Neo-GATE is a bilingual corpus designed to benchmark the ability of machine translation (MT) systems to translate from English into Italian using gender-inclusive neomorphemes. It is built upon GATE (Rarrick et al., 2023), a benchmark for the evaluation of gender rewriters and gender bias in MT.

Neo-GATE includes 841 test entries and 100 dev entries. Each entry is composed of an English source sentence, three Italian references which only differ for the presence of either masculine/feminine/nonbinary words, and the annotation of the target words that are relevant for the evaluation of gender-inclusive MT.

The source sentences are gender-ambiguous, i.e. they provide no information about the gender of human referents. In this setting, words referring to human entities in the target language should express gender with neomorphemes, special characters or symbols that replace masculine and feminine inflectional morphemes.

The Neo-GATE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).

Neo-GATE is available in HuggingFace.

The evaluation code is available at fbk-NEUTR-evAL.

If you use Neo-GATE in your work, please cite the following paper:

@inproceedings{piergentili-etal-2024-enhancing,
      title={{Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models}},
      author={Piergentili, Andrea and 
      Savoldi, Beatrice and 
      Negri, Matteo and 
      Bentivogli, Luisa},
      booktitle = "Proceedings of the 25th Annual Conference of the European Association for Machine Translation",
      month = jun,
      year="2024",
      address = "Sheffield, United Kingdom",
      publisher = "European Association for Machine Translation",
      pages = "298--312",
}