mGeNTE (Multilingual Gender-Neutral Translation Evaluation) is a natural, multilingual corpus designed to benchmark gender-neutral language and automatic translation.
mGente is built upon European Parliament speech data extracted from the Europarl corpus, and represents a multilingual expansion of the bilingual GeNTE dataset.
For each language pair, mGeNTE comprises 1500 parallel sentences, which are enriched with manual annotations and feature a balanced distribution of translation phenomena that either entail i) a gender-neutral translation (set-N), or ii) a gendered translation in the target language (set-G).
For full details about and access to the dataset, see below.
How to obtain mGeNTE
The mGeNTE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).
mGeNTE contains text data extracted from the Europarl Corpus (common test set 2) and all rights of the data belong to the European Union and/or respective copyright holders.ย Please refer to Europarl “Terms of Use” for details.