GeNTE

Oct 9, 2023 | Corpora

GeNTE (Gender-Neutral Translation Evaluation) is a natural, bilingual corpus designed to benchmark the ability of machine translation systems to generate gender-neutral translations.

Built from European Parliament speeches, GeNTE comprises a subset of the English-Italian portion of the Europarl corpus. GeNTE comprises 1500 parallel sentences, which are enriched with manual annotations and feature a balanced distribution of translation phenomena that either entail i) a gender-neutral translation, or ii) a gendered translation in the target language.

For full details about the dataset, see the reference paper below.

How to obtain GeNTE

The GeNTE corpus is released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).

GeNTE contains text data extracted from the Europarl Corpus (common test set 2) and all rights of the data belong to the European Union and/or respective copyright holders. Please refer to Europarl “Terms of Use” for details.

Click here to download GeNTE

Reference papers

If you use GeNTE in your work, please cite the following paper:

Andrea Piergentili*, Beatrice Savoldi*, Dennis Fucci, Matteo Negri, Luisa Bentivogli.
“Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus“.
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 6th–10th December 2023, Singapore.

(*) equal contribution

GeNTE annotated translations

We provide the en-it translations of a subset of the GeNTE corpus generated by Amazon Translate, DeePL, Google Translate and GPT-4, along with two layers of sentence-level manual annotations, i.e. neutrality and acceptability.

The GeNTE annotated translations are released under a Creative Commons Attribution 4.0 International license (CC BY 4.0).

Click to Download GeNTE annotated translations

Reference papers

If you use GeNTE annotated translations in your work, please cite the following paper:

Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli.
“A Prompt Response to the Demand for Automatic Gender-Neutral Translation“.
In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 17-22 March 2024, Malta.

MT Group at FBK Follow

#MachineTranslation Research Unit @FBK_research. #nlproc #deeplearning #ai

Avatar MT Group at FBK @fbk_mt ·

24 Jul

Our pick of the week by
@mgaido91

: "FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model" by Jiaqi Li, Chaoren Wang, Xiaohai Tian, Mingjie Chen, Xinyu Liang, Xu Li, Yufan Lin, Junwen Qiu, Jun Zhang, Lu Lu, Haizhou Li and @drwuz
#SLM #EfficientInference

Marco Gaido @mgaido91

Cool to see a work that adaptively chooses at inference how much to compress the input speech sequence, to control inference costs and quality based on the input, without enforcing a global trade-off to each segment: https://arxiv.org/pdf/2606.31247

@fbk_mt

Reply on Twitter 2080585458933719345 Retweet on Twitter 2080585458933719345 Like on Twitter 2080585458933719345 3 Twitter 2080585458933719345

Avatar MT Group at FBK @fbk_mt ·

8 Jul

Our pick of the week by
@FBKZhihangXie : "Speech-XL: Towards Long-Form Speech Understanding in Large Speech Language Models" by Haoqin Sun, @Chenyang_Lyu, Shiwan Zhao, Xuanfan Ni, Xiangyu Kong, @wangly0229, Weihua Luo and Yong Qin
#SpeechLLM #LongFormSpeech #SLU

Zhihang Xie @FBKZhihangXie

🚀 New paper: Speech-XL for long-form SpeechLLMs
📄 https://arxiv.org/abs/2602.05373
🧩 Uses Speech Summarization Tokens to compress local speech intervals into compact KV states efficiently.
✨ Improves long-form speech understanding while reducing memory and FLOPs on 10-minute audio.

Reply on Twitter 2074844624506503630 Retweet on Twitter 2074844624506503630 Like on Twitter 2074844624506503630 3 Twitter 2074844624506503630

Avatar MT Group at FBK @fbk_mt ·

25 Jun

Last week, we had a great talk for our MT Seminar Series!
@julius_gulius a PhD from @cambridgenlp presented a talk on "Effective uses of grammatical knowledge in extremely low-resource Machine Translation"
#MachineTranslation #LowResourceMT #NLProc #FBK

Reply on Twitter 2070140239809593843 Retweet on Twitter 2070140239809593843 Like on Twitter 2070140239809593843 8 Twitter 2070140239809593843

Avatar MT Group at FBK @fbk_mt ·

24 Jun

Our pick of the week by
@BeatriceSavoldi
: "Accuracy: Community Perspectives on Machine Translation" by Yujun Wang,
@EhudReiter
, Shimei Pan,
@egere14
and Wei Zhao #MachineTranslation #TranslationQuality #Evaluation

BeatriceSavoldi @BeatriceSavoldi

📖 #PickoftheWeek @fbk_mt "Accuracy: Community Perspectives on Machine Translation"

A cool analysis of the conflicting interests of different communities around MT(AI developers, LSPs, and users)
https://arxiv.org/pdf/2606.09655
#NLP #MachineTranslation #DiverseStakeholders

Reply on Twitter 2069781136587239840 Retweet on Twitter 2069781136587239840 Like on Twitter 2069781136587239840 6 Twitter 2069781136587239840

Load More