Bias Mitigation and Gender Neutralization Techniques for Automatic Translation
With language technologies entering widespread use and being deployed at a massive scale, their societal impact has raised concern both within and outside the scientific community. Indeed, while such technologies bring undeniable advantages in many contexts, it is also evident that they come with inherent risks, such as reproducing (or even amplifying) real-world asymmetries by codifying and entrenching various kinds of biases.
Within this project we aim at making automatic translation technology more reliable and inclusive when it comes to the notion of gender. This is achieved following two orthogonal perspectives:
- Objective 1: develop new gender bias mitigation techniques able to reduce the tendency of current ST systems to overproduce masculine forms and perpetuate gender stereotypes in their outputs;
- Objective 2: go beyond the masculine/feminine dichotomy and develop resources and methods for gender-neutral translation, where unnecessary and potentially discriminatory gender specifications are avoided.
Project Results: objective 1 (gender bias mitigation)
Publications
- D. Fucci, M. Gaido, S. Papi, M. Cettolo, M. Negri, L. Bentivogli. 2023. “Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection“. In Proceedings of EMNLP 2023.
- D. Fucci, M. Gaido, M. Cettolo, M. Negri, L. Bentivogli. 2023. “No pitch left behind: addressing gender unbalance in automatic speech recognition through pitch manipulation“. In Proceedings of ASRU 2023.
- M.Gaido, D. Fucci, M. Negri, L. Bentivogli. 2023. “How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation“. In Proceedings of Clic-it 2023.
Datasets
- The MuST-SHE benchmark was extended to the English-German direction and used in the “Test Suites” task at WMT 2023
Project Results: objective 2 (gender-neutral translation)
Publications
- A. Piergentili, D. Fucci, B. Savoldi, M. Negri, L. Bentivogli. 2023. “Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges“. In Proceedings of the First Workshop on Gender-Inclusive Translation Technologies (GITT 2023).
- A. Piergentili, B. Savoldi, D. Fucci, M. Negri, L. Bentivogli. 2023. “Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus“. In Proceedings of EMNLP 2023.
- B. Savoldi, M. Gaido, M. Negri, L. Bentivogli. 2023. “Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES”. To appear in Proceedings of WMT 2023.
Datasets
- GeNTE: the first natural benchmark for gender-neutral translation, available for English-Italian. GeNTE is publicly released together with a reference-free evaluation metric, which is trained on synthetic gender-neutral data generated with GPT.
- INES: a synthetic test suite for assessing gender-neutral translation in the German-English direction, which was used in the “Test Suites” task at WMT 2023.
Workshop
- To foster research on this topic, we co-organized the First international Workshop on Gender Inclusive Translation Technologies (GITT-2023), hosted by EAMT 2023.
Open source code
The code developed during the project is released in open source in our public Github repository FBK-fairseq, where it is listed according to the related publications.