Machine Translation (MT) Quality Estimation (QE) is the task of determining the quality of an automatic translation given its source sentence and without recourse to reference translations. While most of the currently available datasets are obtained through manual annotation of (source,target) sentence pairs with continuous scores or Likert values (e.g. wrt a 5-point scale where 1=”Incomprehensible” and 5=”Flawless translation), little has been done to produce binary datasets with “good” (useful, or suitable for post-editing) vs “bad” (useless, needs complete rewriting) judgements. This kind of judgements is particularly useful to train QE models useful for specifc applications such as the integration in a Computer-assisted translation environment where a sharp distinction between “good” and “bad” translation suggestions is needed.

BinQE is a collection of binary QE datasets for different language pairs, where the labels have been automatically produced by applying the method described in (Turchi et al. 2013). More specifically, BinQE contains:

  • 2,754 English-Spanish news sentences from the WMT 2013 datasets;
  • 10,881 French-English news sentences from the corpus described in (Potet et al., 2010);
  • 1,261 English-Italian sentences from the legal domain collected within the MateCat EU-Project

How to obtain BinQE

BinQE is freely available for research purposes, and is distributed under a Creative Commons Attribution- NonCommercial-ShareAlike license.

Reference paper: whenever making reference to this resource, please cite the following paper:

Marco Turchi, Matteo Negri.: “Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements“. In Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, May 26-31 2014.

author = {Marco Turchi and Matteo Negri},
title = {{Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements}},
booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14)},
year = {2014},
month = {may},
date = {26-31},
address = {Reykjavik, Iceland},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
isbn = {978-2-9517408-8-4},
language = {english}

The creation of BinQE has been partially supported by the EC-funded project MateCat (ICT-2011.4.2-287688).

Additional References

Marion Potet, Emmanuelle Esperana-Rodier, Laurent Besacier, and Herv Blanchon. 2012. “Collection of a Large Database of French-English SMT Output Corrections“. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May 2012.

Marco Turchi, Matteo Negri, and Marcello Federico. 2013. “Coping with the Subjectivity of Human Judgements in MT Quality Estimation“. In Proceedings of the 8th Workshop on Statistical Machine Translation (WMT’13), Sofia, Bulgaria.


For more infomation about BinQe please contact: negri[at]