Corpora
- BinQE – Machine Translation Dataset Annotated with Binary Quality Judgements
- BitterCorpus – English-Italian corpus with annotated bilingual terms in IT domain
- CLTE Benchmark – Cross-Lingual Textual Entailment Dataset
- EC Short Clips – Automatic subtitling benchmark for English-German/Spanish made of European Commission clips.
- EuroParl Interviews – Automatic subtitling benchmark for English-German/Spanish made of European Parliament Interviews.
- eSCAPE – Large-scale Synthetic Corpus for Automatic Post-Editing
- Heroes-ON-OFF – Annotation of dubbing segments based on the Heroes corpus
- MAGMATic – Italian-English multi-domain academic gold standard with manual annotation of terminology
- MuST-C – Multilingual Speech Translation Corpus
- MuST-C Common Post-Edited Test Set: Additional reference translations for English-German/Italian/Spanish
- MuST-Cinema – Speech-to-Subtitles corpus
- MuST-SHE – Multilingual benchmark for the evaluation of gender bias in Machine Translation and Speech Translation
- MuST-Speakers – Annotation of MuST-C talks with speakers’ gender information
- MuST-C Gender-balanced Validation Set – New MuST-C validation set balanced with respect to speakers’ gender
- NEuRoparl-ST – Multilingual benchmark built from European Parliament speeches and annotated with Named Entities and Terminology
- RTE3-derived CLTE dataset – Cross-lingual entailment corpus, obtained by translating the RTE-3 dataset
- TOSCA-MP Speech Ground Truth – Multilingual dataset of news and talk show transcriptions and translations
- WAGS – English-Italian Word Alignment Gold Standard
- WIT3 – Ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks
Software
Actively Mantained
Past Contributions
- Moses – A statistical machine translation system
- IRSTLM – A toolkit featuring algorithms and data structures to store and access very large n-gram language models
online - MGIZA++ – An extension of MGIZA++, which allows to align sentence pair in an online mode
- AQET – Adaptive Quality Estimation tool for Machine Translation
- ModernMT – A neural adaptive machine translation system that adapts to context and learns from corrections