Statistical Post-Editing for a Statistical MT System

* !! Under construction !! *

Hanna Béchara, Yanjun Ma, Josef van Genabith
MT Summit 2011

Presented by Rudolf Rosa
Report by Jindřich Helcl

Introduction

This article was about statistical post-editing on results of a statistical machine translation system. The most interesting part on this article was that authors claim that they achieved improvement of about 2 BLEU score points by pipelining two statistical MT systems, which was until then considered useless.

The paper frequently quotes another article from Simard et al. (2007), which has been also briefly presented in the beginning of the presentation and which you can read online here.

Outline

A brief outline of the paper follows. In introduction, previous work has been briefly presented, it was stated that any results of this method were either none or not statistically significant.

Data: The data for the experiment came from English-French translation memory from Symantec. The size of the data was about 55k sentences (0.8M words) in each language. In the paper, they call the English training data E and the French data F.
Architecture: They wanted to train the same system to do the translation and post-editing. To overcome training on the same data, they build a third dataset F' using 10-fold cross validation approach on resutls of the first translation system trained on datasets E and F. After that, they trained the second system on datasets F' and F to learn it “translate” from (French) results of the first system to the “real-world” French.
Enhancements

Discussion

- otázky

Conclusion

- zhodnocení

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

Statistical Post-Editing for a Statistical MT System

Introduction

Outline

Discussion

Conclusion