Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
courses:rg:2012:spe-for-smt [2012/10/12 12:45] jindra.helcl |
courses:rg:2012:spe-for-smt [2012/10/12 13:12] jindra.helcl |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Statistical Post-Editing for a Statistical MT System ====== | ====== Statistical Post-Editing for a Statistical MT System ====== | ||
- | |||
- | //*** !! Under construction !! ***// | ||
- | |||
//Hanna Béchara, Yanjun Ma, Josef van Genabith// | //Hanna Béchara, Yanjun Ma, Josef van Genabith// | ||
MT Summit 2011 | MT Summit 2011 | ||
+ | [[http:// | ||
Presented by Rudolf Rosa | Presented by Rudolf Rosa | ||
Report by Jindřich Helcl | Report by Jindřich Helcl | ||
+ | |||
===== Introduction ===== | ===== Introduction ===== | ||
This article was about statistical post-editing on results of a statistical machine translation system. The most interesting part on this article was that authors claim that they achieved improvement of about 2 BLEU score points by pipelining two statistical MT systems, which was until then considered useless. | This article was about statistical post-editing on results of a statistical machine translation system. The most interesting part on this article was that authors claim that they achieved improvement of about 2 BLEU score points by pipelining two statistical MT systems, which was until then considered useless. | ||
Line 18: | Line 17: | ||
* **Enhancements: | * **Enhancements: | ||
* Contextual SPE, which means that the translated words was created by concatenating the English word and the translation separated by hash sign to one resulting word. This new dataset is called **E# | * Contextual SPE, which means that the translated words was created by concatenating the English word and the translation separated by hash sign to one resulting word. This new dataset is called **E# | ||
- | * Next, they striped off the #-postfixes of non-translated words. | + | * Next, they striped off the #-postfixes of non-translated words (OOV). |
* Then, they do alignment between the source text and the translation and use the contextual enhancement only where the alignment weight was over some threshold. | * Then, they do alignment between the source text and the translation and use the contextual enhancement only where the alignment weight was over some threshold. | ||
& | & | ||
===== Discussion ===== | ===== Discussion ===== | ||
- | |||
Following topics about the article were discussed on RG meeting: | Following topics about the article were discussed on RG meeting: | ||
- | * malý data (translation memory) | + | * As the main possible flaw of the experiment was assumed the size of the data (only 55k sentences). On the other hand, the data from translation memory |
- | * 10-fold | + | * In the paper, they state that they use 10-fold |
- | | + | * We found pointless for authors to present explicit results of Contextual SPE without removing the #-postfixes, as it was plain enough to remove them right away. This simple objection lead us to idea of removing the #-postfixes even before the OOV utterance is put to the language model, while it could bring some improvements. |
- | * # | + | * When the authors wrote about Contextual SPE with thresholding, |
- | * alignment | + | |
===== Conclusion ===== | ===== Conclusion ===== | ||
- | + | Despite the structure of the paper was often critisized and possible flaws was found, the article was considered to be well-readable and simple enough to be the opening article for this semester' | |
- | - zhodnocení | + |