Differences
This shows you the differences between two versions of the page.
Next revision Both sides next revision | |||
courses:rg:2011:bleu-a-method-for-automatic-evaluation-of-machine-translation [2011/12/05 23:46] galuscakova vytvořeno |
courses:rg:2011:bleu-a-method-for-automatic-evaluation-of-machine-translation [2011/12/06 10:35] popel comment and typos |
||
---|---|---|---|
Line 9: | Line 9: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Presented | + | The presented |
===== Notes ===== | ===== Notes ===== | ||
BLEU score is based on the comparison of the automatic (candidate) translation and reference human translations. Basically, counts of the n-grams shared in automatic translation and reference translation are calculated and divided by number of all n-grams. This n-gram precision is further modified. If the number of particular shared n-gram is higher in the candidate translation than in the reference translation, | BLEU score is based on the comparison of the automatic (candidate) translation and reference human translations. Basically, counts of the n-grams shared in automatic translation and reference translation are calculated and divided by number of all n-grams. This n-gram precision is further modified. If the number of particular shared n-gram is higher in the candidate translation than in the reference translation, | ||
+ | > No, it's not a "// | ||
Jindřich noticed a mistake in section 2 where is written that the phrase "of the party" is shared only with Reference 2, but it is shared also with Reference 3. | Jindřich noticed a mistake in section 2 where is written that the phrase "of the party" is shared only with Reference 2, but it is shared also with Reference 3. | ||
Line 20: | Line 21: | ||
Performed experiments show high correlation of manual ranking and automatic ranking of translation systems. BLEU is able to distinguish between good and bad translations and between translation created by human or by automatic system. | Performed experiments show high correlation of manual ranking and automatic ranking of translation systems. BLEU is able to distinguish between good and bad translations and between translation created by human or by automatic system. | ||
- | The shortage of the paper is, that these experiments are performed only for English. As further papers show, BLEU score is works worse especially for the languages with free word order and for morphologically rich languages. | + | The shortage of the paper is, that these experiments are performed only for English. As further papers show, BLEU score works worse especially for the languages with free word order and for morphologically rich languages. |
===== Conclusion ===== | ===== Conclusion ===== | ||
The paper was well presented and the discussion brought several interesting questions on topics, which were not clean from the paper. The paper is very interesting and readable and it is useful especially for the better understanding of BLEU score. | The paper was well presented and the discussion brought several interesting questions on topics, which were not clean from the paper. The paper is very interesting and readable and it is useful especially for the better understanding of BLEU score. |