courses:rg:2014:bleu [ufal wiki]

You can skip sections 4 and 5 in the paper.

1) Section 2.1.1. defines p_n as a fraction where the denominator is “the number of candidate n-grams in the test corpus”.
Compute this denominator for p_3 and a test corpus with three sentences with lengths 3, 4 and 5.

2) Do we need source-language sentences for computing BLEU?

3) Let's have a corpus with two sentences:
Die Katze ist auf der Matte
Lesegruppe ist meine Lieblingsklasse

Reference translation 1:
The cat is on the mat
Reading group is my favourite class

Reference translation 2:
There is a cat on the mat
I love RG

Machine translation:
cat is cat
Reading group is my nightmare

Compute BLEU and BP of the machine translation compared to the two references.
Use the standard BLEU definition, i.e. case insensitive, N=4, w_n=1/4, log(x) is the natural logarithm (ln(x)).

4) We computed a BLEU score for a given test set with three reference translations.
Then a new reference translation became available,
so we computed a new BLEU score for the same test set with four references (three old, one new).
Can the new BLEU score be lower than the old score? Can it be higher? Why?

5) Can you think of any problems in BLEU metrics (for Czech or any other language)? Name them.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki