Satanjeev Banerjee and Alon Lavie
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

1) Why is correlation of METEOR higher in Table 1 (0.964) than in Table 2 (0.347)?

2) For the following two sentences:

System Translation: Two disappointed bidders went out of the room.
Reference Translation: A bidder who was disappointed leaved the room.

(Please do not consider any list of stop words in this exercise)

a) Which mappings would the METEOR metric find in individual stages (exact, stems, synonyms)?
b) Compute Precision and Recall. (Please use fractions with nominators and denominators so I can see how you computed it).
c) How many chunks are there in the system translation? What is the Penalty?
d) What is the final score?

3) Do you think that METEOR is a sensitive metric for Czech? What types of errors METEOR does not penalize?

