Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
courses:rg:2012:meant [2012/11/12 22:28] rosa vytvořeno |
courses:rg:2012:meant [2012/11/12 23:22] rosa sec 3 |
||
---|---|---|---|
Line 7: | Line 7: | ||
Report by Rudolf Rosa | Report by Rudolf Rosa | ||
- | ===== 1 Introduction ===== | ||
+ | The paper was widely discussed throughout the whole session. The report tries to divide the points discussed in correspondance to the sections of the paper. | ||
+ | ===== 1 Introduction ===== | ||
+ | The paper proposes a semi-automatic translation evaluation metric that is claimed to be both well correlated with human judgement (especially in comparison to BLEU) and less labour-intensive than HTER (which is claimed to be much more expensive). | ||
+ | ==== Question 1: Which translation is considered as "a good one" by (H)MEANT? ==== | ||
+ | Meant assumes that a good traslation is one where the reader understands correctly "Who did what to whom, when, where and why" - which, as Martin noted, is rather adequacy than fluency, and therefore a comparison with BLEU, which is more fluency-oriented, | ||
+ | |||
+ | Matin further explained that HTER is a metric where the humans post-edit the MT output to transform it into a correct translation, | ||
+ | Matěj Korvas then pointed to an important difference between MEANT and HTER: MEANT uses reference translations, | ||
Section **2 Related work** was skipped. | Section **2 Related work** was skipped. | ||
===== 3 MEANT: SRL for MT evaluation ===== | ===== 3 MEANT: SRL for MT evaluation ===== | ||
+ | Here we look at how the evaluation is actually done. It consists of three steps, all done by humans in HMEANT. In MEANT, the first step is done automatically. | ||
+ | |||
+ | ==== Question 2: Which phases of annotations are there? ==== | ||
+ | |||
+ | - SRL (semantic role labelling) of both the reference and the MT output; the labels are based on PropBank (but have nicer names) | ||
+ | - aligning the frames - first, predicates are aligned, and then, for each matching pair of predicates, their arguments are aligned as well | ||
+ | - ternary judging - deciding whether each matched role is translated correctly, incorrectly or only partially correctly | ||
+ | |||
+ | The group discussed whether HMEANT evaluations are really faster than HTER annotations, | ||
+ | |||
+ | ==== Question 3: What does the set J contain in the // | ||
+ | The answer is that it contains the arguments of the predicate. It actually contains all // | ||
+ | |||
+ | We further tried to compute the score for the following set of sentences: | ||
+ | * Reference: //John loves Mary.// | ||
+ | * MT1: //Stupid John loves Mary.// | ||
+ | * MT2: //John loves Jack.// | ||
+ | * MT3: //John hates Mary.// | ||
+ | We supposed that the semantic roles are the same in all cases, i.e. Agent for //John// or //Stupid John//, Predicate for //loves// or //hates//, and Experiencer for //Mary//. It was explained by Martin that //Stupid John// has no inner structure in HMEANT as there is no predicate in the phrase - HMEANT semantic annotation is shallow in that respect. Furthermore, | ||
+ | |||
+ | For MT1, the HMEANT score is equal to 1, because, according to the paper, extra information is not penalized, and the translation is therefore regarded as being completely correct. | ||
+ | |||
+ | For MT2, // | ||
+ | |||
+ | For MT3, the predicates do not match, and therefore no arguments are taken into account. Martin and Ruda agreed that most probably not even a partial match of predicates can be annotated, as there is no support for such annotation in the formulas, which Martin suggested to be a possible flaw of the method. | ||
+ | |||
+ | Karel Bílek also noted that it is hard to annotate semantics on incorrect sentences, which is not mentioned in the paper. | ||
+ | |||
+ | ===== 4 Meta-evaluation methodology ===== | ||