Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:rg:2012:meant [2012/11/13 00:02] rosa final section |
courses:rg:2012:meant [2012/11/13 00:07] rosa spellcheck |
||
---|---|---|---|
Line 8: | Line 8: | ||
- | The paper was widely discussed throughout the whole session. The report tries to divide the points discussed in correspondance | + | The paper was widely discussed throughout the whole session. The report tries to divide the points discussed in correspondence |
===== 1 Introduction ===== | ===== 1 Introduction ===== | ||
- | The paper proposes a semi-automatic translation evaluation metric that is claimed to be both well correlated with human judgement | + | The paper proposes a semi-automatic translation evaluation metric that is claimed to be both well correlated with human judgment |
==== Question 1: Which translation is considered as "a good one" by (H)MEANT? ==== | ==== Question 1: Which translation is considered as "a good one" by (H)MEANT? ==== | ||
- | Meant assumes that a good traslation | + | Meant assumes that a good translation |
- | Matin further explained that HTER is a metric where the humans post-edit the MT output to transform it into a correct translation, | + | Martin |
Matěj Korvas then pointed to an important difference between MEANT and HTER: MEANT uses reference translations, | Matěj Korvas then pointed to an important difference between MEANT and HTER: MEANT uses reference translations, | ||
Line 25: | Line 25: | ||
==== Question 2: Which phases of annotations are there? ==== | ==== Question 2: Which phases of annotations are there? ==== | ||
- | |||
- SRL (semantic role labelling) of both the reference and the MT output; the labels are based on PropBank (but have nicer names) | - SRL (semantic role labelling) of both the reference and the MT output; the labels are based on PropBank (but have nicer names) | ||
- aligning the frames - first, predicates are aligned, and then, for each matching pair of predicates, their arguments are aligned as well | - aligning the frames - first, predicates are aligned, and then, for each matching pair of predicates, their arguments are aligned as well | ||
- ternary judging - deciding whether each matched role is translated correctly, incorrectly or only partially correctly | - ternary judging - deciding whether each matched role is translated correctly, incorrectly or only partially correctly | ||
- | The group discussed whether HMEANT evaluations are really faster than HTER annotations, | + | The group discussed whether HMEANT evaluations are really faster than HTER annotations, |
==== Question 3: What does the set J contain in the // | ==== Question 3: What does the set J contain in the // | ||
Line 56: | Line 55: | ||
===== 6 Experiment: Monolinguals vs. bilinguals ===== | ===== 6 Experiment: Monolinguals vs. bilinguals ===== | ||
- | |||
Petr notes that, although it might seem surprising that monolinguals perform better in the evaluation than bilinguals, it is probably a consequence of the fact that bilinguals try to guess what the source was, while the monolinguals cannot do that. | Petr notes that, although it might seem surprising that monolinguals perform better in the evaluation than bilinguals, it is probably a consequence of the fact that bilinguals try to guess what the source was, while the monolinguals cannot do that. | ||
Line 62: | Line 60: | ||
===== Final Objections ===== | ===== Final Objections ===== | ||
- | |||
For the rest of the session, Martin took the lead to express some more objections to the paper. The group agreed with the objections, and even added some more. | For the rest of the session, Martin took the lead to express some more objections to the paper. The group agreed with the objections, and even added some more. | ||
Table 3 seems to represent the main results of the paper. | Table 3 seems to represent the main results of the paper. | ||
- | It is shocking that the authors used **only 40 sentences**; | + | It is shocking that the authors used **only 40 sentences**; |
The grid search they use to tune the parameters means to "try everything and find the best-correlating parameters" | The grid search they use to tune the parameters means to "try everything and find the best-correlating parameters" | ||
They ran the grid search optimization on the 40 sentences they have, but then they evaluated HMEANT on the same data. | They ran the grid search optimization on the 40 sentences they have, but then they evaluated HMEANT on the same data. | ||
Line 78: | Line 75: | ||
Martin also notes that the authors claim that all other existing evaluation metrics require lexical matches to consider a translation to be correct - which is not true, as the Meteor metric can also use paraphrases. | Martin also notes that the authors claim that all other existing evaluation metrics require lexical matches to consider a translation to be correct - which is not true, as the Meteor metric can also use paraphrases. | ||
- | The group generally agreed that, although the ideas behind HMEANT seem reasonable, the paper itself is misleading and is not to be believed much (or probably at all). The proposed metric possibly correlates better with human judgement | + | The group generally agreed that, although the ideas behind HMEANT seem reasonable, the paper itself is misleading and is not to be believed much (or probably at all). The proposed metric possibly correlates better with human judgment |