Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
courses:rg:2012:riezler-iii [2012/11/19 18:54] korvas zapsáno, přepsáno |
courses:rg:2012:riezler-iii [2012/12/03 10:28] (current) korvas added the questions for the answers |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | gg===Martin' | + | ===Martin' |
+ | |||
+ | 1) | ||
+ | How would you implement approximate randomization for BLEU based on Figure 1, | ||
+ | namely the part " | ||
+ | What are the variable tuples? Can you write a more detailed pseudo (or C, | ||
+ | How would you implement the next part " | ||
+ | |||
+ | 2) | ||
+ | On a testset of 1000 sentences, systems X and Y have exactly the same output except for one sentence: | ||
+ | REF = Hello | ||
+ | MT_X= Hello | ||
+ | MT_Y= Hi | ||
+ | You computed approximate randomization test (based on Figure 1, R=10000 samples) | ||
+ | to check whether the improvement in BLEU is significant. What were the results (i.e. p-value)? | ||
+ | |||
+ | 3) | ||
+ | What would be the p-value for bootstrap test based on a) Figure 2, b) Koehn2004 (the last RG paper)? | ||
+ | This is a bit tricky. Just estimate the expected value of p-value (i.e. 1 - level_of_confidence). | ||
+ | |||
+ | 4) | ||
+ | What would be the p-value for non-strict inequality, i.e. hypothesis " | ||
1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching. | 1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching. | ||
Line 10: | Line 32: | ||
3., 4. | 3., 4. | ||
- | ^ ^ p (x>y) ^ p (x>=y) ^ | + | ^ ^ p (x>y) ^ p (x≥y) ^ |
^ approx. rand | 1.00 | (0+1)/ | ^ approx. rand | 1.00 | (0+1)/ | ||
^ boot. Riezler | 0.26 | (0+1)/ | ^ boot. Riezler | 0.26 | (0+1)/ | ||
Line 18: | Line 40: | ||
//In the following, [i, j] refers to the i-th content row, j-th content column of the above table.// | //In the following, [i, j] refers to the i-th content row, j-th content column of the above table.// | ||
- | [3,1] ... MT_Y can never be better than MT_X, | + | [3,2] ... MT_Y can never be better than MT_X, |
and H_0 says that (MT_X < MT_Y) | and H_0 says that (MT_X < MT_Y) | ||