# Differences

This shows you the differences between two versions of the page.

 courses:rg:2012:riezler-iii [2012/11/19 18:54]korvas zapsáno, přepsáno courses:rg:2012:riezler-iii [2012/12/03 10:28] (current)korvas added the questions for the answers 2012/12/03 10:28 korvas added the questions for the answers2012/11/19 18:56 korvas oprava ohledně tabulky p-hodnot2012/11/19 18:54 korvas zapsáno, přepsáno Next revision Previous revision 2012/12/03 10:28 korvas added the questions for the answers2012/11/19 18:56 korvas oprava ohledně tabulky p-hodnot2012/11/19 18:54 korvas zapsáno, přepsáno Line 1: Line 1: - gg===Martin's questions=== + ===Martin's questions=== + + 1) + How would you implement approximate randomization for BLEU based on Figure 1, + namely the part "Shuffle variable tuples between system X and Y with probability 0.5"? + What are the variable tuples? Can you write a more detailed pseudo (or C,Java,Perl,...) code? + How would you implement the next part "Compute pseudo-statistic |S_Xr − S_Yr | on shuffled data"? + + 2) + On a testset of 1000 sentences, systems X and Y have exactly the same output except for one sentence: + REF = Hello + MT_X= Hello + MT_Y= Hi + You computed approximate randomization test (based on Figure 1, R=10000 samples) + to check whether the improvement in BLEU is significant. What were the results (i.e. p-value)? + + 3) + What would be the p-value for bootstrap test based on a) Figure 2, b) Koehn2004 (the last RG paper)? + This is a bit tricky. Just estimate the expected value of p-value (i.e. 1 - level_of_confidence). + + 4) + What would be the p-value for non-strict inequality, i.e. hypothesis "system X is better or equal than Y"? 1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching. 1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching. Line 10: Line 32: 3., 4. 3., 4. - ^ ^ p (x>y) ^ p (x>=y) ^ + ^ ^ p (x>y) ^ p (x≥y) ^ ^ approx. rand  |    1.00 | (0+1)/(10000+1) | ^ approx. rand  |    1.00 | (0+1)/(10000+1) | ^ boot. Riezler |    0.26 | (0+1)/(10000+1) | ^ boot. Riezler |    0.26 | (0+1)/(10000+1) | Line 18: Line 40: //In the following, [i, j] refers to the i-th content row, j-th content column of the above table.// //In the following, [i, j] refers to the i-th content row, j-th content column of the above table.// - [3,1] ... MT_Y can never be better than MT_X, + [3,2] ... MT_Y can never be better than MT_X, and H_0 says that (MT_X < MT_Y) and H_0 says that (MT_X < MT_Y)

[ Back to the navigation ] [ Back to the content ] 