[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:rg:2012:riezler-iii [2012/11/19 18:56]
korvas oprava ohledně tabulky p-hodnot
courses:rg:2012:riezler-iii [2012/12/03 10:28] (current)
korvas added the questions for the answers
Line 1: Line 1:
-gg===Martin's questions===+===Martin's questions=== 
 + 
 +  1) 
 +  How would you implement approximate randomization for BLEU based on Figure 1, 
 +  namely the part "Shuffle variable tuples between system X and Y with probability 0.5"? 
 +  What are the variable tuples? Can you write a more detailed pseudo (or C,Java,Perl,...) code? 
 +  How would you implement the next part "Compute pseudo-statistic |S_Xr − S_Yr | on shuffled data"? 
 + 
 +  2) 
 +  On a testset of 1000 sentences, systems X and Y have exactly the same output except for one sentence: 
 +  REF = Hello 
 +  MT_X= Hello 
 +  MT_Y= Hi 
 +  You computed approximate randomization test (based on Figure 1, R=10000 samples) 
 +  to check whether the improvement in BLEU is significant. What were the results (i.e. p-value)? 
 + 
 +  3) 
 +  What would be the p-value for bootstrap test based on a) Figure 2, b) Koehn2004 (the last RG paper)? 
 +  This is a bit tricky. Just estimate the expected value of p-value (i.e. 1 - level_of_confidence). 
 + 
 +  4) 
 +  What would be the p-value for non-strict inequality, i.e. hypothesis "system X is better or equal than Y"? 
 1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching. 1. The question aimed to find out whether we would repeatedly count the matching n-grams between the MT output and the reference. They can be pre-computed for each sentence and then aggregated without recurring to string matching.
  

[ Back to the navigation ] [ Back to the content ]