Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
courses:rg:2012:sigtest-mt-zilka [2012/11/14 18:02] zilka |
courses:rg:2012:sigtest-mt-zilka [2013/12/02 22:18] (current) popel |
||
---|---|---|---|
Line 13: | Line 13: | ||
Does the bootstrap resampling (Section 5) assume normal (Gaussian) distribution of the scores of samples? | Does the bootstrap resampling (Section 5) assume normal (Gaussian) distribution of the scores of samples? | ||
===== Question 4 ===== | ===== Question 4 ===== | ||
- | We bootstrapped 1000 test sets, computed scoreA-scoreB on each, and we got -1000, | + | We bootstrapped 1000 test sets, computed scoreA-scoreB on each, and we got -1000, |
Based on Section 6, which system is better - A or B? | Based on Section 6, which system is better - A or B? | ||
Line 34: | Line 34: | ||
====== Presentation ====== | ====== Presentation ====== | ||
* We answered: | * We answered: | ||
- | * Question 1 - BLEU scores are: 1 - 1.0, 2 - 0.0 (or some smoothed value), 3 - 0.2 | + | * Question 1 - BLEU scores are: 1 - 1.0, 2 - not defined (0.0 or some smoothed value in practice), 3 - 0.2 (based on the incorrect formula in the paper which is missing 1/4) |
* Question 2 - broad sampling, samples far apart distributed -> {data_1, data_101, data_201, ...} | * Question 2 - broad sampling, samples far apart distributed -> {data_1, data_101, data_201, ...} | ||
Line 43: | Line 43: | ||
* non-consecutive samples (broad apart) - for each of the sets BLEU varies much less - +-1.5 % | * non-consecutive samples (broad apart) - for each of the sets BLEU varies much less - +-1.5 % | ||
* they make an assumption and claim that there is no difference between comparing output of 2 different MT systems and output of 1 MT systems that is trained just with different data | * they make an assumption and claim that there is no difference between comparing output of 2 different MT systems and output of 1 MT systems that is trained just with different data | ||
- | * Lukas Zilka complained about this assumption - they should have conducted some experiments to support their claim, as there is nothing that suggest | + | * Lukas Zilka complained about this assumption - they should have conducted some experiments to support their claim, as there is nothing that suggests |
===== Section 4, 5 ===== | ===== Section 4, 5 ===== | ||
Line 56: | Line 56: | ||
===== Martin' | ===== Martin' | ||
- | * two philosophical views of p-value - Fisher' | + | * two philosophical views of p-value - Fisher' |
- | * we always | + | * we usually |
* p-value = | * p-value = | ||
* P(T(X)> | * P(T(X)> | ||
- | * unfortunately we tend to view the p-value as P(H0|x) which it is not and we need to apply the Bayes's theorem to get it | + | * unfortunately we tend to view the p-value as P(H0|x) which it is not and we need to apply the Bayes' theorem to get it |
- | * bootstrap resampling can be viewed as p-value=P(d(x) > d(x_orig)|H0), | + | * bootstrap resampling can be viewed as p-value |