Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:rg:2012:sigtest-mt-zilka [2012/11/14 18:02] zilka |
courses:rg:2012:sigtest-mt-zilka [2013/12/02 22:18] (current) popel |
||
|---|---|---|---|
| Line 13: | Line 13: | ||
| Does the bootstrap resampling (Section 5) assume normal (Gaussian) distribution of the scores of samples? | Does the bootstrap resampling (Section 5) assume normal (Gaussian) distribution of the scores of samples? | ||
| ===== Question 4 ===== | ===== Question 4 ===== | ||
| - | We bootstrapped 1000 test sets, computed scoreA-scoreB on each, and we got -1000, | + | We bootstrapped 1000 test sets, computed scoreA-scoreB on each, and we got -1000, |
| Based on Section 6, which system is better - A or B? | Based on Section 6, which system is better - A or B? | ||
| Line 34: | Line 34: | ||
| ====== Presentation ====== | ====== Presentation ====== | ||
| * We answered: | * We answered: | ||
| - | * Question 1 - BLEU scores are: 1 - 1.0, 2 - 0.0 (or some smoothed value), 3 - 0.2 | + | * Question 1 - BLEU scores are: 1 - 1.0, 2 - not defined (0.0 or some smoothed value in practice), 3 - 0.2 (based on the incorrect formula in the paper which is missing 1/4) |
| * Question 2 - broad sampling, samples far apart distributed -> {data_1, data_101, data_201, ...} | * Question 2 - broad sampling, samples far apart distributed -> {data_1, data_101, data_201, ...} | ||
| Line 43: | Line 43: | ||
| * non-consecutive samples (broad apart) - for each of the sets BLEU varies much less - +-1.5 % | * non-consecutive samples (broad apart) - for each of the sets BLEU varies much less - +-1.5 % | ||
| * they make an assumption and claim that there is no difference between comparing output of 2 different MT systems and output of 1 MT systems that is trained just with different data | * they make an assumption and claim that there is no difference between comparing output of 2 different MT systems and output of 1 MT systems that is trained just with different data | ||
| - | * Lukas Zilka complained about this assumption - they should have conducted some experiments to support their claim, as there is nothing that suggest | + | * Lukas Zilka complained about this assumption - they should have conducted some experiments to support their claim, as there is nothing that suggests |
| ===== Section 4, 5 ===== | ===== Section 4, 5 ===== | ||
| Line 56: | Line 56: | ||
| ===== Martin' | ===== Martin' | ||
| - | * two philosophical views of p-value - Fisher' | + | * two philosophical views of p-value - Fisher' |
| - | * we always | + | * we usually |
| * p-value = | * p-value = | ||
| * P(T(X)> | * P(T(X)> | ||
| - | * unfortunately we tend to view the p-value as P(H0|x) which it is not and we need to apply the Bayes's theorem to get it | + | * unfortunately we tend to view the p-value as P(H0|x) which it is not and we need to apply the Bayes' theorem to get it |
| - | * bootstrap resampling can be viewed as p-value=P(d(x) > d(x_orig)|H0), | + | * bootstrap resampling can be viewed as p-value |
