[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
courses:rg:2013:false-positive-psychology [2013/10/10 00:53]
popel
courses:rg:2013:false-positive-psychology [2013/10/14 15:47] (current)
popel answers
Line 8: Line 8:
   - What does "Correcting alpha levels" mean? Give an example.   - What does "Correcting alpha levels" mean? Give an example.
  
 +===== Answers =====
 +
 +==== 1. ====
 +Most of the described issues are relevant for NLP as well (cf. hyper-parameters, unreported technical details, tokenization,...).
 +When human evaluation is involved in NLP, it shares many methodological properties/problems with Psychology.
 +However, in many NLP tasks we have (only) automatic evaluation (based on human-annotated gold data).
 +
 +Psy: "find evidence that an effect exists"
 +NLP: "our method for solving xy is better"
 +
 +NLP: easier to replicate experiments ("code and Makefiles" may be published with the paper)
 +PSY: replication costs money and time, you need different people (so they are not influenced)
 +
 +The last point is quite important. You may suggest the same person should listen to both Beatles and Kalimba (after some time), but there is a risk of long-term effect of the first experiment influencing (skewing) the second one.
 +
 +==== 2. ====
 +P-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. (see Wikipedia [[http://en.wikipedia.org/wiki/P-value|P-value]]) 
 +
 +Using a formula for one-tailed test: p-value = P( X≥x | H0)
 +where X is the statistics we are measuring (e.g. difference between average age of Kalimba-listeners and average age of Beatles-listeners),
 +x is the value of X we have actually measured (e.g. 1.4 years),
 +and H0 is the null hypothesis (no effect, no difference between the two groups, i.e. X=0, the difference has normal distribution with mean 0).
 +    
 +    
 +If you set the traditional significance level to 0.05, you get a false positive case when p<0.05, but the null hypothesis holds.
 +false-positive-rate =
 += P(p-value < 0.05 & H0)
 += P( P( X≥x | H0) < 0.05 & H0) != p-value
 +
 +==== 3. ==== 
 +Alpha was originally defined by Neyman & Pearson as Type I error rate, but this is incompatible with p-value and Fisher's theory of significance testing.
 +Alpha is also used as a name for the "significance level" - a threshold for p-value, which is traditionally set to 0.05.
 +When multiple experiments are tested, we should decrease this threshold (but how?).
 +See [[http://xkcd.com/882/|XKCD]].

[ Back to the navigation ] [ Back to the content ]