Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
courses:rg:2013:false-positive-psychology [2013/10/10 00:53] popel |
courses:rg:2013:false-positive-psychology [2013/10/14 15:47] (current) popel answers |
||
---|---|---|---|
Line 8: | Line 8: | ||
- What does " | - What does " | ||
+ | ===== Answers ===== | ||
+ | |||
+ | ==== 1. ==== | ||
+ | Most of the described issues are relevant for NLP as well (cf. hyper-parameters, | ||
+ | When human evaluation is involved in NLP, it shares many methodological properties/ | ||
+ | However, in many NLP tasks we have (only) automatic evaluation (based on human-annotated gold data). | ||
+ | |||
+ | Psy: "find evidence that an effect exists" | ||
+ | NLP: "our method for solving xy is better" | ||
+ | |||
+ | NLP: easier to replicate experiments ("code and Makefiles" | ||
+ | PSY: replication costs money and time, you need different people (so they are not influenced) | ||
+ | |||
+ | The last point is quite important. You may suggest the same person should listen to both Beatles and Kalimba (after some time), but there is a risk of long-term effect of the first experiment influencing (skewing) the second one. | ||
+ | |||
+ | ==== 2. ==== | ||
+ | P-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. (see Wikipedia [[http:// | ||
+ | |||
+ | Using a formula for one-tailed test: p-value = P( X≥x | H0) | ||
+ | where X is the statistics we are measuring (e.g. difference between average age of Kalimba-listeners and average age of Beatles-listeners), | ||
+ | x is the value of X we have actually measured (e.g. 1.4 years), | ||
+ | and H0 is the null hypothesis (no effect, no difference between the two groups, i.e. X=0, the difference has normal distribution with mean 0). | ||
+ | | ||
+ | | ||
+ | If you set the traditional significance level to 0.05, you get a false positive case when p<0.05, but the null hypothesis holds. | ||
+ | false-positive-rate = | ||
+ | = P(p-value < 0.05 & H0) | ||
+ | = P( P( X≥x | H0) < 0.05 & H0) != p-value | ||
+ | |||
+ | ==== 3. ==== | ||
+ | Alpha was originally defined by Neyman & Pearson as Type I error rate, but this is incompatible with p-value and Fisher' | ||
+ | Alpha is also used as a name for the " | ||
+ | When multiple experiments are tested, we should decrease this threshold (but how?). | ||
+ | See [[http:// |