Differences

This shows you the differences between two versions of the page.

--- courses:rg:2013:false-positive-psychology [2013/10/10 00:53]
popel
+++ courses:rg:2013:false-positive-psychology [2013/10/14 15:47] (current)
popel answers
@@ Line 8: / Line 8: @@
   - What does "Correcting alpha levels" mean? Give an example.
+===== Answers =====
+==== 1. ====
+Most of the described issues are relevant for NLP as well (cf. hyper-parameters, unreported technical details, tokenization,...).
+When human evaluation is involved in NLP, it shares many methodological properties/problems with Psychology.
+However, in many NLP tasks we have (only) automatic evaluation (based on human-annotated gold data).
+Psy: "find evidence that an effect exists"
+NLP: "our method for solving xy is better"
+NLP: easier to replicate experiments ("code and Makefiles" may be published with the paper)
+PSY: replication costs money and time, you need different people (so they are not influenced)
+The last point is quite important. You may suggest the same person should listen to both Beatles and Kalimba (after some time), but there is a risk of long-term effect of the first experiment influencing (skewing) the second one.
+==== 2. ====
+P-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. (see Wikipedia [[http://en.wikipedia.org/wiki/P-value|P-value]])
+Using a formula for one-tailed test: p-value = P( X≥x | H0)
+where X is the statistics we are measuring (e.g. difference between average age of Kalimba-listeners and average age of Beatles-listeners),
+x is the value of X we have actually measured (e.g. 1.4 years),
+and H0 is the null hypothesis (no effect, no difference between the two groups, i.e. X=0, the difference has normal distribution with mean 0).
+If you set the traditional significance level to 0.05, you get a false positive case when p<0.05, but the null hypothesis holds.
+false-positive-rate =
+= P(p-value < 0.05 & H0)
+= P( P( X≥x | H0) < 0.05 & H0) != p-value
+==== 3. ====
+Alpha was originally defined by Neyman & Pearson as Type I error rate, but this is incompatible with p-value and Fisher's theory of significance testing.
+Alpha is also used as a name for the "significance level" - a threshold for p-value, which is traditionally set to 0.05.
+When multiple experiments are tested, we should decrease this threshold (but how?).
+See [[http://xkcd.com/882/|XKCD]].

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences