Differences
This shows you the differences between two versions of the page.
Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:rg:2011:deciphering_foreign_language [2012/01/07 13:41] tran |
courses:rg:2011:deciphering_foreign_language [2012/01/07 14:15] tran |
\mathop {\arg \max }\limits_\theta \prod\limits_c {P_\theta (c)} = \mathop {\arg \max }\limits_\theta \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta (c_i |e_i )} } } | \mathop {\arg \max }\limits_\theta \prod\limits_c {P_\theta (c)} = \mathop {\arg \max }\limits_\theta \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta (c_i |e_i )} } } |
</latex> | </latex> |
| |
| The key idea of section 2 is the Iterative EM algorithm, which is used to estimate <latex>\theta</latex> more effectively in term of saving memory and running time (complexity). |
| |
| If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration. |
| |
| __**Practical question:**__ How to initiate EM? How to start the first iteration? |
| |
| Some other notes related to this paper: |
| Generative story: The generative process that generates data given some hidden variables. |
| [[http://en.wikipedia.org/wiki/Chinese_restaurant_process|Chinese restaurant process]] |
| |
| |
| |
| |