[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2011:deciphering_foreign_language [2012/01/07 13:41]
tran
courses:rg:2011:deciphering_foreign_language [2012/01/07 14:20]
tran
Line 29: Line 29:
 \mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } }  \mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } } 
 </latex> </latex>
 +
 +The key idea of section 2 is the Iterative EM algorithm, which is used to estimate <latex>\theta</latex> more effectively in term of saving memory and running time (complexity).
 +
 +If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta  (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration.
 +
 +__**Practical questions:**__ How to initiate EM? How to start the first iteration?
 +
 +**Some other notes related to this paper:**
 +  - Generative story: The generative process that generates data given some hidden variables.
 +  - [[http://en.wikipedia.org/wiki/Chinese_restaurant_process|Chinese restaurant process]]
 +  - Gibbs sampling:  an algorithm that generates a sequence of samples from the joint probability distribution of two or more random variables.
 +
 +
 +
 +
  
  

[ Back to the navigation ] [ Back to the content ]