[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2011:deciphering_foreign_language [2012/01/07 13:25]
tran
courses:rg:2011:deciphering_foreign_language [2012/01/07 14:14]
tran
Line 21: Line 21:
 \mathop {\arg \max }\limits_\theta  \prod\limits_f {\sum\limits_e {P(e) \times \sum\limits_a {P_\theta  (f,a|e)} } }  \mathop {\arg \max }\limits_\theta  \prod\limits_f {\sum\limits_e {P(e) \times \sum\limits_a {P_\theta  (f,a|e)} } } 
 </latex> </latex>
 +
 +==== Section 2 ====
 +Section 2 deals with a simple version of translation, Word Substitution Decipherment, where there is only one-to-one mapping between source string and cipher string (the position of string does not change.)
 +
 +The solution for this problem is pretty simple: Given a sequence of English tokens <latex>e=e_1,e_2,...,e_n</latex>, and the corresponding sequence of cipher tokens <latex>c=c_1,c_2,...,c_n</latex>, we need to estimate parameter <latex>\theta</latex>:
 +<latex>
 +\mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } } 
 +</latex>
 +
 +The key idea of section 2 is the Iterative EM algorithm, which is used to estimate <latex>\theta</latex> more effectively in term of saving memory and running time (complexity).
 +
 +If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta  (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration.
 +
 +__**Practical question:**__ How to initiate EM? How to start the first iteration?
 +
 +Some other notes related to this paper:
 +Generative story: The generative process that generates data given some hidden variables.
 +Chinese restaurant process
 +
 +
 +
 +

[ Back to the navigation ] [ Back to the content ]