[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:rg:2011:deciphering_foreign_language [2012/01/07 13:41]
tran
courses:rg:2011:deciphering_foreign_language [2012/01/08 22:27] (current)
tran
Line 29: Line 29:
 \mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } }  \mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } } 
 </latex> </latex>
 +
 +The key idea of section 2 is the Iterative EM algorithm, which is used to estimate <latex>\theta</latex> more effectively in term of saving memory and running time (complexity).
 +
 +If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta  (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration.
 +
 +__**Practical questions:**__ How to initiate EM? How to start the first iteration?
 +
 +**Some other notes related to this paper:**
 +  - Generative story: The generative process that generates data given some hidden variables.
 +  - [[http://en.wikipedia.org/wiki/Chinese_restaurant_process|Chinese restaurant process]]
 +  - Gibbs sampling:  an algorithm that generates a sequence of samples from the joint probability distribution of two or more random variables.
 +
 +Why did they experiment with Temporal expression corpus? This corpus has relatively small word types, it makes easier to compare Iterative EM with full EM.
 +
 +==== Section 3 ====
 +Not many details of this section was presented, however, there are few discussions around this.
 +
 +How to choose the best translation? After finishing parameter estimation, pick the final sample and extract the corresponding English translations for every for- eign sentence. This yields the final decipherment output.
 +
 +Given another text (which is not in training data), how to translate it? Use MLE to find the best translation from the model.
 +
 +==== Conclusion ====
 +This is an interesting paper, however, there is a lot of maths behind.  
 +
  
  

[ Back to the navigation ] [ Back to the content ]