Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:rg:2011:deciphering_foreign_language [2012/01/07 14:14] tran |
courses:rg:2011:deciphering_foreign_language [2012/01/08 22:27] (current) tran |
If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration. | If we use traditional EM, every time we update <latex>\theta</latex>, we also need to update pseudo-counts (in this case, conditional probabilities <latex>{P_\theta (c_i |e_i )</latex>.) It leads to O(|V|<sup>2</sup>) time. The heart of Iterative EM is that at every iteration, the algorithm run on a proportion of the most frequent words in vocabulary, and whenever the algorithm estimates <latex>P(c_i|e_i) > 0.5 </latex>, it fixes that probability equal to 1 in the following iteration, hence, the number of free parameters need to be estimated reduce after each iteration. |
| |
__**Practical question:**__ How to initiate EM? How to start the first iteration? | __**Practical questions:**__ How to initiate EM? How to start the first iteration? |
| |
Some other notes related to this paper: | **Some other notes related to this paper:** |
Generative story: The generative process that generates data given some hidden variables. | - Generative story: The generative process that generates data given some hidden variables. |
Chinese restaurant process | - [[http://en.wikipedia.org/wiki/Chinese_restaurant_process|Chinese restaurant process]] |
| - Gibbs sampling: an algorithm that generates a sequence of samples from the joint probability distribution of two or more random variables. |
| |
| Why did they experiment with Temporal expression corpus? This corpus has relatively small word types, it makes easier to compare Iterative EM with full EM. |
| |
| ==== Section 3 ==== |
| Not many details of this section was presented, however, there are few discussions around this. |
| |
| How to choose the best translation? After finishing parameter estimation, pick the final sample and extract the corresponding English translations for every for- eign sentence. This yields the final decipherment output. |
| |
| Given another text (which is not in training data), how to translate it? Use MLE to find the best translation from the model. |
| |
| ==== Conclusion ==== |
| This is an interesting paper, however, there is a lot of maths behind. |
| |
| |
| |