[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2011:deciphering_foreign_language [2011/12/06 09:52]
tran vytvořeno
courses:rg:2011:deciphering_foreign_language [2012/01/07 13:41]
tran
Line 2: Line 2:
  
 Scriber: Ke. T Scriber: Ke. T
 +
 +The talk is about how to tackle MT without parallel training data.
 +
 +==== Section 1 ====
 +Given sentence pairs (e,f) where e is an English sentence and f is a foreign sentence, the translation model estimates parameter 
 +<latex>\theta</latex> such that
 +<latex>
 +\mathop {\arg \max }\limits_\theta  \prod\limits_\theta  {p_\theta  (f|e)} 
 +</latex>
 +
 +In case we do not have parallel data, we observe foreign text and try to maximize likelihood 
 +<latex>
 +\mathop {\arg \max }\limits_\theta  \prod\limits_f {p_\theta  (f)} 
 +</latex>
 +
 +Treating English translation as hidden alignment, our task is to find the parameter <latex>\theta</latex> that
 +<latex>
 +\mathop {\arg \max }\limits_\theta  \prod\limits_f {\sum\limits_e {P(e) \times \sum\limits_a {P_\theta  (f,a|e)} } } 
 +</latex>
 +
 +==== Section 2 ====
 +Section 2 deals with a simple version of translation, Word Substitution Decipherment, where there is only one-to-one mapping between source string and cipher string (the position of string does not change.)
 +
 +The solution for this problem is pretty simple: Given a sequence of English tokens <latex>e=e_1,e_2,...,e_n</latex>, and the corresponding sequence of cipher tokens <latex>c=c_1,c_2,...,c_n</latex>, we need to estimate parameter <latex>\theta</latex>:
 +<latex>
 +\mathop {\arg \max }\limits_\theta  \prod\limits_c {P_\theta  (c)}  = \mathop {\arg \max }\limits_\theta  \prod\limits_c {\sum\limits_e {P(e) \times \prod\limits_{i = 1}^n {P_\theta  (c_i |e_i )} } } 
 +</latex>
 +
  

[ Back to the navigation ] [ Back to the content ]