[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2012:encouraging-consistent-translation [2012/10/16 15:05]
dusek
courses:rg:2012:encouraging-consistent-translation [2012/10/17 11:44]
dusek
Line 8: Line 8:
 The list of discussed topics follows the outline of the paper: The list of discussed topics follows the outline of the paper:
 ==== Sec. 2. Related Work ==== ==== Sec. 2. Related Work ====
-  * **Differences from Carpuat 2009** +**Differences from Carpuat 2009** 
-    Yes: the decoder just gets additional features, but the decision is up to it -- Carpuat 2009 just post-edits the outputs and substitutes the most likely variant everywhere +  It is different: the decoder just gets additional features, but the decision is up to it -- Carpuat 2009 just post-edits the outputs and substitutes the most likely variant everywhere 
-      * Using Carpuat 2009's approach directly in the decoder would influence neighboring words through LM, so even using this in the decoder and not as post-editing leads to a different outcome +    * Using Carpuat 2009's approach directly in the decoder would influence neighboring words through LM, so even using this in the decoder and not as post-editing leads to a different outcome 
-  **Human translators and one sense per discourse** + 
-    * This suggests that modelling human translators is the same as modelling one sense per discourse -- this is suspicious +**Human translators and one sense per discourse** 
-      * The authors do not state their evidence clearly. +  * This suggests that modelling human translators is the same as modelling one sense per discourse -- this is suspicious 
-      * One sense is not the same as one translation+    * The authors do not state their evidence clearly. 
 +    * One sense is not the same as one translation 
 ==== Sec. 3. Exploratory analysis ==== ==== Sec. 3. Exploratory analysis ====
-  * **Hiero** +**Hiero** 
-    * The idea would most probably work the same in normal phrase-based SMT, but the authors use hierarchical phrase-based translation (Hiero) +  * The idea would most probably work the same in normal phrase-based SMT, but the authors use hierarchical phrase-based translation (Hiero) 
-      * Hiero is summarized in Fig. 1: the phrases may contain non-terminals (''X'', ''X1'' etc.), which leads to a probabilistic CFG and bottom-up parsing +    * Hiero is summarized in Fig. 1: the phrases may contain non-terminals (''X'', ''X1'' etc.), which leads to a probabilistic CFG and bottom-up parsing 
-    * The authors chose the ''cdec'' implementation of Hiero (which is implemented in several systems: Moses, cdec, Joshua etc.) +  * The authors chose the ''cdec'' implementation of Hiero (which is implemented in several systems: Moses, cdec, Joshua etc.) 
-      * The choice was probably arbitrary, other systems would yield similar results +    * The choice was probably arbitrary, other systems would yield similar results 
-  **Forced decoding** + 
-    * This means that the decoder is given source //and// target sentence and has to provide the rules/phrases that map from the source to the target +**Forced decoding** 
-      * The decoder might be unable to find the appropriate rules (for unseen words) +  * This means that the decoder is given source //and// target sentence and has to provide the rules/phrases that map from the source to the target 
-      * It is a different decoder mode, for which it must be adjusted +    * The decoder might be unable to find the appropriate rules (for unseen words) 
-      * Forced decoding is much more informative for Hiero translations than for "plain" phrase-based ones, since there are many different parse trees that yield the same target string, and not as much phrases+    * It is a different decoder mode, for which it must be adjusted 
 +    * Forced decoding is much more informative for Hiero translations than for "plain" phrase-based ones, since there are many different parse trees that yield the same target string, and not as much phrases 
 + 
 +**The choice and filtering of "cases"** 
 +  * The "cases" in Table 1 are selected according to the //possibility// of different translations (i.e. each case has at least two translations of the source seen in the training data; the translation counts are from the test data, so it is OK that e.g. "Korea" translates as "Korea" all the time) 
 +  * Table 1 is unfiltered -- only some of the "cases" are then considered relevant: 
 +    * Cases that are //too similar// (less than 1/2 characters differ) are //joined together// 
 +      * Beware, this notion of grouping is not well-defined, does not create equivalence classes: "old hostages" = "new hostages" = "completely new hostages" but "old hostages" != "completely new hostages" (we hope this didn't actually happen) 
 +    * Cases where //only one translation variant prevails// are //discarded// (this is the case of "Korea"
 + 
 +==== Sec. 4. Approach ==== 
 +The actual experiments begin only now; the used data is different. 
 + 
 +**Choice of features** 
 +  * They define 3 features that are designed to be biased towrds consistency -- or are they? 
 +    * If e.g. two variants are used 2 times each, they will have roughly the same score 
 +  * The BM25 function is a refined version of the [[http://en.wikipedia.org/wiki/TF-IDF|TF-IDF]] score 
 +  * The exact parameter values are probably not tuned, left at a default value (and maybe they don't have much influence anyway) 
 +   * See NPFL103 for details on Information retrieval, it's largely black magic 
 + 
 +**Feature weights** 
 +  * The usual model in MT is scoring the hypotheses according to the feature values (''f'') and their weights (''lambda''):  
 +    * ''score(H) = exp( sum( lambda_i * f_i(H)) )'' 
 +  * The feature weights are trained on a heldout data set using [[http://acl.ldc.upenn.edu/acl2003/main/pdfs/Och.pdf|MERT]] (or, here: [[http://en.wikipedia.org/wiki/Margin_Infused_Relaxed_Algorithm|MIRA]]) 
 +  * The resulting weights are not mentioned, but if the weight is < 0, will this favor different translation choices? 
 + 
 +**Meaning of the individual features** 
 +  * C1 indicates that a certain Hiero rule was used frequently 
 +    * but rules are very similar, so we also need something less fine-grained 
 +  * C2 is a target-side feature, just counts the target side tokens (only the "most important" ones; in terms of TF-IDF) 
 +    * It may be compared to Language Model features, but is trained only on the target part of the bilingual training data. 
 +  * C3 counts occurrences of source-target token pairs (and uses the "most important" term pair for each rule, again)
  
 +**Requirements of the new features**
 +  * They need two passes through the data
 +  * You need to have document segmentation
 +    * Since the frequencies are trained on the training set, you can just translate one document at a time, no need to have full sets of documents

[ Back to the navigation ] [ Back to the content ]