[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2012:encouraging-consistent-translation [2012/10/16 15:07]
dusek
courses:rg:2012:encouraging-consistent-translation [2012/10/16 15:15]
dusek
Line 24: Line 24:
   * The authors chose the ''cdec'' implementation of Hiero (which is implemented in several systems: Moses, cdec, Joshua etc.)   * The authors chose the ''cdec'' implementation of Hiero (which is implemented in several systems: Moses, cdec, Joshua etc.)
     * The choice was probably arbitrary, other systems would yield similar results     * The choice was probably arbitrary, other systems would yield similar results
 +
 **Forced decoding** **Forced decoding**
   * This means that the decoder is given source //and// target sentence and has to provide the rules/phrases that map from the source to the target   * This means that the decoder is given source //and// target sentence and has to provide the rules/phrases that map from the source to the target
Line 30: Line 31:
     * Forced decoding is much more informative for Hiero translations than for "plain" phrase-based ones, since there are many different parse trees that yield the same target string, and not as much phrases     * Forced decoding is much more informative for Hiero translations than for "plain" phrase-based ones, since there are many different parse trees that yield the same target string, and not as much phrases
  
 +**The choice and filtering of "cases"**
 +  * The "cases" in Table 1 are selected according to the //possibility// of different translations (i.e. each case has at least two translations of the source seen in the training data; the translation counts are from the test data, so it is OK that e.g. "Korea" translates as "Korea" all the time)
 +  * Table 1 is unfiltered -- only some of the "cases" are then considered relevant:
 +    * Cases that are //too similar// (less than 1/2 characters differ) are //joined together//
 +      * Beware, this notion of grouping is not well-defined, does not create equivalence classes: "old hostages" = "new hostages" = "completely new hostages" but "old hostages" != "completely new hostages" (we hope this didn't actually happen)
 +    * Cases where //only one translation variant prevails// are //discarded// (this is the case of "Korea")

[ Back to the navigation ] [ Back to the content ]