Encouraging Consistent Translation Choices

Ferhan Ture, Douglas W. Oard, and Philip Resnik
NAACL 2012
PDF

Outline -- discussion

The list of discussed topics follows the outline of the paper:

Differences from Carpuat 2009

It is different: the decoder just gets additional features, but the decision is up to it – Carpuat 2009 just post-edits the outputs and substitutes the most likely variant everywhere
- Using Carpuat 2009's approach directly in the decoder would influence neighboring words through LM, so even using this in the decoder and not as post-editing leads to a different outcome

Human translators and one sense per discourse

This suggests that modelling human translators is the same as modelling one sense per discourse – this is suspicious
- The authors do not state their evidence clearly.
- One sense is not the same as one translation

Hiero

The idea would most probably work the same in normal phrase-based SMT, but the authors use hierarchical phrase-based translation (Hiero)
- Hiero is summarized in Fig. 1: the phrases may contain non-terminals (X, X1 etc.), which leads to a probabilistic CFG and bottom-up parsing
The authors chose the cdec implementation of Hiero (which is implemented in several systems: Moses, cdec, Joshua etc.)
- The choice was probably arbitrary, other systems would yield similar results

Forced decoding

This means that the decoder is given source and target sentence and has to provide the rules/phrases that map from the source to the target
- The decoder might be unable to find the appropriate rules (for unseen words)
- It is a different decoder mode, for which it must be adjusted
- Forced decoding is much more informative for Hiero translations than for “plain” phrase-based ones, since there are many different parse trees that yield the same target string, and not as much phrases

The choice and filtering of “cases”

The “cases” in Table 1 are selected according to the possibility of different translations (i.e. each case has at least two translations of the source seen in the training data; the translation counts are from the test data, so it is OK that e.g. “Korea” translates as “Korea” all the time)
Table 1 is unfiltered – only some of the “cases” are then considered relevant
- Cases that are too similar (less than 1/2 characters differ) are joined together
  - Beware, this notion of grouping is not well-defined, does not create equivalence classes: old hostages = new hostages = completely new hostages but old hostages != completely new hostages (we hope this didn't actually happen)