===== Applying Morphology Generation Models to Machine Translation =====
paper by: Kristina Toutanova, Hisami Suzuki, and Achim Ruopp
presentend by: Amir Kamran
report by: Martin Popel

===== Comments =====

  * Two base MT systems (treelet and phrasal) were improved by applying models that generate word forms from target-language stems and source-language sentence. These models are MEMM trained independently on the base MT.
  * English-to-Russian and English-to-Arabic tested.
  * Morphological categories such as POS, person, gender, number etc. (7 categories for Russian and 12 for Arabic) are used in various combinations as features for the MEMM. The dependency structure of source sentences is projected to the target sentences (following Quirk et al. (2005)) and is used to form the features. The source (English) dependency parsing is done by the treelet MT.
  * Three methods of integration of the inflection models with the MT systems are described:
      * Method 1: MT from word forms to word forms as usual, the output then is stemmed and re-inflected.
      * Method 3: MT trained from word forms to stems
      * Method 2: MT trained from word forms to stems, but the alignment is done on word forms, so it is something between Method 1 and Method 3
  * Section 4.1 mentions "max-BLEU training", which is usually being referred as MERT (minimum error rate training).

===== Questions =====

  * Section 3.1 defines Inflection function as if the input is a **set of stems** <latex>S_w</latex>. Is there a difference between this Inflection and an inflection function that converts **one stem** to a set of word forms (and defines the set operation intuitively as <latex>inflection(S_w) = \cup_{stem \in S_w} inflection(stem)</latex> )?
  * What are the names of the two MT systems used? Is the "syntactically-informed treelet MT, designed following Quirk et al. (2005)" somehow related to Microsoft Bing? Is the "phrasal" re-implementation of Pharaoh some kind of Microsoft in-house code, or is it based on Moses?

===== What do we like about the paper =====

  * It is a nice and clever combination of linguistic knowledge (morpho analysis and partially also the parsing) with statistical MT.
  * Similar aims (and sometimes also solutions - see Bojar's two-step SMT) have several UFAL projects, but this paper describes it in a way that's probably easier to grasp for most MT researchers.
  * Comparing methods 1, 2 and 3 is a nice idea.
  * Two oracles were used (for n=1 and for n>1), which is well described in the paper and the oracle BLEU scores give more insight into the problem.
  * Table 1 reports "round-trip" results, i.e. analyzing=stemming a Russian/Arabic sentence and re-inflecting it (the stems and their order is not changed, so it is more straightforward to evaluate it using accuracy instead of BLEU). This also gives some idea of upper bounds (and a comparison of a baseline trigram LM with the morpho model used).

===== What do we dislike about the paper =====

  * According to Amir, about half of the paper is just a copy/description of authors' previous work (namely Minkov et al.,2007).
  * Also Minkov et al.(2007) is reported as if the authors did not apply their work to MT. However, the name of the paper is "Generating complex morphology for machine translation".
  * The improvement on the Treelet system (using Method 3) is quite impressive (2.5 BLEU points), but the improvement on phrasal MT (using Method 1) is smaller (0.7 BLEU points). Given the phrasal MT clearly outperforms the treelet MT system (29 vs. 36 BLEU points), the most interesting improvements are those of the phrasal MT, but there is no significance test (nor human evaluation) reported.
  * Method 3 (base MT trained on stems) and Method 2 (base MT trained on stems, but alignment on forms) could (and should) be applied also to the phrasal MT.
  * Table 1 caption mentions "(accuracy, %)". However, the last row of the table uses different units - number of word forms. This may confuse someone.