Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:encouraging-consistent-translation [2012/10/17 11:43]
dusek
+++ courses:rg:2012:encouraging-consistent-translation [2012/10/17 11:59]
dusek
@@ Line 8: / Line 8: @@
 The list of discussed topics follows the outline of the paper:
 ==== Sec. 2. Related Work ====
 **Differences from Carpuat 2009**
   * It is different: the decoder just gets additional features, but the decision is up to it -- Carpuat 2009 just post-edits the outputs and substitutes the most likely variant everywhere
@@ Line 17: / Line 16: @@
     * The authors do not state their evidence clearly.
     * One sense is not the same as one translation
-==== Sec. 3. Exploratory analysis ====
+==== Sec. 3. Exploratory analysis ====
 **Hiero**
   * The idea would most probably work the same in normal phrase-based SMT, but the authors use hierarchical phrase-based translation (Hiero)
@@ Line 39: / Line 38: @@
 ==== Sec. 4. Approach ====
 The actual experiments begin only now; the used data is different.
@@ Line 59: / Line 57: @@
     * but rules are very similar, so we also need something less fine-grained
   * C2 is a target-side feature, just counts the target side tokens (only the "most important" ones; in terms of TF-IDF)
-    * It may be compared to Language Model features, but is trained only on the target part of the bilingual training data.
+    * It may be compared to Language Model features, but is trained only on the target part of the bilingual tuning data.
   * C3 counts occurrences of source-target token pairs (and uses the "most important" term pair for each rule, again)
@@ Line 65: / Line 63: @@
   * They need two passes through the data
   * You need to have document segmentation
-    * Since the frequencies are trained on the training set, you can just translate one document at a time, no need to have full sets of documents
+    * Since the frequencies are trained on the tuning set (see Sec. 5), you can just translate one document at a time, no need to have full sets of documents
+==== Sec. 5. Evaluation and Discussion ====
+**Choice of baseline**
+  * Baselines are quite nice and competitive, we believe this really is an improvement
+  * MIRA is very cutting-edge
+**Tuning the feature weights**
+  * For the 1st phase, "heuristically" probably means they just used some reasonable enough values, e.g. from earlier experiments
+    * This is in order to speed up the experiment, they don't want to wait for MIRA twice.
+**Different evaluation metrics**
+  * The BLEU variants do not differ that much, only in Brevity Penalty for multiple references
+    * IBM BLEU uses the reference that is closest to the MT output (in terms of length), NIST BLEU uses the shortest one
+  * This was probably just due to some technical reasons, e.g. they had their optimization software designed for one metric and not the other

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences