[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2012:encouraging-consistent-translation-bushra [2012/10/23 15:33]
jawaid
courses:rg:2012:encouraging-consistent-translation-bushra [2012/10/23 15:49]
jawaid
Line 12: Line 12:
  
 ====Approach:==== ====Approach:====
-  - The core idea of maintaining translation consistencies (TC) is implemented by intrdocuing bias towards TC in form of "consistency features". Three consistency features are used inside the decoding model and their values are estimated using 2-pass decoding scheme.+  - The core idea of maintaining translation consistencies (TC) is implemented by intrdocuing bias towards TC in the form of "consistency features". Three consistency features are used inside the decoding model and their values are estimated using 2-pass decoding scheme.
   - BM25 which is used as term weighting function is a well known ranking function in the filed of information retrieval and a refined version of TF-IDF (another ranking function uses in IR).   - BM25 which is used as term weighting function is a well known ranking function in the filed of information retrieval and a refined version of TF-IDF (another ranking function uses in IR).
-  - C<sub>1</sub> is a fine-grain approach of term weighting function and it is computed by counting how many times rule was applied in first-pass. This approach suffers when source and target phrase differs only in non-terminal positioning or use of determiners. +  - Description of consistency features: 
-  - C<sub>2</sub> on other hand is a coarse-grain function which takes only target tokens into account. To us, C2 looks similar to the language model feature but trained only on the target side of the dev set. +    - C<sub>1</sub> is a fine-grain approach of term weighting function and it is computed by counting how many times rule was applied in first-pass. This approach suffers when source and target phrase differs only in non-terminal positioning or with the presence of determiners. 
-  - C<sub>3</sub> goes over all alignment pairs and for each rule it select those term pairs that have maximum feature vaue.+    - C<sub>2</sub> on other hand is a coarse-grain function which takes only target tokens into account. To us, C<sub>2</sub> looks similar to the language model feature but trained only on the target side of the dev set. 
 +    - C<sub>3</sub> goes over all alignment pairs and for each rule it select those term pairs that have maximum feature vaue.
  
 Evaluation: Evaluation:
  
   - Cdec's implementation of Hierarchical MT is used in this work. As we know, hierarchical decoding is also implemented in other MT systems such as Moses, Joshua etc. The selection of cdec over other MT systems is authors' personal choice and doest not bring extra benefits.   - Cdec's implementation of Hierarchical MT is used in this work. As we know, hierarchical decoding is also implemented in other MT systems such as Moses, Joshua etc. The selection of cdec over other MT systems is authors' personal choice and doest not bring extra benefits.
-  - MIRA is used to train the MT system.+  - MIRA is used for tuning feature weights.
   - Authors don't tune decoder in first-pass i.e. they don't calulcate feature weights (lambda) and probably they use weights from their previous experiments or setups. They don't clearly state the reason of this decision but our hypothesis is they might skiped the tuning step just to speed up the translation process.   - Authors don't tune decoder in first-pass i.e. they don't calulcate feature weights (lambda) and probably they use weights from their previous experiments or setups. They don't clearly state the reason of this decision but our hypothesis is they might skiped the tuning step just to speed up the translation process.
-  - NIST-BLEU is used to compare results with official NIST evaluation whereas IBM-BLEU is used for evaluating the rest of experiments. We don't fully understand the use of different BLEU (prefering shorter sentences incase of NIST and longer incase of IBM) for evaluation and not sticking with NIST-BLEU only.+  - NIST-BLEU (prefers shorter sentences) is used to compare results with official NIST evaluation whereas IBM-BLEU (prefers longer sentences) is used for evaluating the rest of experiments. We don't fully understand the use of different BLEU for evaluation and why they didn't use only NIST-BLEU for evaluation.
   - They gain maximum of 1.0 point increase in BLEU after combining all three features.   - They gain maximum of 1.0 point increase in BLEU after combining all three features.
   - Authors called BLEU as a "conservative measure" due to negatively marking their system whenever a selection of content word didn't have exact match in reference translation. We strongly disagree with their claim because baseline is also evaluated using same the metric and it also suffers with decreasy in accuracy because of content word mismatch. We see two solutions to the mentioned issue:   - Authors called BLEU as a "conservative measure" due to negatively marking their system whenever a selection of content word didn't have exact match in reference translation. We strongly disagree with their claim because baseline is also evaluated using same the metric and it also suffers with decreasy in accuracy because of content word mismatch. We see two solutions to the mentioned issue:
- i- They could have supported their argument by manually evaluating the test set. +    - They could have supported their argument by manually evaluating the test set. 
- ii- Instead of wasting half of the page length by criticizing over BLEU, they could have evaluated their system on other metric such as METEOR. +    - Instead of wasting half of the page length by criticizing over BLEU, they could have evaluated their system on other metric such as METEOR. 
-  - Also, significance testing is not performed.+  - We believe that significance testing should have been performed.
  
 ====Conclusion:==== ====Conclusion:====
-Paper is nicely written and all experiments are well documented. We believe that consistent translation choices system is better suited only for translating from direction of morphologically-rich to morphologically-low language pairs but in case of reverse direction this approach can make serious errors by putting different morhological forms of the words bearing different meanings under the conistent translations.+Paper is nicely written and all experiments are well documented. We believe that consistent translation choices system is well suited only for translating from direction of morphologically-rich to morphologically-low language pairs and not the other way round. For translating in direction of morphologically rich languages, this approach can make serious errors by putting different morhological forms of the wordsbearing different meaningsunder the conistent translations. 

[ Back to the navigation ] [ Back to the content ]