[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
courses:rg:transductive_learning_for_statistical_machine_translation [2010/12/08 22:41]
jawaid
courses:rg:transductive_learning_for_statistical_machine_translation [2010/12/08 23:01]
jawaid
Line 17: Line 17:
 ===== Comments ===== ===== Comments =====
  
-  * The Paper very well describes the transductive learning algorithm, **Algorithm 1** which is inspired by Yarowsky algorithm [1].+  * The Paper very well describes the transductive learning algorithm, **Algorithm 1** which is inspired by Yarowsky algorithm [1][2].
  
   * In algorithm 1, the translation model is estimated based on the sentence pairs in bilingual data L. Then a set of source language sentences, U, is translated based on the current model. A subset of good transaltions and their sources, Ti, is selected on each iteration and added to the training data. These sentence pairs are replaces in each iteration and only the original data, L, is fixed throughout algorithm.   * In algorithm 1, the translation model is estimated based on the sentence pairs in bilingual data L. Then a set of source language sentences, U, is translated based on the current model. A subset of good transaltions and their sources, Ti, is selected on each iteration and added to the training data. These sentence pairs are replaces in each iteration and only the original data, L, is fixed throughout algorithm.
Line 37: Line 37:
   * The main issue with this paper is that number of iterations that are selected to train the model are not described. And according to Figure 1 in the paper, graph achieves global maximum on iteration 16 and iteration 18 on train100k and train150K corpus. Where our main concern is they might know the BLEU score from the test set they already have and they stopped the training process or cut down the graph after particular iteration where BLEU is optimized according to already computed BLEU score.   * The main issue with this paper is that number of iterations that are selected to train the model are not described. And according to Figure 1 in the paper, graph achieves global maximum on iteration 16 and iteration 18 on train100k and train150K corpus. Where our main concern is they might know the BLEU score from the test set they already have and they stopped the training process or cut down the graph after particular iteration where BLEU is optimized according to already computed BLEU score.
  
-  * +  * Adding examples in the paper on which improvement is achieved would have helped much better to understand the impact of this training scheme. 
 + 
 +  * Ambiguous terminology is used for defining the feasibility setting in Table 3. '**' is defined as those experiments that produced minimal improvement over the baseline. But, this doesn't mean that experiments marked with '*' achieved significant improvement over baseline.
  
  
Line 44: Line 46:
 ===== Suggested Additional Reading ===== ===== Suggested Additional Reading =====
   * [1] D. Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proc. ACL   * [1] D. Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proc. ACL
-  * [2]+  * [2] S. Abney. 2004. Understanding the Yarowsky Algorithm. Comput. Ling., 30(3). 
  
  

[ Back to the navigation ] [ Back to the content ]