[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:reranking-by-multitask-learning [2010/10/18 17:36]
popel link to slides
courses:rg:reranking-by-multitask-learning [2010/10/22 13:56]
vandas Basics of commentars and discussion after the reading
Line 5: Line 5:
 ACL 5th Workshop on Statistical Machine Translation (WMT) 2010 ACL 5th Workshop on Statistical Machine Translation (WMT) 2010
  
-===== Before reading =====+===== Suggestions for the presenter =====
  
-  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.+It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).
  
 ===== Comments ===== ===== Comments =====
 +
 +  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.
  
 ===== Opinions on the paper ===== ===== Opinions on the paper =====
  
-It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).+TODO: suggestions to solve/comment 
 + 
 +Research group suggested that they extract only those features that has a nonzero weight in any of W. 
 + 
 +Comments by M. Popel: 
 +Feature pruning using a treshold: When you have limited data, according to this work it worth to try a good feature than to set a treshold. 
 + 
 +We were arguing about the number of features used in sets. It is unlikely that they could somehow get the fixed number of features. 
 +(I suppose that it is just number of input features, if they were really used is not clear.) 
 + 
 +Every feature is only fired at the sentence where the conditions are met. 
 +Example: 500 sentences, every sentence has just one N-best list. That means 500 weight vectors 
 + 
 +We argued about hashing the features together - in what way are they hashed? 
 + 

[ Back to the navigation ] [ Back to the content ]