[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:rg:reranking-by-multitask-learning [2010/10/15 15:23]
popel before reading
courses:rg:reranking-by-multitask-learning [2010/10/22 13:56]
vandas Basics of commentars and discussion after the reading
Line 2: Line 2:
 Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki, Masaaki Nagata Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki, Masaaki Nagata
 [[http://aclweb.org/anthology-new/P/P10/P10-1160.pdf]] [[http://aclweb.org/anthology-new/P/P10/P10-1160.pdf]]
 +[[http://www.kecl.ntt.co.jp/icl/lirg/members/kevinduh/papers/duh10multitask-slides.pdf|Kevin's slides]]
 ACL 5th Workshop on Statistical Machine Translation (WMT) 2010 ACL 5th Workshop on Statistical Machine Translation (WMT) 2010
  
-===== Before reading =====+===== Suggestions for the presenter =====
  
-  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.+It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).
  
 ===== Comments ===== ===== Comments =====
 +
 +  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.
  
 ===== Opinions on the paper ===== ===== Opinions on the paper =====
 +
 +TODO: suggestions to solve/comment
 +
 +Research group suggested that they extract only those features that has a nonzero weight in any of W.
 +
 +Comments by M. Popel:
 +Feature pruning using a treshold: When you have limited data, according to this work it worth to try a good feature than to set a treshold.
 +
 +We were arguing about the number of features used in sets. It is unlikely that they could somehow get the fixed number of features.
 +(I suppose that it is just number of input features, if they were really used is not clear.)
 +
 +Every feature is only fired at the sentence where the conditions are met.
 +Example: 500 sentences, every sentence has just one N-best list. That means 500 weight vectors
 +
 +We argued about hashing the features together - in what way are they hashed?
  
  

[ Back to the navigation ] [ Back to the content ]