Differences

This shows you the differences between two versions of the page.

--- courses:rg:reranking-by-multitask-learning [2010/10/18 10:42]
ivanova
+++ courses:rg:reranking-by-multitask-learning [2010/10/22 13:56]
vandas Basics of commentars and discussion after the reading
@@ Line 2: / Line 2: @@
 Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki, Masaaki Nagata
 [[http://aclweb.org/anthology-new/P/P10/P10-1160.pdf]]
+[[http://www.kecl.ntt.co.jp/icl/lirg/members/kevinduh/papers/duh10multitask-slides.pdf|Kevin's slides]]
 ACL 5th Workshop on Statistical Machine Translation (WMT) 2010
-===== Before reading =====
+===== Suggestions for the presenter =====
-  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.
+It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).
 ===== Comments =====
+  * <latex>L_p </latex> norm of a vector <latex>\vec{x}=(x_1, x_2,...,x_n)</latex> is defined as <latex>||\vec{x}||_p = (\sum_i |x_i|^p )^{1/p} </latex>, so e.g. <latex>L_1</latex> norm is simply a sum of absolute values. <latex>L_p</latex> norm is sometimes called also <latex>\fract{l}_p</latex> norm or just p-norm.
 ===== Opinions on the paper =====
-It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).
+TODO: suggestions to solve/comment
+Research group suggested that they extract only those features that has a nonzero weight in any of W.
+Comments by M. Popel:
+Feature pruning using a treshold: When you have limited data, according to this work it worth to try a good feature than to set a treshold.
+We were arguing about the number of features used in sets. It is unlikely that they could somehow get the fixed number of features.
+(I suppose that it is just number of input features, if they were really used is not clear.)
+Every feature is only fired at the sentence where the conditions are met.
+Example: 500 sentences, every sentence has just one N-best list. That means 500 weight vectors
+We argued about hashing the features together - in what way are they hashed?

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences