Table of Contents

N-best Reranking by Multitask Learning

Kevin Duh, Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki, Masaaki Nagata
http://www.aclweb.org/anthology/W/W10/W10-1757.pdf
Kevin's slides
ACL 5th Workshop on Statistical Machine Translation (WMT) 2010

Suggestions for the presenter

It would be great to have an illustrative but simple example of N-best list and also examples of features and examples of labels (to specify the terminology).

Comments

Answer by Kevin Duh:
I think some of the unclearness stems from the fact that I tried to present various multitask learning algorithms under the same framework, but in practice the details differ for each algorithm. The number of input (hashed) features is 4000. For Shared Subspace and Unsupervised Select, we picked from {250,500,1000} features, but for Joint Regularization we do ExtractCommonFeature(W).
Answer by Kevin Duh:
The three multitask methods work on hashed representation of features, and the thresholding is on the original features, so the number of features for “Unsupervised FeatureSelect + Feature Threshold x>10” is really 60,500 distinct features. I counted them separately (though in practice it is possible that some hashed features have close analogs to original features). We didn't do an quantitative analysis that tries to map selected hash features to original features, but instead did another experiment on smaller dataset that directly trains on the original features (see footnote 10): the result was that “rare” features were those involving conjunctions including function words and special characters.

What do we dislike about the paper

What do we like about the paper

Written by Karel Vandas and Martin Popel