courses:rg:automatic-domain-adaptation-for-parsing [ufal wiki]

Automatic Domain Adaptation for Parsing
Comments

Automatic Domain Adaptation for Parsing

David McClovsky, Eugene Charniak, Mark Johnson (ACL 2010)

Presented by: Nathan Green
Report by: Katerina Topilova

Comments

Summary:

Idea – when parsing large data from diverse domains, it is useful for parsers to be able to generalize to a variety of domains.
The result is a system that proposes linear combinations of parsing models trained on the source corpora.

Uses regression to predict f-scores
Features – CosineTop50, UnkWords and Entropy
Training data for the regressor consists of examples of source domain mixtures and their actual f-scores on target texts

Evaluation – 2 scenarios – out-of-domain evaluation, in-domain evaluation
Baselines – Uniform, Self-Trained Uniform, Fixed Set: WSJ, Best Single Corpus, Best Seen, Best Overall
Feature selection – round-robin tuning scenario

Results:

Self-trained corpora are beneficial
This model is the best non-oracle system for both scenarios
Only 0.3% worse than Best Seen for out-of domain
Within 0.6% of the Best Seen for in-domain
0.7% better than Best Overall model

What do we dislike about the paper:

CosineTop50 feature is not sufficiently explained and leaves more possibilities of understanding
1. we could take the top 50 words from first corpus and their frequencies and find frequencies of these words in the second corpus and then compare
2. we could take the top 50 words from both corpora and compare only the frequencies, where these words overlap
3. we could find the top 50 words that the two corpora have in common and then compare the frequencies
Entropy feature could also have been more explained - we have agreed on RG, that this is the entropy of the distribution over mixtures of source domains, taking as better the mixtures, where the entropy value is closer to maximum
Only Genia and SWBD are really different and the other perform well on WSJ, more domains would be better
Overall, it would be very hard to implement this system based solely on this article

What do we like about the paper:

Results are good

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

Automatic Domain Adaptation for Parsing

Comments

Summary:

What do we dislike about the paper:

What do we like about the paper: