Table of Contents
Unsupervised methods for head assignments, EACL 2009
Federico Sangati and Willem Zuidema
Presented by: Zdeněk Žabokrtský
Report by: Eduard Bejček and Lasha Abzianidze
Introduction
- The paper describes two methods (entropy minimization and “familiarity maximization”) for unsupervised head assignment and tests them on two corpora (English PennTB/PARC700 and German Tiger/TigerDB) by two means (against gold standard and using it in a constituency parser).
- It is the first successful unsupervised head assignment.
- The starting point of the unsupervised method is the observation that a head-annotated treebank (obeying the constraint that every nonterminal node has exactly one head daughter) defines a unique single-anchored Lexicalized Tree Substitution Grammar (LTSG) - a LTFG where every elementary tree has exactly one lexical anchor.
- Baselines: four baselines are used for the head assignments on the training data:
- Magerman-Collins scheme - based on a number of heuristic rules that only use the labels of nonterminal nodes and the ordering of daughter nodes (not general as it depends on POS tag names);
- Random - randomly marking one of the nonterminal daughters of every node as head;
- Left - the left most daughter is always marked as head;
- Right - the right most daughter is always marked as head;
- Entropy Minimization is a greedy algorithm which aims to learn the simplest LTFG fitting the data in terms of reducing the uncertainty in assigning an elementary tree to a given lexical item.
- Familiarity Maximization is also a greedy algorithm which aims to assign heads to a tree t in such a way that the elementary trees in the complete derivation of the tree t are “familiar” - frequently observed in the complete derivations of other trees.
- There is a problem when converting dependency structures into head annotations: their algorithm doesn't guarantee exactly one head assigned among siblings.
What do we dislike about the paper
- Equation 5 is misleading/wrong: entropy is here computed for the distribution p which sums up to <latex>|\mathcal{L}|</latex>.
- We found three possible interpretations of “any” in Section 3.3, second paragraph; their approach is therefore a bit unclear. The interpretations are:
- apply one fixed annotation of the heads (but it can be any annotation)
- apply the (partial) annotation which is consistent with all possible head annotations
- apply all possible head annotations
this is the case as a) this is the point of greedy technique to be theoretically able to result any possible head annotations and b) only in this case there is threat (mentioned in article) of having exponential number of elementary trees for a sentence of length <latex>n</latex> but it is less than <latex>n^2</latex>, namely <latex>n\log(n)</latex>, as for each lexical anchor we can have at most <latex>log(n)</latex> different elementary trees. -Lasha-
- It is also unclear, what “transformed to their respective spines” means exactly; Section 3.4, first item in the itemize.
While extracting the elementary trees from a head annotated sentence we are first extracting spines and than they yield elementary trees (note that in this process elementary trees and spines are in one-to-one correspondence). Hence the authors means these spines in respective spines. On figures 3 and 2 there are elementary trees and their respective spines depicted. -Lasha-
- The greediness of the entropy model could be its main weakness resulting in not so good numbers.
What do we like about the paper
- Things mentioned in the Introduction.
- Contributes to the popular problem - transformation of constituency treebanks into dependency structures.
- The correspondence between rule based head-assignment (by Magerman-Collins) and depencencies (in PARC 700 Depencency Bank) is surprisingly low: 85%.