====== Unsupervised methods for head assignments, EACL 2009 ======
=== Federico Sangati and Willem Zuidema ===
Presented by: Zdeněk Žabokrtský
Report by: Eduard Bejček and Lasha Abzianidze
===== Introduction =====
* The paper describes **two** methods (entropy minimization and "familiarity maximization") for **unsupervised** head assignment and tests them on **two** corpora (English PennTB/PARC700 and German Tiger/TigerDB) by **two** means (against gold standard and using it in a constituency parser).
* It is the **first** successful unsupervised head assignment.
* The starting point of the unsupervised method is the observation that a head-annotated treebank (obeying the constraint that every nonterminal node has exactly one head daughter) defines a unique single-anchored Lexicalized Tree Substitution Grammar (LTSG) - a LTFG where every elementary tree has exactly one lexical anchor.
* Baselines: **four baselines** are used for the head assignments on the training data:
* **Magerman-Collins scheme** - based on a number of heuristic rules that only use the labels of nonterminal nodes and the ordering of daughter nodes (not general as it depends on POS tag names);
* **Random** - randomly marking one of the nonterminal daughters of every node as head;
* **Left** - the left most daughter is always marked as head;
* **Right** - the right most daughter is always marked as head;
* **Entropy Minimization** is a greedy algorithm which aims to learn the simplest LTFG fitting the data in terms of reducing the uncertainty in assigning an elementary tree to a given lexical item.
* **Familiarity Maximization** is also a greedy algorithm which aims to assign heads to a tree //t// in such a way that the elementary trees in the complete derivation of the tree //t// are "familiar" - frequently observed in the complete derivations of other trees.
* There is a problem when converting dependency structures into head annotations: their algorithm doesn't guarantee **exactly one head** assigned among siblings.
===== What do we dislike about the paper =====
* Equation 5 is misleading/wrong: entropy is here computed for the distribution **p** which sums up to |\mathcal{L}|.
* We found three possible interpretations of "//any//" in Section 3.3, second paragraph; their approach is therefore a bit unclear. The interpretations are:
- apply one fixed annotation of the heads (but it can be any annotation)
- apply the (partial) annotation which is consistent with all possible head annotations
- apply all possible head annotations
> this is the case as a) this is the point of greedy technique to be theoretically able to result any possible head annotations and b) only in this case there is threat (mentioned in article) of having exponential number of elementary trees for a sentence of length n but it is less than n^2, namely n\log(n), as for each lexical anchor we can have at most log(n) different elementary trees. -Lasha-
* It is also unclear, what "transformed to their respective **spines**" means exactly; Section 3.4, first item in the itemize.
> While extracting the elementary trees from a head annotated sentence we are first extracting //spines// and than they yield elementary trees (note that in this process elementary trees and //spines// are in one-to-one correspondence). Hence the authors means these //spines// in //respective spines//. On figures 3 and 2 there are elementary trees and their //respective spines depicted//. -Lasha-
* The **greediness** of the entropy model could be its main weakness resulting in not so good numbers.
===== What do we like about the paper =====
* Things mentioned in the Introduction.
* Contributes to the popular problem - transformation of constituency treebanks into dependency structures.
* The correspondence between **rule** based head-assignment (by Magerman-Collins) and depencencies (in PARC 700 Depencency Bank) is **surprisingly low**: 85%.