======Non-projective Dependency Parsing using Spanning Tree Algorithms======
**Ryan McDonald, Fernando Pereira, Kiril Ribarov and Jan Hajič**
{{http://acl.ldc.upenn.edu/H/H05/H05-1066.pdf | Non-projective Dependency Parsing using Spanning Tree Algorithms }}
in proceedings of ACL 2005

report by **Lasha Abzianidze**
===== Introduction =====
In the paper, authors formalize weighted dependency parsing as searching for maximum spanning trees (MST) in complete directed graphs, where vertices are words of a sentence and edges dependency relations. Based on this representation, they use Chu-Liu-Edmonds MST algorithm for non-projective parsing (allowing non-projective parses). With this result they got the less complex non-projecting parsing algorithm with <latex>O(n^2)</latex> time complexity than Eisner's algorithm for projective parsing (forbids non-projective parses) with <latex>O(n^3)</latex> time complexity. Moreover, the non-projective parsing algorithm shows the increase in efficiency and accuracy for languages with non-projective dependencies (checked on Czech PDT).


===== Comments =====

  * In English, projective trees are sufficient to analyze most sentence types, e.g. Penn Treebank contains only projective trees. On the other hand, in languages with more flexible word order than English, for instance Czech, non-projective dependencies are more frequent. For the demonstration, there are about 23% of non-projective dependency trees in Czech PDT.  
  * In the paper, a non-projective parser is understood as a parser allowing non-projective dependency parse trees along with projective dependency parse trees, in contrast to a projective parser, which forbids non-projective dependency parse trees. Also under the space of non-projective trees, authors mean the union space of both non-projective and projective trees.
  * The score of dependency trees are commonly represented as the sum of the scores of all edges in the tree, and the score of an edge is a dot product weight vector and feature vector (containing information about nodes - words).
  * Chu-Liu-Edmonds algorithm for finding maximum spanning trees takes in general <latex>O(n^3)</latex> time but for dense graphs Tarjan (1997) gave an efficient implementation of the algorithm with <latex>O(n^2)</latex>. The latter implementation is used by authors.          
  * In the training phase, two modified versions of the Margin Infused relaxed Algorithm (MIRA) are used: Single-best MIRA and Factored MIRA. The reason of the modifications is to lower the time complexity of the training.
  * Experiments are done on the Czech PDT. In particular, on entire PDT (Czech-A) and on 23% portion of PDT including only non-projective dependency trees (Czech-B). The introduced algorithm is competing to other 3 dependency parsers (2 projective and 1 pseudo-projective).      
  * Chu-Liu-Edmonds MST algorithm with Factored MIRA shows the best results on both Czech-A and Czech-B, and slightly lower results than McD2005 projective parser on English projective dependency trees.   

===== What do we like about the paper =====
  * The paper does not require much background knowledge from a reader
  * All algorithms and approaches are described clearly.
===== What do we dislike about the paper =====
  * Subtle use of the notion "non-projective" meaning both non-projective and projective together, which can be little bit confusing for a reader.


\\ written by //[[lasha.abzianidze@gmail.com|Lasha Abzianidze]]//