[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks [2011/11/19 13:08]
zeman Greek sample.
user:zeman:treebanks [2011/11/19 23:23]
zeman Greek parsing.
Line 1590: Line 1590:
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEATof the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=bg::conll|DZ Interset]] to inspect the CoNLL tagset. +The syntactic annotation style and the tagset for dependency relations (analytical functionsin GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]].
- +
-The morphological analysis does not include lemmas. The morphosyntactic tags have been assigned (probably) manually. +
- +
-The guidelines for syntactic annotation are documented in the other [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|technical report]]. The CoNLL distribution contains the BulTreeBankReadMe.html file with a brief description of the syntactic tags (dependency relation labels).+
  
 ==== Sample ==== ==== Sample ====
Line 1649: Line 1645:
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).+Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens in the CoNLL 2007 version are attached nonprojectively (1.17%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al.2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-MST (McDonald et al.87.57 92.04 | +Nakagawa | 76.31 | 84.08 | 
-| Malt (Nivre et al.) | 87.41 91.72 +| Keith Hall et al. | 74.21 82.04 
-Nara (Yuchang Cheng) | 86.34 91.30 |+| Carreras | 73.56 | 81.37 
 +| Malt (Nilsson et al.) | 74.65 81.22 
 +Titov et al. | 73.52 | 81.20 | 
 +| Chen | 74.42 | 81.16 | 
 +| Duan | 74.29 | 80.77 | 
 +| Attardi et al. | 73.92 | 80.75 | 
 +| Malt (J. Hall et al.) | 74.21 80.66 | 
 + 
 +The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  

[ Back to the navigation ] [ Back to the content ]