[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:sl [2012/01/16 13:38]
zeman Sample.
user:zeman:treebanks:sl [2012/01/16 14:01]
zeman Parsing.
Line 44: Line 44:
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEAT) of the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=bg::conll|DZ Interset]] to inspect the CoNLL tagset.+The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEAT) of the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=sl::conll|DZ Interset]] to inspect the CoNLL tagset.
  
-The morphological analysis does not include lemmas. The morphosyntactic tags have been assigned (probably) manually+The morphological analysis includes lemmas. The morphosyntactic tags have been assigned (probably) manually.
- +
-The guidelines for syntactic annotation are documented in the other [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|technical report]]. The CoNLL distribution contains the BulTreeBankReadMe.html file with a brief description of the syntactic tags (dependency relation labels).+
  
 ==== Sample ==== ==== Sample ====
Line 111: Line 109:
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).+Nonprojectivities in SDT are not frequent. Only 675 of the 35140 tokens in the CoNLL 2006 version are attached nonprojectively (1.92%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian:+The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Slovene:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| MST (McDonald et al.) | 87.57 92.04 +| MST (McDonald et al.) | 73.44 83.17 
-Malt (Nivre et al.) | 87.41 91.72 | +Edinburgh (Riedel et al.) | 71.20 83.17 | 
-| Nara (Yuchang Cheng) | 86.34 91.30 |+| Microsoft (Corston-Oliver and Aue) | 72.42 | 81.77 | 
 +| Basis (John O'Neil) | 71.08 | 81.71 
 +| Nara (Yuchang Cheng) | 71.42 81.14 |
  

[ Back to the navigation ] [ Back to the content ]