[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:sv [2012/01/17 14:11]
zeman Sample.
user:zeman:treebanks:sv [2014/04/22 16:56]
zeman Updated link.
Line 1: Line 1:
 ===== Swedish (sv) ===== ===== Swedish (sv) =====
  
-[[http://w3.msi.vxu.se/~nivre/research/Talbanken05.html|Talbanken05]]+[[http://stp.lingfil.uu.se/~nivre/research/Talbanken05.html|Talbanken05]]
  
 ==== Versions ==== ==== Versions ====
Line 47: Line 47:
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEATof the CoNLL formatThere //should// be a 1-1 mapping between the [[http://www.buch-kromann.dk/matthias/treebank/PAROLE-manual.pdf|DDT positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=da::conll|DZ Interset]] to inspect the CoNLL tagset+The morphological analysis in the CoNLL 2006 version does not include lemmas. The part-of-speech tags have been assigned (probablymanuallyThe tagset is very coarse, there are no morphological features, just the part of speech. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=sv::mamba|DZ Interset]] to inspect the tagset.
- +
-The morphological analysis in the CoNLL 2006 version does not include lemmas (the original DTAG version does contain them). The morphosyntactic tags have been assigned (probably) manually. +
- +
-Some multi-word expressions have been collapsed into one token, using underscore as the joining character. This includes adverbially used prepositional phrases (e.g. i_lørdags = on Saturdays) but not named entities.+
  
 ==== Sample ==== ==== Sample ====
Line 92: Line 88:
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in DDT are not frequent. Only 988 of the 100,238 tokens in the CoNLL 2006 version are attached nonprojectively (0.99%).+Nonprojectivities in Talbanken are not frequent. Only 1928 of the 197,123 tokens in the CoNLL 2006 version are attached nonprojectively (0.98%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Danish:+The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Swedish:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-MST (McDonald et al.) | 84.79 | 90.58 +Microsoft (Corston-Oliver and Aue) | 79.69 89.54 
-| Malt (Nivre et al.) | 84.77 | 89.80 +| Malt (Nivre et al.) | 84.58 | 89.50 
-Riedel et al. | 83.63 89.66 |+Illinois (Do and Chang) | 82.31 | 89.05 | 
 +| MST (McDonald et al.82.55 88.93 | 
 +| Kenji Sagae | 82.00 | 88.57 | 
 +| Nara (Yuchang Cheng) | 81.08 | 88.57 | 
 +| Basis (John O'Neil) | 81.78 | 88.45 | 
 +| Riedel et al. | 80.66 | 88.33 |
  

[ Back to the navigation ] [ Back to the content ]