[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hr [2014/07/17 21:23]
zeman Sample.
user:zeman:treebanks:hr [2014/07/17 21:27]
zeman Finalizing the page.
Line 38: Line 38:
  
 The improved pre-release version contains 83640 tokens in 3736 sentences, yielding 22.39 tokens per sentence on average. The improved pre-release version contains 83640 tokens in 3736 sentences, yielding 22.39 tokens per sentence on average.
 +
 +There is no official training-test division of the original data. For HamleDT, we have split the data 90:10, i.e. the first 3362 sentences (75236 tokens) for training and the remaining 374 sentences (8404 tokens) for testing.
  
 ==== Inside ==== ==== Inside ====
Line 70: Line 72:
 (The sum of the percentages exceeds 100% because of rounding.) (The sum of the percentages exceeds 100% because of rounding.)
  
-==== XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ==== 
 ==== Sample ==== ==== Sample ====
  
Line 105: Line 106:
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). +Nonprojectivities in SETimes.HR are rare. Only 461 of the 83640 tokens in the pre-release version are attached nonprojectively (0.55%).
- +
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| MST (McDonald et al.) | 87.57 | 92.04 | +
-| Malt (Nivre et al.) | 87.41 | 91.72 | +
-| Nara (Yuchang Cheng) | 86.34 | 91.30 |+
  
 +//Are there any published parsing results on this corpus?//

[ Back to the navigation ] [ Back to the content ]