[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision Both sides next revision
user:zeman:treebanks:sk [2014/03/01 22:35]
zeman created
user:zeman:treebanks:sk [2014/03/03 11:50]
zeman Size.
Line 35: Line 35:
 ==== Size ==== ==== Size ====
  
-50,000 viet +The treebank reportedly contains about 50000 sentences. In HamleDT, we are currently experimenting with a subset that contains Annotator 1 annotations of documents that have manual morphological annotation, and of Wikipedia (for which the source of morphological annotation has not been confirmed). This subset contains 479473 tokens and 26149 sentences, yielding 18.34 tokens per sentence on average. We have not yet split the data into training and test parts.
- +
-The CoNLL 2006 version contains 35140 tokens in 1936 sentences, yielding 18.15 tokens per sentence on average (CoNLL 2006 data split: 28750 tokens / 1534 sentences training, 6390 tokens / 402 sentences test).+
  
 ==== Inside ==== ==== Inside ====

[ Back to the navigation ] [ Back to the content ]