[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:it [2012/01/03 15:04]
zeman Domain.
user:zeman:treebanks:it [2012/01/03 15:33]
zeman Size.
Line 38: Line 38:
 ==== Size ==== ==== Size ====
  
-According to their websiteSzTB 2.0 contains 1.2 million words plus 250 thousand punctuation tokens in 82000 sentences. Only a fragment was converted to dependencies in the CoNLL 2007 version: 139,143 tokens in 6424 sentences, yielding 21.66 tokens per sentence on average (131,799 tokens / 6034 sentences training, 7344 tokens / 390 sentences test).+According to the README fileISST contains 305,547 word tokens. Only a fragment was converted to dependencies in the CoNLL 2007 version: 76295 tokens in 3359 sentences, yielding 22.71 tokens per sentence on average (71199 tokens / 3110 sentences training, 5096 tokens / 249 sentences test).
  
 ==== Inside ==== ==== Inside ====

[ Back to the navigation ] [ Back to the content ]