[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:tr [2012/03/22 20:48]
zeman Domain.
user:zeman:treebanks:tr [2012/03/22 20:52]
zeman Size.
Line 39: Line 39:
 ==== Size ==== ==== Size ====
  
-According to their website, SzTB 2.0 contains 1.2 million words plus 250 thousand punctuation tokens in 82000 sentences. Only a fragment was converted to dependencies in the CoNLL 2007 version: 139,143 tokens in 6424 sentences, yielding 21.66 tokens per sentence on average (131,799 tokens / 6034 sentences training, 7344 tokens / 390 sentences test).+According to their website, the treebank contains 7262 sentences. The CoNLL 2007 version contains 69695 tokens in 5935 sentences, yielding 11.74 tokens per sentence on average (65182 tokens / 5635 sentences training, 4513 tokens / 300 sentences test).
  
 ==== Inside ==== ==== Inside ====

[ Back to the navigation ] [ Back to the content ]