[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:es [2011/11/20 21:44]
zeman Spanish domain and size.
user:zeman:treebanks:es [2011/11/20 21:45]
zeman
Line 50: Line 50:
 The CoNLL 2006 version contains 95028 tokens in 3512 sentences, yielding 27.06 tokens per sentence on average (CoNLL 2006 data split: 89334 tokens / 3306 sentences training, 5694 tokens / 206 sentences test). The CoNLL 2006 version contains 95028 tokens in 3512 sentences, yielding 27.06 tokens per sentence on average (CoNLL 2006 data split: 89334 tokens / 3306 sentences training, 5694 tokens / 206 sentences test).
  
-The CoNLL 2009 version contains 528,440 tokens in 17709 sentences, yielding 29.59 tokens per sentence on average (CoNLL 2009 data split: 427,442 tokens / 14329 sentences training, 50368 tokens / 1655 sentences development, 50630 tokens / 1725 sentences test).+The CoNLL 2009 version contains 528,440 tokens in 17709 sentences, yielding 29.84 tokens per sentence on average (CoNLL 2009 data split: 427,442 tokens / 14329 sentences training, 50368 tokens / 1655 sentences development, 50630 tokens / 1725 sentences test).
  
 ==== Inside ==== ==== Inside ====

[ Back to the navigation ] [ Back to the content ]