Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:treebanks:eu [2011/11/29 09:38] zeman License. |
user:zeman:treebanks:eu [2011/11/29 10:20] zeman Size. |
||
---|---|---|---|
Line 36: | Line 36: | ||
==== Size ==== | ==== Size ==== | ||
- | The CoNLL 2007 version contains 70223 tokens | + | The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group. |
+ | |||
+ | ^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | ||
+ | | CoNLL 2007 | 3190 | 50526 | 334 | 5390 | | ||
+ | | BDT-II | 9094 | 124,684 | 1010 | 12625 | 1122 | 14295 | 11226 | 151,604 | 13.50 | | ||
==== Inside ==== | ==== Inside ==== |