Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:treebanks:eu [2011/11/29 09:38] zeman License. |
user:zeman:treebanks:eu [2011/11/29 10:25] zeman Inside. |
||
---|---|---|---|
Line 36: | Line 36: | ||
==== Size ==== | ==== Size ==== | ||
- | The CoNLL 2007 version contains 70223 tokens in 2902 sentences, yielding 24.20 tokens per sentence on average (CoNLL 2007 data split: 65419 tokens / 2705 sentences | + | The CoNLL 2007 dataset was officially |
- | ==== Inside ==== | + | ^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ |
+ | | CoNLL 2007 | 3190 | 50526 | 334 | 5390 | | ||
+ | | BDT-II | 9094 | 124,684 | 1010 | 12625 | 1122 | 14295 | 11226 | 151,604 | 13.50 | | ||
- | The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http:// | + | ==== Inside ==== |
Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | ||
Line 95: | Line 97: | ||
* ASP = aspect | * ASP = aspect | ||
* ERL = relation (relative sentence, completive sentence, indirect question...) | * ERL = relation (relative sentence, completive sentence, indirect question...) | ||
+ | |||
+ | The syntactic guidelines (structure and labels) are described in Spanish in this [[http:// | ||
==== Sample ==== | ==== Sample ==== |