[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:eu [2011/11/29 09:38]
zeman License.
user:zeman:treebanks:eu [2011/11/29 10:25]
zeman Inside.
Line 36: Line 36:
 ==== Size ==== ==== Size ====
  
-The CoNLL 2007 version contains 70223 tokens in 2902 sentences, yielding 24.20 tokens per sentence on average (CoNLL 2007 data split: 65419 tokens / 2705 sentences training, 4804 tokens / 197 sentences test).+The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group.
  
-==== Inside ====+^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ 
 +| CoNLL 2007 |  3190 |  50526 |  334 |  5390 |              |   3524 |    55916 |  15.87 | 
 +| BDT-II |  9094 |  124,684 |  1010 |  12625 |  1122 |  14295 |  11226 |  151,604 |  13.50 |
  
-The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]].+==== Inside ====
  
 Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!):
Line 95: Line 97:
   * ASP = aspect   * ASP = aspect
   * ERL = relation (relative sentence, completive sentence, indirect question...)   * ERL = relation (relative sentence, completive sentence, indirect question...)
 +
 +The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags.
  
 ==== Sample ==== ==== Sample ====

[ Back to the navigation ] [ Back to the content ]