[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:ja [2012/01/03 22:47]
zeman vytvořeno
user:zeman:treebanks:ja [2012/01/04 09:25]
zeman Domain and size.
Line 12: Line 12:
 ==== Obtaining and License ==== ==== Obtaining and License ====
  
-To obtain the treebank, download [[http://www.sfs.uni-tuebingen.de/resources/tuebajs-license.pdf|the license agreement]], print it, fill it out and sign it, scan and send it back to Kathrin Beck (kbeck (at) sfs (dot) uni-tuebingen (dot) de). The license in short:+To obtain the treebank, download [[http://www.sfs.uni-tuebingen.de/resources/tuebajs-license.pdf|the license agreement]], print it, fill it out and sign it, scan and send it back to Kathrin Beck (kbeck (at) sfs (dot) uni-tuebingen (dot) de). She will send you the password for the download page. The license in short:
  
   * academic research usage   * academic research usage
Line 23: Line 23:
  
   * Website   * Website
-    * http://www.buch-kromann.dk/matthias/treebank/ (the old and no longer accessible website from <nowiki>http://www.id.cbs.dk/~mtk/</nowiki> has been moved here)+    * http://www.sfs.uni-tuebingen.de/en/tuebajs.shtml
   * Data   * Data
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Matthias Trautner Kromann: [[http://www.buch-kromann.dk/matthias/files/030730-tlt-norfa.pdf|The Danish Dependency Treebank and the DTAG Treebank Tool]]. In: Proceedings of Treebanks and Linguistic TheoriesVäxjöSweden2003.+    * Yasuhiro Kawata, Julia Bartels: Stylebook for the Japanese Treebank in Verbmobil, Report 240, September 29, 2000. 
 +    * Sabine Buchholz, Erwin Marsi: [[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=165|CoNLL-X shared task on Multilingual Dependency Parsing]]. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 149-164New YorkUSA2006.
   * Documentation   * Documentation
-    * //see the left-hand-side links at the treebank websiteeg.:// +    * Yasuhiro Kawata, Julia Bartels: Stylebook for the Japanese Treebank in Verbmobil{{:user:zeman:treebanks:report-240-00.pdf|Report 240}}, has been distributed together with the CoNLL 2006 version of the treebank (file ''doc/report-240-00.ps'').
-    * [[http://www.buch-kromann.dk/matthias/treebank/theory.html|Dependency theory and list of dependency relation labels]] +
-    * Britt Keson: [[http://www.buch-kromann.dk/matthias/treebank/PAROLE-manual.pdf|Vejledning til det danske morfosyntaktisk taggede PAROLE-korpus]] (morphosyntactic tags). Det Danske Sprog- og Litteraturselskab (DSL)+
  
 ==== Domain ==== ==== Domain ====
  
-Unknown (the underlying PAROLE corpus “consists of quotations of 150-250 words from wide range of randomly selected linguistically representative Danish texts from 1983-1992.”)+Spoken dialogues, negotiations about time and place of business meetings. That is why many sentences are relatively short (a frequent single-word sentence is //hai// = “yes”).
  
 ==== Size ==== ==== Size ====
  
-The CoNLL 2006 version contains 100,238 tokens in 5512 sentences, yielding 18.19 tokens per sentence on average (CoNLL 2006 data split: 94386 tokens / 5190 sentences training, 5852 tokens / 322 sentences test).+The CoNLL 2006 version contains 157,172 tokens in 17753 sentences, yielding 8.85 tokens per sentence on average (CoNLL 2006 data split: 151,461 tokens / 17044 sentences training, 5711 tokens / 709 sentences test).
  
 ==== Inside ==== ==== Inside ====

[ Back to the navigation ] [ Back to the content ]