[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
user:zeman:treebanks:de [2011/11/20 19:42]
zeman vytvořeno
user:zeman:treebanks:de [2014/10/08 21:24] (current)
zeman Fixed link.
Line 1: Line 1:
 ===== German (de) ===== ===== German (de) =====
  
-[[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/|TIGER Treebank]]+[[http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html|TIGER Treebank]]
  
 ==== Versions ==== ==== Versions ====
Line 7: Line 7:
   * TIGER Treebank 1 (2003)   * TIGER Treebank 1 (2003)
   * TIGER Treebank 2 (2005)   * TIGER Treebank 2 (2005)
-  * TIGER Treebank 2.1 (2007) in [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/doc/html/TigerXML.html|TIGER-XML]] or Negra export (text) format+  * TIGER Treebank 2.1 (2007) in [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/TIGERSearch/doc/html/TigerXML.html|TIGER-XML]] or Negra export (text) format
   * CoNLL 2006   * CoNLL 2006
   * CoNLL 2009   * CoNLL 2009
Line 13: Line 13:
 ==== Obtaining and License ==== ==== Obtaining and License ====
  
-The TIGER Treebank is freely downloadable after you accept the [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/license/htmllicense.shtml|license terms]] by pressing a button.+The TIGER Treebank is freely downloadable after you accept the [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html|license terms]] by pressing a button.
  
 Republication of the two CoNLL versions in LDC is planned but it has not happenned yet. Republication of the two CoNLL versions in LDC is planned but it has not happenned yet.
Line 31: Line 31:
  
   * Website   * Website
-    * http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/+    * http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html
   * Data   * Data
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, George Smith: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/treeling2002.pdf|The TIGER Treebank]]. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT), Sozopol, Bulgaria, 2002. +    * Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, George Smith: The TIGER Treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT), Sozopol, Bulgaria, 2002. 
-    * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/|List of publications]] +    * Sabine Brants, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Esther König, Wolfgang Lezius, Christian Rohrer, George Smith, Hans UszkoreitTIGER: Linguistic Interpretation of a German CorpusJournal of Language and Computation, 2004 (2), 597-620
-  * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/|Documentation]] +  * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html|Documentation]] 
-    * [[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]] (part of speech) +    * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]] (part of speech) 
-    * Berthold Crysmann, Silvia Hansen-Schirra, George Smith, Dorothea Ziegler-Eisele: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-morph.pdf|TIGER Morphologie-Annotationsschema]], 2005. +    * Berthold Crysmann, Silvia Hansen-Schirra, George Smith, Dorothea Ziegler-Eisele: [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-morph.pdf|TIGER Morphologie-Annotationsschema]], 2005. 
-    * Stefanie Albert, Jan Anderssen, Regine Bader, Stephanie Becker, Tobias Bracht, Sabine Brants, Thorsten Brants, Vera Demberg, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Hagen Hirschmann, Juliane Janitzek, Carolin Kirstein, Robert Langner, Lukas Michelbacher, Oliver Plaehn, Cordula Preis, Marcus Pußel, Marco Rower, Bettina Schrader, Anne Schwartz, George Smith, Hans Uszkoreit: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-syntax.pdf|TIGER Annotationsschema]] //(syntax)//, 2003.+    * Stefanie Albert, Jan Anderssen, Regine Bader, Stephanie Becker, Tobias Bracht, Sabine Brants, Thorsten Brants, Vera Demberg, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Hagen Hirschmann, Juliane Janitzek, Carolin Kirstein, Robert Langner, Lukas Michelbacher, Oliver Plaehn, Cordula Preis, Marcus Pußel, Marco Rower, Bettina Schrader, Anne Schwartz, George Smith, Hans Uszkoreit: [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-syntax.pdf|TIGER Annotationsschema]] //(syntax)//, 2003.
     * The header of the XML version of the TIGER Treebank contains lists of various sorts of tags with brief explanation.     * The header of the XML version of the TIGER Treebank contains lists of various sorts of tags with brief explanation.
  
Line 61: Line 61:
 It is not clear what the //semi-automatic// annotation means (probably first auto-tagging, then manual correction?) and whether it also applies to the morphosyntactic annotation. The CoNLL 2009 version also contains automatically disambiguated lemmas, tags and features. It is not clear what the //semi-automatic// annotation means (probably first auto-tagging, then manual correction?) and whether it also applies to the morphosyntactic annotation. The CoNLL 2009 version also contains automatically disambiguated lemmas, tags and features.
  
-The original treebank is phrase-based. The dependencies in the CoNLL versions must have thus been drawn using a head-selection procedure. Besides CoNLL data, the TIGER project also provides a subset of the TIGER Treebank in a dependency format.+The original treebank is phrase-based. The dependencies in the CoNLL versions must have thus been drawn using a head-selection procedure. Besides CoNLL data, the TIGER project also provides a subset of the TIGER Treebank in a dependency format. (Note that it is possible in the TIGER-XML format to mark the head of each phrase using a particular edge label, e.g. ''HD''. However, it is not guaranteed that every phrase in the TIGER Treebank contains just one head constituent, see the sample below.)
  
 ==== Sample ==== ==== Sample ====

[ Back to the navigation ] [ Back to the content ]