[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:et [2011/11/21 10:30]
zeman vytvořeno
user:zeman:treebanks:et [2011/11/21 13:31]
zeman
Line 1: Line 1:
 ===== Estonian (et) ===== ===== Estonian (et) =====
  
-[[http://vvv.cs.ut.ee/~kaili/Korpus/puud/|Eesti keele puudepank]] ([[http://translate.google.cz/translate?sl=et&tl=en&js=n&prev=_t&hl=cs&ie=UTF-8&layout=2&eotf=1&u=http%3A%2F%2Fvvv.cs.ut.ee%2F~kaili%2FKorpus%2Fpuud%2F&act=url|Google translate]])+[[http://vvv.cs.ut.ee/~kaili/Korpus/puud/|Eesti keele puudepank]] ([[http://translate.google.cz/translate?sl=et&tl=en&js=n&prev=_t&hl=cs&ie=UTF-8&layout=2&eotf=1&u=http%3A%2F%2Fvvv.cs.ut.ee%2F~kaili%2FKorpus%2Fpuud%2F&act=url|Google translate]]) (EKP)
  
 ==== Versions ==== ==== Versions ====
  
-  * TIGER Treebank 1 (2003+  * Downloadable on-line, part of Arborest project (puudepank
-  * TIGER Treebank 2 (2005+  * 8.12.2010 arborest.xml downloadable from the same site (same size, improved markup
-  * TIGER Treebank 2.1 (2007) in [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/doc/html/TigerXML.html|TIGER-XML]] or Negra export (text) format +  * http://vvv.cs.ut.ee/~kaili/Korpus/pindmine/
-  * CoNLL 2006 +
-  * CoNLL 2009+
  
 ==== Obtaining and License ==== ==== Obtaining and License ====
  
-The TIGER Treebank is freely downloadable after you accept the [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/license/htmllicense.shtml|license terms]] by pressing a button.+The EKP is freely [[http://vvv.cs.ut.ee/~kaili/Korpus/puud/|downloadable from here]] in [[http://beta.visl.sdu.dk/treebanks.html#The_source_format|VISL]] or [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/doc/html/TigerXML.html|TIGER-XML]] format. Licensing terms are unknown.
  
-Republication of the two CoNLL versions in LDC is planned but it has not happenned yet. +EKP was created / coordinated (?by Kaili Müürisep, [[http://www.cs.ut.ee/|Institute of Computer Science]] (Arvutiteaduse instituut), University of Tartu (Tartu Ülikool), Liivi 250409 TartuEstonia.
- +
-The license in short: +
- +
-  * non-commercial research and evaluation usage by academic or educational institutions +
-  * no redistribution +
-  * acknowledge the use of the corpus in publications +
- +
-The TIGER Treebank was created by members of three institutes: +
-  * [[http://www.coli.uni-saarland.de/|Department of Computational Linguistics and Phonetics]] (Computerlinguistik, CoLi), Saarland University (Universität des Saarlandes), Postfach 151150, D-66041 Saarbrücken, Germany. +
-  * [[http://www.ims.uni-stuttgart.de/|Institute for Natural Language Processing]] (Institut für Maschinelle Sprachverarbeitung, IMS), University of Stuttgart (Universität Stuttgart), Azenbergstraße 12, D-70174 Stuttgart, Germany. +
-  * [[http://www.uni-potsdam.de/germanistik/|German Department]] (Institut für Germanistik), Philosophische Fakultät, Universität Potsdam, Am Neuen Palais 10, Haus 05D-14469 PotsdamGermany.+
  
 ==== References ==== ==== References ====
  
   * Website   * Website
-    * http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/+    * http://vvv.cs.ut.ee/~kaili/Korpus/puud([[http://translate.google.cz/translate?sl=et&tl=en&js=n&prev=_t&hl=cs&ie=UTF-8&layout=2&eotf=1&u=http%3A%2F%2Fvvv.cs.ut.ee%2F~kaili%2FKorpus%2Fpuud%2F&act=url|Google translate]])
   * Data   * Data
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Sabine BrantsStefanie DipperSilvia HansenWolfgang LeziusGeorge Smith: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/treeling2002.pdf|The TIGER Treebank]]. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT)Sozopol, Bulgaria, 2002. +    * Kaili MüürisepTiina PuolakainenKadri MuischnekMare KoitTiit Roosmaa, Heli Uibo: [[https://nats-www.informatik.uni-hamburg.de/intern/proceedings/2003/RANLP/papers/p16.pdf|A New Language for Constraint Grammar: Estonian]]. In: International Conference Recent Advances in Natural Language Processing. Proceedings, pp. 304-310, Borovets, Bulgaria, 2003
-    * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/|List of publications]] +  * Documentation 
-  * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/|Documentation]] +    * [[http://beta.visl.sdu.dk/treebanks.html#The_source_format|File formats]] 
-    * [[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]] (part of speech) +    * The header of the TIGER-XML version of the treebank contains lists of various sorts of tags with brief explanation.
-    * Berthold Crysmann, Silvia Hansen-Schirra, George Smith, Dorothea Ziegler-Eisele: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-morph.pdf|TIGER Morphologie-Annotationsschema]], 2005. +
-    * Stefanie Albert, Jan Anderssen, Regine Bader, Stephanie Becker, Tobias Bracht, Sabine Brants, Thorsten Brants, Vera Demberg, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Hagen Hirschmann, Juliane Janitzek, Carolin Kirstein, Robert Langner, Lukas Michelbacher, Oliver Plaehn, Cordula Preis, Marcus Pußel, Marco Rower, Bettina Schrader, Anne Schwartz, George Smith, Hans Uszkoreit: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-syntax.pdf|TIGER Annotationsschema]] //(syntax)//, 2003. +
-    * The header of the XML version of the TIGER Treebank contains lists of various sorts of tags with brief explanation.+
  
 ==== Domain ==== ==== Domain ====
  
-Mostly newswire (Frankfurter Rundschau).+Mixed: 
 +  * 388 tailored sentences with movement verbs 
 +  * 732 sentences with movement verbs from the Estonian FrameNet corpus 
 +  * 175 sentences from the Arborest corpus 
 +  * 20 sentences of spoken language
  
 ==== Size ==== ==== Size ====
Line 56: Line 44:
  
 ==== Inside ==== ==== Inside ====
 +
 +The treebank is part of the [[http://corp.hum.sdu.dk/tgrepeye_est.html|Arborest]] project and [[http://beta.visl.sdu.dk/|VISL]] (Visual Interactive Syntax Learning). As such, it is based on Constraint Grammar (Fred Karlsson et al., 1995: Constraint Grammar – A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter).
  
 All versions contain //semi-automatic// part of speech tags ([[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]], STTS) and syntactic structure. Lemmas and morphosyntactic features are available only for newer versions (TIGER Treebank version 2 and onwards, and CoNLL 2009). The parts of speech are heavily context-dependent, e.g. many words can be used both substantively (pronouns) and attributively (determiners), which is distinguished by different POS tags. All versions contain //semi-automatic// part of speech tags ([[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]], STTS) and syntactic structure. Lemmas and morphosyntactic features are available only for newer versions (TIGER Treebank version 2 and onwards, and CoNLL 2009). The parts of speech are heavily context-dependent, e.g. many words can be used both substantively (pronouns) and attributively (determiners), which is distinguished by different POS tags.

[ Back to the navigation ] [ Back to the content ]