[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:tr [2012/03/22 20:52]
zeman Size.
user:zeman:treebanks:tr [2012/03/22 21:11]
zeman Link to the ACL Anthology.
Line 29: Line 29:
   * Principal publications   * Principal publications
     * Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür: Building a Turkish Treebank. In: Anne Abeillé (ed.): Building and Exploiting Syntactically Annotated Corpora. Kluwer Academic Publishers, 2003.     * Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür: Building a Turkish Treebank. In: Anne Abeillé (ed.): Building and Exploiting Syntactically Annotated Corpora. Kluwer Academic Publishers, 2003.
-    * Nart B. Atalay, Kemal Oflazer, Bilge Say: The Annotation Process in the Turkish Treebank. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.+    * Nart B. Atalay, Kemal Oflazer, Bilge Say: [[http://aclweb.org/anthology-new/W/W03/W03-2405.pdf|The Annotation Process in the Turkish Treebank]]. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.
   * Documentation   * Documentation
     * Three PDF files are attached to the CoNLL version in the ''doc'' folder: ttbankkl.pdf (the chapter from Anne Abeillé, contains list of morphological tags), turkishtreebank.pdf (the paper from the EACL workshop) and user_guide.pdf (annotation manual for dependencies, in Turkish).     * Three PDF files are attached to the CoNLL version in the ''doc'' folder: ttbankkl.pdf (the chapter from Anne Abeillé, contains list of morphological tags), turkishtreebank.pdf (the paper from the EACL workshop) and user_guide.pdf (annotation manual for dependencies, in Turkish).
Line 43: Line 43:
 ==== Inside ==== ==== Inside ====
  
-The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.+The original METU-Sabanci Treebank is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is distributed in the [[:format-conll|CoNLL 2006/2007 format]].
  
-Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually.
  
-Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).+There are special derivational nodes. Derived words have been split into several tokens (see also the sample below).
  
 ==== Sample ==== ==== Sample ====
Line 53: Line 53:
 The first sentence of the CoNLL 2007 training data: The first sentence of the CoNLL 2007 training data:
  
-| 1 | Az az Tf | <nowiki>def=yes</nowiki> | 4 | DET | <nowiki>_</nowiki><nowiki>_</nowiki>+| 1 | Ama ama Conj Conj | <nowiki>_</nowiki>| <nowiki>S.MODIFIER</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | elmúlt | elmúlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+hiçbir hiçbir Det Det | <nowiki>_</nowiki>DETERMINER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-nyolc nyolc Mc | <nowiki>n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +şey şey Noun Noun | <nowiki>A3sg|Pnon|Nom</nowiki>OBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki> | 16 | INE | <nowiki>_</nowiki> | <nowiki>_</nowiki>+söylemedim söyle Verb Verb | <nowiki>Neg|Past|A1sg</nowiki> | 8 | SENTENCE | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-<nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> 16 PUNCT | <nowiki>_</nowiki><nowiki>_</nowiki>+ki ki Conj Conj | <nowiki>_</nowiki>INTENSIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | amelyből | amely | P Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+ben ben Pron PersP | <nowiki>A1sg|Pnon|Nom</nowiki>SUBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-összesen összesen Rx | <nowiki>_</nowiki> ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+sizlere siz Pron PersP | <nowiki>A2pl|Pnon|Dat</nowiki>| <nowiki>DATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki> | 11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>Punc Punc | <nowiki>_</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-kényszerűségből kényszerűség Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 10 | szabadságon | szabadság | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 11 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-11 töltött tölt Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-12 Tf | <nowiki>def=yes</nowiki> | 14 DET <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 14 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 15 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 16 | megváltozott | megváltozik | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 17 | itthon | itthon | R | Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 20 | <nowiki>1990-ben</nowiki> | 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> | 21 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 22 | rendszere | rendszer | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 16 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 23 | <nowiki>:</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 24 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 26 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 25 | e | e | P | Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 26 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | kormányfő | kormányfő | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 31 | kivonta | kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki> | 16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 33 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 34 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 34 | Országgyűlés | Országgyűlés | N | Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 36 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 37 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 The first sentence of the CoNLL 2007 test data: The first sentence of the CoNLL 2007 test data:
  
-| 1 | A | a | T | Tf | <nowiki>def=yes</nowiki>DET <nowiki>_</nowiki> <nowiki>_</nowiki>+| 1 | <nowiki>_</nowiki>ötele Verb Verb Pos | 2 | DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Öteleme | <nowiki>_</nowiki>Noun NInf | <nowiki>A3sg|Pnon|Nom</nowiki> | 3 | CLASSIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-3 | kell | kell | V Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki><nowiki>_</nowiki> | +işleminde işlem Noun Noun | <nowiki>A3sg|P3sg|Loc</nowiki>10 | <nowiki>LOCATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 4 | kuyrukta kuyruk Noun Noun | <nowiki>A3sg|Pnon|Loc</nowiki>| <nowiki>LOCATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-azoknak az Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki>8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>_</nowiki>bekle Verb Verb Pos DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+bekleyen | <nowiki>_</nowiki>Adj APresPart | <nowiki>_</nowiki>MODIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 7 | mezőgazdasági | mezőgazdasági | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+eleman eleman Noun Noun | <nowiki>A3sg|Pnon|Nom</nowiki>10 SUBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | termelőknek | termelő | N | Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT <nowiki>_</nowiki> <nowiki>_</nowiki> | +yığına yığın | Noun Noun | <nowiki>A3sg|Pnon|Dat</nowiki>10 | <nowiki>DATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT WPUNCT <nowiki>_</nowiki>3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>_</nowiki>it Verb Verb | <nowiki>_</nowiki>10 DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+10 itilir | <nowiki>_</nowiki>Verb Verb | <nowiki>Pass|Pos|Aor|A3sg</nowiki>11 SENTENCE | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki><nowiki>_</nowiki> | +11 | <nowiki>.</nowiki> | <nowiki>.</nowiki>Punc Punc | <nowiki>_</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-12 <nowiki>,</nowiki> <nowiki>_</nowiki> | WPUNCT WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-13 a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 ATT | <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 15 | héten | hét | N Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-16 megjelent megjelent Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-17 földművelésügyi | földművelésügyi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> 18 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 18 | minisztériumi | minisztériumi | A Af <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 20 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 22 | ősszel | ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 24 | <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki>27 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 26 | belüli | belüli | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | átütemezését | átütemezés | N | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | <nowiki>.</nowiki><nowiki>_</nowiki> | SPUNCT SPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 ==== Parsing ==== ==== Parsing ====

[ Back to the navigation ] [ Back to the content ]