[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:tr [2012/03/22 20:43]
zeman ODTÜ-Sabancı Türkçe Ağaç Yapılı Derlemi
user:zeman:treebanks:tr [2012/03/22 20:57]
zeman Sample.
Line 35: Line 35:
 ==== Domain ==== ==== Domain ====
  
-Mixed: +Post-1990 written Turkishsampled from various genres.
-  * Fiction +
-  * Short essays by 14 to 16 year-old students +
-  * Newspapers (NépszabadságNépszava, Magyar Hírlap, HVG) +
-  * Texts related to computer science +
-  * Legal texts +
-  * Economic and financial short news+
  
 ==== Size ==== ==== Size ====
  
-According to their website, SzTB 2.0 contains 1.2 million words plus 250 thousand punctuation tokens in 82000 sentences. Only a fragment was converted to dependencies in the CoNLL 2007 version: 139,143 tokens in 6424 sentences, yielding 21.66 tokens per sentence on average (131,799 tokens / 6034 sentences training, 7344 tokens / 390 sentences test).+According to their website, the treebank contains 7262 sentences. The CoNLL 2007 version contains 69695 tokens in 5935 sentences, yielding 11.74 tokens per sentence on average (65182 tokens / 5635 sentences training, 4513 tokens / 300 sentences test).
  
 ==== Inside ==== ==== Inside ====
Line 59: Line 53:
 The first sentence of the CoNLL 2007 training data: The first sentence of the CoNLL 2007 training data:
  
-| 1 | Az az Tf | <nowiki>def=yes</nowiki> | 4 | DET | <nowiki>_</nowiki><nowiki>_</nowiki>+| 1 | Ama ama Conj Conj | <nowiki>_</nowiki>| <nowiki>S.MODIFIER</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | elmúlt | elmúlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+hiçbir hiçbir Det Det | <nowiki>_</nowiki>DETERMINER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-nyolc nyolc Mc | <nowiki>n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +şey şey Noun Noun | <nowiki>A3sg|Pnon|Nom</nowiki>OBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki> | 16 | INE | <nowiki>_</nowiki> | <nowiki>_</nowiki>+söylemedim söyle Verb Verb | <nowiki>Neg|Past|A1sg</nowiki> | 8 | SENTENCE | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-<nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> 16 PUNCT | <nowiki>_</nowiki><nowiki>_</nowiki>+ki ki Conj Conj | <nowiki>_</nowiki>INTENSIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | amelyből | amely | P Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+ben ben Pron PersP | <nowiki>A1sg|Pnon|Nom</nowiki>SUBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-összesen összesen Rx | <nowiki>_</nowiki> ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+sizlere siz Pron PersP | <nowiki>A2pl|Pnon|Dat</nowiki>| <nowiki>DATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki> | 11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>Punc Punc | <nowiki>_</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-kényszerűségből kényszerűség Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 10 | szabadságon | szabadság | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 11 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-11 töltött tölt Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-12 Tf | <nowiki>def=yes</nowiki> | 14 DET <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 14 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 15 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 16 | megváltozott | megváltozik | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 17 | itthon | itthon | R | Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 20 | <nowiki>1990-ben</nowiki> | 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> | 21 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 22 | rendszere | rendszer | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 16 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 23 | <nowiki>:</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 24 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 26 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 25 | e | e | P | Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 26 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | kormányfő | kormányfő | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 31 | kivonta | kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki> | 16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 33 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 34 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 34 | Országgyűlés | Országgyűlés | N | Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 36 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 37 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 The first sentence of the CoNLL 2007 test data: The first sentence of the CoNLL 2007 test data:
  
-| 1 | A | a | T | Tf | <nowiki>def=yes</nowiki>DET <nowiki>_</nowiki> <nowiki>_</nowiki>+| 1 | <nowiki>_</nowiki>ötele Verb Verb Pos | 2 | DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Öteleme | <nowiki>_</nowiki>Noun NInf | <nowiki>A3sg|Pnon|Nom</nowiki> | 3 | CLASSIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-3 | kell | kell | V Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki><nowiki>_</nowiki> | +işleminde işlem Noun Noun | <nowiki>A3sg|P3sg|Loc</nowiki>10 | <nowiki>LOCATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 4 | kuyrukta kuyruk Noun Noun | <nowiki>A3sg|Pnon|Loc</nowiki>| <nowiki>LOCATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-azoknak az Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki>8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>_</nowiki>bekle Verb Verb Pos DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+bekleyen | <nowiki>_</nowiki>Adj APresPart | <nowiki>_</nowiki>MODIFIER | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 7 | mezőgazdasági | mezőgazdasági | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+eleman eleman Noun Noun | <nowiki>A3sg|Pnon|Nom</nowiki>10 SUBJECT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | termelőknek | termelő | N | Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT <nowiki>_</nowiki> <nowiki>_</nowiki> | +yığına yığın | Noun Noun | <nowiki>A3sg|Pnon|Dat</nowiki>10 | <nowiki>DATIVE.ADJUNCT</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT WPUNCT <nowiki>_</nowiki>3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>_</nowiki>it Verb Verb | <nowiki>_</nowiki>10 DERIV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+10 itilir | <nowiki>_</nowiki>Verb Verb | <nowiki>Pass|Pos|Aor|A3sg</nowiki>11 SENTENCE | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki><nowiki>_</nowiki> | +11 | <nowiki>.</nowiki> | <nowiki>.</nowiki>Punc Punc | <nowiki>_</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-12 <nowiki>,</nowiki> <nowiki>_</nowiki> | WPUNCT WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-13 a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 ATT | <nowiki>_</nowiki><nowiki>_</nowiki>+
-| 15 | héten | hét | N Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-16 megjelent megjelent Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-17 földművelésügyi | földművelésügyi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> 18 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 18 | minisztériumi | minisztériumi | A Af <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 20 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 22 | ősszel | ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +
-| 24 | <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki>27 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 26 | belüli | belüli | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | átütemezését | átütemezés | N | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | <nowiki>.</nowiki><nowiki>_</nowiki> | SPUNCT SPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 ==== Parsing ==== ==== Parsing ====

[ Back to the navigation ] [ Back to the content ]