Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:treebanks:hu [2011/12/13 12:55] zeman Domain. |
user:zeman:treebanks:hu [2011/12/13 13:32] zeman Inside. |
==== Size ==== | ==== Size ==== |
| |
The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group. | According to their website, SzTB 2.0 contains 1.2 million words plus 250 thousand punctuation tokens in 82000 sentences. Only a fragment was converted to dependencies in the CoNLL 2007 version: 139,143 tokens in 6424 sentences, yielding 21.66 tokens per sentence on average (131,799 tokens / 6034 sentences training, 7344 tokens / 390 sentences test). |
| |
^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | |
| CoNLL 2007 | 3190 | 50526 | 334 | 5390 | | | 3524 | 55916 | 15.87 | | |
| BDT-II | 9094 | 124,684 | 1010 | 12625 | 1122 | 14295 | 11226 | 151,604 | 13.50 | | |
| |
==== Inside ==== | ==== Inside ==== |
| |
Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format. | The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format. |
| |
Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column. |
| |
* IZE = noun | |
* ARR = common | |
* IZB = proper name | |
* LIB = place name | |
* ZKI = number | |
* ADJ = adjective | |
* ARR = common | |
* GAL = question | |
* ADI = verb | |
* SIN = simple | |
* ADK = composed | |
* ADP = periphrastic | |
* FAK = factitive | |
* ADB = adverb | |
* ARR = common | |
* GAL = question | |
* DET = determiner | |
* ERKARR = demonstrative common | |
* ERKIND = demonstrative emphatic | |
* NOLARR = indefinite common | |
* NOLGAL = indefinite question | |
* ZNB = number | |
* DZH = definite | |
* BAN = distributive | |
* ORD = ordinal | |
* DZG = indefinite | |
* ORO = general | |
* IOR = pronoun | |
* PERARR = personal common | |
* PERIND = personal emphatic | |
* IZGMGB = indefinite | |
* IZGGAL = question | |
* BIH = ??? | |
* ELK = ??? | |
* LOT = link | |
* LOK = connector | |
* JNT = conjunction | |
* PRT = particle | |
* ITJ = interjection | |
* BST = other | |
* ADL = auxiliary verb | |
* ADT = synthetic verb | |
* SIG = acronym | |
* SNB = symbol | |
* LAB = abbreviation | |
| |
Main features: | |
| |
* KAS = case. Various descriptions of Basque grammar list different numbers of cases and it is not easy to match all of the BDT case tags with them. Some but not all of them are described in the Annex 3 of the technical report mentioned above. The following list gives all case tags occurring in BDT with their frequencies in brackets. | |
* KAS:ABL (984) = ablativo = ablative | |
* KAS:ABS (22805) = absolutivo = absolutive | |
* KAS:ABU (32) = adlativo terminal ("-raino") = "until, as far as" = terminative | |
* KAS:ABZ (27) = adlativo direccional ("-rantz") = "towards" ~ lative? | |
* KAS:ALA (1093) = adlativo = allative | |
* KAS:BNK (13) =? special case of the locative genitive ("-ko", "-eko") | |
* KAS:DAT (1451) = dativo = dative | |
* KAS:DES (181) = destinativo = benefactive ("-entzat") | |
* KAS:DESK (223) =? descriptive locative genitive ("-ko", "-eko"), also frequently used for counted noun after numeral | |
* KAS:EM (705) = multiword token with postposition (e.g. "_gabe", "_arabera", "_batera", "_bezala"...) | |
* KAS:ERG (6059) = ergativo = ergative | |
* KAS:GEL (6259) = genitivo locativo = locative genitive | |
* KAS:GEN (4307) = genitivo de posesión = possessive genitive | |
* KAS:INE (7690) = inesivo = inessive | |
* KAS:INS (1370) = instrumental | |
* KAS:MOT (165) = motivativo = causative | |
* KAS:PAR (930) = partitivo = partitive | |
* KAS:PRO (89) = prolativo = essive | |
* KAS:SOZ (928) = asociativo = comitative | |
* ASP = aspect | |
* ERL = relation (relative sentence, completive sentence, indirect question...) | |
| |
The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags. | |
| |
Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait). | |
| |
==== Sample ==== | ==== Sample ==== |
The first sentence of the CoNLL 2007 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | espainiako_poliziak | Espainia_Poliziak | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ENTI_LOC | 4 | ncsubj | _ | _ | | | 1 | Az | az | T | Tf | <nowiki>def=yes</nowiki> | 4 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | hiru | hiru | DET | DET_DZH | NMGP | 3 | detmod | _ | _ | | | 2 | elmúlt | elmúlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | gazte | gazte | IZE | IZE_ARR | ABS<nowiki>|</nowiki>MG | 4 | ncobj | _ | _ | | | 3 | nyolc | nyolc | M | Mc | <nowiki>n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | atxilotu | atxilotu | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | | | 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki> | 16 | INE | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 4 | auxmod | _ | _ | | | 5 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | atarrabian | Atarrabia | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 4 | ncmod | _ | _ | | | 6 | amelyből | amely | P | Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | , | , | PUNC | PUNC_KOMA | _ | 6 | PUNC | _ | _ | | | 7 | összesen | összesen | R | Rx | <nowiki>_</nowiki> | 8 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | eta | eta | LOT | LOT_JNT | - | 0 | ROOT | _ | _ | | | 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki> | 11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 9 | madrilera | Madril | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ALA<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 10 | ncmod | _ | _ | | | 9 | kényszerűségből | kényszerűség | N | Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 10 | eraman | eraman | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | | | 10 | szabadságon | szabadság | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 11 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 11 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 10 | auxmod | _ | _ | | | 11 | töltött | tölt | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 12 | . | . | PUNC | PUNC_PUNC | _ | 11 | PUNC | _ | _ | | | 12 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 14 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 14 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 15 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 16 | megváltozott | megváltozik | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 17 | itthon | itthon | R | Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 20 | <nowiki>1990-ben</nowiki> | 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> | 21 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 22 | rendszere | rendszer | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 16 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | <nowiki>:</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 24 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 26 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | e | e | P | Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 26 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 29 | kormányfő | kormányfő | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 31 | kivonta | kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki> | 16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 33 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 34 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 34 | Országgyűlés | Országgyűlés | N | Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 36 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 37 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
The first sentence of the CoNLL 2007 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | epaileek | epaile | IZE | IZE_ARR | BIZ+<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 1 | A | a | T | Tf | <nowiki>def=yes</nowiki> | 2 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | diote | esan | ADT | ADT | PNT<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NK_HAIEK-K | | | 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | eaeko | EAE | IZE | IZE_LIB | SIG<nowiki>|</nowiki>GEL<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | | | 3 | kell | kell | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | parlamentarioek | parlamentario | ADJ | ADJ_ARR | IZAUR-<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | eaetik_kanpo | EAE | SIG | SIG- | DEK<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>ABL_kanpo_ABS<nowiki>|</nowiki>ENTI_LOC<nowiki>|</nowiki>POS | | | 5 | azoknak | az | P | Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | eginiko | egin | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | | | 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | delituak | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 7 | mezőgazdasági | mezőgazdasági | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | ikertzea | ikertu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | | | 8 | termelőknek | termelő | N | Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 9 | eta | eta | LOT | LOT_JNT | - | | | 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | | | 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 11 | auzitegi_gorenari | auzitegi_gora | ADJ | ADJ_IZO | DEK<nowiki>|</nowiki>GEN<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>DAT<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | | | 11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 12 | dagokiola | egon | ADT | ADT | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NI_HARI | | | 12 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 13 | , | , | PUNC | PUNC_KOMA | _ | | | 13 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 14 | baina | baina | LOT | LOT_JNT | AURK | | | 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 15 | atzerrian | atzerri | IZE | IZE_ARR | INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM | | | 15 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 16 | izaniko | izan | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | | | 16 | megjelent | megjelent | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 17 | kontaktu | kontaktu | IZE | IZE_ARR | _ | | | 17 | földművelésügyi | földművelésügyi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 18 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 18 | horiek | horiek | DET | DET_ERKARR | ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 18 | minisztériumi | minisztériumi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 19 | ezin_direla | ezin_izan | ADI | ADI_ADK | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>MWCorrect | | | 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 20 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 20 | delitutzat | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>PRO<nowiki>|</nowiki>MG | | | 20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 21 | hartu | hartu | ADI | ADI_SIN | PART | | | 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 22 | . | . | PUNC | PUNC_PUNC | _ | | | 22 | ősszel | ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
The first sentence of the BDT-II training data: | | 24 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 27 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA | IZE | LIB | PLU:+<nowiki>|</nowiki>IZAUR:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>MW:B<nowiki>|</nowiki>ENT:Erakundea | 2 | ncmod | _ | _ | | | 26 | belüli | belüli | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | buru | buru | IZE | ARR | _ | 4 | ncsubj | _ | _ | | | 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | ohiak | ohi | ADJ | ARR | IZAUR:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | | 28 | átütemezését | átütemezés | N | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | aztertuko | aztertu | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 0 | ROOT | _ | _ | | | 29 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | du | *edun | ADL | ADL | MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 4 | auxmod | _ | _ | | |
| 6 | RUCen | RUC | IZE | IZB | MTKAT:SIG<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Erakundea | 7 | ncmod | _ | _ | | |
| 7 | erreforma | erreforma | IZE | ARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncobj | _ | _ | | |
| 8 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 7 | PUNC | _ | _ | | |
| |
The first sentence of the BDT-II development data: | |
| |
| 1 | Irakaskuntzan | irakaskuntza | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 2 | jardun | jardun | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:BURU | 0 | ROOT | _ | _ | | |
| 3 | zuen | *edun | ADL | ADL | MDN:B1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 2 | auxmod | _ | _ | | |
| 4 | Miel | Miel | IZE | IZB | PLU:-<nowiki>|</nowiki>ENT:Pertsona | 5 | entios | _ | _ | | |
| 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Pertsona | 2 | ncsubj | _ | _ | | |
| 6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | | |
| 7 | urtetik | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:ABL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 8 | 1992ra | 1992 | IZE | ZKI | KAS:ALA<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 9 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | | |
| 10 | hauetatik | hauek | DET | ERKARR | KAS:ABL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | | |
| 12 | urtez | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INS<nowiki>|</nowiki>MUG:MG | 16 | lot | _ | _ | | |
| 13 | Azpeitiko | Azpeitia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 14 | ncmod | _ | _ | | |
| 14 | ikastolan | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 15 | irakasle | irakasle | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 16 | ncpred | _ | _ | | |
| 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | | |
| 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | | |
| 18 | biak | bi | IZE | ZKI | KAS:ABS<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | lot | _ | _ | | |
| 19 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 18 | PUNC | _ | _ | | |
| 20 | Arabako | Araba | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 21 | ncmod | _ | _ | | |
| 21 | ikastolen | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 22 | ncmod | _ | _ | | |
| 22 | elkartean | elkarte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | | |
| |
The first sentence of the BDT-II test data: | |
| |
| 1 | Hegoaldean | hegoalde | IZE | ARR | KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 2 | iduri_zait | iduri_izan | ADI | ADK | ASP:PNT<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORI:NIRI<nowiki>|</nowiki>MW:B | 0 | ROOT | _ | _ | | |
| 3 | euskararen | euskara | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncmod | _ | _ | | |
| 4 | mundu | mundu | IZE | ARR | BIZ:- | 7 | ncsubj | _ | _ | | |
| 5 | hau | hau | DET | ERKARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | detmod | _ | _ | | |
| 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | | |
| 7 | dagola | egon | ADT | ADT | ASP:PNT<nowiki>|</nowiki>ERL:KONPL<nowiki>|</nowiki>MDN:A3<nowiki>|</nowiki>NOR:HURA | 2 | ccomp_obj | _ | _ | | |
| 8 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 7 | PUNC | _ | _ | | |
| 9 | Euskaltzaindiak | Euskaltzaindia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 11 | ncsubj | _ | _ | | |
| 10 | zer | zer | DET | NOLGAL | NMG:MG<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 11 | ncobj | _ | _ | | |
| 11 | erranen | erran | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 13 | menos | _ | _ | | |
| 12 | duen | *edun | ADL | ADL | ERL:ZHG<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 11 | auxmod | _ | _ | | |
| 13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | | |
| 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | | |
| 15 | haren | hura | DET | ERKARR | KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 16 | arauen | arau | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 18 | ncmod | _ | _ | | |
| 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | | |
| 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE<nowiki>|</nowiki>ERL:HELB<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 7 | xmod | _ | _ | | |
| 19 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 18 | PUNC | _ | _ | | |
| |
==== Parsing ==== | ==== Parsing ==== |