Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
user:zeman:treebanks:eu [2011/11/29 10:25] zeman Inside. |
user:zeman:treebanks:eu [2011/11/29 10:42] zeman Sample. |
| |
==== Inside ==== | ==== Inside ==== |
| |
| Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format. |
| |
Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): |
The first sentence of the CoNLL 2007 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | | | 1 | espainiako_poliziak | Espainia_Poliziak | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ENTI_LOC | 4 | ncsubj | _ | _ | |
| 2 | Τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 3 | Atr | _ | _ | | | 2 | hiru | hiru | DET | DET_DZH | NMGP | 3 | detmod | _ | _ | |
| 3 | αντισώματα | αντίσωμα | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 5 | Sb | _ | _ | | | 3 | gazte | gazte | IZE | IZE_ARR | ABS<nowiki>|</nowiki>MG | 4 | ncobj | _ | _ | |
| 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | | | 4 | atxilotu | atxilotu | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | |
| 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | | | 5 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 4 | auxmod | _ | _ | |
| 6 | σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | | | 6 | atarrabian | Atarrabia | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 4 | ncmod | _ | _ | |
| 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 8 | Atr | _ | _ | | | 7 | , | , | PUNC | PUNC_KOMA | _ | 6 | PUNC | _ | _ | |
| 8 | μνήμη | μνήμη | No | NoCm | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 6 | Adv | _ | _ | | | 8 | eta | eta | LOT | LOT_JNT | - | 0 | ROOT | _ | _ | |
| 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | | 9 | madrilera | Madril | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ALA<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 10 | ncmod | _ | _ | |
| 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | | | 10 | eraman | eraman | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | |
| 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 12 | Atr | _ | _ | | | 11 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 10 | auxmod | _ | _ | |
| 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | | | 12 | . | . | PUNC | PUNC_PUNC | _ | 11 | PUNC | _ | _ | |
| 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | | |
| 14 | ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | | |
| 15 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 14 | Sb | _ | _ | | |
| 16 | ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | | |
| 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | | |
| 18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 | Pnom | _ | _ | | |
| 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | | |
| 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | | |
| 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | | |
| 22 | χρόνιες | χρόνιος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | | |
| 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 19 | Atr | _ | _ | | |
| 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | | |
| 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | |
| 26 | εξηγεί | εξηγώ | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | | |
| 27 | η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | | |
| 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | | |
| 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | | |
| 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | |
| |
The first sentence of the CoNLL 2007 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | Η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr | _ | _ | | | 1 | epaileek | epaile | IZE | IZE_ARR | BIZ+<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 2 | Σίφνος | Σίφνος | No | NoPr | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 3 | Sb | _ | _ | | | 2 | diote | esan | ADT | ADT | PNT<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NK_HAIEK-K | |
| 3 | φημίζεται | φημίζομαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | | | 3 | eaeko | EAE | IZE | IZE_LIB | SIG<nowiki>|</nowiki>GEL<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | |
| 4 | και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | | | 4 | parlamentarioek | parlamentario | ADJ | ADJ_ARR | IZAUR-<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | | | 5 | eaetik_kanpo | EAE | SIG | SIG- | DEK<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>ABL_kanpo_ABS<nowiki>|</nowiki>ENTI_LOC<nowiki>|</nowiki>POS | |
| 6 | τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | | | 6 | eginiko | egin | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | |
| 7 | καταγάλανα | καταγάλανος | Aj | Aj | Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | | | 7 | delituak | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 8 | νερά | νερό | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 5 | Obj | _ | _ | | | 8 | ikertzea | ikertu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | |
| 9 | των | ο | At | AtDf | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | | | 9 | eta | eta | LOT | LOT_JNT | - | |
| 10 | πανέμορφων | πανέμορφος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | | | 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | |
| 11 | ακτών | ακτή | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 8 | Atr | _ | _ | | | 11 | auzitegi_gorenari | auzitegi_gora | ADJ | ADJ_IZO | DEK<nowiki>|</nowiki>GEN<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>DAT<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | |
| 12 | της | μου | Pn | PnPo | Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx | 11 | Atr | _ | _ | | | 12 | dagokiola | egon | ADT | ADT | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NI_HARI | |
| 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | | 13 | , | , | PUNC | PUNC_KOMA | _ | |
| | 14 | baina | baina | LOT | LOT_JNT | AURK | |
| | 15 | atzerrian | atzerri | IZE | IZE_ARR | INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM | |
| | 16 | izaniko | izan | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | |
| | 17 | kontaktu | kontaktu | IZE | IZE_ARR | _ | |
| | 18 | horiek | horiek | DET | DET_ERKARR | ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| | 19 | ezin_direla | ezin_izan | ADI | ADI_ADK | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>MWCorrect | |
| | 20 | delitutzat | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>PRO<nowiki>|</nowiki>MG | |
| | 21 | hartu | hartu | ADI | ADI_SIN | PART | |
| | 22 | . | . | PUNC | PUNC_PUNC | _ | |
| |
| The first sentence of the BDT-II training data: |
| |
| | 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA | IZE | LIB | PLU:+<nowiki>|</nowiki>IZAUR:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>MW:B<nowiki>|</nowiki>ENT:Erakundea | 2 | ncmod | _ | _ | |
| | 2 | buru | buru | IZE | ARR | _ | 4 | ncsubj | _ | _ | |
| | 3 | ohiak | ohi | ADJ | ARR | IZAUR:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 4 | aztertuko | aztertu | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 0 | ROOT | _ | _ | |
| | 5 | du | *edun | ADL | ADL | MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 4 | auxmod | _ | _ | |
| | 6 | RUCen | RUC | IZE | IZB | MTKAT:SIG<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Erakundea | 7 | ncmod | _ | _ | |
| | 7 | erreforma | erreforma | IZE | ARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncobj | _ | _ | |
| | 8 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 7 | PUNC | _ | _ | |
| |
| The first sentence of the BDT-II development data: |
| |
| | 1 | Irakaskuntzan | irakaskuntza | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 2 | jardun | jardun | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:BURU | 0 | ROOT | _ | _ | |
| | 3 | zuen | *edun | ADL | ADL | MDN:B1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 2 | auxmod | _ | _ | |
| | 4 | Miel | Miel | IZE | IZB | PLU:-<nowiki>|</nowiki>ENT:Pertsona | 5 | entios | _ | _ | |
| | 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Pertsona | 2 | ncsubj | _ | _ | |
| | 6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | |
| | 7 | urtetik | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:ABL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 8 | 1992ra | 1992 | IZE | ZKI | KAS:ALA<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 9 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | |
| | 10 | hauetatik | hauek | DET | ERKARR | KAS:ABL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | |
| | 12 | urtez | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INS<nowiki>|</nowiki>MUG:MG | 16 | lot | _ | _ | |
| | 13 | Azpeitiko | Azpeitia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 14 | ncmod | _ | _ | |
| | 14 | ikastolan | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 15 | irakasle | irakasle | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 16 | ncpred | _ | _ | |
| | 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | |
| | 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | |
| | 18 | biak | bi | IZE | ZKI | KAS:ABS<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | lot | _ | _ | |
| | 19 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 18 | PUNC | _ | _ | |
| | 20 | Arabako | Araba | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 21 | ncmod | _ | _ | |
| | 21 | ikastolen | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 22 | ncmod | _ | _ | |
| | 22 | elkartean | elkarte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | |
| |
| The first sentence of the BDT-II test data: |
| |
| | 1 | Hegoaldean | hegoalde | IZE | ARR | KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 2 | iduri_zait | iduri_izan | ADI | ADK | ASP:PNT<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORI:NIRI<nowiki>|</nowiki>MW:B | 0 | ROOT | _ | _ | |
| | 3 | euskararen | euskara | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncmod | _ | _ | |
| | 4 | mundu | mundu | IZE | ARR | BIZ:- | 7 | ncsubj | _ | _ | |
| | 5 | hau | hau | DET | ERKARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | detmod | _ | _ | |
| | 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | |
| | 7 | dagola | egon | ADT | ADT | ASP:PNT<nowiki>|</nowiki>ERL:KONPL<nowiki>|</nowiki>MDN:A3<nowiki>|</nowiki>NOR:HURA | 2 | ccomp_obj | _ | _ | |
| | 8 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 7 | PUNC | _ | _ | |
| | 9 | Euskaltzaindiak | Euskaltzaindia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 11 | ncsubj | _ | _ | |
| | 10 | zer | zer | DET | NOLGAL | NMG:MG<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 11 | ncobj | _ | _ | |
| | 11 | erranen | erran | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 13 | menos | _ | _ | |
| | 12 | duen | *edun | ADL | ADL | ERL:ZHG<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 11 | auxmod | _ | _ | |
| | 13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | |
| | 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | |
| | 15 | haren | hura | DET | ERKARR | KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 16 | arauen | arau | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 18 | ncmod | _ | _ | |
| | 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | |
| | 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE<nowiki>|</nowiki>ERL:HELB<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 7 | xmod | _ | _ | |
| | 19 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 18 | PUNC | _ | _ | |
| |
==== Parsing ==== | ==== Parsing ==== |