Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:treebanks:eu [2011/11/29 10:20] zeman Size. |
user:zeman:treebanks:eu [2011/11/29 11:14] zeman Parsing results. |
==== Versions ==== | ==== Versions ==== |
| |
* CoNLL 2007 | * CoNLL 2007 (BDT-I) |
* BDT-II (obtained per e-mail in 2011) | * BDT-II (obtained per e-mail in 2011) |
| |
==== Inside ==== | ==== Inside ==== |
| |
The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]]. | Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format. |
| |
Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): |
* ASP = aspect | * ASP = aspect |
* ERL = relation (relative sentence, completive sentence, indirect question...) | * ERL = relation (relative sentence, completive sentence, indirect question...) |
| |
| The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags. |
| |
| Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait). |
| |
==== Sample ==== | ==== Sample ==== |
The first sentence of the CoNLL 2007 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | | | 1 | espainiako_poliziak | Espainia_Poliziak | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ENTI_LOC | 4 | ncsubj | _ | _ | |
| 2 | Τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 3 | Atr | _ | _ | | | 2 | hiru | hiru | DET | DET_DZH | NMGP | 3 | detmod | _ | _ | |
| 3 | αντισώματα | αντίσωμα | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 5 | Sb | _ | _ | | | 3 | gazte | gazte | IZE | IZE_ARR | ABS<nowiki>|</nowiki>MG | 4 | ncobj | _ | _ | |
| 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | | | 4 | atxilotu | atxilotu | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | |
| 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | | | 5 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 4 | auxmod | _ | _ | |
| 6 | σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | | | 6 | atarrabian | Atarrabia | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 4 | ncmod | _ | _ | |
| 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 8 | Atr | _ | _ | | | 7 | , | , | PUNC | PUNC_KOMA | _ | 6 | PUNC | _ | _ | |
| 8 | μνήμη | μνήμη | No | NoCm | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 6 | Adv | _ | _ | | | 8 | eta | eta | LOT | LOT_JNT | - | 0 | ROOT | _ | _ | |
| 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | | 9 | madrilera | Madril | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ALA<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 10 | ncmod | _ | _ | |
| 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | | | 10 | eraman | eraman | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | |
| 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 12 | Atr | _ | _ | | | 11 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 10 | auxmod | _ | _ | |
| 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | | | 12 | . | . | PUNC | PUNC_PUNC | _ | 11 | PUNC | _ | _ | |
| 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | | |
| 14 | ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | | |
| 15 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 14 | Sb | _ | _ | | |
| 16 | ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | | |
| 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | | |
| 18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 | Pnom | _ | _ | | |
| 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | | |
| 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | | |
| 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | | |
| 22 | χρόνιες | χρόνιος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | | |
| 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 19 | Atr | _ | _ | | |
| 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | | |
| 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | |
| 26 | εξηγεί | εξηγώ | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | | |
| 27 | η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | | |
| 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | | |
| 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | | |
| 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | |
| |
The first sentence of the CoNLL 2007 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | Η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr | _ | _ | | | 1 | epaileek | epaile | IZE | IZE_ARR | BIZ+<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 2 | Σίφνος | Σίφνος | No | NoPr | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 3 | Sb | _ | _ | | | 2 | diote | esan | ADT | ADT | PNT<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NK_HAIEK-K | |
| 3 | φημίζεται | φημίζομαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | | | 3 | eaeko | EAE | IZE | IZE_LIB | SIG<nowiki>|</nowiki>GEL<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | |
| 4 | και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | | | 4 | parlamentarioek | parlamentario | ADJ | ADJ_ARR | IZAUR-<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | | | 5 | eaetik_kanpo | EAE | SIG | SIG- | DEK<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>ABL_kanpo_ABS<nowiki>|</nowiki>ENTI_LOC<nowiki>|</nowiki>POS | |
| 6 | τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | | | 6 | eginiko | egin | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | |
| 7 | καταγάλανα | καταγάλανος | Aj | Aj | Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | | | 7 | delituak | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| 8 | νερά | νερό | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 5 | Obj | _ | _ | | | 8 | ikertzea | ikertu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | |
| 9 | των | ο | At | AtDf | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | | | 9 | eta | eta | LOT | LOT_JNT | - | |
| 10 | πανέμορφων | πανέμορφος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | | | 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | |
| 11 | ακτών | ακτή | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 8 | Atr | _ | _ | | | 11 | auzitegi_gorenari | auzitegi_gora | ADJ | ADJ_IZO | DEK<nowiki>|</nowiki>GEN<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>DAT<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | |
| 12 | της | μου | Pn | PnPo | Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx | 11 | Atr | _ | _ | | | 12 | dagokiola | egon | ADT | ADT | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NI_HARI | |
| 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | | 13 | , | , | PUNC | PUNC_KOMA | _ | |
| | 14 | baina | baina | LOT | LOT_JNT | AURK | |
| | 15 | atzerrian | atzerri | IZE | IZE_ARR | INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM | |
| | 16 | izaniko | izan | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | |
| | 17 | kontaktu | kontaktu | IZE | IZE_ARR | _ | |
| | 18 | horiek | horiek | DET | DET_ERKARR | ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | |
| | 19 | ezin_direla | ezin_izan | ADI | ADI_ADK | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>MWCorrect | |
| | 20 | delitutzat | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>PRO<nowiki>|</nowiki>MG | |
| | 21 | hartu | hartu | ADI | ADI_SIN | PART | |
| | 22 | . | . | PUNC | PUNC_PUNC | _ | |
| |
| The first sentence of the BDT-II training data: |
| |
| | 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA | IZE | LIB | PLU:+<nowiki>|</nowiki>IZAUR:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>MW:B<nowiki>|</nowiki>ENT:Erakundea | 2 | ncmod | _ | _ | |
| | 2 | buru | buru | IZE | ARR | _ | 4 | ncsubj | _ | _ | |
| | 3 | ohiak | ohi | ADJ | ARR | IZAUR:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 4 | aztertuko | aztertu | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 0 | ROOT | _ | _ | |
| | 5 | du | *edun | ADL | ADL | MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 4 | auxmod | _ | _ | |
| | 6 | RUCen | RUC | IZE | IZB | MTKAT:SIG<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Erakundea | 7 | ncmod | _ | _ | |
| | 7 | erreforma | erreforma | IZE | ARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncobj | _ | _ | |
| | 8 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 7 | PUNC | _ | _ | |
| |
| The first sentence of the BDT-II development data: |
| |
| | 1 | Irakaskuntzan | irakaskuntza | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 2 | jardun | jardun | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:BURU | 0 | ROOT | _ | _ | |
| | 3 | zuen | *edun | ADL | ADL | MDN:B1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 2 | auxmod | _ | _ | |
| | 4 | Miel | Miel | IZE | IZB | PLU:-<nowiki>|</nowiki>ENT:Pertsona | 5 | entios | _ | _ | |
| | 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Pertsona | 2 | ncsubj | _ | _ | |
| | 6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | |
| | 7 | urtetik | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:ABL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 8 | 1992ra | 1992 | IZE | ZKI | KAS:ALA<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 9 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | |
| | 10 | hauetatik | hauek | DET | ERKARR | KAS:ABL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | |
| | 12 | urtez | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INS<nowiki>|</nowiki>MUG:MG | 16 | lot | _ | _ | |
| | 13 | Azpeitiko | Azpeitia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 14 | ncmod | _ | _ | |
| | 14 | ikastolan | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 15 | irakasle | irakasle | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 16 | ncpred | _ | _ | |
| | 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | |
| | 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | |
| | 18 | biak | bi | IZE | ZKI | KAS:ABS<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | lot | _ | _ | |
| | 19 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 18 | PUNC | _ | _ | |
| | 20 | Arabako | Araba | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 21 | ncmod | _ | _ | |
| | 21 | ikastolen | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 22 | ncmod | _ | _ | |
| | 22 | elkartean | elkarte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | |
| |
| The first sentence of the BDT-II test data: |
| |
| | 1 | Hegoaldean | hegoalde | IZE | ARR | KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | |
| | 2 | iduri_zait | iduri_izan | ADI | ADK | ASP:PNT<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORI:NIRI<nowiki>|</nowiki>MW:B | 0 | ROOT | _ | _ | |
| | 3 | euskararen | euskara | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncmod | _ | _ | |
| | 4 | mundu | mundu | IZE | ARR | BIZ:- | 7 | ncsubj | _ | _ | |
| | 5 | hau | hau | DET | ERKARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | detmod | _ | _ | |
| | 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | |
| | 7 | dagola | egon | ADT | ADT | ASP:PNT<nowiki>|</nowiki>ERL:KONPL<nowiki>|</nowiki>MDN:A3<nowiki>|</nowiki>NOR:HURA | 2 | ccomp_obj | _ | _ | |
| | 8 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 7 | PUNC | _ | _ | |
| | 9 | Euskaltzaindiak | Euskaltzaindia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 11 | ncsubj | _ | _ | |
| | 10 | zer | zer | DET | NOLGAL | NMG:MG<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 11 | ncobj | _ | _ | |
| | 11 | erranen | erran | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 13 | menos | _ | _ | |
| | 12 | duen | *edun | ADL | ADL | ERL:ZHG<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 11 | auxmod | _ | _ | |
| | 13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | |
| | 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | |
| | 15 | haren | hura | DET | ERKARR | KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | |
| | 16 | arauen | arau | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 18 | ncmod | _ | _ | |
| | 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | |
| | 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE<nowiki>|</nowiki>ERL:HELB<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 7 | xmod | _ | _ | |
| | 19 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 18 | PUNC | _ | _ | |
| |
==== Parsing ==== | ==== Parsing ==== |
| |
Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens in the CoNLL 2007 version are attached nonprojectively (1.17%). | BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%). |
| |
The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek: | The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek: |
| |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ |
| Nakagawa | 76.31 | 84.08 | | | Malt (Nilsson et al.) | 76.94 | 82.84 | |
| Keith Hall et al. | 74.21 | 82.04 | | | Titov et al. | 75.49 | 81.93 | |
| Carreras | 73.56 | 81.37 | | | Sagae | 74.64 | 81.19 | |
| Malt (Nilsson et al.) | 74.65 | 81.22 | | | Carreras | 75.75 | 81.11 | |
| Titov et al. | 73.52 | 81.20 | | | Nakagawa | 72.56 | 81.04 | |
| Chen | 74.42 | 81.16 | | | Malt (J. Hall et al.) | 74.99 | 80.61 | |
| Duan | 74.29 | 80.77 | | | Johansson et al. | 75.08 | 80.43 | |
| Attardi et al. | 73.92 | 80.75 | | |
| Malt (J. Hall et al.) | 74.21 | 80.66 | | |
| |
The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. | The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. |
| |
| Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: [[http://aclweb.org/anthology-new/W/W10/W10-1404.pdf|Application of Different Techniques to Dependency Parsing of Basque]]. In: Proceedings of the First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), NAACL Workshop, Los Angeles, California, USA, 2010. They report only Labeled Attachment Score (LAS) and their best system achieved LAS = 78.98%. |