Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:treebanks [2011/11/19 13:04] zeman Greek documentation. |
user:zeman:treebanks [2011/11/19 23:23] zeman Greek parsing. |
==== Inside ==== | ==== Inside ==== |
| |
The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEAT) of the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=bg::conll|DZ Interset]] to inspect the CoNLL tagset. | The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]]. |
| |
The morphological analysis does not include lemmas. The morphosyntactic tags have been assigned (probably) manually. | |
| |
The guidelines for syntactic annotation are documented in the other [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|technical report]]. The CoNLL distribution contains the BulTreeBankReadMe.html file with a brief description of the syntactic tags (dependency relation labels). | |
| |
==== Sample ==== | ==== Sample ==== |
| |
The first three sentences of the CoNLL 2006 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | Глава | _ | N | Nc | _ | 0 | ROOT | 0 | ROOT | | | 1 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | |
| 2 | трета | _ | M | Mo | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | | | 2 | Τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 3 | Atr | _ | _ | |
| |||||||||| | | 3 | αντισώματα | αντίσωμα | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 5 | Sb | _ | _ | |
| 1 | НАРОДНО | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | |
| 2 | СЪБРАНИЕ | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | |
| |||||||||| | | 6 | σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | |
| 1 | Народното | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | | | 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 8 | Atr | _ | _ | |
| 2 | събрание | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 3 | subj | 3 | subj | | | 8 | μνήμη | μνήμη | No | NoCm | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 6 | Adv | _ | _ | |
| 3 | осъществява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 0 | ROOT | 0 | ROOT | | | 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | |
| 4 | законодателната | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 5 | mod | 5 | mod | | | 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | |
| 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | | | 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 12 | Atr | _ | _ | |
| 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | | | 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | |
| 7 | упражнява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 3 | conjarg | 3 | conjarg | | | 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | |
| 8 | парламентарен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 9 | mod | 9 | mod | | | 14 | ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | |
| 9 | контрол | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 7 | obj | 7 | obj | | | 15 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 14 | Sb | _ | _ | |
| 10 | . | _ | Punct | Punct | _ | 3 | punct | 3 | punct | | | 16 | ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | |
| | 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | |
| | 18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 | Pnom | _ | _ | |
| | 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | |
| | 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | |
| | 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | |
| | 22 | χρόνιες | χρόνιος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | |
| | 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 19 | Atr | _ | _ | |
| | 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | |
| | 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | |
| | 26 | εξηγεί | εξηγώ | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | |
| | 27 | η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | |
| | 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | |
| | 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | |
| | 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | |
| |
The first three sentences of the CoNLL 2006 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | Единственото | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | | | 1 | Η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr | _ | _ | |
| 2 | решение | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 2 | Σίφνος | Σίφνος | No | NoPr | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 3 | Sb | _ | _ | |
| |||||||||| | | 3 | φημίζεται | φημίζομαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | |
| 1 | Ерик | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 4 | και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | |
| 2 | Франк | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | | | 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | |
| 3 | Ръсел | _ | H | Hm | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 6 | τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | |
| |||||||||| | | 7 | καταγάλανα | καταγάλανος | Aj | Aj | Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | |
| 1 | Пълен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 8 | νερά | νερό | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 5 | Obj | _ | _ | |
| 2 | мрак | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 9 | των | ο | At | AtDf | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | |
| 3 | и | _ | C | Cp | _ | 2 | conj | 2 | conj | | | 10 | πανέμορφων | πανέμορφος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | |
| 4 | пълна | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 5 | mod | 5 | mod | | | 11 | ακτών | ακτή | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 8 | Atr | _ | _ | |
| 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg | | | 12 | της | μου | Pn | PnPo | Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx | 11 | Atr | _ | _ | |
| 6 | . | _ | Punct | Punct | _ | 2 | punct | 2 | punct | | | 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | |
| |
==== Parsing ==== | ==== Parsing ==== |
| |
Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). | Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens in the CoNLL 2007 version are attached nonprojectively (1.17%). |
| |
The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian: | The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek: |
| |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ |
| MST (McDonald et al.) | 87.57 | 92.04 | | | Nakagawa | 76.31 | 84.08 | |
| Malt (Nivre et al.) | 87.41 | 91.72 | | | Keith Hall et al. | 74.21 | 82.04 | |
| Nara (Yuchang Cheng) | 86.34 | 91.30 | | | Carreras | 73.56 | 81.37 | |
| | Malt (Nilsson et al.) | 74.65 | 81.22 | |
| | Titov et al. | 73.52 | 81.20 | |
| | Chen | 74.42 | 81.16 | |
| | Duan | 74.29 | 80.77 | |
| | Attardi et al. | 73.92 | 80.75 | |
| | Malt (J. Hall et al.) | 74.21 | 80.66 | |
| |
| The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. |
| |