Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:treebanks [2011/11/19 13:04] zeman Greek documentation. |
user:zeman:treebanks [2011/11/19 13:14] zeman Greek inside. |
==== Inside ==== | ==== Inside ==== |
| |
The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEAT) of the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=bg::conll|DZ Interset]] to inspect the CoNLL tagset. | The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]]. |
| |
The morphological analysis does not include lemmas. The morphosyntactic tags have been assigned (probably) manually. | |
| |
The guidelines for syntactic annotation are documented in the other [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|technical report]]. The CoNLL distribution contains the BulTreeBankReadMe.html file with a brief description of the syntactic tags (dependency relation labels). | |
| |
==== Sample ==== | ==== Sample ==== |
| |
The first three sentences of the CoNLL 2006 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | Глава | _ | N | Nc | _ | 0 | ROOT | 0 | ROOT | | | 1 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | |
| 2 | трета | _ | M | Mo | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | | | 2 | Τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 3 | Atr | _ | _ | |
| |||||||||| | | 3 | αντισώματα | αντίσωμα | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 5 | Sb | _ | _ | |
| 1 | НАРОДНО | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | |
| 2 | СЪБРАНИЕ | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | |
| |||||||||| | | 6 | σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | |
| 1 | Народното | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | | | 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 8 | Atr | _ | _ | |
| 2 | събрание | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 3 | subj | 3 | subj | | | 8 | μνήμη | μνήμη | No | NoCm | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 6 | Adv | _ | _ | |
| 3 | осъществява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 0 | ROOT | 0 | ROOT | | | 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | |
| 4 | законодателната | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 5 | mod | 5 | mod | | | 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | |
| 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | | | 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 12 | Atr | _ | _ | |
| 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | | | 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | |
| 7 | упражнява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 3 | conjarg | 3 | conjarg | | | 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | |
| 8 | парламентарен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 9 | mod | 9 | mod | | | 14 | ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | |
| 9 | контрол | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 7 | obj | 7 | obj | | | 15 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 14 | Sb | _ | _ | |
| 10 | . | _ | Punct | Punct | _ | 3 | punct | 3 | punct | | | 16 | ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | |
| | 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | |
| | 18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 | Pnom | _ | _ | |
| | 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | |
| | 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | |
| | 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | |
| | 22 | χρόνιες | χρόνιος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | |
| | 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 19 | Atr | _ | _ | |
| | 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | |
| | 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | |
| | 26 | εξηγεί | εξηγώ | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | |
| | 27 | η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | |
| | 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | |
| | 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | |
| | 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | |
| |
The first three sentences of the CoNLL 2006 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | Единственото | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | | | 1 | Η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr | _ | _ | |
| 2 | решение | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 2 | Σίφνος | Σίφνος | No | NoPr | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 3 | Sb | _ | _ | |
| |||||||||| | | 3 | φημίζεται | φημίζομαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | |
| 1 | Ерик | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 4 | και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | |
| 2 | Франк | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | | | 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | |
| 3 | Ръсел | _ | H | Hm | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 6 | τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | |
| |||||||||| | | 7 | καταγάλανα | καταγάλανος | Aj | Aj | Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | |
| 1 | Пълен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | | | 8 | νερά | νερό | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 5 | Obj | _ | _ | |
| 2 | мрак | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | | | 9 | των | ο | At | AtDf | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | |
| 3 | и | _ | C | Cp | _ | 2 | conj | 2 | conj | | | 10 | πανέμορφων | πανέμορφος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | |
| 4 | пълна | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 5 | mod | 5 | mod | | | 11 | ακτών | ακτή | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 8 | Atr | _ | _ | |
| 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg | | | 12 | της | μου | Pn | PnPo | Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx | 11 | Atr | _ | _ | |
| 6 | . | _ | Punct | Punct | _ | 2 | punct | 2 | punct | | | 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | |
| |
==== Parsing ==== | ==== Parsing ==== |