[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks [2011/11/19 13:04]
zeman Greek documentation.
user:zeman:treebanks [2011/11/19 23:23]
zeman Greek parsing.
Line 1590: Line 1590:
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEATof the CoNLL format. There //should// be a 1-1 mapping between the [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB positional tags]] and the CoNLL 2006 annotation. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=bg::conll|DZ Interset]] to inspect the CoNLL tagset. +The syntactic annotation style and the tagset for dependency relations (analytical functionsin GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]].
- +
-The morphological analysis does not include lemmas. The morphosyntactic tags have been assigned (probably) manually. +
- +
-The guidelines for syntactic annotation are documented in the other [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|technical report]]. The CoNLL distribution contains the BulTreeBankReadMe.html file with a brief description of the syntactic tags (dependency relation labels).+
  
 ==== Sample ==== ==== Sample ====
  
-The first three sentences of the CoNLL 2006 training data:+The first sentence of the CoNLL 2007 training data:
  
-| 1 | Глава Nc | _ | ROOT ROOT +| 1 | PUNCT PUNCT | _ | 10 AuxG 
-| 2 | трета Mo gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +| 2 | Τα ο At AtDf Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm Atr 
-| |||||||||| +αντισώματα αντίσωμα No NoCm Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm Sb | _ | _ 
-НАРОДНО | _ | An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +IgG | IgG | Rg | RgFwOr | _ | Atr _ | _ | 
-СЪБРАНИЕ | _ | Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT ROOT +| 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ 
-| |||||||||| +σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | 
-Народното | _ | An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d mod mod +| 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj Aj Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm Atr _ | _ 
-събрание | _ | Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i subj subj +μνήμη μνήμη No NoCm Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm Adv | _ | _ 
-осъществява | _ | Vpi trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s ROOT ROOT +, | , | PUNCT | PUNCT | _ | 10 AuxX _ | _ | 
-законодателната | _ | Af gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d mod mod +| 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | 
-власт | _ | Nc | _ | obj obj +| 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm 12 Atr 
-и | _ | Cp | _ | conj conj +12 IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | 
-упражнява Vpi trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s conjarg conjarg +| 13 | πιστεύεται | πιστεύεται | Vb VbMn Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ 
-парламентарен | _ | Am gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +14 ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | 
-контрол | _ | Nc gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i obj obj +| 15 | είναι | είμαι | Vb VbMn Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx 14 | Sb | _ | _ 
-10 | . | Punct Punct | _ | punct punct |+16 ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | | 
 +| 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm 18 Atr _ | _ 
 +18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 Pnom | _ | | 
 +| 19 | για | για | AsPp | AsPpSp | _ | 18 AuxP 
 +20 πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | | 
 +| 21 | ή | ή | Cj | CjCo | _ | 23 Coord 
 +22 χρόνιες χρόνιος Aj Aj Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | 
 +| 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac 19 Atr _ | _ 
 +24 | " | " | PUNCT PUNCT | _ | 10 | AuxG | _ | _ | 
 +| 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | 
 +| 26 | εξηγεί | εξηγώ | Vb VbMn Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ 
 +27 η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | | 
 +28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | 
 +| 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm 28 Atr 
 +30 | . | PUNCT PUNCT | _ | AuxK |
  
-The first three sentences of the CoNLL 2006 test data:+The first sentence of the CoNLL 2007 test data:
  
-| 1 | Единственото An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod mod +| 1 | Η ο At AtDf Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr 
-| 2 | решение Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT ROOT +| 2 | Σίφνος Σίφνος No NoPr Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm Sb 
-| |||||||||| +φημίζεται φημίζομαι Vb VbMn Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred 
-| 1 | Ерик | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT ROOT +και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | 
-Франк | _ | Np gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +| 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | 
-Ръсел Hm gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +| 6 | τα | ο | At AtDf Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac Atr 
-| |||||||||| +καταγάλανα καταγάλανος Aj Aj Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac Atr _ | _ 
-Пълен Am gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +νερά νερό No NoCm Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac Obj | _ | _ 
-мрак Nc gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT ROOT +των ο At AtDf Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge 11 Atr 
-и Cp conj conj +10 πανέμορφων πανέμορφος Aj Aj Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge 11 Atr _ | _ 
-пълна Af gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod | +11 ακτών ακτή No NoCm Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge Atr | _ | _ 
-| 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg +12 της μου Pn PnPo Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx 11 Atr | _ | _ | 
-| . | Punct Punct | _ | punct punct |+13 | . | PUNCT PUNCT | _ | AuxK |
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).+Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens in the CoNLL 2007 version are attached nonprojectively (1.17%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al.2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-MST (McDonald et al.87.57 92.04 | +Nakagawa | 76.31 | 84.08 | 
-| Malt (Nivre et al.) | 87.41 91.72 +| Keith Hall et al. | 74.21 82.04 
-Nara (Yuchang Cheng) | 86.34 91.30 |+| Carreras | 73.56 | 81.37 
 +| Malt (Nilsson et al.) | 74.65 81.22 
 +Titov et al. | 73.52 | 81.20 | 
 +| Chen | 74.42 | 81.16 | 
 +| Duan | 74.29 | 80.77 | 
 +| Attardi et al. | 73.92 | 80.75 | 
 +| Malt (J. Hall et al.) | 74.21 80.66 | 
 + 
 +The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  

[ Back to the navigation ] [ Back to the content ]