[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:ta [2012/03/22 10:37]
zeman
user:zeman:treebanks:ta [2012/03/22 11:01] (current)
zeman Nonprojectivity and parsing.
Line 25: Line 25:
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Loganathan Ramasamy, Zdeněk Žabokrtský: Tamil Dependency Parsing: Results using Rule Based and Corpus Based Approaches. In: //Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011) – Volume Part I//, pages 82-95, Tokyo, Japan, 2011, published by Springer Berlin / Heidelberg, ISBN 978-3-642-19399-6.+    * Loganathan Ramasamy, Zdeněk Žabokrtský: [[http://www.springerlink.com/content/w18v7621070h51g1/|Tamil Dependency Parsing: Results using Rule Based and Corpus Based Approaches]]. In: //Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011) – Volume Part I//, pages 82-95, Tokyo, Japan, 2011, published by Springer Berlin / Heidelberg, ISBN 978-3-642-19399-6.
     * Loganathan Ramasamy, Zdeněk Žabokrtský: Prague Dependency Style Treebank for Tamil. In: //Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)//, İstanbul, Turkey, 2012     * Loganathan Ramasamy, Zdeněk Žabokrtský: Prague Dependency Style Treebank for Tamil. In: //Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)//, İstanbul, Turkey, 2012
   * Documentation   * Documentation
     * [[http://ufal.mff.cuni.cz/~ramasamy/tamiltb/0.1/morph_annotation.html|Morphological annotation]]     * [[http://ufal.mff.cuni.cz/~ramasamy/tamiltb/0.1/morph_annotation.html|Morphological annotation]]
     * [[http://ufal.mff.cuni.cz/~ramasamy/tamiltb/0.1/dependency_annotation.html|Syntactic annotation]]     * [[http://ufal.mff.cuni.cz/~ramasamy/tamiltb/0.1/dependency_annotation.html|Syntactic annotation]]
 +    * Loganathan Ramasamy, Zdeněk Žabokrtský: [[http://ufal.mff.cuni.cz/~ramasamy/papers/2011-TamilTB-TR.pdf|Tamil Dependency Treebank (TamilTB) – 0.1 Annotation Manual]]. Technical Report TR-2011-42, ÚFAL MFF UK, Praha, Czechia, 2011
  
 ==== Domain ==== ==== Domain ====
Line 49: Line 50:
 ==== Sample ==== ==== Sample ====
  
-The first two sentences of the CoNLL 2006 training data:+The first sentence of the CoNLL version of training data:
  
-| 1 | غِيابُ_giyAbu غِياب_giyAb | N | case=1<nowiki>|</nowiki>def=R ExD | _ | _ | +| 1 | cennai cennai | N | <nowiki>NEN-3SN--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki>AAdjn <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 2 | فُؤاد_fu&Ad فُؤاد_fu&Ad | _ | Atr | _ | _ | +| 2 | arukE arukE <nowiki>PP-------</nowiki> <nowiki>_</nowiki> 18 AuxP <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 3 | كَنْعان_kanoEAn كَنْعان_kanoEAn | Atr | _ | _ | +| 3 | sri sri <nowiki>NEN-3SN--</nowiki> <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> | 4 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| |||||||||| +perumpuTUril perumpuTUr <nowiki>NEL-3SN--</nowiki> <nowiki>Cas=L|Per=3|Num=S|Gen=N</nowiki> 18 | AAdjn | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-فُؤاد_fu&Ad فُؤاد_fu&Ad | Atr | _ | _ | +kirIn kirIn <nowiki>NEN-3SN--</nowiki> <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> | 6 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-كَنْعان_kanoEAn كَنْعان_kanoEAn Sb | _ | _ | +pIltu pIltu <nowiki>NEN-3SN--</nowiki> <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> | 11 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-،_, ،_, | _ | | AuxG | _ | _ | +<nowiki>(</nowiki> <nowiki>(</nowiki> <nowiki>Z:-------</nowiki> <nowiki>_</nowiki> | AuxG | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-رائِد_rA}id رائِد_rA}id | _ | | Atr | _ | _ | +wavIna wavInam <nowiki>JJ-------</nowiki> <nowiki>_</nowiki> | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-القِصَّة_AlqiS~ap قِصَّة_qiS~ap N | N gen=F<nowiki>|</nowiki>num=S<nowiki>|</nowiki>def=D Atr | _ | _ | +<nowiki>)</nowiki> <nowiki>)</nowiki> | <nowiki>Z:-------</nowiki> <nowiki>_</nowiki>AuxG <nowiki>_</nowiki> <nowiki>_</nowiki> 
-القَصِيرَةِ_AlqaSiyrapi قَصِير_qaSiyr A | gen=F<nowiki>|</nowiki>num=S<nowiki>|</nowiki>case=2<nowiki>|</nowiki>def=D | | Atr | _ | _ | +10 vimAna vimAnam | <nowiki>NO--3SN--</nowiki> | <nowiki>Per=3|Num=S|Gen=N</nowiki> | 11 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-فِي_fiy فِي_fiy | _ | AuxP | _ | _ | +| 11 | wilaiyaTTukkukk | wilaiyam | N <nowiki>NND-3SN--</nowiki> | <nowiki>Cas=D|Per=3|Num=S|Gen=N</nowiki> | 12 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-لُبْنانِ_lubonAni لُبْنان_lubonAn case=2<nowiki>|</nowiki>def=R Atr | _ | _ | +12 Ana Aku <nowiki>Tg-------</nowiki> <nowiki>_</nowiki> 13 Atr <nowiki>_</nowiki> <nowiki>_</nowiki> 
-رَحَلَ_raHala رَحَل-َ_raHal-V | VP pers=3<nowiki>|</nowiki>gen=M<nowiki>|</nowiki>num=S | Pred | _ | _ | +13 wilam wilam <nowiki>NNN-3SN--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki>18 Sb <nowiki>_</nowiki> <nowiki>_</nowiki> 
-10 مَساءَ_masA'مَساء_masA' | _ | Adv | _ | _ | +14 yArukkum yAr | R | <nowiki>RBD-3SA--</nowiki> <nowiki>Cas=D|Per=3|Num=S|Gen=A</nowiki> | 15 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-11 أَمْسِ_>amosi أَمْسِ_>amosi 10 Atr | _ | _ | +| 15 | pATippu | pATippu | N | <nowiki>NNN-3SN--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> 16 Comp | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-12 عَن_Ean عَن_Ean | P | AuxP | _ | _ | +16 illATa il <nowiki>PP-------</nowiki> <nowiki>_</nowiki> 17 AuxP <nowiki>_</nowiki> <nowiki>_</nowiki> 
-13 81_81 81_81 12 Adv | _ | _ | +17 vakaiyil | vakai | N | <nowiki>NNL-3SN--</nowiki> | <nowiki>Cas=L|Per=3|Num=S|Gen=N</nowiki> | 18 | AAdjn | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-14 عاماً_EAmAF عام_EAm | N | N | gen=M<nowiki>|</nowiki>num=S<nowiki>|</nowiki>case=4<nowiki>|</nowiki>def=13 Atr | _ | _ | +18 etukkap etu <nowiki>Vu-T---AA</nowiki> | <nowiki>Ten=T|Voi=A|Neg=A</nowiki> | 20 | Obj | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-15 ._._. | | _ | 0 | AuxK | _ | _ |+19 patum patu <nowiki>VR-F3SNPA</nowiki> | <nowiki>Ten=F|Per=3|Num=S|Gen=N|Voi=P|Neg=A</nowiki> 18 AuxV <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +20 enRu en <nowiki>Tt-T----A</nowiki> <nowiki>Ten=T|Neg=A</nowiki> 23 AuxC | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +21 muTalvar muTalvar | N | <nowiki>NNN-3SH--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=H</nowiki> | 22 | Atr <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 22 | karuNAwiTi | karuNAwiTi | N | <nowiki>NEN-3SH--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=H</nowiki> | 23 | Sb | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 23 | uRuTiyaLiTT | uRuTiyaLi | V <nowiki>Vt-T---AA</nowiki> | <nowiki>Ten=T|Voi=A|Neg=A</nowiki> 0 | Pred | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +24 uLLAr | uL | V | <nowiki>VR-T3SHAA</nowiki> | <nowiki>Ten=T|Per=3|Num=S|Gen=H|Voi=A|Neg=A</nowiki> | 23 | AuxV | <nowiki>_</nowiki> <nowiki>_</nowiki>
 +| 25 | <nowiki>.</nowiki> <nowiki>.</nowiki> <nowiki>Z#-------</nowiki> | <nowiki>_</nowiki> | 0 | AuxK | <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
-The first sentence of the CoNLL 2006 test data:+The first sentence of the CoNLL version of test data:
  
-| 1 | اِتِّفاقٌ_Ait~ifAqN اِتِّفاق_Ait~ifAq | N | N | case=1<nowiki>|</nowiki>def=ExD _ | _ | +| 1 | pikAr pikAr | N | <nowiki>NEN-3SN--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> | 2 | Atr <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | بَيْنَ_bayona | بَيْنَ_bayona | P | P | _ | 1 | AuxP | _ | _ | +| 2 | iliruwTu iliruwTu | <nowiki>PP-------</nowiki> | <nowiki>_</nowiki> | 4 | AuxP | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | لُبْنانِ_lubonAni | لُبْنان_lubonAn | Z | Z | case=2<nowiki>|</nowiki>def=R | 4 | Atr | _ | _ +ErALamAna ErALamAna | <nowiki>JJ-------</nowiki> | <nowiki>_</nowiki>| Atr | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | وَ_wa | وَ_wa | C | C | _ | 2 | Coord | +iLainjarkaL iLainjar | N | <nowiki>NNN-3PA--</nowiki> | <nowiki>Cas=N|Per=3|Num=P|Gen=A</nowiki>| Sb | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 5 | سُورِيَّةٍ_suwriy~apK | سُورِيا_suwriyA | Z | Z | gen=F<nowiki>|</nowiki>num=S<nowiki>|</nowiki>case=2<nowiki>|</nowiki>def=I | 4 | Atr | _ | _ | +vElai vElai | N | <nowiki>NNN-3SN--</nowiki> | <nowiki>Cas=N|Per=3|Num=S|Gen=N</nowiki> | 6 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | عَلَى_EalaY | عَلَى_EalaY | P | P | _ | 1 | AuxP | _ | _ | +TEti TEtu <nowiki>Vt-T---AA</nowiki> | <nowiki>Ten=T|Voi=A|Neg=A</nowiki> | 9 | AAdjn | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 7 | رَفْعِ_rafoEi | رَفْع_rafoE | N | N | case=2<nowiki>|</nowiki>def=R | 6 | Atr | _ | _ +| 7 | veLi veLi | <nowiki>JJ-------</nowiki> <nowiki>_</nowiki> | 8 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki>
-مُسْتَوَى_musotawaY مُسْتَوَى_musotawaY N | _ | 7 | Atr | _ | _ | +| 8 | mAwilangkaLukku mAwilam <nowiki>NND-3PN--</nowiki> | <nowiki>Cas=D|Per=3|Num=P|Gen=N</nowiki> | 9 | AAdjn <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 9 | التَبادُلِ_AltabAduli | تَبادُل_tabAdul | N | N | case=2<nowiki>|</nowiki>def=D 8 | Atr | _ | _ | +kutipeyarwTu kutipeyar <nowiki>Vt-T---AA</nowiki> | <nowiki>Ten=T|Voi=A|Neg=A</nowiki> | 0 | Pred | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 10 | التِجارِيِّ_AltijAriy~i | تِجارِيّ_tijAriy~ | A | A | case=2<nowiki>|</nowiki>def=D | Atr | _ | _ | +10 varukinRanar varu <nowiki>VR-P3PHAA</nowiki> | <nowiki>Ten=P|Per=3|Num=P|Gen=H|Voi=A|Neg=A</nowiki> AuxV | <nowiki>_</nowiki><nowiki>_</nowiki> 
-| 11 | إِلَى_<ilaY | إِلَى_<ilaY P | P | _ | 7 | AuxP | _ | _ | +11 | <nowiki>.</nowiki> <nowiki>.</nowiki> | Z | <nowiki>Z#-------</nowiki> | <nowiki>_</nowiki>AuxK | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 12 | 500_500 | 500_500 | Q | Q | _ | 11 | Atr | _ | _ | +
-| 13 | مِلْيُونِ_miloyuwni | مِلْيُون_miloyuwn | N | N | case=2<nowiki>|</nowiki>def=R | 12 | Atr | _ | _ +
-14 دُولارٍ_duwlArK دُولار_duwlAr | N | N | case=2<nowiki>|</nowiki>def=I | 13 | Atr | _ | _ | +
- +
-The first sentence of the CoNLL 2007 training data: +
- +
-| 1 | تَعْدادُ | تَعْداد_1 | N | N- Case=1<nowiki>|</nowiki>Defin=R | Sb | _ | _ | +
-سُكّانِ ساكِن_1 | N | N| Case=2<nowiki>|</nowiki>Defin=R | 1 | Atr | _ | _ | +
-| 3 | 22 | [DEFAULT] | Q | Q- | _ | 2 | Atr | _ | _ | +
-| 4 | دَوْلَةً | دَوْلَة_1 | N | N- | Gender=F<nowiki>|</nowiki>Number=S<nowiki>|</nowiki>Case=4<nowiki>|</nowiki>Defin=I | 3 | Atr | _ | _ +
-عَرَبِيَّةً عَرَبِيّ_1 A| Gender=F<nowiki>|</nowiki>Number=S<nowiki>|</nowiki>Case=4<nowiki>|</nowiki>Defin=I | 4 | Atr | _ | _ +
-| 6 | سَ | سَ_FUT | F | F- | _ | 7 | AuxM | +
-| 7 | يَرْتَفِعُ | اِرْتَفَع_1 | V | VI | Mood=I<nowiki>|</nowiki>Voice=A<nowiki>|</nowiki>Person=3<nowiki>|</nowiki>Gender=M<nowiki>|</nowiki>Number=S | 0 | Pred | _ | _ +
-| 8 | إِلَى إِلَى_1 P| _ | 7 | AuxP | _ | _ | +
-| 9 | 654 | [DEFAULT] | Q | Q| _ | 8 | Adv | _ | _ | +
-| 10 | مِلْيُونَ | مِلْيُون_1 | N | N- | Case=4<nowiki>|</nowiki>Defin=R | 9 | Atr | _ | _ | +
-11 نَسَمَةٍ نَسَمَة_1 N| Gender=F<nowiki>|</nowiki>Number=S<nowiki>|</nowiki>Case=2<nowiki>|</nowiki>Defin=I | 10 | Atr | _ | _ +
-12 فِي فِي_1 P| _ | 7 | AuxP | _ | _ | +
-| 13 | مُنْتَصَفِ | مُنْتَصَف_1 | N | N- | Case=2<nowiki>|</nowiki>Defin=12 Adv | +
-14 القَرْنِ قَرْن_1 | N | N- | Case=2<nowiki>|</nowiki>Defin=D | 13 | Atr | _ | _ | +
- +
-The first sentence of the CoNLL 2007 test data: +
- +
-مُقاوَمَةُ | مُقاوَمَة_1 | N | N- | Gender=F<nowiki>|</nowiki>Number=S<nowiki>|</nowiki>Case=1<nowiki>|</nowiki>Defin=R 0 | ExD | _ | _ | +
-| 2 | زَواجِ | زَواج_1 | N | N- | Case=2<nowiki>|</nowiki>Defin=R Atr _ | _ | +
-| 3 | الطُلّابِ | طالِب_1 | N | N- | Case=2<nowiki>|</nowiki>Defin=D 2 | Atr | _ | _ | +
-| 4 | العُرْفِيِّ | عُرْفِيّ_1 | A | A- | Case=2<nowiki>|</nowiki>Defin=D | 2 | Atr | _ | _ |+
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in PADT are rare. Only 431 of the 116,793 tokens in the CoNLL 2007 version are attached nonprojectively (0.37%).+Nonprojectivities in PADT are very rare. Only 15 of the 9581 tokens are attached nonprojectively (0.16%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Arabic:+Initial parsing results were published by [[http://ufal.mff.cuni.cz/~ramasamy/papers/2011-pres-CICLing.pdf|(Ramasamy and Žabokrtský2011)]]. They use smaller data and different training-test data split than defined here (2008 tokens training, 953 tokens test).
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| MST (McDonald et al.) | 66.91 | 79.34 | +| Malt (Nivre et al.) | 65.69 | 75.03 
-| Basis (O'Neil) | 66.71 | 78.54 | +MST (McDonald et al.) | 65.69 | 74.92 |
-| Malt (Nivre et al.) | 66.71 | 77.52 | +
-| Edinburgh (Riedel et al.) | 66.65 | 78.62 | +
- +
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Arabic: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| Malt (Nilsson et al.) | 76.52 | 85.81 | +
-| Nakagawa | 75.08 | 86.09 +
-Malt (Hall et al.) | 74.75 | 84.21 | +
-| Sagae | 74.71 | 84.04 | +
-| Chen | 74.65 | 83.49 | +
-| Titov et al. | 74.12 83.18 | +
- +
-The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].+
  

[ Back to the navigation ] [ Back to the content ]