[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:te [2012/03/22 17:06]
zeman Inside.
user:zeman:treebanks:te [2012/03/22 18:18] (current)
zeman Parsing.
Line 85: Line 85:
 The first sentence of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format: The first sentence of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format:
  
-<code xml><document id=""> +<code xml><document id="">
 <head> <head>
-<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20100831">+<annotated-resource name="HyDT-Telugu" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="tel" date-of-release="20100831">
     <annotation-standard>     <annotation-standard>
         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />
Line 94: Line 94:
         <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />         <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />
     </annotation-standard>     </annotation-standard>
-</annotated-resource>  +</annotated-resource> 
-</head> +</head>
 <Sentence id="1"> <Sentence id="1">
-1 (( NP <fs af='Age,adv,,,,,,' head="Agei" drel=k7t:VGF name=NP+      ((      NP      <fs af='saMgawi,n,,sg,,d,0,0' head='saMgawi' drel='k1:VGF'
-1.1 mudZira NN <fs af='mudZi,n,,sg,,o,era,era'> +1.1     maro    QF      <fs af='maro,avy,,,,,,'> 
-1.2 Agei NST <fs af='Age,adv,,,,,,' name="Agei"+1.2     saMgawi NN      <fs af='saMgawi,n,,sg,,d,0,0' name='saMgawi'
- ))  +        )) 
-2 (( NP <fs af='cA,n,,sg,,d,0,0' head="cA" drel=k1:VGF name=NP2> +      ((      NP      <fs af='mIru,pn,any,pl,2,,ki,ki' head='mIku' drel='k4:VGFname='NP2'
-2.1 praWama QO <fs af='praWama,num,,,,,,'> +2.1     mIku    PRP     <fs af='mIru,pn,any,pl,2,,ki,kiname='mIku'> 
-2.2 kApa NN <fs af='kApa,unk,,,,,,'+        )) 
-2.3 cA NN <fs af='cA,n,,sg,,d,0,0name="cA"+      ((      VGF     <fs af='weVlusA,avy,,,,,0,0_avy' head='weVlusA' name='VGF'
- ))  +3.1     weVlusA VM      <fs af='weVlusA,avy,,,,,0,0_avy' name='weVlusA'
-3 (( VGF <fs af='As,v,,,5,,A_yA+Ce,A' head="ese" name=VGF> +3.2     ?       SYM     <fs af='?,punc,,,,,,'> 
-3.1 ese VM <fs af='As,v,,,7,,A,A' name="ese"+        ))
-3.2 . SYM <fs af='.,punc,,,,,,'> +
- )) +
 </Sentence></code> </Sentence></code>
  
 And in the CoNLL format: And in the CoNLL format:
  
-| 1 | Agei Age | NP | NST lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ +| 1 | saMgawi saMgawi | NP | NN | <nowiki>lex-saMgawi|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-saMgawi</nowiki> | 3 | k1 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 2 | cA cA | NP | NN lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ +| 2 | mIku mIru | NP | PRP | <nowiki>lex-mIru|cat-pn|gend-any|num-pl|pers-2|case-|vib-ki|tam-ki|head-mIku|name-NP2</nowiki> | 3 | k4 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | ese As | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+| 3 | weVlusA weVlusA | VGF | VM | <nowiki>lex-weVlusA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-weVlusA|name-VGF</nowiki> | 0 | main | <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
-And after conversion of the WX encoding to the Bengali script in UTF-8:+And after conversion of the WX encoding to the Telugu script in UTF-8:
  
-| 1 | আগেই | আগে | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ +| 1 | <nowiki>సంగతి</nowiki> <nowiki>సంగతి</nowiki> | NP | NN | <nowiki>lex-saMgawi|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-saMgawi</nowiki> | 3 | k1 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 2 | চা | চা | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ +| 2 | <nowiki>మీకు</nowiki> <nowiki>మీరు</nowiki> | NP | PRP | <nowiki>lex-mIru|cat-pn|gend-any|num-pl|pers-2|case-|vib-ki|tam-ki|head-mIku|name-NP2</nowiki> | 3 | k4 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | এসে | আস্ | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+| 3 | <nowiki>తెలుసా</nowiki> <nowiki>తెలుసా</nowiki> | VGF | VM | <nowiki>lex-weVlusA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-weVlusA|name-VGF</nowiki> | 0 | main | <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
 The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format:
Line 128: Line 126:
 <code xml><document id=""> <code xml><document id="">
 <head> <head>
-<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20100831">+<annotated-resource name="HyDT-Telugu" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="tel" date-of-release="20100831">
     <annotation-standard>     <annotation-standard>
         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />
Line 138: Line 136:
 </head> </head>
 <Sentence id="1"> <Sentence id="1">
-1 (( NP <fs af='parabarwIkAle,adv,,,,,,' head="parabarwIkAle" drel=k7t:VGF name=NP> +      ((      RBP     <fs af='eVMwa,pn,,sg,,d,0,0' head='eVMwa' drel='adv:NP'
-1.1 parabarwIkAle NN <fs af='parabarwIkAle,adv,,,,,,' name="parabarwIkAle"+1.1     eVMwa   WQ      <fs af='eVMwa,pn,,sg,,d,0,0' name='eVMwa'
- ))  +        )) 
-2 (( NP <fs af='aPisa-biyArAraxera,unk,,,,,,' head="aPisa-biyArAraxera" drel=r6:NP3 name=NP2+      ((      NP      <fs af='bAXEnA,unk,,,,,,' head='bAXEnA' drel='k2s:VGNF' name='NP'
-2.1 aPisa-biyArAraxera NN <fs af='aPisa-biyArAraxera,unk,,,,,,' name="aPisa-biyArAraxera"+2.1     bAXEnA  NN      <fs af='bAXEnA,unk,,,,,,' name='bAXEnA'
- ))  +        )) 
-3 (( NP <fs af='nAma,n,,sg,,d,0,0' head="nAma" drel=k2:VGNN name=NP3+      ((      NP      <fs af='ixi,pn,fn,sg,3,o,ti,ti' head='xIni' drel='k2:VGNF' name='NP2'
-3.1 nAma NN <fs af='nAma,n,,sg,,d,0,0' name="nAma"+3.1     xIni    PRP     <fs af='ixi,pn,fn,sg,3,o,ti,ti' name='xIni'
- ))  +        )) 
-4 (( NP <fs af='GoRaNA,unk,,,,,,' head="GoRaNA" drel=pof:VGNN name=NP4+      ((      RBP     <fs af='eVlA,avy,,,,,0,0_avy' head='eVlA' drel='adv:VGNF' name='RBP2'
-4.1 GoRaNA NN <fs af='GoRaNA,unk,,,,,,' name="GoRaNA"+4.1     eVlA    WQ      <fs af='eVlA,avy,,,,,0,0_avy' name='eVlA'
- ))  +        )) 
-5 (( VGNN <fs af='kar,n,,,any,,,' head="karAra" drel=r6:NP5 name=VGNN+      ((      NP      <fs af='bayata,n,,sg,,d,0,0' head='bayata' drel='pof:VGNF' name='NP3'
-5.1 karAra VM <fs af='kar,n,,,any,,,' name="karAra"+5.1     bayata  NST     <fs af='bayata,n,,sg,,d,0,0' name='bayata'
- ))  +        )) 
-6 (( NP <fs af='samay,unk,,,,,,' head="samay" drel=k7t:VGF name=NP5+      ((      VGNF    <fs af='peVttuko,pn,,sg,,,e_axi,e_axi_0' head='peVttukoVnexi' drel='k1s:VGNN' name='VGNF'
-6.1 samay NN <fs af='samay,unk,,,,,,' name="samay"+6.1     peVttukoVnexi   VM      <fs af='peVttuko,pn,,sg,,,e_axi,e_axi_0' name='peVttukoVnexi'
- ))  +        )) 
-7 (( NP <fs af='animeRake,unk,,,,,,' head="animeRake" drel=k2:VGF name=NP6+      ((      RBP     <fs af='sarigA,avy,,,,,0,0_avy' head='sarigA' drel='adv:VGNN' name='RBP3'
-7.1 animeRake NNP <fs af='animeRake,unk,,,,,,' name="animeRake"+7.1     sarigA  RB      <fs af='sarigA,avy,,,,,0,0_avy' name='sarigA'
- ))  +        )) 
-8 (( VGF <fs af='sariye,unk,,,5,,0_rAKA+ka_ha+la,' head="sariye" name=VGF+      ((      NP      <fs af='viRayaM,n,,sg,,d,0,0' head='viRayaM' drel='k1:VGNN' name='NP4'
-8.1 sariye VM <fs af='sariye,unk,,,,,,' name="sariye"+8.1     viRayaM NN      <fs af='viRayaM,n,,sg,,d,0,0' name='viRayaM'> 
-8.2 . SYM <fs af='.,punc,,,,,,'> +        )) 
- )) +9       ((      VGNN    <fs af='weVliyu,v,any,any,any,,aka_po_adaM,aka_po_adaM' head='weVliyakapovadaM' name='VGNN'> 
 +9.1     weVliyakapovadaM        VM      <fs af='weVliyu,v,any,any,any,,aka_po_adaM,aka_po_adaM' name='weVliyakapovadaM'
 +9.2           SYM     <fs af='.,punc,,,,,,'> 
 +        ))
 </Sentence></code> </Sentence></code>
  
 And in the CoNLL format: And in the CoNLL format:
  
-| 1 | parabarwIkAle parabarwIkAle NP NN lex-parabarwIkAle<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-parabarwIkAle<nowiki>|</nowiki>name-NP | 8 | k7t | _ | _ +| 1 | eVMwa eVMwa RBP WQ | <nowiki>lex-eVMwa|cat-pn|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-eVMwa</nowiki> | 2 | adv | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 2 | aPisa-biyArAraxera aPisa-biyArAraxera | NP | NN | lex-aPisa-biyArAraxera<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-aPisa-biyArAraxera<nowiki>|</nowiki>name-NP2 | 3 | r6 | _ | _ +| 2 | bAXEnA bAXEnA | NP | NN | <nowiki>lex-bAXEnA|cat-unk|gend-|num-|pers-|case-|vib-|tam-|head-bAXEnA|name-NP</nowiki> | 6 | k2s | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | nAma nAma | NP | NN lex-nAma<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-nAma<nowiki>|</nowiki>name-NP3 | 5 | k2 | _ | _ +| 3 | xIni ixi | NP | PRP | <nowiki>lex-ixi|cat-pn|gend-fn|num-sg|pers-3|case-o|vib-ti|tam-ti|head-xIni|name-NP2</nowiki> | 6 | k2 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 4 | GoRaNA GoRaNA NP NN lex-GoRaNA<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GoRaNA<nowiki>|</nowiki>name-NP4 | 5 | pof | _ | _ +| 4 | eVlA eVlA RBP WQ | <nowiki>lex-eVlA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-eVlA|name-RBP2</nowiki> | 6 | adv | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 5 | karAra kar VGNN VM lex-kar<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-karAra<nowiki>|</nowiki>name-VGNN | 6 | r6 | +| 5 | bayata bayata NP NST | <nowiki>lex-bayata|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-bayata|name-NP3</nowiki> | 6 | pof | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-samay samay NP NN lex-samay<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-samay<nowiki>|</nowiki>name-NP5 | 8 | k7t | _ | _ +| 6 | peVttukoVnexi peVttuko VGNF VM <nowiki>lex-peVttuko|cat-pn|gend-|num-sg|pers-|case-|vib-e_axi|tam-e_axi_0|head-peVttukoVnexi|name-VGNF</nowiki> | 9 | k1s | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-animeRake animeRake | NP | NNP lex-animeRake<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-animeRake<nowiki>|</nowiki>name-NP6 | 8 | k2 | _ | _ +| 7 | sarigA | sarigA | RBP | RB | <nowiki>lex-sarigA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-sarigA|name-RBP3</nowiki> | 9 | adv | <nowiki>_</nowiki> <nowiki>_</nowiki>
-sariye sariye VGF | VM | lex-sariye<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_rAKA+ka_ha+la<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-sariye<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+viRayaM viRayaM | NP | NN | <nowiki>lex-viRayaM|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-viRayaM|name-NP4</nowiki> | 9 | k1 | <nowiki>_</nowiki> <nowiki>_</nowiki>
 +weVliyakapovadaM weVliyu VGNN | VM | <nowiki>lex-weVliyu|cat-v|gend-any|num-any|pers-any|case-|vib-aka_po_adaM|tam-aka_po_adaM|head-weVliyakapovadaM|name-VGNN</nowiki> | 0 | main | <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
-And after conversion of the WX encoding to the Bengali script in UTF-8:+And after conversion of the WX encoding to the Telugu script in UTF-8:
  
-| 1 | পরবর্তীকালে | পরবর্তীকালে | NP | NN | lex-parabarwIkAle<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-parabarwIkAle<nowiki>|</nowiki>name-NP | 8 | k7t | _ | _ | +| 1 | <nowiki>ఎంత</nowiki> <nowiki>ఎంత</nowiki> | RBP | WQ | <nowiki>lex-eVMwa|cat-pn|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-eVMwa</nowiki> | 2 | adv | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 2 | অফিস-বিযারারদের | অফিস-বিযারারদের | NP | NN | lex-aPisa-biyArAraxera<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-aPisa-biyArAraxera<nowiki>|</nowiki>name-NP2 | 3 | r6 | _ | _ +| 2 | <nowiki>బాధైనా</nowiki> <nowiki>బాధైనా</nowiki> | NP | NN | <nowiki>lex-bAXEnA|cat-unk|gend-|num-|pers-|case-|vib-|tam-|head-bAXEnA|name-NP</nowiki> | 6 | k2s | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | নাম | নাম | NP | NN | lex-nAma<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-nAma<nowiki>|</nowiki>name-NP3 | 5 | k2 | _ | _ +| 3 | <nowiki>దీని</nowiki> <nowiki>ఇది</nowiki> | NP | PRP | <nowiki>lex-ixi|cat-pn|gend-fn|num-sg|pers-3|case-o|vib-ti|tam-ti|head-xIni|name-NP2</nowiki> | 6 | k2 | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 4 | ঘোষণা | ঘোষণা | NP | NN | lex-GoRaNA<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GoRaNA<nowiki>|</nowiki>name-NP4 | 5 | pof | _ | _ +| 4 | <nowiki>ఎలా</nowiki> <nowiki>ఎలా</nowiki> | RBP | WQ | <nowiki>lex-eVlA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-eVlA|name-RBP2</nowiki> | 6 | adv | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 5 | করার কর্ VGNN VM lex-kar<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-karAra<nowiki>|</nowiki>name-VGNN | 6 | r6 | _ | _ +| 5 | బయట బయట NP NST | <nowiki>lex-bayata|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-bayata|name-NP3</nowiki> | 6 | pof | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 6 | সময্ | সময্ | NP | NN | lex-samay<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-samay<nowiki>|</nowiki>name-NP5 | 8 | k7t | _ | _ +| 6 | <nowiki>పెట్టుకొనేది</nowiki> <nowiki>పెట్టుకొ</nowiki> | VGNF | VM | <nowiki>lex-peVttuko|cat-pn|gend-|num-sg|pers-|case-|vib-e_axi|tam-e_axi_0|head-peVttukoVnexi|name-VGNF</nowiki> | 9 | k1s | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 7 | অনিমেষকে | অনিমেষকে | NP | NNP | lex-animeRake<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-animeRake<nowiki>|</nowiki>name-NP6 k2 | +| 7 | <nowiki>సరిగా</nowiki> <nowiki>సరిగా</nowiki> | RBP | RB | <nowiki>lex-sarigA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-sarigA|name-RBP3</nowiki> | 9 | adv | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-সরিযে সরিযে VGF | VM | lex-sariye<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_rAKA+ka_ha+la<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-sariye<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+| 8 | <nowiki>విషయం</nowiki> | <nowiki>విషయం</nowiki> | NP | NN | <nowiki>lex-viRayaM|cat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0|head-viRayaM|name-NP4</nowiki> | 9 | k1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +| 9 | <nowiki>తెలియకపొవడం</nowiki> | <nowiki>తెలియు</nowiki> | VGNN | VM | <nowiki>lex-weVliyu|cat-v|gend-any|num-any|pers-any|case-|vib-aka_po_adaM|tam-aka_po_adaM|head-weVliyakapovadaM|name-VGNN</nowiki> | 0 | main | <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
 The first sentence of the ICON 2010 test data (with fine-grained syntactic tags) in the Shakti format: The first sentence of the ICON 2010 test data (with fine-grained syntactic tags) in the Shakti format:
Line 191: Line 194:
 <code xml><document id=""> <code xml><document id="">
 <head> <head>
-<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20101013">+<annotated-resource name="HyDT-Telugu" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="tel" date-of-release="20101013">
     <annotation-standard>     <annotation-standard>
         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />
- <pos-standard name="Anncorra-pos" version="" date="20061215" /> +        <pos-standard name="Anncorra-pos" version="" date="20061215" /> 
- <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> +        <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> 
- <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />+        <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />
     </annotation-standard>     </annotation-standard>
-<annotated-resource>+</annotated-resource>
 </head> </head>
-<Sentence id="1"> +<Sentence id="29"> 
-1 (( NP <fs af='mAXabIlawA,n,,sg,,d,0,0' head="mAXabIlawA" drel=k1:VGF name=NP> +      ((      NP      <fs af='iMkA,avy,,,,,0,0_avy' head="iMkA" drel=vmod:NULL_VGF name=NP poslcat="NM"
-1.1 mAXabIlawA NNP <fs af='mAXabIlawA,n,,sg,,d,0,0' name="mAXabIlawA"> +1.1     iMkA    PRP     <fs af='iMkA,avy,,,,,0,0_avyposlcat="NM" name="iMkA"> 
- ))  +        )) 
-2 (( NP <fs af='waKana,pn,,,,d,0,0' head="waKana" drel=k7t:VGF name=NP2+      ((      RBP     <fs af='warawarAlugA,avy,,,,,0,0_avy' head="warawarAlugA" drel=adv:VGNF name=RBP poslcat="NM"
-2.1 waKana PRP <fs af='waKana,pn,,,,d,0,0' name="waKana"> +2.1     warawarAlugA    RB      <fs af='warawarAlugA,avy,,,,,0,0_avyposlcat="NM" name="warawarAlugA"> 
- ))  +        )) 
-3 (( NP <fs af='hAwa,n,,sg,,o,era,era' head="hAwera" drel=r6:NP4 name=NP3+      ((      VGNF    <fs af='nAtuko,v,any,any,any,,i_po_ina,i_po_ina' head="nAtukupoyina" drel=nmod:NP2 name=VGNF
-3.1 hAwera NN <fs af='hAwa,n,,sg,,o,era,era' name="hAwera"> +3.1     nAtukupoyina    VM      <fs af='nAtuko,v,any,any,any,,i_po_ina,i_po_ina' name="nAtukupoyina"> 
- ))  +        )) 
-4 (( NP <fs af='GadZi,unk,,,,,,' head="GadZi" drel=k2:VGNF name=NP4+      ((      NP      <fs af='aBiprAyaM,n,,pl,,d,0,0' head="aBiprAyAlu" drel=k1:NULL_VGF name=NP2
-4.1 GadZi NN <fs af='GadZi,unk,,,,,,' name="GadZi"> +4.1     aBiprAyAlu      NN      <fs af='aBiprAyaM,n,,pl,,d,0,0' name="aBiprAyAlu"> 
- ))  +        )) 
-5 (( VGNF <fs af='Kul,v,,,5,,ne,nehead="Kule" drel=vmod:VGF name=VGNF+      ((      NULL_VGF        <fs name='NULL_VGF'> 
-5.1 Kule VM <fs af='Kul,v,,,5,,ne,nename="Kule"> +5.1     NULL    VM      <fs af='NULL,unk,,,,,,' poslcat="NM"> 
- ))  +5.2           SYM     <fs af='.,punc,,,,,,' poslcat="NM"> 
-6 (( NP <fs af='tebila,n,,sg,,d,me,me' head="tebile" drel=k7p:VGF name=NP5> +        ))
-6.1 tebile NN <fs af='tebila,n,,sg,,d,me,me' name="tebile"> +
- ))  +
-7 (( VGF <fs af='rAK,v,,,5,,Cila,Cila' head="rAKaCila" name=VGF> +
-7.1 rAKaCila VM <fs af='rAK,v,,,5,,Cila,Cilaname="rAKaCila"> +
-7.2 । SYM  +
- )) +
 </Sentence></code> </Sentence></code>
  
 And in the CoNLL format: And in the CoNLL format:
  
-| 1 | mAXabIlawA mAXabIlawA | NP | NNP lex-mAXabIlawA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-mAXabIlawA<nowiki>|</nowiki>name-NP | 7 | k1 | _ | _ | +| 1 | iMkA iMkA | NP | PRP | <nowiki>lex-iMkA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-iMkA|name-NP|poslcat-NM</nowiki>vmod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | waKana | waKana | NP | PRP | lex-waKana<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-waKana<nowiki>|</nowiki>name-NP2 | 7 | k7t | _ | _ +warawarAlugA warawarAlugA RBP RB | <nowiki>lex-warawarAlugA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-warawarAlugA|name-RBP|poslcat-NM</nowiki>adv | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-hAwera hAwa NP NN lex-hAwa<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-era<nowiki>|</nowiki>tam-era<nowiki>|</nowiki>head-hAwera<nowiki>|</nowiki>name-NP3 4 | r6 | _ | _ | +nAtukupoyina nAtuko | VGNF | VM | <nowiki>lex-nAtuko|cat-v|gend-any|num-any|pers-any|case-|vib-i_po_ina|tam-i_po_ina|head-nAtukupoyina|name-VGNF</nowiki> | 4 | nmod | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 4 | GadZi | GadZi | NP | NN | lex-GadZi<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GadZi<nowiki>|</nowiki>name-NP4 | 5 | k2 | _ | _ +aBiprAyAlu aBiprAyaM | NP | NN | <nowiki>lex-aBiprAyaM|cat-n|gend-|num-pl|pers-|case-d|vib-0|tam-0|head-aBiprAyAlu|name-NP2</nowiki>k1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-Kule Kul | VGNF | VM | lex-Kul<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-ne<nowiki>|</nowiki>tam-ne<nowiki>|</nowiki>head-Kule<nowiki>|</nowiki>name-VGNF | 7 | vmod | _ | _ +| NULL | NULL | <nowiki>NULL_VGF</nowiki> | VM | <nowiki>name-NULL_VGF</nowiki> | 0 | main | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-tebile tebila | NP | NN | lex-tebila<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-me<nowiki>|</nowiki>tam-me<nowiki>|</nowiki>head-tebile<nowiki>|</nowiki>name-NP5 k7p _ | _ | +
-| 7 | rAKaCila | rAK | VGF | VM | lex-rAK<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-Cila<nowiki>|</nowiki>tam-Cila<nowiki>|</nowiki>head-rAKaCila<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+
  
-And after conversion of the WX encoding to the Bengali script in UTF-8:+And after conversion of the WX encoding to the Telugu script in UTF-8:
  
-| 1 | মাধবীলতা | মাধবীলতা | NP | NNP | lex-mAXabIlawA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-mAXabIlawA<nowiki>|</nowiki>name-NP | 7 | k1 | _ | _ +| 1 | <nowiki>ఇంకా</nowiki> <nowiki>ఇంకా</nowiki> | NP | PRP | <nowiki>lex-iMkA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-iMkA|name-NP|poslcat-NM</nowiki> | 5 | vmod | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 2 | তখন | তখন | NP | PRP | lex-waKana<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-waKana<nowiki>|</nowiki>name-NP2 | 7 | k7t | _ | _ | +| 2 | <nowiki>తరతరాలుగా</nowiki> <nowiki>తరతరాలుగా</nowiki> | RBP | RB | <nowiki>lex-warawarAlugA|cat-avy|gend-|num-|pers-|case-|vib-0|tam-0_avy|head-warawarAlugA|name-RBP|poslcat-NM</nowiki> | 3 | adv | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-| 3 | হাতের হাত | NP | NN | lex-hAwa<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-era<nowiki>|</nowiki>tam-era<nowiki>|</nowiki>head-hAwera<nowiki>|</nowiki>name-NP3 r6 _ | _ | +| <nowiki>నాటుకుపొయిన</nowiki> | <nowiki>నాటుకొ</nowiki>VGNF VM | <nowiki>lex-nAtuko|cat-v|gend-any|num-any|pers-any|case-|vib-i_po_ina|tam-i_po_ina|head-nAtukupoyina|name-VGNF</nowiki>nmod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-| 4 | ঘড়ি | ঘড়ি | NP | NN | lex-GadZi<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GadZi<nowiki>|</nowiki>name-NP4 k2 _ | _ | +| <nowiki>అభిప్రాయాలు</nowiki> | <nowiki>అభిప్రాయం</nowiki> | NP | NN | <nowiki>lex-aBiprAyaM|cat-n|gend-|num-pl|pers-|case-d|vib-0|tam-0|head-aBiprAyAlu|name-NP2</nowiki>k1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-| 5 | খুলে | খুল্ | VGNF | VM | lex-Kul<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-ne<nowiki>|</nowiki>tam-ne<nowiki>|</nowiki>head-Kule<nowiki>|</nowiki>name-VGNF | 7 | vmod | _ | _ | +| NULL | NULL | <nowiki>NULL_VGF</nowiki> | VM | <nowiki>name-NULL_VGF</nowiki> | 0 | main | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 6 | টেবিলে | টেবিল | NP | NN | lex-tebila<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-me<nowiki>|</nowiki>tam-me<nowiki>|</nowiki>head-tebile<nowiki>|</nowiki>name-NP5 k7p _ | _ | +
-| 7 | রাখছিল | রাখ্ | VGF | VM | lex-rAK<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-Cila<nowiki>|</nowiki>tam-Cila<nowiki>|</nowiki>head-rAKaCila<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in HyDT-Bangla are not frequent. Only 78 of the 7252 chunks in the training+development ICON 2010 version are attached nonprojectively (1.08%).+Nonprojectivities in HyDT-Telugu are very rare. Only 13 of the 5722 chunks in the training+development ICON 2010 version are attached nonprojectively (0.23%).
  
-The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Bengali/coarse-grained results of the four officially ranked systems, and the best Bengali-only* system.+The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Telugu/coarse-grained results of the four officially ranked systems.
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-Kolkata (De et al.)84.29 90.32 +Malt (Nivre) | 62.44 86.28 
-Hyderabad (Ambati et al.) 78.25 90.22 +Mannem 65.01 85.76 
-Malt (Nivre) | 76.07 88.97 +Hyderabad (Ambati et al.) | 65.01 85.25 
-| Malt+MST (Zeman) | 71.49 86.89 | +| Malt+MST (Zeman) | 56.43 81.30 |
-| Mannem | 70.34 | 83.56 |+
  
-The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Bengali with fine-grained syntactic tags:+The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Telugu with fine-grained syntactic tags:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-Attardi et al. | 70.66 87.41 +Kosaraju et al. | 70.12 91.82 
-Kosaraju et al. | 70.55 86.16 +Attardi et al. | 65.61 90.48 
-| Kolachina et al. | 70.14 87.10 |+| Kolachina et al. | 68.11 90.15 |
  

[ Back to the navigation ] [ Back to the content ]