Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:treebanks:hi [2011/12/06 16:24] zeman vytvořeno |
user:zeman:treebanks:hi [2011/12/06 16:51] zeman Sample training CoNLL. |
| |
==== Inside ==== | ==== Inside ==== |
| |
| * Broken characters (''\x{FFFD} REPLACEMENT CHARACTER'') in the WX encoding. |
| |
| -- |
| |
The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Bengali in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. | The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Bengali in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. |
==== Sample ==== | ==== Sample ==== |
| |
The first sentence of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format: | The first two sentences of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format: |
| |
<code xml><document id=""> | <code xml><document docid="hi"> |
<head> | <head> |
<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20100831"> | <title> </title> |
| <author> |
| <firstname> </firstname> |
| <middlename> </middlename> |
| <lastname></lastname> |
| </author> |
| <availability format="electronic" /> |
| <bibl> |
| </bibl> |
| <bytecount>8.0K</bytecount> |
| <domain name="general" /> |
| <creation creationdate="19/06/2007" institutename="IIIT Hyderabad"> |
| <creatorname> |
| <lastname>Dipti</lastname> |
| <middlename> |
| </middlename> |
| <firstname>Sharma</firstname> |
| </creatorname> |
| </creation> |
| <distributor>CLIA Consortia, DIT</distributor> |
| <edition number="1.0" /> |
| <encodingdesc> |
| <newencoding>Unicode(UTF-8)</newencoding> |
| <originalencoding>UTF-8</originalencoding> |
| </encodingdesc> |
| <sentencemarker marker=".">Specify Marker</sentencemarker> |
| <language name="hi" writingsystem="LTR" script="Devanagari" /> |
| <normalization normalized="no"> |
| <utilityname>xxx.exe</utilityname> |
| </normalization> |
| <projectdesc name="ILMT" /> |
| <pubaddress addresstype="web"> |
| </pubaddress> |
| <pubdate> |
| <dateofpublication></dateofpublication> |
| </pubdate> |
| <publicationstmt type="copyrightfree"> |
| </publicationstmt> |
| <publisher> |
| <name></name> |
| <url>xxx.com</url> |
| </publisher> |
| <pubplace place="books" /> |
| <wordcount>2 </wordcount> |
| <caption>xuvryavahAra se biParIM bipASA Pilma mahowsava se vApasa lOta gaI bipASA govA. </caption> |
| </caption> |
| |
| <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20100823"> |
<annotation-standard> | <annotation-standard> |
<morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> | <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> |
<pos-standard name="Anncorra-pos" version="" date="20061215" /> | <pos-standard name="Anncorra-pos" version="" date="20061215" /> |
<chunk-standard name="Anncorra-chunk" version="" date="20061215" /> | <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> |
| <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> |
<dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> | <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> |
</annotation-standard> | </annotation-standard> |
</annotated-resource> | </annotated-resource> |
</head> | </head> |
| <body> |
| <tb number="1" segment="no" bullet="no"> |
| <foreign language="select" writingsystem="LTR"></foreign> |
| <text> |
<Sentence id="1"> | <Sentence id="1"> |
1 (( NP <fs af='Age,adv,,,,,,' head="Agei" drel=k7t:VGF name=NP> | 1 bAwa NN <fs af='bAwa,n,f,sg,3,d,0,0' drel='k1:ho' posn='10' name='bAwa' chunkId='NP' chunkType='head:NP'> |
1.1 mudZira NN <fs af='mudZi,n,,sg,,o,era,era'> | 2 galawa JJ <fs af='galawa,adj,any,any,,any,,' drel='k1s:ho' posn='20' name='galawa' chunkId='JJP' chunkType='head:JJP'> |
1.2 Agei NST <fs af='Age,adv,,,,,,' name="Agei"> | 3 ho VM <fs af='ho,v,any,any,any,,0,0' drel='vmod:hE' stype='declarative' posn='30' voicetype='active' name='ho' chunkId='VGF' chunkType='head:VGF'> |
)) | 4 wo CC <fs af='wo,avy,,,,,,' posn='40' name='wo' chunkId='CCP' chunkType='head:CCP'> |
2 (( NP <fs af='cA,n,,sg,,d,0,0' head="cA" drel=k1:VGF name=NP2> | 5 gussA NN <fs af='gussA,n,m,sg,3,d,0,0' drel='pof:AnA' posn='50' name='gussA' chunkId='NP2' chunkType='head:NP2'> |
2.1 praWama QO <fs af='praWama,num,,,,,,'> | 6 selebritija NN <fs af='selebritija,unk,,,,,0_ko,' drel='k4a:AnA' posn='60' vpos='vib_2_RP' name='selebritija' chunkId='NP3' chunkType='head:NP3'> |
2.2 kApa NN <fs af='kApa,unk,,,,,,'> | 7 ko PSP <fs af='ko,psp,,,,,,' posn='70' drel='lwg__psp:selebritija' chunkType='child:NP3' name='ko'> |
2.3 cA NN <fs af='cA,n,,sg,,d,0,0' name="cA"> | 8 BI RP <fs af='BI,avy,,,,,,' posn='80' drel='lwg__rp:selebritija' chunkType='child:NP3' name='BI'> |
)) | 9 AnA VM <fs af='A,v,any,any,any,d,nA,nA' drel='k1:hE' posn='90' name='AnA' chunkId='VGNN' chunkType='head:VGNN'> |
3 (( VGF <fs af='As,v,,,5,,A_yA+Ce,A' head="ese" name=VGF> | 10 lAjamI JJ <fs af='lAjamI,adj,any,any,,,,' drel='pof:hE' posn='100' name='lAjamI' chunkId='JJP2' chunkType='head:JJP2'> |
3.1 ese VM <fs af='As,v,,,7,,A,A' name="ese"> | 11 hE VM <fs af='hE,v,any,sg,3,,hE,hE' drel='ccof:wo' stype='declarative' posn='110' voicetype='active' name='hE' chunkId='VGF2' chunkType='head:VGF2'> |
3.2 . SYM <fs af='.,punc,,,,,,'> | 12 . SYM <fs af='.,punc,,,,,,' posn='120' drel='rsym:hE' chunkType='child:VGF2' name='.'> |
)) | </Sentence> |
</Sentence></code> | |
| |
And in the CoNLL format: | |
| |
| 1 | Agei | Age | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | | <Sentence id="2"> |
| 2 | cA | cA | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | | 1 bqhaspawivAra NNP <fs af='bqhaspawivAra,n,m,sg,3,o,0_ko,0' drel='k7t:hue' posn='10' vpos='vib_2' name='bqhaspawivAra' chunkId='NP' chunkType='head:NP'> |
| 3 | ese | As | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | 2 ko PSP <fs af='ko,psp,,,,,,' posn='20' drel='lwg__psp:bqhaspawivAra' chunkType='child:NP' name='ko'> |
| 3 jZI NNP <fs af='jI,n,m,sg,3,o,0_meM,0' drel='k7:hue' posn='30' vpos='vib_2' name='jZI' chunkId='NP2' chunkType='head:NP2'> |
| 4 meM PSP <fs af='meM,psp,,,,,,' posn='40' drel='lwg__psp:jZI' chunkType='child:NP2' name='meM'> |
| 5 SurU NN <fs af='SurU,n,m,sg,3,d,0,0' drel='pof:hue' posn='50' name='SurU' chunkId='NP3' chunkType='head:NP3'> |
| 6 hue VM <fs af='ho,v,m,sg,any,,eM,eM' drel='nmod__k1inv:mahowsava' posn='60' name='hue' chunkId='VGNF' chunkType='head:VGNF'> |
| 7 ��veM XC <fs af='��veM,n,m,sg,3,d,0,0' posn='70' drel='mod:mahowsava' chunkType='child:NP4' name='��veM'> |
| 8 aMwarrARtrIya XC <fs af='aMwarrARtrIya,n,m,sg,3,d,0,0' posn='80' drel='mod:mahowsava' chunkType='child:NP4' name='aMwarrARtrIya'> |
| 9 Pilma XC <fs af='Pilma,n,f,sg,3,d,0,0' posn='90' drel='mod:mahowsava' chunkType='child:NP4' name='Pilma'> |
| 10 mahowsava NNP <fs af='mahowsava,n,m,sg,,o,0_kA,0' drel='r6:raMga' posn='100' vpos='vib_5' name='mahowsava' chunkId='NP4' chunkType='head:NP4'> |
| 11 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='110' drel='lwg__psp:mahowsava' chunkType='child:NP4' name='ke'> |
| 12 raMga NN <fs af='raMga,n,m,sg,3,o,0_meM,0' drel='k7:padZA' posn='120' vpos='vib_2' name='raMga' chunkId='NP5' chunkType='head:NP5'> |
| 13 meM PSP <fs af='meM,psp,,,,,,' posn='130' drel='lwg__psp:raMga' chunkType='child:NP5' name='meM2'> |
| 14 BaMga JJ <fs af='BaMga,adj,any,any,,any,,' drel='pof:padZA' posn='140' name='BaMga' chunkId='JJP' chunkType='head:JJP'> |
| 15 usa DEM <fs af='vaha,pn,any,sg,3,o,,' posn='150' drel='nmod__adj:samaya' chunkType='child:NP6' name='usa'> |
| 16 samaya NN <fs af='samaya,n,any,sg,3,d,0,0' drel='k7t:padZA' posn='160' name='samaya' chunkId='NP6' chunkType='head:NP6'> |
| 17 padZA VM <fs af='pada,v,any,any,any,,yA,yA' stype='declarative' posn='170' voicetype='active' name='padZA' chunkId='VGF' chunkType='head:VGF'> |
| 18 jaba PRP <fs af='jaba,pn,,,,,,' drel='k7t:kiyA' posn='180' coref='samaya' name='jaba' chunkId='NP7' chunkType='head:NP7'> |
| 19 vahAM PRP <fs af='vahAz,pn,,,,,0_para,' drel='jjmod:wEnAwa' posn='190' vpos='vib_2' name='vahAM' chunkId='NP8' chunkType='head:NP8'> |
| 20 para PSP <fs af='para,psp,,,,,,' posn='200' drel='lwg__psp:vahAM' chunkType='child:NP8' name='para'> |
| 21 wEnAwa JJ <fs af='wEnAwa,adj,any,any,,o,,' drel='nmod:surakRAkarmiyoM' posn='210' name='wEnAwa' chunkId='JJP2' chunkType='head:JJP2'> |
| 22 surakRAkarmiyoM NN <fs af='surakRAkarmI,n,m,pl,3,o,0_ne,0' drel='k1:kiyA' posn='220' vpos='vib_2' name='surakRAkarmiyoM' chunkId='NP9' chunkType='head:NP9'> |
| 23 ne PSP <fs af='ne,psp,,,,,,' posn='230' drel='lwg__psp:surakRAkarmiyoM' chunkType='child:NP9' name='ne'> |
| 24 bOYlIvuda NN <fs af='bOYlIvuda,n,m,sg,3,o,0_kA,0' drel='r6:basu' posn='240' vpos='vib_2' name='bOYlIvuda' chunkId='NP10' chunkType='head:NP10'> |
| 25 kI PSP <fs af='kA,psp,f,sg,,o,,' posn='250' drel='lwg__psp:bOYlIvuda' chunkType='child:NP10' name='kI'> |
| 26 aBinewrI NN <fs af='aBinewrI,n,f,sg,3,o,0,0' posn='260' drel='nmod:bipASA' chunkType='child:NP11' name='aBinewrI'> |
| 27 bipASA NN <fs af='bipASA,n,f,sg,3,d,0,0' posn='270' drel='nmod:basu' chunkType='child:NP11' name='bipASA'> |
| 28 basu NNP <fs af='basu,n,f,sg,3,o,0_ke_sAWa,0' drel='k2:kiyA' posn='280' vpos='vib_vib_vib_4_5' name='basu' chunkId='NP11' chunkType='head:NP11'> |
| 29 ke PSP <fs af='ke,psp,,,,,,' posn='290' drel='lwg__psp:basu' chunkType='child:NP11' name='ke2'> |
| 30 sAWa NST <fs af='sAWa,nst,m,sg,3,d,,' posn='300' drel='lwg__psp:basu' chunkType='child:NP11' name='sAWa'> |
| 31 xuvyarvahAra NN <fs af='xuvyarvahAra,n,m,sg,3,d,0,0' drel='pof:kiyA' posn='310' name='xuvyarvahAra' chunkId='NP12' chunkType='head:NP12'> |
| 32 kiyA VM <fs af='kara,v,m,sg,any,,yA,yA' drel='nmod__relc:samaya' stype='declarative' posn='320' voicetype='active' name='kiyA' chunkId='VGF2' chunkType='head:VGF2'> |
| 33 . SYM <fs af='.,punc,,,,,,' posn='330' drel='rsym:kiyA' chunkType='child:VGF2' name='.'> |
| </Sentence></code> |
| |
And after conversion of the WX encoding to the Bengali script in UTF-8: | The same two sentences converted to the CoNLL format, WX characters decoded back to Devanagari in UTF-8: |
| |
| 1 | আগেই | আগে | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | | | 1 | बात | बात | NN | n | lex-bAwa<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-10<nowiki>|</nowiki>name-bAwa<nowiki>|</nowiki>chunkId-NP<nowiki>|</nowiki>chunkType-head:NP | 3 | k1 | _ | _ | |
| 2 | চা | চা | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | | | 2 | गलत | गलत | JJ | adj | lex-galawa<nowiki>|</nowiki>cat-adj<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-any<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-20<nowiki>|</nowiki>name-galawa<nowiki>|</nowiki>chunkId-JJP<nowiki>|</nowiki>chunkType-head:JJP | 3 | k1s | _ | _ | |
| 3 | এসে | আস্ | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | 3 | हो | हो | VM | v | lex-ho<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>stype-declarative<nowiki>|</nowiki>posn-30<nowiki>|</nowiki>voicetype-active<nowiki>|</nowiki>name-ho<nowiki>|</nowiki>chunkId-VGF<nowiki>|</nowiki>chunkType-head:VGF | 11 | vmod | _ | _ | |
| | 4 | तो | तो | CC | avy | lex-wo<nowiki>|</nowiki>cat-avy<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-40<nowiki>|</nowiki>name-wo<nowiki>|</nowiki>chunkId-CCP<nowiki>|</nowiki>chunkType-head:CCP | 0 | main | _ | _ | |
| | 5 | गुस्सा | गुस्सा | NN | n | lex-gussA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-50<nowiki>|</nowiki>name-gussA<nowiki>|</nowiki>chunkId-NP2<nowiki>|</nowiki>chunkType-head:NP2 | 9 | pof | _ | _ | |
| | 6 | सेलेब्रिटिज | सेलेब्रिटिज | NN | unk | lex-selebritija<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_ko<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-60<nowiki>|</nowiki>vpos-vib_2_RP<nowiki>|</nowiki>name-selebritija<nowiki>|</nowiki>chunkId-NP3<nowiki>|</nowiki>chunkType-head:NP3 | 9 | k4a | _ | _ | |
| | 7 | को | को | PSP | psp | lex-ko<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-70<nowiki>|</nowiki>chunkType-child:NP3<nowiki>|</nowiki>name-ko | 6 | lwg__psp | _ | _ | |
| | 8 | भी | भी | RP | avy | lex-BI<nowiki>|</nowiki>cat-avy<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-80<nowiki>|</nowiki>chunkType-child:NP3<nowiki>|</nowiki>name-BI | 6 | lwg__rp | _ | _ | |
| | 9 | आना | आ | VM | v | lex-A<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-nA<nowiki>|</nowiki>tam-nA<nowiki>|</nowiki>posn-90<nowiki>|</nowiki>name-AnA<nowiki>|</nowiki>chunkId-VGNN<nowiki>|</nowiki>chunkType-head:VGNN | 11 | k1 | _ | _ | |
| | 10 | लाजमी | लाजमी | JJ | adj | lex-lAjamI<nowiki>|</nowiki>cat-adj<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-100<nowiki>|</nowiki>name-lAjamI<nowiki>|</nowiki>chunkId-JJP2<nowiki>|</nowiki>chunkType-head:JJP2 | 11 | pof | _ | _ | |
| | 11 | है | है | VM | v | lex-hE<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-hE<nowiki>|</nowiki>tam-hE<nowiki>|</nowiki>stype-declarative<nowiki>|</nowiki>posn-110<nowiki>|</nowiki>voicetype-active<nowiki>|</nowiki>name-hE<nowiki>|</nowiki>chunkId-VGF2<nowiki>|</nowiki>chunkType-head:VGF2 | 4 | ccof | _ | _ | |
| | 12 | . | . | SYM | punc | lex-.<nowiki>|</nowiki>cat-punc<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-120<nowiki>|</nowiki>chunkType-child:VGF2<nowiki>|</nowiki>name-. | 11 | rsym | _ | _ | |
| | |||||||||| |
| | |||||||||| |
| | 1 | बृहस्पतिवार | बृहस्पतिवार | NNP | n | lex-bqhaspawivAra<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_ko<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-10<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-bqhaspawivAra<nowiki>|</nowiki>chunkId-NP<nowiki>|</nowiki>chunkType-head:NP | 6 | k7t | _ | _ | |
| | 2 | को | को | PSP | psp | lex-ko<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-20<nowiki>|</nowiki>chunkType-child:NP<nowiki>|</nowiki>name-ko | 1 | lwg__psp | _ | _ | |
| | 3 | ज़ी | जी | NNP | n | lex-jI<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_meM<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-30<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-jZI<nowiki>|</nowiki>chunkId-NP2<nowiki>|</nowiki>chunkType-head:NP2 | 6 | k7 | _ | _ | |
| | 4 | में | में | PSP | psp | lex-meM<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-40<nowiki>|</nowiki>chunkType-child:NP2<nowiki>|</nowiki>name-meM | 3 | lwg__psp | _ | _ | |
| | 5 | शुरू | शुरू | NN | n | lex-SurU<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-50<nowiki>|</nowiki>name-SurU<nowiki>|</nowiki>chunkId-NP3<nowiki>|</nowiki>chunkType-head:NP3 | 6 | pof | _ | _ | |
| | 6 | हुए | हो | VM | v | lex-ho<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-eM<nowiki>|</nowiki>tam-eM<nowiki>|</nowiki>posn-60<nowiki>|</nowiki>name-hue<nowiki>|</nowiki>chunkId-VGNF<nowiki>|</nowiki>chunkType-head:VGNF | 10 | nmod__k1inv | _ | _ | |
| | 7 | ��वें | ��वें | XC | n | lex-��veM<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-70<nowiki>|</nowiki>chunkType-child:NP4<nowiki>|</nowiki>name-��veM | 10 | mod | _ | _ | |
| | 8 | अंतर्राष्ट्रीय | अंतर्राष्ट्रीय | XC | n | lex-aMwarrARtrIya<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-80<nowiki>|</nowiki>chunkType-child:NP4<nowiki>|</nowiki>name-aMwarrARtrIya | 10 | mod | _ | _ | |
| | 9 | फिल्म | फिल्म | XC | n | lex-Pilma<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-90<nowiki>|</nowiki>chunkType-child:NP4<nowiki>|</nowiki>name-Pilma | 10 | mod | _ | _ | |
| | 10 | महोत्सव | महोत्सव | NNP | n | lex-mahowsava<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_kA<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-100<nowiki>|</nowiki>vpos-vib_5<nowiki>|</nowiki>name-mahowsava<nowiki>|</nowiki>chunkId-NP4<nowiki>|</nowiki>chunkType-head:NP4 | 12 | r6 | _ | _ | |
| | 11 | के | का | PSP | psp | lex-kA<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-110<nowiki>|</nowiki>chunkType-child:NP4<nowiki>|</nowiki>name-ke | 10 | lwg__psp | _ | _ | |
| | 12 | रंग | रंग | NN | n | lex-raMga<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_meM<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-120<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-raMga<nowiki>|</nowiki>chunkId-NP5<nowiki>|</nowiki>chunkType-head:NP5 | 17 | k7 | _ | _ | |
| | 13 | में | में | PSP | psp | lex-meM<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-130<nowiki>|</nowiki>chunkType-child:NP5<nowiki>|</nowiki>name-meM2 | 12 | lwg__psp | _ | _ | |
| | 14 | भंग | भंग | JJ | adj | lex-BaMga<nowiki>|</nowiki>cat-adj<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-any<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-140<nowiki>|</nowiki>name-BaMga<nowiki>|</nowiki>chunkId-JJP<nowiki>|</nowiki>chunkType-head:JJP | 17 | pof | _ | _ | |
| | 15 | उस | वह | DEM | pn | lex-vaha<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-150<nowiki>|</nowiki>chunkType-child:NP6<nowiki>|</nowiki>name-usa | 16 | nmod__adj | _ | _ | |
| | 16 | समय | समय | NN | n | lex-samaya<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-160<nowiki>|</nowiki>name-samaya<nowiki>|</nowiki>chunkId-NP6<nowiki>|</nowiki>chunkType-head:NP6 | 17 | k7t | _ | _ | |
| | 17 | पड़ा | पड | VM | v | lex-pada<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-yA<nowiki>|</nowiki>tam-yA<nowiki>|</nowiki>stype-declarative<nowiki>|</nowiki>posn-170<nowiki>|</nowiki>voicetype-active<nowiki>|</nowiki>name-padZA<nowiki>|</nowiki>chunkId-VGF<nowiki>|</nowiki>chunkType-head:VGF | 0 | main | _ | _ | |
| | 18 | जब | जब | PRP | pn | lex-jaba<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-180<nowiki>|</nowiki>coref-samaya<nowiki>|</nowiki>name-jaba<nowiki>|</nowiki>chunkId-NP7<nowiki>|</nowiki>chunkType-head:NP7 | 32 | k7t | _ | _ | |
| | 19 | वहां | वहाँ | PRP | pn | lex-vahAz<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_para<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-190<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-vahAM<nowiki>|</nowiki>chunkId-NP8<nowiki>|</nowiki>chunkType-head:NP8 | 21 | jjmod | _ | _ | |
| | 20 | पर | पर | PSP | psp | lex-para<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-200<nowiki>|</nowiki>chunkType-child:NP8<nowiki>|</nowiki>name-para | 19 | lwg__psp | _ | _ | |
| | 21 | तैनात | तैनात | JJ | adj | lex-wEnAwa<nowiki>|</nowiki>cat-adj<nowiki>|</nowiki>gend-any<nowiki>|</nowiki>num-any<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-210<nowiki>|</nowiki>name-wEnAwa<nowiki>|</nowiki>chunkId-JJP2<nowiki>|</nowiki>chunkType-head:JJP2 | 22 | nmod | _ | _ | |
| | 22 | सुरक्षाकर्मियों | सुरक्षाकर्मी | NN | n | lex-surakRAkarmI<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-pl<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_ne<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-220<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-surakRAkarmiyoM<nowiki>|</nowiki>chunkId-NP9<nowiki>|</nowiki>chunkType-head:NP9 | 32 | k1 | _ | _ | |
| | 23 | ने | ने | PSP | psp | lex-ne<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-230<nowiki>|</nowiki>chunkType-child:NP9<nowiki>|</nowiki>name-ne | 22 | lwg__psp | _ | _ | |
| | 24 | बॉलीवुड | बॉलीवुड | NN | n | lex-bOYlIvuda<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_kA<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-240<nowiki>|</nowiki>vpos-vib_2<nowiki>|</nowiki>name-bOYlIvuda<nowiki>|</nowiki>chunkId-NP10<nowiki>|</nowiki>chunkType-head:NP10 | 28 | r6 | _ | _ | |
| | 25 | की | का | PSP | psp | lex-kA<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-250<nowiki>|</nowiki>chunkType-child:NP10<nowiki>|</nowiki>name-kI | 24 | lwg__psp | _ | _ | |
| | 26 | अभिनेत्री | अभिनेत्री | NN | n | lex-aBinewrI<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-260<nowiki>|</nowiki>chunkType-child:NP11<nowiki>|</nowiki>name-aBinewrI | 27 | nmod | _ | _ | |
| | 27 | बिपाशा | बिपाशा | NN | n | lex-bipASA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-270<nowiki>|</nowiki>chunkType-child:NP11<nowiki>|</nowiki>name-bipASA | 28 | nmod | _ | _ | |
| | 28 | बसु | बसु | NNP | n | lex-basu<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-f<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-0_ke_sAWa<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-280<nowiki>|</nowiki>vpos-vib_vib_vib_4_5<nowiki>|</nowiki>name-basu<nowiki>|</nowiki>chunkId-NP11<nowiki>|</nowiki>chunkType-head:NP11 | 32 | k2 | _ | _ | |
| | 29 | के | के | PSP | psp | lex-ke<nowiki>|</nowiki>cat-psp<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-290<nowiki>|</nowiki>chunkType-child:NP11<nowiki>|</nowiki>name-ke2 | 28 | lwg__psp | _ | _ | |
| | 30 | साथ | साथ | NST | nst | lex-sAWa<nowiki>|</nowiki>cat-nst<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-300<nowiki>|</nowiki>chunkType-child:NP11<nowiki>|</nowiki>name-sAWa | 28 | lwg__psp | _ | _ | |
| | 31 | दुव्यर्वहार | दुव्यर्वहार | NN | n | lex-xuvyarvahAra<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-3<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>posn-310<nowiki>|</nowiki>name-xuvyarvahAra<nowiki>|</nowiki>chunkId-NP12<nowiki>|</nowiki>chunkType-head:NP12 | 32 | pof | _ | _ | |
| | 32 | किया | कर | VM | v | lex-kara<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-m<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-yA<nowiki>|</nowiki>tam-yA<nowiki>|</nowiki>stype-declarative<nowiki>|</nowiki>posn-320<nowiki>|</nowiki>voicetype-active<nowiki>|</nowiki>name-kiyA<nowiki>|</nowiki>chunkId-VGF2<nowiki>|</nowiki>chunkType-head:VGF2 | 16 | nmod__relc | _ | _ | |
| | 33 | . | . | SYM | punc | lex-.<nowiki>|</nowiki>cat-punc<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>posn-330<nowiki>|</nowiki>chunkType-child:VGF2<nowiki>|</nowiki>name-. | 32 | rsym | _ | _ | |
| |
The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: | The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: |