[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hi [2011/12/06 16:24]
zeman vytvořeno
user:zeman:treebanks:hi [2011/12/06 17:12]
zeman CoNLL sample corrected.
Line 64: Line 64:
  
 ==== Inside ==== ==== Inside ====
 +
 +  * Broken characters (''\x{FFFD} REPLACEMENT CHARACTER'') in the WX encoding.
 +
 +--
  
 The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Bengali in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Bengali in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion.
Line 87: Line 91:
 ==== Sample ==== ==== Sample ====
  
-The first sentence of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format:+The first two sentences of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format:
  
-<code xml><document id=""> +<code xml><document docid="hi">
 <head> <head>
-<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20100831">+<title>  </title>  
 +<author>  
 +<firstname>  </firstname>  
 +<middlename>    </middlename>  
 +<lastname></lastname>  
 +</author>  
 +<availability format="electronic" />  
 +<bibl>  
 +</bibl>  
 +<bytecount>8.0K</bytecount>  
 +<domain name="general" />  
 +<creation creationdate="19/06/2007" institutename="IIIT Hyderabad">  
 +<creatorname>  
 +<lastname>Dipti</lastname>  
 +<middlename>  
 +</middlename>  
 +<firstname>Sharma</firstname>  
 +</creatorname>  
 +</creation>  
 +<distributor>CLIA Consortia, DIT</distributor>  
 +<edition number="1.0" />  
 +<encodingdesc>  
 +<newencoding>Unicode(UTF-8)</newencoding>  
 +<originalencoding>UTF-8</originalencoding>  
 +</encodingdesc>  
 +<sentencemarker marker=".">Specify Marker</sentencemarker>  
 +<language name="hi" writingsystem="LTR" script="Devanagari" />  
 +<normalization normalized="no">  
 +<utilityname>xxx.exe</utilityname>  
 +</normalization>  
 +<projectdesc name="ILMT" />  
 +<pubaddress addresstype="web">  
 +</pubaddress>  
 +<pubdate>  
 +<dateofpublication></dateofpublication>  
 +</pubdate>  
 +<publicationstmt type="copyrightfree">  
 +</publicationstmt>  
 +<publisher>  
 +<name></name>  
 +<url>xxx.com</url>  
 +</publisher>  
 +<pubplace place="books" />  
 +<wordcount> </wordcount>  
 +<caption>xuvryavahAra se biParIM bipASA Pilma mahowsava se vApasa lOta gaI bipASA govA. </caption>  
 +</caption>  
 + 
 +<annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20100823">
     <annotation-standard>     <annotation-standard>
         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />         <morph-standard name="Anncorra-morph" version="1.31" date="20080920" />
         <pos-standard name="Anncorra-pos" version="" date="20061215" />         <pos-standard name="Anncorra-pos" version="" date="20061215" />
         <chunk-standard name="Anncorra-chunk" version="" date="20061215" />         <chunk-standard name="Anncorra-chunk" version="" date="20061215" />
 +        <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" />
         <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />         <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" />
     </annotation-standard>     </annotation-standard>
-</annotated-resource>  +</annotated-resource> 
-</head> +</head> 
 +<body> 
 +<tb number="1" segment="no" bullet="no"> 
 +<foreign language="select" writingsystem="LTR"></foreign> 
 +<text>
 <Sentence id="1"> <Sentence id="1">
-1 (( NP <fs af='Age,adv,,,,,,' head="Agei" drel=k7t:VGF name=NP+1 bAwa NN <fs af='bAwa,n,f,sg,3,d,0,0' drel='k1:ho' posn='10' name='bAwa' chunkId='NP' chunkType='head:NP'> 
-1.1 mudZira NN <fs af='mudZi,n,,sg,,o,era,era'> +2 galawa JJ <fs af='galawa,adj,any,any,,any,,' drel='k1s:ho' posn='20' name='galawa' chunkId='JJP' chunkType='head:JJP'
-1.2 Agei NST <fs af='Age,adv,,,,,,' name="Agei"+3 ho VM <fs af='ho,v,any,any,any,,0,0' drel='vmod:hE' stype='declarative' posn='30' voicetype='active' name='ho' chunkId='VGF' chunkType='head:VGF'> 
- ))  +4 wo CC <fs af='wo,avy,,,,,,' posn='40' name='wo' chunkId='CCP' chunkType='head:CCP'
-2 (( NP <fs af='cA,n,,sg,,d,0,0' head="cA" drel=k1:VGF name=NP2> +5 gussA NN <fs af='gussA,n,m,sg,3,d,0,0' drel='pof:AnA' posn='50' name='gussA' chunkId='NP2' chunkType='head:NP2'
-2.1 praWama QO <fs af='praWama,num,,,,,,'> +6 selebritija NN <fs af='selebritija,unk,,,,,0_ko,' drel='k4a:AnA' posn='60' vpos='vib_2_RP' name='selebritija' chunkId='NP3' chunkType='head:NP3'> 
-2.2 kApa NN <fs af='kApa,unk,,,,,,'> +7 ko PSP <fs af='ko,psp,,,,,,' posn='70' drel='lwg__psp:selebritija' chunkType='child:NP3' name='ko'> 
-2.3 cA NN <fs af='cA,n,,sg,,d,0,0' name="cA"+8 BI RP <fs af='BI,avy,,,,,,' posn='80' drel='lwg__rp:selebritija' chunkType='child:NP3' name='BI'
- ))  +9 AnA VM <fs af='A,v,any,any,any,d,nA,nA' drel='k1:hE' posn='90' name='AnA' chunkId='VGNN' chunkType='head:VGNN'> 
-3 (( VGF <fs af='As,v,,,5,,A_yA+Ce,A' head="ese" name=VGF+10 lAjamI JJ <fs af='lAjamI,adj,any,any,,,,' drel='pof:hE' posn='100' name='lAjamI' chunkId='JJP2' chunkType='head:JJP2'
-3.1 ese VM <fs af='As,v,,,7,,A,A' name="ese"+11 hE VM <fs af='hE,v,any,sg,3,,hE,hE' drel='ccof:wo' stype='declarative' posn='110' voicetype='active' name='hE' chunkId='VGF2' chunkType='head:VGF2'
-3.2 . SYM <fs af='.,punc,,,,,,'> +12 . SYM <fs af='.,punc,,,,,,' posn='120' drel='rsym:hE' chunkType='child:VGF2' name='.'> 
- ))  +</Sentence>
-</Sentence></code>+
  
-And in the CoNLL format: 
  
-| 1 | Agei | Age | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | +<Sentence id="2"> 
-| 2 | cA | cA | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | +1 bqhaspawivAra NNP <fs af='bqhaspawivAra,n,m,sg,3,o,0_ko,0' drel='k7t:hue' posn='10' vpos='vib_2' name='bqhaspawivAra' chunkId='NP' chunkType='head:NP'> 
-| 3 | ese | As | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+2 ko PSP <fs af='ko,psp,,,,,,' posn='20' drel='lwg__psp:bqhaspawivAra' chunkType='child:NP' name='ko'> 
 +3 jZI NNP <fs af='jI,n,m,sg,3,o,0_meM,0' drel='k7:hue' posn='30' vpos='vib_2' name='jZI' chunkId='NP2' chunkType='head:NP2'> 
 +4 meM PSP <fs af='meM,psp,,,,,,' posn='40' drel='lwg__psp:jZI' chunkType='child:NP2' name='meM'> 
 +5 SurU NN <fs af='SurU,n,m,sg,3,d,0,0' drel='pof:hue' posn='50' name='SurU' chunkId='NP3' chunkType='head:NP3'> 
 +6 hue VM <fs af='ho,v,m,sg,any,,eM,eM' drel='nmod__k1inv:mahowsava' posn='60' name='hue' chunkId='VGNF' chunkType='head:VGNF'> 
 +7 ��veM XC <fs af='��veM,n,m,sg,3,d,0,0' posn='70' drel='mod:mahowsava' chunkType='child:NP4' name='��veM'> 
 +8 aMwarrARtrIya XC <fs af='aMwarrARtrIya,n,m,sg,3,d,0,0' posn='80' drel='mod:mahowsava' chunkType='child:NP4' name='aMwarrARtrIya'> 
 +9 Pilma XC <fs af='Pilma,n,f,sg,3,d,0,0' posn='90' drel='mod:mahowsava' chunkType='child:NP4' name='Pilma'> 
 +10 mahowsava NNP <fs af='mahowsava,n,m,sg,,o,0_kA,0' drel='r6:raMga' posn='100' vpos='vib_5' name='mahowsava' chunkId='NP4' chunkType='head:NP4'> 
 +11 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='110' drel='lwg__psp:mahowsava' chunkType='child:NP4' name='ke'
 +12 raMga NN <fs af='raMga,n,m,sg,3,o,0_meM,0' drel='k7:padZA' posn='120' vpos='vib_2' name='raMga' chunkId='NP5' chunkType='head:NP5'> 
 +13 meM PSP <fs af='meM,psp,,,,,,' posn='130' drel='lwg__psp:raMga' chunkType='child:NP5' name='meM2'> 
 +14 BaMga JJ <fs af='BaMga,adj,any,any,,any,,' drel='pof:padZA' posn='140' name='BaMga' chunkId='JJP' chunkType='head:JJP'> 
 +15 usa DEM <fs af='vaha,pn,any,sg,3,o,,' posn='150' drel='nmod__adj:samaya' chunkType='child:NP6' name='usa'> 
 +16 samaya NN <fs af='samaya,n,any,sg,3,d,0,0' drel='k7t:padZA' posn='160' name='samaya' chunkId='NP6' chunkType='head:NP6'> 
 +17 padZA VM <fs af='pada,v,any,any,any,,yA,yA' stype='declarative' posn='170' voicetype='active' name='padZA' chunkId='VGF' chunkType='head:VGF'> 
 +18 jaba PRP <fs af='jaba,pn,,,,,,' drel='k7t:kiyA' posn='180' coref='samaya' name='jaba' chunkId='NP7' chunkType='head:NP7'> 
 +19 vahAM PRP <fs af='vahAz,pn,,,,,0_para,' drel='jjmod:wEnAwa' posn='190' vpos='vib_2' name='vahAM' chunkId='NP8' chunkType='head:NP8'> 
 +20 para PSP <fs af='para,psp,,,,,,' posn='200' drel='lwg__psp:vahAM' chunkType='child:NP8' name='para'> 
 +21 wEnAwa JJ <fs af='wEnAwa,adj,any,any,,o,,' drel='nmod:surakRAkarmiyoM' posn='210' name='wEnAwa' chunkId='JJP2' chunkType='head:JJP2'> 
 +22 surakRAkarmiyoM NN <fs af='surakRAkarmI,n,m,pl,3,o,0_ne,0' drel='k1:kiyA' posn='220' vpos='vib_2' name='surakRAkarmiyoM' chunkId='NP9' chunkType='head:NP9'> 
 +23 ne PSP <fs af='ne,psp,,,,,,' posn='230' drel='lwg__psp:surakRAkarmiyoM' chunkType='child:NP9' name='ne'> 
 +24 bOYlIvuda NN <fs af='bOYlIvuda,n,m,sg,3,o,0_kA,0' drel='r6:basu' posn='240' vpos='vib_2' name='bOYlIvuda' chunkId='NP10' chunkType='head:NP10'> 
 +25 kI PSP <fs af='kA,psp,f,sg,,o,,' posn='250' drel='lwg__psp:bOYlIvuda' chunkType='child:NP10' name='kI'> 
 +26 aBinewrI NN <fs af='aBinewrI,n,f,sg,3,o,0,0' posn='260' drel='nmod:bipASA' chunkType='child:NP11' name='aBinewrI'> 
 +27 bipASA NN <fs af='bipASA,n,f,sg,3,d,0,0' posn='270' drel='nmod:basu' chunkType='child:NP11' name='bipASA'> 
 +28 basu NNP <fs af='basu,n,f,sg,3,o,0_ke_sAWa,0' drel='k2:kiyA' posn='280' vpos='vib_vib_vib_4_5' name='basu' chunkId='NP11' chunkType='head:NP11'> 
 +29 ke PSP <fs af='ke,psp,,,,,,' posn='290' drel='lwg__psp:basu' chunkType='child:NP11' name='ke2'> 
 +30 sAWa NST <fs af='sAWa,nst,m,sg,3,d,,' posn='300' drel='lwg__psp:basu' chunkType='child:NP11' name='sAWa'> 
 +31 xuvyarvahAra NN <fs af='xuvyarvahAra,n,m,sg,3,d,0,0' drel='pof:kiyA' posn='310' name='xuvyarvahAra' chunkId='NP12' chunkType='head:NP12'> 
 +32 kiyA VM <fs af='kara,v,m,sg,any,,yA,yA' drel='nmod__relc:samaya' stype='declarative' posn='320' voicetype='active' name='kiyA' chunkId='VGF2' chunkType='head:VGF2'> 
 +33 . SYM <fs af='.,punc,,,,,,' posn='330' drel='rsym:kiyA' chunkType='child:VGF2' name='.'> 
 +</Sentence></code>
  
-And after conversion of the WX encoding to the Bengali script in UTF-8:+The same two sentences converted to the CoNLL format, WX characters decoded back to Devanagari in UTF-8:
  
-| 1 | আগেই আগে | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | +<nowiki>1</nowiki> <nowiki>बात</nowiki> <nowiki>बात</nowiki> <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|name-bAwa|chunkId-NP|chunkType-head:NP</nowiki> <nowiki>3</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | চা চা NP NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | +| <nowiki>2</nowiki> | <nowiki>गलत</nowiki> | <nowiki>गलत</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-galawa|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-20|name-galawa|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>3</nowiki> | <nowiki>k1s</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | এসে আস্ VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ |+| <nowiki>3</nowiki> | <nowiki>हो</nowiki> | <nowiki>हो</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-any|num-any|pers-any|case-|vib-0|tam-0|stype-declarative|posn-30|voicetype-active|name-ho|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>11</nowiki> | <nowiki>vmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>4</nowiki> | <nowiki>तो</nowiki> | <nowiki>तो</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-wo|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-40|name-wo|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>5</nowiki> | <nowiki>गुस्सा</nowiki> | <nowiki>गुस्सा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-gussA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-gussA|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>9</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>6</nowiki> | <nowiki>सेलेब्रिटिज</nowiki> | <nowiki>सेलेब्रिटिज</nowiki> | <nowiki>NN</nowiki> | <nowiki>unk</nowiki> | <nowiki>lex-selebritija|cat-unk|gend-|num-|pers-|case-|vib-0_ko|tam-|posn-60|vpos-vib_2_RP|name-selebritija|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>9</nowiki> | <nowiki>k4a</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>7</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP3|name-ko</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>8</nowiki> | <nowiki>भी</nowiki> | <nowiki>भी</nowiki> | <nowiki>RP</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-BI|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-80|chunkType-child:NP3|name-BI</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__rp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>9</nowiki> | <nowiki>आना</nowiki> | <nowiki>आ</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-A|cat-v|gend-any|num-any|pers-any|case-d|vib-nA|tam-nA|posn-90|name-AnA|chunkId-VGNN|chunkType-head:VGNN</nowiki> | <nowiki>11</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>10</nowiki> | <nowiki>लाजमी</nowiki> | <nowiki>लाजमी</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-lAjamI|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-100|name-lAjamI|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>11</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>11</nowiki> | <nowiki>है</nowiki> | <nowiki>है</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-hE|cat-v|gend-any|num-sg|pers-3|case-|vib-hE|tam-hE|stype-declarative|posn-110|voicetype-active|name-hE|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>4</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>12</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-120|chunkType-child:VGF2|name-.</nowiki> <nowiki>11</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| |||||||||| 
 +| <nowiki>1</nowiki> | <nowiki>बृहस्पतिवार</nowiki> | <nowiki>बृहस्पतिवार</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bqhaspawivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-10|vpos-vib_2|name-bqhaspawivAra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>6</nowiki> | <nowiki>k7t</nowiki> <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +<nowiki>2</nowiki> <nowiki>को</nowiki> <nowiki>को</nowiki> <nowiki>PSP</nowiki> <nowiki>psp</nowiki> <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-20|chunkType-child:NP|name-ko</nowiki> | <nowiki>1</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>3</nowiki> | <nowiki>ज़ी</nowiki> | <nowiki>जी</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jI|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_2|name-jZI|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>6</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>4</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP2|name-meM</nowiki> | <nowiki>3</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>5</nowiki> | <nowiki>शुरू</nowiki> | <nowiki>शुरू</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-SurU|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-SurU|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>6</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>6</nowiki> | <nowiki>हुए</nowiki> | <nowiki>हो</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-m|num-sg|pers-any|case-|vib-eM|tam-eM|posn-60|name-hue|chunkId-VGNF|chunkType-head:VGNF</nowiki> | <nowiki>10</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>7</nowiki> | <nowiki>��वें</nowiki> | <nowiki>��वें</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-��veM|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP4|name-��veM</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>8</nowiki> | <nowiki>अंतर्राष्ट्रीय</nowiki> | <nowiki>अंतर्राष्ट्रीय</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aMwarrARtrIya|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-80|chunkType-child:NP4|name-aMwarrARtrIya</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>9</nowiki> | <nowiki>फिल्म</nowiki> | <nowiki>फिल्म</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-Pilma|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-90|chunkType-child:NP4|name-Pilma</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>10</nowiki> | <nowiki>महोत्सव</nowiki> | <nowiki>महोत्सव</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-mahowsava|cat-n|gend-m|num-sg|pers-|case-o|vib-0_kA|tam-0|posn-100|vpos-vib_5|name-mahowsava|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>12</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>11</nowiki> | <nowiki>के</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-110|chunkType-child:NP4|name-ke</nowiki> | <nowiki>10</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>12</nowiki> | <nowiki>रंग</nowiki> | <nowiki>रंग</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-raMga|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-120|vpos-vib_2|name-raMga|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>17</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>13</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-130|chunkType-child:NP5|name-meM2</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>14</nowiki> | <nowiki>भंग</nowiki> | <nowiki>भंग</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-BaMga|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-140|name-BaMga|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>17</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>15</nowiki> | <nowiki>उस</nowiki> | <nowiki>वह</nowiki> | <nowiki>DEM</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-|tam-|posn-150|chunkType-child:NP6|name-usa</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +<nowiki>16</nowiki> | <nowiki>समय</nowiki> | <nowiki>समय</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-samaya|cat-n|gend-any|num-sg|pers-3|case-d|vib-0|tam-0|posn-160|name-samaya|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>17</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>17</nowiki> | <nowiki>पड़ा</nowiki> | <nowiki>पड</nowiki> | <nowiki>VM</nowiki> <nowiki>v</nowiki> | <nowiki>lex-pada|cat-v|gend-any|num-any|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-170|voicetype-active|name-padZA|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>18</nowiki> | <nowiki>जब</nowiki> | <nowiki>जब</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-jaba|cat-pn|gend-|num-|pers-|case-|vib-|tam-|posn-180|coref-samaya|name-jaba|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>32</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>19</nowiki> | <nowiki>वहां</nowiki> | <nowiki>वहाँ</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vahAz|cat-pn|gend-|num-|pers-|case-|vib-0_para|tam-|posn-190|vpos-vib_2|name-vahAM|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>21</nowiki> | <nowiki>jjmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>20</nowiki> | <nowiki>पर</nowiki> | <nowiki>पर</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP8|name-para</nowiki> | <nowiki>19</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>21</nowiki> | <nowiki>तैनात</nowiki> | <nowiki>तैनात</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-wEnAwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-210|name-wEnAwa|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>22</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>22</nowiki> | <nowiki>सुरक्षाकर्मियों</nowiki> | <nowiki>सुरक्षाकर्मी</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-surakRAkarmI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ne|tam-0|posn-220|vpos-vib_2|name-surakRAkarmiyoM|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>32</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>23</nowiki> | <nowiki>ने</nowiki> | <nowiki>ने</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP9|name-ne</nowiki> | <nowiki>22</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>24</nowiki> | <nowiki>बॉलीवुड</nowiki> | <nowiki>बॉलीवुड</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bOYlIvuda|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-240|vpos-vib_2|name-bOYlIvuda|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>28</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>25</nowiki> | <nowiki>की</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-f|num-sg|pers-|case-o|vib-|tam-|posn-250|chunkType-child:NP10|name-kI</nowiki> | <nowiki>24</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>26</nowiki> | <nowiki>अभिनेत्री</nowiki> | <nowiki>अभिनेत्री</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aBinewrI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0|tam-0|posn-260|chunkType-child:NP11|name-aBinewrI</nowiki> | <nowiki>27</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>27</nowiki> | <nowiki>बिपाशा</nowiki> | <nowiki>बिपाशा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bipASA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|chunkType-child:NP11|name-bipASA</nowiki> | <nowiki>28</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>28</nowiki> | <nowiki>बसु</nowiki> | <nowiki>बसु</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-basu|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_ke_sAWa|tam-0|posn-280|vpos-vib_vib_vib_4_5|name-basu|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>32</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>29</nowiki> | <nowiki>के</nowiki> | <nowiki>के</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ke|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-290|chunkType-child:NP11|name-ke2</nowiki> | <nowiki>28</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>30</nowiki> | <nowiki>साथ</nowiki> | <nowiki>साथ</nowiki> | <nowiki>NST</nowiki> | <nowiki>nst</nowiki> | <nowiki>lex-sAWa|cat-nst|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-300|chunkType-child:NP11|name-sAWa</nowiki> <nowiki>28</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>31</nowiki> | <nowiki>दुव्यर्वहार</nowiki> | <nowiki>दुव्यर्वहार</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xuvyarvahAra|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-310|name-xuvyarvahAra|chunkId-NP12|chunkType-head:NP12</nowiki> | <nowiki>32</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> <nowiki>_</nowiki>
 +| <nowiki>32</nowiki> | <nowiki>किया</nowiki> | <nowiki>कर</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-320|voicetype-active|name-kiyA|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__relc</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| <nowiki>33</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-.</nowiki> | <nowiki>32</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
  
 The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format:

[ Back to the navigation ] [ Back to the content ]