Hyderabad Dependency Treebank (HyDT-Hindi)
There has been no official release of the treebank yet. There have been three as-is sample releases for the purposes of the NLP tools contests in parsing Indian languages, attached to the ICON 2009 and 2010 conferences and the MTPIL workshop of COLING 2012.
There is no standard distribution channel for the treebank after the shared task evaluation period. Inquire at the LTRC (ltrc (at) iiit (dot) ac (dot) in) about the possibility of getting the data. The ICON 2010 and HPST 2012 license in short:
HyDT-Hindi is being created by members of the Language Technologies Research Centre, International Institute of Information Technology, Gachibowli, Hyderabad, 500032, India.
News domain corpus from ISI Kolkata.
HyDT-Hindi contains dependencies on two levels: between chunks and inside chunks. The ICON 2009 CoNLL-formatted version contained only dependencies between chunks, thus the node/tree ratio was much lower than in other treebanks. The ICON 2009 version came with a data split into three parts: training, development and test:
Part | Sentences | Chunks | Ratio |
---|---|---|---|
Training | 1501 | 13779 | 9.18 |
Development | 150 | 1250 | 8.33 |
Test | 150 | 1156 | 7.71 |
TOTAL | 1801 | 16185 | 8.99 |
The ICON 2010 version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added:
Part | Sentences | Chunks | Ratio | Words | Ratio |
---|---|---|---|---|---|
Training | 2972 | 64452 | 21.69 | ||
Development | 543 | 12616 | 23.23 | ||
Test | 321 | 6588 | 20.52 | ||
TOTAL | 3836 | 83656 | 21.81 |
I have counted the sentences and tokens (words) on the .conll
files; there are slight differences from the statistics presented in (Husain et al., 2010).
The HTB 0.5 (2012) version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added:
Part | Sentences | Chunks | Ratio | Words | Ratio |
---|---|---|---|---|---|
Training | 12041 | 268093 | 22.27 | ||
Development | 1233 | 26416 | 21.42 | ||
Test | |||||
TOTAL |
HTB 0.5 is distributed in Devanagari UTF-8 and in the WX encoding (see below), both in SSF and CoNLL formats, each with gold-standard and automatic morphology.
The rest of this section applies to the ICON datasets. It may or may not still be valid for HTB 0.5.
The text uses the WX encoding of Indian letters. If we know what the original script is (Devanagari in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. Note that there are (not infrequent) broken characters (\x{FFFD} REPLACEMENT CHARACTER
) in the WX encoding and the correct characters cannot be recovered automatically.
Occasionally there are NULL
nodes that do not correspond to any surface chunk or token. They represent ellided participants.
The syntactic tags (dependency relation labels) are karaka relations, i.e. deep syntactic roles according to the Pāṇinian grammar. There are separate versions of the treebank with fine-grained and coarse-grained syntactic tags.
According to (Husain et al., 2010), in the ICON 2010 version, the chunk tags, POS tags, lemma, morphosyntactic features and inter-chunk dependencies (topology + tags) were annotated manually. The rest (intra-chunk dependencies, headword of chunk) was marked automatically. The tool for intra-chunk dependency parsing achieves about 96% accuracy.
Note: There have been cycles in the Hindi part of HyDT.
The first two sentences of the ICON 2010 training data (with fine-grained syntactic tags) in the Shakti format:
<document docid="hi"> <head> <title> </title> <author> <firstname> </firstname> <middlename> </middlename> <lastname></lastname> </author> <availability format="electronic" /> <bibl> </bibl> <bytecount>8.0K</bytecount> <domain name="general" /> <creation creationdate="19/06/2007" institutename="IIIT Hyderabad"> <creatorname> <lastname>Dipti</lastname> <middlename> </middlename> <firstname>Sharma</firstname> </creatorname> </creation> <distributor>CLIA Consortia, DIT</distributor> <edition number="1.0" /> <encodingdesc> <newencoding>Unicode(UTF-8)</newencoding> <originalencoding>UTF-8</originalencoding> </encodingdesc> <sentencemarker marker=".">Specify Marker</sentencemarker> <language name="hi" writingsystem="LTR" script="Devanagari" /> <normalization normalized="no"> <utilityname>xxx.exe</utilityname> </normalization> <projectdesc name="ILMT" /> <pubaddress addresstype="web"> </pubaddress> <pubdate> <dateofpublication></dateofpublication> </pubdate> <publicationstmt type="copyrightfree"> </publicationstmt> <publisher> <name></name> <url>xxx.com</url> </publisher> <pubplace place="books" /> <wordcount>2 </wordcount> <caption>xuvryavahAra se biParIM bipASA Pilma mahowsava se vApasa lOta gaI bipASA govA. </caption> </caption> <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20100823"> <annotation-standard> <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> <pos-standard name="Anncorra-pos" version="" date="20061215" /> <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> </annotation-standard> </annotated-resource> </head> <body> <tb number="1" segment="no" bullet="no"> <foreign language="select" writingsystem="LTR"></foreign> <text> <Sentence id="1"> 1 bAwa NN <fs af='bAwa,n,f,sg,3,d,0,0' drel='k1:ho' posn='10' name='bAwa' chunkId='NP' chunkType='head:NP'> 2 galawa JJ <fs af='galawa,adj,any,any,,any,,' drel='k1s:ho' posn='20' name='galawa' chunkId='JJP' chunkType='head:JJP'> 3 ho VM <fs af='ho,v,any,any,any,,0,0' drel='vmod:hE' stype='declarative' posn='30' voicetype='active' name='ho' chunkId='VGF' chunkType='head:VGF'> 4 wo CC <fs af='wo,avy,,,,,,' posn='40' name='wo' chunkId='CCP' chunkType='head:CCP'> 5 gussA NN <fs af='gussA,n,m,sg,3,d,0,0' drel='pof:AnA' posn='50' name='gussA' chunkId='NP2' chunkType='head:NP2'> 6 selebritija NN <fs af='selebritija,unk,,,,,0_ko,' drel='k4a:AnA' posn='60' vpos='vib_2_RP' name='selebritija' chunkId='NP3' chunkType='head:NP3'> 7 ko PSP <fs af='ko,psp,,,,,,' posn='70' drel='lwg__psp:selebritija' chunkType='child:NP3' name='ko'> 8 BI RP <fs af='BI,avy,,,,,,' posn='80' drel='lwg__rp:selebritija' chunkType='child:NP3' name='BI'> 9 AnA VM <fs af='A,v,any,any,any,d,nA,nA' drel='k1:hE' posn='90' name='AnA' chunkId='VGNN' chunkType='head:VGNN'> 10 lAjamI JJ <fs af='lAjamI,adj,any,any,,,,' drel='pof:hE' posn='100' name='lAjamI' chunkId='JJP2' chunkType='head:JJP2'> 11 hE VM <fs af='hE,v,any,sg,3,,hE,hE' drel='ccof:wo' stype='declarative' posn='110' voicetype='active' name='hE' chunkId='VGF2' chunkType='head:VGF2'> 12 . SYM <fs af='.,punc,,,,,,' posn='120' drel='rsym:hE' chunkType='child:VGF2' name='.'> </Sentence> <Sentence id="2"> 1 bqhaspawivAra NNP <fs af='bqhaspawivAra,n,m,sg,3,o,0_ko,0' drel='k7t:hue' posn='10' vpos='vib_2' name='bqhaspawivAra' chunkId='NP' chunkType='head:NP'> 2 ko PSP <fs af='ko,psp,,,,,,' posn='20' drel='lwg__psp:bqhaspawivAra' chunkType='child:NP' name='ko'> 3 jZI NNP <fs af='jI,n,m,sg,3,o,0_meM,0' drel='k7:hue' posn='30' vpos='vib_2' name='jZI' chunkId='NP2' chunkType='head:NP2'> 4 meM PSP <fs af='meM,psp,,,,,,' posn='40' drel='lwg__psp:jZI' chunkType='child:NP2' name='meM'> 5 SurU NN <fs af='SurU,n,m,sg,3,d,0,0' drel='pof:hue' posn='50' name='SurU' chunkId='NP3' chunkType='head:NP3'> 6 hue VM <fs af='ho,v,m,sg,any,,eM,eM' drel='nmod__k1inv:mahowsava' posn='60' name='hue' chunkId='VGNF' chunkType='head:VGNF'> 7 ��veM XC <fs af='��veM,n,m,sg,3,d,0,0' posn='70' drel='mod:mahowsava' chunkType='child:NP4' name='��veM'> 8 aMwarrARtrIya XC <fs af='aMwarrARtrIya,n,m,sg,3,d,0,0' posn='80' drel='mod:mahowsava' chunkType='child:NP4' name='aMwarrARtrIya'> 9 Pilma XC <fs af='Pilma,n,f,sg,3,d,0,0' posn='90' drel='mod:mahowsava' chunkType='child:NP4' name='Pilma'> 10 mahowsava NNP <fs af='mahowsava,n,m,sg,,o,0_kA,0' drel='r6:raMga' posn='100' vpos='vib_5' name='mahowsava' chunkId='NP4' chunkType='head:NP4'> 11 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='110' drel='lwg__psp:mahowsava' chunkType='child:NP4' name='ke'> 12 raMga NN <fs af='raMga,n,m,sg,3,o,0_meM,0' drel='k7:padZA' posn='120' vpos='vib_2' name='raMga' chunkId='NP5' chunkType='head:NP5'> 13 meM PSP <fs af='meM,psp,,,,,,' posn='130' drel='lwg__psp:raMga' chunkType='child:NP5' name='meM2'> 14 BaMga JJ <fs af='BaMga,adj,any,any,,any,,' drel='pof:padZA' posn='140' name='BaMga' chunkId='JJP' chunkType='head:JJP'> 15 usa DEM <fs af='vaha,pn,any,sg,3,o,,' posn='150' drel='nmod__adj:samaya' chunkType='child:NP6' name='usa'> 16 samaya NN <fs af='samaya,n,any,sg,3,d,0,0' drel='k7t:padZA' posn='160' name='samaya' chunkId='NP6' chunkType='head:NP6'> 17 padZA VM <fs af='pada,v,any,any,any,,yA,yA' stype='declarative' posn='170' voicetype='active' name='padZA' chunkId='VGF' chunkType='head:VGF'> 18 jaba PRP <fs af='jaba,pn,,,,,,' drel='k7t:kiyA' posn='180' coref='samaya' name='jaba' chunkId='NP7' chunkType='head:NP7'> 19 vahAM PRP <fs af='vahAz,pn,,,,,0_para,' drel='jjmod:wEnAwa' posn='190' vpos='vib_2' name='vahAM' chunkId='NP8' chunkType='head:NP8'> 20 para PSP <fs af='para,psp,,,,,,' posn='200' drel='lwg__psp:vahAM' chunkType='child:NP8' name='para'> 21 wEnAwa JJ <fs af='wEnAwa,adj,any,any,,o,,' drel='nmod:surakRAkarmiyoM' posn='210' name='wEnAwa' chunkId='JJP2' chunkType='head:JJP2'> 22 surakRAkarmiyoM NN <fs af='surakRAkarmI,n,m,pl,3,o,0_ne,0' drel='k1:kiyA' posn='220' vpos='vib_2' name='surakRAkarmiyoM' chunkId='NP9' chunkType='head:NP9'> 23 ne PSP <fs af='ne,psp,,,,,,' posn='230' drel='lwg__psp:surakRAkarmiyoM' chunkType='child:NP9' name='ne'> 24 bOYlIvuda NN <fs af='bOYlIvuda,n,m,sg,3,o,0_kA,0' drel='r6:basu' posn='240' vpos='vib_2' name='bOYlIvuda' chunkId='NP10' chunkType='head:NP10'> 25 kI PSP <fs af='kA,psp,f,sg,,o,,' posn='250' drel='lwg__psp:bOYlIvuda' chunkType='child:NP10' name='kI'> 26 aBinewrI NN <fs af='aBinewrI,n,f,sg,3,o,0,0' posn='260' drel='nmod:bipASA' chunkType='child:NP11' name='aBinewrI'> 27 bipASA NN <fs af='bipASA,n,f,sg,3,d,0,0' posn='270' drel='nmod:basu' chunkType='child:NP11' name='bipASA'> 28 basu NNP <fs af='basu,n,f,sg,3,o,0_ke_sAWa,0' drel='k2:kiyA' posn='280' vpos='vib_vib_vib_4_5' name='basu' chunkId='NP11' chunkType='head:NP11'> 29 ke PSP <fs af='ke,psp,,,,,,' posn='290' drel='lwg__psp:basu' chunkType='child:NP11' name='ke2'> 30 sAWa NST <fs af='sAWa,nst,m,sg,3,d,,' posn='300' drel='lwg__psp:basu' chunkType='child:NP11' name='sAWa'> 31 xuvyarvahAra NN <fs af='xuvyarvahAra,n,m,sg,3,d,0,0' drel='pof:kiyA' posn='310' name='xuvyarvahAra' chunkId='NP12' chunkType='head:NP12'> 32 kiyA VM <fs af='kara,v,m,sg,any,,yA,yA' drel='nmod__relc:samaya' stype='declarative' posn='320' voicetype='active' name='kiyA' chunkId='VGF2' chunkType='head:VGF2'> 33 . SYM <fs af='.,punc,,,,,,' posn='330' drel='rsym:kiyA' chunkType='child:VGF2' name='.'> </Sentence>
The same two sentences converted to the CoNLL format, WX characters decoded back to Devanagari in UTF-8:
1 | बात | बात | NN | n | lex-bAwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|name-bAwa|chunkId-NP|chunkType-head:NP | 3 | k1 | _ | _ |
2 | गलत | गलत | JJ | adj | lex-galawa|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-20|name-galawa|chunkId-JJP|chunkType-head:JJP | 3 | k1s | _ | _ |
3 | हो | हो | VM | v | lex-ho|cat-v|gend-any|num-any|pers-any|case-|vib-0|tam-0|stype-declarative|posn-30|voicetype-active|name-ho|chunkId-VGF|chunkType-head:VGF | 11 | vmod | _ | _ |
4 | तो | तो | CC | avy | lex-wo|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-40|name-wo|chunkId-CCP|chunkType-head:CCP | 0 | main | _ | _ |
5 | गुस्सा | गुस्सा | NN | n | lex-gussA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-gussA|chunkId-NP2|chunkType-head:NP2 | 9 | pof | _ | _ |
6 | सेलेब्रिटिज | सेलेब्रिटिज | NN | unk | lex-selebritija|cat-unk|gend-|num-|pers-|case-|vib-0_ko|tam-|posn-60|vpos-vib_2_RP|name-selebritija|chunkId-NP3|chunkType-head:NP3 | 9 | k4a | _ | _ |
7 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP3|name-ko | 6 | lwg__psp | _ | _ |
8 | भी | भी | RP | avy | lex-BI|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-80|chunkType-child:NP3|name-BI | 6 | lwg__rp | _ | _ |
9 | आना | आ | VM | v | lex-A|cat-v|gend-any|num-any|pers-any|case-d|vib-nA|tam-nA|posn-90|name-AnA|chunkId-VGNN|chunkType-head:VGNN | 11 | k1 | _ | _ |
10 | लाजमी | लाजमी | JJ | adj | lex-lAjamI|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-100|name-lAjamI|chunkId-JJP2|chunkType-head:JJP2 | 11 | pof | _ | _ |
11 | है | है | VM | v | lex-hE|cat-v|gend-any|num-sg|pers-3|case-|vib-hE|tam-hE|stype-declarative|posn-110|voicetype-active|name-hE|chunkId-VGF2|chunkType-head:VGF2 | 4 | ccof | _ | _ |
12 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-120|chunkType-child:VGF2|name-. | 11 | rsym | _ | _ |
1 | बृहस्पतिवार | बृहस्पतिवार | NNP | n | lex-bqhaspawivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-10|vpos-vib_2|name-bqhaspawivAra|chunkId-NP|chunkType-head:NP | 6 | k7t | _ | _ |
2 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-20|chunkType-child:NP|name-ko | 1 | lwg__psp | _ | _ |
3 | ज़ी | जी | NNP | n | lex-jI|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_2|name-jZI|chunkId-NP2|chunkType-head:NP2 | 6 | k7 | _ | _ |
4 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP2|name-meM | 3 | lwg__psp | _ | _ |
5 | शुरू | शुरू | NN | n | lex-SurU|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-SurU|chunkId-NP3|chunkType-head:NP3 | 6 | pof | _ | _ |
6 | हुए | हो | VM | v | lex-ho|cat-v|gend-m|num-sg|pers-any|case-|vib-eM|tam-eM|posn-60|name-hue|chunkId-VGNF|chunkType-head:VGNF | 10 | nmod__k1inv | _ | _ |
7 | ��वें | ��वें | XC | n | lex-��veM|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP4|name-��veM | 10 | mod | _ | _ |
8 | अंतर्राष्ट्रीय | अंतर्राष्ट्रीय | XC | n | lex-aMwarrARtrIya|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-80|chunkType-child:NP4|name-aMwarrARtrIya | 10 | mod | _ | _ |
9 | फिल्म | फिल्म | XC | n | lex-Pilma|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-90|chunkType-child:NP4|name-Pilma | 10 | mod | _ | _ |
10 | महोत्सव | महोत्सव | NNP | n | lex-mahowsava|cat-n|gend-m|num-sg|pers-|case-o|vib-0_kA|tam-0|posn-100|vpos-vib_5|name-mahowsava|chunkId-NP4|chunkType-head:NP4 | 12 | r6 | _ | _ |
11 | के | का | PSP | psp | lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-110|chunkType-child:NP4|name-ke | 10 | lwg__psp | _ | _ |
12 | रंग | रंग | NN | n | lex-raMga|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-120|vpos-vib_2|name-raMga|chunkId-NP5|chunkType-head:NP5 | 17 | k7 | _ | _ |
13 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-130|chunkType-child:NP5|name-meM2 | 12 | lwg__psp | _ | _ |
14 | भंग | भंग | JJ | adj | lex-BaMga|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-140|name-BaMga|chunkId-JJP|chunkType-head:JJP | 17 | pof | _ | _ |
15 | उस | वह | DEM | pn | lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-|tam-|posn-150|chunkType-child:NP6|name-usa | 16 | nmod__adj | _ | _ |
16 | समय | समय | NN | n | lex-samaya|cat-n|gend-any|num-sg|pers-3|case-d|vib-0|tam-0|posn-160|name-samaya|chunkId-NP6|chunkType-head:NP6 | 17 | k7t | _ | _ |
17 | पड़ा | पड | VM | v | lex-pada|cat-v|gend-any|num-any|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-170|voicetype-active|name-padZA|chunkId-VGF|chunkType-head:VGF | 0 | main | _ | _ |
18 | जब | जब | PRP | pn | lex-jaba|cat-pn|gend-|num-|pers-|case-|vib-|tam-|posn-180|coref-samaya|name-jaba|chunkId-NP7|chunkType-head:NP7 | 32 | k7t | _ | _ |
19 | वहां | वहाँ | PRP | pn | lex-vahAz|cat-pn|gend-|num-|pers-|case-|vib-0_para|tam-|posn-190|vpos-vib_2|name-vahAM|chunkId-NP8|chunkType-head:NP8 | 21 | jjmod | _ | _ |
20 | पर | पर | PSP | psp | lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP8|name-para | 19 | lwg__psp | _ | _ |
21 | तैनात | तैनात | JJ | adj | lex-wEnAwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-210|name-wEnAwa|chunkId-JJP2|chunkType-head:JJP2 | 22 | nmod | _ | _ |
22 | सुरक्षाकर्मियों | सुरक्षाकर्मी | NN | n | lex-surakRAkarmI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ne|tam-0|posn-220|vpos-vib_2|name-surakRAkarmiyoM|chunkId-NP9|chunkType-head:NP9 | 32 | k1 | _ | _ |
23 | ने | ने | PSP | psp | lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP9|name-ne | 22 | lwg__psp | _ | _ |
24 | बॉलीवुड | बॉलीवुड | NN | n | lex-bOYlIvuda|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-240|vpos-vib_2|name-bOYlIvuda|chunkId-NP10|chunkType-head:NP10 | 28 | r6 | _ | _ |
25 | की | का | PSP | psp | lex-kA|cat-psp|gend-f|num-sg|pers-|case-o|vib-|tam-|posn-250|chunkType-child:NP10|name-kI | 24 | lwg__psp | _ | _ |
26 | अभिनेत्री | अभिनेत्री | NN | n | lex-aBinewrI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0|tam-0|posn-260|chunkType-child:NP11|name-aBinewrI | 27 | nmod | _ | _ |
27 | बिपाशा | बिपाशा | NN | n | lex-bipASA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|chunkType-child:NP11|name-bipASA | 28 | nmod | _ | _ |
28 | बसु | बसु | NNP | n | lex-basu|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_ke_sAWa|tam-0|posn-280|vpos-vib_vib_vib_4_5|name-basu|chunkId-NP11|chunkType-head:NP11 | 32 | k2 | _ | _ |
29 | के | के | PSP | psp | lex-ke|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-290|chunkType-child:NP11|name-ke2 | 28 | lwg__psp | _ | _ |
30 | साथ | साथ | NST | nst | lex-sAWa|cat-nst|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-300|chunkType-child:NP11|name-sAWa | 28 | lwg__psp | _ | _ |
31 | दुव्यर्वहार | दुव्यर्वहार | NN | n | lex-xuvyarvahAra|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-310|name-xuvyarvahAra|chunkId-NP12|chunkType-head:NP12 | 32 | pof | _ | _ |
32 | किया | कर | VM | v | lex-kara|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-320|voicetype-active|name-kiyA|chunkId-VGF2|chunkType-head:VGF2 | 16 | nmod__relc | _ | _ |
33 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-. | 32 | rsym | _ | _ |
The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format:
<document docid="fullnews_id_2489467"> <head> <caption>jela meM svasWa hE sarabajIwa xo BArawIya aXikAriyoM ne mulAkAwa kI pre isalAmAbAxa.</caption> <language>Hindi </language> <domain_name>News Articles </domain_name> <word_count>524</word_count> <byte_count>64554</byte_count> <availability> <format>CML/SSF</format> <sentence_marker>.</sentence_marker> <normalization>No</normalization> </availability> <encoding_description> <original_encoding>ISO 8859</format> <new_encoding>Unicode UTF8</new_encoding> </encoding_description> <distributor>LTRC, IIIT Hyderabad</distributor> <project_description>NSF Hindi/Urdu Dependency Treebanking Project</place> <creation> </raw_corpus creation_date="" institute_name="IIIT Hyderabad"> </annotated_corpus creation_date="06/01/2009" institute_name="IIIT Hyderabad"> <edition_number>1.0</edition_number> </creation> <publication> <place>New Delhi</place> <date>30/5/2004</date> <type>Newspaper</type> <publisher> <name>Amar Ujala</name> <url>http://www.amarujala.com</url> </publisher> </publication> <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20100831"> <annotation-standard> <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> <pos-standard name="Anncorra-pos" version="" date="20061215" /> <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> </annotation-standard> </annotated-resource> </head> <body> <tb number="1" segment="no" bullet="no"> <foreign language="select" writingsystem="LTR"></foreign> <text> <Sentence id="1"> 1 kota XC <fs af='kota,n,m,sg,3,d,0,0' posn='10' drel='mod:lAhOra' chunkType='child:NP' name='kota'> 2 laKapawa XC <fs af='laKapawa,n,m,sg,3,d,0,0' posn='20' drel='mod:lAhOra' chunkType='child:NP' name='laKapawa'> 3 jela XC <fs af='jela,n,m,sg,3,d,0,0' posn='30' drel='mod:lAhOra' chunkType='child:NP' name='jela'> 4 lAhOra NNP <fs af='lAhOra,n,m,sg,3,o,0_meM,0' drel='jjmod:baMxa' posn='40' vpos='vib_5' name='lAhOra' chunkId='NP' chunkType='head:NP'> 5 meM PSP <fs af='meM,psp,,,,,,' posn='50' drel='lwg__psp:lAhOra' chunkType='child:NP' name='meM'> 6 baMxa JJ <fs af='baMxa,adj,any,any,,o,,' drel='nmod:siMha' posn='60' name='baMxa' chunkId='JJP' chunkType='head:JJP'> 7 sarabajIwa XC <fs af='sarabajIwa,n,m,sg,3,d,0,0' posn='70' drel='mod:siMha' chunkType='child:NP2' name='sarabajIwa'> 8 siMha NNP <fs af='siMha,n,m,sg,3,o,0_ne,0' drel='k1:xIM' posn='80' vpos='vib_3' name='siMha' chunkId='NP2' chunkType='head:NP2'> 9 ne PSP <fs af='ne,psp,,,,,,' posn='90' drel='lwg__psp:siMha' chunkType='child:NP2' name='ne'> 10 maMgalavAra NNP <fs af='maMgalavAra,n,m,sg,3,o,0_ko,0' drel='k7t:xIM' posn='100' vpos='vib_2' name='maMgalavAra' chunkId='NP3' chunkType='head:NP3'> 11 ko PSP <fs af='ko,psp,,,,,,' posn='110' drel='lwg__psp:maMgalavAra' chunkType='child:NP3' name='ko'> 12 BArawIya JJ <fs af='BArawIya,adj,any,any,,o,,' posn='120' drel='nmod__adj:xUwAvAsa' chunkType='child:NP4' name='BArawIya'> 13 xUwAvAsa NN <fs af='xUwAvAsa,n,m,sg,3,o,0_kA,0' drel='r6:aXikAriyoM' posn='130' vpos='vib_3' name='xUwAvAsa' chunkId='NP4' chunkType='head:NP4'> 14 ke PSP <fs af='kA,psp,m,pl,,o,,' posn='140' drel='lwg__psp:xUwAvAsa' chunkType='child:NP4' name='ke'> 15 xo QC <fs af='xo,num,any,pl,,o,,' posn='150' drel='nmod__adj:aXikAriyoM' chunkType='child:NP5' name='xo'> 16 aXikAriyoM NN <fs af='aXikArI,n,m,pl,3,o,0_ko,0' drel='k4:xIM' posn='160' vpos='vib_3' name='aXikAriyoM' chunkId='NP5' chunkType='head:NP5'> 17 ko PSP <fs af='ko,psp,,,,,,' posn='170' drel='lwg__psp:aXikAriyoM' chunkType='child:NP5' name='ko2'> 18 apane PRP <fs af='apanA,pn,any,sg,1,o,0_bAre_meM,0' drel='k7:xIM' posn='180' vpos='vib_2_3' name='apane' chunkId='NP6' chunkType='head:NP6'> 19 bAre PSP <fs af='bAre,psp,,,,,,' posn='190' drel='lwg__psp:apane' chunkType='child:NP6' name='bAre'> 20 meM PSP <fs af='meM,psp,,,,,,' posn='200' drel='lwg__psp:apane' chunkType='child:NP6' name='meM2'> 21 wamAma JJ <fs af='wamAma,adj,any,any,,d,,' posn='210' drel='nmod__adj:jAnakAriyAM' chunkType='child:NP7' name='wamAma'> 22 vyakwigawa JJ <fs af='vyakwigawa,adj,any,any,,d,,' posn='220' drel='nmod__adj:jAnakAriyAM' chunkType='child:NP7' name='vyakwigawa'> 23 jAnakAriyAM NN <fs af='jAnakAriyAM,n,f,pl,3,d,0,0' drel='k2:xIM' posn='230' name='jAnakAriyAM' chunkId='NP7' chunkType='head:NP7'> 24 xIM VM <fs af='xe,v,f,pl,3,,yA,yA' stype='declarative' posn='240' voicetype='active' name='xIM' chunkId='VGF' chunkType='head:VGF'> 25 ki CC <fs af='ki,avy,,,,,,' drel='rs:jAnakAriyAM' posn='250' name='ki' chunkId='CCP' chunkType='head:CCP'> 26 kina WQ <fs af='kOna,pn,any,pl,3,o,,' posn='260' drel='mod__wq:parisWiwiyoM' chunkType='child:NP8' name='kina'> 27 parisWiwiyoM NN <fs af='parisWiwi,n,f,pl,3,o,0_meM,0' drel='k7:kiyA' posn='270' vpos='vib_3' name='parisWiwiyoM' chunkId='NP8' chunkType='head:NP8'> 28 meM PSP <fs af='meM,psp,,,,,,' posn='280' drel='lwg__psp:parisWiwiyoM' chunkType='child:NP8' name='meM3'> 29 use PRP <fs af='vaha,pn,any,sg,3,o,ko,ko' drel='k2:kiyA' posn='290' name='use' chunkId='NP9' chunkType='head:NP9'> 30 giraPwAra JJ <fs af='giraPwAra,adj,any,any,,,,' drel='pof:kiyA' posn='300' name='giraPwAra' chunkId='JJP2' chunkType='head:JJP2'> 31 kiyA VM <fs af='kara,v,m,sg,3,,yA_jA+yA�,yA' drel='ccof:Ora' stype='declarative' posn='310' voicetype='passive' vpos='tam_2' name='kiyA' chunkId='VGF2' chunkType='head:VGF2'> 32 gayA VAUX <fs af='jA,v,m,sg,3,,yA�,yA1' posn='320' drel='lwg__vaux:kiyA' chunkType='child:VGF2' name='gayA'> 33 , SYM <fs af=',s,punc,,,,,' posn='330' drel='rsym:kiyA' chunkType='child:VGF2' name=','> 34 mukaxamA NN <fs af='mukaxamA,n,m,sg,3,d,0,0' drel='k1:calA' posn='340' name='mukaxamA' chunkId='NP10' chunkType='head:NP10'> 35 calA VM <fs af='cala,v,m,sg,3,,yA,yA' hlt='true' drel='ccof:Ora' stype='declarative' posn='350' voicetype='active' name='calA' chunkId='VGF3' chunkType='head:VGF3'> 36 Ora CC <fs af='Ora,avy,,,,,,' drel='ccof:ki' posn='360' name='Ora' chunkId='CCP2' chunkType='head:CCP2'> 37 sajA NN <fs af='sajA,n,f,sg,3,d,0,0' drel='k1:huI' posn='370' name='sajA' chunkId='NP11' chunkType='head:NP11'> 38 huI VM <fs af='ho,v,f,sg,3,,yA,yA' drel='ccof:Ora' stype='declarative' posn='380' voicetype='active' name='huI' chunkId='VGF4' chunkType='head:VGF4'> 39 . SYM <fs af='.,punc,,,,,,' posn='390' drel='rsym:huI' chunkType='child:VGF4' name='.'> </Sentence>
And in the CoNLL format:
1 | kota | kota | XC | n | lex-kota|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-kota | 4 | mod | _ | _ |
2 | laKapawa | laKapawa | XC | n | lex-laKapawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-20|chunkType-child:NP|name-laKapawa | 4 | mod | _ | _ |
3 | jela | jela | XC | n | lex-jela|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-30|chunkType-child:NP|name-jela | 4 | mod | _ | _ |
4 | lAhOra | lAhOra | NNP | n | lex-lAhOra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-40|vpos-vib_5|name-lAhOra|chunkId-NP|chunkType-head:NP | 6 | jjmod | _ | _ |
5 | meM | meM | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-50|chunkType-child:NP|name-meM | 4 | lwg__psp | _ | _ |
6 | baMxa | baMxa | JJ | adj | lex-baMxa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-60|name-baMxa|chunkId-JJP|chunkType-head:JJP | 8 | nmod | _ | _ |
7 | sarabajIwa | sarabajIwa | XC | n | lex-sarabajIwa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP2|name-sarabajIwa | 8 | mod | _ | _ |
8 | siMha | siMha | NNP | n | lex-siMha|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ne|tam-0|posn-80|vpos-vib_3|name-siMha|chunkId-NP2|chunkType-head:NP2 | 24 | k1 | _ | _ |
9 | ne | ne | PSP | psp | lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-90|chunkType-child:NP2|name-ne | 8 | lwg__psp | _ | _ |
10 | maMgalavAra | maMgalavAra | NNP | n | lex-maMgalavAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-100|vpos-vib_2|name-maMgalavAra|chunkId-NP3|chunkType-head:NP3 | 24 | k7t | _ | _ |
11 | ko | ko | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-110|chunkType-child:NP3|name-ko | 10 | lwg__psp | _ | _ |
12 | BArawIya | BArawIya | JJ | adj | lex-BArawIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-120|chunkType-child:NP4|name-BArawIya | 13 | nmod__adj | _ | _ |
13 | xUwAvAsa | xUwAvAsa | NN | n | lex-xUwAvAsa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-130|vpos-vib_3|name-xUwAvAsa|chunkId-NP4|chunkType-head:NP4 | 16 | r6 | _ | _ |
14 | ke | kA | PSP | psp | lex-kA|cat-psp|gend-m|num-pl|pers-|case-o|vib-|tam-|posn-140|chunkType-child:NP4|name-ke | 13 | lwg__psp | _ | _ |
15 | xo | xo | QC | num | lex-xo|cat-num|gend-any|num-pl|pers-|case-o|vib-|tam-|posn-150|chunkType-child:NP5|name-xo | 16 | nmod__adj | _ | _ |
16 | aXikAriyoM | aXikArI | NN | n | lex-aXikArI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ko|tam-0|posn-160|vpos-vib_3|name-aXikAriyoM|chunkId-NP5|chunkType-head:NP5 | 24 | k4 | _ | _ |
17 | ko | ko | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-170|chunkType-child:NP5|name-ko2 | 16 | lwg__psp | _ | _ |
18 | apane | apanA | PRP | pn | lex-apanA|cat-pn|gend-any|num-sg|pers-1|case-o|vib-0_bAre_meM|tam-0|posn-180|vpos-vib_2_3|name-apane|chunkId-NP6|chunkType-head:NP6 | 24 | k7 | _ | _ |
19 | bAre | bAre | PSP | psp | lex-bAre|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-190|chunkType-child:NP6|name-bAre | 18 | lwg__psp | _ | _ |
20 | meM | meM | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP6|name-meM2 | 18 | lwg__psp | _ | _ |
21 | wamAma | wamAma | JJ | adj | lex-wamAma|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-210|chunkType-child:NP7|name-wamAma | 23 | nmod__adj | _ | _ |
22 | vyakwigawa | vyakwigawa | JJ | adj | lex-vyakwigawa|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-220|chunkType-child:NP7|name-vyakwigawa | 23 | nmod__adj | _ | _ |
23 | jAnakAriyAM | jAnakAriyAM | NN | n | lex-jAnakAriyAM|cat-n|gend-f|num-pl|pers-3|case-d|vib-0|tam-0|posn-230|name-jAnakAriyAM|chunkId-NP7|chunkType-head:NP7 | 24 | k2 | _ | _ |
24 | xIM | xe | VM | v | lex-xe|cat-v|gend-f|num-pl|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-240|voicetype-active|name-xIM|chunkId-VGF|chunkType-head:VGF | 0 | main | _ | _ |
25 | ki | ki | CC | avy | lex-ki|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-250|name-ki|chunkId-CCP|chunkType-head:CCP | 23 | rs | _ | _ |
26 | kina | kOna | WQ | pn | lex-kOna|cat-pn|gend-any|num-pl|pers-3|case-o|vib-|tam-|posn-260|chunkType-child:NP8|name-kina | 27 | mod__wq | _ | _ |
27 | parisWiwiyoM | parisWiwi | NN | n | lex-parisWiwi|cat-n|gend-f|num-pl|pers-3|case-o|vib-0_meM|tam-0|posn-270|vpos-vib_3|name-parisWiwiyoM|chunkId-NP8|chunkType-head:NP8 | 31 | k7 | _ | _ |
28 | meM | meM | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP8|name-meM3 | 27 | lwg__psp | _ | _ |
29 | use | vaha | PRP | pn | lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-ko|tam-ko|posn-290|name-use|chunkId-NP9|chunkType-head:NP9 | 31 | k2 | _ | _ |
30 | giraPwAra | giraPwAra | JJ | adj | lex-giraPwAra|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-300|name-giraPwAra|chunkId-JJP2|chunkType-head:JJP2 | 31 | pof | _ | _ |
31 | kiyA | kara | VM | v | lex-kara|cat-v|gend-m|num-sg|pers-3|case-|vib-yA_jA+yA�|tam-yA|stype-declarative|posn-310|voicetype-passive|vpos-tam_2|name-kiyA|chunkId-VGF2|chunkType-head:VGF2 | 36 | ccof | _ | _ |
32 | gayA | jA | VAUX | v | lex-jA|cat-v|gend-m|num-sg|pers-3|case-|vib-yA�|tam-yA1|posn-320|chunkType-child:VGF2|name-gayA | 31 | lwg__vaux | _ | _ |
33 | , | , | SYM | s | lex-|cat-s|gend-punc|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-, | 31 | rsym | _ | _ |
34 | mukaxamA | mukaxamA | NN | n | lex-mukaxamA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-340|name-mukaxamA|chunkId-NP10|chunkType-head:NP10 | 35 | k1 | _ | _ |
35 | calA | cala | VM | v | lex-cala|cat-v|gend-m|num-sg|pers-3|case-|vib-yA|tam-yA|hlt-true|stype-declarative|posn-350|voicetype-active|name-calA|chunkId-VGF3|chunkType-head:VGF3 | 36 | ccof | _ | _ |
36 | Ora | Ora | CC | avy | lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-360|name-Ora|chunkId-CCP2|chunkType-head:CCP2 | 25 | ccof | _ | _ |
37 | sajA | sajA | NN | n | lex-sajA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-370|name-sajA|chunkId-NP11|chunkType-head:NP11 | 38 | k1 | _ | _ |
38 | huI | ho | VM | v | lex-ho|cat-v|gend-f|num-sg|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-380|voicetype-active|name-huI|chunkId-VGF4|chunkType-head:VGF4 | 36 | ccof | _ | _ |
39 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-390|chunkType-child:VGF4|name-. | 38 | rsym | _ | _ |
And after conversion of the WX encoding to the Devanagari script in UTF-8:
1 | कोट | कोट | XC | n | lex-kota|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-kota | 4 | mod | _ | _ |
2 | लखपत | लखपत | XC | n | lex-laKapawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-20|chunkType-child:NP|name-laKapawa | 4 | mod | _ | _ |
3 | जेल | जेल | XC | n | lex-jela|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-30|chunkType-child:NP|name-jela | 4 | mod | _ | _ |
4 | लाहौर | लाहौर | NNP | n | lex-lAhOra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-40|vpos-vib_5|name-lAhOra|chunkId-NP|chunkType-head:NP | 6 | jjmod | _ | _ |
5 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-50|chunkType-child:NP|name-meM | 4 | lwg__psp | _ | _ |
6 | बंद | बंद | JJ | adj | lex-baMxa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-60|name-baMxa|chunkId-JJP|chunkType-head:JJP | 8 | nmod | _ | _ |
7 | सरबजीत | सरबजीत | XC | n | lex-sarabajIwa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP2|name-sarabajIwa | 8 | mod | _ | _ |
8 | सिंह | सिंह | NNP | n | lex-siMha|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ne|tam-0|posn-80|vpos-vib_3|name-siMha|chunkId-NP2|chunkType-head:NP2 | 24 | k1 | _ | _ |
9 | ने | ने | PSP | psp | lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-90|chunkType-child:NP2|name-ne | 8 | lwg__psp | _ | _ |
10 | मंगलवार | मंगलवार | NNP | n | lex-maMgalavAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-100|vpos-vib_2|name-maMgalavAra|chunkId-NP3|chunkType-head:NP3 | 24 | k7t | _ | _ |
11 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-110|chunkType-child:NP3|name-ko | 10 | lwg__psp | _ | _ |
12 | भारतीय | भारतीय | JJ | adj | lex-BArawIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-120|chunkType-child:NP4|name-BArawIya | 13 | nmod__adj | _ | _ |
13 | दूतावास | दूतावास | NN | n | lex-xUwAvAsa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-130|vpos-vib_3|name-xUwAvAsa|chunkId-NP4|chunkType-head:NP4 | 16 | r6 | _ | _ |
14 | के | का | PSP | psp | lex-kA|cat-psp|gend-m|num-pl|pers-|case-o|vib-|tam-|posn-140|chunkType-child:NP4|name-ke | 13 | lwg__psp | _ | _ |
15 | दो | दो | QC | num | lex-xo|cat-num|gend-any|num-pl|pers-|case-o|vib-|tam-|posn-150|chunkType-child:NP5|name-xo | 16 | nmod__adj | _ | _ |
16 | अधिकारियों | अधिकारी | NN | n | lex-aXikArI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ko|tam-0|posn-160|vpos-vib_3|name-aXikAriyoM|chunkId-NP5|chunkType-head:NP5 | 24 | k4 | _ | _ |
17 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-170|chunkType-child:NP5|name-ko2 | 16 | lwg__psp | _ | _ |
18 | अपने | अपना | PRP | pn | lex-apanA|cat-pn|gend-any|num-sg|pers-1|case-o|vib-0_bAre_meM|tam-0|posn-180|vpos-vib_2_3|name-apane|chunkId-NP6|chunkType-head:NP6 | 24 | k7 | _ | _ |
19 | बारे | बारे | PSP | psp | lex-bAre|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-190|chunkType-child:NP6|name-bAre | 18 | lwg__psp | _ | _ |
20 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP6|name-meM2 | 18 | lwg__psp | _ | _ |
21 | तमाम | तमाम | JJ | adj | lex-wamAma|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-210|chunkType-child:NP7|name-wamAma | 23 | nmod__adj | _ | _ |
22 | व्यक्तिगत | व्यक्तिगत | JJ | adj | lex-vyakwigawa|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-220|chunkType-child:NP7|name-vyakwigawa | 23 | nmod__adj | _ | _ |
23 | जानकारियां | जानकारियां | NN | n | lex-jAnakAriyAM|cat-n|gend-f|num-pl|pers-3|case-d|vib-0|tam-0|posn-230|name-jAnakAriyAM|chunkId-NP7|chunkType-head:NP7 | 24 | k2 | _ | _ |
24 | दीं | दे | VM | v | lex-xe|cat-v|gend-f|num-pl|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-240|voicetype-active|name-xIM|chunkId-VGF|chunkType-head:VGF | 0 | main | _ | _ |
25 | कि | कि | CC | avy | lex-ki|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-250|name-ki|chunkId-CCP|chunkType-head:CCP | 23 | rs | _ | _ |
26 | किन | कौन | WQ | pn | lex-kOna|cat-pn|gend-any|num-pl|pers-3|case-o|vib-|tam-|posn-260|chunkType-child:NP8|name-kina | 27 | mod__wq | _ | _ |
27 | परिस्थितियों | परिस्थिति | NN | n | lex-parisWiwi|cat-n|gend-f|num-pl|pers-3|case-o|vib-0_meM|tam-0|posn-270|vpos-vib_3|name-parisWiwiyoM|chunkId-NP8|chunkType-head:NP8 | 31 | k7 | _ | _ |
28 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP8|name-meM3 | 27 | lwg__psp | _ | _ |
29 | उसे | वह | PRP | pn | lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-ko|tam-ko|posn-290|name-use|chunkId-NP9|chunkType-head:NP9 | 31 | k2 | _ | _ |
30 | गिरफ्तार | गिरफ्तार | JJ | adj | lex-giraPwAra|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-300|name-giraPwAra|chunkId-JJP2|chunkType-head:JJP2 | 31 | pof | _ | _ |
31 | किया | कर | VM | v | lex-kara|cat-v|gend-m|num-sg|pers-3|case-|vib-yA_jA+yA�|tam-yA|stype-declarative|posn-310|voicetype-passive|vpos-tam_2|name-kiyA|chunkId-VGF2|chunkType-head:VGF2 | 36 | ccof | _ | _ |
32 | गया | जा | VAUX | v | lex-jA|cat-v|gend-m|num-sg|pers-3|case-|vib-yA�|tam-yA1|posn-320|chunkType-child:VGF2|name-gayA | 31 | lwg__vaux | _ | _ |
33 | , | , | SYM | s | lex-|cat-s|gend-punc|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-, | 31 | rsym | _ | _ |
34 | मुकदमा | मुकदमा | NN | n | lex-mukaxamA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-340|name-mukaxamA|chunkId-NP10|chunkType-head:NP10 | 35 | k1 | _ | _ |
35 | चला | चल | VM | v | lex-cala|cat-v|gend-m|num-sg|pers-3|case-|vib-yA|tam-yA|hlt-true|stype-declarative|posn-350|voicetype-active|name-calA|chunkId-VGF3|chunkType-head:VGF3 | 36 | ccof | _ | _ |
36 | और | और | CC | avy | lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-360|name-Ora|chunkId-CCP2|chunkType-head:CCP2 | 25 | ccof | _ | _ |
37 | सजा | सजा | NN | n | lex-sajA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-370|name-sajA|chunkId-NP11|chunkType-head:NP11 | 38 | k1 | _ | _ |
38 | हुई | हो | VM | v | lex-ho|cat-v|gend-f|num-sg|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-380|voicetype-active|name-huI|chunkId-VGF4|chunkType-head:VGF4 | 36 | ccof | _ | _ |
39 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-390|chunkType-child:VGF4|name-. | 38 | rsym | _ | _ |
The first sentence of the ICON 2010 test data (with fine-grained syntactic tags) in the Shakti format:
<document docid="fullnews_id_2484368"> <head> <caption>elaosI Kolane para hogI bAwacIwa pre isalAmAbAxa.</caption> <language>Hindi </language> <domain_name>News Articles </domain_name> <word_count>313</word_count> <byte_count>37563</byte_count> <availability> <format>CML/SSF</format> <sentence_marker>.</sentence_marker> <normalization>No</normalization> </availability> <encoding_description> <original_encoding>ISO 8859</format> <new_encoding>Unicode UTF8</new_encoding> </encoding_description> <distributor>LTRC, IIIT Hyderabad</distributor> <project_description>NSF Hindi/Urdu Dependency Treebanking Project</place> <creation> </raw_corpus creation_date="" institute_name="IIIT Hyderabad"> </annotated_corpus creation_date="06/01/2009" institute_name="IIIT Hyderabad"> <edition_number>1.0</edition_number> </creation> <publication> <place>New Delhi</place> <date>28/5/2004</date> <type>Newspaper</type> <publisher> <name>Amar Ujala</name> <url>http://www.amarujala.com</url> </publisher> </publication> <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20101013"> <annotation-standard> <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> <pos-standard name="Anncorra-pos" version="" date="20061215" /> <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> </annotation-standard> </annotated-resource> </head> <body> <tb number="1" segment="no" bullet="no"> <foreign language="select" writingsystem="LTR"></foreign> <text> <Sentence id="1"> 1 pAkiswAna XC <fs af='pAkiswAna,n,m,sg,3,d,0,0' posn='10' drel='mod:kaSmIra' chunkType='child:NP' name='pAkiswAna'> 2 aXikqwa XC <fs af='aXikqwa,adj,any,any,,o,,' posn='20' drel='mod:kaSmIra' chunkType='child:NP' name='aXikqwa'> 3 kaSmIra NNP <fs af='kaSmIra,n,m,sg,3,o,0_meM,0' drel='k7p:Ae' posn='30' vpos='vib_4' name='kaSmIra' chunkId='NP' chunkType='head:NP'> 4 meM PSP <fs af='meM,psp,,,,,,' posn='40' drel='lwg__psp:kaSmIra' chunkType='child:NP' name='meM'> 5 � XC <fs af='�,num,m,sg,3,d,,' posn='50' drel='mod:akwUbara' chunkType='child:NP2' name='�'> 6 akwUbara NNP <fs af='akwUbara,n,m,sg,3,o,0_ko,0' drel='k7t:Ae' posn='60' vpos='vib_3' name='akwUbara' chunkId='NP2' chunkType='head:NP2'> 7 ko PSP <fs af='ko,psp,,,,,,' posn='70' drel='lwg__psp:akwUbara' chunkType='child:NP2' name='ko'> 8 Ae VM <fs af='A,v,m,sg,any,,yA,yA' drel='nmod__k1inv:BUkaMpa' posn='80' name='Ae' chunkId='VGNF' chunkType='head:VGNF'> 9 BUkaMpa NN <fs af='BUkaMpa,n,m,sg,3,o,0_se,0' drel='rh:macI' posn='90' vpos='vib_2' name='BUkaMpa' chunkId='NP3' chunkType='head:NP3'> 10 se PSP <fs af='se,psp,,,,,,' posn='100' drel='lwg__psp:BUkaMpa' chunkType='child:NP3' name='se'> 11 macI VM <fs af='maca,v,f,sg,any,,yA,yA' drel='nmod__k1inv:wabAhI' posn='110' name='macI' chunkId='VGNF2' chunkType='head:VGNF2'> 12 wabAhI NN <fs af='wabAhI,n,f,sg,3,o,0_kA_bAxa,0' drel='k7t:kareMge' posn='120' vpos='vib_2_3' name='wabAhI' chunkId='NP4' chunkType='head:NP4'> 13 ke PSP <fs af='kA,psp,m,sg,3,o,,' posn='130' drel='lwg__psp:wabAhI' chunkType='child:NP4' name='ke'> 14 bAxa NST <fs af='bAxa,n,,,,,,' posn='140' drel='lwg__psp:wabAhI' chunkType='child:NP4' name='bAxa'> 15 BArawa NNP <fs af='BArawa,n,m,sg,3,d,0,0' drel='ccof:Ora' posn='150' name='BArawa' chunkId='NP5' chunkType='head:NP5'> 16 Ora CC <fs af='Ora,avy,,,,,,' drel='k1:kareMge' posn='160' name='Ora' chunkId='CCP' chunkType='head:CCP'> 17 pAkiswAna NNP <fs af='pAkiswAna,n,m,sg,3,d,0,0' drel='ccof:Ora' posn='170' name='pAkiswAna2' chunkId='NP6' chunkType='head:NP6'> 18 mAnavIya JJ <fs af='mAnavIya,adj,any,any,,o,,' posn='180' drel='nmod__adj:xqRtikoNa' chunkType='child:NP7' name='mAnavIya'> 19 xqRtikoNa NN <fs af='xqRtikoNa,n,m,sg,3,d,0,0' drel='k2:apanAwe' posn='190' name='xqRtikoNa' chunkId='NP7' chunkType='head:NP7'> 20 apanAwe VM <fs af='apanA,v,m,pl,any,,wA_ho+yA,wA' drel='vmod:kareMge' posn='200' vpos='tam_2' name='apanAwe' chunkId='VGNF3' chunkType='head:VGNF3'> 21 hue VAUX <fs af='ho,v,m,pl,any,,yA,yA' posn='210' drel='lwg__vaux:apanAwe' chunkType='child:VGNF3' name='hue'> 22 SanivAra NNP <fs af='SanivAra,n,m,sg,3,o,0_ko,0' drel='k7t:kareMge' posn='220' vpos='vib_2' name='SanivAra' chunkId='NP8' chunkType='head:NP8'> 23 ko PSP <fs af='ko,psp,,,,,,' posn='230' drel='lwg__psp:SanivAra' chunkType='child:NP8' name='ko2'> 24 islAmAbAxa NNP <fs af='isalAmAbAxa,n,m,sg,3,d,0_meM,0' drel='k7p:kareMge' posn='240' vpos='vib_2' name='islAmAbAxa' chunkId='NP9' chunkType='head:NP9'> 25 meM PSP <fs af='meM,psp,,,,,,' posn='250' drel='lwg__psp:islAmAbAxa' chunkType='child:NP9' name='meM2'> 26 niyaMwraNa XC <fs af='niyaMwraNa,n,m,sg,3,d,0,0' posn='260' drel='mod:reKA' chunkType='child:NP10' name='niyaMwraNa'> 27 reKA NN <fs af='reKA,n,f,sg,3,d,0,0' drel='k2:Kolane' posn='270' name='reKA' chunkId='NP10' chunkType='head:NP10'> 28 ( SYM <fs af=',punc,,,,,,' posn='280' drel='rsym:elaosI' chunkType='child:NP11' name='('> 29 elaosI NN <fs af='elaosI,n,m,sg,3,d,0,0' drel='nmod:reKA' posn='290' name='elaosI' chunkId='NP11' chunkType='head:NP11'> 30 ) SYM <fs af=',punc,,,,,,' posn='300' drel='rsym:elaosI' chunkType='child:NP11' name=')'> 31 Kolane VM <fs af='Kola,v,any,sg,any,o,nA_kA,nA' drel='r6:masale' posn='310' vpos='tam_2' name='Kolane' chunkId='VGNN' chunkType='head:VGNN'> 32 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='320' drel='lwg__psp:Kolane' chunkType='child:VGNN' name='ke2'> 33 masale NN <fs af='masalA,n,m,sg,3,o,0_para,0' drel='k7:kareMge' posn='330' vpos='vib_2' name='masale' chunkId='NP12' chunkType='head:NP12'> 34 para PSP <fs af='para,psp,,,,,,' posn='340' drel='lwg__psp:masale' chunkType='child:NP12' name='para'> 35 bAwacIwa NN <fs af='bAwacIwa,n,f,sg,3,d,0,0' drel='pof:kareMge' posn='350' name='bAwacIwa' chunkId='NP13' chunkType='head:NP13'> 36 kareMge VM <fs af='kara,v,m,pl,3,,gA,gA' posn='360' name='kareMge' chunkId='VGF' chunkType='head:VGF'> 37 . SYM <fs af='.,punc,,,,,,' posn='370' drel='rsym:kareMge' chunkType='child:VGF' name='.'> </Sentence>
And in the CoNLL format:
1 | pAkiswAna | pAkiswAna | XC | n | lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-pAkiswAna | 3 | mod | _ | _ |
2 | aXikqwa | aXikqwa | XC | adj | lex-aXikqwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-20|chunkType-child:NP|name-aXikqwa | 3 | mod | _ | _ |
3 | kaSmIra | kaSmIra | NNP | n | lex-kaSmIra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_4|name-kaSmIra|chunkId-NP|chunkType-head:NP | 8 | k7p | _ | _ |
4 | meM | meM | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP|name-meM | 3 | lwg__psp | _ | _ |
5 | � | � | XC | num | lex-�|cat-num|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-50|chunkType-child:NP2|name-� | 6 | mod | _ | _ |
6 | akwUbara | akwUbara | NNP | n | lex-akwUbara|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-60|vpos-vib_3|name-akwUbara|chunkId-NP2|chunkType-head:NP2 | 8 | k7t | _ | _ |
7 | ko | ko | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP2|name-ko | 6 | lwg__psp | _ | _ |
8 | Ae | A | VM | v | lex-A|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|posn-80|name-Ae|chunkId-VGNF|chunkType-head:VGNF | 9 | nmod__k1inv | _ | _ |
9 | BUkaMpa | BUkaMpa | NN | n | lex-BUkaMpa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_se|tam-0|posn-90|vpos-vib_2|name-BUkaMpa|chunkId-NP3|chunkType-head:NP3 | 11 | rh | _ | _ |
10 | se | se | PSP | psp | lex-se|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-100|chunkType-child:NP3|name-se | 9 | lwg__psp | _ | _ |
11 | macI | maca | VM | v | lex-maca|cat-v|gend-f|num-sg|pers-any|case-|vib-yA|tam-yA|posn-110|name-macI|chunkId-VGNF2|chunkType-head:VGNF2 | 12 | nmod__k1inv | _ | _ |
12 | wabAhI | wabAhI | NN | n | lex-wabAhI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_kA_bAxa|tam-0|posn-120|vpos-vib_2_3|name-wabAhI|chunkId-NP4|chunkType-head:NP4 | 36 | k7t | _ | _ |
13 | ke | kA | PSP | psp | lex-kA|cat-psp|gend-m|num-sg|pers-3|case-o|vib-|tam-|posn-130|chunkType-child:NP4|name-ke | 12 | lwg__psp | _ | _ |
14 | bAxa | bAxa | NST | n | lex-bAxa|cat-n|gend-|num-|pers-|case-|vib-|tam-|posn-140|chunkType-child:NP4|name-bAxa | 12 | lwg__psp | _ | _ |
15 | BArawa | BArawa | NNP | n | lex-BArawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-150|name-BArawa|chunkId-NP5|chunkType-head:NP5 | 16 | ccof | _ | _ |
16 | Ora | Ora | CC | avy | lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-160|name-Ora|chunkId-CCP|chunkType-head:CCP | 36 | k1 | _ | _ |
17 | pAkiswAna | pAkiswAna | NNP | n | lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-170|name-pAkiswAna2|chunkId-NP6|chunkType-head:NP6 | 16 | ccof | _ | _ |
18 | mAnavIya | mAnavIya | JJ | adj | lex-mAnavIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-180|chunkType-child:NP7|name-mAnavIya | 19 | nmod__adj | _ | _ |
19 | xqRtikoNa | xqRtikoNa | NN | n | lex-xqRtikoNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-190|name-xqRtikoNa|chunkId-NP7|chunkType-head:NP7 | 20 | k2 | _ | _ |
20 | apanAwe | apanA | VM | v | lex-apanA|cat-v|gend-m|num-pl|pers-any|case-|vib-wA_ho+yA|tam-wA|posn-200|vpos-tam_2|name-apanAwe|chunkId-VGNF3|chunkType-head:VGNF3 | 36 | vmod | _ | _ |
21 | hue | ho | VAUX | v | lex-ho|cat-v|gend-m|num-pl|pers-any|case-|vib-yA|tam-yA|posn-210|chunkType-child:VGNF3|name-hue | 20 | lwg__vaux | _ | _ |
22 | SanivAra | SanivAra | NNP | n | lex-SanivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-220|vpos-vib_2|name-SanivAra|chunkId-NP8|chunkType-head:NP8 | 36 | k7t | _ | _ |
23 | ko | ko | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP8|name-ko2 | 22 | lwg__psp | _ | _ |
24 | islAmAbAxa | isalAmAbAxa | NNP | n | lex-isalAmAbAxa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0_meM|tam-0|posn-240|vpos-vib_2|name-islAmAbAxa|chunkId-NP9|chunkType-head:NP9 | 36 | k7p | _ | _ |
25 | meM | meM | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-250|chunkType-child:NP9|name-meM2 | 24 | lwg__psp | _ | _ |
26 | niyaMwraNa | niyaMwraNa | XC | n | lex-niyaMwraNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-260|chunkType-child:NP10|name-niyaMwraNa | 27 | mod | _ | _ |
27 | reKA | reKA | NN | n | lex-reKA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|name-reKA|chunkId-NP10|chunkType-head:NP10 | 31 | k2 | _ | _ |
28 | ( | ( | SYM | punc | lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP11|name-( | 29 | rsym | _ | _ |
29 | elaosI | elaosI | NN | n | lex-elaosI|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-290|name-elaosI|chunkId-NP11|chunkType-head:NP11 | 27 | nmod | _ | _ |
30 | ) | ) | SYM | punc | lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-300|chunkType-child:NP11|name-) | 29 | rsym | _ | _ |
31 | Kolane | Kola | VM | v | lex-Kola|cat-v|gend-any|num-sg|pers-any|case-o|vib-nA_kA|tam-nA|posn-310|vpos-tam_2|name-Kolane|chunkId-VGNN|chunkType-head:VGNN | 33 | r6 | _ | _ |
32 | ke | kA | PSP | psp | lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-320|chunkType-child:VGNN|name-ke2 | 31 | lwg__psp | _ | _ |
33 | masale | masalA | NN | n | lex-masalA|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_para|tam-0|posn-330|vpos-vib_2|name-masale|chunkId-NP12|chunkType-head:NP12 | 36 | k7 | _ | _ |
34 | para | para | PSP | psp | lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-340|chunkType-child:NP12|name-para | 33 | lwg__psp | _ | _ |
35 | bAwacIwa | bAwacIwa | NN | n | lex-bAwacIwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-350|name-bAwacIwa|chunkId-NP13|chunkType-head:NP13 | 36 | pof | _ | _ |
36 | kareMge | kara | VM | v | lex-kara|cat-v|gend-m|num-pl|pers-3|case-|vib-gA|tam-gA|posn-360|name-kareMge|chunkId-VGF|chunkType-head:VGF | 0 | main | _ | _ |
37 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-370|chunkType-child:VGF|name-. | 36 | rsym | _ | _ |
And after conversion of the WX encoding to the Devanagari script in UTF-8:
1 | पाकिस्तान | पाकिस्तान | XC | n | lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-pAkiswAna | 3 | mod | _ | _ |
2 | अधिकृत | अधिकृत | XC | adj | lex-aXikqwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-20|chunkType-child:NP|name-aXikqwa | 3 | mod | _ | _ |
3 | कश्मीर | कश्मीर | NNP | n | lex-kaSmIra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_4|name-kaSmIra|chunkId-NP|chunkType-head:NP | 8 | k7p | _ | _ |
4 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP|name-meM | 3 | lwg__psp | _ | _ |
5 | � | � | XC | num | lex-�|cat-num|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-50|chunkType-child:NP2|name-� | 6 | mod | _ | _ |
6 | अक्तूबर | अक्तूबर | NNP | n | lex-akwUbara|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-60|vpos-vib_3|name-akwUbara|chunkId-NP2|chunkType-head:NP2 | 8 | k7t | _ | _ |
7 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP2|name-ko | 6 | lwg__psp | _ | _ |
8 | आए | आ | VM | v | lex-A|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|posn-80|name-Ae|chunkId-VGNF|chunkType-head:VGNF | 9 | nmod__k1inv | _ | _ |
9 | भूकंप | भूकंप | NN | n | lex-BUkaMpa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_se|tam-0|posn-90|vpos-vib_2|name-BUkaMpa|chunkId-NP3|chunkType-head:NP3 | 11 | rh | _ | _ |
10 | से | से | PSP | psp | lex-se|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-100|chunkType-child:NP3|name-se | 9 | lwg__psp | _ | _ |
11 | मची | मच | VM | v | lex-maca|cat-v|gend-f|num-sg|pers-any|case-|vib-yA|tam-yA|posn-110|name-macI|chunkId-VGNF2|chunkType-head:VGNF2 | 12 | nmod__k1inv | _ | _ |
12 | तबाही | तबाही | NN | n | lex-wabAhI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_kA_bAxa|tam-0|posn-120|vpos-vib_2_3|name-wabAhI|chunkId-NP4|chunkType-head:NP4 | 36 | k7t | _ | _ |
13 | के | का | PSP | psp | lex-kA|cat-psp|gend-m|num-sg|pers-3|case-o|vib-|tam-|posn-130|chunkType-child:NP4|name-ke | 12 | lwg__psp | _ | _ |
14 | बाद | बाद | NST | n | lex-bAxa|cat-n|gend-|num-|pers-|case-|vib-|tam-|posn-140|chunkType-child:NP4|name-bAxa | 12 | lwg__psp | _ | _ |
15 | भारत | भारत | NNP | n | lex-BArawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-150|name-BArawa|chunkId-NP5|chunkType-head:NP5 | 16 | ccof | _ | _ |
16 | और | और | CC | avy | lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-160|name-Ora|chunkId-CCP|chunkType-head:CCP | 36 | k1 | _ | _ |
17 | पाकिस्तान | पाकिस्तान | NNP | n | lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-170|name-pAkiswAna2|chunkId-NP6|chunkType-head:NP6 | 16 | ccof | _ | _ |
18 | मानवीय | मानवीय | JJ | adj | lex-mAnavIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-180|chunkType-child:NP7|name-mAnavIya | 19 | nmod__adj | _ | _ |
19 | दृष्टिकोण | दृष्टिकोण | NN | n | lex-xqRtikoNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-190|name-xqRtikoNa|chunkId-NP7|chunkType-head:NP7 | 20 | k2 | _ | _ |
20 | अपनाते | अपना | VM | v | lex-apanA|cat-v|gend-m|num-pl|pers-any|case-|vib-wA_ho+yA|tam-wA|posn-200|vpos-tam_2|name-apanAwe|chunkId-VGNF3|chunkType-head:VGNF3 | 36 | vmod | _ | _ |
21 | हुए | हो | VAUX | v | lex-ho|cat-v|gend-m|num-pl|pers-any|case-|vib-yA|tam-yA|posn-210|chunkType-child:VGNF3|name-hue | 20 | lwg__vaux | _ | _ |
22 | शनिवार | शनिवार | NNP | n | lex-SanivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-220|vpos-vib_2|name-SanivAra|chunkId-NP8|chunkType-head:NP8 | 36 | k7t | _ | _ |
23 | को | को | PSP | psp | lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP8|name-ko2 | 22 | lwg__psp | _ | _ |
24 | इस्लामाबाद | इसलामाबाद | NNP | n | lex-isalAmAbAxa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0_meM|tam-0|posn-240|vpos-vib_2|name-islAmAbAxa|chunkId-NP9|chunkType-head:NP9 | 36 | k7p | _ | _ |
25 | में | में | PSP | psp | lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-250|chunkType-child:NP9|name-meM2 | 24 | lwg__psp | _ | _ |
26 | नियंत्रण | नियंत्रण | XC | n | lex-niyaMwraNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-260|chunkType-child:NP10|name-niyaMwraNa | 27 | mod | _ | _ |
27 | रेखा | रेखा | NN | n | lex-reKA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|name-reKA|chunkId-NP10|chunkType-head:NP10 | 31 | k2 | _ | _ |
28 | ( | ( | SYM | punc | lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP11|name-( | 29 | rsym | _ | _ |
29 | एलओसी | एलओसी | NN | n | lex-elaosI|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-290|name-elaosI|chunkId-NP11|chunkType-head:NP11 | 27 | nmod | _ | _ |
30 | ) | ) | SYM | punc | lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-300|chunkType-child:NP11|name-) | 29 | rsym | _ | _ |
31 | खोलने | खोल | VM | v | lex-Kola|cat-v|gend-any|num-sg|pers-any|case-o|vib-nA_kA|tam-nA|posn-310|vpos-tam_2|name-Kolane|chunkId-VGNN|chunkType-head:VGNN | 33 | r6 | _ | _ |
32 | के | का | PSP | psp | lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-320|chunkType-child:VGNN|name-ke2 | 31 | lwg__psp | _ | _ |
33 | मसले | मसला | NN | n | lex-masalA|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_para|tam-0|posn-330|vpos-vib_2|name-masale|chunkId-NP12|chunkType-head:NP12 | 36 | k7 | _ | _ |
34 | पर | पर | PSP | psp | lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-340|chunkType-child:NP12|name-para | 33 | lwg__psp | _ | _ |
35 | बातचीत | बातचीत | NN | n | lex-bAwacIwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-350|name-bAwacIwa|chunkId-NP13|chunkType-head:NP13 | 36 | pof | _ | _ |
36 | करेंगे | कर | VM | v | lex-kara|cat-v|gend-m|num-pl|pers-3|case-|vib-gA|tam-gA|posn-360|name-kareMge|chunkId-VGF|chunkType-head:VGF | 0 | main | _ | _ |
37 | . | . | SYM | punc | lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-370|chunkType-child:VGF|name-. | 36 | rsym | _ | _ |
The first sentence of the HPST 2012 training data in UTF8 SSF format with gold-standard morphology:
<Sentence id='1'> 1 गुजरात NNP <fs af='गुजरात,n,m,sg,3,o,0_का,0' name='गुजरात' posn='10' chunkId='NP' drel='r6:मुख्यमंत्री' vpos='vib_2' chunkType='head:NP'> 2 के PSP <fs af='का,psp,m,sg,,o,,' name='के' posn='20' drel='lwg__psp:गुजरात' chunkType='child:NP'> 3 मुख्यमंत्री NNP <fs af='मुख्यमंत्री,n,m,sg,3,o,0,0' name='मुख्यमंत्री' posn='30' chunkId='NP2' drel='nmod:मोदी' chunkType='head:NP2'> 4 नरेंद्र NNPC <fs af='नरेंद्र,n,m,sg,3,d,0,0' name='नरेंद्र' posn='40' drel='pof__cn:मोदी' chunkType='child:NP3'> 5 मोदी NNP <fs af='मोदी,n,m,sg,3,o,0_ने,0' name='मोदी' posn='50' chunkId='NP3' drel='k1:किया' vpos='vib_3' chunkType='head:NP3'> 6 ने PSP <fs af='ने,psp,,,,,,' name='ने' posn='60' drel='lwg__psp:मोदी' chunkType='child:NP3'> 7 मंगलवार NNP <fs af='मंगलवार,n,m,sg,3,o,0_को,0' name='मंगलवार' posn='70' chunkId='NP4' drel='k7t:किया' vpos='vib_2' chunkType='head:NP4'> 8 को PSP <fs af='को,psp,,,,,,' name='को' posn='80' drel='lwg__psp:मंगलवार' chunkType='child:NP4'> 9 गृह NNPC <fs af='गृह,n,m,sg,3,d,0,0' name='गृह' posn='90' drel='pof__cn:मंत्री' chunkType='child:NP5'> 10 मंत्री NNP <fs af='मंत्री,n,m,sg,3,d,0,0' name='मंत्री' posn='100' drel='nmod__adj:पाटिल' chunkType='child:NP5'> 11 शिवराज NNPC <fs af='शिवराज,n,m,sg,3,d,0,0' name='शिवराज' posn='110' drel='pof__cn:पाटिल' chunkType='child:NP5'> 12 पाटिल NNP <fs af='पाटिल,n,m,sg,3,o,0_से,0' name='पाटिल' posn='120' chunkId='NP5' drel='k4:किया' vpos='vib_vib_5' chunkType='head:NP5'> 13 से PSP <fs af='से,psp,,,,,,' name='से' posn='130' drel='lwg__psp:पाटिल' chunkType='child:NP5'> 14 मुलाकात NN <fs af='मुलाकात,n,f,sg,3,d,0,0' name='मुलाकात' posn='140' chunkId='NP6' drel='pof:कर' chunkType='head:NP6'> 15 कर VM <fs af='कर,v,any,any,any,,0,0' name='कर' posn='150' chunkId='VGNF' drel='vmod:किया' chunkType='head:VGNF'> 16 आईएएस NNP <fs af='आईएएस,n,m,sg,3,o,0,0' name='आईएएस' posn='160' chunkId='NP7' drel='ccof:और' chunkType='head:NP7'> 17 और CC <fs af='और,avy,,,,,,' name='और' posn='170' chunkId='CCP' drel='r6:तर्ज' chunkType='head:CCP'> 18 आईपीएस NNP <fs af='आईपीएस,n,m,sg,3,o,0_का,0' name='आईपीएस' posn='180' chunkId='NP8' drel='ccof:और' vpos='vib_2' chunkType='head:NP8'> 19 की PSP <fs af='का,psp,f,sg,,o,,' name='की' posn='190' drel='lwg__psp:आईपीएस' chunkType='child:NP8'> 20 तर्ज NN <fs af='तर्ज,n,f,sg,3,o,0_पर,0' name='तर्ज' posn='200' chunkId='NP9' drel='k7:किया' vpos='vib_2' chunkType='head:NP9'> 21 पर PSP <fs af='पर,psp,,,,,,' name='पर' posn='210' drel='lwg__psp:तर्ज' chunkType='child:NP9'> 22 राष्ट्रीय JJ <fs af='राष्ट्रीय,adj,any,any,,o,,' name='राष्ट्रीय' posn='220' drel='nmod__adj:स्तर' chunkType='child:NP10'> 23 स्तर NN <fs af='स्तर,n,m,sg,3,o,0_पर,0' name='स्तर' posn='230' chunkId='NP10' drel='k7:किया' vpos='vib_3' chunkType='head:NP10'> 24 पर PSP <fs af='पर,psp,,,,,,' name='पर2' posn='240' drel='lwg__psp:स्तर' chunkType='child:NP10'> 25 एक QC <fs af='एक,num,any,any,,any,,' name='एक' posn='250' drel='nmod__adj:सेवा' chunkType='child:NP11'> 26 खुफिया JJ <fs af='खुफिया,adj,any,any,,d,,' name='खुफिया' posn='260' drel='nmod__adj:सेवा' chunkType='child:NP11'> 27 सेवा NN <fs af='सेवा,n,f,sg,3,d,0,0' name='सेवा' posn='270' chunkId='NP11' drel='k2:करने' chunkType='head:NP11'> 28 शुरू NN <fs af='शुरू,n,m,sg,3,d,0,0' name='शुरू' posn='280' chunkId='NP12' drel='pof:करने' chunkType='head:NP12'> 29 करने VM <fs af='कर,v,any,sg,any,o,ना_का,nA' name='करने' posn='290' chunkId='VGNN' drel='r6-k2:अनुरोध' vpos='tam_2' chunkType='head:VGNN'> 30 का PSP <fs af='का,psp,m,sg,,d,,' name='का' posn='300' drel='lwg__psp:करने' chunkType='child:VGNN'> 31 अनुरोध NN <fs af='अनुरोध,n,m,sg,3,d,0,0' name='अनुरोध' posn='310' chunkId='NP13' drel='pof:किया' chunkType='head:NP13'> 32 किया VM <fs af='कर,v,m,sg,any,,या,yA' name='किया' posn='320' chunkId='VGF' chunkType='head:VGF' voicetype='active' stype='declarative'> 33 । SYM <fs af='।,punc,,,,,,' name='।' posn='330' chunkId='BLK' drel='rsym:किया' chunkType='head:BLK'> </Sentence>
And the same in CoNLL format:
1 | गुजरात | गुजरात | NNP | n | lex-गुजरात|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_का|tam-0|chunkId-NP|chunkType-head|stype-|voicetype- | 3 | r6 | _ | _ |
2 | के | का | PSP | psp | lex-का|cat-psp|gen-m|num-sg|pers-|case-o|vib-|tam-|chunkId-NP|chunkType-child|stype-|voicetype- | 1 | lwg__psp | _ | _ |
3 | मुख्यमंत्री | मुख्यमंत्री | NNP | n | lex-मुख्यमंत्री|cat-n|gen-m|num-sg|pers-3|case-o|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype- | 5 | nmod | _ | _ |
4 | नरेंद्र | नरेंद्र | NNPC | n | lex-नरेंद्र|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-child|stype-|voicetype- | 5 | pof__cn | _ | _ |
5 | मोदी | मोदी | NNP | n | lex-मोदी|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_ने|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype- | 32 | k1 | _ | _ |
6 | ने | ने | PSP | psp | lex-ने|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype- | 5 | lwg__psp | _ | _ |
7 | मंगलवार | मंगलवार | NNP | n | lex-मंगलवार|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_को|tam-0|chunkId-NP4|chunkType-head|stype-|voicetype- | 32 | k7t | _ | _ |
8 | को | को | PSP | psp | lex-को|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP4|chunkType-child|stype-|voicetype- | 7 | lwg__psp | _ | _ |
9 | गृह | गृह | NNPC | n | lex-गृह|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype- | 10 | pof__cn | _ | _ |
10 | मंत्री | मंत्री | NNP | n | lex-मंत्री|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype- | 12 | nmod__adj | _ | _ |
11 | शिवराज | शिवराज | NNPC | n | lex-शिवराज|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype- | 12 | pof__cn | _ | _ |
12 | पाटिल | पाटिल | NNP | n | lex-पाटिल|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_से|tam-0|chunkId-NP5|chunkType-head|stype-|voicetype- | 32 | k4 | _ | _ |
13 | से | से | PSP | psp | lex-से|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP5|chunkType-child|stype-|voicetype- | 12 | lwg__psp | _ | _ |
14 | मुलाकात | मुलाकात | NN | n | lex-मुलाकात|cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP6|chunkType-head|stype-|voicetype- | 15 | pof | _ | _ |
15 | कर | कर | VM | v | lex-कर|cat-v|gen-any|num-any|pers-any|case-|vib-0|tam-0|chunkId-VGNF|chunkType-head|stype-|voicetype- | 32 | vmod | _ | _ |
16 | आईएएस | आईएएस | NNP | n | lex-आईएएस|cat-n|gen-m|num-sg|pers-3|case-o|vib-0|tam-0|chunkId-NP7|chunkType-head|stype-|voicetype- | 17 | ccof | _ | _ |
17 | और | और | CC | avy | lex-और|cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-CCP|chunkType-head|stype-|voicetype- | 20 | r6 | _ | _ |
18 | आईपीएस | आईपीएस | NNP | n | lex-आईपीएस|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_का|tam-0|chunkId-NP8|chunkType-head|stype-|voicetype- | 17 | ccof | _ | _ |
19 | की | का | PSP | psp | lex-का|cat-psp|gen-f|num-sg|pers-|case-o|vib-|tam-|chunkId-NP8|chunkType-child|stype-|voicetype- | 18 | lwg__psp | _ | _ |
20 | तर्ज | तर्ज | NN | n | lex-तर्ज|cat-n|gen-f|num-sg|pers-3|case-o|vib-0_पर|tam-0|chunkId-NP9|chunkType-head|stype-|voicetype- | 32 | k7 | _ | _ |
21 | पर | पर | PSP | psp | lex-पर|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP9|chunkType-child|stype-|voicetype- | 20 | lwg__psp | _ | _ |
22 | राष्ट्रीय | राष्ट्रीय | JJ | adj | lex-राष्ट्रीय|cat-adj|gen-any|num-any|pers-|case-o|vib-|tam-|chunkId-NP10|chunkType-child|stype-|voicetype- | 23 | nmod__adj | _ | _ |
23 | स्तर | स्तर | NN | n | lex-स्तर|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_पर|tam-0|chunkId-NP10|chunkType-head|stype-|voicetype- | 32 | k7 | _ | _ |
24 | पर | पर | PSP | psp | lex-पर|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP10|chunkType-child|stype-|voicetype- | 23 | lwg__psp | _ | _ |
25 | एक | एक | QC | num | lex-एक|cat-num|gen-any|num-any|pers-|case-any|vib-|tam-|chunkId-NP11|chunkType-child|stype-|voicetype- | 27 | nmod__adj | _ | _ |
26 | खुफिया | खुफिया | JJ | adj | lex-खुफिया|cat-adj|gen-any|num-any|pers-|case-d|vib-|tam-|chunkId-NP11|chunkType-child|stype-|voicetype- | 27 | nmod__adj | _ | _ |
27 | सेवा | सेवा | NN | n | lex-सेवा|cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP11|chunkType-head|stype-|voicetype- | 29 | k2 | _ | _ |
28 | शुरू | शुरू | NN | n | lex-शुरू|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP12|chunkType-head|stype-|voicetype- | 29 | pof | _ | _ |
29 | करने | कर | VM | v | lex-कर|cat-v|gen-any|num-sg|pers-any|case-o|vib-ना_का|tam-nA|chunkId-VGNN|chunkType-head|stype-|voicetype- | 31 | r6-k2 | _ | _ |
30 | का | का | PSP | psp | lex-का|cat-psp|gen-m|num-sg|pers-|case-d|vib-|tam-|chunkId-VGNN|chunkType-child|stype-|voicetype- | 29 | lwg__psp | _ | _ |
31 | अनुरोध | अनुरोध | NN | n | lex-अनुरोध|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP13|chunkType-head|stype-|voicetype- | 32 | pof | _ | _ |
32 | किया | कर | VM | v | lex-कर|cat-v|gen-m|num-sg|pers-any|case-|vib-या|tam-yA|chunkId-VGF|chunkType-head|stype-declarative'>|voicetype-active | 0 | main | _ | _ |
33 | । | । | SYM | punc | lex-।|cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype- | 32 | rsym | _ | _ |
The same sentence with “automatically tagged” morphology. Apparently it means no morphology at all, and the contestants should probably use their own taggers to tag it.
1 | गुजरात | _ | NNP | _ | _ | 3 | r6 | _ | _ |
2 | के | _ | PSP | _ | _ | 1 | lwg__psp | _ | _ |
3 | मुख्यमंत्री | _ | NNP | _ | _ | 5 | nmod | _ | _ |
4 | नरेंद्र | _ | NNPC | _ | _ | 5 | pof__cn | _ | _ |
5 | मोदी | _ | NNP | _ | _ | 32 | k1 | _ | _ |
6 | ने | _ | PSP | _ | _ | 5 | lwg__psp | _ | _ |
7 | मंगलवार | _ | NNP | _ | _ | 32 | k7t | _ | _ |
8 | को | _ | PSP | _ | _ | 7 | lwg__psp | _ | _ |
9 | गृह | _ | NNPC | _ | _ | 10 | pof__cn | _ | _ |
10 | मंत्री | _ | NNP | _ | _ | 12 | nmod__adj | _ | _ |
11 | शिवराज | _ | NNPC | _ | _ | 12 | pof__cn | _ | _ |
12 | पाटिल | _ | NNP | _ | _ | 32 | k4 | _ | _ |
13 | से | _ | PSP | _ | _ | 12 | lwg__psp | _ | _ |
14 | मुलाकात | _ | NN | _ | _ | 15 | pof | _ | _ |
15 | कर | _ | VM | _ | _ | 32 | vmod | _ | _ |
16 | आईएएस | _ | NNP | _ | _ | 17 | ccof | _ | _ |
17 | और | _ | CC | _ | _ | 20 | r6 | _ | _ |
18 | आईपीएस | _ | NNP | _ | _ | 17 | ccof | _ | _ |
19 | की | _ | PSP | _ | _ | 18 | lwg__psp | _ | _ |
20 | तर्ज | _ | NN | _ | _ | 32 | k7 | _ | _ |
21 | पर | _ | PSP | _ | _ | 20 | lwg__psp | _ | _ |
22 | राष्ट्रीय | _ | JJ | _ | _ | 23 | nmod__adj | _ | _ |
23 | स्तर | _ | NN | _ | _ | 32 | k7 | _ | _ |
24 | पर | _ | PSP | _ | _ | 23 | lwg__psp | _ | _ |
25 | एक | _ | QC | _ | _ | 27 | nmod__adj | _ | _ |
26 | खुफिया | _ | NNC | _ | _ | 27 | nmod__adj | _ | _ |
27 | सेवा | _ | NN | _ | _ | 29 | k2 | _ | _ |
28 | शुरू | _ | NN | _ | _ | 29 | pof | _ | _ |
29 | करने | _ | VM | _ | _ | 31 | r6-k2 | _ | _ |
30 | का | _ | PSP | _ | _ | 29 | lwg__psp | _ | _ |
31 | अनुरोध | _ | NN | _ | _ | 32 | pof | _ | _ |
32 | किया | _ | VM | _ | _ | 0 | main | _ | _ |
33 | । | _ | SYM | _ | _ | 32 | rsym | _ | _ |
The first sentence of the development data in the UTF8 SSF format with gold-standard morphology:
<Sentence id='1'> 1 भाजपा NNP <fs af='भाजपा,n,f,sg,3,o,0_ने,0' name='भाजपा' posn='10' chunkId='NP' drel='k1:लगाया' vpos='vib_2' chunkType='head:NP'> 2 ने PSP <fs af='ने,psp,,,,,,' name='ने' posn='20' drel='lwg__psp:भाजपा' chunkType='child:NP'> 3 केंद्र NNPC <fs name='केंद्र' chunkId='FRAGP' chunkType='head:'FRAGP' drel='ccof:और'> 4 और CC <fs af='और,avy,,,,,,' name='और' posn='40' chunkId='CCP' drel='nmod:सरकार' chunkType='head:CCP'> 5 केरल NNPC <fs name='केरल' chunkId='FRAGP2' chunkType='head:'FRAGP2' drel='ccof:और'> 6 सरकार NNP <fs af='सरकार,n,f,sg,3,o,0_पर,0' name='सरकार' posn='60' chunkId='NP2' drel='k7:लगाया' vpos='vib_2' chunkType='head:NP2'> 7 पर PSP <fs af='पर,psp,,,,,,' name='पर' posn='70' drel='lwg__psp:सरकार' chunkType='child:NP2'> 8 भारतीय JJ <fs af='भारतीय,adj,any,any,,o,,' name='भारतीय' posn='80' drel='nmod__adj:ड्राइवर' chunkType='child:NP3'> 9 ड्राइवर NN <fs af='ड्राइवर,n,m,sg,3,o,0,0' name='ड्राइवर' posn='90' chunkId='NP3' drel='nmod:कुट्टी' chunkType='head:NP3'> 10 एम. NNPC <fs af='एम.,n,m,sg,3,d,0,0' name='एम.' posn='100' drel='pof__cn:कुट्टी' chunkType='child:NP4'> 11 आर. NNPC <fs af='आर.,n,m,sg,3,d,0,0' name='आर.' posn='110' drel='pof__cn:कुट्टी' chunkType='child:NP4'> 12 कुट्टी NNP <fs af='कुट्टी,n,m,sg,3,o,0_का,0' name='कुट्टी' posn='120' chunkId='NP4' drel='r6:हत्या' vpos='vib_4' chunkType='head:NP4'> 13 की PSP <fs af='का,psp,f,sg,,o,,' name='की' posn='130' drel='lwg__psp:कुट्टी' chunkType='child:NP4'> 14 हत्या NN <fs af='हत्या,n,f,sg,3,o,0_के_लिए,0' name='हत्या' posn='140' chunkId='NP5' drel='jjmod:जिम्मेदार' vpos='vib_2_3' chunkType='head:NP5'> 15 के PSP <fs af='के,psp,,,,,,' name='के' posn='150' drel='lwg__psp:हत्या' chunkType='child:NP5'> 16 लिए PSP <fs af='लिए,psp,,,,,,' name='लिए' posn='160' drel='lwg__cont:हत्या' chunkType='child:NP5'> 17 जिम्मेदार JJ <fs af='जिम्मेदार,adj,any,any,,o,,' name='जिम्मेदार' posn='170' chunkId='JJP' drel='nmod:तालिबान' chunkType='head:JJP'> 18 तालिबान NNP <fs af='तालिबान,n,m,sg,3,o,0_के_साथ,0' name='तालिबान' posn='180' chunkId='NP6' drel='ras-k1:लगाया' vpos='vib_2_3' chunkType='head:NP6'> 19 के PSP <fs af='के,psp,,,,,,' name='के2' posn='190' drel='lwg__psp:तालिबान' chunkType='child:NP6'> 20 साथ NST <fs af='साथ,nst,m,sg,3,d,,' name='साथ' posn='200' drel='lwg__cont:तालिबान' chunkType='child:NP6'> 21 निपटने VM <fs af='निपट,v,any,any,any,o,ना_में,nA' name='निपटने' posn='210' chunkId='VGNN' drel='k7:लगाया' vpos='tam_2' chunkType='head:VGNN'> 22 में PSP <fs af='में,psp,,,,,,' name='में' posn='220' drel='lwg__psp:निपटने' chunkType='child:VGNN'> 23 ढिलाई NN <fs af='ढिलाई,n,f,sg,3,d,0,0' name='ढिलाई' posn='230' chunkId='NP7' drel='k2:बरतने' chunkType='head:NP7'> 24 बरतने VM <fs af='बरत,v,any,sg,any,o,ना_का,nA' name='बरतने' posn='240' chunkId='VGNN2' drel='r6:आरोप' vpos='tam_2' chunkType='head:VGNN2'> 25 का PSP <fs af='का,psp,m,sg,,d,,' name='का' posn='250' drel='lwg__psp:बरतने' chunkType='child:VGNN2'> 26 आरोप NN <fs af='आरोप,n,m,sg,3,d,0,0' name='आरोप' posn='260' chunkId='NP8' drel='k2:लगाया' chunkType='head:NP8'> 27 लगाया VM <fs af='लगा,v,m,sg,3,,या_है,yA' name='लगाया' posn='270' chunkId='VGF' chunkType='head:VGF' voicetype='active' vpos='tam_2' stype='declarative'> 28 है VAUX <fs af='है,v,any,sg,3,,है,hE' name='है' posn='280' drel='lwg__vaux:लगाया' chunkType='child:VGF'> 29 । SYM <fs af='।,punc,,,,,,' name='।' posn='290' chunkId='BLK' drel='rsym:लगाया' chunkType='head:BLK'> </Sentence>
Nonprojectivities in HyDT-Hindi are not frequent. Only 862 of the 77068 chunks in the training+development ICON 2010 version are attached nonprojectively (1.12%).
The results of the ICON 2009 NLP tools contest have been published in (Husain, 2009). There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Hindi/coarse-grained results of the four officially ranked systems.
Parser (Authors) | LAS | UAS |
---|---|---|
Hyderabad (Ambati et al.) | 79.33 | 90.22 |
Malt (Nivre) | 78.20 | 89.36 |
Malt+MST (Zeman) | 73.88 | 88.49 |
Mannem | 76.90 | 88.06 |
The results of the ICON 2010 NLP tools contest have been published in (Husain et al., 2010), page 6. These are the best results for Hindi with fine-grained syntactic tags:
Parser (Authors) | LAS | UAS |
---|---|---|
Attardi et al. | 87.49 | 94.78 |
Kosaraju et al. | 88.63 | 94.54 |
Kolachina et al. | 86.22 | 93.25 |