Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
user:zeman:treebanks:hi [2011/12/06 16:32] zeman Sample training Shakti. |
user:zeman:treebanks:hi [2012/12/15 13:15] (current) zeman |
* Shakti Standard Format (SSF; native) | * Shakti Standard Format (SSF; native) |
* CoNLL format | * CoNLL format |
| * HPST 2012 (MTPIL workshop COLING 2012; this version is also called HTB (Hindi Treebank) 0.5) |
| * Shakti Standard Format (SSF; native) |
| * CoNLL format |
| * Hyderabad DT říkají tomu starému s malými daty. Tohle je Hindi treebank z velkého projektu sponzorovaného NSF |
| |
There has been no official release of the treebank yet. There have been two as-is sample releases for the purposes of the NLP tools contests in parsing Indian languages, attached to the [[http://ltrc.iiit.ac.in/nlptools2009/|ICON 2009]] and [[http://ltrc.iiit.ac.in/nlptools2010/|2010]] conferences. | |
| There has been no official release of the treebank yet. There have been three as-is sample releases for the purposes of the NLP tools contests in parsing Indian languages, attached to the [[http://ltrc.iiit.ac.in/nlptools2009/|ICON 2009]] and [[http://ltrc.iiit.ac.in/nlptools2010/|2010]] conferences and the [[http://ltrc.iiit.ac.in/mtpil2012/|MTPIL]] workshop of [[http://www.coling2012-iitb.org/|COLING 2012]]. |
| |
==== Obtaining and License ==== | ==== Obtaining and License ==== |
| |
There is no standard distribution channel for the treebank after the ICON 2010 evaluation period. Inquire at the LTRC (ltrc (at) iiit (dot) ac (dot) in) about the possibility of getting the data. The ICON 2010 license in short: | There is no standard distribution channel for the treebank after the shared task evaluation period. Inquire at the LTRC (ltrc (at) iiit (dot) ac (dot) in) about the possibility of getting the data. The ICON 2010 and HPST 2012 license in short: |
| |
* non-commercial research usage | * non-commercial research usage |
==== Domain ==== | ==== Domain ==== |
| |
Unknown. | News domain corpus from ISI Kolkata. |
| |
==== Size ==== | ==== Size ==== |
| |
HyDT-Bangla shows dependencies between chunks, not words. The node/tree ratio is thus much lower than in other treebanks. The ICON 2009 version came with a data split into three parts: training, development and test: | HyDT-Hindi contains dependencies on two levels: between chunks and inside chunks. The ICON 2009 CoNLL-formatted version contained only dependencies between chunks, thus the node/tree ratio was much lower than in other treebanks. The ICON 2009 version came with a data split into three parts: training, development and test: |
| |
^ Part ^ Sentences ^ Chunks ^ Ratio ^ | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ |
| Training | 980 | 6449 | 6.58 | | | Training | 1501 | 13779 | 9.18 | |
| Development | 150 | 811 | 5.41 | | | Development | 150 | 1250 | 8.33 | |
| Test | 150 | 961 | 6.41 | | | Test | 150 | 1156 | 7.71 | |
| TOTAL | 1280 | 8221 | 6.42 | | | TOTAL | 1801 | 16185 | 8.99 | |
| |
The ICON 2010 version came with a data split into three parts: training, development and test: | The ICON 2010 version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added: |
| |
^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ |
| Training | 979 | 6440 | 6.58 | 10305 | 10.52 | | | Training | 2972 | | | 64452 | 21.69 | |
| Development | 150 | 812 | 5.41 | 1196 | 7.97 | | | Development | 543 | | | 12616 | 23.23 | |
| Test | 150 | 961 | 6.41 | 1350 | 9.00 | | | Test | 321 | | | 6588 | 20.52 | |
| TOTAL | 1279 | 8213 | 6.42 | 12851 | 10.04 | | | TOTAL | 3836 | | | 83656 | 21.81 | |
| |
I have counted the sentences and chunks. The number of words comes from (Husain et al., 2010). Note that the paper gives the number of training sentences as 980 (instead of 979), which is a mistake. The last training sentence has the id 980 but there is no sentence with id 418. | I have counted the sentences and tokens (words) on the ''.conll'' files; there are slight differences from the statistics presented in (Husain et al., 2010). |
| |
Apparently the training-development-test data split was more or less identical in both years, except for the minor discrepancies (number of training sentences and development chunks). | The HTB 0.5 (2012) version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added: |
| |
==== Inside ==== | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ |
| | Training | 12041 | | | 268093 | 22.27 | |
| | Development | 1233 | | | 26416 | 21.42 | |
| | Test | | | | | | |
| | TOTAL | | | | | | |
| |
The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Bengali in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. | ==== Inside ==== |
| |
The CoNLL format contains only the chunk heads. The native SSF format shows the other words in the chunk, too, but it does not capture intra-chunk dependency relations. This is an example of a multi-word chunk: | HTB 0.5 is distributed in Devanagari UTF-8 and in the WX encoding (see below), both in SSF and CoNLL formats, each with gold-standard and automatic morphology. |
| |
<code>3 (( NP <fs af='rumAla,n,,sg,,d,0,0' head="rumAla" drel=k2:VGF name=NP3> | //The rest of this section applies to the ICON datasets. It may or may not still be valid for HTB 0.5.// |
3.1 ekatA QC <fs af='eka,num,,,,,,'> | |
3.2 ledisa JJ <fs af='ledisa,unk,,,,,,'> | |
3.3 rumAla NN <fs af='rumAla,n,,sg,,d,0,0' name="rumAla"> | |
))</code> | |
| |
In the CoNLL format, the CPOS column contains the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/Chunk-Tag-List.pdf|chunk label]] (e.g. ''NP'' = //noun phrase//) and the POS column contains the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/POS-Tag-List.pdf|part of speech]] of the chunk head. | The text uses the [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/map.pdf|WX encoding]] of Indian letters. If we know what the original script is (Devanagari in this case) we can map the WX encoding to the original characters in UTF-8. WX uses English letters so if there was embedded English (or other string using Latin letters) it will probably get lost during the conversion. Note that there are (not infrequent) broken characters (''\x{FFFD} REPLACEMENT CHARACTER'') in the WX encoding and the correct characters cannot be recovered automatically. |
| |
Occasionally there are ''NULL'' nodes that do not correspond to any surface chunk or token. They represent ellided participants. | Occasionally there are ''NULL'' nodes that do not correspond to any surface chunk or token. They represent ellided participants. |
The [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/dep-tagset.pdf|syntactic tags]] (dependency relation labels) are //karaka// relations, i.e. deep syntactic roles according to the Pāṇinian grammar. There are separate versions of the treebank with [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/mapping_fine-to-coarse.pdf|fine-grained and coarse-grained]] syntactic tags. | The [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/dep-tagset.pdf|syntactic tags]] (dependency relation labels) are //karaka// relations, i.e. deep syntactic roles according to the Pāṇinian grammar. There are separate versions of the treebank with [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/mapping_fine-to-coarse.pdf|fine-grained and coarse-grained]] syntactic tags. |
| |
According to [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], in the ICON 2010 version, the chunk tags, POS tags and inter-chunk dependencies (topology + tags) were annotated manually. The rest (lemma, morphosyntactic features, headword of chunk) was marked automatically. | According to [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], in the ICON 2010 version, the chunk tags, POS tags, lemma, morphosyntactic features and inter-chunk dependencies (topology + tags) were annotated manually. The rest (intra-chunk dependencies, headword of chunk) was marked automatically. The tool for intra-chunk dependency parsing achieves about 96% accuracy. |
| |
Note: There have been cycles in the Hindi part of HyDT but no such problem occurs in the Bengali part. | Note: There have been cycles in the Hindi part of HyDT. |
| |
==== Sample ==== | ==== Sample ==== |
</Sentence></code> | </Sentence></code> |
| |
And in the CoNLL format: | The same two sentences converted to the CoNLL format, WX characters decoded back to Devanagari in UTF-8: |
| |
| 1 | Agei | Age | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | | | <nowiki>1</nowiki> | <nowiki>बात</nowiki> | <nowiki>बात</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|name-bAwa|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>3</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | cA | cA | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | | | <nowiki>2</nowiki> | <nowiki>गलत</nowiki> | <nowiki>गलत</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-galawa|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-20|name-galawa|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>3</nowiki> | <nowiki>k1s</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | ese | As | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>3</nowiki> | <nowiki>हो</nowiki> | <nowiki>हो</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-any|num-any|pers-any|case-|vib-0|tam-0|stype-declarative|posn-30|voicetype-active|name-ho|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>11</nowiki> | <nowiki>vmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>4</nowiki> | <nowiki>तो</nowiki> | <nowiki>तो</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-wo|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-40|name-wo|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
And after conversion of the WX encoding to the Bengali script in UTF-8: | | <nowiki>5</nowiki> | <nowiki>गुस्सा</nowiki> | <nowiki>गुस्सा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-gussA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-gussA|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>9</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>6</nowiki> | <nowiki>सेलेब्रिटिज</nowiki> | <nowiki>सेलेब्रिटिज</nowiki> | <nowiki>NN</nowiki> | <nowiki>unk</nowiki> | <nowiki>lex-selebritija|cat-unk|gend-|num-|pers-|case-|vib-0_ko|tam-|posn-60|vpos-vib_2_RP|name-selebritija|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>9</nowiki> | <nowiki>k4a</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 1 | আগেই | আগে | NP | NST | lex-Age<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-Agei<nowiki>|</nowiki>name-NP | 3 | k7t | _ | _ | | | <nowiki>7</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP3|name-ko</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | চা | চা | NP | NN | lex-cA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-cA<nowiki>|</nowiki>name-NP2 | 3 | k1 | _ | _ | | | <nowiki>8</nowiki> | <nowiki>भी</nowiki> | <nowiki>भी</nowiki> | <nowiki>RP</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-BI|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-80|chunkType-child:NP3|name-BI</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__rp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | এসে | আস্ | VGF | VM | lex-As<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-A_yA+Ce<nowiki>|</nowiki>tam-A<nowiki>|</nowiki>head-ese<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>9</nowiki> | <nowiki>आना</nowiki> | <nowiki>आ</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-A|cat-v|gend-any|num-any|pers-any|case-d|vib-nA|tam-nA|posn-90|name-AnA|chunkId-VGNN|chunkType-head:VGNN</nowiki> | <nowiki>11</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>लाजमी</nowiki> | <nowiki>लाजमी</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-lAjamI|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-100|name-lAjamI|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>11</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>है</nowiki> | <nowiki>है</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-hE|cat-v|gend-any|num-sg|pers-3|case-|vib-hE|tam-hE|stype-declarative|posn-110|voicetype-active|name-hE|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>4</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-120|chunkType-child:VGF2|name-.</nowiki> | <nowiki>11</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | |||||||||| |
| | <nowiki>1</nowiki> | <nowiki>बृहस्पतिवार</nowiki> | <nowiki>बृहस्पतिवार</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bqhaspawivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-10|vpos-vib_2|name-bqhaspawivAra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>6</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>2</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-20|chunkType-child:NP|name-ko</nowiki> | <nowiki>1</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>3</nowiki> | <nowiki>ज़ी</nowiki> | <nowiki>जी</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jI|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_2|name-jZI|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>6</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>4</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP2|name-meM</nowiki> | <nowiki>3</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>5</nowiki> | <nowiki>शुरू</nowiki> | <nowiki>शुरू</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-SurU|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-50|name-SurU|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>6</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>6</nowiki> | <nowiki>हुए</nowiki> | <nowiki>हो</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-m|num-sg|pers-any|case-|vib-eM|tam-eM|posn-60|name-hue|chunkId-VGNF|chunkType-head:VGNF</nowiki> | <nowiki>10</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>7</nowiki> | <nowiki>��वें</nowiki> | <nowiki>��वें</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-��veM|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP4|name-��veM</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>8</nowiki> | <nowiki>अंतर्राष्ट्रीय</nowiki> | <nowiki>अंतर्राष्ट्रीय</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aMwarrARtrIya|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-80|chunkType-child:NP4|name-aMwarrARtrIya</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>9</nowiki> | <nowiki>फिल्म</nowiki> | <nowiki>फिल्म</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-Pilma|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-90|chunkType-child:NP4|name-Pilma</nowiki> | <nowiki>10</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>महोत्सव</nowiki> | <nowiki>महोत्सव</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-mahowsava|cat-n|gend-m|num-sg|pers-|case-o|vib-0_kA|tam-0|posn-100|vpos-vib_5|name-mahowsava|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>12</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>के</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-110|chunkType-child:NP4|name-ke</nowiki> | <nowiki>10</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>रंग</nowiki> | <nowiki>रंग</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-raMga|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-120|vpos-vib_2|name-raMga|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>17</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>13</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-130|chunkType-child:NP5|name-meM2</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>14</nowiki> | <nowiki>भंग</nowiki> | <nowiki>भंग</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-BaMga|cat-adj|gend-any|num-any|pers-|case-any|vib-|tam-|posn-140|name-BaMga|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>17</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>15</nowiki> | <nowiki>उस</nowiki> | <nowiki>वह</nowiki> | <nowiki>DEM</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-|tam-|posn-150|chunkType-child:NP6|name-usa</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>16</nowiki> | <nowiki>समय</nowiki> | <nowiki>समय</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-samaya|cat-n|gend-any|num-sg|pers-3|case-d|vib-0|tam-0|posn-160|name-samaya|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>17</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>17</nowiki> | <nowiki>पड़ा</nowiki> | <nowiki>पड</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-pada|cat-v|gend-any|num-any|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-170|voicetype-active|name-padZA|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>18</nowiki> | <nowiki>जब</nowiki> | <nowiki>जब</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-jaba|cat-pn|gend-|num-|pers-|case-|vib-|tam-|posn-180|coref-samaya|name-jaba|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>32</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>19</nowiki> | <nowiki>वहां</nowiki> | <nowiki>वहाँ</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vahAz|cat-pn|gend-|num-|pers-|case-|vib-0_para|tam-|posn-190|vpos-vib_2|name-vahAM|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>21</nowiki> | <nowiki>jjmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>20</nowiki> | <nowiki>पर</nowiki> | <nowiki>पर</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP8|name-para</nowiki> | <nowiki>19</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>21</nowiki> | <nowiki>तैनात</nowiki> | <nowiki>तैनात</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-wEnAwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-210|name-wEnAwa|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>22</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>22</nowiki> | <nowiki>सुरक्षाकर्मियों</nowiki> | <nowiki>सुरक्षाकर्मी</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-surakRAkarmI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ne|tam-0|posn-220|vpos-vib_2|name-surakRAkarmiyoM|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>32</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>23</nowiki> | <nowiki>ने</nowiki> | <nowiki>ने</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP9|name-ne</nowiki> | <nowiki>22</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>24</nowiki> | <nowiki>बॉलीवुड</nowiki> | <nowiki>बॉलीवुड</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bOYlIvuda|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-240|vpos-vib_2|name-bOYlIvuda|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>28</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>25</nowiki> | <nowiki>की</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-f|num-sg|pers-|case-o|vib-|tam-|posn-250|chunkType-child:NP10|name-kI</nowiki> | <nowiki>24</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>26</nowiki> | <nowiki>अभिनेत्री</nowiki> | <nowiki>अभिनेत्री</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aBinewrI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0|tam-0|posn-260|chunkType-child:NP11|name-aBinewrI</nowiki> | <nowiki>27</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>27</nowiki> | <nowiki>बिपाशा</nowiki> | <nowiki>बिपाशा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bipASA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|chunkType-child:NP11|name-bipASA</nowiki> | <nowiki>28</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>28</nowiki> | <nowiki>बसु</nowiki> | <nowiki>बसु</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-basu|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_ke_sAWa|tam-0|posn-280|vpos-vib_vib_vib_4_5|name-basu|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>32</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>29</nowiki> | <nowiki>के</nowiki> | <nowiki>के</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ke|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-290|chunkType-child:NP11|name-ke2</nowiki> | <nowiki>28</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>30</nowiki> | <nowiki>साथ</nowiki> | <nowiki>साथ</nowiki> | <nowiki>NST</nowiki> | <nowiki>nst</nowiki> | <nowiki>lex-sAWa|cat-nst|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-300|chunkType-child:NP11|name-sAWa</nowiki> | <nowiki>28</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>31</nowiki> | <nowiki>दुव्यर्वहार</nowiki> | <nowiki>दुव्यर्वहार</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xuvyarvahAra|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-310|name-xuvyarvahAra|chunkId-NP12|chunkType-head:NP12</nowiki> | <nowiki>32</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>32</nowiki> | <nowiki>किया</nowiki> | <nowiki>कर</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|stype-declarative|posn-320|voicetype-active|name-kiyA|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__relc</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>33</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-.</nowiki> | <nowiki>32</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: | The first sentence of the ICON 2010 development data (with fine-grained syntactic tags) in the Shakti format: |
| |
<code xml><document id=""> | <code xml><document docid="fullnews_id_2489467"> |
<head> | <head> |
<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20100831"> | <caption>jela meM svasWa hE sarabajIwa xo BArawIya aXikAriyoM ne mulAkAwa kI pre isalAmAbAxa.</caption> |
| <language>Hindi </language> |
| <domain_name>News Articles </domain_name> |
| <word_count>524</word_count> |
| <byte_count>64554</byte_count> |
| <availability> |
| <format>CML/SSF</format> |
| <sentence_marker>.</sentence_marker> |
| <normalization>No</normalization> |
| </availability> |
| <encoding_description> |
| <original_encoding>ISO 8859</format> |
| <new_encoding>Unicode UTF8</new_encoding> |
| </encoding_description> |
| <distributor>LTRC, IIIT Hyderabad</distributor> |
| <project_description>NSF Hindi/Urdu Dependency Treebanking Project</place> |
| <creation> |
| </raw_corpus creation_date="" institute_name="IIIT Hyderabad"> |
| </annotated_corpus creation_date="06/01/2009" institute_name="IIIT Hyderabad"> |
| <edition_number>1.0</edition_number> |
| </creation> |
| <publication> |
| <place>New Delhi</place> |
| <date>30/5/2004</date> |
| <type>Newspaper</type> |
| <publisher> |
| <name>Amar Ujala</name> |
| <url>http://www.amarujala.com</url> |
| </publisher> |
| </publication> |
| |
| <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20100831"> |
<annotation-standard> | <annotation-standard> |
<morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> | <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> |
<pos-standard name="Anncorra-pos" version="" date="20061215" /> | <pos-standard name="Anncorra-pos" version="" date="20061215" /> |
<chunk-standard name="Anncorra-chunk" version="" date="20061215" /> | <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> |
| <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> |
<dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> | <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> |
</annotation-standard> | </annotation-standard> |
</annotated-resource> | </annotated-resource> |
</head> | </head> |
| <body> |
| <tb number="1" segment="no" bullet="no"> |
| <foreign language="select" writingsystem="LTR"></foreign> |
| <text> |
<Sentence id="1"> | <Sentence id="1"> |
1 (( NP <fs af='parabarwIkAle,adv,,,,,,' head="parabarwIkAle" drel=k7t:VGF name=NP> | 1 kota XC <fs af='kota,n,m,sg,3,d,0,0' posn='10' drel='mod:lAhOra' chunkType='child:NP' name='kota'> |
1.1 parabarwIkAle NN <fs af='parabarwIkAle,adv,,,,,,' name="parabarwIkAle"> | 2 laKapawa XC <fs af='laKapawa,n,m,sg,3,d,0,0' posn='20' drel='mod:lAhOra' chunkType='child:NP' name='laKapawa'> |
)) | 3 jela XC <fs af='jela,n,m,sg,3,d,0,0' posn='30' drel='mod:lAhOra' chunkType='child:NP' name='jela'> |
2 (( NP <fs af='aPisa-biyArAraxera,unk,,,,,,' head="aPisa-biyArAraxera" drel=r6:NP3 name=NP2> | 4 lAhOra NNP <fs af='lAhOra,n,m,sg,3,o,0_meM,0' drel='jjmod:baMxa' posn='40' vpos='vib_5' name='lAhOra' chunkId='NP' chunkType='head:NP'> |
2.1 aPisa-biyArAraxera NN <fs af='aPisa-biyArAraxera,unk,,,,,,' name="aPisa-biyArAraxera"> | 5 meM PSP <fs af='meM,psp,,,,,,' posn='50' drel='lwg__psp:lAhOra' chunkType='child:NP' name='meM'> |
)) | 6 baMxa JJ <fs af='baMxa,adj,any,any,,o,,' drel='nmod:siMha' posn='60' name='baMxa' chunkId='JJP' chunkType='head:JJP'> |
3 (( NP <fs af='nAma,n,,sg,,d,0,0' head="nAma" drel=k2:VGNN name=NP3> | 7 sarabajIwa XC <fs af='sarabajIwa,n,m,sg,3,d,0,0' posn='70' drel='mod:siMha' chunkType='child:NP2' name='sarabajIwa'> |
3.1 nAma NN <fs af='nAma,n,,sg,,d,0,0' name="nAma"> | 8 siMha NNP <fs af='siMha,n,m,sg,3,o,0_ne,0' drel='k1:xIM' posn='80' vpos='vib_3' name='siMha' chunkId='NP2' chunkType='head:NP2'> |
)) | 9 ne PSP <fs af='ne,psp,,,,,,' posn='90' drel='lwg__psp:siMha' chunkType='child:NP2' name='ne'> |
4 (( NP <fs af='GoRaNA,unk,,,,,,' head="GoRaNA" drel=pof:VGNN name=NP4> | 10 maMgalavAra NNP <fs af='maMgalavAra,n,m,sg,3,o,0_ko,0' drel='k7t:xIM' posn='100' vpos='vib_2' name='maMgalavAra' chunkId='NP3' chunkType='head:NP3'> |
4.1 GoRaNA NN <fs af='GoRaNA,unk,,,,,,' name="GoRaNA"> | 11 ko PSP <fs af='ko,psp,,,,,,' posn='110' drel='lwg__psp:maMgalavAra' chunkType='child:NP3' name='ko'> |
)) | 12 BArawIya JJ <fs af='BArawIya,adj,any,any,,o,,' posn='120' drel='nmod__adj:xUwAvAsa' chunkType='child:NP4' name='BArawIya'> |
5 (( VGNN <fs af='kar,n,,,any,,,' head="karAra" drel=r6:NP5 name=VGNN> | 13 xUwAvAsa NN <fs af='xUwAvAsa,n,m,sg,3,o,0_kA,0' drel='r6:aXikAriyoM' posn='130' vpos='vib_3' name='xUwAvAsa' chunkId='NP4' chunkType='head:NP4'> |
5.1 karAra VM <fs af='kar,n,,,any,,,' name="karAra"> | 14 ke PSP <fs af='kA,psp,m,pl,,o,,' posn='140' drel='lwg__psp:xUwAvAsa' chunkType='child:NP4' name='ke'> |
)) | 15 xo QC <fs af='xo,num,any,pl,,o,,' posn='150' drel='nmod__adj:aXikAriyoM' chunkType='child:NP5' name='xo'> |
6 (( NP <fs af='samay,unk,,,,,,' head="samay" drel=k7t:VGF name=NP5> | 16 aXikAriyoM NN <fs af='aXikArI,n,m,pl,3,o,0_ko,0' drel='k4:xIM' posn='160' vpos='vib_3' name='aXikAriyoM' chunkId='NP5' chunkType='head:NP5'> |
6.1 samay NN <fs af='samay,unk,,,,,,' name="samay"> | 17 ko PSP <fs af='ko,psp,,,,,,' posn='170' drel='lwg__psp:aXikAriyoM' chunkType='child:NP5' name='ko2'> |
)) | 18 apane PRP <fs af='apanA,pn,any,sg,1,o,0_bAre_meM,0' drel='k7:xIM' posn='180' vpos='vib_2_3' name='apane' chunkId='NP6' chunkType='head:NP6'> |
7 (( NP <fs af='animeRake,unk,,,,,,' head="animeRake" drel=k2:VGF name=NP6> | 19 bAre PSP <fs af='bAre,psp,,,,,,' posn='190' drel='lwg__psp:apane' chunkType='child:NP6' name='bAre'> |
7.1 animeRake NNP <fs af='animeRake,unk,,,,,,' name="animeRake"> | 20 meM PSP <fs af='meM,psp,,,,,,' posn='200' drel='lwg__psp:apane' chunkType='child:NP6' name='meM2'> |
)) | 21 wamAma JJ <fs af='wamAma,adj,any,any,,d,,' posn='210' drel='nmod__adj:jAnakAriyAM' chunkType='child:NP7' name='wamAma'> |
8 (( VGF <fs af='sariye,unk,,,5,,0_rAKA+ka_ha+la,' head="sariye" name=VGF> | 22 vyakwigawa JJ <fs af='vyakwigawa,adj,any,any,,d,,' posn='220' drel='nmod__adj:jAnakAriyAM' chunkType='child:NP7' name='vyakwigawa'> |
8.1 sariye VM <fs af='sariye,unk,,,,,,' name="sariye"> | 23 jAnakAriyAM NN <fs af='jAnakAriyAM,n,f,pl,3,d,0,0' drel='k2:xIM' posn='230' name='jAnakAriyAM' chunkId='NP7' chunkType='head:NP7'> |
8.2 . SYM <fs af='.,punc,,,,,,'> | 24 xIM VM <fs af='xe,v,f,pl,3,,yA,yA' stype='declarative' posn='240' voicetype='active' name='xIM' chunkId='VGF' chunkType='head:VGF'> |
)) | 25 ki CC <fs af='ki,avy,,,,,,' drel='rs:jAnakAriyAM' posn='250' name='ki' chunkId='CCP' chunkType='head:CCP'> |
| 26 kina WQ <fs af='kOna,pn,any,pl,3,o,,' posn='260' drel='mod__wq:parisWiwiyoM' chunkType='child:NP8' name='kina'> |
| 27 parisWiwiyoM NN <fs af='parisWiwi,n,f,pl,3,o,0_meM,0' drel='k7:kiyA' posn='270' vpos='vib_3' name='parisWiwiyoM' chunkId='NP8' chunkType='head:NP8'> |
| 28 meM PSP <fs af='meM,psp,,,,,,' posn='280' drel='lwg__psp:parisWiwiyoM' chunkType='child:NP8' name='meM3'> |
| 29 use PRP <fs af='vaha,pn,any,sg,3,o,ko,ko' drel='k2:kiyA' posn='290' name='use' chunkId='NP9' chunkType='head:NP9'> |
| 30 giraPwAra JJ <fs af='giraPwAra,adj,any,any,,,,' drel='pof:kiyA' posn='300' name='giraPwAra' chunkId='JJP2' chunkType='head:JJP2'> |
| 31 kiyA VM <fs af='kara,v,m,sg,3,,yA_jA+yA�,yA' drel='ccof:Ora' stype='declarative' posn='310' voicetype='passive' vpos='tam_2' name='kiyA' chunkId='VGF2' chunkType='head:VGF2'> |
| 32 gayA VAUX <fs af='jA,v,m,sg,3,,yA�,yA1' posn='320' drel='lwg__vaux:kiyA' chunkType='child:VGF2' name='gayA'> |
| 33 , SYM <fs af=',s,punc,,,,,' posn='330' drel='rsym:kiyA' chunkType='child:VGF2' name=','> |
| 34 mukaxamA NN <fs af='mukaxamA,n,m,sg,3,d,0,0' drel='k1:calA' posn='340' name='mukaxamA' chunkId='NP10' chunkType='head:NP10'> |
| 35 calA VM <fs af='cala,v,m,sg,3,,yA,yA' hlt='true' drel='ccof:Ora' stype='declarative' posn='350' voicetype='active' name='calA' chunkId='VGF3' chunkType='head:VGF3'> |
| 36 Ora CC <fs af='Ora,avy,,,,,,' drel='ccof:ki' posn='360' name='Ora' chunkId='CCP2' chunkType='head:CCP2'> |
| 37 sajA NN <fs af='sajA,n,f,sg,3,d,0,0' drel='k1:huI' posn='370' name='sajA' chunkId='NP11' chunkType='head:NP11'> |
| 38 huI VM <fs af='ho,v,f,sg,3,,yA,yA' drel='ccof:Ora' stype='declarative' posn='380' voicetype='active' name='huI' chunkId='VGF4' chunkType='head:VGF4'> |
| 39 . SYM <fs af='.,punc,,,,,,' posn='390' drel='rsym:huI' chunkType='child:VGF4' name='.'> |
</Sentence></code> | </Sentence></code> |
| |
And in the CoNLL format: | And in the CoNLL format: |
| |
| 1 | parabarwIkAle | parabarwIkAle | NP | NN | lex-parabarwIkAle<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-parabarwIkAle<nowiki>|</nowiki>name-NP | 8 | k7t | _ | _ | | | <nowiki>1</nowiki> | <nowiki>kota</nowiki> | <nowiki>kota</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-kota|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-kota</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | aPisa-biyArAraxera | aPisa-biyArAraxera | NP | NN | lex-aPisa-biyArAraxera<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-aPisa-biyArAraxera<nowiki>|</nowiki>name-NP2 | 3 | r6 | _ | _ | | | <nowiki>2</nowiki> | <nowiki>laKapawa</nowiki> | <nowiki>laKapawa</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-laKapawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-20|chunkType-child:NP|name-laKapawa</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | nAma | nAma | NP | NN | lex-nAma<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-nAma<nowiki>|</nowiki>name-NP3 | 5 | k2 | _ | _ | | | <nowiki>3</nowiki> | <nowiki>jela</nowiki> | <nowiki>jela</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jela|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-30|chunkType-child:NP|name-jela</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | GoRaNA | GoRaNA | NP | NN | lex-GoRaNA<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GoRaNA<nowiki>|</nowiki>name-NP4 | 5 | pof | _ | _ | | | <nowiki>4</nowiki> | <nowiki>lAhOra</nowiki> | <nowiki>lAhOra</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-lAhOra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-40|vpos-vib_5|name-lAhOra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>6</nowiki> | <nowiki>jjmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | karAra | kar | VGNN | VM | lex-kar<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-karAra<nowiki>|</nowiki>name-VGNN | 6 | r6 | _ | _ | | | <nowiki>5</nowiki> | <nowiki>meM</nowiki> | <nowiki>meM</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-50|chunkType-child:NP|name-meM</nowiki> | <nowiki>4</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | samay | samay | NP | NN | lex-samay<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-samay<nowiki>|</nowiki>name-NP5 | 8 | k7t | _ | _ | | | <nowiki>6</nowiki> | <nowiki>baMxa</nowiki> | <nowiki>baMxa</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-baMxa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-60|name-baMxa|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>8</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | animeRake | animeRake | NP | NNP | lex-animeRake<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-animeRake<nowiki>|</nowiki>name-NP6 | 8 | k2 | _ | _ | | | <nowiki>7</nowiki> | <nowiki>sarabajIwa</nowiki> | <nowiki>sarabajIwa</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-sarabajIwa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP2|name-sarabajIwa</nowiki> | <nowiki>8</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | sariye | sariye | VGF | VM | lex-sariye<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_rAKA+ka_ha+la<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-sariye<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>8</nowiki> | <nowiki>siMha</nowiki> | <nowiki>siMha</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-siMha|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ne|tam-0|posn-80|vpos-vib_3|name-siMha|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>24</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>9</nowiki> | <nowiki>ne</nowiki> | <nowiki>ne</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-90|chunkType-child:NP2|name-ne</nowiki> | <nowiki>8</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>maMgalavAra</nowiki> | <nowiki>maMgalavAra</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-maMgalavAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-100|vpos-vib_2|name-maMgalavAra|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>24</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>ko</nowiki> | <nowiki>ko</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-110|chunkType-child:NP3|name-ko</nowiki> | <nowiki>10</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>BArawIya</nowiki> | <nowiki>BArawIya</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-BArawIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-120|chunkType-child:NP4|name-BArawIya</nowiki> | <nowiki>13</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>13</nowiki> | <nowiki>xUwAvAsa</nowiki> | <nowiki>xUwAvAsa</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xUwAvAsa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-130|vpos-vib_3|name-xUwAvAsa|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>16</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>14</nowiki> | <nowiki>ke</nowiki> | <nowiki>kA</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-pl|pers-|case-o|vib-|tam-|posn-140|chunkType-child:NP4|name-ke</nowiki> | <nowiki>13</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>15</nowiki> | <nowiki>xo</nowiki> | <nowiki>xo</nowiki> | <nowiki>QC</nowiki> | <nowiki>num</nowiki> | <nowiki>lex-xo|cat-num|gend-any|num-pl|pers-|case-o|vib-|tam-|posn-150|chunkType-child:NP5|name-xo</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>16</nowiki> | <nowiki>aXikAriyoM</nowiki> | <nowiki>aXikArI</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aXikArI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ko|tam-0|posn-160|vpos-vib_3|name-aXikAriyoM|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>24</nowiki> | <nowiki>k4</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>17</nowiki> | <nowiki>ko</nowiki> | <nowiki>ko</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-170|chunkType-child:NP5|name-ko2</nowiki> | <nowiki>16</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>18</nowiki> | <nowiki>apane</nowiki> | <nowiki>apanA</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-apanA|cat-pn|gend-any|num-sg|pers-1|case-o|vib-0_bAre_meM|tam-0|posn-180|vpos-vib_2_3|name-apane|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>24</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>19</nowiki> | <nowiki>bAre</nowiki> | <nowiki>bAre</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-bAre|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-190|chunkType-child:NP6|name-bAre</nowiki> | <nowiki>18</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>20</nowiki> | <nowiki>meM</nowiki> | <nowiki>meM</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP6|name-meM2</nowiki> | <nowiki>18</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>21</nowiki> | <nowiki>wamAma</nowiki> | <nowiki>wamAma</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-wamAma|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-210|chunkType-child:NP7|name-wamAma</nowiki> | <nowiki>23</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>22</nowiki> | <nowiki>vyakwigawa</nowiki> | <nowiki>vyakwigawa</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-vyakwigawa|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-220|chunkType-child:NP7|name-vyakwigawa</nowiki> | <nowiki>23</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>23</nowiki> | <nowiki>jAnakAriyAM</nowiki> | <nowiki>jAnakAriyAM</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jAnakAriyAM|cat-n|gend-f|num-pl|pers-3|case-d|vib-0|tam-0|posn-230|name-jAnakAriyAM|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>24</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>24</nowiki> | <nowiki>xIM</nowiki> | <nowiki>xe</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-xe|cat-v|gend-f|num-pl|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-240|voicetype-active|name-xIM|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>25</nowiki> | <nowiki>ki</nowiki> | <nowiki>ki</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-ki|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-250|name-ki|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>23</nowiki> | <nowiki>rs</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>26</nowiki> | <nowiki>kina</nowiki> | <nowiki>kOna</nowiki> | <nowiki>WQ</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-kOna|cat-pn|gend-any|num-pl|pers-3|case-o|vib-|tam-|posn-260|chunkType-child:NP8|name-kina</nowiki> | <nowiki>27</nowiki> | <nowiki>mod__wq</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>27</nowiki> | <nowiki>parisWiwiyoM</nowiki> | <nowiki>parisWiwi</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-parisWiwi|cat-n|gend-f|num-pl|pers-3|case-o|vib-0_meM|tam-0|posn-270|vpos-vib_3|name-parisWiwiyoM|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>31</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>28</nowiki> | <nowiki>meM</nowiki> | <nowiki>meM</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP8|name-meM3</nowiki> | <nowiki>27</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>29</nowiki> | <nowiki>use</nowiki> | <nowiki>vaha</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-ko|tam-ko|posn-290|name-use|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>31</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>30</nowiki> | <nowiki>giraPwAra</nowiki> | <nowiki>giraPwAra</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-giraPwAra|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-300|name-giraPwAra|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>31</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>31</nowiki> | <nowiki>kiyA</nowiki> | <nowiki>kara</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-sg|pers-3|case-|vib-yA_jA+yA�|tam-yA|stype-declarative|posn-310|voicetype-passive|vpos-tam_2|name-kiyA|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>32</nowiki> | <nowiki>gayA</nowiki> | <nowiki>jA</nowiki> | <nowiki>VAUX</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-jA|cat-v|gend-m|num-sg|pers-3|case-|vib-yA�|tam-yA1|posn-320|chunkType-child:VGF2|name-gayA</nowiki> | <nowiki>31</nowiki> | <nowiki>lwg__vaux</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>33</nowiki> | <nowiki>,</nowiki> | <nowiki>,</nowiki> | <nowiki>SYM</nowiki> | <nowiki>s</nowiki> | <nowiki>lex-|cat-s|gend-punc|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-,</nowiki> | <nowiki>31</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>34</nowiki> | <nowiki>mukaxamA</nowiki> | <nowiki>mukaxamA</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-mukaxamA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-340|name-mukaxamA|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>35</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>35</nowiki> | <nowiki>calA</nowiki> | <nowiki>cala</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-cala|cat-v|gend-m|num-sg|pers-3|case-|vib-yA|tam-yA|hlt-true|stype-declarative|posn-350|voicetype-active|name-calA|chunkId-VGF3|chunkType-head:VGF3</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>36</nowiki> | <nowiki>Ora</nowiki> | <nowiki>Ora</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-360|name-Ora|chunkId-CCP2|chunkType-head:CCP2</nowiki> | <nowiki>25</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>37</nowiki> | <nowiki>sajA</nowiki> | <nowiki>sajA</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-sajA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-370|name-sajA|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>38</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>38</nowiki> | <nowiki>huI</nowiki> | <nowiki>ho</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-f|num-sg|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-380|voicetype-active|name-huI|chunkId-VGF4|chunkType-head:VGF4</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>39</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-390|chunkType-child:VGF4|name-.</nowiki> | <nowiki>38</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
And after conversion of the WX encoding to the Bengali script in UTF-8: | And after conversion of the WX encoding to the Devanagari script in UTF-8: |
| |
| 1 | পরবর্তীকালে | পরবর্তীকালে | NP | NN | lex-parabarwIkAle<nowiki>|</nowiki>cat-adv<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-parabarwIkAle<nowiki>|</nowiki>name-NP | 8 | k7t | _ | _ | | | <nowiki>1</nowiki> | <nowiki>कोट</nowiki> | <nowiki>कोट</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-kota|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-kota</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | অফিস-বিযারারদের | অফিস-বিযারারদের | NP | NN | lex-aPisa-biyArAraxera<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-aPisa-biyArAraxera<nowiki>|</nowiki>name-NP2 | 3 | r6 | _ | _ | | | <nowiki>2</nowiki> | <nowiki>लखपत</nowiki> | <nowiki>लखपत</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-laKapawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-20|chunkType-child:NP|name-laKapawa</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | নাম | নাম | NP | NN | lex-nAma<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-nAma<nowiki>|</nowiki>name-NP3 | 5 | k2 | _ | _ | | | <nowiki>3</nowiki> | <nowiki>जेल</nowiki> | <nowiki>जेल</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jela|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-30|chunkType-child:NP|name-jela</nowiki> | <nowiki>4</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | ঘোষণা | ঘোষণা | NP | NN | lex-GoRaNA<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GoRaNA<nowiki>|</nowiki>name-NP4 | 5 | pof | _ | _ | | | <nowiki>4</nowiki> | <nowiki>लाहौर</nowiki> | <nowiki>लाहौर</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-lAhOra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-40|vpos-vib_5|name-lAhOra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>6</nowiki> | <nowiki>jjmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | করার | কর্ | VGNN | VM | lex-kar<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-any<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-karAra<nowiki>|</nowiki>name-VGNN | 6 | r6 | _ | _ | | | <nowiki>5</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-50|chunkType-child:NP|name-meM</nowiki> | <nowiki>4</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | সময্ | সময্ | NP | NN | lex-samay<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-samay<nowiki>|</nowiki>name-NP5 | 8 | k7t | _ | _ | | | <nowiki>6</nowiki> | <nowiki>बंद</nowiki> | <nowiki>बंद</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-baMxa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-60|name-baMxa|chunkId-JJP|chunkType-head:JJP</nowiki> | <nowiki>8</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | অনিমেষকে | অনিমেষকে | NP | NNP | lex-animeRake<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-animeRake<nowiki>|</nowiki>name-NP6 | 8 | k2 | _ | _ | | | <nowiki>7</nowiki> | <nowiki>सरबजीत</nowiki> | <nowiki>सरबजीत</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-sarabajIwa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-70|chunkType-child:NP2|name-sarabajIwa</nowiki> | <nowiki>8</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | সরিযে | সরিযে | VGF | VM | lex-sariye<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-0_rAKA+ka_ha+la<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-sariye<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>8</nowiki> | <nowiki>सिंह</nowiki> | <nowiki>सिंह</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-siMha|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ne|tam-0|posn-80|vpos-vib_3|name-siMha|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>24</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>9</nowiki> | <nowiki>ने</nowiki> | <nowiki>ने</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ne|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-90|chunkType-child:NP2|name-ne</nowiki> | <nowiki>8</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>मंगलवार</nowiki> | <nowiki>मंगलवार</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-maMgalavAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-100|vpos-vib_2|name-maMgalavAra|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>24</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-110|chunkType-child:NP3|name-ko</nowiki> | <nowiki>10</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>भारतीय</nowiki> | <nowiki>भारतीय</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-BArawIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-120|chunkType-child:NP4|name-BArawIya</nowiki> | <nowiki>13</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>13</nowiki> | <nowiki>दूतावास</nowiki> | <nowiki>दूतावास</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xUwAvAsa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_kA|tam-0|posn-130|vpos-vib_3|name-xUwAvAsa|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>16</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>14</nowiki> | <nowiki>के</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-pl|pers-|case-o|vib-|tam-|posn-140|chunkType-child:NP4|name-ke</nowiki> | <nowiki>13</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>15</nowiki> | <nowiki>दो</nowiki> | <nowiki>दो</nowiki> | <nowiki>QC</nowiki> | <nowiki>num</nowiki> | <nowiki>lex-xo|cat-num|gend-any|num-pl|pers-|case-o|vib-|tam-|posn-150|chunkType-child:NP5|name-xo</nowiki> | <nowiki>16</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>16</nowiki> | <nowiki>अधिकारियों</nowiki> | <nowiki>अधिकारी</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-aXikArI|cat-n|gend-m|num-pl|pers-3|case-o|vib-0_ko|tam-0|posn-160|vpos-vib_3|name-aXikAriyoM|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>24</nowiki> | <nowiki>k4</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>17</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-170|chunkType-child:NP5|name-ko2</nowiki> | <nowiki>16</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>18</nowiki> | <nowiki>अपने</nowiki> | <nowiki>अपना</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-apanA|cat-pn|gend-any|num-sg|pers-1|case-o|vib-0_bAre_meM|tam-0|posn-180|vpos-vib_2_3|name-apane|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>24</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>19</nowiki> | <nowiki>बारे</nowiki> | <nowiki>बारे</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-bAre|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-190|chunkType-child:NP6|name-bAre</nowiki> | <nowiki>18</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>20</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-200|chunkType-child:NP6|name-meM2</nowiki> | <nowiki>18</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>21</nowiki> | <nowiki>तमाम</nowiki> | <nowiki>तमाम</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-wamAma|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-210|chunkType-child:NP7|name-wamAma</nowiki> | <nowiki>23</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>22</nowiki> | <nowiki>व्यक्तिगत</nowiki> | <nowiki>व्यक्तिगत</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-vyakwigawa|cat-adj|gend-any|num-any|pers-|case-d|vib-|tam-|posn-220|chunkType-child:NP7|name-vyakwigawa</nowiki> | <nowiki>23</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>23</nowiki> | <nowiki>जानकारियां</nowiki> | <nowiki>जानकारियां</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-jAnakAriyAM|cat-n|gend-f|num-pl|pers-3|case-d|vib-0|tam-0|posn-230|name-jAnakAriyAM|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>24</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>24</nowiki> | <nowiki>दीं</nowiki> | <nowiki>दे</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-xe|cat-v|gend-f|num-pl|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-240|voicetype-active|name-xIM|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>25</nowiki> | <nowiki>कि</nowiki> | <nowiki>कि</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-ki|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-250|name-ki|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>23</nowiki> | <nowiki>rs</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>26</nowiki> | <nowiki>किन</nowiki> | <nowiki>कौन</nowiki> | <nowiki>WQ</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-kOna|cat-pn|gend-any|num-pl|pers-3|case-o|vib-|tam-|posn-260|chunkType-child:NP8|name-kina</nowiki> | <nowiki>27</nowiki> | <nowiki>mod__wq</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>27</nowiki> | <nowiki>परिस्थितियों</nowiki> | <nowiki>परिस्थिति</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-parisWiwi|cat-n|gend-f|num-pl|pers-3|case-o|vib-0_meM|tam-0|posn-270|vpos-vib_3|name-parisWiwiyoM|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>31</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>28</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP8|name-meM3</nowiki> | <nowiki>27</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>29</nowiki> | <nowiki>उसे</nowiki> | <nowiki>वह</nowiki> | <nowiki>PRP</nowiki> | <nowiki>pn</nowiki> | <nowiki>lex-vaha|cat-pn|gend-any|num-sg|pers-3|case-o|vib-ko|tam-ko|posn-290|name-use|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>31</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>30</nowiki> | <nowiki>गिरफ्तार</nowiki> | <nowiki>गिरफ्तार</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-giraPwAra|cat-adj|gend-any|num-any|pers-|case-|vib-|tam-|posn-300|name-giraPwAra|chunkId-JJP2|chunkType-head:JJP2</nowiki> | <nowiki>31</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>31</nowiki> | <nowiki>किया</nowiki> | <nowiki>कर</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-sg|pers-3|case-|vib-yA_jA+yA�|tam-yA|stype-declarative|posn-310|voicetype-passive|vpos-tam_2|name-kiyA|chunkId-VGF2|chunkType-head:VGF2</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>32</nowiki> | <nowiki>गया</nowiki> | <nowiki>जा</nowiki> | <nowiki>VAUX</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-jA|cat-v|gend-m|num-sg|pers-3|case-|vib-yA�|tam-yA1|posn-320|chunkType-child:VGF2|name-gayA</nowiki> | <nowiki>31</nowiki> | <nowiki>lwg__vaux</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>33</nowiki> | <nowiki>,</nowiki> | <nowiki>,</nowiki> | <nowiki>SYM</nowiki> | <nowiki>s</nowiki> | <nowiki>lex-|cat-s|gend-punc|num-|pers-|case-|vib-|tam-|posn-330|chunkType-child:VGF2|name-,</nowiki> | <nowiki>31</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>34</nowiki> | <nowiki>मुकदमा</nowiki> | <nowiki>मुकदमा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-mukaxamA|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-340|name-mukaxamA|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>35</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>35</nowiki> | <nowiki>चला</nowiki> | <nowiki>चल</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-cala|cat-v|gend-m|num-sg|pers-3|case-|vib-yA|tam-yA|hlt-true|stype-declarative|posn-350|voicetype-active|name-calA|chunkId-VGF3|chunkType-head:VGF3</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>36</nowiki> | <nowiki>और</nowiki> | <nowiki>और</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-360|name-Ora|chunkId-CCP2|chunkType-head:CCP2</nowiki> | <nowiki>25</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>37</nowiki> | <nowiki>सजा</nowiki> | <nowiki>सजा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-sajA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-370|name-sajA|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>38</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>38</nowiki> | <nowiki>हुई</nowiki> | <nowiki>हो</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-f|num-sg|pers-3|case-|vib-yA|tam-yA|stype-declarative|posn-380|voicetype-active|name-huI|chunkId-VGF4|chunkType-head:VGF4</nowiki> | <nowiki>36</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>39</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-390|chunkType-child:VGF4|name-.</nowiki> | <nowiki>38</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
The first sentence of the ICON 2010 test data (with fine-grained syntactic tags) in the Shakti format: | The first sentence of the ICON 2010 test data (with fine-grained syntactic tags) in the Shakti format: |
| |
<code xml><document id=""> | <code xml><document docid="fullnews_id_2484368"> |
<head> | <head> |
<annotated-resource name="HyDT-Bangla" version="0.5" type="dep-interchunk-only" layers="morph,pos,chunk,dep-interchunk-only" language="ben" date-of-release="20101013"> | <caption>elaosI Kolane para hogI bAwacIwa pre isalAmAbAxa.</caption> |
| <language>Hindi </language> |
| <domain_name>News Articles </domain_name> |
| <word_count>313</word_count> |
| <byte_count>37563</byte_count> |
| <availability> |
| <format>CML/SSF</format> |
| <sentence_marker>.</sentence_marker> |
| <normalization>No</normalization> |
| </availability> |
| <encoding_description> |
| <original_encoding>ISO 8859</format> |
| <new_encoding>Unicode UTF8</new_encoding> |
| </encoding_description> |
| <distributor>LTRC, IIIT Hyderabad</distributor> |
| <project_description>NSF Hindi/Urdu Dependency Treebanking Project</place> |
| <creation> |
| </raw_corpus creation_date="" institute_name="IIIT Hyderabad"> |
| </annotated_corpus creation_date="06/01/2009" institute_name="IIIT Hyderabad"> |
| <edition_number>1.0</edition_number> |
| </creation> |
| <publication> |
| <place>New Delhi</place> |
| <date>28/5/2004</date> |
| <type>Newspaper</type> |
| <publisher> |
| <name>Amar Ujala</name> |
| <url>http://www.amarujala.com</url> |
| </publisher> |
| </publication> |
| |
| <annotated-resource name="HyDT-Hindi" version="2.0" type="dep-words" layers="morph,pos,chunk,dep-word" language="hin" date-of-release="20101013"> |
<annotation-standard> | <annotation-standard> |
<morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> | <morph-standard name="Anncorra-morph" version="1.31" date="20080920" /> |
<pos-standard name="Anncorra-pos" version="" date="20061215" /> | <pos-standard name="Anncorra-pos" version="" date="20061215" /> |
<chunk-standard name="Anncorra-chunk" version="" date="20061215" /> | <chunk-standard name="Anncorra-chunk" version="" date="20061215" /> |
<dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> | <intrachunk-dependency-standard name="Anncorra-intrachunk-dep" version="1.0" date="" dep-tagset-granularity="5" /> |
| <dependency-standard name="Anncorra-dep" version="2.0" date="" dep-tagset-granularity="6" /> |
</annotation-standard> | </annotation-standard> |
<annotated-resource> | </annotated-resource> |
</head> | </head> |
| <body> |
| <tb number="1" segment="no" bullet="no"> |
| <foreign language="select" writingsystem="LTR"></foreign> |
| <text> |
<Sentence id="1"> | <Sentence id="1"> |
1 (( NP <fs af='mAXabIlawA,n,,sg,,d,0,0' head="mAXabIlawA" drel=k1:VGF name=NP> | 1 pAkiswAna XC <fs af='pAkiswAna,n,m,sg,3,d,0,0' posn='10' drel='mod:kaSmIra' chunkType='child:NP' name='pAkiswAna'> |
1.1 mAXabIlawA NNP <fs af='mAXabIlawA,n,,sg,,d,0,0' name="mAXabIlawA"> | 2 aXikqwa XC <fs af='aXikqwa,adj,any,any,,o,,' posn='20' drel='mod:kaSmIra' chunkType='child:NP' name='aXikqwa'> |
)) | 3 kaSmIra NNP <fs af='kaSmIra,n,m,sg,3,o,0_meM,0' drel='k7p:Ae' posn='30' vpos='vib_4' name='kaSmIra' chunkId='NP' chunkType='head:NP'> |
2 (( NP <fs af='waKana,pn,,,,d,0,0' head="waKana" drel=k7t:VGF name=NP2> | 4 meM PSP <fs af='meM,psp,,,,,,' posn='40' drel='lwg__psp:kaSmIra' chunkType='child:NP' name='meM'> |
2.1 waKana PRP <fs af='waKana,pn,,,,d,0,0' name="waKana"> | 5 � XC <fs af='�,num,m,sg,3,d,,' posn='50' drel='mod:akwUbara' chunkType='child:NP2' name='�'> |
)) | 6 akwUbara NNP <fs af='akwUbara,n,m,sg,3,o,0_ko,0' drel='k7t:Ae' posn='60' vpos='vib_3' name='akwUbara' chunkId='NP2' chunkType='head:NP2'> |
3 (( NP <fs af='hAwa,n,,sg,,o,era,era' head="hAwera" drel=r6:NP4 name=NP3> | 7 ko PSP <fs af='ko,psp,,,,,,' posn='70' drel='lwg__psp:akwUbara' chunkType='child:NP2' name='ko'> |
3.1 hAwera NN <fs af='hAwa,n,,sg,,o,era,era' name="hAwera"> | 8 Ae VM <fs af='A,v,m,sg,any,,yA,yA' drel='nmod__k1inv:BUkaMpa' posn='80' name='Ae' chunkId='VGNF' chunkType='head:VGNF'> |
)) | 9 BUkaMpa NN <fs af='BUkaMpa,n,m,sg,3,o,0_se,0' drel='rh:macI' posn='90' vpos='vib_2' name='BUkaMpa' chunkId='NP3' chunkType='head:NP3'> |
4 (( NP <fs af='GadZi,unk,,,,,,' head="GadZi" drel=k2:VGNF name=NP4> | 10 se PSP <fs af='se,psp,,,,,,' posn='100' drel='lwg__psp:BUkaMpa' chunkType='child:NP3' name='se'> |
4.1 GadZi NN <fs af='GadZi,unk,,,,,,' name="GadZi"> | 11 macI VM <fs af='maca,v,f,sg,any,,yA,yA' drel='nmod__k1inv:wabAhI' posn='110' name='macI' chunkId='VGNF2' chunkType='head:VGNF2'> |
)) | 12 wabAhI NN <fs af='wabAhI,n,f,sg,3,o,0_kA_bAxa,0' drel='k7t:kareMge' posn='120' vpos='vib_2_3' name='wabAhI' chunkId='NP4' chunkType='head:NP4'> |
5 (( VGNF <fs af='Kul,v,,,5,,ne,ne' head="Kule" drel=vmod:VGF name=VGNF> | 13 ke PSP <fs af='kA,psp,m,sg,3,o,,' posn='130' drel='lwg__psp:wabAhI' chunkType='child:NP4' name='ke'> |
5.1 Kule VM <fs af='Kul,v,,,5,,ne,ne' name="Kule"> | 14 bAxa NST <fs af='bAxa,n,,,,,,' posn='140' drel='lwg__psp:wabAhI' chunkType='child:NP4' name='bAxa'> |
)) | 15 BArawa NNP <fs af='BArawa,n,m,sg,3,d,0,0' drel='ccof:Ora' posn='150' name='BArawa' chunkId='NP5' chunkType='head:NP5'> |
6 (( NP <fs af='tebila,n,,sg,,d,me,me' head="tebile" drel=k7p:VGF name=NP5> | 16 Ora CC <fs af='Ora,avy,,,,,,' drel='k1:kareMge' posn='160' name='Ora' chunkId='CCP' chunkType='head:CCP'> |
6.1 tebile NN <fs af='tebila,n,,sg,,d,me,me' name="tebile"> | 17 pAkiswAna NNP <fs af='pAkiswAna,n,m,sg,3,d,0,0' drel='ccof:Ora' posn='170' name='pAkiswAna2' chunkId='NP6' chunkType='head:NP6'> |
)) | 18 mAnavIya JJ <fs af='mAnavIya,adj,any,any,,o,,' posn='180' drel='nmod__adj:xqRtikoNa' chunkType='child:NP7' name='mAnavIya'> |
7 (( VGF <fs af='rAK,v,,,5,,Cila,Cila' head="rAKaCila" name=VGF> | 19 xqRtikoNa NN <fs af='xqRtikoNa,n,m,sg,3,d,0,0' drel='k2:apanAwe' posn='190' name='xqRtikoNa' chunkId='NP7' chunkType='head:NP7'> |
7.1 rAKaCila VM <fs af='rAK,v,,,5,,Cila,Cila' name="rAKaCila"> | 20 apanAwe VM <fs af='apanA,v,m,pl,any,,wA_ho+yA,wA' drel='vmod:kareMge' posn='200' vpos='tam_2' name='apanAwe' chunkId='VGNF3' chunkType='head:VGNF3'> |
7.2 । SYM | 21 hue VAUX <fs af='ho,v,m,pl,any,,yA,yA' posn='210' drel='lwg__vaux:apanAwe' chunkType='child:VGNF3' name='hue'> |
)) | 22 SanivAra NNP <fs af='SanivAra,n,m,sg,3,o,0_ko,0' drel='k7t:kareMge' posn='220' vpos='vib_2' name='SanivAra' chunkId='NP8' chunkType='head:NP8'> |
| 23 ko PSP <fs af='ko,psp,,,,,,' posn='230' drel='lwg__psp:SanivAra' chunkType='child:NP8' name='ko2'> |
| 24 islAmAbAxa NNP <fs af='isalAmAbAxa,n,m,sg,3,d,0_meM,0' drel='k7p:kareMge' posn='240' vpos='vib_2' name='islAmAbAxa' chunkId='NP9' chunkType='head:NP9'> |
| 25 meM PSP <fs af='meM,psp,,,,,,' posn='250' drel='lwg__psp:islAmAbAxa' chunkType='child:NP9' name='meM2'> |
| 26 niyaMwraNa XC <fs af='niyaMwraNa,n,m,sg,3,d,0,0' posn='260' drel='mod:reKA' chunkType='child:NP10' name='niyaMwraNa'> |
| 27 reKA NN <fs af='reKA,n,f,sg,3,d,0,0' drel='k2:Kolane' posn='270' name='reKA' chunkId='NP10' chunkType='head:NP10'> |
| 28 ( SYM <fs af=',punc,,,,,,' posn='280' drel='rsym:elaosI' chunkType='child:NP11' name='('> |
| 29 elaosI NN <fs af='elaosI,n,m,sg,3,d,0,0' drel='nmod:reKA' posn='290' name='elaosI' chunkId='NP11' chunkType='head:NP11'> |
| 30 ) SYM <fs af=',punc,,,,,,' posn='300' drel='rsym:elaosI' chunkType='child:NP11' name=')'> |
| 31 Kolane VM <fs af='Kola,v,any,sg,any,o,nA_kA,nA' drel='r6:masale' posn='310' vpos='tam_2' name='Kolane' chunkId='VGNN' chunkType='head:VGNN'> |
| 32 ke PSP <fs af='kA,psp,m,sg,,o,,' posn='320' drel='lwg__psp:Kolane' chunkType='child:VGNN' name='ke2'> |
| 33 masale NN <fs af='masalA,n,m,sg,3,o,0_para,0' drel='k7:kareMge' posn='330' vpos='vib_2' name='masale' chunkId='NP12' chunkType='head:NP12'> |
| 34 para PSP <fs af='para,psp,,,,,,' posn='340' drel='lwg__psp:masale' chunkType='child:NP12' name='para'> |
| 35 bAwacIwa NN <fs af='bAwacIwa,n,f,sg,3,d,0,0' drel='pof:kareMge' posn='350' name='bAwacIwa' chunkId='NP13' chunkType='head:NP13'> |
| 36 kareMge VM <fs af='kara,v,m,pl,3,,gA,gA' posn='360' name='kareMge' chunkId='VGF' chunkType='head:VGF'> |
| 37 . SYM <fs af='.,punc,,,,,,' posn='370' drel='rsym:kareMge' chunkType='child:VGF' name='.'> |
</Sentence></code> | </Sentence></code> |
| |
And in the CoNLL format: | And in the CoNLL format: |
| |
| 1 | mAXabIlawA | mAXabIlawA | NP | NNP | lex-mAXabIlawA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-mAXabIlawA<nowiki>|</nowiki>name-NP | 7 | k1 | _ | _ | | | <nowiki>1</nowiki> | <nowiki>pAkiswAna</nowiki> | <nowiki>pAkiswAna</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-pAkiswAna</nowiki> | <nowiki>3</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | waKana | waKana | NP | PRP | lex-waKana<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-waKana<nowiki>|</nowiki>name-NP2 | 7 | k7t | _ | _ | | | <nowiki>2</nowiki> | <nowiki>aXikqwa</nowiki> | <nowiki>aXikqwa</nowiki> | <nowiki>XC</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-aXikqwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-20|chunkType-child:NP|name-aXikqwa</nowiki> | <nowiki>3</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | hAwera | hAwa | NP | NN | lex-hAwa<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-era<nowiki>|</nowiki>tam-era<nowiki>|</nowiki>head-hAwera<nowiki>|</nowiki>name-NP3 | 4 | r6 | _ | _ | | | <nowiki>3</nowiki> | <nowiki>kaSmIra</nowiki> | <nowiki>kaSmIra</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-kaSmIra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_4|name-kaSmIra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>8</nowiki> | <nowiki>k7p</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | GadZi | GadZi | NP | NN | lex-GadZi<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GadZi<nowiki>|</nowiki>name-NP4 | 5 | k2 | _ | _ | | | <nowiki>4</nowiki> | <nowiki>meM</nowiki> | <nowiki>meM</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP|name-meM</nowiki> | <nowiki>3</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | Kule | Kul | VGNF | VM | lex-Kul<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-ne<nowiki>|</nowiki>tam-ne<nowiki>|</nowiki>head-Kule<nowiki>|</nowiki>name-VGNF | 7 | vmod | _ | _ | | | <nowiki>5</nowiki> | <nowiki>�</nowiki> | <nowiki>�</nowiki> | <nowiki>XC</nowiki> | <nowiki>num</nowiki> | <nowiki>lex-�|cat-num|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-50|chunkType-child:NP2|name-�</nowiki> | <nowiki>6</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | tebile | tebila | NP | NN | lex-tebila<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-me<nowiki>|</nowiki>tam-me<nowiki>|</nowiki>head-tebile<nowiki>|</nowiki>name-NP5 | 7 | k7p | _ | _ | | | <nowiki>6</nowiki> | <nowiki>akwUbara</nowiki> | <nowiki>akwUbara</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-akwUbara|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-60|vpos-vib_3|name-akwUbara|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>8</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | rAKaCila | rAK | VGF | VM | lex-rAK<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-Cila<nowiki>|</nowiki>tam-Cila<nowiki>|</nowiki>head-rAKaCila<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>7</nowiki> | <nowiki>ko</nowiki> | <nowiki>ko</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP2|name-ko</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>8</nowiki> | <nowiki>Ae</nowiki> | <nowiki>A</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-A|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|posn-80|name-Ae|chunkId-VGNF|chunkType-head:VGNF</nowiki> | <nowiki>9</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>9</nowiki> | <nowiki>BUkaMpa</nowiki> | <nowiki>BUkaMpa</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-BUkaMpa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_se|tam-0|posn-90|vpos-vib_2|name-BUkaMpa|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>11</nowiki> | <nowiki>rh</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>se</nowiki> | <nowiki>se</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-se|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-100|chunkType-child:NP3|name-se</nowiki> | <nowiki>9</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>macI</nowiki> | <nowiki>maca</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-maca|cat-v|gend-f|num-sg|pers-any|case-|vib-yA|tam-yA|posn-110|name-macI|chunkId-VGNF2|chunkType-head:VGNF2</nowiki> | <nowiki>12</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>wabAhI</nowiki> | <nowiki>wabAhI</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-wabAhI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_kA_bAxa|tam-0|posn-120|vpos-vib_2_3|name-wabAhI|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>36</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>13</nowiki> | <nowiki>ke</nowiki> | <nowiki>kA</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-3|case-o|vib-|tam-|posn-130|chunkType-child:NP4|name-ke</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>14</nowiki> | <nowiki>bAxa</nowiki> | <nowiki>bAxa</nowiki> | <nowiki>NST</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAxa|cat-n|gend-|num-|pers-|case-|vib-|tam-|posn-140|chunkType-child:NP4|name-bAxa</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>15</nowiki> | <nowiki>BArawa</nowiki> | <nowiki>BArawa</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-BArawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-150|name-BArawa|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>16</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>16</nowiki> | <nowiki>Ora</nowiki> | <nowiki>Ora</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-160|name-Ora|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>36</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>17</nowiki> | <nowiki>pAkiswAna</nowiki> | <nowiki>pAkiswAna</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-170|name-pAkiswAna2|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>16</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>18</nowiki> | <nowiki>mAnavIya</nowiki> | <nowiki>mAnavIya</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-mAnavIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-180|chunkType-child:NP7|name-mAnavIya</nowiki> | <nowiki>19</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>19</nowiki> | <nowiki>xqRtikoNa</nowiki> | <nowiki>xqRtikoNa</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xqRtikoNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-190|name-xqRtikoNa|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>20</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>20</nowiki> | <nowiki>apanAwe</nowiki> | <nowiki>apanA</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-apanA|cat-v|gend-m|num-pl|pers-any|case-|vib-wA_ho+yA|tam-wA|posn-200|vpos-tam_2|name-apanAwe|chunkId-VGNF3|chunkType-head:VGNF3</nowiki> | <nowiki>36</nowiki> | <nowiki>vmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>21</nowiki> | <nowiki>hue</nowiki> | <nowiki>ho</nowiki> | <nowiki>VAUX</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-m|num-pl|pers-any|case-|vib-yA|tam-yA|posn-210|chunkType-child:VGNF3|name-hue</nowiki> | <nowiki>20</nowiki> | <nowiki>lwg__vaux</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>22</nowiki> | <nowiki>SanivAra</nowiki> | <nowiki>SanivAra</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-SanivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-220|vpos-vib_2|name-SanivAra|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>36</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>23</nowiki> | <nowiki>ko</nowiki> | <nowiki>ko</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP8|name-ko2</nowiki> | <nowiki>22</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>24</nowiki> | <nowiki>islAmAbAxa</nowiki> | <nowiki>isalAmAbAxa</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-isalAmAbAxa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0_meM|tam-0|posn-240|vpos-vib_2|name-islAmAbAxa|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>36</nowiki> | <nowiki>k7p</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>25</nowiki> | <nowiki>meM</nowiki> | <nowiki>meM</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-250|chunkType-child:NP9|name-meM2</nowiki> | <nowiki>24</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>26</nowiki> | <nowiki>niyaMwraNa</nowiki> | <nowiki>niyaMwraNa</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-niyaMwraNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-260|chunkType-child:NP10|name-niyaMwraNa</nowiki> | <nowiki>27</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>27</nowiki> | <nowiki>reKA</nowiki> | <nowiki>reKA</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-reKA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|name-reKA|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>31</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>28</nowiki> | <nowiki>(</nowiki> | <nowiki>(</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP11|name-(</nowiki> | <nowiki>29</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>29</nowiki> | <nowiki>elaosI</nowiki> | <nowiki>elaosI</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-elaosI|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-290|name-elaosI|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>27</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>30</nowiki> | <nowiki>)</nowiki> | <nowiki>)</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-300|chunkType-child:NP11|name-)</nowiki> | <nowiki>29</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>31</nowiki> | <nowiki>Kolane</nowiki> | <nowiki>Kola</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-Kola|cat-v|gend-any|num-sg|pers-any|case-o|vib-nA_kA|tam-nA|posn-310|vpos-tam_2|name-Kolane|chunkId-VGNN|chunkType-head:VGNN</nowiki> | <nowiki>33</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>32</nowiki> | <nowiki>ke</nowiki> | <nowiki>kA</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-320|chunkType-child:VGNN|name-ke2</nowiki> | <nowiki>31</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>33</nowiki> | <nowiki>masale</nowiki> | <nowiki>masalA</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-masalA|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_para|tam-0|posn-330|vpos-vib_2|name-masale|chunkId-NP12|chunkType-head:NP12</nowiki> | <nowiki>36</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>34</nowiki> | <nowiki>para</nowiki> | <nowiki>para</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-340|chunkType-child:NP12|name-para</nowiki> | <nowiki>33</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>35</nowiki> | <nowiki>bAwacIwa</nowiki> | <nowiki>bAwacIwa</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAwacIwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-350|name-bAwacIwa|chunkId-NP13|chunkType-head:NP13</nowiki> | <nowiki>36</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>36</nowiki> | <nowiki>kareMge</nowiki> | <nowiki>kara</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-pl|pers-3|case-|vib-gA|tam-gA|posn-360|name-kareMge|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>37</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-370|chunkType-child:VGF|name-.</nowiki> | <nowiki>36</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
And after conversion of the WX encoding to the Bengali script in UTF-8: | And after conversion of the WX encoding to the Devanagari script in UTF-8: |
| |
| 1 | মাধবীলতা | মাধবীলতা | NP | NNP | lex-mAXabIlawA<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-mAXabIlawA<nowiki>|</nowiki>name-NP | 7 | k1 | _ | _ | | | <nowiki>1</nowiki> | <nowiki>पाकिस्तान</nowiki> | <nowiki>पाकिस्तान</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-10|chunkType-child:NP|name-pAkiswAna</nowiki> | <nowiki>3</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | তখন | তখন | NP | PRP | lex-waKana<nowiki>|</nowiki>cat-pn<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-0<nowiki>|</nowiki>tam-0<nowiki>|</nowiki>head-waKana<nowiki>|</nowiki>name-NP2 | 7 | k7t | _ | _ | | | <nowiki>2</nowiki> | <nowiki>अधिकृत</nowiki> | <nowiki>अधिकृत</nowiki> | <nowiki>XC</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-aXikqwa|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-20|chunkType-child:NP|name-aXikqwa</nowiki> | <nowiki>3</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | হাতের | হাত | NP | NN | lex-hAwa<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-o<nowiki>|</nowiki>vib-era<nowiki>|</nowiki>tam-era<nowiki>|</nowiki>head-hAwera<nowiki>|</nowiki>name-NP3 | 4 | r6 | _ | _ | | | <nowiki>3</nowiki> | <nowiki>कश्मीर</nowiki> | <nowiki>कश्मीर</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-kaSmIra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_meM|tam-0|posn-30|vpos-vib_4|name-kaSmIra|chunkId-NP|chunkType-head:NP</nowiki> | <nowiki>8</nowiki> | <nowiki>k7p</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | ঘড়ি | ঘড়ি | NP | NN | lex-GadZi<nowiki>|</nowiki>cat-unk<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-<nowiki>|</nowiki>tam-<nowiki>|</nowiki>head-GadZi<nowiki>|</nowiki>name-NP4 | 5 | k2 | _ | _ | | | <nowiki>4</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-40|chunkType-child:NP|name-meM</nowiki> | <nowiki>3</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | খুলে | খুল্ | VGNF | VM | lex-Kul<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-ne<nowiki>|</nowiki>tam-ne<nowiki>|</nowiki>head-Kule<nowiki>|</nowiki>name-VGNF | 7 | vmod | _ | _ | | | <nowiki>5</nowiki> | <nowiki>�</nowiki> | <nowiki>�</nowiki> | <nowiki>XC</nowiki> | <nowiki>num</nowiki> | <nowiki>lex-�|cat-num|gend-m|num-sg|pers-3|case-d|vib-|tam-|posn-50|chunkType-child:NP2|name-�</nowiki> | <nowiki>6</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | টেবিলে | টেবিল | NP | NN | lex-tebila<nowiki>|</nowiki>cat-n<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-sg<nowiki>|</nowiki>pers-<nowiki>|</nowiki>case-d<nowiki>|</nowiki>vib-me<nowiki>|</nowiki>tam-me<nowiki>|</nowiki>head-tebile<nowiki>|</nowiki>name-NP5 | 7 | k7p | _ | _ | | | <nowiki>6</nowiki> | <nowiki>अक्तूबर</nowiki> | <nowiki>अक्तूबर</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-akwUbara|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-60|vpos-vib_3|name-akwUbara|chunkId-NP2|chunkType-head:NP2</nowiki> | <nowiki>8</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | রাখছিল | রাখ্ | VGF | VM | lex-rAK<nowiki>|</nowiki>cat-v<nowiki>|</nowiki>gend-<nowiki>|</nowiki>num-<nowiki>|</nowiki>pers-5<nowiki>|</nowiki>case-<nowiki>|</nowiki>vib-Cila<nowiki>|</nowiki>tam-Cila<nowiki>|</nowiki>head-rAKaCila<nowiki>|</nowiki>name-VGF | 0 | main | _ | _ | | | <nowiki>7</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-70|chunkType-child:NP2|name-ko</nowiki> | <nowiki>6</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>8</nowiki> | <nowiki>आए</nowiki> | <nowiki>आ</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-A|cat-v|gend-m|num-sg|pers-any|case-|vib-yA|tam-yA|posn-80|name-Ae|chunkId-VGNF|chunkType-head:VGNF</nowiki> | <nowiki>9</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>9</nowiki> | <nowiki>भूकंप</nowiki> | <nowiki>भूकंप</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-BUkaMpa|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_se|tam-0|posn-90|vpos-vib_2|name-BUkaMpa|chunkId-NP3|chunkType-head:NP3</nowiki> | <nowiki>11</nowiki> | <nowiki>rh</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>10</nowiki> | <nowiki>से</nowiki> | <nowiki>से</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-se|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-100|chunkType-child:NP3|name-se</nowiki> | <nowiki>9</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>11</nowiki> | <nowiki>मची</nowiki> | <nowiki>मच</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-maca|cat-v|gend-f|num-sg|pers-any|case-|vib-yA|tam-yA|posn-110|name-macI|chunkId-VGNF2|chunkType-head:VGNF2</nowiki> | <nowiki>12</nowiki> | <nowiki>nmod__k1inv</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>12</nowiki> | <nowiki>तबाही</nowiki> | <nowiki>तबाही</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-wabAhI|cat-n|gend-f|num-sg|pers-3|case-o|vib-0_kA_bAxa|tam-0|posn-120|vpos-vib_2_3|name-wabAhI|chunkId-NP4|chunkType-head:NP4</nowiki> | <nowiki>36</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>13</nowiki> | <nowiki>के</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-3|case-o|vib-|tam-|posn-130|chunkType-child:NP4|name-ke</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>14</nowiki> | <nowiki>बाद</nowiki> | <nowiki>बाद</nowiki> | <nowiki>NST</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAxa|cat-n|gend-|num-|pers-|case-|vib-|tam-|posn-140|chunkType-child:NP4|name-bAxa</nowiki> | <nowiki>12</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>15</nowiki> | <nowiki>भारत</nowiki> | <nowiki>भारत</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-BArawa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-150|name-BArawa|chunkId-NP5|chunkType-head:NP5</nowiki> | <nowiki>16</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>16</nowiki> | <nowiki>और</nowiki> | <nowiki>और</nowiki> | <nowiki>CC</nowiki> | <nowiki>avy</nowiki> | <nowiki>lex-Ora|cat-avy|gend-|num-|pers-|case-|vib-|tam-|posn-160|name-Ora|chunkId-CCP|chunkType-head:CCP</nowiki> | <nowiki>36</nowiki> | <nowiki>k1</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>17</nowiki> | <nowiki>पाकिस्तान</nowiki> | <nowiki>पाकिस्तान</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-pAkiswAna|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-170|name-pAkiswAna2|chunkId-NP6|chunkType-head:NP6</nowiki> | <nowiki>16</nowiki> | <nowiki>ccof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>18</nowiki> | <nowiki>मानवीय</nowiki> | <nowiki>मानवीय</nowiki> | <nowiki>JJ</nowiki> | <nowiki>adj</nowiki> | <nowiki>lex-mAnavIya|cat-adj|gend-any|num-any|pers-|case-o|vib-|tam-|posn-180|chunkType-child:NP7|name-mAnavIya</nowiki> | <nowiki>19</nowiki> | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>19</nowiki> | <nowiki>दृष्टिकोण</nowiki> | <nowiki>दृष्टिकोण</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-xqRtikoNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-190|name-xqRtikoNa|chunkId-NP7|chunkType-head:NP7</nowiki> | <nowiki>20</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>20</nowiki> | <nowiki>अपनाते</nowiki> | <nowiki>अपना</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-apanA|cat-v|gend-m|num-pl|pers-any|case-|vib-wA_ho+yA|tam-wA|posn-200|vpos-tam_2|name-apanAwe|chunkId-VGNF3|chunkType-head:VGNF3</nowiki> | <nowiki>36</nowiki> | <nowiki>vmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>21</nowiki> | <nowiki>हुए</nowiki> | <nowiki>हो</nowiki> | <nowiki>VAUX</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-ho|cat-v|gend-m|num-pl|pers-any|case-|vib-yA|tam-yA|posn-210|chunkType-child:VGNF3|name-hue</nowiki> | <nowiki>20</nowiki> | <nowiki>lwg__vaux</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>22</nowiki> | <nowiki>शनिवार</nowiki> | <nowiki>शनिवार</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-SanivAra|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_ko|tam-0|posn-220|vpos-vib_2|name-SanivAra|chunkId-NP8|chunkType-head:NP8</nowiki> | <nowiki>36</nowiki> | <nowiki>k7t</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>23</nowiki> | <nowiki>को</nowiki> | <nowiki>को</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-ko|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-230|chunkType-child:NP8|name-ko2</nowiki> | <nowiki>22</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>24</nowiki> | <nowiki>इस्लामाबाद</nowiki> | <nowiki>इसलामाबाद</nowiki> | <nowiki>NNP</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-isalAmAbAxa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0_meM|tam-0|posn-240|vpos-vib_2|name-islAmAbAxa|chunkId-NP9|chunkType-head:NP9</nowiki> | <nowiki>36</nowiki> | <nowiki>k7p</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>25</nowiki> | <nowiki>में</nowiki> | <nowiki>में</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-meM|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-250|chunkType-child:NP9|name-meM2</nowiki> | <nowiki>24</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>26</nowiki> | <nowiki>नियंत्रण</nowiki> | <nowiki>नियंत्रण</nowiki> | <nowiki>XC</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-niyaMwraNa|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-260|chunkType-child:NP10|name-niyaMwraNa</nowiki> | <nowiki>27</nowiki> | <nowiki>mod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>27</nowiki> | <nowiki>रेखा</nowiki> | <nowiki>रेखा</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-reKA|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-270|name-reKA|chunkId-NP10|chunkType-head:NP10</nowiki> | <nowiki>31</nowiki> | <nowiki>k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>28</nowiki> | <nowiki>(</nowiki> | <nowiki>(</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-280|chunkType-child:NP11|name-(</nowiki> | <nowiki>29</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>29</nowiki> | <nowiki>एलओसी</nowiki> | <nowiki>एलओसी</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-elaosI|cat-n|gend-m|num-sg|pers-3|case-d|vib-0|tam-0|posn-290|name-elaosI|chunkId-NP11|chunkType-head:NP11</nowiki> | <nowiki>27</nowiki> | <nowiki>nmod</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>30</nowiki> | <nowiki>)</nowiki> | <nowiki>)</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-300|chunkType-child:NP11|name-)</nowiki> | <nowiki>29</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>31</nowiki> | <nowiki>खोलने</nowiki> | <nowiki>खोल</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-Kola|cat-v|gend-any|num-sg|pers-any|case-o|vib-nA_kA|tam-nA|posn-310|vpos-tam_2|name-Kolane|chunkId-VGNN|chunkType-head:VGNN</nowiki> | <nowiki>33</nowiki> | <nowiki>r6</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>32</nowiki> | <nowiki>के</nowiki> | <nowiki>का</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-kA|cat-psp|gend-m|num-sg|pers-|case-o|vib-|tam-|posn-320|chunkType-child:VGNN|name-ke2</nowiki> | <nowiki>31</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>33</nowiki> | <nowiki>मसले</nowiki> | <nowiki>मसला</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-masalA|cat-n|gend-m|num-sg|pers-3|case-o|vib-0_para|tam-0|posn-330|vpos-vib_2|name-masale|chunkId-NP12|chunkType-head:NP12</nowiki> | <nowiki>36</nowiki> | <nowiki>k7</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>34</nowiki> | <nowiki>पर</nowiki> | <nowiki>पर</nowiki> | <nowiki>PSP</nowiki> | <nowiki>psp</nowiki> | <nowiki>lex-para|cat-psp|gend-|num-|pers-|case-|vib-|tam-|posn-340|chunkType-child:NP12|name-para</nowiki> | <nowiki>33</nowiki> | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>35</nowiki> | <nowiki>बातचीत</nowiki> | <nowiki>बातचीत</nowiki> | <nowiki>NN</nowiki> | <nowiki>n</nowiki> | <nowiki>lex-bAwacIwa|cat-n|gend-f|num-sg|pers-3|case-d|vib-0|tam-0|posn-350|name-bAwacIwa|chunkId-NP13|chunkType-head:NP13</nowiki> | <nowiki>36</nowiki> | <nowiki>pof</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>36</nowiki> | <nowiki>करेंगे</nowiki> | <nowiki>कर</nowiki> | <nowiki>VM</nowiki> | <nowiki>v</nowiki> | <nowiki>lex-kara|cat-v|gend-m|num-pl|pers-3|case-|vib-gA|tam-gA|posn-360|name-kareMge|chunkId-VGF|chunkType-head:VGF</nowiki> | <nowiki>0</nowiki> | <nowiki>main</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | <nowiki>37</nowiki> | <nowiki>.</nowiki> | <nowiki>.</nowiki> | <nowiki>SYM</nowiki> | <nowiki>punc</nowiki> | <nowiki>lex-.|cat-punc|gend-|num-|pers-|case-|vib-|tam-|posn-370|chunkType-child:VGF|name-.</nowiki> | <nowiki>36</nowiki> | <nowiki>rsym</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
| The first sentence of the HPST 2012 training data in UTF8 SSF format with gold-standard morphology: |
| |
| <code xml><Sentence id='1'> |
| 1 गुजरात NNP <fs af='गुजरात,n,m,sg,3,o,0_का,0' name='गुजरात' posn='10' chunkId='NP' drel='r6:मुख्यमंत्री' vpos='vib_2' chunkType='head:NP'> |
| 2 के PSP <fs af='का,psp,m,sg,,o,,' name='के' posn='20' drel='lwg__psp:गुजरात' chunkType='child:NP'> |
| 3 मुख्यमंत्री NNP <fs af='मुख्यमंत्री,n,m,sg,3,o,0,0' name='मुख्यमंत्री' posn='30' chunkId='NP2' drel='nmod:मोदी' chunkType='head:NP2'> |
| 4 नरेंद्र NNPC <fs af='नरेंद्र,n,m,sg,3,d,0,0' name='नरेंद्र' posn='40' drel='pof__cn:मोदी' chunkType='child:NP3'> |
| 5 मोदी NNP <fs af='मोदी,n,m,sg,3,o,0_ने,0' name='मोदी' posn='50' chunkId='NP3' drel='k1:किया' vpos='vib_3' chunkType='head:NP3'> |
| 6 ने PSP <fs af='ने,psp,,,,,,' name='ने' posn='60' drel='lwg__psp:मोदी' chunkType='child:NP3'> |
| 7 मंगलवार NNP <fs af='मंगलवार,n,m,sg,3,o,0_को,0' name='मंगलवार' posn='70' chunkId='NP4' drel='k7t:किया' vpos='vib_2' chunkType='head:NP4'> |
| 8 को PSP <fs af='को,psp,,,,,,' name='को' posn='80' drel='lwg__psp:मंगलवार' chunkType='child:NP4'> |
| 9 गृह NNPC <fs af='गृह,n,m,sg,3,d,0,0' name='गृह' posn='90' drel='pof__cn:मंत्री' chunkType='child:NP5'> |
| 10 मंत्री NNP <fs af='मंत्री,n,m,sg,3,d,0,0' name='मंत्री' posn='100' drel='nmod__adj:पाटिल' chunkType='child:NP5'> |
| 11 शिवराज NNPC <fs af='शिवराज,n,m,sg,3,d,0,0' name='शिवराज' posn='110' drel='pof__cn:पाटिल' chunkType='child:NP5'> |
| 12 पाटिल NNP <fs af='पाटिल,n,m,sg,3,o,0_से,0' name='पाटिल' posn='120' chunkId='NP5' drel='k4:किया' vpos='vib_vib_5' chunkType='head:NP5'> |
| 13 से PSP <fs af='से,psp,,,,,,' name='से' posn='130' drel='lwg__psp:पाटिल' chunkType='child:NP5'> |
| 14 मुलाकात NN <fs af='मुलाकात,n,f,sg,3,d,0,0' name='मुलाकात' posn='140' chunkId='NP6' drel='pof:कर' chunkType='head:NP6'> |
| 15 कर VM <fs af='कर,v,any,any,any,,0,0' name='कर' posn='150' chunkId='VGNF' drel='vmod:किया' chunkType='head:VGNF'> |
| 16 आईएएस NNP <fs af='आईएएस,n,m,sg,3,o,0,0' name='आईएएस' posn='160' chunkId='NP7' drel='ccof:और' chunkType='head:NP7'> |
| 17 और CC <fs af='और,avy,,,,,,' name='और' posn='170' chunkId='CCP' drel='r6:तर्ज' chunkType='head:CCP'> |
| 18 आईपीएस NNP <fs af='आईपीएस,n,m,sg,3,o,0_का,0' name='आईपीएस' posn='180' chunkId='NP8' drel='ccof:और' vpos='vib_2' chunkType='head:NP8'> |
| 19 की PSP <fs af='का,psp,f,sg,,o,,' name='की' posn='190' drel='lwg__psp:आईपीएस' chunkType='child:NP8'> |
| 20 तर्ज NN <fs af='तर्ज,n,f,sg,3,o,0_पर,0' name='तर्ज' posn='200' chunkId='NP9' drel='k7:किया' vpos='vib_2' chunkType='head:NP9'> |
| 21 पर PSP <fs af='पर,psp,,,,,,' name='पर' posn='210' drel='lwg__psp:तर्ज' chunkType='child:NP9'> |
| 22 राष्ट्रीय JJ <fs af='राष्ट्रीय,adj,any,any,,o,,' name='राष्ट्रीय' posn='220' drel='nmod__adj:स्तर' chunkType='child:NP10'> |
| 23 स्तर NN <fs af='स्तर,n,m,sg,3,o,0_पर,0' name='स्तर' posn='230' chunkId='NP10' drel='k7:किया' vpos='vib_3' chunkType='head:NP10'> |
| 24 पर PSP <fs af='पर,psp,,,,,,' name='पर2' posn='240' drel='lwg__psp:स्तर' chunkType='child:NP10'> |
| 25 एक QC <fs af='एक,num,any,any,,any,,' name='एक' posn='250' drel='nmod__adj:सेवा' chunkType='child:NP11'> |
| 26 खुफिया JJ <fs af='खुफिया,adj,any,any,,d,,' name='खुफिया' posn='260' drel='nmod__adj:सेवा' chunkType='child:NP11'> |
| 27 सेवा NN <fs af='सेवा,n,f,sg,3,d,0,0' name='सेवा' posn='270' chunkId='NP11' drel='k2:करने' chunkType='head:NP11'> |
| 28 शुरू NN <fs af='शुरू,n,m,sg,3,d,0,0' name='शुरू' posn='280' chunkId='NP12' drel='pof:करने' chunkType='head:NP12'> |
| 29 करने VM <fs af='कर,v,any,sg,any,o,ना_का,nA' name='करने' posn='290' chunkId='VGNN' drel='r6-k2:अनुरोध' vpos='tam_2' chunkType='head:VGNN'> |
| 30 का PSP <fs af='का,psp,m,sg,,d,,' name='का' posn='300' drel='lwg__psp:करने' chunkType='child:VGNN'> |
| 31 अनुरोध NN <fs af='अनुरोध,n,m,sg,3,d,0,0' name='अनुरोध' posn='310' chunkId='NP13' drel='pof:किया' chunkType='head:NP13'> |
| 32 किया VM <fs af='कर,v,m,sg,any,,या,yA' name='किया' posn='320' chunkId='VGF' chunkType='head:VGF' voicetype='active' stype='declarative'> |
| 33 । SYM <fs af='।,punc,,,,,,' name='।' posn='330' chunkId='BLK' drel='rsym:किया' chunkType='head:BLK'> |
| </Sentence></code> |
| |
| And the same in CoNLL format: |
| |
| | 1 | <nowiki>गुजरात</nowiki> | <nowiki>गुजरात</nowiki> | NNP | n | <nowiki>lex-गुजरात|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_का|tam-0|chunkId-NP|chunkType-head|stype-|voicetype-</nowiki> | 3 | r6 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 2 | <nowiki>के</nowiki> | <nowiki>का</nowiki> | PSP | psp | <nowiki>lex-का|cat-psp|gen-m|num-sg|pers-|case-o|vib-|tam-|chunkId-NP|chunkType-child|stype-|voicetype-</nowiki> | 1 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 3 | <nowiki>मुख्यमंत्री</nowiki> | <nowiki>मुख्यमंत्री</nowiki> | NNP | n | <nowiki>lex-मुख्यमंत्री|cat-n|gen-m|num-sg|pers-3|case-o|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype-</nowiki> | 5 | nmod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 4 | <nowiki>नरेंद्र</nowiki> | <nowiki>नरेंद्र</nowiki> | NNPC | n | <nowiki>lex-नरेंद्र|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-child|stype-|voicetype-</nowiki> | 5 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 5 | <nowiki>मोदी</nowiki> | <nowiki>मोदी</nowiki> | NNP | n | <nowiki>lex-मोदी|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_ने|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype-</nowiki> | 32 | k1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 6 | <nowiki>ने</nowiki> | <nowiki>ने</nowiki> | PSP | psp | <nowiki>lex-ने|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype-</nowiki> | 5 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 7 | <nowiki>मंगलवार</nowiki> | <nowiki>मंगलवार</nowiki> | NNP | n | <nowiki>lex-मंगलवार|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_को|tam-0|chunkId-NP4|chunkType-head|stype-|voicetype-</nowiki> | 32 | k7t | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 8 | <nowiki>को</nowiki> | <nowiki>को</nowiki> | PSP | psp | <nowiki>lex-को|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP4|chunkType-child|stype-|voicetype-</nowiki> | 7 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 9 | <nowiki>गृह</nowiki> | <nowiki>गृह</nowiki> | NNPC | n | <nowiki>lex-गृह|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype-</nowiki> | 10 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 10 | <nowiki>मंत्री</nowiki> | <nowiki>मंत्री</nowiki> | NNP | n | <nowiki>lex-मंत्री|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype-</nowiki> | 12 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 11 | <nowiki>शिवराज</nowiki> | <nowiki>शिवराज</nowiki> | NNPC | n | <nowiki>lex-शिवराज|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-child|stype-|voicetype-</nowiki> | 12 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 12 | <nowiki>पाटिल</nowiki> | <nowiki>पाटिल</nowiki> | NNP | n | <nowiki>lex-पाटिल|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_से|tam-0|chunkId-NP5|chunkType-head|stype-|voicetype-</nowiki> | 32 | k4 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 13 | <nowiki>से</nowiki> | <nowiki>से</nowiki> | PSP | psp | <nowiki>lex-से|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP5|chunkType-child|stype-|voicetype-</nowiki> | 12 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 14 | <nowiki>मुलाकात</nowiki> | <nowiki>मुलाकात</nowiki> | NN | n | <nowiki>lex-मुलाकात|cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP6|chunkType-head|stype-|voicetype-</nowiki> | 15 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 15 | कर | कर | VM | v | <nowiki>lex-कर|cat-v|gen-any|num-any|pers-any|case-|vib-0|tam-0|chunkId-VGNF|chunkType-head|stype-|voicetype-</nowiki> | 32 | vmod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 16 | आईएएस | आईएएस | NNP | n | <nowiki>lex-आईएएस|cat-n|gen-m|num-sg|pers-3|case-o|vib-0|tam-0|chunkId-NP7|chunkType-head|stype-|voicetype-</nowiki> | 17 | ccof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 17 | और | और | CC | avy | <nowiki>lex-और|cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-CCP|chunkType-head|stype-|voicetype-</nowiki> | 20 | r6 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 18 | <nowiki>आईपीएस</nowiki> | <nowiki>आईपीएस</nowiki> | NNP | n | <nowiki>lex-आईपीएस|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_का|tam-0|chunkId-NP8|chunkType-head|stype-|voicetype-</nowiki> | 17 | ccof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 19 | <nowiki>की</nowiki> | <nowiki>का</nowiki> | PSP | psp | <nowiki>lex-का|cat-psp|gen-f|num-sg|pers-|case-o|vib-|tam-|chunkId-NP8|chunkType-child|stype-|voicetype-</nowiki> | 18 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 20 | <nowiki>तर्ज</nowiki> | <nowiki>तर्ज</nowiki> | NN | n | <nowiki>lex-तर्ज|cat-n|gen-f|num-sg|pers-3|case-o|vib-0_पर|tam-0|chunkId-NP9|chunkType-head|stype-|voicetype-</nowiki> | 32 | k7 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 21 | पर | पर | PSP | psp | <nowiki>lex-पर|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP9|chunkType-child|stype-|voicetype-</nowiki> | 20 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 22 | <nowiki>राष्ट्रीय</nowiki> | <nowiki>राष्ट्रीय</nowiki> | JJ | adj | <nowiki>lex-राष्ट्रीय|cat-adj|gen-any|num-any|pers-|case-o|vib-|tam-|chunkId-NP10|chunkType-child|stype-|voicetype-</nowiki> | 23 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | <nowiki>स्तर</nowiki> | <nowiki>स्तर</nowiki> | NN | n | <nowiki>lex-स्तर|cat-n|gen-m|num-sg|pers-3|case-o|vib-0_पर|tam-0|chunkId-NP10|chunkType-head|stype-|voicetype-</nowiki> | 32 | k7 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 24 | पर | पर | PSP | psp | <nowiki>lex-पर|cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP10|chunkType-child|stype-|voicetype-</nowiki> | 23 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | एक | एक | QC | num | <nowiki>lex-एक|cat-num|gen-any|num-any|pers-|case-any|vib-|tam-|chunkId-NP11|chunkType-child|stype-|voicetype-</nowiki> | 27 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 26 | <nowiki>खुफिया</nowiki> | <nowiki>खुफिया</nowiki> | JJ | adj | <nowiki>lex-खुफिया|cat-adj|gen-any|num-any|pers-|case-d|vib-|tam-|chunkId-NP11|chunkType-child|stype-|voicetype-</nowiki> | 27 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 27 | <nowiki>सेवा</nowiki> | <nowiki>सेवा</nowiki> | NN | n | <nowiki>lex-सेवा|cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP11|chunkType-head|stype-|voicetype-</nowiki> | 29 | k2 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 28 | <nowiki>शुरू</nowiki> | <nowiki>शुरू</nowiki> | NN | n | <nowiki>lex-शुरू|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP12|chunkType-head|stype-|voicetype-</nowiki> | 29 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 29 | <nowiki>करने</nowiki> | कर | VM | v | <nowiki>lex-कर|cat-v|gen-any|num-sg|pers-any|case-o|vib-ना_का|tam-nA|chunkId-VGNN|chunkType-head|stype-|voicetype-</nowiki> | 31 | <nowiki>r6-k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 30 | <nowiki>का</nowiki> | <nowiki>का</nowiki> | PSP | psp | <nowiki>lex-का|cat-psp|gen-m|num-sg|pers-|case-d|vib-|tam-|chunkId-VGNN|chunkType-child|stype-|voicetype-</nowiki> | 29 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 31 | <nowiki>अनुरोध</nowiki> | <nowiki>अनुरोध</nowiki> | NN | n | <nowiki>lex-अनुरोध|cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP13|chunkType-head|stype-|voicetype-</nowiki> | 32 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 32 | <nowiki>किया</nowiki> | कर | VM | v | <nowiki>lex-कर|cat-v|gen-m|num-sg|pers-any|case-|vib-या|tam-yA|chunkId-VGF|chunkType-head|stype-declarative'>|voicetype-active</nowiki> | 0 | main | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 33 | <nowiki>।</nowiki> | <nowiki>।</nowiki> | SYM | punc | <nowiki>lex-।|cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype-</nowiki> | 32 | rsym | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
| The same sentence with “automatically tagged” morphology. Apparently it means no morphology at all, and the contestants should probably use their own taggers to tag it. |
| |
| | 1 | <nowiki>गुजरात</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 3 | r6 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 2 | <nowiki>के</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 1 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 3 | <nowiki>मुख्यमंत्री</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 5 | nmod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 4 | <nowiki>नरेंद्र</nowiki> | <nowiki>_</nowiki> | NNPC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 5 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 5 | <nowiki>मोदी</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | k1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 6 | <nowiki>ने</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 5 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 7 | <nowiki>मंगलवार</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | k7t | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 8 | <nowiki>को</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 7 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 9 | <nowiki>गृह</nowiki> | <nowiki>_</nowiki> | NNPC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 10 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 10 | <nowiki>मंत्री</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 12 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 11 | <nowiki>शिवराज</nowiki> | <nowiki>_</nowiki> | NNPC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 12 | <nowiki>pof__cn</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 12 | <nowiki>पाटिल</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | k4 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 13 | <nowiki>से</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 12 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 14 | <nowiki>मुलाकात</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 15 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 15 | कर | <nowiki>_</nowiki> | VM | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | vmod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 16 | आईएएस | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 17 | ccof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 17 | और | <nowiki>_</nowiki> | CC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 20 | r6 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 18 | <nowiki>आईपीएस</nowiki> | <nowiki>_</nowiki> | NNP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 17 | ccof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 19 | <nowiki>की</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 18 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 20 | <nowiki>तर्ज</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | k7 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 21 | पर | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 20 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 22 | <nowiki>राष्ट्रीय</nowiki> | <nowiki>_</nowiki> | JJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 23 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | <nowiki>स्तर</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | k7 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 24 | पर | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 23 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | एक | <nowiki>_</nowiki> | QC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 27 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 26 | <nowiki>खुफिया</nowiki> | <nowiki>_</nowiki> | NNC | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 27 | <nowiki>nmod__adj</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 27 | <nowiki>सेवा</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 29 | k2 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 28 | <nowiki>शुरू</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 29 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 29 | <nowiki>करने</nowiki> | <nowiki>_</nowiki> | VM | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 31 | <nowiki>r6-k2</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 30 | <nowiki>का</nowiki> | <nowiki>_</nowiki> | PSP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 29 | <nowiki>lwg__psp</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 31 | <nowiki>अनुरोध</nowiki> | <nowiki>_</nowiki> | NN | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | pof | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 32 | <nowiki>किया</nowiki> | <nowiki>_</nowiki> | VM | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 0 | main | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 33 | <nowiki>।</nowiki> | <nowiki>_</nowiki> | SYM | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 32 | rsym | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
| The first sentence of the development data in the UTF8 SSF format with gold-standard morphology: |
| |
| <code xml><Sentence id='1'> |
| 1 भाजपा NNP <fs af='भाजपा,n,f,sg,3,o,0_ने,0' name='भाजपा' posn='10' chunkId='NP' drel='k1:लगाया' vpos='vib_2' chunkType='head:NP'> |
| 2 ने PSP <fs af='ने,psp,,,,,,' name='ने' posn='20' drel='lwg__psp:भाजपा' chunkType='child:NP'> |
| 3 केंद्र NNPC <fs name='केंद्र' chunkId='FRAGP' chunkType='head:'FRAGP' drel='ccof:और'> |
| 4 और CC <fs af='और,avy,,,,,,' name='और' posn='40' chunkId='CCP' drel='nmod:सरकार' chunkType='head:CCP'> |
| 5 केरल NNPC <fs name='केरल' chunkId='FRAGP2' chunkType='head:'FRAGP2' drel='ccof:और'> |
| 6 सरकार NNP <fs af='सरकार,n,f,sg,3,o,0_पर,0' name='सरकार' posn='60' chunkId='NP2' drel='k7:लगाया' vpos='vib_2' chunkType='head:NP2'> |
| 7 पर PSP <fs af='पर,psp,,,,,,' name='पर' posn='70' drel='lwg__psp:सरकार' chunkType='child:NP2'> |
| 8 भारतीय JJ <fs af='भारतीय,adj,any,any,,o,,' name='भारतीय' posn='80' drel='nmod__adj:ड्राइवर' chunkType='child:NP3'> |
| 9 ड्राइवर NN <fs af='ड्राइवर,n,m,sg,3,o,0,0' name='ड्राइवर' posn='90' chunkId='NP3' drel='nmod:कुट्टी' chunkType='head:NP3'> |
| 10 एम. NNPC <fs af='एम.,n,m,sg,3,d,0,0' name='एम.' posn='100' drel='pof__cn:कुट्टी' chunkType='child:NP4'> |
| 11 आर. NNPC <fs af='आर.,n,m,sg,3,d,0,0' name='आर.' posn='110' drel='pof__cn:कुट्टी' chunkType='child:NP4'> |
| 12 कुट्टी NNP <fs af='कुट्टी,n,m,sg,3,o,0_का,0' name='कुट्टी' posn='120' chunkId='NP4' drel='r6:हत्या' vpos='vib_4' chunkType='head:NP4'> |
| 13 की PSP <fs af='का,psp,f,sg,,o,,' name='की' posn='130' drel='lwg__psp:कुट्टी' chunkType='child:NP4'> |
| 14 हत्या NN <fs af='हत्या,n,f,sg,3,o,0_के_लिए,0' name='हत्या' posn='140' chunkId='NP5' drel='jjmod:जिम्मेदार' vpos='vib_2_3' chunkType='head:NP5'> |
| 15 के PSP <fs af='के,psp,,,,,,' name='के' posn='150' drel='lwg__psp:हत्या' chunkType='child:NP5'> |
| 16 लिए PSP <fs af='लिए,psp,,,,,,' name='लिए' posn='160' drel='lwg__cont:हत्या' chunkType='child:NP5'> |
| 17 जिम्मेदार JJ <fs af='जिम्मेदार,adj,any,any,,o,,' name='जिम्मेदार' posn='170' chunkId='JJP' drel='nmod:तालिबान' chunkType='head:JJP'> |
| 18 तालिबान NNP <fs af='तालिबान,n,m,sg,3,o,0_के_साथ,0' name='तालिबान' posn='180' chunkId='NP6' drel='ras-k1:लगाया' vpos='vib_2_3' chunkType='head:NP6'> |
| 19 के PSP <fs af='के,psp,,,,,,' name='के2' posn='190' drel='lwg__psp:तालिबान' chunkType='child:NP6'> |
| 20 साथ NST <fs af='साथ,nst,m,sg,3,d,,' name='साथ' posn='200' drel='lwg__cont:तालिबान' chunkType='child:NP6'> |
| 21 निपटने VM <fs af='निपट,v,any,any,any,o,ना_में,nA' name='निपटने' posn='210' chunkId='VGNN' drel='k7:लगाया' vpos='tam_2' chunkType='head:VGNN'> |
| 22 में PSP <fs af='में,psp,,,,,,' name='में' posn='220' drel='lwg__psp:निपटने' chunkType='child:VGNN'> |
| 23 ढिलाई NN <fs af='ढिलाई,n,f,sg,3,d,0,0' name='ढिलाई' posn='230' chunkId='NP7' drel='k2:बरतने' chunkType='head:NP7'> |
| 24 बरतने VM <fs af='बरत,v,any,sg,any,o,ना_का,nA' name='बरतने' posn='240' chunkId='VGNN2' drel='r6:आरोप' vpos='tam_2' chunkType='head:VGNN2'> |
| 25 का PSP <fs af='का,psp,m,sg,,d,,' name='का' posn='250' drel='lwg__psp:बरतने' chunkType='child:VGNN2'> |
| 26 आरोप NN <fs af='आरोप,n,m,sg,3,d,0,0' name='आरोप' posn='260' chunkId='NP8' drel='k2:लगाया' chunkType='head:NP8'> |
| 27 लगाया VM <fs af='लगा,v,m,sg,3,,या_है,yA' name='लगाया' posn='270' chunkId='VGF' chunkType='head:VGF' voicetype='active' vpos='tam_2' stype='declarative'> |
| 28 है VAUX <fs af='है,v,any,sg,3,,है,hE' name='है' posn='280' drel='lwg__vaux:लगाया' chunkType='child:VGF'> |
| 29 । SYM <fs af='।,punc,,,,,,' name='।' posn='290' chunkId='BLK' drel='rsym:लगाया' chunkType='head:BLK'> |
| </Sentence></code> |
| |
==== Parsing ==== | ==== Parsing ==== |
| |
Nonprojectivities in HyDT-Bangla are not frequent. Only 78 of the 7252 chunks in the training+development ICON 2010 version are attached nonprojectively (1.08%). | Nonprojectivities in HyDT-Hindi are not frequent. Only 862 of the 77068 chunks in the training+development ICON 2010 version are attached nonprojectively (1.12%). |
| |
The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Bengali/coarse-grained results of the four officially ranked systems, and the best Bengali-only* system. | The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Hindi/coarse-grained results of the four officially ranked systems. |
| |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ |
| Kolkata (De et al.)* | 84.29 | 90.32 | | | Hyderabad (Ambati et al.) | 79.33 | 90.22 | |
| Hyderabad (Ambati et al.) | 78.25 | 90.22 | | | Malt (Nivre) | 78.20 | 89.36 | |
| Malt (Nivre) | 76.07 | 88.97 | | | Malt+MST (Zeman) | 73.88 | 88.49 | |
| Malt+MST (Zeman) | 71.49 | 86.89 | | | Mannem | 76.90 | 88.06 | |
| Mannem | 70.34 | 83.56 | | |
| |
The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Bengali with fine-grained syntactic tags: | The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Hindi with fine-grained syntactic tags: |
| |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ |
| Attardi et al. | 70.66 | 87.41 | | | Attardi et al. | 87.49 | 94.78 | |
| Kosaraju et al. | 70.55 | 86.16 | | | Kosaraju et al. | 88.63 | 94.54 | |
| Kolachina et al. | 70.14 | 87.10 | | | Kolachina et al. | 86.22 | 93.25 | |
| |