[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hi [2011/12/06 22:35]
zeman Inside.
user:zeman:treebanks:hi [2011/12/08 08:38]
zeman Zarovnání čísel v tabulkách.
Line 46: Line 46:
  
 ^ Part ^ Sentences ^ Chunks ^ Ratio ^ ^ Part ^ Sentences ^ Chunks ^ Ratio ^
-| Training | 1501 | 13779 | 9.18 | +| Training |    1501 |  13779 |  9.18 | 
-| Development | 150 | 1250 | 8.33 | +| Development |  150 |   1250 |  8.33 | 
-| Test | 150 | 1156 | 7.71 | +| Test |         150 |   1156 |  7.71 | 
-| TOTAL | 1801 | 16185 | 8.99 |+| TOTAL |       1801 |  16185 |  8.99 |
  
 The ICON 2010 version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added: The ICON 2010 version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added:
  
 ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^
-| Training | 2972 | | | 64452 | 21.69 | +| Training |    2972 | | |  64452 |  21.69 | 
-| Development | 543 | | | 12616 | 23.23 | +| Development |  543 | | |  12616 |  23.23 | 
-| Test | 321 | | | 6588 | 20.52 | +| Test |         321 | | |   6588 |  20.52 | 
-| TOTAL | 3836 | | | 83656 | 21.81 |+| TOTAL |       3836 | | |  83656 |  21.81 |
  
 I have counted the sentences and tokens (words) on the ''.conll'' files; there are slight differences from the statistics presented in (Husain et al., 2010). I have counted the sentences and tokens (words) on the ''.conll'' files; there are slight differences from the statistics presented in (Husain et al., 2010).
Line 587: Line 587:
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in HyDT-Bangla are not frequent. Only 78 of the 7252 chunks in the training+development ICON 2010 version are attached nonprojectively (1.08%).+Nonprojectivities in HyDT-Hindi are not frequent. Only 862 of the 77068 chunks in the training+development ICON 2010 version are attached nonprojectively (1.12%).
  
-The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Bengali/coarse-grained results of the four officially ranked systems, and the best Bengali-only* system.+The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Hindi/coarse-grained results of the four officially ranked systems.
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Kolkata (De et al.)* | 84.29 | 90.32 | +| Hyderabad (Ambati et al.) | 79.33 | 90.22 | 
-| Hyderabad (Ambati et al.) | 78.25 | 90.22 | +| Malt (Nivre) | 78.20 89.36 
-| Malt (Nivre) | 76.07 88.97 +| Malt+MST (Zeman) | 73.88 88.49 
-| Malt+MST (Zeman) | 71.49 86.89 +| Mannem | 76.90 88.06 |
-| Mannem | 70.34 83.56 |+
  
-The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Bengali with fine-grained syntactic tags:+The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Hindi with fine-grained syntactic tags:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Attardi et al. | 70.66 87.41 +| Attardi et al. | 87.49 94.78 
-| Kosaraju et al. | 70.55 86.16 +| Kosaraju et al. | 88.63 94.54 
-| Kolachina et al. | 70.14 87.10 |+| Kolachina et al. | 86.22 93.25 |
  

[ Back to the navigation ] [ Back to the content ]