Differences

This shows you the differences between two versions of the page.

--- user:zeman:treebanks:hi [2011/12/06 22:35]
zeman Inside.
+++ user:zeman:treebanks:hi [2011/12/08 08:38]
zeman Zarovnání čísel v tabulkách.
@@ Line 46: / Line 46: @@
 ^ Part ^ Sentences ^ Chunks ^ Ratio ^
-| Training | 1501 | 13779 | 9.18 |
+| Training |    1501 |  13779 |  9.18 |
-| Development | 150 | 1250 | 8.33 |
+| Development |  150 |   1250 |  8.33 |
-| Test | 150 | 1156 | 7.71 |
+| Test |         150 |   1156 |  7.71 |
-| TOTAL | 1801 | 16185 | 8.99 |
+| TOTAL |       1801 |  16185 |  8.99 |
 The ICON 2010 version came with a data split into three parts: training, development and test. The intra-chunk dependencies have been added:
 ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^
-| Training | 2972 | | | 64452 | 21.69 |
+| Training |    2972 | | |  64452 |  21.69 |
-| Development | 543 | | | 12616 | 23.23 |
+| Development |  543 | | |  12616 |  23.23 |
-| Test | 321 | | | 6588 | 20.52 |
+| Test |         321 | | |   6588 |  20.52 |
-| TOTAL | 3836 | | | 83656 | 21.81 |
+| TOTAL |       3836 | | |  83656 |  21.81 |
 I have counted the sentences and tokens (words) on the ''.conll'' files; there are slight differences from the statistics presented in (Husain et al., 2010).
@@ Line 587: / Line 587: @@
 ==== Parsing ====
-Nonprojectivities in HyDT-Bangla are not frequent. Only 78 of the 7252 chunks in the training+development ICON 2010 version are attached nonprojectively (1.08%).
+Nonprojectivities in HyDT-Hindi are not frequent. Only 862 of the 77068 chunks in the training+development ICON 2010 version are attached nonprojectively (1.12%).
-The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Bengali/coarse-grained results of the four officially ranked systems, and the best Bengali-only* system.
+The results of the ICON 2009 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2009/CR/intro-husain.pdf|(Husain, 2009)]]. There were two evaluation rounds, the first with the coarse-grained syntactic tags, the second with the fine-grained syntactic tags. To reward language independence, only systems that parsed all three languages were officially ranked. The following table presents the Hindi/coarse-grained results of the four officially ranked systems.
 ^ Parser (Authors) ^ LAS ^ UAS ^
-| Kolkata (De et al.)* | 84.29 | 90.32 |
+| Hyderabad (Ambati et al.) | 79.33 | 90.22 |
-| Hyderabad (Ambati et al.) | 78.25 | 90.22 |
+| Malt (Nivre) | 78.20 | 89.36 |
-| Malt (Nivre) | 76.07 | 88.97 |
+| Malt+MST (Zeman) | 73.88 | 88.49 |
-| Malt+MST (Zeman) | 71.49 | 86.89 |
+| Mannem | 76.90 | 88.06 |
-| Mannem | 70.34 | 83.56 |
-The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Bengali with fine-grained syntactic tags:
+The results of the ICON 2010 NLP tools contest have been published in [[http://ltrc.iiit.ac.in/nlptools2010/files/documents/toolscontest10-workshoppaper-final.pdf|(Husain et al., 2010)]], page 6. These are the best results for Hindi with fine-grained syntactic tags:
 ^ Parser (Authors) ^ LAS ^ UAS ^
-| Attardi et al. | 70.66 | 87.41 |
+| Attardi et al. | 87.49 | 94.78 |
-| Kosaraju et al. | 70.55 | 86.16 |
+| Kosaraju et al. | 88.63 | 94.54 |
-| Kolachina et al. | 70.14 | 87.10 |
+| Kolachina et al. | 86.22 | 93.25 |

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences