Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:treebanks:te [2012/03/22 11:14] zeman vytvořeno |
user:zeman:treebanks:te [2012/03/22 11:46] zeman ICON 2009 Telugu data size. |
||
---|---|---|---|
Line 46: | Line 46: | ||
^ Part ^ Sentences ^ Chunks ^ Ratio ^ | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ | ||
- | | Training | 980 | 6449 | 6.58 | | + | | Training |
- | | Development | 150 | 811 | 5.41 | | + | | Development | |
- | | Test | 150 | 961 | 6.41 | | + | | Test | |
- | | TOTAL | 1280 | 8221 | 6.42 | | + | | TOTAL |
- | The ICON 2010 version came with a data split into three parts: | + | The data distributed for ICON 2010 was slightly smaller, maybe it had been cleaned up? Note that the number of training |
^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ | ||
- | | Training | 979 | 6440 | 6.58 | 10305 | 10.52 | | + | | Training | 1400 | ? | ? | 7602 | 5.43 | |
- | | Development | 150 | 812 | 5.41 | 1196 | 7.97 | | + | | Development | 150 | ? | ? | 839 | 5.59 | |
- | | Test | 150 | 961 | 6.41 | 1350 | 9.00 | | + | | Test | 150 | ? | ? | 836 | 5.57 | |
- | | TOTAL | 1279 | 8213 | 6.42 | 12851 | 10.04 | | + | | TOTAL | 1700 | ? | ? | 9277 | 5.46 | |
- | I have counted the sentences | + | We drew our training |
- | Apparently the training-development-test data split was more or less identical in both years, except for the minor discrepancies (number of training sentences and development chunks). | + | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ |
+ | | Training | 1300 | 5125 | 3.94 | | ||
+ | | Test | ||
+ | | TOTAL | 1450 | 5722 | 3.95 | | ||
==== Inside ==== | ==== Inside ==== |