Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:treebanks:te [2012/03/22 11:30] zeman Size. |
user:zeman:treebanks:te [2012/03/22 11:34] zeman Training data size (both sentences and words) was identical in ICON 2009 and 2010. |
||
---|---|---|---|
Line 43: | Line 43: | ||
==== Size ==== | ==== Size ==== | ||
- | HyDT-Telugu shows dependencies between chunks, not words. The node/tree ratio is thus much lower than in other treebanks. The ICON 2009 version came with a data split into three parts: training, development and test: | + | HyDT-Telugu shows dependencies between chunks, not words. The node/tree ratio is thus much lower than in other treebanks. The ICON 2009 version came with a data split into three parts: training, development and test; the same data was also distributed for ICON 2010: |
- | + | ||
- | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ | + | |
- | | Training | 980 | 6449 | 6.58 | | + | |
- | | Development | 150 | 811 | 5.41 | | + | |
- | | Test | 150 | 961 | 6.41 | | + | |
- | | TOTAL | 1280 | 8221 | 6.42 | | + | |
- | + | ||
- | The ICON 2010 version came with a data split into three parts: training, development and test: | + | |
^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ | ^ Part ^ Sentences ^ Chunks ^ Ratio ^ Words ^ Ratio ^ |