Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:treebanks:fi [2011/12/05 14:11] zeman References. |
user:zeman:treebanks:fi [2011/12/05 14:37] zeman Size. |
||
---|---|---|---|
Line 32: | Line 32: | ||
==== Domain ==== | ==== Domain ==== | ||
- | Mixed: | + | Mixed (Wikipedia, Wikinews, university web-magazine and blogs). |
- | * 388 tailored sentences with movement verbs | + | |
- | * 732 sentences with movement verbs from the Estonian FrameNet corpus | + | |
- | * 175 sentences from the Arborest corpus | + | |
- | * 20 sentences of spoken language | + | |
==== Size ==== | ==== Size ==== | ||
- | All four parts of the treebank together contain 9491 tokens in 1315 sentences, yielding | + | TDT contains 58576 tokens in 4307 sentences, yielding |
- | + | ||
- | ^ File ^ Sentences ^ Terminals ^ Average t/s ^ | + | |
- | | arborest.xml | 175 | 2451 | 14.01 | | + | |
- | | piialaused.xml | 732 | 4505 | 6.15 | | + | |
- | | ratsepalaused.xml | 388 | 2348 | 6.05 | | + | |
- | | sul.xml | 20 | 187 | 9.35 | | + | |
- | | **total** | **1315** | **9491** | **7.22** | | + | |
- | | training | 1184 | 8535 | 7.21 | | + | |
- | | test | 131 | 956 | 7.30 | | + | |
==== Inside ==== | ==== Inside ==== |