Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
pub-company:icon2009 [2010/03/24 17:54] stranak pridany udaje o OOV pro Tides+dictfilt (Shabdanjali) |
pub-company:icon2009 [2010/03/25 10:01] (current) stranak |
- set2 = emille-11+danielpipes-11+agrocorp-11+wikiner2008+wikiner2009+acl2005 | - set2 = emille-11+danielpipes-11+agrocorp-11+wikiner2008+wikiner2009+acl2005 |
- set3 = emille-om+danielpipes-11+agrocorp-11+wikiner2008+wikiner2009+acl2005 | - set3 = emille-om+danielpipes-11+agrocorp-11+wikiner2008+wikiner2009+acl2005 |
| - dictfilt = Shabdanjali from the web (with many errors, probably from wx-to-utf8). Filtered to get rid of the errors, then expanded entries with multiple meanings to separate entries, then filtered to keep onlu word that occur in the large Hindi monolingual corpus. |
| |
| ^ Coverage ^^^^^^^^^^^^^^^ |
| | | **tokens unseen in train** ||||||| **types unseen in train** ||||||| |
| | | //Tides// | //Tides+DP// | //Tides+dict// | //Tides+DP+dict// | //set1// | //set2// | //set3// | //Tides// | //Tides+DP// | //Tides+dict// | //Tides+DP+dict// | //set1// | //set2 // | //set3// | |
| | **Tides-test-en** | 369 | 348 | 363 (1.336%) | 343 (1.262%) | 2524 (9.290%) | 2330 (8.576%) | 2429 (8.940%) | 363 | 343 | 357 (6.011%) | 338 (5.691%) | 1974 (33.238%) | 1824 (30.712%) | 1901 (32.009%) | |
| | **Tides-test-hi** | 839 | 830 | 836 (2.926%) | 828 (2.898%) | 3480 (12.179%) | 3233 (11.314%) | 3310 (11.584%) | 642 | 633 | 639 (10.882%) | 631 (10.746%) | 2569 (43.750%) | 2412 (41.076%) | 2465 (41.979%) | |
| | **Tides-dev-en** | 464 | 421 | 462 (2.055%) | 419 (1.863%) | 2072 (9.215%) | 1732 (7.703%) | 1873 (8.330%) | 459 | 418 | 457 (8.167%) | 416 (7.434%) | 1750 (31.272%) | 1498 (26.769%) | 1608 (28.735%) | |
| | **Tides-dev-hi** | 619 | 607 | 618 (2.537%) | 606 (2.487%) | 2946 (12.092%) | 2546 (10.450%) | 2661 (10.922%) | 580 | 568 | 579 (10.262%) | 567 (10.050%) | 2325 (41.209%) | 2037 (36.104%) | 2129 (37.735%) | |
| |
| |
^ Coverage ^^^^^^^^^^^ | |
| | **tokens unseen in train** ||||| **types unseen in train** ||||| | |
| | //Tides// | //Tides+DP// | //Tides+dict// | //Tides+DP+dict// | //set1// | //set2// | //set3// | //Tides// | //Tides+DP// | //Tides+dict// | //Tides+DP+dict// | //set1// | //set2 // | //set3// | | |
| **Tides-test-en** | 369 | 348 | 363 (1.336%) | 343 (1.262%) | 2524 (9.290%) | 2330 (8.576%) | 2429 (8.940%) | 363 | 343 | 357 (6.011%) | 338 (5.691%) | 1974 (33.238%) | 1824 (30.712%) | 1901 (32.009%) | | |
| **Tides-test-hi** | 839 | 830 | 836 (2.926%) | 828 (2.898%) | 3480 (12.179%) | 3233 (11.314%) | 3310 (11.584%) | 642 | 633 | 639 (10.882%) | 631 (10.746%) | 2569 (43.750%) | 2412 (41.076%) | 2465 (41.979%) | | |
| **Tides-dev-en** | 464 | 421 | 2072 (9.215%) | 1732 (7.703%) | 1873 (8.330%) | 459 | 418 | 1750 (31.272%) | 1498 (26.769%) | 1608 (28.735%) | | |
| **Tides-dev-hi** | 619 | 607 | 2946 (12.092%) | 2546 (10.450%) | 2661 (10.922%) | 580 | 568 | 2325 (41.209%) | 2037 (36.104%) | 2129 (37.735%) | | |
| |