Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
user:zeman:treebanks:cs [2013/07/11 17:40] zeman PDT 2.5. |
user:zeman:treebanks:cs [2014/04/04 14:54] (current) zeman PDT 3.0 size. |
||
---|---|---|---|
Line 9: | Line 9: | ||
* PDT 2.0 (2006) | * PDT 2.0 (2006) | ||
* PDT 2.5 (2011) | * PDT 2.5 (2011) | ||
+ | * PDT 3.0 (2013) | ||
* CoNLL 2006 | * CoNLL 2006 | ||
* CoNLL 2007 | * CoNLL 2007 | ||
Line 36: | Line 37: | ||
* Website | * Website | ||
+ | * http:// | ||
* http:// | * http:// | ||
* http:// | * http:// | ||
Line 42: | Line 44: | ||
* Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall: //Prague Dependency Treebank 1.0// ([[http:// | * Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall: //Prague Dependency Treebank 1.0// ([[http:// | ||
* Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Petr Pajas, Jan Štěpánek, | * Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Petr Pajas, Jan Štěpánek, | ||
+ | * Eduard Bejček, Jan Hajič, Jarmila Panevová, Jiří Mírovský, Johanka Spoustová, Jan Štěpánek, | ||
* Principal publications | * Principal publications | ||
* Jan Hajič, Alena Böhmová, Eva Hajičová, Barbora Hladká: [[http:// | * Jan Hajič, Alena Böhmová, Eva Hajičová, Barbora Hladká: [[http:// | ||
- | * Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčíková, | + | * Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčíková, |
* Documentation | * Documentation | ||
* Jiří Hana, Daniel Zeman: [[http:// | * Jiří Hana, Daniel Zeman: [[http:// | ||
* Jan Hajič, Jarmila Panevová, Eva Buráňová, | * Jan Hajič, Jarmila Panevová, Eva Buráňová, | ||
+ | * Wiki: [[internal: | ||
==== Domain ==== | ==== Domain ==== | ||
Line 60: | Line 64: | ||
Parts of the following table have been taken from [[http:// | Parts of the following table have been taken from [[http:// | ||
+ | |||
+ | PDT 3.0 also distinguishes d-test and e-test but I currently have counts from train and d-test summed up. To be updated... | ||
^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | ^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | ||
Line 65: | Line 71: | ||
| PDT 1.0 | 73088 | 1,255,590 | 7319 | 126,030 | 7507 | 125,713 | 87914 | 1,489,748 | 16.95 | | | PDT 1.0 | 73088 | 1,255,590 | 7319 | 126,030 | 7507 | 125,713 | 87914 | 1,489,748 | 16.95 | | ||
| PDT 2.0 | 68562 | 1,172,299 | 9270 | 158,962 | 10148 | 173,586 | 87980 | 1,504,847 | 17.10 | | | PDT 2.0 | 68562 | 1,172,299 | 9270 | 158,962 | 10148 | 173,586 | 87980 | 1,504,847 | 17.10 | | ||
+ | | PDT 3.0 | 77765 | 1,330,152 | train | train | 10148 | 173,586 | 87913 | 1,503,738 | 17.10 | | ||
| CoNLL 2006 | 72703 | 1,249,408 | 365 | 5853 | | | 73068 | 1,255,261 | 17.18 | | | CoNLL 2006 | 72703 | 1,249,408 | 365 | 5853 | | | 73068 | 1,255,261 | 17.18 | | ||
| CoNLL 2007 | 25364 | 432,296 | 286 | 4724 | | | 25650 | 437,020 | 17.04 | | | CoNLL 2007 | 25364 | 432,296 | 286 | 4724 | | | 25650 | 437,020 | 17.04 | |