{{Infobox Resource | name = CoNLL Dependency Treebanks | owner = zeman | path = /fs/clip-corpora/conll | version = 2006 }} [[http://nextens.uvt.nl/~conll/index.html|CoNLL-X Shared Task]] involved dependency parsing of the following languages: - Arabic (Prague Arabic Dependency Treebank) - Bulgarian (BulTreeBank) - Chinese (Sinica Treebank) - Czech (Prague Dependency Treebank) - Danish ([[Danish Dependency Treebank]]) - Dutch (Alpino Treebank) - German (Tiger) - Japanese (Verbmobil) - Portuguese (Bosque) - Slovene (Slovene Dependency Treebank) - Spanish (Cast3LB) - Swedish ([[Talbanken05]]) - Turkish (METU-Sabanci Treebank) Now there are standardized data sets for all these languages in unified format. Note that the training/test splits differ from those that the treebanks may define out of CoNLL context (because the organizers of the shared task needed to keep the test data secret until the D date). Some treebanks are freely available (da, nl, pt, sv), some are freely available after signing a licence agreement (bg, ja, sl), some require LDC licence (ar, cs), some require their own separate licence (de, tr, zh). I have been able to acquire ar, bg, zh, cs, da, nl, ja, pt, sl, sv. Except for da, nl, pt, sv, do not redistribute without talking to [[User:Zeman|Dan Zeman]].