Treebanks for Various Languages
To Process
Ahoj,
stáhl jsem nový španělský závislostní korpus IULA (větší než AnCora)
/net/projects/tectomt_shared/data/resources/treebanks/es
License: CC BY 3.0 (Unported)
Web: http://www.iula.upf.edu/recurs01_tbk_uk.htm
Doc: http://www.iula.upf.edu/recurs01_conll_uk.htm
Download: http://repositori.upf.edu/handle/10230/20048
Parsing: http://www.taln.upf.edu/system/files/biblio_files/ijcnlp_final_padro_et_al_2013.pdf
state-of-the-art LAS score is 94.7 using Mate-C
sentences 42,000
tokens 590,000
The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information and manually annotated with syntactical information using the DELPH-IN environment. The resulting syntactic analysis is automatically converted to dependencies and delivered using the CONLL format.
Martin