[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Treebanks for Various Languages

http://ufal.mff.cuni.cz/hamledt/ nebo HamleDT ve Wiki

To Process

Ahoj,
stáhl jsem nový španělský závislostní korpus IULA (větší než AnCora)
/net/projects/tectomt_shared/data/resources/treebanks/es

License: CC BY 3.0 (Unported)
Web: http://www.iula.upf.edu/recurs01_tbk_uk.htm
Doc: http://www.iula.upf.edu/recurs01_conll_uk.htm
Download: http://repositori.upf.edu/handle/10230/20048
Parsing: http://www.taln.upf.edu/system/files/biblio_files/ijcnlp_final_padro_et_al_2013.pdf

        state-of-the-art LAS score is 94.7 using Mate-C

sentences 42,000
tokens 590,000

The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information and manually annotated with syntactical information using the DELPH-IN environment. The resulting syntactic analysis is automatically converted to dependencies and delivered using the CONLL format.

Martin


[ Back to the navigation ] [ Back to the content ]