Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
user:zeman:treebanks [2012/03/22 09:57] zeman Tamil, Telugu and Turkish. |
user:zeman:treebanks [2014/07/17 17:43] (current) zeman Croatian. |
====== Treebanks for Various Languages ====== | ====== Treebanks for Various Languages ====== |
| |
http://ufal.mff.cuni.cz/hamledt/ | http://ufal.mff.cuni.cz/hamledt/ nebo [[hamledt|HamleDT ve Wiki]] |
| |
| * [[user:zeman:treebanks:grc|Ancient Greek (grc)]] |
* [[user:zeman:treebanks:ar|Arabic (ar)]] | * [[user:zeman:treebanks:ar|Arabic (ar)]] |
* [[user:zeman:treebanks:bg|Bulgarian (bg)]] | * [[user:zeman:treebanks:eu|Basque (eu)]] |
* [[user:zeman:treebanks:bn|Bengali (bn)]] | * [[user:zeman:treebanks:bn|Bengali (bn)]] |
| * [[user:zeman:treebanks:bg|Bulgarian (bg)]] |
* [[user:zeman:treebanks:ca|Catalan (ca)]] | * [[user:zeman:treebanks:ca|Catalan (ca)]] |
| * [[user:zeman:treebanks:hr|Croatian (hr)]] |
* [[user:zeman:treebanks:cs|Czech (cs)]] | * [[user:zeman:treebanks:cs|Czech (cs)]] |
* [[user:zeman:treebanks:da|Danish (da)]] | * [[user:zeman:treebanks:da|Danish (da)]] |
* [[user:zeman:treebanks:de|German (de)]] | * [[user:zeman:treebanks:nl|Dutch (nl)]] |
* [[user:zeman:treebanks:el|Greek (el)]] | |
* [[user:zeman:treebanks:en|English (en)]] | * [[user:zeman:treebanks:en|English (en)]] |
* [[user:zeman:treebanks:es|Spanish (es)]] | |
* [[user:zeman:treebanks:et|Estonian (et)]] | * [[user:zeman:treebanks:et|Estonian (et)]] |
* [[user:zeman:treebanks:eu|Basque (eu)]] | |
* [[user:zeman:treebanks:fa|Persian (fa)]] | |
* [[user:zeman:treebanks:fi|Finnish (fi)]] | * [[user:zeman:treebanks:fi|Finnish (fi)]] |
* [[user:zeman:treebanks:grc|Ancient Greek (grc)]] | * [[user:zeman:treebanks:de|German (de)]] |
| * [[user:zeman:treebanks:el|Greek (el)]] |
* [[user:zeman:treebanks:hi|Hindi (hi)]] | * [[user:zeman:treebanks:hi|Hindi (hi)]] |
* [[user:zeman:treebanks:hu|Hungarian (hu)]] | * [[user:zeman:treebanks:hu|Hungarian (hu)]] |
* [[user:zeman:treebanks:ja|Japanese (ja)]] | * [[user:zeman:treebanks:ja|Japanese (ja)]] |
* [[user:zeman:treebanks:la|Latin (la)]] | * [[user:zeman:treebanks:la|Latin (la)]] |
* [[user:zeman:treebanks:nl|Dutch (nl)]] | * [[user:zeman:treebanks:fa|Persian (fa)]] |
* [[user:zeman:treebanks:pt|Portuguese (pt)]] | * [[user:zeman:treebanks:pt|Portuguese (pt)]] |
* [[user:zeman:treebanks:ro|Romanian (ro)]] | * [[user:zeman:treebanks:ro|Romanian (ro)]] |
* [[user:zeman:treebanks:ru|Russian (ru)]] | * [[user:zeman:treebanks:ru|Russian (ru)]] |
| * [[user:zeman:treebanks:sk|Slovak (sk)]] |
* [[user:zeman:treebanks:sl|Slovene (sl)]] | * [[user:zeman:treebanks:sl|Slovene (sl)]] |
| * [[user:zeman:treebanks:es|Spanish (es)]] |
* [[user:zeman:treebanks:sv|Swedish (sv)]] | * [[user:zeman:treebanks:sv|Swedish (sv)]] |
* [[user:zeman:treebanks:ta|Tamil (ta)]] | * [[user:zeman:treebanks:ta|Tamil (ta)]] |
* [[user:zeman:treebanks:te|Telugu (te)]] | * [[user:zeman:treebanks:te|Telugu (te)]] |
* [[user:zeman:treebanks:tr|Turkish (tr)]] | * [[user:zeman:treebanks:tr|Turkish (tr)]] |
| |
| ===== To Process ===== |
| |
| Ahoj, |
| stáhl jsem nový španělský závislostní korpus IULA (větší než AnCora) |
| /net/projects/tectomt_shared/data/resources/treebanks/es |
| |
| License: CC BY 3.0 (Unported) |
| Web: http://www.iula.upf.edu/recurs01_tbk_uk.htm |
| Doc: http://www.iula.upf.edu/recurs01_conll_uk.htm |
| Download: http://repositori.upf.edu/handle/10230/20048 |
| Parsing: http://www.taln.upf.edu/system/files/biblio_files/ijcnlp_final_padro_et_al_2013.pdf |
| state-of-the-art LAS score is 94.7 using Mate-C |
| sentences 42,000 |
| tokens 590,000 |
| |
| The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information and manually annotated with syntactical information using the DELPH-IN environment. The resulting syntactic analysis is automatically converted to dependencies and delivered using the CONLL format. |
| |
| Martin |