[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks [2011/11/20 21:21]
zeman Jednotlivé jazyky už musely být přesunuty na samostatné stránky.
user:zeman:treebanks [2014/07/17 17:43] (current)
zeman Croatian.
Line 1: Line 1:
 ====== Treebanks for Various Languages ====== ====== Treebanks for Various Languages ======
  
 +http://ufal.mff.cuni.cz/hamledt/ nebo [[hamledt|HamleDT ve Wiki]]
 +
 +  * [[user:zeman:treebanks:grc|Ancient Greek (grc)]]
   * [[user:zeman:treebanks:ar|Arabic (ar)]]   * [[user:zeman:treebanks:ar|Arabic (ar)]]
-  * [[user:zeman:treebanks:bg|Bulgarian (bg)]]+  * [[user:zeman:treebanks:eu|Basque (eu)]]
   * [[user:zeman:treebanks:bn|Bengali (bn)]]   * [[user:zeman:treebanks:bn|Bengali (bn)]]
 +  * [[user:zeman:treebanks:bg|Bulgarian (bg)]]
   * [[user:zeman:treebanks:ca|Catalan (ca)]]   * [[user:zeman:treebanks:ca|Catalan (ca)]]
 +  * [[user:zeman:treebanks:hr|Croatian (hr)]]
   * [[user:zeman:treebanks:cs|Czech (cs)]]   * [[user:zeman:treebanks:cs|Czech (cs)]]
   * [[user:zeman:treebanks:da|Danish (da)]]   * [[user:zeman:treebanks:da|Danish (da)]]
 +  * [[user:zeman:treebanks:nl|Dutch (nl)]]
 +  * [[user:zeman:treebanks:en|English (en)]]
 +  * [[user:zeman:treebanks:et|Estonian (et)]]
 +  * [[user:zeman:treebanks:fi|Finnish (fi)]]
   * [[user:zeman:treebanks:de|German (de)]]   * [[user:zeman:treebanks:de|German (de)]]
   * [[user:zeman:treebanks:el|Greek (el)]]   * [[user:zeman:treebanks:el|Greek (el)]]
-  * [[user:zeman:treebanks:en|English (en)]]+  * [[user:zeman:treebanks:hi|Hindi (hi)]] 
 +  * [[user:zeman:treebanks:hu|Hungarian (hu)]] 
 +  * [[user:zeman:treebanks:it|Italian (it)]] 
 +  * [[user:zeman:treebanks:ja|Japanese (ja)]] 
 +  * [[user:zeman:treebanks:la|Latin (la)]] 
 +  * [[user:zeman:treebanks:fa|Persian (fa)]] 
 +  * [[user:zeman:treebanks:pt|Portuguese (pt)]] 
 +  * [[user:zeman:treebanks:ro|Romanian (ro)]] 
 +  * [[user:zeman:treebanks:ru|Russian (ru)]] 
 +  * [[user:zeman:treebanks:sk|Slovak (sk)]] 
 +  * [[user:zeman:treebanks:sl|Slovene (sl)]] 
 +  * [[user:zeman:treebanks:es|Spanish (es)]] 
 +  * [[user:zeman:treebanks:sv|Swedish (sv)]] 
 +  * [[user:zeman:treebanks:ta|Tamil (ta)]] 
 +  * [[user:zeman:treebanks:te|Telugu (te)]] 
 +  * [[user:zeman:treebanks:tr|Turkish (tr)]] 
 + 
 +===== To Process ===== 
 + 
 +Ahoj, 
 +stáhl jsem nový španělský závislostní korpus IULA (větší než AnCora) 
 +/net/projects/tectomt_shared/data/resources/treebanks/es 
 + 
 +License:  CC BY 3.0 (Unported) 
 +Web:      http://www.iula.upf.edu/recurs01_tbk_uk.htm 
 +Doc:      http://www.iula.upf.edu/recurs01_conll_uk.htm 
 +Download: http://repositori.upf.edu/handle/10230/20048 
 +Parsing:  http://www.taln.upf.edu/system/files/biblio_files/ijcnlp_final_padro_et_al_2013.pdf 
 +          state-of-the-art LAS score is 94.7 using Mate-C 
 +sentences  42,000 
 +tokens    590,000 
 + 
 +The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information and manually annotated with syntactical information using the DELPH-IN environment. The resulting syntactic analysis is automatically converted to dependencies and delivered using the CONLL format.
  
 +Martin

[ Back to the navigation ] [ Back to the content ]