[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
user:zeman:treebanks [2012/01/08 13:39]
zeman Latin.
user:zeman:treebanks [2014/05/30 12:54]
zeman Odkaz.
Line 1: Line 1:
 ====== Treebanks for Various Languages ====== ====== Treebanks for Various Languages ======
  
 +http://ufal.mff.cuni.cz/hamledt/ nebo [[hamledt|HamleDT ve Wiki]]
 +
 +  * [[user:zeman:treebanks:grc|Ancient Greek (grc)]]
   * [[user:zeman:treebanks:ar|Arabic (ar)]]   * [[user:zeman:treebanks:ar|Arabic (ar)]]
-  * [[user:zeman:treebanks:bg|Bulgarian (bg)]]+  * [[user:zeman:treebanks:eu|Basque (eu)]]
   * [[user:zeman:treebanks:bn|Bengali (bn)]]   * [[user:zeman:treebanks:bn|Bengali (bn)]]
 +  * [[user:zeman:treebanks:bg|Bulgarian (bg)]]
   * [[user:zeman:treebanks:ca|Catalan (ca)]]   * [[user:zeman:treebanks:ca|Catalan (ca)]]
   * [[user:zeman:treebanks:cs|Czech (cs)]]   * [[user:zeman:treebanks:cs|Czech (cs)]]
   * [[user:zeman:treebanks:da|Danish (da)]]   * [[user:zeman:treebanks:da|Danish (da)]]
-  * [[user:zeman:treebanks:de|German (de)]] +  * [[user:zeman:treebanks:nl|Dutch (nl)]]
-  * [[user:zeman:treebanks:el|Greek (el)]]+
   * [[user:zeman:treebanks:en|English (en)]]   * [[user:zeman:treebanks:en|English (en)]]
-  * [[user:zeman:treebanks:es|Spanish (es)]] 
   * [[user:zeman:treebanks:et|Estonian (et)]]   * [[user:zeman:treebanks:et|Estonian (et)]]
-  * [[user:zeman:treebanks:eu|Basque (eu)]] 
   * [[user:zeman:treebanks:fi|Finnish (fi)]]   * [[user:zeman:treebanks:fi|Finnish (fi)]]
-  * [[user:zeman:treebanks:grc|Ancient Greek (grc)]]+  * [[user:zeman:treebanks:de|German (de)]] 
 +  * [[user:zeman:treebanks:el|Greek (el)]]
   * [[user:zeman:treebanks:hi|Hindi (hi)]]   * [[user:zeman:treebanks:hi|Hindi (hi)]]
   * [[user:zeman:treebanks:hu|Hungarian (hu)]]   * [[user:zeman:treebanks:hu|Hungarian (hu)]]
Line 20: Line 22:
   * [[user:zeman:treebanks:ja|Japanese (ja)]]   * [[user:zeman:treebanks:ja|Japanese (ja)]]
   * [[user:zeman:treebanks:la|Latin (la)]]   * [[user:zeman:treebanks:la|Latin (la)]]
 +  * [[user:zeman:treebanks:fa|Persian (fa)]]
 +  * [[user:zeman:treebanks:pt|Portuguese (pt)]]
 +  * [[user:zeman:treebanks:ro|Romanian (ro)]]
 +  * [[user:zeman:treebanks:ru|Russian (ru)]]
 +  * [[user:zeman:treebanks:sk|Slovak (sk)]]
 +  * [[user:zeman:treebanks:sl|Slovene (sl)]]
 +  * [[user:zeman:treebanks:es|Spanish (es)]]
 +  * [[user:zeman:treebanks:sv|Swedish (sv)]]
 +  * [[user:zeman:treebanks:ta|Tamil (ta)]]
 +  * [[user:zeman:treebanks:te|Telugu (te)]]
 +  * [[user:zeman:treebanks:tr|Turkish (tr)]]
 +
 +===== To Process =====
 +
 +Ahoj,
 +stáhl jsem nový španělský závislostní korpus IULA (větší než AnCora)
 +/net/projects/tectomt_shared/data/resources/treebanks/es
 +
 +License:  CC BY 3.0 (Unported)
 +Web:      http://www.iula.upf.edu/recurs01_tbk_uk.htm
 +Doc:      http://www.iula.upf.edu/recurs01_conll_uk.htm
 +Download: http://repositori.upf.edu/handle/10230/20048
 +Parsing:  http://www.taln.upf.edu/system/files/biblio_files/ijcnlp_final_padro_et_al_2013.pdf
 +          state-of-the-art LAS score is 94.7 using Mate-C
 +sentences  42,000
 +tokens    590,000
 +
 +The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information and manually annotated with syntactical information using the DELPH-IN environment. The resulting syntactic analysis is automatically converted to dependencies and delivered using the CONLL format.
 +
 +Martin

[ Back to the navigation ] [ Back to the content ]