This is an old revision of the document!
Table of Contents
Khresmoi
Medical Information Analysis & Retrieval
http://www.khresmoi.eu/
People and contacts
- JH = Jan Hajič <hajic (at) ufal.mff.cuni.cz>
- PP = Pavel Pecina <pecina (at) ufal.mff.cuni.cz>
- JHla = Jaroslava Hlaváčová <hlava (at) ufal.mff.cuni.cz>
- JD = Jan Dědek <dedek (at) ksi.mff.cuni.cz>
- JB = Jakub Bystroň <jb.elitecode (at) gmail.com>
Data
Corpus | Source | Domain | EN-FR | EN-DE | EN | FR | DE | Note |
---|---|---|---|---|---|---|---|---|
TMX | TDA | in | 13517 Kw | 6797 Kw | PP | |||
CESTA Evaluation Package | ELRA | in | 38 Kw | waiting | ||||
EQueR Evaluation Package | ELRA | in | 140 MiB | waiting | ||||
CESART Evaluation Package | ELRA | in | 9000 Kw | waiting | ||||
French Gigaword | LDC | out | 863Kw | DVD | ||||
Acquis | JRC | out | 1,25M sentences | JHla | ||||
EMEA | European Medicines Agency | in | 373k sentences | JHla | ||||
EMEA | European Medicines Agency | in | 14.9Mw | JHla | ||||
EMEA | European Medicines Agency | in | 26,34Mw | JHla | ||||
MESH | U.S. National Library of Medicine | in | 838kw | JHla | ||||
OrphaNet | OrphaNet | in | ? | negotiating |
Zdroje (dle PP)
Khresmoi wiki
http://wiki.khresmoi.eu/index.php5/Data_sets_used
http://wiki.khresmoi.eu/index.php5/Data_sets
www stranka WMT workshopu
http://www.statmt.org/wmt12/
korpus OPUS
http://opus.lingfil.uu.se/
JRC Acquis
http://langtech.jrc.it/JRC-Acquis.html
ELDA
Objednali jsme několik balíčků s in-domain daty (EN-FR, FR)
TDA
Máme kredit na stažení 1 mld. slov. Zatím stažena EN-FR, EN-DE in-domain data.
LDC
Paralelní data
Mono data
Dokumenty
SVN
Prosím PP o doplnění