[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Khresmoi

Medical Information Analysis & Retrieval
http://www.khresmoi.eu/

People and contacts


Data

MT training data available for KHRESMOI

Corpus Source Domain EN-FR EN-DE EN FR DE Note
TDA translation memory TDA in 13517 Kw 6797 Kw 8-) PP
CESTA Evaluation Package ELRA in 38 Kw waiting
EQueR Evaluation Package ELRA in 140 MiB waiting
CESART Evaluation Package ELRA in 9000 Kw waiting
French Gigaword LDC news 863 Kw DVD
Acquis JRC law 1,25 Ms (?3,034 Ms) (3,128 Ms) 8-) JHla
EMEA European Medicines Agency in 373 Ks 8-) JHla
EMEA European Medicines Agency in 14.9Mw 8-) JHla
EMEA European Medicines Agency in 26.34 Mw 8-) JHla
MESH U.S. National Library of Medicine in 838 kw 8-) JHla
OrphaNet OrphaNet in ? negotiating
Europarl WMT12 parl ? ? JHla
News Commentary WMT12 news ? ? JHla
News monolingual WMT12 news JHla
United Nations WMT12 ? ? JHla
French-English 109 corpus WMT12 web ? JHla
Medpedia wiki Medpedia in ? only EN found
MAREC IPC in ? ? ? contacted JHla
Springer Bilingual Corpus much.more in 1.09 Mw 8-) JB
EMEA CORPUS in 12 Mw 8-) JB

k, M … thousand, milion
w, s … words, sentences (for parallel data only source (English) words are counted)

JRC Acquis by mel mit pres 3 Ms:
http://optima.jrc.it/Acquis/JRC-Acquis.3.0/alignmentsHunAlign/index.html

Zdroje

MAREC
A61 (MEDICAL OR VETERINARY SCIENCE; HYGIENE): 1.589,849 files
Nevím, kolik slov, není to v jednolitém balíku.

Khresmoi wiki
http://wiki.khresmoi.eu/index.php5/Data_sets_used
http://wiki.khresmoi.eu/index.php5/Data_sets

www stranka WMT workshopu
http://www.statmt.org/wmt12/

korpus OPUS
http://opus.lingfil.uu.se/

JRC Acquis
http://langtech.jrc.it/JRC-Acquis.html

ELDA

Objednali jsme několik balíčků s in-domain daty (EN-FR, FR)

TDA

Máme kredit na stažení 1 mld. slov. Zatím stažena EN-FR, EN-DE in-domain data.

LDC

Paralelní data

EN-FR
EN-DE

Mono data

FR
DE
EN


Dokumenty


SVN

Prosím PP o doplnění



[ Back to the navigation ] [ Back to the content ]