This is an old revision of the document!
English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models
UNDER CONSTRUCTION!
This page is an add-on to the following paper:
Ondřej Bojar, Pavel Straňák, Daniel Zeman, Gaurav Jain, Michal Hrušecký, Michal Richter, Jan Hajič: English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models. In: Proceedings of ICON 2009, Hyderabad, December.
The purpose of the add-on page is to provide detailed documentation of the data, tools and settings used so that the results can be reproduced by other researchers.
- Data
- IIIT-TIDES
- Daniel Pipes
- EMILLE
- Agrocorpus
- Shabdanjali
- Wikipedia Named Entities
- Tools and their settings
- Tokenization and normalization of the data
- Hunalign
- GIZA++
- makecls
- SRILM
- Moses
- Joshua
- Mumbai Tagger
- Affisix
- Hindomor
- HiTBSuf
- počítadlo BLEU skóre
- Link tables from the paper to concrete settings
- Link to the PDF version of the paper; link to Biblio?