[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models

UNDER CONSTRUCTION!

This page is an add-on to the following paper:

Ondřej Bojar, Pavel Straňák, Daniel Zeman, Gaurav Jain, Michal Hrušecký, Michal Richter, Jan Hajič: English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models. In: Proceedings of ICON 2009, Hyderabad, December.

The purpose of the add-on page is to provide detailed documentation of the data, tools and settings used so that the results can be reproduced by other researchers.

Out of Vocabulary

data tokens types tokens in train types in train
Tides-train-en 1226144 48048
Tides-train-hi 1312435 53451
Tides+DP11-train-en 1402536 52947
Tides+DP11-train-hi 1434543 57131

[ Back to the navigation ] [ Back to the content ]