[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Addicter

Addicter stands for Automatic Detection and DIsplay of Common Translation ERrors. It will be a set of tools (mostly scripts written in Perl) that help with error analysis for machine translation.

The work on Addicter has started at the MT Marathon 2010 in Dublin, within a broader 5-day project called Failfinder (Dan Zeman, Ondřej Bojar, Martin Popel, David Mareček, Jon Clark, Ken Heafield, Qin Gao, Loïc Barrault). The code that resulted from the project can be freely downloaded from https://failfinder.googlecode.com/svn/trunk/. The nucleus that existed just after the MT Marathon (4 Feb 2010) is Addicter version 0.1, to reflect that this is by no means deemed a final product. Anyway, it can already do a useful job.

Currently, Addicter can view and browse aligned corpora, look for example words in context and summarize known alignments of a given word. The viewing and browsing is performed using a web server that generates web pages dynamically (to avoid pre-generating millions of static HTML documents). The obvious drawback is that access to a web server is needed.

Installation

How to install and configure Apache

This tutorial currently focuses on installing Apache HTTP Server on Microsoft Windows. If you are experienced user of another operating system and wish to share advice, please feel free to contact me.

Alignment viewer

Before invoking the viewer, you need to run an indexing script over your aligned corpus. It will create a bunch of index files that will later tell the viewer where to look for examples of a particular word. The indexer needs the following input files:

The indexer splits the output index into multiple files in order to reduce size of any individual file. All index files must be stored in the same folder as the viewing CGI scripts.


[ Back to the navigation ] [ Back to the content ]