Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:addicter [2010/02/22 14:36] zeman The first how-to guide. |
user:zeman:addicter [2011/07/16 17:39] zeman Dan's Perl libraries are needed to run Addicter. |
||
---|---|---|---|
Line 3: | Line 3: | ||
// | // | ||
- | The work on Addicter has started at the MT Marathon 2010 in Dublin, within a broader 5-day project called Failfinder (Dan Zeman, Ondřej Bojar, Martin Popel, David Mareček, Jon Clark, Ken Heafield, Qin Gao, Loïc Barrault). The code that resulted from the project can be freely downloaded from https:// | + | The work on Addicter has started at the MT Marathon 2010 in Dublin, within a broader 5-day project called Failfinder (Dan Zeman, Ondřej Bojar, Martin Popel, David Mareček, Jon Clark, Ken Heafield, Qin Gao, Loïc Barrault). The code that resulted from the project can be freely downloaded from https:// |
- | Currently, Addicter can view and browse | + | In 2011, the viewer was accompanied by an automatic error recognizer and classifier, thanks to Mark Fishel. The development has been moved to ÚFAL StatMT SVN repository (i.e. '' |
+ | |||
+ | Currently, Addicter can do the following: | ||
+ | * Find erroneous tokens | ||
+ | * Browse the test data, sentence by sentence, and show aligned source sentence, reference translation and system hypothesis. | ||
+ | * Browse | ||
+ | * Show lines of the phrase table that contain a given word. | ||
+ | * Summarize | ||
+ | * In the near future, we also plan to add searching and grouping of words sharing the same lemma. That way morphological errors will be highlighted. | ||
+ | |||
+ | The viewing and browsing is performed using a web server that generates web pages dynamically (to avoid pre-generating millions of static HTML documents). Words in sentences are clickable so that the user can quickly navigate to examples and summaries of other than the current word. The obvious drawback is that access to a web server is needed. A small subset can be also generated as static HTML files and viewed without a web server: the test data browser. | ||
+ | |||
+ | There is another subpage for Addicter in this wiki that lies in the external name space, thus it can be used for [[external: | ||
===== Installation ===== | ===== Installation ===== | ||
* Install a web server, unless you already have access to one (local or remote). For instance, the Apache web server is available for at least Linux and MS Windows, and it's free. Configure your web server to work with CGI scripts written in Perl. | * Install a web server, unless you already have access to one (local or remote). For instance, the Apache web server is available for at least Linux and MS Windows, and it's free. Configure your web server to work with CGI scripts written in Perl. | ||
- | * To be able to generate alignments that will be displayed by Addicter, you need Giza++ or equivalent. The first training few steps of the Moses suite will do. | + | * To be able to generate alignments that will be displayed by Addicter, you need [[http:// |
- | * Check out Addicter code from the Failfinder | + | * Check out Addicter code from the ÚFAL SVN repository |
==== How to install and configure Apache ==== | ==== How to install and configure Apache ==== | ||
+ | |||
+ | === Microsoft Windows === | ||
This tutorial currently focuses on installing Apache HTTP Server on Microsoft Windows. If you are experienced user of another operating system and wish to share advice, please feel free to [[mailto: | This tutorial currently focuses on installing Apache HTTP Server on Microsoft Windows. If you are experienced user of another operating system and wish to share advice, please feel free to [[mailto: | ||
Line 22: | Line 36: | ||
* Under Windows, you will also want to set < | * Under Windows, you will also want to set < | ||
* Restart the server. On the main Windows panel, there is (typically in the lower right corner) a set of icons, including a new one for Apache. Right-click on it, select Open Apache Monitor, then Restart. | * Restart the server. On the main Windows panel, there is (typically in the lower right corner) a set of icons, including a new one for Apache. Right-click on it, select Open Apache Monitor, then Restart. | ||
+ | |||
+ | === Ubuntu Linux === | ||
+ | |||
+ | Install the Apache HTTP server package. After successful installation, | ||
+ | |||
+ | < | ||
+ | ScriptAlias /cgi-bin/ / | ||
+ | < | ||
+ | AllowOverride None | ||
+ | Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch | ||
+ | Order allow,deny | ||
+ | Allow from all | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | Either create a copy of the section with new alias and path (eg. '' | ||
==== How to install Addicter ==== | ==== How to install Addicter ==== | ||
Line 27: | Line 57: | ||
We use '' | We use '' | ||
- | * Check out the current version of Addicter | + | * Addicter |
- | * All you need at this moment is in the folder | + | export PERL5LIB=~/ |
+ | * Check out the current version of Addicter from the StatMT SVN repository, again using username | ||
+ | * There are two subfolders, '' | ||
+ | * For every experiment whose data shall be explored by addicter, create a subfolder in '' | ||
===== Alignment viewer ===== | ===== Alignment viewer ===== | ||
Line 43: | Line 76: | ||
* '' | * '' | ||
- | The '' | + | <!--The '' |
- | The indexer splits the output index into multiple files in order to reduce size of any individual file. All index files must be stored in the same folder as the viewing | + | The indexer splits the output index into multiple files in order to reduce size of any individual file. All index files must be stored in the experiment subfolder of '' |
==== How to prepare a corpus for viewing ==== | ==== How to prepare a corpus for viewing ==== | ||
Line 62: | Line 95: | ||
< | < | ||
-trs train.en -trt train.hi -tra train.ali \ | -trs train.en -trt train.hi -tra train.ali \ | ||
- | -s test.en -r test.hi -h test.joshua.hi -ra test.ali -ha test.joshua.ali \ | + | -s test.en -r test.hi -h test.system.hi -ra test.ali -ha test.system.ali \ |
-o $CGI</ | -o $CGI</ | ||
The indexer will copy the input files and output all index files into the '' | The indexer will copy the input files and output all index files into the '' | ||
+ | |||
+ | ==== How to invoke the error classifier ==== | ||
+ | |||
+ | The error classifier currently uses its own monlingual word-alignment of reference translation and the hypothesis. It is invoked as follows: | ||
+ | |||
+ | <code bash> | ||
+ | ${ADDICTER}/ | ||
+ | ${ADDICTER}/ | ||
+ | |||
+ | Place the files '' | ||
==== How to use the viewer ==== | ==== How to use the viewer ==== | ||
Now if your web server is running and configured properly and your index and data files have been prepared in the correct place, launch your web browser and point it to http:// | Now if your web server is running and configured properly and your index and data files have been prepared in the correct place, launch your web browser and point it to http:// | ||
+ | |||
+ | ===== Acknowledgements ===== | ||
+ | |||
+ | This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838 (2010), by the grants of the Czech Science Foundation no. P406/ | ||
+ |