[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
user:zeman:addicter [2010/02/22 11:56]
zeman vytvořeno
user:zeman:addicter [2010/02/22 13:26]
zeman How to install and configure Apache.
Line 6: Line 6:
  
 Currently, Addicter can view and browse aligned corpora, look for example words in context and summarize known alignments of a given word. The viewing and browsing is performed using a web server that generates web pages dynamically (to avoid pre-generating millions of static HTML documents). The obvious drawback is that access to a web server is needed. Currently, Addicter can view and browse aligned corpora, look for example words in context and summarize known alignments of a given word. The viewing and browsing is performed using a web server that generates web pages dynamically (to avoid pre-generating millions of static HTML documents). The obvious drawback is that access to a web server is needed.
 +
 +===== Installation =====
 +
 +  * Install a web server, unless you already have access to one (local or remote). For instance, the Apache web server is available for at least Linux and MS Windows, and it's free. Configure your web server to work with CGI scripts written in Perl.
 +  * To be able to generate alignments that will be displayed by Addicter, you need Giza++ or equivalent. The first training few steps of the Moses suite will do.
 +  * Check out Addicter code from the Failfinder SVN repository.
 +
 +==== How to install and configure Apache ====
 +
 +This tutorial currently focuses on installing Apache HTTP Server on Microsoft Windows. If you are experienced user of another operating system and wish to share advice, please feel free to [[mailto:zeman@ufal.mff.cuni.cz|contact me]].
 +
 +  * Download the Apache HTTP Server from http://httpd.apache.org/download.cgi. For MS Windows, you can download a package for the Microsoft Installer (''.msi''). Install it by double-clicking on the installation file. I suggest installing Apache as a system service. That way, it will automatically start on startup of your computer.
 +  * Configure the server. This essentially means editing a configuration file and restarting the server. Depending on your system settings, Apache version etc., the configuration file will reside in a path similar to this: ''C:\Program Files\Apache Software Foundation\Apache2.2\conf\httpd.conf''. Alternatively, you can access it via your Start Menu: Apache -> Apache HTTP Server 2.2 -> Configure Apache Server -> Edit the Apache httpd.conf Configuration File.
 +    * Look for a ''ScriptAlias'' directive. It tells the server: 1. what path on the hard disk contains scripts that can generate dynamic HTML content on the fly, and 2. how the path will be represented in the URL (web address). For example <code>ScriptAlias /cgi/ "C:/Documents and Settings/Dan/Documents/Web/cgi/"</code> says that the URL ''http://localhost/cgi/anyscript.pl'' leads to your script ''C:\Documents and Settings\Dan\Documents\Web\cgi\anyscript.pl'', and that it's a script (i.e., the server shall invoke it and send its output, instead of sending the script itself).
 +    * Under Windows, you will also want to set <code>ScriptInterpreterSource registry</code> It tells the server that the Windows registry shall be used to figure out how to run a script (e.g., that ''C:\Perl\Perl.exe'' binary must be run to interpret a ''.pl'' script).
 +  * Restart the server. On the main Windows panel, there is (typically in the lower right corner) a set of icons, including a new one for Apache. Right-click on it, select Open Apache Monitor, then Restart.
 +
 +===== Alignment viewer =====
 +
 +Before invoking the viewer, you need to run an indexing script over your aligned corpus. It will create a bunch of index files that will later tell the viewer where to look for examples of a particular word. The indexer needs the following input files:
 +
 +  * ''train.src'' ... source side of training corpus
 +  * ''train.tgt'' ... target side of training corpus
 +  * ''train.ali'' ... alignment of training corpus
 +  * ''test.src'' ... source side of test data
 +  * ''test.tgt'' ... reference translation of test data
 +  * ''test.ali'' ... alignment of the source and reference translation of test data
 +  * ''test.system.tgt'' ... system output for test data
 +  * ''test.system.ali'' ... alignment of the source and the system output for test data
 +
 +The indexer splits the output index into multiple files in order to reduce size of any individual file. All index files must be stored in the same folder as the viewing CGI scripts.

[ Back to the navigation ] [ Back to the content ]