[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:addicter [2011/09/06 17:34]
zeman
user:zeman:addicter [2012/06/22 18:05] (current)
bojar emplus acknowledged
Line 5: Line 5:
 The work on Addicter has started at the MT Marathon 2010 in Dublin, within a broader 5-day project called Failfinder (Dan Zeman, Ondřej Bojar, Martin Popel, David Mareček, Jon Clark, Ken Heafield, Qin Gao, Loïc Barrault). The code that resulted from the project can be freely downloaded from https://failfinder.googlecode.com/svn/trunk/. The nucleus that existed just after the MT Marathon (4 Feb 2010) is Addicter version 0.1, to reflect that this was by no means deemed a final product. The work on Addicter has started at the MT Marathon 2010 in Dublin, within a broader 5-day project called Failfinder (Dan Zeman, Ondřej Bojar, Martin Popel, David Mareček, Jon Clark, Ken Heafield, Qin Gao, Loïc Barrault). The code that resulted from the project can be freely downloaded from https://failfinder.googlecode.com/svn/trunk/. The nucleus that existed just after the MT Marathon (4 Feb 2010) is Addicter version 0.1, to reflect that this was by no means deemed a final product.
  
-In 2011, the viewer was accompanied by an automatic error recognizer and classifier, thanks to Mark Fishel. The development has been moved to ÚFAL StatMT SVN repository (i.e. ''failfinder.googlecode.com'' is currently not maintained).+In 2011, the viewer was accompanied by an automatic error recognizer and classifier, thanks to Mark Fishel. The development has been moved to ÚFAL StatMT SVN repository (i.e. ''failfinder.googlecode.com'' is currently not maintained). In September 2011 at the Sixth MT Marathon in Trento, Addicter was further developed and thoroughly compared with another tool for error analysis, [[http://www.dfki.de/~mapo02/hjerson/|Hjerson]]. See the [[http://statmt.org/mtm6/?n=Main.AutomaticMTErrorAnalysis|project wiki]]. For further developments, see also [[http://terra.cl.uzh.ch/|the Terra website]].
  
 Currently, Addicter can do the following: Currently, Addicter can do the following:
Line 38: Line 38:
     * Look for a ''ScriptAlias'' directive. It tells the server: 1. what path on the hard disk contains scripts that can generate dynamic HTML content on the fly, and 2. how the path will be represented in the URL (web address). For example <code>ScriptAlias /cgi/ "C:/Documents and Settings/Dan/Documents/Web/cgi/"</code> says that the URL ''http://localhost/cgi/anyscript.pl'' leads to your script ''C:\Documents and Settings\Dan\Documents\Web\cgi\anyscript.pl'', and that it's a script (i.e., the server shall invoke it and send its output, instead of sending the script itself).     * Look for a ''ScriptAlias'' directive. It tells the server: 1. what path on the hard disk contains scripts that can generate dynamic HTML content on the fly, and 2. how the path will be represented in the URL (web address). For example <code>ScriptAlias /cgi/ "C:/Documents and Settings/Dan/Documents/Web/cgi/"</code> says that the URL ''http://localhost/cgi/anyscript.pl'' leads to your script ''C:\Documents and Settings\Dan\Documents\Web\cgi\anyscript.pl'', and that it's a script (i.e., the server shall invoke it and send its output, instead of sending the script itself).
     * Under Windows, you will also want to set <code>ScriptInterpreterSource registry</code> It tells the server that the Windows registry shall be used to figure out how to run a script (e.g., that ''C:\Perl\Perl.exe'' binary must be run to interpret a ''.pl'' script).     * Under Windows, you will also want to set <code>ScriptInterpreterSource registry</code> It tells the server that the Windows registry shall be used to figure out how to run a script (e.g., that ''C:\Perl\Perl.exe'' binary must be run to interpret a ''.pl'' script).
 +    * CGI scripts will not run under the same environment as a user command line. They will not see the ''PERLLIB'' variable and thus not find the libraries unless we specifically instruct Apache to pass the variable to the CGI environment: <code>PassEnv PERLLIB PERL5LIB</code>
   * Restart the server. On the main Windows panel, there is (typically in the lower right corner) a set of icons, including a new one for Apache. Right-click on it, select Open Apache Monitor, then Restart.   * Restart the server. On the main Windows panel, there is (typically in the lower right corner) a set of icons, including a new one for Apache. Right-click on it, select Open Apache Monitor, then Restart.
  
Line 58: Line 59:
 ==== How to install Addicter ==== ==== How to install Addicter ====
  
-We use ''$CGI'' to refer to the path you registered with Apache as containing CGI scripts (using the ''ScriptAlias'' directive). **NOTE:** If you are using Addicter's own web server or if Addicter content is the only thing you intend to use the server to serve, probably the easiest thing to do is to set the Addicter's ''cgi'' folder as your ''$CGI''.+We use ''$CGI'' to refer to the path you registered with Apache as containing CGI scripts (using the ''ScriptAlias'' directive). **NOTE:** If you are using Addicter's own web server or if Addicter content is the only thing you intend to use the server to serve, probably the easiest thing to do is to set the Addicter's ''cgi'' folder as your ''$CGI''. **NOTE 2:** There are couple of files with static (non-CGI) web content, needed by the CGI scripts. These files (currently ''tabs.gif'' and ''activatables.js'') are in ''$CGI/..''. With Addicter's own web server, this is just fine. If you are using another web server, however, you must copy these files to the appropriate location in your static content directory structure so that the server finds them. They should not be directly in the ''$CGI'' folder because they are not scripts and should not be treated as scripts by the server.
  
   * Addicter uses some general-purpose Perl libraries that are maintained in a separate repository. Download these first, using username ''public'' and password ''public''. Then make sure that Perl finds these libraries. In Linux/bash, the following commands will do that: <code bash>svn --username public checkout https://svn.ms.mff.cuni.cz/svn/dzlib ~/lib   * Addicter uses some general-purpose Perl libraries that are maintained in a separate repository. Download these first, using username ''public'' and password ''public''. Then make sure that Perl finds these libraries. In Linux/bash, the following commands will do that: <code bash>svn --username public checkout https://svn.ms.mff.cuni.cz/svn/dzlib ~/lib
Line 107: Line 108:
 The error classifier currently uses its own monlingual word-alignment of reference translation and the hypothesis. It is invoked as follows: The error classifier currently uses its own monlingual word-alignment of reference translation and the hypothesis. It is invoked as follows:
  
-<code bash>${ADDICTER}/testchamber/align-hmm.pl ref.txt hyp.txt > tcali.txt +<code bash>${ADDICTER}/prepare/detecter.pl -s srcfile -r reffile -h hypfile [-a alignment] -w workdir</code>
-${ADDICTER}/testchamber/finderrs.pl src.txt hyp.txt ref.txt tcali.txt > tcerr.txt +
-${ADDICTER}/testchamber/errsummary.pl tcerr.txt</code>+
  
-Place the files ''tcali.txt'' and ''tcerr.txt'' in the experiment subfolder of ''$CGI'' and the error classes will be displayed during test data browsing in the viewer.+and it creates the files ''workdir/tcali.txt'' and ''workdir/tcerr.txt''. The input files (src, ref and hyp) can also be gzipped. Custom alignment between hypothesis and reference can be supplied. If it is not supplied, then the default aligner (''${ADDICTER}/testchamber/align-greedy.pl'') is invoked. 
 + 
 +Place the files ''tcali.txt'' and ''tcerr.txt'' in the experiment subfolder of ''$CGI'' and the error classes will be displayed during test data browsing in the viewer. The viewer can work with several alternating alignments (perhaps using different aligning algorithms) of the same data. For each of those alignments, you have to run ''detecter.pl'' separately.
  
 ==== How to use the viewer ==== ==== How to use the viewer ====
  
-First make sure that your web server is running and configured properly and that your index and data files have been prepared in the correct place. If you do not use your own web server, invoke the script ''server.pl'' in the main Addicter folder. It will save something like+First make sure that your web server is running and configured properly and that your index and data files have been prepared in the correct place. If you do not use your own web server, invoke the script ''server.pl'' in the main Addicter folder. It will say something like
  
 <code>Please contact me at: <URL:http://localhost:2588/cgi/index.pl></code> <code>Please contact me at: <URL:http://localhost:2588/cgi/index.pl></code>
Line 125: Line 126:
 ===== Acknowledgements ===== ===== Acknowledgements =====
  
-This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838 (2010), by the grants of the Czech Science Foundation no. P406/11/1499 and P406/10/P259 and the Estonian Science Foundation target financed theme SF0180078s08 (2011).+This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838 (2010), by the grants of the Czech Science Foundation no. P406/11/1499 and P406/10/P259the Estonian Science Foundation target financed theme SF0180078s08 (2011) and by the project EuroMatrixPlus (FP7-ICT-2007-3-231720 of the EU and 7E09003+7E11051 of the Ministry of Education, Youth and Sports of the Czech Republic; 2011-2012). 
 + 
 +===== Publications ===== 
 + 
 +  * Mark Fishel, Ondřej Bojar, Daniel Zeman, Jan Berka: //[[http://ufal.mff.cuni.cz:8080/bib/?section=publication&id=-8080356071532247643&mode=view|Automatic Translation Error Analysis]].// In: TSD 2011, Plzeň, Czechia, 2011 
 +  * Daniel Zeman, Mark Fishel, Jan Berka, Ondřej Bojar: //[[http://ufal.mff.cuni.cz:8080/bib/?section=publication&id=-4951860774549764116&mode=view|Addicter: What Is Wrong with My Translations?]]// In: The Prague Bulletin of Mathematical Linguistics, vol. 96, pp. 79–88; MT Marathon, Trento, Italy, 2011 
 +  * Maja Popović: //[[http://ufal.ms.mff.cuni.cz/pbml/96/art-popovic.pdf|Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output]].// In: The Prague Bulletin of Mathematical Linguistics, vol. 96, pp. 59–68; MT Marathon, Trento, Italy, 2011 
 +  * Jan Berka, Ondřej Bojar, Mark Fishel, Maja Popović, Daniel Zeman: //[[http://ufal.mff.cuni.cz:8080/bib/?section=publication&id=-6688512557324036032&mode=view|Automatic MT Error Analysis: Hjerson Helping Addicter]].// In: Proceedings of LREC 2012, İstanbul, Turkey, 2012
  

[ Back to the navigation ] [ Back to the content ]