====== Addicter ====== The introductory page on the Addicter project is [[user:zeman:addicter|here]]. This page lies in the external name space and is intended for collaboration with people outside of ÚFAL. ==== TODOs ==== * test alignment with synonym detection (cz_wn required) = separating ''lex::'' and ''disam::'' * try misplaced phrase detection * parse the reference, extrapolate unto the hypothesis via word alignment, get phrases from there * group adjacent words aligned to the same word? (imitation) * order evaluation * currently finds misplaced items, but their shift distances are off * not important for ''ows'' vs ''owl'', but * to fix -- for every misplaced token * if it (and only it) were to be moved in the original permutation, what would be the best place? * evaluate with nr. of intersections * try domain adaptation for word alignment with the "via source" alignment (EMNLP 2011 paper) * technical * comb and comment the code * add help files * integrate with the rest of Addicter * approach applicable to learner's corpora * see Anne Lüdelig, TLT9 * try Blast (Sara's program for translation error markup) * alternative to reference-based evaluation: "Inconsistencies in Penn parsing", M. Dickinson ==== Word Alignment -- Progress and Results ==== === Latest best results === [[http://mtj.ut.ee/addicter-best.txt|txt]] === Alternative model comparison === hmm = lightweight direct alignment method (in our ACL/TSD article) gizainter = GIZA++, intersection -- applied to hypotheses+references directly gizadiag = GIZA++, grow-diag -- applied to hypotheses+references directly czenginter = align source+CzEng to reference+CzEng, and source+CzEng to hypotheses+CzEng with GIZA++, intersection, extract hypothesis-reference alignments from there ("Dan's method") czengdiag = same, but with GIZA++ grow-diag | ^ Precision/Recall/F-score: ^^^^ | ^ Lex ^ Order ^ Punct ^ Miss ^ ^ ter* |0.106/0.387/**0.167** |0.025/0.191/**0.044** |0.132/0.936/**0.232** |0.026/0.170/**0.046** | ^ meteor |0.092/0.251/**0.135** |0.047/0.229/**0.078** |0.248/0.665/**0.361** |0.020/0.382/**0.038** | ^ hmm |0.162/0.426/**0.234** |0.069/0.309/**0.112** |0.281/0.793/**0.415** |0.025/0.400/**0.047** | ^ lcs |0.168/0.462/**0.247** |0.000/0.000/**0.000** |0.293/0.848/**0.435** |0.026/0.374/**0.049** | ^ gizainter |0.170/0.483/**0.252** |0.049/0.137/**0.072** |0.284/0.878/**0.429** |0.029/0.409/**0.054** | ^ gizadiag* |0.183/0.512/**0.270** |0.044/0.250/**0.075** |0.285/0.784/**0.417** |0.038/0.224/**0.065** | ^ czengdiag* |0.187/0.514/**0.275** |0.069/0.455/**0.120** |0.230/0.883/**0.365** |0.035/0.234/**0.061** | ^ berkeley* |0.200/0.540/**0.291** |0.050/0.330/**0.087** |0.292/0.844/**0.434** |0.039/0.267/**0.068** | ^ czenginter |0.197/0.543/**0.290** |0.108/0.475/**0.176** |0.233/0.926/**0.372** |0.032/0.402/**0.060** | * non-1-to-1 alignments, converted to 1-to-1 via "align-hmm.pl -x -a ..." === Alignment combinations === via weighed HMM | ^ Precision/Recall/F-score: ^^^^ | ^ Lex ^ Order ^ Punct ^ Miss ^ ^ ter+hmm |0.116/0.402/**0.180** |0.030/0.184/**0.051** |0.145/0.912/**0.251** |0.026/0.181/**0.046** | ^ meteor+hmm |0.162/0.426/**0.234** |0.068/0.309/**0.112** |0.286/0.794/**0.421** |0.025/0.400/**0.047** | ^ gizadiag+hmm |0.186/0.515/**0.273** |0.040/0.215/**0.067** |0.297/0.836/**0.438** |0.039/0.238/**0.067** | ^ gizainter+hmm |0.194/0.505/**0.281** |0.062/0.282/**0.101** |0.299/0.806/**0.436** |0.033/0.382/**0.061** | ^ berkeley+hmm |0.203/0.548/**0.297** |0.049/0.320/**0.085** |0.290/0.816/**0.428** |0.041/0.277/**0.071** | ^ czengdiag+hmm |0.190/0.517/**0.278** |0.073/0.457/**0.126** |0.291/0.841/**0.432** |0.039/0.238/**0.067** | ^ czenginter+hmm |0.214/0.545/**0.307** |0.093/0.525/**0.158** |0.304/0.818/**0.443** |0.038/0.363/**0.068** | | ||||| ^ berk+czengint+hmm |0.219/0.568/**0.316** |0.070/0.432/**0.120** |0.298/0.817/**0.436**|0.048/0.290/**0.082** | ^ berk+czengint+gizaint+hmm |0.220/0.569/**0.317** |0.068/0.420/**0.118** |0.298/0.812/**0.436**|0.048/0.290/**0.083** | ^ berk+czengint+meteor+hmm |0.220/0.569/**0.317** |0.070/0.440/**0.121** |0.295/0.810/**0.433**|0.048/0.290/**0.083** | ^ berk+czengint+meteor+gizaint+hmm |0.221/0.571/**0.318** |0.068/0.424/**0.118** |0.298/0.808/**0.436** |0.049/0.292/**0.084** |