[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
external:addicter [2011/05/16 06:57]
mphi
external:addicter [2011/05/22 09:31] (current)
mphi
Line 5: Line 5:
 This page lies in the external name space and is intended for collaboration with people outside of ÚFAL. This page lies in the external name space and is intended for collaboration with people outside of ÚFAL.
  
-==== Progress, Results and TODOs ====+==== TODOs ==== 
 +  * test alignment with synonym detection (cz_wn required) = separating ''lex::'' and ''disam::'' 
 +  * try misplaced phrase detection 
 +    * parse the reference, extrapolate unto the hypothesis via word alignment, get phrases from there 
 +    * group adjacent words aligned to the same word? (imitation) 
 +  * order evaluation 
 +    * currently finds misplaced items, but their shift distances are off 
 +    * not important for ''ows'' vs ''owl'', but 
 +      * to fix -- for every misplaced token 
 +        * if it (and only it) were to be moved in the original permutation, what would be the best place? 
 +        * evaluate with nr. of intersections 
 +  * try domain adaptation for word alignment with the "via source" alignment (EMNLP 2011 paper) 
 +  * technical 
 +    * comb and comment the code 
 +    * add help files 
 +    * integrate with the rest of Addicter 
 +  * approach applicable to learner's corpora 
 +    * see Anne Lüdelig, TLT9 
 +  * try Blast (Sara's program for translation error markup) 
 +  * alternative to reference-based evaluation: "Inconsistencies in Penn parsing", M. Dickinson
  
-===  Word alignment ===+===Word Alignment -- Progress and Results ====
  
-Alternative model comparison +=== Latest best results === 
-|| border=1 +[[http://mtj.ut.ee/addicter-best.txt|txt]]
-||             || Precision/Recall/F-score|||||||| +
-||                               || Lex      || Order    || Punct    || Miss     || +
-|| meteor    ||0.092/0.251/**0.135** ||0.047/0.229/'''0.078''' ||0.248/0.665/'''0.361''' ||0.020/0.382/'''0.038''' || +
-|| ter*      ||0.106/0.387/**0.167** ||0.025/0.191/'''0.044''' ||0.132/0.936/'''0.232''' ||0.026/0.170/'''0.046''' || +
-|| hmm       ||0.162/0.426/**0.234** ||0.069/0.309/'''0.112''' ||0.281/0.793/'''0.415''' ||0.025/0.400/'''0.047''' || +
-|| lcs       ||0.168/0.462/**0.247** ||0.000/0.000/'''0.000''' ||0.293/0.848/'''0.435''' ||0.026/0.374/'''0.049''' || +
-|| gizadiag* ||0.183/0.512/**0.270** ||0.044/0.250/'''0.075''' ||0.285/0.784/'''0.417''' ||0.038/0.224/'''0.065''' || +
-|| gizainter ||0.170/0.483/**0.252** ||0.049/0.137/'''0.072''' ||0.284/0.878/'''0.429''' ||0.029/0.409/'''0.054''' || +
-|| berkeley* ||0.200/0.540/**0.291** ||0.050/0.330/'''0.087''' ||0.292/0.844/'''0.434''' ||0.039/0.267/'''0.068''' ||+
  
-!!! Explicit wrong lex choice detection +=== Alternative model comparison === 
-align input+czeng to reference+czeng and input+czeng to hypotheses+czeng + 
-extract hypothesis-to-reference alignments from there +hmm = lightweight direct alignment method (in our ACL/TSD article) 
-|| border=1 +gizainter = GIZA++, intersection -- applied to hypotheses+references directly 
-||             || Precision/Recall/F-score: |||||||| +gizadiag = GIZA++, grow-diag -- applied to hypotheses+references directly 
-||                               || Lex      || Order    || Punct    || Miss     || +czenginter = align source+CzEng to reference+CzEng, and source+CzEng to hypotheses+CzEng with GIZA++, intersection, extract hypothesis-reference alignments from there ("Dan's method") 
-|| czengdiag* ||0.187/0.514/'''0.275''' ||0.069/0.455/'''0.120''' ||0.230/0.883/'''0.365''' ||0.035/0.234/'''0.061''' |+czengdiag same, but with GIZA++ grow-diag 
-|| czenginter ||0.197/0.543/'''0.290''' ||0.108/0.475/'''0.176''' ||0.233/0.926/'''0.372''' ||0.032/0.402/'''0.060''' |+ 
-TODO: also try domain adaptation for word alignmentEMNLP 2011 paper+          ^     Precision/Recall/F-score:     ^^^^ 
 +          ^          Lex                 Order        ^         Punct        ^         Miss         ^ 
 +^ ter*      |0.106/0.387/**0.167** |0.025/0.191/**0.044** |0.132/0.936/**0.232** |0.026/0.170/**0.046** | 
 +^ meteor    |0.092/0.251/**0.135** |0.047/0.229/**0.078** |0.248/0.665/**0.361** |0.020/0.382/**0.038** 
 +^ hmm       |0.162/0.426/**0.234** |0.069/0.309/**0.112** |0.281/0.793/**0.415** |0.025/0.400/**0.047** | 
 +^ lcs       |0.168/0.462/**0.247** |0.000/0.000/**0.000** |0.293/0.848/**0.435** |0.026/0.374/**0.049** | 
 +^ gizainter |0.170/0.483/**0.252** |0.049/0.137/**0.072** |0.284/0.878/**0.429** |0.029/0.409/**0.054** 
 +^ gizadiag* |0.183/0.512/**0.270** |0.044/0.250/**0.075** |0.285/0.784/**0.417** |0.038/0.224/**0.065** | 
 +^ czengdiag* |0.187/0.514/**0.275** |0.069/0.455/**0.120** |0.230/0.883/**0.365** |0.035/0.234/**0.061** 
 +^ berkeley* |0.200/0.540/**0.291** |0.050/0.330/**0.087** |0.292/0.844/**0.434** |0.039/0.267/**0.068** | 
 +^ czenginter |0.197/0.543/**0.290** |0.108/0.475/**0.176** |0.233/0.926/**0.372** |0.032/0.402/**0.060** | 
 + 
 +non-1-to-1 alignmentsconverted to 1-to-1 via "align-hmm.pl -x -a ..." 
 + 
 +=== Alignment combinations === 
 +via weighed HMM 
 + 
 +|                     Precision/Recall/F-score:     ^^^^ 
 +|                     Lex         Order         Punct         Miss     ^ 
 +^ ter+hmm         |0.116/0.402/**0.180** |0.030/0.184/**0.051** |0.145/0.912/**0.251** |0.026/0.181/**0.046** | 
 +^ meteor+hmm      |0.162/0.426/**0.234** |0.068/0.309/**0.112** |0.286/0.794/**0.421** |0.025/0.400/**0.047** | 
 +^ gizadiag+hmm    |0.186/0.515/**0.273** |0.040/0.215/**0.067** |0.297/0.836/**0.438** |0.039/0.238/**0.067** | 
 +^ gizainter+hmm   |0.194/0.505/**0.281** |0.062/0.282/**0.101** |0.299/0.806/**0.436** |0.033/0.382/**0.061** | 
 +^ berkeley+hmm    |0.203/0.548/**0.297** |0.049/0.320/**0.085** |0.290/0.816/**0.428** |0.041/0.277/**0.071** | 
 +^ czengdiag+hmm   |0.190/0.517/**0.278** |0.073/0.457/**0.126** |0.291/0.841/**0.432** |0.039/0.238/**0.067** | 
 +^ czenginter+hmm  |0.214/0.545/**0.307** |0.093/0.525/**0.158** |0.304/0.818/**0.443** |0.038/0.363/**0.068** | 
 +| ||||| 
 +^ berk+czengint+hmm |0.219/0.568/**0.316** |0.070/0.432/**0.120** |0.298/0.817/**0.436**|0.048/0.290/**0.082** | 
 +^ berk+czengint+gizaint+hmm |0.220/0.569/**0.317** |0.068/0.420/**0.118** |0.298/0.812/**0.436**|0.048/0.290/**0.083** | 
 +^ berk+czengint+meteor+hmm |0.220/0.569/**0.317** |0.070/0.440/**0.121** |0.295/0.810/**0.433**|0.048/0.290/**0.083** | 
 +^ berk+czengint+meteor+gizaint+hmm |0.221/0.571/**0.318** |0.068/0.424/**0.118** |0.298/0.808/**0.436** |0.049/0.292/**0.084** |
  
-!!! alignment combinations via weighed HMM 
-|| border=1 
-||             || Precision/Recall/F-score: |||||||| 
-||                               || Lex      || Order    || Punct    || Miss     || 
-|| meteor+hmm ||0.162/0.426/'''0.234''' ||0.068/0.309/'''0.112''' ||0.286/0.794/'''0.421''' ||0.025/0.400/'''0.047''' || 
-|| ter+hmm ||0.116/0.402/'''0.180''' ||0.030/0.184/'''0.051''' ||0.145/0.912/'''0.251''' ||0.026/0.181/'''0.046''' || 
-|| gizadiag+hmm ||0.186/0.515/'''0.273''' ||0.040/0.215/'''0.067''' ||0.297/0.836/'''0.438''' ||0.039/0.238/'''0.067''' || 
-|| gizainter+hmm ||0.194/0.505/'''0.281''' ||0.062/0.282/'''0.101''' ||0.299/0.806/'''0.436''' ||0.033/0.382/'''0.061''' || 
-|| berkeley+hmm ||0.203/0.548/'''0.297''' ||0.049/0.320/'''0.085''' ||0.290/0.816/'''0.428''' ||0.041/0.277/'''0.071''' || 
-|| czengdiag+hmm ||0.190/0.517/'''0.278''' ||0.073/0.457/'''0.126''' ||0.291/0.841/'''0.432''' ||0.039/0.238/'''0.067''' || 
-|| czenginter+hmm ||0.214/0.545/'''0.307''' ||0.093/0.525/'''0.158''' ||0.304/0.818/'''0.443''' ||0.038/0.363/'''0.068''' || 
-|| |||||||||| 
-|| berkeley+czenginter+hmm ||0.219/0.568/'''0.316''' ||0.070/0.432/'''0.120''' ||0.298/0.817/'''0.436''' ||0.048/0.290/'''0.082''' || 
-|| berkeley+czenginter+gizainter+hmm ||0.220/0.569/'''0.317''' ||0.068/0.420/'''0.118''' ||0.298/0.812/'''0.436''' ||0.048/0.290/'''0.083''' || 
-|| berkeley+czenginter+meteor+hmm ||0.220/0.569/'''0.317''' ||0.070/0.440/'''0.121''' ||0.295/0.810/'''0.433''' ||0.048/0.290/'''0.083''' || 
  
-!!! TODO 
-* test alignment with synonym detection (cz_wn required) = separating @@lex@@ and @@disam@@ 
-* order evaluation 
-** a lot of background research 
-** currently finds misplaced items, but their shift distances are off 
-*** to fix -- for every misplaced token 
-**** if it (and only it) were to be moved in the original permutation, what would be the best place? 
-**** evaluate with nr. of intersections 
-* comb and comment the code 
-* add help files 
-* integrate with the rest of Addicter 
-* learner's corpus 
-** see Anne Lüdelig, TLT9 
-* adapt to Sara's program 
-* alternative to reference-based evaluation: "Inconsistencies in Penn parsing", M. Dickinson 

[ Back to the navigation ] [ Back to the content ]