This is an old revision of the document!
Table of Contents
Addicter
The introductory page on the Addicter project is here.
This page lies in the external name space and is intended for collaboration with people outside of ÚFAL.
Word Alignment -- Progress and Results
Latest best results
Alternative model comparison
hmm = lightweight direct alignment method (in our ACL/TSD article)
gizainter = GIZA++, intersection – applied to hypotheses+references directly
gizadiag = GIZA++, grow-diag – applied to hypotheses+references directly
czenginter = align source+CzEng to reference+CzEng, and source+CzEng to hypotheses+CzEng with GIZA++, intersection, extract hypothesis-reference alignments from there (“Dan's method”)
czengdiag = same, but with GIZA++ grow-diag
Precision/Recall/F-score: | ||||
---|---|---|---|---|
Lex | Order | Punct | Miss | |
ter* | 0.106/0.387/0.167 | 0.025/0.191/0.044 | 0.132/0.936/0.232 | 0.026/0.170/0.046 |
meteor | 0.092/0.251/0.135 | 0.047/0.229/0.078 | 0.248/0.665/0.361 | 0.020/0.382/0.038 |
hmm | 0.162/0.426/0.234 | 0.069/0.309/0.112 | 0.281/0.793/0.415 | 0.025/0.400/0.047 |
lcs | 0.168/0.462/0.247 | 0.000/0.000/0.000 | 0.293/0.848/0.435 | 0.026/0.374/0.049 |
gizainter | 0.170/0.483/0.252 | 0.049/0.137/0.072 | 0.284/0.878/0.429 | 0.029/0.409/0.054 |
gizadiag* | 0.183/0.512/0.270 | 0.044/0.250/0.075 | 0.285/0.784/0.417 | 0.038/0.224/0.065 |
berkeley* | 0.200/0.540/0.291 | 0.050/0.330/0.087 | 0.292/0.844/0.434 | 0.039/0.267/0.068 |
Explicit wrong lex choice detection | ||||
czengdiag* | 0.187/0.514/0.275 | 0.069/0.455/0.120 | 0.230/0.883/0.365 | 0.035/0.234/0.061 |
czenginter | 0.197/0.543/0.290 | 0.108/0.475/0.176 | 0.233/0.926/0.372 | 0.032/0.402/0.060 |
* non-1-to-1 alignments, converted to 1-to-1 via “align-hmm.pl -x -a …”
Alignment combinations
via weighed HMM
Precision/Recall/F-score: | ||||
---|---|---|---|---|
Lex | Order | Punct | Miss | |
ter+hmm | 0.116/0.402/0.180 | 0.030/0.184/0.051 | 0.145/0.912/0.251 | 0.026/0.181/0.046 |
meteor+hmm | 0.162/0.426/0.234 | 0.068/0.309/0.112 | 0.286/0.794/0.421 | 0.025/0.400/0.047 |
gizadiag+hmm | 0.186/0.515/0.273 | 0.040/0.215/0.067 | 0.297/0.836/0.438 | 0.039/0.238/0.067 |
gizainter+hmm | 0.194/0.505/0.281 | 0.062/0.282/0.101 | 0.299/0.806/0.436 | 0.033/0.382/0.061 |
berkeley+hmm | 0.203/0.548/0.297 | 0.049/0.320/0.085 | 0.290/0.816/0.428 | 0.041/0.277/0.071 |
czengdiag+hmm | 0.190/0.517/0.278 | 0.073/0.457/0.126 | 0.291/0.841/0.432 | 0.039/0.238/0.067 |
czenginter+hmm | 0.214/0.545/0.307 | 0.093/0.525/0.158 | 0.304/0.818/0.443 | 0.038/0.363/0.068 |
berk+czengint+hmm | 0.219/0.568/0.316 | 0.070/0.432/0.120 | 0.298/0.817/0.436 | 0.048/0.290/0.082 |
berk+czengint+gizaint+hmm | 0.220/0.569/0.317 | 0.068/0.420/0.118 | 0.298/0.812/0.436 | 0.048/0.290/0.083 |
berk+czengint+meteor+hmm | 0.220/0.569/0.317 | 0.070/0.440/0.121 | 0.295/0.810/0.433 | 0.048/0.290/0.083 |
berk+czengint+meteor+gizaint+hmm | 0.221/0.571/0.318 | 0.068/0.424/0.118 | 0.298/0.808/0.436 | 0.049/0.292/0.084 |
TODOs
- test alignment with synonym detection (cz_wn required) = separating
lex::
anddisam::
- order evaluation
- a lot of background research
- currently finds misplaced items, but their shift distances are off
- to fix – for every misplaced token
- if it (and only it) were to be moved in the original permutation, what would be the best place?
- evaluate with nr. of intersections
- try domain adaptation for word alignment, EMNLP 2011 paper
- comb and comment the code
- add help files
- integrate with the rest of Addicter
- approach applicable to learner's corpora
- see Anne Lüdelig, TLT9
- adapt Addicter to Sara's program
- alternative to reference-based evaluation: “Inconsistencies in Penn parsing”, M. Dickinson