Addicter
- - TODOs
  - Word Alignment -- Progress and Results

Addicter

The introductory page on the Addicter project is here.

This page lies in the external name space and is intended for collaboration with people outside of ÚFAL.

TODOs

test alignment with synonym detection (cz_wn required) = separating lex:: and disam::
try misplaced phrase detection
- parse the reference, extrapolate unto the hypothesis via word alignment, get phrases from there
- group adjacent words aligned to the same word? (imitation)
order evaluation
- currently finds misplaced items, but their shift distances are off
- not important for ows vs owl, but
  - to fix – for every misplaced token
    - if it (and only it) were to be moved in the original permutation, what would be the best place?
    - evaluate with nr. of intersections
try domain adaptation for word alignment with the “via source” alignment (EMNLP 2011 paper)
technical
- comb and comment the code
- add help files
- integrate with the rest of Addicter
approach applicable to learner's corpora
- see Anne Lüdelig, TLT9
try Blast (Sara's program for translation error markup)
alternative to reference-based evaluation: “Inconsistencies in Penn parsing”, M. Dickinson

Word Alignment -- Progress and Results

Latest best results

txt

Alternative model comparison

hmm = lightweight direct alignment method (in our ACL/TSD article)
gizainter = GIZA++, intersection – applied to hypotheses+references directly
gizadiag = GIZA++, grow-diag – applied to hypotheses+references directly
czenginter = align source+CzEng to reference+CzEng, and source+CzEng to hypotheses+CzEng with GIZA++, intersection, extract hypothesis-reference alignments from there (“Dan's method”)
czengdiag = same, but with GIZA++ grow-diag

	Precision/Recall/F-score:
	Lex	Order	Punct	Miss
ter*	0.106/0.387/0.167	0.025/0.191/0.044	0.132/0.936/0.232	0.026/0.170/0.046
meteor	0.092/0.251/0.135	0.047/0.229/0.078	0.248/0.665/0.361	0.020/0.382/0.038
hmm	0.162/0.426/0.234	0.069/0.309/0.112	0.281/0.793/0.415	0.025/0.400/0.047
lcs	0.168/0.462/0.247	0.000/0.000/0.000	0.293/0.848/0.435	0.026/0.374/0.049
gizainter	0.170/0.483/0.252	0.049/0.137/0.072	0.284/0.878/0.429	0.029/0.409/0.054
gizadiag*	0.183/0.512/0.270	0.044/0.250/0.075	0.285/0.784/0.417	0.038/0.224/0.065
czengdiag*	0.187/0.514/0.275	0.069/0.455/0.120	0.230/0.883/0.365	0.035/0.234/0.061
berkeley*	0.200/0.540/0.291	0.050/0.330/0.087	0.292/0.844/0.434	0.039/0.267/0.068
czenginter	0.197/0.543/0.290	0.108/0.475/0.176	0.233/0.926/0.372	0.032/0.402/0.060

* non-1-to-1 alignments, converted to 1-to-1 via “align-hmm.pl -x -a …”

Alignment combinations

via weighed HMM

	Precision/Recall/F-score:
	Lex	Order	Punct	Miss
ter+hmm	0.116/0.402/0.180	0.030/0.184/0.051	0.145/0.912/0.251	0.026/0.181/0.046
meteor+hmm	0.162/0.426/0.234	0.068/0.309/0.112	0.286/0.794/0.421	0.025/0.400/0.047
gizadiag+hmm	0.186/0.515/0.273	0.040/0.215/0.067	0.297/0.836/0.438	0.039/0.238/0.067
gizainter+hmm	0.194/0.505/0.281	0.062/0.282/0.101	0.299/0.806/0.436	0.033/0.382/0.061
berkeley+hmm	0.203/0.548/0.297	0.049/0.320/0.085	0.290/0.816/0.428	0.041/0.277/0.071
czengdiag+hmm	0.190/0.517/0.278	0.073/0.457/0.126	0.291/0.841/0.432	0.039/0.238/0.067
czenginter+hmm	0.214/0.545/0.307	0.093/0.525/0.158	0.304/0.818/0.443	0.038/0.363/0.068

berk+czengint+hmm	0.219/0.568/0.316	0.070/0.432/0.120	0.298/0.817/0.436	0.048/0.290/0.082
berk+czengint+gizaint+hmm	0.220/0.569/0.317	0.068/0.420/0.118	0.298/0.812/0.436	0.048/0.290/0.083
berk+czengint+meteor+hmm	0.220/0.569/0.317	0.070/0.440/0.121	0.295/0.810/0.433	0.048/0.290/0.083
berk+czengint+meteor+gizaint+hmm	0.221/0.571/0.318	0.068/0.424/0.118	0.298/0.808/0.436	0.049/0.292/0.084

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents