Table of Contents

Alignment by Agreement

Percy Liang, Ben Taskar, Dan Klein, link

Section 2 -- discussion about previous alignment models

IBM Models 1, 2 and HMM alignment model

We can have zero distortion.

If the final symmetrization is intersection, we can't have alignment like this:

1-1 1-2

(And with the joint model introduced in the paper, it should be unlikely).

IBM-2 has the maximum at zero.

Figure 2 – higher threshold ⇒ higher precision ⇒ lower 100-precision (=FPR).

Sum of c(pos) - sum of c(neg)? Tested on CzEng.
mean(C_HMM) = 1.45
mean(C_IBM2) = 0.4 (disregarding bucketing), -0.01 (removing floor from the formula)

Main algorithm -- section 3.

Garbage collector words are aligned to whatever is left – if they themselves are rare.

Instead of a_j, we look for z : space of all alignments → probabilities. The
function is universal – can be used for both sides of alignment.

Define joint objective, instead of:

log p_1(x ; theta_1) + log p_2(x ; theta_2)
(thetas are all model parameters)

Use formula (3). It's the inner product of probability vectors (which sum to
one) ⇒ it is cosine similarity. We want alignments forward and backward to be
similar.

Latent variable is the alignment. Apply EM, formulated in 3.2. They are however
not guaranteed to maximize (and not even to increase) the value of the objective
function in the M step.

We should try Berkley aligner. It ignores uppercase letters :)