Reading Group Presentation Report from 26.03.2012

Paper

Distributed Word Clustering for Large Scale Class-Based Language Modeling
in Machine Translation by Jakob Uszkoreit and Thorsten Brants

Presented by

Long DT

Report by

Joachim Daiber

A PDF version of this report (with better display of formulas) can be found here.

Overview and Notes from the Paper

Difference between predictive and two-side class-based model

Two-side class based model:
P(wiw1i - 1) ≈ p0(wic(wi))p1(c(wi)∣c(wi - n + 1i - 1))

Predictive class-based model:
P(wiw1i - 1) ≈ p0(wic(wi))p1(c(wi)∣wi - n + 1i - 1)

The main difference is the use of words instead of classes for the history of p1.

Why does this improve the algorithm?

Exchange clustering

Complexity of Exchange Clustering


O(I ⋅ (2 ⋅ B + Nv ⋅ Nc ⋅ (Ncpre + Ncsuc)))

Predicitive Exchange Clustering

(for formula see PDF version or paper)

New complexity


O(I ⋅ Nc ⋅ (B + Nv))

The advantage: only two classes affected by a move of a word from one class to another.

Distributed clustering

Experiments

Conclusion

Questions discussed in the Reading Group

Checkpoint questions