[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Reading Group Presentation Report from 26.03.2012

Paper

Distributed Word Clustering for Large Scale Class-Based Language Modeling
in Machine Translation by Jakob Uszkoreit and Thorsten Brants

Presented by

Long DT

Report by

Joachim Daiber

Overview and Notes from the Paper

Difference between predictive and two-side class-based model

Two-side class based model:
P(wiw1i - 1) ≈ p0(wic(wi))p1(c(wi)∣c(wi - n + 1i - 1))

Predictive class-based model:
P(wiw1i - 1) ≈ p0(wic(wi))p1(c(wi)∣wi - n + 1i - 1)

The main difference is the use of words instead of classes for the history of p1.

Why does this improve the algorithm?

Exchange clustering

Complexity of Exchange Clustering


O(I ⋅ (2 ⋅ B + Nv ⋅ Nc ⋅ (Ncpre + Ncsuc)))

Predicitive Exchange Clustering


$ P({w_i}|w_1^{i - 1}) \approx {p_0}({w_i}|c({w_i})) \cdot {p_1}(c({w_i}|{w_{i - 1}})) = \frac{{N({w_i})}}{{N(c({w_i}))}} \cdot \frac{{N({w_{i - 1}},c({w_i})}}{{N({w_{i - 1}})}} $

New complexity


O(I ⋅ Nc ⋅ (B + Nv))

The advantage: only two classes affected by a move of a word from one class to another.

Distributed clustering

Experiments

Conclusion

Questions discussed in the Reading Group

Checkpoint questions


[ Back to the navigation ] [ Back to the content ]