[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Overview of the article:
1. Introduction
2. Related work
3-5. Three challenges and approaches to overcome them
6. Experiment
7. Conclusion

The article is about overcoming the problem of vocabulary sparsity in SMT. The sparsity occurs because many words can have inflection or can take different affixes while in the vocabulary we might not find all those forms.
The authors of the article introduce three problems and their methods to overcome this challenges:
(1) common stems are fragmented into many different forms in training data;
(2) rare and unknown words are frequent in test data;
(3) spelling variation creates additional sparseness problems.

To solve the indicated problems authors modify training and test aligned bilingual data.
For the first challenge the don't intend to do complex morphological analysis, but they apply lightweight technique

We are not absolutely sure about the terminology of the article.
In mathematics, a lattice is a partially ordered set in which any two elements have a unique supremum (the elements' least upper bound; called their join) and an infimum (greatest lower bound; called their meet).
The lattice on the figure 1(b) seems to have a direction, so it might be Confusion Network, rather than lattice.
A Confusion Network (CN), also known as a sausage, is a weighted directed graph with the peculiarity that each path from the start node to the end node goes through all the other nodes. Each edge is labeled with a word and a (posterior) probability.


[ Back to the navigation ] [ Back to the content ]