Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model

Comments

In this paper the author presents a new supervized method for lemmatization, called WordFrame model.
This new method is compared to existing End-Of-String method and is proven better in most of the cases.
- A combination of both methods gives even better results.
The results are evaulated on 30 different languages with median accuracy 97.5%
The WordFrame model algorithm trains well on noisy data, therefore it can be used in co-training with unsupervised methods.

Both models described in this paper were ment to decompose the word to some basic parts (not morphemes, but similar).

Decomposition of inflection into

prefix - concatenation of all prefixes
primary common substring - the stem
point of suffixation change - phonologicaly induced letter change on the boundary of stem and suffix
suffix/ending - concatenation of all suffixes of the word

Decomposition of inflection into

prefix - concatenation of all prefixes
point of prefixation change - phonologicaly induced letter change on the boundary of first part of stem and prefix
secondary common substring - the part of stem before stem vowel change
vowel change - the vowel change inside the stem
primary common substring - the part of stem after the vowel change
point of suffixation change - phonologicaly induced letter change on the boundary of stem and suffix
suffix/ending - concatenation of all suffixes of the word

Doesn't do morphological analysis, only lemmatization
Experiments done only on verbs
The paper doesn't say, what option the algorithm selects if there are more possible correct results
The algorithm only uses features based only on the word itself, it doesn't use context
With information given in this paper, we wouldn't be able to create a program to review the results

Does the term point of prefixation mean the same as the term morpheme boundary?

Written by Martin Kirschner