This is an old revision of the document!
Table of Contents
Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model
Richard Wicentowski (2004): Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model
Comments
Summary
- In this paper the author presents a new supervized method for lemmatization, called WordFrame model.
- This new method is compared to existing End-Of-String method and is proven better in most of the cases.
- A combination of both methods gives even better results.
- The results are evaulated on 30 different languages with median accuracy 97.5%
- The WordFrame model algorithm trains well on noisy data, therefore it can be used in co-training with unsupervised methods.
Described models
Both models described in this paper were ment to decompose the word to some basic parts (not morphemes, but similar).
Extended End-of-String model
Decomposition of inflection into
- prefix - concatenation of all prefixes
- primary common substring - the stem
- point of suffixation change - phonologicaly induced letter change on the boundary of stem and suffix
- suffix/ending - concatenation of all suffixes of the word
WordFrame model
Decomposition of inflection into
- prefix - concatenation of all prefixes
- point of prefixation change - phonologicaly induced letter change on the boundary of first part of stem and prefix
- secondary common substring - the part of stem before stem vowel change
- vowel change - the vowel change inside the stem
- primary common substring - the part of stem after the vowel change
- point of suffixation change - phonologicaly induced letter change on the boundary of stem and suffix
- suffix/ending - concatenation of all suffixes of the word
Suggested Additional Reading
What do we like about the paper
- Robustness of the algorithm in noisy conditions
- Evaluation on many different languages
What do we dislike about the paper
- Doesn't do morphological analysis, only lemmatization
- Experiments done only on verbs
- The paper doesn't say, what option the algorithm selects if there are more possible correct results
- The algorithm only uses features based only on the word itself, it doesn't use context
- With information given in this paper, we wouldn't be able to create a program to review the results
Questions
- Does the term point of prefixation mean the same as the term morpheme boundary?
Written by Martin Kirschner