[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
courses:rg:multilingual-noise-robust-supervised-morphological-analysis-using-the-wordframe-model [2011/01/07 15:11]
kirschner vytvořeno
courses:rg:multilingual-noise-robust-supervised-morphological-analysis-using-the-wordframe-model [2011/01/09 17:45]
kirschner
Line 5: Line 5:
 ===== Comments ===== ===== Comments =====
  
-  +=== Summary === 
 +  In this paper the author presents a new supervized method for lemmatization, called WordFrame model. 
 +  * This new method is compared to existing End-Of-String method and is proven better in most of the cases. 
 +    * A combination of both methods gives even better results. 
 +  * The results are evaulated on 30 different languages with median accuracy 97.5% 
 +  * The WordFrame model algorithm trains well on noisy data, therefore it can be used in co-training with unsupervised methods.
      
 +=== Described models ===
 +Both models described in this paper were ment to decompose the word to some basic parts (not morphemes, but similar).
 +
 +==Extended End-of-String model==
 +Decomposition of inflection into 
 +  * prefix - //concatenation of all prefixes//
 +  * primary common substring - //the stem//
 +  * point of suffixation change - //phonologicaly induced letter change on the boundary of stem and suffix//
 +  * suffix/ending - //concatenation of all suffixes of the word//
 +
 +==WordFrame model==
 +Decomposition of inflection into 
 +  * prefix - //concatenation of all prefixes//
 +  * point of prefixation change - //phonologicaly induced letter change on the boundary of first part of stem and prefix//
 +  * secondary common substring - //the part of stem before stem vowel change//
 +  * vowel change - //the vowel change inside the stem//
 +  * primary common substring - //the part of stem after the vowel change//
 +  * point of suffixation change - //phonologicaly induced letter change on the boundary of stem and suffix//
 +  * suffix/ending - //concatenation of all suffixes of the word//
 +
 ===== Suggested Additional Reading ===== ===== Suggested Additional Reading =====
-   +   * [[http://www.cs.swarthmore.edu/~richardw/pubs/thesis.pdf|R. Wicentowski, 2002, PhD thesis]] 
  
   
Line 14: Line 40:
  
 ===== What do we like about the paper ===== ===== What do we like about the paper =====
-  * +  * Robustness of the algorithm in noisy conditions 
 +  * Evaluation on many different languages
  
 ===== What do we dislike about the paper ===== ===== What do we dislike about the paper =====
-  *+  * Doesn't do morphological analysis, only lemmatization 
 +  * Experiments done only on verbs 
 +  * The paper doesn't say, what option the algorithm selects if there are more possible correct results 
 +  * The algorithm only uses features based only on the word itself, it doesn't use context 
 +  * With information given in this paper, we wouldn't be able to create a program to review the results 
 + 
 +===== Questions ===== 
 +  * Does the term //point of prefixation// mean the same as the term //morpheme boundary//?
  
 Written by Martin Kirschner Written by Martin Kirschner

[ Back to the navigation ] [ Back to the content ]