[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:rg:multilingual-noise-robust-supervised-morphological-analysis-using-the-wordframe-model [2011/01/07 17:55]
kirschner
courses:rg:multilingual-noise-robust-supervised-morphological-analysis-using-the-wordframe-model [2011/01/09 17:53] (current)
kirschner
Line 5: Line 5:
 ===== Comments ===== ===== Comments =====
  
-  +=== Summary === 
 +  In this paper the author presents a new supervized method for lemmatization, called WordFrame model. 
 +  * This new method is compared to existing End-Of-String method and is proven better in most of the cases. 
 +    * A combination of both methods gives even better results. 
 +  * The results are evaulated on 30 different languages with median accuracy 97.5% 
 +  * The WordFrame model algorithm trains well on noisy data, therefore it can be used in co-training with unsupervised methods.
      
 +=== Described models ===
 +Both models described in this paper were ment to decompose the word to some basic parts (not morphemes, but similar).
 +
 +==Extended End-of-String model==
 +Decomposition of inflection into 
 +  * prefix - //concatenation of all prefixes//
 +  * primary common substring - //the stem//
 +  * point of suffixation change - //phonologicaly induced letter change on the boundary of stem and suffix//
 +  * suffix/ending - //concatenation of all suffixes of the word//
 +
 +==WordFrame model==
 +Decomposition of inflection into 
 +  * prefix - //concatenation of all prefixes//
 +  * point of prefixation change - //phonologicaly induced letter change on the boundary of first part of stem and prefix//
 +  * secondary common substring - //the part of stem before stem vowel change//
 +  * vowel change - //the vowel change inside the stem//
 +  * primary common substring - //the part of stem after the vowel change//
 +  * point of suffixation change - //phonologicaly induced letter change on the boundary of stem and suffix//
 +  * suffix/ending - //concatenation of all suffixes of the word//
 +
 ===== Suggested Additional Reading ===== ===== Suggested Additional Reading =====
    * [[http://www.cs.swarthmore.edu/~richardw/pubs/thesis.pdf|R. Wicentowski, 2002, PhD thesis]]    * [[http://www.cs.swarthmore.edu/~richardw/pubs/thesis.pdf|R. Wicentowski, 2002, PhD thesis]]
Line 15: Line 40:
  
 ===== What do we like about the paper ===== ===== What do we like about the paper =====
-  * +  * Robustness of the algorithm in noisy conditions 
 +  * Evaluation on many different languages
  
 ===== What do we dislike about the paper ===== ===== What do we dislike about the paper =====
-  *+  * Doesn't do morphological analysis, only lemmatization 
 +  * Experiments done only on verbs 
 +  * The paper doesn't say, what option the algorithm selects if there are more possible correct results 
 +  * The algorithm only uses features based only on the word itself, it doesn't use context 
 +  * With information given in this paper, we wouldn't be able to create a program to review the results 
 + 
 +===== Questions ===== 
 +  * Does the term //point of prefixation// mean the same as the term //morpheme boundary//? 
 +  * In section 4 of the paper - //experimental results presented were done using 10-fold cross validation// 
 +    * On what data the autor did the tuning of the models? Aren't the results ? //For example ommiting the case of deletion of vowels in stem?// 
 +  * in section 4.1, Table 5 - shoudn't the WF model give allvays better results than the EOS model? The division of the word in EOS model is simplified division in WF model. Isn't it?
  
 Written by Martin Kirschner Written by Martin Kirschner

[ Back to the navigation ] [ Back to the content ]