====== Improving morphology induction by learning spelling rules, ACL 2009 ====== === Jason Naradowsky and Sharon Goldwater === Presented by: Loganathan Ramasamy Report by: Eduard Bejček ===== Introduction ===== * The paper describes morphology induction using Bayesian approach * It is based on the Minimum Description Length (MDL) principle * Baseline: Goldwater et al., 2006: [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.9842&rep=rep1&type=pdf|Interpolating between types and tokens by estimation power-law generators]] * only stem & suffix * Dirichlet priors over the multinominal distributions for word class, stem and suffix * Improvements to the baseline system * introduces spelling rules (context/change, e.g.: "ut_i"/ε→t (in "shut.ing") or "ke_i"/e→ε (in "take.ing")), which are simultaneously learned with morphological analysis * Dirichlet priors first set by hand: to prefer empty rules to deletion/insertion ===== What do we dislike about the paper ===== * Experiments are done only for English and only for verbs -- that's constrained too much * Results are not convincing enough -- F-score and underlying form accuracy outperform baseline only for stems (not for suffixes) and precision doesn't outperform baseline at all ===== What do we like about the paper ===== * Loganathan has the code (although he couldn't compile it, yet) * Spelling rules are also simultaneously learned along with morphological analysis * It's unsupervised and clever: using just a couple of (hyper)parameters (some of them are learned automatically), they can describe a wide range of morphological rules - within one elegant framework.