Table of Contents
Improving morphology induction by learning spelling rules, ACL 2009
Jason Naradowsky and Sharon Goldwater
Presented by: Loganathan Ramasamy
Report by: Eduard Bejček
Introduction
- The paper describes morphology induction using Bayesian approach
- It is based on the Minimum Description Length (MDL) principle
- Baseline: Goldwater et al., 2006: Interpolating between types and tokens by estimation power-law generators
- only stem & suffix
- Dirichlet priors over the multinominal distributions for word class, stem and suffix
- Improvements to the baseline system
- introduces spelling rules (context/change, e.g.: “ut_i”/ε→t (in “shut.ing”) or “ke_i”/e→ε (in “take.ing”)), which are simultaneously learned with morphological analysis
- Dirichlet priors first set by hand: to prefer empty rules to deletion/insertion
What do we dislike about the paper
- Experiments are done only for English and only for verbs – that's constrained too much
- Results are not convincing enough – F-score and underlying form accuracy outperform baseline only for stems (not for suffixes) and precision doesn't outperform baseline at all
What do we like about the paper
- Loganathan has the code (although he couldn't compile it, yet)
- Spelling rules are also simultaneously learned along with morphological analysis
- It's unsupervised and clever: using just a couple of (hyper)parameters (some of them are learned automatically), they can describe a wide range of morphological rules - within one elegant framework.