Table of Contents
From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing
- Valentin Spitkovsky, Hiyan Alshawi, Daniel Jurafsky
1 Introduction
- Klein and Manning’s (2004) Dependency Model with Valence (DMV) - was the first to beat a simple parsing heuristic — the right-branching baseline.
- Headden et al.’s (2009) Extended Valence Grammar (EVG) combats data sparsity with smoothing alone, training on the same small subset of the tree-bank as the classic implementation of the DMV;
- Cohen and Smith (2009) use more complicated algorithms (variational EM and MBR decoding) and stronger linguistic hints (tying related parts of speech and syntactically similar bilingual data)
2 Intuition
Baby Steps
- Its base case — sentences of length one — has a trivial solution that requires neither initialization nor search yet reveals something of sentence heads.
- The next step — sentences of length one and two — refines initial impressions of heads, introduces dependents, and exposes their identities and relative positions.
- Step k + 1 augments training input to include lengths 1, 2, . . . , k, k + 1 of the full data set and executes local search starting from the (smoothed) model estimated by step k. This truly is grammar induction.
Less is More
- just us-ing simple, short sentences is not enough
- We find a “sweet spot” — sentence lengths that are neither too long (excluding the truly daunting examples) nor too
few (supplying enough accessible information), using Baby Steps’ learning curve as a guide.
Leapfrog
- As an alternative to discarding data, a better use of resources is to combine the results of batch and iterative training up to the sweet spot data gradation, then iterate with a large step size.