====== From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing ====== //Talk by Martin Majliš\\ Report by Oldřich Krůza//\\ ==== Introduction ==== On Monday, October 24th 2011, we heard a talk about a paper by Valentin Spitkovsky, Hiyan Alshawi and Daniel Jurafsky on enhancing unsupervised language parsers. The paper itself focuses on improving the state of the art in unsupervised parsing, and reports a success in a rate of percents, which certainly makes it a paper worth notice. ==== Notes ==== Most of the attendants apparently understood the talk and the paper well, and a lively discussion followed. One of our first topics of debate was the notion of skyline presented in the paper. The skyline was somewhat of a supervised element -- the authors estimated initial parameters for a model from gold data and trained it afterwards. They assumed that a model with parameters estimated from gold data cannot be beaten by an unsupervisedly trained model. Verily, after training the skyline model, its accuracy dropped very significantly. The reasons of this were a point of surprise for us as well as for the paper's authors. Complementary to the skyline, the authors presented a baseline which should definitely be beaten by their final model. This baseline, they called "uninformed", but were vague about which exact probability distribution they used in this model. We could only speculate it was a uniform or random probability distribution. A point about unsupervised language modeling came out: Many linguistic phenomena are annotated in a way that is to some extent arbitrary, and reflects more the linguistic theory used than the language itself, and an unsupervised model cannot hope to get them right. The example we discussed was whether the word "should" is governing the verb it's bound with, or vice versa. The authors noticed that dependency orientation in general was not a particularly strong point of their parser, and so they also included an evaluation metric that ignored the dependency orientations. Perhaps the most crucial observations the authors made was that there is a limit where feeding more data to the model training hurts its accuracy. They progressed from short sentences to longer, and identified the threshold, where it's best to start ignoring any more training data, at sentences of length 15. However, we were not 100% clear how they computed this constant. If the model was to be fully unsupervised, it remains a question, how to setup this threshold, because it cannot be safely assumed that it would be the same for all languages and setups. The writing style of the paper was also a matter of differing opinions. Undeniably, it is written in a vocabulary-intensive fashion, bringing readers face to face with words like "unbridled" or "jettison", which I personally had never seen before. ==== Conclusion ==== All in all, it was a paper worth reading, well presented, and thoroughly discussed, bringing useful general ideas as well as interesting details.