Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:longdtreport [2012/03/12 20:11]
longdt vytvořeno
+++ courses:rg:2012:longdtreport [2012/03/12 20:17]
longdt
@@ Line 1: / Line 1: @@
-Test
+====== Faster and Smaller N-Gram Language Model ======
+//Presenter :  Joachim Daiber
+Subscriber : Long DT//
+Date : 12-March-2012\\
+==== Overview ====
+On Monday, October 24th 2011, we heard a talk about a paper by Valentin
+Spitkovsky, Hiyan Alshawi and Daniel Jurafsky on enhancing unsupervised language
+parsers. The paper itself focuses on improving the state of the art in
+unsupervised parsing, and reports a success in a rate of percents, which
+certainly makes it a paper worth notice.
+==== Notes ====
+Most of the attendants apparently understood the talk and the paper well, and a
+lively discussion followed. One of our first topics of debate was the notion of
+skyline presented in the paper. The skyline was somewhat of a supervised element
+-- the authors estimated initial parameters for a model from gold data and
+trained it afterwards. They assumed that a model with parameters estimated from
+gold data cannot be beaten by an unsupervisedly trained model. Verily, after
+training the skyline model, its accuracy dropped very significantly. The reasons
+of this were a point of surprise for us as well as for the paper's authors.
+Complementary to the skyline, the authors presented a baseline which should
+definitely be beaten by their final model. This baseline, they called
+"uninformed", but were vague about which exact probability distribution they
+used in this model. We could only speculate it was a uniform or random
+probability distribution.
+A point about unsupervised language modeling came out: Many linguistic phenomena
+are annotated in a way that is to some extent arbitrary, and reflects more the
+linguistic theory used than the language itself, and an unsupervised model
+cannot hope to get them right. The example we discussed was whether the word
+"should" is governing the verb it's bound with, or vice versa. The authors
+noticed that dependency orientation in general was not a particularly strong
+point of their parser, and so they also included an evaluation metric that
+ignored the dependency orientations.
+Perhaps the most crucial observations the authors made was that there is a limit
+where feeding more data to the model training hurts its accuracy. They
+progressed from short sentences to longer, and identified the threshold, where
+it's best to start ignoring any more training data, at sentences of length 15.
+However, we were not 100% clear how they computed this constant.
+If the model was to be fully unsupervised, it remains a question, how to setup
+this threshold, because it cannot be safely assumed that it would be the same
+for all languages and setups.
+The writing style of the paper was also a matter of differing opinions.
+Undeniably, it is written in a vocabulary-intensive fashion, bringing readers
+face to face with words like "unbridled" or "jettison", which I personally had
+never seen before.
+==== Conclusion ====
+All in all, it was a paper worth reading, well presented, and thoroughly
+discussed, bringing useful general ideas as well as interesting details.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences