[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2012:longdtreport [2012/03/12 20:11]
longdt vytvořeno
courses:rg:2012:longdtreport [2012/03/12 22:42]
longdt
Line 1: Line 1:
-Test+====== Faster and Smaller N-Gram Language Model ====== 
 +//Presenter :  Joachim Daiber 
 +Reporter: Long DT// 
 +Date : 12-March-2012\\ 
 + 
 +==== Overview ==== 
 + 
 +The talk is mainly about techniques to improve performance of N-gram language model.  
 +How it will run faster and use smaller amount of memory.  
 + 
 +==== Encoding ==== 
 +**I. Encoding the count** 
 +In web1T corpus, the most frequent n-gram is 95 billion times, but contain only 770 000 unique count.  
 +=> Maintain value rank array is a good way to encode count 
 +**II. Encoding the n-gram** 
 + 
 +**//Idea//** 
 +encode W1,W2....Wn = c(W1,W2...W(n-1)) Wn  
 +c is offset function, so call context encoding 
 + 
 +**//Implementation//** 
 + 
 +Sorted Array 
 +  + Use n array for n-gram model (array i-th is used for i-gram) 
 +  + Each element in array in pair (w,c) 
 +            + w : index of that word in unigram array 
 +            + c : offset pointer 
 +             
 +   
 +Most of the attendants apparently understood the talk and the paper well, and a 
 +lively discussion followed. One of our first topics of debate was the notion of 
 +skyline presented in the paper. The skyline was somewhat of a supervised element 
 +-- the authors estimated initial parameters for a model from gold data and 
 +trained it afterwards. They assumed that a model with parameters estimated from 
 +gold data cannot be beaten by an unsupervisedly trained model. Verily, after 
 +training the skyline model, its accuracy dropped very significantly. The reasons 
 +of this were a point of surprise for us as well as for the paper's authors. 
 + 
 +Complementary to the skyline, the authors presented a baseline which should 
 +definitely be beaten by their final model. This baseline, they called 
 +"uninformed", but were vague about which exact probability distribution they 
 +used in this model. We could only speculate it was a uniform or random 
 +probability distribution. 
 + 
 +A point about unsupervised language modeling came out: Many linguistic phenomena 
 +are annotated in a way that is to some extent arbitrary, and reflects more the 
 +linguistic theory used than the language itself, and an unsupervised model 
 +cannot hope to get them right. The example we discussed was whether the word 
 +"should" is governing the verb it's bound with, or vice versa. The authors 
 +noticed that dependency orientation in general was not a particularly strong 
 +point of their parser, and so they also included an evaluation metric that 
 +ignored the dependency orientations. 
 + 
 +Perhaps the most crucial observations the authors made was that there is a limit 
 +where feeding more data to the model training hurts its accuracy. They 
 +progressed from short sentences to longer, and identified the threshold, where 
 +it's best to start ignoring any more training data, at sentences of length 15. 
 +However, we were not 100% clear how they computed this constant. 
 +If the model was to be fully unsupervised, it remains a question, how to setup 
 +this threshold, because it cannot be safely assumed that it would be the same 
 +for all languages and setups. 
 + 
 +The writing style of the paper was also a matter of differing opinions. 
 +Undeniably, it is written in a vocabulary-intensive fashion, bringing readers 
 +face to face with words like "unbridled" or "jettison", which I personally had 
 +never seen before. 
 + 
 +==== Conclusion ==== 
 + 
 +All in all, it was a paper worth reading, well presented, and thoroughly 
 +discussed, bringing useful general ideas as well as interesting details.

[ Back to the navigation ] [ Back to the content ]