Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:rg:2012:longdtreport [2012/03/12 20:16] longdt |
courses:rg:2012:longdtreport [2012/03/12 22:44] longdt |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Faster and Smaller N-Gram Language Model ====== | ====== Faster and Smaller N-Gram Language Model ====== | ||
- | //Presenter : Joachim Daiber | + | //Presenter : Joachim Daiber |
- | Subscriber | + | Reporter: Long DT// |
- | Date : 12-March-2012 | + | Date : 12-March-2012\\ |
- | \\ | + | |
==== Overview ==== | ==== Overview ==== | ||
- | On Monday, October 24th 2011, we heard a talk about a paper by Valentin | + | The talk is mainly |
- | Spitkovsky, Hiyan Alshawi and Daniel Jurafsky on enhancing unsupervised | + | How it will run faster |
- | parsers. The paper itself focuses on improving the state of the art in | + | |
- | unsupervised parsing, | + | |
- | certainly makes it a paper worth notice. | + | |
- | ==== Notes ==== | + | ==== Encoding |
+ | **I. Encoding the count** | ||
+ | In web1T corpus, the most frequent n-gram is 95 billion times, but contain only 770 000 unique count. | ||
+ | => Maintain value rank array is a good way to encode count | ||
+ | **II. Encoding the n-gram** | ||
+ | **// | ||
+ | encode W1,W2....Wn = c(W1, | ||
+ | c is offset function, so call context encoding | ||
+ | |||
+ | **// | ||
+ | |||
+ | Sorted Array | ||
+ | + Use n array for n-gram model (array i-th is used for i-gram) | ||
+ | + Each element in array in pair (w,c) | ||
+ | + w : index of that word in unigram array | ||
+ | + c : offset pointer | ||
+ | + Sort base on w | ||
+ | // | ||
+ | |||
+ | Hash Table | ||
+ | | ||
Most of the attendants apparently understood the talk and the paper well, and a | Most of the attendants apparently understood the talk and the paper well, and a | ||
lively discussion followed. One of our first topics of debate was the notion of | lively discussion followed. One of our first topics of debate was the notion of |