[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:rg:2012:longdtreport [2012/03/12 22:39]
longdt
courses:rg:2012:longdtreport [2012/03/12 22:41]
longdt
Line 10: Line 10:
  
 ==== Encoding ==== ==== Encoding ====
-I. Encoding the count+**I. Encoding the count**
 In web1T corpus, the most frequent n-gram is 95 billion times, but contain only 770 000 unique count.  In web1T corpus, the most frequent n-gram is 95 billion times, but contain only 770 000 unique count. 
 => Maintain value rank array is a good way to encode count => Maintain value rank array is a good way to encode count
-II. Encoding the n-gram +**II. Encoding the n-gram** 
-**Idea**+//Idea// 
 +encode W1,W2....Wn = c(W1,W2...W(n-1)) Wn  
 +c is offset function, so call context encoding 
 +//Implementation// 
 +- Sorted Array 
 +  + Use n array for n-gram model (array i-th is used for i-gram) 
 +  + Each element in array in pair (w,c) 
 +            + w : index of that word in unigram array 
 +            + c : offset pointer 
 +             
 +  
 Most of the attendants apparently understood the talk and the paper well, and a Most of the attendants apparently understood the talk and the paper well, and a
 lively discussion followed. One of our first topics of debate was the notion of lively discussion followed. One of our first topics of debate was the notion of

[ Back to the navigation ] [ Back to the content ]