Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:longdtreport [2012/03/12 22:41]
longdt
+++ courses:rg:2012:longdtreport [2012/03/12 22:45]
longdt
@@ Line 14: / Line 14: @@
 => Maintain value rank array is a good way to encode count
 **II. Encoding the n-gram**
 **//Idea//**
 encode W1,W2....Wn = c(W1,W2...W(n-1)) Wn
 c is offset function, so call context encoding
-**//Implementation//
-**
+**//Implementation//**
 Sorted Array
   + Use n array for n-gram model (array i-th is used for i-gram)
@@ Line 24: / Line 26: @@
             + w : index of that word in unigram array
             + c : offset pointer
+  + Sort base on w
+Improvement : Implicitly encode W (all n-gram ending with particular word wi are stored -> wasteful. So, maintain another array save the beginning and the end of the range
+Hash Table
 Most of the attendants apparently understood the talk and the paper well, and a

Institute of Formal and Applied Linguistics Wiki