Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:rg:2012:longdtreport [2012/03/12 22:41] longdt |
courses:rg:2012:longdtreport [2012/03/12 22:45] longdt |
||
---|---|---|---|
Line 14: | Line 14: | ||
=> Maintain value rank array is a good way to encode count | => Maintain value rank array is a good way to encode count | ||
**II. Encoding the n-gram** | **II. Encoding the n-gram** | ||
+ | |||
**// | **// | ||
encode W1,W2....Wn = c(W1, | encode W1,W2....Wn = c(W1, | ||
c is offset function, so call context encoding | c is offset function, so call context encoding | ||
- | **// | + | |
- | ** | + | **// |
Sorted Array | Sorted Array | ||
+ Use n array for n-gram model (array i-th is used for i-gram) | + Use n array for n-gram model (array i-th is used for i-gram) | ||
Line 24: | Line 26: | ||
+ w : index of that word in unigram array | + w : index of that word in unigram array | ||
+ c : offset pointer | + c : offset pointer | ||
- | | + | + Sort base on w |
+ | Improvement : Implicitly encode W (all n-gram ending with particular word wi are stored -> wasteful. So, maintain another array save the beginning and the end of the range | ||
+ | |||
+ | Hash Table | ||
| | ||
Most of the attendants apparently understood the talk and the paper well, and a | Most of the attendants apparently understood the talk and the paper well, and a |