Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:rg:2012:longdtreport [2012/03/12 20:22] longdt |
courses:rg:2012:longdtreport [2012/03/12 22:39] longdt |
||
---|---|---|---|
Line 9: | Line 9: | ||
How it will run faster and use smaller amount of memory. | How it will run faster and use smaller amount of memory. | ||
- | ==== Notes ==== | + | ==== Encoding |
+ | I. Encoding the count | ||
+ | In web1T corpus, the most frequent n-gram is 95 billion times, but contain only 770 000 unique count. | ||
+ | => Maintain value rank array is a good way to encode count | ||
+ | II. Encoding the n-gram | ||
+ | **Idea** | ||
Most of the attendants apparently understood the talk and the paper well, and a | Most of the attendants apparently understood the talk and the paper well, and a | ||
lively discussion followed. One of our first topics of debate was the notion of | lively discussion followed. One of our first topics of debate was the notion of |