[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-13 [2012/01/25 22:53]
straka
courses:mapreduce-tutorial:step-13 [2012/01/25 23:00]
straka
Line 12: Line 12:
 ===== Nonuniform data ===== ===== Nonuniform data =====
  
-The  +Assuming uniformity of the integer keys is quite a big deal. Try improving your solution to sort keys with any distribution. You can use the exponentially distributed data available here:
-After solving +
  
 ^ Path ^ Size ^ ^ Path ^ Size ^
Line 19: Line 18:
 | /home/straka/hadoop/example-inputs/nonuniform-medium | 160MB | | /home/straka/hadoop/example-inputs/nonuniform-medium | 160MB |
 | /home/straka/hadoop/example-inputs/nonuniform-large | 797MB | | /home/straka/hadoop/example-inputs/nonuniform-large | 797MB |
-After you + 
 +Assume we want to produce //r// output files. One of the solutions is to perform two Hadoop jobs: 
 +  - Go through the data and sample only a small fraction of the keys. As there are not so many of them, they can fit in one reducer. 
 +  - Find best //r-1// integer separators using the sampled data. 
 +  - Run the second pass, using the separators to guide the partitioning.
  

[ Back to the navigation ] [ Back to the content ]