[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-13 [2012/01/25 15:46]
straka vytvořeno
courses:mapreduce-tutorial:step-13 [2012/01/25 22:53]
straka
Line 1: Line 1:
-====== MapReduce Tutorial :  ======+====== MapReduce Tutorial : Exercise - sorting ====== 
 + 
 +You are given data consisting of (31-bit integer, string data) pairs. These are available in plain text format: 
 +^ Path ^ Size ^ 
 +| /home/straka/hadoop/example-inputs/numbers-small | 3MB | 
 +| /home/straka/hadoop/example-inputs/numbers-medium | 184MB | 
 +| /home/straka/hadoop/example-inputs/numbers-large | 916MB | 
 +You can assume that the integers are uniformly distributed. 
 + 
 +Your task is to sort these data. Your solution should work for TBs of data. For that reason, you must use multiple reducers. If your job is executed using //r// reducers, the output consists of //r// files, which when concatenated would produce sorted (key, value) pairs. In other words, each of the output files contains sorted (integer, data) pairs and all keys in one file are either smaller or larger than in other file. 
 + 
 +===== Nonuniform data ===== 
 + 
 +The  
 +After solving  
 + 
 +^ Path ^ Size ^ 
 +| /home/straka/hadoop/example-inputs/nonuniform-small | 3MB | 
 +| /home/straka/hadoop/example-inputs/nonuniform-medium | 160MB | 
 +| /home/straka/hadoop/example-inputs/nonuniform-large | 797MB | 
 +After you  

[ Back to the navigation ] [ Back to the content ]