[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-13 [2012/01/25 23:00]
straka
courses:mapreduce-tutorial:step-13 [2012/01/28 23:10]
majlis Added links to previous and next chapter.
Line 3: Line 3:
 You are given data consisting of (31-bit integer, string data) pairs. These are available in plain text format: You are given data consisting of (31-bit integer, string data) pairs. These are available in plain text format:
 ^ Path ^ Size ^ ^ Path ^ Size ^
-| /home/straka/hadoop/example-inputs/numbers-small | 3MB | +| /net/projects/hadoop/examples/inputs/numbers-small | 3MB | 
-| /home/straka/hadoop/example-inputs/numbers-medium | 184MB | +| /net/projects/hadoop/examples/inputs/numbers-medium | 184MB | 
-| /home/straka/hadoop/example-inputs/numbers-large | 916MB |+| /net/projects/hadoop/examples/inputs/numbers-large | 916MB |
 You can assume that the integers are uniformly distributed. You can assume that the integers are uniformly distributed.
  
Line 15: Line 15:
  
 ^ Path ^ Size ^ ^ Path ^ Size ^
-| /home/straka/hadoop/example-inputs/nonuniform-small | 3MB | +| /net/projects/hadoop/examples/inputs/nonuniform-small | 3MB | 
-| /home/straka/hadoop/example-inputs/nonuniform-medium | 160MB | +| /net/projects/hadoop/examples/inputs/nonuniform-medium | 160MB | 
-| /home/straka/hadoop/example-inputs/nonuniform-large | 797MB |+| /net/projects/hadoop/examples/inputs/nonuniform-large | 797MB |
  
 Assume we want to produce //r// output files. One of the solutions is to perform two Hadoop jobs: Assume we want to produce //r// output files. One of the solutions is to perform two Hadoop jobs:
Line 23: Line 23:
   - Find best //r-1// integer separators using the sampled data.   - Find best //r-1// integer separators using the sampled data.
   - Run the second pass, using the separators to guide the partitioning.   - Run the second pass, using the separators to guide the partitioning.
 +
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-14|Step 14]]: N-gram language model.<html></td>
 +</tr>
 +</table>
 +</html>
  

[ Back to the navigation ] [ Back to the content ]