Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-13 [2012/01/25 23:00]
straka
+++ courses:mapreduce-tutorial:step-13 [2012/01/31 15:52]
straka
@@ Line 3: / Line 3: @@
 You are given data consisting of (31-bit integer, string data) pairs. These are available in plain text format:
 ^ Path ^ Size ^
-| /home/straka/hadoop/example-inputs/numbers-small | 3MB |
+| /net/projects/hadoop/examples/inputs/numbers-small | 3MB |
-| /home/straka/hadoop/example-inputs/numbers-medium | 184MB |
+| /net/projects/hadoop/examples/inputs/numbers-medium | 184MB |
-| /home/straka/hadoop/example-inputs/numbers-large | 916MB |
+| /net/projects/hadoop/examples/inputs/numbers-large | 916MB |
 You can assume that the integers are uniformly distributed.
-Your task is to sort these data. Your solution should work for TBs of data. For that reason, you must use multiple reducers. If your job is executed using //r// reducers, the output consists of //r// files, which when concatenated would produce sorted (key, value) pairs. In other words, each of the output files contains sorted (integer, data) pairs and all keys in one file are either smaller or larger than in other file.
+Your task is to sort these data, comparing the key numerically and not lexicographically. In the output file, key '1' should be written as '1'.
+Your solution should work for TBs of data. For that reason, you must use multiple reducers. If your job is executed using //r// reducers, the output consists of //r// files, which when concatenated would produce sorted (key, value) pairs. In other words, each of the output files contains sorted (integer, data) pairs and all keys in one file are either smaller or larger than in other file. Your solution should work for any value //r// -- this value is given to [[.:step-8#partitioning|the partitioner]] as its fourth argument.
 ===== Nonuniform data =====
@@ Line 15: / Line 17: @@
 ^ Path ^ Size ^
-| /home/straka/hadoop/example-inputs/nonuniform-small | 3MB |
+| /net/projects/hadoop/examples/inputs/nonuniform-small | 3MB |
-| /home/straka/hadoop/example-inputs/nonuniform-medium | 160MB |
+| /net/projects/hadoop/examples/inputs/nonuniform-medium | 160MB |
-| /home/straka/hadoop/example-inputs/nonuniform-large | 797MB |
+| /net/projects/hadoop/examples/inputs/nonuniform-large | 797MB |
 Assume we want to produce //r// output files. One of the solutions is to perform two Hadoop jobs:
@@ Line 23: / Line 25: @@
   - Find best //r-1// integer separators using the sampled data.
   - Run the second pass, using the separators to guide the partitioning.
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-14|Step 14]]: N-gram language model.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences