Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:31]
straka
+++ courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:35]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Hadoop job overview ======
-A Hadoop job consists of:
+A regular Hadoop job consists of:
   * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/net/projects/hadoop/bin/compute-splitsize input nr_of_mappers'' can be used to compute the size of a split, such that the resulting job would consist of specified number of mappers.
   * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1.
@@ Line 7: / Line 7: @@
   * [optional] //a partitioner// -- partitioner is executed on every (key, value) pair produced by mapper, and outputs the number of the reducer which should process this pair.
-A Hadoop job can run:
+An AllReduce Hadoop job ([[.:step-16|Perl version]], [[.:step-31|Java version]] consists of a mapper only. All the mappers must be executed simultaneously and can communicate using a ''allReduce'' function.
+Any Hadoop job can run:
   * on a cluster. A separate process is used for every mapper and reducer.
   * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.//

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences