[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:31]
straka
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:35]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Hadoop job overview ====== ====== MapReduce Tutorial : Hadoop job overview ======
  
-A Hadoop job consists of:+regular Hadoop job consists of:
   * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/net/projects/hadoop/bin/compute-splitsize input nr_of_mappers'' can be used to compute the size of a split, such that the resulting job would consist of specified number of mappers.   * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/net/projects/hadoop/bin/compute-splitsize input nr_of_mappers'' can be used to compute the size of a split, such that the resulting job would consist of specified number of mappers.
   * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1.   * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1.
Line 7: Line 7:
   * [optional] //a partitioner// -- partitioner is executed on every (key, value) pair produced by mapper, and outputs the number of the reducer which should process this pair.   * [optional] //a partitioner// -- partitioner is executed on every (key, value) pair produced by mapper, and outputs the number of the reducer which should process this pair.
  
-Hadoop job can run:+An AllReduce Hadoop job ([[.:step-16|Perl version]], [[.:step-31|Java version]] consists of a mapper only. All the mappers must be executed simultaneously and can communicate using a ''allReduce'' function. 
 + 
 +Any Hadoop job can run:
   * on a cluster. A separate process is used for every mapper and reducer.   * on a cluster. A separate process is used for every mapper and reducer.
   * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.//   * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.//
  

[ Back to the navigation ] [ Back to the content ]