[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:15]
straka vytvořeno
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:35]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : ======+====== MapReduce Tutorial : Hadoop job overview ====== 
 + 
 +A regular Hadoop job consists of: 
 +  * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/net/projects/hadoop/bin/compute-splitsize input nr_of_mappers'' can be used to compute the size of a split, such that the resulting job would consist of specified number of mappers. 
 +  * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. 
 +  * [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. 
 +  * [optional] //a partitioner// -- partitioner is executed on every (key, value) pair produced by mapper, and outputs the number of the reducer which should process this pair. 
 + 
 +An AllReduce Hadoop job ([[.:step-16|Perl version]], [[.:step-31|Java version]] consists of a mapper only. All the mappers must be executed simultaneously and can communicate using a ''allReduce'' function. 
 + 
 +Any Hadoop job can run: 
 +  * on a cluster. A separate process is used for every mapper and reducer. 
 +  * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// 

[ Back to the navigation ] [ Back to the content ]