Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-12 [2012/01/25 15:46]
straka vytvořeno
+++ courses:mapreduce-tutorial:step-12 [2012/01/25 21:29]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial :  ======
+====== MapReduce Tutorial : Additional output from mappers and reducers ======
+Sometimes it would be useful to create output files manually in reducers -- either multiple files are needed per reducer, or a specific file format is desired.
+Problem is that Hadoop framework can spawn several task attempts for the same reducer task -- either because of speculative execution, or if one reduce attempt is presumed to have crashed, even if it in fact did not.
+For these reasons Hadoop creates an output directory for every reduce attempt it makes. If the reducer finishes successfully, the files in this directory are moved to the output directory. Still, user must ensure different reducers produce different filenames, usually by naming the files using the serial number of reducer.
+Both these informations are available in Perl API using environmental variables:
+  * ''HADOOP_TASK_ID'' -- available in every mapper and reducer. The serial number of the mapper and reducer task (in range 0..number_of_tasks-1).
+  * ''HADOOP_WORK_OUTPUT_PATH'' -- available in a reducer. It contains an existing directory where the reducer can output files. If the reducer finishes successfully, all files and subdirectories will be moved to output directory of the job.
+===== Reduce-less jobs =====
+If a MR job runs without reducers, the output of mappers is written to output directory without further processing. In this case, environmental variable ''HADOOP_WORK_OUTPUT_PATH'' is present even in a mapper and the files created in this directory are copied to the job output directory.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences