Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-12 [2012/01/25 21:09] straka |
courses:mapreduce-tutorial:step-12 [2012/01/31 09:39] (current) straka Change Perl commandline syntax. |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== MapReduce Tutorial : Additional output from mappers and reducers ====== | ====== MapReduce Tutorial : Additional output from mappers and reducers ====== | ||
| + | |||
| + | Sometimes it would be useful to create output files manually in reducers -- either multiple files are needed per reducer, or a specific file format is desired. | ||
| + | |||
| + | Problem is that Hadoop framework can spawn several task attempts for the same reducer task -- either because of speculative execution, or if one reduce attempt is presumed to have crashed, even if it in fact did not. | ||
| + | |||
| + | For these reasons Hadoop creates an output directory for every reduce attempt it makes. If the reducer finishes successfully, | ||
| + | |||
| + | Both these informations are available in Perl API using environmental variables: | ||
| + | * '' | ||
| + | * '' | ||
| + | |||
| + | ===== Reduce-less jobs ===== | ||
| + | If a MR job runs without reducers, the output of mappers is written to output directory without further processing. In this case, environmental variable '' | ||
| + | |||
| + | |||
| + | ===== Exercise ===== | ||
| + | Change the word counting script {{: | ||
| + | |||
| + | wget --no-check-certificate ' | ||
| + | # NOW EDIT THE FILE | ||
| + | # $EDITOR step-12-exercise.pl | ||
| + | rm -rf step-12-out-ex; | ||
| + | less step-12-out-ex/ | ||
| + | |||
| + | ==== Solution ==== | ||
| + | You can also download the solution {{: | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-12-solution.pl | ||
| + | rm -rf step-12-out-sol; | ||
| + | less step-12-out-sol/ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | < | ||
| + | <table style=" | ||
| + | <tr> | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </tr> | ||
| + | </ | ||
| + | </ | ||
