Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-3 [2012/01/25 21:33] straka |
courses:mapreduce-tutorial:step-3 [2012/01/27 21:01] straka |
||
---|---|---|---|
Line 3: | Line 3: | ||
The simplest Hadoop job consists of a mapper only. The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper. | The simplest Hadoop job consists of a mapper only. The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper. | ||
- | The Hadoop framework handles | + | The Hadoop framework |
===== Example Perl mapper ===== | ===== Example Perl mapper ===== | ||
- | <file perl mapper.pl> | + | <file perl> |
# | # | ||
Line 34: | Line 34: | ||
The values '' | The values '' | ||
- | Resulting script can be executed locally | + | Resulting script can be executed locally |
perl script.pl run input_directory output_directory | perl script.pl run input_directory output_directory | ||
All files in input_directory are processes. The output_directory must not exist. | All files in input_directory are processes. The output_directory must not exist. | ||
Line 40: | Line 40: | ||
===== Exercise ===== | ===== Exercise ===== | ||
- | To check that your Hadoop environment works, try running a MR job on ''/ | + | To check that your Hadoop environment works, try running a MR job on ''/ |
+ | rm -rf step-3-output; | ||
{{.: | {{.: | ||
+ |