Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-24 [2012/01/27 21:27]
straka
+++ courses:mapreduce-tutorial:step-24 [2012/01/27 21:41]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Mappers, running Java Hadoop jobs ======
-We start by exploring a simple Hadoop job with Mapper only. The Mapper outputs only keys starting with ''A''.
+We start by going through a simple Hadoop job with Mapper only.
+A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html|Mapper<Kin, Vin, Kout, Vout>]]. In our case, ''TheMapper'' is subclass of ''Mapper<Text, Text, Text, Text>''.
+http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html
+ The Mapper outputs only keys starting with ''A''.
 <file java MapperOnlyHadoopJob.java>
 import java.io.IOException;
@@ Line 63: / Line 71: @@
 Download the source and compile it.
-The //official// way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner:
+The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner:
-  * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar input_path output_path'' executes the given job locally in a single thread. It is useful for debugging.
+  * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging.
-  * ''net/projects/hadoop/bin/hadoop -jt cluster_master [-r number_of_reducers] job.jar input_path output_path'' submits the job to given ''cluster_master''.
+  * ''net/projects/hadoop/bin/hadoop -jt cluster_master [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- submits the job to given ''cluster_master''.
-  * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar input_path output_path'' creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.
+  * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences