Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-24 [2012/01/27 21:52] straka |
courses:mapreduce-tutorial:step-24 [2012/01/27 22:08] straka |
| |
Job job = new Job(getConf(), this.getClass().getName()); // Create class representing Hadoop job. | Job job = new Job(getConf(), this.getClass().getName()); // Create class representing Hadoop job. |
| // Name of the job is the name of current class. |
| |
job.setJarByClass(this.getClass()); // Use jar containing current class. | job.setJarByClass(this.getClass()); // Use jar containing current class. |
</file> | </file> |
| |
===== Running the job ===== | Remarks: |
Download the source and compile it. | * The filename //must// be the same as the name of the class -- this is enforced by Java compiler. |
| * In one class multiple jobs can be submitted, either in sequence or in parallel. |
| * A mismatch of types is usually detected by the compiler, but sometimes it is detected only at runtime. If that happens, an exception is raised and the program crashes. For example, default key output class it ''LongWritable'' -- if ''Text'' was not specified, the program would crash. |
| |
| ===== Running the job ===== |
The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: | The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: |
* ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. | * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. |
* ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. | * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. |
| |
| ===== Exercise ===== |
| Download the ''MapperOnlyHadoopJob.java'', compile it and run it using |
| /net/projects/hadoop/bin/hadoop -r 0 MapperOnlyHadoopJob |