Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-24 [2012/01/27 21:52] straka |
courses:mapreduce-tutorial:step-24 [2012/01/27 22:08] straka |
| |
Job job = new Job(getConf(), this.getClass().getName()); // Create class representing Hadoop job. | Job job = new Job(getConf(), this.getClass().getName()); // Create class representing Hadoop job. |
| // Name of the job is the name of current class. |
| |
job.setJarByClass(this.getClass()); // Use jar containing current class. | job.setJarByClass(this.getClass()); // Use jar containing current class. |
</file> | </file> |
| |
===== Running the job ===== | Remarks: |
Download the source and compile it. | * The filename //must// be the same as the name of the class -- this is enforced by Java compiler. |
| * In one class multiple jobs can be submitted, either in sequence or in parallel. |
| * A mismatch of types is usually detected by the compiler, but sometimes it is detected only at runtime. If that happens, an exception is raised and the program crashes. |
| |
| ===== Running the job ===== |
The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: | The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: |
* ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. | * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. |
* ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. | * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. |
| |
| ===== Exercise ===== |
| Download the ''MapperOnlyHadoopJob.java'', compile it and run it using |
| /net/projects/hadoop/bin/hadoop -r 0 MapperOnlyHadoopJob |