Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-24 [2012/01/30 15:38] majlis |
courses:mapreduce-tutorial:step-24 [2012/01/31 09:52] straka Change Java commandline syntax. |
===== Running the job ===== | ===== Running the job ===== |
The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: | The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: |
* ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. | * ''net/projects/hadoop/bin/hadoop job.jar [-Dname=value -Dname=value ...] input output_path'' -- executes the given job locally in a single thread. It is useful for debugging. |
* ''net/projects/hadoop/bin/hadoop -jt cluster_master [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- submits the job to given ''cluster_master''. | * ''net/projects/hadoop/bin/hadoop job.jar -jt cluster_master [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- submits the job to given ''cluster_master''. |
* ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. | * ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. |
| |
===== Exercise ===== | ===== Exercise ===== |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' |
make -f /net/projects/hadoop/java/Makefile MapperOnlyHadoopJob.jar | make -f /net/projects/hadoop/java/Makefile MapperOnlyHadoopJob.jar |
rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop -r 0 MapperOnlyHadoopJob.jar /home/straka/wiki/cs-text-small step-24-out-sol | rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop MapperOnlyHadoopJob.jar -r 0 /home/straka/wiki/cs-text-small step-24-out-sol |
less step-24-out-sol/part-* | less step-24-out-sol/part-* |
| |