[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-24 [2012/01/30 15:04]
majlis
courses:mapreduce-tutorial:step-24 [2012/01/31 09:52]
straka Change Java commandline syntax.
Line 88: Line 88:
 ===== Running the job ===== ===== Running the job =====
 The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner: The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner:
-  * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop propertiesinput_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging. +  * ''net/projects/hadoop/bin/hadoop job.jar [-Dname=value -Dname=value ...input output_path'' -- executes the given job locally in a single thread. It is useful for debugging. 
-  * ''net/projects/hadoop/bin/hadoop -jt cluster_master [-r number_of_reducers] job.jar [generic Hadoop propertiesinput_path output_path'' -- submits the job to given ''cluster_master''+  * ''net/projects/hadoop/bin/hadoop job.jar -jt cluster_master [-r number_of_reducers] [-Dname=value -Dname=value ...input output_path'' -- submits the job to given ''cluster_master''
-  * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop propertiesinput_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.+  * ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.
  
 ===== Exercise ===== ===== Exercise =====
Line 96: Line 96:
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java'   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java'
   make -f /net/projects/hadoop/java/Makefile MapperOnlyHadoopJob.jar   make -f /net/projects/hadoop/java/Makefile MapperOnlyHadoopJob.jar
-  rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop -r 0 MapperOnlyHadoopJob.jar /home/straka/wiki/cs-text-small step-24-out-sol+  rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop MapperOnlyHadoopJob.jar -r 0 /home/straka/wiki/cs-text-small step-24-out-sol
   less step-24-out-sol/part-*   less step-24-out-sol/part-*
  
Line 102: Line 102:
   * When using ''-r 0'', the job runs faster, as the mappers write the output directly to disk. Buth there are as many output files as mappers and the (key, value) pairs are stored in no special order.   * When using ''-r 0'', the job runs faster, as the mappers write the output directly to disk. Buth there are as many output files as mappers and the (key, value) pairs are stored in no special order.
   * When not specifying ''-r 0'' (i.e., using ''-r 1'' with ''IdentityReducer''), the job produces the same (key, value) pairs. But this time they are in one output file, sorted according to the key. Of course, the job runs slower in this case.   * When not specifying ''-r 0'' (i.e., using ''-r 1'' with ''IdentityReducer''), the job produces the same (key, value) pairs. But this time they are in one output file, sorted according to the key. Of course, the job runs slower in this case.
 +
 +
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-23|Step 23]]: Predefined formats and types.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-25|Step 25]]: Reducers, combiners and partitioners.<html></td>
 +</tr>
 +</table>
 +</html>
  

[ Back to the navigation ] [ Back to the content ]