Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-7 [2012/01/29 20:51]
straka
+++ courses:mapreduce-tutorial:step-7 [2012/01/31 09:41]
straka Change Perl commandline syntax.
@@ Line 11: / Line 11: @@
 ===== Using a running cluster =====
 Running cluster is identified by its master. When running a Hadoop job using Perl API, existing cluster can be used by
-  perl script.pl run -jt cluster_master:9001 ...
+  perl script.pl -jt cluster_master:9001 ...
 ===== Example =====
@@ Line 18: / Line 18: @@
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-6.txt' -O 'step-7-wordcount.pl'
   /net/projects/hadoop/bin/hadoop-cluster -c 1 -w 600
-  rm -rf step-7-out-sol; perl step-7-wordcount.pl run -jt cluster_master:9001 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium step-7-out-sol
+  # NOW VIEW THE FILE
+  # $EDITOR step-7-wordcount.pl
+  rm -rf step-7-out-sol; perl step-7-wordcount.pl -jt cluster_master:9001 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium step-7-out-sol
   less less step-7-out-sol/part-*
 Remarks:
-  * The reducers seem to start running before the mappers finishes. In the web interface, the running time of reducers is divided into thirds: during the first 33%, the mapper outputs are copied
+  * The reducers seem to start running before the mappers finish. In the web interface, the running time of reducers is divided into thirds:
+    * during the first 33%, the mapper outputs are copied to the machine where reducer runs.
+    * during the second 33%, the (key, value) pairs are sorted.
+    * during the last 33%, the user-defined reducer runs.
 ----

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences