Differences

This shows you the differences between two versions of the page.

--- spark:recipes:using-perl-via-pipes [2014/11/07 13:51]
straka
+++ spark:recipes:using-perl-via-pipes [2014/11/07 14:16]
straka
@@ Line 84: / Line 84: @@
 sc = SparkContext()
 (sc.textFile(input)
-   .map(json.dumps).pipe("perl tokenize.pl", os.environ).map(json.loads)
+   .map(json.dumps).pipe("env perl tokenize.pl", os.environ).map(json.loads)
    .flatMap(lambda tokens: map(lambda x: (x, 1), tokens))
    .reduceByKey(lambda x,y: x + y)
    .saveAsTextFile(output))
+sc.stop()
 </file>
-It can be executed using ''spark-submit perl_integration.py input output''.
+It can be executed using ''spark-submit --files tokenize.pl perl_integration.py input output''. Note that the Perl script has to be added to the list of files used by the job.
 ===== Using Scala and JSON =====
@@ Line 113: / Line 114: @@
 rdd.map(encodeJson).pipe("perl script.pl").map(decodeJson[ProcessedType])
 </file>

Institute of Formal and Applied Linguistics Wiki