Differences

This shows you the differences between two versions of the page.

--- spark:recipes:using-perl-via-pipes [2014/11/07 13:17]
straka
+++ spark:recipes:using-perl-via-pipes [2014/11/07 14:11]
straka
@@ Line 43: / Line 43: @@
 </file>
-==== Complete Example using Simple Perl Tokenizer ====
+==== Complete Example using Simple Perl Tokenizer and Python ====
 Suppose we want to write program which uses Perl Tokenizer and then produces token counts.
@@ Line 84: / Line 84: @@
 sc = SparkContext()
 (sc.textFile(input)
-   .map(json.dumps).pipe("perl tokenize.pl", os.environ).map(json.loads)
+   .map(json.dumps).pipe("env perl tokenize.pl", os.environ).map(json.loads)
    .flatMap(lambda tokens: map(lambda x: (x, 1), tokens))
    .reduceByKey(lambda x,y: x + y)
@@ Line 90: / Line 90: @@
 </file>
-It can be executed using ''spark-submit --files tokenize.pl perl_integration.py input output''.
+It can be executed using ''spark-submit perl_integration.py input output''.
 ===== Using Scala and JSON =====
@@ Line 113: / Line 113: @@
 rdd.map(encodeJson).pipe("perl script.pl").map(decodeJson[ProcessedType])
 </file>

Institute of Formal and Applied Linguistics Wiki