[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
spark:recipes:using-perl-via-pipes [2014/11/11 09:36]
straka
spark:recipes:using-perl-via-pipes [2024/09/27 09:25] (current)
straka [Complete Example using Simple Perl Tokenizer and Scala]
Line 84: Line 84:
 sc = SparkContext() sc = SparkContext()
 (sc.textFile(input) (sc.textFile(input)
-   .map(json.dumps).pipe("env perl tokenize.pl", os.environ).map(json.loads)+   .map(json.dumps).pipe("perl tokenize.pl", os.environ).map(json.loads)
    .flatMap(lambda tokens: map(lambda x: (x, 1), tokens))    .flatMap(lambda tokens: map(lambda x: (x, 1), tokens))
    .reduceByKey(lambda x,y: x + y)    .reduceByKey(lambda x,y: x + y)
Line 155: Line 155:
  
 After compiling ''perl_integration.scala'' with ''sbt'', we can execute it using After compiling ''perl_integration.scala'' with ''sbt'', we can execute it using
-  spark-submit --class Main --files tokenize.pl target/scala-2.10/perl_integration_2.10-1.0.jar input output+  spark-submit --files tokenize.pl target/scala-2.12/perl_integration_2.12-1.0.jar input output
  

[ Back to the navigation ] [ Back to the content ]