Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
spark:recipes:using-perl-via-pipes [2014/11/07 10:59] straka |
spark:recipes:using-perl-via-pipes [2014/11/07 14:11] straka |
||
---|---|---|---|
Line 34: | Line 34: | ||
On the Python side, the Perl script is used in the following way: | On the Python side, the Perl script is used in the following way: | ||
<file python> | <file python> | ||
- | ... | ||
import json | import json | ||
import os | import os | ||
+ | |||
... | ... | ||
+ | |||
# let rdd be an RDD we want to process | # let rdd be an RDD we want to process | ||
rdd.map(json.dumps).pipe(" | rdd.map(json.dumps).pipe(" | ||
</ | </ | ||
- | ==== Complete Example using Simple Perl Tokenizer ==== | + | ==== Complete Example using Simple Perl Tokenizer |
Suppose we want to write program which uses Perl Tokenizer and then produces token counts. | Suppose we want to write program which uses Perl Tokenizer and then produces token counts. | ||
Line 83: | Line 84: | ||
sc = SparkContext() | sc = SparkContext() | ||
(sc.textFile(input) | (sc.textFile(input) | ||
- | | + | |
| | ||
| | ||
Line 93: | Line 94: | ||
===== Using Scala and JSON ===== | ===== Using Scala and JSON ===== | ||
+ | The Perl side is the same as in [[# | ||
+ | |||
+ | The Scala side is a bit more complicated that the Python, because in Scala the '' | ||
+ | <file scala> | ||
+ | def encodeJson[T <: AnyRef](src: | ||
+ | implicit val formats = org.json4s.jackson.Serialization.formats(org.json4s.NoTypeHints) | ||
+ | return org.json4s.jackson.Serialization.write[T](src) | ||
+ | } | ||
+ | |||
+ | def decodeJson[T: | ||
+ | implicit val formats = org.json4s.jackson.Serialization.formats(org.json4s.NoTypeHints) | ||
+ | return org.json4s.jackson.Serialization.read[T](src) | ||
+ | } | ||
+ | |||
+ | ... | ||
+ | |||
+ | // let rdd be an RDD we want to process, creating '' | ||
+ | rdd.map(encodeJson).pipe(" | ||
+ | </ | ||