[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision Both sides next revision
spark:recipes:using-perl-via-pipes [2014/11/04 15:33]
straka created
spark:recipes:using-perl-via-pipes [2014/11/07 10:38]
straka
Line 1: Line 1:
 ====== Using Perl via Pipes ====== ====== Using Perl via Pipes ======
  
-Perl can be used to process ''RDD'' elements using pipes.+Perl can be used to process ''RDD'' elements using pipes. Although this allows using Perl libraries for tokenization/parsing/etc., it is only a limited Perl integration in Spark. Notably: 
 +  - A //driver// program in Python (Scala,Java) still has to exist. 
 +  - Perl programs can operate only on individual ''RDD'' elements, meaning that more complex operations (''reduceByKey'', ''union'', ''join'', ''sortBy'', i.e., operations defining order of multiple elements or joining of multiple elements) can be implemented in the //driver// program only. 
 +Still, this functionality is useful when libraries available only in Perl have to be used.
  
-Here we show how data can be passed from Python/Scala to Perl and back using JSON format, which allows preserving data types -- ''RDD'' elements can be strings, numbers, +Here we show how data can be passed from Python to Perl and back using JSON format, which allows preserving data types -- ''RDD'' elements can be strings, numbers and array (note that Perl has no native tuples). 
 + 
 +===== Using Python and JSON ===== 
 + 
 +Using JSON formatwe can easily serialize and deserialize the data we want to pass from Python to Perl and back. JSON format is used because: 
 +  - It allows serializing numbers, strings and arrays. 
 +  - The serialized JSON string contains no newlines, which fits the line-oriented Spark piping. 
 +  - Libraries for JSON serialization/deserialization are available in both languages. 
 + 
 +===== Using Scala and Java ===== 
 + 
 +Scala and Java can be used in similar way as Python to communicate with Perl scripts via pipes. Nevertheless, available JSON libraries
  

[ Back to the navigation ] [ Back to the content ]