[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
spark:using-python [2014/11/10 15:36]
straka
spark:using-python [2014/11/10 15:42]
straka
Line 16: Line 16:
 Consider the following simple script computing 10 most frequent words of Czech Wikipedia: Consider the following simple script computing 10 most frequent words of Czech Wikipedia:
 <file python> <file python>
-(sc.textFile("/net/projects/spark-example-data/wiki-cs")+(sc.textFile("/net/projects/spark-example-data/wiki-cs", 3*sc.defaultParallelism)
    .flatMap(lambda line: line.split())    .flatMap(lambda line: line.split())
    .map(lambda word: (word, 1))    .map(lambda word: (word, 1))
Line 36: Line 36:
 ===== Running Python Spark Scripts ===== ===== Running Python Spark Scripts =====
  
-Python Spark scripts can be started using +Python Spark scripts can be started using: 
-  spark-submit+<file>spark-submit</file>
  
 As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one). As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one).
Line 56: Line 56:
        
 sc = SparkContext() sc = SparkContext()
-(sc.textFile(input)+(sc.textFile(input, 3*sc.defaultParallelism)
    .flatMap(lambda line: line.split())    .flatMap(lambda line: line.split())
    .map(lambda token: (token, 1))    .map(lambda token: (token, 1))

[ Back to the navigation ] [ Back to the content ]