[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
spark:using-python [2014/11/10 15:36]
straka
spark:using-python [2014/11/10 15:42]
straka
Line 16: Line 16:
 Consider the following simple script computing 10 most frequent words of Czech Wikipedia: Consider the following simple script computing 10 most frequent words of Czech Wikipedia:
 <file python> <file python>
-(sc.textFile("/net/projects/spark-example-data/wiki-cs")+(sc.textFile("/net/projects/spark-example-data/wiki-cs", 3*sc.defaultParallelism)
    .flatMap(lambda line: line.split())    .flatMap(lambda line: line.split())
    .map(lambda word: (word, 1))    .map(lambda word: (word, 1))
Line 56: Line 56:
        
 sc = SparkContext() sc = SparkContext()
-(sc.textFile(input)+(sc.textFile(input, 3*sc.defaultParallelism)
    .flatMap(lambda line: line.split())    .flatMap(lambda line: line.split())
    .map(lambda token: (token, 1))    .map(lambda token: (token, 1))

[ Back to the navigation ] [ Back to the content ]