[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
spark:spark-introduction [2014/10/03 15:02]
straka
spark:spark-introduction [2014/10/06 11:18]
straka
Line 27: Line 27:
   words = wiki.flatMap(lambda line: line.split())   words = wiki.flatMap(lambda line: line.split())
   counts = words.map(lambda word: (word, 1)).reduceByKey(lambda c1,c2: c1+c2)   counts = words.map(lambda word: (word, 1)).reduceByKey(lambda c1,c2: c1+c2)
-  sorted = counts.sortBy(lambda (word,count): count)+  sorted = counts.sortBy(lambda (word,count): count, ascending=False)
   sorted.saveAsTextFile('output')   sorted.saveAsTextFile('output')
      
Line 35: Line 35:
      .map(lambda word: (word, 1))      .map(lambda word: (word, 1))
      .reduceByKey(lambda c1,c2: c1+c2)      .reduceByKey(lambda c1,c2: c1+c2)
-     .sortBy(lambda (word,count): count) +     .sortBy(lambda (word,count): count, ascending=False
-     .take(100)) # Instead of saveAsTextFile, we only print 100 most frequent words+     .take(10)) # Instead of saveAsTextFile, we only print 10 most frequent words
 The output of 'saveAsTextFile' is the directory ''output'' -- because the RDD can be distributed on several computers, the output is a directory containing possibly multiple files. The output of 'saveAsTextFile' is the directory ''output'' -- because the RDD can be distributed on several computers, the output is a directory containing possibly multiple files.
  
Line 43: Line 43:
   val words = wiki.flatMap(line => line.split("\\s"))   val words = wiki.flatMap(line => line.split("\\s"))
   val counts = words.map(word => (word,1)).reduceByKey((c1,c2) => c1+c2)     val counts = words.map(word => (word,1)).reduceByKey((c1,c2) => c1+c2)  
-  val sorted = counts.sortBy({case (word, count) => count}, false)+  val sorted = counts.sortBy({case (word, count) => count}, ascending=false)
   sorted.saveAsTextFile('output')   sorted.saveAsTextFile('output')
      
Line 50: Line 50:
      .flatMap(_.split("\\s"))      .flatMap(_.split("\\s"))
      .map((_,1)).reduceByKey(_+_)      .map((_,1)).reduceByKey(_+_)
-     .sortBy(_._2, false) +     .sortBy(_._2, ascending=false) 
-     .take(100))+     .take(10))
  

[ Back to the navigation ] [ Back to the content ]