[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
spark:spark-introduction [2022/12/14 12:36]
straka [Word Count Example]
spark:spark-introduction [2022/12/14 12:42]
straka [K-Means Example]
Line 63: Line 63:
  
 ===== K-Means Example ===== ===== K-Means Example =====
-An example implementing [[http://en.wikipedia.org/wiki/K-means_clustering|Standard iterative K-Means algorithm]] follows. Try copying it to open Python shell. Note that this wiki is formating empty lines as lines with one space, which is confusing for ''pyspark'' used without ''IPYTHON=1'', so either use ''IPYTHON=1'' or copy the text paragraph-by-paragraph.+An example implementing [[http://en.wikipedia.org/wiki/K-means_clustering|Standard iterative K-Means algorithm]] follows.
 <file python> <file python>
 import numpy as np import numpy as np
Line 70: Line 70:
     return min((np.sum((point - centers[i]) ** 2), i) for i in range(len(centers)))[1]     return min((np.sum((point - centers[i]) ** 2), i) for i in range(len(centers)))[1]
  
-lines = sc.textFile("/net/projects/spark-example-data/points", sc.defaultParallelism)+lines = sc.textFile("/lnet/troja/data/npfl118/points/points-medium.txt", sc.defaultParallelism)
 data = lines.map(lambda line: np.array(map(float, line.split()))).cache() data = lines.map(lambda line: np.array(map(float, line.split()))).cache()
  
-K = 50+K = 100
 epsilon = 1e-3 epsilon = 1e-3
  
Line 111: Line 111:
   centers.map(center => (center-point).norm(2)).zipWithIndex.min._2   centers.map(center => (center-point).norm(2)).zipWithIndex.min._2
  
-val lines = sc.textFile("/net/projects/spark-example-data/points", sc.defaultParallelism)+val lines = sc.textFile("/lnet/troja/data/npfl118/points/points-medium.txt", sc.defaultParallelism)
 val data = lines.map(line => Vector(line.split("\\s+").map(_.toDouble))).cache() val data = lines.map(line => Vector(line.split("\\s+").map(_.toDouble))).cache()
  
-val K = 50+val K = 100
 val epsilon = 1e-3 val epsilon = 1e-3
  

[ Back to the navigation ] [ Back to the content ]