Differences

This shows you the differences between two versions of the page.

--- spark:spark-introduction [2022/12/14 12:36]
straka [Word Count Example]
+++ spark:spark-introduction [2022/12/14 12:42]
straka [K-Means Example]
@@ Line 63: / Line 63: @@
 ===== K-Means Example =====
-An example implementing [[http://en.wikipedia.org/wiki/K-means_clustering|Standard iterative K-Means algorithm]] follows. Try copying it to open Python shell. Note that this wiki is formating empty lines as lines with one space, which is confusing for ''pyspark'' used without ''IPYTHON=1'', so either use ''IPYTHON=1'' or copy the text paragraph-by-paragraph.
+An example implementing [[http://en.wikipedia.org/wiki/K-means_clustering|Standard iterative K-Means algorithm]] follows.
 <file python>
 import numpy as np
@@ Line 70: / Line 70: @@
     return min((np.sum((point - centers[i]) ** 2), i) for i in range(len(centers)))[1]
-lines = sc.textFile("/net/projects/spark-example-data/points", sc.defaultParallelism)
+lines = sc.textFile("/lnet/troja/data/npfl118/points/points-medium.txt", sc.defaultParallelism)
 data = lines.map(lambda line: np.array(map(float, line.split()))).cache()
-K = 50
+K = 100
 epsilon = 1e-3
@@ Line 111: / Line 111: @@
   centers.map(center => (center-point).norm(2)).zipWithIndex.min._2
-val lines = sc.textFile("/net/projects/spark-example-data/points", sc.defaultParallelism)
+val lines = sc.textFile("/lnet/troja/data/npfl118/points/points-medium.txt", sc.defaultParallelism)
 val data = lines.map(line => Vector(line.split("\\s+").map(_.toDouble))).cache()
-val K = 50
+val K = 100
 val epsilon = 1e-3

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences