Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
spark:spark-introduction [2014/11/11 09:08] straka |
spark:spark-introduction [2022/12/14 12:27] straka [Spark Introduction] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Spark Introduction ====== | ====== Spark Introduction ====== | ||
- | This introduction shows several simple examples to give you an idea what programming in Spark is like. See the official [[http:// | + | This introduction shows several simple examples to give you an idea what programming in Spark is like. See the official [[http:// |
===== Running Spark Shell in Python ===== | ===== Running Spark Shell in Python ===== | ||
Line 51: | Line 51: | ||
val counts = words.map(word => (word, | val counts = words.map(word => (word, | ||
val sorted = counts.sortBy({case (word, count) => count}, ascending=false) | val sorted = counts.sortBy({case (word, count) => count}, ascending=false) | ||
- | sorted.saveAsTextFile('output') | + | sorted.saveAsTextFile("output") |
// Alternatively without variables and using placeholders in lambda parameters: | // Alternatively without variables and using placeholders in lambda parameters: | ||
Line 63: | Line 63: | ||
===== K-Means Example ===== | ===== K-Means Example ===== | ||
- | An example implementing [[http:// | + | An example implementing [[http:// |
<file python> | <file python> | ||
import numpy as np | import numpy as np | ||
Line 71: | Line 71: | ||
lines = sc.textFile("/ | lines = sc.textFile("/ | ||
- | data = lines.map(lambda line: map(float, line.split())).cache() | + | data = lines.map(lambda line: np.array(map(float, line.split()))).cache() |
K = 50 | K = 50 | ||
Line 97: | Line 97: | ||
print "Final centers: " + str(centers) | print "Final centers: " + str(centers) | ||
</ | </ | ||
- | The implementation starts by loading the data and caching them in memory using '' | + | The implementation starts by loading the data points |
Note that explicit broadcasting used for '' | Note that explicit broadcasting used for '' |