Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
spark:spark-introduction [2014/11/11 09:09] straka |
spark:spark-introduction [2022/12/14 12:29] straka [Running Spark Shell in Python] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Spark Introduction ====== | ====== Spark Introduction ====== | ||
- | This introduction shows several simple examples to give you an idea what programming in Spark is like. See the official [[http:// | + | This introduction shows several simple examples to give you an idea what programming in Spark is like. See the official [[http:// |
===== Running Spark Shell in Python ===== | ===== Running Spark Shell in Python ===== | ||
To run interactive Python shell in local Spark mode, run (on your local workstation or on cluster using '' | To run interactive Python shell in local Spark mode, run (on your local workstation or on cluster using '' | ||
- | | + | |
- | The IPYTHON=1 parameter instructs Spark to use '' | + | The PYSPARK_DRIVER_PYTHON=ipython3 |
- | After a local Spark executor is started, the Python shell starts. | + | After a local Spark executor is started, the Python shell starts. |
- | the prompt line, the SparkUI | + | the prompt line, the Spark UI address is listed in the following format: |
- | | + | |
- | The SparkUI | + | The Spark UI is an HTML interface, which displays the state of the application -- whether |
==== Running Spark Shell in Scala ==== | ==== Running Spark Shell in Scala ==== | ||
Line 51: | Line 51: | ||
val counts = words.map(word => (word, | val counts = words.map(word => (word, | ||
val sorted = counts.sortBy({case (word, count) => count}, ascending=false) | val sorted = counts.sortBy({case (word, count) => count}, ascending=false) | ||
- | sorted.saveAsTextFile('output') | + | sorted.saveAsTextFile("output") |
// Alternatively without variables and using placeholders in lambda parameters: | // Alternatively without variables and using placeholders in lambda parameters: | ||
Line 71: | Line 71: | ||
lines = sc.textFile("/ | lines = sc.textFile("/ | ||
- | data = lines.map(lambda line: map(float, line.split())).cache() | + | data = lines.map(lambda line: np.array(map(float, line.split()))).cache() |
K = 50 | K = 50 | ||
Line 97: | Line 97: | ||
print "Final centers: " + str(centers) | print "Final centers: " + str(centers) | ||
</ | </ | ||
- | The implementation starts by loading the data and caching them in memory using '' | + | The implementation starts by loading the data points |
Note that explicit broadcasting used for '' | Note that explicit broadcasting used for '' |