This is an old revision of the document!
Table of Contents
Using Scala
In order to use Spark in Scala, environment has to bee set up according to Using Spark in UFAL Environment (including the sbt
).
Starting Interactive Shell
Interactive shell can be started using:
spark-shell
As described in Running Spark on Single Machine or on Cluster, environmental variable MASTER
specifies which Spark master to use (or whether to start a local one).
Usage Examples
Consider the following simple script computing 10 most frequent words of Czech Wikipedia:
(sc.textFile("/net/projects/spark-example-data/wiki-cs", 3*sc.defaultParallelism) .flatMap(_.split("\\s")) .map((_,1)).reduceByKey(_+_) .sortBy(_._2, ascending=false) .take(10))
- run interactive shell inside
spark-qrsh
, or start local Spark cluster using as many threads as there are cores:
spark-shell
- run interactive shell with local Spark cluster using one thread:
MASTER=local spark-shell
- start Spark cluster (10 machines, 1GB RAM each) on SGE and run interactive shell:
spark-qrsh 10 1G spark-shell