This is an old revision of the document!
Table of Contents
Running Spark on Single Machine or on Cluster
In order to use Spark, environment has to bee set up according to Using Spark in UFAL Environment.
When Spark computation starts, it uses environment variable MASTER
to determine the mode of computation. The following values are possible:
local
: Run locally using single thread.local[N]
(e.g.,local[2]
orlocal[4]
): Run locally usingN
threads.local[*]
(default ifMASTER
variable does not exist): Run locally using as many threads as there are processor cores.spark:/
/master_address:master_port
: Run in a distributed fashion using specified master.
Running Spark on Single Machine
Spark computations can be started both on desktop machines and on cluster machines, either by specifying MASTER
to one of local
modes, or by not speficying MASTER at all (local[*]
is used then).
Note that when you use qrsh
or qsub
, your job can usually use one core, so you should specify MASTER=local
. If you do not, Spark will use all cores on the machine, even though SGE gave you onle one.