Running Spark on Single Machine or on Cluster

In order to use Spark, environment has to bee set up according to Using Spark in UFAL Environment.

When Spark computation starts, it uses environment variable MASTER to determine the mode of computation. The following values are possible:

local: Run locally using single thread.
local[N] (e.g., local[2] or local[4]): Run locally using N threads.
local[*] (default if MASTER variable does not exist): Run locally using as many threads as there are processor cores.
spark://master_address:master_port: Run in a distributed fashion using specified master.

Running Spark on Single Machine

Spark computations can be started both on desktop machines and on cluster machines, either by specifying MASTER to one of local modes, or by not speficying MASTER at all (local[*] is used then).

Note that when you use qrsh or qsub, your job can usually use one core, so you should specify MASTER=local. If you do not, Spark will use all cores on the machine, even though SGE gave you onle one.

Starting Spark Cluster

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

Running Spark on Single Machine or on Cluster

Running Spark on Single Machine

Starting Spark Cluster