This is an old revision of the document!
Table of Contents
Using Python
In order to use Spark in Python, environment has to bee set up according to Using Spark in UFAL Environment.
Starting Interactive Shell
Interactive shell can be started using:
pyspark
Better interactive shell with code completion using ipython
(installed everywhere on cluster; ask our IT if you want to have it installed on your workstations too) can be started using:
IPYTHON=1 pyspark
As described in Running Spark on Single Machine or on Cluster, environmental variable MASTER
specifies which Spark master to use (or whether to start a local one).
Usage Examples
- run interactive shell with local Spark cluster using as many threads as there are cores:
IPYTHON=1 pyspark #if MASTER not defined MASTER="local[*]" IPYTHON=1 pyspark #if MASTER is defined differently and must be overwritten
- run interactive shell with local Spark cluster using one thread:
MASTER=local IPYTHON=1 pyspark
- start Spark cluster (10 machines, 1GB RAM each) on SGE and run interactive shell:
IPYTHON=1 spark-qrsh 10 1G pyspark
Note that IPYTHON
variable can be left out or specified in .bashrc
(or similar).
Running Python Spark Scripts
Python Spark scripts can be started using spark-submit
.