Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
spark:using-python [2014/11/11 09:28] straka |
spark:using-python [2017/10/16 20:56] ufal [Starting Interactive Shell] |
| |
Better interactive shell with code completion using ''ipython'' (installed everywhere on cluster; ask our IT if you want to have it installed on your workstations too) can be started using: | Better interactive shell with code completion using ''ipython'' (installed everywhere on cluster; ask our IT if you want to have it installed on your workstations too) can be started using: |
<file>IPYTHON=1 pyspark</file> | <file>PYSPARK_DRIVER_PYTHON=ipython pyspark</file> |
| |
As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one). | As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one). |
| |
* run ''word_count.py'' script inside existing Spark cluster (i.e., inside ''spark-qsub'' or ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: | * run ''word_count.py'' script inside existing Spark cluster (i.e., inside ''spark-qsub'' or ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: |
<file>spark-submit word_count.py input output</file> | <file>spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
* run ''word_count.py'' script with local Spark cluster using one thread: | * run ''word_count.py'' script with local Spark cluster using one thread: |
<file>MASTER=local spark-submit word_count.py input output</file> | <file>MASTER=local spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
* start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count.py'' script: | * start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count.py'' script: |
<file>spark-qsub 10 1G spark-submit word_count.py input output</file> | <file>spark-qsub 10 1G spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
| |
| ===== Using Virtual Environments ===== |
| |
| If you want to use specific virtual environment in your Spark job, use |
| <file>PYSPARK_PYTHON=path_to_python_in_venv [pyspark|spark-submit]</file> |
| |