Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
spark:using-python [2014/11/10 15:42] straka |
spark:using-python [2015/10/23 14:24] ufal |
</file> | </file> |
| |
* run interactive shell inside ''spark-qrsh'', or start local Spark cluster using as many threads as there are cores: | * run interactive shell using existing Spark cluster (i.e., inside ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: |
<file>IPYTHON=1 pyspark</file> | <file>IPYTHON=1 pyspark</file> |
* run interactive shell with local Spark cluster using one thread: | * run interactive shell with local Spark cluster using one thread: |
</file> | </file> |
| |
* run ''word_count.py'' script inside ''spark-qsub'', ''spark-qrsh'', or start local Spark cluster using as many threads as there are cores: | * run ''word_count.py'' script inside existing Spark cluster (i.e., inside ''spark-qsub'' or ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: |
<file>spark-submit word_count.py input output</file> | <file>spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
* run ''word_count.py'' script with local Spark cluster using one thread: | * run ''word_count.py'' script with local Spark cluster using one thread: |
<file>MASTER=local spark-submit word_count.py input output</file> | <file>MASTER=local spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
* start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count.py'' script: | * start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count.py'' script: |
<file>spark-qsub 10 1G spark-submit word_count.py input output</file> | <file>spark-qsub 10 1G spark-submit word_count.py /net/projects/spark-example-data/wiki-cs outdir</file> |
| |
| ===== Using Virtual Environments ===== |
| |
| If you want to use specific virtual environment in your Spark job, use |
| <file>PYSPARK_PYTHON=path_to_python_in_venv [pyspark|spark-submit]</file> |