[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
spark:using-python [2017/10/16 20:58]
ufal [Using Python]
spark:using-python [2019/11/12 13:37]
straka
Line 8: Line 8:
 <file>pyspark</file> <file>pyspark</file>
  
-Better interactive shell with code completion using ''ipython'' (installed everywhere on cluster; ask our IT if you want to have it installed on your workstations too) can be started using: +Better interactive shell with code completion using ''ipython3'' (do `pip3 install --user ipython` if you do not have it) can be started using: 
-<file>PYSPARK_DRIVER_PYTHON=ipython pyspark</file>+<file>PYSPARK_DRIVER_PYTHON=ipython3 pyspark</file>
  
 As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one). As described in [[running-spark-on-single-machine-or-on-cluster|Running Spark on Single Machine or on Cluster]], environmental variable ''MASTER'' specifies which Spark master to use (or whether to start a local one).
Line 20: Line 20:
    .map(lambda word: (word, 1))    .map(lambda word: (word, 1))
    .reduceByKey(lambda c1,c2: c1+c2)    .reduceByKey(lambda c1,c2: c1+c2)
-   .sortBy(lambda (word,count)count, ascending=False)+   .sortBy(lambda word_countword_count[1], ascending=False)
    .take(10))    .take(10))
 </file> </file>
  
   * run interactive shell using existing Spark cluster (i.e., inside ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none:   * run interactive shell using existing Spark cluster (i.e., inside ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none:
-  <file>PYSPARK_DRIVER_PYTHON=ipython pyspark</file>+  <file>PYSPARK_DRIVER_PYTHON=ipython3 pyspark</file>
   * run interactive shell with local Spark cluster using one thread:   * run interactive shell with local Spark cluster using one thread:
-  <file>MASTER=local PYSPARK_DRIVER_PYTHON=ipython pyspark</file>+  <file>MASTER=local PYSPARK_DRIVER_PYTHON=ipython3 pyspark</file>
   * start Spark cluster (10 machines, 1GB RAM each) on SGE and run interactive shell:   * start Spark cluster (10 machines, 1GB RAM each) on SGE and run interactive shell:
-  <file>PYSPARK_DRIVER_PYTHON=ipython spark-qrsh 10 1G pyspark</file>+  <file>PYSPARK_DRIVER_PYTHON=ipython3 spark-qrsh 10 1G pyspark</file>
  
 Note that ''PYSPARK_DRIVER_PYTHON'' variable can be left out or specified in ''.bashrc'' (or similar). Note that ''PYSPARK_DRIVER_PYTHON'' variable can be left out or specified in ''.bashrc'' (or similar).
Line 60: Line 60:
    .map(lambda token: (token, 1))    .map(lambda token: (token, 1))
    .reduceByKey(lambda x,y: x + y)    .reduceByKey(lambda x,y: x + y)
-   .sortBy(lambda (word,count)count, ascending=False)+   .sortBy(lambda word_countword_count[1], ascending=False)
    .saveAsTextFile(output))    .saveAsTextFile(output))
 sc.stop() sc.stop()

[ Back to the navigation ] [ Back to the content ]