[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
spark:using-scala [2014/11/11 09:31]
straka
spark:using-scala [2022/12/14 13:24]
straka
Line 20: Line 20:
 </file> </file>
  
-  * run interactive shell using existing Spark cluster (i.e., inside ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none:+  * run interactive shell using existing Spark cluster (i.e., inside ''spark-srun''), or start local Spark cluster using as many threads as there are cores if there is none:
   <file>spark-shell</file>   <file>spark-shell</file>
   * run interactive shell with local Spark cluster using one thread:   * run interactive shell with local Spark cluster using one thread:
   <file>MASTER=local spark-shell</file>   <file>MASTER=local spark-shell</file>
-  * start Spark cluster (10 machines, 1GB RAM each) on SGE and run interactive shell: +  * start Spark cluster (10 machines, 2GB RAM each) via Slurm and run interactive shell: 
-  <file>spark-qrsh 10 1G spark-shell</file>+  <file>spark-srun 10 2G spark-shell</file>
  
  
Line 42: Line 42:
   - replace the ''spark-template'' by your project name in the first line (i.e., ''name := "my-best-project"'')   - replace the ''spark-template'' by your project name in the first line (i.e., ''name := "my-best-project"'')
   - run ''sbt package'' to create JAR (note that first run of ''sbt'' will take several minutes)   - run ''sbt package'' to create JAR (note that first run of ''sbt'' will take several minutes)
-The resulting JAR can be found in ''target/scala-2.10'' subdirectory, named after your project.+The resulting JAR can be found in ''target/scala-2.11'' subdirectory, named after your project.
  
 ==== Usage Examples ==== ==== Usage Examples ====
Line 68: Line 68:
 The ''sbt'' project file ''word_count.sbt'': The ''sbt'' project file ''word_count.sbt'':
 <file> <file>
-name := "word_count"name := "word_count"+name := "word_count"
  
 version := "1.0" version := "1.0"
  
-scalaVersion := "2.10.4+scalaVersion := "2.12.17
- +  
-libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"+libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.1"
 </file> </file>
  
Line 80: Line 80:
   <file>sbt package</file>   <file>sbt package</file>
  
-  * run ''word_count'' application inside existing Spark cluster (i.e., inside ''spark-qsub'' or ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: +  * run ''word_count'' application inside existing Spark cluster (i.e., inside ''spark-sbatch'' or ''spark-srun''), or start local Spark cluster using as many threads as there are cores if there is none: 
-  <file>spark-submit --class Main target/scala-2.10/word_count_2.10-1.0.jar input output</file>+  <file>spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>
   * run ''word_count'' application with local Spark cluster using one thread:   * run ''word_count'' application with local Spark cluster using one thread:
-  <file>MASTER=local spark-submit --class Main target/scala-2.10/word_count_2.10-1.0.jar input output</file> +  <file>MASTER=local spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file> 
-  * start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count'' application: +  * start Spark cluster (10 machines, 2GB RAM each) on SGE and run ''word_count'' application: 
-  <file>spark-qsub 10 1G spark-submit --class Main target/scala-2.10/word_count_2.10-1.0.jar input output</file> +  <file>spark-sbatch 10 2G spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>
- +
-Note that the ''-''''-class Main'' arguments are needed only because of [[https://issues.apache.org/jira/browse/SPARK-4298|Spark bug 4298]] and will not be needed when fixed.+
  

[ Back to the navigation ] [ Back to the content ]