[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
spark:using-scala [2022/12/14 13:02]
straka [Usage Examples]
spark:using-scala [2022/12/14 13:24]
straka
Line 13: Line 13:
 Consider the following simple script computing 10 most frequent words of Czech Wikipedia: Consider the following simple script computing 10 most frequent words of Czech Wikipedia:
 <file scala> <file scala>
-(sc.textFile("/lnet/troja/data/npfl118/wiki/cs/wiki.txt", 3*sc.defaultParallelism)+(sc.textFile("/net/projects/spark-example-data/wiki-cs", 3*sc.defaultParallelism)
    .flatMap(_.split("\\s"))    .flatMap(_.split("\\s"))
    .map((_,1)).reduceByKey(_+_)    .map((_,1)).reduceByKey(_+_)
Line 72: Line 72:
 version := "1.0" version := "1.0"
  
-scalaVersion := "2.11.12" +scalaVersion := "2.12.17
- +  
-libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.2"+libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.1"
 </file> </file>
  
Line 80: Line 80:
   <file>sbt package</file>   <file>sbt package</file>
  
-  * run ''word_count'' application inside existing Spark cluster (i.e., inside ''spark-qsub'' or ''spark-qrsh''), or start local Spark cluster using as many threads as there are cores if there is none: +  * run ''word_count'' application inside existing Spark cluster (i.e., inside ''spark-sbatch'' or ''spark-srun''), or start local Spark cluster using as many threads as there are cores if there is none: 
-  <file>spark-submit target/scala-2.11/word_count_2.11-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>+  <file>spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>
   * run ''word_count'' application with local Spark cluster using one thread:   * run ''word_count'' application with local Spark cluster using one thread:
-  <file>MASTER=local spark-submit target/scala-2.11/word_count_2.11-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file> +  <file>MASTER=local spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file> 
-  * start Spark cluster (10 machines, 1GB RAM each) on SGE and run ''word_count'' application: +  * start Spark cluster (10 machines, 2GB RAM each) on SGE and run ''word_count'' application: 
-  <file>spark-qsub 10 1G spark-submit target/scala-2.11/word_count_2.11-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>+  <file>spark-sbatch 10 2G spark-submit target/scala-2.12/word_count_2.12-1.0.jar /net/projects/spark-example-data/wiki-cs outdir</file>
  

[ Back to the navigation ] [ Back to the content ]