Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
spark [2014/10/06 10:53] straka |
spark [2014/11/07 15:57] straka |
====== Spark: Framework for Distributed Computations (Under Construction) ====== | ====== Spark: Framework for Distributed Computations (Under Construction) ====== |
| |
[[http://spark.apache.org|Spark]] is a framework for distributed computations. Natively it works in Python, Scala and Java, although it can be used limitedly in Perl using pipes. | [[http://spark.apache.org|{{:spark:spark-logo.png?150 }}]] [[http://spark.apache.org|Spark]] is a framework for distributed computations. Natively it works in Python, Scala and Java, and can be used limitedly in Perl using pipes. |
| |
Apart from embarrassingly parallel computations, Spark framework is suitable for //in-memory// and/or //iterative// computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with [[http://http://hadoop.apache.org/|Hadoop]], but it is quite different -- Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.) | Apart from embarrassingly parallel computations, Spark framework is suitable for //in-memory// and/or //iterative// computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with [[http://http://hadoop.apache.org/|Hadoop]], but it is quite different -- Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.) |
Latest supported version of Spark is available in ''/net/projects/spark''. To use it, add | Latest supported version of Spark is available in ''/net/projects/spark''. To use it, add |
export PATH="/net/projects/spark/bin:/net/projects/spark/sge:$PATH" | export PATH="/net/projects/spark/bin:/net/projects/spark/sge:$PATH" |
to your ''.bashrc'' (or ''.profile'' and log in again; or to your favourite shell config file). If you want to use Scala and do not have ''sbt'' already installed (or you do not know what it is), add also | to your ''.bashrc'' (or ''.profile'' and log in again; or to your favourite shell config file). If you want to use Scala and do not have ''sbt'' already installed (or you do not know what ''sbt'' is), add also |
export PATH="/net/projects/spark/sbt/bin:$PATH" | export PATH="/net/projects/spark/sbt/bin:$PATH" |
| |
* [[spark:Starting Spark Cluster in UFAL Environment]] | * [[spark:Running Spark on Single Machine or on Cluster]] |
* [[spark:Using Python]] | * [[spark:Using Python]] |
* [[spark:Using Scala]] | * [[spark:Using Scala]] |
===== Recipes ===== | ===== Recipes ===== |
| |
* [[spark:recipes:Reading Input and Writing Output]] | * [[spark:recipes:Reading Text Files]] |
| * [[spark:recipes:Writing Text Files]] |
| * [[spark:recipes:Storing Data in Binary Format]] |
* [[spark:recipes:Using Perl via Pipes]] | * [[spark:recipes:Using Perl via Pipes]] |