Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
spark [2014/11/03 17:19] straka |
spark [2014/11/11 08:40] straka |
====== Spark: Framework for Distributed Computations (Under Construction) ====== | ====== Spark: Framework for Distributed Computations (Under Construction) ====== |
| |
[[http://spark.apache.org|Spark]] is a framework for distributed computations. Natively it works in Python, Scala and Java, and can be used limitedly in Perl using pipes. | [[http://spark.apache.org|{{:spark:spark-logo.png?150 }}]] [[http://spark.apache.org|Spark]] is a framework for distributed computations. Natively it works in Python, Scala and Java, and can be used limitedly in Perl using pipes. |
| |
Apart from embarrassingly parallel computations, Spark framework is suitable for //in-memory// and/or //iterative// computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with [[http://http://hadoop.apache.org/|Hadoop]], but it is quite different -- Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.) | Apart from embarrassingly parallel computations, Spark framework is suitable for //in-memory// and/or //iterative// computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with [[http://http://hadoop.apache.org/|Hadoop]], but it is quite different -- Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.) |
* Official [[http://spark.apache.org/docs/latest/quick-start.html|Quick Start]] | * Official [[http://spark.apache.org/docs/latest/quick-start.html|Quick Start]] |
* Official [[http://spark.apache.org/docs/latest/programming-guide.html|Spark Programming Guide]] | * Official [[http://spark.apache.org/docs/latest/programming-guide.html|Spark Programming Guide]] |
| * Official [[http://spark.apache.org/docs/latest/mllib-guide.html|MLlib Programming Guide]] (Spark’s scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives) |
* Official [[http://spark.apache.org/docs/latest/api/python/index.html|Python API Reference]]/[[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package|Scala API Reference]] | * Official [[http://spark.apache.org/docs/latest/api/python/index.html|Python API Reference]]/[[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package|Scala API Reference]] |
| |
Latest supported version of Spark is available in ''/net/projects/spark''. To use it, add | Latest supported version of Spark is available in ''/net/projects/spark''. To use it, add |
export PATH="/net/projects/spark/bin:/net/projects/spark/sge:$PATH" | export PATH="/net/projects/spark/bin:/net/projects/spark/sge:$PATH" |
to your ''.bashrc'' (or ''.profile'' and log in again; or to your favourite shell config file). If you want to use Scala and do not have ''sbt'' already installed (or you do not know what ''sbt'' is), add also | to your ''.bashrc'' (or to your favourite shell config file). If you want to use Scala and do not have ''sbt'' already installed (or you do not know what ''sbt'' is), add also |
export PATH="/net/projects/spark/sbt/bin:$PATH" | export PATH="/net/projects/spark/sbt/bin:$PATH" |
| |
* [[spark:Starting Spark Cluster in UFAL Environment]] | * [[spark:Running Spark on Single Machine or on Cluster]] |
* [[spark:Using Python]] | * [[spark:Using Python]] |
* [[spark:Using Scala]] | * [[spark:Using Scala]] |
===== Recipes ===== | ===== Recipes ===== |
| |
* [[spark:recipes:Various Text Input Formats]] | * [[spark:recipes:Reading Text Files]] |
| * [[spark:recipes:Writing Text Files]] |
| * [[spark:recipes:Storing Data in Binary Format]] |
* [[spark:recipes:Using Perl via Pipes]] | * [[spark:recipes:Using Perl via Pipes]] |