Differences
This shows you the differences between two versions of the page.
Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
spark:spark-introduction [2014/11/03 17:24] straka |
spark:spark-introduction [2014/11/03 17:30] straka |
</file> | </file> |
The output of 'saveAsTextFile' is the directory ''output'' -- because the RDD can be distributed on several computers, the output is a directory containing possibly multiple files. | The output of 'saveAsTextFile' is the directory ''output'' -- because the RDD can be distributed on several computers, the output is a directory containing possibly multiple files. |
| |
| Note that 'map' and 'reduceByKey' operations exist, allowing any Hadoop MapReduce operation to be implemented. On the other hand, several operations like 'join', 'sortBy', 'cogroup' are available, which are not available in Hadoop (or at least not directly), making Spark computational model a strict superset of Hadoop computational model. |
| |
The Scala versions is quite similar: | The Scala versions is quite similar: |
.take(10)) | .take(10)) |
</file> | </file> |
| |
| |
| ===== K-Means Example ===== |
| |