Table of Contents
Storing Data in Binary Format
It is also possible to store an RDD in a binary format, which has several advantages:
- RDD can be saved an later loaded while preserving the element structure (e.g., elements can be for lists of pairs)
- the resulting file is reasonably compact
- saving and loading is fast
Nevertheless, note that the binary format is different in Python and in Scala.
Python
In Python, rdd.saveAsPickleFile is used to save the file and sc.pickleFile can load it:
data.saveAsPickleFile(path) ... loaded_data = sc.pickleFile(path)
Scala
In Scala, rdd.saveAsObjectFile is used to save the file and sc.objectFile can load it:
data.saveAsObjectFile(path) ... val loaded_data = sc.objectFile(path)
