Table of Contents
Storing Data in Binary Format
It is also possible to store an RDD
in a binary format, which has several advantages:
- RDD can be saved an later loaded while preserving the element structure (e.g., elements can be for lists of pairs)
- the resulting file is reasonably compact
- saving and loading is fast
Nevertheless, note that the binary format is different in Python and in Scala.
Python
In Python, rdd.saveAsPickleFile
is used to save the file and sc.pickleFile
can load it:
data.saveAsPickleFile(path) ... loaded_data = sc.pickleFile(path)
Scala
In Scala, rdd.saveAsObjectFile
is used to save the file and sc.objectFile
can load it:
data.saveAsObjectFile(path) ... val loaded_data = sc.objectFile(path)