Table of Contents

Storing Data in Binary Format

It is also possible to store an RDD in a binary format, which has several advantages:

  1. RDD can be saved an later loaded while preserving the element structure (e.g., elements can be for lists of pairs)
  2. the resulting file is reasonably compact
  3. saving and loading is fast

Nevertheless, note that the binary format is different in Python and in Scala.

Python

In Python, rdd.saveAsPickleFile is used to save the file and sc.pickleFile can load it:

data.saveAsPickleFile(path)
...
loaded_data = sc.pickleFile(path)

Scala

In Scala, rdd.saveAsObjectFile is used to save the file and sc.objectFile can load it:

data.saveAsObjectFile(path)
...
val loaded_data = sc.objectFile(path)