It is also possible to store an RDD
in a binary format, which has several advantages:
Nevertheless, note that the binary format is different in Python and in Scala.
In Python, rdd.saveAsPickleFile
is used to save the file and sc.pickleFile
can load it:
data.saveAsPickleFile(path) ... loaded_data = sc.pickleFile(path)
In Scala, rdd.saveAsObjectFile
is used to save the file and sc.objectFile
can load it:
data.saveAsObjectFile(path) ... val loaded_data = sc.objectFile(path)