This is an old revision of the document!
Table of Contents
Writing Text Files
Text files can be written easily by Spark.
Reading Text Files by Lines
To write an RDD
to a text file, each element on a line, the method sc.writeTextFile
can be used:
lines.saveAsTextFile("output_path")
The output_path
always specifies a directory in which several output files are created. If the output directory already exists, an error occurs.
Several output files, named part-00000
, part-00001
, etc., are created in the output directory, one for
Sorting Output
If you want the output to be sorted, use sortBy
method. The sortBy
method must be given a lambda function which extracts from a given element a key, which is used during sorting. The elements are sorted in ascending order, but named parameter ascending
with false
value can be specified.
Python version:
lines.sortBy(lambda line: line) # Sort whole lines lines.sortBy(lambda (k, v): k) # Sort pairs according to the first element lines.sortBy(lambda line: line, ascending=False) # Sort in decreasing order
Scala version:
lines.sortBy(line=>line) # Sort whole lines lines.sortBy(line=>line._1) # Sort pairs according to the first element lines.sortBy(line=>line, ascending=false) # Sort in decreasing order