This is an old revision of the document!
Table of Contents
Writing Text Files
Text files can be written easily by Spark.
Reading Text Files by Lines
To write an RDD to a text file, each element on a line, the method sc.writeTextFile can be used:
lines.saveAsTextFile("output_path")
The output_path always specifies a directory in which several output files are created. If the output directory already exists, an error occurs.
Several output files, named part-00000, part-00001, etc., are created in the output directory, one for
Sorting Output
If you want the output to be sorted, use sortBy method. The sortBy method must be given a lambda function which extracts from a given element a key, which is used during sorting. The elements are sorted in ascending order, but named parameter ascending with false value can be specified.
Python version:
lines.sortBy(lambda line: line) # Sort whole lines lines.sortBy(lambda (k, v): k) # Sort pairs according to the first element lines.sortBy(lambda line: line, ascending=False) # Sort in decreasing order
Scala version:
lines.sortBy(line=>line) # Sort whole lines lines.sortBy(line=>line._1) # Sort pairs according to the first element lines.sortBy(line=>line, ascending=false) # Sort in decreasing order
