[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Writing Text Files

Text files can be written easily by Spark.

Reading Text Files by Lines

To write an RDD to a text file, each element on a line, the method sc.writeTextFile can be used:

lines.saveAsTextFile("output_path")

The output_path always specifies a directory in which several output files are created. If the output directory already exists, an error occurs.

Several output files, named part-00000, part-00001, etc., are created in the output directory, one for

Sorting Output

If you want the output to be sorted, use sortBy method. The sortBy method must be given a lambda function which extracts from a given element a key, which is used during sorting. The elements are sorted in ascending order, but named parameter ascending with false value can be specified.
Python version:

lines.sortBy(lambda line: line)  # Sort whole lines
lines.sortBy(lambda (k, v): k)   # Sort pairs according to the first element
lines.sortBy(lambda line: line, ascending=False) # Sort in decreasing order

Scala version:

lines.sortBy(line=>line)      # Sort whole lines
lines.sortBy(line=>line._1)   # Sort pairs according to the first element
lines.sortBy(line=>line, ascending=false) # Sort in decreasing order

One Output File


[ Back to the navigation ] [ Back to the content ]