====== Writing Text Files ======

Text files can be written easily by Spark.

===== Writing Text Files by Lines =====

To write an ''RDD'' to a text file, each element on a line, the method ''sc.writeTextFile'' can be used:
<file python>
lines.saveAsTextFile("output_path")
</file>
The ''output_path'' always specifies a directory in which several output files are created. If the output directory already exists, an error occurs.

Several output files, named ''part-00000'', ''part-00001'', etc., are created in the output directory, one for every partition of ''RDD''.

==== Sorting Output ====

If you want the output to be sorted, use ''sortBy'' method. The ''sortBy'' method must be given a lambda function which extracts from a given element a key, which is used during sorting. The elements are sorted in ascending order, but named parameter ''ascending'' with ''false'' value can be specified.
Python version:
<file python>
lines.sortBy(lambda line: line)  # Sort whole lines
lines.sortBy(lambda (k, v): k)   # Sort pairs according to the first element
lines.sortBy(lambda line: line, ascending=False) # Sort in decreasing order
</file>
Scala version:
<file scala>
lines.sortBy(line=>line)      # Sort whole lines
lines.sortBy(line=>line._1)   # Sort pairs according to the first element
lines.sortBy(line=>line, ascending=false) # Sort in decreasing order
</file>

==== One Output File ====

In many cases, only one output file is desirable. In that case, ''coalesce(1)'' method can be used, which merges all partitions into one.
<file python>
lines.coalesce(1).saveAsTextFile("output_path")
</file>

In case sorting is also used, use ''coalesce'' **after** the sorting, so that the sorting can be executed in parallel and the partitions are merged only before performing the output.

===== Writing Text Files by Paragraphs  =====

The ''saveAsTextFile'' method always writes one newline between elements. If you want to separate elements by two newlines, append a newline to every element manually:
<file python>
lines.map(lambda line: str(line) + "\n").saveAsTextFile("output_path")
</file>
<file scala>
lines.map(_.toString + "\n").saveAsTextFile("output_path")
</file>