[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
spark:recipes:reading-text-files [2014/11/04 14:11]
straka
spark:recipes:reading-text-files [2014/11/04 14:13]
straka
Line 67: Line 67:
 To control the number of partitions, ''repartition'' or ''coalesce'' can be used.  To control the number of partitions, ''repartition'' or ''coalesce'' can be used. 
  
-For example, to read compressed HamleDT Czech CoNLL files, the following can be used:+For example, to read compressed HamleDT Czech CoNLL files, so that every sentence is one element of the resulting ''RDD'', the following can be used:
 <file python> <file python>
 conlls = paragraphFile(sc, "/net/projects/spark-example-data/hamledt-cs-conll").coalesce(3*sc.defaultParallelism) conlls = paragraphFile(sc, "/net/projects/spark-example-data/hamledt-cs-conll").coalesce(3*sc.defaultParallelism)

[ Back to the navigation ] [ Back to the content ]