Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
spark:recipes:reading-text-files [2014/11/04 10:49] straka |
spark:recipes:reading-text-files [2016/03/31 22:02] (current) straka |
||
---|---|---|---|
Line 67: | Line 67: | ||
To control the number of partitions, '' | To control the number of partitions, '' | ||
- | For example, to read compressed HamleDT Czech CoNLL files, the following can be used: | + | For example, to read compressed HamleDT Czech CoNLL files, so that every sentence is one element of the resulting '' |
<file python> | <file python> | ||
conlls = paragraphFile(sc, | conlls = paragraphFile(sc, | ||
Line 74: | Line 74: | ||
===== Reading Whole Text Files ===== | ===== Reading Whole Text Files ===== | ||
- | To read whole text file or whole text files in a given directory, '' | + | To read whole text file or whole text files in a given directory, '' |
- | + | ||
- | Unfortunately, | + | |
<file python> | <file python> | ||
whole_wiki = sc.wholeTextFiles("/ | whole_wiki = sc.wholeTextFiles("/ | ||
</ | </ | ||
+ | |||
+ | By default, every file is read in separate partitions. To control the number of partitions, '' |