Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
spark:recipes:reading-text-files [2016/03/31 22:02] straka |
spark:recipes:reading-text-files [2025/10/15 20:13] (current) straka [Number of Partitions: Multiple Files in a Directory] |
||
|---|---|---|---|
| Line 25: | Line 25: | ||
| ==== Number of Partitions: Compressed File ==== | ==== Number of Partitions: Compressed File ==== | ||
| - | If the input file is compressed, it is always read as 1 partition, | + | If the input file is compressed |
| - | To create multiple partitions, '' | + | On the other hand, files compressed with **'' |
| - | <file python> | + | |
| - | lines = sc.textFile(compressed_file).repartition(3*sc.defaultParallelism) | + | |
| - | </ | + | |
| ==== Number of Partitions: Multiple Files in a Directory ==== | ==== Number of Partitions: Multiple Files in a Directory ==== | ||
| Line 36: | Line 33: | ||
| When the input file is a directory, each file is read in separate partitions. The minimum number of partitions given as second argument to '' | When the input file is a directory, each file is read in separate partitions. The minimum number of partitions given as second argument to '' | ||
| - | Note that when there are many files (as for example in ''/ | + | Note that when there are many files (thousands or more, as for example in ''/ |
| <file python> | <file python> | ||
| conll_lines = sc.textFile("/ | conll_lines = sc.textFile("/ | ||
