[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
courses:mapreduce-tutorial:step-2 [2012/01/28 10:27]
majlis
courses:mapreduce-tutorial:step-2 [2012/01/29 16:03] (current)
straka
Line 19: Line 19:
   * ''/​home/​straka/​wiki/​cs-seq-medium''​ -- compressed SequenceFile of Czech Wikipedia, 8MB.   * ''/​home/​straka/​wiki/​cs-seq-medium''​ -- compressed SequenceFile of Czech Wikipedia, 8MB.
   * ''/​home/​straka/​wiki/​cs-seq-small''​ -- compressed SequenceFile of Czech Wikipedia, 35kB.   * ''/​home/​straka/​wiki/​cs-seq-small''​ -- compressed SequenceFile of Czech Wikipedia, 35kB.
-  * ''/​home/​straka/​wiki/​cs-text''​ -- uncompressed plain text files of Czech Wikipedia, 200MB. +  * ''/​home/​straka/​wiki/​cs-text''​ -- uncompressed plain text files of Czech Wikipedia ​in the ''​KeyValueTextInputFormat''​, 200MB. 
-  * ''/​home/​straka/​wiki/​cs-text-medium''​ -- uncompressed plain text files of Czech Wikipedia, 16MB. +  * ''/​home/​straka/​wiki/​cs-text-medium''​ -- uncompressed plain text files of Czech Wikipedia ​in the ''​KeyValueTextInputFormat''​, 16MB. 
-  * ''/​home/​straka/​wiki/​cs-text-small''​ -- uncompressed plain text files of Czech Wikipedia, 70kB.+  * ''/​home/​straka/​wiki/​cs-text-small''​ -- uncompressed plain text files of Czech Wikipedia ​in the ''​KeyValueTextInputFormat''​, 70kB.
   * ''/​home/​straka/​wiki/​en-seq''​ -- compressed SequenceFile of English Wikipedia, 1.9GB.   * ''/​home/​straka/​wiki/​en-seq''​ -- compressed SequenceFile of English Wikipedia, 1.9GB.
 +It is recommended to use the text format in the tutorial, so that both input and output files are readable.
  
 ---- ----

[ Back to the navigation ] [ Back to the content ]