Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-5 [2012/01/29 21:30] straka |
courses:mapreduce-tutorial:step-5 [2012/01/31 15:56] (current) straka |
| |
<file perl> | <file perl> |
package Mapper; | package My::Mapper; |
use Moose; | use Moose; |
with 'Hadoop::Mapper'; | with 'Hadoop::Mapper'; |
} | } |
| |
package Reducer; | package My::Reducer; |
use Moose; | use Moose; |
with 'Hadoop::Reducer'; | with 'Hadoop::Reducer'; |
} | } |
| |
package Main; | package main; |
use Hadoop::Runner; | use Hadoop::Runner; |
| |
my $runner = Hadoop::Runner->new( | my $runner = Hadoop::Runner->new( |
mapper => Mapper->new(), | mapper => My::Mapper->new(), |
reducer => Reducer->new()); | reducer => My::Reducer->new()); |
| |
$runner->run(); | $runner->run(); |
===== Types of keys and values ===== | ===== Types of keys and values ===== |
| |
Currently in the Perl API, the keys and values are both strings, which are stored and loaded using UTF-8 format. If you need more complex structures, you have to serialize and deserialize them by yourselves. | Currently in the Perl API, the keys and values are both strings, which are stored and loaded using UTF-8 format and compared lexicographically. If you need more complex structures, you have to serialize and deserialize them by yourselves. |
| |
The Java API offers a wide range of types, including user-defined types, to be used for keys and values. | The Java API offers a wide range of types, including user-defined types, to be used for keys and values. |
Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which counts occurrences of every word in the article texts. You can download the template {{:courses:mapreduce-tutorial:step-5-exercise1.txt|step-5-exercise1.pl}} and execute it. | Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which counts occurrences of every word in the article texts. You can download the template {{:courses:mapreduce-tutorial:step-5-exercise1.txt|step-5-exercise1.pl}} and execute it. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise1.txt' -O 'step-5-exercise1.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise1.txt' -O 'step-5-exercise1.pl' |
rm -rf step-5-out-ex1; perl step-5-exercise1.pl run /home/straka/wiki/cs-text-medium/ step-5-out-ex1 | # NOW EDIT THE FILE |
| # $EDITOR step-5-exercise1.pl |
| rm -rf step-5-out-ex1; perl step-5-exercise1.pl /home/straka/wiki/cs-text-medium/ step-5-out-ex1 |
less step-5-out-ex1/part-* | less step-5-out-ex1/part-* |
| |
You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-5-solution1.pl}} and check the correct output. | You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-5-solution1.pl}} and check the correct output. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-5-solution1.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-5-solution1.pl' |
rm -rf step-5-out-sol1; perl step-5-solution1.pl run /home/straka/wiki/cs-text-medium/ step-5-out-sol1 | # NOW VIEW THE FILE |
| # $EDITOR step-5-solution1.pl |
| rm -rf step-5-out-sol1; perl step-5-solution1.pl /home/straka/wiki/cs-text-medium/ step-5-out-sol1 |
less step-5-out-sol1/part-* | less step-5-out-sol1/part-* |
| |
Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which generates an inverted index. Inverted index contains for each word all its //occurrences//, where each occurrence is pair (article of occurrence, position of occurrence). You can download the template {{:courses:mapreduce-tutorial:step-5-exercise2.txt|step-5-exercise2.pl}} and execute it. | Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which generates an inverted index. Inverted index contains for each word all its //occurrences//, where each occurrence is pair (article of occurrence, position of occurrence). You can download the template {{:courses:mapreduce-tutorial:step-5-exercise2.txt|step-5-exercise2.pl}} and execute it. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise2.txt' -O 'step-5-exercise2.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise2.txt' -O 'step-5-exercise2.pl' |
rm -rf step-5-out-ex2; perl step-5-exercise2.pl run /home/straka/wiki/cs-text-tiny/ step-5-out-ex2 | # NOW EDIT THE FILE |
| # $EDITOR step-5-exercise2.pl |
| rm -rf step-5-out-ex2; perl step-5-exercise2.pl /home/straka/wiki/cs-text-small/ step-5-out-ex2 |
less step-5-out-ex2/part-* | less step-5-out-ex2/part-* |
| |
You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution2.txt|step-5-solution2.pl}} and check the correct output. | You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution2.txt|step-5-solution2.pl}} and check the correct output. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution2.txt' -O 'step-5-solution2.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution2.txt' -O 'step-5-solution2.pl' |
rm -rf step-5-out-sol2; perl step-5-solution2.pl run /home/straka/wiki/cs-text-tiny/ step-5-out-sol2 | # NOW VIEW THE FILE |
| # $EDITOR step-5-solution2.pl |
| rm -rf step-5-out-sol2; perl step-5-solution2.pl /home/straka/wiki/cs-text-small/ step-5-out-sol2 |
less step-5-out-sol2/part-* | less step-5-out-sol2/part-* |
| |