This is an old revision of the document!
Table of Contents
MapReduce Tutorial : Setting the environment
Hadoop installation
The tutorial expects you to be logged to a computer in the UFAL cluster. In this environment, Hadoop is installed in /SGE/HADOOP/active
.
You can go through the tutorial even without being connected to UFAL cluster, but you will need
- local Hadoop installation
- unpack it
- edit
conf/hadoop-env.sh
file and make sure there is valid lineexport JAVA_HOME=/path/to/your/jdk
- the repository
hadoop
containing the Perl API and Java extensions. - when using Perl API, set
hadoop_prefix
to point to your Hadoop installation - when using Java API, one of the
Makefile
s contain absolute path to thehadoop
repository – please correct it
When using local Hadoop installation, you must run all jobs either locally in a single thread or start a local cluster and use -jt
for the jobs to use it (see using-a-running-cluster).
The Perl API
To use the Perl MapReduce API, you need
- Perl package
Moose
. - Perl package
Hadoop
.
The Moose package
The standard Moose package is available in the UFAL environment, just add
. /net/work/projects/perl_repo/admin/bin/setup_platform
to .profile
or .bashrc
or type it in the shell
echo -e "\n#MR Tutorial - Moose" >> ~/.bashrc echo ". /net/work/projects/perl_repo/admin/bin/setup_platform" >> ~/.bashrc
The Hadoop package
The custom Hadoop package is available in /net/projects/hadoop/perl
, just add
export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/" export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"
to .profile
, .bash_profile
, .bashrc
or type it in the shell.
echo -e "\n#MR Tutorial - Hadoop" >> ~/.bashrc echo 'export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"' >> ~/.bashrc echo 'export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"' >> ~/.bashrc
Overview | Step 2: Input and output format, testing data. |