Table of Contents

MapReduce Tutorial : Setting the environment

Requirements

The tutorial expects you to be logged to a computer in the UFAL cluster and be able to submit jobs using SGE. In this environment, Hadoop is installed in /SGE/HADOOP/active.

To use the Perl MapReduce API, you need

The Moose package

The standard Moose package is available in the UFAL environment, just add

. /net/work/projects/perl_repo/admin/bin/setup_platform

to .profile or .bashrc or type it in the shell

echo -e "\n#MR Tutorial - Moose" >> ~/.bashrc
echo ". /net/work/projects/perl_repo/admin/bin/setup_platform" >> ~/.bashrc

The Hadoop package

The custom Hadoop package is available in /net/projects/hadoop/perl, just add

export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"
export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"

to .profile, .bash_profile, .bashrc or type it in the shell.

echo -e "\n#MR Tutorial - Hadoop" >> ~/.bashrc
echo 'export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"' >> ~/.bashrc
echo 'export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"' >> ~/.bashrc

When not logged in UFAL cluster

If you are not logged in the UFAL cluster, you will need:

When using local Hadoop installation, you must run all jobs either locally in a single thread or start a local cluster and use -jt for the jobs to use it (see using-a-running-cluster).


Overview Step 2: Input and output format, testing data.