====== MapReduce Tutorial : Setting the environment ======
===== Requirements =====
The tutorial expects you to be logged to a computer in the UFAL cluster and be able to submit jobs using SGE. In this environment, Hadoop is installed in ''/SGE/HADOOP/active''.
To use the Perl MapReduce API, you need
* Perl package ''Moose''.
* Perl package ''Hadoop''.
==== The Moose package ====
The standard Moose package is available in the UFAL environment, just add
. /net/work/projects/perl_repo/admin/bin/setup_platform
to ''.profile'' or ''.bashrc'' or type it in the shell
echo -e "\n#MR Tutorial - Moose" >> ~/.bashrc
echo ". /net/work/projects/perl_repo/admin/bin/setup_platform" >> ~/.bashrc
==== The Hadoop package ====
The custom Hadoop package is available in ''/net/projects/hadoop/perl'', just add
export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"
export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"
to ''.profile'', ''.bash_profile'', ''.bashrc'' or type it in the shell.
echo -e "\n#MR Tutorial - Hadoop" >> ~/.bashrc
echo 'export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"' >> ~/.bashrc
echo 'export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"' >> ~/.bashrc
===== When not logged in UFAL cluster =====
**If you are not logged in the UFAL cluster, you will need:**
* local Hadoop installation
- download ''http://www.apache.org/dist/hadoop/common/hadoop-1.0.0/hadoop-1.0.0.tar.gz''
- unpack it
- edit ''conf/hadoop-env.sh'' file and make sure there is valid line export JAVA_HOME=/path/to/your/jdk
* the repository ''hadoop'' containing the Perl API and Java extensions.
* when using Perl API, set ''hadoop_prefix'' to point to your Hadoop installation
* when using Java API, one of the ''Makefile''s contain absolute path to the ''hadoop'' repository -- please correct it
When using local Hadoop installation, you must run all jobs either locally in a single thread or start a local cluster and use ''-jt'' for the jobs to use it (see [[.:step-7#using-a-running-cluster]]).
----
[[.|Overview]] | [[step-2|Step 2]]: Input and output format, testing data. |