MapReduce Tutorial : Setting the environment

Hadoop installation

The tutorial expects you to be logged to a computer in the UFAL cluster. In this environment, Hadoop is installed in /SGE/HADOOP/active.

You can go through the tutorial even without being connected to UFAL cluster, but you will need

local Hadoop installation
1. download http://www.apache.org/dist/hadoop/common/hadoop-1.0.0/hadoop-1.0.0.tar.gz
2. unpack it
3. edit conf/hadoop-env.sh file and make sure there is valid line
```
export JAVA_HOME=/path/to/your/jdk
```
the repository hadoop containing the Perl API and Java extensions.
when using Perl API, set hadoop_prefix to point to your Hadoop installation
when using Java API, one of the Makefiles contain absolute path to the hadoop repository – please correct it

When using local Hadoop installation, you must run all jobs either locally in a single thread or start a local cluster and use -jt for the jobs to use it (see using-a-running-cluster).

The Perl API

To use the Perl MapReduce API, you need

Perl package Moose.
Perl package Hadoop.

The Moose package

The standard Moose package is available in the UFAL environment, just add

. /net/work/projects/perl_repo/admin/bin/setup_platform

to .profile or .bashrc or type it in the shell

echo -e "\n#MR Tutorial - Moose" >> ~/.bashrc
echo ". /net/work/projects/perl_repo/admin/bin/setup_platform" >> ~/.bashrc

The Hadoop package

The custom Hadoop package is available in /net/projects/hadoop/perl, just add

export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"
export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"

to .profile, .bash_profile, .bashrc or type it in the shell.

echo -e "\n#MR Tutorial - Hadoop" >> ~/.bashrc
echo 'export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"' >> ~/.bashrc
echo 'export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"' >> ~/.bashrc

Overview

Step 2: Input and output format, testing data.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

MapReduce Tutorial : Setting the environment

Hadoop installation

The Perl API

The Moose package

The Hadoop package