MapReduce Tutorial : Setting the environment
- Requirements
  - The Moose package
  - The Hadoop package
- When not logged in UFAL cluster

MapReduce Tutorial : Setting the environment

Requirements

The tutorial expects you to be logged to a computer in the UFAL cluster and be able to submit jobs using SGE. In this environment, Hadoop is installed in /SGE/HADOOP/active.

To use the Perl MapReduce API, you need

Perl package Moose.
Perl package Hadoop.

The Moose package

The standard Moose package is available in the UFAL environment, just add

. /net/work/projects/perl_repo/admin/bin/setup_platform

to .profile or .bashrc or type it in the shell

echo -e "\n#MR Tutorial - Moose" >> ~/.bashrc
echo ". /net/work/projects/perl_repo/admin/bin/setup_platform" >> ~/.bashrc

The Hadoop package

The custom Hadoop package is available in /net/projects/hadoop/perl, just add

export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"
export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"

to .profile, .bash_profile, .bashrc or type it in the shell.

echo -e "\n#MR Tutorial - Hadoop" >> ~/.bashrc
echo 'export PERLLIB="$PERLLIB:/net/projects/hadoop/perl/"' >> ~/.bashrc
echo 'export PERL5LIB="$PERL5LIB:/net/projects/hadoop/perl"' >> ~/.bashrc

When not logged in UFAL cluster

If you are not logged in the UFAL cluster, you will need:

local Hadoop installation
1. download http://www.apache.org/dist/hadoop/common/hadoop-1.0.0/hadoop-1.0.0.tar.gz
2. unpack it
3. edit conf/hadoop-env.sh file and make sure there is valid line
```
export JAVA_HOME=/path/to/your/jdk
```
the repository hadoop containing the Perl API and Java extensions.
when using Perl API, set hadoop_prefix to point to your Hadoop installation
when using Java API, one of the Makefiles contain absolute path to the hadoop repository – please correct it

When using local Hadoop installation, you must run all jobs either locally in a single thread or start a local cluster and use -jt for the jobs to use it (see using-a-running-cluster).

Overview

Step 2: Input and output format, testing data.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

MapReduce Tutorial : Setting the environment

Requirements

The Moose package

The Hadoop package

When not logged in UFAL cluster