A lot can go wrong in the process of creating cluster and submitting the Hadoop job:
Hadoop::Runner.pm
module not found: The Perl Hadoop package is not configured, see Setting the environment.ipc.Client: Retrying connect to server: IP_ADDRESS:PORT. Already tried ? time(s)
: The jobtracker cannot be contacted. If using -jt jobtracker:port
flag, check, that the jobtracker address is correct./net/projects/hadoop/bin/hadoop-cluster
fails to start a cluster: Look where the jobtracker was scheduled by SGE using qstat
. Login to that machine and investigate logs in /var/log/hadoop/$USER/$SGE_JOBID/
.If the cluster works, but your job crashes, you can:
-c
and -jt
flag). This is more useful for Hadoop jobs written in Java, because then you can use a debugger. When using Perl API, new subprocess are created for Perl tasks anyway.