[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

MapReduce Tutorial : Custom data types

An important feature of the Java API is that custom data and format types can be provided. In this step we implement two custom data types.

BERIntWritable

We want to implement BERIntWritable, which is an int stored in the format of pack “w”, $num. Quoting: The bytes represent an unsigned integer in base 128, most significant digit first, with as few digits as possible. Bit eight (the high bit) is set on each byte except the last.

The new class must implement the Writable interface, i.e., methods readFields and write:

public class BERIntWritable implements Writable {
  private int value;
 
  public void readFields(DataInput in) throws IOException {
    value = 0;
 
    byte next;
    while (((next = in.readByte()) & 0x80) != 0) {
      value = (value << 7) | (next & 0x7F);
    }
    value = (value << 7) | next;
  }
 
  public void write(DataOutput out) throws IOException {
    int highest_pos = 28;
    while (highest_pos > 0 && (value & (0x7F << highest_pos)) == 0) highest_pos -= 7;
    while (highest_pos > 0) {
      out.writeByte(0x80 | ((value >> highest_pos) & 0x7F));
      highest_pos -= 7;
    }
    out.writeByte(value & 0x7F);
  }

Accessory methods get and set are needed in order to work with the value. Also we override toString, which is used by Hadoop when writing to plain text files.

  public int get() { return value; }
  public void set(int value) { this.value = value; }
  public String toString() { return String.valueOf(value); }
}

Such implementation can be used as a type of values. If we wanted to use it as a type of keys, we need to implement WritableComparable instead of just Writable. It is enough to add compareTo method to current implementation:

  public class BERIntWritable implements WritableComparable {
  ... //Same as before
 
  public int compareTo(Object other) {
    int otherValue = ((BERIntWritable)other).get();
    return value < otherValue ? -1 : (value == otherValue ? 0 : 1);
  }
}

PairWritable<A, B>

As another example, we implement a type consisting of two user-defined Writable implementations:

public static class PairWritable<A extends Writable, B extends Writable > implements Writable {
  private A first;
  private B second;
 
  public void readFields(DataInput in) throws IOException {
    first.readFields(in);
    second.readFields(in);
  }
 
  public void write(DataOutput out) throws IOException {
    first.write(out);
    second.write(out);
  }
 
  public A getFirst() { return first; }
  public B getSecond() { return second; }
  public void setFirst(A first) { this.first = first; }
  public void setSecond(B first) { this.second = second; }
  public PairWritable(A first, B second) { this.first = first; this.second = second; }
}

[ Back to the navigation ] [ Back to the content ]