[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/02/05 18:23]
straka
courses:mapreduce-tutorial:step-29 [2012/02/05 18:54]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Custom sorting and grouping comparators. ====== ====== MapReduce Tutorial : Custom sorting and grouping comparators. ======
  
-====== Sorting comparator ======+====== Custom sorting comparator ======
  
 The keys are sorted before processed by a reducer, using a The keys are sorted before processed by a reducer, using a
-[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]].+[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]]. The default comparator uses the ''compareTo'' method provided by the key type, which is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]]. Consider for example the following ''IntPair'' type: 
 + 
 +<code java> 
 +public static class IntPair implements WritableComparable<IntPair>
 +  private int first = 0; 
 +  private int second = 0; 
 + 
 +  public void set(int left, int right) { first = left; second = right; } 
 +  public int getFirst() { return first; } 
 +  public int getSecond() { return second; } 
 + 
 +  public void readFields(DataInput in) throws IOException { 
 +    first = in.readInt(); 
 +    second = in.readInt(); 
 +  } 
 +  public void write(DataOutput out) throws IOException { 
 +    out.writeInt(first); 
 +    out.writeInt(second); 
 +  } 
 + 
 +  public int compareTo(IntPair o) { 
 +    if (first != o.first) return first < o.first ? -1 : 1; 
 +    else return second < o.second ? -1 : second == o.second ? 0 : 1; 
 +  } 
 +
 +</code> 
 + 
 +If we would like in a Hadoop job to sort the ''IntPair'' using the first element only, we can provide a ''RawComparator'' and set it using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)|job.setSortComparatorClass]]: 
 + 
 +<code java> 
 +public static class IntPair implements WritableComparable<IntPair>
 +  ... 
 +  public static class FirstOnlyComparator implements RawComparator<IntPair>
 +    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { 
 +      int first1 = WritableComparator.readInt(b1, s1); 
 +      int first2 = WritableComparator.readInt(b2, s2); 
 +      return first1 < first2 ? -1 : first1 == first2 ? 0 : 1; 
 +    } 
 +    public int compare(IntPair x, IntPair y) { 
 +      return x.getFirst() < y.getFirst() ? -1 : x.getFirst() == y.getFirst() ? 0 : 1; 
 +    } 
 +  } 
 +
 + 
 +... 
 + 
 +job.setSortComparatorClass(IntPair.FirstOnlyComparator.class); 
 +</code> 
 +Notice we used helper function ''readInt'' from [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparator.html|WritableComparator]] class, which provides means of parsing primitive data types from byte streams.
  
 ====== Grouping comparator ====== ====== Grouping comparator ======

[ Back to the navigation ] [ Back to the content ]