[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/02/05 18:23]
straka
courses:mapreduce-tutorial:step-29 [2012/02/05 18:49]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Custom sorting and grouping comparators. ====== ====== MapReduce Tutorial : Custom sorting and grouping comparators. ======
  
-====== Sorting comparator ======+====== Fast sorting comparator ======
  
 The keys are sorted before processed by a reducer, using a The keys are sorted before processed by a reducer, using a
-[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]].+[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]]. The default comparator uses the [[compareTo]] method provided by the key type, which is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]]. Consider for example the following ''IntPair'' type: 
 + 
 +<code java> 
 +public static class IntPair implements WritableComparable<IntPair>
 +  private int first = 0; 
 +  private int second = 0; 
 + 
 +  public void set(int left, int right) { first = left; second = right; } 
 +  public int getFirst() { return first; } 
 +  public int getSecond() { return second; } 
 + 
 +  public void readFields(DataInput in) throws IOException { 
 +    first = in.readInt(); 
 +    second = in.readInt(); 
 +  } 
 +  public void write(DataOutput out) throws IOException { 
 +    out.writeInt(first); 
 +    out.writeInt(second); 
 +  } 
 + 
 +  public int compareTo(IntPair o) { 
 +    if (first != o.first) return first < o.first ? -1 : 1; 
 +    else return second < o.second ? -1 : second == o.second ? 0 : 1; 
 +  } 
 +
 +</code> 
 + 
 +If we would like in a Hadoop job to sort the ''IntPair'' using the first element only, we can provide a ''RawComparator'' and set it using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)|job.setSortComparatorClass]]: 
 + 
  
 ====== Grouping comparator ====== ====== Grouping comparator ======

[ Back to the navigation ] [ Back to the content ]