[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/02/05 18:49]
straka
courses:mapreduce-tutorial:step-29 [2012/02/05 18:54]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Custom sorting and grouping comparators. ====== ====== MapReduce Tutorial : Custom sorting and grouping comparators. ======
  
-====== Fast sorting comparator ======+====== Custom sorting comparator ======
  
 The keys are sorted before processed by a reducer, using a The keys are sorted before processed by a reducer, using a
-[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]]. The default comparator uses the [[compareTo]] method provided by the key type, which is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]]. Consider for example the following ''IntPair'' type:+[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]]. The default comparator uses the ''compareTo'' method provided by the key type, which is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]]. Consider for example the following ''IntPair'' type:
  
 <code java> <code java>
Line 33: Line 33:
 If we would like in a Hadoop job to sort the ''IntPair'' using the first element only, we can provide a ''RawComparator'' and set it using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)|job.setSortComparatorClass]]: If we would like in a Hadoop job to sort the ''IntPair'' using the first element only, we can provide a ''RawComparator'' and set it using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)|job.setSortComparatorClass]]:
  
 +<code java>
 +public static class IntPair implements WritableComparable<IntPair> {
 +  ...
 +  public static class FirstOnlyComparator implements RawComparator<IntPair> {
 +    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
 +      int first1 = WritableComparator.readInt(b1, s1);
 +      int first2 = WritableComparator.readInt(b2, s2);
 +      return first1 < first2 ? -1 : first1 == first2 ? 0 : 1;
 +    }
 +    public int compare(IntPair x, IntPair y) {
 +      return x.getFirst() < y.getFirst() ? -1 : x.getFirst() == y.getFirst() ? 0 : 1;
 +    }
 +  }
 +}
  
 +...
 +
 +job.setSortComparatorClass(IntPair.FirstOnlyComparator.class);
 +</code>
 +Notice we used helper function ''readInt'' from [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparator.html|WritableComparator]] class, which provides means of parsing primitive data types from byte streams.
  
 ====== Grouping comparator ====== ====== Grouping comparator ======

[ Back to the navigation ] [ Back to the content ]