[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/01/28 20:17]
straka vytvořeno
courses:mapreduce-tutorial:step-29 [2012/02/05 18:57]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Running multiple Hadoop jobs ======+====== MapReduce Tutorial : Custom sorting and grouping comparators. ====== 
 + 
 +====== Custom sorting comparator ====== 
 + 
 +The keys are sorted before processed by a reducer, using a 
 +[[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/RawComparator.html|Raw comparator]]. The default comparator uses the ''compareTo'' method provided by the key type, which is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]]. Consider for example the following ''IntPair'' type: 
 + 
 +<code java> 
 +public static class IntPair implements WritableComparable<IntPair>
 +  private int first = 0; 
 +  private int second = 0; 
 + 
 +  public void set(int left, int right) { first = left; second = right; } 
 +  public int getFirst() { return first; } 
 +  public int getSecond() { return second; } 
 + 
 +  public void readFields(DataInput in) throws IOException { 
 +    first = in.readInt(); 
 +    second = in.readInt(); 
 +  } 
 +  public void write(DataOutput out) throws IOException { 
 +    out.writeInt(first); 
 +    out.writeInt(second); 
 +  } 
 + 
 +  public int compareTo(IntPair o) { 
 +    if (first != o.first) return first < o.first ? -1 : 1; 
 +    else return second < o.second ? -1 : second == o.second ? 0 : 1; 
 +  } 
 +
 +</code> 
 + 
 +If we would like in a Hadoop job to sort the ''IntPair'' using the first element only, we can provide a ''RawComparator'' and set it using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)|job.setSortComparatorClass]]: 
 + 
 +<code java> 
 +public static class IntPair implements WritableComparable<IntPair>
 +  ... 
 +  public static class FirstOnlyComparator implements RawComparator<IntPair>
 +    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { 
 +      int first1 = WritableComparator.readInt(b1, s1); 
 +      int first2 = WritableComparator.readInt(b2, s2); 
 +      return first1 < first2 ? -1 : first1 == first2 ? 0 : 1; 
 +    } 
 +    public int compare(IntPair x, IntPair y) { 
 +      return x.getFirst() < y.getFirst() ? -1 : x.getFirst() == y.getFirst() ? 0 : 1; 
 +    } 
 +  } 
 +
 + 
 +... 
 + 
 +job.setSortComparatorClass(IntPair.FirstOnlyComparator.class); 
 +</code> 
 +Notice we used helper function ''readInt'' from [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparator.html|WritableComparator]] class, which provides means of parsing primitive data types from byte streams. 
 + 
 +====== Grouping comparator ====== 
 + 
 +In a reduce, it is guaranteed that keys are processed in ascending order. Sometimes it would be useful if the //values associated with one key// could also be processed in ascending order. 
 + 
 +That is possible only to some degree. The (key, value) pairs are compared //using the key only//. After the (key, value) pairs are sorted, the (key, value) pairs with the same key are //grouped// together. This grouping can be performed using a custom ''RawComparator'' -- it is therefore possible to group the input pairs using //only a part of the keys//. 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-28|Step 28]]: Custom data types.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-30|Step 30]]: Custom input formats.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  

[ Back to the navigation ] [ Back to the content ]