[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-27 [2012/01/28 18:45]
straka
courses:mapreduce-tutorial:step-27 [2012/01/31 13:13]
straka
Line 24: Line 24:
  
   public void write(DataOutput out) throws IOException {   public void write(DataOutput out) throws IOException {
-    int highest_pos = 28; +    int mask_shift = 28; 
-    while (highest_pos > 0 && (value & (0x7F << highest_pos)) == 0) highest_pos -= 7; +    while (mask_shift > 0 && (value & (0x7F << mask_shift)) == 0) mask_shift -= 7; 
-    while (highest_pos > 0) { +    while (mask_shift > 0) { 
-      out.writeByte(0x80 | ((value >> highest_pos) & 0x7F)); +      out.writeByte(0x80 | ((value >> mask_shift) & 0x7F)); 
-      highest_pos -= 7;+      mask_shift -= 7;
     }     }
     out.writeByte(value & 0x7F);     out.writeByte(value & 0x7F);
Line 40: Line 40:
 } }
 </code> </code>
 +Remark: If the ''BERIntWritable'' class is not declared top-level, it must be declared **''static''**.
  
 Such implementation can be used as a type of //values//. If we wanted to use it as a type of //keys//, we need to implement [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]] instead of just ''Writable''. It is enough to add ''compareTo'' method to current implementation: Such implementation can be used as a type of //values//. If we wanted to use it as a type of //keys//, we need to implement [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]] instead of just ''Writable''. It is enough to add ''compareTo'' method to current implementation:
Line 55: Line 56:
 ===== PairWritable<A, B> ===== ===== PairWritable<A, B> =====
  
 +As another example, we implement a type consisting of two user-defined ''Writable'' implementations:
 +<code java>
 +public static class PairWritable<A extends Writable, B extends Writable > implements Writable {
 +  private A first;
 +  private B second;
 +
 +  public void readFields(DataInput in) throws IOException {
 +    first.readFields(in);
 +    second.readFields(in);
 +  }
 +
 +  public void write(DataOutput out) throws IOException {
 +    first.write(out);
 +    second.write(out);
 +  }
 +
 +  public A getFirst() { return first; }
 +  public B getSecond() { return second; }
 +  public void setFirst(A first) { this.first = first; }
 +  public void setSecond(B first) { this.second = second; }
 +  public String toString() { return String.format("%s %s", first.toString(), second.toString()); }
 +  public PairWritable(A first, B second) { this.first = first; this.second = second; }
 +}
 +</code>
 +Remark: Remark: If the ''PairWritable'' class is not declared top-level, it must be declared **''static''**.
 +
 +We did not define ''compareTo'' method. The reason is that in order to do so, the types ''A'' and ''B'' would have to implement ''WritableComparable'' and the ''PairWritable'' could not be used with types not providing ''compareTo''. The best way of solving this issue is probably to create a new type ''PairWritableComparable<A, B>'' which implements ''WritableComparable'':
 +<code java>
 +public static class PairWritableComparable<A extends WritableComparable, B extends WritableComparable > implements WritableComparable {
 +  private A first;
 +  private B second;
 +
 +  public void readFields(DataInput in) throws IOException {
 +    first.readFields(in);
 +    second.readFields(in);
 +  }
 +
 +  public void write(DataOutput out) throws IOException {
 +    first.write(out);
 +    second.write(out);
 +  }
 +
 +  public int compareTo(Object other) {
 +    PairWritableComparable<A, B> otherPair = (PairWritableComparable<A, B>) other;
 +    int cmpFirst = first.compareTo(otherPair.getFirst());
 +    if (cmpFirst < 0) return -1;
 +    if (cmpFirst > 0) return 1;
 +    return second.compareTo(otherPair.getSecond());
 +  }
 +
 +  public A getFirst() { return first; }
 +  public B getSecond() { return second; }
 +  public void setFirst(A first) { this.first = first; }
 +  public void setSecond(B first) { this.second = second; }
 +  public String toString() { return String.format("%s %s", first.toString(), second.toString()); }  
 +  public PairWritableComparable(A first, B second) { this.first = first; this.second = second; }
 +}
 +</code>
 +Remark: If the ''PairWritableComparable'' class is not declared top-level, it must be declared **''static''**.
 +
 +===== Exercise =====
 +
 +Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored.
 +
 +Create a type ''DocWithOccurences<Doctype extends WritableComparable>'' implementing ''WritableComparable''. The type:
 +  * stores a document of type ''Doctype''.
 +  * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type ''BERIntWritable'' should be used.
 +  * is comparable, comparing using the ''Comparable'' interface od ''Doctype''.
 +  * has methods ''getDoc'', ''setDoc'', ''getOccurrences'', ''addOccurence'', ''toString''.
 +
 +Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a //sorted// list of ''DocWithOccurences<Text>'' containing the documents containing this word, including the occurences.
 +
 +
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-28|Step 28]]: Running multiple Hadoop jobs in one source file.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]