[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-27 [2012/01/28 19:04]
straka
courses:mapreduce-tutorial:step-27 [2012/01/28 20:14]
straka
Line 76: Line 76:
   public void setFirst(A first) { this.first = first; }   public void setFirst(A first) { this.first = first; }
   public void setSecond(B first) { this.second = second; }   public void setSecond(B first) { this.second = second; }
 +  public String toString() { return String.format("%s %s", first.toString(), second.toString()); }
   public PairWritable(A first, B second) { this.first = first; this.second = second; }   public PairWritable(A first, B second) { this.first = first; this.second = second; }
 } }
Line 109: Line 110:
   public void setFirst(A first) { this.first = first; }   public void setFirst(A first) { this.first = first; }
   public void setSecond(B first) { this.second = second; }   public void setSecond(B first) { this.second = second; }
 +  public String toString() { return String.format("%s %s", first.toString(), second.toString()); }  
   public PairWritableComparable(A first, B second) { this.first = first; this.second = second; }   public PairWritableComparable(A first, B second) { this.first = first; this.second = second; }
 } }
Line 114: Line 116:
 Remark: If the ''PairWritableComparable'' class is not declared top-level, it must be declared **''static''**. Remark: If the ''PairWritableComparable'' class is not declared top-level, it must be declared **''static''**.
  
 +===== Exercise =====
 +
 +Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored.
 +
 +Create a type ''DocWithOccurences<Doctype extends WritableComparable>'' implementing ''WritableComparable''. The type:
 +  * stores a document of type ''Doctype''.
 +  * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type ''BERIntWritable'' should be used.
 +  * is comparable, comparing using the ''Comparable'' interface od ''Doctype''.
 +  * has methods ''getDoc'', ''setDoc'', ''getOccurrences'', ''addOccurence'', ''toString''.
 +
 +Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a //sorted// list of ''DocWithOccurences<Text>'' containing the documents containing this word, including the occurences.

[ Back to the navigation ] [ Back to the content ]