Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-28 [2012/01/30 15:44] majlis |
courses:mapreduce-tutorial:step-28 [2012/01/31 13:10] straka |
| |
===== Exercise 1 ===== | ===== Exercise 1 ===== |
| Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting. |
| |
Improve the last [[.:step-27#exercise|inverted index creation exercise]], such that | ===== Exercise 2 ===== |
| |
| Improve the [[.:step-27#exercise|inverted index creation exercise]], such that |
- in the first job, create a list of unique document names. Number the documents using the order in this list. | - in the first job, create a list of unique document names. Number the documents using the order in this list. |
- in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). | - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). |
| |
===== Exercise 2 ===== | ===== Exercise 3 ===== |
| |
Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary. | Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary. |
<td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td> | <td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td> |
<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> | <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
<td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom input formats.<html></td> | <td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom sorting and grouping comparators.<html></td> |
</tr> | </tr> |
</table> | </table> |
</html> | </html> |
| |