[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-15 [2012/01/25 22:20]
straka
courses:mapreduce-tutorial:step-15 [2012/01/29 16:40] (current)
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : K-means clustering ====== ====== MapReduce Tutorial : K-means clustering ======
 +
 +Implement the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. You can use the following data:
 +^ Path ^ Number of points ^ Number of dimensions ^ Number of clusters ^
 +| ''/net/projects/hadoop/examples/inputs/points-small'' | 10000 | 50 | 50 |
 +| ''/net/projects/hadoop/examples/inputs/points-medium'' | 100000 | 100 | 100 |
 +| ''/net/projects/hadoop/examples/inputs/points-large'' | 500000 | 200 | 200 |
 +
 +When dealing with iterative algorithms, each iteration is usually implemented as one Hadoop job. The Hadoop ''input_path'' should contain the input data and each mapper should also read the current clusters. The reducers are used to aggregate the data and output new cluster centers. A controlling script should take care of executing Hadoop jobs and stopping the iteration when the algorithm converges.
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-14|Step 14]]: N-gram language model.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html><html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]