Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-15 [2012/01/25 15:46]
straka vytvořeno
+++ courses:mapreduce-tutorial:step-15 [2012/01/26 00:11]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial :  ======
+====== MapReduce Tutorial : K-means clustering ======
+Implement the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. You can use the following data:
+^ Path ^ Number of points ^ Number of dimensions ^ Number of clusters ^
+| ''/home/straka/hadoop/example-inputs/points-small'' | 10000 | 50 | 50 |
+| ''/home/straka/hadoop/example-inputs/points-medium'' | 100000 | 100 | 100 |
+| ''/home/straka/hadoop/example-inputs/points-large'' | 500000 | 200 | 200 |
+When dealing with iterative algorithms, each iteration is usually implemented as one Hadoop job. The Hadoop input_path contains the input data and each mapper also reads the current clusters. The reducers are used to aggregate the data and output new cluster centers. A controlling script is taking care of executing Hadoop jobs and stopping the iteration when the algorithm converges.

Institute of Formal and Applied Linguistics Wiki