[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-15 [2012/01/26 00:11]
straka
courses:mapreduce-tutorial:step-15 [2012/01/26 23:19]
straka
Line 3: Line 3:
 Implement the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. You can use the following data: Implement the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. You can use the following data:
 ^ Path ^ Number of points ^ Number of dimensions ^ Number of clusters ^ ^ Path ^ Number of points ^ Number of dimensions ^ Number of clusters ^
-| ''/home/straka/hadoop/example-inputs/points-small'' | 10000 | 50 | 50 | +| ''/net/projects/hadoop/examples/inputs/points-small'' | 10000 | 50 | 50 | 
-| ''/home/straka/hadoop/example-inputs/points-medium'' | 100000 | 100 | 100 | +| ''/net/projects/hadoop/examples/inputs/points-medium'' | 100000 | 100 | 100 | 
-| ''/home/straka/hadoop/example-inputs/points-large'' | 500000 | 200 | 200 |+| ''/net/projects/hadoop/examples/inputs/points-large'' | 500000 | 200 | 200 |
  
 When dealing with iterative algorithms, each iteration is usually implemented as one Hadoop job. The Hadoop input_path contains the input data and each mapper also reads the current clusters. The reducers are used to aggregate the data and output new cluster centers. A controlling script is taking care of executing Hadoop jobs and stopping the iteration when the algorithm converges. When dealing with iterative algorithms, each iteration is usually implemented as one Hadoop job. The Hadoop input_path contains the input data and each mapper also reads the current clusters. The reducers are used to aggregate the data and output new cluster centers. A controlling script is taking care of executing Hadoop jobs and stopping the iteration when the algorithm converges.
  

[ Back to the navigation ] [ Back to the content ]