K-means algorithm is an iterative algorithm that tries to partition
the dataset into K pre-defined distinct non-overlapping subgroups
(clusters) where each data point belongs to only one group. Read
more about K-means on
Wikipedia
How to run the Visualizer
Specify the number of data points and clusters points.
Press the New button to
generate new data and clusters.
Press the Start button to start
the visualizer.
Press Restart button to
start the visualization from beginning.
Press Stop button to stop the
visualizer.
Understanding The Algorithm
Centroid is the center of a cluster
Initially, the exact center of data points are unknown so
So we select random data points and define them as
centroids for each cluster.
Now that the centroids are initialized, the next step is to
assign data pointsXn to their closest
cluster centroidCk.
In this step, we will first calculate the distance between data
point
X
and centroid
C
using
Euclidean Distance metric.
And then choose the cluster for data points where the distance
between the data point and the centroid is minimum.
Next, we will re-initialize the centroids by calculating the
average of all data points of that cluster.
This means for each cluster, the
new centroid
is calculated by taking the average of all the data points
assigned to that cluster
The position of the centroid is updated to the newly calculated
mean position.
We will keep repeating stage 3 and 4 until we have
optimal centroids
and the
assignments of data points
to correct clusters are
not changing anymore.