K-Means Algorithm Visualizer

~Shamika Redkar

Enter the number of data points:

Enter the number of clusters:

Instructions

K-means algorithm is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. Read more about K-means on Wikipedia

How to run the Visualizer

Specify the number of data points and clusters points.
Press the New button to generate new data and clusters.
Press the Start button to start the visualizer.
Press Restart button to start the visualization from beginning.
Press Stop button to stop the visualizer.

Understanding The Algorithm

Stage 1: Initialization
Stage 2: Assignment
Stage 3: Update
Stage 4: Convergence

Centroid is the center of a cluster
Initially, the exact center of data points are unknown so
So we select random data points and define them as centroids for each cluster.

Now that the centroids are initialized, the next step is to assign data points X_n to their closest cluster centroid C_k.
In this step, we will first calculate the distance between data point X and centroid C using Euclidean Distance metric.
And then choose the cluster for data points where the distance between the data point and the centroid is minimum.

Next, we will re-initialize the centroids by calculating the average of all data points of that cluster.
This means for each cluster, the new centroid is calculated by taking the average of all the data points assigned to that cluster
The position of the centroid is updated to the newly calculated mean position.

We will keep repeating stage 3 and 4 until we have optimal centroids and the assignments of data points to correct clusters are not changing anymore.