A deep dive into partitioning around medoids

Series: Kmeans and Its Variants

In this final article in my mini-series on k-means and its variants, I will talk about the k-medoids algorithm, also commonly called partitioning around medoids (PAM). It has the beauty of being basically deterministic and find very good solutions reliably.

How to cluster noisy data sets

Series: Kmeans and Its Variants

Real-world data sets often come with many outliers that you might not be able to remove completely during the data cleanup phase. If you have run into this problem, I want to introduce you to the k-medians algorithm. By using the median instead of the mean, and using a more robust dissimilarity metric, it is much less sensitive to outliers.

The k-means++ algorithm to kick start your initialization

Series: Kmeans and Its Variants

k-means is a very simple and ubiquitous clustering algorithm. But quite often it does not work on your problem, for example because the initialization is bad. Fortunately, there is an improved initialization method, k-means++, which can help to alleviate this problem.