"KMedoids" (Machine Learning Method)

Details & Suboptions

  • The "KMedoids" method, also known as Partitioning Around Medoids (PAM), is a simple and fast centroid-based method. "KMedoids" is good when clusters have similar sizes and are locally distributed around their centroid (a.k.a. medoids). When clusters have very different sizes, are intertwined, or when outliers are present, it is likely that "KMedoids" will give poor results.
  • The following plots show the results of the "KMedoids" method applied to toy datasets:
  • The "KMedoids" method aims to find k medoids defining k clusters. Each data point is assigned to its nearest medoid. All points assigned to a given medoid are forming a cluster.
  • The procedure to find the best k medoids is the same as "KMeans", except that the medoids are not defined as the mean of a cluster. Instead, a cluster medoid is defined as the data point in the cluster that is the most central, that is, the data point whose average distance to other points in the cluster is minimal. Because "KMedoids" does not compute means like "KMeans", it can be used in non-numeric spaces (a distance function is sufficient).
  • Since the initial centroids are chosen randomly, results might differ upon evaluation.
  • The suboption "InitialCentroids" can be used to specify the initial centroids as a list of data points. Each initial centroid must match an existing data point.
  • The following suboption can be given:
  • "InitialCentroids" Automatica list of initial centroids

Examples

open allclose all

Basic Examples  (3)

Find exactly four clusters of nearby values using the "KMedoids" clustering method:

Create random 2D vectors:

Find clusters in data found using the "KMedoids" method:

Train a ClassifierFunction on a list of strings:

Find the cluster assignments and gather the elements by their cluster:

Options  (3)

"InitialCentroids"  (3)

Find clusters by specifying the "InitialCentroids" suboption:

Generate a list of 100 random colors:

Cluster the colors using the "KMedoids" method:

Specify the initial medoids using the "InitialCentroids" suboption:

Create random 2D vectors:

Find clusters in data found using the "KMedoids" method:

Find clusterings of data by setting the "InitialCentroids":

Possible Issues  (1)

Create and visualize noisy 2D moon-shaped training and test datasets:

Train a ClassifierFunction using "KMedoids" for two clusters and find clusters in the test set:

Visualizing clusters indicates that "KMedoids" performs poorly on intertwined clusters: