"KMeans" (Machine Learning Method)

Details & Suboptions

  • "KMeans" is a classic, simple, centroid-based clustering method. "KMeans" works when clusters have similar sizes and are locally and isotropically distributed around their centroid. When clusters have very different sizes, are anisotropic, are intertwined, or when outliers are present, it is likely that "KMeans" will give poor results.
  • The following plots show the results of the "KMeans" method applied to toy datasets:
  • The "KMeans" method aims to find k centroids defining k clusters. Each data point is assigned to its nearest centroid. All points assigned to a given centroid are forming a cluster.
  • The procedure to find the best k centroids is iterative. The search starts by using random centroids and assigning each point to its nearest centroid:
  • Once all clusters are defined, the mean of each cluster becomes a new centroid:
  • This procedure is repeated until the clusters remain unchanged. This iterative procedure is sometimes called "hard EM" (hard Expectation Maximization).
  • The "KMeans" method is similar to the "GaussianMixture" with a spherical covariance (that is, all clusters are isotropic and have the same size).
  • Since the initial centroids are chosen randomly, results might differ upon evaluation.
  • The suboption "InitialCentroids" can be used to specify the initial centroids as a list of data points.
  • The following suboption can be given:
  • "InitialCentroids" Automatica list of initial centroids

Examples

open allclose all

Basic Examples  (3)

Find exactly four clusters of nearby values using the "KMeans" clustering method:

Create random 2D vectors:

Plot computed clusters using the "KMeans" method:

Train a ClassifierFunction on a list of strings:

Find the cluster assignments and gather the elements by their cluster:

Options  (3)

DistanceFunction  (1)

Cluster data using Manhattan distance:

"InitialCentroids"  (2)

Generate a list of 100 random colors:

Cluster the colors without specifying the initial configuration of centroids using the "KMeans" method:

Specify the initial colors to be used as centroids using the "KMeans" method:

Create random 2D vectors:

Find different clusterings of data using the "KMeans" method by varying the "InitialCentroids":

Possible Issues  (1)

Create and visualize noisy 2D moon-shaped training and test datasets:

Train a ClassifierFunction using "KMeans" for two clusters and find clusters in the test set:

Visualizing clusters indicates that "KMeans" performs poorly on intertwined clusters: