"KMeans" (Machine Learning Method)
- Method for FindClusters, ClusterClassify and ClusteringComponents.
- Partitions data into a specified clusters of similar elements using a k-means clustering algorithm.
Details & Suboptions
- "KMeans" is a classic, simple, centroid-based clustering method. "KMeans" works when clusters have similar sizes and are locally and isotropically distributed around their centroid. When clusters have very different sizes, are anisotropic, are intertwined, or when outliers are present, it is likely that "KMeans" will give poor results.
- The following plots show the results of the "KMeans" method applied to toy datasets:
- The "KMeans" method aims to find k centroids defining k clusters. Each data point is assigned to its nearest centroid. All points assigned to a given centroid are forming a cluster.
- The procedure to find the best k centroids is iterative. The search starts by using random centroids and assigning each point to its nearest centroid:
- Once all clusters are defined, the mean of each cluster becomes a new centroid:
- This procedure is repeated until the clusters remain unchanged. This iterative procedure is sometimes called "hard EM" (hard Expectation Maximization).
- The "KMeans" method is similar to the "GaussianMixture" with a spherical covariance (that is, all clusters are isotropic and have the same size).
- Since the initial centroids are chosen randomly, results might differ upon evaluation.
- The suboption "InitialCentroids" can be used to specify the initial centroids as a list of data points.
- The following suboption can be given:
"InitialCentroids" Automatic a list of initial centroids
Examplesopen allclose all
Basic Examples (3)
Train a ClassifierFunction on a list of strings:
Possible Issues (1)
Train a ClassifierFunction using "KMeans" for two clusters and find clusters in the test set:
Introduced in 2020