ClusteringComponents
ClusteringComponents[array]
gives an array in which each element at the lowest level of array is replaced by an integer index representing the cluster in which the element lies.
ClusteringComponents[array,n]
finds n clusters.
ClusteringComponents[array,n,level]
finds clusters at the specified level in array.
ClusteringComponents[image]
finds clusters of pixels with similar values in image.
ClusteringComponents[image,n]
finds n clusters in image.
Details and Options
- ClusteringComponents works for a variety of data types, including numerical, textual, and image, as well as dates and times.
- The number of clusters can be specified in the following ways:
-
Automatic find the number of clusters automatically n find exactly n clusters UpTo[n] find at most n clusters - The following options can be given:
-
CriterionFunction Automatic criterion for selecting a method DistanceFunction Automatic the distance function to use FeatureExtractor Identity how to extract features from which to learn FeatureNames Automatic feature names to assign for input data FeatureTypes Automatic feature types to assume for input data Method Automatic what method to use MissingValueSynthesis Automatic how to synthesize missing values PerformanceGoal Automatic aspect of performance to optimize RandomSeeding 1234 what seeding of pseudorandom generators should be done internally Weights Automatic what weight to give to each example - By default, ClusteringComponents will preprocess the data automatically unless a DistanceFunction is specified.
- The setting for DistanceFunction can be any distance or dissimilarity function, or a function f defining a distance between two values.
- Possible settings for PerformanceGoal include:
-
Automatic automatic tradeoff among speed, accuracy, and memory "Quality" maximize the accuracy of the classifier "Speed" maximize the speed of the classifier - Possible settings for Method include:
-
Automatic automatically select a method "Agglomerate" single linkage clustering algorithm "DBSCAN" density-based spatial clustering of applications with noise "GaussianMixture" variational Gaussian mixture algorithm "JarvisPatrick" Jarvis–Patrick clustering algorithm "KMeans" k-means clustering algorithm "KMedoids" partitioning around medoids "MeanShift" mean-shift clustering algorithm "NeighborhoodContraction" shift data points toward high-density regions "SpanningTree" minimum spanning tree-based clustering algorithm "Spectral" spectral clustering algorithm - The methods "KMeans" and "KMedoids" can only be used when the number of clusters is specified.
- The methods "DBSCAN", "GaussianMixture", "JarvisPatrick", "MeanShift" and "NeighborhoodContraction" can only be used when the number of clusters is Automatic.
- The following plots show results of common methods on toy datasets:
- Possible settings for CriterionFunction include:
-
"StandardDeviation" root-mean-square standard deviation "RSquared" R-squared "Dunn" Dunn index "CalinskiHarabasz" Calinski–Harabasz index "DaviesBouldin" Davies–Bouldin index Automatic internal index - Possible settings for RandomSeeding include:
-
Automatic automatically reseed every time the function is called Inherited use externally seeded random numbers seed use an explicit integer or string as a seed
Examples
open allclose allBasic Examples (3)
Scope (10)
Options (13)
CriterionFunction (1)
Generate some separated data and visualize it:
Find a cluster assignment with exactly two clusters using different settings for CriterionFunction:
DistanceFunction (1)
By default, EditDistance is used to cluster a list of strings:
Use HammingDistance to cluster based on the number of characters that disagree:
FeatureExtractor (1)
Find clustering components for a list of images:
Create a custom FeatureExtractor to extract features:
Look at the resulting features:
Use the FeatureExtractor to find new clustering components:
FeatureNames (1)
Use FeatureNames to name features, and refer to their names in further specifications:
FeatureTypes (1)
Use FeatureTypes to enforce the interpretation of the features:
Compare it to the result obtained by assuming nominal features:
Method (5)
Generate normally distributed data and visualize its histogram:
Find cluster assignments for this data using the "GaussianMixture" method:
Visualize the corresponding clustering:
Find cluster assignments for a list of string using the k-medoids method:
Look at the resulting clustering:
Find color clusters in an image using different methods:
Find color clusters in an image using the "NeighborhoodContraction" method and its suboption:
Find color clusters in an image using the "Spectral" method and its suboption:
PerformanceGoal (1)
Generate 500 random numerical vectors of length 1000:
Compute their clustering and benchmark the operation:
Perform the same operation with PerformanceGoal set to "Quality":
RandomSeeding (1)
Generate 500 random numerical vectors in 2 dimensions:
Compute their clustering several times and compare the results:
Compute their clustering several times by changing the RandomSeeding option and compare the results:
Applications (2)
Properties & Relations (3)
ClusteringComponents gives an array of cluster indices while FindClusters returns the list of clusters:
Convert the result of ClusteringComponents to partitions of similar elements:
FindClusters yields the same result:
Convert the result of FindClusters to a list of cluster indices:
ClusteringComponents yields the same result:
Text
Wolfram Research (2010), ClusteringComponents, Wolfram Language function, https://reference.wolfram.com/language/ref/ClusteringComponents.html (updated 2022).
CMS
Wolfram Language. 2010. "ClusteringComponents." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2022. https://reference.wolfram.com/language/ref/ClusteringComponents.html.
APA
Wolfram Language. (2010). ClusteringComponents. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/ClusteringComponents.html