ClusteringComponents

ClusteringComponents[array]

gives an array in which each element at the lowest level of array is replaced by an integer index representing the cluster in which the element lies.

ClusteringComponents[array,n]

finds n clusters.

ClusteringComponents[array,n,level]

finds clusters at the specified level in array.

ClusteringComponents[image]

finds clusters of pixels with similar values in image.

ClusteringComponents[image,n]

finds n clusters in image.

Details and Options

ClusteringComponents works for a variety of data types, including numerical, textual, and image, as well as dates and times.
The number of clusters can be specified in the following ways:
Automatic find the number of clusters automatically

n find exactly n clusters

UpTo[n] find at most n clusters
The following options can be given:

CriterionFunction	Automatic	criterion for selecting a method
DistanceFunction	Automatic	the distance function to use
FeatureExtractor	Identity	how to extract features from which to learn
FeatureNames	Automatic	feature names to assign for input data
FeatureTypes	Automatic	feature types to assume for input data
Method	Automatic	what method to use
MissingValueSynthesis	Automatic	how to synthesize missing values
PerformanceGoal	Automatic	aspect of performance to optimize
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally
Weights	Automatic	what weight to give to each example

By default, ClusteringComponents will preprocess the data automatically unless a DistanceFunction is specified.
The setting for DistanceFunction can be any distance or dissimilarity function, or a function f defining a distance between two values.
Possible settings for PerformanceGoal include:

	Automatic	automatic tradeoff among speed, accuracy, and memory
	"Quality"	maximize the accuracy of the classifier
	"Speed"	maximize the speed of the classifier

Possible settings for Method include:

	Automatic	automatically select a method
	"Agglomerate"	single linkage clustering algorithm
	"DBSCAN"	density-based spatial clustering of applications with noise
	"GaussianMixture"	variational Gaussian mixture algorithm
	"JarvisPatrick"	Jarvis–Patrick clustering algorithm
	"KMeans"	k-means clustering algorithm
	"KMedoids"	partitioning around medoids
	"MeanShift"	mean-shift clustering algorithm
	"NeighborhoodContraction"	shift data points toward high-density regions
	"SpanningTree"	minimum spanning tree-based clustering algorithm
	"Spectral"	spectral clustering algorithm

The methods "KMeans" and "KMedoids" can only be used when the number of clusters is specified.
The methods "DBSCAN", "GaussianMixture", "JarvisPatrick", "MeanShift" and "NeighborhoodContraction" can only be used when the number of clusters is Automatic.
The following plots show results of common methods on toy datasets:

Possible settings for CriterionFunction include:

	"StandardDeviation"	root-mean-square standard deviation
	"RSquared"	R-squared
	"Dunn"	Dunn index
	"CalinskiHarabasz"	Calinski–Harabasz index
	"DaviesBouldin"	Davies–Bouldin index
	Automatic	internal index

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or string as a seed

Examples

open allclose all

Basic Examples (3)

Label two clusters of values in a list:

Label a vector of strings:

Cluster analysis of an MR image:

Scope (10)

Clusters of values in a matrix:

Find color clusters in an image:

Find clusters in a 3D image:

Clustering transform of nested lists:

Find clusters at list level 2:

Find clusters at list level 1:

Find duplicates by specifying a large number of potential clusters:

Labeling clusters in a matrix:

Clustering lists of Booleans:

Clustering a list of Boolean vectors:

Options (13)

CriterionFunction (1)

Generate some separated data and visualize it:

Find a cluster assignment with exactly two clusters using different settings for CriterionFunction:

Compare the two clusterings of the data:

DistanceFunction (1)

By default, EditDistance is used to cluster a list of strings:

Use HammingDistance to cluster based on the number of characters that disagree:

FeatureExtractor (1)

Find clustering components for a list of images:

Create a custom FeatureExtractor to extract features:

Look at the resulting features:

Use the FeatureExtractor to find new clustering components:

Look at the new clustering:

FeatureNames (1)

Use FeatureNames to name features, and refer to their names in further specifications:

FeatureTypes (1)

Use FeatureTypes to enforce the interpretation of the features:

Compare it to the result obtained by assuming nominal features:

Method (5)

Generate normally distributed data and visualize its histogram:

Find cluster assignments for this data using the "GaussianMixture" method:

Visualize the corresponding clustering:

Find cluster assignments for a list of string using the k-medoids method:

Look at the resulting clustering:

Find color clusters in an image using different methods:

Find color clusters in an image using the "NeighborhoodContraction" method and its suboption:

Find color clusters in an image using the "Spectral" method and its suboption:

PerformanceGoal (1)

Generate 500 random numerical vectors of length 1000:

Compute their clustering and benchmark the operation:

Perform the same operation with PerformanceGoal set to "Quality":

RandomSeeding (1)

Generate 500 random numerical vectors in 2 dimensions:

Compute their clustering several times and compare the results:

Compute their clustering several times by changing the RandomSeeding option and compare the results:

Weights (1)

Obtain cluster assignment for some numerical data:

Look at the cluster assignment when changing the weight given to each number:

Applications (2)

Color segmentation of a microscopic image, after smoothing with a Perona–Malik filter:

Binary segmentation of an image:

Properties & Relations (3)

ClusteringComponents gives an array of cluster indices while FindClusters returns the list of clusters:

Convert the result of ClusteringComponents to partitions of similar elements:

FindClusters yields the same result:

Convert the result of FindClusters to a list of cluster indices:

ClusteringComponents yields the same result:

Possible Issues (1)

The "KMeans" method cannot be used when the mean of a subset of the input does not belong to the input space:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

ClusteringComponents

Details and Options

Examples

Basic Examples (3)

Scope (10)

Options (13)

CriterionFunction (1)

DistanceFunction (1)

FeatureExtractor (1)

FeatureNames (1)

FeatureTypes (1)

Method (5)

PerformanceGoal (1)

RandomSeeding (1)

Weights (1)

Applications (2)

Properties & Relations (3)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	Automatic	find the number of clusters automatically
	n	find exactly n clusters
	UpTo[n]	find at most n clusters

ClusteringComponents

Details and Options

Examples

Basic Examples (3)

Scope (10)

Options (13)

CriterionFunction (1)

DistanceFunction (1)

FeatureExtractor (1)

FeatureNames (1)

FeatureTypes (1)

Method (5)

PerformanceGoal (1)

RandomSeeding (1)

Weights (1)

Applications (2)

Properties & Relations (3)

Possible Issues (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX