Legacy Documentation

Digital Image Processing (2000)

This is documentation for an obsolete product.

Current products and services

User's Guide

Image Segmentation

7.5 Segmentation by Clustering

Clustering is a classification technique [Did80]. Given a vector of N measurements describing each pixel or group of pixels (i.e., region) in an image, a similarity of the measurement vectors and therefore their clustering in the N-dimensional measurement space implies similarity of the corresponding pixels or pixel groups. Therefore, clustering in measurement space may be an indicator of similarity of image regions, and may be used for segmentation purposes. The vector of measurements describes some useful image feature and thus is also known as a feature vector. Similarity between image regions or pixels implies clustering (small separation distances) in the feature space. Clustering methods were some of the earliest data segmentation techniques to be developed.

KMeans[data, seeds, weights]	returns
KMeansList[data, seeds, weights]	returns
ClusterMetric[clstr]	returns

Clustering functions.

KMeans clustering finds a grouping of the measurements that minimizes the within-cluster sum-of-squares. In this method, each measurement, represented by a vector of length N, is grouped so that it is assigned to one of a fixed number of clusters. The number of clusters is determined by the number of seeds given as the second argument of KMeans. Measurements are transferred from one cluster to another when doing so decreases the within-cluster distances. The algorithm stops when no more transfers can occur. Here we will demonstrate the application of KMeans clustering to a simple image segmentation problem. Consider the problem of programmatically manipulating individual red beans in a fragment of the beans image.

This loads the package.

In[1]:=

This loads and selects a fragment of the example image.

In[2]:=

The following sequence of operations extracts the red beans and enhances the resulting image using grayscale morphology.

In[3]:=

In[4]:=

Out[4]=

This returns the positions of all elements belonging to the foreground objects, namely the red beans.

In[5]:=

The initial cluster centers are selected interactively using the mouse (for details, see Section 5.6).

In[6]:=

This returns the clustering result.

In[7]:=

The result consists of two nested lists. The partitioned data is found in the first list, while the cluster centers are in the second. Here we show the cluster centers returned by KMeans.

In[8]:=

Out[8]=

Here we assign distinct colors to each of the three clusters and plot the data points on a 2D grid. Next we define a helper function to plot the data points (i.e., the pixels) with the typical square geometry.

In[9]:=

In[10]:=

In[11]:=

Out[11]=

The classification of the image pixels into separate regions now permits selective data operations. Here we demonstrate the selective removal of an individual cell from the bilevel image.

In[12]:=

This shows the result. Note the missing red bean in the lower-left corner of the image.

In[13]:=

Out[13]=

A small number of cluster analysis functions are available that measure the geometry of the image regions resulting from a segmentation operation. These allow the computation of bounding boxes or circles, region centers, and borders. The number of pixels in a region's border is an estimate of the length of its perimeter.

RegionArea[pts]	returns
RegionBorder[pts]	returns the border of the list of pixels given by pts using either 4- or 8-connected pixel neighborhoods
RegionCenter[pts]	returns the centroid of the list of pixels given by pts
RegionCircle[pts]	returns
RegionRectangle[pts]	returns bounding rectangle of the list of pixels in pts

Cluster analysis functions.

Here are the perimeter lengths of the clusters obtained earlier.

In[14]:=

Out[14]=

Here is the border of one of the regions.

In[15]:=

Out[15]=

The determination of similarity between regions or pixels is a result of measuring a distance in feature space. Many distance measures have been proposed between two points in N-dimensional space [Did80]. The most frequently used similarity measure is the well-known EuclideanDistance. If we take two measurement vectors X={x₁,x₂,...x_n} and Y={y₁,y₂,...y_n}, the Euclidean distance between them, also commonly called the square-root distance, is defined as

Other supported distance measures include the following Version 6 functions: SquaredEuclideanDistance, ManhattanDistance, ChebyshevDistance, CanberraDistance, CosineDistance, CorrelationDistance, BrayCurtisDistance, (see the tutorial Partitioning Data into Clusters), and the Digital Image Processing function MinkowskiDistance.

The default distance function in KMeans, KMeansList, or ClusterMetric is the Euclidean distance measure. However, any of the supported distance functions may be used instead.

Option name	Default value
DistanceFunction	EuclideanDistance	option for distance measure used in clustering operations

Option for KMeans, KMeansList, and ClusterMetric.

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how »