Clustering is a classification technique [Did80]. Given a vector of
N measurements describing each pixel or group of pixels (i.e., region) in an image, a similarity of the measurement vectors and therefore their clustering in the
N-dimensional measurement space implies similarity of the corresponding pixels or pixel groups. Therefore, clustering in measurement space may be an indicator of similarity of image regions, and may be used for segmentation purposes. The vector of measurements describes some useful image feature and thus is also known as a feature vector. Similarity between image regions or pixels implies clustering (small separation distances) in the feature space. Clustering methods were some of the earliest data segmentation techniques to be developed.
KMeans clustering finds a grouping of the measurements that minimizes the within-cluster sum-of-squares. In this method, each measurement, represented by a vector of length
N, is grouped so that it is assigned to one of a fixed number of clusters. The number of clusters is determined by the number of seeds given as the second argument of
KMeans. Measurements are transferred from one cluster to another when doing so decreases the within-cluster distances. The algorithm stops when no more transfers can occur. Here we will demonstrate the application of
KMeans clustering to a simple image segmentation problem. Consider the problem of programmatically manipulating individual red beans in a fragment of the
beans image.
The determination of similarity between regions or pixels is a result of measuring a distance in feature space. Many distance measures have been proposed between two points in N-dimensional space [Did80]. The most frequently used similarity measure is the well-known
EuclideanDistance. If we take two measurement vectors X={x
1,x
2,...x
n} and Y={y
1,y
2,...y
n}, the Euclidean distance between them, also commonly called the square-root distance, is defined as