Image Processing

The Wolfram Language provides built-in support for both programmatic and interactive image processing, fully integrated with the Wolfram Language's powerful mathematical and algorithmic capabilities. You can create and import images, manipulate them with built-in functions, apply linear and nonlinear filters, and visualize them in any number of ways.

Image Creation and Representation

Images can be created from numerical arrays, from Wolfram Language graphics via cut-and-paste methods, and from external sources via Import.

Image[data]raster image with pixel values given by data
Import["file"]import data from a file
CurrentImage[]capture an image from a camera or other device

Image creation functions.

The simplest way to create an image object is to wrap Image around a matrix of real values ranging from 0 to 1.

Here is a one-channel image created from a matrix of numbers:
Click for copyable input

Another way is to copy and paste or drag and drop an image from some other application. You can use Import to obtain an image from a file on the local file system or any accessible remote location.

This imports an image from the Wolfram Language documentation directory ExampleData:
Click for copyable input

Useful properties of an image can be obtained by calling the following functions.

ImageDimensions[image]give the pixel dimensions of the raster associated with image
ImageAspectRatio[image]give the ratio of height to width of image
ImageChannels[image]give the number of channels present in the data for image
ImageColorSpace[image]give the color space associated with image
ImageType[image]give the type of values used for each pixel element in image
ImageQ[image]give True if image has the form of a valid Image object and False otherwise
Options[symbol]give the list of default options assigned to a symbol
ImageData[image]the array of pixel values in image

Image properties.

This returns the image dimensions:
Click for copyable input

The image's array of pixel values can be easily extracted using the function ImageData. By default, the function returns real values, but you can ask for a specific type using the optional "type" argument.

This returns a fragment of the image as a matrix of real values scaled to the range 0 to 1:
Click for copyable input
Here is the same fragment as a matrix of integers in the range 0 to 255:
Click for copyable input

In the case of multichannel images, the raw pixel data is represented by a 3D array arranged in one of two possible ways as determined by the option Interleaving.

This imports a color image:
Click for copyable input

With the default setting Interleaving->True, the data is organized as a 2D array of lists of color values, a triplet in the common case of images in RGB color space.

This shows the default data organization:
Click for copyable input

The option setting Interleaving->False can be used to store and retrieve the raw data as a list of matrices, one for each of the color channels.

Here is a fragment of the example image arranged as a list of channel matrices:
Click for copyable input

A multichannel image can be split into a list of single-channel images and, conversely, a multichannel image can be created from any number of single-channel images.

This splits the example RGB color image into three grayscale images:
Click for copyable input
Click for copyable input

Coordinate Systems

Several image processing commands require or return positions in the image domain. To specify pixel positions, a coordinate system is required. Note that there is more than one coordinate system in use. This tutorial distinguishes between index coordinates and image coordinates.

Index Coordinates

Images are arrays of pixel data, and these arrays have row and column indices as inherent coordinates. Consequently, the Wolfram Language's part specification extends naturally as a discrete coordinate system to images.

Define an array of black and white data:
Click for copyable input
This extracts the top 10 rows and columns 9 through 15 from an array:
Click for copyable input
The corresponding image command with the same row and column specification:
Click for copyable input

Part specifications given by row and column indices are well-defined. However, the spatial embedding of an array is ambiguous.

The Graphics primitive Raster displays the rows of an array from bottom to top:
Click for copyable input
Image renders the rows of an array from top to bottom:
Click for copyable input

The orientation of column and row coordinates depends on the spatial embedding. The first coordinate enumerating rows runs vertically, pointing up in the case of Raster and down in the case of Image. The second coordinate enumerating columns runs horizontally from left to right.

Index coordinates of a Raster of width and height :
Click for copyable input
Index coordinates of an Image of width and height :
Click for copyable input

Wolfram Language commands that operate on both images and data arrays adhere to the index coordinate system. These commands first list parameters that refer to the vertical row-coordinate and then list parameters that refer to the horizontal column-coordinate.

First-order Gaussian derivative along the row-coordinate in the direction:
Click for copyable input
First-order Gaussian derivative along the column-coordinate in the direction:
Click for copyable input

Image Coordinates

The second coordinate system is not intrinsic to the data but attached to the embedding space. The continuous image coordinate system, like the graphics coordinate system, has its origin in the bottom-left corner of an image with an coordinate extending from left to right and a coordinate running upward. The image domain covers the 2D-interval ×.

Standard image coordinates of an Image of width and height :
Click for copyable input

Image pixels are covered by intervals between successive integer coordinate values. Thus, noninteger coordinates refer unambiguously to a single pixel. Integer coordinates located on pixel boundaries, however, take all immediate pixel neighbors into account, either by selecting all neighboring pixels or by taking their average color value.

Channel values at a standard image coordinate:
Click for copyable input
Click for copyable input
Average channel values at a standard image coordinate between pixels:
Click for copyable input
Click for copyable input
Trimmed image spanned by coordinates {5,5} and {16,16} and its neighboring pixels:
Click for copyable input
Click for copyable input

Image processing commands that are not applicable to arbitrary arrays render their results in standard image coordinates. These standard image coordinates can readily be used in Graphics primitives.

ImageLines returns lines in standard image coordinates:
Click for copyable input
Click for copyable input

For an image of height , the conversion between index coordinates and standard image coordinates is given by

and ,

or vice versa by

and .

A slightly modified version of the standard image coordinate system is the normalized image coordinate system, in which the image width is scaled to 1.

Normalized image coordinates of an Image of width and height :
Click for copyable input

At times, it is more convenient to use normalized coordinates to specify operations that are independent of image dimensions.

Shift of an image by of the image width:
Click for copyable input
Same shift in pixel units:
Click for copyable input

Another modified version of the standard image coordinate system is the pixel-aligned coordinate system. The origin of this coordinate system is shifted by to the left and down with respect to the standard image coordinate system to align the integer coordinates with pixel centers.

Pixel-aligned image coordinates of an Image of width and height :
Click for copyable input
PixelValue[image,{x,y}]give the pixel value of image at position {x,y}
PixelValuePositions[image,val]return a list of pixel positions in image that match the value val
ReplacePixelValue[image,{xp,yp}->val]change the pixel values at pixel position {xp,yp} in image to val

Commands using pixel-aligned image coordinates.

Basic Image Manipulation

Consider the image manipulation operations that change the image dimensions by cropping or padding. These operations serve a variety of useful purposes. Cropping allows you to create a new image from a selected portion of a larger one, while padding is typically used to extend an image at the borders to ensure uniform treatment of the border pixels in many image processing tasks.

ImageTake[image,n]give an image consisting of the first n rows of image
ImageCrop[image]crop image by removing borders of uniform color
ImageTrim[image,{{x1,y1},}]trim image to include the specified {xi,yi} pixels
ImagePad[image,m]pad image on all sides with m background pixels

Image cropping and padding operations.

Click for copyable input
This selects the first 50 rows of the example image:
Click for copyable input

ImageCrop conveniently complements ImageTake. Instead of specifying the exact number of rows or columns to be extracted, it allows you to define the desired dimensions of the resulting image, namely, the number of rows or columns that are to be retained. By default, the cropping operation is centered, thus an equal number of rows and columns is deleted from the edges of the image.

Here a 100×100 pixel region is extracted from the center of the example image:
Click for copyable input

While ImageCrop is primarily used to reduce the dimensions of the source image, it is frequently desirable to pad an image to increase its dimensions. All the most common padding methods are supported.

This shows four different padding methods applied to the right edge of the example image:
Click for copyable input

It is frequently necessary to change the dimensions of an image by resampling, or to reposition it in some manner. Functions that perform these basic geometric tasks are readily available.

ImageResize[image,w]give a resized version of image that is w pixels wide
Thumbnail[image]give a thumbnail version of image
ImageRotate[image]rotate image counterclockwise by 90°
ImageReflect[image]reverse image by top-bottom mirror reflection

Spatial operations.

Here, ImageResize is used to increase and diminish the size of the original image, respectively:
Click for copyable input

ImageRotate is another common spatial operation. It results in an image whose pixel positions are all rotated counterclockwise with respect to a pivot point centered on the image.

This rotates the example image by 30 degrees:
Click for copyable input

Several useful image processing tasks require nothing more than simple arithmetic operations between two images or an image and a constant. For example, you can change brightness by multiplying an image by a constant factor or by adding (subtracting) a constant to (from) an image. More interestingly, the difference of two images can be used to detect change and the product of two images can be used to hide or highlight regions in an image in a process called masking. For this purpose, three basic arithmetic functions are available.

ImageAdd[image,x]add an amount x to each channel value in image
ImageSubtract[image,x]subtract a constant amount x from each channel value in image
ImageMultiply[image,x]multiply each channel value in image by a factor x

Arithmetic operations.

Here is an example of image blending using addition and multiplication:
Click for copyable input

Image Processing by Point Operations

Point operations constitute a simple but important class of image processing operations. These operations change the luminance values of an image and therefore modify how an image appears when displayed. The terminology originates from the fact that point operations take single pixels as inputs. This can be expressed as


where is a grayscale transformation that specifies the mapping between the input image and the result , and , denotes the row, column index of the pixel. Point operations are a one-to-one mapping between the original (input) and modified (output) images according to some function defining the transformation T.

Contrast Modification

Contrast modifying point operations frequently encountered in image processing include negation (grayscale or color), gamma correction, which is a power-law transformation, and linear or nonlinear contrast stretching.

Lighter[image,Null]give a lighter version of an image
Darker[image,Null]give a darker version of an image
ColorNegate[image]give the negative of image, in which all colors have been negated
ImageAdjust[image]adjust the levels in image, rescaling them to cover the range 0 to 1
ImageApply[f,image]apply f to the list of channel values for each pixel in image

Selected point operators.

One of the simplest examples of a point transformation is negation. For a grayscale image f, the transformation is defined by


It is applied to every pixel in the source image. In the case of multichannel images, the same transformation is applied to each color value of every pixel.

This shows the original example image and its digital negative:
Click for copyable input

The function ImageAdjust can be used to perform most of the commonly needed contrast stretching and power-law transformations, while ImageApply enables you to realize any desired point transformation whatsoever.

This increases contrast using linear scaling:
Click for copyable input

As an example of a nonlinear contrast stretching operation, consider the following transformation called sigma scaling. Assuming the default range of 0 to 1, the transformation is defined by


This defines the transformation:
Click for copyable input
Here are several plots of the transformation for different values of the variance parameter:
Click for copyable input
This shows the effect of the transformation on the example image:
Click for copyable input

Image binarization is the operation of converting a multilevel image into a binary image. In a binary image, each pixel value is represented by a single binary digit. In its simplest form, binarization, also called thresholding, is a point-based operation that assigns the value of 0 or 1 to each pixel of an image based on a comparison with some global threshold value t.


Thresholding is an attractive early processing step because it leads to significant reduction in data storage and results in binary images that are simpler to analyze. Binary images permit the use of powerful morphological operators for shape and structure-based analysis of image content. Binarization is also a form of image segmentation, as it divides an image into distinct regions.

Binarize[image]create a binary image from image
ColorQuantize[image,n]give an approximation to image that uses only n distinct colors

Quantization functions.

Color images are first converted to grayscale prior to thresholding. If the threshold value is not explicitly given, an optimal value is calculated using one of several well-known methods.

Here is the default binarization based on Otsu's method for optimal threshold selection:
Click for copyable input
Here ImageApply is used to return a color image in which each individual channel is binarized, resulting in a maximum of eight distinct colors:
Click for copyable input

Color Conversion

Four color spaces are currently supported: RGB (red, green, and blue), CMYK (cyan, magenta, yellow, and black), HSB (hue, saturation, and brightness), and grayscale.

In practice, the RGB (red, green, blue) color scheme is the most frequently used color representation. The three so-called primary colors are combined (added) in various proportions to produce a composite, full-color image. The RGB color model is universally used in color monitors, video recorders, and cameras. Also, the human visual system is tuned to perceive color as a variable combination of these primary colors. The primary colors added in equal amounts produce the secondary colors of light: cyan (C), magenta (M), and yellow (Y). These are the primary pigment colors used in the printing industry and thus the relevance of the CMY color model. For image processing applications it is often useful to separate the color information from luminance. The HSB (hue, saturation, brightness) model has this property. Hue represents the dominant color as seen by an observer, saturation refers to the amount of dilution of the color with white light, and brightness defines the average luminance. The luminance component may, therefore, be processed independently of the image's color information.

ColorConvert[expr, colspace]convert color specifications in expr to refer to the color space represented by colspace

Color conversion function.

This shows the conversion results from an RGB source to the remaining supported color spaces:
Click for copyable input
Click for copyable input

Note that the RGB->Grayscale transformation uses the weighting coefficients recommended for U.S. broadcast television (NTSC) and later incorporated into the CCIR 601 standard for digital video.

Image Histogram

An important concept common to many image enhancement operations is that of a histogram, which is simply a count (or relative frequency, if normalized) of the gray levels in the image. Analysis of the histogram gives useful information about image contrast. Image histograms are important in many areas of image processing, most notably compression, segmentation, and thresholding.

ImageLevels[image]give a list of pixel values and counts for each channel in image
ImageHistogram[image]plot a histogram of the pixel levels for each channel in image

Image histogram functions.

This shows two different histogram visualization methods:
Click for copyable input

Image Processing by Area Operations

Most useful image processing operators are area based. Area-based operations calculate a new pixel value based on the values in a local, typically small, neighborhood. This is usually implemented through a linear or nonlinear filtering operation with a finite-sized operator (i.e., a filter). Without loss of generality, consider a centered and symmetric 3×3 neighborhood of the image pixel at position , with value . A general area-based transformation can be expressed as


where is the output image resulting from applying transformation to the 3×3 centered neighborhoods of all the pixels in input image . It should be noted that the spatial dimensions and geometry of the neighborhood are generally determined by the needs of the application. Examples of image processing region-based operations include noise reduction, edge detection, edge sharpening, image enhancement, and segmentation.

Linear and Nonlinear Filtering

Linear image filtering using convolution is one the most common methods of processing images. To achieve a desired result you must specify an appropriate filter. Tasks such as smoothing, sharpening, edge finding, and zooming are typical examples of image processing tasks that have convolution-based implementations. Other tasks, such as noise removal, for example, are better accomplished using nonlinear processing techniques.

ImageFilter[f,image,r]apply f to the range r of each pixel in each channel of image
ImageConvolve[image,ker]give the convolution of image with kernel ker

General filtering operators.

Here is a typical blurring operation using one of the smoothing filters:
Click for copyable input

The more general (but slower) ImageFilter function can be used in cases when traditional linear filtering is not possible and the desired operation is not implemented by any of the built-in filtering functions.

This calculates the maximum range of values within a small neighborhood of each pixel:
Click for copyable input

A large number of linear and nonlinear operators are available as built-in functions. Here is a partial listing.

Blur[image]give a blurred version of image
Sharpen[image]give a sharpened version of image
MeanFilter[image,r]replace every value by the mean value in its range r
GaussianFilter[image,r]convolve with a Gaussian kernel of pixel radius r
MedianFilter[image,r]replace every value by the median in its range r
MinFilter[image,r]replace every value by the minimum in its range r
CommonestFilter[image,r]replace each pixel with the most common pixel value in its range r

Common linear and nonlinear filtering operators.

One of the more common applications of linear filtering in image processing has been in the computation of approximations of discrete derivatives and consequently edge detection. The well-known methods of Prewitt, Sobel, and Canny are all essentially based on the calculation of two orthogonal derivatives at each point in an image and the gradient magnitude.

Here are the two Sobel filters:
Click for copyable input
This returns the edges of a grayscale image using Sobel filters:
Click for copyable input

As a second example, consider the task of removing the impulsive noise, which is called salt noise due to its visual appearance, from an image. This is a classic example contrasting the different outcomes resulting from a linear moving-average and a nonlinear moving-median calculation.

This creates a small image with impulsive noise:
Click for copyable input
Here is the side-by-side comparison:
Click for copyable input

Clearly, the median filter returns the better result.

Morphological Processing

Mathematical morphology provides an approach to the processing of digital images that is based on the spatial structure of objects in a scene. In binary morphology, unlike linear and nonlinear operators discussed so far, morphological operators modify the shape of pixel groupings instead of their amplitude. However, in analogy with these operators, binary morphological operators may be implemented using convolution-like algorithms with the fundamental operations of addition and multiplication replaced by logical OR and AND.

Dilation[image,r]give the dilation with respect to a range-r square
Erosion[image,r]give the erosion with respect to a range-r square

Fundamental morphological operators.

This shows the dilation (left) and erosion (right), of a binary image (center) using a 5x5 uniform structuring element:
Click for copyable input

The definitions of binary morphology extend naturally to the domain of grayscale images with Boolean AND and OR becoming pointwise minimum and maximum operators, respectively. For a uniform, zero-valued structuring element, the dilation of an image reduces to the following simple form:


This shows the dilation (left) and erosion (right), of the example color image (center) using a 5x5 uniform structuring element:
Click for copyable input

These operators can be used in combinations using a single structuring element or a list of such elements to perform many useful image processing tasks. A partial listing includes thinning, thickening, edge and corner detection, and background normalization.

This uses dilation and erosion to detect edges in a grayscale image:
Click for copyable input
GeodesicDilation[marker,mask]give the fixed point of the geodesic dilation of the image marker constrained by the image mask
GeodesicErosion[marker,mask]give the fixed point of the geodesic erosion of the image marker constrained by the image mask
DistanceTransform[image]give the distance transform of image, in which the value of each pixel is replaced by its distance to the nearest background pixel
MorphologicalComponents[image]give an array in which each pixel of image is replaced by an integer index representing the connected foreground image component in which the pixel lies

Selected morphological functions.

An important category of morphological algorithms, called morphological reconstruction, is based on repeated application of dilation (or erosion) to a marker image, while the result of each step is constrained by a second image, the mask. The process ends when a fixed point is reached. Interestingly, many image processing tasks have a natural formulation in terms of reconstruction. Peak and valley detection, hole filling, region flooding, and hysteresis threshold are just a few examples. The latter, also known as a double threshold, is an integral part of the widely used Canny edge detector. Pixels falling below the low threshold are rejected, pixels above the high threshold are accepted, while pixels in the intermediate range are accepted only if they are "connected" to the high threshold pixels. Connectivity may be established using a variety of algorithms, but reconstruction gives an effective and very simple solution.

Here are the low, high, and double threshold images, respectively:
Click for copyable input
This clears all the symbols:
Click for copyable input