PRODUCTS
Products Overview
Mathematica
Mathematica Student Edition
Mathematica Home Edition
Wolfram
CDF Player
(free download)
Computable Document Format (CDF)
web
Mathematica
grid
Mathematica
Wolfram
Workbench
Wolfram
SystemModeler
Wolfram
Finance Platform
Mathematica
Add-Ons
Wolfram|Alpha Products
SOLUTIONS
Solutions Overview
Engineering
Aerospace Engineering & Defense
Chemical Engineering
Control Systems
Electrical Engineering
Image Processing
Industrial Engineering
Materials Science
Mechanical Engineering
Operations Research
Optics
Petroleum Engineering
Biotechnology & Medicine
Bioinformatics
Medical Imaging
Finance, Statistics & Business Analysis
Actuarial Sciences
Data Science
Econometrics
Economics
Financial Engineering & Mathematics
Financial Risk Management
Statistics
Software Engineering & Content Delivery
Authoring & Publishing
Interface Development
Software Engineering
Web Development
Science
Astronomy
Biological Sciences
Chemistry
Environmental Sciences
Geosciences
Social & Behavioral Sciences
Design, Arts & Entertainment
Game Design, Special Effects & Generative Art
Education
STEM Education Initiative
Higher Education
Community & Technical College Education
Primary & Secondary Education
Students
Technology
Computable Document Format (CDF)
High-Performance & Parallel Computing (HPC)
See Also: Technology Guide
PURCHASE
Online Store
Other Ways to Buy
Volume & Site Licensing
Contact Sales
Software
Service
Upgrades
Training
Books
Merchandise
SUPPORT
Support Overview
Mathematica
Documentation
Support Articles
Learning Center
Technical Services
Training
Does My Site Have a License?
Wolfram User Portal
COMMUNITY
COMPANY
About Wolfram Research
News
Events
Wolfram Blog
Careers
Stephen Wolfram's Home Page
Contact Us
OUR SITES
All Sites
Wolfram|Alpha
Connected Devices Project
Demonstrations Project
MathWorld
Integrator
Wolfram Functions Site
Mathematica Journal
Wolfram Media
Wolfram
Tones
Wolfram Science
Stephen Wolfram
DOCUMENTATION CENTER SEARCH
User's Guide
Image Transforms
8.8 Example: Block Transform Coding
The transform coding method compresses image data by representing the original signal with a small number of transform coefficients. It exploits the fact that for typical images a large amount of signal energy is concentrated in a small number of coefficients.
The goal of transform coding is to minimize the number of retained transform coefficients while keeping distortion at an acceptable level.
Transform coding is an integral part of one of the most widely known standards for lossy image compression, the JPEG (Joint Photographic Experts Group) standard.
What is the best transform to use for purposes of image compression? The discrete Karhunen-Loeve transform, also known as the Hotelling transform, has the property that it maximizes the number of coefficients that are small enough to be discarded. Equivalently, the KL transform minimizes the mean squared error between the original and reconstructed images for a given number of coefficients. However, since no fast algorithm exists for the computation of the KL transform, the
DCT
is typically used.
This loads the package.
In[1]:=
This loads one of the example images.
In[2]:=
Block transform coding divides an image into blocks of equal size and processes each block independently. Block processing allows the coder to adapt to local image statistics, exploit the correlation present among neighboring image pixels, and to reduce computational and storage requirements. JPEG uses 8 × 8 blocks. Larger blocks do not offer significantly better compression, since correlation between pixels tends to decrease as the block size grows. A fixed block size allows the design of optimized software and hardware. Typically the blocks do not overlap, which significantly reduces computational complexity. However, overlapping block processing has proven useful in reducing distortion at very high compression factors.
Here we compute the block DCT transform of an example image. As required by JPEG, the image is first normalized with respect to the middle gray level.
In[3]:=
Here a multiblock segment of the original image and the corresponding DCT transform blocks are shown.
In[4]:=
Out[4]=
This selects an 8×8 block of DCT coefficients for use in the next few calculations.
In[5]:=
Out[5]//MatrixForm=
An inspection of the block cosine transform reveals that a large number of the cosine coefficients have relatively small values. For purposes of compression, it is of interest to discard these small values. This may be implemented in a number of alternative ways. The retained coefficients may be selected based on their position in the DCT block, typically the low-frequency zone of the transform domain. Alternatively, they may be selected based on their value relative to a fixed threshold. The zone method is simpler, but inefficient, since large-valued coefficients outside of the prescribed zone may be discarded, increasing the approximation error. For this reason the threshold method, with its ability to capture all large-valued coefficients and adapt to the local image statistics, is preferred.
Here is a typical triangular zone mask that retains 15 of a total of 64 DCT coefficients.
In[6]:=
Out[6]//MatrixForm=
Here are the DCT coefficients of the
block after application of the mask.
In[7]:=
Out[7]//MatrixForm=
The 15 remaining coefficients define an approximation of the original signal. The distortion or approximation error may be measured in a number of equivalent ways. A commonly used metric is the so-called signal-to-noise (SNR) ratio. In the present case, the noise is the approximation error or difference between the original and reconstructed signals. SNR may be defined as
where
[i,j] and
_{Z}
[i,j] represent the original and zone masked DCT coefficients, respectively. The SNR is typically given in units of decibels (dB). The denominator expression defines the error signal power, while the numerator is the signal power. The SNR in decibels can be calculated using the
ImageProcessing
function
ImageEnergy
.
In[8]:=
Out[8]=
We now turn to the threshold method. We pick a threshold, say T=30, and discard all coefficients with magnitudes smaller than this threshold. The threshold was selected to yield the same number of coefficients as the zone mask.
In[9]:=
Out[9]//MatrixForm=
The SNR improves approximately 4 dB from the application of threshold masking instead of zone masking.
In[10]:=
Out[10]=
The threshold masking method improves image fidelity, but complicates the reconstruction process. Note that the positions of the coefficients need to be known in order to correctly reconstruct an approximation of the original image. The JPEG standard adopted a clever scheme to order the coefficients in a 1D list, so that a coefficient's position in this list uniquely identifies its place in the 2D coefficient array. This is accomplished by ordering all the coefficients according to the following zig-zag scanning pattern:
(
.
.
.
.
.
.
)
Additionally, all zeros beyond the last nonzero value are dropped and replaced by a special end-of-buffer (EOB) symbol. Here is the list of coefficients within block
listed in zig-zag order: {-474,-68,0,0,-62,102,0,-71,73,0,0,0,41,-58,45,70,0,64,-42,0,0,0,0,0,0,0,-33,-57,42,EOB}
The retained coefficients need to be quantized and coded to complete the compression process. In reality, the threshold and quantization steps are combined by dividing the coefficient values by scaling factors stored in a quantization table Q. The standard luminance quantization table has the following scaling factors.
In[11]:=
Here is the result of scaling
with quantization mask Q.
In[12]:=
Out[12]//MatrixForm=
The scaling factors may be changed according to a quality factor q with values
. A factor of
results in images practically indistinguishable from the original. As the factor increases, the distortion in the compressed image increases. Here we compute the quantized block DCT coefficients of the example image.
In[13]:=
The ratio of the number of retained coefficients to the total number of DCT coefficients (i.e., N
^{2}
) may serve as an approximate measure of compression due to the transform coding operations.
In[14]:=
Out[14]=
JPEG decoding reverses the processing steps described in previous sections. We proceed with computing an approximation of our example image using the standard quantization table and evaluate the approximation error.
In[15]:=
In[16]:=
Out[16]=
Here are the error signal's dynamic range and SNR (in dB).
In[17]:=
Out[17]=
This ends the example.
THIS IS DOCUMENTATION FOR AN OBSOLETE PRODUCT.