Transform Coding
• Predictive coding technique is a spatial domain technique since it
operates on the pixel values directly.
• Transform coding techniques operate on a reversible linear
transform coefficients of the image (ex. DCT, DFT, Walsh etc.)
Input Image Compressed
(N × N ) Construct
Forward Symbol Image
n×n Transform
Quantizer
Encoder
subimages
Merge
Symbol Inverse
Decoder Transform
n×n
Compressed subimages Decompressed
Image Image
• Input N × N image is subdivided into subimages of size n × n .
• n × n subimages are converted into transform arrays. This tends to
decorrelate pixel values and pack as much information as possible
in the smallest number of coefficients.
• Quantizer selectively eliminates or coarsely quantizes the
coefficients with least information.
• Symbol encoder uses a variable-length code to encode the
quantized coefficients.
• Any of the above steps can be adapted to each subimage (adaptive
transform coding), based on local image information, or fixed for
all subimages.
Walsh Transform (1-D case)
• Given a one-dimensional image (sequence)
f (m), m = 0,1, , N − 1, with N = 2 , its Walsh transform W (u )
q
is defined as:
∑ f (m)∏ (−1)
N −1 q −1
1
W (u ) = , u = 0,1, , N − 1.
bi ( m ) bq−1−i ( u )
N m =0 i =0
bk ( z ) : k th bit (from LSB) in the binary representation of z.
• Note that the Walsh-Hadamard transform discussed in the text is
very similar to the Walsh transform defined above.
• Example: Suppose z = 6 = 110 in binary representation. Then
b0 (6) = 0 , b1 (6) = 1 and b2 (6) = 1.
• The inverse Walsh transform of W (u ) is given by
f (m) = ∑ W (u )∏ (−1)
N −1 q −1
, m = 0,1, , N − 1.
bi ( m ) bq −1−i ( u )
u =0 i=0
• Verify that the “inverse works.” Let
g ( m) = ∑ ∑ f (n)∏ (−1)
N −1 N −1 q −1 q −1
1
∏ (−1)
bi ( n ) bq −1−i ( u )
bi ( m ) bq −1−i ( u )
N
u =0
n =0 i =0
i =0
This is W ( u ) with m replaced with n.
∑ f (n)∑∏ (−1)
N −1 N −1 q −1
1 (bi ( n ) + bi ( m ) )bq −1−i ( u )
=
N n =0 u =0 i = 0
?
= f (m) (HW problem!!)
Walsh Transform (2-D case)
• Given a two-dimensional N × N image f (m, n) , with N = 2q , its
Walsh transform W (u , v) is defined as:
∑∑ f (m, n)∏ (−1)
N −1 N −1 q −1
1 bi ( m ) bq −1−i ( u ) + bi ( n ) bq−1−i ( v )
W (u , v) = ,
N m= 0 n =0 i=0
u , v = 0,1, , N − 1.
• Similarly, the inverse Walsh transform is given by
∑∑ W (u, v)∏ (−1)
N −1 N −1 q −1
1 bi ( m ) bq −1−i ( u ) + bi ( n ) bq −1−i ( v )
f (m, n) = ,
N u =0 v =0 i =0
m, n = 0,1, , N − 1.
• The Walsh transform is
Separable (can perform 2-D transform in terms of 1-D
transform).
Symmetric (the operations on the variables m, n are identical).
Forward and inverse transforms are identical.
Involves no trigonometric functions (just +1 and –1), so is
computationally simpler.
Discrete Cosine Transform (DCT)
• Given a two-dimensional N × N image f (m, n) , its discrete
cosine transform (DCT) C (u , v) is defined as:
N −1 N −1
(2m + 1)uπ ( 2n + 1) vπ
C (u, v ) = α (u )α (v) f (m, n) cos cos ,
m= 0 n = 0 2N
2 N
u , v = 0,1, , N − 1, where
1 , u=0
α(u ) =
N
2
N , u = 1,2, , N −1
• Similarly, the inverse discrete cosine transform (IDCT) is given
by
N −1 N −1
(2m + 1)uπ ( 2 n + 1)vπ
f (m, n) = α (u )α (v)C (u, v) cos cos ,
u = 0 v =0 2N
2 N
m, n = 0,1, , N − 1.
• The DCT is
Separable (can perform 2-D transform in terms of 1-D
transform).
Symmetric (the operations on the variables m, n are identical)
Forward and inverse transforms are identical
• The DCT is the most popular transform for image compression
algorithms like JPEG (still images), MPEG (motion pictures).
• The more recent JPEG2000 standard uses wavelet transforms
instead of DCT.
• We will now look at a simple example of image compression using
DCT. We will come back to this in detail later.
Discrete Cosine Transform Example
Fraction of DCT coeff. Used 0.85,
Original Image MSE: 0.45
Fraction of DCT coeff. Used 0.65, Fraction of DCT coeff. Used 0.41,
MSE: 1.6 MSE: 4
Discrete Cosine Transform Example
Fraction of DCT coeff. Used 0.19, Fraction of DCT coeff. Used 0.08,
MSE: 7.7 MSE: 12
Transform Selection
• Commonly used ones are Karhunen-Loeve (Hotelling) transform
(KLT), discrete cosine transform (DCT), discrete Fourier
transform (DFT), Walsh-Hadamard transform (WHT).
• Choice depends on the computational resources available and the
reconstruction error that can be tolerated.
• This step by itself is lossless and does not lead to compression. The
quantization of the resulting coefficients results in compression.
• The KLT is optimum in terms of packing the most information for
any given fixed number of coefficients.
• However, the KLT is data dependent. Computing it requires
computing the correlation matrix of the image pixel values.
• The “non-sinusoidal” transforms like the WHT are easy to
compute and implement (no multiplications and no trigonometric
function evaluation).
• Performance of “sinusoidal” transforms like DCT, DFT, in terms
of information packing capability, closely approximates that of the
KLT.
• DCT is by far the most popular choice and is used in the JPEG
(Joint Photographic Experts Group) image standard.
Reconstructed Image Error Image
DCT
RMSE = 0.018
DFT
RMSE = 0.028
WHT
RMSE = 0.023
DCT/DFT/WHT comparison for 8× 8 subimages, 25% coefficients
(with largest magnitude) retained. Note also the blocking artifact.
Subimage Size Selection
• Images are subdivided into subimages of size n × n to reduce the
correlation (redundancy) between adjacent subimages.
• Usually n = 2k , for some integer k. This simplifies the computation
of the transforms (ex. FFT algorithm).
• Typical block sizes used in practice are 8× 8 and 16 × 16 .
Comparison of different transforms and block sizes for transform coding
0.032
DCT
DFT
0.03 Hadamard
0.028
r
or 0.026
r
E
S
M 0.024
R
d
e
zil
a 0.022
mr
o
N
0.02
0.018
0.016
1
10
Size of subimage
Bit Allocation
• After transforming each subimage, only a fraction of the
coefficients are retained. This can be done in two ways:
Zonal coding: Transform coefficients with large variance are
retained. Same set of coefficients retained in all subimages.
Threshold coding: Transform coefficients with large
magnitude in each subimage are retained. Different set of
coefficients retained in different subimages.
• The retained coefficients are quantized and then encoded.
• The overall process of truncating, quantizing, and coding the
transformed coefficients of the subimage is called bit-allocation.
Reconstructed Image Error Image
RMSE = 0.029
Threshold coding
RMSE = 0.038
Zonal coding
Comparison of Zonal and Threshold coding for 8 × 8 DCT subimages,
with 12.5% coefficients retained in each case.
Zonal Coding:
• Transform coefficients with large variance carry most of the
information about the image. Hence a fraction of the coefficients
with the largest variance is retained.
• The variance of each coefficient is calculated based on the
ensemble of ( N / n) 2 transformed subimages, or using a statistical
image model. Recall Project #2 on DCT.
• The coefficients with maximum variance are usually located
around the origin of an image transform. This is usually
represented as a 0-1 mask.
• The same set of coefficients are retained in each subimage.
• The retained coefficients are then quantized and coded. Two
possible ways:
The retained coefficients are normalized with respect to their
standard deviation and they are all allocated the same number of
bits. A uniform quantizer then used.
A fixed number of bits is distributed among all the coefficients
(based on their relative importance). An optimal quantizer such
as a Lloyd-Max quantizer is designed for each coefficient.
Threshold Coding:
• In each subimage, the transform coefficients of largest magnitude
contribute most significantly and are therefore retained.
• A different set of coefficients is retained in each subimage. So this
is an adaptive transform coding technique.
• The thresholding can be represented as T (u, v)m(u, v) , where
m(u , v ) is a masking function:
0 if T (u , v) satisfies some truncation criterion
m(u, v ) =
1 otherwise
• The elements of T (u, v)m(u, v) are reordered in a predefined
manner to form a 1-D sequence of length n 2 . This sequence has
several long runs of zeros, which are run-length encoded.
1 1 1 0 1 0 0 0 0 1 5 6 14 15 27 28
1 1 1 0 0 0 0 0 2 4 7 13 16 26 29 42
1 1 0 0 0 0 0 0 3 8 12 17 25 30 41 43
1 0 0 0 0 0 0 0 9 11 18 24 31 40 44 53
1 0 0 0 0 0 0 0 10 19 23 32 39 45 52 54
0 0 1 0 0 0 0 0 20 22 33 38 46 51 55 60
0 0 0 0 0 0 0 0 21 34 37 47 50 56 59 61
0 0 0 0 0 0 0 0 35 36 48 49 57 58 62 63
Typical threshold mask Zigzag ordering of coefficients
• The retained coefficients are encoded using a suitable variable-
length code.
• The thresholding itself can be done in three different ways,
depending on the “truncation criterion:”
A single global threshold is applied to all subimages. The level
of compression differs from image to image depending on the
number of coefficients that exceed the threshold.
N-largest coding: The largest N coefficients are retained in
each subimage. Therefore, a different threshold is used for each
subimage. The resulting code rate (total # of bits required) is
fixed and known in advance.
Threshold is varied as a function of the location of each
coefficient in the subimage. This results in a variable code rate
(compression ratio).
Here, the thresholding and quantization steps can be together
represented as:
Original
Transform
coefficient
T (u, v)
Thresholded Tˆ (u, v ) = round
and quantized Z (u, v)
value
Normalization
Factor
Z(u, v) is a transform normalization matrix. Typical example
is shown below.
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
The values in the Z matrix weigh the transform coefficients
according to heuristically determined perceptual or psycho-
visual importance. Larger the value of Z(u, v), smaller the
importance of that coefficient.
The Z matrix maybe scaled to obtain different levels of
compression.
At the decoding end, T (u, v) = Tˆ (u, v) Z (u, v ) is used to
denormalize the transform coefficients before inverse
transformation.
Example: Coding with different Z
Reconstructed Image Error Image
RMSE = 0.023
Quantization matrix Z
(9152 non-zero coefficients)
RMSE = 0.048
Quantization matrix 8Z
(2389 nonzero coefficients)