0% found this document useful (0 votes)
32 views37 pages

Teach Eal My Atsu

The document discusses the Discrete Fourier Transform (DFT), its properties, and applications, including the unitary nature of the DFT and its inverse. It also covers the 2-D DFT and its separable nature, along with the Discrete Cosine Transform (DCT) and its advantages for image compression. Additionally, it introduces the Discrete Sine Transform (DST) and the Hadamard Transform, highlighting their properties and computational efficiency.

Uploaded by

Thar Nge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views37 pages

Teach Eal My Atsu

The document discusses the Discrete Fourier Transform (DFT), its properties, and applications, including the unitary nature of the DFT and its inverse. It also covers the 2-D DFT and its separable nature, along with the Discrete Cosine Transform (DCT) and its advantages for image compression. Additionally, it introduces the Discrete Sine Transform (DST) and the Hadamard Transform, highlighting their properties and computational efficiency.

Uploaded by

Thar Nge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

1-D DFT

The unitary Discrete Fourier Transform (DFT) of a sequence u(n) is defined by


where

2π 1
WN = e− j N . Note that in equation (11.1), we have the factor . This has been
√N
introduced to make this transform unitary, as we shall see shortly. Let the sequence
u(n) be represented as a vector u of length-N, and the transformed vector is
likewise v. The unitary DFT can then be written as

v=Fu

where

It turns out that matrix F is unitary.

Definition 11.1 A matrix A is said to be unitary if

AtA = I

where t represents the conjugate transpose operation and I is the identity matrix.

Fact 1: A matrix is unitary if and only if its rows (or columns) are orthonormal.

It is quite straightforward to show that the DFT kernel matrix F is unitary.


The ith row of F is
Therefore,

where p = r - l. By invoking the orthogonality of discrete exponentials,

The above clearly implies that F is unitary. The inverse DFT is therefore
u = F-1v
=Ftv
Since F is also symmetric,
u = F*v
where * represents the complex conjugate operation. Equation () can be expanded
as

The DFT can be viewed as expanding the input sequence in terms of N basis
vectors.
This is true for any unitary transformation. Consider the unitary transformation
v = Au
Let

where ai is a column vector. Since A is unitary,


u = A tv
¿ ¿ ¿
=[a 0 a 1 … a N −1]v
¿ ¿ ¿
=a 0v(0) + a 1v(1) + … + a N −1v(N-1)
The input is therefore represented in terms of the transform coefficients
v(i) and the basis vectors. The basis vectors are the conjugate of the rows of the
¿
transform kernel matrix. In other words, the basis vectors a i are the columns of At.
EXAMPLE 11.1 Consider the 1-D unitary DFT of u = [8 1 0 1] T. The DFT
coefficients are given by
v = Fu

The basis vectors are

The input can therefore be represented as

which can be easily verified by substituting the basis vectors.


Now we address the reason for using a unitary transformation. Clearly,
orthnormality is an inherent property of a unitary matrix. This property leads to
minimization of the sum of square errors (SSE) if the representation of the input.
General Orthonormal Transform
The general orthonormal transform pair is given by

Where

Let the series expansion of the input given by (11.17) be truncated to P terms as

In other words, the input is now reconstructed from a subset of the transform
coefficients. This is typical in image compression applications, where the
coefficients with small magnitude are not transmitted (or coded). Now we define
the sum of squares error (SSE) between u(n) and up(n) as

The following result is a powerful motivation for using orthonormal transforms.


Theorem 11.1 The orthonormality property assures that the sum of squares error, S,
is minimized in an orthonormal transform.
Proof. The SSE is
The derivative of the SSE can be found as

Setting the derivative equal to zero gives the optimum coefficients,

The above gives

Using the orthonormality property,

Equating the real and the conjugate terms, we get

Theorem 11.1 states that no other coefficients can yield a lower SSE than the
unitary transform coefficients.
2-D DFT
Now let us revisit the 2-D DFT algorithm we established in Chapter 2. It
was shown that given a 2-D signal (image) matrix U, its 2-D DFT can be obtained
efficiently by first taking the 1-D DFT of each of its columns and then performing
the 1-D DFT of the rows of the resulting matrix. To express this mathematically,
let the N1 x N2 input matrix be
U = [u0 u1 … uN2-1]
where ui, is the ith column of the matrix. Let FN, represent the N1 x N1,l-D DFT
kernel matrix. The above
algorithm suggests that we
first perform
Where g, is the ith row of G. Next we transform each row of G with the N2 x N2 1-
D DFT kernel matrix F N .
2

This row-column decomposition algorithm


works because the 2-D DFT is a separable transform. Indeed, this method works
for any 2-D separable transform, not just the DFT.
For any given unitary matrices A N , and A N .the transform pair is
1 2
Now we proceed to represent an N x N matrix U as a linear combination of basis
images. There is a set of N2 basis images, and we now show how to construct
them. Define

¿
where a i is the ith column of At .
From (11.31),

The basis images are therefore given by

EXAMPLE 11.2 Consider the N = 4 2-0 DFT Thefirst 4 of th~ 16 total bask
images are
Let the image be

The 2-0 DFT b given by

Since V has four nonzero entries, U can be expressed


in terms of 4 of the 16 basis images as
Now consider the 2-D DFT of an image. The Fourier spectrum of an image
has a very large dynamic range, typically higher than the display device. Therefore,
if we plot this spectrum as an image, only the very bright spots show up on the
screen. Because of this difficulty, the DFT spectrum |F(k1, k2)| is often plotted in
the following scale:

This will be referred to as the log-magnitude spectrum, and it performs the


compression of the dynamic range. Consider the "Lena" image of Fig. 11.2(a) as an
example. Part (b) of this figure shows the phase of the image. The log-magnitude
spectrum of the unitary DFT of the image is shown in part (c). The low frequencies
are at the comers of the image. We can use the MATLAB command fftshift to
move the low frequencies to the center, which gives part (d) of the figure. Very
few coefficients have such a large magnitude, and they are all located at the lower
frequencies. So, let us investigate what happens if we use only a few of the large
coefficients to reconstruct the image using the inverse DFT. This is done in Figure.
The original image is 512 x 512, and so is its DFT. We now use the central M x M
coefficients of the (centrally) shifted DFT to reconstruct the image. The other
coefficients are simply assumed to be zero. The results for M = 60, 80, and 120 are
given in parts (b), (c), and (d), respectively, of Figure. As we increase the number
of DFT coefficients in our reconstruction, the image gets better. For M = 120, the
reconstructed image is almost as good as the original. If we transmitted the original
image, it would require sending 512 x 512 elements. In this case, we are only
sending 120 x 120 complex coefficients. This represents a compression ratio of
approximately 9. This is a very simple example where we assume that all the
coefficients are coded with the same number of bits, which is not usually done in
practice. Nevertheless, this example demonstrates the power of transform coding.
1-D DCT
The Discrete Cosine Transform (DCT) is similar to the DR. There are
several motivations for this transform notably: (a) the DCT coefficients are purely
real; (b) the DCT has near-optimal property for energy compaction; and (c) the
DCT can be computed efficiently using fast algorithms that are computationally
comparable to the FFT. As a result of these advantages, the DCT is used
extensively in image compression.
The DCT can be formatted in the following way. Consider a length-N
sequence x(n), an example of which is shown in Fig. 11.4(a). The first step is to
make this sequence symmetric by the following mapping:

For the example at hand, the resulting sequence ^x (n) is shown in Fig. 11.4(h). This
1
sequence is of length 2N and is symmetric about n = N - 2 The next step is to take
the DFT of this symmetric sequence. The 2N-point DFT of ^x (n) is

Figure: (a) An example of x(n); (h) Z(n) is a symmetric extension of x(n)


Performing the change of variables p = 2N - 1 - n,
Using the fact that WZN = 1 and letting p = n in the second summation,

Notice that 0 ≤ k ≤ 2N - 1 in equation. Finally the DCT is defined as w&*(k) in the


range 0 ≤ k ≤ N - 1 and zero otherwise. This will make the DCT purely real. The
DCT of x(n) is then

This is the forward transform. The actual DCT is scaled so that it is unitary. This
gives the transform

Where

The inverse unitary DCT is

We need to prove that (11.42) and (11.44) together form a unitary DCT pair.
The proof of the fact that the pair is unitary is assigned as Problem 9. We now
show that they indeed to form a DCT transform pair. This can be done by
substituting (11.44) into the right-hand side of () and showing that it simplifies to
Cx(k). We do this in two parts-first for k= 0and then for 1 ≤ k ≤ N-l.
Case 1: k = 0
In this case, we need to prove that

With x(n) as in (11.44), the RHS of the above equation is

By using Euler's formula and after some algebraic manipulations, it can be shown
(Problem 1) that T(k) = 0, which gives the desired result.
Case2: 1 ≤ k ≤ N-1
The RHS of (11.42) is

It can be shown that (Problem 6)

Then (11.47) reduces to (1 = k)

It is easy to show
that
P(k) = N
Then

which is the desired result.

Properties of the DCT


Define the DCT kernel matrix as C, so that for a given vector x, the transformed
vector is
v = CX.
The transform kernel matrix can easily be found from (11.42) as

Property 1: The DCT is real and orthonormal, that is,


C = C* and CCT =I.
The first part is clearly trivial. The roof of the second part is left as an exercise
(Problem 9).
Property 2: The DCT is a fast transform. That is, the DCT of an N-vector can be
computed in O(N log2N) operations. This is shown below.
The derivation is based on taking the even samples, x(2n), and odd samples,
x(2n+ 1), and constructing a new sequence as follows:
The DCT of the sequence x(n) is

The obvious change of variables is now m = N - n - 1. This gives

Thus, the DCT of x(n) can be computed via a FFT of the sequence Z(n).
Property 3: The DCT has excellent energy compaction for highly correlated data.
For a signal x(n) that generates a smooth extension for x(n) + x(N - n - I), the DCT
is well localized; that is, the energy is packed into a small number of coefficients.
This is the case for most images, and therefore the 2-D DCT has become very
popular for image compression.
The 2-D unitary DCT of a N x N sequence x(nl, n2) is defined by

where a(kl) and a(k2) are defined as in (11.43). Notice that this transform is
separable. Therefore, the 2-D DCT can be performed in terms of 1-I3 DCTs using

the row-column decomposition method. The inverse unitary DCT is given by


For a given input matrix X, its 2-D unitary DCT can be written as
V = CXCt
This was shown in equation (). Since C is real and unitary, the 2-D inverse DCT is
X = CTVC
Equations (11.60) and (11.61) are compact representations of the 2-D DCT pair in
terms of the 1-D DCT kernel matrix.
An example of a 2-D DCT of a 512 x 512 image is shown in Fig. 11.5. The log-
magnitude of the DCT is plotted as in equation (11.36). The origin of (kl, k2) is the
upper left corner. The magnitude of the DCT coefficients rapidly decreases as the
pair (kl, k2) increases. That is, some decorrelation of DCT coefficients has
occurred. Now, we use a subset of these coefficients to construct the image using
the inverse DCT. The rest of the coefficients are discarded (assumed zero). In
particular, we use the first M × M DCT coefficients, since the energy of the image
is concentrated in the low values of (kl, k2). The results for M =60, 80, and 110 are
shown in Fig. 11.6. For M = 110, the result is visibly about the same as we
obtained using M = 120 for the Dm. In addition, all of the DCT coefficients are
real. This gives us a compression ratio of approximately 22.
Discrete Sine Transform
The Discrete Sine Transform (DST) of a 1-D sequence x(n) of length N is
defined as

The inverse DST is

The 1-D DST kernel is therefore given by

The DST is then


Sx = Sx
where Sx is the N x 1 transformed vector.
The 2-D DST is defined as a separable transform. For an N x N matrix X, the 2-D
DST is an N x N matrix V, which can be computed as
V = SXST = SXS.
The inverse 2-D DST is then
X = StVS* = SVS.
Properties of the DST
Property 1: The DST is real and symmetric. It is also orthonormal; that is,
S* = S = ST = S-l
The proof is assigned as Problem 11.
Property 2: The DST is a fast transform; that is, it can be computed via a FFT.
This can be proved in a similar fashion to that of the DCT.
Property 3: The DST has good to excellent energy compaction property for
images.
Hadamard Transform
The Hadamard transform is a very unique and interesting transform. The kernel
matrixis a Hadamard matrix, whose elements are all equal to either +1 or -1. This
obviously implies that the Hadamard transform does not require any
multiplications. Many interesting theories have been presented on Hadamard
matrices. Here we present the very basicsof Hadarnard matrix theory and then
proceed to define this transform.
Definition 11.2 A Hadamard matrix H of side1 N is an N x N matrix with elements
±1 and the property that
HH~ = NI.
'We use the term side to denote the row or column of a square matrix. This is
common in the discrete mathematics literature.
The Hadamard matrix is therefore orthogonal, which is a desirable property of
image
transforms. The orthogonal property leads to the following result.
Theorem 11.2 An N x N matrix with elements hij is Hadamard if and only if

Proof. Suppose hij = l for matrix H. The (i,j)th element of HHT is given by

The hypothesis (11.70) is true if and only if HHT = NI. If i = j, the defined
elements are

Theorem 11.3 Let K be a Hadamard matrix of side N and L be a Hadamard matrix


of side R. Then
is also a Hadamurd matrix.
Proof. Let kij denote the elements of matrix K. The Kronecker product K L is

Matrix H has size NR x NR. It can be also be considered to be an N x N


array of R × R submatrices. Matrix HH~ is of the same size. The submatrix in the
(i,j) position of HHT is

Since K is Hadamard,

Therefore

which implies that H is also Hadamard.


In the following, we present two more interesting results on Hadamard
matrices. The proofs are omitted and can be found in [l] along with many other
interesting results.
Theorem 11.4 If there is a Hadamard matrix of side N, then N is 1,2, or a multiple
of 4.
Theorem 11.5 If H is a Hadamard matrix, then K is also Hadamard if it is obtained
from H in one of the following ways:
(a) By negating sow or all rows.
(b) By negating some or all columns.
(c) By permuting the rows and permuting the columns.
Now we consider some examples of Hadamard matrices. Let Hn denote a
Hadamard matrix of side 2n. Then

By Theorem 11.3, we can find

and so on.
The 1-D unitary Hadamard transform of a N x 1 vector x is defined as
1
v= Hx
√N
where H is a Hadamard matrix of side N. The inverse transform is
1
x= Hv
√N
The 2-D Hadamard transform for an N x N matrix X is
1
V = N HXHT

= 1 HXH
N

The inverse transform is


Properties of the Hadamard Transform
1
Property 1: The Hadamard transform kernel His real, symmetric, and
√N
orthonormal.
Property 2: The Hadamard transform is a fast transform. The 1-D transform of an
N-length vector can be computed in O(Nlog2N) additions and subtractions.
Property 3: The Hadamard transform has a good to very good energy compaction
property for most images.
KL Transform
The Karhunen-Loeve (KL) transform was originally introduced by
Karhunen [2] and Loeve [3] and used as a series expansion for random processes.
The KL expansion is essentially the same as the principal component analysis
introduced by Hotelling. It is therefore also called the Hotelling transform, or
principal component transform [4]. It is also known as the eigenvector transform
because the transform is based on the eigenvectors of the autocorrelation matrix of
the input sequence, as we now define.
Let R denote the autocorrelation matrix of an N x 1 input vector x. If λk and
∅ k are, respectively, the kth eigenvalue and corresponding eigenvector of R, then

R∅ k = λk∅ k
for 1 ≤ k ≤ N. By using the eigenvectors 4k as the columns, form a new matrix ᴪ .
The KL transform is defined as
v = ᴪ tx
By using Property 6,

That is, the autocorrelation matrix is unitarily diagonalizable by a matrix of its


eigenvectors.
EXAMPLE 11.3 [6] Consider a 2 x 1 random vector consisting of two realizations
of equal probability given by x1 = [ 21 ]T and x2 = [ 3 2 ]T, The autocorrelation matrix b
therefore

The eigenvalues and eigenvectors can be formed as


λ1 = 8.972, λ2 = 0.0279

This gives the KL transform kernel


matrix

The transform of each of the input vectors is therefore

In each transform vector above, one coefficient is dominant. That is, most of the
energy is packed into that coefficient. The autocorrelation matrix of the
transformed vector is

Thus, the KL transform has produced transform coefficients that are uncorrelated.
The variances of these transform coefficients are clearly equal to the eigenvalues
of R.
Example 11.3 is actually a demonstration of the following important
property of the KL transform.
Theorem 11.6 lfinput x has zero man (E{x} = 0), then the KL transform
coefficients have zero mean and are uncorrelated; that is,
E {v} = 0
E{v(k)v*(l)} = λkδ(k - l).
The proof is straightforward and assigned as a problem. Another important
property of the KL transform deals with energy compaction in the first m transform
coefficients, which is obviously important for compression. The following theorem
presents this important result. The proof is omitted and can be found in [5].
Theorem 11.7 Among all unitary transforms v = Ax, the KL transform packs the
maximum average energy in m 5 N samples of v. Define the variances of the kth
coefficients as

It is assumed that the coefficients are ordered in decreasing values of variance as

Define the energy of the first m coefficients as

Then for any fixed m,

For the 2-D transform definition, we assume separability of the image


autocorrelation function. This is done for computational advantages as we now
explain. An N x N image has an autocorrelation function r ( m , n , m' , n' ) . Its
correlation matrix therefore has N4 elements. In general, we have N2 eigenvectors
and eigenvalues. Finding these is not computationally practical. Let r ( m , n , m' , n' ) be
modeled to be separable, that is,
r( m , n , m' , n' ) = r1(m, n,) r2(m' , n' )
Then, the elements of the basis functions can be expressed as

This is again separable. So, the 2-D KL transform is now defined as

and
For an N x N matrix, the separability assumption reduces the computation from
O(N6) to O(N3) [5].
The obvious disadvantage of the KL transform is its computational complexity. In
general, there is no fast KL transform. In addition, the KL transform depends on
the statistics of the input, which are usually not known a priori. In many cases,
these statistics can be modeled, and the kernel matrix thus obtained can be used for
images in the same class. Because of these difficulties, the KL transform is not
used in practice for data compression. However, it is commonly used as a
benchmark for evaluating the performance of fast transforms.
Huffman Coding
Huffman coding is an optimal coding scheme that results in the lowest possible
average bit rate. It is very easy to implement, as we now show with an example.
Consider a set of five symbols {a l, a2, a3, a4, a5} to be coded. Let the probabilities
(histogram) of these symbols be

The algorithm is now described using the following steps with reference to Figure.
Step 1 Arrange the symbols as nodes in descending order of their probabilities.
Consider these nodes as the leaf nodes of a tree.
Step 2 Select two nodes with the lowest probabilities (a4, a4) and combine them to
form a new node (a6). The probability of this new node is the sum of the
probabilities of the original nodes. Again, we combine two nodes with the lowest
probabilities (a3, a6) to get a new node (a 7). This process is repeated until we obtain
a final node with probability 1. This is called the root node.
Step 3 Arbitrarily assign 1 and 0 to each pair of branches that lead to a node.
Step 4 The code for each symbol is found by reading the 1 or 0 on each branch
sequentially from the root node to the leaf node.
For example, a5 has the code 1 1 1 1; a4 has the code 1 1 1 0; and so on.
Figure: Example of codeword generation in Huffman coding.
Now we compare the performance of the Huffman code to uniform-length
coding. Since we have five symbols, uniform-length coding requires 3 bits per
symbol. For Huffman coding, we find the average length of the codeword as

The entropy of the source is

In this example, Huffman coding yields an average codeword length that is very
close to the entropy of the source. This is quite typical of Huffman coding in most
cases. However, Huffman coding has the following disadvantages that prohibit its
use in some applications.
(a) For a large number of symbols, the Huffman tree can be very large, which
implies a high computational cost.
(b) The probability histogram of the image must be computed, requiring a total of
two passes through the data, once for computing the histogram and again for
coding.
LZW Coding
Ziv and Lempel [l0] invented a universal algorithm for data compression in
1977. A universal algorithm is one that is independent of the symbol statistics. The
Lempel-Ziv algorithm was later modified by Welch [9] in 1984, and the resulting
algorithm is called the LZW algorithm. This algorithm is based on a dictionary (or
translation table), that maps the input symbols into fixed-length codes. The
dictionary has a prefix property. This means that for every string in the dictionary,
its prefix string is also in the dictionary. For example, let s be a string of symbols
and c be a single symbol. If sc is in the dictionary, then s must also be in the
dictionary. The dictionary is built by concatenating new characters to previously
encountered strings. Therefore, as the dictionary grows, the length of the strings is
coded, and compression is achieved. The parser observes the incoming symbols.
As long as the new symbol sequence after the last phrase (string) coincides with an
existing phrase, no new phrase is introduced and another symbol is considered. As
soon as the new phrase is different from anything in the dictionary, it is recognized
as a new phrase, encoded, and also saved in the dictionary.
Consider the following sequence of symbols as they arrive from left to right:
abaccabcbabcabca. . .
The formation of the dictionary and the encoding process are shown in Fig.
11.9. We start with an empty dictionary, which has a set of indices (or memory
locations). Initially, we have null symbol {} in index 0. The parser then observes
the first symbol a, which is not in the dictionary. So, it saves a in index 1 and
encodes it as 0a. The next symbol is b, which is also not in the dictionary. So b is
saved in index 2 and encoded as 0b. The next symbol is a, which is already in the
dictionary in index 1. So, the parser considers the next symbol, c, and forms the
string ac. This string is now in the dictionary and is saved in index 3 and encoded
as lc, which means that it is a string composed of whatever is in index 1 followed
by c. This process is continued, and we have shown up to index-8 in figure. The
decoding process is quite straightforward. The decoder also starts with an empty
dictionary and with null {} in index 0. When it first receives 0a, it takes whatever
is in index 0 (null), follows it by a, and puts it into index 1. Then it receives 0b, and
so it puts b in index 2. Then it gets lc, which means that a new string is created
with the contents of index 1 (which is a) followed by c. The new string ac is saved
in index 3. This process is continued. The decoder therefore builds the dictionary
in the same way as was done in the encoder. The encoding and decoding schemes
are quite simple to implement. The size of the dictionary is, of course, finite.
Therefore, when it gets full, it must be purged. The rules for purging must be the
same for the encoder and the decoder; otherwise there will be errors in decoding.
When the dictionary gets full, one option is to start over with a brand-new empty
dictionary. This is a simple
implementation scheme but is not
the most efficient. Another option is
to leave the single-symbol
strings in the dictionary and
purge the rest. Many other
variations are possible, and the
best possible one would depend on
the application and computational
complexity that can be tolerated.

Figure: Example of LZW encoding scheme.


A typical LZW implementation uses 12-bit codes with 8-bit input symbols.
This implies a dictionary size of 4096 locations. Observe that each transmitted
string consists of an index (12-bit) followed by a single symbol (8-bit). In most
coding schemes, transmission is done in bytes. Therefore, this would require 3
bytes of data per code. However, we can do better by a technique called "bit
packing." In this method, several codes are used and partitioned into appropriate
bytes. In other words, we do not use up 3 bytes for 20 bits (code + symbol), but
instead we get the next code and partition the bits appropriately. This increases the
efficiency of compression but adds to the complexity of the encoder and decoder.
The LZW algorithm is well suited for data compression for many
applications. It adapts to the data being processed and therefore needs no
programmer guidance or pre-analysis of data. Its simplicity permits very-high-
speed execution. The LZW algorithm is used in the "compress" program used in
Unix, in the "zip" program for Windows, and in the "gif” formatting standard.

JPEG Lossy Image Compression Standard


Block diagrams of the JPEG encoder and decoder are shown in Fig, 11.19. The
decoder performs the inverse operations of the encoder and in reverse order.
DCT
The first step in the encoder is the DCT. The input image is partitioned into 8 x 8
blocks of data. For each block, an 8 x 8 2-D DCT is performed. The DCT
coefficients corresponding to the low-frequency bins are usually large in
magnitude, and they are deemed to be perceptually most significant. The higher
frequency coefficients are typically smaller. These features of the DCT coefficients
are exploited in designing the quantizer, where the bulk of the compression takes
place.
Quantizer
In the quantizer, each DCT coefficient X(k l, k2) in the 8 x 8 block is quantized in a
special way depending on its index (k l, k2). A predefined 8 x 8 quantization table is
used for this purpose. Let Q(k1, k2) denote the entries of this quantization table.
Each of these entries

Figure: (a) JPEG encoder, (b) JPEG decoder.


is a number between 0 and 255, for 8-bit images. The quantized value for each
DCT coefficient is given by

A large value of Q(kl, k2) implies a small (in magnitude) quantized value, whereas
a small value of Q(k1, k2) yields a large quantized value. The quantization tables
for the luminance and the chrominance components are given in Figure. Notice
that the Q(kl, k2) entries for lower values of (kl, k2) are small, which implies that the
quantized values will be large. For higher values of (k l, k2), the quantized
coefficients will be small. That is, the lower frequency coefficients will require a
larger number of bits than the higher frequency components. The quantization
tables given here were obtained from numerous psychovisual experiments. These
tables have become widely accepted, although the JPEG standard does not specify
these particular tables or any others. Notice that the quantization table for the
chrominance component has larger values than the luminance. This will yield
smaller quantized coefficients for the chrominance. This is done because the
human visual system is more insensitive to the chrominance component than the
luminance.
As an example, consider the 8 x 8 block of data shown in Fig. 11.21(a). This
is a block from the image "earth" that is found in the MATLAB image processing
toolbox. The DCT of this block is given in figure(b). The DCT matrix is then
quantized according to (11.101) with the luminance table given earlier. The
quantized coefficients are given in Figure(c). There is a large number of zeros in
this matrix. This makes this array
especially suitable for Run
Length Coding (RLC), which is
explained in the next subsection.
Figure: Quantization tables: (a) luminance, (b) chrominance.

Example: (a) an 8 x 8 image matrix, A, (b) DCT matrix of A, (c) quantized DCT
matrix.

Symbol Creater and Coder


This block creates symbols for the coefficients of the quantized 8 x 8 DCT
matrix denoted by Xq(kl, k2). The symbols are created in a very special way so that
compression is achieved. The matrix of quantized DCT coefficients is converted to
a vector sequence by traversing the matrix in a zig-zag way as shown in Fig. 11.22.
As we know from the last example, the quantized DCT matrix has a large number
of zeros in the high-frequency bins. Forming the sequence in this zig-zag way
assures that the zeros are mostly toward the end. This will lead to efficient coding
by RLC, as we shall see shortly.
The DC coefficient, Xq(0, 0), is handled very differently than the rest
because a high degree of correlation exists between the DC coefficients in two
adjacent 8 x 8 blocks of data. These coefficients are therefore differentially coded.
Let X (i)q (0, 0)denote the quantized DC coefficient in the ith block. Then the
difference
di = X (i)q (0, 0) - X (i−1)
q (0, 0)
is mapped to a symbol. For an 8-bit image, the DCT coefficients are signed and lie
in the range [-1023,1023]. Therefore, di lies in the range [-2046,2046]. This is a
very large range and will require too much computation and storage to create the
Huffman coding tree. Therefore, this range is subdivided into categories, and these
categories are

Figure: Zigzag procedure to create a vector sequence from the 8 x 8 quantized


DCT matrix.
Huffman coded. Each of the positive and negative ranges can be represented by 11
bits. That is, they can be subdivided into 12 different categories. The categories are
defined as follows:
Category k: Element range: [2k-1, 2k-1] and [-2k + 1, -2k-1]
¿¿ ¿¿
We will now consider some example categories. Let δ i and δ i denote the possible
positive and negative values of the difference signal. Then we have the following
examples:
¿¿ ¿¿
k = 3 : 4 ≤ δ i ≤ 7, and -7 ≤ δ i ≤ -4
¿¿ ¿¿
k = 4 : 8 ≤ δ i ≤ 15, and -15 ≤ δ i ≤ -8
¿¿ ¿¿
k = 5 : 16 ≤ δ i ≤ 31, and -31 ≤ δ i ≤ -16, and so on.
The categories are Huffman coded, and the amplitude of the signal within the
category is coded in k bits and transmitted as is. At the receiver, the Huffman
decoder decodes a string of bits into a category, say k 0. Then the next k0 bits are
interpreted as the amplitude. The subsequent bits are then decoded for a category,
and the next corresponding bits are the amplitude, and so on. Positive numbers are
coded in binary, whereas the
negative numbers are coded in one's
complement. As an example,
consider the DC values of
quantized DCT coefficients of
three consecutive blocks as 48,60, and 44. The following symbols are generated:

Notice that in the case of k = 5, if we assume straight binary, the amplitude


is 15. But this is not an option for this category. Therefore, it must be interpreted as
a negative one's complement number, which yields the desired result.
Now we discuss the JPEC standard for coding the AC coefficients, many of
which are zero as we have already seen. Recall that in the case of DC coefficients,
we Huffman code each category, which is a symbol between O and 11. In this
case, the symbol to be Huffman coded is composed of (a) run-length and (b)
category. There are 4 bits reserved for the category. The run-length is the number
of zero coefficients before a nonzero coefficient is encountered in the sequence. In
the beginning of the 64-coefficient sequence, there are not many zeros and the run-
lengths will be mostly zero. But as we proceed toward the end of the sequence, the
run-lengths will be large numbers. There are 4 bits reserved for the run-length for
each coefficient, which allows for up to 15 zeros encountered before a nonzero
coefficient. If there are more than 15 zeros in a row, then the run-length is 15 and
the coefficient will be taken as zero. The symbol to be Huffman coded is composed
of 8 bits: 4 bits of run-length followed by 4 bits of category. The category is found
as in the case of DC coefficients. Also, if the category is k, then k bits of amplitude
are transmitted, which are not Huffman coded. At the receiver, once the &bit
symbol (run-length + category) is decoded, it tells the receiver how many zeros
were encountered prior to the current coefficient and the category value k.
Therefore, the receiver will interpret the following k bits as the amplitude and also
insert an appropriate number of zeros before the current coefficient, thereby
reconstructing the transmitted coefficients exactly.
As a special case, if after a nonzero coefficient, all the rest of the coefficients
in the 64-point block are zero, then the symbol of 0 (binary: 0000 0000) is
transmitted. This denotes the end of block. As an example of this procedure,
consider the example of the quantized DCT coefficient block of Fig. 11.21(c).
By using the rig-zag traversing process (Fig. 11.22), we get the first few AC
coefficients as [16, 0, 0, 2, 10, 9, -6, 0, 0, 0, 2,…]. The coding of these coefficients
is as follows:

The coding process in this table is continued until we complete all the 64
coefficients in the block, or we encounter all zeros after a point, in which case the
end of block symbol is the last one to be transmitted. The method of generating the
Huffman codes is identical to what we have described earlier in this chapter.
However, Huffman tables must be transmitted along with the data. More on the
generation and transmission of these tables is explained in the context of the JPEG
lossless compression standard.

JPEG Lossless Image Compression Standard


The original JPEG lossless compression standard was adopted at about the same
time as the lossy compression standard was approved in 1992. The lossless
standard had little in common with the DCT-based lossy standard. The lossless
standard was supposed to achieve a compression ratio of about 2 : 1 on similar sets
of images. The JPEG lossless compression standard is based on predictive coding,
followed by Huffman coding. For prediction, as the encoder scans the input image
from left to right and from top to bottom, each pixel is predicted as a linear
combination of previous pixels. The prediction error is then encoded using
Huffman coding. The standard allows the user to choose from eight different
predictors. Let a 2 x 2 block of pixels be given as follows:

The predictor options are denoted by modes and


are as follows:
The difference between the predicted value and the actual value is Huffman coded.
The coding scheme is exactly the same as that used in encoding the DC values in
the lossy JPEG scheme. Each symbol is represented by a category and an
amplitude. The category symbol is Huffman coded, and the amplitude is simply
transmitted as a string of bits. The Huffman code must meet the following
conditions:
1. The maximum code length is 16 bits.
2. A canonical Huffman code is used. Let each k codeword be n bits long. The
codewords are then given by x + 1, x + 2, . . . , x + k, where x is obtained by
left-shifting the largest magnitude number by an (n - 1) bit codeword.
These two conditions are used for fast encoding and decoding of the
Huffman table. The JPEG bit stream must also contain the Huffman table, which is
coded as follows. Two lists are specified, BITS and HUFFVAL [11]. BITS is a 16-
byte array in which the nth byte gives the number of codewords on length n.
HUFFVAL is a list of symbol values in increasing order of codeword length. In the
case of two symbols having the same codeword length, the symbol with the
smaller magnitude is given first. This encoding scheme significantly speeds up the
decoding. The decoding process requires two passes through the image.
For lossy compression, the Huffman coding scheme is not strictly specified
as above. Many hardware implementations of the lossy standard do not implement
this scheme, but simply directly input the Huffman table into the JPEG stream. The
informative section of the standard provides example Huffman tables for the
luminance and chrominance components. These tables work quite well in practice
for a wide variety of images for lossy compression, and so they are often used.
With the use of standard tables, the decoder requires only one pass through the
image. But this is not possible in the case of lossless compression.
The JPEG lossless standard based on Huffman coding has been adopted widely
and is used in some public domain implementations. However, research in the
early 1990s showed that performance could be significantly improved by using
better predictors and new algorithms. In 1994, JPEG solicited proposals for
improved lossless compression standards that could be decoded with one single
pass of the image. Several proposals were received, and in 1997, the JPEG-LS
standard was approved. Details of the standard can be found in [12] and [13]. The
new standard works significantly better than the old one, and several of its
extensions are currently under consideration for standardization.

You might also like