Mip Unit 1
Mip Unit 1
Overview of Image Processing system and human Visual system- Image representation – pixel
and voxels, Gray scale and color models- Medical image file formats- DICOM, ANALYZE 7.5, NIFTI
and INTERFILE- Discrete sampling model and Quantization- Relationship between the pixels,
Arithmetic and logical operations- Image quality and Signal to Noise ratio- Image Transforms- 2D DFT,
DCT, KLT.
INTRODUCTION:
An image contains descriptive information about the object it represents. An image is defined as
a two-dimensional function, f(x, y) that carries some information, where x and y are known as
spatial or plane coordinates. The amplitude of ‘f’ at any pair of coordinates (x, y) is called the
intensity or gray level of the image at that point.
PIXELS:
Pixels are small individual elements of a digital image. These are also known as image
elements or pixels or picture elements. Each and every pixel has a particular location and
brightness or intensity value. A finite number of pixels form a digital image.
IMAGE PROCESSING:
Image processing is defined as the process of analyzing and manipulating images using a
computer.
ANALOG IMAGE PROCESSING:
Any image processing task which is conducted on two - dimensional analog signals by analog
means is known as analog image processing.
DIGITAL IMAGE PROCESSING:
Using computer algorithms to perform image processing on digital images is referred as digital
image processing i.e. processing digital images by means of a digital computer.
MEDICAL IMAGE PROCESSING:
MIP mainly includes medical image segmentation and registration, structural analysis, 3D
reconstruction and motion analysis and other research direction are based on accurate image
segmentation, so the cornerstone to medical image processing is medical image segmentation.
Advantages of DIP:
It allows a wide range of algorithms to be applied to the input data.
It avoids noise and signal distortion problems.
Fundamental Steps
The fundamental steps in digital image processing are
Image acquisition
Image enhancement
Image restoration
Image compression
Image segmentation
Image representation and description.
Applications - Digital image processing is mainly applied in the following fields,
Gamma - Ray Imaging
X-ray Imaging
Imaging in the Ultra-Violet (UV) Band
Imaging in the Visible and Infrared (IR) Band
Imaging in the Microwave Band
Imaging in the Radio Band
Ultrasound imaging
Block Diagram
a) Image sensors:
Image sensing or image acquisition is used to acquire i.e. to get digital images. It
requires two elements, which are, A physical device that is sensitive to the
energy radiated by the object to be imaged. Example: A digital video camera. A
digitizer to convert the output of the physical sensing device into digital form.
c) Computer:
The computer used in an image processing system is a general - purpose
computer. It can range from a personal computer (PC) to a supercomputer. In
some dedicated applications, specially designed computers are used to achieve
the required performance.
d) Image processing software
The software for image processing has specialized modules which perform specific
tasks. Some software packages have the facility for the user to write code using
the specialized modules.
e) Mass storage
Since images require large amount of storage space, mass storage capability is
very important in image processing applications. Example: A 1024 x 1024 size
image with each pixel represented in 8 - bits, requires 1 megabyte of storage
space, without compression.
Measurement: Storage is measured in the following units:
Bytes = 8 bits
K bytes (Kilobytes) = One thousand bytes
M bytes (Megabytes) = One million bytes
G bytes (Gigabytes) = One billion bytes
T bytes (Terabytes) = One trillion bytes
f) Image displays:
Commonly used displays are color TV monitors. These monitors are driven by the
output of “image and graphics display cards” which are a part of the computer
system. If stereo display are needed in some applications, a headgear containing
two small displays is used.
g) Hardcopy devices
Hardcopy devices are used for recording images. These devices include,
Laser printers
Film cameras
Heat - sensitive devices
Inkjet units
Digital units like optical and CD – ROM disks etc.
Even though the highest resolution is provided by camera film, the written material preferred
is paper.
h) Network:
Networking is a function used in all computer systems today. Since image
processing applications need large amount of data, the main consideration here is the
bandwidth. Also, communications with remote sites are done through the Internet, which uses
optical fiber and other broadband technologies.
Vision is the most advanced human sense. So, images play the most important role in
human perception. Human visual perception is very important because the selection of image
processing techniques is based only on visual judgements.
Structure of the Human Eye
The human eye is nearly in the shape of a sphere. Its average diameter is approximately
20mm. The eye, called the optic globe is enclosed by three membranes known as,
(1) The Cornea and Sclera outer cover
(2) The Choroid and
Representation is the process of characterizing the quantity represented by each pixel. This is
done after the segmentation of an image. Because, the output of image segmentation is a group
of pixels. Therefore, to change this raw pixel data into a form suitable for further computer
processing, representation is used.
Types:
Description
Not only representation makes the data useful to a computer. The next task, that is
description of the region, based on the chosen representation is also needed to complete the
process.
Objective:
The objective of description process is to ‘capture’ the needed difference between
objects or classes of objects while maintaining as much independance as possible to changes
in factors such as location, size and orientation.
Features:
Some of the features used to describe the boundary of a region are:
Length
The orientation of the straight line joining the extreme points
The number of concavities i.e. curves in the boundary etc.
Whatever be the type of representation, the features selected for description should be
Insensitive to the following variations.
Size
Translation and
Rotation
4. RELATIONSHIPS BETWEEN PIXELS
The relationships between the pixels in an image should be known clearly to
understand the image processing techniques. Those relationships for an image f(x, y) are
explained below:
Neighbors of a Pixel
(x – 1, (x –1, (x – 1,
y – 1) y) y + 1)
(x 1, y),(x 1, y),(x, y 1),(x, y 1)
(x, (x,
(x, y)
y – 1) y + 1)
(x + 1, (x +1 (x + 1,
y – 1) , y) y + 1)
Here, each pixel is at unit distance from (x, y) as shown in fig. 1.24. If (x, y) is on the border
of the image, some of the neighbors of p lie outside the digital image.
Here also, some of the neighbors lie outside the image if (x, y) is on the border of the
image.
Adjacency
Let, {V} be the set of gray-level values from 0 to 255 which are used to define
adjacency. There are three types of adjacency.
(1) 4 – Adjacency
(2) 8 – Adjacency
4 – Adjacency:
Two pixels p and q with values from {V} in an image are 4 – adjacent if q is in the set N4(p).
8 – Adjacency:
Two pixels p and q with values from {V} in an image are 8 – adjacent if q is in the set N8(p).
Here, all the three pixels are 8 – adjacent to the center pixel. But, it is called multiple 8-
adjacency, since there is an ambiguity (i.e. confusion) This ambiguity is removed by using m –
adjacency.
Two pixels p and q of the same value (or specified similarity) are m-adjacent if either
i. q and p are 4-adjacent, or
ii. p and q are diagonally adjacent and do not have any common 4-adjacent neighbors. They
cannot be both (i) and (ii).
Path
A path is also known as digital path or curve. A path from pixel, p with coordinates (x, y) to
pixel q with coordinates (s, t) is defined as the sequence of different pixels with coordinates.
yn)
Path Length:
Path length is the number of pixels present in a path. It is given by the value of ‘n’
here.
Closed Path:
In a path, if (x0, y0) = (xn, yn) i.e. the first and last pixel are the same, it is known as a
‘closed path’.
Types:
(1) 4 – path
(3) m – path
Connectivity
Connectivity between pixels is a fundamental concept of digital image processing.
Connected Component:
For any pixel p in S, the set of pixels that are connected to P is called a “connected
component” of S.
Connected Set:
If the pixel p has only one connected component, then set S is called a ‘connected set’.
Region
Let ‘R’ represent a subset of pixels in an image.
Boundary
Boundary is also known as border or contour (Contour outline) The boundary of a region
R is defined as the set of pixels in the region, that has one or more neighbors which are not in
the same region R. If R is an entire image, its boundary is defined as the set of pixels in the
first and last rows and columns of the image. The boundary of a finite region forms a closed
path. Therefore, it is a “global”
Edge
Edges are formed by the pixels with derivative value that is higher than a preset threshold.
Thus, edges are considered as gray-level or intensity discontinuities. Therefore, it is a
“local” concept.
Distance Measures
Various distance measures are used to determine the distance between different pixels.
Conditions:
Consider three pixels, p, q and z, p has coordinates (x, y), q has coordinates (s, t) and z has
coordinates (v, w). For these three pixels D is a distance function or metric if
(4) Dm distance
D4 (p, q) = |x – s| + |y – t|
To create a digital image, we need to convert the continuous sensed data into digital from. This involves two
processes – sampling and quantization. An image may be continuous with respect to the x and y coordinates and
also in amplitude. To convert it into digital form we have to sample the function in both coordinates and in
amplitudes.
In order to form a digital, the gray level values must also be converted (quantized) into discrete quantities.
So we divide the gray level scale into eight discrete levels ranging from block to white. The vertical tick mark
assign the specific value assigned to each of the eight level values.
The continuous gray levels are quantized simply by assigning one of the eight discrete gray levels to each
sample. The assignment it made depending on the vertical proximity of a simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces a two dimensional digital
image.
if a signal is sampled at more than twice its highest frequency component, then it can be reconstructed exactly
from its samples.
But, if it is sampled at less than that frequency (called under sampling), then aliasing will result.
This causes frequencies to appear in the sampled signal that were not in the original signal.
Note that subsampling of a digital image will cause under sampling if the subsampling rate is less than twice
the maximum frequency in the digital image.
Aliasing can be prevented if a signal is filtered to eliminate high frequencies so that its highest frequency
component will be less than twice the sampling rate.
Gating function: exists for all space (or time) and has value zero everywhere except for a finite range of
space/time. Often used for theoretical analysis of signals. But, a gating signal is mathematically defined and
contains unbounded frequencies.
A signal which is periodic, x(t) = x(t+T) for all t and where T is the period, has a finite maximum frequency
component. So it is a bandlimited signal.
Sampling at a higher sampling rate (usually twice or more) than necessary to prevent aliasing is called
oversampling.
A pixel is represented as a square or a dot on any display screen like a mobile, TV, or computer
monitor Pixel is the smallest unit of a digital graphic which can be illuminated on a display
screen and a set of such illuminated pixels form an image on screen.
A pixel is usually represented as a square or a dot on any display screen like a
mobile, TV, or computer monitor. They can be called as the building blocks of a digital
image and can be controlled to accurately show the desired picture.
The quality of picture concerning the clarity, size and colour combination is majorly
controlled by the amount and density of pixels present in the display. Higher the resolution
smaller the size of the pixel and better the clarity and vice versa. Each pixel has a
unique geometric co-ordinate, dimensions (length and breadth), size (eight bits or more),
and has the ability to project multitude of colours.
VOXELS:
Voxels are fairly complicated to understand but can be defined in the easiest of
language as a Volumetric Pixel. In 3D printing, we can define a voxel as a value on a grid in
a three-dimensional space, like a pixel with volume.
Each voxel contains certain volumetric information which helps to create a three
dimensional object with required properties. Voxel is the smallest distinguishable element
of any 3D printed object and represents a certain grid value. However, unlike a pixel, voxel
does not have a specific position in three-dimensional space.
They are not bound by absolute coordinates but are defined by the relative position
of the surrounding voxels. We can equate a voxel to bricks, where the position of a brick is
defined by the relative position of the neighbouring bricks. One important aspect of every
voxel is the ability of repeatability. Voxels have a defined shape and size and can be
stacked over each other to create a 3D object.
A voxel need not be a cube, in fact it can be found in many forms like a spherical,
triangle, square, rectangle, diamond, etc. as long as they follow the principal rule of
repeatability
Volume Pixel does not have volume Voxel has a volume and hence also called as ‘Volumetric Pixel’.
COLOR MODELS
A color model is a specification of a coordinate system and a subspace within that system where each
color is represented by a single point. The color model can also be called as color space or color system.
It is used to specify the colors easily in a standard and accepted way.
Classification
A number of color models are in use today. Based on the use, they can be classified as
below.
RGB (Red, Green, Blue) model for color monitors and color video cameras.
CMY (Cyan, Magneta,Yellow) model and CMYK (Cyan, Magneta,Yellow, Black) model
for color printing.
The RGB model is based on a Cartesian Coordinate System. The RGB color cube is a color subspace
which is shown in fig. 1.29.
Fig. 1.29 The RGB Color Cube
Applications:
Color monitors
Color video cameras etc.
Advantages
Creating colors in this model is an easy process and therefore it is an ideal tool for ‘Image color
generation’. Changing to other models such as CMY is straight forward. It is suitable for hardware
implementation.
The RGB system is based on the strong perception of human vision to red, green and blue primaries.
Disadvantages
It is not acceptable that a color image is formed by combining three primary images.
This model is not suitable for describing colors in a way which is practical for human
interpretation.
For example, the color of a flower is not represented by the percentage of each of the primaries present
in that flower. Therefore, it is less suitable for color description.
Blac
k
Hue a color attribute that describes a pure color.
Saturation a measure of the degree to which a pure color is diluted by white light.
Representation of medical images requires two major components namely, metadata and image data
and shown in Figure 1. Metadata provides the structure and information about the image. Metadata can be
automatically added to an image by the capturing device.
Metadata stored at the beginning of an image file or separate file as a header which containing technical,
descriptive and administrative metadata. Technical metadata includes image dimension, depth of pixel and spatial
resolution, camera settings, and dots per inch. Descriptive metadata contains image creator, keywords related to the
image, captions, titles and comments. Administrative metadata is added manually with the information such as
licensing rights, restrictions on reuse, and contact information for the owner.
Image data is the raw intensity values to represent the pixels. Each pixel has its own memory space
to save the intensity in term of bits called depth per pixel. In, binary image each pixel is store in a single bit either
zero or one. X-ray, CT, MRI are produced grey scale images with 8 bits or 16 bits. The number of gray levels
between black to white depends on number bits used to represent the pixel. PET and SPECT scanner has given
colour images with 24 bits per pixel in a respective red-green-blue palette. Size of the image data is calculated from
Eqn. (1) using metadata, number of rows, number of columns and bits per pixel or voxel.
Fig. 1. Medical image file components formats
DICOM :
DICOM (Digital Imaging and Communications in Medicine) is a standard protocol for the
management and transmission of medical images and related data and is used in many healthcare
facilities. DICOM was originally developed by the National Electrical Manufacturers Association
(NEMA) and the American College of Radiology (ACR). It is a registered trademark of NEMA and is
governed by the DICOM Standards Committee, a collaboration of users across all medical imaging
specialties with an interest in the standardization of medical imaging information .
DICOM is the international standard to communicate and manage medical images and data. Its mission
is to ensure the interoperability of systems used to produce, store, share, display, send, query, process, retrieve
and print medical images, as well as to manage related workflows. Vendors who manufacture imaging equipment
-- e.g., MRIs -- imaging information systems -- e.g., PACS -- and related equipment often observe DICOM
standards, according to NEMA.
These standards can apply to any field of medicine where medical imaging technology is predominately
used, such as radiology, cardiology, oncology, obstetrics and dentistry. Medical imaging is typically a
noninvasive process of creating a visual representation of the interior of a patient, which is otherwise hidden
beneath the skin, muscle, and surrounding organ systems, for diagnostic purposes.
The term non-invasive in this context means instruments are not introduced into the patient's body -- in
most cases -- when performing a scan. Medical images are used for clinical analysis, diagnosis and treatment as
part of a patient's care plan. The information collected can be used to identify any anatomical and physiological
abnormalities, chart the progress of treatment, and provide clinicians with a database of normal patient scans for
later reference.
Imaging information systems, in compliance with DICOM, have largely eliminated the need for film-
based images and the physical storage of these items. Instead, these days, medical images, as well as related non-
image data, can be securely stored digitally, whether on premises or in the cloud.
Importance of DICOM:
With the introduction of advanced imaging technologies, such as CT scans, and the growing use of
computing in clinical work, ACR and NEMA saw a need for a standard method to transfer images and associated
information between different vendor devices, according to the International Organization for Standardization.
These devices produce a variety of digital image formats.
Today, DICOM is used worldwide to store, exchange and transmit medical images, enabling the
integration of medical imaging devices from multiple manufacturers. Patient data and related images are
exchanged and stored in a standardized format. Without a standards-based approach, it would be difficult to share
data between different imaging devices because they would need to interpret multiple image formats.
With DICOM, physicians have easier access to images and reports, allowing them to make a diagnosis,
potentially from anywhere in the world. In turn, patients obtain more efficient care.
Not all medical images follow a DICOM format, which has led to the development of cross-document
sharing, or XDS. An extension known as XDS-I is specific to imaging and allows the storage of multiple image
formats. Many medical imaging system vendors offer features that interpret DICOM and non-DICOM formats.
Analyze 7.5 :
Analyze 7.5 is a file format for storing MRI data developed by the Biomedical Imaging Resource (BIR)
at Mayo foundation in late 1980. Analyze 7.5 contains two files to represent the image data and metadata. The
files are available in the extension of “.img” and “.hdr”.
The image database is the system of files that the ANALYZE package uses to organize and access
image data on the disk. Facilities are provided for converting data from a number of sources for use with the
package. A description of the database format is provided to aid developers in porting images from other sources
for use with the ANALYZE system. An ANALYZE image database consists of at least two files:
• an image file
• a header file
The files have the same name being distinguished by the extensions .img for the image file and .hdr
for the header file. Thus, for the image database heart, there are the UNIX files heart.img and heart.hdr. The
ANALYZETM programs all refer to this pair of files as a single entity named heart.
Image File:
The format of the image file is very simple containing usually uncompressed pixel data for the images
in one of several possible pixel formats:
The .img file contains the information in the form of voxel raw data. The .hdr file contains information
about the img file, such as a number of voxels in each dimension and voxel size. Size of Analyze 7.5 header has
348 bytes.
Header File:
The header file is represented here as a `C' structure which describes the dimensions and history of the
pixel data. The header structure consists of three substructures:
• data_history optional
The header file was constructed in C structure contains three substructures such as header key (40
bytes), image dimension (108 bytes), and data history (200 bytes).
Header key and image dimension are essential structure and data history is an optional one. Analyze
header is a flexible one and that could be adapt with new user-defined data types.
Header key contains several elements namely sizeof_header, extends, and regular. Element
sizeof_header indicates the size of the header in byte representation. Element regular mention all images in the
volume are the same size. Image dimension has several elements such as X,Y, Z dimension of data, spatial units
of measure for a voxel, datatype, pixel dimension in millimetre.
NIFTI :
NIFTI overcomes the drawbacks of Analyze 7.5 developed at the early of 2000 by National Institute of
Health. NIFITI file format is more compatibility with the Analyze 7.5, data stored in NIFTI format also uses a
pair of files, “.img” and “.hdr”. A process with a pair of file is inconvenient and also error-prone. To address
these issues, NIFITI allows for storing the header and data as a single file with “.nii” extension. During transmit
the NIFITI file through a network, the deflate algorithm helps to compress and decompress the file.
In NIFTI file format, the first three dimensions store the three spatial data, x, y, z and fourth
dimension reserve for time point t. The header of NIFTI file format that the size would be 348 bytes in case
of “.hdr/.img” and a size of 352 bytes in the case of “.nii”. NIFTI header parameters are very similar like
ANALYZE 7.5 which given in Table 3. Information of header structure can be read through niftiinfo()
function using MATLAB. NIFTI header info of Brain Imaging of Normal Subjects (BRAINS) dataset is
given in Table 4.
NIFTI was created for handling Neuro-imaging but can be used for other fields as well. NIFTI has
several features such as raw data saved in the 3D image which contains two affine coordinates to relate voxel
index to spatial index. NIFTI has an advantage as storing two files per 3D scans instead of handling multiple
Analyze files. NIFTI file can allow storing some addition parameters such as key acquisition parameters, and
experimental design.
Concept:
Unlike other variable-length odes, arithmetic coding generates non-block codes. It does not have
one - to - one correspondence between source and code symbols in which a code word is present for each
source symbol. Instead, a set or sequence of source symbols is assigned a single arithmetic code word.
Features:
An interval of real numbers between 0 and 1 is defined by the code words for each
sequence of source symbols.
When the number of symbols in the message increases, two changes can happen:
(i) The interval for representing the message becomes smaller according to the
probability of each symbol.
(ii) The number of bits for representing the interval becomes larger.
These operations are applied on pixel-by-pixel basis. So, to add two images together, we add the value at
pixel (0 , 0) in image 1 to the value at pixel (0 , 0) in image 2 and store the result in a new image at pixel (0 , 0).
Then we move to the next pixel and repeat the process, continuing until all pixels have been visited. Clearly, this
can work properly only if the two images have identical dimensions. If they do not, then combination is still
possible, but a meaningful result can be obtained only in the area of overlap. If our images have dimensions of
w1*h1, and w2*h2 and we assume that their origins are aligned, then the new image will have dimensions w*h,
where:
w = min (w1, w2)
h = min (h1, h2)
Addition and Averaging:
If we add two 8-bit gray scale images, then pixels in the resulting image can have values in the range 0-510.
We should therefore choose a 16-bit representation for the output image or divide every pixel value by two. If we
do the later, then we are computing an average of the two images. The main application of image averaging is
noise removal. Every image acquired by a real sensor is afflicted to some degree of random noise. However, the
level of noise is represented in the image can be reduced, provided that the scene is static and unchanging, by the
averaging of multiple observations of that scene. This works because the noisy distribution can be regarded as
approximately symmetrical with a mean of zero. As a result,
as likely as negative perturbations by the same amount, and there will be a tendency for the perturbations to
cancel out when several noisy values are added. Addition can also be used to combine the information of two
images, such as an image morphing, in motion pictures.
Algorithm 1: image addition
read input-image1 into in-array1;
read input-image2 into in- array2;
for i = 1 to no-of-rows do
for j=1 to no-of-columns do begin
out-array (i,j) = in-array1(i,j) + in-array2(i,j);
if ( out-array (i,j) > 255 ) then out-array (i,j) = 255;
end
write out-array to out-image;
Subtraction:
Subtracting two 8-bit grayscale images can produce values between – 225 and +225. This necessitates the use of
16-bit signed integers in the output image unless sign is unimportant, in which case we can simply take the
modulus of the result and store it using 8-bit integers:
g(x,y) = |f1 (x,y) f2 (x,y)|
The main application for image subtraction is in change detection (or motion detection). If we make two
observations of a scene and compute their difference using the above equation, then changes will be indicated by
pixels in the difference image which have non-zero values. Sensor noise, slight changes in illumination and
various other factors can result in small differences which are of no significance so it
is usual to apply a threshold to the difference image. Differences below this threshold are set to zero. Difference
above the threshold can, if desired, be set to the maximum pixel value. Subtraction can also be used in medical
imaging to remove static background information.
Multiplication and Division:
Multiplication and division can be used to adjust brightness of an image. Multiplication of pixel values by
a number greater than one will brighten the image, and division by a factor greater than one will darken the image.
Brightness adjustment is often used as a preprocessing step in image enhancement. One of the principle uses of
image multiplication (or division) is to correct grey-level shading resulting from non uniformities in illumination
orin the sensor used to acquire the image.
LOGICAL OPERATION:
Logical operations apply only to binary images, whereas arithmetic operations apply to multi-valued
pixels. Logical operations are basic tools in binary image processing, where they are used for tasks such as
masking, feature detection, and shape analysis. Logical operations on entire image are performed pixel by pixel.
Because the AND operation of two binary variables is 1 only when both variables are 1, the result at any location
in a resulting AND image is 1 only if the corresponding pixels in the two input images are 1. As logical operation
involve only one pixel location at a time, they can be done in place, as in the case of arithmetic operations. The
XOR (exclusive OR) operation yields a 1 when one or other pixel (but not both) is 1, and it yields a 0 otherwise.
The operation is unlike the OR operation, which is 1, when one or the other pixel is 1, or both pixels are 1.
Logical AND & OR operations are useful for the masking and compositing of images. For example, if we
compute the AND of a binaryimage with some other image, then pixels for which the corresponding value in the
binary image is 1 will be preserved, but pixels for which the corresponding binary value is 0 will be set to 0
(erased) . Thus the binary image acts as a mask that removes information from certain parts of the image. On the
other hand, if we compute the OR of a binary image with some other image , the pixels for which the
corresponding value in the binary image is 0 will be preserved, but pixels for which the corresponding binary
value is 1, will be set to 1 (cleared).
Image quality can refer to the level of accuracy with which different imaging systems capture, process,
store, compress, transmit and display the signals that form an image. Another definition refers to image quality
as "the weighted combination of all of the visually significant attributes of an image". The difference between
the two definitions is that one focuses on the characteristics of signal processing in different imaging systems
and the latter on the perceptual assessments that make an image pleasant for human viewers. Image quality
should not be mistaken with image fidelity. Image fidelity refers to the ability of a process to render a given
copy in a perceptually similar way to the original (without distortion or information loss), i.e., through
a digitization or conversion process from analog media to digital image.
Noise is a random variation of image density, visible as grain in film and pixel level variations in digital images.
It arises from the effects of basic physics— the photon nature of light and the thermal energy of heat— inside
image sensors. Typical noise reduction (NR) software reduces the visibility of noise by smoothing the image,
excluding areas near contrast boundaries. This technique works well, but it can obscure fine, low contrast detail.
Dynamic range (or exposure range) is the range of light levels a camera can capture, usually measured in f-
stops, EV (exposure value), or zones (all factors of two in exposure). It is closely related to noise: high noise
implies low dynamic range.
Tone reproduction is the relationship between scene luminance and the reproduced image brightness.
Contrast, also known as gamma, is the slope of the tone reproduction curve in a log-log space. High contrast
usually involves loss of dynamic range — loss of detail, or clipping, in highlights or shadows.
Color accuracy is an important but ambiguous image quality factor. Many viewers prefer enhanced color
saturation; the most accurate color isn't necessarily the most pleasing. Nevertheless, it is important to measure a
camera's color response: its color shifts, saturation, and the effectiveness of its white balance algorithms.
Distortion is an aberration that causes straight lines to curve. It can be troublesome for architectural
photography and metrology (photographic applications involving measurement). Distortion tends to be
noticeable in low cost cameras, including cell phones, and low cost DSLR lenses. It is usually very easy to see
in wide angle photos. It can be now be corrected in software.
Vignetting, or light falloff, darkens images near the corners. It can be significant with wide angle lenses.
Exposure accuracy can be an issue with fully automatic cameras and with video cameras where there is little or
no opportunity for post-exposure tonal adjustment. Some even have exposure memory: exposure may change
after very bright or dark objects appear in a scene.
Lateral chromatic aberration (LCA), also called "color fringing", including purple fringing, is a lens
aberration that causes colors to focus at different distances from the image center. It is most visible near corners
of images. LCA is worst with asymmetrical lenses, including ultrawides, true telephotos and zooms. It is
strongly affected by demosaicing.
Lens flare, including "veiling glare" is stray light in lenses and optical systems caused by reflections between
lens elements and the inside barrel of the lens. It can cause image fogging (loss of shadow detail and color) as
well as "ghost" images that can occur in the presence of bright light sources in or near the field of view.
Color moiré is artificial color banding that can appear in images with repetitive patterns of high spatial
frequencies, like fabrics or picket fences. It is affected by lens sharpness, the anti-aliasing (lowpass) filter
(which softens the image), and demosaicing software. It tends to be worst with the sharpest lenses.
Artifacts – software (especially operations performed during RAW conversion) can cause significant visual
artifacts, including data compression and transmission losses (e.g. Low quality JPEG), over sharpening "halos"
and loss of fine, low-contrast detail.
Signal-to-noise ratio (SNR) is used in imaging to characterize image quality. The sensitivity of
a (digital or film) imaging system is typically described in the terms of the signal level that yields a threshold
level of SNR. Industry standards define sensitivity in terms of the ISO film speed equivalent, using SNR
thresholds (at average scene luminance) of 40:1 for "excellent" image quality and 10:1 for "acceptable" image
quality.
SNR is sometimes quantified in decibels (dB) of signal power relative to noise power, though in the
imaging field the concept of "power" is sometimes taken to be the power of a voltage signal proportional to
optical power; so a 20 dB SNR may mean either 10:1 or 100:1 optical power, depending on which definition is
in use.
DEFINITION OF SNR:
Traditionally, SNR is defined to be the ratio of the average signal value to the standard deviation of the signal,
when the signal is an optical intensity, or as the square of this value if the signal and noise are viewed as
amplitudes (field quantities).
TRANSFORMATION:
Transformation is a function. A function that maps one set to another set after performing some operations.
Now function applied inside this digital system that process an image and convert it into output can be called as
transformation function. As it shows transformation or relation, that how an image1 is converted to image2.
Image transformation:
G(x,y) = T{ f(x,y) }
In this equation,
This relation between input image and the processed output image can also be represented as.
s = T (r)
where r is actually the pixel value or gray level intensity of f(x,y) at any point. And s is the pixel value or
gray level intensity of g(x,y) at any point. The basic gray level transformation has been discussed in our
tutorial of basic gray level transformations. Now we are going to discuss some of the very basic
transformation functions.
Examples
Now if you will look at this particular graph, you will see a straight transition line between input image and
output image. It shows that for each pixel or intensity value of input image, there is a same intensity value of
output image. That means the output image is exact replica of the input image. It can be mathematically
represented as:
g(x,y) = f(x,y)
All the processing methods of digital images can be broadly divided into two categories. They are,
(1), (2), (3), (4), (5) (6) methods whose outputs are images
(7), (8) ,(9) ,(10) methods whose outputs are image attributes
Knowledge Base
The knowledge base may either be simple such as the details of image regions or it may be
complex such as an image database containing high-resolution satellite images for change-
detection applications.
It guides the operation of each processing module in Fig. 1.1.
- Image enhancement is the process of manipulating an image so that the result is more
suitable than the original image for specific application.
- There are a variety of enhancement techniques that use so many different image
processing approaches.
- Color is one of the very important features that has been extracted from an image.
- Color image processing techniques process an image considering its color as one of the
important attribute in addition to other attributes.
Recognition
The output of processing the image can be obtained from any stage of the modules shown in fig.
1.1.
The modules that are required for an application is totally dependent on the problem to be solved.
UNIT II ENHANCEMENT TECHNIQUES 9+3
Gray level transformation- Log transformation, Power law transformation, Piecewise linear transformation.
Histogram processing- Histogram equalization, Histogram Matching. Spatial domain Filtering-Smoothing filters,
sharpening filters. Frequency domain filtering- Smoothing filters, Sharpening filters- Homomorphic filtering -
Medical image enhancement using Hybrid filters- Performance measures for enhancement techniques.
Experiment with various filtering techniques for noise reduction and enhancement in medical images using
Matlab.
The principal objective of enhancement is to process an image so that the result is more suitable than
the original image for a specific application. Image enhancement approaches fall into two board categories
Spatial domain methods
Frequency domain methods
The term spatial domain refers to the image plane itself and approaches in this categories are basedon direct
manipulation of pixel in an image.
g(x,y)=T[f(x,y)]
Spatial domain process are denoted by the expression f(x,y)- input image T- operator on f, defined over
some neighborhood of f(x,y) g(x,y)-processed image. The neighborhood of a point (x,y) can be explain by
using as square or rectangular sub image area centered at (x,y).
The center of sub image is moved from pixel to pixel starting at the top left corner. The operator T is
applied to each location (x,y) to find the output g at that location . The process utilizes only the pixel in the
area of the image spanned by the neighborhood.It is the simplest form of the transformations when the
neighborhood is of size IxI. In this case g depends only on the value of f at (x,y) and T becomes a gray
level transformation function of the forms
S=T(r)
r- Denotes the gray level of f(x,y)
s- Denotes the gray level of g(x,y) at any point (x,y)
Because enhancement at any point in an image deepens only on the gray level at that point,technique
in this category are referred to as point processing. There are basically three kinds of functions in gray
level transformation –
Point Processing:
Contract stretching -It produces an image of higher contrast than the original one. The operation is
performed by darkening the levels below m and brightening the levels above m in the original image.
In this technique the value of r below m are compressed by the transformation function into a narrow
range of s towards black .The opposite effect takes place for the values of r above m.
Thresholding function: It is a limiting case where T(r) produces a two levels binaryimage.
The values below m are transformed as black and above m are transformed as white.
Reverting the intensity levels of an image in this manner produces the equivalent of a Photo graphic
negative. This type of processing is practically suited for enhancing white or gray details embedded in dark
regions of an image especially when the black areas are dominant in size.
Log transformations:
The general form of the log transformation is
s = c log(1 + r)
Where c- constant
R≥o
This transformation maps a narrow range of gray level values in the input image into a wider range of output
gray levels. The opposite is true for higher values of input levels. We would use this transformations to
expand the values of dark pixels in an image while compressing the higherlevel values. The opposite is true
for inverse log transformation. The log transformation function has an important characteristic that it
compresses the dynamic range of images with largevariations in pixel values.
Eg- Fourier spectrum
Power law curves with fractional values of y map a narrow range of dark input values into a wider
range of output values, with the opposite being true for higher values of input gray levels.We may
get various curves by varying values of γ.
A variety of devices used for image capture, printing and display respond according to a powerlaw. The
process used to correct this power law response phenomenon is called gamma correction. For eg-CRT devices
have intensity to voltage response that is a power function.
Gamma correction is important if displaying an image accurately on a computer screen is of concern.
Images that are not corrected properly can look either bleached out or too dark. Colorphenomenon also uses this
concept of gamma correction. It is becoming more popular due to useof images over the internet. It is important
in general purpose contract manipulation. To make an image black we use γ >1 and γ <1 for white image.
Contrast Stretching
It is the simplest piecewise linear transformation function. We may have various low contrast images
and that might result due to various reasons such as lack of illumination, problemin imaging sensor or wrong
setting of lens aperture during image acquisition. The idea behind contrast stretching is to increase the
dynamic range of gray levels in the image being processed.
The location of points (r1, s1) and (r2, s2) control the shape of the curve.
a) If r1=r2 and s1=s2, the transformation is a linear function that deduces no change in
gray levels.
b) If r1=s1, s1=0 , and s2=L-1, then the transformation become a thresholding
function that creates a binary image
c) Intermediate values of (r1, s1) and (r2, s2) produce various degrees of spread in the
grayvalue of the output image thus effecting its contract. Generally r1≤ r2 and s1 ≤ s2 so that
the function is single valued and monotonically increasing.
(2) Second method is to brighten the desired ranges of gray levels but preserve
the backgroundand gray level tonalities in the image.
P(rk) gives the estimate of the probability of occurrence of gray level rk. The sum of all
components of a normalized histogram is equal to 1.The histogram plots are simple plots of
In the dark image the components of the histogram are concentrated on the low (dark) side of thegray scale. In
case of bright image the histogram components are biased towards the high side of the gray scale. The histogram
of a low contrast image will be narrow and will be centred towardsthe middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of the grayscale. The net effect
of this will be an image that shows a great deal of gray levels details andhas high dynamic range.
Histogram Equalization
Histogram equalization is a common technique for enhancing the appearance of images.
Supposewe have an image which is predominantly dark. Then its histogram would be skewed towards
thelower end of the grey scale and all the image detail are compressed into the dark end of the
histogram. If we could ‘stretch out’ the grey levels at the dark end to produce a more uniformly
distributed histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be enhanced.
The range of r is [0, 1] with r=0 repressing black and r=1 representing white. The
transformation function is of the form
It produces a level s for every pixel value r in the original image.The transformation function is
assumed to fulfil two conditions
T(r) is single valued and monotonically increasing in the interval 0<T(r) <1 for 0<r, 1
The transformation function should be single valued so that the inverse
transformations should exist. Monotonically increasing condition preserves the
increasing order from black to white in theoutput image.
The second conditions guarantee that the output gray levels will be in the same rangeas
the input levels. The gray levels of the image may be viewed as random variables in
the interval[0.1] .The most fundamental descriptor of a random variable is its
probability density function (PDF) Pr(r) and Ps(s) denote the probability density
functions of random variables r and s respectively. Basic results from an elementary
probability theory states that if Pr(r) and Tr are known and T-1(s) satisfies conditions
(a),
For discrete values we deal with probability and summations instead of probability densityfunctions
and integrals.
The transformation function is
Function that seeks to produce an output image that has a uniform histogram. It is a good approach
when automatic enhancement is needed. Thus, an output image is obtained by mapping each pixel with
level rk in the input image into acorresponding pixel with level sk. Equalization automatically
determines a transformation.
Enhancement Using Arithmetic/Logic Operations
These operations are performed on a pixel by basis between two or more images
excluding not operation which is performed on a single image. It depends on the
hardware and/or software that the actual mechanism of implementation should be
sequential, parallel or simultaneous.
Logic operations are also generally operated on a pixel by pixel basis. Only AND, OR
and NOT logical operators are functionally complete. Because all other operators can
be implemented by using these operators. While applying the operations on gray scale
images, pixel values are processed as strings of binary numbers. The NOT logic
operation performsthe same function as the negative transformation.
Image Masking is also referred to as region of Interest (Ro1) processing. This is done
to highlight a particular area and to differentiate it from the rest of the image. Out of
the four arithmetic operations, subtraction and addition are the most useful for image
enhancement.
Image Subtraction
The difference between two images f(x,y) and h(x,y) is expressed as
g(x,y)=f(x,y)-h(x,y)
It is obtained by computing the difference between all pairs of corresponding pixels fromf
and h.
The key usefulness of subtraction is the enhancement of difference between images.
This concept is used in another gray scale transformation for enhancement known as
bit plane slicing. The higher order bit planes of an image carry a significant amount of
visually relevant detail while the lower planes contribute to fine details.
It we subtract the four least significant bit planes from the image. The result will be
nearly identical but there will be a slight drop in the overall contrast due to less
variability in the gray level values of image.
The use of image subtraction is seen in medical imaging area named as mask mode
radiography. The mask h (x,y) is an X-ray image of a region of a patient’s body this
image is captured by using as intensified TV camera located opposite to the x-ray
machine then aconsistent medium is injected into the patient’s blood storm and then a
series of image aretaken of the region same as h(x,y).The mask is then subtracted from
the series of incoming image. This subtraction will give the area which will be the
difference between f(x,y) andh(x,y) this difference will be given as enhanced detail in
the output image.
This produces a move shoving now the contrast medium propagates through various
arteries of the area being viewed. Most of the image in use today is 8- bit image sothe
values of the image lie in the range 0 to 255.The value in the difference image can lie
from -255 to 255. For these reasons we have to do some sort of scaling to display the
results
There are two methods to scale an image
(i) Add 255 to every pixel and then divide at by 2.This gives the surety that
pixel values will be in the range 0 to 255 but it is not guaranteed whether it will cover
the entire8 – bit range or not. It is a simple method and fast to implement but will not
utilize the entire gray scale range to display the results.
(ii) Another approach is
(a) Obtain the value of minimum difference
(b) Add the negative of minimum value to the pixels in the
differenceimage (this will give a modified image whose minimum value
will be 0)
(c) Perform scaling on the difference image by multiplying each pixel
bythe quantity 255/max.
This approach is complicated and difficult to implement. Image subtraction is
used insegmentation application also.
Image Averaging
Consider a noisy image g(x,y) formed by the addition of noise η(x,y) to the original image
f(x,y)
One simple way to reduce this granular noise is to take several identical pictures and average
them,thus smoothing out the randomness.
Assuming that at every point of coordinate (x,y) the noise is uncorrelated and has zero
average value. The objective of image averaging is to reduce the noise content by adding a set of noise
images, {gi(x,y)}.If in image formed by image averaging K different noisy images
E{g(x,y)} = f(x,y) means that g(x,y) approaches f(x,y) as the number of noisy image used in the
averaging processes increases. Image averaging is important in various applications such as in thefield
of astronomy where the images are low light levels.
For linear spatial filtering the response is given by a sum of products of the filter coefficient and the
corresponding image pixels in the area spanned by the filter mask.
The results R of liner filtering with the filter mask at point (x,y) in the image is
The sum of products of the mask coefficient with the corresponding pixel directly under the mask. The
coefficient w (0,0) coincides with image value f(x,y) indicating that mask it centered at (x,y).when the
computation of sum of products takes place.
For a mask of size MxN we assume m=2a+1 and n=2b+1, where a and b are non negative integers. It
shows that all the masks are of add size.
In the general liner filtering of an image of size f of size M*N with a filter mask of size m*m is given by
the expression
The general implementation for filtering an MXN image with a weighted averaging filter of size
mxn is given by
Median filter
The best-known order-statistics filter is the median filter, which, as its name implies,
replaces the value of a pixel by the median of the gray levels in the neighborhood of that pixel:
The original value of the pixel is included in the computation of the median. Median filters are quite
popular because, for certain types of random noise, they provide excellent noise-reduction capabilities,
with considerably less blurring than linear smoothing filters of similar size. Median filters are
particularly effective in the presence of both bipolar and unipolar impulse noise. In fact,the median filter
yields excellent results for images corrupted by this type of noise.
This filter is useful for finding the brightest points in an image. Also, because pepper noise has very low
values, it is reduced by this filter as a result of the max selection process in the sub image area S. The 0th
percentile filter is the Min filter.
The principal objective of sharpening is to highlight fine details in an image or to
enhance details that have been blurred either in error or as a natural effect of particular
method for image acquisition.
The applications of image sharpening range from electronic printing and medical
imagingto industrial inspection and autonomous guidance in military systems.
As smoothing can be achieved by integration, sharpening can be achieved by spatial
differentiation. The strength of response of derivative operator is proportional to the
degree of discontinuity of the image at that point at which the operator is applied. Thus
image differentiation enhances edges and other discontinuities and deemphasizes the
areas with slow varying grey levels.
It is a common practice to approximate the magnitude of the gradient by using absolute
values instead of square and square roots.
A basic definition of a first order derivative of a one dimensional function f(x) is the difference.
Laplacian highlights gray-level discontinuities in an image and deemphasize the regions of slow
varying gray levels. This makes the background a black image. The background texture can be recovered
by adding the original and Laplacian images.
• To sharpen an image, the Laplacian of the image is subtracted from the original image.
A slight further generalization of unsharp masking is called high boost filtering. A highboost filtered
image is defined at any point (x,y) as
Basis of Filtering in Frequency Domain
Basic steps of filtering in frequency Domain
) x+y
i) Multiply the input image by (-1 to centre the transform
ii) Compute F(u,v), Fourier Transform of the image
iii) Multiply f(u,v) by a filter function H(u,v)
iv) Compute the inverse DFT of Result of (iii)
v) Obtain the real part of result of (iv)
x+y
vi) Multiply the result in (v) by (-1)
H (u,v) called a filter because it suppresses certain frequencies from the image while leaving others
unchanged.
function
Where D (u,v) : the distance from point (u,v) to the center of the frequency rectangle
D(u,v)- the distance of point (u,v) from the center of the transform σ = D0- specified cut off
frequency
The filter has an important characteristic that the inverse of it is also Gaussain.
Sharpening Frequency
Domain High pass
filtering:
Image sharpening can be achieved by a high pass filtering process, which attenuatesthe low
frequency components without disturbing high-frequency information. These are radially symmetric
and completely specified by a cross section.
If we have the transfer function of a low pass filter the corresponding high pass filtercan be
obtained using the equation
f (x) =
The above two equation (e) and (f) comprise of a discrete Fourier transformation pair. According to
Euler’s formula
e jx = cos x + j sin xSubstituting these value
to equation (e)
F (u) =Σf(x) [cos 2πux/N+jSin 2πux/N] for u=0, 1, 2,……, N-1
The Fourier transformation separates a function into various components, based on frequency
components. These components are complex quantities.
F {f(x, y)}
Inverse Fourier transformation is given by equation
When images are sampled in a squared array, i.e. M=N, we can write