0% found this document useful (0 votes)

15 views14 pages

1 2 3 Merged

The document covers various concepts in computer vision, including the pinhole perspective imaging model, color space transformations between RGB and CIE XYZ, and the principles of separable convolution. It also discusses convolution with delta functions and provides an implementation algorithm for background subtraction in video streams. Additionally, it outlines the research and application areas of computer vision, highlighting its significance in fields such as autonomous vehicles, robotics, and surveillance.

Uploaded by

Subhasish Sutradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

1 2 3 Merged

Uploaded by

Subhasish Sutradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

1.

The Pinhole Perspective Imaging Model

Imagine a completely dark box with a tiny hole in one of its sides. If you place an object in front
of this pinhole, an inverted image of the object will form on the opposite side of the box. This
simple setup is the essence of the pinhole perspective imaging model.
Here's a breakdown of its key aspects:
● Pinhole as the Center of Projection: The tiny hole acts as the center from which all light
rays originating from the object pass through. This single point of projection is a
fundamental characteristic of the model.
● Straight Line Projection: Light rays are assumed to travel in straight lines from the
object, through the pinhole, and onto the image plane. This linear projection simplifies the
geometry significantly.
● Inverted Image Formation: As the light rays cross at the pinhole, the image formed on
the image plane is inverted both horizontally and vertically with respect to the object.
● Perspective Effect: Objects farther away from the pinhole appear smaller in the image,
while closer objects appear larger. This is the perspective effect that our eyes and most
cameras naturally exhibit.
● No Lenses: The ideal pinhole model doesn't involve any lenses. The pinhole itself
restricts the light rays, creating a focused image (in theory, with an infinitely small
pinhole).
● Image Plane: The surface where the image is formed is called the image plane. In the
simplest model, this plane is assumed to be flat and perpendicular to the optical axis (the
line passing through the pinhole and the center of the image plane).
In essence, the pinhole camera model provides a simplified yet powerful geometric
framework for understanding how a 3D scene is projected onto a 2D image plane. It forms
the basis for many concepts in computer vision and graphics, even though real cameras use
lenses to gather more light and focus the image more effectively.

2. Transforming Between RGB and CIE XYZ

The transformation between RGB color spaces and the CIE XYZ color space is indeed a linear
transformation. This means that each component of one color space is a linear combination of
the components of the other color space.
Let's represent the RGB color vector as \begin{bmatrix} R \\ G \\ B \end{bmatrix} and the CIE
XYZ color vector as \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}.
RGB to CIE XYZ:
The transformation from RGB to CIE XYZ can be expressed as:
\qquad \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} = \mathbf{M}_{RGB \to XYZ} \begin{bmatrix} R \\
G \\ B \end{bmatrix} = \begin{bmatrix} M_{11} & M_{12} & M_{13} \\ M_{21} & M_{22} & M_{23}
\\ M_{31} & M_{32} & M_{33} \end{bmatrix} \begin{bmatrix} R \\ G \\ B \end{bmatrix}
Where M_{ij} are the elements of the 3 \times 3 transformation matrix \mathbf{M}_{RGB \to
XYZ}. These elements are derived from the color matching functions of the specific RGB color
space (e.g., sRGB, Adobe RGB) and the CIE color matching functions. Each row of the matrix
essentially represents how the R, G, and B primaries contribute to the X, Y, and Z tristimulus
values, respectively. For instance:
\qquad X = M_{11}R + M_{12}G + M_{13}B \qquad Y = M_{21}R + M_{22}G + M_{23}B \qquad
Z = M_{31}R + M_{32}G + M_{33}B
CIE XYZ to RGB:
Similarly, the transformation from CIE XYZ back to RGB is also a linear transformation,
represented by the inverse of the \mathbf{M}_{RGB \to XYZ} matrix:
\qquad \begin{bmatrix} R \\ G \\ B \end{bmatrix} = \mathbf{M}_{XYZ \to RGB} \begin{bmatrix} X \\
Y \\ Z \end{bmatrix} = \mathbf{M}_{RGB \to XYZ}^{-1} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} =
\begin{bmatrix} M'_{11} & M'_{12} & M'_{13} \\ M'_{21} & M'_{22} & M'_{23} \\ M'_{31} & M'_{32}
& M'_{33} \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}
Where M'_{ij} are the elements of the inverse transformation matrix \mathbf{M}_{XYZ \to RGB}.
These elements determine how the X, Y, and Z tristimulus values combine to produce the R, G,
and B color components. For example:
\qquad R = M'_{11}X + M'_{12}Y + M'_{13}Z \qquad G = M'_{21}X + M'_{22}Y + M'_{23}Z
\qquad B = M'_{31}X + M'_{32}Y + M'_{33}Z
The specific numerical values of the matrix elements depend on the chosen RGB color space's
primaries and white point.

3. Separable Convolution
Let's consider an image I(x, y) of size N \times N and a discrete, separable 2D filter kernel K(x,
y) of size (2k + 1) \times (2k + 1). Separability means that the 2D kernel can be expressed as
the outer product of two 1D kernels: a horizontal kernel K_h(x) of size (2k + 1) \times 1 and a
vertical kernel K_v(y) of size 1 \times (2k + 1).
\qquad K(x, y) = K_v(y) K_h(x)
Direct 2D Convolution:
To compute the convolution of the image with the 2D kernel at each pixel (i, j), we perform the
following summation:
\qquad (I * K)(i, j) = \sum_{x=-k}^{k} \sum_{y=-k}^{k} I(i - x, j - y) K(x, y)
For each output pixel, this involves (2k + 1) \times (2k + 1) multiplications and approximately the
same number of additions. Since there are N \times N output pixels, the total number of
operations is roughly N^2 (2k + 1)^2 multiplications and N^2 (2k + 1)^2 additions.
Convolution with Two 1D Kernels:
Using the separability property, we can perform the convolution in two steps:
1. Convolve the image with the 1D horizontal kernel K_h(x) along each row: \qquad I'(i,
j) = \sum_{x=-k}^{k} I(i - x, j) K_h(x) For each of the N rows and each of the N columns,
this requires (2k + 1) multiplications and (2k) additions. The total number of operations for
this step is approximately N^2 (2k + 1) multiplications and N^2 (2k) additions.
2. Convolve the intermediate result I'(i, j) with the 1D vertical kernel K_v(y) along each
column: \qquad (I * K)(i, j) = \sum_{y=-k}^{k} I'(i, j - y) K_v(y) Similarly, for each of the N
columns and each of the N rows, this requires (2k + 1) multiplications and (2k) additions.
The total number of operations for this step is approximately N^2 (2k + 1) multiplications
and N^2 (2k) additions.
Total Operations for Separable Convolution:
The total number of operations for the separable convolution is approximately 2 \times N^2 (2k +
1) multiplications and 2 \times N^2 (2k) additions.
Estimate of Operations Saved:
The number of multiplications saved is approximately:
\qquad N^2 (2k + 1)^2 - 2 N^2 (2k + 1) = N^2 (2k + 1) [(2k + 1) - 2] = N^2 (2k + 1) (2k - 1) = N^2
(4k^2 - 1)
The number of additions saved is approximately:
\qquad N^2 (2k + 1)^2 - 2 N^2 (2k) = N^2 [(4k^2 + 4k + 1) - 4k] = N^2 (4k^2 + 1)
For larger kernel sizes (k > 1), the savings in the number of operations by using separable
convolution can be significant, as the complexity reduces from O(N^2 k^2) to O(N^2 k).

4. Convolution with Delta Functions

Let's consider a continuous function f(t) (the results extend analogously to 2D functions). The
Dirac delta function \delta(t) is defined by the following properties:
1. \delta(t) = 0 for t \neq 0
2. \int_{-\infty}^{\infty} \delta(t) dt = 1
3. \int_{-\infty}^{\infty} f(\tau) \delta(t - \tau) d\tau = f(t) (sifting property)
Convolution with a Delta Function:
The convolution of f(t) with \delta(t) is given by:
\qquad (f * \delta)(t) = \int_{-\infty}^{\infty} f(\tau) \delta(t - \tau) d\tau
Using the sifting property of the delta function, where the delta function "sifts out" the value of
f(\tau) at \tau = t, we get:
\qquad (f * \delta)(t) = f(t)
Thus, convolving a function with a delta function reproduces the original function.
Convolution with a Shifted Delta Function:
Now, let's consider a shifted delta function \delta(t - a), where a is a constant shift. The
convolution of f(t) with \delta(t - a) is:
\qquad (f * \delta(t - a))(t) = \int_{-\infty}^{\infty} f(\tau) \delta((t - a) - \tau) d\tau =
\int_{-\infty}^{\infty} f(\tau) \delta(\tau - (t - a)) d\tau
Again, using the sifting property, but this time the delta function is centered at \tau = t - a, we
get:
\qquad (f * \delta(t - a))(t) = f(t - a)
This shows that convolving a function f(t) with a shifted delta function \delta(t - a) results in a
shifted version of the original function, f(t) shifted by a. If a > 0, the function is shifted to the right
(delayed), and if a < 0, the function is shifted to the left (advanced).
These properties of convolution with the delta function are fundamental in signal processing and
image processing, particularly when dealing with impulse responses and spatial
transformations.
Implementation Algorithm of Background Subtraction
Background subtraction is a technique used to identify moving objects in a video stream by
differentiating them from a static background. Here's a common implementation algorithm:
1. Initialization (Background Model Creation):
○ Collect Initial Frames: Acquire a sequence of initial video frames that ideally
contain only the static background without any foreground objects.
○ Build the Background Model: There are several ways to build the background
model:
■ Simple Averaging: Calculate the pixel-wise average of the initial frames.
This creates a single background image.
■ Median Filtering: Compute the pixel-wise median of the initial frames. This is
more robust to transient noise or small moving objects.
■ Gaussian Mixture Model (GMM): Model each pixel's history as a mixture of
Gaussian distributions. Background pixels will typically form one or more
dominant Gaussians. This is more adaptive to gradual changes in the
background (e.g., lighting).
2. Foreground Detection (Frame Processing):
○ Acquire a New Frame: Read the current frame from the video stream.
○ Compare with Background Model: For each pixel in the current frame, compare
its color or intensity value with the corresponding pixel in the background model.
○ Determine Foreground Pixels: A pixel is classified as foreground if the difference
between its current value and the background model exceeds a predefined
threshold. The threshold needs to be chosen carefully to balance sensitivity to
motion and robustness to noise or minor background variations.
■ For simple averaging/median: Calculate the absolute difference
|I_{current}(x, y) - B(x, y)| > T, where I_{current} is the current frame, B is the
background model, and T is the threshold.
■ For GMM: A pixel is considered foreground if its current value does not fit
any of the background Gaussian distributions with a certain confidence level.
○ Create a Binary Mask: Generate a binary image (the foreground mask) where
foreground pixels are white (or 1) and background pixels are black (or 0).
3. Post-processing (Refinement):
○ Noise Reduction: The initial foreground mask often contains noise (isolated white
pixels or small groups). Apply morphological operations like erosion (to remove
small white regions) followed by dilation (to restore the shape of the remaining
foreground objects) to clean the mask.
○ Blob Analysis: Group connected foreground pixels into distinct regions or "blobs."
This helps in identifying individual moving objects.
○ Object Tracking (Optional): If the goal is to track the detected objects over time,
assign unique IDs to the blobs and follow their movement across frames.
4. Update Background Model (Optional):
○ Adaptive Background: To handle gradual changes in the background (e.g.,
shadows, lighting changes, waving trees), the background model can be updated
over time. This is typically done by slowly incorporating the current frame's
information into the background model, but only for pixels classified as background.
The learning rate for this update needs to be carefully chosen.

What is Computer Vision?

Computer Vision is an interdisciplinary field of artificial intelligence (AI) that enables computers
to "see" and interpret the visual world. It involves developing algorithms and techniques that
allow computers to acquire, process, analyze, and understand images and videos, much like
human vision does. The goal is to extract meaningful information from visual data and use it for
various tasks.

Research and Application Areas of Computer Vision

Computer vision is a rapidly evolving field with numerous research and application areas,
including:
Research Areas:
● Image Classification: Categorizing images into predefined classes (e.g., cat vs. dog, car
vs. pedestrian).
● Object Detection: Identifying and localizing specific objects within an image or video
(e.g., detecting faces, cars, and traffic signs in a scene).
● Image Segmentation: Dividing an image into meaningful regions or segments (e.g.,
separating different objects or parts of an object).
● Object Tracking: Following the movement of objects over time in a video sequence.
● Pose Estimation: Determining the 3D pose (position and orientation) of objects or
humans from images or videos.
● Scene Understanding: Developing a comprehensive understanding of the content,
context, and relationships between objects in a visual scene.
● Image Generation: Creating new images from textual descriptions or other input.
● Video Analysis: Understanding and interpreting events, activities, and behaviors in video
data.
● 3D Reconstruction: Creating 3D models of objects or scenes from multiple images or
videos.
● Visual Recognition: A broad area encompassing various tasks like image classification,
object detection, and instance segmentation.
● Explainable AI (XAI) for Vision: Understanding why a computer vision model makes a
particular decision.
● Adversarial Attacks and Robustness: Studying vulnerabilities of vision models and
developing robust models.
● Self-Supervised Learning for Vision: Training vision models without explicit human
annotations.
Application Areas:
● Autonomous Vehicles: Object detection, lane keeping, traffic sign recognition,
pedestrian detection.
● Robotics: Navigation, object manipulation, inspection, human-robot interaction.
● Surveillance and Security: Intruder detection, anomaly detection, crowd analysis, facial
recognition.
● Medical Imaging: Disease diagnosis, image-guided surgery, medical image analysis.
● Manufacturing and Quality Control: Defect detection, part inspection, assembly
verification.
● Retail: Inventory management, customer behavior analysis, product recommendation.
● Agriculture: Crop monitoring, disease detection, yield prediction.
● Augmented and Virtual Reality: Scene understanding, object tracking, virtual object
placement.
● Human-Computer Interaction: Gesture recognition, eye tracking, emotion recognition.
● Entertainment: Special effects in movies, gaming, content creation.
● Search and Retrieval: Image and video search engines, content-based image retrieval.
● Accessibility: Assisting visually impaired individuals with scene description and object
recognition.

Define Image Processing, Pattern Recognition, and Photogrammetry

● Image Processing: Focuses on manipulating and transforming digital images to enhance
their quality, extract specific features, or prepare them for further analysis. The input and
output of image processing are typically images. Common tasks include noise reduction,
image enhancement (contrast adjustment, sharpening), geometric transformations
(scaling, rotation), and image restoration.
● Pattern Recognition: A broader field that aims to classify or categorize data (including
images, audio, text, etc.) into predefined classes or patterns. It involves developing
algorithms that can learn from data and make predictions or decisions based on the
identified patterns. Computer vision tasks like object detection and image classification
are considered subfields of pattern recognition. The process typically involves feature
extraction, followed by a classification or clustering algorithm.
● Photogrammetry: The science and technology of obtaining reliable information about
physical objects and the environment through the process of recording, measuring, and
interpreting photographic images and patterns of electromagnetic radiant energy and
other phenomena. It is primarily concerned with creating accurate 2D and 3D
measurements from images. Applications include surveying, mapping, 3D modeling of
terrain and buildings, and industrial metrology. While it uses images as input, its primary
goal is geometric reconstruction and measurement, rather than high-level understanding
of the image content like in computer vision.

Explain Snell’s Law for Refraction

Snell's Law describes the relationship between the angles of incidence and refraction when light
(or other waves) passes through the interface between two different homogeneous media with
different refractive indices.
Let:
● n_1 be the refractive index of the first medium.
● n_2 be the refractive index of the second medium.
● \theta_1 be the angle of incidence (the angle between the incident ray and the normal to
the interface).
● \theta_2 be the angle of refraction (the angle between the refracted ray and the normal to
the interface).
Snell's Law states:
\qquad n_1 \sin(\theta_1) = n_2 \sin(\theta_2)
In simpler terms:
The ratio of the sine of the angle of incidence to the sine of the angle of refraction is equal to the
inverse ratio of the refractive indices of the two media.
Explanation:
● When light travels from a medium with a lower refractive index (n_1) to a medium with a
higher refractive index (n_2), it bends towards the normal (\theta_2 < \theta_1). This is
because light travels slower in a denser medium (higher refractive index).
● Conversely, when light travels from a medium with a higher refractive index (n_1) to a
medium with a lower refractive index (n_2), it bends away from the normal (\theta_2 >
\theta_1).
● If the refractive indices of the two media are the same (n_1 = n_2), then \sin(\theta_1) =
\sin(\theta_2), which implies \theta_1 = \theta_2. In this case, there is no bending of light
at the interface.
Snell's Law is a fundamental principle in optics and explains phenomena like the bending of light
as it passes from air to water (or glass) and the functioning of lenses and prisms.

What is a Negative Image? How Can We Generate a Negative Image?

A negative image is an image in which the tonal values are inverted. Bright areas in the original
image appear dark in the negative image, and dark areas appear bright. Essentially, it's like
looking at the photographic negative of a print.
Generating a Negative Image:
For a digital grayscale image with pixel intensity values ranging from 0 (black) to L-1 (white),
where L is the number of possible intensity levels (e.g., for an 8-bit image, L = 2^8 = 256), the
negative image can be generated by subtracting each pixel's intensity value from the maximum
intensity value:
\qquad I_{negative}(x, y) = (L - 1) - I_{original}(x, y)
Where:
● I_{negative}(x, y) is the intensity of the pixel at coordinates (x, y) in the negative image.
● I_{original}(x, y) is the intensity of the pixel at coordinates (x, y) in the original image.
● L is the total number of intensity levels.
For a color image with RGB components (each ranging from 0 to 255 for an 8-bit image),
the negative image is generated by inverting each color channel independently:
\qquad R_{negative}(x, y) = 255 - R_{original}(x, y) \qquad G_{negative}(x, y) = 255 -
G_{original}(x, y) \qquad B_{negative}(x, y) = 255 - B_{original}(x, y)
The result is an image where the colors are also inverted (e.g., red becomes cyan, green
becomes magenta, and blue becomes yellow).

Implementation Algorithm of Mean Shift Segmentation

Mean shift segmentation is a non-parametric clustering algorithm used to segment an image
into regions of similar color or intensity. It works by iteratively shifting each data point (pixel)
towards the average of the data points within its neighborhood until convergence.
Here's a common implementation algorithm:
1. Initialization:
○ Define Search Window: Choose a kernel (e.g., Gaussian, flat) and a bandwidth
(radius) for the search window. This window determines the neighborhood of pixels
considered for each mean shift iteration. The bandwidth is a crucial parameter that
affects the size and granularity of the resulting segments.
○ Initialize Cluster Centers: Each pixel in the image is initially considered as a
potential cluster center.
2. Iteration (Mean Shift Process):
○ For each pixel p_i in the image:
■ Define the Search Window: Center the search window around the current
pixel p_i.
■ Calculate the Mean Shift Vector: Find all pixels p_j within the search
window of p_i. Calculate the weighted average (mean) of these neighboring
pixels, where the weights are determined by the kernel function (e.g.,
Gaussian weight decreases with distance). The mean shift vector
\mathbf{m}(p_i) is the difference between this weighted mean and the current
pixel p_i: \qquad \mathbf{m}(p_i) = \frac{\sum_{p_j \in W(p_i)} w(p_j - p_i)
p_j}{\sum_{p_j \in W(p_i)} w(p_j - p_i)} - p_i where W(p_i) is the set of pixels
within the search window of p_i, and w(\cdot) is the kernel function.
■ Update Pixel Position: Shift the current pixel p_i by the mean shift vector:
\qquad p_i^{new} = p_i + \mathbf{m}(p_i)
■ Repeat: Continue this iterative shifting process until the mean shift vector
becomes smaller than a predefined threshold, indicating convergence. The
final converged position is considered the mode (peak) of the local density of
pixels.
3. Clustering (Mode Assignment):
○ Assign Pixels to Modes: After the mean shift process converges for all pixels,
group the pixels based on the modes they converged to. Pixels that converge to the
same mode are considered to belong to the same segment.
○ Merge Close Modes (Optional): If two modes are very close to each other in the
feature space (e.g., color space and spatial space), they can be merged into a
single segment to reduce over-segmentation. A distance threshold is used for this
merging.
4. Output:
○ Segmentation Map: Create a segmented image where each segment is
represented by a unique color or label. This is done by assigning the same color
(e.g., the color of the mode) to all pixels that belong to the same segment.
Key Considerations:
● Bandwidth Selection: The bandwidth of the kernel is the most critical parameter. A small
bandwidth leads to fine-grained segmentation (many small segments), while a large
bandwidth results in coarser segmentation (fewer large segments).
● Kernel Choice: Common kernels include the flat (uniform) kernel and the Gaussian
kernel. The Gaussian kernel gives more weight to closer pixels.
● Feature Space: Mean shift can be applied in different feature spaces. For color
segmentation, the feature space is typically the RGB or Lab color space. Spatial
information (pixel coordinates) can also be included in the feature vector to encourage
spatially connected segments.
● Computational Cost: Mean shift can be computationally intensive, especially for large
images, as each pixel undergoes an iterative process.

What is a Histogram? Explain Histogram Equalization.

A histogram of a digital image is a graphical representation of the distribution of pixel intensity
values. For a grayscale image with intensity levels ranging from 0 to L-1, the histogram plots the
frequency (number of pixels) of each intensity level. The horizontal axis represents the intensity
values, and the vertical axis represents the number of pixels at that intensity.
Histogram Equalization is a technique used to enhance the contrast of an image by
redistributing the pixel intensity values to approximate a uniform distribution. The goal is to
stretch out the intensity range, making better use of all possible intensity levels and thereby
increasing the overall contrast of the image.
Algorithm for Histogram Equalization:
1. Calculate the Histogram: Compute the histogram h(r_k) of the input image, where r_k
represents the k-th intensity level (from 0 to L-1) and h(r_k) is the number of pixels with
that intensity.
2. Calculate the Normalized Histogram (Probability Density Function): Normalize the
histogram by dividing each frequency by the total number of pixels N in the image: \qquad
p(r_k) = \frac{h(r_k)}{N}, for k = 0, 1, ..., L-1 Here, p(r_k) represents the probability of
occurrence of the intensity level r_k.
3. Calculate the Cumulative Distribution Function (CDF): Compute the cumulative sum
of the normalized histogram: \qquad cdf(r_k) = \sum_{i=0}^{k} p(r_i) = \sum_{i=0}^{k}
\frac{h(r_i)}{N} The CDF represents the probability that a pixel's intensity level is less than
or equal to r_k.
4. Map the Intensity Values: Use the CDF to create a transformation function that maps the
original intensity levels to new intensity levels s_k. For an output image with L intensity
levels, the mapping function is: \qquad s_k = \text{round}((L - 1) \times cdf(r_k)) Here, (L -
1) scales the CDF to the full range of output intensity levels, and the rounding operation
ensures that the output intensities are integers.
5. Create the Equalized Image: Apply the mapping function to each pixel in the original
image. If a pixel in the original image has intensity r_k, its corresponding pixel in the
equalized image will have intensity s_k.
The resulting image will have a histogram that is approximately uniform, leading to increased
contrast and better visibility of details.

Sketching the Histogram and Equalized Histogram

Let's analyze the given 3-bit image (8 intensity levels, 0-7) of size 64 \times 64 = 4096 pixels
with the intensity distribution:
r_k 0 1 2 3 4 5 6 7
n_k 790 1023 850 656 329 245 122 81
**1. Sketch the
1. Sketch the Histogram:
The histogram will have 8 bins, corresponding to the 8 intensity levels (0 to 7). The height of
each bar will represent the number of pixels (n_k) at that intensity level.
● Intensity 0: Height = 790
● Intensity 1: Height = 1023 (highest peak)
● Intensity 2: Height = 850
● Intensity 3: Height = 656
● Intensity 4: Height = 329
● Intensity 5: Height = 245
● Intensity 6: Height = 122
● Intensity 7: Height = 81 (lowest peak)
The histogram will show a distribution where most pixels have intensities around 1 and 2, with
fewer pixels at the extreme ends (0 and 7).
2. Calculate the Normalized Histogram (Probability Density Function):
Total number of pixels N = 64 \times 64 = 4096.
r_k n_k p(r_k) = n_k / N
0 790 790 / 4096 ≈ 0.193
1 1023 1023 / 4096 ≈ 0.250
2 850 850 / 4096 ≈ 0.208
3 656 656 / 4096 ≈ 0.160
4 329 329 / 4096 ≈ 0.080
5 245 245 / 4096 ≈ 0.060
6 122 122 / 4096 ≈ 0.030
7 81 81 / 4096 ≈ 0.020
3. Calculate the Cumulative Distribution Function (CDF):
r_k p(r_k) cdf(r_k)
0 0.193 0.193
1 0.250 0.193 + 0.250 = 0.443
2 0.208 0.443 + 0.208 = 0.651
3 0.160 0.651 + 0.160 = 0.811
4 0.080 0.811 + 0.080 = 0.891
5 0.060 0.891 + 0.060 = 0.951
6 0.030 0.951 + 0.030 = 0.981
7 0.020 0.981 + 0.020 = 1.001 ≈ 1.00
4. Map the Intensity Values:
Using the formula s_k = \text{round}((L - 1) \times cdf(r_k)), where L = 8 and L - 1 = 7:
r_k cdf(r_k) s_k = \text{round}(7 \times
cdf(r_k))
0 0.193 round(7 * 0.193) = round(1.351)
=1
1 0.443 round(7 * 0.443) = round(3.101)
=3
2 0.651 round(7 * 0.651) = round(4.557)
=5
3 0.811 round(7 * 0.811) = round(5.677)
r_k cdf(r_k) s_k = \text{round}(7 \times
cdf(r_k))
=6
4 0.891 round(7 * 0.891) = round(6.237)
=6
5 0.951 round(7 * 0.951) = round(6.657)
=7
6 0.981 round(7 * 0.981) = round(6.867)
=7
7 1.000 round(7 * 1.000) = round(7.000)
=7
5. Sketch the Equalized Histogram:
Now, we need to find the number of pixels at each new intensity level s_k. This is done by
summing the number of pixels from the original histogram that map to the same new intensity
level.
● s_k = 1: Corresponds to r_k = 0, so n_{s=1} = n_{r=0} = 790
● s_k = 3: Corresponds to r_k = 1, so n_{s=3} = n_{r=1} = 1023
● s_k = 5: Corresponds to r_k = 2, so n_{s=5} = n_{r=2} = 850
● s_k = 6: Corresponds to r_k = 3 and r_k = 4, so n_{s=6} = n_{r=3} + n_{r=4} = 656 + 329
= 985
● s_k = 7: Corresponds to r_k = 5, r_k = 6, and r_k = 7, so n_{s=7} = n_{r=5} + n_{r=6} +
n_{r=7} = 245 + 122 + 81 = 448
● s_k = 0 and s_k = 2 and s_k = 4 have 0 pixels.
The equalized histogram will have the following approximate distribution:
● Intensity 0: Height = 0
● Intensity 1: Height = 790
● Intensity 2: Height = 0
● Intensity 3: Height = 1023
● Intensity 4: Height = 0
● Intensity 5: Height = 850
● Intensity 6: Height = 985
● Intensity 7: Height = 448
The equalized histogram shows a more spread-out distribution of pixel intensities compared to
the original histogram, indicating increased contrast in the equalized image.

Explain Lowpass Gaussian Filter Kernels

A lowpass Gaussian filter kernel is a type of linear filter used in image processing to blur an
image and reduce noise. It works by convolving the image with a Gaussian function.
Gaussian Function in 1D:
The 1D Gaussian function is defined as:
\qquad G(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{x^2}{2\sigma^2}}
where:
● x is the distance from the center of the kernel.
● \sigma (sigma) is the standard deviation of the Gaussian distribution. It controls the extent
of the blurring. A larger \sigma results in more blurring.
Gaussian Function in 2D:
For a 2D image, the Gaussian function is often defined as a separable function of x and y:
\qquad G(x, y) = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2 + y^2}{2\sigma^2}} =
\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{x^2}{2\sigma^2}} \times \frac{1}{\sqrt{2\pi\sigma^2}}
e^{-\frac{y^2}{2\sigma^2}}
Gaussian Filter Kernel:
A discrete Gaussian filter kernel is a finite-sized matrix derived from sampling the 2D Gaussian
function. The size of the kernel is typically (2k + 1) \times (2k + 1), where k is an integer that
determines the radius of the kernel. The values in the kernel represent the weights applied to
neighboring pixels during convolution.
Properties of Gaussian Filters:
● Lowpass: They attenuate high-frequency components in the image, which correspond to
sharp edges and noise, while preserving low-frequency components (smooth regions).
This results in a blurring effect.
● Spatially Local: The weights in the kernel decrease with distance from the center,
meaning that closer pixels have a greater influence on the filtered pixel value than farther
pixels.
● Smoothness: The Gaussian function is smooth and continuous, which helps to produce
smooth blurring without sharp transitions or ringing artifacts that can occur with other
types of lowpass filters (e.g., box filter).
● Separability: The 2D Gaussian function is separable into the product of two 1D Gaussian
functions. This property allows for efficient implementation of the 2D convolution by
performing two 1D convolutions (one horizontal and one vertical), as discussed earlier.
Which filtering techniques do we have to use to remove salt and pepper noise?
Salt and pepper noise is characterized by random occurrences of black (pepper) and white
(salt) pixels in an image. Since these are sharp, isolated noise points, linear filters like the
Gaussian filter are not very effective at removing them without also blurring the image
significantly.
The most effective filtering techniques for removing salt and pepper noise are non-linear filters,
specifically order-statistic filters. The most common and effective one is the median filter.
Median Filter:
The median filter works by replacing the value of each pixel with the median value of its
neighboring pixels within a defined window (kernel). The median is the middle value in the
sorted set of neighboring pixel values.
● How it works for salt and pepper noise: If the central pixel is a noisy black or white
pixel, the neighboring pixels are more likely to have the original, uncorrupted intensity
values. The median operation effectively replaces the extreme noisy value with a more
representative value from its neighborhood.
Other less common but potentially useful non-linear filters for salt and pepper noise
include:
● Min Filter: Replaces the central pixel with the minimum value in its neighborhood.
Effective for removing salt noise (white pixels).
● Max Filter: Replaces the central pixel with the maximum value in its neighborhood.
Effective for removing pepper noise (black pixels).
● Midpoint Filter: Replaces the central pixel with the average of the minimum and
maximum values in its neighborhood.
The choice of filter and the size of the filter kernel depend on the density of the salt and pepper
noise. Higher noise densities might require larger kernel sizes.
What should an object recognition system do? Explain the current
strategies for object recognition.
An object recognition system should take an image or video as input and perform the
following key tasks:
1. Identify the presence of specific objects: Determine if any of the pre-defined object
categories are present in the input.
2. Localize the objects: If an object is detected, determine its spatial location within the
image, typically by drawing a bounding box around it.
3. Classify the detected objects: Assign a specific category label to each detected object
(e.g., "car," "person," "cat").
4. (Optional) Provide additional information: Depending on the application, the system
might also need to provide more detailed information, such as:
○ Instance segmentation: Identifying the precise pixel boundaries of each object
instance.
○ Pose estimation: Determining the 3D orientation and pose of the object.
○ Attributes: Recognizing specific properties of the object (e.g., color, size, make of
a car).
Current Strategies for Object Recognition:
Modern object recognition systems heavily rely on deep learning, particularly Convolutional
Neural Networks (CNNs). Here are the main current strategies:
1. End-to-End Deep Learning Approaches:
● Region-Based CNNs (R-CNN family):
○ R-CNN (Region-based Convolutional Neural Network): Proposes a set of
candidate object regions using a selective search algorithm, extracts features from
each region using a CNN, and then classifies these regions using Support Vector
Machines (SVMs) and refines the bounding boxes using linear regressors.
○ Fast R-CNN: Improves upon R-CNN by extracting features from the entire image
once and then using a Region of Interest (RoI) pooling layer to extract fixed-size
feature vectors for each proposed region, making it significantly faster.
○ Faster R-CNN: Further enhances speed by replacing the selective search with a
Region Proposal Network (RPN) that is also a CNN, allowing the network to learn to
propose regions directly.
○ Mask R-CNN: Extends Faster R-CNN by adding a branch for predicting
segmentation masks for each detected object, enabling instance segmentation.
● Single-Shot Detectors (SSDs):
○ SSD (Single Shot MultiBox Detector): Predicts bounding boxes and class
probabilities directly from feature maps at multiple scales in a single forward pass of
the network, making it very fast.
○ YOLO (You Only Look Once): Divides the image into a grid and predicts bounding
boxes and class probabilities for each grid cell. It also operates in a single forward
pass and is known for its speed. Various versions (YOLOv2, YOLOv3, YOLOv4,
YOLOv5, YOLOR, YOLOv7, YOLOv8) have been developed with significant
improvements in accuracy and efficiency.
○ RetinaNet: Addresses the class imbalance problem in single-shot detectors using a
focal loss function, achieving state-of-the-art accuracy while maintaining reasonable
speed.
● Transformers for Object Detection:
○ More recently, transformer-based architectures, like DETR (DEtection
TRansformer), have shown promising results. DETR uses a transformer
encoder-decoder architecture along with a set of learnable object queries to directly
predict a fixed number of object bounding boxes and their classes in parallel,
eliminating the need for explicit region proposal or anchor box generation.
2. Key Components and Techniques Used in These Strategies:
● Convolutional Neural Networks (CNNs): Serve as the backbone for feature extraction,
learning hierarchical representations of visual data. Architectures like VGG, ResNet,
Inception, EfficientNet, and others are commonly used.
● Feature Pyramids: Processing features at multiple scales to handle objects of different
sizes. Techniques like Feature Pyramid Networks (FPN) are widely used.
● Anchor Boxes (or Prior Boxes): A set of pre-defined bounding boxes with different sizes
and aspect ratios used in some detectors (like Faster R-CNN and SSD) to facilitate the
prediction of object locations.
● Non-Maximum Suppression (NMS): A post-processing step used to eliminate redundant
overlapping bounding box predictions for the same object, keeping only the most
confident one.
● Data Augmentation: Techniques like random cropping, flipping, scaling, and color
jittering are used to increase the diversity of the training data and improve the robustness
of the models.
● Transfer Learning: Pre-training CNNs on large-scale image datasets (like ImageNet) and
then fine-tuning them on the specific object recognition task with a smaller dataset.
● Loss Functions: Carefully designed loss functions that penalize incorrect classifications
and inaccurate bounding box predictions are crucial for training effective models.
The field of object recognition is continuously advancing, with ongoing research focusing on
improving accuracy, speed, robustness to variations (e.g., lighting, viewpoint, occlusion), and
reducing the need for large amounts of labeled data.

Suggested Readings: Image Processing Basics
No ratings yet
Suggested Readings: Image Processing Basics
11 pages
Module 1 Chapter 3 CV
No ratings yet
Module 1 Chapter 3 CV
19 pages
Convolution in Image Processing
No ratings yet
Convolution in Image Processing
6 pages
Convolution in Signal Processing
No ratings yet
Convolution in Signal Processing
6 pages
TinyML Epi 7
No ratings yet
TinyML Epi 7
21 pages
Vision Review: Image Processing: Course Web Page
No ratings yet
Vision Review: Image Processing: Course Web Page
51 pages
Linear Filters
No ratings yet
Linear Filters
65 pages
Convo Luci On Adores
No ratings yet
Convo Luci On Adores
6 pages
Lecture 2: Image Processing Review, Neighbors, Connected Components, and Distance
No ratings yet
Lecture 2: Image Processing Review, Neighbors, Connected Components, and Distance
7 pages
Computer Vision
No ratings yet
Computer Vision
8 pages
Module-1 - Chapter3 Image Processing
No ratings yet
Module-1 - Chapter3 Image Processing
48 pages
Lecture 2 1 Image Filtering 2018
No ratings yet
Lecture 2 1 Image Filtering 2018
46 pages
ECE280F24 Lab5
No ratings yet
ECE280F24 Lab5
27 pages
Convolution Techniques in Signal Processing
No ratings yet
Convolution Techniques in Signal Processing
36 pages
Lecture03 FT 2D
No ratings yet
Lecture03 FT 2D
68 pages
The 2D Discrete Fourier Transform: Chapter Four
No ratings yet
The 2D Discrete Fourier Transform: Chapter Four
12 pages
Ivp2 - Image Fundamentals
No ratings yet
Ivp2 - Image Fundamentals
26 pages
4 Chapter2
No ratings yet
4 Chapter2
45 pages
Image Filtering: Davide Scaramuzza
No ratings yet
Image Filtering: Davide Scaramuzza
63 pages
Linear Filtering in Computer Vision
No ratings yet
Linear Filtering in Computer Vision
63 pages
Fourier Transform 2D
100% (2)
Fourier Transform 2D
58 pages
Lecture 2.1 - Image Processing Image Filtering: Idar Dyrdal
No ratings yet
Lecture 2.1 - Image Processing Image Filtering: Idar Dyrdal
38 pages
Computer Vision ch2
No ratings yet
Computer Vision ch2
75 pages
2D Preliminaries
No ratings yet
2D Preliminaries
8 pages
Sem232 LA CC07 GROUP10
No ratings yet
Sem232 LA CC07 GROUP10
30 pages
2 Edge
No ratings yet
2 Edge
103 pages
Scheme and Solution: Rns Institute of Technology
No ratings yet
Scheme and Solution: Rns Institute of Technology
8 pages
Transforms Notes
No ratings yet
Transforms Notes
65 pages
Mathematical Models of Image Processing
No ratings yet
Mathematical Models of Image Processing
58 pages
Image Transformation
No ratings yet
Image Transformation
86 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
1 page
Lec10 Image Enhancment
No ratings yet
Lec10 Image Enhancment
27 pages
2D Signals and Systems Overview
No ratings yet
2D Signals and Systems Overview
67 pages
Convolution Lec1
No ratings yet
Convolution Lec1
62 pages
Lecture #2: C Camera Model
No ratings yet
Lecture #2: C Camera Model
38 pages
Image Processing 1 FPCV-1-4
No ratings yet
Image Processing 1 FPCV-1-4
30 pages
Lec 16
No ratings yet
Lec 16
15 pages
Module-2 - Computer Vision Complete
No ratings yet
Module-2 - Computer Vision Complete
57 pages
VC 1
No ratings yet
VC 1
20 pages
Lecture 3: Filtering I Image Enhancement by Neighbourhood Processing
No ratings yet
Lecture 3: Filtering I Image Enhancement by Neighbourhood Processing
7 pages
Frequency Domian Filtering Module
No ratings yet
Frequency Domian Filtering Module
5 pages
IT Assignment
No ratings yet
IT Assignment
14 pages
Dip
No ratings yet
Dip
11 pages
Robotics
No ratings yet
Robotics
35 pages
Cours 2
No ratings yet
Cours 2
114 pages
Lec 6
No ratings yet
Lec 6
87 pages
Mathematical Tools in DIP - 2
No ratings yet
Mathematical Tools in DIP - 2
52 pages
Computer Vision MCQ's For Interview
No ratings yet
Computer Vision MCQ's For Interview
12 pages
DLV Notes Preparatin
No ratings yet
DLV Notes Preparatin
24 pages
Advanced Image Processing Techniques
No ratings yet
Advanced Image Processing Techniques
8 pages
05 Spatial Filtering
No ratings yet
05 Spatial Filtering
94 pages
PRH - 362 - Outline - and - Summarised Notes
No ratings yet
PRH - 362 - Outline - and - Summarised Notes
26 pages
Gaussian Filters
No ratings yet
Gaussian Filters
52 pages
Percent: Test
No ratings yet
Percent: Test
1 page
Sa 4
No ratings yet
Sa 4
1 page
Grammar Test: For Nithra
No ratings yet
Grammar Test: For Nithra
1 page
Question Bank 2
No ratings yet
Question Bank 2
1 page
Sa 3
No ratings yet
Sa 3
1 page
Mzris: ZP+ ZP+
No ratings yet
Mzris: ZP+ ZP+
1 page
Exercise 3.4: 1: False: (A) (B) Rectangles. (D) (E)
No ratings yet
Exercise 3.4: 1: False: (A) (B) Rectangles. (D) (E)
1 page
Chapter-3 Understanding Quadrilaterals
No ratings yet
Chapter-3 Understanding Quadrilaterals
1 page
10: How Two Are: This Figure Trapezium. Which Its Sides Parallel?
No ratings yet
10: How Two Are: This Figure Trapezium. Which Its Sides Parallel?
1 page
Answer: A (Ii) (Iv) A
No ratings yet
Answer: A (Ii) (Iv) A
1 page
Sa 2
No ratings yet
Sa 2
1 page
Sa 5
No ratings yet
Sa 5
1 page
Module 3
No ratings yet
Module 3
18 pages
Module 4
No ratings yet
Module 4
14 pages
Assignment 8 OE EE-804
No ratings yet
Assignment 8 OE EE-804
1 page
Clearance Certificate TIT
No ratings yet
Clearance Certificate TIT
1 page
Syllabus Class VII
No ratings yet
Syllabus Class VII
27 pages
Paper2009 ASCEConf.255 264PeatSettlement
No ratings yet
Paper2009 ASCEConf.255 264PeatSettlement
11 pages
Class 6 Science Mind Map
No ratings yet
Class 6 Science Mind Map
16 pages
View File
No ratings yet
View File
20 pages
Advt. Apprentices 2025-26
No ratings yet
Advt. Apprentices 2025-26
34 pages
Final Report
No ratings yet
Final Report
49 pages
A Review On Fog Removal With Its Techniques and Types
No ratings yet
A Review On Fog Removal With Its Techniques and Types
10 pages
Linear Systems: - Many Image Processing (Filtering) Operations Are Modeled As A Linear System
No ratings yet
Linear Systems: - Many Image Processing (Filtering) Operations Are Modeled As A Linear System
41 pages
Enhancing Degraded Color Images Using Fuzzy Logic and Artificial Bee Colony
No ratings yet
Enhancing Degraded Color Images Using Fuzzy Logic and Artificial Bee Colony
6 pages
DIP 17EC72 18EC733assigment1
No ratings yet
DIP 17EC72 18EC733assigment1
1 page
Computer Vision: Image Enhancement in Spatial Domain
No ratings yet
Computer Vision: Image Enhancement in Spatial Domain
18 pages
Medical Imaging - Lab # 03
No ratings yet
Medical Imaging - Lab # 03
14 pages
Dip - 02 Histogram
No ratings yet
Dip - 02 Histogram
51 pages
Assignment1 40168195
No ratings yet
Assignment1 40168195
10 pages
Cec366 Question Bank 1
No ratings yet
Cec366 Question Bank 1
6 pages
Dip Assignment Tauqeer
No ratings yet
Dip Assignment Tauqeer
69 pages
Digital Image Processing IMAGE ENHANCEMENT
No ratings yet
Digital Image Processing IMAGE ENHANCEMENT
35 pages
DIP2
No ratings yet
DIP2
22 pages
QC Workstation Manual Eng
No ratings yet
QC Workstation Manual Eng
85 pages
Chapter III - Image Enhancement
100% (1)
Chapter III - Image Enhancement
64 pages
1011sem1 Me5405
No ratings yet
1011sem1 Me5405
7 pages
Intro to Image Processing Basics
No ratings yet
Intro to Image Processing Basics
5 pages
Image Histogram Equalization Guide
No ratings yet
Image Histogram Equalization Guide
13 pages
Advanced Image Processing Using Opencv
No ratings yet
Advanced Image Processing Using Opencv
26 pages
Geospatial Tools for Risk Assessment
No ratings yet
Geospatial Tools for Risk Assessment
88 pages
B.Tech Exam: Digital Processing
No ratings yet
B.Tech Exam: Digital Processing
4 pages
19bce2593 VL2021220104449 Pe003
No ratings yet
19bce2593 VL2021220104449 Pe003
60 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
63 OpenCV Interview Questions - Adaface
No ratings yet
63 OpenCV Interview Questions - Adaface
20 pages
Endterm IIP Solution
No ratings yet
Endterm IIP Solution
4 pages
Digital Image Processing Questions 221 Questions) Q
0% (2)
Digital Image Processing Questions 221 Questions) Q
35 pages
Dip Practical File
No ratings yet
Dip Practical File
16 pages
Brightness Preserving Contrast Enhancement of Medical Images Using Adaptive Gamma Correction and Homomorphic Filtering
No ratings yet
Brightness Preserving Contrast Enhancement of Medical Images Using Adaptive Gamma Correction and Homomorphic Filtering
4 pages
Lab Assignments For Remote Sensing: Professor Sabyasachi Maiti Department of Geology & Geophysics I I T Kharagpur
No ratings yet
Lab Assignments For Remote Sensing: Professor Sabyasachi Maiti Department of Geology & Geophysics I I T Kharagpur
6 pages
Computer Vision Lab Manual 2023
No ratings yet
Computer Vision Lab Manual 2023
63 pages

1 2 3 Merged

Uploaded by

1 2 3 Merged

Uploaded by

1.

The Pinhole Perspective Imaging Model

2. Transforming Between RGB and CIE XYZ

4. Convolution with Delta Functions

What is Computer Vision?

Research and Application Areas of Computer Vision

Define Image Processing, Pattern Recognition, and Photogrammetry

Explain Snell’s Law for Refraction

What is a Negative Image? How Can We Generate a Negative Image?

Implementation Algorithm of Mean Shift Segmentation

What is a Histogram? Explain Histogram Equalization.

Sketching the Histogram and Equalized Histogram

Explain Lowpass Gaussian Filter Kernels

You might also like