COMPUTER VISION
DEPARTMENTAL ELECTIVE
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
COURSE OUTCOMES
► Understand key features of Computer Vision to analyse and interpret the
visible world around us
► Design and implement multi-dimensional signal processing, feature
extraction, pattern analysis visual geometric modelling, and stochastic
optimization
► Apply the computer vision concepts to Biometrics, Medical diagnosis,
document processing, mining of visual content, to surveillance, advanced
rendering
SYLLABUS
What is Computer Vision?
A field of computer science which enables computers to see, identify
And process images in the same way that human vision does and
provide output
What is an Image?
► An image is a signal which can be modelled into a 2D or 3D function.
Output:
CV Image analysis,
Input:
Interpretation,
Image
Scene
understanding
CG DIP
Output:
Output: Image recovery,
Image Filtering,
reconstruction
Applications of Computer Vision
Applications of Computer Vision
► Object Tracking
► Boundary Detection
► Shape and Texture Identification
► Grouping of similar objects in image
► Character Recognition
► Medical Image Analysis
► Biometric Recognition
THE 3Rs OF COMPUTER VISION
► The central problems in computer vision are recognition, reconstruction
and reorganization
► Recognition is about attaching semantic category labels to objects and
scenes as well as to events and activities.
► Reconstruction is traditionally about estimating shape, spatial layout,
reflectance and illumination – which could be used together to render the
scene to produce an image.
► Reorganization is our term for what is usually called “perceptual
organization” in human vision; the “re” prefix makes the analogy with
recognition and reconstruction more salient.
BASIC CONCEPTS
► Involves preprocessing of images , object segmentation, feature extraction
and classification
► In monochrome images the minimum value corresponds to black and the
maximum to white.
► The different values the intensity function can take are called gray levels.
► Gray level indicates brightness of a pixel
DIGITIZATION
► Sampling of digital image from analog signal along x and y directions
► Discretization : The function f(x,y) is sampled into an array of MxN . Each
element in this matrix is called pixel
► Quantization: Intensity values given to the image components
► Digital images can be quantized upto 256 gray levels
•Grayscale Images: Each pixel value (0–255) represents a shade of gray,
where 0 is black and 255 is white.
•Color Images: Each pixel value (0–255) corresponds to an entry in a color
palette (a table of 256 colors).-> 8 bit system
COLOR QUANTIZATION
► Coloured images can be quantized into three vectors, one for red colour,
green and blue
► Any colour is a combination of these three primary colours
► There are various colour models
► Typically, an 8 bit integer is used to represent the intensity value in each
channel, giving rise to (2^8)^3 = 1.677 x 10^7 colors
Color Models
► Grayscale image: 2D array of size M x N containing scalar intensity values
(graylevels).
► Color image: typically represented as a 3D array of size M x N x 3 again
containing scalar values. But each pixel location now has three values –
called as R(red),G(green), B(blue) intensity values.
► All file formats store color images based on this representation.
Color Models
► Grayscale image: 2D array of size M x N containing scalar intensity values
(graylevels).
► Color image: typically represented as a 3D array of size M x N x 3 again
containing scalar values. But each pixel location now has three values –
called as R(red),G(green), B(blue) intensity values.
► All file formats store color images based on this representation.
► Pixel depth= No of bits used to store RGB image (8+8+8)
Color Models
► Images in the RGB color model consist of three component images, one
for each primary color – Red, Green, Blue
► In CMY model, Cyan, magenta and yellow are called secondary colors of
light, or primary colors of pigments, used for color printing.
► RGB, CMY are not intuitive from the point of view of human
perception/description.
► We tend of think of color as the following components: (HSI) hue (the origin
of the color we see, in purest form), saturation (dilution of color with white
light), intensity (the amount of black mixed in the color, i.e. dark red versus
bright red)
► import cv2
► from matplotlib import pyplot as plt
►
im = cv2.imread('/content/shapes.jpg')
► im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
►
# Display the image using matplotlib
► plt.imshow(im_rgb)
► plt.show()
►
► plt.subplot(131)
► plt.imshow(r, cmap= 'Reds')
► plt.title('Red Channel')
►
plt.subplot(132)
► plt.imshow(g, cmap= 'Greens')
► plt.title('Green Channel' )
►
plt.subplot(133)
► plt.imshow(b, cmap= 'Blues')
► plt.title('Blue Channel' )
►
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► src = cv2.imread('/content/shapes.jpg')
► print(src.shape)
► red_channel = src[:,:,2] #BGR
► red_img = np.zeros(src.shape)
► red_img[:,:,2] = red_channel
► cv2_imshow(red_img)
MORE PARAMETERS
► Color consistency
► Shadows
► Lighting
► Brightness
► Contrast
► r, g, b = cv2.split(img)
► plt.imshow(r)
► plt.show()
► plt.imshow(g)
► plt.show()
► plt.imshow(b)
► plt.show()
HSI MODEL
► Represents colors as human eye represents colors
► Three components: Hue, Saturation & Intensity
► Saturation & intensity range from 0-1
Saturation
Intensity
HSI MODEL
► import cv2
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► bgr_img = cv2.imread('/content/shapes.jpg')
► hsv_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2HSV)
► cv2_imshow(hsv_img)
► cv2.waitKey(0)
► cv2.destroyAllWindows()
IMAGE PRE-PROCESSING
► Series of operations at the lowest level of abstraction
► Input and output is an image
► Objective is to improve the image
► Remove distortions
► Improve quality or highlight important features
COMMON OPERATIONS
► Gray level transformations
► Histograms
► Geometric transformations
► Arithmetic Operations
► Convolution
► Smoothing
What will this do?
► colored_negative = abs(255-im_rgb)
► cv2_imshow(colored_negative)
► cv2_imshow(ad)
RGB TO GRAYSCALE CONVERSION
► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
BRIGHTNESS OF IMAGE
► Load the image
► Define a variable with the amount of brightness to be increased
► brightness_increase = 50 brightened_image = np.clip(img +
brightness_increase, 0, 255).astype(np.uint8)
GRAY LEVEL HISTOGRAMS
► Depicts frequency of occurrence of each gray value
► Can be interpreted by probability density functions
► PDF represents the likelihood of a pixel having a particular intensity value
► To convert a histogram to a PDF, you need to normalize it.
► Normalization involves dividing each bin count by the total number of pixels in
the image.
► The normalized histogram values then represent probabilities, indicating the
likelihood of a pixel having a specific intensity value.
► The area under the PDF curve should sum to 1, as it represents the probability of
a pixel having an intensity value within the entire intensity range.
► import cv2
► import numpy as np
► from matplotlib import pyplot as plt
► import numpy as np
► from google.colab.patches import cv2_imshow
► path = '/content/2.jpg'
► img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
► cv2_imshow(img)
►
dst = cv2.calcHist(img, [0], None, [256], [0,256])
►
plt.hist(img.ravel(),256,[0,256])
► plt.title('Histogram for gray scale image')
► plt.show()
► image = cv2.resize(img, (
200, 200))
►
b, g, r = cv2.split(image)
► hist_b = cv2.calcHist([b], [
0], None, [256], [0, 256])
► hist_g = cv2.calcHist([g], [
0], None, [256], [0, 256])
► hist_r = cv2.calcHist([r], [
0], None, [256], [0, 256])
► plt.plot(hist_b, color='blue', label='Blue Channel')
► plt.plot(hist_g, color='green', label='Green Channel')
► plt.plot(hist_r, color='red', label='Red Channel')
►
plt.title('RGB Histogram')
► plt.xlabel('Pixel Value')
► plt.ylabel('Frequency')
► plt.legend()
► plt.show()
HISTOGRAM EQUALIZATION
► Histogram equalization is a method in image processing of contrast adjustment
using the image’s histogram
► This allows for areas of lower local contrast to gain a higher contrast
► The goal is to create an image with evenly distributed gray levels
► import cv2
► import numpy as np
► img = cv2.imread('/content/2.jpg', 0)
► equ = cv2.equalizeHist(img)
► res = np.hstack((img, equ))
► cv2_imshow( res)
► cv2.waitKey(0)
► cv2.destroyAllWindows()
Equalized Histogram
AFFINE TRANSFORMATIONS
► To find the transformation matrix, we need three points from input image and
their corresponding locations in the output image
► Is a geometric transformation that preserves lines and parallelism (but not
necessarily distances and angles
► import cv2
► import numpy as np
► image = cv2.imread('/content/2.jpg')
► height, width = image.shape[:2]
► tx, ty = width / 4, height / 4
► # create the translation matrix using tx and ty, it is a NumPy array
► translation_matrix = np.array([
► [1, 0, tx],
► [0, 1, ty]
► ], dtype=np.float32)
► translated_image = cv2.warpAffine(src=image, M=translation_matrix, dsize=(width, height))
► # display the original and the Translated images
► cv2_imshow(translated_image)
AFFINE TRANSFORMATIONS
► img = cv2.imread('/content/2.jpg')
► rows, cols, ch = img.shape
►
► pts1 = np.float32([[50, 50],[200, 50], [50, 200]])
►
► pts2 = np.float32([[10, 100],[200, 50], [100, 250]])
►
► M = cv2.getAffineTransform(pts1, pts2)
► dst = cv2.warpAffine(img, M, (cols, rows))
► cv2_imshow(dst)
DIFFERENTIATE BETWEEN
TRANSFORMATIONS
► Euclidean
► Affine
► Projective
Affine Vs Non Affine
Affine Non-affine
Includes scaling, translation, rotation, Includes projective transformations
shearing, reflection
Parallelism is preserved Not preserved
Also called homography
Euclidean transformation
► In Affine transformations, we can do the following operations:Rotation,
Shearing, Translation, Scaling etc. {2x3 matrix is used}->refer slides
► In Euclidian transformation, we can do rotation and translation
► Subset of Affine transform
► Preserves distance, shape
► Parallelism is maintained
► Also called isometric transform
Projective/Perspective Transformation
► 3D scenes projected to 2D
► Resultant image depends on camera’s viewpoint
► Ratios or dimensions of objects change
► May not preserve angles
► No parallelism
► Generalized Affine transform
► For affine transformation, the projection vector is equal to 0. Thus, affine
transformation can be considered as a particular case of perspective
transformation.
► Since the transformation matrix (M) is defined by 8 constants(degree of
freedom), thus to find this matrix we first select 4 points in the input image
and map these 4 points to the desired locations in the unknown output
image according to the use-case (This way we will have 8 equations and 8
unknowns and that can be easily solved).