Lecturer: Bui Ha Duc, PhD
Email: ducbh@hcmute.edu.vn
▪ Identify and locate the target objects in an image
▪ Object detection involves 2 process:
▪ Locate the presence of ROI
▪ Classify these ROI to detect the object
Where is the Bull Dogs?
Difficulties in detection:
• Scene constancy
• Noise and distraction
• Rotation
• Presence of multiple
objects
• Size
• Occlusion
3
In this case you will learn
▪ Color, Edge and Shape detection operators
▪ Template Matching
▪ Machine learning approaches
Image Feature Features Hypothesis Candidate Hypothesis objects
Detector formation objects verification
- Color
- Edge
- Shape
Model
- Template
database
Tennibot
Which features should
be detected?
How is the scene
constancy?
What could be the noise?
Is multiple object present?
Do we need to deal with
occlusion?
Different
sizes
Multiple
Objects Background
color/contrast
Shade → Circular shape
color change
Noise Background
color/contrast
Light condition Fuzz gaps
Size Color
Shape
▪ A color is the combination od 3
primary color Red, Green and Blue
▪ The RGB color space reflect the
real-life color
▪ The distribution of intensity for
each color is linear on a scale from
0 to 255
▪ However, the color I clusters of materials under different brightness
will vary with a non-linear trend
https://www.sciencedirect.com/science/article/pii/S0898122112002787
▪ To detect color, it’s better to transform RGB color space into the
other color spaces
▪ HSV: close to how humans perceive color
▪ 3 components:
▪ Hue (H): color, range from 0 → 360
▪ Saturation (S): the amount of gray/shade in a particular color,
range from 0 (gray) → 255 (color)
▪ Value: brightness or intensity of the color, range from 0 (black) →
255 (color)
Distribution of the color clusters in RGB and HSL color space
▪ Traditional method
▪ OpenCV function:
inRange(src,min_array,max_array,dst)
▪ Sobel filter: simple tool to detect edges
Gx Gy
Sobel Operator
▪ Sobel filter
https://docs.opencv.org/4.1.1/d2/d2c/tutorial_sobel_derivatives.html
▪ Canny filter: famous tool to perform edge detection
▪ OpenCV function:
Canny(src,edges,threshold1,threshold2,[Sobel_kernel_size,L2Gradient])
thresholds 1-2
Hysteresis threshold
http://justin-liang.com/tutorials/canny/
▪ Method 1: using contour features
▪ Number of vertices → contour approximation
▪ Area of contour
▪ Method 2: using Hough Transform
▪ HoughLines → detect lines → shape
▪ HoughCircles → detect circles
https://docs.opencv.org/3.4/dd/d49/tutorial_py_contour_features.html
▪ Approximates a polygonal curve with Ramer–Douglas–Peucker
algorithm
https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
▪ OpenCV function:
approxPolyDP(contours,contour_Polys,epsilon,is_close)
Epsilon is_close
the maximum the
distance between approximated
the original curve curve is closed
and its or not
approximation
▪ Find Minimum enclosing circles
minEnclosingCirces(contours,center,radius)
▪ Hough Transform : a transform used to detect straight lines or
circles
2D space Hough space
Cartesian coordinate system Polar coordinate system
https://docs.opencv.org/4.1.1/d9/db0/tutorial_hough_lines.html
▪ Hough Transform
Line Detection using the Hough Transform Algorithm
https://towardsdatascience.com/lines-detection-with-hough-transform-84020b3b1549
▪ Hough Transform
HoughLines(src,lines,r,theta,threshold,0,0)
Lines r theta threshold
Output vectors Distance resolution, Angle resolution, number of
of lines, which in pixel unit in degree unit intersections to
is represent by detect a line
(r,theta)
▪ Hough Circle: a transform used to detect circles
HoughCircles(src,circles, method,dp,minDist,para1,para2,r_min, r_max)
Circles method dp minDist
A vector that detection method. Inverse ratio of Minimum
stores sets of 3 Currently the only one the accumulator distance between
values: xc,yc,r for available is resolution to the the centers of the
each detected HOUGH_GRADIENT image resolution detected circles
circle.
1. Detect 2. Detect
3. Construct
High Contrast Black
triangle
ROIs Pentagon
6. Classify
5. Hough 4. Select
with Neural
Correction Best Triangle
Network
https://www.robocup2017.org/file/symposium/RoboCup_Symposium_2017_paper_20.pdf
▪ fish packing plant: separate
sea bass from salmon using
optical sensing
▪ physical differences: length,
lightness, width, number and
shape of fins, position of the
mouth
▪ Noise: variations in lighting,
position of the fish on the
conveyor, “static” due to the
electronics of the camera itself
31
Histograms for the length feature for the two
categories
32
Histograms for the lightness feature for the two categories
33
decision boundary
The two features of lightness and width for sea bass and salmon
How would our system automatically determine the decision boundary?
34
▪ Adaboost (page 831, learning opencv)
▪ Old
▪ Powerful and highly accurate
▪ sensitive to noisy data and outliers
▪ Cascade filter (page 876)
▪ Quick
▪ Require less computational resources
▪ Can work on real-time
▪ Support vector machine (897)
▪ Very accurate
▪ Require powerful computer
▪ Hard to implement on realtime
▪ Deep learning (Faster R-CNNs, Single Shot Detectors, YOLO)
▪ New trend, very powerful
▪ Require huge computational resources
35
▪ ‘Boosting’ : a family of algorithms which converts weak learners to
strong learners.
𝑛
𝐻 𝑥 = 𝑠𝑖𝑔𝑛 𝛼𝑖 ℎ𝑖 (𝑥)
𝑖=1
ℎ𝑖 (𝑥): learners
𝛼 : weight of the leaner
https://www.globalsoftwaresupport.com/boosting-adaboost-in-machine-learning/
▪ Weak learners for image recognition
Common features Haar filters
160,000+ possible features associated
with each 24 x 24 window
Prepare data
Negative Images Positive Images
images which do not contain the target object images which contain the target object
A proportion of 2:1 or higher between negative and positive samples is considered acceptable
▪ Load the cascade filter in Opencv
▪ Digital computer is the most important technology developed in 20th
century
▪ 2 GHz cpu can execute 2 billion operations per second
▪ The fastest computer (2005): 280.6 trillion calculations per second
▪ How many operations can a computer execute at any given instant?
▪ Only ONE! The operations are serial: one after another!
44
The PC can run billions operations A typical firing rate for a neuron
per second is around 100 spikes per second.
Can computer beat the human brain now? Yes and No
How can the brain make up for the slow rate of operation?
Massively parallel!
A huge number of nerve cells (neurons) and interconnections among
them. The number of neurons is estimated to be in the range of
1010 with 60 × 1012 synapses (interconnections).
The function of a biological neuron seems to be much more complex
than that of a logic gate
The brain is a highly complex, non-linear, parallel information processing
system. It performs tasks such as pattern recognition and perception many
times faster than the fastest digital computers
45
▪ A typical biological neuron is composed of:
▪ A cell body;
▪ Dendrites: input channels
▪ Axon: output cable; it usually branches.
46
▪ The major job of neuron:
▪ receives information, usually in the form of electrical pulses, from many other
neurons.
▪ sum these inputs in a complex dynamic way
▪ sends out information in the form of a stream of electrical impulses down its axon
and on to many other neurons.
▪ The connections (synapses) are crucial for excitation, inhibition or modulation of
the cells.
▪ Learning is possible by adjusting the synapses!
How to build a mathematical model of the neuron?
47
▪ Simplest model
inputs
outputs
System
48
Input x
outputs
System
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑠ℎ𝑖𝑝 𝒚 = 𝒙𝒊
𝒊=𝟏
But:
• The neuron only fires when it is sufficiently
excited
• Firing rate has an upper bound
49
▪ Modified model:
𝑚
𝑦 = 𝜑( 𝑥𝑖 − 𝑏) ← 𝑆𝑞𝑢𝑎𝑠ℎ/𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝑖=1
▪ b: threshold bias → the neuron will not fire till it “high” enough.
Based upon this model, is it possible for the inputs to
inhibit the activation of the neuron?
The synaptic weights!
50
𝒎
𝒖𝒌 = 𝒘𝒊 𝒙𝒊 𝒚𝒌 = 𝝋(𝒖𝒌 + 𝒃𝒌 )
𝒊=𝟏
𝒗𝒌 = 𝒖𝒌 + 𝒃𝒌
51
▪ Three basic components for the model of a neuron:
▪ A set of synapses or connecting links: characterized by a weight or strength of its
own.
▪ An adder for summing the input signals, weighted by the respective synapses of
the neuron (a linear combiner).
▪ An activation function for limiting the amplitude of the neuron output
▪ Mathematical model:
𝒖𝒌 = 𝒘𝒊 𝒙𝒊 𝒚𝒌 = 𝝋(𝒖𝒌 + 𝒃𝒌 )
𝒊=𝟏
52
▪ Threshold function (McCulloch-
Pitts model - 1943)
1 𝑖𝑓 𝑣 ≥ 0
𝜑 𝑣 =ቊ
0 𝑖𝑓 𝑣 < 0
▪ Piecewise-linear function
1 𝑣 ≥ 0.5
𝜑 𝑣 = ቐ𝑣 + 0.5 − 0.5 ≤ 𝑣 ≤ 0.5
0 𝑣 ≤ −0.5
53
▪ Logistic function:
1
𝜑 𝑣 =
1 + 𝑒 −𝑎𝑣
▪ Hyperbolic tangent function:
𝑒 𝑥 − 𝑒 −𝑥
𝜑 𝑣 = tanh 𝑣 = 𝑥
𝑒 + 𝑒 −𝑥
54
▪ Gaussian functions
1 1 𝑣−𝜇 2
𝜑 𝑣 = 𝑒𝑥𝑝 −
2𝜋𝜎 2 𝜎
55
▪ Network architecture defines how nodes are connected.
56
▪ Learning is a process by which the free parameters of a neural
network are adapted through a process of stimulation by the
environment in which the network is embedded.
▪ Process of learning:
▪ The NN is stimulated by an environment
▪ The NN undergoes changes in its free parameters as a result of this stimulation.
▪ The NN responds in a new way to the environment because of the changes that
have occurred in its internal structure.
How can the network adjust the weights?
57
▪ Perceptron is built around the McCulloch-Pitts model.
58
▪ Goal: To correctly classify the set of externally applied
stimuli x1, x2,…, xn into one of two classes, C1 and C2.
the input vector: the weight vector:
Where n denotes the iteration step
▪ Output of the neuron y(n)
▪ What is the decision boundary?
60
Decision boundary
▪ m = 1: ?
▪m = 2 ?
▪m = 2 ?
▪ How to choose the proper
weights?
61
Two basic methods can be employed to select a suitable weight vector
▪ By off-line calculation of weights (without learning).
▪ Possible if the system is relative simple
▪ By learning procedure
▪ The weight vector is determined from a given (training) set of
input-output vectors (exemplars) in such a way to achieve the
best classification of the training vectors
62
Example
Truth table of NAND
Three points (0,0), (0,1)
and
(1,0) belong to one class.
And (1,1) belong to another
class.
The decision boundary is the straight line described
by the following equation
x1 + x2 = 1.5 or − x1 − x2 + 1.5 = 0
→ w = (1.5, −1, − 1)
Is the decision line unique for this problem?
63
▪ if C1 and C2 are linearly separable, there exist weight vector 𝑤0 such
that:
𝑤0 𝑇 𝑥 > 0 𝑓𝑜𝑟 ∀𝑥 𝜖 𝐶1
𝑤0 𝑇 𝑥 < 0 𝑓𝑜𝑟 ∀𝑥 𝜖 𝐶2
Given a training set 𝑋 = 𝑋1 ∪ 𝑋2 (𝑋1 ∩ 𝑋2 = 0) where
𝑋1 ⊂ 𝐶1
𝑋2 ⊂ 𝐶2
▪ Training target: Find a weight vector w such that the perceptron can
correctly classify the training set X.
64
▪ Feed a pattern x to the perceptron with weight vector w, it will
produce a binary output y (1 or 0). First consider the case,
𝑤0 𝑇 𝑥 < 0 ⟹ 𝑦 = 0
▪ If the correct label (all the labels of the training samples are known)
is d=0; should we update the weights?
▪ If the desired output is d=1, assume the new weight vector is w’, then
we have:
𝑤 ′ = 𝑤 + ∆𝑤
▪ But how to choose Δw ?
65
▪ 𝑤′𝑇 𝑥 − 𝑤 𝑇 𝑥 = ∆𝑤 𝑇 𝑥 > 0
▪ ∆𝑤 = 𝜂𝑥, 𝜂 > 0 ⟹ ∆𝑤 𝑇 𝑥 = 𝜂𝑥 𝑇 𝑥 > 0
▪ if the true label is d=1, and the perceptron makes a mistake, its
synaptic weights are adjusted by
𝑤 ′ = 𝑤 + 𝜂𝑥
66
▪ consider the case,
𝑤0 𝑇 𝑥 > 0 ⟹ 𝑦 = 1
▪ only adjust the weights when the perceptron makes
mistakes (d=0)
𝑤 ′ = 𝑤 + ∆𝑤
𝑤′𝑇 𝑥 − 𝑤 𝑇 𝑥 = ∆𝑤 𝑇 𝑥 < 0
∆𝑤 = −𝜂𝑥, 𝜂 > 0 ⟹ ∆𝑤 𝑇 𝑥 = −𝜂𝑥 𝑇 𝑥 < 0
▪ If the true label is d=0, and the perceptron makes a
mistake, its synaptic weights are adjusted by
𝑤 ′ = 𝑤 − 𝜂𝑥
67
▪ To unify this algorithm
▪ consider the error signal: e=d-y
▪ the error signal when d=1: e=1-0=1
▪ the error signal when d=0: e=0-1=-1
▪ Then
𝑤 ′ = 𝑤 + 𝜂𝑒𝑥
68
▪ Algorithm Perceptron
Start with a randomly chosen weight vector w(1);
while there exist input vectors that are misclassified by w(n)
Do Let x(n) be a misclassified input vector;
Update the weight vector to
Increment n
end-while
69
▪ Example: Let us consider a simple classification problem where the
input space is one-dimensional space, i.e., a real line:
▪ Class 1 (d = 1) : x = 0.5, 2
▪ Class 2 (d = 0) : x = -1, -2
▪ Solution:
70
71
▪ Perceptron Convergence Theorem:
If C1 and C2 are linearly separable, after a finite number of steps,
the weights stop changing
72
▪ Multilayer perceptrons
(MLPs)
▪ Generalization of the single-
layer perceptron
▪ Consists of
▪ An input layer
▪ One or more hidden layers of
computation nodes
▪ An output layer of computation
nodes
▪ Architectural graph of a
multilayer perceptron with
two hidden layers:
73
▪ Is there an “optimal” way to separate the data?
The theory of support vector machines provides a systematic
method for separating the data “optimally”
74
75
▪ High computational power
▪ Generalization : Producing reasonable outputs for inputs not encountered during
training (learning).
▪ Has a massively parallel distributed structure.
▪ Useful properties and capabilities
▪ Nonlinearity : Most physical systems are nonlinear
▪ Adaptivity (plasticity): Has built-in capability to adapt their synaptic weights to
changes in the environment
▪ Fault tolerance : If a neuron or its connecting links are damaged, the overall
response may still be ok (due to the distributed nature of information stored in a
network).
76