0% found this document useful (0 votes)
23 views77 pages

Case 2 Object Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views77 pages

Case 2 Object Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Lecturer: Bui Ha Duc, PhD

Email: ducbh@hcmute.edu.vn
▪ Identify and locate the target objects in an image
▪ Object detection involves 2 process:
▪ Locate the presence of ROI
▪ Classify these ROI to detect the object

Where is the Bull Dogs?

Difficulties in detection:
• Scene constancy
• Noise and distraction
• Rotation
• Presence of multiple
objects
• Size
• Occlusion

3
In this case you will learn
▪ Color, Edge and Shape detection operators
▪ Template Matching
▪ Machine learning approaches
Image Feature Features Hypothesis Candidate Hypothesis objects
Detector formation objects verification

- Color
- Edge
- Shape
Model
- Template
database
Tennibot
Which features should
be detected?

How is the scene


constancy?

What could be the noise?

Is multiple object present?

Do we need to deal with


occlusion?
Different
sizes

Multiple
Objects Background
color/contrast

Shade → Circular shape


color change
Noise Background
color/contrast

Light condition Fuzz gaps


Size Color

Shape
▪ A color is the combination od 3
primary color Red, Green and Blue
▪ The RGB color space reflect the
real-life color
▪ The distribution of intensity for
each color is linear on a scale from
0 to 255
▪ However, the color I clusters of materials under different brightness
will vary with a non-linear trend

https://www.sciencedirect.com/science/article/pii/S0898122112002787
▪ To detect color, it’s better to transform RGB color space into the
other color spaces
▪ HSV: close to how humans perceive color
▪ 3 components:
▪ Hue (H): color, range from 0 → 360
▪ Saturation (S): the amount of gray/shade in a particular color,
range from 0 (gray) → 255 (color)
▪ Value: brightness or intensity of the color, range from 0 (black) →
255 (color)
Distribution of the color clusters in RGB and HSL color space
▪ Traditional method

▪ OpenCV function:
inRange(src,min_array,max_array,dst)
▪ Sobel filter: simple tool to detect edges

Gx Gy

Sobel Operator
▪ Sobel filter

https://docs.opencv.org/4.1.1/d2/d2c/tutorial_sobel_derivatives.html
▪ Canny filter: famous tool to perform edge detection

▪ OpenCV function:
Canny(src,edges,threshold1,threshold2,[Sobel_kernel_size,L2Gradient])
thresholds 1-2
Hysteresis threshold

http://justin-liang.com/tutorials/canny/
▪ Method 1: using contour features
▪ Number of vertices → contour approximation
▪ Area of contour

▪ Method 2: using Hough Transform


▪ HoughLines → detect lines → shape
▪ HoughCircles → detect circles

https://docs.opencv.org/3.4/dd/d49/tutorial_py_contour_features.html
▪ Approximates a polygonal curve with Ramer–Douglas–Peucker
algorithm

https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm

▪ OpenCV function:
approxPolyDP(contours,contour_Polys,epsilon,is_close)

Epsilon is_close
the maximum the
distance between approximated
the original curve curve is closed
and its or not
approximation
▪ Find Minimum enclosing circles
minEnclosingCirces(contours,center,radius)
▪ Hough Transform : a transform used to detect straight lines or
circles

2D space Hough space


Cartesian coordinate system Polar coordinate system

https://docs.opencv.org/4.1.1/d9/db0/tutorial_hough_lines.html
▪ Hough Transform

Line Detection using the Hough Transform Algorithm

https://towardsdatascience.com/lines-detection-with-hough-transform-84020b3b1549
▪ Hough Transform
HoughLines(src,lines,r,theta,threshold,0,0)

Lines r theta threshold


Output vectors Distance resolution, Angle resolution, number of
of lines, which in pixel unit in degree unit intersections to
is represent by detect a line
(r,theta)
▪ Hough Circle: a transform used to detect circles
HoughCircles(src,circles, method,dp,minDist,para1,para2,r_min, r_max)

Circles method dp minDist


A vector that detection method. Inverse ratio of Minimum
stores sets of 3 Currently the only one the accumulator distance between
values: xc,yc,r for available is resolution to the the centers of the
each detected HOUGH_GRADIENT image resolution detected circles
circle.
1. Detect 2. Detect
3. Construct
High Contrast Black
triangle
ROIs Pentagon

6. Classify
5. Hough 4. Select
with Neural
Correction Best Triangle
Network

https://www.robocup2017.org/file/symposium/RoboCup_Symposium_2017_paper_20.pdf
▪ fish packing plant: separate
sea bass from salmon using
optical sensing
▪ physical differences: length,
lightness, width, number and
shape of fins, position of the
mouth
▪ Noise: variations in lighting,
position of the fish on the
conveyor, “static” due to the
electronics of the camera itself

31
Histograms for the length feature for the two
categories

32
Histograms for the lightness feature for the two categories

33
decision boundary

The two features of lightness and width for sea bass and salmon

How would our system automatically determine the decision boundary?


34
▪ Adaboost (page 831, learning opencv)
▪ Old
▪ Powerful and highly accurate
▪ sensitive to noisy data and outliers

▪ Cascade filter (page 876)


▪ Quick
▪ Require less computational resources
▪ Can work on real-time

▪ Support vector machine (897)


▪ Very accurate
▪ Require powerful computer
▪ Hard to implement on realtime

▪ Deep learning (Faster R-CNNs, Single Shot Detectors, YOLO)


▪ New trend, very powerful
▪ Require huge computational resources

35
▪ ‘Boosting’ : a family of algorithms which converts weak learners to
strong learners.
𝑛

𝐻 𝑥 = 𝑠𝑖𝑔𝑛 ෍ 𝛼𝑖 ℎ𝑖 (𝑥)
𝑖=1

ℎ𝑖 (𝑥): learners
𝛼 : weight of the leaner

https://www.globalsoftwaresupport.com/boosting-adaboost-in-machine-learning/
▪ Weak learners for image recognition

Common features Haar filters


160,000+ possible features associated
with each 24 x 24 window
Prepare data

Negative Images Positive Images


images which do not contain the target object images which contain the target object

A proportion of 2:1 or higher between negative and positive samples is considered acceptable
▪ Load the cascade filter in Opencv
▪ Digital computer is the most important technology developed in 20th
century
▪ 2 GHz cpu can execute 2 billion operations per second
▪ The fastest computer (2005): 280.6 trillion calculations per second
▪ How many operations can a computer execute at any given instant?
▪ Only ONE! The operations are serial: one after another!

44
The PC can run billions operations A typical firing rate for a neuron
per second is around 100 spikes per second.

Can computer beat the human brain now? Yes and No

How can the brain make up for the slow rate of operation?

Massively parallel!
A huge number of nerve cells (neurons) and interconnections among
them. The number of neurons is estimated to be in the range of
1010 with 60 × 1012 synapses (interconnections).
The function of a biological neuron seems to be much more complex
than that of a logic gate
The brain is a highly complex, non-linear, parallel information processing
system. It performs tasks such as pattern recognition and perception many
times faster than the fastest digital computers
45
▪ A typical biological neuron is composed of:
▪ A cell body;
▪ Dendrites: input channels
▪ Axon: output cable; it usually branches.

46
▪ The major job of neuron:
▪ receives information, usually in the form of electrical pulses, from many other
neurons.
▪ sum these inputs in a complex dynamic way
▪ sends out information in the form of a stream of electrical impulses down its axon
and on to many other neurons.
▪ The connections (synapses) are crucial for excitation, inhibition or modulation of
the cells.
▪ Learning is possible by adjusting the synapses!

How to build a mathematical model of the neuron?

47
▪ Simplest model

inputs
outputs
System

48
Input x
outputs
System

𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑠ℎ𝑖𝑝 𝒚 = ෍ 𝒙𝒊
𝒊=𝟏
But:
• The neuron only fires when it is sufficiently
excited
• Firing rate has an upper bound
49
▪ Modified model:
𝑚

𝑦 = 𝜑(෍ 𝑥𝑖 − 𝑏) ← 𝑆𝑞𝑢𝑎𝑠ℎ/𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛


𝑖=1

▪ b: threshold bias → the neuron will not fire till it “high” enough.

Based upon this model, is it possible for the inputs to


inhibit the activation of the neuron?
The synaptic weights!

50
𝒎

𝒖𝒌 = ෍ 𝒘𝒊 𝒙𝒊 𝒚𝒌 = 𝝋(𝒖𝒌 + 𝒃𝒌 )
𝒊=𝟏

𝒗𝒌 = 𝒖𝒌 + 𝒃𝒌

51
▪ Three basic components for the model of a neuron:
▪ A set of synapses or connecting links: characterized by a weight or strength of its
own.
▪ An adder for summing the input signals, weighted by the respective synapses of
the neuron (a linear combiner).
▪ An activation function for limiting the amplitude of the neuron output

▪ Mathematical model:

𝒖𝒌 = ෍ 𝒘𝒊 𝒙𝒊 𝒚𝒌 = 𝝋(𝒖𝒌 + 𝒃𝒌 )
𝒊=𝟏

52
▪ Threshold function (McCulloch-
Pitts model - 1943)
1 𝑖𝑓 𝑣 ≥ 0
𝜑 𝑣 =ቊ
0 𝑖𝑓 𝑣 < 0

▪ Piecewise-linear function

1 𝑣 ≥ 0.5
𝜑 𝑣 = ቐ𝑣 + 0.5 − 0.5 ≤ 𝑣 ≤ 0.5
0 𝑣 ≤ −0.5

53
▪ Logistic function:
1
𝜑 𝑣 =
1 + 𝑒 −𝑎𝑣

▪ Hyperbolic tangent function:


𝑒 𝑥 − 𝑒 −𝑥
𝜑 𝑣 = tanh 𝑣 = 𝑥
𝑒 + 𝑒 −𝑥

54
▪ Gaussian functions
1 1 𝑣−𝜇 2
𝜑 𝑣 = 𝑒𝑥𝑝 −
2𝜋𝜎 2 𝜎

55
▪ Network architecture defines how nodes are connected.

56
▪ Learning is a process by which the free parameters of a neural
network are adapted through a process of stimulation by the
environment in which the network is embedded.
▪ Process of learning:
▪ The NN is stimulated by an environment
▪ The NN undergoes changes in its free parameters as a result of this stimulation.
▪ The NN responds in a new way to the environment because of the changes that
have occurred in its internal structure.

How can the network adjust the weights?


57
▪ Perceptron is built around the McCulloch-Pitts model.

58
▪ Goal: To correctly classify the set of externally applied
stimuli x1, x2,…, xn into one of two classes, C1 and C2.

the input vector: the weight vector:

Where n denotes the iteration step


▪ Output of the neuron y(n)

▪ What is the decision boundary?

60
Decision boundary

▪ m = 1: ?
▪m = 2 ?
▪m = 2 ?

▪ How to choose the proper


weights?

61
Two basic methods can be employed to select a suitable weight vector
▪ By off-line calculation of weights (without learning).
▪ Possible if the system is relative simple

▪ By learning procedure
▪ The weight vector is determined from a given (training) set of
input-output vectors (exemplars) in such a way to achieve the
best classification of the training vectors

62
Example
Truth table of NAND
Three points (0,0), (0,1)
and
(1,0) belong to one class.
And (1,1) belong to another
class.

The decision boundary is the straight line described


by the following equation
x1 + x2 = 1.5 or − x1 − x2 + 1.5 = 0

→ w = (1.5, −1, − 1)
Is the decision line unique for this problem?
63
▪ if C1 and C2 are linearly separable, there exist weight vector 𝑤0 such
that:
𝑤0 𝑇 𝑥 > 0 𝑓𝑜𝑟 ∀𝑥 𝜖 𝐶1
𝑤0 𝑇 𝑥 < 0 𝑓𝑜𝑟 ∀𝑥 𝜖 𝐶2

Given a training set 𝑋 = 𝑋1 ∪ 𝑋2 (𝑋1 ∩ 𝑋2 = 0) where


𝑋1 ⊂ 𝐶1
𝑋2 ⊂ 𝐶2
▪ Training target: Find a weight vector w such that the perceptron can
correctly classify the training set X.

64
▪ Feed a pattern x to the perceptron with weight vector w, it will
produce a binary output y (1 or 0). First consider the case,
𝑤0 𝑇 𝑥 < 0 ⟹ 𝑦 = 0

▪ If the correct label (all the labels of the training samples are known)
is d=0; should we update the weights?
▪ If the desired output is d=1, assume the new weight vector is w’, then
we have:
𝑤 ′ = 𝑤 + ∆𝑤
▪ But how to choose Δw ?

65
▪ 𝑤′𝑇 𝑥 − 𝑤 𝑇 𝑥 = ∆𝑤 𝑇 𝑥 > 0
▪ ∆𝑤 = 𝜂𝑥, 𝜂 > 0 ⟹ ∆𝑤 𝑇 𝑥 = 𝜂𝑥 𝑇 𝑥 > 0
▪ if the true label is d=1, and the perceptron makes a mistake, its
synaptic weights are adjusted by
𝑤 ′ = 𝑤 + 𝜂𝑥

66
▪ consider the case,
𝑤0 𝑇 𝑥 > 0 ⟹ 𝑦 = 1
▪ only adjust the weights when the perceptron makes
mistakes (d=0)
𝑤 ′ = 𝑤 + ∆𝑤

𝑤′𝑇 𝑥 − 𝑤 𝑇 𝑥 = ∆𝑤 𝑇 𝑥 < 0
∆𝑤 = −𝜂𝑥, 𝜂 > 0 ⟹ ∆𝑤 𝑇 𝑥 = −𝜂𝑥 𝑇 𝑥 < 0
▪ If the true label is d=0, and the perceptron makes a
mistake, its synaptic weights are adjusted by
𝑤 ′ = 𝑤 − 𝜂𝑥
67
▪ To unify this algorithm
▪ consider the error signal: e=d-y
▪ the error signal when d=1: e=1-0=1
▪ the error signal when d=0: e=0-1=-1

▪ Then
𝑤 ′ = 𝑤 + 𝜂𝑒𝑥

68
▪ Algorithm Perceptron
Start with a randomly chosen weight vector w(1);
while there exist input vectors that are misclassified by w(n)
Do Let x(n) be a misclassified input vector;
Update the weight vector to

Increment n
end-while
69
▪ Example: Let us consider a simple classification problem where the
input space is one-dimensional space, i.e., a real line:
▪ Class 1 (d = 1) : x = 0.5, 2
▪ Class 2 (d = 0) : x = -1, -2

▪ Solution:

70
71
▪ Perceptron Convergence Theorem:
If C1 and C2 are linearly separable, after a finite number of steps,
the weights stop changing

72
▪ Multilayer perceptrons
(MLPs)
▪ Generalization of the single-
layer perceptron
▪ Consists of
▪ An input layer
▪ One or more hidden layers of
computation nodes
▪ An output layer of computation
nodes
▪ Architectural graph of a
multilayer perceptron with
two hidden layers:

73
▪ Is there an “optimal” way to separate the data?

The theory of support vector machines provides a systematic


method for separating the data “optimally”

74
75
▪ High computational power
▪ Generalization : Producing reasonable outputs for inputs not encountered during
training (learning).
▪ Has a massively parallel distributed structure.

▪ Useful properties and capabilities


▪ Nonlinearity : Most physical systems are nonlinear
▪ Adaptivity (plasticity): Has built-in capability to adapt their synaptic weights to
changes in the environment
▪ Fault tolerance : If a neuron or its connecting links are damaged, the overall
response may still be ok (due to the distributed nature of information stored in a
network).

76

You might also like