0% found this document useful (0 votes)
77 views25 pages

NNDL-unit 3

The document outlines the syllabus for a course on Third-Generation Neural Networks, covering topics such as Spiking Neural Networks, Convolutional Neural Networks, and Extreme Learning Machine Models. It discusses the principles of these networks, their architectures, learning mechanisms, challenges, and applications in fields like computer vision and image processing. Additionally, it highlights the advantages and disadvantages of these neural network types, emphasizing their efficiency and robustness compared to traditional neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views25 pages

NNDL-unit 3

The document outlines the syllabus for a course on Third-Generation Neural Networks, covering topics such as Spiking Neural Networks, Convolutional Neural Networks, and Extreme Learning Machine Models. It discusses the principles of these networks, their architectures, learning mechanisms, challenges, and applications in fields like computer vision and image processing. Additionally, it highlights the advantages and disadvantages of these neural network types, emphasizing their efficiency and robustness compared to traditional neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

[UNIT 111 l

Third-Generation N eu ra l
N et w or ks
Syllabus
Spiking Neural Nen\'Orks-Convolutional Ne11ra/ Netvvorks-
Deep Learning Neural Networks-Extreme
Learning Machine Model-Convol11tional Nt'11ra/ Networks
: The Convolution Operation - Motivation
-Pooling - Variants of the basic Co11vo/11tion Function ·- Struc
t11red 011tputs - Data Types - EJficicnt
Convolution Algorithms - Ne11roscientijic Basis - Applicatio
ns : Comp11lcr Vision, Image Generation,
Image Compression.

Contents
3.1 Spiking Neural Networks
3.2 Convolutional Neural Networks
3.3 Extreme Learning Machine Model
3.4 The Convolution Operation
3.5 Pooling
3.6 Variants of the Basic Convolution Function
3. 7 Structured Outputs
3.8 Data Types
3.9 Efficient Convolution Algorithms
3.10 Neuroscientific Basis
3.11 Applications : Com pute r Vision
3.12 Two Marks Questions with Answers
Third-Generation Neural N
3 -2 er~
'~s

ED Spiking Neural Networks .


. . . 11 . in ·pired by the brnm nnd the comn1unic .
• S ikin\! 7\curJI ~c-rwork. (S~~ ) ,,we ong:111.1 ) ~ . . . . at1on
r - . . . . tr::m,fonnntwn nn d1scrc-te nctton potenr
s("hcmt'.' ch:lt neurons use for mtonn:.1t1on . - %
tsrikes) in time thr0ugh :1dnptiw s~11apses.
t\ •ork. - k.110,,11 m the machine lea .
• s~~ is ditrhent from traditional neur::i 1 ne ' !S • ming
. . . . k )p "rates on spikes. Spikes are discrete e\'ents taki
commumry. Sp1k.mg neura 1 net\\ L1r c c. ._ • ng
.- . t· ft . Thu · it is different from Art1t1ct:1l Neural Networks that us
place at spec1t1c pom ~ o 1me. ~- . . . e
111
continuous Yalues. Differential equations represent Ynrious bwlogtcnl processes the event
of a spike.
• One of the most critical processes is the membr:me capacity of th e neuron. A neuron spikes
when it reaches a specific potential. After a neuron spike, the potential is again re.
established for chat neuron. It takes some time for a neuron to return to its st able state after
firing an action potential. The time inrern1l after reaching membrane potential is known as
the refractory period.
• An S~'"N architecture consists of spiking neurons and interconnecting synapses that are
modeled by adjustable scalar weights. The first step in implementing an SNN is to encode
the analog input data in_to the spik:e trains using either a rate based method, some fonn of
temporal coding or population coding.
• Spike trains in a network of spiking neurons are propagated through S)11aptic connections. A
synapse can be either excitatory, which increases the neuron· s membrane potential upon
receiving input. which decreases the neuron· s membrane potential.
The strength of the adaptive S)11apses (weights) can be changed as a result of learning. The
......,ing rule of an SXN is its most challenging component for deYeloping multi-layer (deep)
s, because the non-differentiability of spike trains limits the popular backpropagation

neural networks are a type of neural network that can simulate the firing of neurons
.brain. These networks are designed to better model how the brain works, and they
potential to be more efficient and powerful than traditional neJ}ral networks.
am:al networks are made up of neurons that ~re in response to input. The strength
the rate at which the neuron fires and the pattern of firing can be

c:,oanections between th~


cdthe network. This error
~- nd_ D_
rk_s_a_ rn_in....:::g_ _ _ _ _:!_3.::_-~3 _ _ _ _ __:i72h)!!_ir:c2d-~G~e~n~er~a~tio?!_n~N~e~u~ra~l..'...:N~e.:.:tw:.:.o:.:..rk_s
e_e_p_L_e_a_

common lcarnmg mechanisms in SNN s are as follows;


,
.
sNN uses unsupervised and supervised 1earnmg .
, mechamsm.
ervised Learning via spik f ·
1. t.JnsUP e- immg-dependent plasticity (STOP):
• Data is delivered without a label and the network receives no feedback on its perfonnance.
Detecting and reacting to statistical correlations in data is a common activity. Hebbian
teaming and its spiking generalizations, such as STOP, are a good example of this. The
identification of correlations can be a goal in and of itself, but it can also be utilized to
cluster or classify data later on.
• STOP is defined as a process that strengthens a synaptic weight if the post-synaptic neuron
activates soon after th e pre-synaptic neuron fires and weakens it if the post-synaptic neuron
fires later. This conventional form of STOP, on the other hand, is merely one of the
numerous physiological forms of STOP.

2. supervised Learning
• In supe1:ised learning, data (the input) is accompanied by labels (the targets) and the
learning device's purpose is to correlate (classes of) inputs with the target outputs.
• An error signal is computed between the target and the actual output and utilized to update
the network's weights.
• Supervised learning allows us to use the targets to directly updat·e parameters, whereas
reinforcement learning just provides us with a generic error signal ("reward") that reflects
how well the system is functioning. In practice, the line between the two types of supervised
learning is blurred.

BIJ Challenges with SNN


I. One challenge is that these networks are sti ll relatively new and therefore not well
understood.
2. Training spiking neural nc_tworks can be difficult and time-consuming, as they require
specialized hardware and software.
3. Another challenge is that these networks can be very sensitive to changes in input data,
meaning that th ey can be di ffic ul t to deploy in real-world applications.
4. Spiking neu ra l networks can be power-hungry, which can be a problem for mobile or
battery-powered devices

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


Neural Networks and Deep Learning
3-4
Third-Generation Neurat t le.
-lv.1,,r~.
~l
fjfj Benefits of SNN Networks (TNNs) because the
. Traditional Neura I Y on1
• SNNs are more efficient than which reduces the amount of energy required }
transmit information when necessary, lG

operate the network.


• SNNs m~re robust to noise and errors. . TNNs which makes th
• SNNs can be implemented in hardware more easily than ' ern \VelJ_

suited for real-time applications.


- control systems for robots and other machines
• s:NNs have been used to develop success fu 1 ·

II) Convoh.itional Neural Networks


. 1 Neura I Network (CNN) is a deep learning neural network designed ior
• Convoluttona
· ages · A CNN is a feed-fonvard neura1
processino· structured arrays of data sue h as tm
network, :ften with up to 20 or 30 layers. The power of a convolutional neural net\\"ork
comes from a special kind of layer called the convolutional layer.
• Convolutional neural network is also called ConvNet.
• In CNN, 'convolution' is referred to as the mathematical function. It's a type of linear
operation in which you can multiply two functions to create a third function that expresses
how one function's shape can be changed by the other.
• In simple tcnns, two images that arc represented in the fom1 of two matrices. are multiplied
to provide an output that is used to extract information from the image.
• CNN represents the input data in the form of multidimensional arrays. It \\·orks \\·ell for a
large number of labeled data. CNN extract the each and c, cry portion of input image.,, hid1
is known as receptive field. It assigns weights for each neuron lw,l'd on the signi1i c:mt rnk
of the receptive field.
• ln1itcad of preprocessing the data to derive fcaturt·s like tcxttJrl'S and ~hapcs. a CNN ta~c~
ju,t the image's raw pixel data as input and "learns" how to extract these fraturc~. anJ
ultimately infer what object they constitute.

• The goal of CNN is to reduce the im;-igcs so that it would he easier to process wit!H1ut 11,:-1nf
features that arc valuable for accurate prediction.

• A convolutional neural network is made up of numcrou"i layers. ~urh as rum,,luti,,n J,1:1."r~.


pool_ing ~ayers ~nd fully <.:onncctcd layers and it u~cs a had -propagatiun alr.(iflthm 11 , k,1rn
spatial h1crarch1cs of data automatically and adaptively,
• To undcr~tand the Concept of Convnl111,·,,, 1,·11 N I '
, l'llla N1..·t\.\n1k..; {CN~'- ), kt 11'- t.1h· •'''
cxamp~e of the images our brain can intcrpn:t.

TECHNICAL PUBL/CA TIONs®. an U() tlm.1~;;;;, "-no\\lr>d;fJ


Notworks nnd Doop Ln11rn/n g
1 ·-. - - - - 3-5 k
fJt~!!;.--"
,,,,- ------- Thlrcl-Gnnomtion Nouml Networ s
As soon as we sec an imagc . -- ,
• . ' our l)l'a1n sla . . .. . . . . on the color, shape and
sornd1mcs also the message ti . . i ls c.ilego rmng 1l based
. . . I
1<1t linage i . . can he done throug 1
achim.: s even after a ri go . . . . . s con veying . S11nrla r thing
111 - 1011 s l1a1111 11 , B . .. . . .. .
what humans intcrpn.: t ai 1 1. _g. ut lhc d1f11 cul1 y 1s th cre is a huge difference in
1c w l<1l ma t h111 , I .
f pixels. There is a un · , . . e t oes. hir a machin e, th e image is merely an array .
ique p<1ttern rnclu I . I I1 . . . .
o c t e image 1s merely an array of pixels. 1 here 1s
· cl
·c ue pattern 111 uc1cu 111 ,,, ,.h l •
, , 1 . t:t •
a Unl 1 1
"<" 0 1Jec t 11 , · , · the ·image and the computer tncs · to fi111 d
. . lt . . resent 111
out tI1cse pa cins to get the in fc .
. . >r111at1on about lhe image.
. .
, Machines can be trained givin, l . 01 . . to increas e its ability to recogn ize the objects
. d i. . . g ons linages
mclu cc m a given rnput image.
, Most of the digital companies· hw, .
. • e opte<l lor CNN s for image recognition , some of these
· I100 k, etc.
include Goqglc, Amazon, Instagra111 , l1·1tc1·es. t, 1·ace
· ·
• Hence, we define a con voluti011d. I neural network as : "A neural network consisting of
· · ·
multiple convol
.
·
utiona l layers· w1, ·ic•I1 .<11.c use d marnly for ·image proccssrng, class1ficat1on ,
segmentation and other correlated data".

EDI Advantages and Disadvantages of CNN


1. Advantages :
ision.
• CNN automatically detects the important feature s without any human superv
• CNN is also computationally efficient.
• Higher accuracy.
• Weight sharing is another major advantage of CNNs.
with a regular
• Convolutional neural networks also minimize computation in comparison
neural network.
• CNNs make use of the same knowledge across all image locations.

2. Disadvantages :
• Adversarial attacks are cases . of feeding the network 'bad' examp
les to cause

misclassification.
• CNN requires lot of training data.
• CNNs tend to be much slower because of operations like maxpool.

1!11 Application of CNN


• CNN is mostly used for image classification, for
containing mountains and valleys or recopi6*-
signal processing, etc. are the areas whent r~ie!~!
Third-Generation Neural Netwo L

. 3-6 ~
Neural Networks and Deep Leaming d
·11ance systems an smart 1, 0111
. . ars Al-powered survc• . . es
• Objt•ct detection : Sclf-dnvmg c ' b" t CNN can identify obJccts on th
·d tify and mark o ~ec s. e
otkn use CNN to be able to I en
. . I st·r and label them.
photos and m real-tune, c as Y . D nind' s \.VavcNct Conv"t1. •
. , · synthesizer uses eepi ~e1
• Voice synthesis : Google Assi stant s voice

model. f d. t lescope data and predict the probable


: They are used to make sense O ra 10 e
• Astrophysics
visual image to represent that data.

IfAi Basic Structure of CNN Fully


connected

Convolution
Pooling

Fig. 3.2.1 Basic architecture of CNN

• A convolutional neural network, as discussed above, has the following layers that are useful
for various deep learning algorithms. Let us see the working of these layers taking an
example of the image having dimension of 12 x 12 x 4. These are:
I. Input layer: This layer will accept the image of width 12, height' 12 and depth 4 .
2. Convolution layer : It computes the volume of the image by getting the dot product
between the image filters possible and the image patch. For example, there are 10 filters
possible, then the volume will be computed as 12 x 12 x 10.
3. Activation function layer : This layer applies activation function to each element in the
output of the convolutional layer. Some of the well accepted activation functions are
ReLu, Sigmoid, Tanh, Leaky ReLu, etc. These functions will not change the volume
obtained at the convolutional layer and hence it will remain equal to 12 x 12 x 10 .
. 4. Pool layer : This function mainly reduces the volume of the intem1ediate outputs. which
enables fast computation of the network model, thus preventing it from overfitting.

TECHNICAL PUBL/CA TIONS® - an up-thrust for knowledge


. 1Netwo11,,s n11d [)opp LMming Third-Generation Neomi Notworl<s
!}!:!!-!--- 3. 7

EIJ Extreme Learning Machine Model


• Leaming Machine (ELM) was propmcd by Guang-Rin and Qin-Yu , which was aim
F '.\.lt\' 11\C f
t, tni n Sin gle-I
l • .

tl,r Single ll1dJcn


.
liddcn Layer F·ccdforward Network s (SLFN~). ELM is a training atgont· Iun

..
Laye r Fccd"iorward Neural Network (SLFN) , which converges muc~t1
· EL
faster than trad1t1onal rncth 0 ds. · • I I 'ti 1111s·
· M converges much fa~tcr than trad1t1ona a gon
I
bclnuse 1t learns without iteration
b", , in
·
, ELM assigns random values t0 ti1c weights .
between input
.
and hidden layer amI tI1c ,ascs
th e hid<lcn layer and th csc parameters arc frozen during training . Fig. 3.3.1 ~hows th c
architec ture of the ELM .

,, ,,
I I I I
I I I I
I I I I
I ' t
I I
I I t
I I I
I I I
I I I
I I
X· I
I I
'o
I I 1
I
I I
I
I I
I
t I
I
t t
I I
I
t
I I I
I I I
t I I
I
I I I
I
I I ',._ I
,._ I

Input Hidden Output


neurons neurons neurons

Fig. 3.3.1 Architecture of the ELM

, hidden
• ELM is a single-hidden layer feed-forward network with three parts : input neurons
neurons and output neurons:
fonn of
• In particular, h(x) == [h 1 (x), ... , ~ (x)] is nonlinear feature mapping of ELM with the
n the
h}x) == g(wj. x + bj) and pj == [Pjl, ... ,Pf] T,j ==I, ... , Lis the output weights betwee
jth hidden layer and the output nodes.
linear
• The basic training of ELM can be regarded as two steps: _random ·initialization and
parameter solution.
I. Firstly, ELM uses random parameters wi and bi in its hidden layer and they are frozen
during the ·whole training pro~ess. The ~ut vector is mapped into a random feature
t
space with random :settings and nonlinear activation functions which is· more efficien

TECHNICAL PUBLICATIONs® - an up-thrust for knowledge


Neu ral Netw orks and Dee p Lear ning
3-8 Thi rd-Ge ne!a tion Neu ral N ~
than those of train ed parameters. Wit
h non line ar _piec ewi se con tinu ous activ
ation
functions , ELM has the universal approxi
mation capability. . . .
2. In the second step, p can be obtained
I
by Mo ore- Pen rose inve rse as it is a lme ar problem
HP = T.
• In ELM, the main idea involves the
hidden layer weights. Fur ther mor e, the
biases are
rand oml y generated and the calculation
of the outp ut wei ghts is don e usin g the
least-squares
solu tion

El ) The Co nvo luti on Operation


• Con volu tion operation focuses on extracti
ng/preserving important features from
Con volu tion operation. allows the netw the input.
ork to detect horizontal and vert ical edg
image and then bas ed on those edges buil es of an
d high-level features in the foll owi ng
layers of
neu ral network.
• In general form , convolution is·an operatio
n on two functions of a real valu ed argu
mot ivat e the definition of convolution, ment. To
we start with examples of two function
s we might
use.
• Sup pos e we are tracking the location
of a spaceship with a lase r sensor. Las
pro vide s a single outp ut x(t), the position er sensor
of the spaceship at time t. Bot h "x" and
real-valued, i:e., we can get a different read "t" are
ing from the laser sen sor at any inst ant in
time.
• Now sup pos e that our laser sensor is som
ewhat noisy. To obtain a less noisy esti mat
spa cesh ip's position, we would like to e of the
average together several measurements.
Of course,
mor e rece nt measurements are more rele
vant, so we will wan t this to be a wei ghte
that gives more wei ght to recent measure d average
ments.
• We can do this with a weighting function
w(a) , whe re "a" is the age of a measuremen
app ly · suc h a wei ghte d average operatio t. Ifw e
n· at every moment, we obta in a new
pro vidi ng a smo othe d estimate of the pos function
ition "s" of the spaceship.
• Con volu tion ope rati on use s three paramet
ers Input image, Fea ture dete ctor and
Feature
map.
• Con volu tion ope rati on involves an inpu
t matrix and a filter, also kno wn as the
kernel. Input
matrix can be pixel valu es of a grayscal
e image whereas a filter is a rela tive ly sma
ll matrix
that detects edg es by dark enin g areas
of input image whe re ther e are transitio
brig hter to dark er areas. The re can be diff ns from
erent types of filters dep end ing upo n wha
features we wan t to dete ct, e.g. vertical, t type of
horizontal, or diagonal, etc.
• Input ima ge is con vert ed into bina ry 1 and
0. The convolution operation, sho wn in
Fig. 3.4.l
is kno wn as the feature dete ctor of a CN
N. The input to a convolution can be raw
data or a

TECHNICAL PUBLICATIONS® - an up-th


rust for knowledge
~ As and Deep Learning 3 _9

th
Third-Generation Neural Networks Ii
'

k:iturc ~nap ~utput from anolber convolution. It is often interpreted as a filter in which c
· ~ •
1"-crnd tilters mput data for certain ki'nds of m,orn1at1on.
• Sl)1nctimes a 5 x 5 or a 7 x 7 matrix is used as a featur~ detector. The feature detector is
ollcn referred to as a "kernel" or a "filter,". At each step, the kernel is multiplied by the input
data values wi th in its bounds, creating a single entry in the output feature map.
-------►

1 I 1 I 1 I '.,.___I
,r_
I I U I1' '
___ IL ___ 1
1 --r---r--- I.
o I 1 I 1 I 1 I 0
I I I I 1 : 0 : 1
---~---~---
0 Q 1
~---~---
1 1
___ I ___ I __ _
I I

,. ,.
I I I I I I ,,.
I I I I 0 I 1 1,,."'Q
q---t---, ,.
---~---t~ ___ I - .,z1' ---
1 ,,. I ,. ,.
Q Q I
I
1 1-.....J nI
7'-'I..
I
I
I ~ 0 I 1 ,. ,.
---~---t---~---~---
0 1 1 1 1 1 o 1 o
,,."' I '-{--,
I' Convoluted feature
I I I I Kernel

Input data

Fig. 3.4.1 Convolution operation

• Generally, an image can be considered as a matrix whose elements are numbers between 0
and 255. The size of image matrix is : image height*image width*numbe r of image
channels.
• A grayscale image has 1 channel, where a colour image has 3 channels.
• Kernel : A kernel is a small matrix ·of numbers that is used in image convolutions.
Differently sized kernels containing different patterns of numbers produce different results
under convolution. The size of a kernel is arbitrary but 3 x 3 is often used. Fig. 3.4.2 shows
example of kernel.
0 1 0

1 1 1

0 1 0

Fig. 3.4.2 Example of kernel

• Convolutional layers perform transformations on the input data volume that are a function of
the activations in the input volume and the parameters.
• 1n reality, convolutional neural networks develop multiple feature detectors and use them to
develop several feature maps which are referred to as convolutional layers and it is _shown in
Fig. 3.4.3.

TECHNICAL PUBLICATIONS® • ~n up-thrust for knowledge


Third-Generation Neural Net
. Wo11ia
3 . 10
order fior It. to
L ing --- -- . . . wIw1 (C'Jtllr cs 1t. 11inc·ls important m
- 'C. _c:_~ ·
1
Neural-
---=-- Nc>t wo, -De
- -J.. s-and '
. . ·k ktl'nn1rn.: s I
1 1 l we acc urate Y· .
• Th1\1t11.:li tr:1i11i11 ~. tltl· K " Of
.. i . 1, ,nrizc them~ mt d , tditional hyper-para
, a111. c.1 l,:.b r the layer an ac ll'leters.
be :1bk ll) s r:111 images O
that the class scores are
. . 1. 1 , 1arametcrs
• C1.,m'l1h111011al la yers l,l\ i..: I
.
ters in tl11s 1aye r such
. . the parame ·
Gradil'nl Jcsccnt is used to tra1l1
consistent with th e labc Is 111 1e · g set.
· ti· trainin
Feature maps
we create many
feature maps to
-- -- -~
obtain our first
convolution layer

1-L-~~t~1=~1~~t}~E13;~t~~:::::::====~~ij~itr,~=====~~..,
Io )
0 0
0 0 0 0 0 0 0

Input image

Convolutional layer

Fig. 3.4.3 Feature detectors

• Components of convolutional layers are as follows :


a) Filters ·
b) Activation maps
c) Parameter sharing
d) Layer-specific hyper-parameters
width and height of the
• Filters are a function that has a width and height smaller than the
inp~t tensor when the
input volume. The filters are tensors and they are used to convolve the
filter tensors are the
tensor is passed to t~e layer instance. The random values inside the
weights of the convolutional layer.
of the input volume during
• Sliding each filter across the spatial ?imensions (width, height)
two dimensional output
the forward pass of information through the CNN. This produces a
called an.activation map for that specific filter.

Ill ■ Parameter Sharing


count. Convolutional layers
• Parameter sharing is used in CNN to control the total parameter
a technique· called h .
reduce the parameter count further by using
. param eter s anng . .
.

~ k
TECHNICAL PUBLICA TIONs® _an un-thru t ,or
,., s now/edge
.,eura1Netw orks and Deep Learning
::..:- ------ '--__ ..:..;: .:..:.! !._ _ _ _. 3 11
,.
• Third-Generation Neural Networks
, The user can reduce the number of
c:in compute at some spatial p .l_Paramctcrs
. )' ) . 0s1 ion (x ) tl by .mak
.
ing an assumption that if one feature
t-' ,. , t , 1en 1t ts useful to compute a diffe · .
- - rent place
, In other words, qeno ting a single .
. ·k-1Jropagation eve 20
b,1~ t . shce of depth as a depth slice. For example,
, ry neuron m the . during
, c gradients will b d netw
t IllS ork will compute the gradient for its weights, but
. t e a ded up a .
.
we1g11 s per s1ice. · cros s each depth shce and only update a smg .
le set of
, If all neurons in a single depth sr .
. ice are usmg the same weight vector, then the
of the convolutional layer can b forward pass
, . . :
e computed m each depth slice as a convolut •
neuron s weights with the input 1 · ion of the
. vo ume. This • .
1s the reason why it is common to refer to the
sets of weights as a filter (or a kernel) th t .
, a 1s convo1ved with . .
the mpu t.
, Fig. 3 .4.4 shows convolution shar th
es e same parameters across all spatial locations
·
.

Fig. 3.4.4 Con volu tion shares the sam


e parameters across all spatial location
s
B Equivariant Representation
• Convolution function is equivariant to trans
lation. This means that shifting the input and
applying convolution is equivalent to applying
convolution to the input and shifting it.
• If we move the object in the input, its
representation will move the same amount
in the
output.
• General definition If representation(transform(x)) = transform(re
presentation(x))
.then representation is equivariant to the tr~nsform
• Convolution is equivaria~t to translation.
This is a direct cons equence of parameter sharing.
• It is useful whe n detecting structures that are com
mon in the input. For example, edges in an
image.
• E • . •
qu1vanance in ear1y layers is good· We are able to achieve translation-invariance (via
max-pooling) due to this property.
.
. . • · ·ant to othe r oper ation s such as change in scale or rotation.
• Convolution 1s not eqm van .
.

----------------:
-L~/:CA:T:/::;O:N~S®®-.:an;:-u:p:-t;hru~st~f
TECHNICAL PU8 o;r~k~no;w01/e~d~g;e---------
·
Neural Nf't .. , 1
0 {'t~p Loa/ Ill/lg 3 _ 12
- n Ll/1\S ,1n(1 Third-Generation Neural Net
~
• E'\11mpk of l'quirnrhmcr With 20 images convolution creates a
map where Cena•
fraturc s appear in the inpuL If we move
the object in the inpu~, the representation
the same amount in the output. It is usef
Will mo::
ul to detect edges m firs_t \ayer of convo\uti
ona\
nc twor k. Samc edges appear cveryw I1ere , ,
m image, so it is practical to share pararnet
across entire ima ge, ers

l( fl Padding
• Padding is the process of adding one
or more pixels of zeros all arou nd th e bou
ndaries of an
image, in ord er to increase its effective size
. Zero padding helps to make output dime
nsions
and kern el size independent.
• One observation is that the convolution , th
operation reduces the size of the (q + 1)
layer in
comparison with the size of the qth laye
r. This type of reduction in size is not desi
rable in
gene~al, because it tends to lose some
information along the borders of the ima
ge. This
problem can be reso lved by using padding
.
• 3 common zero padding strategies are
:
a) Valid convolution : Extreme case
in which no zero-padding is used whatsoe
ver, and the
convolution kernel is only allowed to visi
t positions where the entire kernel is contained
entirely within the input. For a kernel of
size k in any dimension, the input shape of
the direction will become m- k + 1 in min
the output. This shrinkage restricts architect
ure
depth.
b) Same convolution : Just enough
zero-padding is added to keep the size of
the output
equal to the size of the input. Essentially
, for a dimension where kernel size is k,
input is padded by k.:... 1 zeros in-that dim the
ension.
c) Full convolution : Other extreme
ease where enough zeroes are added for ever
y pixel to
be visited k times in each direction, resulting
an output image of width m + k - 1.
d) The 1D block is composed by a configu
rable number of filters, where the filter has
a set
size; a convolution operation is perform
ed between the vector and the filter, prod
ucing
as output a new vector with as many channe
ls as the number of filters .. Every value in
the
tensor is then fed thro1:1gh an activation func
tion to introduce nonlinearity.
• When padding is_not used, the resu
lting "padding" is also referred to as a
valid padding.
Valid padding generally does not work wel
l from an experimental point of view. In
of valid padding, the contributions of the the case
pixels on the borders of the layer will be
under·
represented compared to the central pixe
ls in the next hidden layer, which is undesira
ble.

TECHNICAL PUBLICATIONS® - an up-th


rust for knowledge
rks and Deep Learning 3 • 13 Third-Generation Neural Notworks
~

-Str ide . .
in • · . . .·
c onvolut1on functions used. practic e differ slightly compa red to convolution operation as
'
. .
it is usually understood m the math emattc .
a 1hterature.
· · t
In general a convolution layer consis ts of applic ation of several different kernels to the mpu ·
' • Tl ·
.
This allows the extraction of severa 1 differe nt features at all locations in the mput. us
kernels, are used as
means that in each layer, a single kernel is not applied. Multiple
di ffe rent feature detectors.
channel convolutions
• The input is generally not real-valued but instead vector valued. Multi-

are commutative only if numb er of output and mput .
channels 1s the same.
° convolutions can be
• In rd er to allow for calculation of features at a coarser level strided
lution followed by a
tised. The effect of st rided convolution is the same as that of a convo
down sampling stage. This can be used to reduce the representation size.
,I

' r.
and vertically over the . '/:
• The stride indicates the pace by which the filter move·s horizontally
convolution.
pixels of the input image during convolution . Fig. 3.4.5 shows stride during
I
Stride= 1

i
I.

# =

Fig. 3.4.5 Stride during convolution

amount of movement
• Stride is a parameter of the neural network's filter that modifies the
images and video
over the image or video. Stride is a component for the compression of
move one pixel or
data. For example, if a neural network's stride is set to 1, the filter will
unit, at a time. If stride= I, the filter will move one pixel.
smaller stride size if we
• Stride depends on what we expect in our output image. We prefer a
hand, if we are only
expect several fine-grained features to reflect in our output. On the other
interested in the macro-level of features, we choose a larger stride size.

l'!E Typical Setting


when strides are used,
• For square images,· we use stride sizes of I in most settings. Even
s are not square,
small strides of size 2 are used. In cases where the input image
preprocessing is used to enforce this property.

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


3 _ 14 Third-Generation Neur al Networks
-
Neural Networks and Deep Learning
the image to creat_e the training d~ta. The
• For example, one ca n ex tract ·square patch es of
er of 2, because this often results m more
number of filters in each layer is often a pow
to hidden layer depths that are pow ers of 2.
efficient processing. Such an app'roach also leads

f!I;) ReLU Layer


from the filtered image and replace it with
• In this layer we remove every negative value
input is above a certain quantity. So, when
zero. This function only activates when the node
the input is below zero the output is zero.
in threshold it has linear relationship with the
• However, when the input rises above a certa
accelerate the speed of a training <lat.a set in
dependent variable. This means that it is able to
ation functions.
a deep neural network .that is faster than other activ
ation function is com bine d with a linear
• In traditional neural networks ' the activ
e the next layer of activations.
· transformation with a matrix of weights to creat
a
typically used as the activation function in
• The reason why the rectifier function is
nonlinearity of the data set. By removing
convolutional neural network is to increase the
the rectifier function is effectively removing
negative values from the neurons' input signals,
with gray pixels.
black pixels from the image and replacing them

·ftf j Sparse lntera·ctions


e connectivity or sparse weights. Sparse
• Sparse interactions are also referred to as spars
re detector smaller than the inpu t image,
interaction is implemented by using kernels or featu
i.e. Making the kernel smaller than the input.
then it becomes difficult to dete ct edges in
• If we have an input image of the size 256 by 256
pixels in the image. This mea ns that we need
the image may occupy only a smaller subset of
the mem ory requirements of the model and
to store fewer parameters, which both reduces
s that computing _the outp ut requires fewer
improves its statistical efficiency. It also mean
operations.
to interact with the local regio n in the image.
• Sparse interaction idea uses convolution kernel
h improves t~e para mete rs and efficiency
This region is called receptive field, whic
com pare d with the full connection layer.
picture, the pixels ~f the imag e may contain
• For example, when process_ing a three channel
detect the edge info mati on in the image, we
thousands of pi~els, but when we only need to
picture, we ·only need to use the convolution
do not need to connect the pixels of the whole
This calc ulati.o n meth od not only improves
kernel containing hundreds of pixels to detect.
part of the para mete r space. ·
the calculation efficiency, but also saves a large

TECHNICAL PUBLICA TIONs® ~ an up-thrust


for knowledge
and Deep Learning
,Netw0 rks::,..:::..;__--=----.. !::...__ _ _ _ 3-15
Third-Generation Neural Networks
Nevra
A'.1 pooling
p;"' p0 oting helps the representation becom r .
' r ·g function take th e s ightly invariant to small translations of the input.
A poo in s e output of the .
" ,, previous layer at a certain location L and
ornputes a summary of the neighborho od around L.
C .
The pooling layer reduces the height a d .
' k fi n width of the input. It helps reduce computation, as
well as heps1 ma e eature detectors mo . . .
. . re mvanant to its position in the input.
The function of the -pooling layer •
• . . IS to progressively reduce the spatial size of the
representation to reduce the amount 0 f ·parameters and computation in the network, and
-

hence to also control overfitting· N 0 1earnmg
_ takes place on the pooling layers.
.
pooling layer after th e convoluttonal
• The addition of a layer is a common pattern used for
ordering la_y ers wi th in a convolutional neural network that may be repeated one -or more
times in a given model. ·

· • The pooling layer operates upon each feature map separately to create a new set of the same
number of pooled feature maps. Pooling involves selecting a pooling operation, much like a
filter to be applied to feature maps.
• The size of the pooling operation or filter is smaller than the size of the feature map. This
means that the pooling layer will always reduce the size of each feature map by a factor of 2,
e.g. each dimension is halved, reducing the number of pixels or values in each feature map
to one quarter the size. ·
• For example, a pooling layer applied to a feature map of 6 x 6 (36 pixels) will result in an
output pooled feature map of 3 x 3 (9 pixels). The pooling o·peration is specified, rather than
learned.
• The pooling operation, also called subsampling, is used to reduce the dimensionality of
feature maps from the convolution operation. Max pooling and average pooling are the most
common pooling operations used in the CNN.
• Pooling units are obtained using functions like max-pooling, average pooling and even L2 -
norm pooling. At the pooling layer, forward propagation results in an N_x N pooling block
being reduced to a ·single value - value of the "winning unit". Back-propagatio n of the
pooling layer then computes the error which is acquired by this single ~alue "winning unit".
• Pooling layers, also known as down sampling~ conducts dimensionality reduction, reducing
the number of parameters in the input. Similar to the convolutional layer, the pooling
operation sweeps a tilter across the entire input, but the difference is that this filter does not
have any weights. Instead, the kernel applies an aggregation function to the values within the
receptive field, populating the output array. Thero are t\_VO main types of pooling:

l Network
Thi rd- Ge ner atio n Ne ura ~
3-1 6 . . .
.
max 1rnu 111
Neural Ne two rks and Deep Lea rnm g
• t it. selects the pixel with the
.
. A I1 filt 1 er
moves across .d mpthu. ' approac h tends to be used more ofte n
t 1,e
• Max poo l111 g : s t c As an as , e, ts
sen d to the ou tpu t arra Y.
va lue to I .
r:ige poo .lin g. cu lates the av era ge va ue Within
compared to ave the input it cal
er mo ves acr oss '
filt
• Avcragc po oling : As the .
to sen d to the ou tpu t array. er a certai n feature
the recepti ve field ful if we care more about wh
eth
.
on can be use
• Invariance to local tra nslati
where it is.
is present rather than exactly
tio n
Ell Variants of the Basic Convolution Func s :
ion functions are as follow
• Various types of convo lut
l. • Stride
2. Zero padding
3. Unshared co nvolution
en channels
4. Partial connectivity betwe
5. Tiled tion as
practi ce dif fer sligh tly co mpared to convolution opera
in
• Convolution functions used
mathematical literature .
it is usually understood in the to the input.
er con sis ts of ap pli cat ion of several different kernels
• In general a convolution lay in the input. This
ractio n of sev era l dif fer en t features at all locat~ons
This allows the ext . Multiple kernels, are use
d as
single ker nel is no t ap pli ed
means that in each layer, a
different feature detectors . ions
alu ed bu t ins tea d ve cto r va lued. Multi-channel convolut
l-v
• The input is generally not rea the same.
uta tive only if nu mb er of output and input channels is
are c·omm
fea tur es at a co ars er lev el strided convolutions can be
ion of
• In order to allow for calculat ed by a
nv olu tio n is the sam e as that of a convolution follow
co
used. The effect of strided entation size.
sam pling stage. Th is can be used to reduce the repres
down

FIi■ Tiled ough space,


set of kernels tha t is rotated through as we move thr
a
• Tiled convolution learn y connected
te set of we igh ts at ev ery spatial location as in locall
rather than learning a separa
layer.
d a lo II Y co nn ec ted layer.
een a convolutional layer an ca
• It offers a compromise betw · f
for sto rin g the pa ram ete rs will inc se on Iy b.y a factor of the size o
• Memory requirements rea
. ·
this set of kernels.

~U;B~Ll~C;A;Tl;;O;N ;s® ®- a= n~ :--:,:--h--=---------------


------:----T.;E~C;Hll\N;,c;A~L~P Up- rust for knowledge
rks and Deep Learnin g
,, ,,11 ('}etWO 3 _ 17
1 Third-Generation Neural Networ ks
1
- k be a 6- D t ensor, w h ere two of the ct·
~c.
.· .
, l.Lt ,np. Rather than havin imcnsions corresp ond to differe nt locatio ns in the
1111\'ut , , g a separa te inde fi I
l . cycle throuo h a set f ct·
.
x or cac 1 location in the output map, output
I1 c ittl'n~ o
the output width th· . th ifferent c h oices
o .
' ' .
of kernel stack in each directi on. If t 1s
1rd tL1 , is is t e same
i:q ' as a locally connec ted layer.
- ,rans pose d and Dilate d Convolutions

, transp osed
. convo
. lution s : These typ es O f convolutions
. . ns
are also known as dcconv olut10
or fractt0. nally stnded convo
. lutions · A transpo
sed convolutional layer carries out a regula r
convolution but reverts its spatial transformation.
, fi<ro· 3 ·6. l shows how transp osed con vo1uhon
· with
· a 2 x 2 kernel is compu ted for a 2 X 2
input tensor. ·
Input Kernel
ffiJJ Transposed @JJ
~ conv
ITIT1
Output
0 0 1
0 4 6
4 12 9

Fig. 3.6.1 Transp osed conv~l ution with a 2 x 2 ker~el

• The shaded portio ns are a portion of an intermediate tensor as well as the


input and kernel
tensor elemen ts used for the compu tati?n.
• Dilated convo lution operat ion expands window size withou t increa sing
the numbe r of
weights by inserti ng zero-v alues into convolution kernels. Dilate d convol utions
can be used
in real time applic ations and in applications where the processing power is
less as the RAM
requirements are less intensi ve.
• Dilated convo lution also called atrous convolutions. The central idea is
that a new dilatio n
parameter (d) is introd uced, which decides on the spacing betwee n the filter
weigh ts while
performing convo lution.
• Fig. 3.6.2 shows convo lution with a dilated filter where t4e dilatio n factor
is d = 2.
• Dilation by a factor of "d" means that the original filter is expand ed by d
- 1 spaces betwe en
each elemen t and the interm ediate empty locations are filled in with zeros.
. . .

; , .
;
3 - 18 Thir d-G ene ratio n Neu ral Networks
g
Neu ral Net work s and Dee p LeR rnin --- ---

(b)
(a)

(d)
(c)
is d = 2
ted filte r whe re the dilation fact or
Fig. 3.6. 2 Con volu tion with a dila

ilJ Structured Outputs rather


nal netw ork s can be use d to out put a high -dimensional structured object,
• Convolutio ion task.
sification task or a real value for a regress
than jus t predicting a class label for a clas
tted by a standard convolutional layer.
• Typ ical ly this object is jus t a tensor, emi
or S, where SIJ, .. k is the probability that pixel (j, k) of
• Example, the model might emi t a tens
s i.
the inp ut to the network be_longs to clas
w
the mod el to . labe l eve ry pix el in an image and dra w precise masks that follo
• Thi s allo ws 3. 7.1 shows
out line s of ind ivid ual obje cts. Cha nge architecture for large D outputs. Fig.
the ~.7.1 on next
network for pixel labeling . (Re fer Fig.
an exa mp le of a recurrent convolutional
pag e)
h axes corresponding to image rows, image columns and
• The inp ut is an ima ge tens or X, wit /\
ability
, gre en, blue ). The goa l is to out put a ten sor of labels Y, wit h a prob
cha nne ls (red ge rows,
ove r labe ls for eac h pix el. Thi s ten sor has axes cor resp ond ing to ima
dist ribu tion
ses .
ima ge col um ns, and the diff ere nt clas
its
recurrent net wor k iter ativ ely refines
/\

out put ting Y in a sing le sho t, the


• Rat her tha n /\
The same
as inp ut for cre atin g a new estimate.
/\

e Y by usin g a pre vio us esti mat e of Y


esti mat
as many times
am eter s are use d for eac h upd ated estimate and the esti mat e can be refi ned
par
as we wish.

an up-thrust for knowledge


· TECHNICAL PUB LIC ATI ONS ® -
',I f\'1•/\\\ ,rB_!!l1(~-l~-P~- Lfl,1'.!1''.'9
,l . M
\1';
7"" ,1 c;,,,,,,,
.-
11 11/lll N,,11rnl Nf'lwr11 k .'i -
- .. .
.

Fi g. 3.7.1 Exam ple of a r


ecurrent convolutional networt( for pixel labeling
, I he kn~nr nf cun\'ulutinn 1 ,, 111,,is \ J • j
"" " ' IS ll\l'( Oil Cell 1l !\trp to comput e the IlH 11( en
1cpn.·~c nta ti,,n gin~n the input image .


I he kcrm: l tensor V is used t 0 . I . · te of the labels •
· • l'tol ll l:C an estima given the hidde n va Iuc-;.
, · () 11
th
all but c lir st step, the kernel s W arc convol ved over · Y to prov ide input to the
hidden
th
Li~ er. On e tir st time stC"p, thi s term is replaced by 1.cro. lkcau
,c the same parameters arc
11~1.:d l'll eac h step, this is an example of a recurr
ent network .
EI)D ata Type s
• The (.bta used with a convo lutional network usually consists
of severa l chann els, each
channd being the observation of a different quantity at some point
in space or time.
• One advantage to convo~utional networks is that they can also proces
s inputs with varyin g
spatial extent s. These kinds of input simply cannot be represented
by traditi onal, matrix
multip lication-based neural networks. This provides a compe
lling reason to use
convo lutional networks even when computational cost and overfi
tting are not significant
issues.
• For example, consider a collection of images, where each image
has a different width and
height. It is unclear how to model such inputs with a weigh
t matrix of fixed size.
Convolution is straightforward to apply; the kernel is simply applie
d a different number of
rimes depending on the size of the input and the output of the ·convo
lution operation scales
accordingly.
, Convolution may be viewed as matrix multiplication; the same convo
lution kernel induces a
different size of doubly block circulate matrix for each size of input.
Sometimes the output
of the network is allowed to have variable size as well as the input,
for example if we want
to assim
0
a class label to each pixel of the input.
• Data s~zes and types : Number vs volume
• l billion tweets - 130 Gs ( l 40bytes - tweet)

TECHNICAL PUBLICA T/ONS® - an up-thrust for knowledge


Neu ral Netw orks and Dee p Leam ing
3 - 20 Thir d-Ge nera tion Neu ral Networks

• lm,1 gcN ct : o Ycr \-l milli(1n !Jbd cd


high -res olut ion ima ges bek mgi ng to
roug hly 22,0oo
cate gori es - 150 Gs
• Non n::1li zc the d:ita to [O, I]: Dee p lear ning
like s nonn::1\i zed ycc tors .
► Wh en Not to Use Con volu tion ?
• The use of con yo\u tion for proc essi ng
vari:ibly size d inpu ts mak es sens e only
for inpu ts that
hav e vari able size bec ause they con tain
yary ing amo unts of obs erva tion of the
sam e kind of
thin g - diff eren t leng ths of reco rdin gs onr
tim e, diff eren t wid ths of obs erva tion s ove
etc. r space,

• Con volu tion doe s not mak e sens e if


the inpu t has vari able size bec ause it
can optionally
incl ude diff eren t kind s of obse rvat ions .
• Exa mpl e : If we are proc essing coll ege·
app lication s and our feat ures con sist of
both grades
and stan dard ized test scores,· but not ever
y app lica nt took the stan dard ized test,
then it does
not mak e sens e to conY olve the sam e
wei ghts ove r features coITesponding to
the grad es as
wel l as the feat ures corr espo ndin g to the
test scores.
II ] Efficient Convolution Algorithms
• Con volu tion is equ ival ent to con vert
ing both the inpu t and the kern el to
the frequency
dom ain usin g a Fou rier tran sfor m, perf
orm ing poin t-wi se mul tipli cati on of the
two signals,
and ~on vert ing bac k to the time dom
ain usin g an inve rse Fou rier transfom
1. For some
pro blem size s, this can be fast er than the
na'ive imp lem enta tion of disc rete con volu
tion .
• Wh en a cl-d ime nsio nal kern el can be exp
ress ed as the oute r prod uct of d vec tors
, one vector
per dim ensi on, the kern el is call ed
sepa rabl e. Wh en the kern el is sepa
rabl e, naive
con volu tion is inef ficie nt.
• It is equ ival ent to com pos ed one -dim ensi
ona l con volu tion s with each of thes e
vect ors. The
com pos ed app roac h is sign ifica ntly fast er
than perf om1 ing one d-di men sion al conv
olution
with thei r oute r prod uct.
• The kern el also take s few er para met ers
to repr esen t as vec tors . If the kernel
is w elements
wid e in eac h dim ensi on, then naiv e
mul tidi men sion al con volu tion requ ires
O(w d) runtime
and para met er stor age spac e, whi le sepa
rabl e con volu tion requ ires O(w x d)
runt ime and
para met er stor age spac e. Not eve ry con
volu tion can be repr esen ted in this way
.
11111 Neuroscientific Basis
• Con volu tion al netw ork s are perh aps
the grea test succ ess stor y of biol ogic
ally inspired
arti fici al inte llig enc e. The hist ory of
con volu tion al netw ork s beg ins with
neuroscientific
exp erim ents long befo re the rele van t
com puta tion al mod els wer e dev elop ed.

TECHNICAL PUBLICATIONS® - an up-th


rust for knowledge
arks and Deep Learning 3 - 21 Third-Generation Neural Network!!_

~
rJB" Nc.:urophysiologiS ts David Hubel and Torsten Wiesel collaborated for several years to
• .0e,;1cr111ine many of the mo st basic facts about how the mammalian vision system works.
Their accomplishments were eventually recognized with a Nobel prize. Their wo_rk helped to
characterize many aspects of brain function.
• In this simplified view, we focus on a part of the brain called VI, also known as the primary
visual cortex. Yl is the first area of the brain that begins to perform significantly advanced
processing of visual input.
• In cartoon view, images are formed by light arriving in the eye and stimulating the retina, th e
·· · · . · le
tight-sensitive tissue m the back of the eye. The neurons in the _retina perform some s imp 1
preprocessing of the image but do not substantially alter the way it is represented.
• The image· then passes through the optic nerve and a brain region called the lateral
geniculate nucleus. A convolutional network layer is designed to capture three properties of '

Vl :
1. V 1 is arranged in a spatial map. It actually has a two-dimensional structure mirroring the
structure of the image in the retina.
l
2. Vl contains many simple cells. A simple cell's acttv1ty can to some extent be
characterized by a linear function of the image in a small, spatially localized receptive
field. The ·detector units of a convolutional network are designed to emulate these
properties of simple cells. I
tl
3. ~ l also contains many complex cells. These cells r~spond to features that are similar to I
J
those detected by simple cells, but complex cells are invariant to small shifts in the
position of the feature. This inspires the pooling units of convolutional networks.

BIi Applications: Computer Vision


• With the help of convolutional neural networks, deep learning 1s able to perform the
following tasks :
a) Object recognition b) Face recognit10n c) Motion detection d) Pose estimation
e) Semantic segmentation
a) Object recognition (detection) : Nowadays AI is · able to recognize both static and
dynamically moving objects with 99 % accuracy. In general, it is a matter of dividing the
image into fragments and letting algorithms find the similarities to one of the existing
· objects in order to assign it to one of the classes. Classification plays an important role in
this process and the success of object recognition largely depends on the richness of the
object database.

TECHNICAL PUBLICATIONS® - an up-thrust for knowledge


Neur al Net11101 ks and 0<'t'P L f'tll ning _
--- --~J~-~2~2_ __ _ _ _ _~_11_lf_d_-G_e_ne_r_a_t,~o~n~N~e~uri
a Ne~
h) rcL·,,~ nilillll is the idcn1.d1-ca t1·on o f a sp1.:
l-';h: I..' • •cific perso n know n to the syst
.
1
ein froillthc
~
d.11abasL' .
• .
l') ~lDlll)ll lktcct1on · d 1 11· 11 1·s a key pm1 of any surv eilla nce syste m
: Motion c ec 0 Th· ·
. · 1s Illa
d
bL' uscJ to trigg er an alam1, se n a no 1I·riica r·o1
1 ·1 to som eone or simp ly recor d th
' . e eve ~
for later ana lys is. One way to detec t moti·on is· b ·ng a moti on dete ctor h'
Y us,. .' w 1ch detectl\ts
chan ges betw een frames of an image sequ ence . T
he simp le st form of moti on detection is
thrcs ho lJ .
d) Pose estimati on : Hum an pose recog nitio n is
a chall engi ng com pute r visio n task due to
the wide variety of human shap es and appe aran
ce, diffi cult illum inati on and crowded
scen ery. For these tasks, phot ographs, imag e sequ
ence s, dept h imag es, or skeletal data
from moti on captu re devi ces are used to estim ate
the locat ion of hum an joint s.
e) Sem antic segm entat ion is a type of deep learn
ing that attem pts to class ify each pixel in
an imag e into one of seve ral classes, such as road
, sky or grass . Thes e labels are then
used durin g training so that when new imag es are
proc esse d they can also be segmented
into these categ ories base d on what they look
like com pare d with prev iousl y seen
pictu res.

FIii ■ Ima ge Compression-


• Imag e com pres sion has an impo rtant role in data
trans fer and stora ge, espe ciall y due to the
data expl osio n that is incre asing signi fican tly faste
r than Moo re's Law .
• The arch itect ure show n in Fig. 3.11.1 belo w,
has two disti nct parts Com CNN and
RecC NN.
.,,,. ---- ---- ---

____
I
'
/ .,, -,
I
I
,' ,-- ,
I
-'
I I
I
t'."""...K:M--...1'... I
I
I
~: Image ~
I L..,._/ enco der l-.y""
I
I
: Compact representation
Original \
, __ ComCNN ___ .,,'I
· image

/
,,,,. ---- ---- --- ' Channel ·
I \
I
I
I
I
I
,· Imag e
I deco der
I
I
I
I
I
Decoded image
\
, ___ RecC NN __ _ ./
', ________
Co(•) ________
;
,
· Fig. 3.11.1

TECHNICAL PUBL/CA TIONS® - an up-thrust for


knowledge
,, 111l1~s rmd lkr p L<'mmng
J • 23 Th,rd-GP.nP.rRtmn Ne11r,1f N~ .!.. !,
,r,' / \, ' -- -·

11l11tional layers !-.la d d. . im:1gc. IJe~aw,c


~ • 11 l .... 111' r 1111,
• • • ' c '" llus way can capture feature~ of an ·· ·
, • l .
th ' . . <.'ompo~I1Ion of
,f ,Ill' 11"'-' of multi -layer CNN~ • is arch,tcC'turc can mamram the ~,ructural
'
;11111 11, ifC ;1s well.
I
is a netw ork rc~p ·t I . thc~e ima ges in ~uc h a way I iat
I Ill' C111 . 11C NN 011 ~ ' c 111 comprcM
1 ing

· . -k
.on network . 1 his. net,rni
c~1 i11 :111l ima ges can he cffectiv I . ucrcd hy rcconstrnct1
r c Y rcconstr · ·
volutio nal I · I the second layer followed hy halt:h normal11:at10n
• 111 ..,j ... 1s of three con ' aye rs w,11
LI ·

l;l ~'l'I'.

• ·Sinc
e 1hc first convolutional Jaye .
r uses
. a !ltrn
. 1s
. ge ~11e
•Jc of two , the ima . reduced hy IHJ ff•
I
I
I
and the last ayer,
Rec ( 'NN layer uses twe nty N eura I network layers. Apart from the fin,1 · n.
f ~
· n operalw
· al1o
•ich l;.1y·cr in this formation carr1·cs
,
ou t convo Iutm · nal and hatch normal1z
c..
PSNR metrics of
scale images and 50 epochs . SSIM and
, Trainc<l th c network using 400 gray
.
1'1 csc ima ges arc better than JPEG images
m arc visible
es, artifacts of image compression algorith
, In lossy image compression techniqu for
An examp le of such ai1i fact s is visible on images for which tiling was used
in images. ges .
ntiz atio n . In such ima ges, thes e tile boundaries continue to remain in the ima
qua

Elf) Two Marks Questions with Answers


Q.1 What is a spiking neural netw
ork ?
uses discrete time
Ans. : A spiking neural network is
a type of artificial neural network that
t
s type of neural network is more efficien
the brain. Thi
I t

steps to simulate the firing of neurons in sing


al artificial neural networks and can more accurately model the brain's proces
than tradition
of infonnation
Q.2 Define con volu tion al networks.
in place of
Con volutional netw orks are simply neural networks that use convolution
Ans. :
of their layers.
general matrix multiplication in at least one
at are benefits of it ?
in convolutional networks ? Wh
Q.3 How sparse inte rac tion s used
than the
rse interaction is implemented by using kernels or feature detector smaller
Ans. : Spa
than the input
input image, i.e. Making the kernel smaller

Q.4 Why sparse interactions is ben


eficial ?

Ans.: efficiency ,
t R duc es the mem ory requirements and. improves its statistical .
• Fe wer parame ers : e
rations
• Computing the output requires fewer ope
tional
Q5 wou 1d sparse •mteract,on
·• e reduction on performance of in convolu
· s caus

--- ..:.:"_e=:tw~o~rk~s~?~-=-=-=-=-=-=-=-=-=-=--=--=--=--=--=:~-;~-=--=-~~:~:~==~~::~=-=-~:_:_:_:_:_:_:_:_:_:=:
- an up-thrust for knowledge
TECHNICAL PUBLICA T/ONS®
Third-Generation Neural N
~l'lr.
Neural Net ., 3 - 24 . . ~
---..........
worns nnd Df><'P Ll'am,ng
_ ,,h direct conncct1on s 111 a convotur
1 d r II tonal
Ans . : Nl) I really. si nce \,·e ha \'C di:L'P layers. Lvcn t lOll::- o a or most of
, .. , . , ., be indirectly con ncctc
11d •11 L 'cry spnrSL'. unit s in the deeper layLis ca n the
mput image.

Q.6 What is equivariance representation ?


. . . f ·a meter sharing causes the layer to have a
.
Ans . · In
,
case of convo lution, the particular foi °
rn pm
property ca lled eq ui variance to translation.
Q.J List the types of pooling .
A . . . 1· 1 L2 norm and weighted average ·
ns. · Types of pool mg are Max pool mg, average poo mg,

Q.8 Explain Pro of tiled convolution.

Ans.:
· I · t I rand a locally connected layer ·
• Ito ffers a compromise between a convo ut1ona aye ·
• Memory requirements for storing the parameters wi ll increase only by a factor of the size
of this set ofkemels
Q.9 What Is a convolution ?

Ans. : Convolution is an orderly procedure where two sources of information are intertwined;
it's an operation that changes a function into something else.
Q.10 Which are four main operations in a CNN ?

Four main operations in a CNN are Convolution, Non- Linearity (ReLU), Pooling or Sub
Ans. ·:

Sampling and Classification (Fully Connected Layer).


Q.11 Define full convolution.

Ans. :Full convolution applies the maximum possible padding to the input feature maps before
convolution. The maximum possible padding is the one where at least one valid input value is
' involved in all convolution cases.
Q.12 What is an autoencoder ?

Ans. : Autoencoders are neural networks that can learn to compress and reconstruct input data,
such as images, using a hidden layer of neurons. An autoencoder rriodel consists of two parts : an
encoder and a decoder.
Q.13 What is aim of autoencoder ?

Ans. : The aim of an autoencoder is to learn a lower-dimensional representation (encoding


) for a

higher-dimensional data, typically for dimensionality reduction, by training the network to


capture the most important parts of the input image.

TECHNICAL PUBLICATIONS® - an .up-thrust for knowledge


rks and Dee p Lea rnin g 3-2 5 ks
,NeiWa Third-Generation Neu ral Networ
Neura . .
coder ?
What is regularization In autoen
a.14 . er
aut oen cod ers use a los s functio n that encourages the model to have oth
f>.OS · : Re
gulanzed . • .
erti es bes ide s cop ym g its mp ut to its output.
.
11roP upervised ?
aut oen cod er sup erv ise d or uns
a.15 Is
sscd
co_ der is a neu ral net wo rk mo del that seeks to learn a compre
,.,,,. : An autoen
mp ut. Th ey are an uns upe rvis ed
. learning method, although techmcally,
rep resentation of the . .
to as self-supervised.
re trai ned usm g sup erv ise d learning methods ' referred t
1heY a
?
r•
Q.lG Why do we use aut oen cod er
· ng) for higher
-
aim s to lea rn a low er-d ime nsiona l representation (encodi
Ans . .· An aut oen
cod er
for dim ens ion alit y reduction, by training the network to capture the
. e nsional data, typ ica lly
d,rn
most important par ts of the inp ut image.

□□□

You might also like