0% found this document useful (0 votes)

4 views32 pages

Adaptive Hough Transform With Optimized Deep Learning Followed by Dynamic Time Warping For Hand Gesture Recognition

Uploaded by

Kamel Ghanem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views32 pages

Adaptive Hough Transform With Optimized Deep Learning Followed by Dynamic Time Warping For Hand Gesture Recognition

Uploaded by

Kamel Ghanem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Multimedia Tools and Applications (2022) 81:2095–2126

https://doi.org/10.1007/s11042-021-11469-9

Adaptive hough transform with optimized deep learning

followed by dynamic time warping for hand gesture
recognition

Manisha Kowdiki1 · Arti Khaparde1

Received: 20 February 2021 / Revised: 21 July 2021 / Accepted: 19 August 2021 /

Published online: 20 October 2021
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021

Abstract
Hand gesture is a natural interaction method, and hand gesture recognition is familiar in
human–computer interaction. Yet, the variations, as well as the complexity of hand ges-
tures such as self-structural characteristics, views, and illuminations, made hand ges-
ture recognition as a challenging task. Nowadays, the human–computer interaction area
enhancement leads to putting interest in the dynamic hand gesture segmentation based on
the gesture recognition system. Apart from the lengthy clinical success, dynamic hand ges-
ture segmentation through webcam vision seems challenging due to the light effects, par-
tial occlusion, and complicated environment. Hence, to segment the entire hand gesture
region and enhance the segmentation accuracy, this paper develops an improved segmenta-
tion and deep learning-based strategy for dynamic hand gesture recognition. The data is
gathered from the ISL benchmark dataset that consists of both static as well as dynamic
images. The initial process of the proposed model is the pre-processing, which is being
performed by grey scale conversion and histogram equalization. Further, the segmentation
of gestures is done by the novel Adaptive Hough Transform (AHT), where the theta angle
is tuned. Once the segmentation of gestures is done, the optimized Deep Convolutional
Neural Network (Deep CNN) is used for gesture recognition. The learning rate, epoch
count, and hidden neurons are tuned by the same heuristic concept. As the main contri-
bution, the segmentation and classification are enhanced by the hybridization of Electric
Fish Optimization (EFO), and Whale Optimization Algorithm (WOA) called Electric Fish-
based Whale Optimization Algorithm (E-WOA). The training of optimized Deep CNN is
handled by Dynamic Time Warping (DTW) for avoiding redundant frames, thus enhancing
the performance of dynamic hand gestures. Quantitative measurement is accomplished for
evaluating hand gesture segmentation and recognition, which portrays the superior behav-
iour of the proposed model.

Keywords Dynamic and static hand gestures · Hand gesture recognition · Adaptive hough
transform · Deep convolutional neural network · Electric fish-based whale optimization
algorithm · Dynamic time warping

* Manisha Kowdiki
manisha.kowdiki@mitwpu.edu.in
Extended author information available on the last page of the article

13
Vol.:(0123456789)
2096 Multimedia Tools and Applications (2022) 81:2095–2126

1 Introduction

Hand gestures act as a natural way for interacting with distinct people. Motivated by the
sound and vision of human interaction, the utilization of hand gestures is the most effec-
tive and powerful method in Human–Computer Interaction (HCI). Hand gestures are
composed of a powerful inter-human communication modality. Hence, they are called
as the convenient and intuitive means for providing communication among machines and
humans. This shows the research interest in the advancement and development of hand
gesture technologies. The majority of the traditional solutions for gesture recognition
are modeled for the dynamic [19, 13] or static gestures [4, 11, 34]. There exist only a
few solutions, which perform on both dynamic and static gestures [32] in a simultaneous
manner. The majority of the present solutions handle two hands [24] or a single hand
[29]. There exist no solutions for recognizing and tracking the gestures from multiple
hands. The outcomes are mostly in the format of average recognition rates.
The DTW algorithm matches the data sequence of distinct lengths. It chooses and
stores the minimum cost in a rotational manner [8, 23]. DTW is an effective classification
technique, and most of the research in this field is performed to enhance its accuracy and
speed. Yet, this technique is limited due to the following reasons. (1) The existing DTW
technique performs in an image-to-image manner for calculating the distances among the
samples. This lessens the generalization ability apart from the training samples. (2) In
general, DTW techniques design the model objects as holistic time-series curves. Rather
than perceiving the global methods, the patterns are also examined. Hence, the global
features lessen its flexibility [37].
Nowadays, deep learning approaches such as RNN and CNN have offered reasonable
outcomes in computer vision research. Still, research is going on for enhancing their usage
in gesture recognition [39]. During the process of gesture recognition, the main challenge
faced by deep learning is the efficient gesture movement presentation. The deep learning-
oriented approach needs a huge database for training the inputs in an efficient manner for
attaining sufficient outcomes in the process of testing. The entire processing is done inside
the hidden layers; hence the researcher faces a challenge in investigating the training process
[12]. Hence, the DTW flexibility, its need for a small-sized database during the process of
training makes it the best tool for the matching process, and therefore, it acts as a flexible
technique for the extracted features analysis. At present times, the accelerometer-oriented
gesture recognition technologies are dependent on Dynamic Time Warping (DTW) [27, 40],
Fuzzy Neural Network (FNN) [5], Hidden Markov Model (HMM) [30], etc. Even though it
offers some advantages, it limits from space attitude variation of the mobile phone, acceler-
ometer specifications, distinct in the individual of the user, and various problems.
The major enhancement behind this paper is shown below.

• To propose an enhanced segmentation as well as the deep learning-oriented strategy

for the dynamic and static hand gesture recognition using the AHT and the opti-
mized Deep CNN.
• To develop a novel hybrid meta-heuristic algorithm called E-WOA by combining
EFO and WOA, where the proposed algorithm achieves better exploration, exploita-
tion, and convergence capability properties than other algorithms.
• To segment the gestures that are attained from the pre-processing phase using the grey
scale conversion and the histogram equalization using the novel AHT, in which the
improvement is made by optimizing its theta angle using the same proposed E-WOA.

13
Multimedia Tools and Applications (2022) 81:2095–2126 2097

• To recognize the gestures in the final step using the optimized Deep CNN, in which the
enhancement is made by optimizing its learning rate, epoch count, and hidden neuron
count using the same proposed E-WOA, thus improves the recognition accuracy.
• To train the optimized Deep CNN using the DTW technique to avoid the redundant
frames for improving the performance and to compare the proposed E-WOA-Deep
CNN with several optimization algorithms and machine learning algorithms for deter-
mining the superiority in terms of both the static as well as dynamic images.

The organization is as follows: Section 1 provides the introduction regarding the hand
gesture recognition process. The various literature-related works of the hand gesture rec-
ognition methods are described in Section 2. The dataset description and pre-processing
steps involved for static and dynamic hand gesture recognition are explained in Section 3.
Section 4 portrays the improved gesture segmentation for static and dynamic hand gesture
recognition. Section 5 describes the intelligent static and dynamic hand gesture recognition
using deep learning with DTW. Section 6 provides the results and discussions. The paper is
concluded in Section 7.

2 Literature Survey

2.1 Related Works

In 2016, Plouffe and Cretu [25] had labeled the improvement of a natural gesture user inter-
face based on the depth data that is gathered using a Kinect sensor. The hand was assumed
as the nearest object. The novel algorithm enhanced the scanning time for recognizing the
initial pixel. A directional search algorithm permitted the complete hand contour identifi-
cation. The fingertips were located using the k-curvature algorithm. The gesture candidates
were chosen by the DTW. The observed gesture was differentiated with a prerecorded ref-
erence gesture series. This method exceeded the majority of the solutions in terms of static
recognition. It was identical with respect to dynamic and static recognition of the sign lan-
guage alphabet and familiar signs. The solution has handled several hands inside the inter-
est space. The evaluated areas represented the natural control of a software interface and
gesture and sign digit interpretation.
In 2016, Srivastava and Sinha [31] revealed a quaternion-oriented QDTW approach.
It characterized distinct hand/arm gestures and movements. The case study was related
to the outdoor tennis game. A novel technique was developed for the process of training
several tennis shots. It also discussed the consistency and shot of a player. The accuracy
was enhanced for both detection and classification. In 2018, Tang et al. [33] developed a
Structured DTW technique for the purpose of continuous hand trajectory recognition. An
automatic continuous trajectory segmentation technique joined the velocity and templated
information for spotting the starting and finishing points. Distinct weights were assigned
on the basis of the structured information. It was demonstrated on the Continuous Letter
Trajectory (CLT) database. It was robust in terms of diversity.
In 2016, Cheng et al. [6] had defined an Image-to-Class DTW technique for the
3D hand trajectory gestures and 3D static hand gestures recognition. It was twofold.
Initially, it calculated the image-to-class DTW distance. Hence, better generalization
ability was attained. In the second step, fingerlets were developed for the static gesture

13
2098 Multimedia Tools and Applications (2022) 81:2095–2126

representation, and strokelets were proposed for the trajectory gesture representation.
The DTW distance was calculated among a gesture category and data sample. It was
demonstrated on distinct 3D hand gesture datasets. The UESTC-HTG was gathered by
means of a Kinect device. The recognition accuracy was enhanced on trajectory ges-
tures and static gestures.
In 2015, Wang and Li [35] developed a novel accelerometer-oriented gesture recog-
nition system. The data collection was defined using the acceleration waveform in an
automatic manner. It handled the problems produced by the amplitude range. The angle
offset was minimized using the coordinate transformation theory. During the process of
training, the exemplars and clusters were extracted by the Affinity Propagation (AP) and
DTW. The classification enhanced the resolution. It returned better performance on the
Android platform.
In 2018, Choi and Kim [7] introduced a modified DTW algorithm, which differen-
tiated gesture-position sequences on the basis of the gestural movement direction. It
did not take into account the two-dimensional behaviour of the movement of the user.
Hence, the sequence comparison is required to be enhanced. Similar gestures were cho-
sen by means of the filtering process. The difference was computed by means of the
ratio of the proportional and the Euclidean distance to the computed difference. The
recognition-decline issue handled the chosen spline interpolation. The experiment was
conducted on public databases such as G3D and MSRC-12 that described an enhanced
performance of the proposed method.
In 2020, Ameur et al. [1] developed a dynamic hand gesture recognition technique by
means of touchless hand motions over a Leap Motion device. The bidirectional LSTM
and basic unidirectional LSTM were exploited in a separate manner. The temporal and
spatial dependencies were considered among the network layers and the Leap Motion
data during the process of the backward and forward pass. This model was tested on the
RIT dataset and LeapGestureDB dataset. It retrieved better performance with respect to
computational complexity and accuracy.
In 2013, Angel et al. [2] addressed a probability-oriented DTW for gesture recog-
nition. The Gaussian-oriented probabilistic model was constructed using distinct sam-
ples. In the final step, the DTW cost was described on the basis of the novel method. It
was tested on challenging databases. From the analysis, better gesture recognition was
retrieved on the RGB-D data.
In 2021, Lv [15] has proposed a somatosensory sensor for implementing gesture rec-
ognition. This method was very important in human–computer interaction. Initially, the
gesture was converted into a gesture sequence that has consisted of micro gestures for
describing the different directions. Then, the comparison between the gesture sequences
with gesture templates was made. The gesture found out by the pattern matching result.
From the results, more than 92% gestures can be found out by the proposed recognition
method.
In 2021, Blazkiewicz [3] aimed to quantify the degree of asymmetry of kinematic and
kinetic parameters which is occurred by the existence of the different ankle orthosis set-
tings using Dynamic Time Warping (DTW). Barefoot gait and gait with four different
walker settings were tested in eighteen healthy persons. Measurement of Kinematic and
kinetic parameters was done using the Vicon and Kistler plates. From the results, the ortho-
sis position of this study fulfils its protective function, but gait 15DF can lead to the over-
load of the knee and hip joints.

13
Multimedia Tools and Applications (2022) 81:2095–2126 2099

2.2 Review

Hand gesture recognition is an important area in language technology and computer science
with the aim of interpreting hand gestures through mathematical algorithms. It emerges
from any state or bodily motion. It can be accomplished using several methods such as
DTW, deep learning, and Hough transform, etc. But, there exist drawbacks such as it is not
precise, not possible to make long conversations, information gets distorted, and complex-
ity in understanding, etc. Table 1 lists the features and the challenges of the existing models
associated with hand gesture recognition. k-curvature algorithm [25] can be employed in
the natural control of a software application and is utilized for the real-time interpreta-
tion of familiar gestures and sign digits. But, it does not speed up the gesture recognition
using a distinct version of DTW. QDTW [31] offers better accuracy and can be utilized in
outdoor swing-oriented sports and indoor gaming. Still, it does not propose a ranking sys-
tem for a player. SDTW [33] segments the trajectories automatically by joining the SVM
classifier and DTW algorithm and also improves the significance related to the structure
information. Yet, it does not minimize the overlap problem among complex letter trajecto-
ries and simple letter trajectories. I2C-DTW [6] attains better generalization ability and is
also useful for multiple users to appear together. But, it does not learn the fewer parameters
in an automatic manner for handling the multiple-user gesture recognition and continuous
gesture recognition. MVSAMP [35] is more robust to angle offset and waveform distortion
and also offers better performance in user-independent recognition and user-dependent rec-
ognition. Still, the offset that is produced by the yaw angle is not resolved. Modified DTW
algorithm [7] lightens the matching errors and also provides a lower learning pressure and
a simple calculation process. Yet, much time-consuming occurs. HBU-LSTM [1] enhances
the model performance by considering the temporal and spatial dependencies and also
effectively classifies the input data attained from the LMC. But, it cannot be implemented
by GPU. Probability-based DTW [2] benefits from both the temporal warping ability and
generalization ability and can also solve the multiple deformations in data. Kinect sensor
[15] is used for checking whether the threshold is small. The computational cost is very
high. DTW [3] has a simple calculation process and a lower learning pressure. It has a very
high computational cost. Still, it does not attain gesture-discriminative features. Thus, it is
required to develop novel deep learning methods for generating effective performance of
hand gesture recognition while handling both dynamic and static images.

3 Dataset Description and pre‑processing steps involved for static

and dynamic hand Gesture Recognition

3.1 Dataset Description

The dataset used here is the benchmark IIITA-ROBITA Indian Sign Language (ISL) Ges-
ture Database. The recognition and interpretation of ISL gestures are becoming an inter-
esting area of research in the area of Human–Robot Interaction. The IIITA-ROBITA ISL
Gesture Database is offered by the Indian Institute of Information Technology Allahabad
(IIITA)-ROBITA to the gesture recognition researchers for generating future enhancement
in this area [21, 22]. The data is generated at the AI and Robotics Lab, IIIT-Allahabad

13
Table 1 Features and challenges of state-of-the-art hand gesture recognition methods
2100

Author [citation] Methodology Features Challenges

13
Plouffe and Cretu [25] k-curvature algorithm It can be used for the real-time interpretation of familiar The gesture recognition is not speeded up by a distinct version
gestures and sign digits of DTW
It can be employed in the natural control of a software
application
Srivastava and Sinha [31] QDTW It can be utilized in outdoor swing-oriented sports and indoor The ranking system is not proposed for a player
gaming
It provides better accuracy
Tang et al. [33] SDTW The significance related to the structure information is The overlap problem is not minimized among complex letter
improved trajectories and simple letter trajectories
The trajectories are segmented in an automatic manner by
joining the SVM classifier and DTW algorithm
Cheng et al. [6] I2C-DTW It is helpful for multiple users to appear together The less parameter are not learned in an automatic manner for
It attains better generalization ability handling the multiple-user gesture recognition and continu-
ous gesture recognition
Wang and Li [35] MVSAMP It provides better performance in user-independent recogni- It cannot resolve the offset that is produced by the yaw angle
tion and user-dependent recognition
It is more robust to angle offset and waveform distortion
Choi and Kim [7] Modified DTW algorithm It offers a lower learning pressure and a simple calculation It is very much time-consuming
process
The matching errors are alleviated
Ameur et al. [1] HBU-LSTM The input data attained from the LMC is effectively classified It cannot be implemented by GPU
The model performance is enhanced with consideration of
the temporal and spatial dependencies
Angel et al. [2] Probability-based DTW It can handle multiple deformations in data The gesture-discriminative features are not attained
It benefits from both the temporal warping ability and gener-
alization ability
Lv [15] Kinect sensor It is used for checking whether the threshold is small The computational cost is very high
Blazkiewicz [3] DTW It has a simple calculation process and a lower learning It has a very high computational cost
pressure
Multimedia Tools and Applications (2022) 81:2095–2126
Multimedia Tools and Applications (2022) 81:2095–2126 2101

Aboard Anger Ascend Beside Drink

Fig. 1 Sample static images of ISL dataset for five signs

from July 2009. The dataset is composed of 23 distinct gestures that are captured at 320
by 240 pixels, 30 fps having Sony Handycam. A constant background is maintained
that contains several light illumination conditions. The ISL gesture dataset consists of
a sequence of RGB frames for 23 Isolated ISL gestures. Some of the sample static, as
well as dynamic images used here for the hand gesture recognition are displayed below in
Figs. 1 and 2.

3.2 Image Pre‑processing

In the proposed hand gesture recognition model, the image pre-processing is performed
using the histogram equalization and the grey scale conversion. In the static type, the
images are considered for the processing, and for the dynamic type, the video frames
{ are
}
in ,
utilized for the processing. Assume the database used for the recognition as QA = MZcy
in which cy = 1, 2, ⋯ CY and CY represents the total number of hand gesture images or
video used for the recognition. The description of these two methods adopted in the pre-
processing phase is given below.

Frame 1 for Frame 101 for Frame 201 for Frame 301 for Frame 401 for
Across Across Across Across Across

Frame 1 for Frame 101 for Frame 201 for Frame 301 for Frame 401 for
Advance Advance Advance Advance Advance

Fig. 2 Sample dynamic images of ISL dataset for different frames for different signs

13
2102 Multimedia Tools and Applications (2022) 81:2095–2126

3.2.1 Grey scale conversion

The grey scale image is formed by transforming the colour intensity EQJQ of the entire
pixels in the color image to the entire trees present in the random forest. At every tree node,
the group of pixels is divided in terms of the stored binary test 𝜑∗, and it is sent again to
the right or left of the child node till a leaf node is met. The path of iqth pixel present in the
entire trees of the forest finishes in a group of leaf nodes LFQiq. The handled values pre-
∑
sent in the LFQiq represent the covariance matrix lq and the leaf node I Q ̂ lq . On holding
these values, the RGB colours of iq th pixel are transformed to the grey scale value GVQiq as

in Eq. (1).
∑ ( lq )
GVQiq = ̂
𝜔lq dec I Q
(1)
lq∈LFQiq

Here, the confidence

( weight
) is represented by 𝜔lq , the normalized decimal value of I Q
̂ lq
is represented by dec I Q
̂ lq , and these are described by lq
𝜔 = Trace lq . Hence, the attained
1
∑
( )
grey scale image is represented by MZcy .
grey

3.2.2 Histogram equalization

This Histogram Equalization approach [9] varies the intensity of an image, thereby improv-
ing the image brightness. Assume MZcy as the known image with the help of kqlq by kqmq
grey

matrix having pixel intensities within 0 to 1, and the possible intensity value count is rep-
resented by PIVQ that is almost equal to 256. The normalized histogram NHQ$ of EQJQ
having a bin for every possible intensity is represented in Eq. (2).
number of pixels with density heq
NHQ = (2)
total numberof pixels

In the above equation, heq = 0, 1, ⋯ , PIVQ − 1. The histogram equalized image is

his , where floor() rounds to the nearest integer.
shown in Eq. (3), which is defined by MZcy
EQJQ
∑
grey(iq,KQ)
his
MZcy = floor((PIVQ − 1) NHQ) (3)
heq=0

Hence, the final pre-processed image using the histogram equalization is represented by
his , and this is further employed for the process of segmentation.
MZcy

3.3 Proposed architectural model

Nowadays, the major challenging task present in computer vision is hand gesture recog-
nition. The present methodologies revealed preliminary outcomes on simple scenarios,
yet they are very apart from human performance. Owing to the huge count of potential
applications that include human gesture recognition in fields such as clinical assistance,
sign language recognition, or surveillance, among others, there exists a vast active
research community in handling this problem. An illustration of sequential learning is
hand gesture recognition. The major problem arises due to the distinct temporal dura-
tion of the data sequences, and it may even consist of a fundamentally distinct group of

13
Multimedia Tools and Applications (2022) 81:2095–2126 2103

component elements. The two major approaches associated with this problem are the
methods like Conditional Random Fields (CRF) or Hidden Markov Models (HMM) that
are generally used for handling the problem from a probabilistic viewpoint in the case of
classification problems. Moreover, the key pose techniques for hand gesture recognition
are also developed. The dynamic programming-oriented algorithms are utilized for both
the clustering as well as the alignment of temporal series. The most familiar dynamic
programming technique utilized for hand gesture recognition is Dynamic Time Warping
(DTW). Yet, applying these methods in complex portions becomes a challenging task
owing to the highly changing environmental conditions. The familiar problems arising
are the partial occlusions, illumination variations, unexpected object appearance, speed,
human action spontaneity, human movement continuity, background influence, the wide
variety of human pose configurations, or distinct viewpoints. These impacts produce
dramatic variations in the specific gesture description, thereby producing great intra-
class variability.. The architecture of the proposed hand gesture recognition model is
displayed in Fig. 3.
The proposed hand gesture recognition model is composed of four phases such as,
“Data collection, pre-processing, segmentation, and recognition”. In the initial phase, the
data is gathered from the standard ISL dataset that consists of static as well as dynamic
images. After collecting the data, the next phase of pre-processing begins. It is done to
enhance the image, which suppresses the unwanted distortions and improves few features
of the image that are significant for future processing. Here, the pre-processing is accom-
plished using the grey-scale conversion and the histogram equalization. Pre-processing is
useful to expose how the data should be structured on the basis of domain knowledge. In
the grey scale conversion, the RGB values are converted into gray scale values and are
used to measure the intensity of light present in images. In the grey scale conversion, lesser
information is provided in each pixel that makes the process simpler. The histogram equali-
zation is used to enhance the contrast present in the images. It is done by efficiently spread-
ing out the most repeated intensity values. Once the pre-processed image is attained, it is
subjected to the third phase of segmentation. Here, the AHT is used to perform the seg-
mentation process. The Hough transform is used to find the imperfect instances of objects
that are present inside a specific class of shapes. An improvement is made in the Hough
transform by optimizing the theta parameter using the proposed E-WOA; hence it is called
as adaptive segmentation. The Hough transform is very useful when detecting the lines
with short breaks in images due to noise. This segmented image is subjected to the final
recognition phase, in which the recognition is performed by the optimized Deep CNN.
Here, an enhancement is made in the Deep CNN by optimizing its learning rate, epoch
count, and hidden neuron count by the same proposed E-WOA. E-WOA has the capability
for avoiding the local optima. It is also useful for solving the unconstrained and constrained
issues. The learning rate describes a tuning parameter for defining the step size at every
iteration. An epoch represents the count of passes of the complete training dataset the algo-
rithm has taken. The hidden neurons are used to accomplish the nonlinear transformations
of the inputs that are subjected to the network. The training of the Deep CNN is done
by the DTW to avoid the redundant frames and improve the performance when dealing
with dynamic data. The implementation of the Deep CNN is very faster. The DTW per-
mits comparisons of two time-series sequences having changing speeds and lengths. DTW
lightens the matching errors and also provides a lower learning pressure. In the final step,
the Deep CNN returns the recognized gesture as output. Due to the advantages of the pro-
cessing steps, the gesture segmentation and recognition has performed efficiently.

13
2104 Multimedia Tools and Applications (2022) 81:2095–2126

Input ISL dataset

Static Images Dynamic Images

Pre-Processing

Grey-scale Histogram
conversion equalization

Segmentation

AHT
Theta

Recognition
Training
Optimized Deep CNN Optimize
Deep CNN
Proposed E-
WOA Learning Epoch Hidden
rate count neuron count

DTW

Recognized
gesture

Fig. 3 Architecture of the proposed hand gesture recognition model

4 Improved gesture segmentation for static and dynamic hand

gesture recognition

4.1 Adaptive hough transform

The segmentation of gestures is performed by the AHT, in which an improvement is made

by optimizing its theta value by the proposed E-WOA. Hough transform [20] is used as a
computational tool for quantifying and detecting the shape primitives present in images in the

13
Multimedia Tools and Applications (2022) 81:2095–2126 2105

availability of noise. It acts as a robust tool for extracting the features like ellipses, circles, or
straight edges. The primitives are described by the polygons and are defined in a parametric
format. It is employed as one step in a processing chain. It has the capability to find, quantify,
and extract shapes for recognizing those features and shapes in the case of incomplete, broken
outlines or that of noise corrupted in the thresholded image. The shape of interest is trans-
formed into its parameter space. A line present in a Cartesian coordinate system (xh, yh) is
defined as in Eq. (4).
yh = mhxh + bh (4)
Here, the term bh describes its interception with yh and the constant mh denotes the
slope. Every line is characterized uniquely with the constants bh and mh. Hence, any line is
described using a point in a coordinate system bh and mh. Conversely, any point (xh, yh) is
linked with a group of values for bh and mh, and hence Eq. (4) is rewritten as in Eq. (5).
yh 1
mh = − bh (5)
xh xh
Every point (xh, yh) is defined by a line in a (mh, nh) space. The values
( present)in (mh, nh)
space are not described for the vertical lines. When a group of points yhkh , xhkh lying on a
line that is defined by yh = MHxh + BH is converted into (mh, nh) space known as parameter
space or Hough space, then every point is defined using a line in Hough space as in Eq. (6).
yhkh 1
mh = − bh (6)
xhkh xhkh

Assume that the entire lines meet at one point (MH, NH). The Hough transform is also
used by the polar coordinates as in Eq. (7).
𝜌 = xh cos 𝜃 + yh sin 𝜃 (7)
Here, the angle of the line with the horizontal axis is defined by 𝜃, and the minimum distance
to the origin is defined by 𝜌, and these are linked to bh and mh via Eq. (8) and Eq. (9).
cos 𝜃
mh = − (8)
sin 𝜃

cos 𝜃
mh = − (9)
sin 𝜃
The Hough space seems to be bi-dimensional, having coordinates θ and ρ. The initial step
in the Hough transform method is the formation of a 2D parameter space. A straight line is
fixed to every element present in the matrix. This parameter matrix describes only a finite line
count. Next, a counter is fixed from every point to other points in the parameter space. Hence,
the parameter matrix is generally known as an accumulator. Initialize the counter to zero before
initiating the transformation. Here, the improvement is made in the Hough transform by opti-
mizing its θ value by the proposed E-WOA, hence it is called as AHT. The main objective of
the proposed E-WOA-based gesture segmentation is to maximize the accuracy by optimizing
the theta value of the Hough transform. This characteristic is defined in Eq. (10).
Obn1 = arg max (Acr)
{𝜃} (10)

13
2106 Multimedia Tools and Applications (2022) 81:2095–2126

Here, the term Obn1 denotes the objective function for the gesture segmentation and
θ denotes the theta angle of the Hough transform that is to be optimized by the proposed
E-WOA, and Acr denotes the accuracy. Accuracy is defined as, “the degree to which the
result of a measurement conforms to the correct value or a standard”. The formula for
accuracy is defined in Eq. (11).
pt + nt
Acr =
pt + nt + pf + nf (11)

Here, the terms nt and nf denotes the true negative and false negative, respectively. The
bounding limit of the theta value lies in between 0.01 and 0.10. The solution encoding of
the E-WOA-based gesture segmentation is portrayed in Fig. 4.
The major drawback of the Hough transform is that it produces misleading out-
comes when objects appear to be aligned by chance. The detected lines represent
the infinite lines that are defined by their (mh, nh) values other than the finite lines

I/P image Pre-processed image

Static image Dynamic image Static image Dynamic image

AHT for segmentation

Theta optimization Initialization

Segmented
output

Fitness
Check accuracy
evaluation

No if
accuracy
is high

Update by Yes
E-WOA
Optimized
Termination
segmented output

Fig. 4 Proposed Adaptive Hough transform for gesture segmentation

13
Multimedia Tools and Applications (2022) 81:2095–2126 2107

having the defined end points. Hence, to overcome these drawbacks, the parameter θ
is optimized in the Hough transform. The AHT is tolerant of the gaps present in the
edges, unaffected by the occlusion present in the image, and it is also not affected
by the noise. Thus, the final AHT-based gesture segmented image is represented as
MZcy .
hough

4.2 Proposed E‑WOA

The proposed E-WOA-Deep CNN is used to optimize the theta parameter of Hough trans-
form in the segmentation phase and learning rate, epoch count, and hidden neuron count
of Deep CNN in the recognition phase. The WOA [17] mimics the humpback whales. It
is motivated using the bubble-net hunting method. After describing the best search agent,
the remaining search agents update their positions in the path of the best search agent as
depicted in Eqs. (12) and (13).

⃗ = ||CF
DF ⃗ ⋅ XA
⃗ ∗ (iaja) − X A(iaja)
⃗ |
| (12)
| |

⃗
X A(iaja ⃗ ∗ (iaja) − AF
+ 1) = X A ⃗ ⋅ DF
⃗ (13)

Here, the position vector of the best solution is represented by XA ∗, the element-by-
element multiplication is represented by ‘·’, the current iteration is represented by iaja,
the position vector is represented by X A ⃗ , the coefficient vectors are represented by AF
⃗
and CF ⃗ , and the absolute value is represented by ||. The coefficient vectors are measured
as in Eqs. (14) and (15).
⃗ = 2a⃗f ⋅ r⃗a − a⃗f
AF (14)

⃗ = 2 ⋅ r⃗a
CF (15)
In the above equations, a⃗f is minimized from 2 to 0 and r⃗a represents a random
vector in [0,1]. Two approaches are defined to describe the bubble-net character-
istics called the exploitation phase of the humpback whales. In the first approach,
called the shrinking encircling mechanism, the term a⃗f is decreased. In the second
approach, called spiral updating position, the distance is measured among the whale
positioned at (XA, YF) and prey positioned at (XA ∗, YF ∗) . This behaviour is shown in
Eq. (16).
⃗
X A(iaja + 1) = DF⃗� ⋅ ebflf ⋅ cos (2𝜋lf ) + X A
⃗ ∗ (iaja) (16)

Here, the term lf represents a random number, bf represents a constant, · represents an element-
| ⃗ |
by-element multiplication, and DF⃗� = |X A ⃗
∗ (iaja) − X A(iaja) | represents the distance of the if th
| |
whale to the prey. The spiral updating position is modelled as in Eq. (17).
{
XA⃗ ∗ (iaja) − AF ⃗ ⋅ DF⃗ if pf < 0.5
⃗
X A(iaja + 1) = (17)
⃗� bflf ⃗
DF ⋅ e ⋅ cos (2𝜋lf ) + X A ∗ (iaja) if pf ≥ 0.5

13
2108 Multimedia Tools and Applications (2022) 81:2095–2126

In the above equation, the random number in [0,1] is represented by pf . The prey is
searched in a random manner. The search for prey is the exploration phase on the basis of the
⃗ vector. The search agent present in the exploration phase is updated on the
variation of the AF
basis of the randomly selected search agent rather than the best search agent attained. This
| ⃗|
mechanism together with |AF | > 1 performs the exploration and permits to accomplish a
| |
global search as shown in Eqs. (18) and (19).

⃗ = ||CF
DF ⃗ ⋅ XA ⃗ ||
⃗ rand − X A (18)
| |

⃗
X A(iaja ⃗ rand − AF
+ 1) = X A ⃗ ⋅ DF
⃗ (19)

The WOA offers several advantages such as better exploitation, exploration, con-
vergence behaviour, and local optima avoidance. But, it lacks due to some shortcom-
ings, such as it is not good with the search space exploration. Hence, to overcome
the shortcomings, EFO is integrated into it, and the so formed algorithm is called
as E-WOA. The EFO has several advantages such as high compelling convergence
ability, and better global searchability, etc. EFO algorithm [38] is motivated by the
communication characteristics and prey location of the electric fish. The passive and
active electrolocation ability of these fishes acts as the best candidate for handling
the global as well as the local search. It is composed of only a fraction of the entire
fish species. They accommodate in muddy water, and they seem to be nocturnal. A
species-specific ability is possessed called electrolocation that acts as the distinguish-
ing ability for positioning the obstacles and prey. They are classified as weakly and
strongly electric fish on the basis of the electric field strength produced. The strongly
electric fish uses the electrolocation ability for offensive uses. The weakly electric
fish are employed to detect, communicate, navigate objects, etc. The behaviours that
rely on self-organization are multiple interactions, fluctuation, negative feedback, and
positive feedback. The intelligent characteristics of electric fish are designed as Elec-
tric Organ Discharge (EOD) amplitude, EOD frequency, passive electrolocation, and
active electrolocation.
In general, for the traditional WOA, if (pf < 0.5), it checks whether (|AF| < 1). If it is
satisfied, then the current search agent is updated using Eq. (12), and if (|AF| ≥ 1), the
current search agent position is updated using Eq. (19). But, in the proposed E-WOA, if
(pf < 0.5), it checks the condition whether (|AF| ≥ 1). If this condition is fulfilled, then the
current search agent position is updated using Eq. (19). Otherwise, if (|AF| < 1), then the
update takes place using EFO as in Eq. (20).
( )
XAcand
iaja = XAiaja + 𝜙 XAkaja − XAiaja (20)

Here, the term ka denotes a randomly selected individual from the neighbour group
of the iath individual, the dimension is defined by ja|ja ∈ 1, 2, ⋯ , da , the candidate
location of the iath individual is defined by xacand
iaja
, and a random number produced
from a uniform distribution is defined by 𝜙 ∈ [−1, 1]. In the other case, it checks
whether (pf ≥ 0.5) , and if this condition is satisfied, then the current search agent
position of WOA is updated using Eq. (16). The pseudocode of the proposed E-WOA
is shown in Algorithm 1, and the flowchart of the developed E-WOA is depicted in
Fig. 5.

13
Multimedia Tools and Applications (2022) 81:2095–2126 2109

Algorithm 1: Pseudocode of proposed E-WOA

Begin
Population initialization
Fitness calculation
XA * =best search agent
while ( iaja maximum iteration count)
for every search agent
Update the parameters
if pf 0.5
if AF 1
Current search agent update with the help of WOA as
in Eq. (19)
else if AF 1
Perform EFO update as in Eq. (20)
end if
else if pf 0.5
Current search agent position update of WOA with the
help of Eq. (16)
end if
end for
If any search agent moves beyond the search space, check and
amend it
Fitness calculation of every search agent
If a better solution arises, them update XA *
iaja iaja 1
end while
Return XA *
Stop

A hybrid algorithm [16] joins two or more algorithms for handling the critical problem
by selecting one algorithm or switching among them over the sequence of the algorithm.
This is usually performed to join the necessary features of each, such that the new algo-
rithm is improved than the individual components. It is modelled to provide better per-
formance when compared with the individual algorithms. It exploits the good properties
of distinct techniques by subjecting them to the problems that it can handle in an effec-
tive manner. It can easily solve multi-objective optimization engineering problems having
inequality constraints.

5 Intelligent static and dynamic hand gesture recognition using deep

learning with the dynamic time warping

5.1 Dynamic time warping

While considering the dynamic image (video), the training, as well as the testing, includes
some frame count that offers distinct information. Therefore, the frames having repeated

13
2110 Multimedia Tools and Applications (2022) 81:2095–2126

Fig. 5 Flowchart of the proposed E-WOA

13
Multimedia Tools and Applications (2022) 81:2095–2126 2111

information are eradicated that contain minimum difference from the earlier frame. The
DTW [26] model measures the disparity between two data series that are attained at dif-
ferent times. A matrix comprising of the Euclidean distances at aligned points above the
two series is utilized for computing the minimal cost between the two series. Further, the
direction of the shortest path selection is associated with specific regulations and rules.
In particular, the movement is lessened to diagonal, horizontal, and vertical directions. A
weight is associated with these directions. The shortest path is limited in the case of thresh-
old to the two series. These are needed to be equivalent ones. Thus, the measurement of the
distance between the two frames helps to remove the repeated frames based on the DTW
concept.

5.2 Optimized deep CNN‑based recognition

The optimized deep CNN is used for gesture recognition, in which the improvement is
made by optimizing its learning rate, epoch count, and hidden neuron count by the pro-
posed E-WOA. Deep CNN [28] represents the feedforward networks, where the informa-
tion flow occurs in a single direction. These are biologically inspired networks. CNN archi-
tectures are composed of pooling and convolutional layers. The pooling layers are grouped
in the form of modules. The modules are arranged one above the other for generating a
deep model. One or several fully connected layers are fed by these representations. The
final fully connected layer produces the class label as output.

5.2.1 Convolutional layers

It acts as feature extractors. Hence, the feature representations regarding the input images
are learned. The neurons are sorted in the form of feature maps. Every neuron is composed
of a receptive field that is linked to the neuron neighbourhood in the earlier layer through
a group of trainable weights called a filter bank. A novel feature map is measured by con-
volving the inputs with the learned weights. The outcomes are passed via a nonlinear acti-
vation function. The neurons consist of weights that are conditioned to be equal. Different
feature maps consist of distinct weights, and hence it is possible to extract various features
at every location. The kzth output feature map YZkz is measured as in Eq. (21).
( )
YZkz = f WZkz ∗ xz (21)

Here, the convolutional filter is represented by WZkz , the input image is represented by
xz, the multiplication sign represents the 2D convolutional operator, and f (⋅) defines the
nonlinear activation function. The nonlinear features are extracted by the nonlinear activa-
tion functions. The existing methods used the hyperbolic and sigmoid tangent functions.
The Rectified Linear Units (ReLUs) are very familiar these days.

5.2.2 Pooling layers

This layer minimizes the spatial resolutions of the feature maps. It propagates the average
of the input values to the next layer. The maximum value is propagated by the max-pooling
aggregation layers inside a receptive field. It chooses the largest element in every receptive
field as in Eq. (22).

13
2112 Multimedia Tools and Applications (2022) 81:2095–2126

YZkzizjz = max xzkzpzqz

(pz,qz)∈ℜizjz (22)

In the above equation, the pooling operation output is represented by YZkzizjz , and the
element at the location (pz, qz) is represented by xzkzpzqz.

5.2.3 Fully connected layers

Various convolutional and pooling layers are stacked above each other for extracting vari-
ous abstract feature representations. It accomplishes the function of high-level reasoning.
The classification problems utilized the softmax operator on top of DCNN.

5.2.4 Training

The free parameters are adjusted by the CNN using the learning algorithms. This achieves
the necessary network output. The familiar algorithm used here is the backpropagation. It
measures the gradient of an objective. The parameters are adjusted to reduce the errors.
The problem that occurs in training is overfitting. This issue damages the capability of the
model in generalizing unseen data. It exists as a major challenge during the regularization
process. DCNNs need various hyperparameters like the epoch count for running the model
as well as the learning rate. The batch normalization permits higher learning rates. The
learning rate is the quantity of the weights that are updated in the process of training. It is
mostly a configurable hyperparameter that is employed in the training of Deep CNN and is
composed of a small positive value, usually in the range among 0 to 1. An epoch describes
the pass count of the whole training dataset, the optimization algorithm has finished. If the
batch size represents the entire training dataset, then the epoch count is nothing but the
iteration count.
The major objective of the proposed E-WOA-based classification is to maximize the
precision by optimizing the learning rate, epoch count, and hidden neuron count of the
Deep CNN. This behaviour is mathematically modelled as in Eq. (23).
Obn2 = arg max (Pr ec)
{LR,EC,HNC} (23)

In the above equation, the term Obn2 represents the objective function for the classifica-
tion, LR denotes the learning rate, EC denotes the epoch count, and HNC denotes the hid-
den neuron count of the Deep CNN that is to be optimized by the proposed E-WOA, and
Pr ec denotes the precision. Precision is defined as, “the ratio of positive observations that
are predicted exactly to the total number of observations that are positively predicted”. The
formula for precision is defined in Eq. (24).
pt
Pr ec =
pt + pf (24)

Here, the terms pt and pf denotes the true positive and false positive, respectively. The
bounding limit of the learning rate lies in between 0.01 to 0.09, the epoch count lies in
between 5 to 10, and the hidden neuron count lies in between 5 to 256. The architectural
representation of the proposed optimized Deep CNN-based recognition is depicted in Fig. 6.
In the traditional Deep CNN, the orientation and position of the objects are not encoded
into their predictions. It does not have the capability to be spatially invariant to the input
data. It also entirely loses the information regarding the position and composition of the

13
Multimedia Tools and Applications (2022) 81:2095–2126 2113

Fig. 6 Architectural representation of the proposed optimized Deep CNN-based recognition

components. Hence, the optimized Deep CNN is proposed to overcome these drawbacks. This
optimized Deep CNN can minimize the losses and offer the most accurate outcomes.

6 Results and discussions

6.1 Experimental setup

The proposed E-WOA-Deep CNN-based hand gesture segmentation and recognition were
implemented in Python, and the results were executed. Here, the dataset was gathered from
the ISL dataset that consists of static as well as dynamic images. Here, the population size
considered was 10, and the maximum iterations accomplished was 25. Here, the proposed
E-WOA-Deep CNN was compared with various optimization algorithms such as EFO-Deep
CNN [38], WOA-Deep CNN [17], GWO-Deep CNN [18], and PSO-Deep CNN [36], and
distinct machine learning algorithms like DH-GWO-NN [22], Deep CNN [28], RNN [14],
and VGG16 [10] in terms of performance measures such as, “accuracy, sensitivity, specificity,
precision, FPR, FNR, FDR, NPV, F1 Score and MCC”.

6.2 Performance measures

The various performance measures used here are described below.

(a) Accuracy: It is defined in Eq. (11).
(b) Sensitivity: “the number of true positives, which are recognized exactly”.
pt
Sens =
pt + nf (25)

13
2114 Multimedia Tools and Applications (2022) 81:2095–2126

(d) Precision: It is defined in Eq. (24).

(e) FPR: “the ratio of the count of false-positive predictions to the entire count of negative
predictions”.
pf
FPR = (27)
pf + nt

(f) FNR: “the proportion of positives which yield negative test outcomes with the test”.
nf
FNR = (28)
nt + pt

(g) NPV: “probability that subjects with a negative screening test truly don’t have the
disease”.
nf
NPV = (29)
nf + nt

(h) FDR: “the number of false positives in all of the rejected hypotheses”.
pf
FDR = (30)
pf + pt

(i) F1 Score: “harmonic mean between precision and recall. It is used as a statistical
measure to rate performance”.
Sens ∙ Pr ec
F1Scre = (31)
Pr ec + Sens
(j) MCC: “correlation coefficient computed by four values”.
pt × nt − pf × nf
MCC = √ (32)
(pt + pf )(pt + nf )(nt + pf )(nt + nf )

6.3 Segmentation analysis

The experimental outcomes of the hand gesture recognition are displayed in Figs. 7 and 8
that include the original images, pre-processed images, canny edge detection images, and
Hough transformed images.

6.4 Effect of optimized deep CNN

The performance analysis of the developed and traditional heuristic-oriented AHT and
Deep CNN for hand gesture recognition using static and dynamic images by changing the
learning percentages for the different performance measures is depicted in Figs. 9 and 10.
The positive measures reveal an increased result, and negative measures reveal a decreased

13
Multimedia Tools and Applications (2022) 81:2095–2126 2115

Image 1 Image 2 Image 3 Image 4 Image 5 (Drink)

(Aboard) (Anger) (Ascend) (Beside)
Original
Images

Pre-
processed
Images

Canny
Edge
detection
Images

PSO-Deep
CNN
Hough
transforme
d Images

GWO-
Deep
CNN
Hough
transforme
d Images
WOA-
Deep
CNN
Hough
transforme
d Images
EFO-Deep
CNN
Hough
transforme
d Images

E-WOA-
Deep
CNN
Hough
transforme
d images

Fig. 7 Experimental outcomes of pre-processing and segmentation for hand gesture recognition in terms of
static images

13
2116 Multimedia Tools and Applications (2022) 81:2095–2126

Frame 1 (Above) Frame 2 (Across) Frame 3 (Advance) Frame 4 (Afraid) Frame 5 (All)
Original
Images

Pre-processed
Images

Canny Edge
detection
Images

PSO-Deep
CNN Hough
transformed
Images

GWO-Deep
CNN Hough
transformed
Images

WOA-Deep
CNN Hough
transformed
Images

EFO-Deep
CNN Hough
transformed
Images

E-WOA-
Deep CNN
Hough
transformed
images

Fig. 8 Experimental outcomes of pre-processing and segmentation for hand gesture recognition in terms of
dynamic images

result defining the superiority of the proposed E-WOA-Deep CNN. From Fig. 9a, the accu-
racy of the E-WOA-Deep CNN at 65% learning percentage is 1.68%, 0.52%, 1.36%, and
1.25% higher than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep
CNN. On considering Fig. 9b, at 65% learning percentage, the F1 Score of the E-WOA-
Deep CNN is 0.92%, 0.31%, 0.82%, and 0.72% better than EFO-Deep CNN, WOA-Deep
CNN, GWO-Deep CNN, and PSO-Deep CNN. In Fig. 9c, the FDR of the E-WOA-Deep

13
Multimedia Tools and Applications (2022) 81:2095–2126 2117

Fig. 9 Performance analysis of the developed and traditional heuristic-oriented AHT and Deep-CNN for
hand gesture recognition using static images by changing the learning percentages for the measures a accu-
racy, b F1 score, c FDR, d FNR, e FPR, f MCC, g NPV, h precision, i sensitivity, and j specificity

13
2118 Multimedia Tools and Applications (2022) 81:2095–2126

Fig. 10 Performance analysis of the developed and traditional heuristic-oriented AHT and Deep-CNN for
hand gesture recognition using dynamic images by changing the learning percentages for the measures a
accuracy, b F1 score, c FDR, d FNR, e FPR, f MCC, g NPV, h precision, i sensitivity, and j specificity

13
Multimedia Tools and Applications (2022) 81:2095–2126 2119

CNN at 85% learning percentage is 18.34%, 12.10%, 9.80%, and 13.21%, surpassed than
EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep CNN. On consider-
ing Fig. 9d, at 65% learning percentage, the FNR of E-WOA-Deep CNN is 8.28%, 21.05%,
44.44%, and 42.31% improved than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep
CNN, and PSO-Deep CNN. Similarly, in the case of Fig. 10a, at 65% learning percentage,
the accuracy of the E-WOA-Deep CNN is 0.63%, 0.21%, 0.84%, and 0.10% progressed
than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep CNN. From
Fig. 10e, the FPR at 85% learning percentage is 2.20%, 1.16%, 1.39%, and 0.54%, superior
to EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep CNN. In the case
of Fig. 10f, at 55% learning percentage, the MCC of the E-WOA-Deep CNN is 15.38%,
9.76%, 2.17%, and 4.26% improved than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep
CNN, and PSO-Deep CNN. Moreover, in Fig. 10g, the NPV at 75% learning percentage is
1.79%, 4.26%, 0.41%, and 3.89% higher than EFO-Deep CNN, WOA-Deep CNN, GWO-
Deep CNN, and PSO-Deep CNN. Hence, better performance analysis is achieved in the
case of the proposed E-WOA-Deep CNN for both the static as well as the dynamic images
than all the traditional algorithms.

6.5 Analysis over conventional machine learning

The performance analysis over several traditional machine learning algorithms in terms of
distinct performance measures is displayed in Figs. 11 and 12, respectively. The measures
reveal better outcomes in the case of the proposed E-WOA-Deep CNN. From Fig. 11a,
at 85% learning percentage, the accuracy of E-WOA-Deep CNN is 1.06%, 2.15%, and
3.26% improved than CNN, RNN, and VGG16. On considering Fig. 11b, at 85% learning
percentage, the F1 Score of E-WOA-Deep CNN is 6.38%, 5.26%, and 4.17% higher than
CNN, RNN, and VGG16. In the case of Fig. 11c, the FDR at 85% learning percentage
is 13.33%, 18.75%, and 1.47% improved than CNN, RNN, and VGG16. On considering
Fig. 11d, the FNR at 75% learning percentage is 17.65%, 28.57%, and 25.93% superior to
CNN, RNN, and VGG16. Further, in Fig. 12a, the accuracy at 55% learning percentage is
2.17%, 3.33%, and 1.08% superior to CNN, RNN, and VGG16. On considering Fig. 12e,
the FPR at 45% learning percentage is 2.0%, 10.71%, and 4.17% higher than CNN, RNN,
and VGG16. From Fig. 12f, the MCC at 35% learning percentage is 16.28%, 19.05%, and
2.04% surpassed than CNN, RNN, and VGG16. Similarly, in the case of Fig. 12g, the NPV
at 85% learning percentage is 2.04%, 4%, and 2.13% better than CNN, RNN, and VGG16.
Hence the machine learning analysis of the proposed E-WOA-Deep CNN is better than the
state-of-the-art methods in both the static as well as dynamic images.

6.6 Overall algorithmic analysis

The overall algorithmic analysis in the case of several optimization algorithms with
the proposed E-WOA-Deep CNN for both the static as well as dynamic images is
listed in Tables 2 and 3 for the static as well as the dynamic images. The outcomes
hold better with the proposed E-WOA-Deep CNN. In the case of Table 2 for static
images, the accuracy of the E-WOA-Deep CNN is 1.02%, 1.14%, 1.02%, and 0.76%
improved than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep
CNN. Additionally, the FPR of the E-WOA-Deep CNN is 1.16%, 0.71%, 1.82%, and
0.20%, superior to EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-
Deep CNN. On considering Table 3, for the dynamic images, the accuracy of the

13
2120 Multimedia Tools and Applications (2022) 81:2095–2126

Fig. 11 Performance analysis of the developed and traditional machine learning algorithms for hand gesture
recognition using static images by changing the learning percentages for the measures a accuracy, b F1
score, c FDR, d FNR, e FPR, f MCC, g NPV, h precision, i sensitivity, and j specificity

13
Multimedia Tools and Applications (2022) 81:2095–2126 2121

Fig. 12 Performance analysis of the developed and traditional machine learning algorithms for hand gesture
recognition using dynamic images by changing the learning percentages for the measures a accuracy, b F1
score, c FDR, d FNR, e FPR, f MCC, g NPV, h precision, i sensitivity, and j specificity

13
2122 Multimedia Tools and Applications (2022) 81:2095–2126

Table 2 Overall performance analysis of developed and traditional heuristic-oriented AHT and Deep CNN
models for hand gesture recognition in static format
Performance WOA-Deep PSO-Deep CNN EFO-Deep CNN GWO-Deep E-WOA-Deep
measures CNN [17] [36] [38] CNN [18] CNN

Specificity 0.496293 0.498879 0.505608 0.490571 0.499862

FDR 0.015587 0.015336 0.01651 0.016399 0.013675
Accuracy 0.956721 0.960351 0.95787 0.957844 0.967633
MCC 0.390033 0.40992 0.4173 0.401329 0.43553
FPR 0.503707 0.501121 0.494392 0.509429 0.500138
F1 Score 0.977545 0.979467 0.97812 0.97813 0.983326
Precision 0.984413 0.984664 0.98349 0.983601 0.986325
NPV 0.496293 0.498879 0.505608 0.490571 0.499862
FNR 0.029229 0.025675 0.027191 0.02728 0.019655
Sensitivity 0.970771 0.974325 0.972809 0.97272 0.980345

E-WOA-Deep CNN is 0.56%, 0.65%, 1.18%, and 0.92% higher than EFO-Deep CNN,
WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep CNN. Similarly, the MCC of
the E-WOA-Deep CNN for the dynamic images is 2.76%, 1.57%, 11.63%, and 4.26%
higher than EFO-Deep CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep
CNN. Thus, the outcomes demonstrated that the proposed E-WOA-Deep CNN pro-
duced better outcomes in the case of both the static as well as the dynamic images
than all the traditional methods.

6.7 Overall classifier analysis

The classifier analysis in terms of several machine learning algorithms with the pro-
posed E-WOA-Deep CNN for both the static as well as dynamic images is defined in

Table 3 Overall performance analysis of developed and traditional heuristic-oriented AHT and Deep CNN
models for hand gesture recognition in dynamic format
Performance EFO-Deep CNN GWO- PSO-Deep CNN WOA- E-WOA-Deep CNN
measures [38] Deep CNN [36] Deep CNN
[18] [17]

F1 Score 0.982348 0.979206 0.980517 0.981851 0.985206

Precision 0.984457 0.983474 0.983892 0.984437 0.986004
NPV 0.501882 0.492511 0.51199 0.513988 0.494146
Sensitivity 0.980248 0.974974 0.977165 0.979278 0.984409
FNR 0.019752 0.025026 0.022835 0.020722 0.015591
FDR 0.015543 0.016526 0.016108 0.015563 0.013996
Specificity 0.501882 0.492511 0.51199 0.513988 0.494146
Accuracy 0.965833 0.959886 0.9624 0.964914 0.97123
MCC 0.453005 0.417004 0.446481 0.458296 0.465512
FPR 0.498118 0.507489 0.48801 0.486012 0.505854

13
Multimedia Tools and Applications (2022) 81:2095–2126 2123

Table 4 Overall classifier analysis of developed and traditional machine learning algorithms for hand ges-
ture recognition in a static format
Performance RNN [14] VGG16 [10] Deep CNN [28] DH-GWO-NN [22] E-WOA-Deep CNN
measures

F1 Score 0.977871 0.978444 0.982654 0.811429 0.983326

Accuracy 0.957336 0.958423 0.966418 0.97 0.987633
FPR 0.508273 0.508687 0.512985 0.004 0.500138
Specificity 0.491727 0.491313 0.487015 0.996 0.499862
MCC 0.392409 0.399904 0.458485 0.805216 0.43553
Precision 0.984157 0.984022 0.98341 0.946667 0.986325
NPV 0.491727 0.491313 0.487015 0.996 0.499862
FNR 0.028335 0.027071 0.018102 0.29 0.019655
Sensitivity 0.971665 0.972929 0.981898 0.71 0.980345
FDR 0.015843 0.015978 0.01659 0.053333 0.013675

Table 4 and Table 5, respectively. The outcomes are superior to the proposed E-WA-
Deep CNN. On considering Table 4, the accuracy of E-WOA-Deep CNN is 2.20%,
3.05%, and 3.16% better than DH-GWO-NN, Deep CNN, VGG16, and RNN. The
precision of static images is 4.19%, 0.30%, 0.23%, and 0.01% higher than DH-GWO-
NN, Deep CNN, VGG16, and RNN. Similarly, the accuracy of dynamic images is
8.26%, 0.74%, 1.04%, and 1.03%, superior to DH-GWO-NN, VGG16, RNN, and Deep
CNN. Further, the specificity of dynamic images is 49.13%, 0.93%, 0.28%, and 1.19%
improved than DH-GWO-NN, VGG16, RNN, and Deep CNN. Therefore, the classifier
analysis outcomes holds good for the proposed E-WOA-Deep CNN for both the static
as well as dynamic images when it is compared with several traditional machine learn-
ing algorithms.

Table 5 Overall classifier analysis of developed and traditional machine learning algorithms for hand ges-
ture recognition in dynamic format
Performance Deep CNN [28] RNN [14] VGG16 [10] DH-GWO-NN [22] E-WOA-Deep CNN
measures

NPV 0.500111 0.492759 0.498763 0.971429 0.494146

FPR 0.499889 0.507241 0.501237 0.028571 0.505854
Specificity 0.500111 0.492759 0.498763 0.971429 0.494146
F1 Score 0.979933 0.979933 0.981417 0.307692 0.985206
Sensitivity 0.976987 0.976679 0.978234 0.228571 0.984409
FNR 0.023013 0.023321 0.021766 0.771429 0.015591
MCC 0.441131 0.429617 0.434551 0.279108 0.465512
Precision 0.982897 0.983209 0.984621 0.470588 0.986004
Accuracy 0.961302 0.961273 0.964051 0.897143 0.97123
FDR 0.017103 0.016791 0.015379 0.529412 0.013996

13
2124 Multimedia Tools and Applications (2022) 81:2095–2126

7 Conclusion

This paper has proposed improved segmentation and deep learning-based strategy for
dynamic hand gesture recognition. The data was collected from the ISL benchmark data-
set that consisted of both the static as well as dynamic images. The pre-processing was
accomplished by the grey scale conversion and histogram equalization. The segmentation
of gestures was performed by the AHT by tuning the theta angle using the E-WOA. Next,
the optimized Deep CNN was used for gesture recognition, where the learning rate, epoch
count, and the number of hidden neurons were optimized using the proposed E-WOA.
The training of optimized Deep CNN was done by the DTW that avoided the redundant
frames, thus improving its performance. From the analysis, the accuracy of static images
of the E-WOA-Deep CNN was 1.02%, 1.14%, 1.02%, and 0.76% improved than EFO-Deep
CNN, WOA-Deep CNN, GWO-Deep CNN, and PSO-Deep CNN. Similarly, the accu-
racy of dynamic images was 8.26%, 0.74%, 1.04%, and 1.03%, superior to DH-GWO-NN,
VGG16, RNN, and Deep CNN. Therefore, the outcomes clearly demonstrated that the pro-
posed E-WOA-Deep CNN holds superior results for gesture recognition in the case of both
the static as well as dynamic images than the state-of-the-art methods.

References
1. Ameur S, Ben Khalifa A, Bouhlel MS (2020) A novel hybrid bidirectional unidirectional LSTM network for
dynamic hand gesture recognition with Leap Motion. Entertain Comput 35:100373
2. Bautista MA, Hernandez-Vela A, Vi Ponce X, Perez-Sala X Baro, Pujol O, Angulo C, Escalera S
(2013) Probability-based dynamic time warping for gesture recognition on RGB-D data. Lecture Notes
Comput Sci 7854:126–135
3. Blazkiewicz M, Lann Vel Lace K, Hadamus A (2021) Gait symmetry analysis based on dynamic time
warping. Symmetry 13(5):836
4. Chen Q, Georganas ND, Petriu EM (2008) Hand gesture recognition using Haar-like features and a
stochastic context-free grammar. IEEE Trans Instrum Meas 57(8):1562–1571
5. Cheng H, Yang L, Liu Z (2016) Survey on 3d hand gesture recognition. IEEE Trans Circuits Syst
Video Technol 26(9):1659–1673
6. Cheng H, Dai Z, Liu Z, Zhao Y (2016) An image-to-class dynamic time warping approach for both 3D
static and trajectory hand gesture recognition. Pattern Recogn 55:137–147
7. Choi H-R, Kim TY (2018) Modified dynamic time warping based on direction similarity for fast ges-
ture recognition. Pattern Recogn 2018:9
8. Dardas NH, Georganas ND (2011) Real-time hand gesture detection and recognition using bag-of-
features and support vector machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607
9. Dorothy R, Joany RM, Rathish J, Santhana Prabha S, Rajendran S, Joseph S (2015) Image enhancement
by Histogram equalization. Int J Nano Corros Sci Eng 2:21–30
10. Guan Q, Wang Y, Ping Bo, Li D, Jiajun Du, Qin Yu, Hongtao Lu, Wan X, Xiang J (2019) Deep convo-
lutional neural network VGG-16 model for differential diagnosing of papillary thyroid carcinomas in
cytological images: a pilot study. J Cancer 10(20):4876–4882
11. Hsieh C-C, Liou D-H (2015) Novel Haar features for real-time hand gesture recognition using SVM. J
Real-Time Image Process 10(2):357–370
12. Ibañez R, Soria Á, Teyseyre A, Rodríguez G, Campo M (2017) Approximate string matching: a lightweight
approach to recognize gestures with kinect. Pattern Recogn 62:73–86
13. Kollorz K, Penne J, Hornegger J, Barke A (2008) Gesture recognition with a time-of-flight camera. Int
J Intell Syst Technol Appl 5(3–4):334–343
14. Li F, Liu M (2019) A hybrid convolutional and recurrent neural network for hippocampus analysis in
Alzheimer’s disease. J Neurosci Methods 323:108–118
15. Lv W (2021) Gesture recognition in somatosensory game via kinect sensor. Internet Technol Lett.
https://doi.org/10.1002/itl2.311
16. Marsaline Beno M, Valarmathi IR, Swamy SM, Rajakumar BR (2014) Threshold prediction for segmenting
tumour from brain MRI scans. Int J Imaging Syst Technol 24(2):129–137

13
Multimedia Tools and Applications (2022) 81:2095–2126 2125

17. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
18. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
19. Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybernet C Appl Rev
37(3):311–324
20. Murillo-Bracamontesa EA, Martinez-Rosas ME, Miranda-Velasco MM, Martinez-Reyes HL, Martinez-
Sandoval JR, Cervantes-de-Avila H (2012) Implementation of Hough transform for fruit image segmenta-
tion. In: International meeting of electrical engineering research ENIINVIE 2012, vol 35, pp 230–239
21. Nandy A, Mondal S, Prasad JS, Chakraborty P, Nandi GC (2010) Recognizing & interpreting Indian
sign language gesture for human robot interaction. In: The proceeding of ICCCT-10, IEEE Xplore
Digital Library, pp 712–717
22. Nandy A, Mondal S, Prasad JS, Chakraborty P, Nandi GC (2010) Recognition of isolated indian sign
language gesture in real time. In: Das VV et al (eds) Information processing and management, LNCS-
CCIS, vol 70. Springer, Berlin, pp 102–107
23. Palacios JM, Sagüés C, Montijano E, Llorente S (2013) Humancomputer interaction based on hand
gestures using RGB-D sensors. Sensors 13(9):11842–11860
24. Pedersoli F, Benini S, Adami N, Leonardi R (2014) XKin: An open source framework for hand pose
and gesture recognition using Kinect. Vis Comput 30(10):1107–1122
25. Plouffe G, Cretu A-M (2016) Static and dynamic hand gesture recognition in depth data using dynamic
time warping. IEEE Trans Instrum Meas 65(2):305–316
26. Plouffe G, Cretu A (2016) Static and dynamic hand gesture recognition in depth data using dynamic
time warping. IEEE Trans Instrum Meas 65(2):305–316
27. Poularakis S, Katsavounidis I (2016) Low-complexity hand gesture recognition system for continuous
streams of digits and letters. IEEE Trans Cybernet 46(9):2094–2108
28. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive
review. Neural Comput 29:2352–2449
29. Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using Kinect sensor.
IEEE Trans Multimedia 15(5):1110–1120
30. Ren Z, Yuan J, Meng J et al (2013) Robust part-based hand gesture recognition using kinect sensor.
IEEE Trans Multimedia 15(5):1110–1120
31. Srivastava R, Sinha P (2016) Hand movements and gestures characterization using quaternion dynamic
time warping technique. IEEE Sens. J 16(5):1333–1341
32. Tang M (2011) Recognizing hand gestures with Microsoft’s Kinect. Stanfordedu 14(4):303–313
33. Tang J, Cheng H, Zhao Y, Guo H (2018) Structured dynamic time warping for continuous hand trajectory
gesture recognition. Pattern Recogn 80:21–31
34. Várkonyi-Kóczy AR, Tusor B (2011) Human–computer interaction for smart environment applications
using fuzzy hand posture and gesture models. IEEE Trans Instrum Meas 60(5):1505–1514
35. Wang H, Li Z (2015) Accelerometer-based gesture recognition using dynamic time warping and sparse
representation. Multimedia Tools Appl 75:8637–8655
36. Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22:387–408
37. Yao Y, Fu Y (2014) Contour model-based hand-gesture recognition using the Kinect sensor. IEEE
Trans Circuits Syst Video Technol 24(11):1935–1944
38. Yilmaz S, Sen S (2020) Electric fish optimization: a new heuristic algorithm inspired by electroloca-
tion. Neural Comput Appl 32:11543–11578
39. Yoon H-S, Soh J, Bae YJ, Seung Yang H (2001) Hand gesture recognition using combined features of
location, angle and velocity. Pattern Recogn 34(7):1491–1501
40. Zhou Y, Jiang G, Lin Y (2016) A novel finger and hand pose estimation technique for real-time hand
gesture recognition. Pattern Recogn 49:102–114

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

13
2126 Multimedia Tools and Applications (2022) 81:2095–2126

Authors and Affiliations

Manisha Kowdiki1 · Arti Khaparde1

Arti Khaparde
arti.khaparde@mitwpu.edu.in
1
School of Electronics & Communication Engineering Department, MIT World Peace University,
Dr. Vishwanath Karad, Pune, India

2d Convolution Research
No ratings yet
2d Convolution Research
15 pages
Deep Learning Based Real-Time Recognition of Dynamic Finger Gestures Using A Data Glove
No ratings yet
Deep Learning Based Real-Time Recognition of Dynamic Finger Gestures Using A Data Glove
11 pages
First Meeting Hasanv3
No ratings yet
First Meeting Hasanv3
20 pages
Paper 3
No ratings yet
Paper 3
10 pages
1376 3343 1 PB
No ratings yet
1376 3343 1 PB
10 pages
Hussain 2017
No ratings yet
Hussain 2017
2 pages
Sensors 24 06262
No ratings yet
Sensors 24 06262
16 pages
Hand Gesture Recognition Literature Review
100% (2)
Hand Gesture Recognition Literature Review
5 pages
Gesture Recognition System
No ratings yet
Gesture Recognition System
12 pages
Multimodal Depth Hand Gesture Recognition
No ratings yet
Multimodal Depth Hand Gesture Recognition
9 pages
A Transformer Based Dynamic Hand Gesture Recognition
No ratings yet
A Transformer Based Dynamic Hand Gesture Recognition
13 pages
mẫu báo cáo
No ratings yet
mẫu báo cáo
12 pages
Gesture Recognition for HCI Experts
No ratings yet
Gesture Recognition for HCI Experts
10 pages
ICSCSS 177 Final Manuscript
No ratings yet
ICSCSS 177 Final Manuscript
6 pages
Sensors 23 05555
No ratings yet
Sensors 23 05555
20 pages
Aksdcbfd12324 4354665tgdfhghdf
No ratings yet
Aksdcbfd12324 4354665tgdfhghdf
11 pages
Single Shot Detector CNN and Deep Dilated Masks Fo
No ratings yet
Single Shot Detector CNN and Deep Dilated Masks Fo
12 pages
Paper Survey
No ratings yet
Paper Survey
10 pages
Real Time Hand Gesture Recognition Research
No ratings yet
Real Time Hand Gesture Recognition Research
11 pages
Shsconf Stehf2022 03011
No ratings yet
Shsconf Stehf2022 03011
5 pages
05 A Survey Paper On Hand Gesture Recognition
No ratings yet
05 A Survey Paper On Hand Gesture Recognition
6 pages
5 Dynamic Hand Gesture Recognition For Wearable Devices With Low Complexity Recurrent Neural Networks
No ratings yet
5 Dynamic Hand Gesture Recognition For Wearable Devices With Low Complexity Recurrent Neural Networks
4 pages
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
No ratings yet
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
11 pages
Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Networks
No ratings yet
Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Networks
9 pages
Convolutional Neural Network Based Hand Gesture Recognition-2
No ratings yet
Convolutional Neural Network Based Hand Gesture Recognition-2
5 pages
Paper 5
No ratings yet
Paper 5
10 pages
Sensors 22 00706
No ratings yet
Sensors 22 00706
14 pages
A Novel Hybrid Deep Learning Architecture For Dynamic Hand Gesture Recognition
No ratings yet
A Novel Hybrid Deep Learning Architecture For Dynamic Hand Gesture Recognition
14 pages
Gray Word11
No ratings yet
Gray Word11
15 pages
Hand Gesture Recognition Based On Convolution Neural Network CNN and Support Vector Machine SVM
No ratings yet
Hand Gesture Recognition Based On Convolution Neural Network CNN and Support Vector Machine SVM
4 pages
Project Summary
No ratings yet
Project Summary
10 pages
Advancements and Challenges in Hand Gesture Recognition: A Comprehensive Review
No ratings yet
Advancements and Challenges in Hand Gesture Recognition: A Comprehensive Review
11 pages
Gesture Recognition Based On Kinect: Yunda Liu, Min Dong, Sheng Bi, Dakui Gao, Yuan Jing, Lan Li
No ratings yet
Gesture Recognition Based On Kinect: Yunda Liu, Min Dong, Sheng Bi, Dakui Gao, Yuan Jing, Lan Li
5 pages
Hand Gesture Recognition Using Machine Learning
No ratings yet
Hand Gesture Recognition Using Machine Learning
7 pages
Mathematics 12 01393
No ratings yet
Mathematics 12 01393
34 pages
Dynamic Hand Gesture Recognition A Literature Review
No ratings yet
Dynamic Hand Gesture Recognition A Literature Review
8 pages
Gesture Recognition For Human Computer Interaction
No ratings yet
Gesture Recognition For Human Computer Interaction
3 pages
Hand Gesture Recognition for ASL
No ratings yet
Hand Gesture Recognition for ASL
6 pages
Convolution Neural Networks For Hand Gesture Recognation
No ratings yet
Convolution Neural Networks For Hand Gesture Recognation
5 pages
Feature Analysis Using Spiking Neurons With Improved PCA Appoach For Hand Gesture Recognition
No ratings yet
Feature Analysis Using Spiking Neurons With Improved PCA Appoach For Hand Gesture Recognition
4 pages
(2011) Comparing Gesture Recognition Accuracy Using Color and Depth Information
No ratings yet
(2011) Comparing Gesture Recognition Accuracy Using Color and Depth Information
7 pages
Dynamic Hand Gesture Recognition A Literature Review
No ratings yet
Dynamic Hand Gesture Recognition A Literature Review
22 pages
Static Hand Gesture Recognition System
No ratings yet
Static Hand Gesture Recognition System
18 pages
A Comparative Study of Two State-Of-The-Art Sequence Processing Techniques For Hand Gesture Recognirtion
No ratings yet
A Comparative Study of Two State-Of-The-Art Sequence Processing Techniques For Hand Gesture Recognirtion
12 pages
Hand Gesture Recognition With Convolution Neural Networks
No ratings yet
Hand Gesture Recognition With Convolution Neural Networks
4 pages
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
No ratings yet
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
16 pages
Vision-Based Hand Gesture Recognition Review
No ratings yet
Vision-Based Hand Gesture Recognition Review
14 pages
Human Hand Gesture Recognition Using A Convolution Neural Network
No ratings yet
Human Hand Gesture Recognition Using A Convolution Neural Network
7 pages
Hand Gesture i-PACT
No ratings yet
Hand Gesture i-PACT
6 pages
7seminar Report
No ratings yet
7seminar Report
12 pages
Final PPT Gestnet 12.04.2024
No ratings yet
Final PPT Gestnet 12.04.2024
36 pages
Ph.D. Thesis Help: Gesture Recognition
100% (2)
Ph.D. Thesis Help: Gesture Recognition
5 pages
A Machine Learning Based Approach For Dynamic Hand Gesture Recognition in Human-Robot Interaction
No ratings yet
A Machine Learning Based Approach For Dynamic Hand Gesture Recognition in Human-Robot Interaction
6 pages
Analysis and Evaluation of Feature Selection and Feature Extraction Methods
No ratings yet
Analysis and Evaluation of Feature Selection and Feature Extraction Methods
13 pages
Deep Learning Gesture Recognition Update
No ratings yet
Deep Learning Gesture Recognition Update
17 pages
114 Submission
No ratings yet
114 Submission
5 pages
Hand Gesture Recognition Using OpenCV and Python
No ratings yet
Hand Gesture Recognition Using OpenCV and Python
7 pages
Irjet V11i4321
No ratings yet
Irjet V11i4321
4 pages
Gesture Volume Control Using Otsu Method
No ratings yet
Gesture Volume Control Using Otsu Method
12 pages
International Conference On Innovative Computing and Communications
No ratings yet
International Conference On Innovative Computing and Communications
772 pages
Morphological Operators That Considerably Improve The Generalised Hough Transform
No ratings yet
Morphological Operators That Considerably Improve The Generalised Hough Transform
15 pages
E Cient In-Place Hough Transform Algorithm For Arbitrary Image Sizes
No ratings yet
E Cient In-Place Hough Transform Algorithm For Arbitrary Image Sizes
29 pages
Radman 2013
No ratings yet
Radman 2013
8 pages
Texture Aware Autoencoder Pre-Training and Pairwise Learning Refinement For Improved Iris Recognition
No ratings yet
Texture Aware Autoencoder Pre-Training and Pairwise Learning Refinement For Improved Iris Recognition
21 pages
s42044 023 00141 0
No ratings yet
s42044 023 00141 0
13 pages
2NDSCIPAPER
No ratings yet
2NDSCIPAPER
16 pages
Pattern Matching For The Iris Biometric Recognition System Uses KNN and Fuzzy Logic Classifier Techniques
No ratings yet
Pattern Matching For The Iris Biometric Recognition System Uses KNN and Fuzzy Logic Classifier Techniques
8 pages
Model of Influence of Electrical Lighting On Psychophysiological State For Use in Adaptive Lighting Control Systems
No ratings yet
Model of Influence of Electrical Lighting On Psychophysiological State For Use in Adaptive Lighting Control Systems
12 pages
Structure Correlation-Aware Attention For Iris Recognition: Original Article
No ratings yet
Structure Correlation-Aware Attention For Iris Recognition: Original Article
21 pages
Performance and Economic Analysis of Designed Diff
No ratings yet
Performance and Economic Analysis of Designed Diff
23 pages
Zakysakr 2011
No ratings yet
Zakysakr 2011
6 pages
1 s2.0 S016786551630109X Main
No ratings yet
1 s2.0 S016786551630109X Main
8 pages
Bendale 2012
No ratings yet
Bendale 2012
8 pages
Anx1 Sow b000
No ratings yet
Anx1 Sow b000
5 pages
Sharma e T Al. 2022 Estruc&Divers
No ratings yet
Sharma e T Al. 2022 Estruc&Divers
13 pages
Cooperation and Ambedkarism
No ratings yet
Cooperation and Ambedkarism
8 pages
Account Statement 050723 041023
No ratings yet
Account Statement 050723 041023
22 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Aurélien Géron Online PDF
No ratings yet
Hands On Machine Learning With Scikit Learn and TensorFlow Aurélien Géron Online PDF
54 pages
Jay Shetty Find Your Purpose Booklet
96% (24)
Jay Shetty Find Your Purpose Booklet
69 pages
6097473R1 TB302 ForkModInspect
No ratings yet
6097473R1 TB302 ForkModInspect
2 pages
Anthropometric Measurment procedure-WPS Office
No ratings yet
Anthropometric Measurment procedure-WPS Office
5 pages
Alcatel Lucent 9400 AWy
No ratings yet
Alcatel Lucent 9400 AWy
6 pages
GL0030
No ratings yet
GL0030
4 pages
3GPP TS 32.645: Technical Specification
No ratings yet
3GPP TS 32.645: Technical Specification
16 pages
Year 2 Daily Lesson Plans: By:Missash
No ratings yet
Year 2 Daily Lesson Plans: By:Missash
5 pages
Assemblage Mun 2.0 - MVN Sr. Sec. School, Sector-88
No ratings yet
Assemblage Mun 2.0 - MVN Sr. Sec. School, Sector-88
27 pages
Blow Mould Tool Design and Manufacturing Process For 1litre Pet Bottle
No ratings yet
Blow Mould Tool Design and Manufacturing Process For 1litre Pet Bottle
10 pages
Ew Chacklist
No ratings yet
Ew Chacklist
1 page
Practicum Log
No ratings yet
Practicum Log
2 pages
Ao Smith Motors PDF
100% (1)
Ao Smith Motors PDF
59 pages
Forensic Chemistry Gun Powder
No ratings yet
Forensic Chemistry Gun Powder
21 pages
Chemical Equilibrium Guide
No ratings yet
Chemical Equilibrium Guide
53 pages
Compiler Design Assignment 2023
No ratings yet
Compiler Design Assignment 2023
2 pages
Man Re Treaty 2022-23 14 April Final
No ratings yet
Man Re Treaty 2022-23 14 April Final
27 pages
Air Cooled Chiller Quotation 2024
No ratings yet
Air Cooled Chiller Quotation 2024
3 pages
Literature Review On Educational Administration
100% (2)
Literature Review On Educational Administration
23 pages
Damcos Vision Series SD 8751 2E05 1
No ratings yet
Damcos Vision Series SD 8751 2E05 1
18 pages
AEG Protect-8 INV EN
No ratings yet
AEG Protect-8 INV EN
4 pages
3 Certificates
No ratings yet
3 Certificates
46 pages
Sboa 325
No ratings yet
Sboa 325
6 pages
Performance Task in Math 9 Princess Allayne Fariola
No ratings yet
Performance Task in Math 9 Princess Allayne Fariola
9 pages
Codgos de Falhas Sinotruk
No ratings yet
Codgos de Falhas Sinotruk
26 pages
Substation Upgrade Strategies
No ratings yet
Substation Upgrade Strategies
7 pages

Adaptive Hough Transform With Optimized Deep Learning Followed by Dynamic Time Warping For Hand Gesture Recognition

Uploaded by

Adaptive Hough Transform With Optimized Deep Learning Followed by Dynamic Time Warping For Hand Gesture Recognition

Uploaded by

Multimedia Tools and Applications (2022) 81:2095–2126

Adaptive hough transform with optimized deep learning

Manisha Kowdiki1 · Arti Khaparde1

Received: 20 February 2021 / Revised: 21 July 2021 / Accepted: 19 August 2021 /

• To propose an enhanced segmentation as well as the deep learning-oriented strategy

3 Dataset Description and pre‑processing steps involved for static

Author [citation] Methodology Features Challenges

Aboard Anger Ascend Beside Drink

Fig. 1 Sample static images of ISL dataset for five signs

3.2.1 Grey scale conversion

Here, the confidence

In the above equation, heq = 0, 1, ⋯ , PIVQ − 1. The histogram equalized image is

3.3 Proposed architectural model

Input ISL dataset

Fig. 3 Architecture of the proposed hand gesture recognition model

4 Improved gesture segmentation for static and dynamic hand

4.1 Adaptive hough transform

The segmentation of gestures is performed by the AHT, in which an improvement is made

I/P image Pre-processed image

AHT for segmentation

Theta optimization Initialization

Fig. 4 Proposed Adaptive Hough transform for gesture segmentation

Algorithm 1: Pseudocode of proposed E-WOA

5 Intelligent static and dynamic hand gesture recognition using deep

5.1 Dynamic time warping

Fig. 5 Flowchart of the proposed E-WOA

5.2 Optimized deep CNN‑based recognition

YZkzizjz = max xzkzpzqz

5.2.3 Fully connected layers

Fig. 6 Architectural representation of the proposed optimized Deep CNN-based recognition

6 Results and discussions

The various performance measures used here are described below.

(d) Precision: It is defined in Eq. (24).

6.4 Effect of optimized deep CNN

Image 1 Image 2 Image 3 Image 4 Image 5 (Drink)

6.5 Analysis over conventional machine learning

6.6 Overall algorithmic analysis

Specificity 0.496293 0.498879 0.505608 0.490571 0.499862

6.7 Overall classifier analysis

F1 Score 0.982348 0.979206 0.980517 0.981851 0.985206

F1 Score 0.977871 0.978444 0.982654 0.811429 0.983326

NPV 0.500111 0.492759 0.498763 0.971429 0.494146

Authors and Affiliations

Manisha Kowdiki1 · Arti Khaparde1

You might also like

3 Dataset Description and pre‑processing steps involved for static

3.2.1 Grey scale conversion

3.3 Proposed architectural model

4 Improved gesture segmentation for static and dynamic hand

4.1 Adaptive hough transform

5 Intelligent static and dynamic hand gesture recognition using deep

5.1 Dynamic time warping

5.2 Optimized deep CNN‑based recognition

5.2.3 Fully connected layers

6 Results and discussions

6.4 Effect of optimized deep CNN

6.5 Analysis over conventional machine learning

6.6 Overall algorithmic analysis

6.7 Overall classifier analysis