0% found this document useful (0 votes)
15 views9 pages

Gray Word4

This paper presents a method for hand gesture recognition using Kinect's color and depth data, focusing on segmenting the hand region for improved accuracy. The proposed approach utilizes the YCbCr color space for skin color detection and processes depth images to extract hand regions, achieving high accuracy in near real-time. Experimental results indicate the method's effectiveness, although it has limitations regarding gesture variety and environmental conditions.

Uploaded by

Dr.Ali Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Gray Word4

This paper presents a method for hand gesture recognition using Kinect's color and depth data, focusing on segmenting the hand region for improved accuracy. The proposed approach utilizes the YCbCr color space for skin color detection and processes depth images to extract hand regions, achieving high accuracy in near real-time. Experimental results indicate the method's effectiveness, although it has limitations regarding gesture variety and environmental conditions.

Uploaded by

Dr.Ali Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The International Arab Journal of Information Technology, Vol. 17, No.

1, January 2020 137

A Combined Method of Skin-and Depth-based


Hand Gesture Recognition
Tukhtaev Sokhib1 and Taeg Keun Whangbo2
1
Department of IT Convergence Engineering, Gachon University, Korea
2
Department of Computer Science, Gachon University, Korea

Abstract: Kinect is a promising acquisition device that provides useful information on a scene through color and depth data.
There has been a keen interest in utilizing Kinect in many computer vision areas such as gesture recognition. Given the
advantages that Kinect provides, hand gesture recognition can be deployed efficiently with minor drawbacks. This paper
proposes a simple and yet efficient way of hand gesture recognition via segmenting a hand region from both color and depth
data acquired by Kinect v1. The Inception model of the image recognition system is used to check the reliability of the
proposed method. Experimental results are derived from a sample dataset of Microsoft Kinect hand acquisitions. Under the
appropriate conditions, it is possible to achieve high accuracy in close to real time.

Keywords: Gesture recognition, Microsoft Kinect, inception model, depth.

Received September 21, 2017; accepted September 23, 2018


https://doi.org/10.34028/iajit/17/1/16

1. Introduction Kinect has shown to be favorable [16]. The demand for


a large dataset is undeniable, it is because typical
It is widely recognized today that people can use their models of the hands have many Degrees Of Freedom
hands to control devices without actually touching (DoF). Several hand gesture recognition methods have
them. Hands were themselves the very first means of been offered by computer vision researchers over the
communication; thus, it is natural and intuitive to use past decade, most of which are based on images
hands while interacting with a gesture-driven system. derived from consumer depth cameras [1, 6, 21].
Being a non-verbal way of conveyance, hands are The structure of this paper is as follows: Section 2
much more common form of communication such as discusses other related research works on gesture
speech. Some difficulties of hand gesture recognition recognition via depth cameras. Section 3 provides in-
have been resolved via solutions involving computer depth information concerning the proposed method for
vision [23]. Another way that hand gestures can be hand gesture recognition and Section 4 reveals the
understood by a computer is through the data-glove experimental results obtained from the Inception
approach. Parvini et al. [19] proposed a gesture classification model. Section 5 highlights some
recognition system that utilizes bio-mechanical limitations of the proposed method. Finally, section 6
characteristics where the range of motion of each completes the manuscript with conclusions and future
section of the hand participating in a sign, relative to directions.
non-participating sections, is a user-independent
characteristic of that sign. They used CyberGlove as a
sensory device. Similarly, Deyou [4] proposed a
2. Literature Review
gesture recognition approach which uses a dataglove. Since the introduction of depth cameras there have
However, gloves must be worn, which can become been many research investigations regarding hand
cumbersome and cause sweaty hands that adversely gesture recognition. Our method relies upon color and
affect personal hygiene [10]. Therefore, a system that depth images derived from Kinect, so going forward;
could interact with humans without using haptic we will limit the discussion to the most relevant depth
devices such as controllers, gloves, or world-grounded camera-based prior works.
devices would be highly valuable. Even though there A distance metric called Finger-Earth Mover’s
have been methods previously proposed for gesture Distance (FEMD) has been used for a gesture variance
recognition, none of them is perfect [16, 19, 23], and as [22]. FEMD is capable of differentiating hand gestures
a result, scholars and researchers have been motivated using a time series curve of the fingers only, not
to learn more about hand gestures. Recent including the whole hand. Another vision based gesture
investigations show machine learning methods to be a recognition method was proposed by Vaezi and
promising tool when a large enough dataset can be Nekouie [29]. They used a 2D image and 3D model for
supplied [3], and for hand gesture recognition tasks their estimation to avoid computational complexity
138 The International Arab Journal of Information Technology, Vol. 17, No. 1, January 2020

induced by reconstruction and inverse kinematics. One hand gestures for HCI and proposed shape descriptors.
deficiency of their work is its requirement of color Kinect enables the manipulation of windows and
markers to locate the exact angles of fingers with gaming applications too. Guan-Feng He et al. [8]
respect to the other parts of a hand in a 2D color proposed a vision-based HCI system in which users
image. The work by Dominio et al. [5] is also relevant can control windows and games such as the popular
to our study. They extracted hand region from depth Angry Birds. Madani et al. [15] presents another
and color data as well and introduced two new relevant approach where gesture signals from Wiimote device
features: the distance from the palm center to the are classified to deal with numeric gestures.
finger, and the curvature of the hand shape. The Human Robot Interaction (HRI) has also become a
method we propose is similar to that of [5]. It is mainly hot topic and many works have been claimed at
intended for sedentary activities that do not require creating robotic devices that can mimic human actions.
substantial changes of the scene. Under such Gestures are crucial in HRI for robots to follow what
conditions, we extract the closest points from the depth humans do or respond to their gestural commands [7,
data. Other detailed information is provided in the 27]. Apart from hand gesture recognition, palm center
upcoming chapter. and finger recognition have been researched over the
Some attempts successfully adapted machine past several years. Kinect provides full body skeleton
learning methods to gesture recognition using Kinect’s tracking, which is especially useful when combined
skeletal data. Bhattacharya et al. [2] applied Support with depth data to detect finger position.
Vector Machines and Decision Trees to recognize Skin color detection has been found to have a great
aircraft gestures used in military. Another promising impact on detecting bare hands even under fairly
method for hand gesture recognition is to create a changing backgrounds [14]. It is difficult to have a
model of the hand with all possible DoFs included and single method for skin color detection tasks, as human
match the model with the current shape from Kinect. skin color tone is cardinally different for people from
Similar work has been conducted by Microsoft different places. Since an Red, Green and Blue (RGB)
Research group in [25]. The hand region of interest is color space is not a good option for skin color
derived from depth input stream coming directly from detection as explained in [24], there are a variety of
Kinect and sent to the reinitializer to assume a other color spaces that can be used for this specific
distribution over poses and the “golden energy”, which task. Through the years researchers have found the
compares the output from the reinitializer with the YCbCr color space to be pertinent compared to its
actual shape of the hand. The output with least error is counterparts, and Shaik et al. [24] presented a
derived from the golden energy and shown as a best comparative study of skin color detection on Hue,
matched result in real-time application. This work [25] Saturation and Value (HSV) and Luminance, Chroma-
is outstanding in comparison with other similar blue, Chroma-red (YCbCr). Inspired from [24], we
approaches reported in [19, 21], but lacks the ability to chose the YCbCr color space to help detect human skin
render a good match when practically implemented color in color images.
with other physical things like pens or balls. The rich
information of Kinect is not only applicable for hand 3. Proposed Method
or face recognition but also can be used for full body
pose recognition as claimed in [26]. A 3D range The method we propose uses color images and depth
camera is also used to recognize hand gestures in real images captured via Kinect. However, with Kinect, it is
time. Li and Jarvis [13] created a system to identify not possible to capture scene information of very near
hand gestures from a 3D range camera. They captured or very far distances [11]. To achieve good results, the
a depth data with a range camera to segment the hand user must stand facing the Kinect at a relevant range,
and locate the hand in 3D space. The system has some that is, 0.8 m to 5.5 m. At distances closer than 0.8 m,
limitations when the user holds his hand still such that Kinect records a scene with a lot of noise and
the forearm and hand are the same distance from the flickering. Similarly, at distances further than 5.5m, the
camera. In this situation, it is difficult to segment only depth image varies substantially in brightness. As
a hand distinguished from the forearm, and therefore a stated above, there are more than 20 DoFs for a single
color web camera is required to segment the hand hand to be accurately modeled. In this paper, we used
using color information. Pavlovic et al. [20] surveyed only images, and a limited number of gestures were
several approaches to modeling, analysis, and included. On the one hand, in VR-based gaming
recognition of hand gestures for visual interpretation. applications or in other fields like health care, or
Hand gesture and arm motion have been considered to entertainment where hand gestures are used, there is
be promising candidates for indexing visual commands not much necessity for multiple variations of hand
to control the computer. Specific to our hand gestures, shapes. On the other hand, it would be hard to
there is a work done by Trigo and Pellegrino in [28] memorize and mimic each gesture for different tasks,
which analyzed the use of invariant moments, k- even if the system could include many gestures.
curvature features and template matching to classify Having considered the previous aspects, a dataset of
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 139

hand gestures [16] was selected to experimentally test detect skin color intensities. For this to be
our method. This dataset contains gestures performed accomplished, we first extract the two main
by 14 different people, each performing 10 different components: Cb-chrominance-blue and Cr-
gestures repeated 10 times each, for a total of 1400 chrominance-red from the image in the YCbCr space.
gestures. We did not use all the available data though; The threshold value for extracting skin color intensity
there was an additional built-in dataset-assembled with values is determined based on trial of likely skin colors
a Leap Motion device that comes with Kinect. Figure 1 in the dataset chosen. In the case of our experimental
pictorially explains the overall procedure of our dataset, intensity values less than 78 and greater than
approach. 126 in the Cb channel and intensity values less than
134 and greater than 173 in the Cr channel are found to
be most likely skin color intensity values derived after
experimenting with many combinations. Pixel
intensities that are not considered to be skin color are
converted to black, which helps when implementing
the coordinate matching method. Figure 2 shows
sample images from the dataset to give a clear
understanding of the process.

Figure 1. Overall schematic of the proposed method demonstrated


by pictures.

3.1. Skin Color Segmentation


Detection of skin color in images is a very popular and
useful technique for face and hand gesture recognition.
For most image applications, an RGB color model is
preferred [9]. The RGB color model is an additive, that a) Actual images. b) Images after skin segmentation is applied.

is, three light beams are added to form a final color. Figure 2. Samples derived from skin color extraction.
Though, it is difficult to determine specific color in
RGB color model [9]. For this reason, skin color based 3.2. Depth Image Processing
techniques exploit other color models such as HSV or Next, the depth data that was captured at the same time
Luminance In-phase Quadrature (YIQ). We selected as the color image is processed. To work with depth
the YCbCr color space over all other available color data from Kinect we need to have a clear
spaces because color and brightness are split nicely, understanding of the nature of it. The depth stream of
allowing us to perform computations on individual Kinect is made up of pixels that contain distance
channels. In the YCbCr color space the luminance information from the camera plane to the nearest
information is stored separately in the Y component object. The distance is measured in millimeters. This
and, the chrominance information is stored in Cb and feature is very useful when there is a need for
Cr in blue and red, respectively. However, there is a removing a background scene from image. The depth
transformation to easily convert from the RGB color
image comes in three different resolutions: 640x480
model to the YCbCr color model: (the default), 320x240, and 80x60. Reasonably, the
Y’ = 0.299 * R’ + 0.587 * G’ + 0.114 * B’ largest resolution is what we need because we are
Cb = -0.169 * R’ – 0.331 * G’ + 0.500 * B’ (1) going to acquire the highest quality possible. In our
Cr = 0.500 * R’ – 0.419 * G’ – 0.081 * B’ case, we acquire a depth image with the largest
As such, it is possible to derive each channel separately resolution and extract a candidate hand region from it.
and detect skin color values from components. There are several ways of extracting a candidate hand
A color image is acquired from Kinect; at the same region from the depth image. One of them is to enable
time, a depth image is also acquired, and both are sent skeletal tracking and identify the human shape. When
to the preprocessing stage where the resolution of the skeletal tracking is enabled, the Kinect runtime
color image is down-sampled to fit the resolution of processes the depth data to segment the player from the
the depth map. After converting the color image from map and removes background pixel intensities.
the RGB to the YCbCr color space, we are able to Furthermore, it is natural and intuitive for the users
who control the Kinect-driven system with hand
140 The International Arab Journal of Information Technology, Vol. 17, No. 1, January 2020

gestures to raise their hands in front of their torso and


move their hands a bit forward in order to clearly
represent the gesture. There should be nothing that
occludes the user or Kinect, because it will obviously
block the gesture. Since a gesture of the identified
human is made in front of the camera, the closest
intensity values in the near vicinity can be derived by
using a threshold value. For our dataset of hand
gestures, the threshold value is chosen to be an
intensity value greater than 15 and less than 23. These
exact numbers are chosen after examining the
frequency of pixels values of the depth image. In
normal cases, the user’s body covers 8-10 neighboring
pixel intensity values; we arrived at this conclusion
after observing the histogram distributions of several
depth images. Figure 3 presents some of the a) Depth images from Kinect. b) Thresholded results.

histograms of those depth images in one plot. Figure 4. Samples derived after threshold value is applied.

3.3. Coordinate Matching and Hand Candidate


Region Extraction
The coordinate matching stage is not difficult to
complete, but it plays significant role in extracting a
hand and neighboring hand candidate regions. Firstly,
we obtain both skin segmented and thresholded images
from the previous two processes given in above sub-
chapters. Then a skin color image is converted to a
grayscale intensity image because a color image is
three dimensional and the depth image to be matched
is two dimensional, therefore we need to have both
images be of the same dimension. Converting an RGB
image to grayscale does not hinder our method because
after skin segmentation, non-skin pixel values are
Figure 3. Histogram distributions of 10 different depth images
colored in different tones. converted to black, which means that black remains
black while skin-color pixel values assume some value
As can be seen in Figure 3, the most frequent of gray. After all these processes are complete, we
intensity values range from 24 to 32, and these values compare each single pixel of the thresholded image
are of the human body’s. We are not considering the and gray image, and if the pixel value at the
black values since they represent invalid distance corresponding location of both images is not black, we
information, therefore they can be safely ignored. mark this pixel value as white; if the condition is not
Users perform gestures a bit closer to the Kinect, met, we mark it as black. The pseudo code of an
meaning that smaller intensity values than 24 and algorithm is given below for one image pair:
greater than 15 are possible hand candidates. These Algorithm 1. RegionExtraction
candidate values sometimes include some noise as well
as parts of the forearm. We remove those types of noise ti=thresh_image
using a median filter with a slightly larger threshold, gi=gray_image
rt=number_of_rows_of_ti
and forearm parts are eliminated by coordinate ct=number_of_columns_of_ti
matching, which is performed using a color image of vt=pixel_value_of_ti
the same frame. Pixels that are out of this threshold are rg=number_of_rows_of_gi
set to black and will be used for coordinate matching. cg=number_of_columns_of_gi
Figure 4 illustrates output images after the threshold vg=pixel_value_of_gi
value has been applied to the depth images. for(i=1 to rt)
for(j=1 to ct)
{
for(k=1 to rg)
{
for(l=1 to cg)
{
if ti[i,j] && gi[k,l] !=0
{
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 141

ti[i,j]=255 system. We used the results derived from


} preprocessing stage as the input image. The dataset of
else binary images was created by applying our method to
{
ti[i,j]=0 the existing dataset of Microsoft Kinect hand
} acquisitions. The dataset consists of 1200 images of 10
} different gestures. The gestures are based on American
} Sign Language (ASL) as used in [12]. Figure 7 depicts
} some example images of the dataset.
return ti

With this technique, it is possible to extract a hand and


its closest regions more accurately because the forearm
parts in the color image are removed by the
thresholded image, and other noise in the depth image
is removed by the color image. Figure 5 displays an
acquired image after using this technique.

Figure 7. Dataset samples originated from Microsoft Kinect hand


acquisitions [16].

We performed a process referred to as inductive


transfer, which means applying learnings from
Figure 5. A result after coordinate matching is applied. previous training sessions to a new training session.
Inception is a large image classification model with
The results after coordinate matching are good but
millions of parameters that can differentiate many
still not sufficient. Therefore, we further improve the
images. For our purpose we only used the last layer of
image using preprocessing tools like a 3x3 median
the network and adapted it for gesture classification.
filter, where the largest region is extracted in the image
We classified our gesture images using a laptop with an
using an object extraction method. Median filtering not
Advanced Micro Devices (AMD) A10 processor,
only removes salt and pepper noise but also smooths
Gallium 0.4 on AMD Aruba, 8GB RAM on an Ubuntu
the image content. This helps recover the hand image. OS. Therefore, we took advantage of a virtual software
Figure 6 depicts the median filtering process.
container-docker on our machine for implementing our
training without actually installing the necessary
dependencies. Tensorflow is an open source software
platform released by Google to build and train deep
learning models; we employed this tool to train our
network and show the robustness of our method. We
used 10 gestures, and the results are measured by
values ranging between 0 and 1. The closer the value is
to 1 the more accurate the model is. From Table 1 to
Table 10, we present the accuracy results of our
method for 10 gestures with 15 examples each. Gesture
types that did not have any probability or had only
minor probability were excluded from the table for
a) An original image. b) An image after median filtering. c) The final result.
space efficiency.
Figure 6. Preprocessing results.

4. Experiment and Results


In this chapter, we assess the performance of our
method utilizing Inception-an image recognition
142 The International Arab Journal of Information Technology, Vol. 17, No. 1, January 2020

Table 1. The accuracy results for G01.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G0 0.99 0.99 0.93 0.98 0.99 0.97 0.97 0.96 0.97 0.99 0.99 0.99 0.94 0.99 0.99
G1 0.02 0.01
G4 0.05 0.02 0.03

Table 2. The accuracy results for G1.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G0 0.05
G1 0.87 0.88 0.87 0.91 0.95 0.84 0.96 0.84 0.89 0.89 0.86 0.87 0.96 0.86 0.90
G2 0.04 0.13 0.02
G3 0.05
G4 0.05 0.02 0.03 0.01 0.04 0.11 0.04
G7 0.05 0.05 0.05

Table 3. The accuracy results for G2.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G1 0.01 0.08 0.04
G2 0.97 0.83 0.86 0.90 0.99 0.95 0.95 0.99 0.91 0.98 0.90 0.89 0.94 0.86 0.93
G3 0.03 0.02 0.10 0.03
G7 0.06
G8 0.052 0.03
G9 0.04 0.06

Table 4. The accuracy results for G3.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G0 0.05
G2 0.04 0.11
G3 0.98 0.96 0.92 0.93 0.93 0.93 0.97 0.91 0.90 0.87 0.95 0.97 0.94 0.97 0.97
G9 0.02 0.06 0.03 0.03 0.02 0.01 0.03 0.01 0.02 0.02

Table 5. The accuracy results for G4.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G0 0.04 0.07
G1 0.06 0.06 0.03 0.03 0.05 0.03 0.06 0.05
G4 0.91 0.89 0.86 0.90 0.90 0.93 0.92 0.95 0.92 0.88 0.82 0.87 0.89 0.91 0.92
G6 0.08 0.05
G7 0.07 0.09 0.11

Table 6. The accuracy results for G5.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G5 0.99 0.95 0.99 0.95 0.99 0.98 0.99 0.99 0.93 0.99 0.99 0.99 0.99 0.99 0.90
G8 0.04 0.03
G9 0.02 0.08

Table 7. The accuracy results for G6.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G4 0.04
G5 0.08
G6 0.97 0.99 0.82 0.99 0.96 0.98 0.94 0.86 0.94 0.92 0.89 0.89 0.96 0.96 0.94
G7 0.04 0.07 0.02
G8 0.17 0.02 0.01 0.09 0.04
G9 0.03

Table 8. The accuracy results for G7.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G1 0.10
G4 0.06 0.02
G7 0.97 0.89 0.97 0.85 0.99 0.89 0.99 0.98 0.90 0.96 0.92 0.99 0.91 0.98 0.95
G8 0.02 0.06 0.05 0.03 0.07
G9 0.04

Table 9. The accuracy results for G8.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G2 0.02
G3 0.04
G6 0.04 0.06 0.04
G7 0.03 0.06 0.03 0.02 0.03
G8 0.96 0.95 0.99 0.91 0.97 0.94 0.92 0.92 0.93 0.94 0.96 0.88 0.93 0.92 0.98
G9 0.03 0.07
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 143

Table 10. The accuracy results for G9.


G\№ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
G2 0.02 0.05 0.05 0.06 0.05 0.05 0.03
G3 0.05 0.11 0.01
G4 0.05
G5 0.01
G8 0.06 0.05
G9 0.92 0.92 0.93 0.86 0.93 0.91 0.90 0.91 0.99 0.93 0.97 0.90 0.93 0.97 0.92

The leftmost column defines a gesture type


6. Conclusions and Future Studies
according to Figure 7. The bold row represents the This article presents a novel approach to gesture
observed gesture type. This applies to Tables 1-10. recognition for Kinect-based applications that require
As seen in the tables, the algorithm works non-moving and sitting activities. With less number
satisfactorily. We tallied the average percentage of of gestures, this work differs from other existing
each gesture and created a confusion matrix out of 15 approaches [16, 17, 18] in that its recognition rate is
test images for each gesture. The accuracy results in high and it can be implemented in close to real-time.
Tables 1-10 were converted to integer values and the There are some theoretical and practical issues that
contributions that did not reach 1% were removed need to be addressed. On the theoretical side,
from the table. All the gestures had a 90% illumination problems and hand segmentation parts
recognition rate at least. With this we can effectively must be improved by implementing relatively
verify the robustness of our algorithm. Table 11 advanced software. On the practical side, the number
presents an average percentage for each gesture. of users must be increased up to 3-4 users. As the
next step, we aim to include more gesture types of
Table 11. Average results for 10 gestures with 15 examples each.
ASL and segment gestures performed with both
G0 98 1 hands. In the near future, we plan to create a dataset
G1 90 1 5 1
of gestures performed with both hands having more
G2 1 93 1
G3 1 94 2 than 10 gestures.
G4 1 3 90 1 2
G5 98 1 1 References
G6 94 4 2
G7 1 1 95 2 [1] Ahn Y., Park Y., Choi K., Park W., Seo H., and
G8 1 1 94 1 Jung K., “TOF Depth Camera 3D Gesture
G9 2 1 1 93 Interaction System,” in Proceedings of
G0 G1 G2 G3 G4 G5 G6 G7 G8 G9
International Conference on Advances in
Information Technology, Bangkok, pp. 11-17,
Moreover, we successfully integrated the Kinect
device and the proposed algorithm as a standalone 2012.
[2] Bhattacharya S., Czejdo B., and Perez N.,
system to calculate the timings in real time situations.
“Gesture Classification with Machine Learning
The algorithm was able to detect the gesture in 0.1-
0.3 seconds and most existing gesture-based Using Kinect Sensor Data,” in Proceedings of
3rd international Conference on Emerging
applications could make use of the proposed method
in real time. Applications of Information Technology,
Kolkata, pp. 727-31, 2012.
[3] Dean J., Corrado G., Monga R., Chen K., Devin
5. Limitations M., Le Q., Mao M., Ranzato M., Senior A,
While experimenting in a real time environment, we Tucker P., Yang K., and Ng A., “Large Scale
witnessed that the system occasionally takes more Distributed Deep Networks,” in Proceedings of
time to identify a gesture and the user needs to hold the 25th International Conference on Neural
up his hand for 0.2-0.3 seconds. These shortcomings Information Processing Systems, Lake Tahoe,
can be attributed to global illumination effects in a pp. 1232-1240, 2012.
room, and the system needs to be improved in terms [4] Deyou X., “A Network Approach for Hand
of the skin color detection process. According to our Gesture Recognition in Virtual Reality Driving
recognition conditions, a user can raise their left hand Training System of SPG,” in Proceedings of the
only; however, after implementing mirroring effect of 18th International Conference on Pattern
the frame we were able to recognize right-handed Recognition, Hong Kong, pp. 519-522, 2006.
gestures as well. Notwithstanding, the recognition [5] Dorninio F., Donadeo M., Marin G., Zanuttigh
rate was reduced by approximately 6%. Furthermore, P., and Cortelazzo G., “Hand Gesture
the system fails to recognize gestures when a user is Recognition with Depth Data,” in Proceedings
occluded or shows a gesture that has not been trained of the 4th ACMI/EEE International Workshop on
before. Analysis and Retrieval of Tracked Events and
144 The International Arab Journal of Information Technology, Vol. 17, No. 1, January 2020

Motion in Imagery Stream, Barcelona, pp. 9-16, Alphabets using Edge Oriented Histogram and
2013. Multi Class SVM,” International Journal of
[6] Elmezain M., Al-Hamadi A., Appenrodt J., and Computer Applications, vol. 82, no. 4, pp. 28-
Michaelis B., “A Hidden Markov Model-Based 35, 2013.
Continuous Gesture Recognition System for [18] Pansare J., Gawande S., and Ingle M., “Real-
Hand Motion Trajectory,” in Proceedings of the Time Static Hand Gesture Recognition for
19th International Conference on Pattern American Sign Language (ASL) in Complex
Recognition, Tampa, pp. 1-4, 2008. Background,” Journal of Signal and
[7] Gai S., Jung J., and Yi J., “Mobile Shopping Information Processing, vol. 3, no. 3, pp 364-
Cart Application Using Kinect,” in Proceedings 367, 2012.
of the 10th International Conference on [19] Parvini F., McLeod D., Shahabi C., Navai B.,
Ubiquitous Robots and Ambient Intelligence, Zali B., and Ghandeharizadeh S., “An
Jeju, pp. 289-291, 2013. Approach to Glove-Based Gesture
[8] He G., Kang S., Song W., and Jung S., “Real- Recognition,” in Proceedings of International
Time Gesture Recognition Using 3d Depth Conference on Human-Computer Interaction,
Camera,” in Proceedings of IEEE 2nd San Diego, pp. 236-245, 2009.
International Conference on Software [20] Pavlovic V., Sharma R., and Huang T., “Visual
Engineering and Service Science, Beijing, pp. Interpretation of Hand Gestures for Human-
187-190, 2011. Computer Interaction: A Review,” IEEE
[9] Ibraheem N., Hasan M., Khan R., and Mishra Transactions on Pattern Analysis and Machine
P., “Understanding Color Models: A Review,” Intelligence, vol. 19, no. 7, pp. 677-695, 1997.
ARPN Journal of Science and Technology, vol. [21] Ren Z., Meng J., and Yuan J., “Depth Camera
2, no. 3, pp. 265-275, 2012. Based Hand Gesture Recognition and its
[10] Jerald J., the VR Book. Human-Centered Design Applications in Human- Computer-Interaction,”
for Virtual Reality, Association for Computing in Proceedings of 8th International Conference
Machinery and Morgan and Claypool New on Information, Communications and Signal
York, 2016. Processing, Singapore, pp. 1-5, 2011.
[11] Khoshelham K., “Accuracy Analysis of Kinect [22] Ren Z., Yuan J., and Zhang Z., “Robust Hand
Depth Data,” In: ISPRS Workshop Laser Gesture Recognition Based on Finger-Earth
Scanning, Calgary, 2011. Movers Distance with a Commodity Depth
[12] Kulkarni V. and Lokhande S., “Appearance Camera,” in Proceedings of the 19th
based Recognition of American Sign Language International Conference on Multimedea,
Using Gesture Segmentation,” International Scottsdale, pp. 1093-1096, 2011.
Journal on Computer Science and Engineering, [23] Samantaray A., Nayak S., and Mishra A.,
vol. 2, no. 3, pp. 560-565, 2010. “Hand Gesture Recognition using Computer
[13] Li Z. and Jarvis R., “Real Time Hand Gesture Vision,” International Journal of Scientific and
Recognition Using A Range Camera,” in Engineering Research, vol. 4, no. 6, pp. 1602-
Proceedings of Australasian Conference on 1609, 2013.
Robotics and Automation Sydney, pp. 529-534, [24] Shaik K., Ganesan P., Kalist V., Sathish B., and
Australia, 2009. Jenitha M., “Comparative Study of Skin Color
[14] Liu L., Sang N., Yang S., and Huang R., “Real- Detection and Segmentation in HSV and
Time Skin Color Detection under Rapidly YCbCr Color Space,” Procedia Computer
Changing Illumination Conditions,” IEEE Science, vol. 57, 41-48, 2015.
Transactions on Consumer Electronics, vol. 57, [25] Sharp T., Keskin C., Robertson D., Taylor J.,
no. 3, pp. 1295-1302, 2011. Shotton J., Kim D., Rhemann C., Leichter I.,
[15] Madani T., Tahir M., Ziauddin S., Raza S., Vinnikov A., Wei Y., Freedman D., Kohli P.,
Ahmed M., Khan M., and Ashraf M., “An Krupka E., Fitzgibbon A., and Izadi S.,
Accelerometer-Based Approach To Evaluate 3D “Accurate, Robust, and Flexible Real-Time
Unistroke Gestures,” The International Arab Hand Tracking,” in Proceedings of the 33rd
Journal of Information Technology, vol. 12, no. Annual ACM Conference on Human Factors in
4, pp. 389- 394, 2015. Computing Systems, Seoul, pp. 3633-3642
[16] Marin G., Dominio F., and Zanuttigh P., “Hand 2015.
Gesture Recognition with Leap Motion and [26] Shotton J., Fitzgibbon A., Cook M., Sharp T.,
Kinect Devices,” in Proceedings of IEEE Finocchio M., Moore R., Kipman A., and Blake
International Conference on Image Processing, A., “Real-Time Human Pose Recognition in
pp. 1565-1569, Paris, 2014. Parts from a Single Depth Image,” in
[17] Nagarajan S. and Subashini T., “Static Hand Proceedings of IEEE Conference Computer
Gesture Recognition for Sign Language
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 145

Vision and Pattern Recognition, Providence,


pp. 1297-1304, 2011.
[27] Song Y., Gu Y., Wang P., Liu Y., and Li A., “A
Kinect based Gesture Recognition Algorithm
using GMM and HMM,” in Proceedings of 6th
International Conference on Biomedical
Engineering and Informatics, Hangzhou, pp.
750-754, 2013.
[28] Trigo T., Roberto S., and Pellegrino S., “An
Analysis of Features for Hand-Gesture
Classification,” in Proceedings of 17th
International Conference on Systems, Signals
and Image Processing, Rio de Janeiro, pp. 412-
415, 2010.
[29] Vaezi M. and Nekouie M., “3d Human Hand
Posture Reconstruction Using A Single 2d
Image,” International Journal of Human
Computer Interaction, vol. 1, no. 4, pp. 83-94,
2011.

Tuktaev Sokhib received the B.S.


and M.S. degrees from Tashkent
University of Information
Technologies, Tashkent,
Uzbekistan in 2015 and Gachon
University, in the faculty of IT
Convergence Engineering, Korea
in 2018. His research interests include Computer
Vision, Image Processing, Machine/Deep Learning
and Artificial Intelligence.

Taeg Keun Whangbo received the


M.S. degree from City University
of New York in 1988 and the Ph.D.
degree both in Computer Science
from Stevens Institute of
Technology in 1995. Currently, he
is a professor in the Department of
Computer Science, Gachon University, Korea. Before
he joined the Gachon University, he was the software
developer in Q-Systems which is located in New
Jersey from 1988 to 1993. He was also the researcher
in Samsung Electronics from 2005 to 2007. From
2006 to 2008, he was the president of the Association
of Korea Cultural Technology. His research areas
include Computer Graphics, HCI and VR/AR.

You might also like