Gray Word4
Gray Word4
Abstract: Kinect is a promising acquisition device that provides useful information on a scene through color and depth data.
There has been a keen interest in utilizing Kinect in many computer vision areas such as gesture recognition. Given the
advantages that Kinect provides, hand gesture recognition can be deployed efficiently with minor drawbacks. This paper
proposes a simple and yet efficient way of hand gesture recognition via segmenting a hand region from both color and depth
data acquired by Kinect v1. The Inception model of the image recognition system is used to check the reliability of the
proposed method. Experimental results are derived from a sample dataset of Microsoft Kinect hand acquisitions. Under the
appropriate conditions, it is possible to achieve high accuracy in close to real time.
induced by reconstruction and inverse kinematics. One hand gestures for HCI and proposed shape descriptors.
deficiency of their work is its requirement of color Kinect enables the manipulation of windows and
markers to locate the exact angles of fingers with gaming applications too. Guan-Feng He et al. [8]
respect to the other parts of a hand in a 2D color proposed a vision-based HCI system in which users
image. The work by Dominio et al. [5] is also relevant can control windows and games such as the popular
to our study. They extracted hand region from depth Angry Birds. Madani et al. [15] presents another
and color data as well and introduced two new relevant approach where gesture signals from Wiimote device
features: the distance from the palm center to the are classified to deal with numeric gestures.
finger, and the curvature of the hand shape. The Human Robot Interaction (HRI) has also become a
method we propose is similar to that of [5]. It is mainly hot topic and many works have been claimed at
intended for sedentary activities that do not require creating robotic devices that can mimic human actions.
substantial changes of the scene. Under such Gestures are crucial in HRI for robots to follow what
conditions, we extract the closest points from the depth humans do or respond to their gestural commands [7,
data. Other detailed information is provided in the 27]. Apart from hand gesture recognition, palm center
upcoming chapter. and finger recognition have been researched over the
Some attempts successfully adapted machine past several years. Kinect provides full body skeleton
learning methods to gesture recognition using Kinect’s tracking, which is especially useful when combined
skeletal data. Bhattacharya et al. [2] applied Support with depth data to detect finger position.
Vector Machines and Decision Trees to recognize Skin color detection has been found to have a great
aircraft gestures used in military. Another promising impact on detecting bare hands even under fairly
method for hand gesture recognition is to create a changing backgrounds [14]. It is difficult to have a
model of the hand with all possible DoFs included and single method for skin color detection tasks, as human
match the model with the current shape from Kinect. skin color tone is cardinally different for people from
Similar work has been conducted by Microsoft different places. Since an Red, Green and Blue (RGB)
Research group in [25]. The hand region of interest is color space is not a good option for skin color
derived from depth input stream coming directly from detection as explained in [24], there are a variety of
Kinect and sent to the reinitializer to assume a other color spaces that can be used for this specific
distribution over poses and the “golden energy”, which task. Through the years researchers have found the
compares the output from the reinitializer with the YCbCr color space to be pertinent compared to its
actual shape of the hand. The output with least error is counterparts, and Shaik et al. [24] presented a
derived from the golden energy and shown as a best comparative study of skin color detection on Hue,
matched result in real-time application. This work [25] Saturation and Value (HSV) and Luminance, Chroma-
is outstanding in comparison with other similar blue, Chroma-red (YCbCr). Inspired from [24], we
approaches reported in [19, 21], but lacks the ability to chose the YCbCr color space to help detect human skin
render a good match when practically implemented color in color images.
with other physical things like pens or balls. The rich
information of Kinect is not only applicable for hand 3. Proposed Method
or face recognition but also can be used for full body
pose recognition as claimed in [26]. A 3D range The method we propose uses color images and depth
camera is also used to recognize hand gestures in real images captured via Kinect. However, with Kinect, it is
time. Li and Jarvis [13] created a system to identify not possible to capture scene information of very near
hand gestures from a 3D range camera. They captured or very far distances [11]. To achieve good results, the
a depth data with a range camera to segment the hand user must stand facing the Kinect at a relevant range,
and locate the hand in 3D space. The system has some that is, 0.8 m to 5.5 m. At distances closer than 0.8 m,
limitations when the user holds his hand still such that Kinect records a scene with a lot of noise and
the forearm and hand are the same distance from the flickering. Similarly, at distances further than 5.5m, the
camera. In this situation, it is difficult to segment only depth image varies substantially in brightness. As
a hand distinguished from the forearm, and therefore a stated above, there are more than 20 DoFs for a single
color web camera is required to segment the hand hand to be accurately modeled. In this paper, we used
using color information. Pavlovic et al. [20] surveyed only images, and a limited number of gestures were
several approaches to modeling, analysis, and included. On the one hand, in VR-based gaming
recognition of hand gestures for visual interpretation. applications or in other fields like health care, or
Hand gesture and arm motion have been considered to entertainment where hand gestures are used, there is
be promising candidates for indexing visual commands not much necessity for multiple variations of hand
to control the computer. Specific to our hand gestures, shapes. On the other hand, it would be hard to
there is a work done by Trigo and Pellegrino in [28] memorize and mimic each gesture for different tasks,
which analyzed the use of invariant moments, k- even if the system could include many gestures.
curvature features and template matching to classify Having considered the previous aspects, a dataset of
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 139
hand gestures [16] was selected to experimentally test detect skin color intensities. For this to be
our method. This dataset contains gestures performed accomplished, we first extract the two main
by 14 different people, each performing 10 different components: Cb-chrominance-blue and Cr-
gestures repeated 10 times each, for a total of 1400 chrominance-red from the image in the YCbCr space.
gestures. We did not use all the available data though; The threshold value for extracting skin color intensity
there was an additional built-in dataset-assembled with values is determined based on trial of likely skin colors
a Leap Motion device that comes with Kinect. Figure 1 in the dataset chosen. In the case of our experimental
pictorially explains the overall procedure of our dataset, intensity values less than 78 and greater than
approach. 126 in the Cb channel and intensity values less than
134 and greater than 173 in the Cr channel are found to
be most likely skin color intensity values derived after
experimenting with many combinations. Pixel
intensities that are not considered to be skin color are
converted to black, which helps when implementing
the coordinate matching method. Figure 2 shows
sample images from the dataset to give a clear
understanding of the process.
is, three light beams are added to form a final color. Figure 2. Samples derived from skin color extraction.
Though, it is difficult to determine specific color in
RGB color model [9]. For this reason, skin color based 3.2. Depth Image Processing
techniques exploit other color models such as HSV or Next, the depth data that was captured at the same time
Luminance In-phase Quadrature (YIQ). We selected as the color image is processed. To work with depth
the YCbCr color space over all other available color data from Kinect we need to have a clear
spaces because color and brightness are split nicely, understanding of the nature of it. The depth stream of
allowing us to perform computations on individual Kinect is made up of pixels that contain distance
channels. In the YCbCr color space the luminance information from the camera plane to the nearest
information is stored separately in the Y component object. The distance is measured in millimeters. This
and, the chrominance information is stored in Cb and feature is very useful when there is a need for
Cr in blue and red, respectively. However, there is a removing a background scene from image. The depth
transformation to easily convert from the RGB color
image comes in three different resolutions: 640x480
model to the YCbCr color model: (the default), 320x240, and 80x60. Reasonably, the
Y’ = 0.299 * R’ + 0.587 * G’ + 0.114 * B’ largest resolution is what we need because we are
Cb = -0.169 * R’ – 0.331 * G’ + 0.500 * B’ (1) going to acquire the highest quality possible. In our
Cr = 0.500 * R’ – 0.419 * G’ – 0.081 * B’ case, we acquire a depth image with the largest
As such, it is possible to derive each channel separately resolution and extract a candidate hand region from it.
and detect skin color values from components. There are several ways of extracting a candidate hand
A color image is acquired from Kinect; at the same region from the depth image. One of them is to enable
time, a depth image is also acquired, and both are sent skeletal tracking and identify the human shape. When
to the preprocessing stage where the resolution of the skeletal tracking is enabled, the Kinect runtime
color image is down-sampled to fit the resolution of processes the depth data to segment the player from the
the depth map. After converting the color image from map and removes background pixel intensities.
the RGB to the YCbCr color space, we are able to Furthermore, it is natural and intuitive for the users
who control the Kinect-driven system with hand
140 The International Arab Journal of Information Technology, Vol. 17, No. 1, January 2020
histograms of those depth images in one plot. Figure 4. Samples derived after threshold value is applied.
Motion in Imagery Stream, Barcelona, pp. 9-16, Alphabets using Edge Oriented Histogram and
2013. Multi Class SVM,” International Journal of
[6] Elmezain M., Al-Hamadi A., Appenrodt J., and Computer Applications, vol. 82, no. 4, pp. 28-
Michaelis B., “A Hidden Markov Model-Based 35, 2013.
Continuous Gesture Recognition System for [18] Pansare J., Gawande S., and Ingle M., “Real-
Hand Motion Trajectory,” in Proceedings of the Time Static Hand Gesture Recognition for
19th International Conference on Pattern American Sign Language (ASL) in Complex
Recognition, Tampa, pp. 1-4, 2008. Background,” Journal of Signal and
[7] Gai S., Jung J., and Yi J., “Mobile Shopping Information Processing, vol. 3, no. 3, pp 364-
Cart Application Using Kinect,” in Proceedings 367, 2012.
of the 10th International Conference on [19] Parvini F., McLeod D., Shahabi C., Navai B.,
Ubiquitous Robots and Ambient Intelligence, Zali B., and Ghandeharizadeh S., “An
Jeju, pp. 289-291, 2013. Approach to Glove-Based Gesture
[8] He G., Kang S., Song W., and Jung S., “Real- Recognition,” in Proceedings of International
Time Gesture Recognition Using 3d Depth Conference on Human-Computer Interaction,
Camera,” in Proceedings of IEEE 2nd San Diego, pp. 236-245, 2009.
International Conference on Software [20] Pavlovic V., Sharma R., and Huang T., “Visual
Engineering and Service Science, Beijing, pp. Interpretation of Hand Gestures for Human-
187-190, 2011. Computer Interaction: A Review,” IEEE
[9] Ibraheem N., Hasan M., Khan R., and Mishra Transactions on Pattern Analysis and Machine
P., “Understanding Color Models: A Review,” Intelligence, vol. 19, no. 7, pp. 677-695, 1997.
ARPN Journal of Science and Technology, vol. [21] Ren Z., Meng J., and Yuan J., “Depth Camera
2, no. 3, pp. 265-275, 2012. Based Hand Gesture Recognition and its
[10] Jerald J., the VR Book. Human-Centered Design Applications in Human- Computer-Interaction,”
for Virtual Reality, Association for Computing in Proceedings of 8th International Conference
Machinery and Morgan and Claypool New on Information, Communications and Signal
York, 2016. Processing, Singapore, pp. 1-5, 2011.
[11] Khoshelham K., “Accuracy Analysis of Kinect [22] Ren Z., Yuan J., and Zhang Z., “Robust Hand
Depth Data,” In: ISPRS Workshop Laser Gesture Recognition Based on Finger-Earth
Scanning, Calgary, 2011. Movers Distance with a Commodity Depth
[12] Kulkarni V. and Lokhande S., “Appearance Camera,” in Proceedings of the 19th
based Recognition of American Sign Language International Conference on Multimedea,
Using Gesture Segmentation,” International Scottsdale, pp. 1093-1096, 2011.
Journal on Computer Science and Engineering, [23] Samantaray A., Nayak S., and Mishra A.,
vol. 2, no. 3, pp. 560-565, 2010. “Hand Gesture Recognition using Computer
[13] Li Z. and Jarvis R., “Real Time Hand Gesture Vision,” International Journal of Scientific and
Recognition Using A Range Camera,” in Engineering Research, vol. 4, no. 6, pp. 1602-
Proceedings of Australasian Conference on 1609, 2013.
Robotics and Automation Sydney, pp. 529-534, [24] Shaik K., Ganesan P., Kalist V., Sathish B., and
Australia, 2009. Jenitha M., “Comparative Study of Skin Color
[14] Liu L., Sang N., Yang S., and Huang R., “Real- Detection and Segmentation in HSV and
Time Skin Color Detection under Rapidly YCbCr Color Space,” Procedia Computer
Changing Illumination Conditions,” IEEE Science, vol. 57, 41-48, 2015.
Transactions on Consumer Electronics, vol. 57, [25] Sharp T., Keskin C., Robertson D., Taylor J.,
no. 3, pp. 1295-1302, 2011. Shotton J., Kim D., Rhemann C., Leichter I.,
[15] Madani T., Tahir M., Ziauddin S., Raza S., Vinnikov A., Wei Y., Freedman D., Kohli P.,
Ahmed M., Khan M., and Ashraf M., “An Krupka E., Fitzgibbon A., and Izadi S.,
Accelerometer-Based Approach To Evaluate 3D “Accurate, Robust, and Flexible Real-Time
Unistroke Gestures,” The International Arab Hand Tracking,” in Proceedings of the 33rd
Journal of Information Technology, vol. 12, no. Annual ACM Conference on Human Factors in
4, pp. 389- 394, 2015. Computing Systems, Seoul, pp. 3633-3642
[16] Marin G., Dominio F., and Zanuttigh P., “Hand 2015.
Gesture Recognition with Leap Motion and [26] Shotton J., Fitzgibbon A., Cook M., Sharp T.,
Kinect Devices,” in Proceedings of IEEE Finocchio M., Moore R., Kipman A., and Blake
International Conference on Image Processing, A., “Real-Time Human Pose Recognition in
pp. 1565-1569, Paris, 2014. Parts from a Single Depth Image,” in
[17] Nagarajan S. and Subashini T., “Static Hand Proceedings of IEEE Conference Computer
Gesture Recognition for Sign Language
A Combined Method of Skin-and Depth-based Hand Gesture Recognition 145