What is Gesture:-
A movement of a limb or the body as an expression of thought or
feeling.
Gesture Recognition = Complex Task:-
� Motion modeling
�Motion analysis
�Pattern recognition
�Machine learning
�Psycholinguistic studies
�…
Introduction:-
�Interaction with computers are not comfortable experience
�Computers should communicate with people with body
language.
�Hand gesture recognition becomes important
�Interactive human-machine interface and virtual
environment
�Two common technologies for hand gesture recognition
�glove-based method
�Using special glove-based device to extract hand
posture
�Annoying
�vision-based method
�3D hand/arm modeling
�Appearance modeling
Background and Trends:-
� In Today’s world:
• Many devices with integrated cameras
• Many personal webcams
• Our Goal:
• To understand how to take advantage of these one
camera systems
Mood, emotion:-
• Mood and emotion are expressed by body language
• Facial expressions
• Tone of voice
• Allows computers to interact with human beings in a more
natural way
Tasks to be performed:-
• Making of gestures in front of the camera
• Gesture detection at a suitable frame rate
• Capturing the gestures and storing them in a .jpg file
• System training to recognize the gestures with a low error
rate
• Execution of events upon the successful gesture recognition
on the webpage
Human Gesture Representation:-
�Psycholinguistics research by Stokoe:
� Hand shape
�Position
�Orientation
�Movement
�Application scenarios of gestures
�Conversational
�Controlling
�eg: vision-based interfaces
�Manipulation
�eg: Interact with virtual objects
�Communication
�eg: Sign language → Highly structured
CSL and Pre-processing:-
�Sign Language
�Rely on the hearing society
�Two main elements:
�Low and simple level signed alphabet, mimics the
letters of the native spoken language
�Higher level signed language, using actions to
mimic the meaning or description of the sign
�CSL is the abbreviation for Chinese Sign Language
�30 letters in CSL alphabet ßà Objects in recognition
Human Computer Interface using Gesture:-
• Replace mouse and keyboard
• Pointing gestures
• Navigate in a virtual environment
• Pick up and manipulate virtual objects
• Interact with a 3D world
• No physical contact with computer
• Communicate at a distance
Gesture Making:-
Usage of a small set of gestures (fingers).
Every finger raised will perform some predefined navigation
of the webpage
System capabilities can be programmed to accommodate
other human gestures as well
Error in detection can be reduced by training
Pre-processing of Hand Gesture Recognition:-
�Detection of Hand Gesture Regions
�Aim to fix on the valid frames and locate the hand region
from the rest of the image.
�Low time consuming à fast processing rate à real time
speed
�Detect skin region from the rest of the image by using color.
�Each color has three components
�hue, saturation, and value
�chroma consists of hue and saturation is separated
from value
�Under different condition, chroma is invariant.
�Color is represented in RGB space, also in YUV and YIQ
space.
�In YUV space
�saturation à displacement
C= | U |2 + | V |2
�hue -> amplitudeθ = tan
−1
(V / U )
�In YIQ space
�The color saturation cue I is combined with Θto
reinforce the segmentation effect
�Skins are between red and yellow
�Transform color pixel point P from RGB to YUV and YIQ
space
�Skin region is:
�105 º <= Θ<= 150 º
�30 <= I <= 100
�Hands and faces
�On-line video stream containing hand gestures can be
considered as a signal S(x, y, t)
� (x,y) denotes the image coordinate
�t denotes time
�Convert image from RGB to HIS to extract intensity signal
I(x,y,t)
�Based on the representation by YUV and YIQ, skin pixels can
be detected and form a binary image sequence M’(x,y,t) –
region mask
�Another binary image sequence M’’(x,y,t) which reflects the
motion information is produced between every consecutive pair
of intensity images – motion mask
�M(x,y,t) delineating the moving skin region by using logical
AND between the corresponding region mask and motion mask
sequence
�Normalization
�Transformed the detection results into gray-scale
images with 36*36 pixels.
Gesture Detection:-
Gestures are detected at a suitable frame rate.
The camera captures the hand gesture and we apply canny
edge detection algorithm to store the gestures in the
following format
Locally Linear Embedding:-
�Sparse data vs. High dimensional space
�30 different gestures, 120 samples/gesture
�36*36 pixels
�3600 training samples vs. d = 1296
�Difficult to describe the data distribution
�Reduce the dimensionality of hand gesture images
�Locally Linear Embedding maps the high-dimensional data
to a single global coordinate system to preserve the
neighbouring relations.
�Given n input vectors {x1, x2, …, xn},
è LLE algorithm
è {y1, y2, …, yn} (m<<d)
�Find the k nearest neighbours of each point xi
�Measure reconstruction error from the approximation of
each point by the neighbour points and compute the
reconstruction weights which minimize the error
� Compute the low-embedding by minimizing an embedding
cost function with the reconstruction weights
Sign Language:-
• 5000 gestures in vocabulary
• each gesture consists of a hand shape, a hand motion and a
location in 3D space
• facial expressions are important
• full grammar and syntax
• each country has its own Sign language
• Irish Sign Language is different from British Sign Language
or American Sign Language
A B C
System Training:-
System training is done using “Neuroph” an open source
Image Recognition tool that takes images as input and
produces a neural network.
This Neural network can be trained to recognize the gestures
This can be used with Java Classes to be integrated in our
application, using plug-in provided with the tool
Datagloves:-
• Datagloves provide very accurate measurements of hand-
shape
• But are cumbersome to wear
• Expensive
• Connected by wires- restricts freedom of movement
Datagloves - the future:-
• Will get lighter and more flexible
• Will get cheaper ~ $100
• Wireless?
Our vision-based system:-
• Our vision-based system
Wireless & Flexible No specialised hardware
Single Camera Real-time
Coloured Gloves:-
• User must wear coloured gloves
• Very cheap
• Easy to put on
• BUT get dirty
• Eventually we wish to use natural skin
Feature Space:-
Each point represents a different image
Clusters of points represent different hand-shapes
Distance between points depends on how similar the images are
A continuous gesture creates a trajectory in feature space
We can project a new
image onto the trajectory
Experiments:-
�4125 images including all 30 hand gestures
�60% for training , 40% for testing
�For each image:
�320*240 image, 24b color depth
�Taken from camera with different distance and
orientation
Experiment Results:-
Data # of Samples Recognized Recognition Rate
Samples (%)
Training 2475 2309 93.3
Testing 1650 1495 90.6
Total 4125 3804 92.2
Conclusion:-
�Robust against similar postures in different light conditions
and backgrounds
�Fast detection process, allows the real time video
application with low cost sensors, such as PC and USB camera