0% found this document useful (0 votes)
54 views17 pages

IACV - Mini Project Report - 214

The document presents a project on hand gesture recognition aimed at accurately interpreting gestures made by both hands, focusing on numerical digits. Utilizing computer vision and deep learning techniques, the system is designed for real-time processing and aims to enhance human-computer interaction, particularly for individuals with mobility impairments. The project highlights the potential applications in various fields, including virtual reality and assistive technologies, while addressing challenges in gesture recognition under diverse conditions.

Uploaded by

neminsheth24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views17 pages

IACV - Mini Project Report - 214

The document presents a project on hand gesture recognition aimed at accurately interpreting gestures made by both hands, focusing on numerical digits. Utilizing computer vision and deep learning techniques, the system is designed for real-time processing and aims to enhance human-computer interaction, particularly for individuals with mobility impairments. The project highlights the potential applications in various fields, including virtual reality and assistive technologies, while addressing challenges in gesture recognition under diverse conditions.

Uploaded by

neminsheth24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Hand Gesture Recognition

Submitted in partial fulfillment of the requirements


of the course of

Image Analysis and


Computer Vision

by

Nemin Sheth 60003210214


Harsh Chheda 60003210218
Rushit Jhaveri 60003210221

under guidance of

Prof. Neha Katre

Department of Information Technology


Dwarkadas J. Sanghvi College of Engineering,
Mumbai – 400 056
2023-24
Abstract
In recent years, hand gesture recognition has garnered significant attention due to its potential
applications in various fields, including human-computer interaction, sign language recognition,
and virtual reality. This project presents a comprehensive system for hand gesture recognition
that can detect and interpret gestures made by both hands simultaneously, with a focus on
recognizing numbers. The proposed system utilizes computer vision techniques and deep
learning algorithms to analyze live video input from a camera, extract hand gestures, and classify
them into predefined categories corresponding to numerical digits and other basic gestures. The
system's architecture incorporates Convolutional Neural Networks (CNNs) for feature extraction
and classification, along with techniques for hand detection, tracking, and pose estimation.
Additionally, real-time processing capabilities are achieved through optimization techniques to
ensure efficient performance. The experimental evaluation demonstrates the effectiveness and
robustness of the proposed system in accurately recognizing hand gestures of numbers, under
various environmental conditions and with different hand orientations. The system's versatility
and accuracy make it suitable for a wide range of applications, such as gesture-based input
devices, virtual reality interfaces, and assistive technologies for individuals with disabilities.
Through this project, we aim to contribute to the advancement of hand gesture recognition
technology and its practical applications in various domains. By enabling accurate and efficient
recognition of hand gestures, our system opens up new possibilities for intuitive human-
computer interaction, accessibility solutions, and innovative user interfaces. Moreover, the
insights gained from this research can inform future developments in gesture recognition
systems and pave the way for novel applications in fields such as healthcare, education, and
entertainment.

vi
Table of Contents

1. Introduction
Motivation / Objective
2. Literature Review
Literature Related to Existing Systems
3. Proposed Methodology/ Approach
Problem Definition
Scope
Proposed Approach to build the system
4. System Design
Proposed System Architecture
5. Implementation
Description of Datasets
Description of Tools used
Interface Design
Code
6. Conclusion
7. References (list of papers, books, websites/ blogs, any other resources referred)

vi
i
1. Introduction

Motivation / Objective

The project aims to develop a sophisticated hand gesture recognition system capable of
accurately interpreting gestures made by both hands simultaneously, with a specific emphasis on
recognizing numerical digits. This endeavor is driven by the increasing demand for intuitive
human-computer interaction methods that harness natural hand movements, particularly in
scenarios where traditional input devices may be impractical or cumbersome. By enabling users
to communicate with computers through hand gestures, the system seeks to enhance user
experience and streamline interactions across various computing platforms, including desktop,
mobile, and immersive technologies.

One key motivation behind the project is to promote accessibility and inclusivity by providing
alternative input methods for individuals with mobility impairments or those who prefer non-
verbal communication modalities. Recognizing hand gestures, including numbers, can facilitate
easier access to digital platforms and services for a broader range of users. Moreover, the project
endeavors to advance technological solutions for virtual reality (VR) and augmented reality
(AR) applications by enabling seamless integration of hand tracking and interaction in virtual
environments. This not only fosters immersive experiences but also enhances user engagement
and interaction within VR/AR applications.

Furthermore, the project seeks to contribute to the advancement of computer vision techniques
and deep learning algorithms by leveraging technological advancements in these domains. By
addressing challenges such as occlusions, varying lighting conditions, and complex hand poses,
the developed system aims to push the boundaries of hand gesture recognition technology.
Beyond academic research, the system holds potential for practical applications in industries
such as robotics, smart manufacturing, and human-robot collaboration, where intuitive human-
machine interaction is crucial for enhancing productivity and safety. Overall, the project on hand
gesture recognition with both hands, including recognizing numbers, endeavors to bridge the gap
between humans and machines, opening up new possibilities for seamless and intuitive
interaction in diverse contexts.

v
ii
i`
2. Literature Review

The comprehensive review presented by Mohamed et. al. in [1] highlights the significant
progress and ongoing research efforts in vision-based hand gesture recognition systems from
2014 to 2020. The analysis of 98 articles extracted from reputable online databases sheds light
on key areas of focus within the field. Notably, the majority of research concentrates on data
acquisition, data environment, and hand gesture representation, reflecting the foundational
pillars of gesture recognition technology. However, despite advancements, there remains a
notable gap in continuous gesture recognition, suggesting the need for further exploration and
refinement in this aspect to achieve practical implementation of vision-based gesture recognition
systems.

Addressing the challenges of real-time hand gesture recognition in complex environments, Wu


et. al. [2] presents a novel approach leveraging deep learning techniques for unmanned vehicle
control. By integrating the ssd_mobilenet model for hand detection and Convolutional Pose
Machines (CPMs) for keypoint detection, the system achieves impressive real-time performance
with a recognition accuracy of 96.7%. The utilization of multi-frame recursion further enhances
robustness, demonstrating the applicability of the proposed method in real-world scenarios.

Introducing a novel technique for computer interaction through hand gestures, Hussain et. al. [3]
proposes a comprehensive framework encompassing hand shape recognition, dynamic hand
tracing, and gesture command conversion. By leveraging computer vision algorithms, the system
achieves a remarkable accuracy of 93.09% across six static and eight dynamic hand gestures.
This research contributes to the development of more natural and intuitive human-machine
interfaces, offering new avenues for interaction design and usability enhancement.

With the increasing demand for human-machine interaction, Sun et. al. [4] addresses the
practical application of hand gesture recognition in various domains such as robot control and
intelligent furniture. The proposed methodology combines skin color segmentation, real-time
hand gesture tracking, and convolutional neural network (CNN) recognition to achieve an
impressive accuracy of 98.3% in recognizing common digits. This underscores the efficacy of
integrating computer vision techniques with machine learning algorithms for accurate and
efficient gesture recognition.

Focusing on the accessibility aspect, Swati et. al. [5] presents a simple yet effective technique
for gesture recognition using Python libraries such as OpenCV and NumPy. By considering
parameters like area ratio and convexity defects, the system distinguishes between different
gestures, offering potential applications for entertainment and accessibility purposes. This
research contributes to the democratization of gesture recognition technology, making it
accessible to a wider audience, including individuals with disabilities.

ix
3. Proposed Methodology/ Approach

Problem Definition

The problem addressed in this project is the development of a robust hand gesture recognition
system capable of accurately interpreting gestures made by both hands, with a specific focus on
recognizing numerical digits. Traditional input methods such as keyboards and touchscreens
may not always be suitable, especially in scenarios where users require hands-free interaction or
have mobility impairments. Therefore, there is a need for an intuitive and efficient system that
can understand and interpret hand gestures, enabling users to interact with computers and
devices seamlessly.

Scope

The scope of this project encompasses the following aspects:

1. Hand Gesture Recognition: The system will be designed to detect and recognize hand gestures
made by users in real-time. This includes recognizing numerical digits as well as potentially
other predefined gestures.

2. Simultaneous Recognition of Both Hands: The system will be capable of recognizing gestures
made by both hands simultaneously, allowing for more complex interactions and applications.

3. Real-time Performance: Emphasis will be placed on achieving real-time processing


capabilities to ensure smooth and responsive interaction with the system.

4. Versatility and Adaptability: The system will be designed to adapt to different hand shapes,
sizes, and orientations, enhancing its usability across diverse user populations and environments.

5. Accuracy and Robustness: The system will aim to achieve high accuracy in gesture
recognition while being robust to variations in lighting conditions, background clutter, and
occlusions.

6. Integration Potential: The system will be designed with the potential for integration into
various applications and devices, such as assistive technologies, virtual reality interfaces, and
interactive displays.

Proposed Approach to Build System

The proposed approach to building the hand gesture recognition system involves the following
steps:

1. Data Acquisition: Gather a diverse dataset of hand gestures, including numerical digits,
performed by different individuals in various environments. This dataset will serve as the
training data for the system.

2. Preprocessing: Preprocess the input images to enhance contrast, remove noise, and normalize
lighting conditions. This step ensures that the input data is clean and standardized for further
processing.

3. Feature Extraction: Utilize computer vision techniques to extract relevant features from the
hand gestures, such as key landmarks and spatialx relationships between them. This step
transforms the raw input data into a more meaningful
` representation that can be used for gesture
classification.

4. Model Training: Train a deep learning model, such as a Convolutional Neural Network
(CNN), using the preprocessed data to recognize hand gestures. The model will learn to map
input images to corresponding gesture labels, enabling accurate classification.

5. Real-time Processing: Implement the trained model into a real-time processing pipeline,
where live video input from a camera is continuously fed into the system. The system will detect
and recognize hand gestures in real-time, allowing for immediate feedback and interaction.

6. Evaluation and Optimization: Evaluate the performance of the system using metrics such as
accuracy, precision, and recall. Identify areas for improvement and optimize the system to
enhance its performance and robustness.

7. Integration and Deployment: Integrate the hand gesture recognition system into the desired
applications or devices, ensuring compatibility and seamless integration. Deploy the system for
practical use, where users can interact with computers and devices using hand gestures
effectively and intuitively.

xi
4. System Design

MediaPipe Architecture

x
ii
`
5. Implementation

Decription of Datasets

For training and testing the hand gesture recognition system, diverse datasets of hand gestures
will be utilized. These datasets will include examples of numerical digits (0-9) performed by
different individuals in various environments. Additionally, the datasets may contain other
predefined gestures relevant to the intended applications of the system. The datasets will be
annotated with labels corresponding to the gestures performed in each image or video frame. To
ensure robustness and generalization of the system, the datasets will encompass variations in
hand shapes, sizes, orientations, lighting conditions, and backgrounds. Some commonly used
datasets for hand gesture recognition include the American Sign Language (ASL) Alphabet
dataset, ChaLearn Looking at People dataset, and custom datasets collected specifically for the
project.

Description of Tools Used

1. OpenCV (Open Source Computer Vision Library): OpenCV will be used for image and video
processing tasks, including input/output operations, preprocessing, and feature extraction. It
provides a comprehensive set of functions and algorithms for computer vision tasks, making it
suitable for implementing various components of the hand gesture recognition system.

2. MediaPipe: MediaPipe offers pre-trained models and pipelines for hand tracking and gesture
recognition, which can be integrated into the system for efficient hand detection and tracking. It
provides high-level abstractions and APIs for building real-time applications with complex
computer vision tasks.

3. TensorFlow/Keras: TensorFlow and Keras will be utilized for training and deploying deep
learning models for gesture recognition. These frameworks offer powerful tools and utilities for
building and training convolutional neural networks (CNNs), which are commonly used for
image classification tasks.

4. NumPy: NumPy will be used for numerical computations and array manipulations,
particularly in preprocessing steps and data manipulation tasks. Its efficient array operations and
mathematical functions make it suitable for handling large datasets and performing complex
computations.

5. Matplotlib: Matplotlib will be used for data visualization and analysis, allowing for the
visualization of training/validation metrics, model performance, and debugging.

Interface Design

The interface design of the hand gesture recognition system will focus on providing a user-
friendly and intuitive experience for interacting with the system. The interface will consist of the
following components:

1. Live Video Feed: The interface will display a live video feed from the camera, showing the
user's hand gestures in real-time.

2. Gesture Recognition Output: The interface will provide visual feedback indicating the
recognized gestures, such as highlighting the detected hand regions and displaying
xi
corresponding numerical digits or predefined gesture labels.
ii
3. User Instructions: Clear and concise instructions will be provided to guide users on how to
perform different gestures and interact with the system effectively.

4. Control Options: The interface may include options for controlling the system settings, such
as adjusting sensitivity levels, switching between different gesture recognition modes, and
calibrating the camera.

5. Accessibility Features: Consideration will be given to incorporating accessibility features,


such as adjustable font sizes, high contrast modes, and voice-guided instructions, to
accommodate users with diverse needs and preferences.

x
i
v
Code:

import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands

# For webcam input:


cap = cv2.VideoCapture(0)
with mp_hands.Hands(
model_complexity=0,
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
# If loading a video, use 'break' instead of 'continue'.
continue

# To improve performance, optionally mark the image as not writeable to


# pass by reference.
image.flags.writeable = False
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = hands.process(image)

# Draw the hand annotations on the image.


image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# Initially set finger count to 0 for each cap


fingerCount = 0

if results.multi_hand_landmarks:

for hand_landmarks in results.multi_hand_landmarks:


# Get hand index to check label (left or right)
handIndex = results.multi_hand_landmarks.index(hand_landmarks)
handLabel = results.multi_handedness[handIndex].classification[0].label

# Set variable to keep landmarks positions (x and y)


handLandmarks = []

# Fill list with x and y positions of each landmark


for landmarks in hand_landmarks.landmark:
handLandmarks.append([landmarks.x, landmarks.y])

# Test conditions for each finger: Count is increased if finger is


# considered raised.
# Thumb: TIP x position must be greater orxvlower than IP x position,
# deppeding on hand label.
if handLabel == "Left" and handLandmarks[4][0] > handLandmarks[3][0]:
fingerCount = fingerCount+1
elif handLabel == "Right" and handLandmarks[4][0] < handLandmarks[3][0]:
fingerCount = fingerCount+1

# Other fingers: TIP y position must be lower than PIP y position,


# as image origin is in the upper left corner.
if handLandmarks[8][1] < handLandmarks[6][1]: #Index finger
fingerCount = fingerCount+1
if handLandmarks[12][1] < handLandmarks[10][1]: #Middle finger
fingerCount = fingerCount+1
if handLandmarks[16][1] < handLandmarks[14][1]: #Ring finger
fingerCount = fingerCount+1
if handLandmarks[20][1] < handLandmarks[18][1]: #Pinky
fingerCount = fingerCount+1

# Draw hand landmarks


mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())

# Display finger count


cv2.putText(image, str(fingerCount), (50, 450), cv2.FONT_HERSHEY_SIMPLEX, 3, (255,
0, 0), 10)

# Display image
cv2.imshow('MediaPipe Hands', image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()

x
v
i`
Output

Gestures with one hand:

xv
ii
Gestures with both hands

x
v
ii
xi
x
Conclusion

In conclusion, the development of a hand gesture recognition system capable of accurately


interpreting gestures made by both hands, including the recognition of numerical digits,
represents a significant advancement in human-computer interaction technology. Through the
integration of computer vision techniques, deep learning algorithms, and real-time processing
capabilities, the system offers users a seamless and intuitive means of interacting with computers
and devices. The diverse datasets, tools, and interface design considerations outlined in this
project lay the foundation for the creation of a robust and versatile system that can be deployed
across various applications and environments.

Moving forward, further research and development efforts will focus on refining the system's
accuracy, robustness, and usability, addressing challenges such as variations in hand poses,
lighting conditions, and background clutter. Additionally, ongoing advancements in computer
vision, machine learning, and hardware technologies will enable the continuous enhancement
and evolution of hand gesture recognition systems, opening up new possibilities for innovative
applications in fields such as healthcare, education, entertainment, and assistive technologies.
Ultimately, the convergence of interdisciplinary expertise and collaborative efforts will drive the
realization of more natural, intuitive, and inclusive human-machine interaction paradigms,
shaping the future of computing and interaction design.

x
x
`
References

1. A Review of the Hand Gesture Recognition System: Current Progress and Future Directions.
(2021). IEEE Journals & Magazine | IEEE Xplore.
https://ieeexplore.ieee.org/document/9622242

2. Real-time Hand Gesture Recognition Based on Deep Learning in Complex Environments.


(2019, June 1). IEEE Conference Publication | IEEE Xplore.
https://ieeexplore.ieee.org/document/8833328

3. Hand gesture recognition using deep learning. (2017, November 1). IEEE Conference
Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/8368821

4. Research on the Hand Gesture Recognition Based on Deep Learning. (2018, December 1).
IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/8634348

5. Hand Gesture Recognition to Facilitate Tasks for the Disabled. (2022, February 23). IEEE
Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/9743056

xx
i

You might also like