Hand Gesture Recognition
Submitted in partial fulfillment of the requirements
                  of the course of
            Image Analysis and
             Computer Vision
                        by
         Nemin Sheth 60003210214
         Harsh Chheda 60003210218
         Rushit Jhaveri 60003210221
                under guidance of
                Prof. Neha Katre
    Department of Information Technology
  Dwarkadas J. Sanghvi College of Engineering,
               Mumbai – 400 056
                   2023-24
                                         Abstract
In recent years, hand gesture recognition has garnered significant attention due to its potential
applications in various fields, including human-computer interaction, sign language recognition,
and virtual reality. This project presents a comprehensive system for hand gesture recognition
that can detect and interpret gestures made by both hands simultaneously, with a focus on
recognizing numbers. The proposed system utilizes computer vision techniques and deep
learning algorithms to analyze live video input from a camera, extract hand gestures, and classify
them into predefined categories corresponding to numerical digits and other basic gestures. The
system's architecture incorporates Convolutional Neural Networks (CNNs) for feature extraction
and classification, along with techniques for hand detection, tracking, and pose estimation.
Additionally, real-time processing capabilities are achieved through optimization techniques to
ensure efficient performance. The experimental evaluation demonstrates the effectiveness and
robustness of the proposed system in accurately recognizing hand gestures of numbers, under
various environmental conditions and with different hand orientations. The system's versatility
and accuracy make it suitable for a wide range of applications, such as gesture-based input
devices, virtual reality interfaces, and assistive technologies for individuals with disabilities.
Through this project, we aim to contribute to the advancement of hand gesture recognition
technology and its practical applications in various domains. By enabling accurate and efficient
recognition of hand gestures, our system opens up new possibilities for intuitive human-
computer interaction, accessibility solutions, and innovative user interfaces. Moreover, the
insights gained from this research can inform future developments in gesture recognition
systems and pave the way for novel applications in fields such as healthcare, education, and
entertainment.
                                               vi
                                 Table of Contents
1. Introduction
       Motivation / Objective
2. Literature Review
       Literature Related to Existing Systems
3. Proposed Methodology/ Approach
       Problem Definition
       Scope
       Proposed Approach to build the system
4. System Design
       Proposed System Architecture
5. Implementation
       Description of Datasets
       Description of Tools used
       Interface Design
       Code
6. Conclusion
7. References (list of papers, books, websites/ blogs, any other resources referred)
                                               vi
                                               i
1. Introduction
Motivation / Objective
The project aims to develop a sophisticated hand gesture recognition system capable of
accurately interpreting gestures made by both hands simultaneously, with a specific emphasis on
recognizing numerical digits. This endeavor is driven by the increasing demand for intuitive
human-computer interaction methods that harness natural hand movements, particularly in
scenarios where traditional input devices may be impractical or cumbersome. By enabling users
to communicate with computers through hand gestures, the system seeks to enhance user
experience and streamline interactions across various computing platforms, including desktop,
mobile, and immersive technologies.
One key motivation behind the project is to promote accessibility and inclusivity by providing
alternative input methods for individuals with mobility impairments or those who prefer non-
verbal communication modalities. Recognizing hand gestures, including numbers, can facilitate
easier access to digital platforms and services for a broader range of users. Moreover, the project
endeavors to advance technological solutions for virtual reality (VR) and augmented reality
(AR) applications by enabling seamless integration of hand tracking and interaction in virtual
environments. This not only fosters immersive experiences but also enhances user engagement
and interaction within VR/AR applications.
Furthermore, the project seeks to contribute to the advancement of computer vision techniques
and deep learning algorithms by leveraging technological advancements in these domains. By
addressing challenges such as occlusions, varying lighting conditions, and complex hand poses,
the developed system aims to push the boundaries of hand gesture recognition technology.
Beyond academic research, the system holds potential for practical applications in industries
such as robotics, smart manufacturing, and human-robot collaboration, where intuitive human-
machine interaction is crucial for enhancing productivity and safety. Overall, the project on hand
gesture recognition with both hands, including recognizing numbers, endeavors to bridge the gap
between humans and machines, opening up new possibilities for seamless and intuitive
interaction in diverse contexts.
                                                v
                                                ii
                                                i`
2. Literature Review
The comprehensive review presented by Mohamed et. al. in [1] highlights the significant
progress and ongoing research efforts in vision-based hand gesture recognition systems from
2014 to 2020. The analysis of 98 articles extracted from reputable online databases sheds light
on key areas of focus within the field. Notably, the majority of research concentrates on data
acquisition, data environment, and hand gesture representation, reflecting the foundational
pillars of gesture recognition technology. However, despite advancements, there remains a
notable gap in continuous gesture recognition, suggesting the need for further exploration and
refinement in this aspect to achieve practical implementation of vision-based gesture recognition
systems.
Addressing the challenges of real-time hand gesture recognition in complex environments, Wu
et. al. [2] presents a novel approach leveraging deep learning techniques for unmanned vehicle
control. By integrating the ssd_mobilenet model for hand detection and Convolutional Pose
Machines (CPMs) for keypoint detection, the system achieves impressive real-time performance
with a recognition accuracy of 96.7%. The utilization of multi-frame recursion further enhances
robustness, demonstrating the applicability of the proposed method in real-world scenarios.
Introducing a novel technique for computer interaction through hand gestures, Hussain et. al. [3]
proposes a comprehensive framework encompassing hand shape recognition, dynamic hand
tracing, and gesture command conversion. By leveraging computer vision algorithms, the system
achieves a remarkable accuracy of 93.09% across six static and eight dynamic hand gestures.
This research contributes to the development of more natural and intuitive human-machine
interfaces, offering new avenues for interaction design and usability enhancement.
With the increasing demand for human-machine interaction, Sun et. al. [4] addresses the
practical application of hand gesture recognition in various domains such as robot control and
intelligent furniture. The proposed methodology combines skin color segmentation, real-time
hand gesture tracking, and convolutional neural network (CNN) recognition to achieve an
impressive accuracy of 98.3% in recognizing common digits. This underscores the efficacy of
integrating computer vision techniques with machine learning algorithms for accurate and
efficient gesture recognition.
Focusing on the accessibility aspect, Swati et. al. [5] presents a simple yet effective technique
for gesture recognition using Python libraries such as OpenCV and NumPy. By considering
parameters like area ratio and convexity defects, the system distinguishes between different
gestures, offering potential applications for entertainment and accessibility purposes. This
research contributes to the democratization of gesture recognition technology, making it
accessible to a wider audience, including individuals with disabilities.
                                                 ix
3. Proposed Methodology/ Approach
Problem Definition
The problem addressed in this project is the development of a robust hand gesture recognition
system capable of accurately interpreting gestures made by both hands, with a specific focus on
recognizing numerical digits. Traditional input methods such as keyboards and touchscreens
may not always be suitable, especially in scenarios where users require hands-free interaction or
have mobility impairments. Therefore, there is a need for an intuitive and efficient system that
can understand and interpret hand gestures, enabling users to interact with computers and
devices seamlessly.
Scope
The scope of this project encompasses the following aspects:
1. Hand Gesture Recognition: The system will be designed to detect and recognize hand gestures
made by users in real-time. This includes recognizing numerical digits as well as potentially
other predefined gestures.
2. Simultaneous Recognition of Both Hands: The system will be capable of recognizing gestures
made by both hands simultaneously, allowing for more complex interactions and applications.
3. Real-time Performance: Emphasis will be placed on achieving real-time processing
capabilities to ensure smooth and responsive interaction with the system.
4. Versatility and Adaptability: The system will be designed to adapt to different hand shapes,
sizes, and orientations, enhancing its usability across diverse user populations and environments.
5. Accuracy and Robustness: The system will aim to achieve high accuracy in gesture
recognition while being robust to variations in lighting conditions, background clutter, and
occlusions.
6. Integration Potential: The system will be designed with the potential for integration into
various applications and devices, such as assistive technologies, virtual reality interfaces, and
interactive displays.
Proposed Approach to Build System
The proposed approach to building the hand gesture recognition system involves the following
steps:
1. Data Acquisition: Gather a diverse dataset of hand gestures, including numerical digits,
performed by different individuals in various environments. This dataset will serve as the
training data for the system.
2. Preprocessing: Preprocess the input images to enhance contrast, remove noise, and normalize
lighting conditions. This step ensures that the input data is clean and standardized for further
processing.
3. Feature Extraction: Utilize computer vision techniques to extract relevant features from the
hand gestures, such as key landmarks and spatialx relationships between them. This step
transforms the raw input data into a more meaningful
                                                `      representation that can be used for gesture
classification.
4. Model Training: Train a deep learning model, such as a Convolutional Neural Network
(CNN), using the preprocessed data to recognize hand gestures. The model will learn to map
input images to corresponding gesture labels, enabling accurate classification.
5. Real-time Processing: Implement the trained model into a real-time processing pipeline,
where live video input from a camera is continuously fed into the system. The system will detect
and recognize hand gestures in real-time, allowing for immediate feedback and interaction.
6. Evaluation and Optimization: Evaluate the performance of the system using metrics such as
accuracy, precision, and recall. Identify areas for improvement and optimize the system to
enhance its performance and robustness.
7. Integration and Deployment: Integrate the hand gesture recognition system into the desired
applications or devices, ensuring compatibility and seamless integration. Deploy the system for
practical use, where users can interact with computers and devices using hand gestures
effectively and intuitively.
                                               xi
4. System Design
MediaPipe Architecture
                         x
                         ii
                         `
5. Implementation
Decription of Datasets
For training and testing the hand gesture recognition system, diverse datasets of hand gestures
will be utilized. These datasets will include examples of numerical digits (0-9) performed by
different individuals in various environments. Additionally, the datasets may contain other
predefined gestures relevant to the intended applications of the system. The datasets will be
annotated with labels corresponding to the gestures performed in each image or video frame. To
ensure robustness and generalization of the system, the datasets will encompass variations in
hand shapes, sizes, orientations, lighting conditions, and backgrounds. Some commonly used
datasets for hand gesture recognition include the American Sign Language (ASL) Alphabet
dataset, ChaLearn Looking at People dataset, and custom datasets collected specifically for the
project.
Description of Tools Used
1. OpenCV (Open Source Computer Vision Library): OpenCV will be used for image and video
processing tasks, including input/output operations, preprocessing, and feature extraction. It
provides a comprehensive set of functions and algorithms for computer vision tasks, making it
suitable for implementing various components of the hand gesture recognition system.
2. MediaPipe: MediaPipe offers pre-trained models and pipelines for hand tracking and gesture
recognition, which can be integrated into the system for efficient hand detection and tracking. It
provides high-level abstractions and APIs for building real-time applications with complex
computer vision tasks.
3. TensorFlow/Keras: TensorFlow and Keras will be utilized for training and deploying deep
learning models for gesture recognition. These frameworks offer powerful tools and utilities for
building and training convolutional neural networks (CNNs), which are commonly used for
image classification tasks.
4. NumPy: NumPy will be used for numerical computations and array manipulations,
particularly in preprocessing steps and data manipulation tasks. Its efficient array operations and
mathematical functions make it suitable for handling large datasets and performing complex
computations.
5. Matplotlib: Matplotlib will be used for data visualization and analysis, allowing for the
visualization of training/validation metrics, model performance, and debugging.
Interface Design
The interface design of the hand gesture recognition system will focus on providing a user-
friendly and intuitive experience for interacting with the system. The interface will consist of the
following components:
1. Live Video Feed: The interface will display a live video feed from the camera, showing the
user's hand gestures in real-time.
2. Gesture Recognition Output: The interface will provide visual feedback indicating the
recognized gestures, such as highlighting the detected hand regions and displaying
                                                xi
corresponding numerical digits or predefined gesture  labels.
                                                ii
3. User Instructions: Clear and concise instructions will be provided to guide users on how to
perform different gestures and interact with the system effectively.
4. Control Options: The interface may include options for controlling the system settings, such
as adjusting sensitivity levels, switching between different gesture recognition modes, and
calibrating the camera.
5. Accessibility Features: Consideration will be given to incorporating accessibility features,
such as adjustable font sizes, high contrast modes, and voice-guided instructions, to
accommodate users with diverse needs and preferences.
                                                 x
                                                 i
                                                 v
Code:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# For webcam input:
cap = cv2.VideoCapture(0)
with mp_hands.Hands(
  model_complexity=0,
  min_detection_confidence=0.5,
  min_tracking_confidence=0.5) as hands:
 while cap.isOpened():
  success, image = cap.read()
  if not success:
    print("Ignoring empty camera frame.")
    # If loading a video, use 'break' instead of 'continue'.
    continue
  # To improve performance, optionally mark the image as not writeable to
  # pass by reference.
  image.flags.writeable = False
  image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
  results = hands.process(image)
  # Draw the hand annotations on the image.
  image.flags.writeable = True
  image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  # Initially set finger count to 0 for each cap
  fingerCount = 0
  if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
     # Get hand index to check label (left or right)
     handIndex = results.multi_hand_landmarks.index(hand_landmarks)
     handLabel = results.multi_handedness[handIndex].classification[0].label
     # Set variable to keep landmarks positions (x and y)
     handLandmarks = []
     # Fill list with x and y positions of each landmark
     for landmarks in hand_landmarks.landmark:
      handLandmarks.append([landmarks.x, landmarks.y])
     # Test conditions for each finger: Count is increased if finger is
     # considered raised.
     # Thumb: TIP x position must be greater orxvlower than IP x position,
     # deppeding on hand label.
     if handLabel == "Left" and handLandmarks[4][0] > handLandmarks[3][0]:
     fingerCount = fingerCount+1
    elif handLabel == "Right" and handLandmarks[4][0] < handLandmarks[3][0]:
     fingerCount = fingerCount+1
    # Other fingers: TIP y position must be lower than PIP y position,
    # as image origin is in the upper left corner.
    if handLandmarks[8][1] < handLandmarks[6][1]:        #Index finger
      fingerCount = fingerCount+1
    if handLandmarks[12][1] < handLandmarks[10][1]: #Middle finger
      fingerCount = fingerCount+1
    if handLandmarks[16][1] < handLandmarks[14][1]: #Ring finger
      fingerCount = fingerCount+1
    if handLandmarks[20][1] < handLandmarks[18][1]: #Pinky
      fingerCount = fingerCount+1
    # Draw hand landmarks
    mp_drawing.draw_landmarks(
      image,
      hand_landmarks,
      mp_hands.HAND_CONNECTIONS,
      mp_drawing_styles.get_default_hand_landmarks_style(),
      mp_drawing_styles.get_default_hand_connections_style())
   # Display finger count
   cv2.putText(image, str(fingerCount), (50, 450), cv2.FONT_HERSHEY_SIMPLEX, 3, (255,
0, 0), 10)
  # Display image
  cv2.imshow('MediaPipe Hands', image)
  if cv2.waitKey(5) & 0xFF == 27:
    break
cap.release()
                                            x
                                            v
                                            i`
Output
Gestures with one hand:
                          xv
                          ii
Gestures with both hands
                           x
                           v
                           ii
xi
x
Conclusion
In conclusion, the development of a hand gesture recognition system capable of accurately
interpreting gestures made by both hands, including the recognition of numerical digits,
represents a significant advancement in human-computer interaction technology. Through the
integration of computer vision techniques, deep learning algorithms, and real-time processing
capabilities, the system offers users a seamless and intuitive means of interacting with computers
and devices. The diverse datasets, tools, and interface design considerations outlined in this
project lay the foundation for the creation of a robust and versatile system that can be deployed
across various applications and environments.
Moving forward, further research and development efforts will focus on refining the system's
accuracy, robustness, and usability, addressing challenges such as variations in hand poses,
lighting conditions, and background clutter. Additionally, ongoing advancements in computer
vision, machine learning, and hardware technologies will enable the continuous enhancement
and evolution of hand gesture recognition systems, opening up new possibilities for innovative
applications in fields such as healthcare, education, entertainment, and assistive technologies.
Ultimately, the convergence of interdisciplinary expertise and collaborative efforts will drive the
realization of more natural, intuitive, and inclusive human-machine interaction paradigms,
shaping the future of computing and interaction design.
                                                x
                                                x
                                                `
References
1. A Review of the Hand Gesture Recognition System: Current Progress and Future Directions.
(2021). IEEE Journals & Magazine | IEEE Xplore.
https://ieeexplore.ieee.org/document/9622242
2. Real-time Hand Gesture Recognition Based on Deep Learning in Complex Environments.
(2019, June 1). IEEE Conference Publication | IEEE Xplore.
https://ieeexplore.ieee.org/document/8833328
3. Hand gesture recognition using deep learning. (2017, November 1). IEEE Conference
Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/8368821
4. Research on the Hand Gesture Recognition Based on Deep Learning. (2018, December 1).
IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/8634348
5. Hand Gesture Recognition to Facilitate Tasks for the Disabled. (2022, February 23). IEEE
Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/9743056
                                              xx
                                              i