Project report
On
HAND GESTURE RECOGNIZATION
By
GOWTHAMI 22261A04A5
SREESHANTH KUMAR 22261A04B0
PRASHANTH 22261A04B5
ECE-2
Guided By
Mrs.ARCHANA YADAV
Assistant Professor
Department of ECE
Department of Electronics and Communication Engineering
MAHATMA GANDHI INSTITUTE OF TECHNOLOGY
(Autonomous)
(Kokapet Village, Gandipet, Hyderabad, Telangana – 500 075)
2023
1
MAHATMA GANDHI INSTITUTE OF
TECHNOLOGY
(Autonomous)
(Kokapet Village, Gandipet, Hyderabad, Telangana –
500 075
Department of Electronics and Communication Engineering
CERTIFICATE Date:
This is to certified that project entitled HAND GESTURE RECOGNIZATION has been
submitted to Department of Electronics and Communication Engineering, Mahatma Gandhi
Institute of Technical for the partial fulfillment of the requirement of B.Tech. 5-SemArtificial
Intelligence course.
Signature Signature
Dr. S P Singh
Mrs.ARCHANA YADAV
Project guide Professor & Head of ECE
2
INDEX
List of Contents Page NO.
1.Abstract 4
2.Introduction 5
3.Code 6-7
4.Code Explanation 7-10
5.Advantages 11
6.Result 11
ABSTRACT
Hand gesture recognition is a rapidly advancing field within computer vision, aiming to interpret human hand
movements as input commands for various applications such as human-computer interaction, sign language
translation, and virtual reality. This project presents a hand gesture recognition system utilizing MediaPipe, a
powerful framework by Google for real-time computer vision applications, in conjunction with OpenCV for
capturing webcam images and visualizing the results. The system tracks key hand landmarks, detects the
number of raised fingers, and classifies gestures in real time.
The recognition process involves detecting and tracking the position of key hand landmarks using MediaPipe's
Hand model, which captures hand shapes and movements with high accuracy. The number of fingers raised is
used as a basic form of gesture recognition, providing a simple and efficient way to interact with the system.
The output is displayed on a live webcam feed, allowing real-time interaction and feedback. The project
focuses on providing a user-friendly interface, with real-time hand gesture tracking and counting.
The system can be extended to support more complex gestures, enabling the development of intuitive
applications for gaming, accessibility tools for the differently-abled, and smart environments. The flexibility
and efficiency of this hand gesture recognition framework make it a viable solution for practical and
innovative uses in various fields of technology and user interaction.
INTRODUCTION
Hand gesture recognition is an innovative and exciting field of study within the domain of computer vision and
human-computer interaction (HCI). It aims to enable computers to interpret and respond to human gestures,
primarily through hand movements. These gestures can be used to convey commands, emotions, or other forms
of information, making hand gesture recognition an essential component of interactive systems, particularly in
applications such as virtual reality, gaming, assistive technologies, and sign language interpretation.
Traditional methods of human-computer interaction, such as keyboards, mice, and touch screens, often require
direct physical contact or precise movements. However, these methods can be limiting in certain environments,
especially in situations where touch is impractical or when hands-free interaction is preferred. Hand gestures
provide an intuitive, natural, and contactless alternative, which makes gesture recognition an attractive
approach for many emerging technologies.
This project leverages MediaPipe, a powerful framework by Google designed for real-time computer vision
tasks, to perform hand gesture recognition. The MediaPipe Hand model detects key landmarks on the hand,
and by analyzing these landmarks, the system can track and identify specific gestures such as the number of
fingers raised. Additionally, the system uses OpenCV for capturing webcam frames and displaying the results
in real-time.
The primary goal of this project is to implement a robust and responsive hand gesture recognition system that
can count the number of raised fingers as a simple gesture. By using computer vision techniques, this system
can detect and interpret hand gestures in real-time, enabling hands-free interaction with various applications.
The system’s versatility and simplicity make it suitable for a wide range of real-time applications, from sign
language translation to intuitive control in gaming and virtual environments.
This project serves as the foundation for developing more complex gesture recognition systems, with potential
future expansions to recognize more intricate hand gestures and provide richer, more interactive user
experiences.
CODE
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
import numpy as np
# Initialize MediaPipe hands
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(max_num_hands=2, min_detection_confidence=0.7,
min_tracking_confidence=0.7)
def display_frame(frame):
"""Display the frame using Matplotlib."""
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.imshow(rgb_frame)
plt.axis("off")
plt.pause(0.001) # Pause to allow real-time updates
def count_fingers(hand_landmarks):
tips = [8, 12, 16, 20] # Index, Middle, Ring, Pinky finger tips
thumb_tip = 4
finger_status = []
for tip in tips:
if hand_landmarks.landmark[tip].y < hand_landmarks.landmark[tip - 2].y:
finger_status.append(1)
else:
finger_status.append(0)
if hand_landmarks.landmark[thumb_tip].x < hand_landmarks.landmark[thumb_tip - 1].x:
finger_status.append(1)
else:
finger_status.append(0)
return sum(finger_status)
# Open the webcam
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
exit()
plt.ion() # Turn on interactive mode for Matplotlib
try:
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
break
frame = cv2.flip(frame, 1) # Flip the frame horizontally
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb_frame)
total_fingers = 0
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
total_fingers += count_fingers(hand_landmarks)
# Draw total fingers count on the frame
cv2.putText(frame, f"Total Fingers: {total_fingers}", (50, 100),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 3)
display_frame(frame)
except KeyboardInterrupt:
print("Exiting...")
finally:
cap.release()
plt.close()
CODE EXPLAINATION
1.Import Libraries
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
import numpy as np
cv2: This is OpenCV, a popular library for computer vision tasks. It is used for video capture,
image processing, and displaying the webcam feed.
mediapipe: This is a framework by Google that provides ready-to-use solutions for tasks like hand
tracking, pose detection, etc. It is used here to track and recognize hand landmarks.
matplotlib.pyplot: Used for displaying images or video frames using Matplotlib. Though it's not
the most efficient method for real-time display, it's included for visualization purposes in your code.
numpy: A powerful library for numerical computing, which is often used in image processing
tasks. It is not directly used here but is included by default with OpenCV.
2. Initialize MediaPipe for Hand Tracking
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(max_num_hands=2,
min_detection_confidence=0.7,
min_tracking_confidence=0.7)
• mp_hands: This accesses the hand-tracking model in MediaPipe.
• mp_drawing: This is used to draw hand landmarks on the frame.
• hands: Initializes the hand detection model to track up to 2 hands with a minimum
confidence of 0.7 for detection and tracking.
3. Display Frame Function
def display_frame(frame):
"""Display the frame using Matplotlib."""
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.imshow(rgb_frame)
plt.axis("off")
plt.pause(0.001) # Pause to allow real-time updates
• This function is used to show the video frame with the finger count overlay. It converts the color
format from BGR (used by OpenCV) to RGB (used by Matplotlib) and displays it.
4. Count Fingers Function
def count_fingers(hand_landmarks):
tips = [8, 12, 16, 20] # Finger tips of index, middle, ring, pinky fingers
thumb_tip = 4
finger_status = []
for tip in tips:
if hand_landmarks.landmark[tip].y < hand_landmarks.landmark[tip - 2].y:
finger_status.append(1)
else:
finger_status.append(0)
if hand_landmarks.landmark[thumb_tip].x < hand_landmarks.landmark[thumb_tip - 1].x:
finger_status.append(1)
else:
finger_status.append(0)
return sum(finger_status)
• This function counts how many fingers are raised by checking the position of each finger's tip.
It compares the position of each finger's tip with the joint below it to determine if the finger is raised.
It returns the total number of raised fingers.
5. Capture Webcam Video
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open webcam.")
exit()
cv2.VideoCapture(0): Opens the webcam to capture video.
If the webcam is not working, it will display an error and exit the program
6. Main Loop for Processing and Displaying the Frame
try:
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture frame.")
break
• The loop captures each frame from the webcam and processes it continuously.
7. Process the Frame and Detect Hands
frame = cv2.flip(frame, 1) # Flip the frame horizontally
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb_frame)
• cv2.flip(frame, 1): Flips the frame horizontally (like looking in a mirror).
• hands.process(rgb_frame): Processes the frame to detect hand landmarks using the
MediaPipe hand model.
8. Draw Hand Landmarks and Count Fingers
total_fingers = 0
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(frame, hand_landmarks,
mp_hands.HAND_CONNECTIONS)
total_fingers += count_fingers(hand_landmarks)
• If hands are detected, it draws the hand landmarks on the frame.
• Then, it calculates how many fingers are raised for each hand and adds them up.
9. Display the Finger Count on the Frame
cv2.putText(frame, f"Total Fingers: {total_fingers}", (50, 100),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 3)
• This adds the total number of raised fingers as text on the frame. It uses green color and a
large font size for visibility.
10. Show the Frame
display_frame(frame)
• This function displays the processed frame using Matplotlib, showing the hand
landmarks and the total number of raised fingers
11. Exit Condition
except KeyboardInterrupt:
print("Exiting...")
finally:
cap.release()
plt.close()
ADVANTAGES
Hands-Free Interaction:The system allows users to interact with devices without needing to
physically touch them. This is especially useful for applications in environments where hands-free control
is necessary, such as in healthcare, smart homes, or virtual reality.
Real-Time Processing:The system provides real-time feedback, meaning it can detect and count
fingers almost instantly as you raise or lower them. This makes it suitable for interactive applications like
games or controlling devices in real time.
Simple and Intuitive:Gesture-based control is intuitive for most users. Raising fingers to indicate
numbers or actions is natural and easy to understand. This makes the system accessible to a wide range of
users without requiring complex setups or learning.
Accessibility:It can be used to help individuals with disabilities who may have difficulty using
traditional input devices like keyboards or mice. For example, sign language interpretation could be one
potential application, where raised fingers or gestures correspond to specific signs.
Cost-Effective:The system leverages widely available hardware like webcams, and uses open-source
libraries like OpenCV and MediaPipe, which makes it cost-effective for development and deployment. No
specialized sensors are required beyond a standard camera.
Multiple Hands Detection: The system can detect and count fingers from multiple hands
simultaneously, making it more versatile. This is useful in applications where multiple people interact
with the system at once, such as during collaborative tasks or group activities.
RESULT
When you run the Hand Gesture Recognition program, the result will be a live video feed from your
webcam displaying your hand(s) in real-time. The system detects and tracks your hand landmarks (such
as the tips of the fingers and joints) and draws them on the screen. As you raise or lower your fingers, the
program counts the raised fingers and displays the total number in the top-left corner of the screen,
updating dynamically. For example, if you raise three fingers, the display will show "Total Fingers: 3". If
both hands are visible, it will sum the fingers from both hands. The system works smoothly in real-time,
providing instant feedback as you make gestures. The program runs continuously until manually stopped,
and it can detect multiple hands simultaneously, making it useful for interactive applications such as
games, sign language recognition, or control systems that rely on hand gestures.