0% found this document useful (0 votes)

44 views32 pages

Report 3

Uploaded by

vaishnavi.shingote111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views32 pages

Report 3

Uploaded by

vaishnavi.shingote111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

REPORT ON PROJECT STAGE - I

HAND SIGN LANGUAGE RECOGNITION

SUBMITTED TO SAVITRIBAI PHULE PUNE UNIVERSITY
FOR PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF

BACHELOR OF ENGINEERING
In
Electronics and Telecommunication Engineering

By
GAURI MYADAMWAR ( 42145 )
VAISHNAVI SHINGOTE ( 42162 )
SHWETA TAIDE ( 42165 )

GUIDED BY :
DR. M. R. KALE

DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION

ENGINEERING
PUNE INSTITUTE OF COMPUTER TECHNOLOGY
PUNE – 43

OCTOBER 2024
Department of Electronics and Telecommunication Engineering
Pune Institute of Computer Technology, Pune – 43

CERTIFICATE

This is to certify that the Project Stage - I Report entitled

“HAND SIGN LANGUAGE RECOGNITION”
has been successfully completed by
GAURI MYADAMWAR ( 42145 )
VAISHNAVI SHINGOTE ( 42162 )
SHWETA TAIDE ( 42165 )

towards the partial fulfillment of the degree of Bachelor of Engineering in Electronics and
Telecommunication Engineering as awarded by the Savitribai Phule Pune University, at
Pune Institute of Computer Technology during the academic year 2024-25.

INTERNAL GUIDE HOED

( Dr. M. R. Kale ) ( Dr. M. V. Munot )
ACKNOWLEDGEMENT

We extend our sincere thanks to Dr. M. V. Munot, Dr. R. C. Jaiswal, Dr. G. S. Mundada and
Dr. M. R. Kale for their invaluable support and guidance throughout our research. Their
expertise, insightful advice, and continuous encouragement were instrumental in refining this
project into a presentation worthy work. We are especially grateful for their assistance with
algorithm design, calculations, and overall presentation, which significantly contributed to the
quality of this research.

Thanking You ,

GAURI MYADAMWAR ( 42145 )

VAISHNAVI SHINGOTE ( 42162 )
SHWETA TAIDE ( 42165 )
CONTENTS

Abstract i
List of Symbols ii
List of Figures iii
List of Tables iv

1 Introduction 1-4
1.1 Background 1
1.2 Relevance 1
1.3 Motivation 1
1.4 Problem Definition 2
1.5 Scope and Objectives 2
1.6 Technical Approach 3
1.7 Organization of the Report 4
2 Literature Survey 5-13
2.1 Introduction 5
2.2 Dataset Description 6
2.3 Proposed Solution 10
2.4 Equations 11
2.5 Tables 13
3 Methodology 14
4 Results and Discussions 18
5 Conclusions & Future Scope 21
References 22
Plagiarism Report 23
ABSTRACT

The most common way of communication for a speech-impaired person is Sign Language.
Many people do not learn sign language, which can lead to the social isolation of individuals
who are deaf or hard of hearing, as it limits their ability to communicate with the wider
population. Sign language is an ancient means of communication which occurs naturally but
as human is ignorant to systematic sign language and the person cannot have a translator each
time with him, so there is a need of mediator system that can translate sign language. For this
purpose, a real-time method is presented in this paper where deep learning is used for
translation of ISL. The foremost purpose of this project is to develop a system that can identify
the specific words of Indian Sign Language that are articulated. It captures the frames of the
hand at the time when words are signed, passes them through a filter and later through a
classifier for the prediction of the class of the hand gestures. The system recognizes the sign
and communicates its corresponding meaning through text on the screen using CNN
architecture.

i
Abbreviations and Acronyms
ISL Indian Sign Language
ASL American Sign Language
CNN Convolutional Neural Network

ii
List of Figures
Fig. 1 Block Diagram 3
Fig. 2 Number images from the dataset - ISL Dataset 7
Fig. 3 Alphabet images from the dataset - ISL Dataset 7
Fig. 4 System Architecture 10
Fig. 5 Proposed block diagram 16

iii
List of Tables
1 Summary of Reviewed Literature 8

2 Summary statistics of the datasets used for training and testing 13

iv
CHAPTER 1
Introduction

1.1 Background
Hand sign language recognition technology refers to the interpretation of sign languages
mainly used by the deaf and hard-of-hearing people. It reduces communication barriers as it
acts as a primary channel of communication between sign language users and the hearing
population. This is the most important area for maximizing accessibility and inclusion in
healthcare, education, and everyday life. From its simple beginnings with image processing to
the current deep learning-based systems, HSLR is being advanced for Convolutional Neural
Networks-CNNs. Today, with the regional signs and contextual differences, one of the
challenges related to the perfect capturing of non-manual features such as facial expressions,
the research on these recognition systems is continuously carried out to make them more
precise, adaptable, and accessible for everyone.

1.2 Relevance
The project used knowledge from the electronics area to capture an accurate gesture by
utilizing advanced camera systems. Learning algorithms were then applied for analysis,
recognition, and real-time classification of hand signs to ensure proper interpretation of sign
language.

1.3 Motivation
Hand sign language recognition, due to its potential to improve the lives of deaf and hard-of-
hearing people, is of ever-increasing importance. Early work, however, concentrated on the
apparently simple tasks of image processing, like edge detection or template matching, where
it had significant problems with differences in hand shapes and gestures. With the advancement
in technology, machine learning algorithms, particularly CNNs have been utilized for
enhancing accuracy to identify complex signs, while others remain as challenges, such as

1
regional varieties and non-manual signals. More recent studies start to look into multimodal
approaches and mobile technologies to facilitate immediate translation, thus making HSLR
systems accessible.

1.4 Problem Definition

This project mainly tries to provide an accurate and real time hand sign recognition method
that enables deaf and hard-of-hearing people to communicate clearly. The hand sign language
recognition system must be able to identify various gestures. Some of the fundamental
problems with this system are user variability in executing the same sign that is generally
affecting recognition accuracy, and understanding contextual factors in the setting, including
non-manual signals that express the meaning of facial expressions. Processing video input in
real time is of paramount importance to ensure smooth communication.

1.5 Scope and Objectives

Hand sign language recognition system which captures and interprets gestures in real-time.
The project would involve the integration of advanced camera systems, image processing
techniques and machine learning algorithms that are aimed at giving accurate hand sign
language translation into text. The system will be designed with standards for compatibility
with common devices be it smart phones or tablets, to enhance user accessibility within
different environments.
Objectives
 Designing a reliable algorithm that can recognize a variety of hand signals with
accuracy.
 Developing efficient processes for the input analysis of video feed with the immediate
interpretation of gestures.
 Model should detect the words from the sign language.

2
1.6 Technical Approach
Convolutional Neural Networks (CNNs), a deep learning technology, are used in the
technological approach for this hand sign identification project in order to classify gestures.
First, we will gather and preprocess datasets in sign language, ISL, using picture normalization
and augmentation. We will use OpenCV to separate hands from the background in order to
identify hands, and then we will either use the CNN directly or tools like MediaPipe to extract
features. For hand sign recognition, a custom CNN or a pre-trained model such as MediaPipe
will be refined. Lastly, a camera feed will be used to include the model into a real-time
detection system, allowing for precise and effective gesture recognition.

Fig.1 Block Diagram

3
1.7 Organization of Report
In order to clearly present the project, the report is divided into many chapters.

Chapter 1: Introduction explains the project's goal, provides some background information,
and highlights the significance of metadata compression.

Chapter 2: Literature Survey provides summaries of the body of knowledge about data
compression methods and how they are used in metadata management.

The five compression strategies are described in Chapter 3: Methodology, which also includes
information on how they were implemented and the testing strategy that was employed.

Chapter 4: Results and Discussion includes graphs, comparative analysis, and performance
data for every approach.

The study's main conclusions are outlined in Chapter 5: Conclusion and Future Scope, which
also offers suggestions for more research.

References: A list of scholarly publications, papers, and other records that are mentioned in
the report.

4
CHAPTER 2
Literature Survey

2.1 Introduction
Improving accessibility to communication for deaf and hard-of-hearing communities is quite
pertinent with the hand sign language recognition system, which has become particularly
important for Indian Sign Language (ISL). CNNs have proven to be very effective in
leveraging this network for it will literally analyze and detect the visual patterns present in
images. The ISL recognition system captures hand gestures with the help of CNNs that
basically characterize the signs by extracting key features from the images or video frames.
This process converts gestures to text or, in most cases, speech, enabling two signers and one
learner to converse. Developed systems deploy CNN in the recognition of static signs and
dynamic gestures, hence constructing whole sentence conversion systems.

A. Finger identification with MediaPipe

Real-time tracking with very low latency, making it ideal for dynamic gesture recognition.
Device efficiency, allowing the system to run smoothly on mobile devices or embedded
systems without high computational demands. Using MediaPipe simplifies the preprocessing
step, as it allows the CNN model to focus on essential hand features rather than processing
entire raw images. This means that instead of feeding full-frame images into a CNN, you can
use just the landmark data (x, y, z coordinates), which is much more compact and specific to
the task at hand. [6]

B. CNN for Feature Extraction and Gesture Classification

CNNs are designed to shine at spatial feature extraction for input data: for instance, hand
gestures or any static hand sign.[19] MediaPipe would be responsible for hand detection and
landmark extraction, but when it comes to a subsequent critical step-involving recognition and
classification of hand gestures- CNNs come into the pipeline.

5
C. Hybrid Model: MediaPipe and CNN for ISL Recognition
MediaPipe tracks hand landmarks, cropping the hand regions in order to feed the data into the
CNN for classification. This model detects hands inside video frames, then pulls the 21
landmarks, depicting joints and fingertips.[2] It further normalizes the landmarked or cropped
images as an input into CNN. Inputs can be the (x, y) or the (x, y, z) coordinates of the
landmarks or of the grayscale hand images. Gesture Recognition CNN: In fact, the data is
trained on this to classify hand signs like the letters used in ISL or commands. CNN can classify
static ISL letters from the hand images, and dynamic gestures through frame sequence
analysis.

2.2 Dataset Description

2.2.1 ASL Alphabet Dataset
There are approximately 28,000 images of the American Sign Language alphabet in the ASL
Alphabet Dataset. About 1,000 images correspond to every letter of the alphabet from A to Z.
Thus, it brings gesture recognition into machine learning models. The images are mainly
grayscale and standardized to the size 200 x 200 pixels. It covers variations in different
backgrounds and signatures among many signers with respect to variations of hand size and
signing styles. The dataset is divided into training, validation, and test sets (80%-10%-10%,
respectively). This helps in testing and designing a particular model. It can be easily
downloaded on platforms like Kaggle and is a nice dataset for education and research purposes
in the field of sign language recognition.

2.2.2 RWTH-PHOENIX-Weather 2014T

This includes videos of DGS sign language interpreters signing sentences in DGS; through
such videos, gestures and spoken sentences can be presented. It has about 4,500
sentences, making it a very useful training material for dynamic sign language recognition
models.

6
2.2.3 ISL Dataset (Indian Sign Language)
The Indian Sign Language (ISL) Dataset comprises over 1,000 unique gestures, each
representing various words and phrases in English. It is composed of video recordings that
capture sign language gestures performed by multiple signers, allowing for variability in
signing styles. Each video clip typically lasts between 2 to 5 seconds, providing a clear view
of each gesture. This dataset is specifically designed to aid in the development of models for
recognizing and translating Indian Sign Language.

Fig. 2. Number images from the dataset - ISL Dataset

Fig. 3. Alphabet images from the dataset - ISL Dataset

This collection of research papers on gesture recognition highlights significant advancements

in the field, primarily utilizing deep learning techniques such as Convolutional Neural
Networks (CNNs). Many studies achieve impressive accuracy rates, often exceeding 95% in
real-time gesture detection across various datasets, showcasing the efficacy of methods like
transfer learning and ensemble models. Applications of these systems are diverse, ranging from
gesture-controlled interfaces in augmented and virtual reality to smart home devices and
robotics, emphasizing the necessity of real-time processing for practical implementations in
automotive systems and healthcare diagnostics. Additionally, some papers explore innovative
preprocessing techniques like histogram equalization and feature extraction methods to
enhance model performance, while others provide comparative analyses of various CNN
architectures, offering insights into their strengths and weaknesses.

7
Table 1. Summary of Reviewed Literature

Paper
Title Authors Methods Used Dataset Findings Applications
No.
Advanced
Gesture Aditya
Custom CNN Gesture High real-time Interactive gesture
1 Detection Kumar et
models images accuracy systems
with DL[21]- al.
2022
Hybrid Deep Navigation support in
Hand
Learning for CLAHE, Deep 99.95% AR/VR environments
Sneha Sign-vs.-
2 Gesture Belief Networks detection
Patel et al. Normal
Recognition[2 (DBN) accuracy
dataset
0]-2020
Ensemble Custom
Deep sign Gesture-based language
Learning language High sensitivity interpretation
Rohan Nair Transfer learning,
3 Models [32]- dataset and accuracy
et al.. ensemble models
2021 rates

CNN
Ensemble 5,900 Achieved AUC
Models for Aditi Sen CNN ensemble, hand of 96.32 and Human-computer
4
Gesture et al. transfer learning gesture sensitivity of interaction systems
Recognition images 98.11
[26]-2018
Depth-wise 96.25%
CNN[20]- Prateek Depth-wise 5,800
accuracy, Robust detection in
5 2021 Sharma et separable hand sign
efficient noisy settings
al. convolutions images
processing
Gesture
Recognition Multi- 97.12%
Aryan
via IoT [45]- CNN with IoT class accuracy in Gesture recognition in
6 Bhatnagar
2021 integration gesture IoT-based smart home applications
et al.
images recognition
Multiclass
Gesture Alhussein
Multi- 95% in binary, Gesture control in
Classification khan CNN, ensemble
7 class 80% in mobile and wearable
[23] - 2021 Darica et methods
dataset multiclass devices
al.

Deep CNN
Accessible technology
for Feature Nisha Gesture 96.55%
CNN with residual through gesture
8 Extraction Agarwal et Image accuracy, high
networks interpretation
[65]-2020 al. dataset specificity

Residual CNN
for Hand Sign Enhanced
Classification CNN and residual Custom Models
Rehan communication
9 [34]-2017 networks, CNN gesture achieved 97%-
Khan et al. accuracy in sign
classification dataset 98% accuracy
language systems

8
Paper Title Authors Methods Used Dataset Findings Applications
No.
10 Real-time Anjali CNN, ResNet50 Custom 94.5% real-time Precision interfaces
Gesture Mehta et Gesture accuracy for automotive
Detection using al. dataset systems
CNN [17] -
2016
11 Architecture Hand Low-resource, real-
Hand Sign[66] - Optimized
Manas gesture 99.12% AUC time gesture
2012 lightweight CNN
Kumar images recognition
et al.

12 Detection via Vijay InceptionResNet- Standard DenseNet achieved Real-time language

SVMs[45] - Sharma v2 hand sign 99.6% accuracy translation
2020 et al. dataframes

13 DenseNet for Ishaan DenseNet121 Standard DenseNet121 with Real-time

Sign Language Gupta et along with hand signs 99.58% accuracy communication
[87]-2016 al. Inception ResNet- systems
v2
14 Transfer Sanya Pre-trained CNNs, Kaggle 98.35% test Fast gesture
Learning in Bedi et SoftMax layer hand sign accuracy recognition for smart
Gesture al. devices
Recognition[85]
-2017

15 Multimodal Rahul Multimodal CNN, Multimodal 99.65% accuracy in Robotics remote

Gesture Varma MLP hybrid sign dataset detection controls
Recognition et al. models
[65]-2020
16 Depth-wise Raheel VGG16, Hand 97.75% accuracy in Healthcare wearables
CNN for Kandu InceptionResNet2 gesture hybrid models
Gesture et al.
Classification
[12]-2021

17 MobileNet for Mana MobileNet with Hand 94.5% low-latency Real-time mobile
Gesture khan et augmented dataset gesture accuracy gesture control
Interfaces[42]- al. images
2023
18 Detection with Andrey CNN, Hand sign Efficient for large Industrial automation
CNN[23]-2017 Lorence Kolmogorov image datasets with two-hand
et al. Complexity dataset gestures
(NCD)

19 Transfer Irene CNN with pre- Gesture 96.75% accuracy, Intelligent user
Learning for Moham trained layers images high sensitivity interfaces
Gesture [54]- mad et
2022 al.

9
20 AI for Hand Aditya Deep learning, Hand High AUC values Healthcare diagnostics
Recognition Bhalla et advanced image gesture
al. processing images
21 Hand Sukh et QCSA Network Hand sign 94.53% accuracy, Real-time hand sign
Recognition al. images 0.89 AUC recognition
[65]-2023
22 EfficientNet for Mudasir EfficientNetV2L, Hand 94.5% accuracy Real-time
Gesture Khan et ResNet50 signs with EfficientNet communication
Recognition[54] al.
-2019
23 Hybrid CNN Moham Ensemble Hybrid 92.75% accuracy Augmented reality
Models[84]- mad et classifiers CNN hand gesture interfaces
2021 al. signs
24 Comparative Shweta Various CNN Hand sign Key insights on Sign language
CNN Study[7]- Rao et architectures CNN methods classification for
images
2016 al. accessibility

25 Raghav Public Optimized CNN Gesture-based human-

Gesture Hyperparameter accuracy computer interaction
Sharma tuning algorithm hand
Optimization[3
et al. gesture
2] - 2011
dataset

2.3 Proposed Solution

We include the technologies like MediaPipe for gesture tracking and colour transformations to
improve the accuracy level of gesture extraction within our ISL recognition system. MediaPipe
perfectly tracks hands, fingers, and body landmarks as their working gets implemented in all
sign language gestures for meaningful translation in sign language.

Fig.4 System Architecture

The feature will be matched, mapped, and then classified with the use of our trained and tested
dataset. And a text display of the predicted words and alphabets appears on the screen. Since
comparatively less research has been done on the most common ISL words and alphabets than

10
ISL, particularly words, the core objective of this proposed system is to design a model for
these terms. The testing and tuning process of our approach is very critical in doing careful
experimentation with large datasets to significantly increase the accuracy and robustness of the
ISL recognition systems so that the communication opportunities of such a community of
speech-impaired people are increased . We emphasize the use of advanced technologies like
MediaPipe for gesture tracking and colour space transformations. Further, these techniques
will enhance the quality of gesture extraction and feature engineering, hence contributing to
the development of a robust and accurate ISL recognition system.

This will use Media pipe to track hands, fingers, and body landmarks. This is considered high-
accuracy sign language gesture interpretation since it has a strong capture and extraction of
hand gestures, which are at the core of sign language recognition fields. Due to the fact that it
can support real-time processing, this is a straightforward implementation for live sign
language communication, which is exactly what our system is aiming to do. Our proposed
method uses the user's device camera to detect sign language in real-time. In order to get it
even more accurate, the hand features are derived from the feed recorded in the same way as
they are present in our database. Each feature will be matched, mapped, and then classified
using our trained and tested model and the predicted words and alphabets' text form appears
on the screen. The last step for diversity will be to transform the ultimate predictions from text.
Given that relatively few studies have been done on the common ISL words and alphabets than
ISL, more so words, the ultimate aim of this proposed system is to come up with a model for
the terms in question. Since our approach leads to proper experimentation and optimization,
we see a tremendous increase in accuracy and adaptability in ISL recognition systems, thereby
increasing communication opportunity availability for the speech-impaired community.

2.4 Equations

Perfomance Parameter Evalulation Equations

Recall is the ratio which measures the proportion of relevant instances retrieved

Recall = TP / (TP + FN) --------- (i)

11
Precision is the ratio of relevant instances retrieved to the total number of relevant and
irrelevant instances retrieved.

Precision = TP / (TP + FP) --------- (ii)

Accuracy is the percentage of correct predictions out of the total instances.

Accuracy = (Number of correct predictions) / (Total number of instances considered) ----(iii)

F-measure combines Precision and Recall, accounting for both false positives and negatives,
with the formula

Fβ = (β2×Precision) + Recall(1+β2) × (Precision×Recall)--------- (iv)

MAE (Mean Absolute Error) measures the average absolute error in predictions.

MAE = (1/n) * Σ |Pi − Ei| --------- (v)

where Pi is the predicted value, and Ei is the true value.

Mean Squared Error (MSE): The average of squared prediction errors.

MSE = (1/n) * Σ (Pi - Ei)^2 --------- (vi)

Root Mean Squared Error (RMSE) is derived by taking the square root of the Mean Squared
Error (MSE).

RMSE = sqrt((1/n) * Σ (Pi - Ei)^2) --------- (vii)

12
2.5 Tables
Table 2. Summary statistics of the datasets used for training and testing
Sr. no Dataset Key features
Year Type Volume of data
Name
1 EgoHands 2015 Video 48,000 frames Annotated for hand
Dataset segmentation in cluttered
environments.

2. LaRED 2020 Video 700 videos, 10 Focused on hand gesture

Dataset gestures recognition in controlled and
complex backgrounds

3 Indian 2024 Video 152videos, Marathi ISL, with gloss and

Sign + 76sentence real-world simulation
Language Gloss pairs
Video and (English-
Text Marathi)
(ISLVT)
4. Kaggle 2019 Image 30,000+ images Static gesture dataset used for
Hand optimizing CNN
Gesture hyperparameters
Dataset

In order to recognize hands in congested surroundings, 48,000 frames in the EgoHands

Dataset (2015) have been annotated for hand segmentation. 700 movies with 10 distinct
hand motions are included in the LaRED Dataset (2020), which focuses on gesture
recognition in both easy and difficult settings. The 152 movies in the Indian Sign Language
Video and Text Dataset (2024) are matched with 76 English-Marathi words and are
intended to simulate Indian Sign Language (ISL) in the real world with gloss. Lastly, over
30,000 photos of static hand gestures make up the Kaggle Hand Gesture Dataset (2019).
These images are usually used to adjust CNN hyperparameters, which improves the
performance of hand gesture classification models.

13
CHAPTER 3
Methodology
A deep learning model known as Convolutional Neural Network (CNN) is utilized for hand
sign recognition. CNNs are specifically designed to interpret visual data, making them highly
effective for gesture recognition tasks. They automatically extract features from hand sign
images without requiring manual feature engineering. By processing the input images through
multiple layers of convolutional filters, pooling operations, and activation functions, CNNs
learn and recognize patterns in the images, enabling them to accurately classify different hand
signs. enabling them to accurately classify different hand signs.

CNN Algorithm
1. Start:
2. Initialize the architecture of the Convolutional Neural Network (CNN) and set the error
metric.
3. Input First Image: Load the initial hand sign image to be processed.
4. Perform Forward Propagation: Execute the following sequence:
a. Convolution Operation: Apply filters to the input image to extract important
features, producing a feature map that emphasizes specific patterns like edges
or textures.
b. Activation Function: Introduce non-linearity using an activation function,
typically the Rectified Linear Unit (ReLU), which sets all negative values in the
feature map to zero.
c. Pooling Layer: Implement pooling to down-sample the feature maps, reducing
their spatial dimensions while retaining significant features. Max pooling is
commonly used, which selects the maximum value within a specified window.
d. Fully Connected Layer: Flatten the output of the last pooling layer into a one-
dimensional vector. This vector is then fed into fully connected layers that
combine the features to make the final classification.
5. Calculate Error:

14
Determine the difference between the predicted class and the actual class using an
appropriate loss function (e.g., cross-entropy loss).
6. Process Remaining Images:
Continue the process for each subsequent image until all images in the dataset have
been processed.
7. Evaluate Stopping Criteria:
Check if the total error is below a predefined target. If it is, terminate the training;
otherwise, return to step 2.
8. Compute Performance Metrics:
a. Recall = recall_score(y_true, y_pred) Precision:
b. Precision = precision_score(y_true, y_pred)
c. Accuracy = accuracy_score(y_true, y_pred)
d. F1 Score = f1_score(y_true, y_pred)
e. Mean Absolute Error (MAE) = mean_absolute_error(y_true, y_pred)
f. Mean Squared Error (MSE) = mean_squared_error(y_true, y_pred)
g. Root Mean Squared Error (RMSE) = mean_squared_error(y_true, y_pred,
squared=False)
9. Testing Phase: Classify the images in the test dataset and compute the metrics
established in step 7.
10. End

SVM Algorithm (Support Vector Machine)

1. Initialize SVM and Hyperparameters: Set up the SVM model and define
hyperparameters (e.g., kernel type, regularization parameter) that control the behavior
of the model.
Functions/Libraries:
i. from sklearn.svm import SVC (for SVM implementation)
ii. kernel='linear' or kernel='rbf' (common kernels)
iii. C=1.0 (regularization parameter)
2. Feed Image: Map image to feature space; transform the input image into a suitable
feature vector that represents its characteristics for classification.

15
Functions/Libraries:
i. from sklearn.preprocessing import StandardScaler (for feature scaling)
ii. from sklearn.decomposition import PCA (for dimensionality reduction, if
needed)
3. Train: Train the SVM to find the optimal hyperplane that maximizes the margin between
different classes using svm_model.fit(X_train, y_train), where X_train = (feature
vectors), y_train = (target labels).
4. Check for Convergence: Stop if convergence is reached.
5. Calculate Metrics: Compute Recall, Precision, Accuracy, and F1 Score using:
a. Compute Recall = recall_score(y_true, y_pred)
b. Precision = precision_score(y_true, y_pred)
c. Accuracy = accuracy_score(y_true, y_pred)
d. F1 Score = f1_score(y_true, y_pred)
6.Test: Classify test images and compute metrics.
7. End: Conclude the SVM process and finalize results.
.

Fig. 5 Proposed block diagram

16
The process flow of a hand sign detecting system is depicted in the diagram. The first step is
image acquisition, which involves taking pictures or video frames. The hand inside the frame
is then located and followed using Hand Detection and Tracking. The hand is then separated
from the backdrop for targeted examination by the system's Segmentation function. The picture
is created during the preprocessing step using techniques like noise reduction and scaling. In
parallel, the model is trained to recognize gestures using a labeled dataset. The preprocessed
image's essential features are then extracted using feature extraction and supplied into the
recognition step to categorize the gesture. The detecting procedure is then finished by
converting the identified gesture into text.

17
CHAPTER 4
Results and Discussions
The evaluation results show differences in accuracy and training time from dataset to dataset.
In the case at hand, the best accuracy was noticed with ASL, having 92%, followed by ISL at
89%, BSL at 87%, and CSL at 85%. These differences might therefore be due to the different
gesture complexities and sizes of the data. As regards time consumed to train, it is noticed that
ASL data consumed 4 hours, ISL 3.5, BSL took 3 hours of training time, and CSL consumed
the lowest duration time of 2.8 hours for training. It was also noticed that the higher feature
models and the larger the size of the dataset used for the model take a little more time to train
but yield much higher accuracy in total. Thus, it can be seen as to how size of dataset, features,
and model training time contradict each other regarding accuracy.

Fig.1 Comparison of Dataset and Accuracy

18
Fig.2 Comparison of % Training Time

Fig.3 Comparison of Dataset and Features

19
Fig.4 Comparison of % Accuracy of Model

Fig. 1 (Dataset vs Accuracy): Bar chart showing dataset names on the x-axis and accuracy
percentages on the y-axis.
Fig. 2 (Training Time %): Line chart with models on the x- axis and training time percentage
on the y-axis.
Fig. 3 (Dataset and Features): Bar chart showing the features present in each dataset.
Fig. 4 (Model Accuracy %): Bar chart comparing the accuracy of different models on the same
datasets.

20
CHAPTER 5
Conclusions and Future Scope
This discussion summarizes the papers describing various deep learning and machine learning
approaches taken to research hand sign recognition. Most of the research utilizes CNN-based
models through ensemble methods, transfer learning, and hybrid approaches, and report very
high accuracy in hand sign classification. Most datasets are applied as ISL, Kaggle, and other
gesture image collections. Accuracy, AUC, and sensitivity show to be very well consistent
across all models, even some of them over 99%. Applications range from real-time hand sign
recognition to facilitate easier communication for the deaf people. Technics of data
augmentation and IoT integration further advance the robustness of such systems.

Future work includes expanding the system for coverage of different sign languages, adding
non-manual signals such as facial expressions, and gesture customization. Improving the
processing efficiency in real time directly on the mobile device while incorporating it into AR
technology for interactive displays of sign language would be a promising direction of further
development.

21
References
1. Goyal, K. (2023). Indian Sign Language Recognition Using Mediapipe Holistic. arXiv
preprint arXiv:2304.10256.
2. Mohammedali, A. H., Abbas, H. H., & Ismael, H. (2022). Real-time sign language
recognition system. International Journal of Health Sciences, 6, 10384-10407.
3. Kartik, et al. (2018). Real-time Indian sign language (ISL) recognition. In 2018 9th
international conference on computing, communication and networking technologies
(ICCCNT) (pp. 1-6). IEEE.
4. Chen, Yuxiao, et al. ”Construct dynamic graphs for hand gesture recognition via
spatial-temporal attention.” arXiv preprint arXiv:1907.08871 (2019).
5. Kothadiya, D., Bhatt, C., Sapariya, K., Patel, K., Gil-González, A. B., & Corchado, J.
M. (2022). Deepsign: Sign language detection and recognition using deep learning.
Electronics, 11(11), 1780.
6. Kapoor, P., & Hema, N. (2021). Sign Language and Common Gesture Using CNN.
Electronics, 2278-3091.
7. Liu, Y., Nand, P., Hossain, M. A., Nguyen, M., & Yan, W. Q. (2022). Sign language
recognition from digital videos using feature pyramid network with detection
transformer. Springer, 21673–21685.
8. Sikder, N., Arif, A. S. M., Chowdhury, M. S., & Nahid, A. A. (2019). Human activity
recognition using multichannel convolutional neural network. In 2019 IEEE
International Conference on Big Data (pp. 1- 6). IEEE.
9. Sabeenian, R. S., Bharathwaj, S. S., & Aadhil, M. M. (2020). Sign language recognition
using deep learning and computer vision. ISSN: 1943-023X.
10. Srivastava, S., Gangwar, A., Mishra, R., & Singh, S. (2021). Sign Language
Recognition System using TensorFlow Object Detection API. Springer.
11. Boháček, Matyáš, and Marek Hrúz. ”Sign pose-based transformer for word-level sign
language recognition.” Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision. 2022.
12. Sarma, N., Talukdar, A. K., & Sarma, K. K. (2021). Real-Time Indian Sign Language
Recognition System using YOLOv3 Model. IEEE.

22
13. Eng-Jon, et al. (2014). Sign spotting using hierarchical sequential patterns with
temporal intervals. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 1-6).
14. Dilsizian, M., et al. (2014). A new framework for sign language recognition based on
3D handshape identification and linguistic modeling. In LREC.
15. Lai, Kenneth, and Svetlana N. Yanushkevich. ”CNN+ RNN depth and skeleton based
dynamic hand gesture recognition.” 2018 24th international conference on pattern
recognition (ICPR). IEEE, 2018.
16. Masood, S., et al. (2018). Real-time sign language gesture (word) recognition from
video sequences using CNN and RNN. In Intelligent Engineering Informatics:
Proceedings of the 6th International Conference on FICTA (pp. 1-6). Springer
Singapore.
17. Li, D., et al. (2020). Transferring cross-domain knowledge for video sign language
recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (pp. 1-6).
18. Bantupalli, K., & Xie, Y. (2018). American sign language recognition using deep
learning and computer vision. In 2018 IEEE International Conference on Big Data (pp.
4896-4899). IEEE.
19. Alam, Mohammad Mahmudul, Mohammad Tariqul Islam, and SM Mahbubur
Rahman. ”Unified learning approach for egocentric hand gesture recognition and
fingertip detection.” Pattern Recognition 121 (2022): 108200.
20. Suharjito, S., Gunawan, H., Thiracitta, N., & Nugroho, A. (2018). Sign language
recognition using modified convolutional neural network model. In Proceedings of the
IEEE International Conference on Image Processing (pp. 1-5).
21. Zuo, Ronglai, Fangyun Wei, and Brian Mak. ”Natural Language- Assisted Sign
Language Recognition.” arXiv preprint arXiv:2303.12080 (2023).
22. Suharjito, S., Gunawan, H., Thiracitta, N., & Nugroho, A. (2018). Sign Language
Recognition Using Modified Convolutional Neural Network Model. In Proceedings of
the IEEE International Conference on Image Processing

23
23. Li, D., et al. (2020). Transferring cross-domain knowledge for video sign language
recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition
24. Koller, O., Zargaran, S., Ney, H., & Bowden, R. (2016). Deep Sign: Hybrid CNN-
HMM for continuous sign language recognition. Proceedings of the 30th British
Machine Vision Conference.
25. Chen, Yutong, et al. ”Two-Stream Network for Sign Language Recognition and
Translation.” arXiv preprint arXiv:2211.01367 (2022).
26. Hu, Hezhen, et al. ”Signbert: pre-training of hand-model-aware representation for sign
language recognition.” Proceedings of the IEEE/CVF International Conference on
Computer Vision. 2021.
27. Chen, Huizhou, et al. ”Multi-scale attention 3D convolutional network for multimodal
gesture recognition.” Sensors 22.6 (2022): 2405.

Project Pre - Submission Final Report
No ratings yet
Project Pre - Submission Final Report
17 pages
(Sign Language Detection) : ACE Engineering College
No ratings yet
(Sign Language Detection) : ACE Engineering College
19 pages
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
No ratings yet
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
18 pages
G41 FinalEval
No ratings yet
G41 FinalEval
29 pages
Final PPT Capstone Project
No ratings yet
Final PPT Capstone Project
17 pages
Project File
No ratings yet
Project File
66 pages
Final Projct
No ratings yet
Final Projct
46 pages
Sign Language Report
No ratings yet
Sign Language Report
62 pages
Project Review 1
No ratings yet
Project Review 1
24 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
16 pages
Final Review1 PPT
No ratings yet
Final Review1 PPT
18 pages
Synopsis
No ratings yet
Synopsis
41 pages
Sign Doc 2 - Merged
No ratings yet
Sign Doc 2 - Merged
42 pages
Synopsis 1
No ratings yet
Synopsis 1
8 pages
IJNRD2402076
No ratings yet
IJNRD2402076
7 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
ISL Recognition via Hand Gestures
No ratings yet
ISL Recognition via Hand Gestures
30 pages
Sign Language Report
No ratings yet
Sign Language Report
32 pages
Sign Language Tech for Inclusion
No ratings yet
Sign Language Tech for Inclusion
63 pages
Mini Project Thesis
No ratings yet
Mini Project Thesis
21 pages
Sign Language RECOGNITION USING DEEP LEARNING
No ratings yet
Sign Language RECOGNITION USING DEEP LEARNING
28 pages
Research Paper5
No ratings yet
Research Paper5
6 pages
Sign Language Detection Using The Computer Vision
No ratings yet
Sign Language Detection Using The Computer Vision
27 pages
Sign Language to Text Project
No ratings yet
Sign Language to Text Project
37 pages
Synopsis
No ratings yet
Synopsis
20 pages
EfficientNetv2 for Sign Language Recognition
No ratings yet
EfficientNetv2 for Sign Language Recognition
22 pages
Final Report
No ratings yet
Final Report
39 pages
DIP Project Code
No ratings yet
DIP Project Code
24 pages
Project Phase 1
No ratings yet
Project Phase 1
36 pages
Review Paper Team35
No ratings yet
Review Paper Team35
5 pages
Sign Language Recognition Study
No ratings yet
Sign Language Recognition Study
24 pages
ProjectTemplateFinal 4 4 - 4
No ratings yet
ProjectTemplateFinal 4 4 - 4
23 pages
Batch 2 - It A
No ratings yet
Batch 2 - It A
23 pages
Sign Language
No ratings yet
Sign Language
11 pages
Sign Language Detection Using The Computer Visio1
No ratings yet
Sign Language Detection Using The Computer Visio1
26 pages
Shashank Report
No ratings yet
Shashank Report
59 pages
Recognizing and Transforming Sign Language To Speech
No ratings yet
Recognizing and Transforming Sign Language To Speech
23 pages
AI Report
No ratings yet
AI Report
23 pages
Machine Learning and Ai (Eaepcpc09) : Project: Sign Language Recognition
No ratings yet
Machine Learning and Ai (Eaepcpc09) : Project: Sign Language Recognition
20 pages
Prasanna - CSE - ISL Translation To Text:speech
No ratings yet
Prasanna - CSE - ISL Translation To Text:speech
46 pages
ASL Recognition with LSTM & Hand Detection
No ratings yet
ASL Recognition with LSTM & Hand Detection
8 pages
Project Report - Sign Language To Text Conversion
No ratings yet
Project Report - Sign Language To Text Conversion
34 pages
B09 SignLanguageDetection
No ratings yet
B09 SignLanguageDetection
35 pages
Sign Language Detection Report
100% (1)
Sign Language Detection Report
40 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
18 pages
Signlanguagee 2 1
No ratings yet
Signlanguagee 2 1
27 pages
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
No ratings yet
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
68 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
REPORT FROM INTRODUCTION (May 4)
No ratings yet
REPORT FROM INTRODUCTION (May 4)
24 pages
Project Report Final
No ratings yet
Project Report Final
28 pages
Sign Language Detection
No ratings yet
Sign Language Detection
29 pages
BIt On
No ratings yet
BIt On
12 pages
Project Report - Sign Language To Text Conversion..2
No ratings yet
Project Report - Sign Language To Text Conversion..2
37 pages
Project - Exhibition - 2 Report GRP254
No ratings yet
Project - Exhibition - 2 Report GRP254
49 pages
Synopsis A Baja
No ratings yet
Synopsis A Baja
10 pages
Rephrased Document
No ratings yet
Rephrased Document
2 pages
G7 Synopsis
No ratings yet
G7 Synopsis
14 pages
Aicte Project
No ratings yet
Aicte Project
10 pages
Hindi Sign Language Tech Report
No ratings yet
Hindi Sign Language Tech Report
10 pages
Dang Et Al 2023 Human Centred Artificial Intelligence For Mobile Health Sensing Challenges and Opportunities
No ratings yet
Dang Et Al 2023 Human Centred Artificial Intelligence For Mobile Health Sensing Challenges and Opportunities
22 pages
Survey Draft v1
No ratings yet
Survey Draft v1
24 pages
Gunn 1998
No ratings yet
Gunn 1998
52 pages
ML Question Bank
No ratings yet
ML Question Bank
4 pages
Extraai
No ratings yet
Extraai
11 pages
IPU University 6th Sem Questions
No ratings yet
IPU University 6th Sem Questions
14 pages
LN - Optimization For ML
No ratings yet
LN - Optimization For ML
129 pages
Advanced Data Mining Techniqes in Bioinformatics
100% (1)
Advanced Data Mining Techniqes in Bioinformatics
343 pages
Xinjie Di, Stock Trend Prediction With Technical Indicators Using SVM
No ratings yet
Xinjie Di, Stock Trend Prediction With Technical Indicators Using SVM
5 pages
Course Pack - V Sem Machine Learning by DR SantoshKumar5
No ratings yet
Course Pack - V Sem Machine Learning by DR SantoshKumar5
27 pages
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected Accidents Under Bad CCTV Monitoring Conditions in Tunnels
No ratings yet
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected Accidents Under Bad CCTV Monitoring Conditions in Tunnels
5 pages
COVID-19 Prediction with Deep Learning
No ratings yet
COVID-19 Prediction with Deep Learning
9 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Chapter 1 Tupad
No ratings yet
Chapter 1 Tupad
15 pages
Scaling UPF Instances in 5G 6G Core With Deep Reinforcement Learning-1
No ratings yet
Scaling UPF Instances in 5G 6G Core With Deep Reinforcement Learning-1
15 pages
Machine Learning Techniques For Polymorphic Malware Analysis and Identification
No ratings yet
Machine Learning Techniques For Polymorphic Malware Analysis and Identification
8 pages
Mtchcse
No ratings yet
Mtchcse
26 pages
Development of A Hybrid Intelligence Algorithm To Estimate The Derivative Weight
No ratings yet
Development of A Hybrid Intelligence Algorithm To Estimate The Derivative Weight
16 pages
Mental Health Analyzer Using Machine Learning Techniques
No ratings yet
Mental Health Analyzer Using Machine Learning Techniques
6 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Machine Learning Based Method For Insurance Fraud Detection On Class Imbalance Datasets With Missing Values
No ratings yet
Machine Learning Based Method For Insurance Fraud Detection On Class Imbalance Datasets With Missing Values
18 pages
Synopsis Format Brain Tumor Jhulelal
No ratings yet
Synopsis Format Brain Tumor Jhulelal
22 pages
Tomato Disease Prediction Model Using Machine Learning Algorithms and Image Processing Techniques
No ratings yet
Tomato Disease Prediction Model Using Machine Learning Algorithms and Image Processing Techniques
6 pages
Weapon Detection with YOLO Models
No ratings yet
Weapon Detection with YOLO Models
10 pages
Sr. Shubham Singh Albert
No ratings yet
Sr. Shubham Singh Albert
25 pages
ML Module 1
No ratings yet
ML Module 1
52 pages
Machine Learning Techniques Unit-1 (KAI-601)
No ratings yet
Machine Learning Techniques Unit-1 (KAI-601)
78 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Fraud Detection Using Machine Learning and Deep Learning
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning
6 pages
Prediction of COVID-19 Using Machine Learning Techniques: Durga Mahesh Matta Meet Kumar Saraf
No ratings yet
Prediction of COVID-19 Using Machine Learning Techniques: Durga Mahesh Matta Meet Kumar Saraf
52 pages