Available online at www.sciencedirect.
com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia
Procedia Computer Science 00 (2019) 000–000
Procedia Computer Science 224 (2023) 425–430 www.elsevier.com/locate/procedia
10th International Symposium on Emerging Inter-networks, Communication and Mobility
10th International Symposium on Emerging Inter-networks, Communication and Mobility
(EICM-2023)
August 14-16, 2023, Halifax, Nova Scotia, Canada
(EICM-2023)
August 14-16, 2023, Halifax, Nova Scotia, Canada
An Innovative Arabic Text Sign Language Translator
An Innovative Arabic Text Sign Language Translator
Amani Abdallaa , Aayah Alsereidia , Nouf Alyammahia , Fatima Ba Qehaizela , Henry
Alexander
Amani Abdallaa , Aayah Alsereidia
, NoufaAlyammahi
Ignatious , Hesham El-Sayed
a ∗a,
, Fatima Ba Qehaizela , Henry
a ∗a,
a
Alexander Ignatious , Hesham El-Sayed
College of Information Technology, United Arab Emirates University, Al Ain, UAE
a College of Information Technology, United Arab Emirates University, Al Ain, UAE
Abstract
Abstract
In recent years, several software and hardware solutions have been proposed for object detection, motion tracking, and gesture
identification.
In recent years,However, these solutions
several software failed tosolutions
and hardware efficiently
haveidentify appropriate
been proposed for gestures and trackmotion
object detection, their motion
tracking,in aand
stipulated
gesture
time range. To However,
identification. overcome these
the above-mentioned
solutions failed challenges, weidentify
to efficiently proposeappropriate
a novel signgestures
languageandtranslator application,
track their motion which uses the
in a stipulated
MediaPipe
time range. Holistic
To overcomemodelthealong with Long Short-Term
above-mentioned challenges,Memory (LSTM),
we propose integrated
a novel with Neural
sign language Network
translator (NN) model
application, to build
which uses the
the
proposed
MediaPipe Holistic model along with Long Short-Term Memory (LSTM), integrated with Neural Network (NN) model to buildinto
translator model. This model will pick up the gestures from different angles quickly and accurately to translate them the
appropriate
proposed text. This
translator paperThis
model. has been
modeldivided intoup
will pick two
theparts, the first
gestures frompart involves
different research
angles andand
quickly dataaccurately
collection,toand the second
translate part
them into
involves training
appropriate and testing
text. This the been
paper has collected datainto
divided for two
real-time
parts, implementation. This model
the first part involves research hasand
been
datatrained and tested
collection, and thewith real-time
second part
data collected
involves from
training andthe Leap the
testing motion controller
collected andreal-time
data for cameras.implementation.
The model provides This appreciable
model has been results.
trained and tested with real-time
data collected from the Leap motion controller and cameras. The model provides appreciable results.
© 2023
© 2020 The
The Authors.
Authors. Published
Published byby Elsevier
Elsevier B.V.
B.V.
This is an open access article under the CC
© 2020 The Authors. Published by Elsevier B.V. BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under
under
This is an open responsibility
responsibility
access ofofthe
article under thescientific
the Conferencecommittee
CC BY-NC-ND Program of Chairs.
the(http://creativecommons.org/licenses/by-nc-nd/4.0/)
license Conference Program Chair
Peer-review under responsibility of the Conference Program Chairs.
Keywords: Sign language; hand gesture recognition; machine learning; artificial intelligence (AI);convolutional neural networks (CNN) ;
Keywords: Sign language; hand gesture recognition; machine learning; artificial intelligence (AI);convolutional neural networks (CNN) ;
1. Introduction
1. Introduction
Impaired people often use sign language as their mode of communication. There is no universal sign language, but
impaired people
Impaired people representing
often use signmany countries
language use their
as their modeunique signs to communicate.
of communication. Theuniversal
There is no shape, position, and move-
sign language, but
ment of the
impaired hands
people are the three
representing main
many elements
countries usethat make
their up asigns
unique signal to express theirThe
to communicate. feelings
shape,and communicate.
position, and move-It
is easyoffor
ment thepeople
hands to
areunderstand and interact
the three main elementswith
thata make
community whentotheexpress
up a signal individual
theircould describe
feelings their perceived
and communicate. It
environment
is visually.
easy for people American Sign
to understand and Language (ASL)
interact with is the mostwhen
a community commonly used signcould
the individual language in thetheir
describe United States.
perceived
environment visually. American Sign Language (ASL) is the most commonly used sign language in the United States.
∗ Corresponding author. Hesham El-Sayed
∗ E-mail address:author.
Corresponding helsayed@uaeu.ac.ae
Hesham El-Sayed
E-mail address: helsayed@uaeu.ac.ae
1877-0509 © 2020 The Authors. Published by Elsevier B.V.
This is an open
1877-0509 access
© 2020 Thearticle under
Authors. the CC BY-NC-ND
Published license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
by Elsevier B.V.
Peer-review
This is an under
open responsibility
access article of the
under the Conference
CC Program
BY-NC-ND Chairs.
license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1877-0509 © 2023 The Authors. Published by Elsevier B.V.
Peer-review
This underaccess
is an open responsibility of the Conference
article under Program Chairs.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the Conference Program Chair
10.1016/j.procs.2023.09.059
426 Amani Abdalla et al. / Procedia Computer Science 224 (2023) 425–430
2 Author name / Procedia Computer Science 00 (2019) 000–000
Sign language is the link that connects us to the world of persons who are deaf or have difficulty speaking. A variety of
hand, finger, arm, head, and facial expressions are used to communicate with impaired persons. It enables individuals
to comprehend the world around them through visual descriptions and hence contribute to society Sign language helps
impaired persons in many ways. Language assists physically challenged persons to communicate among themselves
and also with other common people. Motivates and helps impaired students in their studies. Increases the self-esteem
of the disabled. Instills a sense of social responsibility as well as sensitivity in non-deaf volunteers; who volunteer to
learn sign language in order to communicate with the disabled. Young people quickly adapt to this language because
of its simplicity. Many convolutional neural networks (CNN)-based neural network models have recently demon-
strated exceptional performance in a variety of visual tasks, including image classification, object detection, semantic
segmentation, and action recognition [1]. With today’s level of computer vision and machine learning technology,
interpreting sign language demands a high level of spatial and temporal awareness and is thus viewed as extremely
challenging. It’s worth noting that sign languages differ from hand (finger) languages in that hand languages only
represent each letter in an alphabet with the shape of a single hand, whereas the linguistic meaning of each sign is
determined by subtle differences in the shape and movement of the signer’s body, hands, and, in some cases, facial
expression [2].
In this paper, we are building a system that recognizes sign language gestures and converts them into a text format.
The system will record videos, analyze frames to find objects and extract features, classify recorded gestures, and
then translate them into appropriate text. Our system is designed to improve communication and interaction between
impaired persons. In our proposed approach, we have used the Media pipe framework for processing time-series data.
Unlike the Leap Motion controller, Media Pipe nominally generates a 3D point for each joint, including the fingers,
from a single image [3]. It defines not only the hand but also the pose and the facial gestures. Hand gestures will be
translated into text labels and their motion is recognized using our device’s standard camera capacity, which is another
essential feature for accurate sign language identification.
The main aim of our paper is to develop a software interface that interprets American Sign Language to text in real-
time scenarios, which will assist impaired people to communicate among themselves effectively and eliminate the
need for a manual translator. This paper aims to achieve the highest possible accuracy and least time consumption for
the prediction of symbols compared to the existing models in the literature. The other key goals of this paper are listed
below:
• Detect the hand gestures successfully.
• Translate the identified hand gestures into suitable words.
• Frame the translated words into meaningful and accurate sentences.
• Develop a user-friendly graphical user interface (GUI) to simplify the translation process.
The paper is organized in the following structure. Section 2 discusses the background and the existing literature related
to the paper. Section 3, covers the proposed approach. Section 4, elaborates on the experimental analysis, and Section
5 concludes this paper with a brief summary related to the importance of this paper along with its key inferences.
2. Background and Related Work
The most used sign language is American Sign Language (ASL), which many impaired people find difficult to inter-
pret. So, several models have been developed to translate sign language into readable text and bridge the communi-
cation gap between impaired people and their non-disabled peers. There have been many advances in sign language
translation in the last few years. In sign language, a technique known as using moments is detailed where dynamic
gestures are distinguished from static ones. To display body components and identify hand movements, the method
breaks down the video into frames and segments the skin color [1]. In addition, K-Nearest Neighbor (K-NN) and Hid-
den Markov Models (HMM) are used for classification. [4] A smartphone application has been developed to translate
Kannada sign language into text. The training stage involves the application of a filter (noise reduction technique,
edge detection, or detection format) and the extraction of the features using the histogram-oriented gradient (HOG).
As a final step, k-means clusters were used to categorize each input video clip using two label probabilities for each
feature type [5]. Authors in [6] have highlighted the importance of the Leap Motion Controller device in capturing
Amani Abdalla et al. / Procedia Computer Science 224 (2023) 425–430 427
Author name / Procedia Computer Science 00 (2019) 000–000 3
gestures and training any machine learning model to recognize and translate gestures to the appropriate text.
In [7] the authors present two hidden Markov model-based systems for recognizing sentence-level continuous Amer-
ican Sign Language (ASL) using a single camera to track the user’s unadorned hands. In [8] the authors use two
wearable armbands called Myo, American Sign Language (ASL) can be converted to speech. applied the Gaussian
Mixture Model Hidden Markov Model (GMM-HMM) technique to achieve classification rates of up to 97% for ASL
words (gestures). In [9] the authors proposed solutions scans for the gesture signs using image processing techniques
and extracting the information based on the given image as input to the system. Though many studies have proposed
sign language translators for the Arabic language, none has come up with a complete solution that covers all the
mandatory translation tasks. Moreover, most of the models use a small set of gestures for translation. The models used
for training are obsolete and not very efficient in terms of accuracy and efficiency. None of the studies explains the
mechanism behind the translation after identifying the gestures. As a result of these drawbacks, this paper develops
a translation tool for American sign language. For object detection, feature extraction, and classification, the pro-
posed approach will use MediaPipe holistic with LSTM NN models [10] [11][12]. We have referred to many articles
which use advanced artificial intelligence (AI) concepts to rephrase sentences. The articles guided and initiated us in
developing our proposed text rephrasing model [13] [14] [15].
2.1. Requirement and Analysis
The Functional requirements are listed below:
FR1. The user scans appropriate actions or gestures using the system by connecting to the webcam.
FR2. The system captures the images.
FR3. The system detects the actions.
FR4. The system recognizes the appropriate movement of the hand.
FR5. The system searches the database to match the movement with the predefined gestures.
FR6. The system adds the meaning of the action to the text box above the image in real-time testing.
FR7. The software converts the actions into Text format.
FR8. The system displays the final output to the user in a text format.
2.2. The Non-Functional requirements:
The Non Functional requirements are listed below:
NFR1. The software must be compatible with the Windows operating system.
NFR2. The software must be responsive to all the inputs given by the user.
NFR3. The user interface must satisfy the Windows design guidelines.
NFR4. The system must be accessible 24/7 for the users working around the clock.
NFR5. The application interface must be simple and easy to use.
NFR6. The system must not exceed 2GB RAM.
3. Proposed Work
The paper is designed to translate the signs captured using the webcam into readable words. The proposed architecture
is discussed below in two sections, one for training the captured gestures and the other for testing the gestures. Fig.
1 illustrates the flow of the proposed training model and testing models. For the training part, we collect the gestures
through a webcam and pass them to the Mediapipe holistic framework to extract the key points associated with the
gestures. Further, our proposed system converts the captured video along with its mandatory features into appropriate
image frames. The image frames are fed into a deep neural network model integrated with an LSTM layer to train the
model in sequences [16][17]. High-level applications are developed using Python language along with its extended
packages to perform various activities related to the proposed model.
Moving to the testing part we use the captured videos and test the trained model using LSTM with the help of a deep
neural network.
428
4 Amani
Author Abdalla
name et al. /Computer
/ Procedia Procedia Science
Computer00 Science 224 (2023) 425–430
(2019) 000–000
(a) Proposed Training Model (b) Proposed Testing Model
Fig. 1: Proposed Models
Fig. 2: Proposed Text Framing Model
3.1. Converting gestures to text
Different gestures collected represent different classes. This study creates a knowledge base of simple vocabulary
representing different gestures. An innovative algorithm that uses advanced AI concepts is used in this study to phrase
the matching gestures to the appropriate text. The proposed algorithm follows simple compiler design techniques.
After the text is framed from the identified gestures, they will be in a jumbled state, hence the proposed algorithm
rephrases the jumbled words into meaningful text. The algorithm rearranges the accordingly and adds the subject,
verb, and adjective to the gesture objects to frame the text. Fig. 2, illustrates the flow of the proposed text-framing
mechanism [13] [15].
4. Experimental Analysis
Advanced Python packages like TensorFlow, Tensor Board, Keras, OpenCV, Media pipe, Skit-learn LSTM neural
networks with dense layers and MATLAB were used to design and develop the proposed models. The paper was built
in several stages, the media pipe holistic was used as a start to set the key points to the detected actions, then actions
were taken with the holistic key points, and they were included in the labeled files, and then each sequence of the
actions was divided into thirty frames, and these actions were trained by using a sequential model with LSTM neural
network-integrated with dense layers, We merged the above two approaches, to minimize the data required to produce
a hyper-accurate model. Moreover, the proposed approach consumes less time to train the models. After completing
the training and ensuring the accuracy of recognition, we started to implement the model, and the result showed that
the system performs efficiently, quickly, and with excellent accuracy. The following steps were followed to evaluate
the performance of our approach.
• Initially check whether the right gesture sign is captured.
• Check whether the right image frame is fed as an input to the training model.
Amani Abdalla et al. / Procedia Computer Science 224 (2023) 425–430 429
Author name / Procedia Computer Science 00 (2019) 000–000 5
Fig. 3: Various gestures and their corresponding translated text
Fig. 4: Comparison with other Machine Learning Models
• Test if the system can match the detected gesture sign with the standard ASL gestures, to estimate the accuracy
of the output.
• Estimate the execution time of the proposed models.
• Testing the accuracy
430 Amani Abdalla et al. / Procedia Computer Science 224 (2023) 425–430
6 Author name / Procedia Computer Science 00 (2019) 000–000
Fig. 3, illustrates various gesture signs along with their matching text. For the purpose of model evaluation, we used
Sklearn matrix to implement the results. Fig. By using Tensor Board, we were able to notice the improvement of the
system in accuracy, and a decrease in loss significantly, this is due to selecting and reducing the neural networks model
parameters when compared with other machine learning models. The results obtained are portrayed in Fig. 4.
5. CONCLUSION
To sum up, we believe that sign language translation is an important area of research. We also believe that the proposed
system will transform the lives of impaired persons. The suggested translator is a full-fledged solution that converts
more gesture signs to their matching text. The advanced machine learning models used in the proposed system help
to efficiently recognize, extract features, and classify the gesture signs. Moreover, the suggested pattern-matching
algorithm intelligently translates the recognized gestures into the appropriate text. The solution will help impaired
people to have a clear vision of the identified gestures representing a real-time event. In the future, we intend to train
the proposed model with more gesture signs in order to transform the model into a full-fledged functioning mode.
Acknowledgements
This work was supported by United Arab Emirates University Summer Undergraduate Research Experiences (SURE
PLUS Project Grant G00003898).
References
[1] R. Thakare, P. K. Pathan, M. Lokhande, and N. Waje, “Dynamic and static gesture recognition system using moments,” Int. J. Adv. Eng. Res.
Sci., vol. 4, no. 2, pp. 69–72, 2017.
[2] S. Bhushan, M. Alshehri, I. Keshta, A. K. Chakraverti, J. Rajpurohit, and A. Abugabah, “An experimental analysis of various machine learning
algorithms for hand gesture recognition,” Electronics, vol. 11, no. 6, p. 968, 2022.
[3] S. Bowen, “KROS Development Blog,” https://kros.dev/2021/09/15/leap-motion-vs-mediapipe/, 2022, [Online; accessed 19-Jan-2023].
[4] R. M. Kagalkar and S. V. Gumaste, “Mobile application based translation of sign language to text description in kannada language.” Int. J.
Interact. Mob. Technol., vol. 12, no. 2, pp. 92–112, 2018.
[5] B. Khelil, H. Amiri, T. Chen, F. Kammüller, I. Nemli, and C. Probst, “Hand gesture recognition using leap motion controller for recognition
of arabic sign language,” in 3rd International conference ACECS, vol. 16, 2016, pp. 233–238.
[6] Á. Aguilera-Rubio, I. M. Alguacil-Diego, A. Mallo-López, and A. Cuesta-Gómez, “Use of the leap motion controller® system in the re-
habilitation of the upper limb in stroke. a systematic review,” Journal of Stroke and Cerebrovascular Diseases, vol. 31, no. 1, p. 106174,
2022.
[7] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign language recognition using desk and wearable computer based video,” IEEE
Transactions on pattern analysis and machine intelligence, vol. 20, no. 12, pp. 1371–1375, 1998.
[8] B. Oktekin, “Development of turkish sign language recognition application,” Near East University, 2018.
[9] A. Sharma, A. Mittal, S. Singh, and V. Awatramani, “Hand gesture recognition using image processing and feature extraction techniques,”
Procedia Computer Science, vol. 173, pp. 181–190, 2020.
[10] M. A. Rau and J. P. Beier, “Exploring the effects of gesture-based collaboration on students’ benefit from a perceptual training.” Journal of
Educational Psychology, vol. 115, no. 2, p. 267, 2023.
[11] K. T. Alhamazani, J. Alshudukhi, S. Aljaloud, S. Abebaw et al., “Hand gesture of recognition pattern analysis by image treatment techniques,”
Computational and Mathematical Methods in Medicine, vol. 2022, 2022.
[12] A.-K. Jokipohja and N. Lilja, “Depictive hand gestures as candidate understandings,” Research on Language and Social Interaction, vol. 55,
no. 2, pp. 123–145, 2022.
[13] X. Zhao, “Leveraging artificial intelligence (ai) technology for english writing: Introducing wordtune as a digital writing assistant for efl
writers,” RELC Journal, p. 00336882221094089, 2022.
[14] A. Gupta, A. Agarwal, P. Singh, and P. Rai, “A deep generative framework for paraphrase generation,” in Proceedings of the aaai conference
on artificial intelligence, vol. 32, no. 1, 2018.
[15] J. Becker, J. P. Wahle, T. Ruas, and B. Gipp, “Paraphrase detection: Human vs. machine content,” arXiv preprint arXiv:2303.13989, 2023.
[16] J. S. Clarke, N. Llewellyn, J. Cornelissen, and R. Viney, “Gesture analysis and organizational research: The development and application of a
protocol for naturalistic settings,” Organizational Research Methods, vol. 24, no. 1, pp. 140–171, 2021.
[17] A. Sharma, A. Chopra, M. Singh, and A. Pandey, “American sign language gesture analysis using tensorflow and integration in a drive-
through,” in Advances in Computing and Data Sciences: 6th International Conference, ICACDS 2022, Kurnool, India, April 22–23, 2022,
Revised Selected Papers, Part I. Springer, 2022, pp. 399–414.