ASL to Text/Speech App Development
ASL to Text/Speech App Development
Abstract—This paper presents an analysis of the performance colleagues and peers regardless of whether the second person
of different techniques that have been used for the conversion of knows sign language.
sign language to text/speech format. Using the best possible
method after analysis, an android application is developed that Being able to achieve a uniform sign language translation
can convert real-time AS L (American S ign Language) signs to machine is not a simple task, however, there are two common
text/speech. methods used to address this problem namely sensor based
sign language recognition and Vision-based sign language
Keywords—sign language, ASL, image processing, machine
learning recognition. Sensor based sign language recognition [12] uses
designs such as the robotic arm with a sensor, smart glove,
golden glove for the conversion of ASL Sign language to
I. INT RODUCT ION
speech. But the issue is that many people do not use it. Also,
466 million people worldwide have impaired hearing loss one must spend money to purchase such a glove, which is not
(more than 5 percent of the world 's population), 34 million of easily available. Vision based Sign Language Translation
whom are teenagers, according to the World Health
[13][14] uses Digital Image Processing. It is a framework
Organization (WHO). Studies expect these figures would
which is utilized to perceive and interpret nonstop gesture-
surpass 900 million by 2050. Moreover, most cases of
debilitating hearing loss affecting millions of people are based communication to English content. In vision-based
concentrated in low- and middle-income countries. gesture recognition, a camera is used as input. Videos are
broken down into frames before processing. Hence vision-
based methods are preferred over gesture-based approaches as
Sign Languages allow the dumb and deaf people to anyone with a smartphone can convert sign language to
communication with each other and the rest of the world. text/speech and it is relatively cost-effective.
There are over 135 different sign languages around the world
which include American Sign Language (ASL), British Sign In this paper, the method of developing an android
Language (BSL) and Australian Sign Language (Auslan) etc. application is demonstrated for the vision-based approach, of
sign language to text/speech conversion without any sensors,
American Sign Language has been created to reach the by only capturing video of the hand gestures, completely
wider public and acts as the primary sign language of the free of an y cost.
Deaf populations in the United States and much of
Anglophone Canada, als o including most of West Africa and II. METHODOLOGY
areas of Southeast Asia.
A. Overview
People with hearing impairments are left behind in online In this paper, 26 ASL alphabets are used along with 1
conferences, office sessions, schools. They usually use basic customized symbol for ‘Space’ which is to be recognized in
text chat to converse — a method less than optimal. With the real-time using a smartphone. For this purpose, here the One
growing adoption of telehealth, deaf people need to be able to Plus 6 smartphone with OxygenOS (based on Android Oreo)
communicate naturally with their healthcare network, operating system has been used. The algorithm is developed on
top of a Java-based OpenCV wrapper. The entire system was since the SVM algorithm works more precisely with smaller
developed using images that are of 200 x 200 pixels in RGB datasets. The dataset created for space consists of 100 images
format. of the gesture as well. In total, there are 27 classes (26
‘Alphabet’ classes + 1 ‘Space’ class) where considered the
To design an appropriate model, the first thing is to ‘Space’ as a separate class.
understand what features will be the most appropriate to extract
from static images. Examples of such features include Radial
signature, Histogram of gradients (HOG) [1], centroid distance
signature, Fourier descriptors.
The technique which is the most appropriate for this
scenario is Histogram of gradients (HOG) descriptors . HOG is
preferred because the appearance and shape of a local object
can be easily detected by means of intensity gradients or edge
directions. The image is divided into small connected regions
called cells, and a histogram of gradient directions is compiled
for the pixels within each cell. The descriptor is the
concatenation of the histograms. For higher accuracy, local
histograms are contrast-normalized by measuring the intensity
variance over a wider area of the image, called a block, and
then using this value to normalize all cells within a block. This
normalization results in greater invariance with shifts in
lighting and shadowing.
Support Vector Machine (SVM) [8][9][10], a machine
learning algorithm uses HOG descriptors as the features of the
image. Hence, SVM is used to train our model and this
experimentation deals with using three different array
parameters for SVM and comparing the results of each. The Figure 2: Architecture of the android application
three array parameters are Detection Method, Kernel and
Dimensionality reduction type. The following are the different
types of array parameters that are used for training.
III. IMPLEMENTATION
Detection Method - Contour Mask, Canny Edges,
Skeleton The application is designed and implemented using Android
Kernel - Linear, Radial Basis Function (RBF) Studio and OpenCV [15] functions in Java.
Dimensionality Reduction Type - None, Principal
Component Analysis (PCA) A. Calibration
Here, color-based segmentation has been implemented
provided using libraries present by OpenCV. This can be done
B. DataSet Used by understanding all the different skin tones and their HSVA
The dataset used for this paper is the ASL Kaggle dataset (Hue, Saturation, Value, Alpha) Configurations. The following
[2], which contains 3000 images for every alphabet of the lower and upper bounds define all the skin tones possible. Only
English vocabulary. Here another character has been if the image possesses pixel values in this range, the frame will
introduced which is unique from all other hand gestures for the be considered for classification else it will be discarded.
purpose of acting as an indication of completion for the // H lowerBound.val[0] = 0; upperBound.val[0] = 25;
previous word. This special sign called the ‘Space’ allows the
user to form sentences in a very simple fashion. For example, // S lowerBound.val[1] = 40; upperBound.val[1] = 255;
this space gesture will be used to separate ‘hello’ and ‘world’ // V lowerBound.val[2] = 60; upperBound.val[2] = 255;
in the sentence ‘hello world’.
// A lowerBound.val[3] = 0; upperBound.val[3] = 255;
The image is then blurred using gaussian blur for easy
processing. The next step is to find contours of the largest area
of the frame wherein skin color is present. The main contour is
applied to the largest area and child contour is also applied
within the largest skin color area so that even if there are two
patches of skin, say one full hand and the other some one’s
Figure 1: Hand gesture for ‘Space’ finger, between those can be easily differentiated. A matrix is
used to represent the contours of the skin area.
Since our dataset has 3000 images of all other characters,
reduced it to 100 distinct images of each character for training
B. Processing of frame
The following diagram summarizes the steps involved in the
processing of the frame.
C. Detection Method
Figure 3: Frame Processing Diagram
1. Contour masking
The steps involved in the processing of the frame are: Contours may be defined precisely as a curve that connects all
the continuous points (along the boundary), with the same
An input image is read as BGR (Blue, Green, Red) color or intensity. The contours are a valuable resource for the
Format since OpenCV uses BGR instead of RGB
study of the structure and for identification and recognition of
Format, then the image is converted to RGB for image
an object like a hand as shown in figure 6.
processing.
H. Use Case Diagram Table 1: Performance comparison of all array parameters in SVM