International Conference on Electronics, Communication and Aerospace Technology
ICECA 2017
                Indian Sign Language Converter System
                        Using An Android App.
                   Pranali Loke                                  Juilee Paranjpe                                    Sayli Bhabal
           Department of Computer                          Department of Computer                           Department of Computer
      Vidyalankar Institute of Technology             Vidyalankar Institute of Technology              Vidyalankar Institute of Technology
          Wadala(E), Mumbai, India.                       Wadala(E), Mumbai, India.                        Wadala(E), Mumbai, India.
       Email: pranali.loke18@gmail.com                Email: juileeparanjpe11@gmail.com                 Email: saylibhabal26@gmail.com
           Ketan Kanere
     Department of Computer
Vidyalankar Institute of Technology
    Wadala(E), Mumbai, India.
 Email: ketan.kanere9@gmail.com
   Abstract—In this paper, we introduce a sign language converter        more. The HSV (Hue, Saturation and Value) color model in
system using hand gesture recognition feature to recognize the           image processing is used skin detection. The openCV library
gestures in Indian sign language and convert them to a natural           in Matlab is used to convert the RGB color space to HSV one.
language. Sign language is a large set of hand gestures that
are used to communicate with the hearing impaired and mute.
The proposed system uses Hue, Saturation, Intensity (HSV)                An android application with a user-friendly graphical
color model for hand tracking and segmentation. We have used             interface that captures gestures and interprets them to text
supervised learning for training the neural network which is             in a natural language will enable the disabled people to
responsible for data classification. An android application is used       connect and interact with the masses without the need of a
to capture images of hand gestures which are sent to a web
hosting server, from where they are given as input to the neural         controller. All input images are captured by camera present
network in MATLAB for pattern recognition. The hand gesture              in an android device. The captured images are then sent to
is mapped to its equivalent in natural language in MATLAB and            the Matlab via an online server and are converted in binary
the converted text is sent back to the users device. It is an easy-to-   form of size defined so that it takes less time and memory
use and inexpensive approach to recognize single handed as well          space during pattern recognition. The converted text is then
as double handed gestures accurately. This system will facilitate
communication for millions of deaf and mute people and aid in            sent back to the users device.
communicating with people who dont understand sign language.
                                                                         Rest of the paper is organized as follows. Section II covers
   Index Terms—Hand gesture recognition, HSV color model,                the existing technologies and need for a new technology for
neural network, supervised learning, pattern recognition, hand           hand gesture recognition. Section III talks about the proposed
tracking and segmentation.
                                                                         system and discusses the architecture used for implementing
                                                                         the system. Section IV discusses implementation of android
                        I. I NTRODUCTION
                                                                         application to send the images to the server. Section V talks
   Gesture recognition is an interpretation of gestures using            about image processing and data classification using neural
mathematical algorithms. It comprises of any gestures made               network in MATLAB. Section VI cover sending data to the
by face or hands. This technique has a wide range of                     neural network from a smart phone via web hosting server
applications and uses. Gesture recognition can be used to                and sending the converted text back to users device. The
control devices merely through gestures without any physical             paper ends with conclusion and future scope.
link with the actual machine. Using gesture recognition, a
person can point at the computer screen or at a mobile screen                              II. LITERATURE SURVEY
and use sign language to select and use different applications              Applications of image processing in diverse areas are
in the device. Sign language used by deaf and mute people                growing rapidly. One of the applications of image processing
can be interpreted using this technique and converted to text            is hand gesture recognition using pattern recognition
thus enabling and aiding better communication between the                algorithms. For tracking and recognizing hand gestures
deaf and mute and people interacting with them.                          different approaches are currently used.
Image processing is usually used for gesture recognition                 Previously, systems have been developed to convert sign
as it provides features like pattern recognition, texture                languages like the American sign language (ASL) to natural
understanding, content-based retrieval, compression and many             languages. However, no such system exists for converting
                                                    978-1-5090-5686-6/17/$31.00 ©2017 IEEE                                             436
                                                        International Conference on Electronics, Communication and Aerospace Technology
                                                                                                                           ICECA 2017
Indian sign language (ISL) to natural language. Two such            assumption that the inputs are images, which allows to encode
existing technologies for recognizing hand gestures have been       certain properties into the architecture. Hence, the imple-
discussed below. The two main approaches used are Data              mentation of the forward function becomes efficient and this
Glove based technology and Vision-based approach.                   reduces the amount of parameters in the network greatly.
                                                                    Also the neurons in convolutional neural network are arranged
A. Data Glove Technology                                            in 3 dimensions: width, height and depth. But there is one
   Data glove is an electronic device that senses the movements     disadvantage associated with this technique that it involves
of the hand and ,also, the fingers individually and sends these      high computational cost hence to train this neural network
movements to the computer in the form of analog or digital          computer with good GPU is required[4].
signals. These digital signals are then mapped to the task to
be performed in the virtual environment[5]. On this glove,                               III. PROPOSED SYSTEM
various sensors are placed to detect the global position and
relative configurations of the hand[1]. Data glove is used in
fields like 3D animation, 3D sketching and also in the medical
world. It is basically used to interact with the 3D models[5].
It is an expensive device because of the multiple sensors
used. Because of the shortcoming of data glove based systems,
vision-based systems are used, which are wireless and the only
thing needed is one or multiple cameras. Using motion sensing
input devices such as Microsoft Kinect is also another method
but this device is also expensive.
B. Vision-based Techniques
   There are several vision-based gesture recognition tech-
niques commonly used for static and dynamic gesture recog-
nition. We have discussed a few of those below:
   1) Support      Vector      Machine:      Support       Vector
Machine(SVM) is a supervised machine learning algorithm
used in pattern recognition to analyse data and recognize
patterns. It is used for classification and regression analysis.
It builds a training model and predicts the category of an
unknown sample. The objective of SVM modelling is to                              Fig. 1: Flowchart of Proposed System
construct a hyperplane as the decision plane, which separates
the positive and the negative binary classes with the largest          We propose to implement the sign language converter
margin. SVM is not just limited to 2 class classification            system as an android application. This application will be
problem; it can also be used for multi-class classification          able to detect gestures in the Indian Sign Language (ISL).
problem using pair-wise top-down, bottom-up classification           The user will have to capture images by positioning camera
methods[6].                                                         it in front of the person who is making the gestures using
   2) K-Nearest Neighbours(K-NN): K-nearest neighbour               this app. These images will then be sent to the server. The
classifier is a machine learning algorithm which relies on           server will send these images to MATLAB where gesture
the distance between feature vectors. It classifies an unknown       recognition using neural network will take place. The neural
data object by finding the most common class among the k-            network is trained to recognize ISL gestures. The image will
nearest examples in the feature space. The algorithm stores         be mapped to the corresponding ISL gesture and converted
feature vectors and labels of the training images. A major          to text. The converted text will be sent back to android
disadvantage of K-NN algorithm is that it uses all the features     application via server and the output will be displayed in the
equally while computing for similarities; which can lead to         users smartphone.
classification errors ,especially, when a small subset of features
is used for classification[7].                                       For classification of gestures many applications make
                                                                    use of regression analysis. But regression analysis is based
C. Convolutional Neural Network                                     on assumption that the cause and effect relationship between
   Another important technique used for hand gesture recogni-       the variables remains unchanged. This might not always be
tion and classification is use of convolutional neural networks.     the case and hence estimation of the variable values done on
Convolutional neural networks are similar to normal neural          the basis of the regression equation may lead to erroneous
networks with few major changes. Convolutional network              results. Hence, in this project, we present a vision-based
has raw image pixels to one end and class scores to other           system which uses the built-in camera provided in android
end. Convolutional network architectures make the explicit          device.
                                                978-1-5090-5686-6/17/$31.00 ©2017 IEEE                                             437
                                                        International Conference on Electronics, Communication and Aerospace Technology
                                                                                                                           ICECA 2017
                                                                         V. DATA CLASSIFICATION USING NEURAL
For classification purpose, we are making use of neural                              NETWORK IN MATLAB
network trained using Scaled Conjugate Gradient Back-                  We perform two major operations on an image to extract
propagation algorithm. For feature extraction of images and         information from the image which acts as the input to the
classification using neural network, we will be making use of        neural network. These operations are explained briefly below:
MATHWORKs Matlab. We collected a vast and varied dataset
for training purpose of the neural network. This system will
allow user to capture images from remote locations and then
these images will be sent to a server which will forward the
images to MATLAB for processing where a trained neural
network will be present. The response will be sent back to
server and server will send back its response in the form of
its textual meaning to the android device.                                                   (a) Original color image
                                                                           (b) Black and white image after performing preprocessing
                 Fig. 2: System Architecture                                                   (c) Tokenized image
                                                                     Fig. 3: Preprocessing and Tokenizing of alphabet A of ISL.
   This system will be implemented using client-server ar-
chitecture. On the client machine, there will be an android            1)Preprocessing: Here we convert colored image to a b/w
application which will be used to capture images of the             image using HSV skin color extraction technique. In this
gesture. Once the image is captured, it will be uploaded on         technique, HSV color space is used instead of RGB color
the server using the android application. Upon reaching the         space[10]. Hue represents the dominant color, saturation
server, feature extraction of the user-uploaded image will take     indicates the intensity of the color and value represents the
place in the feature extraction module. The features extracted      brightness[11]. This technique distinguishes the skin pixel
are nothing but the angles between different parts of the edges     and majority of the gesture is retained and unnecessary
of the gesture that are calculated. These features in the form of   background and noise is eliminated. Figure 3b shows a
angle-values are sent to neural network, trained for classifying    preprocessed image.
different gestures to their textual meaning. This system will       2)Tokenizing: We use Sobel edge detection technique to
be implemented using client-server architecture[9].                 trace the boundary . Sobel edge detection technique perform
                                                                    a 2D spatial gradient quantity on a image and points with
                                                                    high spatial frequency are highlighted[12]. We then sample
           IV. ANDROID IMPLEMENTATION                               the points on the edge of the gestures and obtain the angle
                                                                    between the vectors formed by consecutive points. This
   The android app for the system was developed using An-           operation can also be called as feature extraction[2]. Figure
droid Studio. In this system, we are using okhttp3, an HTTP         3c shows a tokenized image.
library, that makes networking for Android apps easier and
faster. It facilitates automatic scheduling of network requests.       Using complete image as an input to the neural network
It integrates easily with any protocol and provides support         would be inefficient; hence, it is advisable to extract features
for raw strings, images, and JSON. For accessing camera in          from an image and then use them as inputs as they sufficiently
android device, camera2 class provided by Android Studio            represent a particular image and are smaller in dimensions.
was used. The android app will invoke the camera in-built in        Detailed explanation of each operation is given below:
the android device. For this, various permissions like camera
access, read and write external storage, read phone state and       Algorithm: Preprocessing.
internet permission were given. The captured images were then       Input: Image with RGB color map which contains gesture.
sent to the server. These images were then provided as input        Output: Binary image where gesture pixel is 1 and rest is 0.
to Matlab for pattern recognition and classification.
                                                978-1-5090-5686-6/17/$31.00 ©2017 IEEE                                             438
                                                      International Conference on Electronics, Communication and Aerospace Technology
                                                                                                                         ICECA 2017
Steps :                                                           video format. Many other gestures of the sign language can
1. Resize input image rgbimg to fixed predefined size: [rows,       also be made a part of the database.
cols].
                                                                                               R EFERENCES
2. Convert RGB color channels of rgbimg to HSV channel
denoted by hsvimg.                                                [1] Pragati Garg, Naveen Aggarwal and Sanjeev Sofat. Vision Based Hand
                                                                      Gesture Recognition
3. Set HSV thresholds for skin color detection[3].
[0 <= H <= 50]                                                    [2] Jagdish L. Raheja, A. Singhal*, Sadab*,Ankit Chaudhary Android based
[0.05 <= S <= 0.8]                                                    Portable Hand Sign Recognition System.
[0.25 <= V <= 1]                                                  [3] V. A. OLIVEIRA, A. CONCI. Skin Detection using HSV color space
4. Apply these thresholds on hsvimg to filter out only skin
colored component (i.e., pixel value 1 if it falls in the         [4] CS231n Convolutional Neural Networks for Visual Recognition [Online].
                                                                      Available: http://cs231n.github.io/convolutional-networks/ [Accessed:
threshold).                                                           23-Mar-2017]
5. Eliminate blobs that are less than permissible size: [150
px] and store as noisereducedimg.                                 [5] Piyush Kumar, Jyoti Verma and Shitala Prasad Hand Data Glove: A
                                                                      Wearable Real-Time Device for Human-Computer Interaction.
6. Smooth out the contours of the white blobs in
noisereducedimg using structuring element as disk and             [6] Mrs.A.R.Patil, Dr.S.S.Subbaraman A Review On Vision Based Hand
call it smoothing.                                                    Gesture Recognition Approach Using Support Vector Machines.
7. Fill all closed blobs which have holes in them.                [7] Jinho Kim, Byung-Soo Kim, Silvio Savarese Comparing Image
8. Retain only the blob having the largest filled area filledimg        Classification Methods: K-Nearest-Neighbor and Support-Vector-
and call the new image gesturebwimg.                                  Machines.
9. return gesturebwim.                                            [8] Android - JSON Parser [Online]. Available: https://www.tutorialspoint.
                                                                      com/android/android json parser.htm/ [Accessed: 25-Mar-2017]
VI. SENDING TEXT FROM MATLAB TO ANDROID
                                                                  [9] Pieter Vermeulen, Todd F. MozerClient/server architecture for text-to-
   The mapping of image to a gesture will give us the text            speech synthesis .
corresponding to that gesture. This text will be sent to
android application via a server. For this, MATLAB will use       [10] Michael W. Schwarz, William B. Cowan, John C. BeattyAn experimental
                                                                      comparison of RGB, YIQ, LAB, HSV, and opponent color models.
RESTful API. REST means representational state transfer. It
is a common architectural style for web services. RESTful         [11] Gururaj P. Surampalli, Dayanand J, Dhananjay M An Analysis of skin
APIs or interfaces provide standard HTTP methods such as              pixel detection using different skin color extraction techniques.
GET, PUT, POST, or DELETE. RESTful API will be used               [12] Muthukrishnan R., M.Radha Edge detection techniques for image
for request-response relationship with the server and JSON            segmentation.
to represent MATLAB datatypes.
JSON object is used to send text from server to the
android device. JSON stands for JavaScript Object Notation
and is used to exchange data to/from a web server. There
are four different classes provided by Android to manipulate
JSON data. We have used JSONObject class to receive text
from the server[8].
      VII. CONCLUSION AND FUTURE SCOPE
   In this paper, a new approach was proposed for static
gesture recognition on the resource constrained devices like
smartphones. The focus of this research is applying HSV color
model for feature extraction. From the study of skin color
segmentation methods, we conclude that there is a problem
of hand segmentation using normal RGB camera in skin
color background as well as change in the light conditions.
Proposed sign language recognition system recognizes the
gestures in constrained environment like dark background. The
performance of the algorithm used can decrease due to vary-
ing lighting conditions, lighter background and background
noise. So, further work should focus on hand segmentation
method on the resource constrained devices with varying light
conditions and skin color background. This system can be
further developed to recognize gestures in real time and in
                                              978-1-5090-5686-6/17/$31.00 ©2017 IEEE                                             439