Wheelchair Control Using Speech Recognition: P. B. Ghule and M. G. Bhalerao R. H. Chile and V. G. Asutkar
Wheelchair Control Using Speech Recognition: P. B. Ghule and M. G. Bhalerao R. H. Chile and V. G. Asutkar
Abstract—In this paper a speech controlled wheelchair for [1]. Technology of voice Recognition was first used in 1999
physically disabled person is developed which can be used for by Siamo University of Alcala, Spain [1]. They designed a
different languages. A speech recognition system using Mel wheelchair which was controlled by head gesture and also
Frequency Cepstral Coefficients (MFCC) was developed in the
laptop with an interactive and user friendly GUI and the normal voice commands as a secondary parameter. In 1999-2000
wheelchair was converted to an electric wheelchair by applying in India, CEERI University developed voice cum auto steer
a gear mechanism to the wheels with DC motor attached to wheelchair which had a line following mode along with voice
the gear. An Arduino Uno board is used to acquire the control controlling mode [1]. In 2002 Mr. Huri in Yonsei univer-
signal from MATLAB and give it to relay driver circuit which sity, Koria designed a wheelchair with multiple controlling
intern results in the motion of wheelchair in desired direction.
The speech inputs such as forward, back, left, right and stop modes, including facial gesture, EMG signal from neck and
acquired from the user and then the motion of the wheelchair voice commands [1]. In 2007, M. Nishimory and T. Saitoh
made according to the respective command. The performance proposed a voice controlled intelligent wheelchair. The user
of MFCC in presence of noise and for different languages was could control the wheelchair through voice commands in
studied to know the reliability of the algorithm in different Japanese language [2]. Voice controlled wheelchair using
condition.
DSK TMS320C6711 was proposed in April 2009 by Qadri
I. I NTRODUCTION M. T. and Ahmed S. A. which uses a DSP processor
from Texas Instrument for voice signal processing [3]. Zero
The principal moto behind the research i.e., benefit of the crossing count and standard deviation of spoken words are
research to the common persons and country is least bothered the algorithms used for the voice recognition by them. In July
by the researchers. There is a need to thing about the product 2009 a robust speech recognition is applied to voice driven
which will benefit people in society along with its benefit to wheelchair by Akira sasou and hiroaki Kojima [4].They had
nation. We will focus on one of the issue faced by the people used the array of microphones attached to wheelchair for
i.e. physical disability and will try to provide an engineering voice input. This wheelchair had a disadvantage of more
solution i.e. an electric wheelchair with maximum liberties processing time for voice recognition. In 2013, A. Ruiz
and minimum costing. The need to reduce the cost of was Errano and R. P. Gomez had developed a dual control system
elaborated by Dr. Amartya Sen in his keynote address held which was capable of driving a wheelchair through tongue
at World Bank’s conference on the issue of disability, the and speech signal [5].
poverty line for physically disabled people should also take
into account the greater expenses they suffer in exercising Mostly the electric wheelchair developed run with the help
with what purchasing power they have. A study in the U.K of joystick. The further solutions proposed for making it
found that poverty rate for the disabled people was 23.1% more comfortable are controlling the wheelchair using tongue
as compared to 17.9% for the non-disabled people, but when movement [5], hand gesture [6], voice command [7] and brain
their greater expenses associated owing to their disabled were control interface [8]. The tongue is not much feasible as when
considered, the poverty rate for the people with disabilities we are using tongue to control wheelchair we cannot talk
was shot up to 47.4% . This tells us the higher expenses and and might be hectic for the user for long term use. The hand
the need to reduce this cost. gesture [6] is better option than tongue but it will cause pain
The application of technology in the field of wheelchairs and discomfort after an ample amount of time. The Brain
was first tried by George Klein in 1953[4] consequently this control interface [8] is effective but a very costly solution
area of technology i.e. electric wheelchair is continuously for wheelchair control and can give a tiring experience to
being flourished and expanding immensely with magnificent the user it also requires a lot of setup for acquiring the brain
discoveries which aims to makes user more competent and signal, then processing it and extracting an exact information
potent in the society. In 1986, Arizona State University, for the use. The voice controlled Wheelchair gives a far
U.S. occasionally launched a wheelchair which used machine better platform for the wheelchair control considering the
vision to identify landmarks and center wheelchair in hallway accessibility and comfortableness of the user. We finally want
978-1-5090-3251-8/16/$31.00 ©2016 IEEE
to our user to be a potent citizen of the country. frequencies. To boost the magnitude of higher frequencies,
Speech is a natural way of communication used by the input speech waveform is pre-emphasized by a first order
human. It is a resilient way to interchange the information filter with transfer function;
in between two person. This concept motivated many
researchers to use speech as communication channel for man H(Z) = 1 − αz −1 ; 0.9 ≤ α ≤ 1.0 (1)
machines interaction. This gives rise to Speech Processing.
In the 1920’s, speech recognition came into existence.
First machine to recognize speech commercially named
Radio Rax(toy) was been manufactured. Advance research
in speech processing began in early 1936 at Bell Lab. In
1939, Bell Lab demonstrated a speech synthesis machine
invented by them at the World Fair, New York. In decade of
1940-1950 many of the researchers tried to utilize the basic
ideas of acoustic, phonetics and speech properties based on
The transformed Frequency domain signal contains both I. Mean Square Error
real and imaginary values in the signal. This signal is The classifier is used to classify the input and give the rec-
converted to Real values using below formula to reduce the ognized word. There are many classifiers like ANN, GMM,
mathematical complexities. HMM but we have used Mean square error as classifier
because of its mathematical simplicity.
2 2
Xp (k) = (Re (Xp (k))) + (Im (Xp (k))) MSE is a frequently used measure of the differences
between estimated vector and Actual or Ideal vector. In this
It eliminates the imaginary part thus brings down the math-
application MSE is used to calculate distance between the
ematical computation drastically.
ceptral coefficients of newly recorded signal with ceptral co-
efficients of pre-recorded signals. The MSE will be minimum
E. Mel-Filter bank creation for the best matched feature vectors. If R1 are a vector from
In signal processing the raw data is converted to informa- MFCC of Pre-recorded signal and R2 are MFCC of newly
tive, non-redundant leading to better human interpretation. recorded signal then MSE can be calculated using below
Generally feature Extraction is used to reduce size of the equation,
vectors.When input is large to be handled and is supposed
to be spare, then it can be altered to a condensed form n
1
called features vectors called as feature extraction. Mel- M SE = (R2 − R1 )2 (4)
frequency cepstrum (MFC) is used in speech processing n i=1
which is depiction of short-term power spectrum of sound,
Where, n is length of both vectors.
dependent on linear cosine transform of log power spectrum
The MSE of Entra-class (different recordings of same
on nonlinear mel scale of frequency.
words) feature vectors is less, while MSE of Enter-class
Triangular membership function are generated using the
(different recordings of different words) is more. Hence
formula given below.
one can find the relation between two feature vectors by
⎧
⎪
⎪
⎪
⎪
0
lf (k)−lf (m−1)
f or lf (k) < lf
c
(m − 1) calculating the MSE between them.
⎪
⎪ c
⎪
⎨l f or lf (m − 1) ≤ lf (k) < lf (m)
fc (m)−lfc (m−1) c c
M (m, k) =
lf (k)−lf (m+1)
⎪
⎪
⎪
⎪
⎪ l
c
(m)−lf (m+1)
f or lf
c
(m − 1) ≤ lf (k) < lf
c
(m) III. H ARDWARE I MPLEMENTATION
⎪
⎩ fc
⎪ c
0 f or lf (k) < lf (m + 1)
c A. Mechanical Design
F. Linear to Mel-frequency conversion In this project a readily available traveling type wheelchair
is used and modified. Pair of DC geared motors are connected
Stevens, Volkmann, and Newman in 1937 proposed mel to the rear wheels by sprocket and chain mechanism as shown
scale, which is perceptual scale of pitches equidistant from in Fig.2. Front wheels are of caster type and are free to rotate
each other according to the listeners. Mel arises from word in angle of 360 degree.
melody to designate that scale is dependent upon pitch
comparisons. Formula to convert f hertz into mel scale is: B. Hardware arrangement
An arduino board is connected to PC. A relay driver circuit
f is attached to arduino board. Relay driver is attached to the
M el(f ) = 2595 log10 (1 + ) (3)
700 motors attached to the Chain and sprocket arrangement.
!
TABLE II
C ORRECT RECOGNITION ’ S FOR DIFFERENT LINGUISTIC S IGNAL USING
MFCC
!
No. of Recognised
Language
Signals
Hindi 1000
Marathi 1000
Fig. 5. Speech recognition from database Bengali 1000
Kannada 1000
Tamil 1000
F. Control Signal Generation Telugu
Malayalam
1000
1000