Comparative Analysis Study of Human Activity
Recognition Using Various Techniques
*
Muhammad Hassan 1 , Tasweer Ahmad 2 t, Sadaf Ali 3!
Department of Electrical Engineering, Government College University Lahore Pakistan t
Department of Electrical Engineering, Government College University Lahore, Pakistan
!
*
III
Department of Computer Science, University of Lahore Pakistan
!
Email: * Engr.muhamadhassan@gmaiLcom, t tasveerahmad@gmaiLcom, Sadatkamboh66@gmaiLcom
Abstract-A number of efforts have been applied on human using pixels [8]. Each input image is first segmented in
activity recognition such as face recognition and motion analysis. regions. Then, the consistency constraints in individual image
The major problems in recognition are: 1. Scale. 2. Shift. 3. and multiple views as projection and geometric intersection
Projection. Human Activities are identified by the pose of the
are enforced to carve out the silhouette in each image. As an
body such as standing, sitting etc. and some of the activities are
extension, we also propose a space-carving-like method to
like the emotions, facial expressions etc. Generally, human
produce the visual hull of the object without the silhouettes.
activity recognition is achieved by two groups of instruments so
far; one on the basis of wearable sensors and the other one on the
Silhouette provides important geometric information for both
basis of environment sensing instruments. Wearable sensors are computer vision and graphics. In computer vision, shape from
used like accelerometers, gyroscopes, and microphones and silhouette methods construct a 3D shape as the intersection of
environment sensing instruments we use video recorders, shock the visual cones defined by the rays from the camera centers
sensors etc. Since the installation of the environment sensing through all pixels on the silhouettes. These shape-from
instruments is intrusive and quite expensive so Wearable sensors silhouette methods are especially successful for real-time
are preferred. The Human Activity is recognized by combining
virtual reality applications, as they are simple and efficient.
information of human pose, human location and elapsed time.
The contour method will extract the boundary. Contours are
The main objective of this research is to recognize and
the boundary lines of geometric shapes within digital images.
summarize various activities of the human body by the help of
the various techniques like Machine Learning, Sensor
Since the identification of contours is crucial for analyzing the
Technology etc. It will be helpful in commercial use like the contents of an image, contour extraction is one of the most
restaurants and food places full of the automated robots and important problems in computer VISIOn and pattern
many more. It is also helpful in Medical field like full automated recognition. Contour method includes Wavelets, Fourier and
nurse robots available for the patient care etc. it is also helpful in Hough transform. Geometric algorithms, such as polygonal
security purposes and many more. fitting can be used to efficiently find approximations of
contours; however, the knowledge required is not always
I. INTRODUCTION
available, so they are not generally applicable. Global
Recognizing human activities have been becoming very methods, such as the Hough transform, do not rely on any kind
challenging now days. Its application areas spread from of prior knowledge, and try to find sets of pixels which lie on
pedestrian surveillance, crime prevention and human computer curves of specific shapes. These three methods all present
interaction. The activity recognition use these technologies some drawbacks: local methods ignore valuable global
more frequently these days i.e., robust foreground information about the geometric proximity of pixels, since
segmentation, human subject tracking, and occlusion handling they only look at a very small neighborhood; regional methods
[1]. But still these technologies need some improvements. A require prior knowledge about which pixels are part of which
number of human behaviors are considered as the human contour; and global methods such as the Hough transform can
activity, which makes this field much broad. only be used to find certain types of shapes.
The vision-based activity recognition is based on two Researchers also use vision based sensors for human
levels [2], low resolution recognition and high resolution task. activity recognition. Since huge advancements in micro-sensor
The information requires for low resolution task takes the technology, low-power wireless communication and wireless
body as a whole [3] e.g., center of mass, area, volume, and sensor networks is done so the sensor systems are low-cost,
velocity of the entire body [4]. While the information requires effective and privacy-aware alternatives.
for high resolution task is measurement [5] and relation of Mobile devices like cellular phones have played an
individual body parts e.g., pose, silhouette, position, and important role as powerful sensors. The sensors are GPS,
velocity of hands, feet, head [6, 7] etc. microphones, cameras and accelerometers etc. these devices
There are two types of shape-based features i.e. silhouette have small size therefore they are used for data mining
and contour. The silhouette method will extract that object is research and applications.
carrying some object and can be divided in little segments
15BN: 978-1-4799-5754-5/14/$26.00 ©2014 IEEE 83
Accelerometers have a high potential for use in are needed to approximate a typical signal for differential
ambulatory three dimensional movement analysis systems: equations the cosines express a particular choice of boundary
they are small, do not need to be attached to a reference and conditions
provide a signal which incorporates acceleration and
inclination information. However, the analysis of movements
Feature
from accelerometer data is generally not straightforward,
Databas
because the information comprises several components: the
output signal of an accelerometer consists of an actual Trainin
acceleration component and a gravitational acceleration
component. Image's tri-axial accelerometer principle is a
patented sensor solution for motion measurement in three Testing
dimensions. All three axes of motion are measured with a
single silicon chip sensor, which is good news for those who SVM
need to measure vibrations or accelerations in tight spaces. Classifie
The following properties are the key features in recognizing
the activities: 1) Robustness: features should be obtained from
Results
sequences with quite accuracy. 2) Discriminative: it should be
discriminative instead of generative models. 3) Rejection: Fig.l. Overview of the Model
activity recognition should not exhaustive for foreseeable
future. Robustness is defined as the ability of a system to resist The data generated and transmitted to PC via Bluetooth
change without adapting its initial stable configuration. from the accelerometer [12]. This Data has the following
In Real-life activity recognition systems, a hierarchical attributes: 1. Time, 2. Acceleration along x axis. 3.
approach is usually followed. At lower levels, background Acceleration along y axis. 4. Acceleration along z axis. Four
foreground segmentation, tracking and object detection is daily activities are collected such as: I. running. 2. Still. 3.
done. At the midlevel, action-recognition is done. At the high Jumping. 4. Walking. In order to achieve robustness, the
level, the reasoning engines encode the activity is done [9]. position of sensor is important. The person should place the
accelerometer in his pocket. Since the sensor is not fixed with
II. ACTIVITY RECOGNITION USING DISCRETE COSINE
body and it is in continues state of random motion. This
TRANSFORM AND PRINCIPAL COMPONENT ANALYSIS random motion causes vibrations. The magnitude of DCT
coefficients are termed as features. These features are
The main issue in ubiquitous and wearable computing is extracted from each acceleration data.
N-I
context awareness. Here, for the recognition of activities, we I
use a single tri-axial accelerometer. The user can easily place
X, ( O ) = r>:r Lx(n)
-v N .=0
the accelerometer in hislher pocket. Discrete cosine transform
(DCT) is an orthogonal transform [10]. The principal
component analysis (PCA) is used with DCT domain to XcCk) =
gN_1L x(n)
- cos
(2n + l )kJr ,k = J,2,"', (N - J)
N .=0 2N
extract only the discriminating features for recognition [11].
With this combination, more OCT coefficients are kept and Where Xc (k) is the Kth DCT coefficient. Using 2N-point
the extraction of most discriminating features can be achieved. Fast Fourier Transform, the N coefficients of DCT can be
Then we adopt multi-class support vector machines (SVM) to computed. XJk) is a band pass filter with center frequency at
identity different human activities. The central idea of the (2k+ J)/2N. when the sampling frequency is normalized at 1.
overview is shown in the figure 1. Hence the magnitude of the output of XJk) for small k is
A Support Vector Machine (SVM) is a discriminative generally large.
classifier formally defined by a separating hyper-plane. In The dimensionality is reduced by eliminating some of the
other words, given labeled training data (supervised learning), original features. This reduction is achieved by feature
the algorithm outputs an optimal hyper-plane which reduction techniques. The original features are mapped on the
categorizes new examples. lower dimensional subspace by the help of features transform
A discrete cosine transform (OCT) expresses a finite method. Here we use PCA to eliminate dimension of features.
sequence of data points in terms of a sum of cosine functions This leads to the essential information in acceleration signal.
oscillating at different frequencies. DCTs are important to This information explains the human activity.
numerous applications in science and engineering, from lossy Since, the SVM is basically made for binary
compression of audio (where small high-frequency classification. So, multiclass is dealt indirectly. To solve the
components can be discarded), to spectral methods for the multi-class classification issue, we distribute the problem in
numerical solution of partial differential equations. The use of many binary classes. One versus One (OVO) strategy will
cosine rather than sine functions is critical in these construct a classifier from two corresponding class data.
applications: for compression, it turns out that cosine functions Output is produced by the MAX-WINS strategy.
are much more efficient (as described below), fewer functions
84
III. FUZZY INFERENCE SYSTEM 2. Standard Deviation contains x-STD, y-STD and z-STD. It is
only used for confirmation of the results from Peak to Peak
To obtain human's movement acceleration data we use amplitude results.
accelerometer. This accelerometer will c1assity 4 different 3. Correlation between Axes includes xy-CORR, xz-CORR,
activities of a person: l. moving forward. 2. Jumping. 3. and yz-CORR. As we know that correlation between two axes
Going upstairs. 4. Going downstairs. The input for the fuzzy produces data stream. So, the difference between the
system is based on 3 features obtained from every axis of maximum and minimum value is calculated.
accelerometer, which are: 1. Peak to peak amplitude. 2. We define following rules for the extraction of the 9 features
Standard deviation. 3. Correlation. The environmental values
III
as:
of these features gave birth to the fuzzy rules and member Rule 2: If (x-PPA AND y-PPA AND z-PPA AND x-STD
functions. AND y-STD AND z-STD) are intermediate AND (xy-CORR
r-----------------------------------------,
1 I
AND xz-CORR AND yz-CORR) are low then feature is going
1 I
1 1 downstairs.
1 1
1 1
input: : output
� 1
1
�
I
1
: (fu zzy) :
1 ______------------------------------______ 1
Fig. 2. Structure of a fuzzy inference system
A FIS has the property of decision making therefore it is Fig. 3. Direction of the Accelerations in a Tri-axial Accelerometer
used to identity different patterns of user motion. A FIS is
used to identity different activities because of: I. Easiness of Rule 3: If (x-PPA AND y-PPA AND z-PPA AND x-STD
understanding. 2. Flexibility. 3. Tolerance of imprecise data AND y-STD AND z-STD) are intermediate AND (xy-CORR
[13]. AND xz-CORR AND yz-CORR) are high then feature is
The process of mapping the input to output using fuzzy going upstairs
logic can be formulated. This formulation is known as fuzzy Rule 4: If (x-PPA AND y-PPA AND z-PPA AND x-STD
inference. The decisions are made with the help of this AND y-ST AND z-STD) are high then feature is jumping.
mapping. The fuzzy inference involves: 1. Membership
functions. 2. Logic operations. 3. If-then rules Membership
IV. WIRELESS SENSORS USING EVOLVING FUZZY
functions (MF) are used to define as inputs and outputs of the
system. Fuzzy if-then rules are statements in form of SYSTEMS
antecedent & consequent. These statements are obtained by
fuzzy operators applied on I/O of the system. Fuzzy operators This technique is used for re-training the system until the
are usually: I. AND. 2. OR. Decision-making unit implements application is stop without any delay in performance. This
fuzzy operations on rules. Here, we use 'min' for AND approach is formulated on adaptive signal processing
operator and 'max' for OR operator. The process of building techniques. This leads to the construction of evolving models
the fuzzy set in the consequent on the basis of antecedent in a for pattern recognition. Evolving fuzzy systems has the ability
FIS is called implementation. for the progressive evolution, which is present in human
Fuzzification Interface gets inputs use membership nature [14].
functions to determine the degree of the fuzzy sets. Here, We constructed the prototype by connecting a wireless
aggregation is performed in this block. Aggregation shows the sensor on an accelerometer. The main concept here is
outputs of each rule into a single fuzzy set and we use 'max' encoding intelligence in wearable sensors. The comparative
for this purpose. Now defuzzification is done to obtain the analysis of our algorithm with other techniques can be
output. For defuzzification, we use centroid calculation. reviewed by the following datasets: l. Offline Dataset is based
A tri-axial accelerometer uses 9 features as inputs to our on the raw data from the sensors. 2. Online Dataset is based on
FIS. These are computed from acceleration data for all 3 axes. features selected in a real-time fashion. Evolving Fuzzy
For each data streams every feature is obtained: Systems (EFS) are fully recursive therefore we use Online
l. Peak to Peak Amplitude is based on x-PPA, y-PPA and z Dataset. A classifier is used for mapping from feature to class
PPA. We get maximum value of acceleration amplitudes for label space. Evolutionary algorithms are used to train off-line
all axes in case of jumping. We get minimum value of to make classifiers. The e-Class family is trained for on-line
acceleration amplitudes for all axes in case of Walking applications.
forward. We get intermediate value of acceleration amplitudes Our on-line algorithm works like adaptive control;
for all axes in case of moving upstairs and moving downstairs. between 2 samples, 2 phases occurred: I. Classification. 2.
Classifier update. The first phase has unknown class label and
we predict it and the second phase is known. The difference
85
between e-Class classifiers is: 1. Open structure of rule-base. REFERENCES
2. On-line learning mechanism.
[I] 1. K. Aggarwal and Q. Cai, "Human Motion Analysis: A Review,"
The advantages of usage of on-line application are
Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440,
computational simplicity: I. Low memory requirements. 2. March 1999.
Short time of response. 3. Re-Iearning ability. An on-line [2] L. Fiore, "Multi-Camera Human Activity Monitoring", in Journal of
creates stable patterns for classification such as similar motion Intelligent and Robotic Systems Volume 52 Issue I, May 2008.
[3] Bodor, R. , Jackson, B. , and Papanikolopoulos, N., "Vision-Based
patterns for a long time. We study and experiment on 3
Human Tracking and Activity Recognition," Proc. of 11th Mediterranean
activities in this order: I. Climbing Stairs. 2. Stretching. 3. Conf. on Control and Automation, June 2003.
Walking. Since climbing and walking have almost similar [4] Maurin, B. , Masoud, 0., and Papanikolopoulos, N. , "Monitoring
pattern, therefore we put stretching in between them. Crowded Traffic Scenes," Proc. of the IEEE 5th Int. Conf. On Intelligent
Transportation Systems (ITSC 2002), pp 19-24, Singapore, September
Response time must be as less as we can to clear data buffers.
3-6, 2002.
The time of operation must be constant for real time [5] Beymer, D. and Konolige, K. , "Real-Time Tracking of Multiple People
operations. We have maximum time of response by using Using Continuous Detection," Proceeding Of the International
operational time constraints. Conference on Computer Vision, 1999.
[6] Fablet, R. and Black, M. J., "Automatic Detection and Tracking of
We cannot use a fixed threshold technique in real time
Human Motion with a View-Based Representation," European Conf. On
environment. Because it is a weak approach and is only use Computer Vision, ECCV'02 May 2002.
for specific condition not for generic purposes. The [7] Haritaoglu, 1., Harwood, D., and Davis, L. , "W4: Real-Time
acceleration waveform varies when a fall occurred. During Surveillance of People and Their Activities," IEEE Trans. On Pattern
Recognition and Machine Intelligence, August 2000.
Walking maximum is available between 2 delimited cluster
[8] http://www. nlpr.ia.ac.cn
centers. Whenever fall occurs, there will be an increase in [9] Turaga, P., R. Chellappa, V. S. Subrahmanian, and O. Udrea. "Machine
acceleration data. While climbing stairs the fall reached a Recognition of Human Activities: A Survey", in IEEE Transactions on
greater level of complexity. When subject is walking, the time Circuits and Systems for Video Technology, October 2008.
[10] Z. He. "Gesture recognition based on 3D accelerometer for cell phones
taken more than 1second and varies from 1 to 3 seconds.
interaction", in Circuits and Systems, 2008. APCCAS 2008. IEEE Asia
Pacific Conference on December 2008.
V. RESULTS AND CONCLUSIONS [II] Z. He. "Accelerometer Based Gesture Recognition Using Fusion
Features and SVM" in Pattern Recognition (CCPR), 2010 Chinese
The feature extraction method by the help of OCT and Conference on 20IO.
PCA is very efficient method. [t has high accuracy rate for [12] Z. He. "Activity recognition from acceleration data based on discrete
recogmtlon. Vector machines were used for activity cosine transform and SVM" in Systems, Man and Cybernetics, 2009.
SMC 2009. IEEE International Conference on October 2009.
recognition in system. The experimental result has accuracy [13] M. Helmi. "Human activity recognition using a fuzzy inference system"
rate more than 95% for 4 activities. in Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International
A fuzzy inference system (F[S) with rules & membership Conference on August 2009.
functions can recognize the pattern acceptably. FIS shows [14] Andreu, Javier, and P. Angelov. "Real-time human activity recognition
from wireless sensors using evolving fuzzy systems" in Fuzzy Systems
better results than various classifiers. The accuracy of the (FUZZ), 2010 IEEE International Conference on july 2010.
proposed algorithm can be increased by combining a neural
network termed as neuro-fuzzy system. If fuzzy inference
system is combined with genetic algorithm, this will leads to a
higher accuracy of recognition.
A real-time evolving classifier (e-Class) is used to
recognize activities from 2-axial accelerometer data obtained
from sensor nodes. The success rate was approximately 99%
for 3 unrelated activities.
Sr. No. Title Accuracy Rate
ACTIVITY RECOGNITION USING
01. DISCRETE COSINE TRANSFORM
95%
AND PRINCIPAL COMPONENT
ANALYSIS
02. FUZZY INFERENCE SYSTEM 97%
WIRELESS SENSORS USING
03. 99%
EVOLVING FUZZY SYSTEMS
Table 1. Results of Various Techniques
86