Stray Dogs Behavior Detection in Urban
Area Video Surveillance Streams
Marius Baba1, Dan Pescaru1, Vasile Gui2, Ionel Jian1
1
Politehnica University of Timisoara, Department of Computer and Information Technology,
V. Parvan no. 2, Timisoara, Romania
emails: mariusb007@yahoo.com, dan.pescaru@upt.ro, ionel.jian@upt.ro
2
Politehnica University of Timisoara, Department of Communications,
V. Parvan no. 2, Timisoara, Romania
email: vasile.gui@upt.ro
Abstract—Smart city solutions are increasingly important in Our approach is based on existing infrastructure of
the context of continuous population grow. The approach surveillance cameras that is common for most modern cities. It
presented in this paper provides a solution for automatic detection starts with basic feature extraction from video sequences
of stray dog attacks in urban areas. It is based on existing video- followed by shape classification for dog/human recognition. The
camera urban infrastructure and uses movement feature last step involves scene interpretation using a classifier for dog
extraction and classification algorithms in order to discriminate group behavior based on various scene parameters like
between normal and dangerous behavior for a group of stray dogs. movement parameters, trajectories, and group interaction.
The output can be further used for an emergency on-line alert
system. The rest of the paper is organized as following. Next section
presents a brief overview of smart cities video automatic
Keywords—video surveillance, video processing, intelligent
systems
surveillance solutions. In the Section III we introduce our
approach for stray dog attack automatic detection. Section IV
I. INTRODUCTION describes foreground extraction steps and shape classification.
High level processing of the basic features is discussed in the
Intelligent video surveillance systems becomes more and next section. Finally, conclusions and some future work are
more important in conjunction with nowadays problems like presented in the last section.
theorist attacks, natural disasters determined by the global
warming, and cities overcrowding. They also spread with II. AUTOMATIC VIDEO SURVEILLANCE IN SMART CITIES
spectacular results on various other domains, starting from
healthcare for in-house patients monitoring or fall incidents Video surveillance systems are used in urban areas to capture
detection [1] for elder people care, to endangered species and process data in order to predict, detect and observe incidents
surveillance in harsh environment [2]. on public domains. They are widespread nowadays and support
implementation on the smart city concept.
Most solutions in this domain are based on intelligent video
analytics algorithms. The main goal is to help or even replace Ref. [4] presents such solution for city surveillance
the hard work of surveillance officers, which face with the monitoring. The proposed method makes use of inexpensive
problem of monitoring several screens for long periods of time. video sensors positioned all over the city. Each one has a build-
Medical experiments demonstrate that human eyes get tired after in video processing unit. The partially processed information is
just few minutes if one has to draw its attention among different sent over the network to the control center for complex analysis.
screens. Ref. [3] proves that even with a single screen for a In [5], a framework for extending video coverage is
period of 30 minutes, a human subject can miss more than 80% presented. It allows accessing private camera streaming on
of the activity in a video sequence. Therefore, an automatic demand. Event based solution from [6] detects incidents and
system could be seen as a logical way to increase surveillance stores them in the cloud. When needed, the video and its meta-
efficiency. However, the problem is complex due pseudo- data can be retrieved from server archive.
random behavior of the subjects, which make pattern definition
a harder task. Reference [7] aim to achieve classification of offensive team
formations for American football play via a SVM linear
The solution proposed in this paper falls into the category of classifier on video data. Frame gradient information is used as
automatic event detection in public video surveillance scenes. input for foreground object extraction. The frame to frame mean
The problem analyzed in context of the smart city concept is the square error is then calculated. The goal is to determine the
detection of stray dog attacks in urban areas. The severity of this scene motion level.
issue is demonstrated by the significant number of results (over
1.200.000 results on May 2016) of a simple Google search on In [8] the pedestrian group detection is achieved by
"stray dogs attack". unsupervised feature K-means clustering. Activity zones are
automatically detected. The Leader Clustering Algorithm [9] is
978-1-5090-3748-3/16/$31.00 ©2016 IEEE
applied on some extracted track points to obtaining activity
zones. Foreground pixels are determined by the means of a
Gaussian mixture model algorithm. Then, for action
classification a bag of words concept is used.
Another approach uses features quantification by Random
Forest algorithm [10]. The action is labeled by a previously
trained SVM classifier.
A most advanced approach uses deep learning algorithms for
object detection, tracking and recognition [11]. Despite the high
computational resource requirements, they proved to be suitable
for big cities involving large data analysis.
III. ATTACK PATTERN DETECTION
The proposed algorithm analyses each frame captured from
a surveillance video camera in HD resolution at 1280x720 / model to classify the video pixels. The foreground pixels are
25fps. A block diagram of the processing chain is show in Fig. extracted by applying the MOG mask on the original frame.
1. In the low level processing stage foreground objects are
extracted by applying background subtraction on the actual The results are groups of pixels that represent the moving
frame. The moving pixels representing objects are analyzed by objects. Because of background subtraction imperfection and
shape classification algorithm resulting two classes – humans or image noise introduced by camera, parasite pixels may appear
dogs. To generate trajectories, we track each object and store its in the foreground. They are single pixels or in some cases they
position, velocity and acceleration at each frame. The resulting form very small blobs. To eliminate such errors, we apply
data is stored in a feature vector Tj. Objects are tracked only for morphological opening on the mask. A rectangular structuring
a period of N frames, since a feature vector has a fixed maximum element of 3x3 is used for both erosion and dilation functions
length. Its content is periodically updated as new frame arrives. needed in the morphological opening algorithm.
The high level processing stage has the role to identify the Foreground objects are segmented using a similar approach
type of activity undergoing in the video. It consists of two steps: with the one presented in [13]. The output of the process is a
trajectory feature extraction and event detection. vector of contours. If the object is at large distance from the
video camera, its features like arms or legs are very small.
IV. LOW LEVEL PROCESSING Therefore, the objects become harder to recognize. Also the
noise present in the video affects already distorted object shapes.
A. Foreground extraction To overcome this, we propose the use a blob filtering by size
For extracting pixels of interest from video data we apply algorithm to disregard distant objects. The blob area is compared
several filtering algorithms. against reference area using a threshold. If the area is smaller
Background subtraction, as in [12], is used for generating the than the threshold value the object is disregarded.
foreground mask. It uses the mixture of Gaussian (MOG) B. Shape classification
In [14], the shape is decomposed into base and strand
structures. If the number of strand structures is equal to the
template and the base is the same as the template reference, a
match is found. Such strategy is very sensible to blob occlusions.
This lack makes it unusable for our application. The urban
scenes captured by camera are usually crowded with many
occlusions.
Object skeletons can be obtained by applying a thinning
algorithm [15] on the extracted blobs. We tried to use this feature
in our application, but it proved to be inadequate. The skeleton
is very sensible to minor object shape changes. Small change of
the object contour leads to significantly different skeleton shape.
In this paper, we propose a robust shape classification
algorithm based on minimum bounding circle. For each blob we
calculate its bounding circle (Fig. 2). The circle radius R1, R2
represents object size. By comparing bounding circle radiuses
we can establish if the object represents human or dog silhouette.
Dog silhouettes are much smaller than humans. This
classification is not sufficient. Silhouettes may have same size
and yet represent different objects.
To address this problem we use blob area information as an short. In the final phase while getting closer to the victim they
additional feature. If the area of the blob is subtracted from its start to decelerate. The acceleration has negative values and the
bounding circle, the remaining area ΔS (A1 respectively A2 in velocity gets smaller until it equals to zero.
Fig. 2) is obtained. Objects representing humans have ΔS much
larger than objects representing dog’s silhouettes. C. Trajectory feature extraction
Following the analysis in sections A and B, for scene
Since most dog attacks occur in places with no vehicles, we classification we extract the features defined below. The first
focused on discriminating only between humans and dogs. The feature is trajectory itself consisting of N points.
test videos sequences were selected to contain just these two
object classes. To generate the trajectory we determine the object position
at each frame. Image moments are used to compute the blob
V. HIGH LEVEL PROCESSING mass center P(x,y). It represents the current position of the
In attack scenarios, the group of dogs runs toward the victim. object. For each object a vector of successive points P are
At some point, all of them are around the victim, attempting to buffered. The acquired data represents objects trajectories.
reach it as close as possible. For detecting such situations, we Additional features, the velocity and the acceleration are easily
propose trajectory and blob velocity analysis. extracted from the frames.
A. Trajectory analysis By measuring the elapsed time and the distance between two
successive points, we calculate object velocity at time t:
By analyzing blob trajectories, we can anticipate the possible
attack. In this scenario, the running dogs trajectories are all
convergent to one small spot. We can say that they are
vt=(Pt – Pt-1)/dt . (1)
convergent if the attack is taking place. However there are some
exceptions to this rule. Usually the situation is very rare, but in
some cases not all dogs are attacking the victim. To treat such Acceleration is computed by using velocities vt-1 and vt :
situations, we analyze the crowd and disregard isolated objects
trajectories. at = (vt – vt-1)/dt . (2)
The trajectories may be classified in one of the two possible
classes: convergent or divergent. If the trajectories are Then, we construct a feature vector Tj for each object
convergent there may be possible attack. Divergent trajectories position xj, yj:
are indicating that the group is splitting apart. Tj = xj,yj,vj,aj . (3)
B. Movement analysis
Velocity in combination with trajectory analysis can tell if A trajectory feature corresponds to N image frames and is
there is a dog attack tacking place. defined by:
The attack scenario can be divided into three phases: T = [ T1,T2,...,TN ]T . (4)
• beginning (acceleration > 0)
It precisely describes the object’s static and dynamic state at
• intermediary (acceleration = 0, velocity > 0) each group of frames.
• final (acceleration < 0) D. Event detection
We use a SVM algorithm to classify video frames as either
In the beginning phase, each dog starts to run toward the dangerous or normal. It was proved to obtain good performances
victim. The acceleration of each dog starts to increase in similar applications [7, 8]. The feature vector, Tj, is generated
drastically. at each frame. The result is stored in a circular buffer keeping
only the last N vectors. The SVM is called only at the time when
the buffer is full. All buffered data is passed to the classifier as
input.
The algorithm has fairly good performance and low
computational complexity. It classifies the input well, even
when it was previously trained with a small data set. Decision is
based on support vectors therefore actual class distributions have
little influence on results. This feature makes the SWM be fast
and prove very good classification power.
If the group is at greater distance from the victim, at some VI. APPROACH VALIDATION
point, the acceleration will become equal to zero. This represents
To validate the approach we implemented a prototype
the intermediary phase. Each group individual has reached its
software. As performances are important, we choose C++ as the
maximum speed. On the opposite side if the group is very close
implementation language for low level video processing. For
to the victim, the intermediary phase disappears. The dogs can
basic image processing algorithms, we use the openCV library.
never reach their maximum speed, because the distance is too
The prototype generates as output a feature vector. All data is REFERENCES
saved in the Weka Attribute-Relation File Format (ARFF). [1] Cohen, Charles J., Doug Haanpaa, and James P. Zott, "Machine vision
algorithms for robust animal species identification", The 2015 IEEE
We loaded generated ARFF files into the Weka learning Applied Imagery Pattern Recognition Workshop (AIPR), IEEE Press,
environment. The SVM classifier was trained with a set of 89 2015.
videos. These videos where a carefully selected part of public [2] S. T. Londei, J. Rousseau, F. Ducharme, A. St-Arnaud, J. Meunier, J.
Internet sources as [18][19].We avoid here videos with bad Saint-Arnaud, and F. Giroux, “An intelligent videomonitoring system for
camera angle or taken on fog or rain conditions. fall detection at home: perceptions of elderly people”, Journal of
Telemedicine and Telecare, vol. 15, no. 8, pp. 383–390, 2009.
For testing we collected the data from another set of 30 [3] D. Elliott, “Intelligent video solution: a definition”, Security Magazine,
videos. At each iteration a group of trajectories were generated 47(6), pp.46–48, June 2010.
representing a total of 503 items, from which 230 represents [4] Jorge Fernández, Lorena Calavia, Carlos Baladrón, Javier M.
trajectories taken from videos containing attack scenarios. Aguiar,Belén Carro, Antonio Sánchez-Esguevillas, Jesus A. Alonso-
Considering these conditions, the SVM successfully López and Zeev Smilansky, “An intelligent surveillance platform for
large metropolitan areas with dense sensor deployment”, 2013.
distinguished attack from normal scenarios. The confusion
[5] Sola O. Ajiboye, Philip Birch, Christopher Chatwin, Rupert Young,
matrix for the 30 test videos is presented by Table I. It reveals “Hierarchical Video Surveillance Architecture, a chassis for video big
that 99% trajectories are correctly determined and just 0.99% are data analytics and exploration” , 2015.
incorrect classified. [6] Dhiman Chattopadhyayr, Ranjan Dasgupta, Rohan Banerjee, Ankur
Chakraborty, “Event driven video surveillance system using city cloud a
TABLE I. solution compliant with sensor web enablement architecture”, 2013.
[7] Indriyati Atmosukarto, Bernard Ghanem, Shaunak Ahuja, Karthik
Predicted Muthuswamy , Narendra Ahuja, “Automatic Recognition of Offensive
attack normal Team Formation in American Football Plays”, 2013.
[8] Maria Andersson, Luis Patino, Gertjan J. Burghouts, Adam Flizikowski,
Actual attack 230 0
Murray Evans, David Gustafsson, Henrik Petersson, Klamer Schutte,
class normal 5 268 James Ferryman, “Activity Recognition and Localization on a Truck
Parking Lot” , 2013.
[9] R. Duda, P. Hart and D. Stork. “Pattern Classification”, Wiley, 2012.
VII. CONCLUSIONS [10] F. Moosmann, B. Triggs and F. Jurie., ” Randomized Clustering Forests
for Building Fast and Discriminative Visual Vocabularies”, NIPS, 2006.
The approach presented in this paper provides an automatic [11] Li Wang , Dennis Sng, “Deep Learning Algorithms with Applications to
way for dangerous behavior detection of group of stray dogs. It Video Analytics for A Smart City: A Survey” , 2015.
is based on existing urban video cameras infrastructure and [12] Z.Zivkovic “Improved adaptive Gausian mixture model for Background
make use of video processing algorithms and SVM classifier. subtraction”, 2004.
Preliminary tests on over 130 selected videos shows promising [13] Suzuki, S. and Abe, K., “Topological Structural Analysis of Digitized
results with a successful rate of over 99%. Binary Images by Border Following.” CVGIP 30 1, pp 32-46, 1985.
[14] Andrew Temlyakov, Brent C. Munsell, Jarrell W. Waggoner, Song Wang,
Further work that will need to be performed to address “Two Perceptually Motivated Strategies for Shape Classification”.
problems encountered during tests in order to cover movement [15] T.Y.Zhang , C.Y.Suen, “A Fast Parallel Algorithm for Thining Digital
parameters extraction in hard conditions as difficult camera Patterns”.
angle, fog, rain, or snow. [16] Lorena Calavia, Carlos Baladrón, Javier M. Aguiar, Belén Carro and
Antonio Sánchez-Esguevillas, “A Semantic Autonomous Video
ACKNOWLEDGMENT Surveillance System for Dense Camera Networks in Smart Cities”, 2012.
[17] Rama Chellappa, Ashok Veeraraghavan, and Gaurav Aggarwal, “Pattern
This work was supported by a grant of the Romanian Recognition in Video,” 2005.
National Authority for Scientific Research and Innovation, [18] Norsk Kennel Klub (NKK). “European Dog Show 2015 - Day 3.” Online
CNCS – UEFISCDI, project number PN-II-RU-TE-2014-4- video clip.YouTube. YouTube, Sep 6, 2015.
0731 [19] Canal de PES10vsFIFA10. “Mujer atacada por manada de perros
callejeros .” Online video clip.YouTube. Youtube, Jul 20, 2010.