0% found this document useful (0 votes)

20 views9 pages

Muhammad 2018

This manuscript presents an energy-efficient CNN-based framework for summarizing surveillance videos tailored for resource-constrained devices. It introduces a novel shot segmentation method utilizing deep features and selects keyframes based on image memorability and entropy to create diverse and engaging summaries. The proposed method demonstrates superior performance compared to existing video summarization techniques on benchmark datasets.

Uploaded by

yerson sanchez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views9 pages

Muhammad 2018

Uploaded by

yerson sanchez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Accepted Manuscript

Efficient CNN based summarization of surveillance videos for

resource-constrained devices

Khan Muhammad , Tanveer Hussain , Sung Wook Baik

PII: S0167-8655(18)30384-2
DOI: https://doi.org/10.1016/j.patrec.2018.08.003
Reference: PATREC 7267

To appear in: Pattern Recognition Letters

Received date: 5 April 2018

Revised date: 10 July 2018
Accepted date: 4 August 2018

Please cite this article as: Khan Muhammad , Tanveer Hussain , Sung Wook Baik , Efficient CNN
based summarization of surveillance videos for resource-constrained devices, Pattern Recognition
Letters (2018), doi: https://doi.org/10.1016/j.patrec.2018.08.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Highlights
 We propose an energy-efficient CNN-based framework for summarization of surveillance videos
 Our system uses a novel shot segmentation scheme using deep features
 Keyframes selection is based on image memorability, providing diverse and interesting summary

T
IP
CR
US
AN
M
ED
PT
CE
AC

1
ACCEPTED MANUSCRIPT

Pattern Recognition Letters

journal homepage: www.elsevier.com

Efficient CNN based summarization of surveillance videos for resource-constrained

devices

T
Khan Muhammad, Tanveer Hussain, Sung Wook Baik*

IP
Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul-143-747, Republic of Korea

CR
ABSTRACT

The widespread usage of surveillance cameras in smart cities has resulted in a gigantic volume of video data

US
whose indexing, retrieval and management is a challenging issue. Video summarization tends to detect
important visual data from the surveillance stream and can help in efficient indexing and retrieval of
required data from huge surveillance datasets. In this research article, we propose an efficient convolutional
neural network based summarization method for surveillance videos of resource-constrained devices. Shot
AN
segmentation is considered as a backbone of video summarization methods and it affects the overall quality
of the generated summary. Thus, we propose an effective shot segmentation method using deep features.
Furthermore, our framework maintains the interestingness of the generated summary using image
memorability and entropy. Within each shot, the frame with highest memorability and entropy score is
M

considered as a keyframe. The proposed method is evaluated on two benchmark video datasets and the
results are encouraging compared to state-of-the-art video summarization methods.
ED

Keywords: Video Analysis, Video Summarization, Surveillance, Energy-Efficiency, Resource-Constrained

Devices (DT)
based
1. Introduction clustering for selection of keyframes. Drew et al. [21] used both
PT

compression and clustering for efficient summarization of videos.

Wireless multimedia sensor networks have been extensively Another automatic VS method is presented in [22] where the
explored in smart cities for sensing, monitoring, event detection final number of clusters for summarization are determined using
and automatic reporting, which can be helpful for numerous a dynamic calculation mechanism. Furini et al. [23] proposed an
CE

applications such as disaster management [1, 2], e-health [3], approach known as “still and moving” (STIMO) where
surveillance [4, 5], ubiquities communication [6, 7], and security storyboards can be generated on-the-fly. De Avila et al. [24]
systems [8-14]. Despite their maturity up to some extent, their focused on static VS using color features and k-means clustering
real implementation is a challenge due to their increased as well as nomination of a novel summary evaluation approach.
AC

processing and transmission power needed for visual sensors to Mahmoud et al. [25] enhanced the VS pipeline via density-
process and transmit the video data. This also affects the decision assisted spatial clustering approach. Xu et al. [26] used clustering
making process which is based on observation of the surveillance along with semantical, emotional, and shoot-quality clues for
environment. Recently, intelligent surveillance mechanisms have summarization of user generated summaries.
been presented, attempting to reduce the processing and
bandwidth consumption. For instance, Irfan et al. [15] used Besides compression domain and clustering, numerous actions,
salient motion detection for image prioritization in multi-camera activities, and events based summarization methods are also
surveillance networks with reduced processing time and reliable proposed in recent years. For instance, Meghdadi et al. [27]
dissemination of important contents to base station via sink node. explored surveillance videos using summarization of its shots. A
Bradai et al. [16] presented “EMCOS”, which is a surveillance VS method using abnormality detection is presented
computationally friendly approach for streaming multimedia data in [28]. Mademlis et al. [29] investigated an activity based VS
in cognitive radio networks. Considering the increasing volume approach. Other related works using event detection for
of video data, Almeida et al. [17] presented an online video summarization are explored in [30] and [31]. Continuing the
summarization (VS) method in compressed domain which is later usage of recognition, Wang et al. [32] investigated web videos
extended by Wang et al. [18] for video synopsis. Another VS for summarization using an event driven approach with tags and
method for online applications using compression domain is recognition of key shots. Thomas et al. [33] emphasized on
presented in [19]. Mundur et al. [20] used Delaunay triangulation detection of perceptually important events for VS. Rabbouch et al.

2
ACCEPTED MANUSCRIPT
[34] used a cluster analysis based VS approach with application a four-threshold approach for shot segmentation based on which
to automatically count and recognize vehicles in surveillance anchor person is detected in news videos. Priya et al. [46] used
video stream. Zhang et al. [35] based their VS on spatio-temporal shot segmentation for indexing and retrieval of ecological
analysis for detection of object motion using which an attention videos. Li et al. [47] used sparse coding for shot segmentation,
curve is generated and keyframes are extracted. Besides which in turn is used for summarization. The mentioned methods
clustering, graphs based VS methods are also exploited. For are domain specific and their performance for surveillance videos
instance, Kuanar et al. [36] presented a VS approach with is limited. Considering this, we investigated deep features for
intelligent utilization of bipartite graph matching for modelling shot segmentation to intelligently divide the video stream into
inter-view dependencies in multi-view videos and optimum-path meaningful shots. Our approach extracts deep features from the
forest scheme for clustering. In continuation with this method, two consecutive frames to determine whether the underlying
[37] used graph-assisted hierarchical method for VS. frames belong to the same or different shot. Features are
extracted from the fully connected layer (FC7) of CNN model
In an attempt to extend the usefulness of VS, Bagheri et al.
which is trained using MobileNet architecture (version 2) on
[38] proposed a method for temporal mapping of surveillance
ImageNet dataset. This model is originally trained for
videos for indexing and searching. Kannan et al. [39] enriched
classification but has learned global discriminative features,
the summarization pipeline with personalization for movies data.
which can be used for other purposes. After our analysis, we
Varini et al. [40] explored personalized VS for egocentric videos
found that deep features learned at higher full connected layer
with consideration of user preferences. Chen et al. [41] focused

T
(FC7) of this model, are suitable for shot segmentation, thus, we
on resource allocation for personalized VS. Hamza et al. [42]
used it in our framework.
used both personalization and privacy preservation for medical

IP
videos summarization. Sparse coding is also used in the After features extraction, the extracted deep features are
summarization pipeline. For instance, Mei et al. [43] used compared using Euclidean distance between them, whose
minimum sparse reconstruction strategy for summarization. formula is given in Equation 1.

CR
Sparse coding with context-awareness for surveillance videos is
presented by [44]. n

Summarizing the current literature, it can be observed that

distt  (X
i 1
i  X i )2 (1)
certain methods are capable to generate good summaries but their
computationally expensive behaviour is limiting their usefulness
for surveillance networks and devices with constrained resources.
On the other hand, the efficient VS methods are not competitive
enough to detect important frames from real surveillance stream.
US Herein n is the total number of pixels in the frame, X is the
first frame and X is the next selected frame. The value returned
from this function is normalized between 0 and 1. The optimal
threshold selected for determining the same and different shots is
AN
0.7, which is decided after several experiments. The two frames
Considering these challenges, in this paper, we present an
having Euclidean distance less than or equal to 0.7, are
efficient VS method for devices with constrained resources. We
considered to be from the same shot. Otherwise, the underlying
attempted to contribute to shot segmentation and keyframes
two frames are considered to be of different shots.
selection of the video summarization pipeline. Shot segmentation
is the prerequisite and most important step as the generated
M

2.2 Features extraction

summary is heavily dependent on it. Therefore, we used deep
features extracted from convolutional neural networks (CNNs) It is well-agreed that every individual is aware of how an
for shot segmentation. To measure the importance of frames, we image or a scene is memorable in his mind, depending upon the
image or scene. Certain images get more attention of users
ED

used image memorability and entropy, based on which keyframes

are chosen from each video shot. Our scheme is efficient as well compared to others. For instance, images having people and
as intelligent enough to nominate important frames in events such as fight, romance, and actions etc., are more
surveillance stream, thus, making it a best fit for embedded memorable to users compared to images with normal neutral
processing and adaptation in visual surveillance networks. events. Thus frames having people in center obtain higher
PT

memorability values compared to people on right or left of the

The rest of the paper is structured as follows. Section 2 frame. The same case can be observed in videos. Video frames
explains our proposed framework with experimental results and having people and some salient objects attract the user attention
discussion in Section 3. Section 4 concludes this work with and have higher probability of being selected as keyframes.
CE

recommendation for future research. Sample keyframes with their memorability scores are shown in
2. The proposed framework Fig. 2. The frames having more objects and in the center have
received higher memorability scores compared to frames with
Motivated from the strength of CNNs for various applications, either few objects or objects at the sides of the frames.
AC

this framework aims to propose an energy-efficient CNN based

The image memorability model is trained using a large image
VS method for surveillance videos captured by resource-
memorability dataset [48], having 60,000 images from diverse
constrained devices. The framework is three-fold: shot
resources using CNNs. This dataset consists of images with
segmentation using deep features, computing image
memorability score ranges from 0 to 1. Thus, the final trained
memorability and entropy for each frame of the shots, and
classifier outputs a real value between 0 and 1.
keyframes selection from each shot for summary generation. An
additional post-processing step using color histogram difference
is finally used to discard the duplicate frames. The framework
with its main components is shown in Fig. 1.
2.1 Shot segmentation
Shot segmentation is a challenging problem for VS methods
as the diversity, coverage, and interestingness of the video Fig. 1. The proposed framework
summary is dependent on the segmented shots. Recently,
numerous shot segmentation methods have been presented for
different applications including anchor detection in news videos,
VS, video indexing and retrieval. For instance, Ji et al. [45] used
3
ACCEPTED MANUSCRIPT
format. The dataset videos are distributed in several categories
like documentary, educational, ephemeral, historical, and lecture.
The length of videos varies from 1 to 4 minutes and the total
duration of the whole dataset is approximately 75 minutes. Using
this dataset, the proposed method is compared with several
methods including OV [49], DT [20], STIMO [23], VSCAN
[50], and VSUMM [24] using standard evaluation metrics of
precision, recall, and F-measure [51, 52] whose details are given
in Eq. 2-Eq. 4.
Precision: It corresponds to the accuracy of a method by
making use of the extracted false keyframes as shown in
Equation 2.
N matched
Pr ecison  (2)
Nextracted
Herein “Nmatched” shows the number of keyframes matched
with the ground truth summary and the summary of our proposed

T
method and “Nextracted“ represents the number of extracted
keyframes in a summary.

IP
Fig. 2. Sample frames with memorability scores predicted by
trained image memorability model Recall: It corresponds to the possibility of extraction over all
Selecting keyframes using only memorability score may result ground truth keyframes.

CR
in a summary that may not adequately represent the entire video. N matched
Memorability does not preserve the heterogeneity of keyframes Re call  (3)
N groundtruth
and thus we added entropy measure in our framework to maintain
Herein “Ngroundtruth” represents the number of keyframes in the
the diversity of our summary. Entropy of an image shows the
ground truth summary. The F-measure is calculated using Eq. 4.
amount of information. Thus, the features extraction phase
includes memorability prediction and computing entropy score.
2.3 Keyframe extraction
US
information inside it and its score is directly proportional to the
F
2 * (Pr ecision * Re call)
Pr ecision  Re call
(4)
A set of representative videos are shown in Table 1 along with
their F-measure score. From Table 1, it can be observed that F-
AN
measure of the proposed system is much higher than every single
Keyframes extraction is the final step of VS pipeline for
existing technique on the same video. The last row in Table 1
which an attention curve is constructed using the computed
shows the average score of F-measure for each method under
features. In our case, we consider image memorability and
observation and shows the superiority of our scheme in
entropy as a metric to generate an attention curve for each shot of
M

generating relevant video summaries.

the input video. Within each shot, the frame with maximum
memorability and entropy score is considered as the keyframe. 3.2 Results on YouTube database
Next, the keyframes from all shots are combined to constitute a
video summary. Finally, a post-processing step is used to These 50 videos are downloaded from YouTube [53]
ED

eliminate the redundant frames. This step uses color histogram belonging to different genres ranging from 1 to 10 minutes. The
difference to discard frames of same shots that are selected as results of our method based on this dataset are compared with 5
keyframes. The advantage of this step is explained through an users generated summaries, VSUMM [24], and Fei et al. [54]
example in Section 3. method. The main difference between the proposed method and
PT

[54] is the way shots are segmented. They have used perceptual
3. Experimental results and discussion hashing for shot segmentation whereas we have employed deep
features which are much powerful than traditional methods. The
The proposed method is tested on two different videos
details of shot segmentation are already given in section 2.1. For
datasets and the results are compared with state of the art
CE

comparison, a representative video from several video genres are

methods for performance evaluation. In the field of static video
selected and results are compared with other methods as well as
summarization, different methods evaluate their experimental
with 5 users generated summaries as shown in Table 2. The best,
results on different datasets. Many of them neither made the
average, and worst F-measure scores are computed by its
datasets publically available nor publicize the implementation
AC

comparison with keyframes generated by different subjects. The

details for other researchers. Thus, the comparison with large
summary generated by our method scored best compared to other
number of VS techniques is nearly impossible. Considering this,
methods, showing that the generated summary is representative
we compared our scheme with five methods based on two
enough to satisfy the users’ needs.
publically available video datasets, each containing 50 videos.
The reason is the diversity and several different video categories Figure 3 and Fig. 4 show the visual comparison of generated
of the two selected datasets. It is challenging and time consuming summaries of our method with [54], VSUMM [24], and 5
to find the ground truth summary for each video in a dataset and different users generated summaries. From the whole dataset,
due to this difficulty, benchmark VS datasets are comparatively only representative videos are chosen and their summaries are
less in literature. The selected datasets contain five user’s shown. In Fig. 3, it can be observed that the summary generated
summaries available for each video, making its quantitative by our method consists of salient frames and is near to
evaluation and comparison with state-of-the-art easy and straight- summaries of Users. Also, our summary contains visually
forward. important frames that are marked by users in their generated
summaries. Figure 4 shows the summary of video# 88 in which
3.1 Results on Open Video project database our method has extracted comparatively more number of frames
The open video (OV) dataset [49] consists of videos in in comparison with some user generated summaries but these
MPEG-1 format with 30 fps and dimension 352 x 240 pixels. The additional frames are also salient and memorable.
videos have sound information and the frames are in RGB
4
ACCEPTED MANUSCRIPT
To select the best set of representative frames, a post- time of a system to highlight its applicability. To this end, our
processing step is used to discard any nearly duplicate frames method can process 18 fps, making it a suitable summarization
from the generated summary. The advantage of this step is approach for visual surveillance.
evident from Fig. 5. Finally, it is important to report the running

Table 1: Sample results for selected videos using F-measure score

Video No OV [49] DT [20] STIMO [23] VSUMM [24] VSCAN [50] Proposed Method
V1 0.66 0.24 0.60 0.58 0.58 0.78
V3 0.74 0.66 0.46 0.73 0.77 0.88
V5 0.46 0.31 0.54 0.75 0.87 0.72
V49 0.70 0.67 0.65 0.84 0.85 0.80
V50 0.77 0.77 0.89 0.71 0.81 0.8
Average F-measure
0.66 0.53 0.62 0.72 0.77 0.79
score

Table 2: Mean F-measure of each category sample video of the proposed method with VSUMM, [54], and user summaries
Users Proposed

T
Category Video # VSUMM [24] [54]
Worst Average Best Method
V11 0.67 0.74 0.77 0.68 0.82 0.82

IP
Cartoons
V12 0.65 0.71 0.76 0.65 0.67 0.73
News V88 0.68 0.76 0.84 0.67 0.75 0.78
Home V108 0.57 0.67 0.79 0.52 0.73 0.73

CR
Average F-measure 0.64 0.72 0.79 0.63 0.74 0.76

US
AN
M
ED
PT

Fig. 3. Video summary of our method, users, VSUMM [24], and [54] for video “v11”
CE
AC

Fig. 4. Video summary of our method, users, VSUMM [24], and [54] for video “v88”

5
ACCEPTED MANUSCRIPT

This work was supported by the National Research

Foundation of Korea (NRF) grant funded by the Korea
government (MSIP) (No.2016R1A2B4011712).
References
[1] K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P.
Yang, and S. W. Baik, "Efficient Deep CNN-Based
Fire Detection and Localization in Video Surveillance
Applications," IEEE Transactions on Systems, Man,
and Cybernetics: Systems, pp. 1-16, 2018.
[2] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S.
W. Baik, "Convolutional Neural Networks Based Fire

T
Detection in Surveillance Videos," IEEE Access, vol. 6,
pp. 18174-18183, 2018.
[3] K. Wang, Y. Shao, L. Shu, C. Zhu, and Y. Zhang,

IP
"Mobile big data fault-tolerant processing for ehealth
networks," IEEE Network, vol. 30, pp. 36-42, 2016.
[4] X. Chang, Z. Ma, Y. Yang, Z. Zeng, and A. G.

CR
Hauptmann, "Bi-level semantic representation analysis
for multimedia event detection," IEEE transactions on
cybernetics, vol. 47, pp. 1180-1197, 2017.
Fig. 5. Effect of post-processing on video summary. (a) original [5] X. Chang, Y.-L. Yu, Y. Yang, and E. P. Xing,
summary generated by our method (b) Summary after removing "Semantic pooling for complex event analysis in
redundancy through histogram difference untrimmed videos," IEEE transactions on pattern

4 Conclusion and future work

Convolutional neural networks have showed state-of-
the-art performance for addressing different problems of
US [6]
analysis and machine intelligence, vol. 39, pp. 1617-
1632, 2017.
G. Han, L. Liu, W. Zhang, and S. Chan, "A hierarchical
jammed-area mapping service for ubiquitous
communication in smart communities," IEEE
AN
Communications Magazine, vol. 56, pp. 92-98, 2018.
video surveillance such as event detection and scene [7] Y. Wu, F. Hu, G. Min, and A. Y. Zomaya, Big Data
recognition. Recently, they have been applied to video and Computational Intelligence in Networking: CRC
summarization, however, their computationally Press, 2017.
complexity is limiting its suitability for surveillance [8] K. Muhammad, J. Ahmad, and S. W. Baik, "Early fire
networks with limited resources. With this motivation, detection using convolutional neural networks during
M

surveillance for effective disaster management,"

we studied light-weight CNNs and propose an efficient Neurocomputing, vol. 288, pp. 30-42, 2018/05/02/
VS method for surveillance videos in resource- 2018.
constrained scenarios. Our method is three-fold: shot [9] K. Muhammad, R. Hamza, J. Ahmad, J. Lloret, H. H.
segmentation using deep features, memorability and G. Wang, and S. W. Baik, "Secure Surveillance
ED

entropy score prediction, and summary generation. Shot Framework for IoT systems using Probabilistic Image
Encryption," IEEE Transactions on Industrial
segmentation is the most critical step of video Informatics, 2018.
summarization, therefore, we performed it through deep [10] A. B. Mabrouk and E. Zagrouba, "Abnormal behavior
features to intelligently divide the surveillance video recognition for intelligent video surveillance systems: A
PT

into meaningful shots. Next, the memorability of each review," Expert Systems with Applications, 2017.
frame within the shot is predicted using a fine-tuned [11] X. Chang and Y. Yang, "Semisupervised feature
analysis by mining correlations among multiple tasks,"
image memorability prediction model, followed by IEEE transactions on neural networks and learning
entropy measure. Finally, the frame with highest systems, vol. 28, pp. 2294-2305, 2017.
CE

memorability and entropy score within each shot is [12] X. Chang, Z. Ma, M. Lin, Y. Yang, and A. G.
picked to constitute the final summary. The obtained Hauptmann, "Feature interaction augmented sparse
results based on two benchmark datasets show the learning for fast kinect motion detection," IEEE
Transactions on Image Processing, vol. 26, pp. 3911-
significance of this method for adaptability in resource-
3920, 2017.
AC

constrained surveillance networks for summarization. [13] K. Muhammad, M. Sajjad, and S. W. Baik, "Dual-Level
Security based Cyclic18 Steganographic Method and its
The current method can summarize video stream Application for Secure Transmission of Keyframes
with 18 fps. Further research is needed to improve this during Wireless Capsule Endoscopy," Journal of
processing rate and combine it with spectrum sensing Medical Systems, vol. 40, p. 114, 2016.
technologies for smarter surveillance. We also plan to [14] M. Sajjad, S. Khan, T. Hussain, K. Muhammad, A. K.
apply and extend this work to other resource constrained Sangaiah, A. Castiglione, et al., "CNN-based anti-
spoofing two-tier multi-factor authentication system,"
environments such as wireless sensor networks [55, 56] Pattern Recognition Letters, 2018/02/22/ 2018.
and internet of things [57, 58]. [15] I. Mehmood, M. Sajjad, W. Ejaz, and S. W. Baik,
"Saliency-directed prioritization of visual data in
Acknowledgment wireless surveillance networks," Information Fusion,
vol. 24, pp. 16-30, 2015.

6
ACCEPTED MANUSCRIPT

[16] A. Bradai, K. Singh, A. Rachedi, and T. Ahmed, [32] M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S.
"EMCOS: Energy-efficient mechanism for multimedia Chua, "Event driven web video summarization by tag
streaming over cognitive radio sensor networks," localization and key-shot identification," IEEE
Pervasive and Mobile Computing, vol. 22, pp. 16-32, Transactions on Multimedia, vol. 14, pp. 975-985,
2015. 2012.
[17] J. Almeida, N. J. Leite, and R. d. S. Torres, "Online [33] S. S. Thomas, S. Gupta, and V. K. Subramanian,
video summarization on compressed domain," Journal "Perceptual Video Summarization—A New Framework
of Visual Communication and Image Representation, for Video Summarization," IEEE Transactions on
vol. 24, pp. 729-738, 2013. Circuits and Systems for Video Technology, vol. 27, pp.
[18] S.-z. Wang, Z.-y. Wang, and R.-m. Hu, "Surveillance 1790-1802, 2017.
video synopsis in the compressed domain for fast video [34] H. Rabbouch, F. Saâdaoui, and R. Mraihi,
browsing," Journal of Visual Communication and "Unsupervised video summarization using cluster
Image Representation, vol. 24, pp. 1431-1442, 2013. analysis for automatic vehicles counting and

T
[19] J. Almeida, N. J. Leite, and R. d. S. Torres, "Vison: recognizing," Neurocomputing, vol. 260, pp. 157-173,
Video summarization for online applications," Pattern 2017.
Recognition Letters, vol. 33, pp. 397-409, 2012. [35] Y. Zhang, R. Tao, and Y. Wang, "Motion-state-

IP
[20] P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based adaptive video summarization via spatiotemporal
video summarization using Delaunay clustering," analysis," IEEE Transactions on Circuits and Systems
International Journal on Digital Libraries, vol. 6, pp. for Video Technology, vol. 27, pp. 1340-1352, 2017.

CR
219-232, 2006. [36] S. K. Kuanar, K. B. Ranga, and A. S. Chowdhury,
[21] M. S. Drew and J. Au, "Clustering of compressed "Multi-view video summarization using bipartite
illumination-invariant chromaticity signatures for matching constrained optimum-path forest clustering,"
efficient video summarization," Image and Vision IEEE Transactions on Multimedia, vol. 17, pp. 1166-
Computing, vol. 21, pp. 705-716, 2003. 1173, 2015.
[22] D. P. Papadopoulos, V. S. Kalogeiton, S. A. [37] L. dos Santos Belo, C. A. Caetano Jr, Z. K. G. do

[23]
Chatzichristofis, and N. Papamarkos, "Automatic
summarization and annotation of videos with lack of
metadata information," Expert Systems with
Applications, vol. 40, pp. 5765-5778, 2013.
M. Furini, F. Geraci, M. Montangero, and M.
US [38]
Patrocínio Jr, and S. J. F. Guimarães, "Summarizing
video sequence using a graph-based hierarchical
approach," Neurocomputing, vol. 173, pp. 1001-1016,
2016.
S. Bagheri, J. Y. Zheng, and S. Sinha, "Temporal
AN
Pellegrini, "STIMO: STIll and MOving video mapping of surveillance video for indexing and
storyboard for the web scenario," Multimedia Tools and summarization," Computer Vision and Image
Applications, vol. 46, pp. 47-69, 2010. Understanding, vol. 144, pp. 237-257, 2016.
[24] S. E. F. De Avila, A. P. B. Lopes, A. da Luz, and A. de [39] R. Kannan, G. Ghinea, and S. Swaminathan, "What do
Albuquerque Araújo, "VSUMM: A mechanism you wish to see? A summarization system for movies
designed to produce static video summaries and a novel based on user preferences," Information Processing &
M

evaluation method," Pattern Recognition Letters, vol. Management, vol. 51, pp. 286-305, 2015.
32, pp. 56-68, 2011. [40] P. Varini, G. Serra, and R. Cucchiara, "Personalized
[25] K. M. Mahmoud, M. A. Ismail, and N. M. Ghanem, Egocentric Video Summarization of Cultural Tour on
"Vscan: an enhanced video summarization using User Preferences Input," IEEE Transactions on
density-based spatial clustering," in International Multimedia, vol. 19, pp. 2832-2845, 2017.
ED

conference on image analysis and processing, 2013, pp. [41] F. Chen, C. De Vleeschouwer, and A. Cavallaro,
733-742. "Resource allocation for personalized video
[26] B. Xu, X. Wang, and Y.-G. Jiang, "Fast Summarization summarization," IEEE Transactions on Multimedia,
of User-Generated Videos: Exploiting Semantic, vol. 16, pp. 455-469, 2014.
Emotional, and Quality Clues," IEEE MultiMedia, vol. [42] R. Hamza, K. Muhammad, Z. Lv, and F. Titouna,
PT

23, pp. 23-33, 2016. "Secure video summarization framework for

[27] A. H. Meghdadi and P. Irani, "Interactive exploration of personalized wireless capsule endoscopy," Pervasive
surveillance video through action shot summarization and Mobile Computing, vol. 41, pp. 436-450, 2017.
and trajectory visualization," IEEE Transactions on [43] S. Mei, G. Guan, Z. Wang, S. Wan, M. He, and D. D.
Visualization and Computer Graphics, vol. 19, pp. Feng, "Video summarization via minimum sparse
CE

2119-2128, 2013. reconstruction," Pattern Recognition, vol. 48, pp. 522-

[28] W. Lin, Y. Zhang, J. Lu, B. Zhou, J. Wang, and Y. 533, 2015.
Zhou, "Summarizing surveillance videos with local- [44] S. Zhang, Y. Zhu, and A. K. Roy-Chowdhury,
patch-learning-based abnormality detection, blob "Context-Aware Surveillance Video Summarization,"
sequence optimization, and type-based synopsis," IEEE Transactions on Image Processing, vol. 25, pp.
AC

Neurocomputing, vol. 155, pp. 84-98, 2015. 5469-5478, 2016.

[29] I. Mademlis, A. Tefas, and I. Pitas, "A salient [45] P. Ji, L. Cao, X. Zhang, L. Zhang, and W. Wu, "News
dictionary learning framework for activity video videos anchor person detection by shot clustering,"
summarization via key-frame extraction," Information Neurocomputing, vol. 123, pp. 86-99, 2014.
Sciences, vol. 432, pp. 319-331, 2018. [46] G. L. Priya and S. Domnic, "Shot based keyframe
[30] X. Song, L. Sun, J. Lei, D. Tao, G. Yuan, and M. Song, extraction for ecological video indexing and retrieval,"
"Event-based large scale surveillance video Ecological informatics, vol. 23, pp. 107-117, 2014.
summarization," Neurocomputing, vol. 187, pp. 66-74, [47] J. Li, T. Yao, Q. Ling, and T. Mei, "Detecting shot
2016. boundary with sparse coding for video summarization,"
[31] S. S. Thomas, S. Gupta, and V. K. Subramanian, "Event Neurocomputing, vol. 266, pp. 66-78, 2017.
Detection on Roads Using Perceptual Video [48] A. Khosla, A. S. Raju, A. Torralba, and A. Oliva,
Summarization," IEEE Transactions on Intelligent "Understanding and predicting image memorability at a
Transportation Systems, 2017. large scale," in Computer Vision (ICCV), 2015 IEEE
International Conference on, 2015, pp. 2390-2398.

7
ACCEPTED MANUSCRIPT

[49] D. DeMenthon, V. Kobla, and D. Doermann, "Video

summarization by curve simplification," in Proceedings
of the sixth ACM international conference on
Multimedia, 1998, pp. 211-218.
[50] K. M. Mahmoud, M. A. Ismail, and N. M. Ghanem,
"VSCAN: An Enhanced Video Summarization Using
Density-Based Spatial Clustering," Berlin, Heidelberg,
2013, pp. 733-742.
[51] K. Muhammad, J. Ahmad, M. Sajjad, and S. W. Baik,
"Visual saliency models for summarization of
diagnostic hysteroscopy videos in healthcare systems,"
SpringerPlus, vol. 5, p. 1495, 2016.
[52] K. Muhammad, M. Sajjad, M. Y. Lee, and S. W. Baik,

T
"Efficient visual attention driven framework for key
frames extraction from hysteroscopy videos,"
Biomedical Signal Processing and Control, vol. 33, pp.

IP
161-168, 2017.
[53] J. D. Mitchell-Jackson, "Energy needs in an internet
economy: A closer look at data centers," Masters of

CR
Science Thesis from the Energy and Resources Group,
University of California at Berkeley, 2001.
[54] M. Fei, W. Jiang, and W. Mao, "Memorable and rich
video summarization," Journal of Visual
Communication and Image Representation, vol. 42, pp.
207-217, 2017.
[55] G. Han, L. Liu, S. Chan, R. Yu, and Y. Yang,
"HySense: A hybrid mobile crowdsensing framework
for sensing opportunities compensation under dynamic
coverage constraint," IEEE Communications Magazine,
vol. 55, pp. 93-99, 2017.
US
AN
[56] Y. Wu, G. Min, K. Li, and B. Javadi, "Modeling and
analysis of communication networks in multicluster
systems under spatio-temporal bursty traffic," IEEE
Transactions on Parallel and Distributed Systems, vol.
23, pp. 902-912, 2012.
[57] G. Han, L. Zhou, H. Wang, W. Zhang, and S. Chan, "A
M

source location protection protocol based on dynamic

routing in WSNs for the Social Internet of Things,"
Future Generation Computer Systems, 2017.
[58] Y. Ma, Y. Wu, J. Ge, and J. Li, "An Architecture for
Accountable Anonymous Access in the Internet-of-
ED

Things Network," IEEE Access, 2018.

PT
CE
AC

06 Chapter2
No ratings yet
06 Chapter2
30 pages
Hybrid Algorithms For Summarization of Video Surveillance Systems
No ratings yet
Hybrid Algorithms For Summarization of Video Surveillance Systems
35 pages
Computers 12 00186
No ratings yet
Computers 12 00186
14 pages
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
No ratings yet
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
14 pages
Attention-Guided Multi-Granularity Fusion Model For Video Summarization
No ratings yet
Attention-Guided Multi-Granularity Fusion Model For Video Summarization
11 pages
Video Summarization Using Deep Semantic Features
No ratings yet
Video Summarization Using Deep Semantic Features
16 pages
Discover Artificial Intelligence: Wanet: Weight and Attention Network For Video Summarization
No ratings yet
Discover Artificial Intelligence: Wanet: Weight and Attention Network For Video Summarization
13 pages
IJCRTBE02132
No ratings yet
IJCRTBE02132
7 pages
Video Summarization Overview: Cyberagent, Inc. Otani - Mayu@Cyberagent - Co.Jp
No ratings yet
Video Summarization Overview: Cyberagent, Inc. Otani - Mayu@Cyberagent - Co.Jp
55 pages
Paper 2
No ratings yet
Paper 2
6 pages
Video Summarization Using Deep Neural Networks: A Survey
No ratings yet
Video Summarization Using Deep Neural Networks: A Survey
26 pages
Dynamic Summarization of Videos Based On Descriptors in Space-Time Video Volumes and Sparse Autoencoder
No ratings yet
Dynamic Summarization of Videos Based On Descriptors in Space-Time Video Volumes and Sparse Autoencoder
11 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
8 pages
Mathematical Problems in Engineering - 2022 - Ul Haq - An Effective Video Summarization Framework Based On The Object of
No ratings yet
Mathematical Problems in Engineering - 2022 - Ul Haq - An Effective Video Summarization Framework Based On The Object of
25 pages
A Novel Clustering Method For Static Video Summarization
No ratings yet
A Novel Clustering Method For Static Video Summarization
17 pages
A Machine Learning Based Approach To Video Summarization
No ratings yet
A Machine Learning Based Approach To Video Summarization
5 pages
1 s2.0 S1877050916311309 Main
No ratings yet
1 s2.0 S1877050916311309 Main
8 pages
23 Video Summarization LTC-SUM Lightweight Client-Driven Personalized Video Summarization Framework Using 2D CNN
No ratings yet
23 Video Summarization LTC-SUM Lightweight Client-Driven Personalized Video Summarization Framework Using 2D CNN
15 pages
1 - 4. An Approach For Video Summarization Based On Unsupervised Learning Using Deep Semantic Features and Keyframe Extraction
No ratings yet
1 - 4. An Approach For Video Summarization Based On Unsupervised Learning Using Deep Semantic Features and Keyframe Extraction
8 pages
Peerj Cs 07 402
No ratings yet
Peerj Cs 07 402
23 pages
Video Summarization Using Fully Convolutional Sequence Networks
No ratings yet
Video Summarization Using Fully Convolutional Sequence Networks
17 pages
Complete Paper Draft
No ratings yet
Complete Paper Draft
7 pages
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
No ratings yet
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
6 pages
Video Summarization Research Guidance
No ratings yet
Video Summarization Research Guidance
20 pages
Yun Thesesgg
No ratings yet
Yun Thesesgg
127 pages
AI-driven Video Summarization For Optimizing Content Retrieval and Management Through Deep Learning Techniques
No ratings yet
AI-driven Video Summarization For Optimizing Content Retrieval and Management Through Deep Learning Techniques
13 pages
Key-Shots Based Video Summarization by Applying Self-Attention Mechanism
No ratings yet
Key-Shots Based Video Summarization by Applying Self-Attention Mechanism
7 pages
Motion Based Summarization and Grouping of Events For Video Surveillance System
No ratings yet
Motion Based Summarization and Grouping of Events For Video Surveillance System
3 pages
Summary
No ratings yet
Summary
5 pages
VA Lecture 27
No ratings yet
VA Lecture 27
39 pages
Video Summarizing
No ratings yet
Video Summarizing
12 pages
A Hierarchical Visual Model For Video Object Summarization
No ratings yet
A Hierarchical Visual Model For Video Object Summarization
13 pages
Video Summarization Techniques
No ratings yet
Video Summarization Techniques
2 pages
Exploring Global Diverse Attention Via Pairwise
No ratings yet
Exploring Global Diverse Attention Via Pairwise
12 pages
56856-Proof
No ratings yet
56856-Proof
12 pages
Survey Paper - Complete
No ratings yet
Survey Paper - Complete
35 pages
Event Video Mashup: From Hundreds of Videos To Minutes of Skeleton
No ratings yet
Event Video Mashup: From Hundreds of Videos To Minutes of Skeleton
8 pages
Hand Sign Language Recognition Using Deep Learning
No ratings yet
Hand Sign Language Recognition Using Deep Learning
14 pages
Synopsis On: Video Summarization
No ratings yet
Synopsis On: Video Summarization
11 pages
Lin TSM Temporal Shift Module For Efficient Video Understanding ICCV 2019 Paper
No ratings yet
Lin TSM Temporal Shift Module For Efficient Video Understanding ICCV 2019 Paper
11 pages
Sensors 23 03384 v2
No ratings yet
Sensors 23 03384 v2
15 pages
2023 - A Novel Deep Convolutionalencoder Decoder Network Application To Moving Object Detection in Videos
No ratings yet
2023 - A Novel Deep Convolutionalencoder Decoder Network Application To Moving Object Detection in Videos
15 pages
AudioVisual Video Summarization
No ratings yet
AudioVisual Video Summarization
8 pages
Frame Recognize by De-Redundancy Algorithm
No ratings yet
Frame Recognize by De-Redundancy Algorithm
5 pages
Beyond Search - Event-Driven Summarization For Web Videos
No ratings yet
Beyond Search - Event-Driven Summarization For Web Videos
23 pages
A Survey: Video Summarization Using Deep Learning Techniques
No ratings yet
A Survey: Video Summarization Using Deep Learning Techniques
6 pages
Modeling Spatial-Temporal Clues in A Hybrid Deep Learning Framework For Video Classification
No ratings yet
Modeling Spatial-Temporal Clues in A Hybrid Deep Learning Framework For Video Classification
10 pages
Moving Object Based Collision-Free Video Synopsis
No ratings yet
Moving Object Based Collision-Free Video Synopsis
7 pages
Module 3 - Ref Paper
No ratings yet
Module 3 - Ref Paper
10 pages
4 2 Cse499a
No ratings yet
4 2 Cse499a
6 pages
Integrating Spatial and Temporal Dependencies
No ratings yet
Integrating Spatial and Temporal Dependencies
6 pages
Enhanced Generic Video Summarizationusing Large Scale Categorization
No ratings yet
Enhanced Generic Video Summarizationusing Large Scale Categorization
8 pages
Multi-Scale Contrastive Learning For Video Temporal Grounding
No ratings yet
Multi-Scale Contrastive Learning For Video Temporal Grounding
12 pages
Multi Scale Contrastive Learning For Video Temporal Grounding
No ratings yet
Multi Scale Contrastive Learning For Video Temporal Grounding
9 pages
Intelligent Video Surveillance System
No ratings yet
Intelligent Video Surveillance System
21 pages
CSA12 Project Expo
No ratings yet
CSA12 Project Expo
12 pages
Counteracting Temporal Attacks in Video Copy Detec
No ratings yet
Counteracting Temporal Attacks in Video Copy Detec
14 pages
CVPR2016 - Slicing Convolutional Neural Network For Crowd Video Understanding
No ratings yet
CVPR2016 - Slicing Convolutional Neural Network For Crowd Video Understanding
9 pages
Dot Net: SI - No Topic Field
No ratings yet
Dot Net: SI - No Topic Field
39 pages
The Digital Divide in The Post-Snowden Era: Clark, Ian
No ratings yet
The Digital Divide in The Post-Snowden Era: Clark, Ian
33 pages
The Business of Drugs, Law Enforcement Intelligence Analysis
No ratings yet
The Business of Drugs, Law Enforcement Intelligence Analysis
32 pages
Law Enforcement Administration
No ratings yet
Law Enforcement Administration
10 pages
Stattin and Kerr 2000
No ratings yet
Stattin and Kerr 2000
14 pages
Aadhaar 2019042158699525738 2020042452180088197
No ratings yet
Aadhaar 2019042158699525738 2020042452180088197
1 page
Control Room Officer Job Description
No ratings yet
Control Room Officer Job Description
2 pages
V For Vendetta Essay
No ratings yet
V For Vendetta Essay
5 pages
BTRC Form 20191001
No ratings yet
BTRC Form 20191001
1 page
The 'New World Order' Page 6
100% (2)
The 'New World Order' Page 6
27 pages
Rolta VTMS Brochure
No ratings yet
Rolta VTMS Brochure
8 pages
Theories of Privacy and Surveillance: Topic 1 Graham Greenleaf
No ratings yet
Theories of Privacy and Surveillance: Topic 1 Graham Greenleaf
22 pages
Module 14 - Digital Technology and The Web
No ratings yet
Module 14 - Digital Technology and The Web
17 pages
Logipix Stadium Brochure 2021 Web
No ratings yet
Logipix Stadium Brochure 2021 Web
15 pages
NetSDK - C# Programming Manual (Field Surveillance Unit)
No ratings yet
NetSDK - C# Programming Manual (Field Surveillance Unit)
36 pages
Iris Brochure 22 23 V2a
No ratings yet
Iris Brochure 22 23 V2a
52 pages
Smart Mobile Solution
No ratings yet
Smart Mobile Solution
15 pages
Bright 11- Bài tập làm thêm Unit 5 -Keys
No ratings yet
Bright 11- Bài tập làm thêm Unit 5 -Keys
17 pages
Beyond The Smart City: Everyday Entanglements of Technology and Urban Life
50% (2)
Beyond The Smart City: Everyday Entanglements of Technology and Urban Life
12 pages
Enemy of The State
100% (2)
Enemy of The State
3 pages
CP UVR 3201E2 I: 32 Ch. 1080N/720P Digital Video Recorder
No ratings yet
CP UVR 3201E2 I: 32 Ch. 1080N/720P Digital Video Recorder
5 pages
Access Numbers
No ratings yet
Access Numbers
2 pages
Strengthening Nigeria's Downstream Sector
No ratings yet
Strengthening Nigeria's Downstream Sector
16 pages
Does Modern Technology Improve Our Quality of Life
100% (3)
Does Modern Technology Improve Our Quality of Life
2 pages
04 A Report Thermograf Trafo 1 - LVMDP 1
No ratings yet
04 A Report Thermograf Trafo 1 - LVMDP 1
30 pages
Aramco QMIS-V4 Inspection Guide
100% (2)
Aramco QMIS-V4 Inspection Guide
56 pages
U. Gori - Modelling Cyber Security - Approaches, Methodology, Strategies - Volume 59 NATO Science For Peace and Security Series - E - Human and Societal Dynamics (2009) PDF
No ratings yet
U. Gori - Modelling Cyber Security - Approaches, Methodology, Strategies - Volume 59 NATO Science For Peace and Security Series - E - Human and Societal Dynamics (2009) PDF
241 pages
Umid Quick Overview
No ratings yet
Umid Quick Overview
53 pages
English General Paper
No ratings yet
English General Paper
21 pages
سيرفر ccczm فاتح كل الباقات لمدة مفتوحة
No ratings yet
سيرفر ccczm فاتح كل الباقات لمدة مفتوحة
6 pages

Muhammad 2018

Uploaded by

Muhammad 2018

Uploaded by

Accepted Manuscript

Efficient CNN based summarization of surveillance videos for

Khan Muhammad , Tanveer Hussain , Sung Wook Baik

To appear in: Pattern Recognition Letters

Received date: 5 April 2018

Pattern Recognition Letters

Efficient CNN based summarization of surveillance videos for resource-constrained

Keywords: Video Analysis, Video Summarization, Surveillance, Energy-Efficiency, Resource-Constrained

compression and clustering for efficient summarization of videos.

Summarizing the current literature, it can be observed that

2.2 Features extraction

used image memorability and entropy, based on which keyframes

memorability values compared to people on right or left of the

this framework aims to propose an energy-efficient CNN based

generating relevant video summaries.

comparison, a representative video from several video genres are

comparison with keyframes generated by different subjects. The

Table 1: Sample results for selected videos using F-measure score

This work was supported by the National Research

4 Conclusion and future work

surveillance for effective disaster management,"

23, pp. 23-33, 2016. "Secure video summarization framework for

2119-2128, 2013. reconstruction," Pattern Recognition, vol. 48, pp. 522-

Neurocomputing, vol. 155, pp. 84-98, 2015. 5469-5478, 2016.

[49] D. DeMenthon, V. Kobla, and D. Doermann, "Video

source location protection protocol based on dynamic

Things Network," IEEE Access, 2018.

You might also like