0% found this document useful (0 votes)
23 views9 pages

MP Survilence Aii

The document discusses the deployment and evaluation of an AI-enabled Smart Video Solution (SVS) designed to enhance community safety by integrating with existing CCTV infrastructure. It highlights the system's capabilities in real-time anomaly detection, data visualization, and actionable insights for urban planning and public health, while prioritizing privacy and ethical standards. The findings demonstrate the SVS's effectiveness in a community college setting, managing 16 cameras with a consistent throughput and average latency, showcasing its potential for broader civic applications.

Uploaded by

md3054129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views9 pages

MP Survilence Aii

The document discusses the deployment and evaluation of an AI-enabled Smart Video Solution (SVS) designed to enhance community safety by integrating with existing CCTV infrastructure. It highlights the system's capabilities in real-time anomaly detection, data visualization, and actionable insights for urban planning and public health, while prioritizing privacy and ethical standards. The findings demonstrate the SVS's effectiveness in a community college setting, managing 16 cameras with a consistent throughput and average latency, showcasing its potential for broader civic applications.

Uploaded by

md3054129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

From Lab to Field: Real-World Evaluation of an AI-Driven Smart Video Solution

to Enhance Community Safety


Shanle Yao1 * , Babak Rahimi Ardabili 2 * , Armin Danesh Pazho 1 , Ghazal Alinezhad Noghre 1 ,
Christopher Neff1 , Lauren Bourque1 , Hamed Tabkhi1
1
Department of Electrical and Computer Engineering, University of North Carolina at Charlotte
2
Public Policy Program, University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC 28223 USA
syao@charlotte.edu, brahimia@charlotte.edu
arXiv:2312.02078v2 [cs.CV] 4 Sep 2024

Abstract tionalities that go beyond mere surveillance to enhance com-


munity living and environments (Javed et al. 2022).
This article adopts and evaluates an AI-enabled Smart Video SVS leverage advancements in AI and machine learning
Solution (SVS) designed to enhance safety in the real world.
The system integrates with existing infrastructure camera net-
to analyze video data in real-time, providing actionable in-
works, leveraging recent advancements in AI for easy adop- sights that can improve city infrastructure and community
tion. Prioritizing privacy and ethical standards, pose based services (Ardabili et al. 2023b). For example, the results
data is used for downstream AI tasks such as anomaly detec- can be used to inform urban planners about pedestrian flows
tion. Cloud-based infrastructure and mobile app are deployed, for better traffic management (Rashvand et al. 2023), help
enabling real-time alerts within communities. The SVS em- public health officials monitor social distancing (Usmani
ploys innovative data representation and visualization tech- et al. 2023), and assist in the efficient allocation of commu-
niques, such as the Occupancy Indicator, Statistical Anomaly nity resources (Dechouniotis et al. 2020). Such applications
Detection, Bird’s Eye View, and Heatmaps, to understand demonstrate the potential of SVS to transform video data
pedestrian behaviors and enhance public safety. Evaluation of into valuable information for civic improvements.
the SVS demonstrates its capacity to convert complex com-
puter vision outputs into actionable insights for stakeholders, While traditional research has focused on the develop-
community partners, law enforcement, urban planners, and ment of these systems in controlled environments (Pazho
social scientists. This article presents a comprehensive real- et al. 2023a), it is crucial to transition and evaluate SVS
world deployment and evaluation of the SVS, implemented in real-world settings to address challenges such as latency,
in a community college environment across 16 cameras. The scalability, and privacy concerns (Ardabili et al. 2023a; Xu
system integrates AI-driven visual processing, supported by et al. 2018). Moving from lab to field ensures that these sys-
statistical analysis, database management, cloud communi- tems are robust and adaptable, capable of handling the com-
cation, and user notifications. Additionally, the article evalu- plexities of dynamic urban environments and diverse com-
ates the end-to-end latency from the moment an AI algorithm munity needs (Noghre et al. 2023).
detects anomalous behavior in real-time at the camera level
to the time stakeholders receive a notification. The results The architecture of the SVS is illustrated in Figure 1,
demonstrate the system’s robustness, effectively managing 16 which represents the end-to-end design (Ardabili et al.
CCTV cameras with a consistent throughput of 16.5 frames 2023b). It begins with the environment where video data is
per second (FPS) over a 21-hour period and an average end- captured by existing CCTV cameras. The data is then pro-
to-end latency of 26.76 seconds between anomaly detection cessed by the AI module, where computer vision techniques
and alert issuance. are applied for human behavior analysis. The processed in-
formation is sent to the server module for further data analy-
sis and aggregation (Pazho et al. 2023a). To ensure privacy,
Introduction metadata is stored in the cloud module, which also hosts a
Closed-Circuit Television (CCTV) cameras are widely ap- mobile application. This setup allows end-users to receive
plied in public spaces, traditionally serving as passive notifications and access insights on their devices, fostering
surveillance tools aimed primarily at security and monitor- real-time engagement and decision-making.
ing (Ashby 2017). However, these systems often underuti- This article introduces the deployment and evaluation of
lized the potential of visual data they capture, missing op- an AI-enabled SVS designed to enhance safety and civic
portunities to contribute to broader civic applications such functionalities in community spaces such as educational ar-
as urban planning, public health, and resource management eas, parking lots, and smart city applications. By employ-
(Angelidou et al. 2018). By transforming CCTV networks ing privacy-preserving techniques, the system utilizes pose
into Smart Video Solutions (SVS), we can unlock new func- based data for human behavior analysis, aligning with ethi-
cal standards. The SVS applies innovative data visualization
* These authors contributed equally. tools, including Occupancy Indicators, Statistical Anomaly
Figure 1: The end-to-end system architecture coupled with notification’s data flow. S1 represents the first scene that detects a
behavior anomaly. The yellow line shows the notification data flow to the notification service. Sn shows the nth scene where a
suspicious object has been detected. The red line shows the notification data flow when detecting object anomalies.

Detection, Bird’s Eye View, and Heatmaps, to provide in- Recent advancements in machine learning have enabled
sights into pedestrian behaviors and resource allocation, smart video technology to perform complex data analysis,
thereby aiding urban planning and enhancing public safety. offering insights for civic applications. For instance, Pra-
The SVS has been implemented in a community college manik et al. demonstrated lightweight models for edge de-
to evaluate its real-world effectiveness. Our findings demon- vices to reduce latency in urban environments (Pramanik,
strate the system’s capability to manage 16 CCTV cameras Sarkar, and Maiti 2021). Singh et al. introduced an AI-
with a throughput of 16.5 frames per second (FPS) over a based video technology for fall detection, highlighting the
21-hour period, maintaining an average latency of 26.76 sec- potential for applications beyond traditional safety measures
onds for detecting behavioral anomalies and notifying users. (Singh et al. 2023).
In summary, the contributions of this article are: In 2023, the E2E-VSDL method by Gandapur utilized Bi-
• Adopting an AI-enabled SVS with existing CCTV infras- GRU and CNN for anomaly detection, achieving high accu-
tructure in real world, emphasizing civic applications be- racy (Gandapur 2022). Similarly, TAC-Net’s deep learning
yond traditional security measures. approach excelled in addressing anomaly scenarios (Huang
et al. 2021).
• Presenting innovative data representation techniques that
Integrating these systems into smart city applications
enhance urban planning and resource allocation by pro-
bridges the gap between theoretical research and practical
viding insights into pedestrian behaviors.
implementation. Alshammari et al. developed an SVS us-
• Providing a comprehensive real-world evaluation of the ing surveillance cameras in real-world settings (Alshammari
SVS, demonstrating its effectiveness, scalability, and and Rawat 2019), and RV College of Engineering Bengaluru
privacy-preserving capabilities in community settings. improved object detector accuracy (Franklin and Dabbagol
2020). These efforts highlight the necessity of robust testbed
Related Works support for evaluating SVS, addressing scalability and pri-
Recent advancements in AI-driven video systems have vacy challenges (Ma et al. 2019).
leveraged state-of-the-art algorithms and system designs to The combination of machine learning and IoT devices
perform high-level AI tasks using computer vision. Ancilia, has transformed urban planning, providing insights for man-
for instance, employs a multi-stage computer vision pipeline aging congestion and enhancing civic participation. Ard-
aimed at enhancing public safety. While we have imple- abili et al. introduced an end-to-end system design that inte-
mented this system as the baseline for our deployment, it grates Ancilia with an end-user device for direct communi-
is important to note that the original study reports the results cation. We have adopted this comprehensive system design
in controlled environment. (Pazho et al. 2023a). for our study’s deployment, allowing us to assess its effec-
The deployment of SVS in real-world settings is essential tiveness in practical, real-world environments (Ardabili et al.
to validate their effectiveness and address challenges such as 2023b). Mahdavinejad et al. emphasized machine learning’s
latency, scalability, and privacy. While traditional research role in forecasting urban congestion (Mahdavinejad et al.
has focused on laboratory-based studies for real-time object 2018), while Zanella et al. advocated for accessible urban
tracking and anomaly detection (Pazho et al. 2023b; S. et al. IoT data (Zanella et al. 2014). These studies illustrate how
2021), there is a growing recognition of the need to evaluate data-driven approaches contribute to civic applications.
these systems in practical environments. Real-world evalua- Despite progress in SVS research, comprehensive testbed
tion ensures that this technology can adapt to dynamic urban support for evaluating real-world system performance re-
settings and meet diverse community needs. mains limited. Our study addresses this gap by deploying a
C0 AI Module N0 Server Cloud
Module Module
Object Pose Anomaly
Tracker
Detector Estimator Detector
Global App
Tracker Low-latency
DB Development
Crop Selection Feature Extractor

Authenti-
...

...
Messaging
cation Service

Cn AI Module Nn Database

Object Pose Anomaly End Users


Tracker
Detector Estimator Detector

Statistical
Analysis
Crop Selection Feature Extractor

Figure 2: End-to-end detailed system. C0 represents the camera ID and for each camera, one AI Module N0 including multi-
AI-vision-models pipeline is assigned with. All the AI Modules send processed data to one Server Module database, and Server
Module will re-identify the human track ID based on the feature extractor data. The statistical analyzer analyzes all the data in
the database stored across all the cameras and communicates the results with Cloud Module. Cloud-native services are utilized
to host the end users’ applications.

Indoor
Outdoor
objects and flags anomalies (Pazho et al. 2023a). A track-
ing algorithm then creates tracklets, and pose estimator ex-
7 tracts 2D skeletal data for individuals. Object anomaly de-
tection and behavioral anomaly detection (Noghre, Pazho,
and Tabkhi 2024; Pazho et al. 2024) are performed within
this module respectively, with alerts sent to end-user devices
3 for real-time response (Pazho et al. 2023b; Yao et al. 2024).
In our feature, YOLOv8 (Jocher, Chaurasia, and Qiu
2023), Bytetrack (Zhang et al. 2022), HRNet (Sun et al.
4 2019), GEPC (Markovitz et al. 2020a) and OSNet (Zhou
2 et al. 2019) are used for further evaluation. The detailed
decision-making process is provided in the supplementary
materials for replication and further investigation.
Server Module centralizes metadata and historical data
Figure 3: Locations of Cameras on the Campus. A maximum from AI modules, stored in a MySQL database as outlined
of 16 cameras are used in Table 1. It handles global tracking and statistical analy-
sis, using cosine similarity to re-identify individuals across
state-of-the-art SVS in a community college, demonstrating cameras to analyze patterns (Pazho et al. 2023a). The design
its capabilities and potential for enhancing civic applications prioritizes privacy by avoiding raw data transmission to the
in public settings. Cloud Module.
Cloud Module leverages cloud-native services for data
Software System Features storage, management, and API generation (Dahunsi, Idogun,
The AI-based real-time video solution seamlessly integrates and Olawumi 2021). It minimizes lag between anomaly de-
with existing CCTV infrastructures, creating a Physical- tection and notification, using a rule-based messaging ser-
Cyber-Physical (PCP) system that delivers actionable infor- vice to send real-time alerts via email, text, or app notifica-
mation to end users. Such system consists of four key com- tions. A low-latency database supports real-time data access,
ponents: AI Modules, Server Module, Cloud Module, and while APIs generated through an application development
End-user Devices, as illustrated in Figure 2. The design of kit optimize data retrieval (Mohammed et al. 2022; Ardabili
the AI and server modules was inspired by work (Pazho et al. et al. 2023b).
2023a), while the cloud module and user device components
End User Devices is designed to promptly notify users of
were adapted from another study (Ardabili et al. 2023b).
detected anomalies via a smartphone application (Ardabili
AI Module/Modules is a multi-stage pipeline optimized et al. 2023a). The app provides real-time data and analy-
for real-time computer vision tasks. It processes image data sis, ensuring consistent functionality across different devices
in batches of 30 frames, where an object detector identifies and operating systems, thereby enhancing accessibility and
Table 1: Example visualization of the database at Server Module

Record Bounding Anomaly


Camera Class ID Feature Local ID Global ID
Time Box Score
00:00:00 1 0 [x, y, w, h] Tensors 15 1001 40
00:00:01 2 0 [x, y, w, h] Tensors 21 1001 40
.... ... .. ... ... .. .. ..
23:59:59 1 0 [x, y, w, h] Tensors 9999 1001 40

800 800
G_ID 563
700 700 G_ID 581
600 600 G_ID 598

BirdsEye_Y
G_ID 620
500 500
G_ID 622
400 400

Y
G_ID 623
300 300 G_ID 626
G_ID 633
200 200
G_ID 634
100 100
Camera 1 Camera 2
0 0
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
X BirdsEye_X
(a) (b)

Figure 5: Comparing different views of one camera. The


graph (a) generated directly from database shows the origi-
Camera 3 Camera 4 nal camera view. The graph (b) represents the average bird’s
Figure 4: This represents an example of 4 pipelines running eye view coordinates.
in the testbed with IP cameras deployed at different loca-
tions.
Applications and Visualizations
This section illustrates how the data collected from real
delivering information efficiently. world can be analyzed and visualized to enhance situational
awareness in applications such as urban planning, resource
Deployment and Setup allocation, and crowd management.

In lab settings, each advanced software features demon- Descriptive Data


strates significant potential individually; however, to fully In our descriptive analysis, the Global ID from human fea-
realize its benefits, a comprehensive end-to-end evaluation ture data serves as the unique identifier for tracking across
in a real-world environment is essential. Such an evalua- data streams. We present five key metrics to provide a gen-
tion allows for assessing the system’s load handling and en- eral overview of traffic flow and occupancy trends: real-time
durance under actual conditions, ensuring that it can perform count of people, hourly average number of people per cam-
reliably and effectively when deployed, which might not be era per location, total number of people and peak hour anal-
evident in a controlled environment. ysis over time. These metrics help to monitor occupancy, un-
In our study, an existing network of 16 AXIS IP cam- derstand distribution trends, and identify peak traffic times,
eras operating at 30 FPS with 720p resolution across the aiding in resource allocation and emergency planning.
college campus is applied, covering approximately 35,000
square feet indoors and 60,000 square feet outdoors. Figure Situational Awareness
3 visually represents the camera placements. Three cameras Situational awareness is crucial for effective decision-
monitor outdoor areas, while the remaining 13 oversee var- making across various domains. Our study explores four fol-
ious indoor locations, including entry points, vending ma- lowing key visualization techniques:
chines, hallways, and communal spaces. The indoor cam-
eras are mounted at 7 feet 6 inches, and the outdoor cameras Occupancy Indicator provides insight into the number of
at 10 feet 8 inches. Each camera has a varifocal lens with people in a specific location by comparing current occu-
a horizontal field of view of 100 to 36 degrees. The sys- pancy levels to historical data, which particularly useful in
tem operates on a dedicated server equipped with a 16-core emergencies or public health scenarios. To calculate occu-
CPU, 252 GB of memory, and four GPUs, each with 24 GB pancy, the system compares current detections with histori-
VRAM. cal percentiles, categorizing occupancy as ”Low,” ”Normal,”
or ”High.”
As shown in Figure 4, a sample output from the Local
Node during a scenario with four operational cameras is pre- Statistical Anomaly often indicate unusual traffic flow.
sented, with privacy-preserving instance segmentation ap- By analyzing historical data, the system establishes a base-
plied. line, identifying deviations as anomalies. The system cal-
800 0.5 0.5 0.5 0.5
800 800
800
0.4 0.4 0.4 0.4
BirdsEye_Y

BirdsEye_Y

BirdsEye_Y

BirdsEye_Y
600 600 600
600
0.3 0.3 0.3 0.3
400 400 400 400
0.2 0.2 0.2 0.2
200 0.1 200 0.1 200 0.1 200 0.1

0 0 0 0 0 0 0 0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
BirdsEye_X BirdsEye_X BirdsEye_X BirdsEye_X
(a) Camera 2- Weekday (b) Camera 2- Weekend (c) Camera 7- Weekday (d) Camera 7- Weekend

Figure 6: Heatmap representation of the two cameras during weekday and weekends. Graph (a) and (c) represent the heatmaps
during weekdays, while graph (b) and (d) show congestion of the same areas during weekends.

culates such by comparing current detections with histor- statement, ”The presence of video surveillance makes me
ical data, flagging events that exceed two standard devia- feel safer.” Furthermore, respondents found our AI-driven
tions from the mean. In our analysis, zero detections were video solution more beneficial, with over 50% rating it as
excluded to avoid skewing the data. very or extremely beneficial, compared to current passive
technologies, which 42% rated as not effective or only mod-
Bird’s Eye View provides an accurate spatial representa-
erately effective. Our solution also raised fewer privacy con-
tion, ideal for crowd management and area planning. The
cerns, with 66% of respondents reporting no privacy con-
system normalizes object dimensions and uses a scale fac-
cerns or only moderate concerns. These findings suggest a
tor to accurately position objects within this view. Figure 5
strong preference for AI-driven surveillance solutions over
compares the original and processed Bird’s Eye View from
traditional passive technologies, especially in public places.
one camera, showing how this technique refines and clarifies
spatial data.
System Evaluation and Results
Heat map visually represent data distribution, useful for Load Stress Evaluation
monitoring congestion and spatial usage. The system gen-
erates 2D heatmaps from Bird’s Eye View coordinates, ap- This section evaluated the system’s performance under in-
plying Gaussian smoothing for clarity. Figure 6 presents creasing input loads by varying crowd densities from videos.
heatmaps for different 24-hour intervals. Graph (a) and (c) Key metrics such as average latency and throughput were
show weekday pedestrian movement, while graph (b) and measured as we scaled the system using different numbers
(d) represent the weekend congestions at the same areas. of pipeline nodes—one, four, eight, and twelve—to assess
Comparing the density pattern during weekday and week- scalability and adaptability under increased workloads.
end, reflect differences in area usage. To rigorously assess the system’s performance under
The detailed algorithms for all visualization tools are varying crowd densities and parallel node operations, we
available in the supplementary materials for replication and prepared ten videos with density levels ranging from 0 to
further investigation. 9 from CHAD dataset (Pazho et al. 2022). Each video lasted
150 seconds and operated at a frame rate of 30 frames per
Community Engagement second. To ensure the statistical validity of our results, we
averaged metrics over the central 100 batches in each run,
To gauge public perception of such solution, we conducted excluding the initial 25 batches for system warm-up and the
seven rounds of survey studies in six public places across last 25 batches for cool-down effects.
Charlotte, NC, during July and August 2023. A total of 410 Figure 8 graphically presents the trends for throughput
respondents participated, providing insights into their con- and latency. The X-axis, labeled ”Density” (count), rep-
cerns about safety and surveillance. resents the average number of humans detected per batch
The results indicated that respondents were most con- (comprising 30 frames) for each input node number in the
cerned about safety in parking lots (54%), public transit experiment. With one and four camera nodes, our system
(44%), and entertainment venues (42%). Additionally, sig- maintained latency under 10 seconds, showcasing stability
nificant concerns were raised about potential biases and regardless of scene crowdness. However, with eight nodes,
discrimination in current passive surveillance technologies, latency increased beyond 20 seconds at higher density lev-
with these concerns being important or very important to els, particularly from level 5 onwards. A notable decrease
respondents across all demographic groups, highlighting a in latency was observed with 12 nodes, especially at density
widespread sensitivity to this issue. 1 levels 8 and 9, correlating with processing 108 individuals
Interestingly, vulnerable groups, such as females and peo- (12 nodes x 9 density).
ple of color, were statistically more likely to agree with the Throughput consistently declined linearly with increas-
1
This study has been previously published, but in adherence to ing density and node count, reaching a low of 4.56 FPS at
the double-blind review policy of the AAAI conference, the refer- the highest density level 9 with 12 nodes. The performance
ence will be provided in the camera-ready version upon acceptance. decrease is primarily due to the computational demands of
W e d T h u T h u T h u T h u F r i F r i F r i S a t S a t S a t S u n S u n S u n M o n M o n M o n T u e T u e W e d W e d

L a te n c y
1 0 .5
L a te n c y (s)

8 .5
6 .5
4 .5
2 .5
T h r o u g h p u t
T h r o u g h p u t (F P S )

3 0 .5
2 6 .5
2 2 .5
1 8 .5
1 4 .5
D e n s ity
D e n s ity (c o u n t)

8 .0 0
6 .0 0
4 .0 0
2 .0 0
0 .0 0

1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0
T im e

Figure 7: Latency and throughput trends concerning crowd densities during a week-long period for 8 camera nodes

60 1 Node 60 1 Node • Sixteen-Node Setup: Utilizes all cameras for compre-


Throughput (FPS)

50 4 Nodes 4 Nodes
50 hensive coverage.
8 Nodes 8 Nodes
Latency (s)

40 12 Nodes 40 12 Nodes
30 30
20 20
10 10 As shown in Figure 7, the system’s performance over a
0 0 week, running with eight cameras, demonstrated consistent
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
latency and throughput results about crowd density. Specifi-
Density (count) Density (count)
cally, the latency ranged from 2.5-7.8s, and throughput var-
(a) Latency Trends (b) Throughput Trends ied from 22.3-29.8 FPS. These findings confirm that irre-
spective of crowd density, the latency increased. At the same
Figure 8: Latency and Throughput trends with respect to time, the FPS decreased, aligning with expectations and un-
crowd densities across different nodes number running in derscoring that the length of the videos does not significantly
parallelism. impact the system’s performance.

It is essential to note that the specific data spikes observed


pose estimator (Sun et al. 2019) processing high-density within this evaluation could stem from various factors, in-
keypoints and is further constrained by CPU computation cluding network irregularities within the environment (as
limits. As node count and density increase, these factors col- the input streams originate from IP cameras) or CPU mem-
lectively create a bottleneck, leading to the observed decline ory usage nearing its limits. However, in light of the com-
in throughput and latency. prehensive experiment, these occasional data point spikes
remain within acceptable bounds. The system consistently
Real-World Endurance Evaluation maintained optimal performance levels, showcasing its ro-
This evaluation assessed the system’s long-term stability us- bustness and capacity to effectively manage and adapt to
ing all 16 cameras in the testbed. We conducted extended various operational scenarios.
trials with eight, twelve, and sixteen nodes running continu-
ously for 21 hours and a week-long test using eight nodes to For configurations with 8, 12, and 16 camera nodes,
ensure the system’s robustness. This setup aimed to validate we observed similar trends over the 21-hour-long experi-
the system’s performance in realistic, extended operational ment with varying performance metrics: average latencies
conditions. ranged from 2.6-4.8s, 5.3-6.5s, and 6.7-10.5s, respectively,
while corresponding throughputs ranged from 28.5-26.5
• Eight-Node Setup: Covers key indoor areas, including FPS, 20.5-18 FPS, and 16.5-14.5 FPS. Detailed results with
entry points, hallways, and common spaces. visualizations for 8, 12, and 16 camera nodes are provided
• Twelve-Node Setup: Adds outdoor monitoring of park- in the supplementary materials for replication and further in-
ing lots. vestigation.
16 Nodes manual recording of latency data may introduce minor dis-
12 Nodes crepancies.
0.8 8 Nodes The results indicate that increasing the number of cam-
4 Nodes eras significantly impacts PCP latency. For example, ob-
0.6
Probablity

ject anomaly latency increased from 4.7 seconds with four


cameras to 19.68 seconds with sixteen, while behavioral
0.4 anomaly latency rose from 9.34 seconds to 26.76 seconds.
This increase is attributed to system traffic, multitasking
0.2 overhead, and the processing capacity of the system.
Figure 9 displays the distribution of 400 data points from
0 our PCP experiments. Graph (a) shows the distribution for
object anomaly detection, with less variation in the four-
camera setup compared to the sixteen-camera setup. Graph
(b) illustrates the distribution for behavioral anomaly detec-
tion, where the data is more scattered with an increased num-
ber of cameras. The standard deviation for object detection
2 4 6 8 10 12 14 16 18 20 22
latency increased from 0.65 to 1.12 seconds as the camera
Latency (s)
count increased, indicating a slight broadening of latency
(a) Object Anomaly due to system complexity and resource contention.
16 Nodes This evaluation demonstrates the system’s capability to
12 Nodes enhance safety and operational efficiency in real-world sce-
8 Nodes narios. The ability to notify end users within approximately
4 Nodes
20 seconds is crucial for managing anomalies in public
0.4 spaces, aiding in the prevention of potential dangers and im-
Probablity

proving resource allocation.


Overall, these real-world evaluations allows research how
0.2 these key parameters, such as the number of cameras and
crowd density, could affect system performance, such as la-
tency and throughput in different public safety scenarios in
0 case of emergency, is crucial in decision making and tak-
ing the proper actions. For example, retail and transport
hubs could analyze foot traffic and crowd density, offering
insights into peak hours and emergency response planning
(Hosseini et al. 2024).

8 10 12 14 16 18 20 22 24 26 28 30 Conclusion
Latency (s) This paper presented a comprehensive evaluation of an
(b) Behavior Anomaly AI-based real-time video solution, integrated with exist-
ing CCTV infrastructures, to enhance situational awareness
Figure 9: Represent latency rug plot with different anomaly through object and behavioral anomaly detection. The sys-
scenarios. 50 data points for each camera number tem was tested across various configurations, including up to
sixteen camera nodes, to assess its performance under real-
Physical-Cyber-Physical Evaluation (Anomaly world conditions.
detection) Our evaluation highlighted the system’s ability to main-
tain operational efficiency with increasing node counts, de-
This section evaluates the system’s Physical-Cyber-Physical spite a corresponding rise in PCP latency. Object and be-
(PCP) latency, representing the end-to-end time from when havioral anomaly detection latencies ranged from 4.7 sec-
an anomaly appears to when the end user receives a notifi- onds with four cameras to 26.76 seconds with sixteen, re-
cation. We assessed both object and behavioral anomaly de- flecting the system’s scalability and robustness. Through ex-
tection (Markovitz et al. 2020b) using four, eight, twelve, tensive testing, the system demonstrated its capability to de-
and sixteen AI pipeline nodes reading input frames from liver timely notifications to end users, crucial for managing
the same camera node, conducting twenty experiments per anomalies in public spaces. This work underscores the im-
anomaly type and collecting 400 data samples. The latency portance of real-world evaluations in optimizing AI-driven
measures the time from detecting anomalies, such as high- video solutions, ensuring they meet the demands of dynamic
priority objects or dangerous behaviors like fighting, to alert- environments and contribute to enhanced public safety and
ing the end user. A three-second transmission delay from operational efficiency.
cameras to AI Nodes was observed, slightly higher than
other IP systems, due to our unique network setup. This
delay is included in the overall PCP measurements, though
References Ma, M.; Preum, S. M.; Ahmed, M. Y.; Tärneberg, W.; Hen-
Alshammari, A.; and Rawat, D. 2019. Intelligent multi- dawi, A.; and Stankovic, J. A. 2019. Data Sets, Model-
camera video surveillance system for smart city applica- ing, and Decision Making in Smart Cities: A Survey. ACM
tions. In 2019 IEEE 9th Annual Computing and Commu- Trans. Cyber-Phys. Syst., 4(2).
nication Workshop and Conference (CCWC), 0317–0323. Mahdavinejad, M. S.; Rezvan, M.; Barekatain, M.; Adibi,
IEEE. P.; Barnaghi, P.; and Sheth, A. P. 2018. Machine learning
Angelidou, M.; Psaltoglou, A.; Komninos, N.; Kakderi, C.; for Internet of Things data analysis: A survey. Digital Com-
Tsarchopoulos, P.; and Panori, A. 2018. Enhancing sus- munications and Networks, 4(3): 161–175.
tainable urban development through smart city applications. Markovitz, A.; Sharir, G.; Friedman, I.; Zelnik-Manor, L.;
Journal of science and technology policy management, 9(2): and Avidan, S. 2020a. Graph embedded pose clustering
146–169. for anomaly detection. In Proceedings of the IEEE/CVF
Ardabili, B.; Pazho, A.; Noghre, G.; Neff, C.; Conference on Computer Vision and Pattern Recognition,
Bhaskararayuni, S.; Ravindran, A.; Reid, S.; and Tabkhi, 10539–10547.
H. 2023a. Understanding Policy and Technical Aspects of Markovitz, A.; Sharir, G.; Friedman, I.; Zelnik-Manor, L.;
AI-Enabled Smart Video Surveillance to Address Public and Avidan, S. 2020b. Graph embedded pose clustering
Safety. arXiv preprint, arXiv:2302.04310. for anomaly detection. In Proceedings of the IEEE/CVF
Ardabili, B. R.; Pazho, A. D.; Noghre, G. A.; Neff, C.; Conference on Computer Vision and Pattern Recognition,
Bhaskararayuni, S. D.; Ravindran, A.; Reid, S.; and Tabkhi, 10539–10547.
H. 2023b. Understanding Policy and Technical Aspects Mohammed, S.; Fiaidhi, J.; Sawyer, D.; and Lamouchie, M.
of AI-Enabled Smart Video Surveillance to Address Public 2022. Developing a GraphQL SOAP Conversational Mi-
Safety. Computational Urban Science, 3(1): 21. cro Frontends for the Problem Oriented Medical Record
Ashby, M. P. 2017. The value of CCTV surveillance cameras (QL4POMR). In Proceedings of the 6th International Con-
as an investigative tool: An empirical analysis. European ference on Medical and Health Informatics, 52–60.
Journal on Criminal Policy and Research, 23(3): 441–459. Noghre, G. A.; Pazho, A. D.; Katariya, V.; and Tabkhi,
Dahunsi, F.; Idogun, J.; and Olawumi, A. 2021. Commercial H. 2023. Understanding the challenges and opportuni-
cloud services for a robust mobile application backend data ties of pose-based anomaly detection. arXiv preprint
storage. Indonesian Journal of Computing, Engineering and arXiv:2303.05463.
Design (IJoCED), 3(1): 31–45. Noghre, G. A.; Pazho, A. D.; and Tabkhi, H. 2024. An ex-
Dechouniotis, D.; Athanasopoulos, N.; Leivadeas, A.; Mit- ploratory study on human-centric video anomaly detection
ton, N.; Jungers, R.; and Papavassiliou, S. 2020. Edge through variational autoencoders and trajectory prediction.
computing resource allocation for dynamic networks: The In Proceedings of the IEEE/CVF Winter Conference on Ap-
DRUID-NET vision and perspective. Sensors, 20(8): 2191. plications of Computer Vision, 995–1004.
Franklin, R.; and Dabbagol, V. 2020. Anomaly detection in Pazho, A.; Noghre, G.; Ardabili, B.; Neff, C.; and Tabkhi,
videos for video surveillance applications using neural net- H. 2022. Chad: Charlotte anomaly dataset. arXiv preprint,
works. In 2020 Fourth International Conference on Inven- arXiv:2212.09258.
tive Systems and Control (ICISC), 632–637. IEEE. Pazho, A. D.; Neff, C.; Noghre, G. A.; Ardabili, B. R.; Yao,
Gandapur, M. 2022. E2E-VSDL: End-to-end video S.; Baharani, M.; and Tabkhi, H. 2023a. Ancilia: Scalable
surveillance-based deep learning model to detect and pre- intelligent video surveillance for the artificial intelligence of
vent criminal activities. Image and Vision Computing, 123: things. IEEE Internet of Things Journal.
104467. Pazho, A. D.; Noghre, G. A.; Katariya, V.; and Tabkhi,
Hosseini, S. S.; Ardabili, B. R.; Azarbayjani, M.; Pu- H. 2024. VT-Former: An Exploratory Study on Vehi-
lugurtha, S.; and Tabkhi, H. 2024. Understanding the Tran- cle Trajectory Prediction for Highway Surveillance through
sit Gap: A Comparative Study of On-Demand Bus Ser- Graph Isomorphism and Transformer. In Proceedings of
vices and Urban Climate Resilience in South End, Char- the IEEE/CVF Conference on Computer Vision and Pattern
lotte, NC and Avondale, Chattanooga, TN. arXiv preprint Recognition, 5651–5662.
arXiv:2403.14671. Pazho, A. D.; Noghre, G. A.; Purkayastha, A. A.; Vempati,
Huang, C.; Wu, Z.; Wen, J.; Xu, Y.; Jiang, Q.; and Wang, J.; Martin, O.; and Tabkhi, H. 2023b. A survey of graph-
Y. 2021. Abnormal event detection using deep contrastive based deep learning for anomaly detection in distributed sys-
learning for intelligent video surveillance system. IEEE tems. IEEE Transactions on Knowledge and Data Engineer-
Transactions on Industrial Informatics, 18(8): 5171–5179. ing, 36(1): 1–20.
Javed, A. R.; Shahzad, F.; ur Rehman, S.; Zikria, Y. B.; Raz- Pramanik, A.; Sarkar, S.; and Maiti, J. 2021. A real-time
zak, I.; Jalil, Z.; and Xu, G. 2022. Future smart cities: video surveillance system for traffic pre-events detection.
Requirements, emerging technologies, applications, chal- Accident Analysis and Prevention, 154: 106019.
lenges, and future aspects. Cities, 129: 103794. Rashvand, N.; Hosseini, S. S.; Azarbayjani, M.; and Tabkhi,
Jocher, G.; Chaurasia, A.; and Qiu, J. 2023. YOLO by H. 2023. Real-Time Bus Arrival Prediction: A Deep Learn-
Ultralytics (Version 8.0.0). https://github.com/ultralytics/ ing Approach for Enhanced Urban Mobility. arXiv preprint
ultralytics. arXiv:2303.15495.
S., J.; C., S.; E., Y.; and GP., J. 2021. Real time object detec-
tion and trackingsystem for video surveillance system. Mul-
timedia Tools and Applications, 80: 3981–96.
Singh, R.; Srivastava, H.; Gautam, H.; Shukla, R.; and
Dwivedi, R. 2023. An Intelligent Video Surveillance Sys-
tem using Edge Computing based Deep Learning Model.
In 2023 International Conference on Intelligent Data Com-
munication Technologies and Internet of Things (IDCIoT),
439–444. IEEE.
Sun, K.; Xiao, B.; Liu, D.; and Wang, J. 2019. Deep high-
resolution representation learning for human pose estima-
tion. In CVPR.
Usmani, U. A.; Watada, J.; Jaafar, J.; Aziz, I. A.; and Roy,
A. 2023. A Deep Learning Algorithm to Monitor Social
Distancing in Real-Time Videos: A Covid-19 Solution. In
Interpretable Cognitive Internet of Things for Healthcare,
73–90. Springer.
Xu, R.; Nikouei, S.; Chen, Y.; Polunchenko, A.; Song, S.;
Deng, C.; and Faughnan, T. 2018. Real-time human ob-
jects tracking for smart surveillance at the edge. In 2018
IEEE international conference on communications (ICC),
1–6. IEEE.
Yao, S.; Noghre, G. A.; Pazho, A. D.; and Tabkhi, H. 2024.
Evaluating the Effectiveness of Video Anomaly Detection
in the Wild: Online Learning and Inference for Real-world
Deployment. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR) Work-
shops, 4832–4841.
Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; and
Zorzi, M. 2014. Internet of things for smart cities. IEEE
Internet of Things journal, 1(1): 22–32.
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.;
Luo, P.; Liu, W.; and Wang, X. 2022. Bytetrack: Multi-
object tracking by associating every detection box. In Com-
puter Vision–ECCV 2022: 17th European Conference, Tel
Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII,
1–21. Cham: Springer Nature Switzerland.
Zhou, K.; Yang, Y.; Cavallaro, A.; and Xiang, T. 2019.
Omni-scale feature learning for person re-identification. In
Proceedings of the IEEE/CVF international conference on
computer vision, 3702–3712.

You might also like