MP Survilence Aii
MP Survilence Aii
Detection, Bird’s Eye View, and Heatmaps, to provide in- Recent advancements in machine learning have enabled
sights into pedestrian behaviors and resource allocation, smart video technology to perform complex data analysis,
thereby aiding urban planning and enhancing public safety. offering insights for civic applications. For instance, Pra-
The SVS has been implemented in a community college manik et al. demonstrated lightweight models for edge de-
to evaluate its real-world effectiveness. Our findings demon- vices to reduce latency in urban environments (Pramanik,
strate the system’s capability to manage 16 CCTV cameras Sarkar, and Maiti 2021). Singh et al. introduced an AI-
with a throughput of 16.5 frames per second (FPS) over a based video technology for fall detection, highlighting the
21-hour period, maintaining an average latency of 26.76 sec- potential for applications beyond traditional safety measures
onds for detecting behavioral anomalies and notifying users. (Singh et al. 2023).
In summary, the contributions of this article are: In 2023, the E2E-VSDL method by Gandapur utilized Bi-
• Adopting an AI-enabled SVS with existing CCTV infras- GRU and CNN for anomaly detection, achieving high accu-
tructure in real world, emphasizing civic applications be- racy (Gandapur 2022). Similarly, TAC-Net’s deep learning
yond traditional security measures. approach excelled in addressing anomaly scenarios (Huang
et al. 2021).
• Presenting innovative data representation techniques that
Integrating these systems into smart city applications
enhance urban planning and resource allocation by pro-
bridges the gap between theoretical research and practical
viding insights into pedestrian behaviors.
implementation. Alshammari et al. developed an SVS us-
• Providing a comprehensive real-world evaluation of the ing surveillance cameras in real-world settings (Alshammari
SVS, demonstrating its effectiveness, scalability, and and Rawat 2019), and RV College of Engineering Bengaluru
privacy-preserving capabilities in community settings. improved object detector accuracy (Franklin and Dabbagol
2020). These efforts highlight the necessity of robust testbed
Related Works support for evaluating SVS, addressing scalability and pri-
Recent advancements in AI-driven video systems have vacy challenges (Ma et al. 2019).
leveraged state-of-the-art algorithms and system designs to The combination of machine learning and IoT devices
perform high-level AI tasks using computer vision. Ancilia, has transformed urban planning, providing insights for man-
for instance, employs a multi-stage computer vision pipeline aging congestion and enhancing civic participation. Ard-
aimed at enhancing public safety. While we have imple- abili et al. introduced an end-to-end system design that inte-
mented this system as the baseline for our deployment, it grates Ancilia with an end-user device for direct communi-
is important to note that the original study reports the results cation. We have adopted this comprehensive system design
in controlled environment. (Pazho et al. 2023a). for our study’s deployment, allowing us to assess its effec-
The deployment of SVS in real-world settings is essential tiveness in practical, real-world environments (Ardabili et al.
to validate their effectiveness and address challenges such as 2023b). Mahdavinejad et al. emphasized machine learning’s
latency, scalability, and privacy. While traditional research role in forecasting urban congestion (Mahdavinejad et al.
has focused on laboratory-based studies for real-time object 2018), while Zanella et al. advocated for accessible urban
tracking and anomaly detection (Pazho et al. 2023b; S. et al. IoT data (Zanella et al. 2014). These studies illustrate how
2021), there is a growing recognition of the need to evaluate data-driven approaches contribute to civic applications.
these systems in practical environments. Real-world evalua- Despite progress in SVS research, comprehensive testbed
tion ensures that this technology can adapt to dynamic urban support for evaluating real-world system performance re-
settings and meet diverse community needs. mains limited. Our study addresses this gap by deploying a
C0 AI Module N0 Server Cloud
Module Module
Object Pose Anomaly
Tracker
Detector Estimator Detector
Global App
Tracker Low-latency
DB Development
Crop Selection Feature Extractor
Authenti-
...
...
Messaging
cation Service
Cn AI Module Nn Database
Statistical
Analysis
Crop Selection Feature Extractor
Figure 2: End-to-end detailed system. C0 represents the camera ID and for each camera, one AI Module N0 including multi-
AI-vision-models pipeline is assigned with. All the AI Modules send processed data to one Server Module database, and Server
Module will re-identify the human track ID based on the feature extractor data. The statistical analyzer analyzes all the data in
the database stored across all the cameras and communicates the results with Cloud Module. Cloud-native services are utilized
to host the end users’ applications.
Indoor
Outdoor
objects and flags anomalies (Pazho et al. 2023a). A track-
ing algorithm then creates tracklets, and pose estimator ex-
7 tracts 2D skeletal data for individuals. Object anomaly de-
tection and behavioral anomaly detection (Noghre, Pazho,
and Tabkhi 2024; Pazho et al. 2024) are performed within
this module respectively, with alerts sent to end-user devices
3 for real-time response (Pazho et al. 2023b; Yao et al. 2024).
In our feature, YOLOv8 (Jocher, Chaurasia, and Qiu
2023), Bytetrack (Zhang et al. 2022), HRNet (Sun et al.
4 2019), GEPC (Markovitz et al. 2020a) and OSNet (Zhou
2 et al. 2019) are used for further evaluation. The detailed
decision-making process is provided in the supplementary
materials for replication and further investigation.
Server Module centralizes metadata and historical data
Figure 3: Locations of Cameras on the Campus. A maximum from AI modules, stored in a MySQL database as outlined
of 16 cameras are used in Table 1. It handles global tracking and statistical analy-
sis, using cosine similarity to re-identify individuals across
state-of-the-art SVS in a community college, demonstrating cameras to analyze patterns (Pazho et al. 2023a). The design
its capabilities and potential for enhancing civic applications prioritizes privacy by avoiding raw data transmission to the
in public settings. Cloud Module.
Cloud Module leverages cloud-native services for data
Software System Features storage, management, and API generation (Dahunsi, Idogun,
The AI-based real-time video solution seamlessly integrates and Olawumi 2021). It minimizes lag between anomaly de-
with existing CCTV infrastructures, creating a Physical- tection and notification, using a rule-based messaging ser-
Cyber-Physical (PCP) system that delivers actionable infor- vice to send real-time alerts via email, text, or app notifica-
mation to end users. Such system consists of four key com- tions. A low-latency database supports real-time data access,
ponents: AI Modules, Server Module, Cloud Module, and while APIs generated through an application development
End-user Devices, as illustrated in Figure 2. The design of kit optimize data retrieval (Mohammed et al. 2022; Ardabili
the AI and server modules was inspired by work (Pazho et al. et al. 2023b).
2023a), while the cloud module and user device components
End User Devices is designed to promptly notify users of
were adapted from another study (Ardabili et al. 2023b).
detected anomalies via a smartphone application (Ardabili
AI Module/Modules is a multi-stage pipeline optimized et al. 2023a). The app provides real-time data and analy-
for real-time computer vision tasks. It processes image data sis, ensuring consistent functionality across different devices
in batches of 30 frames, where an object detector identifies and operating systems, thereby enhancing accessibility and
Table 1: Example visualization of the database at Server Module
800 800
G_ID 563
700 700 G_ID 581
600 600 G_ID 598
BirdsEye_Y
G_ID 620
500 500
G_ID 622
400 400
Y
G_ID 623
300 300 G_ID 626
G_ID 633
200 200
G_ID 634
100 100
Camera 1 Camera 2
0 0
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
X BirdsEye_X
(a) (b)
BirdsEye_Y
BirdsEye_Y
BirdsEye_Y
600 600 600
600
0.3 0.3 0.3 0.3
400 400 400 400
0.2 0.2 0.2 0.2
200 0.1 200 0.1 200 0.1 200 0.1
0 0 0 0 0 0 0 0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
BirdsEye_X BirdsEye_X BirdsEye_X BirdsEye_X
(a) Camera 2- Weekday (b) Camera 2- Weekend (c) Camera 7- Weekday (d) Camera 7- Weekend
Figure 6: Heatmap representation of the two cameras during weekday and weekends. Graph (a) and (c) represent the heatmaps
during weekdays, while graph (b) and (d) show congestion of the same areas during weekends.
culates such by comparing current detections with histor- statement, ”The presence of video surveillance makes me
ical data, flagging events that exceed two standard devia- feel safer.” Furthermore, respondents found our AI-driven
tions from the mean. In our analysis, zero detections were video solution more beneficial, with over 50% rating it as
excluded to avoid skewing the data. very or extremely beneficial, compared to current passive
technologies, which 42% rated as not effective or only mod-
Bird’s Eye View provides an accurate spatial representa-
erately effective. Our solution also raised fewer privacy con-
tion, ideal for crowd management and area planning. The
cerns, with 66% of respondents reporting no privacy con-
system normalizes object dimensions and uses a scale fac-
cerns or only moderate concerns. These findings suggest a
tor to accurately position objects within this view. Figure 5
strong preference for AI-driven surveillance solutions over
compares the original and processed Bird’s Eye View from
traditional passive technologies, especially in public places.
one camera, showing how this technique refines and clarifies
spatial data.
System Evaluation and Results
Heat map visually represent data distribution, useful for Load Stress Evaluation
monitoring congestion and spatial usage. The system gen-
erates 2D heatmaps from Bird’s Eye View coordinates, ap- This section evaluated the system’s performance under in-
plying Gaussian smoothing for clarity. Figure 6 presents creasing input loads by varying crowd densities from videos.
heatmaps for different 24-hour intervals. Graph (a) and (c) Key metrics such as average latency and throughput were
show weekday pedestrian movement, while graph (b) and measured as we scaled the system using different numbers
(d) represent the weekend congestions at the same areas. of pipeline nodes—one, four, eight, and twelve—to assess
Comparing the density pattern during weekday and week- scalability and adaptability under increased workloads.
end, reflect differences in area usage. To rigorously assess the system’s performance under
The detailed algorithms for all visualization tools are varying crowd densities and parallel node operations, we
available in the supplementary materials for replication and prepared ten videos with density levels ranging from 0 to
further investigation. 9 from CHAD dataset (Pazho et al. 2022). Each video lasted
150 seconds and operated at a frame rate of 30 frames per
Community Engagement second. To ensure the statistical validity of our results, we
averaged metrics over the central 100 batches in each run,
To gauge public perception of such solution, we conducted excluding the initial 25 batches for system warm-up and the
seven rounds of survey studies in six public places across last 25 batches for cool-down effects.
Charlotte, NC, during July and August 2023. A total of 410 Figure 8 graphically presents the trends for throughput
respondents participated, providing insights into their con- and latency. The X-axis, labeled ”Density” (count), rep-
cerns about safety and surveillance. resents the average number of humans detected per batch
The results indicated that respondents were most con- (comprising 30 frames) for each input node number in the
cerned about safety in parking lots (54%), public transit experiment. With one and four camera nodes, our system
(44%), and entertainment venues (42%). Additionally, sig- maintained latency under 10 seconds, showcasing stability
nificant concerns were raised about potential biases and regardless of scene crowdness. However, with eight nodes,
discrimination in current passive surveillance technologies, latency increased beyond 20 seconds at higher density lev-
with these concerns being important or very important to els, particularly from level 5 onwards. A notable decrease
respondents across all demographic groups, highlighting a in latency was observed with 12 nodes, especially at density
widespread sensitivity to this issue. 1 levels 8 and 9, correlating with processing 108 individuals
Interestingly, vulnerable groups, such as females and peo- (12 nodes x 9 density).
ple of color, were statistically more likely to agree with the Throughput consistently declined linearly with increas-
1
This study has been previously published, but in adherence to ing density and node count, reaching a low of 4.56 FPS at
the double-blind review policy of the AAAI conference, the refer- the highest density level 9 with 12 nodes. The performance
ence will be provided in the camera-ready version upon acceptance. decrease is primarily due to the computational demands of
W e d T h u T h u T h u T h u F r i F r i F r i S a t S a t S a t S u n S u n S u n M o n M o n M o n T u e T u e W e d W e d
L a te n c y
1 0 .5
L a te n c y (s)
8 .5
6 .5
4 .5
2 .5
T h r o u g h p u t
T h r o u g h p u t (F P S )
3 0 .5
2 6 .5
2 2 .5
1 8 .5
1 4 .5
D e n s ity
D e n s ity (c o u n t)
8 .0 0
6 .0 0
4 .0 0
2 .0 0
0 .0 0
1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0 1 6 :0 0 0 0 :0 0 0 8 :0 0
T im e
Figure 7: Latency and throughput trends concerning crowd densities during a week-long period for 8 camera nodes
50 4 Nodes 4 Nodes
50 hensive coverage.
8 Nodes 8 Nodes
Latency (s)
40 12 Nodes 40 12 Nodes
30 30
20 20
10 10 As shown in Figure 7, the system’s performance over a
0 0 week, running with eight cameras, demonstrated consistent
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
latency and throughput results about crowd density. Specifi-
Density (count) Density (count)
cally, the latency ranged from 2.5-7.8s, and throughput var-
(a) Latency Trends (b) Throughput Trends ied from 22.3-29.8 FPS. These findings confirm that irre-
spective of crowd density, the latency increased. At the same
Figure 8: Latency and Throughput trends with respect to time, the FPS decreased, aligning with expectations and un-
crowd densities across different nodes number running in derscoring that the length of the videos does not significantly
parallelism. impact the system’s performance.
8 10 12 14 16 18 20 22 24 26 28 30 Conclusion
Latency (s) This paper presented a comprehensive evaluation of an
(b) Behavior Anomaly AI-based real-time video solution, integrated with exist-
ing CCTV infrastructures, to enhance situational awareness
Figure 9: Represent latency rug plot with different anomaly through object and behavioral anomaly detection. The sys-
scenarios. 50 data points for each camera number tem was tested across various configurations, including up to
sixteen camera nodes, to assess its performance under real-
Physical-Cyber-Physical Evaluation (Anomaly world conditions.
detection) Our evaluation highlighted the system’s ability to main-
tain operational efficiency with increasing node counts, de-
This section evaluates the system’s Physical-Cyber-Physical spite a corresponding rise in PCP latency. Object and be-
(PCP) latency, representing the end-to-end time from when havioral anomaly detection latencies ranged from 4.7 sec-
an anomaly appears to when the end user receives a notifi- onds with four cameras to 26.76 seconds with sixteen, re-
cation. We assessed both object and behavioral anomaly de- flecting the system’s scalability and robustness. Through ex-
tection (Markovitz et al. 2020b) using four, eight, twelve, tensive testing, the system demonstrated its capability to de-
and sixteen AI pipeline nodes reading input frames from liver timely notifications to end users, crucial for managing
the same camera node, conducting twenty experiments per anomalies in public spaces. This work underscores the im-
anomaly type and collecting 400 data samples. The latency portance of real-world evaluations in optimizing AI-driven
measures the time from detecting anomalies, such as high- video solutions, ensuring they meet the demands of dynamic
priority objects or dangerous behaviors like fighting, to alert- environments and contribute to enhanced public safety and
ing the end user. A three-second transmission delay from operational efficiency.
cameras to AI Nodes was observed, slightly higher than
other IP systems, due to our unique network setup. This
delay is included in the overall PCP measurements, though
References Ma, M.; Preum, S. M.; Ahmed, M. Y.; Tärneberg, W.; Hen-
Alshammari, A.; and Rawat, D. 2019. Intelligent multi- dawi, A.; and Stankovic, J. A. 2019. Data Sets, Model-
camera video surveillance system for smart city applica- ing, and Decision Making in Smart Cities: A Survey. ACM
tions. In 2019 IEEE 9th Annual Computing and Commu- Trans. Cyber-Phys. Syst., 4(2).
nication Workshop and Conference (CCWC), 0317–0323. Mahdavinejad, M. S.; Rezvan, M.; Barekatain, M.; Adibi,
IEEE. P.; Barnaghi, P.; and Sheth, A. P. 2018. Machine learning
Angelidou, M.; Psaltoglou, A.; Komninos, N.; Kakderi, C.; for Internet of Things data analysis: A survey. Digital Com-
Tsarchopoulos, P.; and Panori, A. 2018. Enhancing sus- munications and Networks, 4(3): 161–175.
tainable urban development through smart city applications. Markovitz, A.; Sharir, G.; Friedman, I.; Zelnik-Manor, L.;
Journal of science and technology policy management, 9(2): and Avidan, S. 2020a. Graph embedded pose clustering
146–169. for anomaly detection. In Proceedings of the IEEE/CVF
Ardabili, B.; Pazho, A.; Noghre, G.; Neff, C.; Conference on Computer Vision and Pattern Recognition,
Bhaskararayuni, S.; Ravindran, A.; Reid, S.; and Tabkhi, 10539–10547.
H. 2023a. Understanding Policy and Technical Aspects of Markovitz, A.; Sharir, G.; Friedman, I.; Zelnik-Manor, L.;
AI-Enabled Smart Video Surveillance to Address Public and Avidan, S. 2020b. Graph embedded pose clustering
Safety. arXiv preprint, arXiv:2302.04310. for anomaly detection. In Proceedings of the IEEE/CVF
Ardabili, B. R.; Pazho, A. D.; Noghre, G. A.; Neff, C.; Conference on Computer Vision and Pattern Recognition,
Bhaskararayuni, S. D.; Ravindran, A.; Reid, S.; and Tabkhi, 10539–10547.
H. 2023b. Understanding Policy and Technical Aspects Mohammed, S.; Fiaidhi, J.; Sawyer, D.; and Lamouchie, M.
of AI-Enabled Smart Video Surveillance to Address Public 2022. Developing a GraphQL SOAP Conversational Mi-
Safety. Computational Urban Science, 3(1): 21. cro Frontends for the Problem Oriented Medical Record
Ashby, M. P. 2017. The value of CCTV surveillance cameras (QL4POMR). In Proceedings of the 6th International Con-
as an investigative tool: An empirical analysis. European ference on Medical and Health Informatics, 52–60.
Journal on Criminal Policy and Research, 23(3): 441–459. Noghre, G. A.; Pazho, A. D.; Katariya, V.; and Tabkhi,
Dahunsi, F.; Idogun, J.; and Olawumi, A. 2021. Commercial H. 2023. Understanding the challenges and opportuni-
cloud services for a robust mobile application backend data ties of pose-based anomaly detection. arXiv preprint
storage. Indonesian Journal of Computing, Engineering and arXiv:2303.05463.
Design (IJoCED), 3(1): 31–45. Noghre, G. A.; Pazho, A. D.; and Tabkhi, H. 2024. An ex-
Dechouniotis, D.; Athanasopoulos, N.; Leivadeas, A.; Mit- ploratory study on human-centric video anomaly detection
ton, N.; Jungers, R.; and Papavassiliou, S. 2020. Edge through variational autoencoders and trajectory prediction.
computing resource allocation for dynamic networks: The In Proceedings of the IEEE/CVF Winter Conference on Ap-
DRUID-NET vision and perspective. Sensors, 20(8): 2191. plications of Computer Vision, 995–1004.
Franklin, R.; and Dabbagol, V. 2020. Anomaly detection in Pazho, A.; Noghre, G.; Ardabili, B.; Neff, C.; and Tabkhi,
videos for video surveillance applications using neural net- H. 2022. Chad: Charlotte anomaly dataset. arXiv preprint,
works. In 2020 Fourth International Conference on Inven- arXiv:2212.09258.
tive Systems and Control (ICISC), 632–637. IEEE. Pazho, A. D.; Neff, C.; Noghre, G. A.; Ardabili, B. R.; Yao,
Gandapur, M. 2022. E2E-VSDL: End-to-end video S.; Baharani, M.; and Tabkhi, H. 2023a. Ancilia: Scalable
surveillance-based deep learning model to detect and pre- intelligent video surveillance for the artificial intelligence of
vent criminal activities. Image and Vision Computing, 123: things. IEEE Internet of Things Journal.
104467. Pazho, A. D.; Noghre, G. A.; Katariya, V.; and Tabkhi,
Hosseini, S. S.; Ardabili, B. R.; Azarbayjani, M.; Pu- H. 2024. VT-Former: An Exploratory Study on Vehi-
lugurtha, S.; and Tabkhi, H. 2024. Understanding the Tran- cle Trajectory Prediction for Highway Surveillance through
sit Gap: A Comparative Study of On-Demand Bus Ser- Graph Isomorphism and Transformer. In Proceedings of
vices and Urban Climate Resilience in South End, Char- the IEEE/CVF Conference on Computer Vision and Pattern
lotte, NC and Avondale, Chattanooga, TN. arXiv preprint Recognition, 5651–5662.
arXiv:2403.14671. Pazho, A. D.; Noghre, G. A.; Purkayastha, A. A.; Vempati,
Huang, C.; Wu, Z.; Wen, J.; Xu, Y.; Jiang, Q.; and Wang, J.; Martin, O.; and Tabkhi, H. 2023b. A survey of graph-
Y. 2021. Abnormal event detection using deep contrastive based deep learning for anomaly detection in distributed sys-
learning for intelligent video surveillance system. IEEE tems. IEEE Transactions on Knowledge and Data Engineer-
Transactions on Industrial Informatics, 18(8): 5171–5179. ing, 36(1): 1–20.
Javed, A. R.; Shahzad, F.; ur Rehman, S.; Zikria, Y. B.; Raz- Pramanik, A.; Sarkar, S.; and Maiti, J. 2021. A real-time
zak, I.; Jalil, Z.; and Xu, G. 2022. Future smart cities: video surveillance system for traffic pre-events detection.
Requirements, emerging technologies, applications, chal- Accident Analysis and Prevention, 154: 106019.
lenges, and future aspects. Cities, 129: 103794. Rashvand, N.; Hosseini, S. S.; Azarbayjani, M.; and Tabkhi,
Jocher, G.; Chaurasia, A.; and Qiu, J. 2023. YOLO by H. 2023. Real-Time Bus Arrival Prediction: A Deep Learn-
Ultralytics (Version 8.0.0). https://github.com/ultralytics/ ing Approach for Enhanced Urban Mobility. arXiv preprint
ultralytics. arXiv:2303.15495.
S., J.; C., S.; E., Y.; and GP., J. 2021. Real time object detec-
tion and trackingsystem for video surveillance system. Mul-
timedia Tools and Applications, 80: 3981–96.
Singh, R.; Srivastava, H.; Gautam, H.; Shukla, R.; and
Dwivedi, R. 2023. An Intelligent Video Surveillance Sys-
tem using Edge Computing based Deep Learning Model.
In 2023 International Conference on Intelligent Data Com-
munication Technologies and Internet of Things (IDCIoT),
439–444. IEEE.
Sun, K.; Xiao, B.; Liu, D.; and Wang, J. 2019. Deep high-
resolution representation learning for human pose estima-
tion. In CVPR.
Usmani, U. A.; Watada, J.; Jaafar, J.; Aziz, I. A.; and Roy,
A. 2023. A Deep Learning Algorithm to Monitor Social
Distancing in Real-Time Videos: A Covid-19 Solution. In
Interpretable Cognitive Internet of Things for Healthcare,
73–90. Springer.
Xu, R.; Nikouei, S.; Chen, Y.; Polunchenko, A.; Song, S.;
Deng, C.; and Faughnan, T. 2018. Real-time human ob-
jects tracking for smart surveillance at the edge. In 2018
IEEE international conference on communications (ICC),
1–6. IEEE.
Yao, S.; Noghre, G. A.; Pazho, A. D.; and Tabkhi, H. 2024.
Evaluating the Effectiveness of Video Anomaly Detection
in the Wild: Online Learning and Inference for Real-world
Deployment. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR) Work-
shops, 4832–4841.
Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; and
Zorzi, M. 2014. Internet of things for smart cities. IEEE
Internet of Things journal, 1(1): 22–32.
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.;
Luo, P.; Liu, W.; and Wang, X. 2022. Bytetrack: Multi-
object tracking by associating every detection box. In Com-
puter Vision–ECCV 2022: 17th European Conference, Tel
Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII,
1–21. Cham: Springer Nature Switzerland.
Zhou, K.; Yang, Y.; Cavallaro, A.; and Xiang, T. 2019.
Omni-scale feature learning for person re-identification. In
Proceedings of the IEEE/CVF international conference on
computer vision, 3702–3712.