Noname manuscript No.
(will be inserted by the editor)
  Real-time social distancing detector using SocialdistancingNet-
  19 deep learning network
  Rinkal Keniya · Ninad Mehendale
  Received: date / Accepted: date
  Abstract With no doubt, the COVID-19 pandemic              For the first three days, the infection is the most in-
  has put the world to a halt. The world we lived in a       fectious. Many typical symptoms include nausea, dry
  few months prior is completely different than what it      cough, and fatigue. Severe and harmful human conse-
  is now. The virus is spreading quickly and is a dan-       quences have contributed to a worldwide halt. Many
  ger to the human race. Seeing the necessity of the hour    such signs may include sore throat and headache. It
  one must always take certain precautions of which one      takes a fortnight for a person with mild symptoms to get
  being social distancing. Maintaining social distancing     healed. The duration of recovery for individuals with se-
  during COVID-19 is a must to ensure a slowdown in          vere symptoms depends on the extent, along with an in-
  the growth rate of new cases. Our manuscript focuses       dividual’s immune capability. The main diagnostic ap-
  on detecting if the people around are maintaining social   proach is from a nasopharyngeal swab by a real-time
  distancing or not. Using our own self developed model      reverse transcription-polymerase chain reaction (RRT-
  named SocialdistancingNet-19 for detecting the frame       PCR). Chest CT imaging is also useful for the diagno-
  of a person and displaying labels, they are marked as      sis of people with an elevated probability of infection
  safe or unsafe if the distance is less than a certain value.
                                                             based on signs and risk factors. Seeing the devastating
  This system can be used for monitoring people via video    spread of the disease, the World Health Organization
  surveillance in CCTV. Our model achieved an accuracy       (WHO) suggested favoring the term social distancing.
  of 92.8 %.                                                 To slow down the rate of spread of the disease it is
                                                             necessary to maintain physical distance. Maintaining
  Keywords Social distancing · Object detection ·
                                                             a distance of two meters between two individuals is a
  COVID
                                                             must to remain safe and get back to the world we lived
                                                             a few months back. After the COVID-19 pandemic, the
  1 Introduction                                             CDC changed the concept of social distancing as keep-
                                                             ing out of congregate environments, preventing public
  Coronavirus is an infectious disease caused by the corona meetings, and preserving, when appropriate, a gap of
  virus-2 extreme acute respiratory syndrome. The dis-       around six feet or two meters from everyone. Recent
  ease was first detected in Wuhan, China in December,       findings have shown that droplets from a sneeze or a
  which has contributed to a spread across the world.        deep breath will fly more than six meters during ex-
  When in close contact, the virus spreads mainly be-        ercise. And hence maintaining the norm of social dis-
  tween individuals, including by tiny droplets formed       tancing is a necessity and also in our benefit to live a
  when sneezing or coughing. Droplets falling on the ground safer and healthier life. Our work proposes to determine
  will pass through the air through the body of a human.     whether or not an individual is following the rule of so-
                                                             cial distancing. The findings are verified using both a
  * Corresponding author                                     live stream as well as a video feed. By measuring the gap
  N. Mehendale
  B-412, K. J. Somaiya College of Engineering, Mumbai, India of two frames of people from the centroids, we can un-
  Tel.: +91-9820805405                                       derstand whether or not a person is maintaining social
  E-mail: ninad@somaiya.edu                                  distancing. Also, they are labelled as safe and unsafe.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  2                                                                                           Rinkal Keniya, Ninad Mehendale
  Fig. 1 A video stream or an image is fed as an input to our self developed model named SocialdistancingNet-19. The people
  are detected as maintaining social distancing or not depending on the distance maintained between two individuals. They are
  marked in frames of different colours and also labels are marked for each of them.
  Fig. 2 The training of the model is first carried out by loading the dataset into the model and then trained. Later, the model
  is loaded and then objects are detected in the image and video stream. Further depending on the distance frames are marked
  on the people along with labels indicating the marking as maintaining or violating social distancing.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  Real-time social distancing detector using SocialdistancingNet-19 deep learning network                                       3
  Fig. 3 SocialdistancingNet-19 has an architecture of 19 layers. The network is fed with an input image. Then it is further
  passed through a convolution, batch normalization and ReLU (Rectification Linear Unit) layers. After that it is passed through
  a single max pooling layer, two convolution layers, two batch normalization layers, two ReLU layers and a single addition layer.
  It is then passed through single convolution, batch normalization and ReLU (Rectification Linear Unit) layers. And at the end
  was finally passed through a fully connected and a softmax layer. And then we received the classification output.
                                                     Model                       Accuracy (%)
                                  Yadav et al. [1]                               91
                                  Sener et al. [2]                               93.3
                                  Liu et al. [3] (SSD300)                        74.3
                                  Liu et al. [3] (SSD512)                        76.8
                                  ResNet-50                                      86.5
                                  ResNet-18                                      85.3
                                  SocialdistancingNet-19 (Proposed method)       92.8
  Table 1 Comparison of the accuracy values of the different methodologies. The SocialdistancingNet-19 model gave the highest
  accuracy as compared to the other models.
  2 Literature review                                               the detection of a social distance violation by individu-
                                                                    als was detected continuously in threshold time, there
                                                                    rings an alarm that instructs people to maintain social
  Various research work has been carried out on social
                                                                    distance and a critical alert is sent to the control cen-
  distancing using different techniques. Yadav et al. [1]
                                                                    ter of the State Police Headquarters for further action.
  proposed a system that used raspberry pi4 with a cam-
                                                                    They achieved an accuracy of 91 %. Singh Punn et al.
  era to automatically track public spaces in real-time
                                                                    [4] proposed a real-time based deep learning to moni-
  to prevent the spread of Covid-19. The trained model
                                                                    tor social distancing using object detection and track-
  with the custom data set was installed in the raspberry
                                                                    ing approaches. The number of violations was given by
  pi4, and the camera was attached to it. The camera is
                                                                    computing the number of groups formed and the vio-
  fed with real-time videos of public places to the model
                                                                    lation index term computed as the ratio of the number
  in the raspberry pi4, which continuously and automat-
                                                                    of people to the number of groups. Different object de-
  ically monitors public places and detects whether peo-
                                                                    tection models were used like Faster RCNN, SSD, and
  ple keep safe social distances and also checks whether
                                                                    YOLO v3, where YOLO v3 with balanced performance
  or not those people wear masks. Their method operates
                                                                    of FPS and mAP score. An AI monocular camera-based
  in two stages: first, when a person identified without a
                                                                    real-time system to monitor social distancing was pro-
  mask his photo was taken and sent to a control cen-
                                                                    posed by Yang et al. [5]. The proposed method uses a
  ter at the State Police Headquarters; and second, when
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  4                                                                                             Rinkal Keniya, Ninad Mehendale
  Fig. 4 Results when the input video and images were given to the model. In (a) and (b), the people were detected as
  maintaining social distancing or not depending on the distance maintained between two individuals. They were marked in
  frames of different colours. Green colour is marked for violating social distancing and labelled as unsafe. The purple frame was
  marked for those maintaining social distancing and labelled as safe.In (c) and (d), the people are detected and frames were
  marked as per the distance between two individuals. Along with this the number of violations was also counted.
  critical social density to avoid overcrowding by modu-            ing process of multiple instances. Experimental find-
  lating inflow to the region of interest. The method was           ings on two benchmark datasets validate that the use
  verified using 3 different pedestrian crowd datasets. But         of two-person visual descriptors along with multiple-
  there were some missing detections in the train station           instance spatial learning provides an efficient way to
  dataset, as in some areas the density of pedestrians is           infer the form of interaction. They achieved an accu-
  very high and occlusion happens. However, after some              racy of 93.3 %. Bielecki et al. [6] did a study of 508 male
  analysis, they concluded that the maximum pedestrians             soldiers with average age of 21years. They followed the
  were captured and the idea of social density is valid. In         number of soldiers into two groups. For the 354 sol-
  the proposed method by Sener et al. [2] the motion                diers affected before social distancing was introduced,
  of the communicating people was extracted from each               COVID-19 caused 30 % to become sick. While no sol-
  region of the detected individual. Then, visual descrip-          dier in a population of 154, in which infections occurred
  tors for two persons are created. As the relative spatial         after social distancing had been introduced. An innova-
  positions of communicating people are likely to com-              tive localization method was proposed to by Nadikattu
  plement the visual descriptors, we propose to use em-             et al. [7] to track humans’ positions in the surround-
  bedding of spatial multiple instances, which implicitly           ing based on sensors. This AI smart device is not only
  integrates the distances between people into the learn-           handy for maintaining social distancing but also detects
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  Real-time social distancing detector using SocialdistancingNet-19 deep learning network                                  5
  symptoms of COVID in and person if any. The system              spread in India. The model is an age-structured com-
  will warn the user if anyone is near him within the vi-         partment based approach to explore different modes of
  tal six-foot radius. Ghorai et al. [8] proposed a deep          disease propagation, greatly extending from the tradi-
  learning solution that would alert the person as soon           tional SEIR approach. The model was adapted for India
  as on violates social distancing. A video stream is cap-        using the correct population ladder, matrices for touch
  tured from the CCTV camera and with the PoseNet                 levels, external arrivals. They also specifically moni-
  model the people are detected and then kept a rack of           tored the results of models like touch recording, seg-
  the number of people present in the video stream. If            regation of COVID-positive patients, quarantining, use
  the distance between 2 frames of people is less then the        of masks, better grooming procedures, social distancing
  authorities in-charge are alerted. Using deep learning          by and touch levels in various places of home, college,
  techniques a drone was proposed by Ramadass et al.              school, and other locations. Results of the simulation
  [9] for inspection of social distancing and also to check       suggest that any non-trivial number of pathogens will
  if a person is wearing a mask or not. In the camera             be left even after a prolonged lockout and the pandemic
  of the drone is installed the qualified yolov3 algorithm        will resurface. Liu et al. [3, ?] presented a method for
  with the custom data collection. The drone camera runs          detecting objects in images using a single deep neural
  the yolov3 algorithm and determines whether or not              network. The model named single shot multibox de-
  social space is preserved and whether the individuals           tector (SSD), discretized the output space of bounding
  wearing masks are in the crowd. The drone is made fit           boxes into a set of default boxes over different aspect
  to operate automatically. Reluga et al. [10] proposed a         ratios and scales per feature map location. At predic-
  differential-game for determining whether persons dur-          tion time, the network generates scores for the presence
  ing an outbreak can use social distancing and associated        of each object category in each default box and pro-
  self-protective behaviors. The differential game is used        duces adjustments to the box to better match the ob-
  as a mitigating tool to research the possible utility of        ject shape. The results on the PASCAL VOC, COCO,
  social distancing by measuring the equilibrium actions          and ILSVRC datasets showed that SSD has competi-
  under several cost functions. Following outbreak detec-         tive accuracy to methods that utilize an additional ob-
  tion, computational techniques are used to measure the          ject proposal step and is much faster, while providing a
  cumulative expense of an infection under equilibrium            unified framework for both training and inference. The
  practices as a result of the period until mass vaccina-         accuracy for SSD300 was 74.3 % and for SSD512 was
  tion. The main parameters in the study are the specific         76.8 %.
  number of reproductions and the underlying efficacy of
  social distancing. To slow the spread of the COVID-19
                                                                  3 Methodology
  virus via airborne transmission, a ”social distancing”
  approach of around 1.83 m (6 feet) was recommended
                                                                  We loaded 295 images from the dataset, where each
  in the proposed method by Feng et al. [11]. It was also
                                                                  image had single or multiple labels inside it which were
  found that the wind effect on droplet transport and de-
                                                                  used for training the model. Further, more images and
  position is dynamic and highly dependent and localized
                                                                  labels were generated using an auxiliary dataset. The
  on the wake flow patterns. Secondary flow intensities
                                                                  auxiliary dataset is a variation of the images in terms of
  between the two simulated beings, and calm currents.
                                                                  rotation(+5,-5), scaling (0.95 to 1), and cropping(0.95
  High RH=99.5 % leads to higher deposition fractions on
                                                                  to 1). The dataset was then stored into two different
  both human bodies and the ground, which is not neces-
                                                                  columns. First, the image file path and the second is the
  sarily related to higher exposure risks. High RH=99.5
                                                                  corresponding label. Later the dataset is split into train-
  % can enhance the condensation effect, and the cough
                                                                  ing and testing for validation and 60 % of the dataset
  droplet sizes keep growing during their transport in the
                                                                  is selected for training, 10 % for validation, and the re-
  air until the partial pressure at the droplet surface is
                                                                  maining 30 % for testing of trained detectors. We used
  equal to the saturation pressure of water vapor. In con-
                                                                  the SocialdistancingNet-19 architecture for the train-
  trast, RH=40 % triggers the evaporation of the water
                                                                  ing purpose. Box labels were used to create the data
  in cough droplets, thereby leading to droplet size reduc-
                                                                  for training and evaluation purposes. A rectangular box
  tion, which may lead to a long time suspended in the
                                                                  was used to mark the object. This network comprises
  air. High RH=99.5 % results in higher percentages of
                                                                  of 2 subnetworks- feature extraction and feature de-
  deposition on both human bodies and the environment,
                                                                  tection. The feature extraction was carried out by a
  which are not generally correlated with a higher risk
                                                                  pre-trained convolutional neural network (CNN) model.
  of radiation. Venkateswaran et al. [12] proposed a Sys-
                                                                  We also used a reduced ResNet-50, MobileNet-V2 and
  tem Dynamics (SD) model of the Covid-19 pandemic
                                                                  ResNet-18 network. The detection of sub-networks of
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  6                                                                                      Rinkal Keniya, Ninad Mehendale
  small CNN is compared to feature extraction and is           processed data directly. Activation layer 40 of ReLU
  composed of a few convolutional layers specific to the       (Rectification Linear Unit) is generally selected for the
  YOLO object detection model. The YOLO detection              feature extraction layer and we refresh the activation
  model is similar to the single-stage detector model. This    layer with the detected sub-network. The feature ex-
  algorithm views object recognition as a problem of re-       traction layer outputs the feature maps and down sam-
  gression, taking a given input image or video stream         ples it by the factor of 16. The amount of downsampling
  and concurrently knowing the bounding box coordi-            was good to maintain the tread between the special res-
  nates and the corresponding labels of class probabili-       olution and strength of the extracted feature. This fea-
  ties. YOLO has three tuning parameters, network in-          ture extracted downs to the encoder with a stronger
  put sizes, anchored box, and feature extraction net-         image feature that was used to estimate the cost of
  work. First, the frame is detected. We then compute          the special resolution. Data augmentation was carried
  bounding box coordinates and then derived the cen-           out to improve the accuracy by randomly transform-
  ter of the bounding box. Using the box coordinates the       ing the data while training. Data augmentation added
  top-left coordinates are derived. Afterwhich the frame is    more variety during training. And actually, increases
  pre-processed giving three results which are confidence,     the number of labels in the training data samples. The
  bounding box, and centroids of each person. The eu-          use of transform augmentation during the training al-
  clidean distance is calculated and used to find the dis-     lows random keeping of images. The associated box la-
  tance between centroids. After the comparison of the         bels are also flipped horizontally. Augmentation is not
  distance between the centroids of two individuals, it        performed for the validation and test data and hence
  is compared with the minimum distance in terms of            evaluation can be carried out unbiasedly since the data
  pixels. The pairs are marked as red or green depend-         is unmodified. NVIDIA GPU- 1660, 1408 Cuda core
  ing on if they have violated social distancing or not.       with 6GB DDR5 RAM and 192 bits memory bus was
  The user specifies the input size and number of classes      used to train the network.
  while choosing a network. With the minimum size for a
  network, the size of the training image and the com-
  putational cost was optimized. We tried to find the          4 Results and discussion
  best model as per input size and set of training im-
  ages and optimize it to handle larger data sets than         The accuracy of developed model SocialdistancingNet-
  the current dataset. SocialdistancingNet-19 has an ar-       19 was 92.8 %. The accuracy of the ResNet-50 network
  chitecture of 19 layers. The network is fed with an in-      was 86.5 %. For ResNet-18 the accuracy was 85.3 %.
  put image of dimension 224x224x3. Then it is further         We tested our model using a video stream and images.
  passed through a convolution, batch normalization and        Of which, we could see the proper detection of people
  ReLU (Rectification Linear Unit) layer each of dimen-        according to the distance between a pair. The frames
  sion 112x112x64. After that it is passed through a single    were also labelled as safe and unsafe accordingly. Also,
  max pooling layer, two convolution layers, two batch         the count of the violations made were counted and were
  normalization layers, two ReLU layers and a single ad-       constantly updating. While using the webcam, it is nec-
  dition layer. Each of these layers were of dimension         essary to have people moving continuously else the de-
  56x56x64. Further it was passed through single con-          tection goes incorrect. This could happen due to the
  volution, batch normalization and ReLU (Rectification        detection method, wherein the entire frame is detected,
  Linear Unit) layers, each of dimension 56x56x32. Then        and further, the distance calculation and comparison
  it was passed through a global average pooling layer of      between the centroids takes place. The results obtained
  dimension 1x1x32. And at the end was finally passed          by the model are displayed in fig 4. The purple and
  through a fully connected and a softmax layer each           green coloured images displayed along with the labels
  of dimension 1x1x10. And then we received the clas-          indicate if the person is maintaining social distancing
  sification output. The reduced computational cost was        or not. The table 1 shows the comparison with differ-
  having 224x224x3 which was the bare minimum size             ent models tested and found in the reviews and their
  required to run any network. Image resizing was the          respective accuracies. The maximum accuracy was 93.3
  only pre-processing operation required before training.      % and 74.3 % was the minimum accuracy.
  Then, the estimated anchor boxes were used for ob-
  ject training to account for resizing before the training.
                                                               5 Conclusions
  Also, the estimated anchor resizes. This was done to
  transform the process with the number of anchor boxes
                                                               Our work distinguishes the social distancing pattern
  estimated in the resized images. And later stored in the
                                                               and classifies them as a violation of social distancing or
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311
  Real-time social distancing detector using SocialdistancingNet-19 deep learning network                                      7
  maintaining the social distancing norm. Additionally,             4. N. Singh Punn, S.K. Sonbhadra, S. Agarwal, Monitor-
  it also displays labels as per the object detection. The             ing covid-19 social distancing with person detection and
                                                                       tracking via fine-tuned yolo v3 and deepsort techniques,
  classifier was then implemented for live video streams
                                                                       arXiv pp. arXiv–2005 (2020)
  and images also. This system can be used in CCTV for              5. D. Yang, E. Yurtsever, V. Renganathan, K.A. Redmill,
  surveillance of people during pandemics. Mass screen-                U. Ozgüner, A vision-based social distancing and critical
  ing is possible and hence can be used in crowded places              density detection system for covid-19, arXiv e-prints pp.
                                                                       arXiv–2007 (2020)
  like railway stations, bus stops, markets, streets, mall
                                                                    6. M. Bielecki, R. Züst, D. Siegrist, D. Meyerhofer, G.A.G.
  entrances, schools, colleges, etc. By monitoring the dis-            Crameri, Z.G. Stanga, A. Stettbacher, T.W. Buehrer,
  tance between two individuals, we can make sure that                 J.W. Deuel, Social distancing alters the clinical course
  an individual is maintaining social distancing in the                of covid-19 in young adults: A comparative cohort study,
                                                                       Clinical Infectious Diseases (2020)
  right way which will enable us to curb the virus.                 7. R.R. Nadikattu, S.M. Mohammad, P. Whig, Novel eco-
                                                                       nomical social distancing smart device for covid-19, In-
                                                                       ternational Journal of Electrical Engineering and Tech-
  6 Acknowledgement                                                    nology (IJEET) (2020)
                                                                    8. A. Ghorai, S. Gawde, D. Kalbande, Digital solution for
  Authors would like to thank all colleagues from COVID                enforcing social distancing, Available at SSRN 3614898
                                                                       (2020)
  research group.                                                   9. L. Ramadass, S. Arunachalam, Z. Sagayasree, Apply-
                                                                       ing deep learning algorithm to maintain social distance
                                                                       in public place through drone technology, International
  Compliance with Ethical Standards                                    Journal of Pervasive Computing and Communications
                                                                       (2020)
  Conflicts of interest                                            10. T.C. Reluga, Game theory of social distancing in re-
                                                                       sponse to an epidemic, PLoS Comput Biol 6(5), e1000793
                                                                       (2010)
  Authors R. Keniya, and N. Mehendale, declare that he             11. Y. Feng, T. Marchal, T. Sperry, H. Yi, Influence of wind
  has no conflict of interest.                                         and relative humidity on the social distancing effective-
                                                                       ness to prevent covid-19 airborne transmission: A numer-
                                                                       ical study, Journal of aerosol science p. 105585 (2020)
  Involvement of human participant and animals                     12. J. Venkateswaran, O. Damani, Effectiveness of testing,
                                                                       tracing, social distancing and hygiene in tackling covid-
                                                                       19 in india: A system dynamics model, arXiv preprint
  This article does not contain any studies with animals               arXiv:2004.08859 (2020)
  or Humans performed by any of the authors. All the
  necessary permissions were obtained from the Institute
  Ethical Committee and concerned authorities.
  Information about informed consent
  No informed consent was required as the studies does
  not involve any human participant.
  Funding information
  No funding was involved in the present work.
  References
   1. S. Yadav, Deep learning based safe social distancing and
      face mask detection in public areas for covid-19 safety
      guidelines adherence
   2. F. Sener, N. Ikizler-Cinbis, Two-person interaction recog-
      nition via spatial multiple instance embedding, Journal
      of Visual Communication and Image Representation 32,
      63 (2015)
   3. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed,
      C.Y. Fu, A.C. Berg, in European conference on computer
      vision (Springer, 2016), pp. 21–37
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3669311