Accident Detection
Accident Detection
fully edited. Content may change prior to final publication. Citation information: DOI
                                                                             10.1109/ACCESS.2019.2939532, IEEE Access
Date of publication xxxx 00, 0000, date of current version June 19, 2019.
Digital Object Identifier
    ABSTRACT Car accidents cause a large number of deaths and disabilities every day, a certain proportion
    of which result from untimely treatment and secondary accidents. To some extent, automatic car accident
    detection can shorten response time of rescue agencies and vehicles around accidents to improve rescue
    efficiency and traffic safety level. In this paper, we proposed an automatic car accident detection method
    based on Cooperative Vehicle Infrastructure Systems (CVIS) and machine vision. First of all, a novel image
    dataset CAD-CVIS is established to improve accuracy of accident detection based on intelligent roadside
    devices in CVIS. Especially, CAD-CVIS is consisted of various kinds of accident types, weather conditions
    and accident location, which can improve self-adaptability of accident detection methods among different
    traffic situations. Secondly, we develop a deep neural network model YOLO-CA based on CAD-CVIS and
    deep learning algorithms to detect accident. In the model, we utilize Multi-Scale Feature Fusion (MSFF)
    and loss function with dynamic weights to enhance performance of detecting small objects. Finally, our
    experiment study evaluates performance of YOLO-CA for detecting car accidents, and the results show
    that our proposed method can detect car accident in 0.0461 seconds (21.6FPS) with 90.02% average
    precision (AP). In additionally, we compare YOLO-CA with other object detection models, and the results
    demonstrate the comprehensive performance improvement on the accuracy and real-time over other models.
INDEX TERMS Car accident detection, CVIS, machine vision, deep learning
VOLUME 4, 2016 1
                            This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                           10.1109/ACCESS.2019.2939532, IEEE Access
video features in traffic accidents, such as vehicle collision,                                     High Sensitive GPS receiver is considered as the index for
rollover and so on. To some extent, these features can be                                           detecting accidents, and the GSM/GPRS modem is utilized to
used to detect or predict car accidents. Accordingly, some                                          send the location of the accident. [24] presented a prototype
researchers apply the machine vision technology based on                                            system called e-NOTIFY, which monitors the change of ac-
deep-learning into methods of car accident detection. These                                         celeration to detect accident and utilize V2X communication
methods extract and process complex image features instead                                          technologies to report it. To a certain extent, these methods
of single vehicle motion parameter, which improves the                                              can detect and report car accidents in short time, and improve
accuracy of detecting car accidents. However, the datasets                                          the efficiency of car accidents warning. However, the vehi-
of these methods are mostly captured by car cameras or                                              cle running condition before car accidents is complex and
cell phones of pedestrian, which is not suitable for roadside                                       unpredictable, and the accuracy of accident detection only
devices in CVIS. In additionally, the reliability and real-time                                     based on speed and acceleration may be low. In addition, they
performance of these methods need to be improved to meet                                            rely too heavily on vehicular monitoring and communication
the requirements of car accident detection.                                                         equipment, which may be unreliable or damaged in some
   In this paper, we propose a data-driven car accident detec-                                      extreme circumstances, such as heavy canopy, underground
tion method based on CVIS, whose goal is improving effi-                                            tunnel, and serious car accidents.
ciency and accuracy of car accident response. With the goal,
we focus on such a general application scenario when there                                          B. METHOD BASED VIDEO FEATURES
is an accident on the road, roadside intelligent devices recog-                                     With the development of machine vision and artificial neural
nize and locate it efficiently. First, we build a novel dataset,                                    network technology, more and more applications based on
Car Accident Detection for Cooperative Vehicle Infrastruc-                                          video processing have been applied in transportation and
ture System dataset (CAD-CVIS), which is more suitable for                                          vehicle fields. Under this background, some researchers uti-
car accident detection based on roadside intelligent devices                                        lized video features of the car accident to detect it. [25] pre-
in CVIS. Then, a deep learning model YOLO-CA based on                                               sented a Dynamic-Spatial-Attention Recurrent Neural Net-
CAD-CVIS is developed to detect car accident. Especially,                                           work (RNN) for anticipating accidents in dashcam videos,
we optimize the network of traditional deep learning models                                         which can predict accidents about 2 seconds before they
YOLO [21] to build network of YOLO-CA, which is more                                                occur with 80% recall and 56.14% precision. [26] proposed
accurate and fast in detecting car accident. In additionally,                                       a car accident detection system based on first-person videos,
considering of wide shooting scope of roadside cameras in                                           which detected anomalies by predicting the future locations
CVIS, multi-scale feature fusion method and loss function                                           of car participants and then monitoring the prediction accu-
with dynamic weights are utilized to improve performance                                            racy and consistency metrics. These methods also have some
of detecting small objects.                                                                         limitations because of low penetration of vehicular intelligent
   The rest of this paper is organized as follows: Section 2                                        devices and shielding effects between vehicles.
gives an overviews of related work. We present the details of                                          There are also some other methods which use roadside
our proposed method in Section 3. The performance evalua-                                           devices instead of vehicular equipments to obtain and pro-
tion is discussed in Section 4. Finally, Section 5 conclude this                                    cess video. [27] proposed a novel accident detection system
paper.                                                                                              at intersection, which composed background images from
                                                                                                    image sequence and detected accidents by using Hidden
II. RELATED WORK                                                                                    Markov Model. [28] outlined a novel method for modeling
The car accident detection and notification method is a                                             of interaction among multiple moving objects, and used the
challenging issue and has attracted a lot of attention from                                         Motion Interaction Field to detect and localize car accidents.
researchers. They have proposed and applied various car ac-                                         [29] proposed a novel approach for automatic road accident
cident detection methods. In generally, car accident detection                                      detection, which was based on detecting damaged vehicles
methods are mainly divided into the following two kinds:                                            from footage received from surveillance cameras installed
vehicle running condition-based and accident video features-                                        in roads. In this method, Histogram of gradients (HOG)
based.                                                                                              and Gray level co-occurrence matrix features were used to
                                                                                                    train support vector machines. [30] presented a novel dataset
A. METHOD BASED ON VEHICLE RUNNING CONDITION                                                        for car accidents analysis based on traffic Closed-Circuit
When an accident occurs, the motion state of the vehicle                                            Television (CCTV) footage, and combined Faster Regions-
will change dramatically. Therefore, many researchers pro-                                          Convolutional Neural Network (R-CNN) and Context Min-
posed the accident detection method by monitoring motion                                            ing to detect and predict car accidents. The method in [30]
parameters, such as acceleration, velocity and so on. [22]                                          achieved 1.68 seconds in terms of Time-To-Accident mea-
used On Board Diagnosis (OBD) system to monitor speed                                               sure with an Average Precision of 47.25%. [8] proposed a
and engine status to detect a crash, and utilized smart-phone                                       novel framework for automatic car accident detection, which
to report the accident by Wi-Fi or cellular network. [23]                                           learned feature representation from the spatio-temporal vol-
developed an accident detection and reporting system using                                          umes of raw pixel intensity instead of traditional hand-crafted
GPS, GPRS, and GSM. The speed of vehicle obtained from                                              features. The experiments of method in [8] demonstrated it
2                                                                                                                                                                               VOLUME 4, 2016
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                           10.1109/ACCESS.2019.2939532, IEEE Access
can detect on average 77.5% accidents correctly with 22.5%                                          1) Data collection and annotation
false alarms.                                                                                       There are two major challenges in collecting car accidents
   Compared with the methods based on vehicle running                                               data:(1) Access: access to roadside traffic cameras data is of-
condition, these methods improve the detection accuracy and                                         ten limited. In addition, the accident data from transportation
some of them even can predict accidents about 2 seconds be-                                         administration is often not available for public uses because
fore they occur. To some extent, these methods are significant                                      of many legal reasons. (2) Abnormality: car accidents are rare
in decreasing the accident rate and improving traffic safety.                                       in the road compared with normal traffic conditions. In this
However, the detection accuracy of these methods is low and                                         work, we try to draw support from video sharing websites to
the error rate is high, and the wrong accident information will                                     search the videos and images including car accidents, such
have a great impact on the normal traffic flow. Concerning                                          as news report and documentary. In order to improve the
the core issue mentioned above, in order to avoid the draw-                                         applicability of our proposed method to roadside edge device,
backs of vehicular cameras, our proposed method utilizes the                                        we only pick out the videos and images captured from a
roadside intelligent edge devices to obtain traffic video and                                       traffic CCTV footage.
process image. Moreover, for sake of improving the accuracy
of accident detection method based on intelligent roadside
devices, we establish the CAD-CVIS dataset based on video
sharing websites, which is consisted of various kinds of
accident types, weather conditions and accident locations.
Moreover, we develop the model YOLO-CA to improve the                                               FIGURE 2. Data collection and annotation for the CAD-CVIS dataset.
reliability and real-time performance among different traffic
conditions by combining deep learning algorithms and MSFF                                              Through the above steps, we obtain 633 car accidents
method.                                                                                             scenes, 3255 accident key frames and 225206 normal frames.
                                                                                                    Moreover, the car accident scene only occupies a small
III. METHODS                                                                                        part of each accident frame. We utilize LabelImg [31] to
A. METHOD OVERVIEW                                                                                  annotate the location of the accident in each frame in detail
                                                                                                    to enhance the accuracy of locating accident. The high ac-
                                                                                                    curacy enables emergency message be sent to the vehicles
                                Intelligent                                                         that are in the same direction as accident more efficiently
                             roadside devices
    CAD-CVIS
         Data-driven
                                                                        Rescue agencies             and decrease the impact to the vehicles that are in the
    YOLO-CA                                                                                         opposite direction. The whole steps of data collection and
            Real-time
              image
                                                                                                    annotation are shown in Fig. 2. The CAD-CVIS dataset is
     Accident?
         Y
                                                                                                    made available for research use through https://github.com/
    DSRC/5G                                                                                         zzzzzzc/Car-accident-detection.
                          Car accident!
                                                                                                    2) Statistics of the CAD-CVIS
                                                                                                    Statistics of the CAD-CVIS dataset can be found in Fig. 3.It
                                                                                                    can be found that the CAD-CVIS dataset includes various
                                                                                                    types of car accidents, which can improve the adaptability of
                                                                                                    our method to different conditions. According to the number
                                                                                                    of vehicles in the accident, the CAD-CVIS dataset includes
FIGURE 1. The application scenario of the automatic car accident detection                          323 Single Vehicle Accident frames, 2449 Double Vehi-
method based on CVIS.
                                                                                                    cle Accidents frames and 483 Multiple Vehicle Accidents
                                                                                                    frames. Moreover, the CAD-CVIS dataset covers a variety
   The Fig. 1 shows the application principle of our proposed                                       of weather conditions, such as 2769 accident frames under
car accident detection method based CVIS. Firstly, the car ac-                                      sunny condition, 268 frames under foggy condition, 52 acci-
cident detection application program with YOLO-CA model                                             dent frames under rainy condition and 166 accident frames
is deployed on the edge server, which is developed based on                                         under snowy condition. Besides, there are 2588 frames of
CAD-CVIS and deep learning algorithms. Then edge server                                             accidents in the daytime and 667 accident frames at night.
receives and processes the real-time image captured by road-                                        In addition, the CAD-CVIS dataset contains 2281 frames
side cameras. Finally, the roadside communication unit will                                         of accidents occurring at the intersection, 596 frames in the
broadcast the accident emergency messages to the relevant                                           urban road, 189 frames in the expressway and 189 frames in
vehicles and rescue agencies by DSRC and 5G networks. In                                            the highway.
the rest of this section, we will present the details of CAD-                                          Comparison between CAD-CVIS and related datasets can
CVIS and YOLO-CA model.                                                                             be found in Table. 1. The A in Table. 1 responses that there
                                                                                                    is annotation of car accident in the dataset. R responses that
B. CAD-CVIS                                                                                         the videos and frames captured from the roadside CCTV
VOLUME 4, 2016                                                                                                                                                                                  3
                        This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
                                         This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                                                                10.1109/ACCESS.2019.2939532, IEEE Access
                                        1500
                                                                                                                                    2100
                                                                                                                                    1800
                                                                                                                                                                                               end to end detection service. Because of eliminating the
                                        1200
                                                                                                                                    1500                                                       process of selecting the proposal regions, these algorithms
                                        900
                                                                                                                                    1200
                                                                                                                                    900
                                                                                                                                                                                               are very fast and still has guaranteeing accuracy. Considering
                                        600
                                                                                                                                    600                                                        that accident detection requires high real-time performance,
                                        300
                                                                                                                                    300
                                          0                                                                                           0
                                                                                                                                                                                               we design the deep neural network based on one-stage model
                                                   Single         Double      Multiple                                                       Sunny          Foggy        Rainy      Snowy
                                                Number of vehicles in the accident                                                                        Weather condition                    YOLO [21].
                                                                 (a)                                                                                           (b)
                                                                                                                                                                                               1) Network Design
                                        3000
                                                                                                                                    2400                                                       YOLO utilizes its particular CNN to complete classification
    Number of traffic accident frames
2700
                                        2400
                                                                                                                                    2100
                                                                                                                                                                                               and location of multiple objects in an image at one time. In
                                                                                                                                    1800
                                        2100
                                        1800                                                                                        1500
                                                                                                                                                                                               the training process of YOLO, each image is divided into
                                        1500
                                                                                                                                    1200                                                       S × S grids. If the center of an object falls into a grid cell,
                                        1200
                                        900
                                                                                                                                    900
                                                                                                                                                                                               that grid cell is responsible for detecting that object [39].
                                                                                                                                    600
                                        600
                                        300
                                                                                                                                    300
                                                                                                                                                                                               This design can improve the detection speed dramatically
                                          0
                                                       Daytime             Night
                                                                                                                                      0
                                                                                                                                           Intersection   Urban road   Expressway   Highway
                                                                                                                                                                                               and the detection accuracy with reference to global features.
                                                        Time of the accident                                                                         Location of the accident
                                                                                                                                                                                               However, it also will cause serious detection error when there
                                                                 (c)                                                                                           (d)                             are more than one objects in one grids. Roadside cameras
FIGURE 3. Number of accident frames in CAD-CVIS categorized by different                                                                                                                       have a wide scope of shooting, the accident area may be
indexes. (a) Accident Type (b) Weather condition (c) Accident time (d) Accident
location                                                                                                                                                                                       small in the image. Inspired of the multi-scale feature fusion
                                                                                                                                                                                               (MSFF) network, in order to improve the performance of
TABLE 1. Comparison between CAD-CVIS and related datasets                                                                                                                                      model to detect small objects, we utilize 24 layers to achieve
                                                                                                                                                                                               image upsampling and obtain two different dimensional out-
                              Dataset name                             Scenes            Frames or Duration                                                   A            R          M        put tensors. This new car accident detection model is called
                               UCSD Ped2                                 77                  1636 frames                                                      ×            X          ×        as YOLO-CA, and the network structure diagram of YOLO-
                              CUHK Avenue                                47                  3820 frames                                                      ×            ×          X
                                  DAD                                   620                   2.4 hours                                                       X            ×          X        CA is shown as Fig. 4.
                                 CADP                                   1416                  5.2 hours                                                       ×            X          X           As shown in Fig. 4, YOLO-CA is composed of 228 neural
                               CAD-CVIS                                  632             3255+225206 frames                                                   X            X          X        network layers, and the number of each kind of layer is
                                                                                                                                                                                               shown in Table. 2. These layers constitute many kinds of
                                                                                                                                                                                               basic components of YOLO-CA network, such as DBL and
footage. M responses that there are multiple road conditions                                                                                                                                   ResN. The DBL is the minimum components of YOLO-
in dataset. Compared with CUHK Avenue [32], UCSD Ped2                                                                                                                                          CA network, which is composed of Convolution layer, Batch
[33] and DAD [25], CAD-CVIS contains more car accident                                                                                                                                         Normalization layer and Leaky ReLU layer. ResN consists of
scenes, which can improve the adaptability of model based                                                                                                                                      Zero Padding layer, DBL and N Resblock_units [40], which
on CAD-CVIS. Moreover, the frames of CAD-CVIS are all                                                                                                                                          is designed to avoid neural network degradation caused by
captured from roadside CCTV footage, which is more suit-                                                                                                                                       increased depth. Ups in Fig. 4 is upsampling layer, which is
able for the accident detection methods based on intelligent                                                                                                                                   utilized to improve the performance of YOLO-CA to detect
roadside devices in CVIS.                                                                                                                                                                      small objects. Concat is concatenate layer, which is used to
                                                                                                                                                                                               concatenate the layer in Darknet-53 and upsampling layer.
C. OUR PROPOSED DEEP NEURAL NETWORK MODEL
                                                                                                                                                                                               TABLE 2. Composition of YOLO-CA Network
In the task of car accident detection, we must not only
judge whether there is a car accident in the image, but
                                                                                                                                                                                                                      Layer name         Number
also accurately locate the car accident. That’s because the                                                                                                                                                              Input               1
accurate location guarantees that the RSU can broadcast the                                                                                                                                                           Convolution           65
emergency message to the vehicles affected by the accident.                                                                                                                                                       Batch Normalization       65
                                                                                                                                                                                                                     Leaky ReLU             65
The classification and location algorithms can be divided                                                                                                                                                            Zero Padding            5
into two kinds:(1) Two stage model, such as R-CNN [34],                                                                                                                                                                   Add              23
Fast R-CNN [35], Faster R-CNN [36] and Faster R-CNN                                                                                                                                                                   Upsampling             1
                                                                                                                                                                                                                      Concatenate            1
with FPN [37]. These algorithms utilize selective research                                                                                                                                                               Total             228
and Region Proposal Network (RPN) to select about 2000
proposal regions in the image, and then detection objects
by the features of these regions extracted by CNN. These                                                                                                                                       2) Detection principle
region-based models locate objects accurately, but extracting                                                                                                                                  Fig. 5 shows the detection principle of YOLO-CA, which
proposals take a great deal of time. (2) One stage model,                                                                                                                                      includes extracting feature map and predicting bounding box.
4                                                                                                                                                                                                                                                      VOLUME 4, 2016
                                                                 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                           10.1109/ACCESS.2019.2939532, IEEE Access
DBL*5
    Input                                                                                                                     DBL*5
  416x416x3                                                                           DBL      UpS
                                                                                                            Concat
                                                                                                                                 DBL             DBL     Conv
                                                                                                                                                                             Output2
                                                                                                                                                                             26x26x18
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
        This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                               10.1109/ACCESS.2019.2939532, IEEE Access
                                                                                                                                                     b
                                                                                                                                                1X
                                                                                                                                  Loss =           Loss_imgk                                        (7)
                                                 Intersection                                                                                   b
                                                                                                                                                   k=1
                         This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
              This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                                     10.1109/ACCESS.2019.2939532, IEEE Access
0.6 0.6
                                                                         Recall
              0.4                                                                 0.4
0.2 0.2
                                                                                                                                                                 1
               0                                                                   0
                    0     0.5   1    1.5      2   2.5   3                               0   0.5   1    1.5      2   2.5   3
                                    Batches                 104                                       Batches                 104
                                                                                                                                                                0.9
                                    (a)                                                               (b)
                                                                                                                                                                0.8
1 2
                                                                                                                                                  Precision
                                                                                                                                                                0.7
              0.8                                                                 1.6
                                                                                                                                                                0.6
              0.6                                                                 1.2
                                                                                                                                                                          Fast R-CNN                    (AP=77.65%)
                                                                         Loss
                                                                                                                                                                0.5
  IoU
in training set, IoU finally stabilizes above 0.8. The Fig. 7d 0.8
                                    This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                           10.1109/ACCESS.2019.2939532, IEEE Access
TABLE 4. AP and IoU results of different models among different scales of object
   In order to compare and analysis the performance of mod-                                         loss among different scales of objects. This process increases
els in details, the objects of test set is divided into three parts                                 the error punishment of small objects, because that the
according to different scales of objects:(1) Large: the area of                                     same errors of x, y, w, h cause more serious impact on the
object is larger than one tenth of image size. (2) Medium:                                          detection effect of the small object than that of the large
the area of object is over the interval [1/100, 1/10] of image                                      object. Consequently, YOLO-CA has obvious advantages in
size. (3) Small: the area of object is less than one-hundredth                                      AP and Average IoU of small objects than YOLOv3. The
of image size.                                                                                      MSFF processes of Faster R-CNN with FPN and YOLO-
                                                                                                    CA are similar, feature pyramid networks is used to extract
   The Table. 4 shows the AP and IoU results of the seven
                                                                                                    feature maps of different scales and fuse these maps to obtain
models among different scales of object. We can intuitively
                                                                                                    features with high-semantic and high-resolution. Faster R-
see that the scales of objects significantly affect the accuracy
                                                                                                    CNN utilizes RPN to select about 20000 proposal regions,
and locating performance of detection models. It can be
                                                                                                    whereas there are only 13∗13∗3+26∗26∗3 = 2535 candidate
found that our proposed YOLO-CA has obvious advantages
                                                                                                    bounding boxes in YOLO-CA. This difference results in
in AP and Average IoU than Fast R-CNN, Faster R-CNN
                                                                                                    Faster R-CNN has slight advantages in accuracy performance
and YOLOv3 without MSFF, especially among small scale
                                                                                                    than YOLO-CA, but also causes serious disadvantages in
of objects. There is not MSFF process in the above three
                                                                                                    real-time performance.
models, which results in that they detection the objects only
rely on the top-level features. However, although there is
rich semantic information in top-level features, the location
                                                                                                                        30
information of objects is rough, which does not benefit to
                                                                                                                        27
locate the bounding box of objects correctly. On the contrary,
                                                                                                                        24
there is little semantic information in low-level features with
high resolution, but the location information of objects is                                                             21
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                           10.1109/ACCESS.2019.2939532, IEEE Access
RPN, Faster R-CNN achieve about 3.5 of FPS among test set                                           4) Comparison with other car accident detection methods
(Faster R-CNN:3.5, Faster R-CNN with FPN:3.6).                                                      Although other car accident detection methods utilize a small
   Although Faster R-CNN obtains significantly improve-                                             private collection of datasets and do not make them public
ment of real-time performance compared with Fast R-CNN,                                             so comparing them may not be fair at this stage. But still,
there is still a big gap with one-stage models. That is be-                                         we list the performance achieved by these methods on their
cause one-stage models abandon the process of selecting                                             individual datasets. ARRS [3] achieve about 63% AP with
proposal regions and utilize one CNN to implement location                                          6% false alarms. The method of [42] achieve 89.50% AP.
and classification of objects. As shown in Fig. 9, SSD can                                          DSA-RNN [25] achieve about 80% recall and 56.14% AP.
achieve 15.6 of FPS among test set. The other three models                                          The method in [30] achieve about 47.25% AP. The method
based on YOLO utilize the backbone Darknet-53 instead of                                            of [8] achieve 77.5% AP and 22.5% false alarms. Moreover,
VGG-16 in SSD, and computation of the former network                                                the number of accident scenes of the datasets utilized in these
is significantly less than the latter because of using the                                          methods is limited, which will result in poor adaptability for
residual networks. Therefore, the real-time performance of                                          new scenarios.
SSD is lower than YOLO-based models in our experiments.
In additionally, our proposed YOLO-CA simplifies the MSFF                                           V. CONCLUSION
networks of YOLOv3. So YOLO-CA can achieve 21.7 of                                                  In this paper, we have proposed an automatic car accident
FPS, which is higher than that of YOLOv3 (about 19.1). Be-                                          detection method based on CVIS. First of all, we present the
cause of lacking MSFF process in YOLOv3 without MSFF,                                               application principles of our proposed method in the CVIS.
it has better real-time performance (about 23.6 of FPS) than                                        Secondly, we build a novel image dataset CAD-CVIS, which
YOLO-CA, but this lacking results in serious performance                                            is more suitable for car accident detection method based on
penalties of AP.                                                                                    intelligent roadside devices in CVIS. Then we develop the car
   Fig. 10 show some visual results of the seven models                                             accident detection model YOLO-CA based on CAD-CVIS
among different scales of objects. It can be found that there                                       and deep learning algorithms. In the model, we combine the
is a false positive in the large objects detection results of Fast                                  multi-scale feature fusion and loss function with dynamic
R-CNN, but the other six models all have high accuracy and                                          weights to improve real-time and accuracy of YOLO-CA.
locating performance in large objects in Fig. 10. However,                                          Finally, we show the simulation experiments results of our
the locating performance of Fast R-CNN, Faster R-CNN,                                               method, which demonstrates our proposed methods can de-
SSD, and YOLOv3 without MSFF decrease significantly in                                              tect car accident in 0.0461 seconds with 90.02% AP. More-
medium object frame (1), and the prediction bounding box                                            over, the comparative experiments results show that YOLO-
cannot fitting out the contour of car accident. Moreover, Fast                                      CA has comprehensive performance advantages of detecting
R-CNN, SSD, and YOLO-without MSFF cannot detect the                                                 car accident than other detection models, in terms of accuracy
car accident in small object frame (1). In additionally, except                                     and real-time.
for Faster R-CNN with FPN and YOLO-CA, other models
have serious location error in small object frame (3).                                              REFERENCES
                                                                                                     [1] WHO, “Global status report on road safety 2018,” https://www.who.int/
3) Comparison of comprehensive performance and                                                           violence_injury_prevention/road_safety_status/2018/en/.
                                                                                                     [2] H. L. Wang and M. A. Jia-Liang, “A design of smart car accident rescue
practicality                                                                                             system combined with wechat platform,” Journal of Transportation Engi-
As analyzed above, it can be found that our proposed YOLO-                                               neering, 2017.
CA has performance advantages of detecting car accident                                              [3] Y. Ki and D. Lee, “A traffic accident recording and reporting model at
                                                                                                         intersections,” IEEE Transactions on Intelligent Transportation Systems,
than Fast R-CNN, Faster R-CNN, SSD, and YOLOv3 in                                                        vol. 8, no. 2, pp. 188–194, June 2007.
terms of accuracy, locating and real-time performance. For                                           [4] W. Hao and J. Daniel, “Motor vehicle driver injury severity study under
YOLOv3 without MSFF, the FPS of it (23.6) is higher than                                                 various traffic control at highway-rail grade crossings in the united states,”
                                                                                                         Journal of Safety Research, vol. 51, pp. 41–48, 2014.
that of YOLO-CA (21.7), and this difference is acceptable in                                         [5] J. White, C. Thompson, H. Turner, B. Dougherty, and D. C. Schmidt,
the practical application of detecting car accident. However,                                            “Wreckwatch: Automatic traffic accident detection and notification with
the AP of YOLO-CA is significantly higher than that of                                                   smartphones,” Mobile Networks and Applications, vol. 16, no. 3, pp. 285–
                                                                                                         303, 2011.
YOLOv3 without MSFF, especially for small scales of object                                           [6] S. Sadek, A. Al-Hamadi, B. Michaelis, and U. Sayed, “Real-time auto-
(76.51% vs 58.89%). Compared with Faster R-CNN with                                                      matic traffic accident recognition using hfg,” in International Conference
FPN, YOLO-CA can approach the AP of it (90.66% vs                                                        on Pattern Recognition, 2010.
                                                                                                     [7] A. Shaik, N. Bowen, J. Bole, G. Kunzi, D. Bruce, A. Abdelgawad, and
90.03%) with an obvious speed advantage. Faster R-CNN                                                    K. Yelamarthi, “Smart car: An iot based accident detection system,” in
cost about 277ms on average to detect one frame, whereas                                                 2018 IEEE Global Conference on Internet of Things (GCIoT). IEEE,
YOLO-CA only need 46 ms, which illustrates the speed                                                     2018, pp. 1–5.
                                                                                                     [8] D. Singh and C. K. Mohan, “Deep spatio-temporal representation for
of YOLO-CA is about 6x faster than Faster R-CNN with                                                     detection of road accidents using stacked autoencoder,” IEEE Transactions
FPN. Car accident detection in CVIS requires high real-time                                              on Intelligent Transportation Systems, vol. 20, no. 3, pp. 879–887, March
performance because of the high dynamics of vehicles. To                                                 2019.
                                                                                                     [9] M. Zheng, T. Li, R. Zhu, J. Chen, Z. Ma, M. Tang, Z. Cui, and Z. Wang,
summarize, our proposed YOLO-CA have higher practicality                                                 “Traffic accident’s severity prediction: A deep-learning approach-based
and comprehensive performance on accuracy and real-time.                                                 cnn network,” IEEE Access, vol. 7, pp. 39 897–39 910, 2019.
VOLUME 4, 2016 9
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
     This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                            10.1109/ACCESS.2019.2939532, IEEE Access
(1)
Large (2)
(3)
(1)
Mediun (2)
(3)
(1)
Small (2)
(3)
FIGURE 10. Some visual results of the seven models among different scales of objects.
[10] L. Zheng, Z. Peng, J. Yan, and W. Han, “An online learning and unsuper-                         [16] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, “Detecting
     vised traffic anomaly detection system,” Advanced Science Letters, vol. 7,                           unexpected obstacles for self-driving cars: Fusing deep learning and
     no. 1, pp. 449–455, 2012.                                                                            geometric modeling,” in 2017 IEEE Intelligent Vehicles Symposium (IV).
[11] F. Yang, S. Wang, J. Li, Z. Liu, and Q. Sun, “An overview of internet of                             IEEE, 2017, pp. 1025–1032.
     vehicles,” China Communications, vol. 11, no. 10, pp. 1–15, Oct 2014.                           [17] T. Qu, Q. Zhang, and S. Sun, “Vehicle detection from high-resolution
[12] C. Ma, W. Hao, A. Wang, and H. Zhao, “Developing a coordinated                                       aerial images using spatial pyramid pooling-based deep convolutional
     signal control system for urban ring road under the vehicle-infrastructure                           neural networks,” Multimedia Tools and Applications, vol. 76, no. 20, pp.
     connected environment,” IEEE Access, vol. 6, pp. 52 471–52 478, 2018.                                21 651–21 663, 2017.
[13] S. Zhang, J. Chen, F. Lyu, N. Cheng, W. Shi, and X. Shen, “Vehicular                            [18] D. Dooley, B. McGinley, C. Hughes, L. Kilmartin, E. Jones, and
     communication networks in the automated driving era,” IEEE Communi-                                  M. Glavin, “A blind-zone detection method using a rear-mounted fisheye
     cations Magazine, vol. 56, no. 9, pp. 26–32, 2018.                                                   camera with combination of vehicle detection methods,” IEEE Transac-
[14] Y. Wang, D. Zhang, Y. Liu, B. Dai, and L. H. Lee, “Enhancing transporta-                             tions on Intelligent Transportation Systems, vol. 17, no. 1, pp. 264–278,
     tion systems via deep learning: A survey,” Transportation research part C:                           Jan 2016.
     emerging technologies, 2018.                                                                    [19] X. Changzhen, W. Cong, M. Weixin, and S. Yanmei, “A traffic sign
[15] G. Wu, F. Chen, X. Pan, M. Xu, and X. Zhu, “Using the visual intervention                            detection algorithm based on deep convolutional neural network,” in 2016
     influence of pavement markings for rutting mitigation–part i: preliminary                            IEEE International Conference on Signal and Image Processing (ICSIP),
     experiments and field tests,” International Journal of Pavement Engineer-                            Aug 2016, pp. 676–679.
     ing, vol. 20, no. 6, pp. 734–746, 2019.                                                         [20] S. Zhang, C. Bauckhage, and A. B. Cremers, “Efficient pedestrian detec-
10 VOLUME 4, 2016
                      This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
       This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
                                                                              10.1109/ACCESS.2019.2939532, IEEE Access
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.