Emerging Technologies
Emerging Technologies
ABSTRACT Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experi-
ence. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs
cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved
problems and surveys the technical aspect of automated driving. Studies regarding present challenges,
high-level system architectures, emerging methodologies and core functions including localization, mapping,
perception, planning, and human machine interfaces, were thoroughly reviewed. Furthermore, many state-
of-the-art algorithms were implemented and compared on our own platform in a real-world driving setting.
The paper concludes with an overview of available datasets and tools for ADS development.
INDEX TERMS Autonomous vehicles, control, robotics, automation, intelligent vehicles, intelligent
transportation systems.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 58443
E. Yurtsever et al.: Survey of Autonomous Driving: Common Practices and Emerging Technologies
DARPA Urban Challenge, several more automated driving of automated driving system components and architec-
competitions such as [27]–[30] were held in different coun- tures are given in Section III. Section IV presents a sum-
tries. mary of state-of-the-art localization techniques followed by
Common practices in system architecture have been estab- Section V, an in-depth review of perception models. Assess-
lished over the years. Most of the ADSs divide the massive ment of the driving situation and planning are discussed in
task of automated driving into subcategories and employ an Section VI and VII respectively. In Section VIII, current
array of sensors and algorithms on various modules. More trends and shortcomings of human machine interface are
recently, end-to-end driving started to emerge as an alter- introduced. Datasets and available tools for developing auto-
native to modular approaches. Deep learning models have mated driving systems are given in Section IX.
become dominant in many of these tasks [31].
The Society of Automotive Engineers (SAE) refers to II. PROSPECTS AND CHALLENGES
hardware-software systems that can execute dynamic driv- A. SOCIAL IMPACT
ing tasks (DDT) on a sustainable basis as ADS [32]. There Widespread usage of ADSs is not imminent. Yet it is still
are also vernacular alternative terms such as ‘‘autonomous possible to foresee its potential impact and benefits to a
driving’’ and ‘‘self-driving car’’ in use. Nonetheless, despite certain degree:
being commonly used, SAE advices not to use them as these 1) Problems that can be solved: preventing traffic acci-
terms are unclear and misleading. In this paper we follow dents, mitigating traffic congestions, reducing emis-
SAE’s convention. sions
The present paper attempts to provide a structured and 2) Arising opportunities: reallocation of driving time,
comprehensive overview of state-of-the-art automated driv- transporting the mobility impaired
ing related hardware-software practices. Moreover, emerging 3) New trends: consuming Mobility as a Service (MaaS),
trends such as end-to-end driving and connected systems are logistics revolution
discussed in detail. There are overview papers on the subject, Widespread deployment of ADSs can reduce the societal
which covered several core functions [15], [16], and which loss caused by erroneous human behavior such as distraction,
concentrated only on the motion planning aspect [18], [19]. driving under influence and speeding [3].
However, a survey that covers: present challenges, available Globally, the elder group (over 60 years old) is growing
and emerging high-level system architectures, individual core faster than the younger groups [33]. Increasing the mobility
functions such as localization, mapping, perception, plan- of elderly with ADSs can have a huge impact on the quality
ning, vehicle control, and human-machine interface alto- of life and productivity of a large portion of the population.
gether does not exist. The aim of this paper is to fill this gap A shift from personal vehicle-ownership towards con-
in the literature with a thorough survey. In addition, a detailed suming Mobility as a Service (MaaS) is an emerging
summary of available datasets, software stacks, and simu- trend. Currently, ride-sharing has lower costs compared to
lation tools is presented here. Another contribution of this vehicle-ownership under 1000 km annual mileage [34]. The
paper is the detailed comparison and analysis of alternative ratio of owned to shared vehicles is expected to be 50:50 by
approaches through implementation. We implemented some 2030 [35]. Large scale deployment of ADSs can accelerate
state-of-the-art algorithms in our platform using open-source this trend.
software. Comparison of existing overview papers and our
work is shown in Table 1. B. CHALLENGES
The remainder of this paper is written in eight sections. ADSs are complicated robotic systems that operate in inde-
Section II is an overview of present challenges. Details terministic environments. As such, there are myriad scenarios
FIGURE 2. Information flow diagrams of: (a) a generic modular system, and (b) an end-to-end driving system.
Core functions of a modular ADS can be summarized as: TABLE 2. Common end-to-end driving approaches.
localization and mapping, perception, assessment, planning
and decision making, vehicle control, and human-machine
interface. Typical pipelines [15], [47], [51]–[56], [56]–[58]
start with feeding raw sensor inputs to localization and object
detection modules, followed by scene prediction and decision
making. Finally, motor commands are generated at the end of
the stream by the control module [31], [68].
Developing individual modules separately divides the chal-
lenging task of automated driving into an easier-to-solve set
of problems [69]. These sub-tasks have their corresponding
literature in robotics [70], computer vision [71] and vehicle
dynamics [36], which makes the accumulated know-how and
expertise directly transferable. This is a major advantage of
modular systems. In addition, functions and algorithms can
be integrated or built upon each other in a modular design. The earliest end-to-end driving attempt dates back to
E.g, a safety constraint [72] can be implemented on top of ALVINN [60], where a 3-layer fully connected network was
a sophisticated planning module to force some hard-coded trained to output the direction that the vehicle should fol-
emergency rules without modifying the inner workings of low. An end-to-end driving system for off-road driving was
the planner. This enables designing redundant but reliable introduced in [61]. With the advances in artificial neural
architectures. network research, deep convolutional and temporal networks
The major disadvantages of modular systems are being became feasible for automated driving tasks. A deep convo-
prone to error propagation [31] and over-complexity. In the lutional neural network that takes image as input and outputs
unfortunate Tesla accident, an error in the perception module steering was proposed in [62]. A spatiotemporal network,
in the form of a misclassification of a white trailer as sky, an FCN-LSTM architecture, was developed for predicting
propagated down the pipeline until failure, causing the first ego-vehicle motion in [63]. DeepDriving is another convo-
ADS related fatality [46]. lutional model that tries to learn a set of discrete perception
indicators from the image input [59]. This approach is not
3) END-TO-END DRIVING entirely end-to-end though, the proper driving actions in the
End-to-end driving, referred as direct perception in some perception indicators have to be generated by another mod-
studies [59], generate ego-motion directly from sensory ule. All of the mentioned methods follow direct supervised
inputs. Ego-motion can be either the continuous operation training strategies. As such, ground truth is required for train-
of steering wheel and pedals or a discrete set of actions, ing. Usually, the ground truth is the ego-action sequence of
e.g, acceleration and turning left. There are three main an expert human driver and the network learns to imitate the
approaches for end-to-end driving: direct supervised deep driver. This raises an import design question: should the ADS
learning [59]–[63], neuroevolution [66], [67] and the more drive like a human?
recent deep reinforcement learning [64], [65]. The flow A novel deep reinforcement learning model, Deep Q Net-
diagram of a generic end-to-end driving system is shown works (DQN), combined reinforcement learning with deep
in Figure 2 and comparison of the approaches is given learning [73]. In summary, the goal of the network is to
in Table 2. select a set of actions that maximize cumulative future
rewards. A deep convolutional neural network was used to messages to an area instead of a direct address and they
approximate the optimal action reward function. Actions are accept corresponding responses from any sender [49]. Since
generated first with random initialization. Then, the network vehicles are highly mobile and dispersed on the road network,
adjust its parameters with experience instead of direct super- the identity of the information source becomes less relevant.
vised learning. An automated driving framework using DQN In addition, local data often carries more crucial informa-
was introduced in [64], where the network was tested in a tion for immediate driving tasks such as avoiding a rapidly
simulation environment. The first real world run with DQN approaching vehicle on a blind spot.
was achieved in a countryside road without traffic [65]. DQN Early works, such as the CarSpeak system [82], proved that
based systems do not imitate the human driver, instead, they vehicles can utilize each other’s sensors and use the shared
learn the optimum way of driving. information to execute some dynamic driving tasks. However,
Neuroevolution refers to using evolutionary algorithms to without reducing huge amounts of continuous driving data,
train artificial neural networks [74]. End-to-end driving with sharing information between hundreds of thousand vehicles
neuroevolution is not popular as DQN and direct supervised in a city could not become feasible. A semiotic framework
learning. To the best of our knowledge, real world end-to- that integrates different sources of information and converts
end driving with neuroevolution is not achieved yet. However, raw sensor data into meaningful descriptions was introduced
some promising simulation results were obtained [66], [67]. in [83] for this purpose. In [84], the term Vehicular Cloud
ALVINN was trained with neuroevolution and outperformed Computing (VCC) was coined and the main advantages of it
the direct supervised learning version [66]. A RNN was over conventional Internet cloud applications was introduced.
trained with neuroevolution in [67] using a driving simulator. Sensors are the primary cause of the difference. In VCC,
The biggest advantage of neuroevolution is the removal of sensor information is kept on the vehicle and only shared if
backpropagation, hence, the need for direct supervision. there is a local query from another vehicle. This potentially
End-to-end driving is promising, however it has not been saves the cost of uploading/downloading a constant stream
implemented in real-world urban scenes yet, except lim- of sensor data to the web. Besides, the high relevance of
ited demonstrations. The biggest shortcomings of end-to-end local data increases the feasibility of VCC. Regular cloud
driving in general are the lack of hard coded safety measures computing was compared to vehicular cloud computing and
and interpretability [69]. In addition, DQN and neuroevo- it was reported that VCC is technologically feasible [85].
lution has one major disadvantage over direct supervised The term ‘‘Internet of Vehicles’’ (IoV) was proposed for
learning: these networks must interact with the environment describing a connected ADS [48] and the term ‘‘vehicular
online and fail repeatedly to learn the desired behavior. On the fog’’ was introduced in [49].
contrary, direct supervised networks can be trained offline Establishing an efficient VANET with thousands of vehi-
with human driving data, and once the training is done, cles in a city is a huge challenge. For an ICN based VANET,
the system is not expected to fail during operation. some of the challenging topics are security, mobility, routing,
naming, caching, reliability and multi-access computing [86].
4) CONNECTED SYSTEMS In summary, even though the potential benefits of a connected
There is no operational connected ADS in use yet, however, system is huge, the additional challenges increase the com-
some researchers believe this emerging technology will be plexity of the problem to a significant degree. As such, there
the future of driving automation [48]–[50]. With the use of is no operational connected system yet.
Vehicular Ad hoc NETwork (VANETs), the basic operations
of automated driving can be distributed amongst agents. V2X B. SENSORS AND HARDWARE
is a term that stands for ‘‘vehicle to everything.’’ From mobile State-of-the-art ADSs employ a wide selection of onboard
devices of pedestrians to stationary sensors on a traffic light, sensors. High sensor redundancy is needed in most of the
an immense amount of data can be accessed by the vehicle tasks for robustness and reliability. Hardware units can be
with V2X [22]. By sharing detailed information of the traffic categorized into five: exteroceptive sensors for perception,
network amongst peers [75], shortcomings of the ego-only proprioceptive sensors for internal vehicle state monitoring
platforms such as sensing range, blind spots, and compu- tasks, communication arrays, actuators, and computational
tational limits may be eliminated. More V2X applications units.
that will increase safety and traffic efficiency are expected Exteroceptive sensors are mainly used for perceiving the
to emerge in the foreseeable future [76]. environment, which includes dynamic and static objects, e.g.,
VANETs can be realized in two different ways: conven- drivable areas, buildings, pedestrian crossings. Camera, lidar,
tional IP based networking and Information-Centric Net- radar and ultrasonic sensors are the most commonly used
working (ICN) [48]. For vehicular applications, lots of data modalities for this task. A detailed comparison of exterocep-
have to be distributed amongst agents with intermittent and tive sensors is given in Table 3.
in less than ideal connections while maintaining high mobil-
ity [50]. Conventional IP-host based Internet protocol cannot 1) MONOCULAR CAMERAS
function properly under these conditions. On the other hand, Cameras can sense color and are passive, i.e. they do not
in information-centric networking, vehicles stream query emit any signal for measurements. Sensing color is extremely
FIGURE 3. Ricoh Tetha V panoramic images collected using our data FIGURE 4. DAVIS240 events, overlayed on the image (left) and
collection platform, in Nagoya University campus. Note some distortion corresponding RBG image from a different camera (right), collected by
still remains on the periphery of the image. our data collection platform, at a road crossing near Nagoya University.
The motion of the cyclist and vehicle causes brightness changes which
trigger events.
important for tasks such as traffic light recognition. Further-
more, 2D computer vision is an established field with remark- stimulus. The output is therefore an irregular sequence of
able state-of-the-art algorithms. Moreover, a passive sensor data points, or events triggered by changes in brightness.
does not interfere with other systems since it does not emit The response time is in the order of microseconds [90]. The
any signals. However, cameras have certain shortcomings. main limitation of current event cameras is pixel size and
Illumination conditions affect their performance drastically, image resolution. For example, the DAVIS40 image shown
and depth information is difficult to obtain from a single cam- in Figure 4 has a pixel size of 18.5×18.5 µm and a resolution
era. There are promising studies [87] to improve monocular of 240 × 180. Recently, a driving dataset with event camera
camera based depth perception, but modalities that are not data has been published [89].
negatively affected by illumination and weather conditions
are still necessary for dynamic driving tasks. Other camera 4) RADAR
types gaining interest for ADS include flash cameras [77],
Radar, lidar and ultrasonic sensors are very useful in covering
thermal cameras [79], [80], and event cameras [78].
the shortcomings of cameras. Depth information, i.e. distance
to objects, can be measured effectively to retrieve 3D infor-
2) OMNIDIRECTIONAL CAMERA
mation with these sensors, and they are not affected by illu-
For 360◦ 2D vision, omnidirectional cameras are used as an mination conditions. However, they are active sensors. Radars
alternative to camera arrays. They have seen widespread use, emit radio waves that bounce back from objects and measure
with increasingly compact and high performance hardware the time of each bounce. Emissions from active sensors can
being constantly released. Panoramic view is particularly interfere with other systems. Radar is a well-established tech-
desirable for applications such as navigation, localization nology that is both lightweight and cost-effective. For exam-
and mapping [88]. An example panoramic image is shown ple, radars can fit inside side-mirrors. Radars are cheaper and
in Figure 3. can detect objects at longer distances than lidars, but the latter
are more accurate.
3) EVENT CAMERAS
Event cameras are among the newer sensing modalities that 5) LIDAR
have seen use in ADS [89]. Event cameras record data Lidar operates with a similar principle that of radar but it
asynchronously for individual pixels with respect to visual emits infrared light waves instead of radio waves. It has
7) FULL SIZE CARS satisfies three core conditions: using the already existing
There are numerous instrumented vehicles introduced by public road network, sharing the traffic with non-automated
different research groups, such as Stanford’s Junior [15], vehicles and not modifying the road infrastructure. A platoon
which employs an array of sensors with different modali- consisting of three automated trucks was formed in [103] and
ties for perceiving external and internal variables. Boss won significant fuel savings were reported.
the DARPA Urban Challenge with an abundance of sen- Tractor-trailer setup poses an additional challenge for auto-
sors [47]. RobotCar [53] is a cheaper research platform aimed mated freight transport. Conventional control methods such
for data collection. In addition, different levels of driving as feedback linearization [106] and fuzzy control [107] were
automation have been introduced by the industry; Tesla’s used for path tracking without considering the jackknifing
Autopilot [92] and Google’s self driving car [93] are some constraint. The possibility of jackknifing, the collision of the
examples. Bertha [57] is developed by Daimler and has 4 truck and the trailer with each other, increases the difficulty of
120◦ short-range radars, two long-range range radar on the the task [108]. A control safety governor design was proposed
sides, stereo camera, wide angle-monocular color camera on in [108] to prevent jackknifing while reversing.
the dashboard, another wide-angle camera for the back. Our
vehicle is shown in Figure 5. A detailed comparison of sensor IV. LOCALIZATION AND MAPPING
setups of 10 different full-size ADSs is given in Table 4. Localization is the task of finding ego-position relative to a
reference frame in an environment [17], and it is fundamental
8) LARGE VEHICLES AND TRAILERS to any mobile robot. It is especially crucial for ADSs [21]; the
Earliest intelligent trucks were developed for the PATH pro- vehicle must use the correct lane and position itself in it accu-
gram in California [102], which utilized magnetic markers on rately. Furthermore, localization is an elemental requirement
the road. Fuel economy is an essential topic in freight trans- for global navigation.
portation and methods such as platooning has been developed The reminder of this section details the three most common
for this purpose. Platooning is a well-studied phenomenon; it approaches that use solely on-board sensors: Global Posi-
reduces drag and therefore fuel consumption [103]. In semi- tioning System and Inertial Measurement Unit (GPS-IMU)
autonomous truck platooning, the lead truck is driven by a fusion, Simultaneous Localization And Mapping (SLAM),
human driver, and several automated trucks follow it; forming and state-of-the-art a priori map-based localization. Read-
a semi-autonomous road-train as defined in [104]. Sartre ers are referred to [17] for a broader localization overview.
European Union project [105] introduced such a system that A comparison of localization methods is given in Table 5.
A. GPS-IMU FUSION further [20]. We refer the readers to [21] for a detailed SLAM
The main principle of GPS-IMU fusion is correcting accu- survey in the intelligent vehicle domain.
mulated errors of dead reckoning in intervals with absolute
position readings [109]. In a GPS-IMU system, changes C. A PRIORI MAP-BASED LOCALIZATION
in position and orientation are measured by IMU, and this The core idea of a priori map-based localization techniques
information is processed for localizing the vehicle with dead is matching: localization is achieved through the compar-
reckoning. There is a significant drawback of IMU, and in ison of online readings to the information on a detailed
general dead reckoning: errors accumulate with time and they pre-built map and finding the location of the best possible
often lead to failure in long-term operations [110]. With the match [111]. Often an initial pose estimation, for example
integration of GPS readings, the accumulated errors of the with a GPS, is used at the beginning of the matching process.
IMU can be corrected in intervals. There are various approaches to map building and preferred
GPS-IMU systems by themselves cannot be used for vehi- modalities.
cle localization as they do not meet the performance crite- Changes in the environment affect the performance of
ria [111]. In the 2004 DARPA Grand Challenge, the red team map-based methods negatively. This effect is prevalent espe-
from Carnegie Mellon University [96] failed the race because cially in rural areas where past information of the map can
of a GPS error. The accuracy required for urban automated deviate from the actual environment because of changes in
driving is too high for the current GPS-IMU systems used roadside vegetation and constructions [114]. Moreover, this
in production cars. Moreover, in dense urban environments, method requires an additional step of map making.
the accuracy drops further, and the GPS stops functioning There are two different map-based approaches; landmark
from time to time because of tunnels [109] and high buildings. search and matching.
Even though GPS-IMU systems by themselves do not
meet the performance requirements and can only be utilized
1) LANDMARK SEARCH
for high-level route planning, they are used for initial pose
Landmark search is computationally less expensive in com-
estimation in tandem with lidar and other sensors in state-of-
parison to point cloud matching. It is a robust localization
the-art localization systems [111].
technique as long as a sufficient amount of landmarks exists.
In an urban environment, poles, curbs, signs and road markers
B. SIMULTANEOUS LOCALIZATION AND MAPPING can be used as landmarks.
Simultaneous localization and mapping (SLAM) is the act A road marking detection method using lidar and Monte
of online map making and localizing the vehicle in it at the Carlo Localization (MCL) was used in [98]. In this method,
same time. A priori information about the environment is road markers and curbs were matched to a 3D map to find the
not required in SLAM. It is a common practice in robotics, location of the vehicle. A vision based road marking detection
especially in indoor environments. However, due to the high method was introduced in [115]. Road markings detected
computational requirements and environmental challenges, by a single front camera were compared and matched to
running SLAM algorithms outdoors, which is the operational a low-volume digital marker map with global coordinates.
domain of ADSs, is less efficient than localization with a Then, a particle filter was employed to update the position
pre-built map [112]. and heading of the vehicle with the detected road markings
Team MIT used a SLAM approach in DARPA urban and GPS-IMU output. A road marking detection based local-
challenge [113] and finished it in the 4th place. Whereas, ization technique using; two cameras directed towards the
the winner, Carnegie Mellon‘s Boss [47] and the runner- ground, GPS-IMU dead reckoning, odometry, and a precise
up, Stanford‘s Junior [15], both utilized a priori information. marker location map was proposed in [116]. Another vision
In spite of not having the same level of accuracy and effi- based method with a single camera and geo-referenced traffic
ciency, SLAM techniques have one major advantage over signs was presented in [117].
a priori methods: they can work anywhere. This approach has one major disadvantage; landmark
SLAM based methods have the potential to replace dependency makes the system prone to fail where landmark
a priori techniques if their performances can be increased amount is insufficient.
3) 2D TO 3D MATCHING
Matching online 2D readings to a 3D a priori map is an
emerging technology. This approach requires only a camera
on the ADS equipped vehicle instead of the more expensive
lidar. The a priori map still needs to be created with a lidar.
A monocular camera was used to localize the vehicle in
a point cloud map in [123]. With an initial pose estimation,
2D synthetic images were created from the offline 3D point
cloud map and they were compared with normalized mutual
information to the online images received from the camera.
FIGURE 7. Creating a 3D pointcloud map with congregation of scans. This method increases the computational load of the localiza-
We used Autoware [122] for mapping.
tion task. Another vision matching algorithm was introduced
in [124] where a stereo camera setup was utilized to compare
online readings to synthetic depth images generated from 3D
2) POINT CLOUD MATCHING
prior.
The state-of-the-art localization systems use multi-modal Camera based localization approaches could become pop-
point cloud matching based approaches. In summary, ular in the future as the hardware requirement is cheaper than
the online-scanned point cloud, which covers a smaller area, lidar based systems.
is translated and rotated around its center iteratively to be
compared against the larger a priori point cloud map. The
V. PERCEPTION
position and orientation that gives the best match between
the two point clouds give the localized position of the sensor Perceiving the surrounding environment and extracting infor-
relative to the map. For initial pose estimation, GPS is used mation which may be critical for safe navigation is a critical
commonly along dead reckoning. We used this approach objective for ADS. A variety of tasks, using different sensing
to localize our vehicle. The matching process is shown modalities, fall under the category of perception. Building
in Figure 6 and the map-making in Figure 7. on decades of computer vision research, cameras are the
In the seminal work of [111], a point cloud map col- most commonly used sensor for perception, with 3D vision
lected with lidar was used to augment inertial navigation and becoming a strong alternative/supplement.
localization. A particle filter maintained a three-dimensional The reminder of this section is divided into core per-
vector of 2D coordinates and the yaw angle. A multi- ception tasks. We discuss image-based object detection in
modal approach with probabilistic maps was utilized in [100] Section V-A1, semantic segmentation in Section V-A2, 3D
to achieve localization in urban environments with less object detection in Section V-A3, road and lane detection in
than 10 cm RMS error. Instead of comparing two point Section V-C and object tracking in Section V-B.
clouds point by point and discarding the mismatched reads,
the variance of all observed data was modeled and used A. DETECTION
for the matching task. A matching algorithm for lidar 1) IMAGE-BASED OBJECT DETECTION
scans using multi-resolution Gaussian Mixture Maps (GMM) Object detection refers to identifying the location and size
was proposed in [119]. Iterative Closest Point (ICP) was of objects of interest. Both static objects, from traffic lights
compared against Normal Distribution Transform (NDT) and signs to road crossings, and dynamic objects such as
in [118], [120]. In NDT, accumulated sensor readings are other vehicles, pedestrians or cyclists are of concern to ADSs.
Generalized object detection has a long-standing history as a TABLE 6. Comparison of 2D bounding box estimation architectures on
the test set of ImageNet1K, ordered by Top 5% error. Number of
central problem in computer vision, where the goal is to deter- parameters (Num. Params) and number of layers (Num. Layers), hints at
mine if objects of specific classes are present in an image, the computational cost of the algorithm.
then to determine their size via a rectangular bounding box.
This section mainly discusses state-of-the-art object detection
methods, as they represent the starting point of several other
tasks in an ADS pipe, such as object tracking and scene
understanding.
Object recognition research started more than 50 years
ago, but only recently, in the late 1990s and early 2000s,
has algorithm performance reached a level of relevance for
driving automation. In 2012, the deep convolutional neu-
ral network (DCNN) AlexNet [4] shattered the ImageNet
image recognition challenge [131]. This resulted in a near
complete shift of focus to supervised learning and in par-
ticular deep learning for object detection. There exists a
number of extensive surveys on general image-based object much time as possible for the planning and control modules
detection [132]–[134]. Here, the focus is on the state-of-the- to react to those objects. As such, single stage detectors are
art methods that could be applied to ADS. often the detection algorithms of choice for ADSs. How-
While state-of-the-art methods all rely on DCNNs, there ever, as shown in Table 6, region proposal networks (RPN),
currently exist a clear distinction between them: used in two-stage detection frameworks, have proven to be
unmatched in terms of object recognition and localization
1) Single stage detection frameworks use a single network accuracy, and computational cost has improved greatly in
to produce object detection locations and class predic- recent years. They are also better suited for other tasks related
tion simultaneously. to detection, such as semantic segmentation as discussed in
2) Region proposal detection frameworks use two distinct Section V-A2. Through transfer learning, RPNs achieving
stages, where general regions of interest are first pro- multiple perception tasks simultaneously are become increas-
posed, then categorized by separate classifier networks. ingly feasible for online applications [138]. RPNs can replace
Region proposal methods are currently leading detection single stage detection networks for ADS applications in the
benchmarks, but at the cost requiring high computation near future.
power, and generally being difficult to implement, train and
fine-tune. Meanwhile, single stage detection algorithms tend a: OMNIDIRECTIONAL AND EVENT CAMERA-BASED
to have fast inference time and low memory cost, which is PERCEPTION
well-suited for real-time driving automation. YOLO (You 360 degree vision, or at least panoramic vision, is necessary
Only Look Once) [135] is a popular single stage detector, for higher levels of automation. This can be achieved through
which has been improved continuously [128], [136]. Their camera arrays, though precise extrinsic calibration between
network uses a DCNN to extract image features on a coarse each camera is then necessary to make image stitching pos-
grid, significantly reducing the resolution of the input image. sible. Alternatively, omnidirectional cameras can be used,
A fully-connected neural network then predicts class proba- or a smaller array of cameras with very wide angle fisheye
bilities and bounding box parameters for each grid cell and lenses. These are however difficult to intrinsically calibrate;
class. This design makes YOLO very fast, the full model the spherical images are highly distorted and the camera
operating at 45 FPS and a smaller model operating at 155 FPS model used must account for mirror reflections or fisheye lens
for a small accuracy trade-off. More recent versions of this distortions, depending on the camera model producing the
method, YOLOv2, YOLO9000 [136] and YOLOv3 [128] panoramic images [141], [142]. The accuracy of the model
briefly took over the PASCAL VOC and MS COCO bench- and calibration dictates the quality of undistorted images pro-
marks while maintaining low computation and memory cost. duced, on which the aforementioned 2D vision algorithms are
Another widely used algorithm, even faster than YOLO, used. An example of fisheye lenses producing two spherical
is the Single Shot Detector (SSD) [137], which uses standard images then combined into one panoramic image is shown
DCNN architectures such as VGG [130] to achieve competi- in Figure 3. Some distortions inevitably remain, but despite
tive results on public benchmarks. SSD performs detection on these challenges in calibration, omnidirectional cameras have
a coarse grid similar to YOLO, but also uses higher resolution been used for many applications such as SLAM [143] and 3D
features obtained early in the DCNN to improve detection and reconstruction [144].
localization of small objects. Event cameras are a fairly new modality which output
Considering both accuracy and computational cost is asynchronous events usually caused by movement in the
essential for detection in ADS; the detection needs to be observed scene, as shown in Figure 4. This makes the sens-
reliable, but also operate better than real-time, to allow as ing modality interesting for dynamic object detection. The
FIGURE 8. An urban scene near Nagoya University, with camera and lidar data collected by our experimental vehicle and
object detection outputs from state-of-the-art perception algorithms. (a) A front facing camera’s view, with bounding box
results from YOLOv3 [128] and (b) instance segmentation results from MaskRCNN [138]. (c) Semantic segmentation masks
produced by DeepLabv3 [139]. (d) The 3D Lidar data with object detection results from SECOND [140]. Amongst the four,
only the 3D perception algorithm outputs range to detected objects.
other appealing factor is their response time on the order methods which attempt to deal with dynamic lighting con-
of microseconds [90], as frame rate is a significant limita- ditions directly have also been developed. Both attempting
tion for high-speed driving. The sensor resolution remains to extract lighting invariant features [149] and assessing the
an issue, but new models are rapidly improving. They have quality of features [150] have been proposed. Pre-processed,
been used for a variety of applications closely related to illumination invariant images have applied to ADS [151]
ADS. A recent survey outlines progress in pose estimation and were shown to improve localization, mapping and scene
and SLAM, visual-inertial odometry and 3D reconstruction, classification capabilities over long periods of time. Still,
as well as other applications [145]. Most notably, a dataset dealing with the unpredictable conditions brought forth by
for end-to-end driving with event cameras was recently pub- inadequate or changing illumination remains a central chal-
lished, with preliminary experiments showing that the output lenge preventing the widespread implementation of ADS.
of an event camera can, to some extent, be used to predict car
steering angle [89]. 2) SEMANTIC SEGMENTATION
Beyond image classification and object detection, computer
b: POOR ILLUMINATION AND CHANGING APPEARANCE vision research has also tackled the task of image segmenta-
The main drawback with using camera is that changes in tion. This consists of classifying each pixel of an image with
lighting conditions can significantly affect their performance. a class label. This task is of particular importance to driving
Low light conditions are inherently difficult to deal with, automation as some objects of interest are poorly defined by
while changes in illumination due to shifting shadows, intem- bounding boxes, in particular roads, traffic lines, sidewalks
perate weather, or seasonal changes, can cause algorithms to and buildings. A segmented scene in an urban area can be
fail, in particular supervised learning methods. For example, seen in Figure 8. As opposed to semantic segmentation, which
snow drastically alters the appearance of scenes and hides labels pixels based on a class, instance segmentation algo-
potentially key features such as lane markings. An easy alter- rithms further separates instances of the same class, which
native is to use an alternate sensing modalities for perception, is important in the context of driving automation. In other
but lidar also has difficulties with some weather conditions words, objects which may have different trajectories and
like fog and snow [146], and radars lack the necessary resolu- behaviors must be differentiated from each other. We used
tion for many perception tasks [51]. A sensor fusion strategy the COCO dataset [152] to train the instance segmentation
is often employed to avoid any single point of failure [147]. algorithm Mask R-CNN [138] with the sample result shown
Thermal imaging through infrared sensors are also used in Figure 8.
for object detection in low light conditions, which is partic- Segmentation has recently become feasible for real-time
ularly effective for pedestrian detection [148]. Camera-only applications. Generally, developments in this field progress
FIGURE 9. Outline of a traditional method for object detection from 3D pointcloud data. Various filtering and data reduction methods are
used first, followed by clustering. The resulting clusters are shown by the different colored points in the 3D lidar data of pedestrians
collected by our data collection platform.
in parallel with image-based object detection. The afore- so far as the primary perception modality. However, cameras
mentioned Mask R-CNN [138] is a generalization of Faster have limitations that are critical to ADS. Aside from illumi-
R-CNN [153]. The multi-task R-CNN network can achieve nation which was previously discussed, camera-based object
accurate bounding box estimation and instance segmentation detection occurs in the projected image space and therefore
simultaneously and can also be generalized to other tasks like the scale of the scene is unknown. To make use of this infor-
pedestrian pose estimation with minimal domain knowledge. mation for dynamic driving tasks like obstacle avoidance, it is
Running at 5 fps means it is approaching the area of real-time necessary to bridge the gap from 2D image-based detection to
use for ADS. the 3D, metric space. Depth estimation is therefore necessary,
Unlike Mask-RCNN’s architecture which is more akin to which is in fact possible with a single camera [158] though
those used for object detection through its use of region stereo or multi-view systems are more robust [159]. These
proposal networks, segmentation networks usually employ a algorithms necessarily need to solve an expensive image
combination of convolutions for feature extraction. Those are matching problem, which adds a significant amount of pro-
followed by deconvolutions, also called transposed convolu- cessing cost to an already complex perception pipeline.
tions, to obtain pixel resolution labels [154], [155]. Feature A relatively new sensing modality, the 3D lidar, offers an
pyramid networks are also commonly used, for example in alternative for 3D perception. The 3D data collected inher-
PSPNet [156], which also introduced dilated convolutions for ently solves the scale problem, and since they have their
segmentation. This idea of sparse convolutions was then used own emission source, they are far less dependable on light-
to develop DeepLab [157], with the most recent version being ing condition, and less susceptible to intemperate weather.
the current state-of-the-art for object segmentation [139]. The sensing modality collects sparse 3D points representing
We employed DeepLab with our ADS and a segmented frame the surfaces of the scene, as shown in Figure 9, which are
is shown in Figure 8. challenging to use for object detection and classification. The
While most segmentation networks are as of yet too slow appearance of objects change with range, and after some dis-
and computationally expensive to be used in ADS, it is impor- tance, very few data points per objects are available to detect
tant to notice that many of these segmentations networks an object. This poses some challenges for detection, but since
are initially trained for different tasks, such as bounding the data is a direct representation of the world, it is more easily
box estimation, then generalized to segmentation networks. separable. Traditional methods often use euclidean clustering
Furthermore, these networks were shown to learn universal [160] or region-growing methods [161] for grouping points
feature representations of images, and can be generalized for into objects. This approach has been made much more robust
many tasks. This suggests the possibility that single, general- through various filtering techniques, such as ground filter-
ized perception networks may be able to tackle all perception ing [162] and map-based filtering [163]. We implemented
tasks required for an ADS. a 3D object detection pipeline to get clustered objects from
raw point cloud input. An example of this process is shown
3) 3D OBJECT DETECTION in Figure 9.
Given their affordability, availability and widespread As with image-based methods, machine learning has also
research, cameras are used by nearly all algorithms presented recently taken over 3D detection methods. These methods
FIGURE 10. A depth image produced from synthetic lidar data, generated FIGURE 11. Bird’s eye view perspective of 3D lidar data, a sample from
in the CARLA [164] simulator. the KITTI dataset [177].
have low cost and are robust to poor weather, while lidar
offer precise object localization capabilities, as discussed in
Section IV.
Another similar sensor to the radar are sonar devices,
though their extremely short range of <2m and poor angu-
lar resolution makes their use limited to very near obstacle
detection [146].
B. OBJECT TRACKING
Object tracking is also often referred to as multiple object
tracking (MOT) [183] and detection and tracking of multi-
ple objects (DATMO) [184]. For fully automated driving in
complex and high speed scenarios, estimating location alone FIGURE 12. A scene with several tracked pedestrians and cyclist with a
basic particle filter on an urban road intersection. Past trajectories are
is insufficient. It is necessary to estimate dynamic objects’ shown in white with current heading and speed shown by the direction
heading and velocity so that a motion model can be applied and magnitude of the arrow, sample collected by our data collection
to track the object over time and predict future trajectory to platform.
FIGURE 13. Annotating a 3D point cloud map with topological information. A large number of annotators were employed to build the map
shown on the right-hand side. The point-cloud and annotated maps are available on [197].
road structures and the ability to detect several lanes at long road maps which encode topology and emerging machine
ranges [199]. Annotated maps as shown in Figure 13 are learning-based road and lane classification methods, robust
extremely useful for understanding lane semantics. systems for driving automation are slowly taking shape.
Current methods on road understanding typically first rely
on exteroceptive data preprocessing. When cameras are used, VI. ASSESSMENT
this usually means performing image color corrections to nor- A robust ADS should constantly evaluate the overall risk
malize lighting conditions [203]. For lidar, several filtering level of the situation and predict the intentions of surrounding
methods can be used to reduce clutter in the data such as human drivers and pedestrians. A lack of acute assessment
ground extraction [162] or map-based filtering [163]. For any mechanism can lead to accidents. This section discusses
sensing modality, identifying dynamic objects which con- assessment under three subcategories: overall risk and uncer-
flicts with the static road scene is an important pre-processing tainty assessment, human driving behavior assessment, and
step. Then, road and lane feature extraction is performed on driving style recognition.
the corrected data. Color statistics and intensity information
A. RISK AND UNCERTAINTY ASSESSMENT
[204], gradient information [205], and various other filters
have been used to detect lane markings. Similar methods have Overall assessment can be summarized as quantifying the
been used for road estimation, where the usual uniformity uncertainties and the risk level of the driving scene. It is a
of roads and elevation gap at the edge allows for region promising methodology that can increase the safety of ADS
growing methods to be applied [206]. Stereo camera systems pipelines [31].
[207], as well as 3D lidars [204], have been used determine Using Bayesian methods to quantify and measure uncer-
the 3D structure of roads directly. More recently, machine tainties of deep neural networks was proposed in [212].
learning-based methods which either fuse maps with vision A Bayesian deep learning architecture was designed for prop-
[200] or use fully appearance-based segmentation [208] have agating uncertainty throughout an ADS pipeline, and the
been used. advantage of it over conventional approaches was shown
Once surfaces are estimated, model fitting is used to in a hypothetical scenario [31]. In summary, each module
establish the continuity of the road and lanes. Geometric conveys and accepts probability distributions instead of exact
fitting through parametric models such as lines [209] and outcomes throughout the pipeline, which increases the overall
splines [204] have been used, as well as non-parametric robustness of the system.
continuous models [210]. Models that assume parallel lanes An alternative approach is to assess the overall risk level of
have been used [201], and more recently models integrating the driving scene separately, i.e outside the pipeline. Sensory
topological elements such as lane splitting and merging were inputs were fed into a risk inference framework in [83], [213]
proposed [204]. to detect unsafe lane change events using Hidden Markov
Temporal integration completes the road and lane segmen- Models (HMMs) and language models. Recently, a deep
tation pipeline. Here, vehicle dynamics are used in combina- spatiotemporal network that infers the overall risk level of a
tion with a road tracking system to achieve smooth results. driving scene was introduced in [211]. Implementation of this
Dynamic information can also be used alongside Kalman method is available open-source.1 We employed this method
filtering [201] or particle filtering [207] to achieve smoother to assess the risk level of a lane change as shown in Figure 14.
results.
B. SURROUNDING DRIVING BEHAVIOR ASSESSMENT
Road and lane estimation is a well-researched field and
Understanding surrounding human driver intention is most
many methods have already been integrated successfully
relevant to medium to long term prediction and decision
for lane keeping assistance systems. However, most meth-
making. In order to increase the prediction horizon of
ods remain riddled with assumptions and limitations, and
truly general systems which can handle complex road 1 https://github.com/Ekim-Yurtsever/DeepTL-Lane-Change-
topologies have yet to be developed. Through standardized Classification
FIGURE 14. Assessing the overall risk level of driving scenes. We employed an open-source1 deep spatiotemporal
video-based risk detection framework [211] to assess the image sequences shown in this figure.
surrounding object behavior, human traits should be con- TABLE 8. Driving style categorization of human-driven ego vehicles.
sidered and incorporated into the prediction and evaluation
steps. Understanding surrounding driver intention from the
perspective of an ADS is not a common practice in the field,
as such, state-of-the-art is not established yet.
In [214], a target vehicle’s future behavior was predicted
with a hidden Markov model (HMM) and the prediction time
horizon was extended 56% by learning human driving traits.
The proposed system tagged observations with predefined
maneuvers. Then, the features of each type were learned in
a data-centric manner with HMMs. Another learning based
approach was proposed in [215], where a Bayesian network
classifier was used to predict maneuvers of individual drivers
on highways. A framework for long term driver behavior
prediction using a combination of a hybrid state system and
HMM was introduced in [216]. Surrounding vehicle infor-
mation was integrated with ego-behavior through a symbol- of human-driven ego vehicles can be found in [220] and
ization framework in [83], [213]. Detecting dangerous cut in in [221]. Readers are referred to these papers for a com-
maneuvers was achieved with an HMM framework that was plete review. The remainder of this subsection gives a brief
trained on safe and dangerous data in [217]. Lane change overview of human-driven ego vehicle-based driving style
events were predicted 1.3 seconds in advance with support recognition.
vector machines (SVM) and Bayesian filters [218]. Typically, driving style is defined with respect to either
The main challenges are the short observation window for aggressiveness [222]–[226] or fuel consumption [227]–[231].
understanding the intention of humans and real-time high- For example, [232] introduced a rule-based model that clas-
frequency computation requirements. ADSs can typically sified driving styles with respect to jerk. This model decides
only observe a surrounding vehicle only for seconds. Compli- whether a maneuver is aggressive or calm by a set of rules
cated driving behavior models that require longer observation and jerk thresholds. Drivers were categorized with respect
periods cannot be utilized under these circumstances. to their average speed in [233]. In conventional methods,
total number and meaning of driving style classes are pre-
C. DRIVING STYLE RECOGNITION defined beforehand. The vast majority of driving style recog-
In 2016, Google’s self-driving car collided with an oncoming nition literature uses two [83], [213], [222], [223], [227] or
bus [8] during a lane change where it assumed that the bus three [234]–[236] classes. Representing driving style in a
driver was going to yield. However, the bus driver accelerated continuous domain is uncommon, but there are some studies.
instead. This accident may have been prevented if the ADS In [237], driving style was depicted as a continuous value
understood the bus driver’s individual, unique driving style between −1 and +1, which stands for mild and active respec-
and predicted his behavior. tively. Details of classification methods are given in Table 8.
Driving style is a broad term without an established More recently, machine learning based approaches have
common definition. Furthermore, recognizing the surround- been utilized for driving style recognition. Principal com-
ing human driving styles is a severely understudied topic. ponent analysis was used and five distinct driving classes
However, thorough reviews of driving style categorization were detected in an unsupervised manner in [238] and a
FIGURE 15. Global plan and the local paths. The annotated vector map shown in Figure 13 was utilized by the planner. We employed
OpenPlanner [219], which is a graph-based planner, to illustrate a typical planning approach.
GMM based driver model was used to identify individ- Route planning is formulated as finding the point-to-point
ual drivers with success in [241]. Car-following and pedal shortest path in a directed graph, and conventional meth-
operation behavior was investigated separately in the latter ods are examined under four categories in [248]. These are;
study. Another GMM based driving style recognition model goal-directed, separator-based, hierarchical and bounded-hop
was proposed for electric vehicle range prediction in [242]. techniques. A* search [249] is a standard goal-directed path
In [222], aggressive event detection with dynamic time warp- planning algorithm and used extensively in various fields for
ing was presented where the authors reported a high success almost 50 years.
score. Bayesian approaches were utilized in [243] for model- The main idea of separator-based techniques is to remove a
ing driving style on roundabouts and in [244] to asses criti- subset of vertices [250] or arcs from the graph and compute an
cal braking situations. Bag-of-words and K-means clustering overlay graph over it. Using the overlayed graph to calculate
was used to represent individual driving features in [245]. the shortest path results in faster queries.
A stacked autoencoder was used to extract unique driving Hierarchical techniques take advantage of the road hier-
signatures from different drivers, and then macro driving style archy. For example, the road hierarchy in the US can be
centroids were found with clustering [240]. Another autoen- listed from top to bottom as freeways, arterials, collectors and
coder network was used to extract road-type specific driving local roads respectively. For a route query, the importance
features [246]. Similarly, driving behavior was encoded in of hierarchy increases as the distance between origin and
a 3-channel RGB space with a deep sparse autoencoder to destination gets longer. The shortest path may not be the
visualize individual driving styles [247]. fastest nor the most desirable route anymore. Getting away
A successful integration of driving style recognition into a from the destination thus making the route a bit longer to
real world ADS pipeline is not reported yet. However, these take the closest highway ramp may result in faster travel time
studies are promising and point to a possible new direction in in comparison to following the shortest path of local roads.
ADS development. Contraction Hierarchies (CH) method was proposed in [251]
for exploiting road hierarchy.
Precomputing distances between selected vertexes and uti-
VII. PLANNING AND DECISION MAKING
lizing them on the query time is the basis of bounded-hop
Planning can be divided into two sub-tasks: global route plan-
techniques. Precomputed shortcuts can be utilized partly or
ning and local path planning. Figure 15 illustrates a typical
exclusively for navigation. However, the naive approach of
planning approach in detail.
precomputing all possible routes from every pair of ver-
The remainder of this section gives a brief overview of the
tices is impractical in most cases with large networks. One
subject. For more information studies such as [18], [23], [248]
possible solution to this is to use hub labeling (HL) [252].
can be referred.
This approach requires preprocessing also. A label associ-
ated with a vertex consists of nearby hub vertices and the
A. GLOBAL PLANNING distance to them. These labels satisfy the condition that at
The global planner is responsible for finding the route on the least one shared hub vertex must exist between the labels of
road network from origin to the final destination. The user any given two vertices. HL is the fastest query time algorithm
usually defines the final destination. Global navigation is a for route planning [248], in the expense of high storage
well-studied subject, and high performance has become an usage.
industry standard for more than a decade. Almost all modern A combination of the above algorithms are popular in state-
production cars are equipped with navigation systems that of-the-art systems. For example, [253] combined a separator
utilize GPS and offline maps to plan a global route. with a bounded-hop method and created the Transit Node
TABLE 9. Local planning techniques. roadmap method (PRM) [259] and rapidly-exploring random
tree (RRT) [257] are the most commonly used SBP algo-
rithms. PRM first samples the C-space during its learning
phase and then makes a query with the predefined origin and
destination points on the roadmap. RRT, on the other hand,
is a single query planner. The path between start and goal
configuration is incrementally built with random tree-like
branches. RRT is faster than PRM and both are probabilis-
tically complete [257], which means a path that satisfies the
given conditions will be guaranteed to be found with enough
runtime. RRT* [258], an extension of the RRT, provides
more optimal paths instead of completely random ones while
sacrificing computational efficiency. The main disadvantage
of SBP in general is, again, the jerky trajectories [19].
Interpolating curve planners fit a curve to a known set of
points [19], e.g. way-points generated from the global plan
Routing with Arc Flags (TNR + AF) algorithm. Modern or a discrete set of future points from another local planner.
route planners can make a query in milliseconds. The main obstacle avoidance strategy is to interpolate new
collision-free paths that first deviate from, and then re-enter
B. LOCAL PLANNING back to the initial planned trajectory. The new path is gener-
The objective of the local planner is to execute a global ated by fitting a curve to a new set of points: an exit point from
plan without failing. In other words, in order to complete the currently traversed trajectory, newly sampled collision
its trip, the ADS must find trajectories to avoid obstacles free points, and a re-entry point on the initial trajectory. The
and satisfy optimization criteria in the configuration space resultant trajectory is smooth, however, the computational
(C-space), given a starting and destination point. A detailed load is usually higher compared to other methods. There
local planning review is presented in [19] where the tax- are various curve families that are used commonly such as
onomy of motion planning was divided into four groups; clothoids [260], polynomials [261], Bezier curves [262] and
graph-based planners, sampling-based planners, interpolat- splines [104].
ing curve planners and numerical optimization approaches. Optimization based motion planners improve the quality
After a summary of these conventional planners, the emerg- of already existing paths with optimization functions. A*
ing deep learning-based planners are introduced at the end of trajectories were optimized with numeric non-linear func-
this section. Table 9 gives a brief summary of local planning tions in [263]. Potential Field Method (PFM) was improved
methods. by solving the inherent oscillation problem using Newton’s
Graph-based local planners use the same techniques as method with obtaining C1 continuity in [264].
graph-based global planners such as Dijkstra [254] and Recently, Deep Learning (DL) and reinforcement learn-
A* [249], which output discrete paths rather than continu- ing based local planners started to emerge as an alternative.
ous ones. This can lead to jerky trajectories [19]. A more Fully convolutional 3D neural networks can generate future
advanced graph-based planner is the state lattice algorithm. paths from sensory input such as lidar point clouds [265].
As all graph-based methods, state lattice discretizes the deci- An interesting take on the subject is to segment image data
sion space. High dimensional lattice nodes, which typically with path proposals using a deep segmentation network [266].
encode 2D position, heading and curvature [255], are used Planning a safe path in occluded intersections was achieved in
to create a grid first. Then, the connections between the a simulation environment using deep reinforcement learning
nodes are precomputed with an inverse path generator to build in [268]. The main difference between end-to-end driving
the state lattice. During the planning phase, a cost function, and deep learning based local planners is the output: the
which usually considers proximity to obstacles and deviation former outputs direct vehicle control signals such as steering
from the goal, is utilized for finding the best path with the and pedal operation, whereas the latter generates a trajectory.
precomputed path primitives. State lattices can handle high This enables DL planners to be integrated into conventional
dimensions and is good for local planning in dynamical envi- pipelines [24].
ronments, however, the computational load is high and the Deep learning based planners are promising, but they
discretization resolution limits the planners’ capacity [19]. are not widely used in real-world systems yet. Lack of
A detailed overview of Sampling Based Planning (SBP) hard-coded safety measures, generalization issues, need for
methods can be found in [267]. In summary, SBP tries to labeled data are some of the issues that need to be addressed.
build the connectivity of the C-space by randomly sampling
paths in it. Randomized Potential Planner (RPP) [256] is VIII. HUMAN MACHINE INTERACTION
one of the earliest SBP approaches, where random walks Vehicles communicate with their drivers/passengers through
are generated for escaping local minimums. Probabilistic their HMI module. The nature of this communication greatly
depends on the objective, which can be divided into two: pri- and pedestrians. Then, this information can be used in another
mary driving tasks and secondary tasks. The interaction inten- algorithm for planning purposes. Even though these two algo-
sity of these tasks depend on the automation level. Where rithms are connected in the stack of this example, the object
a manually operated, level zero conventional car requires detection part can be worked on and validated separately
constant user input for operation, a level five ADS may need during the development process. Since computer vision is
user input only at the beginning of the trip. Furthermore, a well-studied field, there are annotated datasets for object
the purpose of interaction may affect intensity. A shift from detection and tracking specifically. The existence of these
executing primary driving tasks to monitoring the automation datasets increases the development process and enables inter-
process raises new HMI design requirements. disciplinary research teams to work with each other much
There are several investigations such as [269], [270] about more efficiently. For end-to-end driving, the dataset has to
automotive HMI technologies, mostly from the distraction include additional ego-vehicle signals, chiefly steering and
point of view. Manual user interfaces for secondary tasks are longitudinal control signals.
more desired than their visual counterparts [269]. The main As learning approaches emerged, so did training datasets
reason is vision is absolutely necessary and has no alternative to support them. The PASCAL VOC dataset [292], which
for primary driving tasks. Visual interface interactions require grew from 2005 to 2012, was one of the first dataset fea-
glances with durations between 0.6 and 1.6 seconds with a turing a large amount of data with relevant classes for ADS.
mean of 1.2 seconds [269]. As such, secondary task interfaces However, the images often featured single objects, in scenes
that require vision is distracting and detrimental for driving. and scales that are not representative of what is encoun-
Auditory User Interfaces (AUI) are good alternatives to tered in driving scenarios. In 2012, the KITTI Vision Bench-
visually taxing HMI designs. AUIs are omni-directional: even mark [177] remedied this situation by providing a relatively
if the user is not attending, the auditory cues are hard to large amount of labeled driving scenes. It remains as one
miss [271]. The main challenge of audio interaction is auto- of the most widely used datasets for applications related to
matic speech recognition (ASR). ASR is a very mature field. driving automation. Yet in terms of quantity of data and
However, in vehicle domain there are additional challenges; number of labeled classes, it is far inferior to generic image
low performance caused by uncontrollable cabin conditions databases such as ImageNet [131] and COCO [152]. While
such as wind and road noise [272]. Beyond simple voice no doubt useful for training, these generic image databases
commands, conversational natural language interaction with lack the adequate context to test the capabilities of ADS.
an ADS is still an unrealized concept with many unsolved UC Berkeley DeepDrive [275] is a recent dataset with anno-
challenges [273]. tated image data. The Oxford RobotCar [53] was used to
The biggest HMI challenge is at automation level three and collect over 1000 km of driving data with six cameras, lidar,
four. The user and the ADS need to have a mutual understand- GPS and INS in the UK but is not annotated. ApolloScape
ing, otherwise, they will not be able to grasp each other’s is a very recent dataset that is not fully public yet [278].
intentions [270]. The transition from manual to automated Cityscapes [274] is commonly used for computer vision algo-
driving and vice versa is prone to fail in the state-of-the- rithms as a benchmark set. Mapillary Vistas is a big image
art. Recent research showed that drivers exhibit low cognitive dataset with annotations [276]. TorontoCity benchmark [286]
load when monitoring automated driving compared to doing is a very detailed dataset; however it is not public yet. The
a secondary task [288]. Even though some experimental nuScenes dataset is the most recent urban driving dataset with
systems can recognize driver-activity with a driver facing lidar and image sensors [178]. Comma.ai has released a part
camera based on head and eye-tracking [289], and prepare of their dataset [293] which includes 7.25 hours of driving.
the driver for handover with visual and auditory cues [290] In DDD17 [89] around 12 hours of driving data is recorded.
in simulation environments, a real world system with an The LiVi-Set [281] is a new dataset that has lidar, image and
efficient handover interaction module does not exist at the driving behavior. CommonRoad [294] is a new benchmark
moment. This is an open problem [291] and future research for motion-planning.
should focus on delivering better methods to inform/prepare Naturalistic driving data is another type of dataset that
the driver for easing the transition [41]. concentrates on the individual element of the driving: the
driver. SHRP2 [283] includes over 3000 volunteer partici-
IX. DATASETS AND AVAILABLE TOOLS pants’ driving data over a 3-year collection period. Other
A. DATASETS AND BENCHMARKS naturalistic driving datasets are the 100-Car study [284],
Datasets are crucial for researchers and developers because euroFOT [285] and NUDrive [282]. Table 10 shows the com-
most of the algorithms and tools have to be tested and trained parison of these datasets.
before going on road.
Typically, sensory inputs are fed into a stack of algorithms B. OPEN-SOURCE FRAMEWORKS AND SIMULATORS
with various objectives. A common practice is to test and Open source frameworks are very useful for both researchers
validate these functions separately on annotated datasets. For and the industry. These frameworks can ‘‘democratize’’ ADS
example, the output of cameras, 2D vision, can be fed into development. Autoware [122], Apollo [295], Nvidia Drive-
an object detection algorithm to detect surrounding vehicles Works [296] and openpilot [297] are amongst the most used
software-stacks capable of running an ADS platform in real be implemented due to the complex infrastructure required.
world. We utilized Autoware [122] to realize core automated Human-machine interaction is an under-researched field with
driving functions in this study. many open problems.
Simulations also have an important place for ADS devel- The development of automated driving systems relies on
opment. Since the instrumentation of an experimental vehicle the advancements of both scientific disciplines and new tech-
still has a high cost and conducting experiments on public nologies. As such, we discussed the recent research develop-
road networks are highly regulated, a simulation environ- ments which are likely to have a significant impact on auto-
ment is beneficial for developing certain algorithms/modules mated driving technology, either by overcoming the weakness
before road tests. Furthermore, highly dangerous scenarios of previous methods or by proposing an alternative. This
such as a collision with pedestrian can be tested in simula- survey has shown that through inter-disciplinary academic
tions with ease. CARLA [164] is an urban driving simulator collaboration and support from industries and the general
developed for this purpose. TORCS [298] was developed for public, the remaining challenges can be addressed. With
race track simulation. Some researchers even used computer directed efforts towards ensuring robustness at all levels of
games such as Grand Theft Auto V [299]. Gazebo [300] automated driving systems, safe and efficient roads are just
is a common simulation environment for robotics. For traf- beyond the horizon.
fic simulations, SUMO [301] is a widely used open-source
platform. [302] proposed different concepts of integrating REFERENCES
real-world measurements into the simulation environment. [1] S. Singh, ‘‘Critical reasons for crashes investigated in the national motor
vehicle crash causation survey,’’ Washington, DC, USA, Tech. Rep. DOT
HS 812 115, 2015.
X. CONCLUSIONS [2] T. J. Crayton and B. M. Meier, ‘‘Autonomous vehicles: Developing a pub-
In this survey on automated driving systems, we outlined lic health research agenda to frame the future of transportation policy,’’
some of the key innovations as well as existing systems. J. Transp. Health, vol. 6, pp. 245–252, Sep. 2017.
[3] W. D. Montgomery, R. Mudge, E. L. Groshen, S. Helper,
While the promise of automated driving is enticing and J. P. MacDuffie, and C. Carson, ‘‘America’s workforce and the
already marketed to consumers, this survey has shown there self-driving future: Realizing productivity gains and spurring economic
remains clear gaps in the research. Several architecture mod- growth,’’ Securing America’s Future Energy, Washington, DC, USA,
Tech. Rep., 2018.
els have been proposed, from fully modular to completely [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
end-to-end, each with their own shortcomings. The optimal with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf.
sensing modality for localization, mapping and perception Process. Syst., 2012, pp. 1097–1105.
[5] B. Schwarz, ‘‘LIDAR: Mapping the world in 3D,’’ Nature Photon., vol. 4,
is still disagreed upon, algorithms still lack accuracy and
no. 7, p. 429, 2010.
efficiency, and the need for a proper online assessment has [6] S. Hecker, D. Dai, and L. Van Gool, ‘‘End-to-end learning of driving
become apparent. Less than ideal road conditions are still an models with surround-view cameras and route planners,’’ in Proc. Eur.
open problem, as well as dealing with intemperate weather. Conf. Comput. Vis. (ECCV), 2018, pp. 435–453.
[7] D. Lavrinc. This is How Bad Self-Driving Cars Suck in Rain. Accessed:
Vehicle-to-vehicle communication is still in its infancy, while Dec. 16, 2018. [Online]. Available: https://jalopnik.com/this-is-how-bad-
centralized, cloud-based information management has yet to self-driving-cars-suck-in-the-rain-1666268433
[8] A. Davies. Google’s Self-Driving Car Caused Its First Crash. Accessed: [30] C. Englund, L. Chen, J. Ploeg, E. Semsar-Kazerooni, A. Voronov,
Dec. 16, 2018. [Online]. Available: https://www.wired.com/2016/ H. H. Bengtsson, and J. Didoff, ‘‘The grand cooperative driving challenge
02/googles-self-driving-car-may-caused-first-crash/ 2016: Boosting the introduction of cooperative automated vehicles,’’
[9] M. McFarland. Who’s Responsible When an Autonomous Car IEEE Wireless Commun., vol. 23, no. 4, pp. 146–152, Aug. 2016.
Crashes? Accessed: Jun. 4, 2019. [Online]. Available: https://money. [31] R. McAllister, Y. Gal, A. Kendall, M. van der Wilk, A. Shah, R. Cipolla,
cnn.com/2016/07/07/technology/tesla-liability-risk/index.html and A. Weller, ‘‘Concrete problems for autonomous vehicle safety:
[10] T. B. Lee. Autopilot Was Active When a Tesla Crashed Into a Advantages of Bayesian deep learning,’’ in Proc. 26th Int. Joint Conf.
Truck, Killing Driver. Accessed: May 19, 2019. [Online]. Available: Artif. Intell., Aug. 2017, pp. 1–9.
https://arstechnica.com/cars/2019/05/feds-autopilot-was-active-during- [32] SAE, ‘‘Taxonomy and definitions for terms related to driving automation
deadly-march-tesla-crash/ systems for on-road motor vehicles,’’ SAE Tech. Paper J3016_201806,
[11] Eureka. E! 45: Programme For a European Traffic System With Highest 2016.
Efficiency and Unprecedented Safety. Accessed: May 19, 2019. [Online]. [33] DESA, UN, ‘‘World population prospects the 2017 revision key findings
Available: https://www.eurekanetwork.org/project/id/45 and advance tables,’’ in Proc. World Popul. Prospect., 2017, pp. 1–46.
[12] B. Ulmer, ‘‘VITA II-active collision avoidance in real traffic,’’ in Proc. [34] Deloitte. 2019 Deloitte Global Automotive Consumer Study—
Intell. Vehicles Symp., 1994, pp. 1–6. Advanced Vehicle Technologies and Multimodal Transportation,
[13] M. Buehler, K. Iagnemma, and S. Singh, Eds., The 2005 DARPA Grand
Global Focus Countries. Accessed: May 19, 2019. [Online]. Available:
Challenge: The Great Robot Race, vol. 36. Berlin, Germany: Springer,
https://www2.deloitte.com/content/dam/Deloitte/us/Documents/
2007.
manufacturing/us-global-automotive-consumer-study-2019.pdf
[14] D. Feng, C. Haase-Schuetz, L. Rosenbaum, H. Hertlein, C. Glaeser,
[35] Federatione Internationale de l’Automobile (FiA) Region 1.
F. Timm, W. Wiesbeck, and K. Dietmayer, ‘‘Deep multi-modal object
The Automotive Digital Transformation and the Economic Impacts
detection and semantic segmentation for autonomous driving: Datasets,
of Existing Data Access Models. Accessed: May 19, 2019. [Online].
methods, and challenges,’’ 2019, arXiv:1902.07830. [Online]. Available:
Available: https://www.fiaregion1.com/wp-content/uploads/2019/03/
http://arxiv.org/abs/1902.07830
[15] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, The-Automotive-Digital-Transformation_Full-study.pdf
J. Z. Kolter, D. Langer, O. Pink, V. Pratt, M. Sokolsky, G. Stanek, [36] R. Rajamani, Vehicle Dynamics and Control, vol. 25. Berlin, Germany:
D. Stavens, A. Teichman, M. Werling, and S. Thrun, ‘‘Towards fully Springer, 2006, p. 471.
autonomous driving: Systems and algorithms,’’ in Proc. IEEE Intell. [37] M. R. Hafner, D. Cunningham, L. Caminiti, and D. Del Vecchio,
Vehicles Symp. (IV), Jun. 2011, pp. 163–168. ‘‘Cooperative collision avoidance at intersections: Algorithms and exper-
[16] M. Campbell, M. Egerstedt, J. P. How, and R. M. Murray, ‘‘Autonomous iments,’’ IEEE Trans. Intell. Transp. Syst., vol. 14, no. 3, pp. 1162–1175,
driving in urban environments: Approaches, lessons and challenges,’’ Sep. 2013.
Phil. Trans. Roy. Soc. A, Math., Phys. Eng. Sci., vol. 368, no. 1928, [38] A. Colombo and D. Del Vecchio, ‘‘Efficient algorithms for collision
pp. 4649–4672, Oct. 2010. avoidance at intersections,’’ in Proc. 15th ACM Int. Conf. Hybrid Syst.,
[17] S. Kuutti, S. Fallah, K. Katsaros, M. Dianati, F. Mccullough, and Comput. Control (HSCC), 2012, pp. 145–154.
A. Mouzakitis, ‘‘A survey of the state-of-the-art localization techniques [39] P. E. Ross, ‘‘The audi a8: The World’s first production car to achieve
and their potentials for autonomous vehicle applications,’’ IEEE Internet level 3 autonomy,’’ IEEE Spectr., vol. 1, 2017. [Online]. Available:
Things J., vol. 5, no. 2, pp. 829–846, Apr. 2018. https://spectrum.ieee.org/cars-that-think/transportation/self-driving/the-
[18] B. Paden, M. Cap, S. Z. Yong, D. Yershov, and E. Frazzoli, ‘‘A survey of audi-a8-the-worlds-first-production-car-to-achieve-level-3-autonomy
motion planning and control techniques for self-driving urban vehicles,’’ [40] C. Gold, M. Körber, D. Lechner, and K. Bengler, ‘‘Taking over control
IEEE Trans. Intell. Vehicles, vol. 1, no. 1, pp. 33–55, Mar. 2016. from highly automated vehicles in complex traffic situations: The role of
[19] D. Gonzalez, J. Perez, V. Milanes, and F. Nashashibi, ‘‘A review of motion traffic density,’’ Hum. Factors, J. Hum. Factors Ergonom. Soc., vol. 58,
planning techniques for automated vehicles,’’ IEEE Trans. Intell. Transp. no. 4, pp. 642–652, Jun. 2016.
Syst., vol. 17, no. 4, pp. 1135–1145, Apr. 2016. [41] N. Merat, A. H. Jamson, F. C. H. Lai, M. Daly, and O. M. J. Carsten,
[20] J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, ‘‘Transition to manual: Driver behaviour when resuming control from a
‘‘Autonomous vehicle perception: The technology of today and tomor- highly automated vehicle,’’ Transp. Res. F, Traffic Psychol. Behaviour,
row,’’ Transp. Res. C, Emerg. Technol., vol. 89, pp. 384–406, Apr. 2018. vol. 27, pp. 274–282, Nov. 2014.
[21] G. Bresson, Z. Alsayed, L. Yu, and S. Glaser, ‘‘Simultaneous localization [42] E. Ackerman, ‘‘Toyota’s gill pratt on self-driving cars and the
and mapping: A survey of current trends in autonomous driving,’’ IEEE reality of full autonomy,’’ IEEE Spectr., May 2017. [Online].
Trans. Intell. Vehicles, vol. 2, no. 3, pp. 194–220, Sep. 2017. Available: https://spectrum.ieee.org/cars-that-think/transportation/self-
[22] K. Abboud, H. A. Omar, and W. Zhuang, ‘‘Interworking of DSRC and driving/toyota-gill-pratt-on-the-reality-of-full-autonomy
cellular network technologies for V2X communications: A survey,’’ IEEE [43] J. D’Onfro. ‘I Hate Them’: Locals Reportedly Are Frustrated With
Trans. Veh. Technol., vol. 65, no. 12, pp. 9457–9470, Dec. 2016. Alphabet’s Self-Driving Cars. Accessed: May 19, 2019. [Online].
[23] C. Badue, R. Guidolini, R. Vivacqua Carneiro, P. Azevedo, V. B. Cardoso, Available: https://www.cnbc.com/2018/08/28/locals-reportedly-
A. Forechi, L. Jesus, R. Berriel, T. Paixão, F. Mutz, L. Veronese, frustrated-with-alphabets-waymo-self-driving-cars.html
T. Oliveira-Santos, and A. F. De Souza, ‘‘Self-driving cars: A survey,’’ [44] J.-F. Bonnefon, A. Shariff, and I. Rahwan, ‘‘The social dilemma of
2019, arXiv:1901.04407. [Online]. Available: http://arxiv.org/ autonomous vehicles,’’ Science, vol. 352, no. 6293, pp. 1573–1576,
abs/1901.04407 Jun. 2016.
[24] W. Schwarting, J. Alonso-Mora, and D. Rus, ‘‘Planning and decision-
[45] J.-F. Bonnefon, A. Shariff, and I. Rahwan, ‘‘The social dilemma of
making for autonomous vehicles,’’ Annu. Rev. Control, Robot., Auto.
autonomous vehicles,’’ 2015, arXiv:1510.03346. [Online]. Available:
Syst., vol. 1, no. 1, pp. 187–210, May 2018.
[25] S. Lefèvre, D. Vasquez, and C. Laugier, ‘‘A survey on motion prediction http://arxiv.org/abs/1510.03346
and risk assessment for intelligent vehicles,’’ ROBOMECH J., vol. 1, [46] Y. Tian, K. Pei, S. Jana, and B. Ray, ‘‘DeepTest: Automated testing of
no. 1, p. 1, Dec. 2014. deep-neural-network-driven autonomous cars,’’ in Proc. 40th Int. Conf.
[26] M. Buehler, K. Iagnemma, and S. Singh, Eds., The DARPA Urban Chal- Softw. Eng. (ICSE), 2018, pp. 303–314.
lenge: Autonomous Vehicles in City Traffic, vol. 56. Berlin, Germany: [47] C. Urmson et al., ‘‘Autonomous driving in urban environments: Boss
Springer, 2009. and the urban challenge,’’ J. Field Robot., vol. 25, no. 8, pp. 425–466,
[27] A. Broggi, P. Cerri, M. Felisa, M. C. Laghi, L. Mazzei, and P. P. Porta, 2008.
‘‘The VisLab intercontinental autonomous challenge: An extensive test [48] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, ‘‘Internet of vehicles:
for a platoon of intelligent vehicles,’’ Int. J. Vehicle Auto. Syst., vol. 10, From intelligent grid to autonomous cars and vehicular clouds,’’
no. 3, p. 147, 2012. in Proc. IEEE World Forum Internet Things (WF-IoT), Mar. 2014,
[28] A. Broggi, P. Cerri, S. Debattisti, M. Chiara Laghi, P. Medici, D. Molinari, pp. 241–246.
M. Panciroli, and A. Prioletti, ‘‘PROUD—Public road urban driverless- [49] E.-K. Lee, M. Gerla, G. Pau, U. Lee, and J.-H. Lim, ‘‘Internet of vehicles:
car test,’’ IEEE Trans. Intell. Transp. Syst., vol. 16, no. 6, pp. 3508–3519, From intelligent grid to autonomous cars and vehicular fogs,’’ Int. J. Dis-
Dec. 2015. trib. Sensor Netw., vol. 12, no. 9, Sep. 2016, Art. no. 155014771666550.
[29] P. Cerri, G. Soprani, P. Zani, J. Choi, J. Lee, D. Kim, K. Yi, and A. Broggi, [50] M. Amadeo, C. Campolo, and A. Molinaro, ‘‘Information-centric net-
‘‘Computer vision at the Hyundai autonomous challenge,’’ in Proc. 14th working for connected vehicles: A survey and future perspectives,’’ IEEE
Int. IEEE Conf. Intell. Transp. Syst. (ITSC), Oct. 2011, pp. 777–783. Commun. Mag., vol. 54, no. 2, pp. 98–104, Feb. 2016.
[51] J. Wei, J. M. Snider, J. Kim, J. M. Dolan, R. Rajkumar, and B. Litkouhi, [75] H. T. Cheng, H. Shan, and W. Zhuang, ‘‘Infotainment and road safety
‘‘Towards a viable autonomous driving research platform,’’ in Proc. IEEE service support in vehicular networking: From a communication per-
Intell. Vehicles Symp. (IV), Jun. 2013, pp. 763–770. spective,’’ Mech. Syst. Signal Process., vol. 25, no. 6, pp. 2020–2038,
[52] A. Broggi, M. Buzzoni, S. Debattisti, P. Grisleri, M. C. Laghi, P. Medici, Aug. 2011.
and P. Versari, ‘‘Extensive tests of autonomous driving technolo- [76] J. Wang, Y. Shao, Y. Ge, and R. Yu, ‘‘A survey of vehicle to everything
gies,’’ IEEE Trans. Intell. Transp. Syst., vol. 14, no. 3, pp. 1403–1415, (V2X) testing,’’ Sensors, vol. 19, no. 2, p. 334, 2019.
Sep. 2013. [77] C. H. Jang, C. S. Kim, K. C. Jo, and M. Sunwoo, ‘‘Design factor
[53] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, ‘‘1 year, 1000 km: optimization of 3D flash lidar sensor based on geometrical model for
The Oxford RobotCar dataset,’’ Int. J. Robot. Res., vol. 36, no. 1, automated vehicle and advanced driver assistance system applications,’’
pp. 3–15, Jan. 2017. Int. J. Automot. Technol., vol. 18, no. 1, pp. 147–156, Feb. 2017.
[54] N. Akai, L. Y. Morales, T. Yamaguchi, E. Takeuchi, Y. Yoshihara, [78] A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcia, and D. Scaramuzza,
H. Okuda, T. Suzuki, and Y. Ninomiya, ‘‘Autonomous driving based on ‘‘Event-based vision meets deep learning on steering prediction for self-
accurate localization using multilayer LiDAR and dead reckoning,’’ in driving cars,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Proc. IEEE 20th Int. Conf. Intell. Transp. Syst. (ITSC), Oct. 2017, pp. 1–6. Jun. 2018, pp. 5419–5427.
[55] E. Guizzo, ‘‘How Google’s self-driving car works,’’ IEEE Spectr., vol. 18, [79] C. Fries and H.-J. Wuensche, ‘‘Autonomous convoy driving by night:
no. 7, pp. 1132–1141, Oct. 2011. The vehicle tracking system,’’ in Proc. IEEE Int. Conf. Technol. Practical
[56] H. Somerville, P. Lienert, and A. Sage. (Mar. 2018). Uber’s Use Robot Appl. (TePRA), May 2015, pp. 1–6.
of Fewer Safety Sensors Prompts Questions After Arizona Crash. [80] Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, ‘‘MFNet:
Business News, Reuters. Accessed: Dec. 16, 2018. [Online]. Towards real-time semantic segmentation for autonomous vehicles with
Available: https://www.reuters.com/article/us-uber-selfdriving-sensors- multi-spectral scenes,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
insight/ubers-use-of-fewer-safety-sensors-prompts-questions-after- (IROS), Sep. 2017, pp. 5108–5115.
arizona-crash-idUSKBN1H337Q [81] T. B. Lee. How 10 Leading Companies Are Trying To Make Powerful,
[57] J. Ziegler et al., ‘‘Making Bertha drive—An autonomous journey on a Low-Cost Lidar. Accessed: May 19, 2019. [Online]. Available:
historic route,’’ IEEE Intell. Transp. Syst. Mag., vol. 6, no. 2, pp. 8–20, https://arstechnica.com/cars/2019/02/the-ars-technica-guide-to-the-
Summer 2014. lidar-industry/
[58] Baidu. Apollo Auto. Accessed: May 1, 2019. [Online]. Available: [82] S. Kumar, L. Shi, N. Ahmed, S. Gil, D. Katabi, and D. Rus, ‘‘CarSpeak:
https://github.com/ApolloAuto/apollo A content-centric network for autonomous driving,’’ in Proc. ACM SIG-
[59] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, ‘‘DeepDriving: Learning COMM Conf. Appl., Technol., Archit., Protocols Comput. Commun.,
affordance for direct perception in autonomous driving,’’ in Proc. IEEE 2012, pp. 259–270.
Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 2722–2730. [83] E. Yurtsever, S. Yamazaki, C. Miyajima, K. Takeda, M. Mori, K. Hitomi,
[60] D. A. Pomerleau, ‘‘ALVINN: An autonomous land vehicle in a neural and M. Egawa, ‘‘Integrating driving behavior and traffic context through
network,’’ in Proc. Adv. Neural Inf. Process. Syst., 1989, pp. 305–313. signal symbolization for data reduction and risky lane change detection,’’
[61] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, ‘‘Off-road IEEE Trans. Intell. Vehicles, vol. 3, no. 3, pp. 242–253, Sep. 2018.
obstacle avoidance through end-to-end learning,’’ in Proc. Adv. Neural [84] M. Gerla, ‘‘Vehicular cloud computing,’’ in Proc. 11th Annu. Medit. Ad
Inf. Process. Syst., 2006, pp. 739–746. Hoc Netw. Workshop (Med-Hoc-Net), 2012, pp. 152–155.
[62] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, [85] M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, ‘‘A sur-
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, vey on vehicular cloud computing,’’ J. Netw. Comput. Appl., vol. 40,
J. Zhao, and K. Zieba, ‘‘End to end learning for self-driving cars,’’ 2016, pp. 325–344, Apr. 2014.
arXiv:1604.07316. [Online]. Available: http://arxiv.org/abs/1604.07316 [86] I. Din, B.-S. Kim, S. Hassan, M. Guizani, M. Atiquzzaman, and
[63] H. Xu, Y. Gao, F. Yu, and T. Darrell, ‘‘End-to-end learning of driving mod- J. Rodrigues, ‘‘Information-centric network-based vehicular communica-
els from large-scale video datasets,’’ 2017, arXiv:1612.01079. [Online]. tions: Overview and research opportunities,’’ Sensors, vol. 18, no. 11,
Available: http://arxiv.org/abs/1612.01079 p. 3957, 2018.
[64] A. Sallab, M. Abdou, E. Perot, and S. Yogamani, ‘‘Deep reinforcement [87] A. Saxena, S. H. Chung, and A. Y. Ng, ‘‘Learning depth from sin-
learning framework for autonomous driving,’’ Electron. Imag., vol. 2017, gle monocular images,’’ in Proc. Adv. Neural Inf. Process. Syst., 2006,
no. 19, pp. 70–76, Jan. 2017. pp. 1161–1168.
[65] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, [88] J. Janai, F. Güney, A. Behl, and A. Geiger, ‘‘Computer vision for
V.-D. Lam, A. Bewley, and A. Shah, ‘‘Learning to drive in a day,’’ 2018, autonomous vehicles: Problems, datasets and state of the art,’’ Apr. 2017,
arXiv:1807.00412. [Online]. Available: http://arxiv.org/abs/1807.00412 arXiv:1704.05519. [Online]. Available: https://arxiv.org/abs/1704.05519
[66] S. Baluja, ‘‘Evolution of an artificial neural network based autonomous [89] J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, ‘‘DDD17: End-to-end
land vehicle controller,’’ IEEE Trans. Syst., Man Cybern. B, Cybern., DAVIS driving dataset,’’ 2017, arXiv:1711.01458. [Online]. Available:
vol. 26, no. 3, pp. 450–463, Jun. 1996. http://arxiv.org/abs/1711.01458
[67] J. Koutník, G. Cuccu, J. Schmidhuber, and F. Gomez, ‘‘Evolving [90] P. Lichtsteiner, C. Posch, and T. Delbruck, ‘‘A 128 × 128 120 dB 15 µs
large-scale neural networks for vision-based reinforcement learning,’’ in latency asynchronous temporal contrast vision sensor,’’ IEEE J. Solid-
Proc. 15th Annu. Conf. Genetic Evol. Comput. Conf. (GECCO), 2013, State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008.
pp. 1061–1068. [91] B. Schoettle, ‘‘Sensor fusion: A comparison of sensing capabilities
[68] S. Behere and M. Torngren, ‘‘A functional architecture for autonomous of human drivers and highly automated vehicles,’’ Sustain.
driving,’’ in Proc. 1st Int. Workshop Automot. Softw. Archit. (WASA), Worldwide Transp., Univ. Michigan, Ann Arbor, MI, USA,
May 2015, pp. 3–10. Tech. Rep. SWT-2017-12, Aug. 2017.
[69] L. Chi and Y. Mu, ‘‘Deep steering: Learning end-to-end driving [92] Tesla Motors. Autopilot Press Kit. Accessed: Dec. 16, 2018. [Online].
model from spatial and temporal visual cues,’’ 2017, arXiv:1708.03798. Available: https://www.tesla.com/presskit/autopilot#autopilot
[Online]. Available: http://arxiv.org/abs/1708.03798 [93] SXSW Interactive. (2016). Chris Urmson Explains Google Self-
[70] J.-P. Laumond, Robot Motion Planning and Control, vol. 229. Berlin, Driving Car Project. Accessed: Dec. 16, 2018. [Online]. Available:
Germany: Springer, 1998. https://www.sxsw.com/interactive/2016/chris-urmson-explain-googles-
[71] R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision, vol. 5. New York, self-driving-car-project/
NY, USA: McGraw-Hill, 1995. [94] M. A. Al-Khedher, ‘‘Hybrid GPS-GSM localization of automo-
[72] S. J. Anderson, S. B. Karumanchi, and K. Iagnemma, ‘‘Constraint-based bile tracking system,’’ 2012, arXiv:1201.2630. [Online]. Available:
planning and control for safe, semi-autonomous operation of vehicles,’’ http://arxiv.org/abs/1201.2630
in Proc. IEEE Intell. Vehicles Symp., Jun. 2012, pp. 383–388. [95] K. S. Chong and L. Kleeman, ‘‘Accurate odometry and error modelling
[73] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, for a mobile robot,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), vol. 4,
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, Apr. 1997, pp. 2783–2788.
G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, [96] C. Urmson, J. Anhalt, M. Clark, T. Galatali, J. P. Gonzalez, J. Gowdy,
D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, ‘‘Human-level A. Gutierrez, S. Harbaugh, M. Johnson-Roberson, H. Kato, P. L. Koon,
control through deep reinforcement learning,’’ Nature, vol. 518, K. Peterson, B. K. Smith, S. Spiker, E. Tryzelaar, and W. L. Whittaker,
no. 7540, pp. 529–533, Feb. 2015. ‘‘High speed navigation of unrehearsed terrain: Red team technology for
[74] D. Floreano, P. Dürr, and C. Mattiussi, ‘‘Neuroevolution: From architec- grand challenge 2004,’’ Robot. Inst., Carnegie Mellon Univ., Pittsburgh,
tures to learning,’’ Evol. Intell., vol. 1, no. 1, pp. 47–62, Mar. 2008. PA, USA, Tech. Rep. CMU-RI-TR-04-37, 2004.
[97] T. Bailey and H. Durrant-Whyte, ‘‘Simultaneous localization and map- [119] R. W. Wolcott and R. M. Eustice, ‘‘Fast LIDAR localization using mul-
ping (SLAM): Part II,’’ IEEE Robot. Autom. Mag., vol. 13, no. 3, tiresolution Gaussian mixture maps,’’ in Proc. IEEE Int. Conf. Robot.
pp. 108–117, Sep. 2006. Autom. (ICRA), May 2015, pp. 2814–2821.
[98] A. Hata and D. Wolf, ‘‘Road marking detection using LIDAR reflec- [120] M. Magnusson, A. Nuchter, C. Lorken, A. J. Lilienthal, and J. Hertzberg,
tive intensity data and its application to vehicle localization,’’ in ‘‘Evaluation of 3D registration reliability and speed—A comparison of
Proc. 17th Int. IEEE Conf. Intell. Transp. Syst. (ITSC), Oct. 2014, ICP and NDT,’’ in Proc. IEEE Int. Conf. Robot. Autom., May 2009,
pp. 584–589. pp. 3907–3912.
[99] T. Ort, L. Paull, and D. Rus, ‘‘Autonomous vehicle navigation in rural [121] R. Valencia, J. Saarinen, H. Andreasson, J. Vallvé, J. Andrade-Cetto,
environments without detailed prior maps,’’ in Proc. IEEE Int. Conf. and A. J. Lilienthal, ‘‘Localization in highly dynamic environments
Robot. Autom. (ICRA), May 2018, pp. 2040–2047. using dual-timescale NDT-MCL,’’ in Proc. IEEE Int. Conf. Robot.
[100] J. Levinson and S. Thrun, ‘‘Robust vehicle localization in urban environ- Automat. (ICRA), May/Jun. 2014, pp. 3956–3962.
ments using probabilistic maps,’’ in Proc. IEEE Int. Conf. Robot. Autom., [122] S. Kato, E. Takeuchi, Y. Ishiguro, Y. Ninomiya, K. Takeda, and
May 2010, pp. 4372–4378. T. Hamada, ‘‘An open approach to autonomous vehicles,’’ IEEE Micro,
[101] E. Takeuchi and T. Tsubouchi, ‘‘A 3-D scan matching using improved vol. 35, no. 6, pp. 60–68, Dec. 2015.
3-D normal distributions transform for mobile robotic mapping,’’ [123] R. W. Wolcott and R. M. Eustice, ‘‘Visual localization within LIDAR
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2006, maps for automated urban driving,’’ in Proc. IEEE/RSJ Int. Conf. Intell.
pp. 3068–3073. Robots Syst., Sep. 2014, pp. 176–183.
[102] S. E. Shladover, ‘‘PATH at 20—History and major milestones,’’ IEEE [124] C. Mcmanus, W. Churchill, A. Napier, B. Davis, and P. Newman,
Trans. Intell. Transp. Syst., vol. 8, no. 4, pp. 584–592, Dec. 2007. ‘‘Distraction suppression for vision-based pose estimation at
[103] A. Alam, B. Besselink, V. Turri, J. MåRtensson, and K. H. Johansson, city scales,’’ in Proc. IEEE Int. Conf. Robot. Autom., May 2013,
‘‘Heavy-duty vehicle platooning for sustainable freight transportation: pp. 3762–3769.
A cooperative method to enhance safety and efficiency,’’ IEEE Control [125] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, ‘‘Inception-
Syst. Mag., vol. 35, no. 6, pp. 34–56, Dec. 2015. v4, inception-ResNet and the impact of residual connections on
[104] C. Bergenhem, S. Shladover, E. Coelingh, C. Englund, S. Shladover, and learning,’’ Feb. 2016, arXiv:1602.07261. [Online]. Available: https://
S. Tsugawa, ‘‘Overview of platooning systems,’’ in Proc. 19th ITS World arxiv.org/abs/1602.07261
Congr., Vienna, Austria, Oct. 2012, pp. 1–8. [126] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for
[105] E. Chan, ‘‘Sartre automated platooning vehicles,’’ Towards Innov. Freight
image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Logistics, vol. 2, pp. 137–150, May 2016.
(CVPR), Jun. 2016, pp. 770–778.
[106] A. Keymasi Khalaji and S. A. A. Moosavian, ‘‘Robust adaptive controller
[127] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ‘‘Densely
for a tractor-trailer mobile robot,’’ IEEE/ASME Trans. Mechatronics,
connected convolutional networks,’’ Aug. 2016, arXiv:1608.06993.
vol. 19, no. 3, pp. 943–953, Jun. 2014.
[Online]. Available: https://arxiv.org/abs/1608.06993
[107] J. Cheng, B. Wang, and Y. Xu, ‘‘Backward path tracking control for
[128] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improve-
mobile robot with three trailers,’’ in Proc. Int. Conf. Neural Inf. Process.
ment,’’ Apr. 2018, arXiv:1804.02767. [Online]. Available: https://arxiv.
Cham, Switzerland: Springer, Nov. 2017, pp. 32–41.
[108] M. Hejase, J. Jing, J. M. Maroli, Y. Bin Salamah, L. Fiorentini, org/abs/1804.02767
and U. Ozguner, ‘‘Constrained backward path tracking control using a [129] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
plug-in jackknife prevention system for autonomous tractor-trailers,’’ V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’
in Proc. 21st Int. Conf. Intell. Transp. Syst. (ITSC), Nov. 2018, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,
pp. 2012–2017. pp. 1–9.
[109] F. Zhang, H. Stahle, G. Chen, C. C. C. Simon, C. Buckl, and A. Knoll, [130] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
‘‘A sensor fusion approach for localization with cumulative error elimina- large-scale image recognition,’’ Apr. 2015, arXiv:1409.1556. [Online].
tion,’’ in Proc. IEEE Int. Conf. Multisensor Fusion Integr. for Intell. Syst. Available: https://arxiv.org/abs/1409.1556
(MFI), Sep. 2012, pp. 1–6. [131] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet:
[110] W.-W. Kao, ‘‘Integration of GPS and dead-reckoning navigation sys- A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput.
tems,’’ in Proc. Vehicle Navigat. Inf. Syst. Conf., vol. 2, Oct. 1991, Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
pp. 635–643. [132] A. Andreopoulos and J. K. Tsotsos, ‘‘50 years of object recognition:
[111] J. Levinson, M. Montemerlo, and S. Thrun, ‘‘Map-based precision vehicle Directions forward,’’ Comput. Vis. Image Understand., vol. 117, no. 8,
localization in urban environments,’’ in Robotics: Science and Systems pp. 827–891, Aug. 2013.
III, W. Burgard, O. Brock, and C. Stachniss, Eds. Cambridge, MA, USA: [133] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, ‘‘Object detection with deep
MIT Press, 2007, ch. 16, pp. 4372–4378. learning: A review,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 30,
[112] A. Ranganathan, D. Ilstrup, and T. Wu, ‘‘Light-weight localization for no. 11, pp. 3212–3232, Nov. 2019.
vehicles using road markings,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots [134] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and
Syst., Nov. 2013, pp. 921–927. M. Pietikäinen, ‘‘Deep learning for generic object detection:
[113] J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G. Fiore, A survey,’’ Sep. 2018, arXiv:1809.02165. [Online]. Available:
L. Fletcher, E. Frazzoli, A. Huang, S. Karaman, O. Koch, Y. Kuwata, https://arxiv.org/abs/1809.02165
D. Moore, E. Olson, S. Peters, J. Teo, R. Truax,and M. Walter, [135] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
‘‘A perception-driven autonomous urban vehicle,’’ J. Field Robot., Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
vol. 25, no. 10, pp. 727–774, 2008. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
[114] N. Akai, L. Y. Morales, E. Takeuchi, Y. Yoshihara, and Y. Ninomiya, [136] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ in
‘‘Robust localization using 3D NDT scan matching with experimentally Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
determined uncertainty and road marker matching,’’ in Proc. IEEE Intell. pp. 6517–6525.
Vehicles Symp. (IV), Jun. 2017, pp. 1356–1363. [137] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu,
[115] J. K. Suhr, J. Jang, D. Min, and H. G. Jung, ‘‘Sensor fusion-based low- and A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ Dec. 2015,
cost vehicle localization system for complex urban environments,’’ IEEE arXiv:1512.02325. [Online]. Available: https://arxiv.org/abs/1512.02325
Trans. Intell. Transp. Syst., vol. 18, no. 5, pp. 1078–1086, May 2017. [138] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc.
[116] D. Gruyer, R. Belaroussi, and M. Revilloud, ‘‘Accurate lateral positioning IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988.
from map data and road marking detection,’’ Expert Syst. Appl., vol. 43, [139] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,
pp. 1–8, Jan. 2016. ‘‘Encoder-decoder with atrous separable convolution for semantic
[117] X. Qu, B. Soheilian, and N. Paparoditis, ‘‘Vehicle localization using image segmentation,’’ Feb. 2018, arXiv:1802.02611. [Online]. Available:
mono-camera and geo-referenced traffic signs,’’ in Proc. IEEE Intell. https://arxiv.org/abs/1802.02611
Vehicles Symp. (IV), Jun. 2015, pp. 605–610. [140] Y. Yan, Y. Mao, and B. Li, ‘‘SECOND: Sparsely embedded convolutional
[118] M. Magnusson, ‘‘The three-dimensional normal-distributions detection,’’ Sensors, vol. 18, no. 10, p. 3337, Oct. 2018.
transform—An efficient representation for registration, surface analysis, [141] C. Geyer and K. Daniilidis, ‘‘A unifying theory for central panoramic
and loop detection,’’ Ph.D. dissertation, Graduate School Robot. systems and practical implications,’’ in Computer Vision—ECCV. Berlin,
Automat. Process Control, Örebro Univ., Örebro, Sweden, 2009. Germany: Springer, 2000, pp. 445–461.
[142] D. Scaramuzza, A. Martinelli, and R. Siegwart, ‘‘A toolbox for easily [163] J. Lambert, L. Liang, Y. Morales, N. Akai, A. Carballo, E. Takeuchi,
calibrating omnidirectional cameras,’’ in Proc. IEEE/RSJ Int. Conf. Intell. P. Narksri, S. Seiya, and K. Takeda, ‘‘Tsukuba challenge 2017 dynamic
Robots Syst., Oct. 2006, pp. 5695–5701. object tracks dataset for pedestrian behavior analysis,’’ J. Robot.
[143] D. Scaramuzza and R. Siegwart, ‘‘Appearance-guided monocular omni- Mechtron., vol. 30, no. 4, pp. 598–612, Aug. 2018.
directional visual odometry for outdoor ground vehicles,’’ IEEE Trans. [164] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, ‘‘CARLA:
Robot., vol. 24, no. 5, pp. 1015–1026, Oct. 2008. An open urban driving simulator,’’ 2017, arXiv:1711.03938. [Online].
[144] M. Schonbein and A. Geiger, ‘‘Omnidirectional 3D reconstruction in aug- Available: http://arxiv.org/abs/1711.03938
mented manhattan worlds,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots [165] S. Song and J. Xiao, ‘‘Sliding shapes for 3D object detection in depth
Syst., Sep. 2014, pp. 716–723. images,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV). Cham, Switzerland:
[145] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, Springer, 2014, pp. 634–651.
A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, [166] D. Z. Wang and I. Posner, ‘‘Voting for voting in online point cloud object
and D. Scaramuzza, ‘‘Event-based vision: A survey,’’ Apr. 2019, detection,’’ in Proc. Robot., Sci. Syst., Jul. 2015, pp. 1–9.
arXiv:1904.08405. [Online]. Available: https://arxiv.org/abs/1904.08405 [167] Y. Zhou and O. Tuzel, ‘‘VoxelNet: End-to-end learning for point cloud
[146] R. H. Rasshofer and K. Gresser, ‘‘Automotive radar and lidar systems based 3D object detection,’’ Nov. 2017, arXiv:1711.06396. [Online].
for next generation driver assistance functions,’’ Adv. Radio Sci., vol. 3, Available: https://arxiv.org/abs/1711.06396
pp. 205–209, May 2005. [168] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and
[147] P. Radecki, M. Campbell, and K. Matzen, ‘‘All weather perception: R. Urtasun, ‘‘3D object proposals for accurate object class detection,’’
Joint data association, tracking, and classification for autonomous in Advances in Neural Information Processing Systems 28, C. Cortes,
ground vehicles,’’ May 2016, arXiv:1605.02196. [Online]. Available: N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. New York,
https://arxiv.org/abs/1605.02196 NY, USA: Curran Associates, Inc., 2015, pp. 424–432.
[148] P. Hurney, F. Morgan, M. Glavin, E. Jones, and P. Waldron, ‘‘Review of [169] D. Lin, S. Fidler, and R. Urtasun, ‘‘Holistic scene understanding for 3D
pedestrian detection techniques in automotive far-infrared video,’’ IET object detection with RGBD cameras,’’ in Proc. IEEE Int. Conf. Comput.
Intell. Transp. Syst., vol. 9, no. 8, pp. 824–832, Oct. 2015. Vis., Dec. 2013, pp. 1417–1424.
[149] N. Carlevaris-Bianco and R. M. Eustice, ‘‘Learning visual feature descrip- [170] B. Li, T. Zhang, and T. Xia, ‘‘Vehicle detection from 3D lidar using fully
tors for dynamic lighting conditions,’’ in Proc. IEEE/RSJ Int. Conf. Intell. convolutional network,’’ in Proc. Robot., Sci. Syst., Jun. 2016, pp. 1–8.
Robots Syst., Sep. 2014, pp. 2769–2776. [171] L. Liu, Z. Pan, and B. Lei, ‘‘Learning a rotation invariant detector with
[150] V. Peretroukhin, W. Vega-Brown, N. Roy, and J. Kelly, ‘‘PROBE-GK: rotatable bounding box,’’ Nov. 2017, arXiv:1711.09405. [Online]. Avail-
Predictive robust estimation using generalized kernels,’’ in Proc. IEEE able: https://arxiv.org/abs/1711.09405
Int. Conf. Robot. Automat. (ICRA), May 2016, pp. 817–824. [172] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, ‘‘Multi-view 3D object
[151] W. Maddern, A. Stewart, C. McManus, B. Upcroft, W. Churchill, and detection network for autonomous driving,’’ in Proc. IEEE Conf. Comput.
P. Newman, ‘‘Illumination invariant imaging: Applications in robust Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6526–6534.
vision-based localisation, mapping and classification for autonomous
[173] M. Ren, A. Pokrovsky, B. Yang, and R. Urtasun, ‘‘SBNet: Sparse blocks
vehicles,’’ in Proc. Vis. Place Recognit. Changing Environ. Workshop,
network for fast inference,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
IEEE Int. Conf. Robot. Automat. (ICRA), Hong Kong, vol. 2, May 2014,
Pattern Recognit., Jun. 2018, pp. 8711–8720.
p. 3.
[174] W. Ali, S. Abdelkarim, M. Zahran, M. Zidan, and A. El Sallab,
[152] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays,
‘‘YOLO3D: End-to-end real-time 3D oriented object bounding box detec-
P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, ‘‘Microsoft COCO:
tion from LiDAR point cloud,’’ Aug. 2018, arXiv:1808.02350. [Online].
Common Objects in Context,’’ May 2014, arXiv:1405.0312. [Online].
Available: https://arxiv.org/abs/1808.02350
Available: https://arxiv.org/abs/1405.0312
[175] B. Yang, W. Luo, and R. Urtasun, ‘‘PIXOR: Real-time 3D object detection
[153] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards
from point clouds,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
real-time object detection with region proposal networks,’’ Jun. 2015,
Recognit., Jun. 2018, pp. 7652–7660.
arXiv:1506.01497. [Online]. Available: https://arxiv.org/abs/1506.01497
[154] H. Noh, S. Hong, and B. Han, ‘‘Learning deconvolution network [176] D. Feng, L. Rosenbaum, and K. Dietmayer, ‘‘Towards safe autonomous
for semantic segmentation,’’ in Proc. IEEE Int. Conf. Comput. Vis. driving: Capture uncertainty in the deep neural network for lidar 3D
(ICCV). Washington, DC, USA: IEEE Computer Society, Dec. 2015, vehicle detection,’’ Apr. 2018, arXiv:1804.05132. [Online]. Available:
pp. 1520–1528, doi: 10.1109/ICCV.2015.178. https://arxiv.org/abs/1804.05132
[155] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks [177] A. Geiger, P. Lenz, and R. Urtasun, ‘‘Are we ready for autonomous driv-
for biomedical image segmentation,’’ May 2015, arXiv:1505.04597. ing? The KITTI vision benchmark suite,’’ in Proc. IEEE Conf. Comput.
[Online]. Available: https://arxiv.org/abs/1505.04597 Vis. Pattern Recognit., Jun. 2012, pp. 3354–3361.
[156] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, ‘‘Pyramid scene [178] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. Erin Liong, Q. Xu,
parsing network,’’ Dec. 2016, arXiv:1612.01105. [Online]. Available: A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, ‘‘NuScenes: A mul-
https://arxiv.org/abs/1612.01105 timodal dataset for autonomous driving,’’ 2019, arXiv:1903.11027.
[157] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, [Online]. Available: http://arxiv.org/abs/1903.11027
‘‘DeepLab: Semantic image segmentation with deep convolutional [179] S. Shi, X. Wang, and H. Li, ‘‘PointRCNN: 3D object proposal generation
nets, atrous convolution, and fully connected CRFs,’’ Jun. 2016, and detection from point cloud,’’ Dec. 2018, arXiv:1812.04244. [Online].
arXiv:1606.00915. [Online]. Available: https://arxiv.org/abs/1606.00915 Available: https://arxiv.org/abs/1812.04244
[158] X. Ma, Z. Wang, H. Li, W. Ouyang, X. Fan, and P. Zhang, ‘‘Accurate [180] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, ‘‘Point-
monocular object detection via color-embedded 3D reconstruction for Pillars: Fast encoders for object detection from point clouds,’’ Dec. 2018,
autonomous driving,’’ Mar. 2019, arXiv:1903.11444. [Online]. Available: arXiv:1812.05784. [Online]. Available: https://arxiv.org/abs/1812.05784
https://arxiv.org/abs/1903.11444 [181] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, ‘‘IPOD: Intensive point-based
[159] X. Cheng, P. Wang, and R. Yang, ‘‘Learning depth with convolutional spa- object detector for point cloud,’’ Dec. 2018, arXiv:1812.05276. [Online].
tial propagation network,’’ 2018, arXiv:1810.02695. [Online]. Available: Available: https://arxiv.org/abs/1812.05276
https://arxiv.org/abs/1810.02695 [182] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, ‘‘Frustum PointNets for
[160] R. B. Rusu, ‘‘Semantic 3D object maps for everyday manipulation 3D object detection from RGB-D data,’’ Nov. 2017, arXiv:1711.08488.
in human living environments,’’ KI–Künstliche Intelligenz, vol. 24, [Online]. Available: https://arxiv.org/abs/1711.08488
pp. 345–348, Oct. 2009. [183] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, and
[161] W. Wang, K. Sakurada, and N. Kawaguchi, ‘‘Incremental and enhanced T.-K. Kim, ‘‘Multiple object tracking: A literature review,’’ Sep. 2014,
scanline-based segmentation method for surface reconstruction of sparse arXiv:1409.7618. [Online]. Available: https://arxiv.org/abs/1409.7618
LiDAR data,’’ Remote Sens., vol. 8, no. 11, p. 967, 2016. [184] A. Azim and O. Aycard, ‘‘Detection, classification and tracking of moving
[162] P. Narksri, E. Takeuchi, Y. Ninomiya, Y. Morales, N. Akai, and objects in a 3D environment,’’ in Proc. IEEE Intell. Vehicles Symp.,
N. Kawaguchi, ‘‘A slope-robust cascaded ground segmentation in 3D Jun. 2012, pp. 802–807.
point cloud for autonomous vehicles,’’ in Proc. 21st Int. Conf. Intell. [185] J. Shi and Tomasi, ‘‘Good features to track,’’ in Proc. IEEE Conf. Comput.
Transp. Syst. (ITSC), Nov. 2018, pp. 497–504. Vis. Pattern Recognit. (CVPR), Jun. 1994, pp. 593–600.
[186] M.-P. Dubuisson and A. K. Jain, ‘‘A modified Hausdorff distance for [209] A. Borkar, M. Hayes, and M. T. Smith, ‘‘Robust lane detection and
object matching,’’ in Proc. 12th Int. Conf. Pattern Recognit., vol. 1, tracking with Ransac and Kalman filter,’’ in Proc. 16th IEEE Int. Conf.
Oct. 1994, pp. 566–568. Image Process. (ICIP), Nov. 2009, pp. 3261–3264.
[187] S. Hwang, N. Kim, Y. Choi, S. Lee, and I. S. Kweon, ‘‘Fast multiple [210] A. V. Nefian and G. R. Bradski, ‘‘Detection of drivable corridors for
objects detection and tracking fusing color camera and 3D LIDAR for off-road autonomous navigation,’’ in Proc. Int. Conf. Image Process.,
intelligent vehicles,’’ in Proc. 13th Int. Conf. Ubiquitous Robots Ambient Oct. 2006, pp. 3025–3028.
Intell. (URAI), Aug. 2016, pp. 234–239. [211] E. Yurtsever, Y. Liu, J. Lambert, C. Miyajima, E. Takeuchi, K. Takeda,
[188] T.-N. Nguyen, B. Michaelis, A. Al-Hamadi, M. Tornow, and and J. H. L. Hansen, ‘‘Risky action recognition in lane change video
M.-M. Meinecke, ‘‘Stereo-camera-based urban environment perception clips using deep spatiotemporal networks with segmentation mask trans-
using occupancy grid and object tracking,’’ IEEE Trans. Intell. Transp. fer,’’ in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,
Syst., vol. 13, no. 1, pp. 154–165, Mar. 2012. pp. 3100–3107.
[189] J. Ziegler, P. Bender, T. Dang, and C. Stiller, ‘‘Trajectory planning for [212] Y. Gal, ‘‘Uncertainty in deep learning,’’ Ph.D. dissertation, Dept. Eng.,
bertha—A local, continuous method,’’ in Proc. IEEE Intell. Vehicles Univ. Cambridge, Cambridge, U.K., 2016.
Symp. Proc., Jun. 2014, pp. 450–457. [213] S. Yamazaki, C. Miyajima, E. Yurtsever, K. Takeda, M. Mori, K. Hitomi,
[190] A. Ess, K. Schindler, B. Leibe, and L. Van Gool, ‘‘Object detection and M. Egawa, ‘‘Integrating driving behavior and traffic context through
and tracking for autonomous navigation in dynamic environments,’’ Int. signal symbolization,’’ in Proc. IEEE Intell. Vehicles Symp. (IV),
J. Robot. Res., vol. 29, no. 14, pp. 1707–1725, Dec. 2010. Jun. 2016, pp. 642–647.
[191] A. Petrovskaya and S. Thrun, ‘‘Model based vehicle detection and track- [214] X. Geng, H. Liang, B. Yu, P. Zhao, L. He, and R. Huang, ‘‘A scenario-
ing for autonomous urban driving,’’ Auto. Robots, vol. 26, nos. 2–3, adaptive driving behavior prediction approach to urban autonomous driv-
pp. 123–139, Apr. 2009. ing,’’ Appl. Sci., vol. 7, no. 4, p. 426, 2017.
[192] M. He, E. Takeuchi, Y. Ninomiya, and S. Kato, ‘‘Precise and efficient [215] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and D. Wollherr,
model-based vehicle tracking method using Rao-Blackwellized and scal- ‘‘A combined model-and learning-based framework for interaction-aware
ing series particle filters,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. maneuver prediction,’’ IEEE Trans. Intell. Transp. Syst., vol. 17, no. 6,
(IROS), Oct. 2016, pp. 117–124. pp. 1538–1550, Jun. 2016.
[193] D. Z. Wang, I. Posner, and P. Newman, ‘‘Model-free detection and [216] V. Gadepally, A. Krishnamurthy, and Ü. Özgüner, ‘‘A framework for esti-
tracking of dynamic objects with 2D lidar,’’ Int. J. Robot. Res., vol. 34, mating long term driver behavior,’’ J. Adv. Transp., vol. 2017, pp. 1–11,
no. 7, pp. 1039–1063, Jun. 2015. Jan. 2017.
[194] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, [217] P. Liu, A. Kurt, K. Redmill, and U. Ozguner, ‘‘Classification of highway
M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, F. Mujica, lane change behavior to detect dangerous cut-in maneuvers,’’ in Proc.
A. Coates, and A. Y. Ng, ‘‘An empirical evaluation of deep learning 95th Annu. Meeting Transp. Res. Board (TRB), vol. 2, 2015, pp. 1–14.
on highway driving,’’ Apr. 2015, arXiv:1504.01716. [Online]. Available: [218] P. Kumar, M. Perrollaz, S. Lefevre, and C. Laugier, ‘‘Learning-based
https://arxiv.org/abs/1504.01716 approach for online lane change intention prediction,’’ in Proc. IEEE
[195] D. Held, S. Thrun, and S. Savarese, ‘‘Learning to track at 100 FPS Intell. Vehicles Symp. (IV), Jun. 2013, pp. 797–802.
with deep regression networks,’’ Apr. 2016, arXiv:1604.01802. [Online]. [219] H. Darweesh, E. Takeuchi, K. Takeda, Y. Ninomiya, A. Sujiwo,
Available: https://arxiv.org/abs/1604.01802 L. Y. Morales, N. Akai, T. Tomizawa, and S. Kato, ‘‘Open source inte-
[196] S. Chowdhuri, T. Pankaj, and K. Zipser, ‘‘MultiNet: Multi-modal multi- grated planner for autonomous navigation in highly dynamic environ-
task learning for autonomous driving,’’ Sep. 2017, arXiv:1709.05581. ments,’’ J. Robot. Mechatronics, vol. 29, no. 4, pp. 668–684, 2017.
[Online]. Available: https://arxiv.org/abs/1709.05581 [220] F. Sagberg, Selpi, G. F. Bianchi Piccinini, and J. Engström, ‘‘A review
[197] Autoware. Accessed: Jun. 12, 2019. [Online]. Available: https://github. of research on driving styles and road safety,’’ Hum. Factors, J. Hum.
com/autowarefoundation/autoware Factors Ergonom. Soc., vol. 57, no. 7, pp. 1248–1275, Nov. 2015.
[198] J. C. McCall and M. M. Trivedi, ‘‘Video-based lane estimation and [221] C. Marina Martinez, M. Heucke, F.-Y. Wang, B. Gao, and D. Cao,
tracking for driver assistance: Survey, system, and evaluation,’’ IEEE ‘‘Driving style recognition for intelligent vehicle control and advanced
Trans. Intell. Transp. Syst., vol. 7, no. 1, pp. 20–37, Mar. 2006. driver assistance: A survey,’’ IEEE Trans. Intell. Transp. Syst., vol. 19,
[199] A. Bar Hillel, R. Lerner, D. Levi, and G. Raz, ‘‘Recent progress in no. 3, pp. 666–676, Mar. 2018.
road and lane detection: A survey,’’ Mach. Vis. Appl., vol. 25, no. 3, [222] D. A. Johnson and M. M. Trivedi, ‘‘Driving style recognition using a
pp. 727–745, Apr. 2014. smartphone as a sensor platform,’’ in Proc. 14th Int. IEEE Conf. Intell.
[200] C. Fernández, D. Fernández-Llorca, and M. A. Sotelo, ‘‘A hybrid vision- Transp. Syst. (ITSC), Oct. 2011, pp. 1609–1615.
map method for urban road detection,’’ J. Adv. Transp., vol. 2017, [223] M. Fazeen, B. Gozick, R. Dantu, M. Bhukhiya, and M. C. González,
Oct. 2017, Art. no. 7090549. ‘‘Safe driving using mobile phones,’’ IEEE Trans. Intell. Transp. Syst.,
[201] R. Labayrade, ‘‘A reliable and robust lane detection system based on vol. 13, no. 3, pp. 1462–1468, Sep. 2012.
the parallel use of three algorithms for driving safety assistance,’’ IEICE [224] N. Karginova, S. Byttner, and M. Svensson, ‘‘Data-driven methods for
Trans. Inf. Syst., vols. E89–D, no. 7, pp. 2092–2100, Jul. 2006. classification of driving styles in buses,’’ SAE Tech. Paper 2012-01-0744,
[202] Y. Jiang, F. Gao, and G. Xu, ‘‘Computer vision-based multiple-lane 2012.
detection on straight road and in a curve,’’ in Proc. Int. Conf. Image Anal. [225] A. Doshi and M. M. Trivedi, ‘‘Examining the impact of driving style
Signal Process., Apr. 2010, pp. 114–117. on the predictability and responsiveness of the driver: Real-world and
[203] M. Paton, K. MacTavish, C. J. Ostafew, and T. D. Barfoot, ‘‘It’s not simulator analysis,’’ in Proc. IEEE Intell. Vehicles Symp., Jun. 2010,
easy seeing green: Lighting-resistant stereo visual teach & repeat using pp. 232–237.
color-constant images,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), [226] V. Vaitkus, P. Lengvenis, and G. Zylius, ‘‘Driving style classification
May 2015, pp. 1519–1526. using long-term accelerometer information,’’ in Proc. 19th Int. Conf.
[204] A. S. Huang, D. Moore, M. Antone, E. Olson, and S. Teller, ‘‘Finding Methods Models Autom. Robot. (MMAR), Sep. 2014, pp. 641–644.
multiple lanes in urban road networks with vision and lidar,’’ Auto. [227] F. Syed, S. Nallapa, A. Dobryden, C. Grand, R. McGee, and D. Filev,
Robots, vol. 26, nos. 2–3, pp. 103–122, Apr. 2009. ‘‘Design and analysis of an adaptive real-time advisory system for
[205] H.-Y. Cheng, B.-S. Jeng, P.-T. Tseng, and K.-C. Fan, ‘‘Lane detection with improving real world fuel economy in a hybrid electric vehicle,’’ SAE
moving vehicles in the traffic scenes,’’ IEEE Trans. Intell. Transp. Syst., Tech. Paper 2010-01-0835, 2010.
vol. 7, no. 4, pp. 571–582, Dec. 2006. [228] A. Corti, C. Ongini, M. Tanelli, and S. M. Savaresi, ‘‘Quantita-
[206] J. M. Álvarez, A. M. López, and R. Baldrich, ‘‘Shadow resistant road tive driving style estimation for energy-oriented applications in road
segmentation from a mobile monocular system,’’ in Pattern Recognition vehicles,’’ in Proc. IEEE Int. Conf. Syst., Man, Cybern., Oct. 2013,
and Image Analysis. Berlin, Germany: Springer, 2007, pp. 9–16. pp. 3710–3715.
[207] R. Danescu and S. Nedevschi, ‘‘Probabilistic lane tracking in difficult [229] E. Ericsson, ‘‘Independent driving pattern factors and their influence on
road scenarios using stereovision,’’ IEEE Trans. Intell. Transp. Syst., fuel-use and exhaust emission factors,’’ Transp. Res. D, Transp. Environ.,
vol. 10, no. 2, pp. 272–282, Jun. 2009. vol. 6, no. 5, pp. 325–345, Sep. 2001.
[208] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks [230] V. Manzoni, A. Corti, P. De Luca, and S. M. Savaresi, ‘‘Driving style
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern estimation via inertial measurements,’’ in Proc. 13th Int. IEEE Conf.
Recognit. (CVPR), Jun. 2015, pp. 3431–3440. Intell. Transp. Syst., Sep. 2010, pp. 777–782.
[231] J. S. Neubauer and E. Wood, ‘‘Accounting for the variation of driver [254] D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck, ‘‘PHAST:
aggression in the simulation of conventional and advanced vehicles,’’ Hardware-accelerated shortest path trees,’’ J. Parallel Distrib. Comput.,
SAE Tech. Paper 2013-01-1453, 2013. vol. 73, no. 7, pp. 940–952, Jul. 2013.
[232] Y. L. Murphey, R. Milton, and L. Kiliaris, ‘‘Driver’s style classification [255] M. Pivtoraiko and A. Kelly, ‘‘Efficient constrained path planning via
using jerk analysis,’’ in Proc. IEEE Workshop Comput. Intell. Vehicles search in state lattices,’’ in Proc. Int. Symp. Artif. Intell., Robot., Automat.
Veh. Syst., Mar. 2009, pp. 23–28. Space, 2005, pp. 1–7.
[233] E. Yurtsever, K. Takeda, and C. Miyajima, ‘‘Traffic trajectory history [256] J. Barraquand and J.-C. Latombe, ‘‘Robot motion planning: A distributed
and drive path generation using GPS data cloud,’’ in Proc. IEEE Intell. representation approach,’’ Int. J. Robot. Res., vol. 10, no. 6, pp. 628–649,
Vehicles Symp. (IV), Jun. 2015, pp. 229–234. Dec. 1991.
[234] D. Dorr, D. Grabengiesser, and F. Gauterin, ‘‘Online driving style recog- [257] S. M. LaValle and J. J. Kuffner, ‘‘Randomized kinodynamic planning,’’
nition using fuzzy logic,’’ in Proc. 17th Int. IEEE Conf. Intell. Transp. Int. J. Robot. Res., vol. 20, no. 5, pp. 378–400, May 2001.
Syst. (ITSC), Oct. 2014, pp. 1021–1026. [258] S. Karaman and E. Frazzoli, ‘‘Sampling-based algorithms for optimal
[235] L. Xu, J. Hu, H. Jiang, and W. Meng, ‘‘Establishing style-oriented motion planning,’’ Int. J. Robot. Res., vol. 30, no. 7, pp. 846–894,
driver models by imitating human driving behaviors,’’ IEEE Trans. Intell. Jun. 2011.
Transp. Syst., vol. 16, no. 5, pp. 2522–2530, Oct. 2015. [259] L. E. Kavraki et al., ‘‘Probabilistic roadmaps for path planning in high-
[236] B. Rajan, A. McGordon, and P. Jennings, ‘‘An investigation on the effect dimensional configuration spaces,’’ IEEE Trans. Robot. Automat., vol. 12,
of driver style and driving events on energy demand of a PHEV,’’ World no. 4, pp. 566–580, 1996.
Electric Vehicle J., vol. 5, no. 1, pp. 173–181, 2012. [260] H. Fuji, J. Xiang, Y. Tazaki, B. Levedahl, and T. Suzuki, ‘‘Trajectory
[237] A. Augustynowicz, ‘‘Preliminary classification of driving style with planning for automated parking using multi-resolution state roadmap
objective rank method,’’ Int. J. Automot. Technol., vol. 10, no. 5, considering non-holonomic constraints,’’ in Proc. IEEE Intell. Vehicles
pp. 607–610, Oct. 2009. Symp. Proc., Jun. 2014, pp. 407–413.
[238] Z. Constantinescu, C. Marinoiu, and M. Vladoiu, ‘‘Driving style analysis [261] P. Petrov and F. Nashashibi, ‘‘Modeling and nonlinear adaptive control
using data mining techniques,’’ Int. J. Comput. Commun. Control, vol. 5, for autonomous vehicle overtaking,’’ IEEE Trans. Intell. Transp. Syst.,
no. 5, p. 654, 2010. vol. 15, no. 4, pp. 1643–1656, Aug. 2014.
[239] Y. Zhang, W. C. Lin, and Y.-K.-S. Chin, ‘‘A pattern-recognition approach [262] J. P. Rastelli, R. Lattarulo, and F. Nashashibi, ‘‘Dynamic trajectory gen-
for driving skill characterization,’’ IEEE Trans. Intell. Transp. Syst., eration using continuous-curvature algorithms for door to door assis-
vol. 11, no. 4, pp. 905–916, Dec. 2010. tance vehicles,’’ in Proc. IEEE Intell. Vehicles Symp. Proc., Jun. 2014,
[240] E. Yurtsever, C. Miyajima, and K. Takeda, ‘‘A traffic flow simulation pp. 510–515.
framework for learning driver heterogeneity from naturalistic driving data [263] D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, ‘‘Path planning
using autoencoders,’’ Int. J. Automot. Eng., vol. 10, no. 1, pp. 86–93, for autonomous vehicles in unknown semi-structured environments,’’ Int.
2019. J. Robot. Res., vol. 29, no. 5, pp. 485–501, Apr. 2010.
[241] C. Miyajima, Y. Nishiwaki, K. Ozawa, T. Wakita, K. Itou, K. Takeda, [264] J. Ren, K. A. McIsaac, and R. V. Patel, ‘‘Modified Newton’s method
and F. Itakura, ‘‘Driver modeling based on driving behavior and its eval- applied to potential field-based navigation for mobile robots,’’ IEEE
uation in driver identification,’’ Proc. IEEE, vol. 95, no. 2, pp. 427–437, Trans. Robot., vol. 22, no. 2, pp. 384–391, Apr. 2006.
Feb. 2007. [265] L. Caltagirone, M. Bellone, L. Svensson, and M. Wahde, ‘‘LIDAR-based
[242] A. Bolovinou, I. Bakas, A. Amditis, F. Mastrandrea, and W. Vinciotti, driving path generation using fully convolutional neural networks,’’ in
‘‘Online prediction of an electric vehicle remaining range based on regres- Proc. IEEE 20th Int. Conf. Intell. Transp. Syst. (ITSC), Oct. 2017, pp. 1–6.
sion analysis,’’ in Proc. IEEE Int. Electr. Vehicle Conf. (IEVC), Dec. 2014, [266] D. Barnes, W. Maddern, and I. Posner, ‘‘Find your own way:
pp. 1–8. Weakly-supervised segmentation of path proposals for urban auton-
[243] A. Mudgal, S. Hallmark, A. Carriquiry, and K. Gkritza, ‘‘Driving behav- omy,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2017,
ior at a roundabout: A hierarchical Bayesian regression analysis,’’ Transp. pp. 203–210.
Res. Part D: Transp. Environ., vol. 26, pp. 20–26, Jan. 2014. [267] M. Elbanhawi and M. Simic, ‘‘Sampling-based robot motion planning:
[244] J. C. McCall and M. M. Trivedi, ‘‘Driver behavior and situation aware A review,’’ IEEE Access, vol. 2, pp. 56–77, 2014.
brake assistance for intelligent vehicles,’’ Proc. IEEE, vol. 95, no. 2, [268] D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura,
pp. 374–387, Feb. 2007. ‘‘Navigating occluded intersections with autonomous vehicles using deep
[245] E. Yurtsever, C. Miyajima, S. Selpi, and K. Takeda, ‘‘Driving signature reinforcement learning,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA),
extraction,’’ in Proc. 3rd Int. Symp. Future Act. Saf. Technol. Toward Zero May 2018, pp. 2034–2039.
Traffic Accidents (FAST-Zero), 2015, pp. 1–5. [269] C. A. Pickering, K. J. Burnham, and M. J. Richardson, ‘‘A review of
[246] K. Sama, Y. Morales, N. Akai, H. Liu, E. Takeuchi, and K. Takeda, ‘‘Driv- automotive human machine interface technologies and techniques to
ing feature extraction and behavior classification using an autoencoder to reduce driver distraction,’’ in Proc. 2nd IET Int. Conf. Syst. Saf., 2007,
reproduce the velocity styles of experts,’’ in Proc. 21st Int. Conf. Intell. pp. 223–228.
Transp. Syst. (ITSC), Nov. 2018, pp. 1337–1343. [270] O. Carsten and M. H. Martens, ‘‘How can humans understand their
[247] H. Liu, T. Taniguchi, Y. Tanaka, K. Takenaka, and T. Bando, ‘‘Visualiza- automated cars? HMI principles, problems and solutions,’’ Cognition,
tion of driving behavior based on hidden feature extraction by using deep Technol. Work, vol. 21, no. 1, pp. 3–20, Feb. 2019.
learning,’’ IEEE Trans. Intell. Transp. Syst., vol. 18, no. 9, pp. 2477–2489, [271] P. Bazilinskyy and J. de Winter, ‘‘Auditory interfaces in automated
Sep. 2017. driving: An international survey,’’ PeerJ Comput. Sci., vol. 1, p. e13,
[248] H. Bast, D. Delling, A. Goldberg, M. Müller-Hannemann, T. Pajor, Aug. 2015.
P. Sanders, D. Wagner, and R. F. Werneck, ‘‘Route planning in transporta- [272] M. Peden, R. Scurfield, D. Sleet, D. Mohan, A. A. Hyder, E. Jarawan, and
tion networks,’’ in Algorithm Engineering. Cham, Switzerland: Springer, C. D. Mathers, World Report on Road Traffic Injury Prevention. Geneva,
2016, pp. 19–80. Switzerland: World Health Organization, 2004.
[249] P. Hart, N. Nilsson, and B. Raphael, ‘‘A formal basis for the heuristic [273] D. R. Large, L. Clark, A. Quandt, G. Burnett, and L. Skrypchuk, ‘‘Steering
determination of minimum cost paths,’’ IEEE Trans. Syst. Sci. Cybern., the conversation: A linguistic exploration of natural language interac-
vol. 4, no. 2, pp. 100–107, Jul. 1968. tions with a digital assistant during simulated driving,’’ Appl. Ergonom.,
[250] D. Van Vliet, ‘‘Improved shortest path algorithms for transport networks,’’ vol. 63, pp. 53–61, Sep. 2017.
Transp. Res., vol. 12, no. 1, pp. 7–20, Feb. 1978. [274] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
[251] R. Geisberger, P. Sanders, D. Schultes, and C. Vetter, ‘‘Exact routing in U. Franke, S. Roth, and B. Schiele, ‘‘The cityscapes dataset for semantic
large road networks using contraction hierarchies,’’ Transp. Sci., vol. 46, urban scene understanding,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
no. 3, pp. 388–404, Aug. 2012. Recognit. (CVPR), Jun. 2016, pp. 3213–3223.
[252] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, ‘‘Reachability and [275] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and
distance queries via 2-Hop labels,’’ SIAM J. Comput., vol. 32, no. 5, T. Darrell, ‘‘BDD100K: A diverse driving video database with scal-
pp. 1338–1355, Jan. 2003. able annotation tooling,’’ 2018, arXiv:1805.04687. [Online]. Available:
[253] R. Bauer, D. Delling, P. Sanders, D. Schieferdecker, D. Schultes, and http://arxiv.org/abs/1805.04687
D. Wagner, ‘‘Combining hierarchical and goal-directed speed-up tech- [276] G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, ‘‘The mapillary
niques for Dijkstra’s algorithm,’’ J. Experim. Algorithmics, vol. 15, vistas dataset for semantic understanding of street scenes,’’ in Proc. IEEE
pp. 2–3, Mar. 2010. Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 5000–5009.
[277] A. Patil, S. Malla, H. Gang, and Y.-T. Chen, ‘‘The H3D dataset [299] S. R. Richter, Z. Hayder, and V. Koltun, ‘‘Playing for benchmarks,’’ in
for full-surround 3D multi-object detection and tracking in crowded Proc. Int. Conf. Comput. Vis. (ICCV), vol. 2, 2017, pp. 2213–2222.
urban scenes,’’ 2019, arXiv:1903.01568. [Online]. Available: http://arxiv. [300] N. Koenig and A. Howard, ‘‘Design and use paradigms for Gazebo, an
org/abs/1903.01568 open-source multi-robot simulator,’’ in Proc. IEEE/RSJ Int. Conf. Intell.
[278] X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, Robots Syst. (IROS), vol. 4, Sep./Oct. 2004, pp. 2149–2154.
‘‘The ApolloScape open dataset for autonomous driving and its [301] D. Krajzewicz, G. Hertkorn, C. Rössel, and P. Wagner, ‘‘SUMO (Sim-
application,’’ 2018, arXiv:1803.06184. [Online]. Available: http://arxiv. ulation of Urban MObility)—An open-source traffic simulation,’’ in
org/abs/1803.06184 Proc. 4th Middle East Symp. Simulation Modelling (MESM), 2002,
[279] Udacity. Udacity Dataset. Accessed: Apr. 30, 2019. [Online]. Available: pp. 183–187.
https://github.com/udacity/self-driving-car/tree/master/datasets [302] J. E. Stellet, M. R. Zofka, J. Schumacher, T. Schamm, F. Niewels, and
[280] H. Schafer, E. Santana, A. Haden, and R. Biasini. ‘‘A commute in data: J. M. Zollner, ‘‘Testing of advanced driver assistance towards automated
The Comma2k19 dataset,’’ 2018, arXiv:1812.05752. [Online]. Available: driving: A survey and taxonomy on existing approaches and open ques-
https://arxiv.org/abs/1812.05752 tions,’’ in Proc. IEEE 18th Int. Conf. Intell. Transp. Syst., Sep. 2015,
[281] Y. Chen, J. Wang, J. Li, C. Lu, Z. Luo, H. Xue, and C. Wang, pp. 1455–1462.
‘‘LiDAR-video driving dataset: Learning driving policies effectively,’’
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 5870–5878. EKIM YURTSEVER (Member, IEEE) received the
[282] K. Takeda, J. H. L. Hansen, P. Boyraz, L. Malta, C. Miyajima, and B.S. and M.S. degrees from Istanbul Technical
H. Abut, ‘‘International large-scale vehicle corpora for research on driver University, in 2012 and 2014, respectively, and the
behavior on the road,’’ IEEE Trans. Intell. Transp. Syst., vol. 12, no. 4, Ph.D. degree in information science from Nagoya
pp. 1609–1623, Dec. 2011. University, Japan, in 2019.
[283] A. Blatt, J. Pierowicz, M. Flanigan, P. S. Lin, A. Kourtellis, C. Lee, Since 2019, he has been a Postdoctoral
P. Jovanis, J. Jenness, M. Wilaby, and J. Campbell, ‘‘Naturalistic driving Researcher with the Department of Electrical and
study: Field data collection,’’ SHRP 2 Rep. S2-S07-RW-1, 2015. Computer Engineering, Ohio State University. His
[284] S. G. Klauer, F. Guo, J. Sudweeks, and T. A. Dingus, ‘‘An analysis of research interests include artificial intelligence,
driver inattention using a case-crossover approach on 100-car data,’’ Nat. machine learning, and computer vision. He is
Highway Traffic Saf. Admin., Washington, DC, USA, Tech. Rep., 2010. currently working on machine learning and computer vision tasks in the
[285] M. Benmimoun et al., ‘‘euroFOT: Field operational test and impact
intelligent vehicle domain.
assessment of advanced driver assistance systems: Final results,’’ in
Proc. FISITA World Automot. Congr. Berlin, Germany: Springer, 2013,
pp. 537–547.
[286] S. Wang, M. Bai, G. Mattyus, H. Chu, W. Luo, B. Yang, J. Liang,
JACOB LAMBERT (Student Member, IEEE)
J. Cheverie, S. Fidler, and R. Urtasun, ‘‘TorontoCity: Seeing the world received the B.S. degree (Hons.) in physics from
with a million eyes,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), McGill University, Montreal, Canada, in 2014,
Oct. 2017, pp. 3028–3036. and the M.A.Sc. degree from the University of
[287] Y. Choi, N. Kim, S. Hwang, K. Park, J. S. Yoon, K. An, and I. S. Kweon, Toronto, Canada, in 2017. He is currently pursuing
‘‘KAIST multi-spectral Day/Night data set for autonomous and assisted the Ph.D. degree with Nagoya University, Japan.
driving,’’ IEEE Trans. Intell. Transp. Syst., vol. 19, no. 3, pp. 934–948, His current research focuses on 3D perception
Mar. 2018. through lidar sensors for autonomous robotics.
[288] S. Sibi, H. Ayaz, D. P. Kuhns, D. M. Sirkin, and W. Ju, ‘‘Monitoring driver
cognitive load using functional near infrared spectroscopy in partially
autonomous cars,’’ in Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2016,
pp. 419–425. ALEXANDER CARBALLO (Member, IEEE)
[289] C. Braunagel, E. Kasneci, W. Stolzmann, and W. Rosenstiel, ‘‘Driver-
received the Dr.Eng. degree from the Intelligent
activity recognition in the context of conditionally autonomous driv-
Robot Laboratory, University of Tsukuba, Japan.
ing,’’ in Proc. IEEE 18th Int. Conf. Intell. Transp. Syst., Sep. 2015,
pp. 1652–1657.
From 1996 to 2006, he worked as a Lecturer
[290] M. Walch, K. Lange, M. Baumann, and M. Weber, ‘‘Autonomous driv- at the School of Computer Engineering, Costa
ing: Investigating the feasibility of car-driver handover assistance,’’ in Rica Institute of Technology. From 2011 to 2017,
Proc. 7th Int. Conf. Automot. User Interfaces Interact. Veh. Appl., 2015, he worked at the Research and Development,
pp. 11–18. Hokuyo Automatic Co., Ltd. Since 2017, he has
[291] J. H. L. Hansen, C. Busso, Y. Zheng, and A. Sathyanarayana, ‘‘Driver been a Designated Assistant Professor with the
modeling for detection and assessment of driver distraction: Examples Institutes of Innovation for Future Society, Nagoya
from the UTDrive test bed,’’ IEEE Signal Process. Mag., vol. 34, no. 4, University, Japan. His main research interests are lidar sensors, robotic
pp. 130–142, Jul. 2017. perception, and autonomous driving.
[292] M. Everingham et al., ‘‘The 2005 PASCAL visual object classes
challenge,’’ in Proc. Mach. Learn. Challenges Workshop. Berlin,
Germany: Springer, 2005.
[293] E. Santana and G. Hotz, ‘‘Learning a driving simulator,’’ 2016,
arXiv:1608.01230. [Online]. Available: http://arxiv.org/abs/1608.01230 KAZUYA TAKEDA (Senior Member, IEEE)
[294] M. Althoff, M. Koschi, and S. Manzinger, ‘‘CommonRoad: Composable
received the B.E.E., M.E.E., and Ph.D. degrees
benchmarks for motion planning on roads,’’ in Proc. IEEE Intell. Vehicles
from Nagoya University, Japan.
Symp. (IV), Jun. 2017, pp. 719–726.
[295] H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu,
Since 1985, he has been working with the
H. Li, and Q. Kong, ‘‘Baidu Apollo EM motion planner,’’ 2018, Advanced Telecommunication Research Labo-
arXiv:1807.08048. [Online]. Available: http://arxiv.org/abs/1807.08048 ratories and KDD R&D Laboratories, Japan.
[296] NVIDIA. Driveworks SDK. Accessed: Dec. 9, 2018. [Online]. Available: In 1995, he started a research group for signal pro-
https://developer.nvidia.com/driveworks cessing applications at Nagoya University. He is
[297] CommaAI. OpenPilot. Accessed: Dec. 9, 2018. [Online]. Available: currently a Professor with the Institutes of Inno-
https://github.com/commaai/openpilot vation for Future Society, Nagoya University, and
[298] B. Wymann, C. Dimitrakakis, A. Sumner, E. Espié, and C. Guion- with Tier IV Inc. He is also serving as a member of the Board of Governors of
neau. (2015). TORCS: The Open Racing Car Simulator. Accessed: the IEEE ITS society. His main focus is investigating driving behavior using
May 2, 2019. [Online]. Available: http://www.cse.chalmers.se/~chrdimi/ data centric approaches, utilizing signal corpora of real driving behavior.
papers/torcs.pdf