0% found this document useful (0 votes)

44 views18 pages

A Hybrid Spatial-Temporal Deep Learning Architecture For Lane Detection

Uploaded by

suyash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views18 pages

A Hybrid Spatial-Temporal Deep Learning Architecture For Lane Detection

Uploaded by

suyash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

DOI: 10.1111/mice.

12829

RESEARCH ARTICLE

A hybrid spatial–temporal deep learning architecture for

lane detection

Yongqi Dong1 Sandeep Patil2 Bart van Arem1 Haneen Farah1

1 Department of Transport and Planning,

Faculty of Civil Engineering and Abstract

Geosciences, Delft University of Accurate and reliable lane detection is vital for the safe performance of lane-
Technology, Delft, The Netherlands
keeping assistance and lane departure warning systems. However, under certain
2Faculty of Mechanical, Maritime and
Materials Engineering, Delft University of
challenging circumstances, it is difficult to get satisfactory performance in accu-
Technology, Delft, The Netherlands rately detecting the lanes from one single image as mostly done in current lit-
erature. Since lane markings are continuous lines, the lanes that are difficult
Correspondence
Haneen Farah, Department of Transport to be accurately detected in the current single image can potentially be better
and Planning, Faculty of Civil deduced if information from previous frames is incorporated. This study pro-
Engineering and Geosciences, Delft
poses a novel hybrid spatial–temporal (ST) sequence-to-one deep learning archi-
University of Technology, Delft, The
Netherlands. tecture. This architecture makes full use of the ST information in multiple contin-
Email: h.farah@tudelft.nl uous image frames to detect the lane markings in the very last frame. Specifically,
the hybrid model integrates the following aspects: (a) the single image feature
Funding information
Applied and Technical Sciences (TTW), a extraction module equipped with the spatial convolutional neural network; (b)
subdomain of the Dutch Institute for the ST feature integration module constructed by ST recurrent neural network;
Scientific Research (NWO), Grant/Award
Number: 17187
(c) the encoder–decoder structure, which makes this image segmentation prob-
lem work in an end-to-end supervised learning format. Extensive experiments
reveal that the proposed model architecture can effectively handle challenging
driving scenes and outperforms available state-of-the-art methods.

1 INTRODUCTION ronmental perception, camera-based lane detection is

important, as it allows the vehicle to position itself within
The interest in developing automated driving functional- the lane. This is also the foundation of most lane-keeping
ities, and in the end, fully automated vehicles, has been assistance and lane departure warning systems (Andrade
increasing vastly over the last decade. The safety of these et al., 2019; Bar Hillel et al., 2014; W. Chen et al., 2020; Liang
automated functionalities is a crucial element and a pri- et al., 2020; Xing et al., 2018).
ority for academic researchers, manufacturers, policymak- Traditional vision-based lane-detection methods rely on
ers, and their potential future users. Automated driving hand-crafted low-level features (e.g., color, gradient, and
requires a full understanding of the environment around ridge features) and usually work in a four-step proce-
the automated vehicle through its sensors. Vision-based dure, that is, image pre-processing, feature extraction, line
methods have lately been boosted by advancements in detection and fitting, and post-processing (Bar Hillel et al.,
computer vision and machine learning. Regarding envi- 2014; Haris & Glowacz, 2021). Traditional computer vision

1
2 DONG ET AL.

techniques, for example, Inverse Perspective Mapping (Aly, not yet make the utmost of the spatial-temporal information
2008; Wang et al., 2014), Hough transform (Berriel et al., together with correlation and dependencies in the continuous
2017; Jiao et al., 2019; Zheng et al., 2018), Gaussian filters driving frames. Thus, for certain extremely challenging
(Aly, 2008; Sivaraman and Trivedi, 2013; Wang et al., 2012), driving scenes, their detection results are still unsatisfactory.
and Random Sample Consensus (RANSAC) (Aly, 2008; Choi In this paper, lane detection is treated as a segmentation
et al., 2018; Du et al., 2018; Guo et al., 2015; Lu et al., 2019), task, in which a novel hybrid spatial-temporal sequence-to-one
are usually adopted in the 4-step procedure. The problems of deep learning architecture is developed for lane detection
traditional methods are: (a) hand-crafted features are through a continuous sequence of images in an end-to-end
cumbersome to manage and not always useful, suitable, or approach. To cope with challenging driving situations, the
powerful; and (b) the detection results are always based on one hybrid model takes multiple continuous frames of an image
single image. Thus, the detection accuracies are relatively not sequence as inputs, and integrates the single image feature
high. extraction module, the spatial-temporal feature integration
During the last decade, with the advancements in deep module, together with the encoder-decoder structure to make
learning algorithms and computational power, many deep full use of the spatial-temporal information in the image
neural network based methods have been developed for lane sequence. The single image feature extraction module utilizes
detection with good performance. There are generally two modified common backbone networks with embedded spatial
dominant approaches (Tabelini et al., 2020b), i.e., (1) convolutional neural network (SCNN) (Pan et al., 2018) layers
segmentation-based pipeline (Kim and Park, 2017; Ko et al., to extract the features in every single image throughout the
2020; T. Liu et al., 2020; Pan et al., 2018; Zhang et al., 2021; continuous driving scene. SCNN is powerful in extracting
Zou et al., 2020), in which predictions are made on the per- spatial features and relationships in one single image,
pixel basis, classifying each pixel as either lane or not; (2) the especially for long continuous shape structures. Next, the
pipeline using row-based prediction (Hou et al., 2020; Qin et extracted features are fed into spatial-temporal recurrent neural
al., 2020; Yoo et al., 2020), in which the image is split into a network (ST-RNN) layers to capture the spatial-temporal
(horizontal) grid and the model predicts the most probable dependencies and correlations among the continuous frames.
location to contain a part of a lane marking in each row. An encoder-decoder structure is adopted with the encoder
Recently, Liu et al. (2021) summarized two additional consisting of SCNN and several fully-convolution layers to
categories of deep learning based lane detection methods, i.e., downsample the input image and abstract the features, while
the anchor-based approach (Chen et al., 2019; Li et al., 2020; the decoder, constructed by CNNs, upsample the abstracted
Tabelini et al., 2020b; Xu et al., 2020), which focuses on outputs of previous layers to the same size as the input image.
optimizing the line shape by regressing the relative coordinates With the labelled ground truth of the very last image in the
with the help of predefined anchors, and the parametric continuous frames, the model training works in an end-to-end
prediction based method which directly outputs parametric way as a supervised learning approach. To train and validate
lines expressed by curve equation (R. Liu et al., 2020; Tabelini the proposed model on two large-scale open-sourced datasets,
et al., 2020a). Apart from these dominant approaches, some i.e., tvtLANE (Zou et al., 2020) and TuSimple, a
other less common methods were proposed recently. For corresponding training strategy has been also developed. To
instance, Lin et al. (2020) fused the adaptive anchor scheme summarize, the main contributions of this paper lie in:
(designed by formulating a bilinear interpolation algorithm) • A hybrid spatial-temporal sequence-to-one deep neural
aided informative feature extraction and object detection into network architecture integrating the advantages of the
a single deep convolutional neural network for lane detection encoder-decoder structure, SCNN embedded single image
from a top-view perspective. Philion (2019) developed a novel feature extraction module, and ST-RNN module, is proposed;
learning-based approach with a fully convolutional model to • The proposed model architecture is the first attempt that
decode the lane structures directly rather than delegating tries to strengthen both spatial relation feature extraction in
structure inference to post-processing, plus an effective every single image frame and spatial-temporal correlation
approach to adapt the model to new contexts by unsupervised together with dependencies among continuous image frames
transfer learning. for lane detection;
Similar to traditional vision-based lane-detection methods, • The implementation utilized two widely used neural
most available deep learning models utilize only the current network backbones, i.e., UNet (Ronneberger et al., 2015) and
image frame to perform the detection. Until very recently, a SegNet (Badrinarayanan et al., 2017) and included extensive
few studies have explored the combination of convolutional evaluation experiments on commonly used datasets,
neural network (CNN) and recurrent neural network (RNN) to demonstrating the effectiveness and strength of the proposed
detect lane markings or simulate autonomous driving using model architecture;
continuous driving scenes (Chen et al., 2020; Zhang et al.,
• The proposed model can tackle lane detection in
2021; Zou et al., 2020). However, the available methods do not
challenging scenes such as curves, dirty roads, serious vehicle
take full advantage of the essential properties of the lane being
occlusions, etc., and outperforms all the available state-of-the-
long continuous solid or dashed line structures. Also, they do
art baseline models in most cases with a large margin.
DONG ET AL. 3

Skip connection
Concatenate for UNet-based backbone;
Pooling indices reuse for SegNet-based backbone.

ST-RNN Layer 0 ST-RNN Layer 1 ST-RNN Layer r

Encoded(Xt0)
ST-RNN(0,0) ST-RNN(1,0) ST-RNN(r,0)

I t 0 
Encoded(Xt1)
ST-RNN(0,1) ST-RNN(1,1) ST-RNN(r,1)

I t 1
Encoded(Xt2)
ST-RNN(0,2) ST-RNN(1,2) ST-RNN(r,2)
It 2
SCNN

I t  n-1
Encoded(Xtn-1)
ST-RNN(0,n-1) ST-RNN(1,n-1) ST-RNN(r,n-1)
Prediction of I t  n 

It n Encoded(Xtn)
ST-RNN(0,n) ST-RNN(1,n) ST-RNN(r,n)
Single image
feature extraction Spatio-temporal feature integration module
Continuous Images module

Input Encoder ST-RNN blocks Decoder Output

SCNN structure
C SCNN_DOWN SCNN_UP SCNN_RIGHT SCNN_LEFT
w
W C

H
H
C

W Next hidden layer

Selected layer

FIGURE 1. The architecture of the proposed model.

• Under the proposed architecture, the light version model encoder-decoder structure, to tackle lane detection in
variant can achieve beyond state-of-the-art performance while challenging driving scenes.
using fewer parameters.
2.1 Overview of the proposed model architecture
2 PROPOSED METHOD
The proposed deep neural network architecture adopts a
Although many sophisticated methods have been proposed sequence-to-one end-to-end encoder-decoder structure as
for lane detection, most of the available methods use only one shown in Figure 1.
single image resulting in unsatisfactory performance under Here "sequence-to-one" means that the model gets a
some extremely challenging scenarios, e.g., dazzle lighting, sequence of multi images as input and outputs the detection
and serious occlusion. This study proposes a novel hybrid result of the last image (please note that essentially the model
spatial-temporal sequence-to-one deep neural network is still utilizing sequence-to-sequence neural networks); "end-
architecture for lane detection. The architecture was inspired to-end" means that the learning algorithm goes directly from
by: (a) the successful precedents of hybrid deep neural network the input to the desired output, which refers to the lane
architectures which fuse CNN and RNN to make use of by: (a) detection result in this paper, bypassing the intermediate states
the successful precedents of hybrid deep neural network (Levinson et al., 2011; Neven et al., 2017); the encoder-
architectures which fuse CNN and RNN to make use of decoder structure is a modular structure that consists of an
information in continuous multiple frames (Zhang et al., 2021; encoder network and a decoder network, and is often employed
Zou et al., 2020); (b) the domain prior knowledge that traffic in sequence-to-sequence tasks, such as language translation
lanes are long continuous shape line structure with strong e.g., (Sutskever et al., 2014), and speech recognition e.g., (Wu
spatial relationship. The architecture integrates two modules et al., 2017). Here, the proposed model adopts encoder CNN
utilizing two distinctive neural networks with complementary with SCNN layers and decoder CNN using fully convolutional
merits, i.e., SCNN and convolutional Long Short Term layers. The encoder takes a sequence of continuous image
Memory (ConvLSTM) neural network, under an end-to-end frames, i.e., time-series-images, as input and abstracts the
feature map(s) in smaller sizes. To make use of the prior
4 DONG ET AL.

knowledge that traffic lanes are solid- or dashed- line detailed structure of SCNN is demonstrated in the bottom part
structures with a continuous shape, one special kind of CNN, of Figure 1.
i.e., SCNN, is adopted after the first CNN hidden layer. With SCNN can propagate the spatial information in one image
the help of SCNN, spatial features and relationships in every through four directions, as shown with the suffix "DOWN",
single image will be better extracted. Following this, the "UP", "RIGHT", "LEFT" in Figure 1, which denotes
extracted feature maps of the continuous frames, constructed downward, upward, rightward, and leftward, respectively.
in a time-series manner, will be fed to ST-RNN blocks for Take the "SCNN_DOWN" module for an example,
sequential feature extraction and spatial-temporal information considering that SCNN is adopted on a three dimensional
integration. Finally, the decoder network upsamples the tensor of size C × W × H, where in the lane detection task, C,
abstracted feature maps obtained from the ST-RNN and W, and H denote the number of channels, image (or its feature
decodes the content to the original input image size with the map) width, and heights respectively. For SCNN_D, the input
detection results. The proposed model architecture is tensor would be split into H slices, and the first slice will then
implemented with two backbones, UNet (Ronneberger et al., be sent into a convolution operation layer with C kernels of
2015) and SegNet (Badrinarayanan et al., 2017). Note, in the size C × w, in which w is the kernel width. Different from the
UNet based architecture, similar to (Ronneberger et al., 2015), traditional CNN in which the output of one convolution layer
the proposed model employs the skip connection between the is introduced into the next layer directly, in SCNN_D the
encoder and decoder phase by concatenating operation to reuse output is added to the next adjacent slice to produce a new
features and retain information from previous encoder layers slice, and iteratively to the next convolution layer continuing
for more accurate predictions; while in the SegNet based until the last slice in the selected direction is updated. The
networks, at the decoder stage, similar to (Badrinarayanan et convolution kernel weights are shared throughout all slices,
al., 2017), the proposed model reuses the pooling indices to and the same mechanism works for other directions of SCNNs.
capture, store, and make use of the vital boundary information With the above properties, SCNN has demonstrated its
in the encoder feature maps. The detailed network strengths in extracting spatial relationships in the image, which
implementation is elaborated in the remaining parts of Section makes it suitable for detecting long continuous shape
2. structures, e.g., traffic lanes, poles, and walls (Pan et al., 2018).
However, using only one image to do the detection, SCNN still
2.2 Network design could not produce satisfying performance under extremely
1) End-to-end encoder-decoder: Regarding lane detection challenging conditions. And that is why a sequence-to-one
as an image segmentation problem, the encoder-decoder architecture with continuous image frames as inputs and ST-
structure based neural network can be implemented and trained RNN blocks to capture the spatial-temporal correlations in the
in an end-to-end way. Inspired by the excellent performance of continuous frames is proposed in this paper.
CNN-based encoder-decoder for image semantic- 3) ST-RNN module: In this proposed framework, the
segmentation tasks in various domains (Badrinarayanan et al., multiple continuous frames of images are modelled as "image-
2017; Wang et al., 2020; Yasrab et al., 2017), this study also time-series" inputs. To capture the spatial-temporal
adopts the "symmetrical" encoder-decoder as the main dependencies and correlations among the image-time-series,
backbone structure. Convolution and pooling operations are the ST-RNN module is embedded in the middle of the encoder-
employed to extract and abstract the features in every image in decoder structure, which takes over the output extracted
the encoder stage; while in the decoder subset, the inverted features of the encoder as its input and outputs the integrated
convolution and upsampling operation are adopted to grasp the spatial-temporal information to the decoder.
extracted high-order features and construct the outputs layer Various versions of RNNs have been proposed, e.g., Long
by layer with regards to the targets. By setting the output target Short Term Memory (LSTM) together with its multivariate
size the same as the input image size, the whole network can version, i.e., fully connected LSTM (FC-LSTM), and Gated
work in an end-to-end approach. In the implementation, two Recurrent Unit (GRU), to tackle time-series data in different
widely used backbones, U-Net and Seg-Net, are adopted. To application domains. In this paper, two state-of-the-art RNN
better extract and make use of the spatial relations in every networks, i.e., ConvLSTM (Shi et al., 2015) and Convolutional
image frame, the SCNN layer is introduced in the encoder part Gated Recurrent Unit (ConvGRU) (Ballas et al., 2016), are
of the single image feature extraction module. Furthermore, to employed. These models, considering their abilities in spatial-
excavate and make use of the spatial-temporal correlations and temporal feature extraction, generally outperform other
dependencies among the input continuous image frames, ST- traditional RNN models.
RNN blocks are embedded in the middle of the encoder- A general critical problem for the vanilla RNN model is the
decoder networks. gradients vanishing (Hochreiter and Schmidhuber, 1997;
2) SCNN: The Spatial Convolutional Neural Network Pascanu et al., 2013; Ribeiro, 2020). For this, LSTM
(SCNN) was first proposed by Pan et al. (2018). The "spatial" introduces memory cells and gates to control the information
here means that the specially designed CNN can propagate flow to trap the gradient preventing it from vanishing during
spatial information via slice-by-slice message passing. The the back-propagation. In LSTM, the information of the new
DONG ET AL. 5

time-series inputs will be accumulated to the memory cell 𝒞𝑡 previous hidden state is supposed to be forgotten through an
if the input gate 𝑖𝑡 is on. In contrast, if the information is not element-wise multiplication operation when calculating
"important", the past cell status 𝒞𝑡−1 could be "forgotten" current candidate hidden representation. From the equations, it
by activating the forget gate 𝑓𝑡 . Also, there is the output gate is concluded that the information of ℋ𝑡 mainly comes from
ℋ̃𝑡 , while ℋ𝑡−1 as the previous hidden-state representation
𝑜𝑡 which decides whether the latest cell output 𝒞𝑡 will be
propagated to the final state ℋ𝑡 . The traditional FC-LSTM also contributes to the process of computing the final
contains too much redundancy for spatial information, which representation of ℋ𝑡 , thus the temporal dependencies are
makes it time-consuming and computational-expensive. To captured.
address this, the ConvLSTM (Shi et al., 2015) is selected to In practice, both ConvLSTM and ConvGRU with different
build the ST-RNN block of the proposed framework. In numbers of hidden layers were employed to serve as the ST-
ConvLSTM, the convolutional structures and operations are RNN module in the proposed architecture, and the
introduced in both the input-to-state and state-to-state corresponding performances were evaluated, respectively. To
transitions to do spatial information encoding, which also be specific, in the proposed network, the input and the output
alleviates the problem of time- and computation-consuming. sizes of the ST-RNN block are equivalent to the feature map
The key formulation of the ConvLSTM is shown by size extracted through the encoder, which are 8 ×16 and 4 × 8
equations (1)-(5), where ⊙ denotes the Hadamard product, ∗ for the UNet based and SegNet based backbone, respectively.
denotes the convolution operation, 𝜎(∙) represents the sigmoid The convolutional kernel size in ConvLSTM and ConvGRU is
function, and tanh(∙) represents the hyperbolic tangent 3 × 3, and the dimension of each hidden layer is 512. The
function; 𝑋𝑡 , 𝒞𝑡 , and ℋ𝑡 are the input (i.e., the extracted detailed implementations are described in the following
features from the encoder in the proposed framework), section.
memory cell status, and output at time 𝑡; 𝑖𝑡 , 𝑓𝑡 , and 𝑜𝑡 are the
2.3 Detailed implementation
function values of the input gate, forget gate, and output gate,
respectively; 𝑊 denotes the weight matrices, whose subscripts 1) Network Design Details: The proposed spatial-temporal
indicate the two corresponding variables are connected by this sequence-to-one neural network was developed for the lane
matrix. For instance, 𝑊𝑥𝑐 is the weight matrix between the detection task with K (in this paper K=5 if not specified)
input extracted features 𝑋𝑡 and the memory cell 𝒞𝑡 ; ′𝑏′s are continuous image frames as inputs. The image frames were
biases of the gates, e.g., 𝑏𝑖 is the input gate’s bias. firstly fed into the encoder for feature extraction and
𝑖𝑡 = 𝜎(𝑊𝑥𝑖 ∗ 𝑋𝑡 + 𝑊ℎ𝑖 ∗ ℋ𝑡−1 + 𝑊𝑐𝑖 ⊙ 𝒞𝑡−1 + 𝑏𝑖 ) (1) abstraction. Different from the normal CNN-based encoder,
𝑓𝑡 = 𝜎(𝑊𝑥𝑓 ∗ 𝑋𝑡 + 𝑊ℎ𝑓 ∗ ℋ𝑡−1 + 𝑊𝑐𝑓 ⊙ 𝒞𝑡−1 + 𝑏𝑓 ) (2) the SCNN layer was utilized to effectively extract the spatial
𝒞𝑡 = 𝑓𝑡 ⊙ 𝒞𝑡−1 + 𝑖𝑡 ⊙ tanh(𝑊𝑥𝑐 ∗ 𝑋𝑡 + 𝑊ℎ𝑐 ∗ ℋ𝑡−1 + relationships within every image. Different locations of the
𝑏𝑐 )(3) SCNN layer were tested, i.e., embedding the SCNN layer after
𝑜𝑡 = 𝜎(𝑊𝑥𝑜 ∗ 𝑋𝑡 + 𝑊ℎ𝑜 ∗ ℋ𝑡−1 + 𝑊𝑐𝑜 ⊙ 𝒞𝑡 + 𝑏𝑜 ) (4) the first hidden convolutional layer or at the very beginning.
ℋ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶𝑡 ) (5) The outputs of the encoder network were modelled in a time-
The ConvGRU (Ballas et al., 2016) further lightens the series manner and fed into the ST-RNN blocks (i.e.,
computational complexity by reducing a gate structure but ConvLSTM or ConvGRU layers) to further extract more
could perform similarly or slightly better compared with the useful and accurate features, especially the spatial-temporal
traditional RNNs or even ConvLSTM. The procedure of dependencies and correlations among different image frames.
computing different gates and hidden states/outputs of In short, the encoder network is primarily responsible for
ConvGRU is demonstrated with equations (6)-(9), in which the spatial feature extraction and abstraction transforming input
symbols have the same meaning as described before, while images into specified feature maps, while the ST-RNN blocks
additional 𝑧𝑡 and 𝑟𝑡 mean the update gate and the reset gate, accept the extracted features from the continuous image frames
̃ represents the current candidate hidden in a time-series manner to capture the spatial-temporal
respectively, plus ℋ
dependencies.
representation.
The outputs of the ST-RNN blocks were then transferred
𝑧𝑡 = 𝜎(𝑊𝑧𝑥 ∗ 𝑋𝑡 + 𝑊𝑧ℎ ∗ ℋ𝑡−1 + 𝑏𝑧 ) (6) into the decoder network that adopts deconvolution and
𝑟𝑡 = 𝜎(𝑊𝑟𝑥 ∗ 𝑋𝑡 + 𝑊𝑟ℎ ∗ ℋ𝑡−1 + 𝑏𝑟 ) (7) upsampling operations to highlight and make full use of the
features and rebuild the target to the original size of the input
̃𝑡 = tanh(𝑊𝑜𝑥 ∗ 𝑋𝑡 + 𝑊𝑜ℎ ∗ (𝑟𝑡 ⊙ ℋ𝑡−1 ) + 𝑏𝑜 )
ℋ (8) image. Note there is the skip concatenate connection (for
̃ + (1-𝑧𝑡 )ℋ𝑡−1
ℋ𝑡 = 𝑧𝑡 ℋ (9) UNet-based architecture) or pooling indices reusing (for
SegNet-based architecture) between the encoder and decoder
In ConvGRU, there are only two gate structures, i.e., the to reuse the retained features from previous encoder layers for
update gate 𝑧𝑡 and the reset gate 𝑟𝑡 . It is the update gate 𝑧𝑡 that more accurate predictions at the decoder phase. After the
decides how to update the hidden representation when decoder phase, the lane detection result is obtained as an image
generating the ultimate result of ℋ𝑡 at the current layer, as in the equivalent size to the input image frame. With the
shown in equation (9). While the reset gate 𝑟𝑡 is served to labelled ground truth and the help of the encoder-decoder
control to what extent the feature information captured in the structure, the proposed model can be trained and implemented
6 DONG ET AL.

in an end-to-end way. The detailed input, output sizes, together ℎ𝜃 (𝑥𝑖 ))] (10)
with parameters of the layers in the entire neural network are
where 𝑆 is number of training examples, 𝑤 stands for the
listed in Appendix Table A1 and Table A2.
weight which is set according to the ratio between the total lane
For both SegNet-based and UNet-based implementations,
pixel quantities and none-lane pixel quantities throughout the
two types of RNN layers, i.e., ConvLSTM and ConvGRU,
whole training set, 𝑦𝑖 is the true target label for training
were tested to serve as the ST-RNN block. Besides, the ST-
RNN blocks were tested with 1 hidden layer and 2 hidden example 𝑖, 𝑥𝑖 is the input for training example 𝑖, and ℎ𝜃 stands
layers, respectively. So there are four variants of in the for the model with neural network weights 𝜃.
proposed SegNet-based models, i.e., 3) Training details: The proposed neural networks with
SCNN_SegNet_ConvGRU1, SCNN_SegNet_ConvGRU2, different variants, together with the baseline models were
SCNN_SegNet_ConvLSTM1, and trained on the Dutch high-performance supercomputer
SCNN_SegNet_ConvLSTM2. SCNN_SegNet_ConvGRU1 clusters, Cartesius and Lisa, using 4 Titan RTX GPUs with the
means the model is using SegNet as the backbone with SCNN data parallel mechanism in PyTorch. The input image size was
layer embedded encoder, and 1 hidden layer of ConvGRU as set as 128 × 256 to reduce the computational payload. The
the ST-RNN block. This naming rule applies to the other 3 batch size was set to be as large as possible (e.g., 64 for UNet-
variants. Also, there are four variants of the proposed UNet- based network architecture, 100 for SegNet based ones, and
based models, with a similar naming rule. 136 for UNetLight based ones), and the learning rate was
In the proposed models with U-Net as the backbone, the initially set to 0.03. The RAdam optimizer (Liu et al., 2019)
number of kernels used in the last convolutional block of the was first used in this work for training the model at the
encoder part differs from the original U-Net’s settings. Here, beginning. At the later stage, when the training accuracy was
the number of output kernels (channels) of the last beyond 95%, the optimizer was switched to the Stochastic
convolutional block in the proposed encoder does not double Gradient Descent (SGD) (Bottou, 2010) optimizer with decay.
its input kernels, which applies to all the previous With the labelled ground truth, the models were trained
convolutional blocks. This is done, similar to (Zou et al., through iteratively updating the parameters in the weight
2020), to better connect the output of the encoder with the ST- matrixes and the losses on the basis of the deviation between
RNN block (ConvLSTM or ConvGRU layers). To do so, the outputs of the proposed neural network and the ground truth
parameters of the full-connection layer are designed to be using the backpropagation mechanism. To speed up the
quadrupled while the side lengths of the feature maps reduced training process, the pre-trained weights of SegNet and U-Net
to half, at the same time, the number of kernels remains on ImageNet (Deng et al., 2009) were adopted.
unchanged. This strategy also somewhat contributes to 3 EXPERIMENTS AND RESULTS
reducing the parameter size of the whole network.
Extensive experiments were carried out to inspect and
A modified light version of UNet (UNetLight) was also
verify the accuracy, effectiveness, and robustness of the
tested to serve as the network backbone to reduce the total
proposed lane detection model using two large-scale open-
parameter size, increase the model’s ability to operate in real-
sourced datasets. The proposed models were evaluated on
time, and also further verify the proposed network
different driving scenes and were compared with several state-
architecture’s effectiveness. The UNetLight has a similar
of-the-art baseline lane detection methods which also employ
network design to the demonstration in Table A2. The only
deep learning, e.g., U-Net (Ronneberger et al., 2015), Seg-Net
difference is that all the numbers of kernels in the ConvBlocks
(Badrinarayanan et al., 2017), SCNN (Pan et al., 2018),
are reduced to half except for the Input in In_ConvBlock (with
LaneNet (Neven et al., 2018), UNet_ConvLSTM (Zou et al.,
the input channel of 3 unchanged) and Output in
2020), and SegNet_ConvLSTM (Zou et al., 2020).
Out_ConvBlock (with the output channel of 2 unchanged). To
save space, the parameter settings of UNetLight based 3.1 Datasets
implementation will not be illustrated. 1) tvtLANE training set: To verify the proposed model
2) Loss function: Since the lane detection is modeled as a performance, the tvtLANE dataset (Zou et al., 2020) based
segmentation task and a pixel-wise binary classification upon the TuSimple lane marking challenge dataset, was first
problem, cross-entropy is a suitable candidate to serve as the utilized for training, validating, and testing. The original
loss function. However, because the pixels classified to be dataset of the TuSimple lane marking challenge includes 3,626
lanes are always quite less than those classified to be the clips of training and 2,782 clips of testing which are collected
background (meaning that it is an imbalanced binary under various weather conditions and during different periods.
classification and discriminative segmentation task), in the In each clip, there are 20 continuous frames saved in the same
implementation, the loss was built upon the weighted cross- folder. In each clip, only the lane marking lines of the very last
entropy. The adopted loss function as the standard weighted frame, i.e., the 20th frame, are labelled with the ground truth
binary cross-entropy function is given as in equation (10), officially. Zou et al. (2020) additionally labelled every 13th
1 image in each clip and added their own collected lane dataset
𝐿𝑜𝑠𝑠 = − ∑𝑆𝑖=1[𝑤 ∗ 𝑦𝑖 ∗ 𝑙𝑜𝑔(ℎ𝜃 (𝑥𝑖 )) + (1-𝑦𝑖 ) ∗ 𝑙𝑜𝑔(1 −
𝑆
DONG ET AL. 7

which includes 1,148 sequences of rural driving scenes TABLE 1. Trainset and testset in tvtLANE.
collected in China. This immensely expanded the variety of the
road and driving conditions since the original TuSimple Trainset
dataset only covers the highway driving conditions. K Subset Labled Images Num
continuous frames of each clip are used as the inputs with the Original TuSimple Dataset (Highway) 7,252
ground truth of the labelled 13th or 20th frame to train the Zou et al. (2020) added (Rural Road) 2,296
models. Sample Methods
To further augment the training dataset, crop, flip, and Sample
Labled Ground Truth Train Sample Frames
Stride
rotation operations were employed, thus a total number of
3 1st, 4th, 7th, 10th, 13th
(3,626 + 1,148) × 4 = 19,096 continuous sequences were th
13 2 5th, 7th, 9th, 11th, 13th
produced, in which 38,192 images are labelled with ground
1 9th, 10th, 11th, 12th, 13th
truth. To adapt to different driving speeds, the input image
3 8th, 11th, 14th, 17th, 20th
sequences were sampled at 3 strides with a frame interval of 1,
20th 2 12th, 14th, 16th, 18th,20th
2, or 3, respectively. Then, 3 sampling methods were employed
1 16th, 17th, 18th, 19th,20th
to construct the training samples regarding the labelled 13th
Testset
and 20th frames in each sequence, as demonstrated in Table 1. Labled Labled
2) tvtLANE testing set: Two different datasets were used for Sample
Subset Images Ground Test Sample Frames
Stride
testing, i.e., Testset #1 (normal) and Testset #2 (challenging), Num Truth
th
which are also formatted with 5 continuous images as the input Testset #1 13 1 9th, 10th, 11th, 12th, 13th
540
Normal 20th 1 16th, 17th, 18th,19th,20th
to detect the lane markings in the very last frame with the
labelled ground truth. To be specific, Testset #1 is built upon 1st, 2nd, 3rd, 4th, 5th
the original TuSimple test set for normal driving scene testing; Testset #2 2nd, 3rd, 4th, 5th, 6th
728 All 1
Challenging 3rd, 4th, 5th, 6th, 7th
while Testset #2 is constructed with 12 challenging driving
situations, especially used for robustness evaluation. The ⋯
detailed descriptions of the trainset and testset in tvtLANE are
illustrated in Table 1, with examples shown in Figure 2. (a)

3.1 Qualitative evaluation (b)

Qualitative evaluation with the visualization of the lane (c)

detection results is the most intuitive approach to compare and
evaluate the properties of different models, and it helps to find (d)
insights regarding their pros and cons.
1) tvtLANE Testset #1: normal situations FIGURE 2. Samples data in trainset and testset. (a) original TuSimple
dataset (Highway), (b) Zou et al., (2020) added Rural Road situations, (c)
Samples of the lane detection results on tvtLANE testset #1 Testset #1 Normal situations, and (d) Testset #2 Challenging situations.
of the proposed models and other state-of-the-art models are In each row, the first five images are the input image sequence the last
demonstrated in Figure 3(1). All these results are without post- image is the labelled ground truth.
processing. points around, thinner predicted lane lines indicate a more
In general, a good lane detection should include the precise model prediction of the lane position.
following 5 properties: • The predicted lane lines should not merge or be broken.
• The number of lines need to be predicted correctly. A As illustrated in the 1st, 2nd, 6th, 7th, and 8th columns of Figure
wrong detection or a misprediction might cause the automated 3(1), some baseline models’ output lane lines either merge at
vehicles to consider unsafe or unreachable areas as drivable the far end or break the continuity with dashed lines. The
areas resulting in potential accidents. As illustrated in the 1st proposed models perform slightly better although in a few
and 2nd columns in Figure 3(1), the proposed models can cases the lines are also discontinuous.
identify the correct number of lane lines, while the baseline • The lanes should be predicted correctly even at the
models, especially the ones using a single image, somewhat boundary of the image. As can be found in Figure 3(1), some
cannot detect the correct number of lines compared with baseline models, e.g., row (c), (d), and (e), run across
ground truth. difficulties at the top boundary of the image with merge lanes
• The positions of each lane marking line should be on the top. This also accords with the aforementioned property.
predicted precisely accords with the ground truth. As • The lane detection models should deliver accurate
illustrated in Figure 3(1), the proposed models in row (j) with predictions under different driving scenes, even under some
the model named by SCNN_SegNet_ConvLSTM2 and row (n) challenging situations. For example, in the 2 nd, 3rd, 5th, and 7th
with the model named by SCNN_UNet_ConvLSTM2, could columns of Figure 3(1), vehicles are occluding the lanes. A
deliver better lane location predictions with thinner lines, good lane detection model should be able to handle these. The
compared with the baseline models. Superior to scattering proposed models perform well under these slightly challenging
cases, more challenging situations are further discussed later.
8 DONG ET AL.

Input images: (a)

(a)
Ground truth: (b)
(b)

Baseline Models: (c) SegNet; (d) UNet; (e) SegNet_ConvLSTM; (f) UNet_ConvLSTM
(c)

(d)

(e)

(f)

Proposed Models SegNet-based: (g) SCNN_SegNet_ConvGRU1; (h) SCNN_SegNet_ConvGRU2;

(i) SCNN_SegNet_ConvLSTM1; (j) SCNN_SegNet_ConvLSTM2

(g)

(h)

(i)

(j)

Proposed Models UNet-based: (k) SCNN_UNet_ConvGRU1; (l) SCNN_UNet_ConvGRU2;

(m) SCNN_UNet_ConvLSTM1; (n) SCNN_UNet_ConvLSTM2

(k)

(l)

(m)

(n)

Proposed Models UNetLight-based: (o) SCNN_UNetLight_ConvGRU1;

(p) SCNN_UNetLight_ConvGRU2; (q) SCNN_UNetLight_ConvLSTM1; (r) SCNN_UNetLight_ConvLSTM2

(o)

(p)

(q)

(r)

(1) Visualization of the lane-detection results on tvtLANE Testset #1 (normal situations).

DONG ET AL. 9

Input images: (a)

bright dim&occlude dirty&occlude occlude curve blur tunnel blur&curve

(a)
Ground truth: (b)
(b)

Baseline Models: (c) SegNet; (d) UNet; (e) SegNet_ConvLSTM; (f) UNet_ConvLSTM
(c)

(d)

(e)

(f)

Proposed Models SegNet-based: (g) SCNN_SegNet_ConvGRU1; (h) SCNN_SegNet_ConvGRU2;

(i) SCNN_SegNet_ConvLSTM1; (j) SCNN_SegNet_ConvLSTM2
(g)

(h)

(i)

(j)

Proposed Models Unet-based: (k) SCNN_UNet_ConvGRU1; (l) SCNN_UNet_ConvGRU2;

(m) SCNN_UNet_ConvLSTM1; (n) SCNN_UNet_ConvLSTM2
(k)

(l)

(m)

(n)

Proposed Models UNetLight-based: (o) SCNN_UNetLight_ConvGRU1;

(p) SCNN_UNetLight_ConvGRU2; (q) SCNN_UNetLight_ConvLSTM1; (r) SCNN_UNetLight_ConvLSTM2

(o)

(p)

(q)

(r)
(2) Visualization of the lane-detection results on tvtLANE Testset #2 (challenging situations)
FIGURE 3. Qualitative evaluation: visualization of the lane-detection results on (1) tvtLANE Testset #1 and (2) tvtLANE Testset #2
10 DONG ET AL.

2) tvtLANE Testset #2: 12 challenging driving cases false positive means the number of image pixels that are
Figure 3(2) shows the comparison of the proposed models background but are wrongly classified as lane markings; false
with the baseline models under some extremely challenging negative stands for the number of image pixels which are lane
driving scenes in the tvtLANE testset#2. All the results are not marking but are wrongly classified as the background.
post-processed. These challenging scenes cover wide Specifically, this study chooses 𝛽 = 1, which corresponds
situations including serious vehicle occlusion, bad lighting to the F1-measure (harmonic mean) shown in equation (15).
conditions (e.g., shadow, dim), tunnel situations, and dirt road Precision∗Recall
conditions. In some extremely challenging cases, the lanes are F1-measure = 2 ∗ (15)
Precision+Recall
totally occluded by vehicles, other objects, and/or shadows, The F1-Measure, which balances Precision and Recall, is
which could be very difficult even for humans to do the always selected as the main benchmark for model evaluation,
detection. e.g., (Liu et al., 2021; Pan et al., 2018; Xu et al., 2020; Zhang
As can be observed in Figure 3(2), although all the baseline et al., 2021; Zou et al., 2020).
models fail in these challenging cases, the proposed models, Furthermore, the model parameter size, i.e., Params (M),
especially the one named SCNN_SegNet_ConvLSTM2 together with the multiply-accumulate (MAC) operations, i.e.,
illustrated in the row (k), could still deliver good predictions in MACs (G), are provided as indicators of the model complexity.
almost every situation listed in Figure 3(2). The only flaw is The two indicators are commonly used to estimate models’
that in the 3rd column where vehicle occlusion and blur road computational complexities and real-time capabilities.
conditions happen simultaneously, the proposed models also
find it hard to predict precisely. With the results in the 4 th, 7th, 2) Performance and comparisons on tvtLANE testset
and 8th columns, the robustness of #1(normal situations)
SCNN_SegNet_ConvLSTM2’s property in detecting the As shown in Table 2, the proposed model of
correct number of lane lines is further verified, especially, one SCNN_UNet_ConvLSTM2, performs the best when
can observe in the 4th column, where almost all the other evaluating on tvtLANE Testset#1, with the highest Accuracy
models are defeated, SCNN_SegNet_ConvLSTM2 can still and F1-Measure, while the proposed model of
predict the correct number of lanes. SCNN_SegNet_ConvLSTM2 delivers the best Precision.
Furthermore, it should be noticed that correct lane location
predictions in these challenging situations are of vital TABLE 2. Model performance comparison on tvtLANE
importance for safe driving. For example, regarding the testset #1 (normal situations)
situation in the last column where a heavy vehicle totally Test_Acc
(%) Precision Recall
F1- MACs Params
Measure (G) (M)
shadows the field of vision on the left side, it will be very Baseline Models
dangerous if the automated vehicle is driving according to the Models
using
U-Net 96.54 0.790 0.985 0.877 15.5 13.4
lane detection results demonstrated in the 3 rd to 5th rows. single SegNet 96.93 0.796 0.962 0.871 50.2 29.4
image
SCNN* 96.79 0.654 0.808 0.722 77.7 19.2
as input
3.2 Quantitative evaluation LaneNet* 97.94 0.875 0.927 0.901 44.5 19.7

1) Evaluation metrics: This subsection examines the SegNet_ConvLSTM** 97.92 0.874 0.931 0.901 217.0 67.2

proposed models’ properties regarding quantitative UNet_ConvLSTM** 98.00 0.857 0.958 0.904 69.0 51.1

evaluations. When treated as a pixel-wise classification task, Proposed Models (SegNet-Based)

SCNN_SegNet_ConvGRU1 98.00 0.878 0.935 0.905 219.2 43.7
accuracy must be the most simple criterion for the performance
SCNN_SegNet_ConvGRU2 98.05 0.888 0.918 0.903 221.5 57.9
evaluation of lane detection (Zou et al., 2017), which
SCNN_SegNet_ConvLSTM1 98.01 0.881 0.935 0.907 220.0 48.5
represents the overall classification performance in terms of
SCNN_SegNet_ConvLSTM2 98.07 0.893 0.928 0.910 223.0 67.3
correctly classified pixels, indicated in equation (11). Models
𝑇𝑟𝑢𝑙𝑦 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑃𝑖𝑥𝑒𝑙𝑠 using Proposed Models (UNet-Based)
Accuracy = (11) continuous
SCNN_UNet_ConvGRU1 98.13 0.878 0.957 0.916 77.9 27.7
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑖𝑥𝑒𝑙𝑠 images
However, since it is an imbalanced binary classification sequence SCNN_UNet_ConvGRU2 98.19 0.887 0.950 0.917 87.0 41.9
as inputs
SCNN_UNet_ConvLSTM1 98.18 0.886 0.948 0.916 81.0 32.4
problem, where the lanes pixels are far less than the
SCNN_UNet_ConvLSTM2 98.19 0.889 0.950 0.918 93.0 51.3
background pixels, using only accuracy to evaluate the model
Proposed Models (Light Version UNet-Based)
is not suitable. Thus, Precision, Recall, and F-measure,
SCNN_UNetLight_ConvGRU1 97.83 0.850 0.960 0.902 19.6 6.9
illustrated by equation (12)-(14), are commonly employed.
SCNN_UNetLight_ConvGRU2 98.01 0.863 0.950 0.905 21.9 10.5
True Positive
Precision = (12) SCNN_UNetLight_ConvLSTM1 97.71 0.830 0.950 0.886 20.4 8.1
True Positive+False Positive
True Positive SCNN_UNetLight_ConvLSTM2 97.76 0.840 0.953 0.893 23.4 12.8
Recall = (13)
True Positive+False Negative

F-measure = (1 + 𝛽 ) 2 Precision∗Recall
(14) * Results reported in (Zhang et al., 2021).
𝛽 2 Precision+Recall ** There are two hidden layers of ConvLSTM in
In the above equation, true positive indicates the number of SegNet_ConvLSTM and UNet_ConvLSTM.
image pixels that are lane marking and are correctly identified;
DONG ET AL. 11

TABLE 3. Model performance comparison on tvtLANE testset #2 (12 types of challenging scenes)
PRECISION
Challenging Scenes 1-
2-
6- 8-
10-
12-
curve 3- 4- 5- dirty 7- blur 9- 11- dim
shadow- shadow- overall
Models & bright occlude curve & urban & blur tunnel &
bright dark
occlude occlude curve occlude

U-Net 0.7018 0.7441 0.6717 0.6517 0.7443 0.3994 0.4422 0.7612 0.8523 0.7881 0.7009 0.5968 0.6754

SegNet 0.6810 0.7067 0.5987 0.5132 0.7738 0.2431 0.3195 0.6642 0.7091 0.7499 0.6225 0.6463 0.6080

UNet_ConvLSTM 0.7591 0.8292 0.7971 0.6509 0.8845 0.4513 0.5148 0.8290 0.9484 0.9358 0.7926 0.8402 0.7784

SegNet_ConvLSTM 0.8176 0.8020 0.7200 0.6688 0.8645 0.5724 0.4861 0.7988 0.8378 0.8832 0.7733 0.8052 0.7563

SCNN_SegNet_ConvGRU1 0.8107 0.7951 0.7225 0.6830 0.8503 0.4640 0.5071 0.6699 0.8481 0.8994 0.7804 0.8429 0.7477

SCNN_SegNet_ConvGRU2 0.7952 0.8087 0.7770 0.6444 0.8689 0.5067 0.5171 0.7147 0.8423 0.8744 0.7979 0.8757 0.7572

SCNN_SegNet_ConvLSTM1 0.7945 0.8078 0.7600 0.6417 0.8525 0.5252 0.3686 0.7582 0.7715 0.8702 0.7778 0.8517 0.7348

SCNN_SegNet_ConvLSTM2 0.8326 0.7497 0.7470 0.7369 0.8647 0.6196 0.4333 0.7371 0.8566 0.9125 0.8153 0.8466 0.7673

SCNN_UNet_ConvGRU1 0.8492 0.8306 0.8163 0.7845 0.8819 0.4025 0.4493 0.7378 0.8291 0.8928 0.8198 0.8040 0.7639

SCNN_UNet_ConvGRU2 0.8678 0.7873 0.8548 0.7654 0.8805 0.5319 0.4735 0.8064 0.8765 0.8431 0.7112 0.7388 0.7640

SCNN_UNet_ConvLSTM1 0.8602 0.7844 0.8119 0.7807 0.8871 0.4066 0.4652 0.7445 0.8321 0.8972 0.7507 0.7068 0.7531

SCNN_UNet_ConvLSTM2 0.8182 0.8362 0.8189 0.7359 0.8365 0.5872 0.5377 0.8046 0.8770 0.8722 0.7952 0.7817 0.7784

SCNN_UNetLight_ConvGRU1 0.8212 0.7454 0.7189 0.6996 0.8521 0.3499 0.3999 0.7851 0.7282 0.8686 0.6940 0.6289 0.7011

SCNN_UNetLight_ConvGRU2 0.8147 0.8349 0.7390 0.7004 0.8591 0.4039 0.3360 0.6811 0.8300 0.8533 0.8125 0.7996 0.7238

SCNN_UNetLight_ConvLSTM1 0.7222 0.7450 0.6533 0.6203 0.8039 0.2635 0.2716 0.7341 0.7546 0.7319 0.6298 0.7406 0.6377

SCNN_UNetLight_ConvLSTM2 0.7618 0.7416 0.7067 0.6537 0.8096 0.1921 0.2639 0.6857 0.6830 0.6931 0.6391 0.6022 0.6190

F1-MEASURE
Challenging Scenes 1-
2-
6- 8-
10-
12-
curve 3- 4- 5- dirty 7- blur 9- 11- dim
shadow- shadow- overall
Models & bright occlude curve & urban & blur tunnel &
bright dark
occlude occlude curve occlude

U-Net 0.8200 0.8408 0.7946 0.7337 0.7827 0.3698 0.5658 0.8147 0.7715 0.6619 0.5740 0.4646 0.6985

SegNet 0.8042 0.7900 0.7023 0.6127 0.8639 0.2110 0.4267 0.7396 0.7286 0.7675 0.6935 0.5822 0.6727

UNet_ConvLSTM 0.8465 0.8891 0.8411 0.7245 0.8662 0.2417 0.5682 0.8323 0.7852 0.6404 0.4741 0.5718 0.7143

SegNet_ConvLSTM 0.8852 0.8544 0.7688 0.6878 0.9069 0.4128 0.5317 0.7873 0.7575 0.8503 0.7865 0.7947 0.7609

SCNN_SegNet_ConvGRU1 0.8821 0.8626 0.7734 0.7185 0.9039 0.3027 0.5288 0.7229 0.7866 0.8658 0.7759 0.7763 0.7547

SCNN_SegNet_ConvGRU2 0.8710 0.8630 0.8094 0.6989 0.9005 0.3963 0.5497 0.7470 0.7637 0.8525 0.7798 0.7396 0.7591

SCNN_SegNet_ConvLSTM1 0.8768 0.8801 0.8185 0.7166 0.9083 0.3750 0.4516 0.7806 0.7320 0.8622 0.8029 0.8245 0.7629

SCNN_SegNet_ConvLSTM2 0.8956 0.8237 0.7909 0.7468 0.9108 0.4398 0.4858 0.7379 0.7546 0.8729 0.7963 0.8074 0.7666

SCNN_UNet_ConvGRU1 0.8608 0.8745 0.8393 0.7802 0.9005 0.3181 0.5143 0.7833 0.7567 0.5554 0.3503 0.3703 0.6839

SCNN_UNet_ConvGRU2 0.8706 0.8556 0.8304 0.7647 0.8532 0.3515 0.5253 0.8345 0.7399 0.5405 0.3567 0.2855 0.6722

SCNN_UNet_ConvLSTM1 0.8971 0.8493 0.8234 0.7633 0.8997 0.3054 0.5307 0.7424 0.7436 0.6243 0.5568 0.5366 0.6992

SCNN_UNet_ConvLSTM2 0.8670 0.8866 0.8405 0.7565 0.7955 0.4179 0.5933 0.7880 0.7285 0.6296 0.4747 0.4134 0.7024

SCNN_UNetLight_ConvGRU1 0.8896 0.8212 0.7819 0.7517 0.8913 0.3043 0.4961 0.8133 0.7000 0.5635 0.3086 0.2733 0.6637

SCNN_UNetLight_ConvGRU2 0.8593 0.8730 0.7878 0.7406 0.8889 0.3335 0.4266 0.7263 0.7782 0.6498 0.5280 0.5257 0.6910

SCNN_UNetLight_ConvLSTM1 0.8115 0.8056 0.7168 0.6882 0.8179 0.2613 0.3681 0.7834 0.7576 0.5701 0.5281 0.5081 0.6418

SCNN_UNetLight_ConvLSTM2 0.8377 0.8158 0.7620 0.6971 0.8365 0.2209 0.3577 0.7551 0.6594 0.4597 0.3545 0.3559 0.6079

Incorporating the quantitative evaluation with the (iv) the thinness of the predicted lanes with less blurriness,
qualitative evaluation, it could be easily interpreted that the which accords with (ii). The correct prediction directly reduces
highest Precision, Accuracy, and F1-Measure are mainly the number of False Positives, and a good Precision contributes
derived from (i) the correct lane number, (ii) the accurate lane to better Accuracy and F1-Measure. Considering the structure
position, (iii) the sound continuity in the detected lanes, and of the proposed model architecture, a further explanation of the
12 DONG ET AL.

high F1-Measure, Accuracy, and Precision can be explained as Table 3 provides the Precision and F1-Measure for the
follows: evaluation reference.
Firstly, the SCNN layer embedded in the encoder equips the As indicated by the bold numbers, the proposed model,
proposed model with better information extracting ability SCNN_SegNet_ConvLSTM2, results in the best F1-Measure
regarding the low-level features and spatial relations in each at the overall level and in more situations, while the
image. UNet_ConvLSTM results in the best Precision at the overall
Secondly, the ST-RNN blocks, i.e., ConvLSTM / level and in more situations. Incorporating with the qualitative
ConvGRU layers, can effectively capture the temporal evaluation in Figure 3(2), it is shown that UNet_ConvLSTM
dependencies among the continuous image frames, which tends to not classify pixels into lane lines for uncertain areas
could be very helpful for challenging situations where the lanes under some challenging situations (e.g., the 2nd and 7th columns
are shadowed or covered by other objects in the current frame. in Figure 3(2)). This might be the reason for its obtaining better
Finally, the proposed architecture could make the best of Precision. To further confirm this speculation, Figure 4
the spatial-temporal information among the processed K compares the lane detection results of
continuous frames by regulating the weights of the SCNN_SegNet_ConvLSTM2 and UNet_ConvLSTM under
convolutional kernels within the SCNN and ConvLSTM / challenging situations 8-blur&curve, and 10-shadow-dark,
ConvGRU layers. where UNet_ConvLSTM delivers very good Precisions.
All in all, with the proposed architecture the proposed (a)
model tries to not only strengthen feature extraction regarding
(b)
spatial relation in one image frame but also the spatial-
(c)
temporal correlation and dependencies among image frames
for lane detection. (d)

Looking at the main metric, F1-Measure, it is demonstrated (e)

that increasing only Precision or only Recall will not improve
(f)
the F1-Measure. Although the bassline models of U-Net, (1) Challenging situation 8-blur&curve
SegNet, and SegNet_ConvLSTM get better Recalls, they do (a)
not deliver good F1-Measure since their Precisions is much
(b)
lower than the proposed model of
(c)
SCNN_SegNet_ConvLSTM2 or SCNN_UNet_ConvLSTM2.
Regarding the good Recall of U-Net and SegNet, it could be (d)

speculated from the qualitative evaluation, where one can find (e)
that U-Net and SegNet tend to produce thicker lane lines. With
(f)
thicker lines and blurry areas, the two models can somehow
(2) Challenging situation 10-shadow-dark
reduce the False Negative, which will contribute to better FIGURE 4. Visual comparison of the lane-detection results on
Recall. This also demonstrates that Recall and Precision challenging driving situations for UNet_ConvLSTM and the proposed
antagonize each other which further proves that F1-Measure model SCNN_SegNet_ConvLSTM2. All the results are not post-
processed. (a) Input images. (b) Ground truth. (c) Detection results of
should be a more reasonable evaluation measure compared UNet_ConvLSTM. (d) Detection results of UNet_ConvLSTM
with Precision and Recall. overlapping on the original images. (e) Detection results of
3) Performance and comparisons on tvtLANE testset #2 SCNN_SegNet_ConvLSTM2. (f) Detection results of
SCNN_SegNet_ConvLSTM2 overlapping on the original images. The
(challenging situations)
upper part (1) is for challenging situation 8-blur&curve, while the down
To further evaluate the proposed models’ performance and part (2) is for situation 10-shadow-dark.
verify the models’ robustness, the models were evaluated on a
As illustrated in Figure 4, truly UNet_ConvLSTM tries not
brand-new dataset, i.e., the tvtLANE Testset #2. As introduced
to classify pixels into lane lines under uncertain areas as much
in 3.1 Datasets, tvtLANE Testset #2 includes 728 images in
as possible. This leads to fewer False Negatives which helps
highway, urban, and rural driving scenes. These challenging
for raising a better Precision. However, in real application
driving scenes’ data were obtained by data recorders at various
scenarios, this is not wise and not acceptable. On the contrary,
locations, outside and inside the car front windshield under
the proposed model SCNN_SegNet_ConvLSTM2 tries to
different road and weather conditions. Testset #2 is a
make tough but valuable detections classifying candidate
challenging and comprehensive dataset for model evaluation,
points into lane lines in the challenging uncertain areas with
from which some cases would be difficult enough for humans
dirt, dark road conditions, and/or vehicle occlusions. This may
to do the correct detection.
lead to more False Negatives and a worse Precision but is
Table 3 demonstrates the model performance comparison
praiseworthy. These analyses further demonstrate that F1-
on the 12 types of challenging scenes in tvtLANE Testset #2.
Measure is a better measure compared with Precision. Finally,
Following the results and discussions in 2) Performance and
it can be concluded that the proposed model,
comparisons on tvtLANE testset #1(normal situations), here
SCNN_SegNet_ConvLSTM2, delivers the best performance
DONG ET AL. 13

Normal Challenging #2 shadow-bright Challenging #8 blur&curve

on the challenging tvtLANE Testset #2, which verified the
proposed model architecture’s robustness.
To sum up, the proposed model architecture demonstrates (a)
its effectiveness in both normal and challenging driving
scenes, with the UNet based model,
SCNN_UNet_ConvLSTM2, beats the baseline models with a (b)
large margin on normal situations, while the SegNet based
model, SCNN_SegNet_ConvLSTM2 performs the best
handling almost all the challenging driving scenes. The finding (c)
that, compared with UNet based models, SegNet based neural
network models are more robust coping with challenging
driving environments accords with results in (Zou et al., 2020). FIGURE 5. Visualization of the extracted low-level features at
Down_ConvBlock_1 for UNet based models. (a) Original image. (b)
Results of UNet_ConvLSTM (without SCNN layers). (c) Results of the
3.3 Parameter analysis and ablation study SCNN_UNet_ConvLSTM2 (with SCNN layers).
1) The added value of SCNN updating weight parameters of the neural networks) and thus
Regarding the neural network architecture, the effects of affect the model’s performance regarding the marking
SCNN were investigated by evaluating performances of the detection results. It might further influence the final detection
model variants with and without SCNN layers. As results. In contrast, with SCNN layers, the extracted features
demonstrated in Figure 3 and Figure 4, together with the of the lanes are more inerratic, clear, and evident as shown in
quantitative results in Table 2 and Table 3, the proposed Figure 5 (c). There are fewer interferences surrounding the
SegNet and UNet based models with SCNN embedded detected lane features. This verifies SCNN’s powerful strength
encoder, i.e., SCNN_SegNet_ConvLSTM, in detecting the spatial relations in every single image with its
SCNN_SegNet_ConvGRU, SCNN_UNet_ConvLSTM, and message passing mechanism.
SCNN_UNet_ConvGRU, outperform SegNet_ConvLSTM All the above results demonstrate the adding of the SCNN
and UNet_ConvLSTM which are also SegNet or UNet based layer embedded in the encoder does contribute to the spatial
sequential model using multiple continuous image frames as feature extraction, with which the model could better make the
inputs but without SCNN. Especially, utmost use of the spatial-temporal information among the
SCNN_UNet_ConvLSTM2 obtains the best result in normal continuous image frames.
testing while SCNN_SegNet_ConvLSTM2 delivers the best 2) Different locations of SCNN layer
performance in challenging situations. TABLE 4. Model performance comparison with different
For normal cases’ testing on tvtLANE Testset#1, as shown locations of SCNN layer on tvtLANE testset #1 and #2.
in Table 2, by adding SCNN layer in the encoder, almost all Testing Datasets
the proposed models with SCNN embedded encoder
Testset #1 Testset #2
outperform the baseline models with better F1-Measure. To be (Normal Situations) (Challenging Scenes)
specific, SCNN_SegNet_ConvLSTM2 improves the lane Models
s
detection accuracy by around 0.3% and F1-measure by around
Location Test_ Precisi F1-
Test_
Precisi F1-
1%, and these improvements are from the already very good Acc Recall Acc Recall
of SCNN (%) on Measure
(%)
on Measure
results obtained by SegNet_ConvLSTM. Similarly, SegNet_Conv
LSTM
Without 97.92 0.874 0.931 0.901 97.83 0.756 0.765 0.761
SCNN_UNet_ConvLSTM2 overperforms UNet_ConvLSTM
SCNN_SegNet_ Conv1_1 98.00 0.884 0.921 0.902 97.92 0.757 0.757 0.757
with even larger margins regarding both Accuracy, Precision, ConvLSTM2 Conv2_1 98.07 0.893 0.928 0.910 97.90 0.767 0.766 0.767
and F1-measure. UNet_Conv
Without 98.00 0.857 0.957 0.904 97.93 0.778 0.660 0.714
For challenging situations, adding the SCNN layer also LSTM
helps the proposed model, SCNN_SegNet_ConvLSTM2, beat SCNN_UNet_ In_Conv_1 98.28 0.896 0.939 0.917 98.08 0.776 0.593 0.672
ConvLSTM2 Conv1_1 98.19 0.889 0.950 0.918 97.95 0.778 0.640 0.702
other baseline models, and deliver the best F1-Measure as
indicated in Table 3. Results of testing different locations of the SCNN layer in
Figure 5 visualizes the extracted features at the proposed model architecture are shown in Table 4. The
Down_ConvBlock_1 layer for UNet based models, with and results reveal that: (a) Compared with baseline models without
without SCNN. Clearly, vast differences can be witnessed SCNN layers, the embedding of SCNN layers really help to
between the baseline model UNet_ConvLSTM and the improve the models’ performance, which further verifies the
proposed model SCNN_UNet_ConvLSTM2. In Figure 5 (b), added-value of SCNN and accords with the aforementioned
the CNN-based UNet layers identify the low-level features in results in 1); (b) In terms of the main evaluation metric F1-
the images regarding the target lane lines. However, the measure, embedding SCNN layer after the Conv1_1 (in
extracted features are not so clear, i.e., there are some SegNet based model) or In_Conv_1 (in UNet based model)
interference signals, especially as visualized in the third image layer delivers better results compared with embedding it at the
of row (b), which is supposed to affect the model training (i.e., very beginning or early layers of the encoder; (c) For UNet
based model, embedding SCNN layer at the very beginning
14 DONG ET AL.

delivers quite good Precision and Accuracy, but worse Recall, network backbone to reduce the total parameter size and
which means there are fewer False Positives but more False improve the model’s ability to operate in real-time. The
Negatives. This should be related to the properties of the UNet UNetLight backbone has a similar network design with UNet
style neural network. These results further confirm the whose parameter settings are demonstrated in Table A2. The
effectiveness of the proposed model architecture. only difference is that all the numbers of kernels in the
3) Type and number of ST-RNN layers ConvBlocks are reduced to half except for the Input in
As described in Section 3, in the proposed model In_ConvBlock (with the input channel of 3 unchanged) and
architecture two types of RNNs, i.e., ConvLSTM and Output in Out_ConvBlock (with the output channel of 2
ConvGRU, are employed to serve in the ST-RNN block, to unchanged). From the testing results in Table 2, it is shown that
capture and make use of the spatial-temporal dependencies and the model named SCNN_UNetLight_ConvGRU2, with fewer
correlations among the continuous image sequences. The parameters than all the baseline models, beat the baselines
number of hidden ConvLSTM and ConvGRU layers were also exhibiting better performance regarding both Accuracy and
tested from 1 to 2. The quantitative results are demonstrated in F1-Measure. To be specific, compared with the best baseline
Table 2 and Table 3, while some intuitive qualitative insights model, i.e., UNet_ConvLSTM,
could be drawn from Figure 3 and Figure 4. SCNN_UNetLight_ConvGRU2 only uses less than one-fifth
From Table 2, it is illustrated that in general models of the parameter size but delivers better evaluation metrics in
adopting ConvLSTM layers in the ST-RNN block perform testing Accuracy, Precision, and F1-Measure.
better than those adopting ConvGRU layers with improved F1- Regarding UNetLight based models, models using
measure, except for the UNetLight based. This could be ConvGRU layers in the ST-RNN block perform better than
explained by ConvLSTM’s better properties in extracting those adopting ConvLSTM. The reason could be that light
spatial-temporal features and capturing time dependencies by version UNet cannot implement high-quality feature extraction
more control gates and thus more parameters compared with which does not feed enough information for ConvLSTM,
ConvGRU. Furthermore, from Table 2 and Table 3, it is while ConvGRU, with fewer control gates, is more robust
observed that models with two hidden ST-RNN layers, for when low-level features are not that fully extracted.
both ConvLSTM and ConvGRU, generally perform better than All these results further verify the proposed network
those with only one hidden ST-RNN layer. This could be architecture’s effectiveness and strength.
speculated that with two hidden ST-RNN layers, one layer can
serve for sequential feature extraction, and the other can 4 CONCLUSION
achieve spatial-temporal feature integration. The In this paper, a novel spatial-temporal sequence-to-one model
improvements of two ST-RNN layers over one are not that framework with a hybrid neural network architecture is
significant which might be due to (a) models employing one proposed for robust lane detection under various normal and
ST-RNN layer already obtain good results; (b) since the length challenging driving scenes. This architecture integrates single
of the continuous image frames is only five, one ST-RNN layer image feature extraction module with SCNN, spatial-temporal
might be already enough to do the spatial-temporal feature feature integration module with ST-RNN, together with the
extraction, so when incorporating longer image sequences the encoder-decoder structure. The proposed architecture achieved
superiorities of two ST-RNN layers could be promoted. significantly better results in comparison to baseline models
However, longer image sequences require more computational that use a single frame (e.g., U-Net, SegNet, and LaneNet), as
resources and longer training time, which could not be well as the state-of-art models adopting "CNN+RNN"
afforded at the present stage in this study. This could be the structures (e.g., UNet_ConvLSTM, SegNet_ConvLSTM),
future research direction. with the best testing Accuracy, Precision, F1-measure on the
4) Number of parameters and real-time capability normal driving dataset (i.e., tvtLANE Testset #1) and the best
As shown in Table 2, the two proposed candidate models, F1-measure on 12 challenging driving scenarios dataset
i.e., SCNN_SegNet_ConvLSTM2 and (tvtLANE Testset #2). The results demonstrate the
SCNN_UNet_ConvLSTM2, possess a bit more parameters effectiveness of strengthening spatial relation abstraction in
compared with the baseline SegNet_ConvLSTM and every single image with SCNN layer, plus the employment of
UNet_ConvLSTM, respectively. However, almost all of the multiple continuous image sequences as inputs. The results
proposed model variants with different types and numbers of also demonstrate the proposed model architecture’s ability in
ST-RNN layers outperform the baselines, and some of them making the best of the spatial-temporal information in
are even with low parameter sizes e.g., continuous image frames. Extensive experimental results show
SCNN_SegNet_ConvGRU1, SCNN_SegNet_ConvLSTM1, the superiorities of the sequence-to-one "SCNN +
SCNN_UNet_ConvGRU1, SCNN_UNet_ConvLSTM1. ConvLSTM" over "SCNN + ConvGRU" and ordinary "CNN
Generally speaking, lower numbers of model parameters mean + ConvLSTM" regarding sequential spatial-temporal feature
better real-time capability. extracting and learning, together with target-information
In addition, four model variants were implemented with a classification for robust lane detection. In addition, testing
modified light version of UNet, i.e., UNetLight, serving as the results of the model variants with the modified light version of
DONG ET AL. 15

UNet (i.e., UNetLight) as the backbone, demonstrate the Berriel, R.F., de Aguiar, E., de Souza, A.F., Oliveira-Santos, T., 2017.
proposed model architecture’s potential regarding real-time Ego-Lane Analysis System (ELAS): Dataset and algorithms. Image
Vis. Comput. https://doi.org/10.1016/j.imavis.2017.07.005
capability. Bottou, L., 2010. Large-scale machine learning with stochastic gradient
To the best of the authors’ knowledge, the proposed model descent, in: Proceedings of COMPSTAT 2010 - 19th International
is the first attempt that tries to strengthen both spatial relations Conference on Computational Statistics, Keynote, Invited and
Contributed Papers. https://doi.org/10.1007/978-3-7908-2604-3_16
regarding feature extraction in every image frame together Chen, S., Leng, Y., Labi, S., 2020. A deep learning algorithm for
with the spatial-temporal correlations and dependencies simulating autonomous driving considering prior knowledge and
among image frames for lane detection, and the extensive temporal information. Comput. Civ. Infrastruct. Eng. 35, 305–321.
https://doi.org/10.1111/mice.12495
evaluation experiments demonstrate the strength of this
Chen, W., Wang, W., Wang, K., Li, Z., Li, H., Liu, S., 2020. Lane
proposed architecture. Therefore, it is recommended in future departure warning systems and lane line detection methods based on
research to incorporate both aspects to obtain better image processing and semantic segmentation–a review. J. Traffic
performance. Transp. Eng. (English Ed. https://doi.org/10.1016/j.jtte.2020.10.002
Chen, Z., Liu, Q., Lian, C., 2019. PointLaneNet: Efficient end-to-end
In this paper, the challenging cases do not include night CNNs for accurate real-time lane detection. IEEE Intell. Veh. Symp.
driving, rainy or wet road conditions, neither do they include Proc. 2019-June, 2563–2568.
situations in which the input images are defective (e.g., partly https://doi.org/10.1109/IVS.2019.8813778
Choi, Y., Park, J.H., Jung, H., 2018. Lane Detection Using Labeling
masked or blurred). There are demands to build larger test sets Based RANSAC Algorithm. International Journal of Computer and
with comprehensive challenging situations to further validate Information Engineering, 12(4), 245–248.
the model’s robustness. Since a large amount of unlabeled Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L., 2009.
driving scene data involving various challenging cases was Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition (pp. 248-255).
collected within the research group, a future research direction https://doi.org/10.1109/cvprw.2009.5206848.
might be to develop semi-supervised learning methods and Du, H., Xu, Z., Ding, Y., 2018. The fast lane detection of road using
employ domain adaption to label the collected data, and then RANSAC algorithm, in: Advances in Intelligent Systems and
Computing. https://doi.org/10.1007/978-3-319-67071-3_1
open source them for boosting the research in the field of Guo, J., Wei, Z., Miao, D., 2015. Lane Detection Method Based on
robust lane detection. Furthermore, to further enhance the lane Improved RANSAC Algorithm, in: Proceedings - 2015 IEEE 12th
detection model, customed loss function, pre-trained International Symposium on Autonomous Decentralized Systems,
ISADS 2015. https://doi.org/10.1109/ISADS.2015.24
techniques adopted in image-inpainting task, e.g., masked
Haris, M., Glowacz, A., 2021. Lane line detection based on object feature
autoencoders, plus sequential attention mechanism could be distillation. Electron, 10(9), 1102.
introduced and integrated into the proposed framework. https://doi.org/10.3390/electronics10091102
Hochreiter, S., Schmidhuber, J., 1997. Long Short Term Memory. Neural
ACKNOWLEDGMENT Computation. Neural computation, 9(8), 1735-1780.
Hou, Y., Ma, Z., Liu, C., Hui, T.W., Loy, C.C., 2020. Inter-Region
This work was supported by the Applied and Technical Affinity Distillation for Road Marking Segmentation. Proc. IEEE
Sciences (TTW), a subdomain of the Dutch Institute for Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 12483–125492.
Scientific Research (NWO) through the Project Safe and https://doi.org/10.1109/CVPR42600.2020.01250
Jiao, X., Yang, D., Jiang, K., Yu, C., Wen, T., Yan, R., 2019. Real-time
Efficient Operation of Automated and Human-Driven lane detection and tracking for autonomous vehicle applications.
Vehicles in Mixed Traffic (SAMEN) under Contract 17187. Proc. Inst. Mech. Eng. Part D J. Automob. Eng.
The authors thank Dr. Qin Zou, Hanwen Jiang, and Qiyu Dai https://doi.org/10.1177/0954407019866989
Kim, J., Park, C., 2017. End-To-End Ego Lane Estimation Based on
from Wuhan University, as well as Jiyong Zhang from
Sequential Transfer Learning for Self-Driving Cars, in: IEEE
Southwest Jiaotong University for their tips in using the Computer Society Conference on Computer Vision and Pattern
tvtLANE dataset. Recognition Workshops. https://doi.org/10.1109/CVPRW.2017.158
Ko, Y., Lee, Y., Azam, S., Munir, F., Jeon, M., Pedrycz, W., 2020. Key
REFERENCES Points Estimation and Point Instance Segmentation Approach for
Lane Detection 1–10. https://doi.org/10.1109/tits.2021.3088488
Aly, M., 2008. Real time detection of lane markers in urban streets. IEEE
Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S.,
Intell. Veh. Symp. Proc. 7–12.
Kolter, J.Z., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G.,
https://doi.org/10.1109/IVS.2008.4621152
Stavens, D., Teichman, A., Werling, M., Thrun, S., 2011. Towards
Andrade, D.C., Bueno, F., Franco, F.R., Silva, R.A., Neme, J.H.Z.,
fully autonomous driving: Systems and algorithms, in: IEEE
Margraf, E., Omoto, W.T., Farinelli, F.A., Tusset, A.M., Okida, S.,
Intelligent Vehicles Symposium, Proceedings.
Santos, M.M.D., Ventura, A., Carvalho, S., Amaral, R.D.S., 2019. A
https://doi.org/10.1109/IVS.2011.5940562
Novel Strategy for Road Lane Detection and Tracking Based on a
Li, X., Li, J., Hu, X., Yang, J., 2020. Line-CNN: End-to-End Traffic Line
Vehicle’s Forward Monocular Camera. IEEE Trans. Intell. Transp.
Detection with Line Proposal Unit. IEEE Trans. Intell. Transp. Syst.
Syst. 20, 1497–1507. https://doi.org/10.1109/TITS.2018.2856361
21, 248–258. https://doi.org/10.1109/TITS.2019.2890870
Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. SegNet: A Deep
Liang, D., Guo, Y.C., Zhang, S.K., Mu, T.J., Huang, X., 2020. Lane
Convolutional Encoder-Decoder Architecture for Image
Detection: A Survey with New Results. J. Comput. Sci. Technol. 35,
Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–
493–505. https://doi.org/10.1007/s11390-020-0476-4
2495. https://doi.org/10.1109/TPAMI.2016.2644615
Lin, C., Li, L., Cai, Z., Wang, K.C.P., Xiao, D., Luo, W., Guo, J.G., 2020.
Ballas, N., Yao, L., Pal, C., Courville, A., 2016. Delving deeper into
Deep Learning-Based Lane Marking Detection using A2-LMDet.
convolutional networks for learning video representations, in: 4th
Transp. Res. Rec. https://doi.org/10.1177/0361198120948508
International Conference on Learning Representations, ICLR 2016 -
Liu, L., Chen, X., Zhu, S., Tan, P., 2021. CondLaneNet: a Top-to-down
Conference Track Proceedings.
Lane Detection Framework Based on Conditional Convolution. arXiv
Bar Hillel, A., Lerner, R., Levi, D., Raz, G., 2014. Recent progress in road
preprint arXiv:2105.05003.
and lane detection: A survey. Mach. Vis. Appl. 25, 727–745.
https://doi.org/10.1007/s00138-011-0404-2
16 DONG ET AL.

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J., 2019. On recognition. IEEE J. Sel. Top. Signal Process.
the Variance of the Adaptive Learning Rate and Beyond. arXiv https://doi.org/10.1109/JSTSP.2017.2756439
preprint arXiv:1908.03265. Xing, Y., Lv, C., Chen, L., Wang, Huaji, Wang, Hong, Cao, D., Velenis,
Liu, R., Yuan, Z., Liu, T., Xiong, Z., 2020. End-to-end Lane Shape E., Wang, F.Y., 2018. Advances in Vision-Based Lane Detection:
Prediction with Transformers 3694–3702. Algorithms, Integration, Assessment, and Perspectives on ACP-
https://doi.org/10.1109/wacv48630.2021.00374 Based Parallel Vision. IEEE/CAA J. Autom. Sin. 5, 645–661.
Liu, T., Chen, Z., Yang, Y., Wu, Z., Li, H., 2020. Lane Detection in Low- https://doi.org/10.1109/JAS.2018.7511063
light Conditions Using an Efficient Data Enhancement : Light Xu, H., Wang, S., Cai, X., Zhang, W., Liang, X., Li, Z., 2020. CurveLane-
Conditions Style Transfer. IEEE Intell. Veh. Symp. Proc. 2020-May, NAS: Unifying Lane-Sensitive Architecture Search and Adaptive
1394–1399. https://doi.org/10.1109/IV47402.2020.9304613 Point Blending. arXiv preprint arXiv:2007.12147.
Lu, Z., Xu, Y., Shan, X., Liu, L., Wang, X., Shen, J., 2019. A lane Yasrab, R., Gu, N., Zhang, X., 2017. An encoder-decoder based
detection method based on a ridge detector and regional G-RANSAC. Convolution Neural Network (CNN) for future Advanced Driver
Sensors (Switzerland). https://doi.org/10.3390/s19184028 Assistance System (ADAS). Appl. Sci.
Neven, D., De Brabandere, B., Georgoulis, S., Proesmans, M., Van Gool, https://doi.org/10.3390/app7040312
L., 2018. Towards End-to-End Lane Detection: An Instance Yoo, S., Seok Lee, H., Myeong, H., Yun, S., Park, H., Cho, J., Hoon Kim,
Segmentation Approach. IEEE Intell. Veh. Symp. Proc. 2018-June, D., 2020. End-to-end lane marker detection via row-wise
286–291. https://doi.org/10.1109/IVS.2018.8500547 classification. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Neven, D., De Brabandere, B., Georgoulis, S., Proesmans, M., Van Gool, Recognit. Work. 2020-June, 4335–4343.
L., 2017. Fast Scene Understanding for Autonomous Driving. https://doi.org/10.1109/CVPRW50498.2020.00511
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X., 2018. Spatial as deep: Spatial Zhang, J., Deng, T., Yan, F., Liu, W., 2021. Lane Detection Model Based
CNN for traffic scene understanding, in: 32nd AAAI Conference on on Spatio-Temporal Network With Double Convolutional Gated
Artificial Intelligence, AAAI 2018. AAAI press, pp. 7276–7283. Recurrent Units. IEEE Trans. Intell. Transp. Syst.
Pascanu, R., Mikolov, T., Bengio, Y., 2013. On the difficulty of training https://doi.org/10.1109/TITS.2021.3060258
recurrent neural networks, in: 30th International Conference on Zheng, F., Luo, S., Song, K., Yan, C.W., Wang, M.C., 2018. Improved
Machine Learning, ICML 2013. Lane Line Detection Algorithm Based on Hough Transform. Pattern
Philion, J., 2019. FastDraw: Addressing the long tail of lane detection by Recognit. Image Anal. 28, 254–260.
adapting a sequential prediction network. Proc. IEEE Comput. Soc. https://doi.org/10.1134/S1054661818020049
Conf. Comput. Vis. Pattern Recognit. 2019-June, 11574–11583. Zou, Q., Jiang, H., Dai, Q., Yue, Y., Chen, L., Wang, Q., 2020. Robust
https://doi.org/10.1109/CVPR.2019.01185 lane detection from continuous driving scenes using deep neural
Qin, Z., Wang, H., Li, X., 2020. Ultra Fast Structure-aware Deep Lane networks. IEEE Trans. Veh. Technol. 69, 41–54.
Detection. In Computer Vision–ECCV 2020: 16th European https://doi.org/10.1109/TVT.2019.2949603
Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part Zou, Q., Ni, L., Wang, Q., Li, Q., Wang, S., 2017. Robust Gait
XXIV 16 (pp. 276-291). Springer International Publishing. Recognition by Integrating Inertial and RGBD Sensors. IEEE Trans.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional Cybern. https://doi.org/10.1109/TCYB.2017.2682280
networks for biomedical image segmentation. Lect. Notes Comput.
Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics) 9351, 234–241. https://doi.org/10.1007/978-3-319-
24574-4_28
Ribeiro, A. H., Tiels, K., Aguirre, L. A., & Schön, T. (2020). Beyond
exploding and vanishing gradients: analysing RNN training using
attractors and smoothness. In International Conference on Artificial
Intelligence and Statistics (pp. 2370-2380). PMLR.
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.,
2015. Convolutional LSTM network: A machine learning approach
for precipitation nowcasting, in: Advances in Neural Information
Processing Systems.
Sivaraman, S., Trivedi, M.M., 2013. Integrated lane and vehicle detection,
localization, and tracking: A synergistic approach. IEEE Trans. Intell.
Transp. Syst. 14, 906–917.
https://doi.org/10.1109/TITS.2013.2246835
Sutskever, I., Vinyals, O., Le, Q. V., 2014. Sequence to sequence learning
with neural networks, in: Advances in Neural Information Processing
Systems (pp. 3104-3112).
Tabelini, L., Berriel, R., Paixão, T.M., Badue, C., de Souza, A.F.,
Oliveira-Santos, T., 2020a. PolyLaneNet: Lane estimation via deep
polynomial regression. arXiv preprint arXiv:2004.10924.
Tabelini, L., Berriel, R., Paixão, T.M., Badue, C., De Souza, A.F.,
Olivera-Santos, T., 2020b. Keep your Eyes on the Lane: Attention-
guided Lane Detection. arXiv e-prints, arXiv-2010
Wang, B.F., Qi, Z.Q., Ma, G.C., 2014. Robust lane recognition for
structured road based on monocular vision. J. Beijing Inst. Technol.
(English Ed). 23, 345–351.
Wang, S., Hou, X., Zhao, X., 2020. Automatic Building Extraction from
High-Resolution Aerial Imagery via Fully Convolutional Encoder-
Decoder Network with Non-Local Block. IEEE Access.
https://doi.org/10.1109/ACCESS.2020.2964043
Wang, Y., Dahnoun, N., Achim, A., 2012. A novel system for robust lane
detection and tracking. Signal Processing.
https://doi.org/10.1016/j.sigpro.2011.07.019
Wu, B., Li, K., Ge, F., Huang, Z., Yang, M., Siniscalchi, S.M., Lee,
C.H.L., 2017. An end-to-end deep learning approach to simultaneous
speech dereverberation and acoustic modeling for robust speech
DONG ET AL. 17

APPENDIX
See Table A1 and Table A2.
TABLE A1. Parameter settings for each layer of the SegNet-based neural network.

Input Output
Layer Kernel Padding Stride Activation
(channel×hight×width) (channel×hight×width)
Conv_1_1 3×128×256 64×128×256 3×3 (1,1) 1 ReLU
Down_ConvBlock_1 Conv_1_2 64×128×256 64×128×256 3×3 (1,1) 1 ReLU
Maxpool1 64×128×256 64×64×128 2×2 (0,0) 2 ---
SCNN_Down 64×1×128 64×1×128 1×9 (0,4) 1 ReLU
SCNN_Up 64×1×128 64×1×128 1×9 (0,4) 1 ReLU
SCNN
SCNN_Right 64×64×1 64×64×1 9×1 (4,0) 1 ReLU
SCNN_Left 64×64×1 64×64×1 9×1 (4,0) 1 ReLU
Conv_2_1 64×64×128 128×64×128 3×3 (1,1) 1 ReLU
Down_ConvBlock_2 Conv_2_2 128×64×128 128×64×128 3×3 (1,1) 1 ReLU
Maxpool2 128×64×128 128×32×64 2×2 (0,0) 2 ---
Conv_3_1 128×32×64 256×32×64 3×3 (1,1) 1 ReLU
Conv_3_2 256×32×64 256×32×64 3×3 (1,1) 1 ReLU
Down_ConvBlock_3
Conv_3_3 256×32×64 256×32×64 3×3 (1,1) 1 ReLU
Maxpool3 256×64×128 256×16×32 2×2 (0,0) 2 ---
Conv_4_1 256×16×32 512×16×32 3×3 (1,1) 1 ReLU
Conv_4_2 512×16×32 512×16×32 3×3 (1,1) 1 ReLU
Down_ConvBlock_4
Conv_4_3 512×16×32 512×16×32 3×3 (1,1) 1 ReLU
Maxpool4 512×16×32 512×8×16 2×2 (0,0) 2 ---
Conv_5_1 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Conv_5_2 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Down_ConvBlock_5
Conv_5_3 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Maxpool5 512×8×16 512×4×8 2×2 (0,0) 2 ---
5 * ConvLSTMCell(input=(512×4×8), kernel=(3,3), stride=(1,1), padding=(1,1)) Or
ST-RNN Layer1*
5 * ConvGRUCell(input=(512×4×8), kernel=(3,3), stride=(1,1), padding=(1,1), dropout(0.5))
5 * ConvLSTMCell(input=(512×4×8), kernel=(3,3), stride=(1,1), padding=(1,1)) Or
ST-RNN Layer2**
5 * ConvGRUCell(input=(512×4×8), kernel=(3,3), stride=(1,1), padding=(1,1), dropout(0.5))
MaxUnpool1 512×4×8 512×8×16 2×2 (0,0) 2 ---
Up_Conv_5_1 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Up_ConvBlock_5
Up_Conv_5_2 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Up_Conv_5_3 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
MaxUnpool2 512×8×16 512×16×32 2×2 (0,0) 2 ---
Up_Conv_4_1 512×16×32 512×16×32 3×3 (1,1) 1 ReLU
Up_ConvBlock_4
Up_Conv_4_2 512×16×32 512×16×32 3×3 (1,1) 1 ReLU
Up_Conv_4_3 512×16×32 256×16×32 3×3 (1,1) 1 ReLU
MaxUnpool3 256×16×32 256×32×64 2×2 (0,0) 2 ---
Up_Conv_3_1 256×32×64 256×32×64 3×3 (1,1) 1 ReLU
Up_ConvBlock_3
Up_Conv_3_2 256×32×64 256×32×64 3×3 (1,1) 1 ReLU
Up_Conv_3_3 256×32×64 128×32×64 3×3 (1,1) 1 ReLU
MaxUnpool4 128×32×64 128×64×128 2×2 (0,0) 2 ---
Up_ConvBlock_2 Up_Conv_2_1 128×64×128 128×64×128 3×3 (1,1) 1 ReLU
Up_Conv_2_2 128×64×128 64×64×128 3×3 (1,1) 1 ReLU
MaxUnpool5 64×64×128 64×128×256 2×2 (0,0) 2 ---
Up_ConvBlock_1 Up_Conv_1_1 64×128×256 64×128×256 3×3 (1,1) 1 ReLU
Up_Conv_1_2 64×128×256 2×128×256 3×3 (1,1) 1 LogSoftmax

Abbreviations: ConvGRU, convolutional gated recurrent unit; ConvLSTM, convolutional long short-term memory; SCNN, spatial convolutional
neural network; ST-RNN, spatial-temporal recurrent neural network; ReLU, Rectified Linear Unit.

* Two types of ST-RNN, i.e., ConvLSTM and ConvGRU are tested;

** ST-RNN blocks are tested with 1 hidden layer or 2 hidden layers.
18 DONG ET AL.

TABLE A2. Parameter settings for each layer of the UNet-based neural network.

Input Output
Layer Kernel Padding Stride Activation
(channel×hight×width) (channel×hight×width)
In_Conv_1 3×128×256 64×128×256 3×3 (1,1) 1 ReLU
In_ConvBlock
In_Conv_2 64×128×256 64×128×256 3×3 (1,1) 1 ReLU
SCNN_Down 64×1×256 64×1×256 1×9 (0,4) 1 ReLU
SCNN_Up 64×1×256 64×1×256 1×9 (0,4) 1 ReLU
SCNN
SCNN_Right 64×128×1 64×128×1 9×1 (4,0) 1 ReLU
SCNN_Left 64×128×1 64×128×1 9×1 (4,0) 1 ReLU
Maxpool1 64×128×256 64×64×128 2×2 (0,0) 2 ---
Down_ConvBlock_1 Conv_1_1 64×64×128 128×64×128 3×3 (1,1) 1 ReLU
Conv_1_2 128×64×128 128×64×128 3×3 (1,1) 1 ReLU
Maxpool2 128×64×128 128×32×64 2×2 (0,0) 2 ---
Down_ConvBlock_2 Conv_2_1 128×32×64 256×32×64 3×3 (1,1) 1 ReLU
Conv_2_2 256×32×64 256×32×64 3×3 (1,1) 1 ReLU
Maxpool3 256×32×64 256×16×32 2×2 (0,0) 2 ---
Down_ConvBlock_3 Conv_3_1 256×16×32 512×16×32 3×3 (1,1) 1 ReLU
Conv_3_2 512×16×32 512×16×32 3×3 (1,1) 1 ReLU
Maxpool4 512×16×32 512×8×16 2×2 (0,0) 2 ---
Down_ConvBlock_4 Conv_4_1 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
Conv_4_2 512×8×16 512×8×16 3×3 (1,1) 1 ReLU
5 * ConvLSTMCell(input=(512×8×16), kernel=(3,3), stride=(1,1), padding=(1,1)) Or
ST-RNN Layer1*
5 * ConvGRUCell(input=(512×8×16), kernel=(3,3), stride=(1,1), padding=(1,1), dropout(0.5))
5 * ConvLSTMCell(input=(512×8×16), kernel=(3,3), stride=(1,1), padding=(1,1)) Or
ST-RNN Layer2**
5 * ConvGRUCell(input=(512×8×16), kernel=(3,3), stride=(1,1), padding=(1,1), dropout(0.5))
UpsamplingBilinear2D_1 512×8×16 512×16×32 2×2 (0,0) 2 ---
Up_ConvBlock_4 Up_Conv_4_1 1024×16×32 256×16×32 3×3 (1,1) 1 ReLU
Up_Conv_4_2 256×16×32 256×16×32 3×3 (1,1) 1 ReLU
UpsamplingBilinear2D_2 256×16×32 256×32×64 2×2 (0,0) 2 ---
Up_ConvBlock_3 Up_Conv_3_1 512×32×64 128×32×64 3×3 (1,1) 1 ReLU
Up_Conv_3_2 128×32×64 128×32×64 3×3 (1,1) 1 ReLU
UpsamplingBilinear2D_3 128×32×64 128×64×128 2×2 (0,0) 2 ---
Up_ConvBlock_2 Up_Conv_2_1 256×64×128 64×64×128 3×3 (1,1) 1 ReLU
Up_Conv_2_2 64×64×128 64×64×128 3×3 (1,1) 1 ReLU
UpsamplingBilinear2D_4 64×64×128 64×128×256 2×2 (0,0) 2 ---
Up_ConvBlock_1 Up_Conv_1_1 128×128×256 64×128×256 3×3 (1,1) 1 ReLU
Up_Conv_1_2 64×128×256 64×128×256 3×3 (1,1) 1 ReLU
Out_ConvBlock Out_Conv 64×128×256 2×128×256 1×1 (0,0) 1 ---

* Similar to the SegNet-based network architecture, two types of ST-RNN, i.e., ConvLSTM and ConvGRU, are tested;
** ST-RNN blocks are tested with one hidden layer or two hidden layers.

A Hybrid Lane Detection Model For Wild Road Conditions-1
No ratings yet
A Hybrid Lane Detection Model For Wild Road Conditions-1
10 pages
Deep Learning for Lane Detection
No ratings yet
Deep Learning for Lane Detection
15 pages
End-To-End Lane Shape Prediction With Transformers: Lrj466097290@stu - Xjtu.edu - CN Yuan - Ze.jian@xjtu - Edu.cn
No ratings yet
End-To-End Lane Shape Prediction With Transformers: Lrj466097290@stu - Xjtu.edu - CN Yuan - Ze.jian@xjtu - Edu.cn
9 pages
LANE Detection - 2
No ratings yet
LANE Detection - 2
15 pages
Lane Detection Report
No ratings yet
Lane Detection Report
84 pages
LVLane Deep Learning For Lane Detection and Classification in Challenging Conditions
No ratings yet
LVLane Deep Learning For Lane Detection and Classification in Challenging Conditions
7 pages
Autonomous Drivingg
No ratings yet
Autonomous Drivingg
12 pages
Robust Lane Detection From Continuous Driving Scenes Using Deep Neural Networks
No ratings yet
Robust Lane Detection From Continuous Driving Scenes Using Deep Neural Networks
14 pages
Sensors 23 06661 v2
No ratings yet
Sensors 23 06661 v2
21 pages
1 s2.0 S2667241323000186 Main
No ratings yet
1 s2.0 S2667241323000186 Main
7 pages
Real-Time Vehicle and Lane Detection Using Modified OverFeat CNN
No ratings yet
Real-Time Vehicle and Lane Detection Using Modified OverFeat CNN
9 pages
1 s2.0 S1877050920300892 Main
No ratings yet
1 s2.0 S1877050920300892 Main
7 pages
Polylanenet: Lane Estimation Via Deep Polynomial Regression
No ratings yet
Polylanenet: Lane Estimation Via Deep Polynomial Regression
7 pages
12301-Article Text-15829-1-2-20201228
No ratings yet
12301-Article Text-15829-1-2-20201228
8 pages
Advances in Multimedia
No ratings yet
Advances in Multimedia
18 pages
Houghlanenet Lane Detection With Deep Hough Transform and Dynamic Convolution
No ratings yet
Houghlanenet Lane Detection With Deep Hough Transform and Dynamic Convolution
12 pages
Towards EndtoEnd Lane Detection An Instance Segmentation
No ratings yet
Towards EndtoEnd Lane Detection An Instance Segmentation
7 pages
Lane Detection Project Report
No ratings yet
Lane Detection Project Report
15 pages
Spatial As Deep: Spatial CNN For Traffic Scene Understanding
No ratings yet
Spatial As Deep: Spatial CNN For Traffic Scene Understanding
8 pages
Virtual Lane For Autonomous Vehicles1
No ratings yet
Virtual Lane For Autonomous Vehicles1
11 pages
FusionLane：使用深度神经网络进行车道标记语义分割的多传感器融合
No ratings yet
FusionLane：使用深度神经网络进行车道标记语义分割的多传感器融合
10 pages
Sensors 23 00789
No ratings yet
Sensors 23 00789
21 pages
Research Paper
No ratings yet
Research Paper
21 pages
Lane Line and Object Detection Using Yolo v3
No ratings yet
Lane Line and Object Detection Using Yolo v3
5 pages
YOLOP You Only Look Once For Panoptic Driving Perception
No ratings yet
YOLOP You Only Look Once For Panoptic Driving Perception
9 pages
Research Paper
No ratings yet
Research Paper
6 pages
Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM For Traffic Sign Recognition and Robust Lane Detection
No ratings yet
Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM For Traffic Sign Recognition and Robust Lane Detection
11 pages
Major Project Research Paper (Team 03)
No ratings yet
Major Project Research Paper (Team 03)
6 pages
Major Project Research Paper (Batch 03)
No ratings yet
Major Project Research Paper (Batch 03)
6 pages
38802-Article Text-149939-1-10-20250331
No ratings yet
38802-Article Text-149939-1-10-20250331
12 pages
Monocular 3D Lane Detection For Autonomous Driving Recent Achievements Challenge
No ratings yet
Monocular 3D Lane Detection For Autonomous Driving Recent Achievements Challenge
21 pages
Major Research Paper
No ratings yet
Major Research Paper
6 pages
ACM1
No ratings yet
ACM1
7 pages
High Accuracy Lane Line Detection System Using Enhanced Yolo V3
No ratings yet
High Accuracy Lane Line Detection System Using Enhanced Yolo V3
6 pages
LaneScanNET: Deep Learning for Autonomous Driving
No ratings yet
LaneScanNET: Deep Learning for Autonomous Driving
15 pages
Lane Detection in Autonomous Vehicles A Systematic Review
No ratings yet
Lane Detection in Autonomous Vehicles A Systematic Review
37 pages
Zheng CLRNet Cross Layer Refinement Network For Lane Detection CVPR 2022 Paper
No ratings yet
Zheng CLRNet Cross Layer Refinement Network For Lane Detection CVPR 2022 Paper
10 pages
1 s2.0 S0950705121010807 Main
No ratings yet
1 s2.0 S0950705121010807 Main
17 pages
Rethinking Efficient Lane Detection Via Curve Modeling: Voldemortx/Pytorch-Auto-Drive
No ratings yet
Rethinking Efficient Lane Detection Via Curve Modeling: Voldemortx/Pytorch-Auto-Drive
15 pages
Anchor-Based Lane Detection Guide
No ratings yet
Anchor-Based Lane Detection Guide
25 pages
1 s2.0 S2215098617317317 Main
No ratings yet
1 s2.0 S2215098617317317 Main
12 pages
A Real-Time Collision Detection System For Vehicles
No ratings yet
A Real-Time Collision Detection System For Vehicles
6 pages
Major PRC-1 ppt-1
No ratings yet
Major PRC-1 ppt-1
12 pages
Real-Time Lane Detection with UNet
No ratings yet
Real-Time Lane Detection with UNet
18 pages
03 Content
No ratings yet
03 Content
5 pages
Chen Yue 2021 Thesis
No ratings yet
Chen Yue 2021 Thesis
119 pages
PersFormer - 3D Lane Detection Via Perspective Transformer and The OpenLane Benchmark
No ratings yet
PersFormer - 3D Lane Detection Via Perspective Transformer and The OpenLane Benchmark
33 pages
Keep Your Eyes On The Lane: Real-Time Attention-Guided Lane Detection
No ratings yet
Keep Your Eyes On The Lane: Real-Time Attention-Guided Lane Detection
9 pages
1902 07830
No ratings yet
1902 07830
27 pages
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
No ratings yet
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
11 pages
Enhancing Object Detection in Self Driving Cars Using A 3nb1910g
No ratings yet
Enhancing Object Detection in Self Driving Cars Using A 3nb1910g
12 pages
Key Points Estimation and Point Instance Segmentat
No ratings yet
Key Points Estimation and Point Instance Segmentat
7 pages
Detecting Lane and Road Markings at A Distance With Perspective Transformer Layers
No ratings yet
Detecting Lane and Road Markings at A Distance With Perspective Transformer Layers
6 pages
Sensors 22 07682 With Cover
No ratings yet
Sensors 22 07682 With Cover
22 pages
Lane Detection Method Based On MCA-UFLD
No ratings yet
Lane Detection Method Based On MCA-UFLD
7 pages
Improved Deep Network for Self-Driving Scene Classification
No ratings yet
Improved Deep Network for Self-Driving Scene Classification
14 pages
Lane Departure Warning Systems and Lane Line Detec
No ratings yet
Lane Departure Warning Systems and Lane Line Detec
27 pages
173 Road Lane Detection
No ratings yet
173 Road Lane Detection
6 pages
Algorithm For Classifying Vehicle Lanes in Video Images
No ratings yet
Algorithm For Classifying Vehicle Lanes in Video Images
9 pages
Suspension Systems 1
No ratings yet
Suspension Systems 1
34 pages
Assignment 11
No ratings yet
Assignment 11
3 pages
1.1 - Unsteady Convection Diffusion
No ratings yet
1.1 - Unsteady Convection Diffusion
6 pages
CNN Based Lane Detection With Instance Segmentation in Edge-Cloud Computing
No ratings yet
CNN Based Lane Detection With Instance Segmentation in Edge-Cloud Computing
10 pages
4 - DE - CFD - Energy Equation
No ratings yet
4 - DE - CFD - Energy Equation
30 pages
Traffic Signs Recognition With Deep Learning
No ratings yet
Traffic Signs Recognition With Deep Learning
5 pages
Tire Model Parameters via ANN
No ratings yet
Tire Model Parameters via ANN
16 pages
Zhang Wang 2017 A Combination Application of Tandem Blade and Endwall Boundary Layer Suction in A Highly Loaded
No ratings yet
Zhang Wang 2017 A Combination Application of Tandem Blade and Endwall Boundary Layer Suction in A Highly Loaded
15 pages
Numerical Investigation On The Effect of Vortex Generator On Axial Compressor Performance
No ratings yet
Numerical Investigation On The Effect of Vortex Generator On Axial Compressor Performance
10 pages
Tire-Stiffness and Vehicle-State Estimation Based On Noise-Adaptive Particle Filtering
No ratings yet
Tire-Stiffness and Vehicle-State Estimation Based On Noise-Adaptive Particle Filtering
17 pages
DigitalFilters Apracticalguide
No ratings yet
DigitalFilters Apracticalguide
67 pages
Infino Pc-Abs Hp1000xa (Natural)
No ratings yet
Infino Pc-Abs Hp1000xa (Natural)
1 page
American Wide Flange Steel Beams W Beam Letter 1
No ratings yet
American Wide Flange Steel Beams W Beam Letter 1
7 pages
Dams Notes
No ratings yet
Dams Notes
28 pages
PROVerXL 4030V2 User Manual v1.0-202306
No ratings yet
PROVerXL 4030V2 User Manual v1.0-202306
98 pages
Excel Macros for Beginners
No ratings yet
Excel Macros for Beginners
6 pages
Basic Probability
No ratings yet
Basic Probability
16 pages
أهمية المساندة الاجتماعية في تحقيق الشعور بالأمن النفسي لدى طلبة الجامعة
No ratings yet
أهمية المساندة الاجتماعية في تحقيق الشعور بالأمن النفسي لدى طلبة الجامعة
17 pages
Webex Services - Port Numbers and Protocols
No ratings yet
Webex Services - Port Numbers and Protocols
3 pages
Thevenin Theorem: Simplifying Circuits
No ratings yet
Thevenin Theorem: Simplifying Circuits
9 pages
XXXXX
No ratings yet
XXXXX
9 pages
Topology Concepts
No ratings yet
Topology Concepts
12 pages
ATI TEAS V7 NURSE CHEUNG Math PowerPoint V2
No ratings yet
ATI TEAS V7 NURSE CHEUNG Math PowerPoint V2
182 pages
Metrology and Measurement Systems: Index 330930, ISSN 0860-8229 WWW - Metrology.pg - Gda.pl
No ratings yet
Metrology and Measurement Systems: Index 330930, ISSN 0860-8229 WWW - Metrology.pg - Gda.pl
12 pages
Construction Project Cost Analysis
60% (5)
Construction Project Cost Analysis
7 pages
STAT-205 (IT) Mid Term Paper
No ratings yet
STAT-205 (IT) Mid Term Paper
2 pages
Class 12 Chemistry Project Electrochemistry
No ratings yet
Class 12 Chemistry Project Electrochemistry
4 pages
DRAM Controller Design Guide
No ratings yet
DRAM Controller Design Guide
1 page
Avl Tree
No ratings yet
Avl Tree
38 pages
6.0m Abutment Design (4 P.C Girder-12 Pile) (Final & Ok)
100% (1)
6.0m Abutment Design (4 P.C Girder-12 Pile) (Final & Ok)
30 pages
Lab Plan
No ratings yet
Lab Plan
2 pages
Types of Bread and Pastry Explained
No ratings yet
Types of Bread and Pastry Explained
22 pages
EE214 Electrical Technology & Instrumentation PDF
No ratings yet
EE214 Electrical Technology & Instrumentation PDF
2 pages
ME130-2: Fluid Mechanics: Fluid Properties & Fluid Statics
No ratings yet
ME130-2: Fluid Mechanics: Fluid Properties & Fluid Statics
9 pages
Constructability Guide
100% (1)
Constructability Guide
31 pages
Nurul Syazwani Binti Mohamad Fauzi DSK1A 18DSK2F2012 Exercise Page 44
No ratings yet
Nurul Syazwani Binti Mohamad Fauzi DSK1A 18DSK2F2012 Exercise Page 44
4 pages
Dilip M. Salwi - Story of Zero - Children's Book Trust, New Delhi (1988) PDF
100% (1)
Dilip M. Salwi - Story of Zero - Children's Book Trust, New Delhi (1988) PDF
28 pages
B.Sc. 2nd Sem
No ratings yet
B.Sc. 2nd Sem
2 pages
Bulk Materials Requirements Checklist
No ratings yet
Bulk Materials Requirements Checklist
1 page
MS WORD Icons and Uses
No ratings yet
MS WORD Icons and Uses
12 pages

A Hybrid Spatial-Temporal Deep Learning Architecture For Lane Detection

Uploaded by

A Hybrid Spatial-Temporal Deep Learning Architecture For Lane Detection

Uploaded by

DOI: 10.1111/mice.

A hybrid spatial–temporal deep learning architecture for

Yongqi Dong1 Sandeep Patil2 Bart van Arem1 Haneen Farah1

1 Department of Transport and Planning,

Faculty of Civil Engineering and Abstract

1 INTRODUCTION ronmental perception, camera-based lane detection is

ST-RNN Layer 0 ST-RNN Layer 1 ST-RNN Layer r

Input Encoder ST-RNN blocks Decoder Output

W Next hidden layer

FIGURE 1. The architecture of the proposed model.

3.1 Qualitative evaluation (b)

Qualitative evaluation with the visualization of the lane (c)

Input images: (a)

Proposed Models SegNet-based: (g) SCNN_SegNet_ConvGRU1; (h) SCNN_SegNet_ConvGRU2;

Proposed Models UNet-based: (k) SCNN_UNet_ConvGRU1; (l) SCNN_UNet_ConvGRU2;

Proposed Models UNetLight-based: (o) SCNN_UNetLight_ConvGRU1;

(1) Visualization of the lane-detection results on tvtLANE Testset #1 (normal situations).

Input images: (a)

Proposed Models SegNet-based: (g) SCNN_SegNet_ConvGRU1; (h) SCNN_SegNet_ConvGRU2;

Proposed Models Unet-based: (k) SCNN_UNet_ConvGRU1; (l) SCNN_UNet_ConvGRU2;

Proposed Models UNetLight-based: (o) SCNN_UNetLight_ConvGRU1;

evaluations. When treated as a pixel-wise classification task, Proposed Models (SegNet-Based)

Looking at the main metric, F1-Measure, it is demonstrated (e)

Normal Challenging #2 shadow-bright Challenging #8 blur&curve

* Two types of ST-RNN, i.e., ConvLSTM and ConvGRU are tested;

You might also like