Self-driving Car using Convolution Neural Network
and Road Lane Detection
                                      Sudarsan Ghosh1 , Vishal Bansal2 ,Subhankar Dey3
                                          School of Electronics and Telecommunication
                                            Kalinga Institute of Industrial Technology
                                                       Bhubaneswar, India
                                  sudarsa.ghosh.15@gmail.com1 ,vbmeinkampf465@gmail.com2 ,
                                                     skrdy009@gmail.com3
   Abstract—The development in Computer vision recently has           the region of object proposals which will be used by Fast R-
grown up beyond imagination. Many tech companies and re-              CNN for object detection.
searchers now developing self-driving cars with the help of           When we drive, we use our eyes to see and decide where
deep learning and other related technologies. Google’s latest
Inception v3 model is one of the most powerful a deep learning        to go. The lines on the road act as a constant reference for
architecture that can extract information from an image in real-      where to steer the car. So, one of the first thing we need
time. Other than that Road lane detector is used to give additional   to do in developing self-driving car is to detect the lanes.
information about the road from a video frame and can help in         There are many methods that can be used for edge detection
decision making process of the self-driving car. In this paper, we    to help in detecting the road lane: Sobel, Laplacian etc. Sobel
use Google Inception v3 as information extractor from real-time
video of a simulated environment. For simulation environment,         edge detection [5] takes advantage of gradient magnitude with
we used GTA-5 which is a very complex open-world game and             greatest rates change in light intensity using two kernel. On
the visual information is very much comparable to the real world      the other hand, Laplacian edge detection [6] uses only one
environment. We use NVIDIA GTX 970 with 4 GB of RAM for               kernel. The downside of these methods is extreme sensitivity
the computation.                                                      of noises.
   Index Terms—self-driving car, convolution neural network,
Inception v3, road lane detector                                      In order to build a robust self-driving car, we need a good in-
                                                                      formation extractor from image and lane detector to additional
                       I. I NTRODUCTION                               helping purpose. In this paper we combined there two parts to
   Autonomous vehicles are one of the possible solution to our        build a robust model that can perform well in the simulated
most common cause of traffic accidents: drivers error due to          environment. The proposed method combines Inception v3
lack of attention. According to several different studies, such       as image information extractor and linear lane detector. The
as the one by L.C Davis in 2004 [1]. In the past few years,           proposed method can be used for steering a car in a virtual
computer technology has advanced very fast and simulation             environment.
began to be used more and more projects. Simulations are              The papers body is organized as follows: Section I is the
more safe, efficient and less expensive than live testing on real     briefing of the background and introduction of our proposed
cars. It also allows to create more complex and wide variety          method, Section II contains the methodologies that will be
of environments which might not be possible for real world            used in the experiment, Section III describes the Problem
testing. So, testing complex systems in simulated environments        Setup and Data generation methods Section IV contains
is the ideal solution to test quickly with minimum risk and           the implementation and experimental results of the methods,
more possibilities.                                                   Section V discusses about several issues and limitations of
Computer vision is one of the part of AI that deals with              our methods, and Section VI summarizes the conclusion and
extraction of information from an image. Many papers have             possible ideas for improvements.
been published on Deep Learning especially in Convolutional           II. OVERVIEW OF INCEPTION V3 AND ROAD LANE
Neural Network (CNN). This CNN methods are getting great                               DETECTOR
success in image processing, video analysis, pattern recogni-         A. Inception v3
tion and many more.
GoogleNet [2], the winner of ILSVRC 2014, uses a deep                    GoogLeNet won the ILSVRC in 2014 and it is based on a
CNN architecture with 22 network layers which are known               repetition of inception module. This inception module have six
as Inception layer.Faster R-CNN [3], one of the CNN method            convolutions and one max-pooling. Four of these convolutions
that takes advantage of the convolutional network sharing for         are 1 x 1 kernel, which is introduced to increase the width
R-CNN [4], uses Region Proposal Network (RPN) to generate             and width and the depth of the network. This reduces the
                                                                      dimensionality when it is necessary. For this reason 1 x 1
                                                                      convolution is performed before the other two convolutions
in the module, a 3 x 3 and a 5 x 5 convolution. After all the
computation, the output of the module is calculated as the
concatenation of the output of the convolutions. This module is
repeated 9 times. A dropout layer is used at the end(see Fig.1)
                                                                                           Fig. 2. Binary mask of our raw frame
                                                                              Next, the Polynomial regression (using np.ployfit function
                                                                            provided by numpy package) enables us to approximate the
                                                                            whole lane from the output of Hough line transformation.
                                                                                              III. PROBLEM SETUP
                                                                            A. Transfer Learning and Fine-Tuning Inception v3
Fig. 1. Inception modules where each 5 x 5 convolution is replaced by two
3 x 3 convolution                                                              In practice we dont usually train an entire Convolutional
                                                                            Neural Network from scratch. This is because of limited data
   Inception v3 is a modification of GoogLeNet. The base                    set available and the computation power required to train the
inception module is changed by replacing one 5 x 5 kernel                   whole network. Rather, we use pre-trained network which is
with two 3 x 3 kernels.The resulting network is made up of                  trained on a large data set and then use it for another task
10 inception modules and the base module is modified as the                 where the data set is limited. The pre-trained network acts as
network goes deeper. Five modules are changed by replacing                  a feature extractor for new task, we give those features to a
the n x n convolution by a 1 x 7 and a 7 x 1 kernels to reduce              classifier for our purpose of use.
the computation cost. The last two modules replace the last                 As a feature extractor the inception v3 has very similar task
two 33 convolutions by a 1 3 and a 3 1. Finally, the first 77               in our problem. The objects that we are going to find in the
convolution is also changed by three 33 convolutions. In total,             road is very similar to the data set on which the inception
Inception v3 has 42 learnable layers.                                       model is trained. For our purpose, we are trying to predict
                                                                            the possible steering suggestion when a frame is given to the
B. Road Lane Detector                                                       network from the virtual environment The features extracted
   In the first stage, we need to preprocess the raw image                  by the inception model is fed to the classifier to predict in 9
data, munge it and turn it into a working dataset using any                 different classes each for precise steering direction and then
sort of vectorization procedure. We used OpenCV, a terrific                 trained on our new data set from the virtual environment. We
library for image manipulation on a pixel level using matrix                used supervised learning method to train the model for our
operations. We select yellow and white color ranges and get                 new task.
a binary mask of our frame. This yellow and white color
decides the width of the lane(see fig.2)                                    B. Data set generation and Lane detection
   Next, we apply Canny edge detection (using cv2.Canny                        For supervised learning method to train the model, we need
function provided by Opencv package) and Hough line                         frames from the cars environment and the correct steering
transformation(using cv2.HoughLinesP function provided                      directions. The data set is generated by recording frames of
by Opencv package). Canny edge detection is an algorithm                    the virtual environment and registering the key strokes when a
which calculates the intensity gradients of the image and                   driver is driving the car. We have trained the model on 100,000
applies double threshold to determine edges.Now we have                     frames. We had to balance the data as the raw generated data
single edges from canny detection, we can connect them                      was too much unbalanced. Theses frames are then processed
using lines with the help of Hough line transformation.                     by the Lane Detector then result is fed to the Inception model
                                                                            for training for better performance.
                                                                      Fig. 4. The result of Lane detection in Virtual Environment (highway)
              Fig. 3. Inception V3 Model Architechture
        IV. IMPLEMENTATION AND RESULTS
A. Implementations
   The proposed methods implementation and observations
are computed using TensorFlow, Numpy and other libraries in
Python Environment. We used TensorFlow library [6] to load
                                                                         Fig. 5. Results of the Lane Detector on Complex Environment
the Inception-v3 on our machine and trained it on the new
data. In the output layer of the classifier Softmax activation is
used to get probability distribution over all the output classes.
We used CUDA for our GPU driver For operating system                          V. DISCUSSIONS AND LIMITATIONS
and other hardware supports, we use Windows 10 pro, Intel
i5 processor, 16 GB of RAM, and NVIDIA GTX 970 GPU
with 4 GB RAM.                                                         In the road lane detection, we did some filtering before
                                                                    detecting the lanes. Besides different light condition, weather
   The model was trained on the frames and the key strokes          or camera view still affects the detection of our methods(see
that were collected from the virtual environment. After train-      Fig.5).The steering direction is uncertain in complex urban
ing, the model was tested on the game environment.                  areas. In highway situations the performance was best.We
                                                                    have tried different algorithms to make the steering smoother
B. Results                                                          and stable. We have tried Long Short Term Memory(LSTM)
   The resultant frame from the implementation is shown in          but it did not perform better than the proposed model. We
the Fig.4 and consistent steering suggestions are also achieved.    also tried the car to follow a specified path in the map to
From some complex situation it seems to be confused in              reach a desired destination.
steering direction.
      VI. CONCLUSION AND FUTURE WORKS
   We have demonstrated a detail and step-wise approach
for the lane detection and the Convolutional Neural network
model to steer a vehicle in virtual environment. This paper
is good for highway driving situation but fails in complex
urban roads, sharp turning situations. We will try to improve
and make robust system to handle all possible situations.We
will train the model with much more diverse data possible in
the future and experiment with different algorithm to avoid
collision and other vital tasks.
                        ACKNOWLEDGMENT
   Authors of this paper would like to thank Google Inc.
for releasing the inception model and TensorFlow library for
researcher’s like us to experiment in this fascinating field.
                            R EFERENCES
[1] L. C. Davis, Effect of adaptive cruise control systems on traffic flow,
    Physical Review E, vol. 69, no. 6, 2004.
[2] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
    V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions,
    Proceedings of the IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), pp. 1-9, 2015.
[3] S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: towards realtime
    object detection with region proposal networks, Advances in Neural
    Information Processing Systems (NIPS), 2015.
[4] R. Girshick, Fast r-cnn, Proceedings of the IEEE International Confer-
    ence on Computer Vision (ICCV), pp. 1440-1448, 2015.
[5] N. Kanopoulos, N. Vasanthavada, and R.L. Baker, Design of image
    edge detection filter using the sobel operator, IEEE Journal Of Solid-
    State Circuits, vol. 23, no. 2, 1988.
[6] G.T. Shrivakshan, A comparison of various edge detection techniques
    used in image processing, IJCSI International Journal of Computer
    Science (IJCS), vol. 9, no. 3, pp. 358-367, 2012.