Information 13 00325
Information 13 00325
Article
Intelligent Video Surveillance Systems for Vehicle
Identification Based on Multinet Architecture
Jacobo González-Cepeda *, Álvaro Ramajo and José María Armingol
Department of Electric, Electronic and Automatic Engineering, Carlos III University, 28912 Madrid, Spain;
aramajo@pa.uc3m.es (Á.R.); armingol@ing.uc3m.es (J.M.A.)
* Correspondence: 100307736@alumnos.uc3m.es
Abstract: Security cameras have been proven to be particularly useful in preventing and combating
crime through identification tasks. Here, two areas can be mainly distinguished: person and vehicle
identification. Automatic license plate readers are the most widely used tool for vehicle identification.
Although these systems are very effective, they are not reliable enough in certain circumstances. For
example, due to traffic jams, vehicle position or weather conditions, the sensors cannot capture an
image of the entire license plate. However, there is still a lot of additional information in the image
which may also be of interest, and that needs to be analysed quickly and accurately. The correct
use of the processing mechanisms can significantly reduce analysis time, increasing the efficiency
of video cameras significantly. To solve this problem, we have designed a solution based on two
technologies: license plate recognition and vehicle re-identification. For its development and testing,
we have also created several datasets recreating a real environment. In addition, during this article, it
is also possible to read about some of the main artificial intelligence techniques for these technologies,
as they have served as the starting point for this research.
work has been performed directly on the signal processing system, focusing on artificial
intelligence techniques such as those described in the following paragraphs (mainly deep
learning) and combining them in a multinet solution. The second purpose is to show
different methods and solutions that we consider (through experience) to be the most
practical, optimal and updated, and compile them in a small survey. We want to collect
what we consider to be the most applicable solutions for real environments in relation to
these specific tasks (vehicle re-identification and license plate reading). It is important to
highlight that these methods and solutions have served as a starting point for this research,
and have helped with thinking about possible solutions.
This will be the paper’s structure. Right after this paragraph, we will define some
concepts that are important for setting the framework. Section 2 will identify the state-of-
the-art in license plate readers and different methods. Section 3 will be similar to Section 2,
but concerning the collection of vehicle re-identification methods. In Section 4, certain
considerations for designing a solution will be detailed. In Section 5, our own solution
will be detailed, based on a combination of the methods described above. Finally, certain
conclusions will be outlined.
Figure 1.
Figure 1. Example
Example of
of an
an access
access control
control camera
camera for
forlicense
licenseplate
platereading
reading[8].
[8].
In this specific
Focusing on thescenario,
first types theofsignal processing
cameras, they will system
need will recognise for
the capability the identifying
license plates, and
extract the the
registering characters, store ofthem
license plates and register
the vehicles entering the anddifferent
leaving. vehicles.
To this The
end, rest of the
all elements
elements
of the systemof the system
shall will enable
be installed this task.
to obtain Hence,
images the camera
of license plates and lighting
with will bepossible
the highest large,
visible, and
quality and contrast.
they will focus directly on the license plate. In addition, the work of acquiring
the frames
In thisof the license
specific platethe
scenario, is easier,
signalthanks
processingto thesystem
fact thatwilltherecognise
car must theremain stopped
license plates,
until the
extract theprocess has been
characters, storecompleted.
them and register the different vehicles. The rest of the elements
of theMost
system of security systems
will enable this operate
task. Hence,similar thetocamera
this previous one. They
and lighting willarebedesigned for
large, visible,
controlled
and they will environments
focus directly withon static cameras
the license plate.installed, accompanied
In addition, the workby large optics
of acquiring the
(which of
frames contribute
the license to plate
a better resolution
is easier, thanks intothetheimage)
fact thatand thelighting
car must systems
remainthat are
stopped
designed
until to eliminate
the process any completed.
has been type of aberration. However, these ideal circumstances are not
alwaysMostpossible, such as,
of security for example,
systems operateonsimilar
a highway.to this previous one. They are designed
for controlled environments with static cameras installed, accompanied by large optics
1.4. Motivation:
(which contribute Identifying
to a better theresolution
Scenario and Looking
in the image) forand
Solutions
lighting systems that are designed
to eliminate any type of aberration. However,
The specific scenario in the use of video surveillance these ideal circumstances
systems will arebe notvehicle
always
possible, such as, for example, on a highway.
identification and tracking. The vast majority of commercial systems focus mainly on
license plates. This is why most of them employ specific cameras to facilitate signal
1.4. Motivation:
processing. Identifying
However, therethe Scenario
are many andsituations
Looking forwhere Solutionsthese cameras do not work
The specific scenario in the use of video surveillance systems will be vehicle identifica-
properly.
tion and
These tracking.
situations Thecanvastbemajority
mainly of duecommercial
to two factors: systems focus position
vehicles mainly on in license
imagesplates.
and
This is why
lightning most of them
conditions. The employ
first onespecific
directlycameras
affects the to facilitate
information signal processing.
available in theHowever,
frames
there are many
obtained. situationswhen
For example, wherea thesevehicle cameras
exceeds dothenotspeed
work properly.
limit while it is overtaking
another car, a situation can happen where the speed trap vehicles
These situations can be mainly due to two factors: captures position
both vehiclesin images
and itand is
lightning conditions. The first one directly affects the information
impossible to obtain all of the characters of the license plate corresponding to the offender. available in the frames
obtained.
Or, when aFor cameraexample, when a avehicle
is monitoring certain exceeds
place such theasspeed limit while
a roundabout or aitcorner,
is overtaking
it may
another
happen car,that athe situation
sensorscan happen
capture the where
image theof thespeed trap
target captures
car, but that both
the vehicles
frame does andnot it is
impossible
include theto obtainplate.
license all of the characters of the license plate corresponding to the offender.
Or, when a camera
The second is monitoring
scenario occurs when a certain
weatherplace such as a conditions
or lightning roundabout or athe
affect corner, it may
capacities
happen that the sensors capture the image of the target car,
of the sensors. By law, European cars have license plate backlighting. Although this was but that the frame does not
include the license plate.
thought to facilitate license plate reading, it is very common in the evening or at night that
The second
this indirect lightscenario
can burn occurs when captured,
the images weather orcausing lightningtheconditions affect the capacities
opposite effect.
of theAs sensors. By law, European cars have license
previously mentioned, cameras are fundamental for investigation plate backlighting. Although andthis was
crime
thought
solving. toThat facilitate license
being said, all plate
of thereading,
informationit is very
providedcommoncan be in critical.
the evening or at example
A perfect night that
this indirect light can burn the images captured, causing the opposite effect.
As previously mentioned, cameras are fundamental for investigation and crime solv-
ing. That being said, all of the information provided can be critical. A perfect example may
be a robbery that is recorded by a security camera. Due to the above-mentioned scenarios,
this camera may have captured images of the vehicle used by the offenders, but not its
license plate. In this case, we have very important but incomplete information. Although
this camera has not been able to read the license plate, maybe there is another one that
Information 2022, 13, 325 4 of 26
could obtain the information in time. Without an automatic image processing tool, this
may take hours or days, and time is a vital resource. The later that data is analysed, the less
chance there is to capture the authors.
Some systems try to solve these problems by offering additional capacities such as
detecting the brand or the colour of the cars (such as, for example, OpenALPR). This could
be very useful, but it is not enough in some scenarios where, for example, we want to detect
a certain vehicle while it drives along a motorway, and we need to determine where it
leaves the motorway. On the other hand, many traffic control cameras do not have enough
resolution to detect a car with just its license plate.
Vehicle tracking for security requires a high standard of precision. In fact, in critical
situations where there is no margin of error, there will be always an officer verifying the
information. A problem appears when there is a large amount of information to analyse.
Linking with the robbery example, traditional license plate readers would not be useful
because we do not have the license plate characters, and pure vehicle re-identification
systems may provide too much information. However, a combination of both elements can
guide the investigation faster because it would be possible to both synthesise the images of
cars that can be matched to our target, and to know the license plate data. This is the main
advance compared to other different solutions shown here that act in isolation.
Through this research, we want to look for a robust solution that not only improves
license plate reading capacities, but also one that can be applied in a wide range of scenarios
and that satisfies real needs in security systems that are used for vehicle identification.
(a) (b)
(c) (d)
Figure 2.
Figure 2. Example
ExampleofofDORI
DORIconcept:
concept:(a)(a) Detection,
Detection, (b) (b) Observation,
Observation, (c) Recognition
(c) Recognition and
and (d) (d)
Identification.
Identification.
Observation (Figure 2b) would correspond to the capacity to appreciate the possible
Detection
movements (Figure
of this new2a) is the
asset. ability
It will allowof the
for system
an analysisto capture some
or a study ofmovement or event.
its “intentions”. As
This
is would
shown in thebe the firstimage,
second step toit“activate”
is usually the perimetral
possible when security. It can comes
the new target be associated
closer towith
the
camera,
a motionwhich allows
detection it to beInseen
system. fact,with
it isbetter
usually resolution,
combined but not physical
with enough to recognise
sensors it.
or with
Recognition (Figure 2c) is the capability to observe whether
movement detection mechanisms (such as event alerts caused by pixel variations, or by the asset is previously
known
infraredordetectors).
not. In thisUsing
case, the
thisfirst
termimage
is usually
as anlinked
example, withdetection
the CCTV operator,
would be the as capability
this is the
entity
to detectthata is
new able to make
active this recognition.
element; in this case, In Figure 2, it is the distance or the number of
a person.
pixelsObservation
when the user is able
(Figure 2b)towould
know who is in thetoimages.
correspond the capacity to appreciate the possible
Finally, identification
movements of this new asset. (Figure 2d)allow
It will is thefor
ability to see enough
an analysis characteristic
or a study elementsAs
of its “intentions”. in
the image to make the asset recognisable on subsequent occasions.
is shown in the second image, it is usually possible when the new target comes closer to In the previous example,
itthe
is camera,
possible which
to observe
allows enough
it to be of seen
the person’s
with betterfeatures to givebut
resolution, an unambiguous
not enough to picturerecogniseof
him
it. or her (as in the images). These last two capabilities, recognition and identification,
are the main objectives
Recognition (Figure pursued
2c) is the by acapability
video surveillance
to observesystem.
whether Inthe
fact,asset
the vast majority
is previously
of research
known linesInhave
or not. this focused
case, thison developing
term is usually these
linkedcapabilities, exploiting
with the CCTV the increment
operator, as this is
in the resolution of the sensors. The higher the resolution, the sharper
the entity that is able to make this recognition. In Figure 2, it is the distance or the number the image, and the
greater the distance at which that recognition
of pixels when the user is able to know who is in the images. and identification can be conducted.
The recognition
Finally, and identification
identification (Figure 2d) tasks is thedeveloped
ability to seeby video
enough surveillance
characteristicsystems tend
elements
to be mainly performed upon people and vehicles. To identify
in the image to make the asset recognisable on subsequent occasions. In the previous a person, there are different
biometric
example, itcharacteristics that are individual
is possible to observe enough of the andperson’s
impossible to replicate,
features to give an such as the face,
unambiguous
voice, eyes or even the arrangement of blood vessels [14,15]
picture of him or her (as in the images). These last two capabilities, recognition (in vehicles, it is normaland to
resort to licenseare
identification, plates).
the main objectives pursued by a video surveillance system. In fact, the
All these systems need images with a minimum resolution to be able to fulfil their
vast majority of research lines have focused on developing these capabilities, exploiting
objectives. Hence, lighting mechanisms, and optics and sensors try to obtain the images in
the increment in the resolution of the sensors. The higher the resolution, the sharper the
the best possible conditions, so that the treatment of these signals offers the best results.
image, and the greater the distance at which that recognition and identification can be
Even as they are different procedures, the operation behind them is exactly the same; that
conducted.
is, detect a face/license plate, extract it, and match it with a database to know who the
Information 2022, 13, 325 6 of 26
vehicle’s owner is, or who that person is. However, is this an identification? Or is it really
a recognition? In order to develop this idea, we must delve deeper into the differences
between identification and recognition.
Due to the large computational cost required by the artificial intelligence algorithms
applied, the vast majority of identification systems actually perform large-scale recognition
tasks [16]. An identification can be understood as a recognition in which a sample “n”
collides with a base “N”, where “N” contains “n”, as well as a large number of elements.
For example, when referring to a facial identification system, the system does not
really identify any faces. It will compare a sample against a very large database (such as an
ID Card database). Faces are encoded as an array of unique and unrepeatable features, and
compared with the rest of the features arrays stored.
The first case would refer to a classification network (such as the one explained in the
previous section), in which there would be as many categories as the number of people
among whom the identification is made. This generates a problem. Under a traditional
approach, to perform a correct classification, it would be necessary to have a very specific
dataset with as many categories as elements (faces) to classify. In addition, being very
similar elements (faces), it is also necessary to have a very large volume of images. It is
not the same that the classifier distinguishes between people and vehicles; for example,
compared to only between different types of vehicles.
To simplify this problem, a different approach must be applied. For example, let us
imagine an access control system with facial recognition. The system, when it has the frame
corresponding to the person who wants to enter, will compare that face with those that
stored in its database. That is, it will indicate whether or not that face is in the database.
Broadly speaking, this problem can be understood as a binary classification, in where
there are really two categories: yes (the image corresponds to one of the stored ones) or no
(it does not correspond). Under this perspective, it is possible to simplify a classification
problem to a mere comparison. This is called re-identification [17,18], and it is the approach
chosen to develop our investigation upon.
License plate
detection
License plate
detection
Traditional
computer vision Classifiers
Traditional
techniques
computer vision Classifiers
techniques
Figure4.4.License
Figure Licenseplate
platedetection
detectionmethods.
methods.
Figure 4. License plate detection methods.
From
Fromaapractical
practicalpoint
point of of
view, thisthis
view, classification can be
classification canreduced
be reducedto traditional com-
to traditional
computer vision techniques and deep learning classifiers. The first group, especially edge-to traditi
puter vision From
techniques anda practical
deep point
learning of view,
classifiers. this
The classification
first group, can
especially be reduced
edge-based
methods,
based methods, computer
can be vision through
implemented
can be implemented techniques and OpenCV
OpenCV
through deep learning
libraries. Anclassifiers.
example
libraries. isThe
An example [20],first group,
which
is [20], usesespecially
whicha e
combination based methods,
between
uses a combination betweencan
geometric be implemented
transformations
geometric andthrough OpenCV
Haar cascades
transformations and Haar libraries.
[21–23] An[21–23]
to detect
cascades example
licensetois [20], w
detect license plates in real time. The idea through traditional techniques is to take [21–2
plates in real uses
time. a
Thecombination
idea through between
traditional geometric
techniques transformations
is to take and
advantage Haar
of cascades
the char-
acteristics
advantageand the
the dimensions
ofdetect license plates
characteristics of the inlicense
and real platesThe
the time.
dimensions(which ofare
idea usually
through
the license standardised
traditional
plates (which [24]).
techniques
are is to
On the other hand, deep-learning
advantage of the classifiers
characteristics andarethe
trained to detect
dimensions of license
the platesplates
license and (which
usually standardised [24]).
to extract an ROI. Here,
usually the development
standardised [24]). of YOLO [25] has been crucial. Their different
On the other hand, deep-learning classifiers are trained to detect license plates and
versions and evolutions On the(especially
other hand, “Tiny” ones) have
deep-learning reducedare
classifiers inference
trained time and license
to detect com- plates
to extract an ROI. Here, the development of YOLO [25] has been crucial. Their different
putational costs, being able to operate in real time. In fact, thanks to
to extract an ROI. Here, the development of YOLO [25] has been crucial. Their these classifiers, it diffe
versions and evolutions (especially “Tiny” ones) have reduced inference time and
is becoming commonversions toand useevolutions
them to pre-detect
(especially cars“Tiny”
in images,ones)reducing
have reducedthe amount of
inference time
computational costs, being able to operate in real time. In fact, thanks to these classifiers,
information processed.
computational costs, being able to operate in real time. In fact, thanks to these classif
it is becoming common to use them to pre-detect cars in images, reducing the amount of
In summary, we can conclude
it is becoming commonthat deep
to use learning
them methods
to pre-detect tend
cars to be more
in images, robust,
reducing the amou
information processed.
but they require more resources
information as they are computationally more complex [19] than the
processed.
“traditional methods”. The influence of these computational costs will be deeply explained
in Section 4. However, we have to balance this dichotomy between results and costs. In fact,
as will be showed in the end of this section, this issue lies in guiding the recent research.
2.1.2. Pre-Processing
Although this step could be omitted, it is very much recommended. It will depend on
the quality of the information previously detected.
Here, the aim is to prepare the characters on the inside of the license plate to be
easily legible. In this case, we differ from [19], because we consider the traditional pre-
processing and character segmentation as only one step altogether, as they are thought to
help character recognition.
In this step, the most common techniques (which can also be implemented through the
OpenCV libraries) are binarization, thresholding and morphological transformations such
Information 2022, 13, 325 8 of 26
as erosion or dilation, which remove noise from the image and highlight the characters
against the background.
Sometimes, it will be necessary to make hard geometrical transformations to turn
our image to be as horizontal as possible. This will depend on the character recognition
method. This is where deep learning is offering new possibilities, with “spatial transformer
networks” [26]. For example, WPOD-net [27] employs spatial transformation and turn
images to a horizontal plane.
been trained with specific datasets, and training methods such as transfer learning [37–40].
For example [37], shows an end-to-end double stage method for Persian characters, where
license plate recognition is the final step right after car detection and license plate detection,
with an accuracy of 99.37% in character recognition. Another example is [38], which is a
single-stage method based as well in CNNs for multiple-font characters, with an accuracy
of 98.13% in several types of vehicles (such as cars or motorbikes).
On the other hand, there is another very interesting approach that consists of deploying
these systems for low-resource devices in real time, emulating real environments. For
example, in [41], the authors run their tool in a CPU-based system with 8 GB of RAM, and
obtain a precision of 66.1% with MobileNet SSDv2 [42] with a 27.2 FPS rate; or in [43], with
metrics of detection of 90% and a recognition rate of 98.73% with just Raspberry Pi3B+
as hardware support. Going further, we can find Android-based systems such as [44]
or [45], which are designed to be deployed in mobile phones and operate in real time. These
procedures and their results are shown in Table 1.
License plates can be very different, depending on the country (even more, when
considering Asian characters), making it very difficult to obtain good generic results. To
solve this, it is very important to train our system with a specific dataset that should meet
the following requirements:
• Variable lighting conditions: day, night, rain, fog . . . ;
• Different angles and viewpoints;
• Several license plates per frame;
• Different backgrounds: street, road, paths . . . ;
• Different qualities and aspect ratios;
• Different sensors.
Information 2022, 13, 325 10 of 26
The dataset has to cover as many real conditions as possible, and not only ideal ones.
That being said, the license plates must be recognisable in the different images, and they
have to be big enough (at least 5000 different samples). We have developed this idea,
creating our own datasets, which will be detailed in Section 5.
characteristic patterns that specifically identify the image of each vehicle, but without
having a predefined focus.
(a) (b)
Figure 5. Representation
Figure of the
5. Representation of “anchor” (a) and
the “anchor” a network
(a) and thatthat
a network applies the the
applies triplet lossloss
triplet function
function (b).
(b).
As the main advantage, this type of mechanism usually presents quite accurate metrics.
However, they have
As the main problems
advantage, thiswhen
typeitofcomes to performing
mechanism usuallycorrect training,
presents quite with it being
accurate
necessary
metrics. to apply
However, theyadvanced techniques,
have problems when since thisto
it comes can easily fall correct
performing into overtraining.
training, with
it being necessary to apply advanced techniques, since this can easily fall into
3.1.4. Methods Based on Unsupervised Learning
overtraining.
In this case, the idea behind the system is to use a type of unsupervised training CNN,
called GANsBased
3.1.4. Methods (Generative Adversarial
on Unsupervised Networks), to perform re-identification tasks. These
Learning
networks
In this case, the idea behind the system is to useand
have two main modules: a generator a discriminator.
a type of unsupervised Thetraining
generator creates
CNN,
called GANs (Generative Adversarial Networks), to perform re-identification tasks. These in
new virtual images through real images, introducing pseudorandom patterns, which
some cases, are invaluable to the human eye. The discriminator is in charge of discerning
networks have two main modules: a generator and a discriminator. The generator creates
whether, after this subsequent modification, the original image and the one obtained at
new virtual images through real images, introducing pseudorandom patterns, which in
the discriminator output are similar or not. Thus, it is not necessary to have a dataset that
some cases, are invaluable to the human eye. The discriminator is in charge of discerning
contains labelled images.
whether, after this subsequent modification, the original image and the one obtained at
This type of network was initially created to protect against steganographic attacks
the discriminator output are similar or not. Thus, it is not necessary to have a dataset that
(in which digital noise is included in an image, so that if it is analysed visually, it may be
contains labelled images.
the same as the original, but if its digital encoding is analysed, it is completely different). It
This type of network was initially created to protect against steganographic attacks
is a type of unsupervised learning (unlike the rest of the systems, which are considered
(in which digital noise is included in an image, so that if it is analysed visually, it may be
supervised), since the network does not know whether the pairs of images analysed by the
the same as the original, but if its digital encoding is analysed, it is completely different).
It is a type of unsupervised learning (unlike the rest of the systems, which are considered
supervised), since the network does not know whether the pairs of images analysed by
the discriminator are similar or not to each other; it is the network that does this
autonomously. In fact, this is the purpose of the learning process.
contains labelled images.
This type of network was initially created to protect against steganographic attacks
(in which digital noise is included in an image, so that if it is analysed visually, it may be
the same as the original, but if its digital encoding is analysed, it is completely different).
It is a type of unsupervised learning (unlike the rest of the systems, which are considered
Information 2022, 13, 325 12 of 26
supervised), since the network does not know whether the pairs of images analysed by
the discriminator are similar or not to each other; it is the network that does this
autonomously. In fact, this is the purpose of the learning process.
discriminator
The idea,are similar orinnotvehicle
therefore, to eachre-identification,
other; it is the network thatthe
is that does this autonomously.
generator induces
In fact, this
geometric is the purpose
modifications of the
in the learning
images process.
so that the network is able to discriminate between
similarTheandidea, therefore,
dissimilar in vehicle
images (Figurere-identification,
6). is that the generator induces geometric
modifications in the images so that the network is able to discriminate between similar and
dissimilar images (Figure 6).
of attention
Figure 7. Example representation of attention maps
maps [53].
[53].
The studies in [62,63] are examples of metrics learning methods. The first one presents
a “novel viewpoint-aware triplet loss” to solve re-identification considering intra-view
triplet loss (between different classes) and inter-view triplet loss (similarities in the same
class) as the final output of the net. The second one proposes the “support neighbours
(SN) loss” derived from the KNN (k nearest neighbours) algorithm, and it is also valid for
person and vehicle re-identification. The study in [64] is another very interesting and novel
approach, due to its combination of “3D viewpoint alignment”.
3.3. Datasets
In a classification problem, datasets have a deep impact on network metrics. The
more categories to classify, the more images are needed. We have already mentioned that
re-identification can be considered a specific classification model. Therefore, datasets are
fundamental (as is the case for license plates). The particularity in re-identification is the
similarity between the classified objects. This requires a large sample of car images, with
very different viewpoints.
One of the first datasets was published in 2013 by Stanford University [65] (Figure 8).
The original purpose was for making three-dimensional object classification, which were
very similar. However, it started to serve as an initial point for several studies focused on
the development of CNNs. In fact, this dataset can be used as a starting point to evaluate
certain CNNs as classifiers.
similarity between the classified objects. This requires a large sample of car images
very different viewpoints.
One of the first datasets was published in 2013 by Stanford University [65] (Figu
The original purpose was for making three-dimensional object classification, which
very similar. However, it started to serve as an initial point for several studies focus
Information 2022, 13, 325 14 of 26
the development of CNNs. In fact, this dataset can be used as a starting point to ev
certain CNNs as classifiers.
Figure
Figure 8. Example 8. Example
of Images of Images
from Stanford from
Cars Stanford
Dataset [65].Cars Dataset [65].
Figure 9.
Figure 9. Examples
Examples of
of images
images from
from the
the VeRi
VeRidataset
dataset[67]
[67]and
andVeRi-Wild
VeRi-Wild[55].
[55].
To
Toincrease
increasethe there-identification
re-identification possibilities, the same
possibilities, authors
the same of the previous
authors dataset
of the previous
created
dataset “VeRi—776”. It is an evolution
created “VeRi—776”. of VeRi, which
It is an evolution increases
of VeRi, which the total volume
increases the totalof volume
images
(up to 20% more), with up to 776 different vehicles contained in 50,000
of images (up to 20% more), with up to 776 different vehicles contained in 50,000 images. images. It also
incorporates new bounding boxes of the license plates in these images,
It also incorporates new bounding boxes of the license plates in these images, reinforcing reinforcing the
re-identification by complementing the images of vehicles with the characters
the re-identification by complementing the images of vehicles with the characters of the of the license
plates
license(something fundamental
plates (something in the work
fundamental in theof police
work of investigation). Finally, itFinally,
police investigation). includes it
an annotation
includes of the spatial–temporal
an annotation relationship,
of the spatial–temporal providing
relationship, the location
providing of the camera
the location of the
capturing the image,
camera capturing thethe direction
image, of the captured
the direction vehicle and
of the captured the trajectory
vehicle it is taking.
and the trajectory it is
VeRi-Wild
taking. is the ultimate expression of a dataset obtained through real-time video
captures. For hisis purpose,
VeRi-Wild the ultimatecontinuous
expression video
of asequences were obtained
dataset obtained through24 h a day video
real-time for a
full month, captured by 174 different cameras. This achieved 416,314
captures. For his purpose, continuous video sequences were obtained 24 h a day for a full images of vehicles
with
month,40,671 different
captured by 174identities,
differentwith different
cameras. Thisoptic variants
achieved suchimages
416,314 as occlusions,
of vehicles variant
with
trajectories, perspectives, etc.
40,671 different identities, with different optic variants such as occlusions, variant
We consider
trajectories, these three
perspectives, etc.datasets, due to their resemblance to the reality of work within
security, as well as their wide range of data, to be very important tools for developing a
We consider these three datasets, due to their resemblance to the reality of work
valid and useful re-identification tool for real work.
within security, as well as their wide range of data, to be very important tools for
That being said, there are other datasets (see Table 4) with a big influence in re-
developing a valid and useful re-identification tool for real work.
identification studies such as PKU Vehicle [69], which contains tens of millions of vehicle
That being said, there are other datasets (see Table 4) with a big influence in re-
identification studies such as PKU Vehicle [69], which contains tens of millions of vehicle
images captured by real-world surveillance cameras in several Chinese cities, “including
several locations (e.g., highways, streets, intersections), weather conditions (e.g., sunny,
rainy, foggy), illuminations (e.g., daytime and evening), shooting angles (e.g., front, side,
Information 2022, 13, 325 15 of 26
Finally, the massive proliferation of UAV, mainly in security, has opened up a new
field in vehicle re-identification research. The authors of [73] propose “the view-decision
based compound matching learning model (VD-CML)”. To verify the effectiveness of their
proposal, they have created the first vehicle re-identification dataset (VeRi-UAV) captured
by an UAV. This dataset has impelled new research, such as [74,75].
instead of GPU, although they sacrifice precision; such as [41] or [43]. It is important
to mention that in critical scenarios, there will always be a human controller behind the
system, so these tools may deliver more false positives rather than false negatives (and the
human controller will discriminate if it is right or not).
Figure12.12.
Figure Highway
Highway Gantry
Gantry Dataset
Dataset sample.sample.
The second dataset has been called the Operational Urban Dataset (OUD) (Figure 13).
The second dataset has been called the Operational Urban Dataset (OUD) (Figu
It is divided into two different recording locations (v1 and v2), with a multitude of different
It is divided
perspectives andinto two different
occlusions between therecording locations
vehicles and with the (v1 and v2),
vegetation. Eachwith a multit
of the
different
scenes perspectives
has been and occlusions
captured simultaneously between
by two thecameras
different vehicles(c1 and with
and c2), the vegetation
emulating
aofreal
theoperational
scenes has environment. In addition,
been captured by having two
simultaneously byinput
twosources, it allows
different for (c1 an
cameras
searching for vehicles annotated from one camera in the other, with different perspectives,
emulating a real operational environment. In addition, by having two input sou
which is the main objective pursued in this work. It groups a total of 1255 images of
allows
69 forwith
classes, searching
slightly for vehicles
different annotated
annotation from
criteria. V1 one camera
contains in the
all types of other,
images,with di
perspectives,
including whichand
very distant is partial
the main objective
views, while v2 pursued in vehicles
only collects this work.with It groups a total o
a minimum
images of 69 classes, with slightly different annotation criteria. V1 contains all ty
recognizable size.
images, including very distant and partial views, while v2 only collects vehicles
minimum recognizable size.
of the scenes has been captured simultaneously by two different cameras (c1 and c2),
emulating a real operational environment. In addition, by having two input sources, it
allows for searching for vehicles annotated from one camera in the other, with different
perspectives, which is the main objective pursued in this work. It groups a total of 1255
images of 69 classes, with slightly different annotation criteria. V1 contains all types of
Information 2022, 13, 325 19 of 26
images, including very distant and partial views, while v2 only collects vehicles with a
minimum recognizable size.
Figure 13.
Figure 13. Operational
Operational Urban
Urban Dataset
Dataset sample.
sample.
5.2.2.
5.2.2. Dataset
Dataset for
for License
License Plate
Plate Recognition
Recognition
As
As mentioned,
mentioned, oneone of
of the
the main
main problems
problems in in ALPR
ALPR systems
systems is
is having
having access
access to
to specific
specific
license
license plate datasets. Thus, we have created a specific dataset with different images of
plate datasets. Thus, we have created a specific dataset with different images of
European
European license
license plates
plates (mainly
(mainly Spanish)
Spanish) fromfrom very
very different
different views,
views, angles
angles and
and lightning
lightning
conditions
conditions(including
(includingday
dayandandnight).
night).It contains more
It contains thanthan
more 2035 2035
images, 2531 license
images, plates
2531 license
Information 2022, 13, x FOR PEER REVIEW
and over 15,000 characters, both manually labelled and rescaled to 640 × 640 20 of
pixels 27
per
plates and over 15,000 characters, both manually labelled and rescaled to 640 × 640 pixels
labelled image.
per labelled We have
image. called
We have it theitSpanish
called ALPR
the Spanish Dataset
ALPR (Figure
Dataset 14). 14).
(Figure
Figure 14.
Figure 14. Spanish
Spanish ALPR
ALPR Dataset
Dataset examples.
examples.
This
This dataset
dataset is
is twofold.
twofold. The first part (license plates) has been used for license
license plate
plate
detection and ROI extraction against YOLOv5s.
detection and ROI extraction against YOLOv5s. For character recognition, we have used
used
the
the second
second part
part (the
(the labelled
labelled characters)
characters) to
to train
train YOLOv5s
YOLOv5s again,
again, obtaining
obtaining a 37 categories
categories
classifier
classifier (27
(27 letters
letters and
and 10
10 numbers).
numbers).
5.3.
5.3. Training
Training
5.3.1. Vehicle Re-Identification Model Training
5.3.1. Vehicle Re-Identification Model Training
Vehicle re-identification is based on FastReid Toolbox; a comparison between some
Vehicle re-identification is based on FastReid Toolbox; a comparison between some
state-of-the-art models and several trained backbone models has been tested. For vehicle
state-of-the-art models and several trained backbone models has been tested. For vehicle
re-identification purposes, the FastReid Toolbox Repository [83] offers already pretrained
re-identification purposes, the FastReid Toolbox Repository [83] offers already pretrained
and optimised architectures. As a backbone, different training strategies with the Effi-
and optimised architectures. As a backbone, different training strategies with the
cientNet [84] family have been performed, with max pooling and convolutional layers
EfficientNet
appended to [84]
theirfamily
outputhave beentuning.
for fine performed, with max pooling and convolutional layers
appended to their output for fine tuning.
Firstly, it has been trained with Standford Cars dataset [65] (Figure 15). To adjust the
Firstly,
dropout and itlearning
has been trained
rate, withtest
the first Standford
trainingCars dataset [65]with
was performed (Figure 15). Toversion
a reduced adjust the
of
dropout and learning rate, the first test training was performed with a reduced
the dataset. This first value refers to the ratio of neural networks in certain layers that are version of
the dataset. This first value refers to the ratio of neural networks in certain
randomly “turned off” during training. In this way, feature extraction is performed via layers that are
randomly
several “turned
paths off” during
(the “firing” neurons)training. In this
and thus, the way, feature
model extraction
is better is performed
generalised. via
The learning
several paths (the “firing” neurons) and thus, the model is better generalised.
rate refers to the speed at which the weights are updated. A reduced value allows many The learning
rate refers
more weightsto the speed
to be at which
added, but atthetheweights
cost of are updated.
a longer A reduced
training time, sovalue
it is allows many
advisable to
more weights
adjust to be added, but at the cost of a longer training time, so it is advisable to
it optimally.
adjust it optimally.
the dataset. This first value refers to the ratio of neural networks in certain layers that are
randomly “turned off” during training. In this way, feature extraction is performed via
several paths (the “firing” neurons) and thus, the model is better generalised. The learning
rate refers to the speed at which the weights are updated. A reduced value allows many
Information 2022, 13, 325 more weights to be added, but at the cost of a longer training time, so it is advisable20to of 26
adjust it optimally.
Figure
Figure 16.16. EfficientNet
EfficientNet B0-B3-B7
B0-B3-B7 comparison
comparison in Stanford-Cars
in Stanford-Cars training.
training.
TheThe graphics
graphics show
show a very
a very similar
similar performance
performance between
between the three
the three models,models,
and soand
we so
we have adopted B0 model, as it is the lightest and the fastest. Additionally,
have adopted B0 model, as it is the lightest and the fastest. Additionally, we have we have
conducted another training run (Figure 17) with the VeRi-776 dataset [67],
conducted another training run (Figure 17) with the VeRi-776 dataset [67], changing changing output
classes.
output TheThe
classes. precision maintains
precision similarity
maintains between
similarity thethe
between three models,
three models,confirming
confirmingthe
election of EfficientNetB0.
the election of EfficientNetB0.
The graphics show a very similar performance between the three models, and so we
have adopted B0 model, as it is the lightest and the fastest. Additionally, we have
conducted another training run (Figure 17) with the VeRi-776 dataset [67], changing
output classes. The precision maintains similarity between the three models, confirming
Information 2022, 13, 325 21 of 26
the election of EfficientNetB0.
Figure
Figure 17.17. EfficientNet
EfficientNet B0-B3-B7
B0-B3-B7 comparison
comparison in VeRi-776
in VeRi-776 training.
training.
Information 2022, 13, x FOR PEER REVIEW5.3.2. License Plate Detection/Character Recognition Training 22 of 27
5.3.2. License Plate Detection/Character Recognition Training
Information 2022, 13, x FOR PEER REVIEW The Spanish ALPR Dataset has been used to train both license plate detection and
22 of 27
The Spanish ALPR Dataset has been used to train both license plate detection and
character recognition. In the first case, we have made a fast-training run (Figure 18) with
character recognition. In the first case, we have made a fast-training run (Figure 18) with
50 epochs in YOLOv5s and the Adam optimizer, obtaining this result:
50 epochs in YOLOv5s and the Adam optimizer, obtaining this result:
Figure
Figure 19. Character
19. Character recognition
recognition training
training graph.
graph.
Figure 19. Character recognition training graph.
5.4. Results
5.4. Results
5.4. The The following tables show the final training results. It is important to mention that
Results
following tables show the final training results. It is important to mention that
we are combining two different methods for vehicle identification within real scenarios, so
we areThe combining
followingtwo different
tables show methods
the final for vehicle
training identification
results. within to
It is important real scenarios,
mention that
these metrics are relative to our own datasets, which have the aim of recreating these real
sowethese metrics aretwo
are combining relative to ourmethods
different own datasets, which
for vehicle have the aim
identification of recreating
within these
real scenarios,
real conditions.
so these metricsOn the
are one hand,
relative weown
to our will datasets,
show results forhave
which vehicle
there-identification,
aim of recreatingtested
these
on the
real HGD andOn
conditions. OUDthe datasets
one hand,(Tables
we will5 show
and 6). On the
results for other
vehicle hand, we will showtested
re-identification, the
license
on theplate
HGDdetection
and OUD and recognition
datasets results,
(Tables 5 and trained
6). Onandthetested
otheron our own
hand, dataset.
we will showThe
the
use of aplate
license specific dataset
detection andforrecognition
ALPR training is due
results, to the
trained andlack
testedof on
bigour
enough European
own dataset. The
Information 2022, 13, 325 22 of 26
conditions. On the one hand, we will show results for vehicle re-identification, tested on
the HGD and OUD datasets (Tables 5 and 6). On the other hand, we will show the license
plate detection and recognition results, trained and tested on our own dataset. The use
of a specific dataset for ALPR training is due to the lack of big enough European license
plate datasets.
Table 5. Accuracy in positive–negative pair test in HGD and OUD datasets for vehicle re-
identification.
Precision
Model HGD OUD v1c1 OUD v2c1 OUD v1 1 OUD v2 1
FastReid (VeRi-776) 97.9% 94.0% 96.6% 87.8% 91.5%
1 Dataset with 2 different input cameras.
Table 6. Rank@1 and rank@10 metrics for vehicle re-identification in OUD datasets.
Rank@1 Rank@10
Model OUD v1 OUD v2 OUD v1 OUD v2
FastReid (VeRi-776) 75.4% 90.6% 94.0% 99.4%
If we attend to the inference time (Table 8), the results throw that it is possible to
perform a whole vehicle identification in less than 10 ms, although only under very ideal
conditions. This time can easily be incremented, due to several factors, such as the overall
data of the processed image.
6. Conclusions
The aim of this article was to develop new solutions for vehicle identification in real
time, and under real operational conditions. To this end, we first pointed out some problems
that concern traditional surveillance systems that are used for vehicle identification, mainly
due to the images taken by the sensors under operational conditions. In addition, we
have highlighted the possibility of solving them by the means of two-factor authentication
mechanisms: a combination of license plates and vehicle characteristics. To help us in our
research, we have compiled the state-of-the-art methods of license plate reading, as well as
vehicle re-identification. After identifying the scarcity of suitable European license plate
datasets, we created our own one, to develop the system. We have also created two others
to test vehicle re-identification under real conditions. Thus, we believe that we offer a fresh
point of view in vehicle identification, through the combination of two robust solutions,
based on updated CNNs, as well as the creation of specific datasets.
Author Contributions: This work is the result of research conducted by the authors in the field of
security. Conceptualization, J.G.-C., Á.R. and J.M.A.; methodology, J.G-C. and J.M.A.; software, J.G.-C.
and Á.R.; validation, J.G.-C., Á.R. and J.M.A.; formal analysis, J.G-C. and J.M.A.; investigation, J.G.-C.
and Á.R.; resources, J.M.A.; data curation, J.G.-C. and Á.R.; writing—original draft preparation,
J.G.-C.; writing—review and editing, J.G.-C., Á.R. and J.M.A.; project administration, J.M.A.; funding
acquisition, J.M.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research has been funded by the following research projects: CCAD (AEI), SEGVAUTO-
4.0-CM and AMBULATE (CM).
Data Availability Statement: Not applicable.
Acknowledgments: Grants PID2019-104793RB-C31 and PDC2021-121517-C31, funded by MCIN/AEI/
10.13039/501100011033, and by the European Union, “NextGenerationEU/PRTR” and the Comu-
nidad de Madrid, through SEGVAUTO-4.0-CM (P2018/EMT-4362). New paradigm for emergency
transport services management: ambulance. AMBULATE-CM. This article is part of the agreement
between the Comunidad de Madrid (Consejería de Educación, Universidades, Ciencia y Portavocía)
and uc3m for the direct award of aid to fund research projects on SARS-CoV-2 and COVID-19 dis-
ease financed with the React-UE resources of the European Regional Development Fund, “A way
for Europe”.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ghosh, G.; Sood, M.; Verma, S. Internet of things based video surveillance systems for security applications. J. Comput. Theor.
Nanosci. 2020, 17, 2582–2588. [CrossRef]
2. Elharrouss, O.; Almaadeed, N.; Al-Maadeed, S. A review of video surveillance systems. J. Vis. Commun. Image Represent. 2021, 77,
103116. [CrossRef]
3. Zhang, S.; Chan, S.C.; Qiu, R.D.; Ng, K.T.; Hung, Y.S.; Lu, W. On the design and implementation of a high definition multi-
view intelligent video surveillance system. In Proceedings of the 2012 IEEE International Conference on Signal Processing,
Communication and Computing (ICSPCC 2012), Hong Kong, China, 12–15 August 2012; pp. 353–357. [CrossRef]
4. Sreenu, G.; Durai, S. Intelligent video surveillance: A review through deep learning techniques for crowd analysis. J. Big Data
2019, 6, 48. [CrossRef]
5. Fernandes, A.O.; Moreira, L.F.E.; Mata, J.M. Machine vision applications and development aspects. In Proceedings of the 2011
9th IEEE International Conference on Control and Automation (ICCA), Santiago, Chile, 19–21 December 2011; pp. 1274–1278.
[CrossRef]
Information 2022, 13, 325 24 of 26
6. Wang, V.; Tucker, J.V. Surveillance and identity: Conceptual framework and formal models. J. Cybersecur. 2017, 3, 145–158.
[CrossRef]
7. Gonzalez, R.C.; Woods, R.E. Digital Image Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2008; ISBN 13: 9788131712863.
8. Safie, S.; Azmi, N.M.A.N.; Yusof, R.; Yunus, M.R.M.; Sayuti, M.F.Z.C.; Fai, K.K. Object Localization and Detection for Real-Time
Automatic License Plate Detection (ALPR) System Using RetinaNet Algorithm. In Intelligent Systems and Applications; IntelliSys
2019; Advances in Intelligent Systems and Computing; Bi, Y., Bhatia, R., Kapoor, S., Eds.; Springer: Cham, Switzerland, 2020;
Volume 1037.
9. Aloul, F.; Zahidi, S.; El-Hajj, W. Two factor authentication using mobile phones. In Proceedings of the 2009 IEEE/ACS International
Conference on Computer Systems and Applications, Rabat, Morocco, 10–13 May 2009; pp. 641–644.
10. De Cristofaro, E.; Du, H.; Freudiger, J.; Norcie, G. A comparative usability study of two-factor authentication. arXiv 2013,
arXiv:1309.5344.
11. Gope, P.; Sikdar, B. Lightweight and privacy-preserving two-factor authentication scheme for IoT devices. IEEE Internet Things J.
2018, 6, 580–589. [CrossRef]
12. Lee, S.; Ong, I.; Lim, H.T.; Lee, H.J. Two factor authentication for cloud computing. J. Inf. Commun. Converg. Eng. 2010, 8, 427–432.
[CrossRef]
13. IEC EN62676-4; Video Surveillance Systems for Use in Security Applications—Part 4: Application Guidelines. International Stan-
dard: Geneva, Switzerland, 2015. Available online: https://standards.globalspec.com/std/9939964/EN%2062676-4 (accessed on
29 June 2022).
14. Bouchrika, I. A survey of using biometrics for smart visual surveillance: Gait recognition. In Surveillance in Action; Springer:
Cham, Switzerland, 2018; pp. 3–23.
15. Devasena, C.L.; Revathí, R.; Hemalatha, M. Video Surveillance Systems—A Survey. Int. J. Comput. Sci. Issues (IJCSI) 2011, 8,
635–642.
16. Renninger, L.W.; Malik, J. When is scene identification just texture recognition? Vis. Res. 2004, 44, 2301–2311. [CrossRef]
17. Gong, S.; Xiang, T. Person Re-identification. In Visual Analysis of Behaviour; Springer: London, UK, 2011. [CrossRef]
18. Layne, R.; Hospedales, T.M.; Gong, S.; Mary, Q. Person re-identification by attributes. BMVC 2012, 2, 8.
19. Shashirangana, J.; Padmasiri, H.; Meedeniya, D.; Perera, C. Automated license plate recognition: A survey on methods and
techniques. IEEE Access 2020, 9, 11203–11225. [CrossRef]
20. García Serrano, A. Aplicación de Sistemas de Percepción Para la Seguridad Vial; Departamento de Ingeniería Eléctrica, Electrónica y
Automática, Universidad Carlos III: Madrid, Spain, 2020.
21. Guevara, M.L.; Echeverry, J.D.; Urueña, W.A. Detección de rostros en imágenes digitales usando clasificadores en cascada. Sci.
Tech. 2008, 1, 38.
22. Sharma, P.S.; Roy, P.K.; Ahmad, N.; Ahuja, J.; Kumar, N. Localisation of License Plate and Character Recognition Using
Haar Cascade. In Proceedings of the 2019 6th International Conference on Computing for Sustainable Global Development
(INDIACom), New Delhi, India, 13–15 March 2019; pp. 971–974.
23. Cuimei, L.; Zhiliang, Q.; Nan, J.; Jianhua, W. Human face detection algorithm via Haar cascade classifier combined with three
additional classifiers. In Proceedings of the 2017 13th IEEE International Conference on Electronic Measurement & Instruments
(ICEMI), Yangzhou, China, 20–22 October 2017; pp. 483–487.
24. Real Decreto 2822/1998, de 23 de Diciembre, por el que se Aprueba el Reglamento General de Vehículos. Spain (1998, mod. 2021).
Available online: https://www.boe.es/buscar/act.php?id=BOE-A-1999-1826 (accessed on 29 June 2022).
25. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
26. Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. Available
online: https://proceedings.neurips.cc/paper/2015/hash/33ceb07bf4eeb3da587e268d663aba1a-Abstract.html (accessed on 29
June 2022).
27. Silva, S.M.; Jung, C.R. License plate detection and recognition in unconstrained scenarios. In Proceedings of the European
Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 580–596.
28. Smith, R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis
and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633.
29. Patel, C.; Patel, A.; Patel, D. Optical character recognition by open-source OCR tool tesseract: A case study. Int. J. Comput. Appl.
2012, 55, 50–56. [CrossRef]
30. Singh, J.; Bhushan, B. Real Time Indian License Plate Detection using Deep Neural Networks and Optical Character Recognition
using LSTM Tesseract. In Proceedings of the 2019 International Conference on Computing, Communication, and Intelligent
Systems (ICCCIS), Greater Noida, India, 18–19 October 2019; pp. 347–352.
31. Goel, T.; Tripathi, K.C.; Sharma, M.L. Single Line License Plate Detection Using OPENCV and tesseract. Int. Res. J. Eng. Technol.
2020, 07, 5884–5887.
32. Dias, C.; Jagetiya, A.; Chaurasia, S. Anonymous vehicle detection for secure campuses: A framework for license plate recognition
using deep learning. In Proceedings of the 2019 2nd International Conference on Intelligent Communication and Computational
Techniques (ICCT), Jaipur, India, 28–29 September 2019; pp. 79–82.
33. Zherzdev, S.; Gruzdev, A. Lprnet: License plate recognition via deep neural networks. arXiv 2018, arXiv:1806.10447.
Information 2022, 13, 325 25 of 26
34. Silva, S.M.; Jung, C.R. Real-time brazilian license plate detection and recognition using deep convolutional neural networks. In
Proceedings of the 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Niteroi, Brazil, 17–20 October
2017; pp. 55–62.
35. Li, H.; Wang, P.; Shen, C. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Trans.
Intell. Transp. Syst. 2018, 20, 1126–1136. [CrossRef]
36. Xu, Z.; Yang, W.; Meng, A.; Lu, N.; Huang, H.; Ying, C.; Huang, L. Towards end-to-end license plate detection and recognition: A
large dataset and baseline. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14
September 2018; pp. 255–271.
37. Pirgazi, J.; Kallehbasti, M.M.P.; Ghanbari Sorkhi, A. An End-to-End Deep Learning Approach for Plate Recognition in Intelligent
Transportation Systems. Wirel. Commun. Mob. Comput. 2022, 2022, 3364921. [CrossRef]
38. Kaur, P.; Kumar, Y.; Ahmed, S.; Alhumam, A.; Singla, R.; Ijaz, M.F. Automatic License Plate Recognition System for Vehicles Using
a CNN. CMC-Comput. Mater. Contin. 2022, 71, 35–50.
39. Hossain, S.N.; Hassan, M.; Masba, M.; Al, M. Automatic License Plate Recognition System for Bangladeshi Vehicles Using Deep
Neural Network. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning; Springer: Singapore, 2022;
pp. 91–102.
40. Zandi, M.S.; Rajabi, R. Deep Learning Based Framework for Iranian License Plate Detection and Recognition. Multimedia Tools
Appl. 2022, 81, 15841–15858.
41. Ashrafee, A.; Khan, A.M.; Irbaz, M.S.; Nasim, A.; Abdullah, M.D. Real-time Bangla License Plate Recognition System for Low
Resource Video-based Applications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
Waikoloa, HI, USA, 4–8 January 2022; pp. 479–488.
42. Chiu, Y.C.; Tsai, C.Y.; Ruan, M.D.; Shen, G.Y.; Lee, T.T. Mobilenet-ssdv2: An improved object detection model for embedded
systems. In Proceedings of the 2020 International Conference on System Science and Engineering (ICSSE), Kagawa, Japan,
31 August–3 September 2020; pp. 1–5.
43. Padmasiri, H.; Shashirangana, J.; Meedeniya, D.; Rana, O.; Perera, C. Automated License Plate Recognition for Resource-
Constrained Environments. Sensors 2022, 22, 1434. [CrossRef] [PubMed]
44. Ali, F.; Rathor, H.; Akram, W. License Plate Recognition System. In Proceedings of the 2021 International Conference on Advance
Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021; pp. 1053–1055.
[CrossRef]
45. Yang, C.; Zhou, L. Design and Implementation of License Plate Recognition System Based on Android. In Proceedings of the 11th
International Conference on Computer Engineering and Networks; Springer: Singapore, 2022; pp. 211–219.
46. Kessentini, Y.; Besbes, M.D.; Ammar, S.; Chabbouh, A. A two-stage deep neural network for multi-norm license plate detection
and recognition. Expert Syst. Appl. 2019, 136, 159–170. [CrossRef]
47. Laroca, R.; Severo, E.; Zanlorensi, L.A.; Oliveira, L.S.; Gonçalves, G.R.; Schwartz, W.R.; Menotti, D. A robust real-time automatic
license plate recognition based on the YOLO detector. In Proceedings of the 2018 International Joint Conference on Neural
Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–10.
48. OpenALPR. Openalpr-Eu Dataset. 2016. Available online: https://github.com/openalpr/benchmarks/tree/master/endtoend/
eu (accessed on 29 June 2022).
49. Chan, L.Y.; Zimmer, A.; da Silva, J.L.; Brandmeier, T. European Union Dataset and Annotation Tool for Real Time Automatic
License Plate Detection and Blurring. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation
Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6.
50. Yang, H.; Cai, J.; Zhu, M.; Liu, C.; Wang, Y. Traffic-Informed Multi-Camera Sensing (TIMS) System Based on Vehicle Re-
Identification. IEEE Trans. Intell. Transp. Syst. 2022. [CrossRef]
51. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823.
52. Wang, Y. Deep learning technology for re-identification of people and vehicles. In Proceedings of the 2022 IEEE International
Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 25–27 February 2022; pp. 972–975.
53. Wang, H.; Hou, J.; Chen, N. A survey of vehicle re-identification based on deep learning. IEEE Access 2019, 7, 172443–172469.
[CrossRef]
54. Mai, L.; Chen, X.Z.; Yu, C.W.; Chen, Y.L. Multi-view Vehicle Re-Identification Method Based on Siamese Convolutional Neural
Network Structure. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan),
Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2.
55. Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019;
pp. 3235–3243.
56. Zheng, Z.; Ruan, T.; Wei, Y.; Yang, Y.; Mei, T. VehicleNet: Learning robust visual representation for vehicle re-identification. IEEE
Trans. Multimed. 2020, 23, 2683–2693. [CrossRef]
57. Khan, S.D.; Ullah, H. A survey of advances in vision-based vehicle re-identification. Comput. Vis. Image Underst. 2019, 182, 50–63.
[CrossRef]
Information 2022, 13, 325 26 of 26
58. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv.
Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053
c1c4a845aa-Abstract.html (accessed on 29 June 2022).
59. Bai, S.; Zheng, Z.; Wang, X.; Lin, J.; Zhang, Z.; Zhou, C.; Yang, H.; Yang, Y. Connecting language and vision for natural language-
based vehicle retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN,
USA, 20–25 June 2021; pp. 4034–4043.
60. He, L.; Liao, X.; Liu, W.; Liu, X.; Cheng, P.; Mei, T. Fastreid: A pytorch toolbox for general instance re-identification. arXiv 2020,
arXiv:2006.02631.
61. Tian, X.; Pang, X.; Jiang, G.; Meng, Q.; Zheng, Y. Vehicle Re-Identification Based on Global Relational Attention and Multi-
Granularity Feature Learning. IEEE Access 2022, 10, 17674–17682. [CrossRef]
62. Li, Y.; Liu, K.; Jin, Y.; Wang, T.; Lin, W. VARID: Viewpoint-Aware Re-IDentification of Vehicle Based on Triplet Loss. IEEE Trans.
Intell. Transp. Syst. 2022, 23, 1381–1390. [CrossRef]
63. Li, K.; Ding, Z.; Li, K.; Zhang, Y.; Fu, Y. Vehicle and Person Re-Identification with Support Neighbor Loss. IEEE Trans. Neural
Netw. Learn. Syst. 2022, 33, 826–838. [CrossRef] [PubMed]
64. Meng, D.; Li, L.; Liu, X.; Gao, L.; Huang, Q. Viewpoint Alignment and Discriminative Parts Enhancement in 3D Space for Vehicle
ReID. IEEE Trans. Multimed. 2022. [CrossRef]
65. Krause, J.; Stark, M.; Deng, J.; Li, F.-F. 3D object representations for fine-grained categorization. In Proceedings of the IEEE
International Conference on Computer Vision, Sydney, Australia, 2–8 December 2013.
66. Liu, H.; Tian, Y.; Yang, Y.; Pang, L.; Huang, T. Deep relative distance learning: Tell the difference between similar vehicles.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016;
pp. 2167–2175.
67. Liu, X.; Liu, W.; Ma, H.; Fu, H. Large-scale vehicle re-identification in urban surveillance videos. In Proceedings of the 2016 IEEE
International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016; pp. 1–6.
68. Liu, X.; Liu, W.; Mei, T.; Ma, H. A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In
European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 869–884.
69. Bai, Y.; Lou, Y.; Gao, F.; Wang, S.; Wu, Y.; Duan, L.Y. Group-sensitive triplet embedding for vehicle reidentification. IEEE Trans.
Multimed. 2018, 20, 2385–2399. [CrossRef]
70. Guo, H.; Zhao, C.; Liu, Z.; Wang, J.; Lu, H. Learning Coarse-to-Fine Structured Feature Embedding for Vehicle Re-Identification.
In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence AAAI18, New Orleans, LA, USA, 2–7 February
2018; pp. 6853–6860.
71. Tang, Z.; Naphade, M.; Liu, M.Y.; Yang, X.; Birchfield, S.; Wang, S.; Kumar, R.; Anastasiu, D.; Hwang, J.N. Cityflow: A city-scale
benchmark for multi-target multi-camera vehicle tracking and re-identification. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8797–8806.
72. ElRashidy, A.; Ghoneima, M.; Abd El Munim, H.E.; Hammad, S. Recent Advances in Vision-based Vehicle Re-identification
Datasets and Methods. In Proceedings of the 2021 16th International Conference on Computer Engineering and Systems (ICCES),
Cairo, Egypt, 15–16 December 2021; pp. 1–6.
73. Song, Y.; Liu, C.; Zhang, W.; Nie, Z.; Chen, L. View-Decision Based Compound Match Learning for Vehicle Re-identification in
UAV Surveillance. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp.
6594–6601.
74. Liu, C.; Song, Y.; Chang, F.; Li, S.; Ke, R.; Wang, Y. Posture Calibration Based Cross-View & Hard-Sensitive Metric Learning for
UAV-Based Vehicle Re-Identification. IEEE Trans. Intell. Transp. Syst. 2022. [CrossRef]
75. Yao, A.; Qi, J.; Zhong, P. Self-aligned Spatial Feature Extraction Network for UAV Vehicle Re-identification. arXiv 2022,
arXiv:2201.02836.
76. Jocher, G. YoloV5 by Ultralytics. Available online: https://github.com/ultralytics/yolov5 (accessed on 29 June 2022).
77. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
78. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790.
79. Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training
sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA,
13–19 June 2020; pp. 9759–9768.
80. Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516.
81. Lee, Y.; Park, J. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13906–13915.
82. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in
context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755.
83. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the
European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland; pp. 21–37.
84. JDAI Computer Vision. Fast-Reid Repository. 2021. Available online: https://github.com/JDAI-CV/fast-reid (accessed on 29
June 2022).