Towards an Optimised Vehicle Detection
                              Algorithm for Multi-Object Tracking in Traffic
                                              Surveillance

                                  Soumi Mitra3 , Nandhini Reddy Aileni3 , Irina Tal1,3 , and Malika
                                                        Bendechache1,2,3
                                               1
                                                 Lero – The Irish Software Research Centre
                                         2
                                            ADAPT – Science Foundation Ireland Research Centre
                                          3
                                            School of Computing, Dublin City University, Ireland
                                            {soumi.mitra2, nandhini.aileni2}@mail.dcu.ie
                                               {irina.tal, malika.bendechache}@dcu.ie


                                 Abstract. Smart cities implementation has been increasing in recent
                                 years, using computer vision techniques which help to reduce the prob-
                                 lems of traffic congestion and also monitor traffic. Computer vision-based
                                 detection and tracking methods using convolutional neural networks are
                                 being successful in implementing smart cities, using cameras on the
                                 streets and at a low cost for the internet of things. Convolutional neu-
                                 ral networks are deep learning algorithms used to analyse visual imagery.
                                 However, the same vehicle is detected multiple times in a particular frame
                                 at a particular timestamp, which thereby increases the time complexity
                                 of the algorithm by detecting the same vehicle multiple times. This paper
                                 proposes a new optimised algorithm that solves the vehicle duplication
                                 problem. Our proposed approach is based on extending and optimising
                                 the state of the art Point-RCNN (Region-based Convolutional Neural
                                 Network) algorithm by combining it with the D-Hash (Difference hash)
                                 algorithm. The D-Hash is a robust image hashing algorithm used for the
                                 identification of duplicate images. These images are then processed onto
                                 a 3D multiple object tracking system called Point-RCNN which is used
                                 for the bounding and identification of vehicles. The proposed algorithm
                                 was tested on the KITTI 3D object detection benchmark data set. Our
                                 experiments show that the vehicle duplication issue is eliminated without
                                 sacrificing the accuracy of data object detection results. In addition, our
                                 proposed approach decreases the time complexity by passing 70 more
                                 frames per second (FPS) when compared to the Point-RCNN baseline.
                                 The execution time (speed) of the proposed algorithm is also improved
                                 by almost 34% compared to the baseline.


                          Keywords: Vehicle Detection, Point-RCNN, Vehicle Duplication, D-Hash.

                          1    Introduction
                          Transportation is one of the most significant domains where actionable insights
                          drawn from data gathered by camera sensors can be beneficial [10]. In the last few


Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
2       S. Mitra et al.

years, various analytical approaches and video data have been processed together
with the increased computing power, which has enabled new applications as well.
Traffic analysis can be a very important aspect of smart city implementation as
it helps in reducing various traffic related issues like traffic congestion, accidents,
etc.
    Deep learning is a technique where insights from data can be driven by under-
standing the patterns of the data, thereby classifying it [20][1]. A technique that
helps computers to understand digital images or videos is defined as computer
vision [3]. Deep learning models, in particular, convolutional neural networks,
help with object detection. Object Detection is defined as a combination of image
classification and object localization. Image classification helps in the classifica-
tion of images by assigning a class label. Object localization helps by bounding
a box around the objects in that image. Object detection helps in bounding
boxes around the objects of interest in an image [2]. Unique Id’s are gener-
ated from object detection, which furthermore helps in tracking objects moving
around frames in a video [13]. Deep learning models and computer vision help
in monitoring the traffic rate.
    3D object detection and Multiple Object Tracking are considered essential
applications for autonomous driving [18]. 3D object detection helps in capturing
an object’s size, position, and orientation in the world. 3D object detection is
more accurate when compared to 2D object detection, especially in real-time
systems. The LiDAR (Light Detection and Ranging) point cloud helps in ob-
taining 3D detection [13]. 3D object detection automatically produces semantic
masks for 3D object segmentation, whereas 2D provides weak semantic segmen-
tation [12]. The visual system that tracks multiple moving objects in a dynamic
environment is defined as Multiple Object Tracking.
    However, the existing approaches and algorithms suffer from vehicle dupli-
cation issues where the same vehicle is being identified multiple times in a par-
ticular timestamp.
    In this paper, we have presented a new algorithm by merging the baseline
Point RCNN and D-Hash algorithm which helps to resolve the issue of vehicle
duplication without sacrificing the accuracy of the baseline.
    The paper is aligned as following. Section 2 introduces related works about
object detection and tracking using deep learning techniques. Section 3 gives
a brief description of the data set used, pre-processing techniques and the al-
gorithms used for our proposed approach. Section 4 introduces our proposed
approach. Section 5 presents the results and their evaluation metrics. We make
concluding remarks in Section 6.


2    Literature Review
This section summarises machine learning, particularly deep learning algorithms
such as regions convolutional neural networks (RCNN) were extensively used for
traffic analysis using video and imaging data. This section also summarises the
KITTI dataset benchmark.
          Optimised Vehicle Detection Algorithm For Multi-Object Tracking          3

2.1   Deep Learning Algorithms Used
In [11], a new method was proposed for Vehicle detection with sub-class training,
where data was trained on convolutional neural networks based on faster RCNN
architecture. Vehicle detection is performed on Sub-Classes categories learning
using RCNN in order to improve the performance of vehicle detection by identi-
fying different categories of vehicles in different orientations as well as different
climatic conditions. The transfer learning approach is also evaluated for fine-
tuning a pre-trained RCNN model. Coco dataset is used for training the neural
networks and features are extracted from them. The extracted features are tested
on the UA-DETRAC dataset. Average precession is the evaluation metric used
for vehicle detection. Where the proposed method with various variations was
evaluated on the validation set comparing to the four baseline methods which are
Deformable Part Model (DPM), Aggregate channel features (ACF), RCNN, and
CompACT. The accuracy of vehicle detection using Faster RCNN was 93.43%.
However, the issue of multiple bounding boxes overlaps from different subclasses
on the same object.
    In 2019, another paper was published in [8] where the authors proposed a
low confidence track filtering extension on the Deep SORT tracking algorithm,
which can significantly reduce false-positive tracks generated by Deep SORT.
Tracks with low average detection confidence in their initial several frames will
be deleted. In this way, the detection confidence can be set to a lower value
and even zero to avoid missing detections. They also generated a vehicle re-
identification dataset from the UA-DETRAC dataset to train the Deep SORT
for vehicle data association. Experiments on the UA-DETRAC test dataset show
that the proposed extension can achieve promising results by notable margins
against state-of-the-art trackers. The evaluation metric used in this paper was
PR-MOTA and average detection confidence threshold. But, this paper does not
address the issue of vehicle duplication.
    In 2019, [9] proposed a new method, a multi-scale detector for accurate ve-
hicle detection using UA-DETRAC data set. Where additional prediction layers
are integrated into conventional Yolo-v3 using spatial pyramid pooling. Initially,
for vehicle detection vehicles, MS-COCO pre-trained Yolo-v3 model is used to
initialize the Darknet-53 network which is trained in an end-to-end manner
with Stochastic Gradient Descent (SGD). This later generates feature maps,
which acts as an input for Feature Pyramid Network (FPN). 2 more predic-
tion layers and 5 more SPP networks with batch normalization are used to
accounting various object scales. Feature maps from layers at different stages
have different dimensions, sampling operation is performed to combine them
effectively. The pooling layer is used to progressively reduce the dimension of
feature representation from the convolution layer, inserted in-between succes-
sive convolution layers. Overall mean Average precession (mAP) was calculated
for DPM (Deformable Part-based Models ), ACF ( Aggregate Channel Features
), RCNN, Faster RCNN2, SA-FRCNN, NANO, CompACT (Complexity Aware
Cascade Training), EB (Evolving Boxes), R-FCN (Region-Based Fully CNN),
GP-FRCNN (Geometric Proposals for Faster RCNN), HAVD, SSD-VDIG and
4      S. Mitra et al.

conventional Yolo-v3 using spatial pyramid pooling with 2 additional predic-
tion layers which is the proposed method. The proposed method outperformed
the existing algorithms with an accuracy of 85.29%. However, vehicle detection
speed was low when compared with RCNN.
    In 2020, another paper was published [14] where they proposed the motion
priors embedded parallel architecture for surveillance vehicle detection. The key
is to properly leverage motions by decoupling moving objects from overall ve-
hicles, in order to enhance vehicle appearance while carefully suppressing false
positives in the background. Following the protocol of UA-DETRAC, they had
submitted the results of their detector with the input size of 512×512 to the
public testing server for evaluation. The evaluation metrics used here were true
positives and false positives. They achieved an overall accuracy of 80.76% AP
while maintaining the fastest speed of 14 FPS among these detectors. In terms
of the performance under different weather conditions, their approach obtained
competitive results on the cloudy and sunny subset and outperforms the other
methods on the night and rainy subset. They attributed the stable performance
under various conditions especially the bad weather to the proper use of motion
priors. Detectors only use geometric features are susceptible to unexpected en-
vironments when detecting vehicles in real traffic, thus motions are very critical
to generate robust predictions in surveillance vehicle detection.
    Traffic congestion and occlusion are some of the major challenges in vehicle
detection. A new methodology was developed in 2020 by [15] for vehicle detec-
tion under complicated conditions. A combination of the MOG2 (Mixture of
Gaussians) Algorithm and H-Squeeze Net Algorithm were used to accurately
identify vehicles and their respective categories. MOG2 acts as a background
subtraction model which generates ROI’s (Region of Interest) from a set of video
frames. These generated ROI’s helped in avoiding the bounding box problem.
Whereas the H-Squeeze Net algorithm is used for identifying various vehicle cat-
egories and the complete classification of vehicles is determined by using Softmax
classifier. Evaluation is performed on traffic data from a traffic intersection in
Suzhou, China, CDNet 2014 and UA-DETRAC data sets. The proposed model
is compared with the H-SqueezeNet model, with original SqueezeNet and other
state-of-the-art networks such as VGG16, VGG19, Inception-v3, ResNet and
Darknet-53. Out of all the metrics calculated accuracy and model size has helped
in determining that this proposed model has outperformed the state-of-the-art
models. In addition, the problem of false positives is reduced by using MOG2
and H-Squeeze Algorithms. However, different types of vehicle categories ex-
cept for cars, trucks and busses are not identified and vehicle detection speed is
comparatively low.
    A new methodology was proposed in [12], where 3D objects were detected
using Point-RCNN from the raw point of the cloud. 3D objects are detected and
bounded in a combination of two stages. Where in the first stage global semantic
features are generated by a bottom-up approach is used where high-quality 3D
proposals are generated from the point cloud and these help in separating the
foreground and background points. Whereas in the second stage local spatial
          Optimised Vehicle Detection Algorithm For Multi-Object Tracking         5

features are generated and by refining the proposal of canonical coordinates.
Later global semantic features of stage 1 and local spatial features of stage
two are combined, thereby accurately bounding 3D objects. The evaluation was
performed on the KITTI dataset and obtained a 96.01% recall value.
   A 3D multiple object tracking was proposed in [18], where objects were
detected using LiDAR point cloud on a KITTI dataset. 3D Kalman Filter in
combination with Hungarian algorithm is used for data association and state es-
timation [17]. The state-space of the Kalman filter is defined in the image plane
thereby extending the state of objects to 3D including location, size, velocity,
and orientation. The proposed algorithm was evaluated on 2D and 3D MOT
systems, where MOTA, MOTP, IDS, and FPS were used as evaluation metrics.
The algorithm outperformed the 3D MOT system with 207.4 FPS (frames per
second), by achieving the highest speed.
     In this paper, some image hashing algorithms are introduced and compared
[4], which helps in detecting similar images from a large social network dataset.
A-Hash, P-Hash, D-Hash, and W-Hash algorithms were considered for evaluat-
ing the robustness of large data set using precision, recall, and F1 score as their
evaluation metrics. Experiments were conducted to infer that the P-hash algo-
rithm outperformed the remaining three algorithms with an F1 score of 0.864 at
precision and recall values of 0.926 and 0.81 respectively, for a distance threshold
N = 16. Followed by the D-hash algorithm, with an F1 score of 0.846 at precision
and recall values of 0.952 and 0.761 respectively, for a distance threshold N =
14. W-hash and A-hash algorithms obtained lower f1 score values.

2.2   KITTI Dataset

A KITTI dataset benchmark was developed [6] for stereo, optical flow, visual
odometry, 3D object detection, and 3D tracking. 194 training and 195 testing
images are used with a resolution of 1240 x 376 pixels for stereo and optical flow
estimation benchmark, which includes difficulties such as gunman version and
reflecting surface. Evaluation provides results for all non-occluded as well as all
ground truth pixels. 3D odometry dataset consists of 22 stereo videos with a total
length of 40 kilometres, this video provides GPS ground truth trajectory. The
proposed evaluation metrics minimize bias by computing errors over all possible
sequences for a given trajectory length or driving speed our online evaluation
server, evaluate submit results as a function of these two variables capturing
different sources of error. 3D Object detection, object orientation, and tracking
benchmarks provide accurate 3D information in the form of 3D bounding boxes
for object classes such as cars, vans, pedestrians, and cyclists. 3D object ground
truth values are generated by annotating 3D bounding box trackers to all objects
visible in the image.
    The goal of our proposed approach is to improve the baseline algorithms by
solving the vehicle duplication problem without sacrificing the accuracy of data
object detection.
6      S. Mitra et al.

3     Methodology
3.1   Dataset Description
KITTI data set has been obtained from a moving platform in Karlsruhe, Ger-
many [7]. This dataset is created by capturing the visuals on highways and rural
areas using high-resolution stereo cameras with both greyscale and colour sys-
tems. Velodyne HDL-64E, laser scanner which produces more than one million
3D points per second, and OXTS RT 3003 localization system which combines
GPS, GLONASS, an IMU, and RTK correction signals. The cameras, laser scan-
ner, and localization system are calibrated and synchronized, thereby providing
accurate preliminary values. Figure 1 shows one frame of the KITTI 3D Vision
Benchmark Suite dataset.


                    Fig. 1. KITTI 3D Vision Benchmark Suite


    In our approach, a part of the KITTI dataset, a 3D object detection bench-
mark is adapted. Where a total of 7481 training images and 7518 testing images
with 80.256 labeled objects have been gathered from corresponding point clouds.
    Object detection and 3D orientation estimation are the important charac-
teristics of the KITTI 3D object benchmark [6]. Accurate 3D bounding boxes
are bounded for different object classes such as cars, cyclists, pedestrians, and
these bounding boxes are obtained by manual labels of objects in 3D point
clouds which are produced by the Velodyne system. Benchmark data is chosen
by employing a greedy approach which uses 100 non-occluded objects per class
along with 16 entropy object-orientation classes. Average Orientation Similarity
(AOS) is an evaluation metric used where it is defined as True Positives over
True Positives and False Negatives.
                                         TP
                              AOS =
                                       TP + FN
   True positives are to be overlapped by more than 50% and the multiple
detections of the same object are considered as false negatives.


3.2   Data pre-processing
In our proposed algorithm, we have merged the D-Hash algorithm with Point-
RCNN to remove the vehicle duplication issue and also for better detection
and tracking. To implement the D-Hash algorithm, firstly, we had to convert
the entire dataset images to grayscale format and also resized and flatten the
images to calculate hamming distance. This step increases the efficiency of the
          Optimised Vehicle Detection Algorithm For Multi-Object Tracking       7

algorithm, as the resized images reduce the time complexity and also helps to
calculate the hamming distance which is an important factor in our proposed
algorithm. Figure 2 shows the data preprocessing steps.


                        Fig. 2. Data Pre-processing Steps


3.3   Algorithms Used
In this paper, we propose a new algorithm that uses point cloud technology along
with the perceptual image hashing technique. We have merged the hashing or
D-Hash algorithm with Point-RCNN. The following sections discuss the two
algorithms used.
Point-RCNN (The Baseline): Point-CNN is a 3-dimensional framework used
for object detection from raw point clouds. A point cloud is a bunch of data
points that generate a 3-dimensional form or shape. Each point in the point cloud
has its own 3-dimensional X, Y, and Z coordinates. Methods like remote sensing
or photogrammetry are used to generate point clouds. In photogrammetry, a
bunch of photos is taken in many dimensions to generate point clouds. And in
remote sensing, satellites or aircraft are used to collect pictures or data from
the globe. LiDAR (Light Detection and Ranging) sensors are also used in this
process of collecting data from the earth’s surface and are later used to generate
the point cloud [5].
    We have considered this 3D object detection approach as it generates a more
accurate output than the conventional 2D methods. Point-RCNN uses a 2-stage
method to generate the detection. The first stage uses a bottom-up approach
to generate 3D proposals and the second stage refines the proposals into canon-
ical coordinates, which are the X, Y, Z coordinates from the point cloud, to
achieve the final detection, which is more accurate than the detections found
in 2D methods. Also, this process achieves a higher speed than conventional
methods. This technique avoids the use of a large number of 3D anchor boxes
throughout the 3D environment, saving time and effort. The KITTI 3D object
8       S. Mitra et al.

detection benchmark is used to evaluate this algorithm and it outperforms all
the conventional methods in terms of time complexity and effort. Figure 3 shows
the architectural diagram of Point-RCNN.


                      Fig. 3. Point-RCNN architectural diagram


D-Hash: There are so many perceptual hashing algorithms for image hashing,
but we have considered D-Hash as it provides the best accuracy and speed [4]. In
the D-Hash procedure, before hashing, the images are converted into grayscale
and then reduced in size to compute the difference between two pictures. It ba-
sically focuses on the picture structure. This principle uses Hamming Distance
as a comparison parameter to detect duplicate images in any dataset. Dupli-
cate images detected from D-Hash algorithm are removed from our dataset by
applying the remove function os.remove().
    It is a very simple method to implement. Firstly, all the images are converted
into grayscale and reduced to a block size of 9x8, which results in a total of 72
pixels. Then, the difference between each adjacent pair of pixels is calculated in a
row, for a total of 8 differences in a row. The output of 8 rows with 8 differences
produces a result of 64 bits. Then bits are assigned to them. Each bit is set in
such a way that the left or right pixel will be the brighter one. Two images are
considered to be the same if the hamming distance between them is less than 5.
    To compare two binary data strings of equal length, Hamming distance is
used. The XOR operation is used here to calculate the distance. The Hamming
distance is mostly used in computer networking and coding theory as an error
detection and correction metric [19].
    The KITTI 3D object detection benchmark is used to evaluate D-Hash and it
helps in eliminating vehicle duplication issues in vehicle detection and tracking.


4   Our Proposed Approach
Figure 4 shows an overview of the proposed approach. Our proposed algorithm
merges the D-Hash algorithm with Point-RCNN to eliminate the vehicle dupli-
cation issue and also for better detection and tracking. We have merged a 3D
Kalman filter with the Hungarian technique from the baseline study [4], which
uses a 3D object detector to extract 3D detections from the LiDAR point cloud.
Kalman filters are used mostly in dynamic systems where the data is uncertain.
It can sometimes determine what may happen next in a real-time system. It is
          Optimised Vehicle Detection Algorithm For Multi-Object Tracking         9


            Fig. 4. Architectural Diagram of proposed 3D MOT system


widely used in systems where data is continuously changing over time. That’s
why it can be used very efficiently in multi-object tracking systems. As Kalman
filters don’t store historical data, they are memory efficient and also very fast to
execute, which makes them very suitable for real-time MOT systems [16].
     The Hungarian technique is a combinational optimized algorithm which is
used to solve problems in polynomial time. This technique works on an iter-
ative basis and in a very optimized way. Hence, it is very useful in real-time
applications like 3D MOT (multiple object tracking) systems.
     For car and cyclist divides, we employed Point-RCNN detections on the
KITTI 3D object detection dataset [4]. In addition, to reduce time complex-
ity, we combined the D-Hash technique to delete duplicate photos from track
sequences. Thereby removing the duplicate images in the track sequence. Un-
like other filter-based MOT systems, which define the filter’s state space on the
image plane, the state space of the objects is expanded to three dimensions,
including three dimensions of location, size, velocity, and orientation [4]. Figure
5 shows how 3D objects are getting detected using our approach.


              Fig. 5. 3D Object Detection using Proposed Algorithm


    We have merged the Point-RCNN and D-Hash algorithms and applied them
to the KITTI 3D object detection dataset to evaluate them. The data is passed
through the 3D MOT system along with the D-Hash layer so that the duplicate
images are removed and unique objects are bound in 3D bounding boxes. This
technique achieves the same accuracy as the baseline Point-RCNN and also works
10     S. Mitra et al.

at faster FPS (Frames per second) than the baseline one. The KITTI 3D object
detection dataset has three object types: cars, pedestrians and cyclists. For our
approach, we have trained and tested cars and cyclists.


5    Evaluation & Results
The KITTI 3D object detection dataset, which includes LiDAR point clouds and
ground truth 3D bounding box trajectories, is used to evaluate our proposed
approach. We have used the KITTI validation set for 3D MOT assessment be-
cause the KITTI test set only allows 2D MOT evaluation and its ground truth
is not available to users [4]. We compared the car and cyclist subsets of the
KITTI dataset, based on previous research. We have used the same evaluation
metrics used in the baseline Point-RCNN to evaluate our proposed algorithm.
We have used MOTA (Multi Object Tracking Accuracy), MOTP (Multi Ob-
ject Tracking Precision), MODA (Multi Object Detection Accuracy), MODP
(Multi Object Detection Precision), sMOTA (scaled MOTA), AMOTA (aver-
age MOTA), AMOTP (average MOTP) along with other CLEAR metrics like
Precision, Recall, F1 to evaluate our algorithm.
    We have run our tracker on the KITTI 3D object detection validation set
with our proposed algorithm detection and achieved sMOTA 93.28%, AMOTA
45.43%, AMOTP 77.41% for the car object and sMOTA 72.94%, AMOTA 37.95%,
AMOTP 63.03% exactly the same as per the baseline.
    We have summarised the results in Table 1 which shows that the proposed
algorithm generates the same metrics as per the baseline.


            Table 1. Comparing Point-RCNN and Proposed Algorithm

                    Car                     Cyclist
     Evaluation     Point-RCNN Proposed Al- Point-RCNN       Proposed
     metrics                   gorithm                       Algorithm
     MOTA           86.24%     86.24%       79.82%           79.82%
     MOTP           78.43%     78.43%       76.55%           76.55%
     MODA           86.24%     86.24%       79.82%           79.82%
     MODP           83.11%     83.11%       95.70%           95.70%
     sMOTA          93.28%     93.28%       72.94%           72.94%
     AMOTA          45.43%     45.43%       37.95%           37.95%
     AMOTP          77.41%     77.41%       63.03%           63.03%
     Recall         92.17%     92.17%       84.49%           84.49%
     Precision      96.22%     96.22%       95.55%           95.55%
     F1             94.15%     94.15%       89.68%           89.68%


   Our proposed algorithm speed is 277.7 FPS (i.e., number of frames passed
per second in a sequence. In other words, it is the number of distinct images
captured in a second.) while the speed of the baseline one is 207.4 FPS which
          Optimised Vehicle Detection Algorithm For Multi-Object Tracking    11

is almost 34% more than the baseline one. Figure 6 shows the FPS comparison
between Point-RCNN and proposed algorithm.


          Fig. 6. Comparing FPS of Point-RCNN and Proposed Algorithm


6      Conclusion & Future Work
The existing baseline algorithm suffered from a vehicle duplication issue where
the same vehicle was identified multiple times in a particular timestamp. This
issue thereby increased the time complexity of the baseline algorithms. Using
our algorithm, vehicle duplication issue in multi object detection and tracking
in traffic surveillance is eliminated. The baseline Point-RCNN algorithm merged
with the D-Hash algorithm which also gives a multi object detection and tracking
solution with similar accuracy as the baseline and reduces time complexity by
increasing the speed to almost 34%.
    To evaluate our algorithm, we only used the KITTI 3D object detection
benchmark suite. In the future, we hope to evaluate our algorithm using other
similar datasets, such as the UA-DETRAC benchmark suite. We will also at-
tempt to improve the algorithm’s performance in light of the various weather
conditions (like sunny, rainy, etc).


Acknowledgement
This work was supported in part by the Science Foundation Ireland grants
13/RC/2094 P2 (Lero) and 13/RC/2106 P2 (Adapt).


References
 [1]   Sweta Bhattacharya et al. A Review on Deep Learning for Future Smart
       Cities. May 2020. doi: 10.1002/itl2.187.
12      S. Mitra et al.

 [2]   Jason Brownlee. A Gentle Introduction to Object Recognition With Deep
       Learning. en-US. https://machinelearningmastery.com/object-recognition-
       with-deep-learning/. May 2019. (Visited on 08/05/2021).
 [3]   Deep Learning for Computer Vision. en-US. https://machinelearningmastery.
       com/deep-learning-for-computer-vision/.
 [4]   Andrea Drmic et al. Evaluating robustness of perceptual image hashing
       algorithms. 2017. doi: 10.23919/MIPRO.2017.7973569.
 [5]   FME Community. https://community.safe.com/s/article/what-is-
       a-point-cloud-what-is-lidar.
 [6]   Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for Au-
       tonomous Driving? The KITTI Vision Benchmark Suite. 2012.
 [7]   Andreas Geiger et al. Vision meets robotics: the KITTI dataset. Sept. 2013.
       doi: 10.1177/0278364913491297.
 [8]   Xinyu Hou, Yi Wang, and Lap-Pui Chau. Vehicle Tracking Using Deep
       SORT with Low Confidence Track Filtering. 2019.
 [9]   Kwang-Ju Kim et al. Multi-Scale Detector for Accurate Vehicle Detection
       in Traffic Surveillance Data. 2019. doi: 10.1109/ACCESS.2019.2922479.
[10]   Posted on November 4 and 2020. How Computer Vision is shaping smart
       cities. en. https : / / www . phase1vision . com / blog / how - computer -
       vision-is-shaping-smart-cities. (Visited on 08/05/2021).
[11]   Sitapa Rujikietgumjorn and Nattachai Watcharapinchai. Vehicle detection
       with sub-class training using R-CNN for the UA-DETRAC benchmark.
       Aug. 2017. doi: 10.1109/AVSS.2017.8078520.
[12]   Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. PointRCNN: 3D Object
       Proposal Generation and Detection from Point Cloud. 2019. arXiv: 1812.
       04244 [cs.CV].
[13]   Shaoshuai Shi et al. From Points to Parts: 3D Object Detection from Point
       Cloud with Part-aware and Part-aggregation Network. 2020. arXiv: 1907.
       03670 [cs.CV].
[14]   Xiaolian Wang et al. Illuminating Vehicles With Motion Priors For Surveil-
       lance Vehicle Detection. 2020. doi: 10.1109/ICIP40778.2020.9190727.
[15]   Zhiyuan Wang et al. A Robust Vehicle Detection Scheme for Intelligent
       Traffic Surveillance Systems in Smart Cities. 2020. doi: 10.1109/ACCESS.
       2020.3012995.
[16]   Greg Welch, Gary Bishop, et al. An introduction to the Kalman filter. 1995.
[17]   Xinshuo Weng et al. 3D Multi-Object Tracking: A Baseline and New Eval-
       uation Metrics. 2020. arXiv: 1907.03961 [cs.CV].
[18]   Xinshuo Weng et al. AB3DMOT: A Baseline for 3D Multi-Object Tracking
       and New Evaluation Metrics. 2020. arXiv: 2008.08063 [cs.CV].
[19]   What is Hamming Distance. https://www.tutorialspoint.com/what-
       is-hamming-distance.
[20]   What is Object Tracking - An Introduction. en-US. https://viso.ai/
       deep-learning/object-tracking/. July 2021. (Visited on 08/05/2021).