PEDESTRIAN DETECTION IN DIFFERENT LIGHTING CONDITIONS
                     USING DEEP NEURAL NETWORKS
                          Jason Nataprawira, Yanlei Gu, Koki Asami, Igor Goncharenko
 College of Information Science and Engineering, Ritsumeikan University, Japan, guyanlei@fc.ritsumei.ac.jp


ABSTRACT

Pedestrian safety is one of the most significant issues in the development of advanced driver assistant systems and
autonomous vehicles. DNN (Deep Neural Network) or deep learning has been effectively implemented through
many applications, especially on object classification. In addition, several famous DNNs, e.g. Faster R-CNN
(Faster Region Convolutional Neural Network), YOLO (You Only Look Once) and SSD (Single Shot Detector),
are applied for pedestrian detection recently. However, most pedestrian detection research only dealt with the
detection at the daytime or nighttime. A few research focused on pedestrian detection at both daytime and
nighttime environments. This paper evaluates and compares the performance of DNN-based pedestrian detection
algorithm YOLO at both daytime and nighttime environment. The evaluation was conducted on a pedestrian
dataset which includes RGB images captured from both daytime and nighttime conditions. The experiment result
indicates that the performance of DNN-based pedestrian detection is significantly affected due to the lighting
conditions. In the daytime condition, 45% precision on person detection could be achieved, but only 20% precision
is obtained in the nighttime condition.
Key words: Pedestrian Detection, Lighting conditions, Autonomous Driving.

1. INTRODUCTION

Pedestrian safety is one of the most significant issues in the development of modern transportation. In 2017,
European Commission released a report that implies about 21% of traffic accidents were caused by pedestrian
(2017 Road Safety Statistics: What Is behind the Figures?, 2017). Advanced driver assistant systems and
autonomous vehicles are intensively developed to reduce accidents and improve the effectiveness of transportation.
However, the current achievements are still inadequate, e.g. about 65 traffic accidents of Tesla autonomous
vehicles involve pedestrians (“Tesla Deaths,” 2020). As a result, pedestrian detection becomes an extremely
important task before the autonomous vehicles are commercialized.
    Recently, Deep Neural Network (DNN) or deep learning has been effectively used for many applications,
especially for object detection (Zhao et al., 2019). Since pedestrian detection is a part of object detection tasks,
researchers have studied in applying DNN to pedestrian detection (Zhang et al., 2016). Similarly, for some specific
tasks, researchers needed to propose a new DNN in order to fit it into pedestrian detection task as what Tian et al.
(2015) introduced.
    Despite the successful implementation of DNN towards pedestrian detections, there are still some obstacles in
pedestrian detection domain. Hwang et al. (2017) mentioned one of them is lighting condition. Most pedestrian
detection research only dealt with pedestrian detection at daytime or nighttime. Only a few research focuses on
pedestrian detection at both daytime and night environments. Nonetheless, autonomous vehicle should behave
perfectly in all light conditions. In addition, it is better to develop the unified algorithm and system for pedestrian
detection in all light conditions to avoid the switching between the daytime and nighttime model, because the
correct switching is also a challenging problem.
    This paper attempted to evaluate and compare the performance of DNN-based pedestrian detection algorithm
at nighttime and daytime environment. This paper shows how the performance of DNN-based pedestrian detection
is affected due to different light conditions. The results of this research can inspire the further development of
pedestrian detection for autonomous vehicle, e.g. the relevancy of DNN-based pedestrian detection, and the
necessity of hardware improvement for the pedestrian detection in different lighting conditions.
    This paper is organized in five sections. Firstly, the backgrounds are introduced at the beginning. Then, related
works of pedestrian detection and DNN techniques are explained in the second section. Following that, the
methodology is explained. The fourth section presents the experimental results. The final section concludes the
paper and discusses possible future works.


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

 IICST2020: 5th International Workshop on Innovations in Information and Communication Science and Technology, Malang, Indonesia
                                                                                         Nataprawira J., Gu Y., Asami K., Goncharenko I.


2. RELATED WORKS

To begin with, Deep Neural Network (DNN) or deep learning has been developed rapidly for a decade.
Particularly, the suitable DNN network for object detection is known as Convolutional Neural Network (CNN).
The CNN method for object detection was firstly proposed by Girshick et al. (2014). It is called R-CNN (Region
Convolutional Neural Network). The method works by generating 2000 proposals and then obtains the features to
be fed into the network. However, the first establishment never performed well due to the repetition of region
proposals, resulting in slow computation time. Consequently, Girshick (2015) again improved the method by
creating Fast R-CNN. A novel method from Fast R-CNN is customizing the output layer by branching it to two
layers: “cls” regressor for classification task and “bbox” regressor for regression task. This results in the network
being capable of running classification and regression tasks simultaneously, hence the faster computation time.
Additionally, Fast R-CNN is perfected by Faster R-CNN (Ren et al., 2017). It introduced a Region Proposal
Network (RPN) layer and an anchor. The former is responsible of generating region proposals or features from the
input image, allowing the network to learn where to locate region proposals by itself. The latter, however, is the
center of each sliding window. The illustration of anchor is depicted in figure 1. The anchor becomes the
foundation of YOLO (You Only Look Once) (Redmon et al., 2016) method. The R-CNN family networks and
YOLO have been widely used for pedestrian detection in daytime (Lan et al., 2018; Tomè et al., 2016; Zhang et
al., 2016).


Fig. 1. Constructing anchors at sliding window in Faster R-CNN (Ren et al., 2017)

    In terms of pedestrian detection for nighttime environment, a few researchers have inspected this problem.
First, multispectral method is one of the famous methods for detecting pedestrian detection at nighttime. When the
multispectral pedestrian dataset was published by Hwang et al. (2015), a research on this multispectral dataset
followed immediately. Choi et al. (2016) implemented CNN by inputting both RGB images and FIR (far-infrared)
or thermal images to the CNN at the same time. Similarly, SSD (Single Shot Detector) (Liu et al., 2016) was also
applied for multispectral pedestrian detection (Hou et al., 2018). Despite applying SSD directly, the authors applied
pixel-level image fusion where it alters the pixel-level to obtain the best feature information. Furthermore, RPN
was applied for multispectral method (Konig et al., 2017). Additionally, boosted decision trees (Zhang et al., 2016)
was utilized for the classification task. By combining RGB and thermal images into RPN, it proved to produce
better results.
    In contrast, Kruthiventi et al. (2017) proposed a method which can extract multi-modal like features of thermal
images. They only used RGB images, but they were capable of extracting features from them. They utilized
ResNet50 as the base network to produce two networks called “ResNet-teacher” and “ResNet-student”. The
overview of the network is shown in figure 2. They claimed their “ResNet-student” network has the best average
miss rate compared to other proposed pedestrian detection at nighttime environment by using only RGB images.
    While most researchers focused more in leveraging the model in multispectral method, Chebrolu and Kumar
(2019) applied Faster R-CNN (Ren et al., 2017) for pedestrian detection at daytime and nighttime. They proposed
“brightness awareness model, where it is capable of detecting the light environment whether it is day or night, and
also detecting pedestrian afterwards. For daytime, they used RGB camera, whereas for nighttime they used thermal
images.


98


 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                     Pedestrian Detection in Different Lightning Conditions Using Deep Neural Networks


Fig. 2. ResNet-teacher and ResNet student networks architecture (Kruthiventi et al., 2017)

    This research focuses on pedestrian detection at both daytime and night environments. An unified algorithm
YOLO v3 (Redmon and Farhadi, 2018) is used for pedestrian detection in all light conditions. By comparing the
performance of pedestrian detection in different light conditions, this paper shows how the performance of DNN-
based pedestrian detection is affected by different light conditions. The result of this paper can be used as a
reference for the development of the pedestrian safety function of autonomous vehicles. In this research, the usage
of RGB image is the main focus in assessing the performance of the pedestrian detection task. An open source
code (Packyan, 2019) of single-stage detector of YOLO v3 (Redmon and Farhadi, 2018) was adopted to complete
this research as it has a better performance in terms of processing time than Faster R-CNN.

3. METHODOLOGY

3.1     Algorithm
YOLO (Redmon et al., 2016; Redmon and Farhadi, 2017, 2018) was used in this experiment. Different from the
aforementioned R-CNN family methods, YOLO is classified as a single-stage detector. In other words, it means
classification and regression tasks are run simultaneously.
    Three YOLO versions have been published from 2016. The YOLO v1 (Redmon et al., 2016) pioneered the
single-stage detector classifier. As depicted in figure 3, YOLO divides an image into S × S grids. Next, each grid
detects 2 bounding boxes whose parameters are x, y, width, height, and confidence. The role of x and y in YOLO
is similar to that of the anchor in Faster R-CNN. Confidence is needed for declaring whether an object exists at
the image by comparing IoU (Intersection over Union) to the ground truth bounding box.


Fig. 3. General Flow of YOLO (Redmon et al., 2016)

    YOLO utilizes Darknet (Redmon, 2016), specifically Darknet19 as the network architecture. It is developed
by 24 convolutional layers and 2 FC layers. Similarly, 1×1 reduction layers are used to lessen the features space,
followed by 3×3 convolutional layers. To calculate the loss, it is detailed in equation (1):

                                                                                                                                    99


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                 Nataprawira J., Gu Y., Asami K., Goncharenko I.


                                                  𝑆𝑆 2    𝐵𝐵
                                                     𝑜𝑜𝑜𝑜𝑜𝑜
                                   ⋋𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 � � 𝟙𝟙𝑖𝑖𝑖𝑖 [(𝑥𝑥𝑖𝑖 − 𝑥𝑥�𝑖𝑖 )2 + (𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 ]
                                           𝑖𝑖=0 𝑗𝑗=0
                                         𝑆𝑆 2𝐵𝐵                                          2
                                                  obj                  2
                            + ⋋coord � � 𝟙𝟙𝑖𝑖𝑖𝑖 ���𝜔𝜔𝑖𝑖 − �𝜔𝜔     �𝑖𝑖 � + ��ℎ𝑖𝑖 − �ℎ�𝑖𝑖 � �
                                     𝑖𝑖=0 𝑗𝑗=0
                                                   𝑆𝑆 2 𝐵𝐵                                                                                        Fehler!
                                                                             2
                                                           obj
                                                + � � 𝟙𝟙𝑖𝑖𝑖𝑖 �𝐶𝐶𝑖𝑖 − 𝐶𝐶̂𝑖𝑖 �                                                                   Textmarke
                                                          𝑖𝑖=0 𝑗𝑗=0                                                                            nicht
                                                              𝑆𝑆 2 𝐵𝐵
                                                                                     noobj                  2                                  definiert.(1)
                                          + ⋋noobj � � 𝟙𝟙𝑖𝑖𝑖𝑖                                �𝐶𝐶𝑖𝑖 − 𝐶𝐶̂𝑖𝑖 �
                                                                𝑖𝑖=0 𝑗𝑗=0
                                                𝑆𝑆 2
                                                         obj
                                        + � 𝟙𝟙𝑖𝑖                    �                (𝑝𝑝𝑖𝑖 (𝑐𝑐) − 𝑝𝑝̂𝑖𝑖 (𝑐𝑐))2
                                                𝑖𝑖=0           𝑐𝑐 ∈ 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐


where it associates coordinates error, objectness score, and classification error. All notations follow the explanation
in the previous paragraph. C represents the confidence score, 𝟙𝟙obj
                                                                 𝑖𝑖 marks the object exists, and 𝟙𝟙obj
                                                                                                    𝑖𝑖𝑖𝑖 is extended from
the previous notation where j is the bounding box predictor in cell i. In addition, the symbols with a hat represent
the ground truth values, and the symbols without hat are the predicted values.
    Following the success of YOLO v1, YOLO v2 (Redmon and Farhadi, 2017) was then published. It tried to
improve YOLO v1, where it was still incapable of detecting occlusion objects or many objects aside. Several
improvements in YOLO v2 are batch normalization, new architecture, and the new anchor boxes approach.
Through batch normalization, it improved the mAP for more than 2%. In addition to that, YOLO v2 used Darknet-
19 which consists of 19 convolutional layers and 5 max-pooling layers. The critical improvement was introducing
the anchor boxes for classification tasks. The anchor box in YOLO v2 is the center of the bounding box which
functions to predict bounding box, similar to the one introduced in Faster R-CNN (Ren et al., 2017).
    Finally, YOLO v3 was released in 2018 (Redmon and Farhadi, 2018). Compared to its predecessor, there were
no significant changes. First, they used a new architecture called Darknet-53. As its name implies, it has 53
convolutional layers, which makes it deeper than Darknet-19. Additionally, it uses 3×3 sizes with 1×1 layer.
Meanwhile, YOLO v3 also improved the performance by scoring the bounding box prediction then applying
logistic regression towards the prediction. If the bounding box prediction covers the ground truth object more than
any previous bounding box prediction, it is scored as 1. Otherwise, it refuses the prediction. In addition, YOLO
v3 implements 3 different scales of predictions. This method was adopted from the Feature Pyramid Networks
(FPN) concept (Lin et al., 2017). For every detection, it detects three parts: boundary box, objectness, and 80 class
predictions. Afterwards, it upsampled the previous 2 layers, then through several convolutional layers, it predicts
a similar tensor. At last, the same method is applied to determine the final result. In this research, YOLO v3 is
used for the evaluation of the performance of the pedestrian detection in different light conditions.

3.2     Dataset Preparation

KAIST Multispectral Pedestrian dataset (Hwang et al., 2015) is one of the famous datasets for the evaluation of
pedestrian detection in different light conditions. It is a dataset produced by Korea Advanced Institute of Science
and Technology in South Korea. It has two types of pictures, one is captured from an RGB camera and the other
is captured from an infrared camera. There are 3 places recorded, campus, downtown, and road. Each place has
day and night scenarios.
    This dataset follows annotation format as used in Caltech Dataset (Dollar et al., 2010). All annotations used
pixels format where the object locations are. They are saved in .txt file for each file, where one file is associated
with the same filename for both an RGB image and an infrared image. This dataset has labelled three objects:
person, people, and cyclist. The label “people” refers to a group of several persons, although there is not any clear
explanations in defining a group of several persons. Additionally, if a group of several persons have been labeled
as “people”, “person” label is not labeled again on each person object. However, sometimes there are some images
which have several “person” label but not categorized as “people” label. In the training dataset, the objects are
contained in 14100 images of daytime scenario and in 8058 images of nighttime scenario. In addition, about 2800


100


 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                     Pedestrian Detection in Different Lightning Conditions Using Deep Neural Networks

daytime images and 1600 nighttime images in the test dataset are used for the evaluation of the performance of the
pedestrian detection.

Table 1. Training and Validation Configuration.
    Parameter                                                          Value
    Epoch                                                              50
    Batch Size                                                         10
    Weights                                                            Darknet-53.conv.74
    Learning Rate                                                      0.001
    IoU threshold                                                      0.5
    Confidence threshold                                               0.8
    NMS threshold                                                      0.4

3.3    Training and Validation

The PyTorch open source code used in this research was created by Packyan (2019). The configuration was the
same as the original YOLO v3. The architecture of the network was modified to adapt to the number of classes.
As KAIST Multispectral Pedestrian Dataset has three objects, some convolutional layers follow this filter equation:

                                             filter= [3* (4+1+number of classes)]                                                   (2)

where 3 stands for 3 prediction boxes, 4 stands for 4 bounding box offsets, and 1 objectness prediction. From
equation 2, it yields 24 filters for specific convolutional layers.
    For training configuration, the parameters are explained in table 1. Number of epochs was 50. Batch size was
10. Darknet-53 weights, which is provided by Redmon (Redmon, 2016), were loaded. Learning rate was set to
0.001. In terms of hardware configuration, it was trained using GPU NVIDIA RTX 2070 Super and CPU Intel
Xeon E5-1650 v4 3.60GHz.
    For validation configuration, mAP was used to evaluate the performance. Additionally, some detected sample
images will be shown. The threshold can be seen in detail in table 1. IoU (Intersection over Union) threshold was
0.5. Confidence threshold was set to 0.8, and NMS (Non-Maximum Suppression) threshold was set to 0.4.

4. EXPERIMENTAL RESULTS

After each training epoch, a validation was conducted on the test dataset and all mAPs were collected for each
epoch. They have been plotted in figure 4. The solid line denotes the performance in daytime, whereas the dashed
line indicates the performance in nighttime. In the daytime training, the detection performance for the “person”
was around 45%. As for the “people” and “cyclist” detections, the precision values were relatively low. This
occurred due to the problem of class imbalance among “person” label, “people” label, and “cyclist” label, and the
training images which include “people” label and “cyclist” label were very few. As a result, this behavior affected
the average of all classes mAP. Thus, this paper focuses on the discussion about the detection of the “person.”
Figure 4 clearly implies the pedestrian detection algorithm performed better in the daytime environment compared
to nighttime environment.
    Figure 5 shows sample pictures of the ground-truth and the detection result in the daytime environment. Each
row shows two pairs of pictures, where the left picture of the pair is the ground-truth and the right picture of the
pair is the detection result. In the ground-truth picture, “person”, “people”, and “cyclist” bounding box are colored
green, pink, and yellow respectively. The detection result has a bounding box and a text label for visualization. In
fact, the last two images of figure 5’s first row present the problem that “people” is incorrectly recognized as
several “persons”. This also proves to be the main issue in the daytime experiment. In the future, labels will be
optimized to overcome this issue.
    Figure 6 visualizes the experiment results in nighttime environment. Basically, the pictures are dependent on
the environment: darkness and over-exposure. In both situations, “person” could not be detected correctly as shown
in the both pairs of the first row of figure 6. Especially, in the over-exposure environment, pedestrian’s appearance
is blurred with the background. Consequently, it causes misdetection in the over-exposure environment. For the
darkness environment, the pedestrian is sometimes not visible the RGB images as presented in the first pair of the
first row. In the future, the improvement may focus on darkness and over-exposure.


                                                                                                                                   101


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                        Nataprawira J., Gu Y., Asami K., Goncharenko I.


Fig. 4. Mean Average Precision for both daytime (solid line) and nighttime test (dashed line)

5. CONCLUSION

This paper we have presented a performance comparison of the DNN-based pedestrian detection in different
lighting conditions, in order to answer the research question: how much the performance of DNN-based pedestrian
detection is affected by the lighting conditions? This research adopted YOLO as the pedestrian detection algorithm
and assessed the performance of YOLO on KAIST Multispectral Pedestrian dataset. The experimental results
indicated that the performance of DNN-based pedestrian detection was significantly affected by the lighting
conditions. In the daytime condition, 45% precision for person detection could be achieved, but only 20% precision
was obtained for person detection in the nighttime condition. One reason for the incorrect detection results in the
daytime experiment is because of the type of labels in dataset. By comparing the detection results in daytime and
nighttime environments, this research found that both darkness and over-exposure could affect the performance of
DNN-based pedestrian detection in nighttime environments.
    In the future, the infrared camera may be considered to improve the problem caused by darkness, and the
brightness suppression and adaption on RGB camera may be studied for reducing the incorrect detection in over-
brightness environments. In addition, the re-labeling of the dataset may also be conducted for more accurate
evaluation.


102


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                     Pedestrian Detection in Different Lightning Conditions Using Deep Neural Networks


      Ground truth                         Detection                        Ground truth                         Detection

Fig. 5. Sample pictures of ground truth and detection results at daytime environment


      Ground truth                         Detection                       Ground truth                        Detection

Fig. 6. Sample pictures of ground truth and detection results at nighttime environment


                                                                                                                                   103


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                         Nataprawira J., Gu Y., Asami K., Goncharenko I.


REFERENCES

2017 Road Safety Statistics: What Is behind the Figures? (2017). Brussels, Belgium.
Chebrolu, K.N.R., and Kumar, P.N. (2019). Deep learning based pedestrian detection at all light conditions, In:
     Proceedings of the 2019 IEEE International Conference on Communication and Signal Processing, ICCSP
     2019, 838–842. https://doi.org/10.1109/ICCSP.2019.8698101
Choi, H., Kim, S., Park, K., and Sohn, K. (2016). Multi-spectral pedestrian detection based on accumulated object
     proposal with fully convolutional networks, In: Proceedings - International Conference on Pattern
     Recognition. https://doi.org/10.1109/ICPR.2016.7899703
Girshick, R., Donahue, J., Darrell, T., Malik, J., Berkeley, U.C., and Malik, J. (2014). Rich feature hierarchies for
     accurate object detection and semantic segmentation, In: Proceedings of the IEEE Computer Society
     Conference on Computer Vision and Pattern Recognition, 1, 5000. https://doi.org/10.1109/CVPR.2014.81
Girshick, R. (2015). Fast R-CNN, In: Proceedings of the IEEE International Conference on Computer Vision.
     https://doi.org/10.1109/ICCV.2015.169
Hou, Y.L., Song, Y., Hao, X., Shen, Y., Qian, M., and Chen, H. (2018). Multispectral pedestrian detection based
     on deep convolutional neural networks. Infrared Physics and Technology, 94, 69–77.
     https://doi.org/10.1016/j.infrared.2018.08.029
Hwang, S., Park, J., Kim, N., Choi, Y., and Kweon, I.S. (2015). Multispectral pedestrian detection: Benchmark
     dataset and baseline, In: Proceedings of the IEEE Computer Society Conference on Computer Vision and
     Pattern Recognition, 07-12-June, 1037–1045. https://doi.org/10.1109/CVPR.2015.7298706
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., and Teutsch, M. (2017). Fully Convolutional Region
     Proposal Networks for Multispectral Person Detection, In: IEEE Computer Society Conference on Computer
     Vision and Pattern Recognition Workshops, 2017-July, 243–250. https://doi.org/10.1109/CVPRW.2017.36
Kruthiventi, S.S.S., Sahay, P., and Biswal, R. (2017). Low-light pedestrian detection from RGB images using
     multi-modal knowledge distillation, In: 2017 IEEE International Conference on Image Processing (ICIP),
     4207–4211. https://doi.org/10.1109/ICIP.2017.8297075
Lan, W., Dang, J., Wang, Y., and Wang, S. (2018). Pedestrian detection based on yolo network model, In:
     Proceedings of 2018 IEEE International Conference on Mechatronics and Automation, ICMA 2018.
     https://doi.org/10.1109/ICMA.2018.8484698
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017). Feature pyramid networks for
     object detection, In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition,
     CVPR 2017, 2017-Janua, 936–944. https://doi.org/10.1109/CVPR.2017.106
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). SSD: Single shot
     multibox detector. Lecture Notes in Computer Science, 9905 LNCS, 21–37. https://doi.org/10.1007/978-3-
     319-46448-0_2
Packyan. (2019). PyTorch-Yolov3-kitti. Retrieved from https://github.com/packyan/PyTorch-YOLOv3-kitti
Redmon, J. (2016). Darknet: Open Source Neural Networks in C. Retrieved from http://pjreddie.com/darknet/
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object
     detection, In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
     Recognition. https://doi.org/10.1109/CVPR.2016.91
Redmon, J., and Farhadi, A. (2017). YOLO9000: Better, faster, stronger, In: Proceedings - 30th IEEE Conference
     on Computer Vision and Pattern Recognition, CVPR 2017. https://doi.org/10.1109/CVPR.2017.690
Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. Retrieved from
     http://arxiv.org/abs/1804.02767
Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region
     Proposal Networks, In: IEEE Transactions on Pattern Analysis and Machine Intelligence.
     https://doi.org/10.1109/TPAMI.2016.2577031
Tesla Deaths. (2020). Retrieved January 31, 2020, from https://www.tesladeaths.com/
Tian, Y., Luo, P., Wang, X., and Tang, X. (2015). Deep learning strong parts for pedestrian detection, In:
     Proceedings        of      the     IEEE        International    Conference       on      Computer       Vision.
     https://doi.org/10.1109/ICCV.2015.221
Tomè, D., Monti, F., Baroffio, L., Bondi, L., Tagliasacchi, M., and Tubaro, S. (2016). Deep Convolutional Neural
     Networks for pedestrian detection. Signal Processing: Image Communication, 47, 482-489.
     https://doi.org/10.1016/j.image.2016.05.007
Zhang, L., Lin, L., Liang, X., and He, K. (2016). Is faster R-CNN doing well for pedestrian detection? Lecture
     Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes
     in Bioinformatics). https://doi.org/10.1007/978-3-319-46475-6_28
Zhao, Z. Q., Zheng, P., Xu, S. T., and Wu, X. (2019). Object Detection with Deep Learning: A Review, In: IEEE
     Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2876865
104


 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).