<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of different types of vehicles from aerial imagery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonas Uus</string-name>
          <email>jonas.uus@bpti.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomas Krilavicˇius</string-name>
          <email>tomas.krilavicius@bpti.lt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Applied Informatics faculty, Vytautas Magnus University</institution>
          ,
          <addr-line>Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vytautas Magnus University</institution>
          ,
          <addr-line>Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
          ,
          <institution>Baltic Institute of Advanced Technology</institution>
          ,
          <addr-line>Vilnius</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>80</fpage>
      <lpage>85</lpage>
      <abstract>
        <p>-Accurate detection of vehicles in large amounts of imagery is one of the harder objects' detection tasks as the image resolution can be as high as 16K or sometimes even higher. Difference in vehicles size and their position (direction, they face) is another challenge to overcome to achieve acceptable detection quality. The vehicles can also be partially obstructed, cut off or it may be hard to differentiate between object colour and its foreground. Small size of vehicles in high resolution images complicates the task of accurate detection even more. CNN is one of the most promising methods for image processing, hence, it was decided to use their implementation in YOLO V3. To deal with big high resolution images method for splitting/recombining images and augmenting them was developed. Proposed approach allowed to achieve 81.72% average precision of vehicles detection. Results show practical applicability of such approach for vehicles detection, yet to reach higher accuracy on tractor, off-road and van categories of the vehicles the count in different vehicle categories needs to be balanced, i.e. more examples of the mentioned vehicles are required.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Vehicles’ detection from aerial photography is a very
important and quite a difficult task, especially when it is performed
in real time or high resolution aerial or satellite images are
used for vehicle detection, such as 18000x18000 px. resolution
images in COWC [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] dataset. As the drones are used in
more and more sectors (according to "cbinsights", currently
unmanned aerial vehicle (UAV) could be used in 38 different
sectors [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), for that reason the volume of video and photo
material from drones is increasing, the need to create solution
for making use of this unprecedented amount of data has
become pronounced (at the moment of writing this paper
YouTube returns more than 3.3 million results with "aerial
footage" query). For human to annotate vehicles from videos
or high resolution images it takes a lot of resources. Thus
vehicle detection task needs to be automated.
      </p>
      <p>
        In this paper we investigate applicability of Convolutional
Neural Networks. Due to good performance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we use YOLO
V3 (You only look once) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] CNN as a tool to apply proposed
splitting/merging images method.
      </p>
      <p>Moreover, we split the image into fixed overlapping
rectangular frames (a sliding window method).
c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)</p>
      <p>
        Some results show that Yolo V2 performs quite well with
aerial imagery only with applied modifications: "First making
the net shallower to increase its output resolution. Second
changing the net shape to more closer match the aspect ratio
of the data." [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In another vehicles detection solution newer YOLO version
was used [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Images were taken from 3 publicly available
datasets: VEDAI, COWC and DOTA. The model had good test
results for small objects, rotating objects, as well as compact
and dense objects, with 76.7% mAP and 92% recall.
      </p>
      <p>None of these solutions used splitting and remerging
technique with images’ overlapping. They used already presplitted
images.</p>
    </sec>
    <sec id="sec-2">
      <title>II. PROBLEM</title>
      <p>As computing speed is increasing, and technology is
advancing neural networks are being optimised, it had been
decided to apply best image augmentation/splitting/remerging
methods for vehicle detection. In the application of neural
network the following set of problems becomes apparent:
1) Having a variety of different resolution images in dataset
(HD, Full HD, 2K ...).
2) Uneven vehicles’ sizes in a dataset which are influenced
by different ground sample distances (GSD).
3) Uneven vehicles’ count in categories by having more
cars than other vehicle categories combined.
4) Almost all of the fully connected convolutional neural
networks have a fixed-size first layer and all images
should be resized to fit the first layer.
5) Vehicles can be partially obstructed (only part of the
vehicle could be seen).
6) Hard to differentiate vehicles from foreground (for
example, black car parked in a shadow).
7) Vehicles may be facing multiple directions depending
on the camera flight direction and its rotation.
8) Available vehicle detection solutions are limited to
detecting a small number of features.
9) After re-merging splitted images, the same vehicle may
be detected multiple times.</p>
      <p>
        Currently existing vehicles’ detection solutions are subject
to company trade secrets and companies do not openly discuss
technical specifications and application results (for example
web platform Supervisely [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). That is why it is difficult
to adapt or even sometimes impossible to add additional
functionality, some solutions are based on older versions of
neural networks (for as long as they are functional) and they
detect few vehicle categories. For example, one of the vehicle
detection solutions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] detects vehicle features based only on
their size (either a small or a large vehicle). Also, currently
available solutions which uses CNNs mostly work with fixed
size input images or rescale them to fixed size as existing
deep convolutional neural networks (CNNs) require a
fixedsize (e.g. 224x224) input images [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As rescaling images
is detrimental for small objects features within images, the
images thus are split into smaller pieces, then after the process
of individual detection of vehicles in each piece, every image
is remerged into full sized image. For example, if a high
resolution image such as 4K is rescaled to 608 by 608 pixels,
then a rear glass of a car is about 20 by 10 px. after rescaling,
and thus the window width becomes about 6 times smaller and
its height about 3.5 times smaller and the size of a window
decreases to about 6 by 6 px., as a result it becomes harder
to differentiate between a van and a car and the probability
of misidentification increases. In case of multiple detections
in the overlapping image pieces, the NMS (Non-Maximum
Suppression) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is used to remove duplicate detections as
NMS retains only the overlapping bounding box with highest
probability (if its area overlaps more than preset value). The
herein discussed practice of YOLO application encompasses
the attempt to solve all of the above problems.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. DATASET</title>
      <p>
        MAFAT tournament [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] provided images which were used
for training, validation and csv file with boxes and classes,
but the csv file was created with classification task in mind
and it was not used. The images were adapted for object
detection task as the original dataset was initially created
for classification task and not every object was annotated, or
false positives [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (also called a false detection, vehicle is
annotated where there is none) were assigned. Every image
was manually annotated and some of them were removed.
Those images that were removed were not taken orthogonal
to ground, they were taken at an angle. Only images with
topdown view were kept. For image augmentation horizontal and
vertical flipping and rotation at 45 intervals was used.
      </p>
      <p>Following is the count of dataset images:
1) 1712 images were chosen as training images, about 80%
of original training dataset images.
2) After splitting training images into 500x500 pixel pieces,
images count rose to 9141.
3) 1986 images were chosen for validation, about 78% of
original validation dataset images.
4) 12 227 vehicles were annotated manually by me in the
training dataset, Fig. 1.
5) 10 914 vehicles were annotated manually by me in the
validation dataset, Fig. 2.</p>
      <p>The number of vehicles used in the training images is
presented in Fig. 1 and the validation datasets are presented
in Fig. 2.
The characteristics of dataset images:
1) Images were taken from a variety of locations, some
were taken in cities, others in rural areas.
2) Images were taken at a different time of a day.
3) Vehicles were lit from different sides.
4) The resolution of images were different from 900x600
px. to 4010x3668px.
5) Some parts of images were darkened out (for example
one half of image was made completely black, while
another half of image has picture).
6) GSD (Ground sample distance) of images varied
between 5 and 15 cm.
7) Objects in images might have been obstructed by trees
or cut off, only part of vehicle might have been seen
(for example, a car parked in a garage, a car near the
edge of the image).</p>
      <p>Couple of images examples taken from dataset Fig 3.</p>
    </sec>
    <sec id="sec-4">
      <title>See variation in image resolutions in table I. The categories of vehicles that were being detected: Fig. 3: Examples of images in dataset 82</title>
      <p>The above dataset was considered sufficient for the evaluation
of developed method.</p>
    </sec>
    <sec id="sec-5">
      <title>IV. PROPOSED SOLUTION The objective was to develop a method for identification of diverse vehicles.</title>
      <sec id="sec-5-1">
        <title>Image resolution and sizes</title>
        <p>
          The use of CNNs is complicated due to the dataset having
a variety of different resolution images (HD, Full HD, 2K
...) and uneven vehicles’ sizes in a dataset, see Sect. III.
The different sizes in the images are influenced by different
ground sample distances (GSD) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. As almost all of the
convolutional neural networks have a fixed-size first layer
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], all images are resized to that layer size, so if an image
resolution is as high as 16K and it is being resized to, for
example, 608x608 px. all of the small vehicle features will
disappear from the subsequent image. For this reason we
propose to split the image into fixed overlapping rectangular
frames (a sliding window method). This produces double
detection problem as vehicles may be detected on both images.
To remove duplicates, NMS (Non-Maximum Suppression)
is used [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. If two or more bounding boxes overlap with
same vehicle category, then the box with highest detection
probability is kept, while the others are removed. Amount of
overlapping is determined by finding largest possible vehicle
size in the dataset. This ensures that if the vehicle was cut
off on one of the images, it would be fully visible in another
image.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Image obstruction</title>
        <p>One more problem with vehicle detection in images is
that the vehicles can be partially obstructed (only part of the
vehicle could be seen) for example when car are half parked in
garage, or when car are parked alongside tree ant tree branches
obstruct car features, or when car is on the edge of image.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Orientation</title>
        <p>As vehicles orientation in images are not constant they
may be facing multiple directions depending on the camera
flight direction and its rotation. To solve different vehicles
orientation problem, the images are augmented with random
rotation at 45 intervals Fig. 4.</p>
        <p>(a) rotated 0 (original)
(b) rotated 45
(c) rotated 90
(d) rotated 135
(e) rotated 180
(f) rotated 225
(g) rotated 270
(h) rotated 315</p>
        <p>To increase images’ count, images were augmented by
rotating them at 45 degrees intervals. Additionally, dataset images
were augmented by flipping them vertically, horizontally and
by flipping both horizontally and vertically Fig. 5.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>V. EXPERIMENTS</title>
      <sec id="sec-6-1">
        <title>A. Tools</title>
        <p>For experiments, convolutional neural network YOLO V3
was used on Darknet framework. YOLO V3 architecture is
presented in Fig. 6.</p>
        <p>
          On original YOLO repository the problem was that while
training, detection loss climbed to infinity, when any single
parameter was changed, thus another forked repository [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
from github was used instead, as it does not have the same
issue. For YOLO V3 to work with splitting/ merging workflow,
original source code was modified. To know when the training
had to be terminated, an average loss value was observed. It
(a) original
(b) flipped horizontally
(c) flipped vertically
(d) flipped vertically and
horizontally
        </p>
        <p>
          Fig. 6: YOLO V3 architecture [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
was observed that if any bigger change was to be carried out on
neural network, such as adding new object category, the neural
network should be trained from previous weights in which
neural network had been more generic at detections. Training
after changing parameters from scratch would be even better,
but that would take longer. It was observed that YOLO detects
new class better when previous best weights are not used.
        </p>
        <p>Also, it is hard to differentiate an off-road from a car when
looking from above, as the body shape of an off-road may
differ only slightly (for example, be wider), thus off-road was
annotated as a car. Jeep category is hard too, as the only
difference between a car and a jeep is that a jeep has a rear
spare tire attached or it has a truck bed (like a pickup).</p>
      </sec>
      <sec id="sec-6-2">
        <title>B. Dataset</title>
        <p>Vehicle categories like cars, jeeps, large vehicles, vans
and tractors need to be detected from the aerial photographs
and their position needs to be marked by drawing bounding
box around each the object. At first, cars’ class had been
divided into hatchbacks and sedans, but during manual objects’
annotation it was observed, that if a car is half obstructed and
only its front part can be seen, it is impossible to tell whether
it is a sedan or a hatchback as the only differentiating factor
is the size of rear glass and only the trunk/ boot can be seen.
For this reason, sedans and hatchbacks were merged into one
vehicle class.</p>
        <p>As the dataset contained mostly cars, YOLO learned that if
unsure, it should ascribe an object to a car category, that way
it could reach better mAP result in a long run than guessing
rarer classes. This non-homogeneous dataset problem shows
up, if dataset has different number of vehicles for given class
in dataset. This non-homogeneous dataset problem could be
solved by adding images in which rarer classes’ vehicles are
shown or by augmenting a larger number of rearer class images
than images with other vehicles.</p>
        <p>Cross-validation statistical method was used during YOLO
training, the dataset was divided into images for training
and validation. The neural network can not see any of the
validation images during training, it can only see them when
its performance is validated. This method is used to prevent
overfitting. The following modifications were performed for
the purposes for training and validation images in a dataset:
1) Images’ slicing/ overlapping parameter values
modification.
2) Fixing wrongly annotated vehicle data and their
bounding boxes’ locations in the datasets.
3) Changing vehicles’ count of classes by adding, merging
existing, then reannotating dataset.
4) Choosing images from dataset for training/ validation.
5) Experimenting with images’ manipulations (vertical/
horizontal flipping, image rotation), this drastically
improved dataset size. These manipulations were manually
coded as YOLO, unlike Tensorflow, does not have these
image manipuliations integrated.</p>
        <p>The following modifications which were done on YOLO:
1) Change of YOLO layer resolution (mostly first layer, as
all images are resized to the same resolution as the first
layer size).
2) Experiments with different YOLO configurations and
different layers’ count.
3) Change of network parameters (such as anchors,
recalculating certain layer size after vehicles’ classes
modifications, learning speed).
4) Adding a module to darknet for easier work with split
images and for external communication with other
programs.</p>
      </sec>
      <sec id="sec-6-3">
        <title>C. Experiments results</title>
        <p>
          To evaluate performance PASCAL VOC evaluation metrics
were used and the results were compared using AP (average
precision) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. This metric uses Jaccard index [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] for
calculating IOU (intersection over union) to compare between
ground truth and detection boxes.
        </p>
        <p>After training the YOLO V3 neural network it managed
to detect cars with 78.69% average precision (AP) Fig. 7,
large vehicles with 44.85% average precision (AP) Fig. 8.
Other vehicle categories such as jeeps, vans and tractors were
detected but they were wrongly categorised. That was the
reason vehicle category detection average precision was very
low. To solve this problem, the dataset needs to have more
unified count of vehicles in every category.</p>
        <p>Fig. 8: Precision and recall curve for large vehicle category</p>
        <p>
          The above figures show how precision and recall are
corelated, for example, if we choose precision at 95%, then 45%
of cars were detected in validation images at that level of
precision. F-Score [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] at this precision level is equal to 0.61,
if recall increases to 80% then the precision drops to 75%.
F-Score at 75% is equal to 0.77. When all categories were
merged into one and then results were validated again, average
precision increased to 81.72% Fig. 9. This indicates that in
order detection precision is increased, YOLO V3 needs to
classify categories more accurately.
        </p>
        <p>Fig. 9: Precision and recal graph when all vehicles are merged
to one category</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>VI. CONCLUSIONS</title>
      <p>This application could be used for statistics (to count how
many vehicles are there in a given image), vehicles tracking,
prediction of further vehicle movement direction and
realtime vehicle detection from real time video feed. A vehicles’
detection application was created so as users could easily
configure it and make vehicles’ detection task easier. The user
only needs to input images and a couple of parameters to
execute vehicles’ detection with CNN.</p>
      <p>Results:
1) Dataset was prepared for vehicles detection task by
manually annotating all of the vehicles in dataset images.
2) Images’ were augmented to increase dataset size.
3) Method for combining splitting and joining images and
using convulutional neural network for vehicles
detection was proposed.</p>
    </sec>
    <sec id="sec-8">
      <title>YOLO V3 CNN</title>
      <p>4) Proposed method performance was tested by using
1) When YOLO V3 is used together with proposed method
is capable of detecting cars with 79% accuracy and large
vehicles with 45% accuracy.
2) When proposed method is used, YOLO V3 CNN still
has difficulty detecting characteristics of other vehicles,
such as off-road, tractors and vans which makes the final
detection result lower.</p>
      <p>resolution images.
3) Proposed method helps to avoid losing vehicles and their
features that would otherwise be lost by resizing high
4) The dataset used for training and validation should have
more unified count of vehicles categories (more photos
with tractors, large vehicles and jeeps should be added
to the dataset).</p>
      <p>For future work R-CNN and SSD networks will be trained
on Tensorflow framework as those networks are also widely
used CNN’s for object detection tasks and they will be tested
using same proposed method. Also, as currently used images’
available datasets and photos taken from drones. As the dataset
should have more unified count of vehicles categories, more
photos with tractors, large vehicles and jeeps should be added
to the dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Wesam</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sakla Kofi Boakye T. Nathan</surname>
            <given-names>Mundhenk</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Goran</given-names>
            <surname>Konjevod</surname>
          </string-name>
          .
          <article-title>A large contextual dataset for classification, detection and counting of cars with deep learning</article-title>
          .
          <source>arXiv:1609.04453</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>CBINSIGHTS.</surname>
          </string-name>
          <article-title>38 ways drones will impact society: From fighting war to forecasting weather, uavs change everything</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Redmon</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ali</given-names>
            <surname>Farhadi</surname>
          </string-name>
          . Yolo:
          <article-title>Real-time object detection</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Redmon</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ali</given-names>
            <surname>Farhadi</surname>
          </string-name>
          .
          <article-title>Yolov3: An incremental improvement</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1804</year>
          .02767,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Carlet</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bernard</given-names>
            <surname>Abayowa</surname>
          </string-name>
          .
          <article-title>Fast vehicle detection in aerial imagery</article-title>
          .
          <source>CoRR, abs/1709.08666</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>A vehicle detection method for aerial image based on yolo</article-title>
          .
          <source>Journal of Computer and Communications</source>
          , pages
          <fpage>98</fpage>
          -
          <lpage>107</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Supervise.</surname>
          </string-name>
          <article-title>The leading platform for entire computer vision lifecycle</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Alexey</surname>
          </string-name>
          .
          <article-title>Object detection on satellite images</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Shaoqing</given-names>
            <surname>Ren Jian Sun Kaiming He</surname>
          </string-name>
          , Xiangyu Zhang.
          <article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition</article-title>
          .
          <source>arXiv:1406.4729v1</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Rosebrock</surname>
          </string-name>
          .
          <article-title>Non-maximum suppression for object detection in python</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>yuvalsh. Mafat</surname>
          </string-name>
          challenge
          <article-title>- fine-grained classification of objects from aerial imagery</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Google</surname>
          </string-name>
          . Classification:
          <article-title>True vs. false and positive vs</article-title>
          . negative. Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <article-title>Wikipedia contributors. Ground sample distance</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Ayoosh</given-names>
            <surname>Kathuria</surname>
          </string-name>
          .
          <article-title>What's new in yolo v3? Accessed:</article-title>
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Alexey</surname>
          </string-name>
          .
          <article-title>Yolo-v3 and yolo-v2 for windows and linux</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Hui</surname>
          </string-name>
          .
          <article-title>map (mean average precision) for object detection</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          .
          <article-title>Jaccard index</article-title>
          . Accessed:
          <year>2019</year>
          .
          <volume>02</volume>
          .22.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Marina</surname>
            <given-names>Sokolova</given-names>
          </string-name>
          , Nathalie Japkowicz, and
          <string-name>
            <given-names>Stan</given-names>
            <surname>Szpakowicz</surname>
          </string-name>
          .
          <article-title>Beyond accuracy, f-score and roc: A family of discriminant measures for performance evaluation</article-title>
          . volume Vol.
          <volume>4304</volume>
          , pages
          <fpage>1015</fpage>
          -
          <lpage>1021</lpage>
          ,
          <year>01 1970</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>