<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining airborne images and open data to retrieve knowledge of construction sites</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical University of Darmstadt</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technical University of Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Construction site planning is based on both explicit knowledge, as retrieved from regulations, and implicit knowledge, arising from experience. To retrieve and formalize rules from implicit knowledge, past construction projects can be analyzed. In this paper, we present an image analysis pipeline to retrieve information on past construction sites from airborne images. We fuse machine learning based image analysis with georeferencing and openly available geospatial data to retrieve a detailed description with true dimensions of the construction site at hand.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Image analysis, as part of computer vision, is a heavily researched topic, that got even more
attention through recent advances in autonomous driving and machine learning related topics.
For effective and efficient image analysis and object recognition, machine learning algorithms
have been increasingly used during the last decades. In 2012, the convolutional neural network
(CNN) “AlexNet”
        <xref ref-type="bibr" rid="ref5">(Krizhevsky, Sutskever, and Hinton, 2012)</xref>
        achieved a top-5 error of 15.3%
in the prestigious ImageNet Large Scale Visual Recognition Challen
        <xref ref-type="bibr" rid="ref9">ge (Russakovsky et al.,
2015</xref>
        ). These results were surprisingly accurate at the time, proving the advantages of using
CNN. On this account, the software industry shifted towards using CNN for all machine
learning based image processing tasks (LeCun, Ben
        <xref ref-type="bibr" rid="ref9">gio, and Hinton, 2015</xref>
        ).
      </p>
      <p>
        Image analysis on construction sites, on the other hand, is a rather new topic. Since one of the
key aspects of machine learning is the collection of large datasets, current approaches focus on
data gathering.
        <xref ref-type="bibr" rid="ref16">Tajeen and Zhu (2014)</xref>
        present an image dataset containing numerous annotated
images of construction equipment, however centering on excavation phase (excavator, loader,
dozer, roller and backhoe) and images taken from ground.
        <xref ref-type="bibr" rid="ref6">Kropp, Koch, and König (2018</xref>
        )
detect indoor construction elements based on similarities, focusing on radiators. In the scope of
automated construction progress monitoring, Han et al. published an approach for Amazon
Turk based labelling
        <xref ref-type="bibr" rid="ref2">(Han and Golparvar-Fard, 2017)</xref>
        .
        <xref ref-type="bibr" rid="ref1">Bügler et al. (2017)</xref>
        combined
photogrammetric methods and video analysis to assess the progress of earthworks. To this end,
they created point clouds to measure the volume of excavated soil and detected truck dumpers
on images using foreground detection.
        <xref ref-type="bibr" rid="ref4">Jahr, Braun, and Borrmann (2018</xref>
        ) use an artificial
intelligence approach to detecting formwork elements on UAV imagery of construction sites.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>To generalize implicit knowledge from airborne images, we propose an image analysis pipeline
utilizing machine learning algorithms, georeferencing, as well as data retrieval (see Figure 1).
In a first step, we detect construction sites by using an object detection algorithm on the airborne
images. If at least one construction site is detected, an instance segmentation algorithm is used
to detect individual elements of the construction. In this paper, we concentrate on detecting
tower cranes, as they highly affect construction progress. To enable georeferencing, surveying
information on the images is required. To be able to estimate element dimensions, the images
are orthorectified. Additional information relevant to the construction site, such as building
dimensions, property lines, or neighboring constructions, can be retrieved by georeferencing
the image and linking it to spatial information retrieved from the cadastral map or other
geoinformation services, such as OpenStreetMap or city models. Finally, all information
retrieved is stored in a database for reliable data management and access.
Airborne
images</p>
      <p>Construction
site detection</p>
      <p>Orthorectification,
Localization</p>
      <p>Camera
parameters /</p>
      <p>GPS</p>
      <p>Construction
element
detection</p>
      <p>Data
enrichment</p>
      <p>OSM /
Cadastral</p>
      <p>map /
City models / …</p>
      <p>Construction</p>
      <p>elements
Property lines,
Neighbouring
constructions,
roads...</p>
      <p>Image
database</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Image analysis on airborne images</title>
      <p>To evaluate the photographs, we use two different CNNs: the first network detects construction
sites on airborne images, the second network segments the resulting cropped images.</p>
    </sec>
    <sec id="sec-5">
      <title>Image analysis using convolutional neural networks. There are different tasks to be solved</title>
      <p>
        by image processing algorithms. Well known tasks include classification, where classes of
single-object images are recognized; object detection, where several objects in one image may
be classified and localized within the image; and image segmentation, where individual pixels
are classified
        <xref ref-type="bibr" rid="ref14">(Rusk, 2015)</xref>
        . In this paper, we focus on object detection and instance
segmentation using CNNs.
      </p>
      <p>
        CNNs are structured in locally interconnected layers with shared weights (see Figure 2). Each
layer comprises multiple calculation units, or neurons. The neurons of the first layer (input
layer) represent the pixels of the analyzed image, the last layer (output layer) comprises the
predictable object classes. Between input and output layer, any number of hidden layers can be
arranged. To adapt to different problem domains, such as recognizing construction site
elements, CNNs are trained. During training, the connections between certain neurons are
increased, while the connections between other neurons are reduced—the weights connecting
consecutive layers are weighted. The training is usually carried out using supervised
backpropagation, meaning the network is fed with exemplary input-output pairs
        <xref ref-type="bibr" rid="ref14">(Rusk, 2015)</xref>
        .
The expected output, viz. correct solution, for each input is called ground truth. To train a CNN
towards reliable predictions, a significant amount of training data is required, which has to be
prepared in a preprocessing step. To accelerate the training processes, weights of previously
trained CNNs can be used. To adapt pretrained CNNs, usually the last layers are replaced with
layers representing the new problem domain before training with new data.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Detecting construction sites using convolutional neural networks. To collect information</title>
      <p>
        on construction sites, first, they have to be localized. The task to classify and localize objects
on images containing several objects of different types is called object detection
        <xref ref-type="bibr" rid="ref14">(Rusk, 2015)</xref>
        .
To solve this task, algorithms usually predict rectangular areas (bounding boxes) with high
probabilities of object occurrences, as well as the corresponding object class. To measure the
performance of an algorithm, the intersection of union between prediction and correct solution
is examined. Acknowledged measures include precision (How many of the predictions are
correct?), recall (How many relevant items are predicted?) and mAP (mean average precision,
calculated from recall and precision).
      </p>
      <p>
        In this paper, we used an “you only look once” network (YOLOv3, see Figure 2) in Darknet
framework
        <xref ref-type="bibr" rid="ref13">(Redmon &amp; Farhadi, 2018)</xref>
        . YOLOv3 is a single shot detector, which enables
reasonable prediction rates with very fast training and prediction times compared to other
leading algorithms. YOLOv3 divides the input image in a grid, where each cell predicts only
one object. Predictions of objects of varying sizes are enabled by a feature pyramid network—
YOLOv3 makes predictions at three different scales for each location. To predict the bounding
box of the detected object, YOLOv3 uses anchor boxes with dimensions tailored to the specific
problem domain.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Segmenting construction elements using convolutional neural networks. As exact</title>
      <p>information on the whereabouts and dimensions of the site equipment is desired, each
equipment has to be detected as precisely as possible. We want to know the exact shape of the
object rather than its bounding box. Therefore, we need an instance segmentation algorithm that
labels images pixelwise and is able to distinguish not only between several classes, but between
several objects of the same class.</p>
      <p>
        A very capable algorithm for instance segmentation is Mask R-CNN
        <xref ref-type="bibr" rid="ref3">(He, Gkioxari, Dollár, and
Girshick, 2017)</xref>
        . Mask R-CNN predicts instance masks for detected objects in two stages:
firstly, it uses a RoI (region of interest) Align network to locate bounding boxes and classes for
possible objects. Secondly, a semantic segmentation model is used to determine the exact object
outlines within the bounding box. Since only one object should be contained in each bounding
box, a binary classifier mapping the pixels 1/0 is sufficient—1 representing the presence, 0 the
absence of an object.
      </p>
    </sec>
    <sec id="sec-8">
      <title>3.2 Orthorectification of images</title>
      <p>Aerial images are not only source of information about the captured scene regarding the content
but can also deliver information about localization of the objects and metric information. To be
able to measure and localize objects in aerial images, the following information is needed:
•
•
•</p>
      <p>Exterior orientation parameters for each image (position and orientation of the camera
during acquisition)
Interior orientation parameters of the camera (obtained in the calibration process)
Meshed digital Surface Model (DSM) generated from the images itself, if stereo pairs are
provided, or DSM from another source.</p>
      <p>
        Georeferencing. Determining the exterior orientation of the camera in the world coordinate
system is called georeferencing. This can be done either by using ground control points (GCPs),
or by using the GNSS (global navigation satellite system) position together with inertial
measurement unit (IMU) and system calibration for camera orientation (direct georeferencing).
Direct georeferencing has the advantage that the manual effort of measuring the GCPs is
avoided. The accuracy of the direct georeferencing depends on the quality of the GNSS signal
and can vary from few decimeters to few meters
        <xref ref-type="bibr" rid="ref12">(Pfeifer, Glira, and Briese, 2012)</xref>
        .
Ground Sample Distance. To estimate true dimensions in aerial photographs, the distance
between two pixels on ground (Ground Sample Distance, GSD) must be known. In traditional
aerial photogrammetry, images are acquired using nadir view. Assuming a locally flat Earth
surface and knowing flight altitude, sensor size and camera constant (focal length), the GSD
can be calculated. The altitude is determined relatively to the reference surface and not the
terrain (Figure 3). Therefore, changes in the terrain height and presence of other objects (e.g.
high buildings) lead to varying GSD. Furthermore, the flight altitude does not remain constant
over the whole flight campaign, as well as vertical acquisition geometry not always can be
ensured. Accordingly, only approximate GDS can be indicated. Modern aerial photogrammetric
camera systems use a combination of nadir and oblique view cameras, delivering additional
views on building facades and other 3D objects. GSD in oblique images, however, cannot be
determined.
      </p>
      <p>Orthophoto. During orthorectification, aerial images in central projection are transformed into
orthogonal projection in order to unify the GSD and allow direct measurements in the images
(Figure 4). For each cell of the DSM mesh, the corresponding part of the image is identified
and transformed onto the DSM. The resulting mesh is then ortho-projected on a regular grid
created on the reference surface with defined GSD. We orthorectify not only the color image,
but also obtained labels.</p>
    </sec>
    <sec id="sec-9">
      <title>3.3 Additional sources and data enrichment</title>
      <p>Surrounding amenities and conditions of construction sites pose big influences on the site
equipment. For example, neighboring buildings might influence tower crane positions and
minimum dimensions to ensure safe operation without interference; access roads are highly
relevant for material supply and will influence construction roads and storage area positioning.
If the location of a construction site has been determined through georeferencing the image, the
information on that construction site can be enriched using spatial information on the
surrounding amenities. Spatial information is available from different sources, e.g. cadastral
maps. Cadastral maps show property lines and ownerships and may include additional
information such as parcel numbers and existing structures.</p>
      <p>A wide selection of digital geodata is available on OpenStreetMap1 (OSM). OSM aims to
collect and provide data under an open license. The geodata provided includes, inter alia, roads,
parks, building outlines, and amenities such as fire hydrants and post boxes. While the data can
be viewed as map representation, several APIs are available for data access, of which Overpass
API is currently well maintained. Overpass works with queries either in xml or in its native
language, Overpass QL. In this paper, we used the Overpass API with Overpass QL to retrieve
information on neighboring buildings and roads.</p>
      <p>Additional 3-dimensional information could be retrieved from city models. Depending on the
model's Level of Detail (LoD), buildings are represented as 3D blocks (base area and height)
or with increasing detailing, such as roof shape, window areas and even interior construction.
City models are widely available. For storing and exchanging city models, the open standard
CityGML, among other data models, might be used.</p>
    </sec>
    <sec id="sec-10">
      <title>4. Case study</title>
      <p>The presented image analysis steps were implemented individually to prove the proposed
approach. The results are presented in the following sections.</p>
    </sec>
    <sec id="sec-11">
      <title>4.1 Generating ground truth data for image analysis</title>
      <p>
        To create ground truth data for both CNNs, we used airborne images provided by the German
Aerospace Center (DLR)
        <xref ref-type="bibr" rid="ref7">(Kurz et al., 2012)</xref>
        . The aerial photographs were not commissioned
for this paper, but rather repurposed from other applications, therefore not leading to additional
data recording efforts. The images were gathered in Germany, both by airplane and helicopter.
In total, approximately 4.500 high resolution images with varying image sizes have been
sighted and labeled. In a first step, we extracted manually all images that contain construction
sites within the construction phase (characterized by the use of tower cranes; visible material
storage; first construction elements have been erected. See Figure 5). Subsequently, to generate
the object detection dataset, we added bounding boxes for all construction sites using the
labeling platform “Labelbox”2. We translated the labels from Labelbox format to YOLO format
and split the data in training, testing and validation data set (Table 1).
To prepare the segmentation dataset, we used images of tower cranes taken by UAVs, and
images taken by hand cameras (mostly from below) (Figure 6). Additionally, we further
processed images from the object detection dataset. We added a margin around the construction
site bounding boxes to ensure that all relevant information is contained (especially tower cranes
reaching outside construction field) and cropped along the labels. Corresponding image areas
for neighboring construction sites with overlapping bounding boxes are contained in both crops.
We again used Labelbox to add polygonal labels for all tower cranes. For training the Mask
RCNN network, we decided to use the COCO data format. In this case, too, we split the data in
70% training data, 20% testing data and 10% validation data.
      </p>
    </sec>
    <sec id="sec-12">
      <title>4.2 Training CNNs for object detection and instance segmentation</title>
      <p>
        Object detection with YOLOv3. For training the construction detection network, we use a
YOLOv3 architecture in Darknet
        <xref ref-type="bibr" rid="ref13">(Redmon and Farhadi, 2018)</xref>
        . To better adapt the network to
the construction site dataset, we regenerated the anchor sizes. To that end, we used k-means
clustering on the aggregate of bounding boxes in the construction site dataset. The resulting 9
clusters for bounding box width and length, normalized on the respective image size (see Figure
7), are used as length and width of the anchors.
      </p>
      <p>To reduce training time and retrieve better results with our limited data set, we use the pretrained
weights of the Darknet53 network. Training for 1000 epochs took approximately 10h on an
Nvidia DGX-1. Examples for resulting bounding boxes are depicted in Figure 8. While
bounding boxes for smaller construction sites are very well predicted, the CNN is not yet
sufficiently adapted for larger construction sites. When sighting the dataset, it gets apparent,
that smaller construction sites are predominant in residential areas and make up for a majority
of the dataset. To a broader variety of construction sites, the training dataset is currently
expanded further. Another step to improve the object detection algorithm entails more extensive
preprocessing of the data.</p>
    </sec>
    <sec id="sec-13">
      <title>Tower crane segmentation with Mask R-CNN. For image segmentation, we used Mask R</title>
      <p>
        CNN in Keras with TensorFlow backend
        <xref ref-type="bibr" rid="ref3">(He et al., 2017)</xref>
        . We used pretrained weights from
the COCO Dataset
        <xref ref-type="bibr" rid="ref11">(Lin et al., 2014)</xref>
        . Mask R-CNN adjusted very fast to the tower crane dataset,
leading to low loss after few epochs (Figure 9). Examples for resulting instance bounding boxes
and masks are depicted in Figure 10. Object bounding boxes are predicted reliably, while, in
some cases, masks tend to disconnections. To improve the predictions, the training data set is
increased continuously.
      </p>
    </sec>
    <sec id="sec-14">
      <title>4.3 Georeferenced information</title>
      <p>In Figure 11, we present results for orthorectification. First, the DSM was generated (left) and
then the orthophoto was calculated (right). Here, the seamline dividing the areas covered by
color information from two images cuts a tower crane into two parts, which makes it difficult
to detect tower cranes directly in the orthophoto. Therefore, cranes were labeled in original
aerial images (Figure 12, left) and the labels were orthoprojected on the DSM (Figure 12, right).
As we see in Figure 11, crane 1 is not present in the DSM at all, and from crane 2 only the tower
of the crane is mapped in the DSM. This is because the crane structure is not easy to reconstruct
from images. As a very thin structure, cranes do not provide many points in the point cloud.
Points that are detected are on much higher level than the surrounding, therefore many
algorithms detect those points as outliers and remove them from DSM. In addition, cranes may
move during data acquisition, which leads to further difficulties in the reconstruction. The
remedy for this situation is using true orthophotos based on high-density and high-accuracy
point clouds, which contain entire tower cranes. True orthophotos, however, require high
overlaps between the images (80% in flight direction and 60% between the image stripes). Here
not only the difficult geometry of the crane must be considered, but also its dynamic behavior.
This aspect should be a subject of our future investigations. Currently, we select for
orthorectification labels which are located close to the image center and orthorectify labels
object-wise, which means that we prevent seamlines splitting the labels.</p>
    </sec>
    <sec id="sec-15">
      <title>4.4 Data enrichment using Overpass API</title>
      <p>To retrieve surrounding amenities to the construction site in question from OSM, we used
Overpass API. Using the GPS coordinates of the construction’s outer corners, we queried
adjacent roads and footways as well as neighboring buildings. Overpass API returns the queried
data text based (i.e. as json or xml). For monitoring the results, we used Overpass turbo3 (Figure
13). Overpass turbo is a web-based tool able to run Overpass API queries. The results are shown
as interactive map, where further information on the queried nodes can be retrieved. The
additional data collected from OSM is added to the information retrieved from the images.</p>
    </sec>
    <sec id="sec-16">
      <title>5. Summary</title>
      <p>In this contribution, we presented an image analysis pipeline capable of detecting construction
sites as well as construction elements on airborne images. To gain further information on the
situation on site, we retrieve real dimensions of the construction field and tower cranes by
orthorectifying the images. Further data sources can be used to enrich the information. Using
the overpass API, we retrieve information on the site’s surroundings from OpenStreetMap. In
the end, we gained a knowledge database for construction sites in Germany, which will be
dynamically extended. Next steps include the extension with further construction site elements
such as containers or vehicles, connection to city models to retrieve 3D information and the
advancement of available algorithms using the knowledge database for solving the CSLP.</p>
      <p>Deep learning.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bügler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borrmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogunmakin</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vela</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Teizer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Fusion of Photogrammetry and Video Analysis for Productivity Assessment of Earthwork Processes</article-title>
          .
          <source>Computer-Aided Civil and Infrastructure Engineering</source>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ),
          <fpage>107</fpage>
          -
          <lpage>123</lpage>
          . https://doi.org/10.1111/mice.12235
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>K. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Golparvar-Fard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Potential of big visual data and building information modeling for construction performance analytics: An exploratory study</article-title>
          .
          <source>Automation in Construction</source>
          ,
          <volume>73</volume>
          ,
          <fpage>184</fpage>
          -
          <lpage>198</lpage>
          . https://doi.org/10.1016/j.autcon.
          <year>2016</year>
          .
          <volume>11</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkioxari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Mask R-CNN</article-title>
          .
          <source>Proceedings of the IEEE International Conference on Computer Vision</source>
          ,
          <fpage>2961</fpage>
          -
          <lpage>2969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jahr</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Borrmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Formwork detection in UAV pictures of construction sites</article-title>
          .
          <source>Proc. of the 12th European Conference on Product and Process Modelling</source>
          , Copenhagen, Denmark.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <volume>1097</volume>
          -
          <fpage>1105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Kropp</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>König</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Interior construction state recognition with 4D BIM registered image sequences</article-title>
          .
          <source>Automation in Construction</source>
          ,
          <volume>86</volume>
          ,
          <fpage>11</fpage>
          -
          <lpage>32</lpage>
          . https://doi.org/10.1016/J.AUTCON.
          <year>2017</year>
          .
          <volume>10</volume>
          .027
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Kurz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meynberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenbaum</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Türmer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinartz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schroeder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Low-cost optical camera system for disaster monitoring</article-title>
          .
          <source>Int. Archives of the Photogrammetry, Remote Sens. and Spatial Information Sci</source>
          ,
          <volume>39</volume>
          ,
          <fpage>B8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , https://doi.org/10.1038/nature14539
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Nature</surname>
          </string-name>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ),
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , T.-Y.,
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , …
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Microsoft COCO: Common objects in context</article-title>
          .
          <source>European Conference on Computer Vision</source>
          ,
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Pfeifer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glira</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Briese</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Direct georeferencing with on board navigation components of light weight UAV platforms</article-title>
          .
          <source>International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences</source>
          ,
          <volume>39</volume>
          (
          <issue>B7</issue>
          ),
          <fpage>487</fpage>
          -
          <lpage>492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Yolov3: An incremental improvement</article-title>
          . ArXiv Preprint ArXiv:
          <year>1804</year>
          .02767.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Rusk</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Deep learning</article-title>
          .
          <source>Nature Methods</source>
          , Vol.
          <volume>13</volume>
          , p.
          <fpage>35</fpage>
          . https://doi.org/10.1038/nmeth.3707
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma, S., …
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>115</volume>
          (
          <issue>3</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          . https://doi.org/10.1007/s11263-015-0816-y
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Tajeen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Image dataset development for measuring construction equipment recognition performance</article-title>
          .
          <source>Automation in Construction</source>
          ,
          <volume>48</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . https://doi.org/10.1016/J.AUTCON.
          <year>2014</year>
          .
          <volume>07</volume>
          .006
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>