<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pizzo Calabro (VV),
Italy
" luca.ciampi@isti.cnr.it (L. Ciampi)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Trafic Density Estimation via Unsupervised Domain Adaptation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Ciampi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Santiago</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joao Paulo Costeira</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gennaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Amato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Information Science and Technologies - National Research Council - Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Superior Técnico (LARSyS/IST) - Lisbon</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Monitoring trafic flows in cities is crucial to improve urban mobility, and images are the best sensing modality to perceive and assess the flow of vehicles in large areas. However, current machine learningbased technologies using images hinge on large quantities of annotated data, preventing their scalability to city-scale as new cameras are added to the system. We propose a new methodology to design image-based vehicle density estimators with few labeled data via an unsupervised domain adaptation technique.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Unsupervised Domain Adaptation</kwd>
        <kwd>Synthetic Datasets</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Counting Vehicles</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Trafic problems are constantly increasing, and tomorrow’s cities can only be smart if they enable
smart mobility. This concept is becoming more critical since trafic congestion caused by the
increasing number of people using diferent road infrastructures to travel anywhere is imposing
extra costs that make all activities more expensive and put a damper on the development.</p>
      <p>Smart mobility applications such as smart parking and road trafic management are nowadays
widely employed worldwide, making our cities more livable and bringing benefits to the cities,
a better quality of our life, reducing costs, and improving energy usage.</p>
      <p>Images are probably the best sensing modality to perceive and assess the flow of vehicles
in large areas. Like no other sensing mechanism, networks of city cameras can observe such
large dimensions and simultaneously provide visual data to AI systems to extract relevant
information from this deluge of data.</p>
      <p>In this work, we propose a CNN-based system that can estimate trafic density and count the
vehicles present in urban scenes directly on-board smart city cameras, analyzing the images
captured by themselves.
∑
25 vehicles</p>
      <p>
        Current systems address the counting problem as a supervised learning process. They fall
into two main classes of methods: a) detection-based approaches [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that try to identify
and localize single instances of objects in the image and b) density-based techniques that rely
on regression techniques to estimate a density map from the image, and where the final count
is given by summing all pixel values [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Figure 1 illustrates the mapping of such regression.
Concerning vehicle counting in urban spaces, where images are of low resolution, and most
objects are partially occluded, density-based methods have a clear advantage on detection
methods [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        However, since this class of approaches requires pixel-level ground truth for supervised
learning, they may not generalize well to unseen images, especially when there is a large
domain gap between the training (source) and the test (target) sets, such as diferent camera
perspectives, weather, or illumination. The direct transfer of the learned features between
diferent domains does not work very well because the distributions are diferent. Thus, a model
trained on the source domain usually experiences a drastic drop in performance when applied
to the target domain. This problem is commonly referred to as Domain Shift [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and it severely
hampers the application of counting methods to very large-scale scenarios since annotating
images for all the possible cases is unfeasible.
      </p>
      <p>To mitigate this problem, we introduce a methodology that performs Unsupervised Domain
Adaptation (UDA) among diferent scenarios. UDA techniques address the domain shift taking
a source labeled dataset and a target unlabeled one. The challenge here is to automatically infer
some knowledge from the target data to reduce the gap between the two domains. Specifically,
in this work, we propose an end-to-end CNN-based UDA algorithm for trafic density estimation
and counting, based on adversarial learning performed directly on the generated density maps,
i.e., in the output space, given that in this specific case, the output space contains valuable
information such as scene layout and context. We focus on vehicle counting, but the approach
is suitable for counting any other types of objects.</p>
      <p>Another contribution of this work is represented by the creation of two new per-pixel
annotated datasets made available to the scientific community. One of the two novel datasets is
a collection of synthetic images taken from a photo-realistic video game where the labels are
automatically assigned while interacting with the API of the graphical engine. We conducted
our experiments considering these two datasets and another collection of images already
present in the literature, validating our approach over diferent types of domain shifts: i) the
Camera2Camera domain shift, where the source images belong to some specific cameras, and
the target ones are instead taken from diferent perspectives and context; ii) the Day2Night
domain shift, where the source domain is represented by images taken during the day and the
target domain by pictures taken at night; iii) the Synthetic2Real domain shift, where source
images are collected using a video game and automatically annotated, while the target ones are
real urban pictures. Experiments show a significant improvement compared to the performance
of the model without domain adaptation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Datasets</title>
      <sec id="sec-2-1">
        <title>2.1. NDISPark Dataset</title>
        <p>This section describes the datasets exploited in this work, focusing mainly on the two novel
datasets created on purpose in this work.</p>
        <p>The NDISPark - Night and Day Instance Segmented Park dataset is a small, manually annotated
dataset for counting cars in parking lots, consisting of about 250 images. This dataset is
challenging and describes the most dificult situations that can be found in a real scenario:
seven diferent cameras capture the images under various weather conditions and angles of
view. Furthermore, it is worth noting that pictures are taken during the day and the night,
showing utterly diferent light conditions. The images are precisely annotated with instance
segmentation labels, and this allowed us to generate accurate ground truth density maps usable
for the counting task.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. GTA Dataset</title>
        <p>
          The GTA - Grand Trafic Auto dataset is a vast collection of about 15,000 synthetic images of
urban trafic scenes collected from the highly photo-realistic video game GTA V - Grand Theft
Auto V. We deploy a framework that can automatically and precisely annotate the vehicles
present in the scene with per-pixel annotations. To the best of our knowledge, it is the first
instance segmentation synthetic dataset of city trafic scenarios. Figure 2 shows some examples
of images belonging to this dataset together with the annotations.
2.2.1. WebCamT Dataset
The WebCamT dataset is a collection of trafic scenes recorded using city-cameras introduced
by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It is particularly challenging for analysis due to the low-resolution (352 × 240), high
occlusion, and large perspective. We considered images belonging to diferent cameras and
consequently having diferent views.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Method</title>
      <p>Our method relies on a CNN model trained end-to-end with adversarial learning in the output
space (i.e., the density maps), which contains rich information such as scene layout and context.</p>
      <p>The peculiarity of our adversarial learning scheme is that it forces the predicted density maps
in the target domain to have local similarities with the ones in the source domain.</p>
      <p>Figure 3 depicts the proposed framework consisting of two modules: 1) a CNN that predicts
trafic density maps, from which we estimate the number of vehicles in the scene, and 2) a
discriminator that identifies whether a density map (received by the density map estimator)
was generated from an image of the source domain or the target domain.</p>
      <p>In the training phase, the density map predictor learns to map images to densities based on
annotated data from the source domain. At the same time, it learns to predict realistic density
maps for the target domain by trying to fool the discriminator with an adversarial loss. The
discriminator’s output is a pixel-wise classification of a low-resolution map, as illustrated in
Figure 3, where each pixel corresponds to a small region in the density map. Consequently,
the output space is forced to be locally similar for both the source and target domains. In the
inference phase, the discriminator is discarded, and only the density map predictor is used for
the target images. We describe each module and how it is trained in the following subsections.</p>
      <sec id="sec-3-1">
        <title>3.1. Density Estimation Network</title>
        <p>
          We formulate the counting task as a density map estimation problem [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The density (intensity)
of each pixel in the map depends on its proximity to a vehicle centroid and the size of the vehicle
in the image so that each vehicle contributes with a total value of 1 to the map. Therefore,
it provides statistical information about the vehicles’ location and allows the counting to be
estimated by summing of all density values.
        </p>
        <p>
          This task is performed by a CNN-based model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], whose goal is to automatically determine
the vehicle density map associated with a given input image. Formally, the density map estimator,
Ψ : ℛ×ℋ× ↦→ ℛℋ× , transforms a  × ℋ input image ℐ with  channels, into a density
map,  = Ψ( ℐ) ∈ ℛℋ× .
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Discriminator Network</title>
        <p>The discriminator network, denoted by Θ , also consists of a CNN model. It takes as input
the density map, , estimated by the network Ψ . Its output is a lower resolution probability
+
Source Image</p>
        <p>Label
Target Image</p>
        <p>Source Prediction
Target Prediction</p>
        <p>Adversarial Loss
Density Estimator</p>
        <p>Network</p>
        <p>Discriminator
Network</p>
        <p>Discriminator</p>
        <p>Loss
map where each pixel represents the probability that the corresponding region (from the input
density map) comes either from the source or the target domain. The goal of the discriminator
is to learn to distinguish between density maps belonging to source or target domains. Through
an adversarial loss, this discriminator will, in turn, force the density estimator to provide density
maps with similar distributions in both domains. In other words, the target domain density
maps have to look realistic, even though the network Ψ was not trained with an annotated
training set from that domain.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Domain Adaptation Learning</title>
        <p>The proposed framework is trained based on an alternate optimization of the density estimation
network, Ψ , and the discriminator network, Θ . Regarding the former, the training process relies
on two components: 1) density estimation using pairs of images and ground truth density maps,
which we assume are only available in the source domain; and 2) adversarial training, which
aims to make the discriminator fail to distinguish between the source and target domains. As for
the latter, images from both domains are used to train the discriminator on correctly classifying
each pixel of the probability map as either source or target.</p>
        <p>To implement the above training procedure, we use two loss functions: one is employed in
the first step of the algorithm to train network Ψ , and the other is used in the second step to
train the discriminator Θ . These loss functions are detailed next.</p>
        <p>Network Ψ Training. We formulate the loss function for Ψ as the sum of two main
components:</p>
        <p>ℒ(ℐ , ℐ ) = ℒ(ℐ ) +  ℒ(ℐ ),
where ℒ is the loss computed using ground truth annotations available in the source
domain, while ℒ is the adversarial loss that is responsible for making the distribution of
the target and the source domain closer to each other. In particular, we define the density loss
ℒ as the mean square error between the predicted and ground truth density maps, i.e.
ℒ =  ( , _ ).</p>
        <p>
          To compute the adversarial loss ℒ, we first forward the images belonging to the target
domain through network Ψ , to generate the predicted density maps  . Then, we forward
 through network Θ , to generate the probability map  = Θ(Ψ( ℐ )) ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]′×  ′ , where
′ &lt;  and  ′ &lt;  . The adversarial loss is given by
ℒ(ℐ ) = −
∑︁ log(ℎ,),
ℎ,
where the subscript ℎ,  denotes a pixel in  . This loss makes the distribution of  closer to
 by forcing Ψ to fool the discriminator, through the maximization of the probability of 
being locally classified as belonging to the source domain.
        </p>
        <p>Network Θ Training. Given an image ℐ and the corresponding predicted density map ,
we feed  as input to the fully-convolutional discriminator Θ to obtain the probability map  .
The discriminator is trained by comparing  with the ground truth label map  ∈ {0, 1}′×  ′
using a pixel-wise binary cross-entropy loss
ℒ(ℐ) = −
∑︁(1 − ℎ,) log(1 − ℎ,) + ℎ,(ℎ,),
ℎ,
where ℎ, = 0 ∀ ℎ,  if ℐ is taken from the target domain and ℎ, = 1 otherwise.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>We validate the proposed UDA method for density estimation and counting of trafic scenes
under diferent settings. First, we employ the NDISPark dataset, and we test the Day2Night
domain shift considering pictures taken during the day as the source domain, while night
images for the target domain. Then, we utilize the WebCamT dataset to take into account
the Camera2Camera performance gap, tackling the domain shift that takes place when we
consider a camera diferent from the ones used during the training phase. Finally, we use the
GTA dataset to assess the Synthetic2Real domain diference, training the algorithm using the
synthetic images, and then test it on real data considering the WebCamT dataset again.</p>
      <p>For all the experiments, we base the evaluation of the models on three metrics widely used
for the counting task: (i) Mean Absolute Error (MAE) that measures the absolute count error
of each image; (ii) Mean Squared Error (MSE) that instead quantifies the squared count error
for each image; (iii) Average Relative Error (ARE), which measures the absolute count error
divided by the true count. Note that, as a result of the squaring of each error, the MSE efectively
penalizes large errors more heavily than small ones. Instead, the ARE is the only metric that
(1)
(2)
(3)
considers the relation of the error and the total number of vehicles present for each image.
Results are summarized in Table 1. We achieved better results compared to the basic model in
all the considered scenarios and considering all the three metrics.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this article, we tackled the problem of determining the density and the number of objects
present in large sets of images. Building on a CNN-based density estimator, the proposed
methodology can generalize to new data sources for which there are no annotations available.
We achieved this generalization by exploiting an Unsupervised Domain Adaptation strategy,
whereby a discriminator attached to the output forces similar density distribution in the target
and source domains. Experiments show a significant improvement relative to the performance
of the model without domain adaptation. To the best of our knowledge, we are the first to
introduce a UDA scheme for counting to reduce the gap between the source and the target
domain without using additional labels. Given the conventional structure of the estimator, the
improvement obtained by just monitoring the output entails a great capacity to generalize
learned knowledge, thus suggesting the application of similar principles to the inner layers of
the network.</p>
      <p>Another contribution is represented by the creation of two new per-pixel annotated datasets
made available to the scientific community. One of the two novel datasets is a synthetic dataset
created from a photo-realistic video game. Here the labels are automatically assigned while
interacting with the API of the graphical engine. Using this synthetic dataset, we demonstrated
that it is possible to train a model with a precisely annotated and automatically generated
synthetic dataset and perform UDA toward a real-world scenario, obtaining very good performance
without using additional manual annotations.</p>
      <p>In our view, this work’s outcome opens new perspectives to deal with the scalability of
learning methods for large physical systems with scarce supervisory resources.
This work was partially supported by H2020 project AI4EU under GA 825619.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ciampi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gennaro</surname>
          </string-name>
          ,
          <article-title>Counting vehicles with deep learning in onboard UAV imagery</article-title>
          ,
          <source>in: 2019 IEEE Symposium on Computers and Communications, ISCC</source>
          <year>2019</year>
          , Barcelona, Spain, June 29 - July 3,
          <year>2019</year>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: https://doi.org/10.1109/ ISCC47284.
          <year>2019</year>
          .
          <volume>8969620</volume>
          . doi:
          <volume>10</volume>
          .1109/ISCC47284.
          <year>2019</year>
          .
          <volume>8969620</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ciampi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gennaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rabitti</surname>
          </string-name>
          ,
          <article-title>Counting vehicles with cameras</article-title>
          , in: S. Bergamaschi,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Maurino (Eds.),
          <source>Proceedings of the 26th Italian Symposium on Advanced Database Systems</source>
          , Castellaneta Marina (Taranto), Italy, June 24-27,
          <year>2018</year>
          , volume
          <volume>2161</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2161</volume>
          /paper12.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bolettieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Carrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ciampi</surname>
          </string-name>
          , G. Pieri,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gennaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vairo</surname>
          </string-name>
          ,
          <article-title>A wireless smart camera network for parking monitoring</article-title>
          ,
          <source>in: IEEE Globecom Workshops, GC Wkshps</source>
          <year>2018</year>
          ,
          <string-name>
            <given-names>Abu</given-names>
            <surname>Dhabi</surname>
          </string-name>
          ,
          <source>United Arab Emirates, December</source>
          <volume>9</volume>
          -
          <issue>13</issue>
          ,
          <year>2018</year>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: https://doi.org/10.1109/GLOCOMW.
          <year>2018</year>
          .
          <volume>8644226</volume>
          . doi:
          <volume>10</volume>
          .1109/ GLOCOMW.
          <year>2018</year>
          .
          <volume>8644226</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Learning to count objects in images</article-title>
          , in: J.
          <string-name>
            <surname>D. Laferty</surname>
            ,
            <given-names>C. K. I.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Shawe-Taylor</surname>
            , R. S. Zemel,
            <given-names>A</given-names>
          </string-name>
          . Culotta (Eds.),
          <source>Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December</source>
          <year>2010</year>
          , Vancouver, British Columbia, Canada, Curran Associates, Inc.,
          <year>2010</year>
          , pp.
          <fpage>1324</fpage>
          -
          <lpage>1332</lpage>
          . URL: https://proceedings.neurips.cc/ paper/2010/hash/fe73f687e5bc5280214e0486b273a5f9-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Chen, Csrnet:
          <article-title>Dilated convolutional neural networks for understanding the highly congested scenes</article-title>
          ,
          <source>in: 2018 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2018</year>
          ,
          <article-title>Salt Lake City</article-title>
          ,
          <string-name>
            <surname>UT</surname>
          </string-name>
          , USA, June 18-22,
          <year>2018</year>
          , IEEE Computer Society,
          <year>2018</year>
          , pp.
          <fpage>1091</fpage>
          -
          <lpage>1100</lpage>
          . URL: http://openaccess.thecvf.com/content_cvpr_2018/html/ Li_CSRNet_Dilated_Convolutional_CVPR_
          <year>2018</year>
          <article-title>_paper</article-title>
          .html. doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2018</year>
          .
          <volume>00120</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Wu,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Costeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M. F.</given-names>
            <surname>Moura</surname>
          </string-name>
          ,
          <article-title>Understanding trafic density from large-scale web camera data</article-title>
          ,
          <source>in: 2017 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, July
          <volume>21</volume>
          -
          <issue>26</issue>
          ,
          <year>2017</year>
          , IEEE Computer Society,
          <year>2017</year>
          , pp.
          <fpage>4264</fpage>
          -
          <lpage>4273</lpage>
          . URL: http://doi.ieeecomputersociety.
          <source>org/10</source>
          .1109/CVPR.
          <year>2017</year>
          .
          <volume>454</volume>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2017</year>
          .
          <volume>454</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>Unbiased look at dataset bias</article-title>
          ,
          <source>in: The 24th IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2011</year>
          ,
          <article-title>Colorado Springs</article-title>
          , CO, USA,
          <fpage>20</fpage>
          -
          <lpage>25</lpage>
          June 2011, IEEE Computer Society,
          <year>2011</year>
          , pp.
          <fpage>1521</fpage>
          -
          <lpage>1528</lpage>
          . URL: https://doi.org/10.1109/CVPR.
          <year>2011</year>
          .
          <volume>5995347</volume>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2011</year>
          .
          <volume>5995347</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>