<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Traffic Density Estimation via Unsupervised Domain Adaptation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Luca</forename><surname>Ciampi</surname></persName>
							<email>luca.ciampi@isti.cnr.it</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Information Science and Technologies</orgName>
								<orgName type="institution">National Research Council -Pisa</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><surname>Santiago</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Instituto Superior Técnico (LARSyS/IST)</orgName>
								<address>
									<settlement>Lisbon</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Joao</forename><surname>Paulo Costeira</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Instituto Superior Técnico (LARSyS/IST)</orgName>
								<address>
									<settlement>Lisbon</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Claudio</forename><surname>Gennaro</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Information Science and Technologies</orgName>
								<orgName type="institution">National Research Council -Pisa</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>Amato</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Information Science and Technologies</orgName>
								<orgName type="institution">National Research Council -Pisa</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Traffic Density Estimation via Unsupervised Domain Adaptation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">AD78D85FF54F7BF032B44DB51C418A4A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Unsupervised Domain Adaptation, Synthetic Datasets, Deep Learning, Counting Vehicles, (L. Ciampi) 0000-0002-6985-0439 (L. Ciampi)</term>
					<term>0000-0002-4737-0020 (C. Santiago)</term>
					<term>0000-0001-6769-2935 (J. P. Costeira)</term>
					<term>0000-0002-3715-149X (C. Gennaro)</term>
					<term>0000-0003-0171-4315 (G. Amato)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Monitoring traffic flows in cities is crucial to improve urban mobility, and images are the best sensing modality to perceive and assess the flow of vehicles in large areas. However, current machine learningbased technologies using images hinge on large quantities of annotated data, preventing their scalability to city-scale as new cameras are added to the system. We propose a new methodology to design image-based vehicle density estimators with few labeled data via an unsupervised domain adaptation technique.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Traffic problems are constantly increasing, and tomorrow's cities can only be smart if they enable smart mobility. This concept is becoming more critical since traffic congestion caused by the increasing number of people using different road infrastructures to travel anywhere is imposing extra costs that make all activities more expensive and put a damper on the development.</p><p>Smart mobility applications such as smart parking and road traffic management are nowadays widely employed worldwide, making our cities more livable and bringing benefits to the cities, a better quality of our life, reducing costs, and improving energy usage.</p><p>Images are probably the best sensing modality to perceive and assess the flow of vehicles in large areas. Like no other sensing mechanism, networks of city cameras can observe such large dimensions and simultaneously provide visual data to AI systems to extract relevant information from this deluge of data.</p><p>In this work, we propose a CNN-based system that can estimate traffic density and count the vehicles present in urban scenes directly on-board smart city cameras, analyzing the images captured by themselves. Current systems address the counting problem as a supervised learning process. They fall into two main classes of methods: a) detection-based approaches <ref type="bibr" target="#b0">[1]</ref>[2][3] that try to identify and localize single instances of objects in the image and b) density-based techniques that rely on regression techniques to estimate a density map from the image, and where the final count is given by summing all pixel values <ref type="bibr" target="#b3">[4]</ref>. Figure <ref type="figure" target="#fig_0">1</ref> illustrates the mapping of such regression. Concerning vehicle counting in urban spaces, where images are of low resolution, and most objects are partially occluded, density-based methods have a clear advantage on detection methods <ref type="bibr">[5][6]</ref>.</p><p>However, since this class of approaches requires pixel-level ground truth for supervised learning, they may not generalize well to unseen images, especially when there is a large domain gap between the training (source) and the test (target) sets, such as different camera perspectives, weather, or illumination. The direct transfer of the learned features between different domains does not work very well because the distributions are different. Thus, a model trained on the source domain usually experiences a drastic drop in performance when applied to the target domain. This problem is commonly referred to as Domain Shift <ref type="bibr" target="#b6">[7]</ref>, and it severely hampers the application of counting methods to very large-scale scenarios since annotating images for all the possible cases is unfeasible.</p><p>To mitigate this problem, we introduce a methodology that performs Unsupervised Domain Adaptation (UDA) among different scenarios. UDA techniques address the domain shift taking a source labeled dataset and a target unlabeled one. The challenge here is to automatically infer some knowledge from the target data to reduce the gap between the two domains. Specifically, in this work, we propose an end-to-end CNN-based UDA algorithm for traffic density estimation and counting, based on adversarial learning performed directly on the generated density maps, i.e., in the output space, given that in this specific case, the output space contains valuable information such as scene layout and context. We focus on vehicle counting, but the approach is suitable for counting any other types of objects.</p><p>Another contribution of this work is represented by the creation of two new per-pixel annotated datasets made available to the scientific community. One of the two novel datasets is a collection of synthetic images taken from a photo-realistic video game where the labels are automatically assigned while interacting with the API of the graphical engine. We conducted our experiments considering these two datasets and another collection of images already present in the literature, validating our approach over different types of domain shifts: i) the Camera2Camera domain shift, where the source images belong to some specific cameras, and the target ones are instead taken from different perspectives and context; ii) the Day2Night domain shift, where the source domain is represented by images taken during the day and the target domain by pictures taken at night; iii) the Synthetic2Real domain shift, where source images are collected using a video game and automatically annotated, while the target ones are real urban pictures. Experiments show a significant improvement compared to the performance of the model without domain adaptation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The Datasets</head><p>This section describes the datasets exploited in this work, focusing mainly on the two novel datasets created on purpose in this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">NDISPark Dataset</head><p>The NDISPark -Night and Day Instance Segmented Park dataset is a small, manually annotated dataset for counting cars in parking lots, consisting of about 250 images. This dataset is challenging and describes the most difficult situations that can be found in a real scenario: seven different cameras capture the images under various weather conditions and angles of view. Furthermore, it is worth noting that pictures are taken during the day and the night, showing utterly different light conditions. The images are precisely annotated with instance segmentation labels, and this allowed us to generate accurate ground truth density maps usable for the counting task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">GTA Dataset</head><p>The GTA -Grand Traffic Auto dataset is a vast collection of about 15,000 synthetic images of urban traffic scenes collected from the highly photo-realistic video game GTA V -Grand Theft Auto V. We deploy a framework that can automatically and precisely annotate the vehicles present in the scene with per-pixel annotations. To the best of our knowledge, it is the first instance segmentation synthetic dataset of city traffic scenarios. Figure <ref type="figure" target="#fig_1">2</ref> shows some examples of images belonging to this dataset together with the annotations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">WebCamT Dataset</head><p>The WebCamT dataset is a collection of traffic scenes recorded using city-cameras introduced by <ref type="bibr" target="#b5">[6]</ref>. It is particularly challenging for analysis due to the low-resolution (352 × 240), high occlusion, and large perspective. We considered images belonging to different cameras and consequently having different views.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Method</head><p>Our method relies on a CNN model trained end-to-end with adversarial learning in the output space (i.e., the density maps), which contains rich information such as scene layout and context. The peculiarity of our adversarial learning scheme is that it forces the predicted density maps in the target domain to have local similarities with the ones in the source domain.</p><p>Figure <ref type="figure" target="#fig_2">3</ref> depicts the proposed framework consisting of two modules: 1) a CNN that predicts traffic density maps, from which we estimate the number of vehicles in the scene, and 2) a discriminator that identifies whether a density map (received by the density map estimator) was generated from an image of the source domain or the target domain.</p><p>In the training phase, the density map predictor learns to map images to densities based on annotated data from the source domain. At the same time, it learns to predict realistic density maps for the target domain by trying to fool the discriminator with an adversarial loss. The discriminator's output is a pixel-wise classification of a low-resolution map, as illustrated in Figure <ref type="figure" target="#fig_2">3</ref>, where each pixel corresponds to a small region in the density map. Consequently, the output space is forced to be locally similar for both the source and target domains. In the inference phase, the discriminator is discarded, and only the density map predictor is used for the target images. We describe each module and how it is trained in the following subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Density Estimation Network</head><p>We formulate the counting task as a density map estimation problem <ref type="bibr" target="#b3">[4]</ref>. The density (intensity) of each pixel in the map depends on its proximity to a vehicle centroid and the size of the vehicle in the image so that each vehicle contributes with a total value of 1 to the map. Therefore, it provides statistical information about the vehicles' location and allows the counting to be estimated by summing of all density values. This task is performed by a CNN-based model <ref type="bibr" target="#b4">[5]</ref>, whose goal is to automatically determine the vehicle density map associated with a given input image. Formally, the density map estimator, Ψ : ℛ 𝒞×ℋ×𝒲 ↦ → ℛ ℋ×𝒲 , transforms a 𝒲 × ℋ input image ℐ with 𝒞 channels, into a density map, 𝐷 = Ψ(ℐ) ∈ ℛ ℋ×𝒲 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Discriminator Network</head><p>The discriminator network, denoted by Θ, also consists of a CNN model. It takes as input the density map, 𝐷, estimated by the network Ψ. Its output is a lower resolution probability map where each pixel represents the probability that the corresponding region (from the input density map) comes either from the source or the target domain. The goal of the discriminator is to learn to distinguish between density maps belonging to source or target domains. Through an adversarial loss, this discriminator will, in turn, force the density estimator to provide density maps with similar distributions in both domains. In other words, the target domain density maps have to look realistic, even though the network Ψ was not trained with an annotated training set from that domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Domain Adaptation Learning</head><p>The proposed framework is trained based on an alternate optimization of the density estimation network, Ψ, and the discriminator network, Θ. Regarding the former, the training process relies on two components: 1) density estimation using pairs of images and ground truth density maps, which we assume are only available in the source domain; and 2) adversarial training, which aims to make the discriminator fail to distinguish between the source and target domains. As for the latter, images from both domains are used to train the discriminator on correctly classifying each pixel of the probability map as either source or target. To implement the above training procedure, we use two loss functions: one is employed in the first step of the algorithm to train network Ψ, and the other is used in the second step to train the discriminator Θ. These loss functions are detailed next.</p><p>Network Ψ Training. We formulate the loss function for Ψ as the sum of two main components:</p><formula xml:id="formula_0">ℒ(ℐ 𝒮 , ℐ 𝒯 ) = ℒ 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 (ℐ 𝒮 ) + 𝜆 𝑎𝑑𝑣 ℒ 𝑎𝑑𝑣 (ℐ 𝒯 ),<label>(1)</label></formula><p>where ℒ 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 is the loss computed using ground truth annotations available in the source domain, while ℒ 𝑎𝑑𝑣 is the adversarial loss that is responsible for making the distribution of the target and the source domain closer to each other. In particular, we define the density loss ℒ 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 as the mean square error between the predicted and ground truth density maps, i.e. ℒ 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 = 𝑀 𝑆𝐸(𝐷 𝒮 , 𝐷 𝒮_𝒢𝒯 ).</p><p>To compute the adversarial loss ℒ 𝑎𝑑𝑣 , we first forward the images belonging to the target domain through network Ψ, to generate the predicted density maps 𝐷 𝒯 . Then, we forward 𝐷 𝒯 through network Θ, to generate the probability map 𝑃 = Θ(Ψ(ℐ 𝒯 )) ∈ [0, 1] 𝐻 ′ ×𝑊 ′ , where 𝐻 ′ &lt; 𝐻 and 𝑊 ′ &lt; 𝑊 . The adversarial loss is given by</p><formula xml:id="formula_1">ℒ 𝑎𝑑𝑣 (ℐ 𝒯 ) = − ∑︁ ℎ,𝑤 log(𝑃 ℎ,𝑤 ),<label>(2)</label></formula><p>where the subscript ℎ, 𝑤 denotes a pixel in 𝑃 . This loss makes the distribution of 𝐷 𝒯 closer to 𝐷 𝒮 by forcing Ψ to fool the discriminator, through the maximization of the probability of 𝐷 𝒯 being locally classified as belonging to the source domain. Network Θ Training. Given an image ℐ and the corresponding predicted density map 𝐷, we feed 𝐷 as input to the fully-convolutional discriminator Θ to obtain the probability map 𝑃 . The discriminator is trained by comparing 𝑃 with the ground truth label map 𝑌 ∈ {0, 1} 𝐻 ′ ×𝑊 ′ using a pixel-wise binary cross-entropy loss</p><formula xml:id="formula_2">ℒ 𝑑𝑖𝑠𝑐 (ℐ) = − ∑︁ ℎ,𝑤 (1 − 𝑌 ℎ,𝑤 ) log(1 − 𝑃 ℎ,𝑤 ) + 𝑌 ℎ,𝑤 𝑙𝑜𝑔(𝑃 ℎ,𝑤 ),<label>(3)</label></formula><p>where 𝑌 ℎ,𝑤 = 0 ∀ ℎ, 𝑤 if ℐ is taken from the target domain and 𝑌 ℎ,𝑤 = 1 otherwise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Results</head><p>We validate the proposed UDA method for density estimation and counting of traffic scenes under different settings. First, we employ the NDISPark dataset, and we test the Day2Night domain shift considering pictures taken during the day as the source domain, while night images for the target domain. Then, we utilize the WebCamT dataset to take into account the Camera2Camera performance gap, tackling the domain shift that takes place when we consider a camera different from the ones used during the training phase. Finally, we use the GTA dataset to assess the Synthetic2Real domain difference, training the algorithm using the synthetic images, and then test it on real data considering the WebCamT dataset again. For all the experiments, we base the evaluation of the models on three metrics widely used for the counting task: (i) Mean Absolute Error (MAE) that measures the absolute count error of each image; (ii) Mean Squared Error (MSE) that instead quantifies the squared count error for each image; (iii) Average Relative Error (ARE), which measures the absolute count error divided by the true count. Note that, as a result of the squaring of each error, the MSE effectively penalizes large errors more heavily than small ones. Instead, the ARE is the only metric that  <ref type="table" target="#tab_0">1</ref>. We achieved better results compared to the basic model in all the considered scenarios and considering all the three metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In this article, we tackled the problem of determining the density and the number of objects present in large sets of images. Building on a CNN-based density estimator, the proposed methodology can generalize to new data sources for which there are no annotations available. We achieved this generalization by exploiting an Unsupervised Domain Adaptation strategy, whereby a discriminator attached to the output forces similar density distribution in the target and source domains. Experiments show a significant improvement relative to the performance of the model without domain adaptation. To the best of our knowledge, we are the first to introduce a UDA scheme for counting to reduce the gap between the source and the target domain without using additional labels. Given the conventional structure of the estimator, the improvement obtained by just monitoring the output entails a great capacity to generalize learned knowledge, thus suggesting the application of similar principles to the inner layers of the network. Another contribution is represented by the creation of two new per-pixel annotated datasets made available to the scientific community. One of the two novel datasets is a synthetic dataset created from a photo-realistic video game. Here the labels are automatically assigned while interacting with the API of the graphical engine. Using this synthetic dataset, we demonstrated that it is possible to train a model with a precisely annotated and automatically generated synthetic dataset and perform UDA toward a real-world scenario, obtaining very good performance without using additional manual annotations.</p><p>In our view, this work's outcome opens new perspectives to deal with the scalability of learning methods for large physical systems with scarce supervisory resources.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of an image with the bounding box annotations (left) and the corresponding density map that sums to the counting value (right).</figDesc><graphic coords="2,298.24,84.19,88.01,88.07" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Some examples of images of our Grand Traffic Auto dataset, together with the automatically generated instance segmentation annotations.</figDesc><graphic coords="4,128.18,148.93,104.17,61.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Algorithm overview.Given 𝐶 × 𝐻 × 𝑊 images from source and target domains, we pass them through the density map estimation network to obtain output predictions. A density loss is computed for source predictions based on the ground truth. In order to improve target predictions, a discriminator is used to locally classify whether a density map belongs to the source or target domain. Then, an adversarial loss is computed on the target prediction and is back-propagated to the density map estimation and counting network.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Experimental results obtained for the four considered domain shift in terms of MAE, MSE and ARE. We achieved performance improvements for all the scenarios, considering all the three metrics.</figDesc><table><row><cell></cell><cell>MAE</cell><cell>MSE</cell><cell>ARE</cell></row><row><cell cols="3">Day2Night Domain Shift -NDISPark Dataset</cell><cell></cell></row><row><cell>Baseline -CSRNet [5]</cell><cell>3.95</cell><cell>27.45</cell><cell>0.43</cell></row><row><cell>Our Approach</cell><cell>3.49</cell><cell>20.90</cell><cell>0.39</cell></row><row><cell cols="3">Camera2Camera Domain Shift -WebCamT Dataset [6]</cell><cell></cell></row><row><cell>Baseline -CSRNet [5]</cell><cell>3.24</cell><cell>16.83</cell><cell>0.21</cell></row><row><cell>Our Approach</cell><cell>2.86</cell><cell>13.03</cell><cell>0.19</cell></row><row><cell cols="3">Synthetic2Real Domain Shift -GTA Dataset</cell><cell></cell></row><row><cell>Baseline -CSRNet [5]</cell><cell>4.10</cell><cell>25.83</cell><cell>0.28</cell></row><row><cell>Our Approach</cell><cell>3.88</cell><cell>23.80</cell><cell>0.27</cell></row><row><cell cols="4">considers the relation of the error and the total number of vehicles present for each image.</cell></row><row><cell>Results are summarized in Table</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partially supported by H2020 project AI4EU under GA 825619.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Counting vehicles with deep learning in onboard UAV imagery</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ciampi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Falchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gennaro</surname></persName>
		</author>
		<idno type="DOI">10.1109/ISCC47284.2019.8969620</idno>
		<ptr target="https://doi.org/10.1109/ISCC47284.2019.8969620.doi:10.1109/ISCC47284.2019.8969620" />
	</analytic>
	<monogr>
		<title level="m">2019 IEEE Symposium on Computers and Communications, ISCC 2019</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019-07-03">June 29 -July 3, 2019. 2019</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Counting vehicles with cameras</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ciampi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Falchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gennaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rabitti</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2161/paper12.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th Italian Symposium on Advanced Database Systems</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">S</forename><surname>Bergamaschi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Noia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Maurino</surname></persName>
		</editor>
		<meeting>the 26th Italian Symposium on Advanced Database Systems<address><addrLine>Castellaneta Marina (Taranto), Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">June 24-27, 2018. 2161. 2018</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A wireless smart camera network for parking monitoring</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bolettieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Carrara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ciampi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gennaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">R</forename><surname>Leone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vairo</surname></persName>
		</author>
		<idno type="DOI">10.1109/GLOCOMW.2018.8644226</idno>
		<ptr target="https://doi.org/10.1109/GLOCOMW.2018.8644226.doi:10.1109/GLOCOMW.2018.8644226" />
	</analytic>
	<monogr>
		<title level="m">IEEE Globecom Workshops, GC Wkshps 2018</title>
				<meeting><address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">December 9-13, 2018. 2018</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Learning to count objects in images</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Lempitsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2010/hash/fe73f687e5bc5280214e0486b273a5f9-Abstract.html" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010</title>
				<editor>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Lafferty</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">K I</forename><surname>Williams</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Shawe-Taylor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Zemel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Culotta</surname></persName>
		</editor>
		<meeting><address><addrLine>Vancouver, British Columbia, Canada</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2010-12-09">9 December 2010. 2010</date>
			<biblScope unit="page" from="1324" to="1332" />
		</imprint>
	</monogr>
	<note>Proceedings of a meeting held 6-</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2018.00120</idno>
		<ptr target="http://openaccess.thecvf.com/content_cvpr_2018/html/Li_CSRNet_Dilated_Convolutional_CVPR_2018_paper.html.doi:10.1109/CVPR.2018.00120" />
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018</title>
				<meeting><address><addrLine>Salt Lake City, UT, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2018-06-18">2018. June 18-22, 2018. 2018</date>
			<biblScope unit="page" from="1091" to="1100" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Understanding traffic density from large-scale web camera data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Costeira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M F</forename><surname>Moura</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2017.454</idno>
		<ptr target="http://doi.ieeecomputersociety.org/10.1109/CVPR.2017.454.doi:10.1109/CVPR.2017.454" />
	</analytic>
	<monogr>
		<title level="m">2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017</title>
				<meeting><address><addrLine>Honolulu, HI, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2017">July 21-26, 2017. 2017</date>
			<biblScope unit="page" from="4264" to="4273" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Unbiased look at dataset bias</title>
		<author>
			<persName><forename type="first">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Efros</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2011.5995347</idno>
		<ptr target="https://doi.org/10.1109/CVPR.2011.5995347.doi:10.1109/CVPR.2011.5995347" />
	</analytic>
	<monogr>
		<title level="m">The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011</title>
				<meeting><address><addrLine>Colorado Springs, CO, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2011-06-25">20-25 June 2011. 2011</date>
			<biblScope unit="page" from="1521" to="1528" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
