<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multi-robot Sanitization of Railway Stations Based on Deep Q-Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Riccardo</forename><surname>Caccavale</surname></persName>
							<email>riccardo.caccavale@unina.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli studi di Napoli &quot;Federico II&quot;</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vincenzo</forename><surname>Calà</surname></persName>
							<email>v.cala@rfi.it</email>
							<affiliation key="aff1">
								<orgName type="institution">Rete Ferroviaria Italiana</orgName>
								<address>
									<settlement>Rome</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mirko</forename><surname>Ermini</surname></persName>
							<email>mi.ermini@rfi.it</email>
							<affiliation key="aff2">
								<orgName type="institution">Rete Ferroviaria Italiana</orgName>
								<address>
									<addrLine>Firenze Osmannoro</addrLine>
									<settlement>Florence</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alberto</forename><surname>Finzi</surname></persName>
							<email>alberto.finzi@unina.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli studi di Napoli &quot;Federico II&quot;</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vincenzo</forename><surname>Lippiello</surname></persName>
							<email>vincenzo.lippiello@unina.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli studi di Napoli &quot;Federico II&quot;</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fabrizio</forename><surname>Tavano</surname></persName>
							<email>fabrizio.tavano@unina.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli studi di Napoli &quot;Federico II&quot;</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Rete Ferroviaria Italiana</orgName>
								<address>
									<settlement>Rome</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multi-robot Sanitization of Railway Stations Based on Deep Q-Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">EA4EC7A14D616D5214305B15A6178583</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T15:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Deep Reinforcement Learning</term>
					<term>Multi-robot Systems</term>
					<term>Experience Replay Buffer</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Sanitizing railway stations is a relevant issue especially due to the recent evolution of the Covid-19 pandemic. In this work, we propose a multi-robot approach to sanitize railway stations based on a distributed Deep Q-Learning technique. The framework relies on anonymous information from existing WiFi networks to localize passengers inside the station and to develop a map of possible risky areas to be sanitized. Starting from this map, a swarm of cleaning robots, each one endowed with a robot-specific convolutional neural network, learns how to on-line cooperate inside the station in order to maximize the sanitized area depending on the presence of the passengers.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, the spreading of diseases such as the Covid-19 has emphasized the problem of sanitizing large and crowded public environments like railway stations. In the present work, our aim is to design a solution for the sanitizing by the Deep Q-Learning technique in a real case of study of interest for Italian railway infrastructure manager RFI s.p.a., in a real environment offered by the most important italian railway station of the capital, Roma Termini. The framework relies on anonymous information from existing WiFi networks to localize passengers inside the station and to develop a map of possible risky areas to be sanitized. Starting from this map, we propose a decentralized approach where a swarm of cleaning robots, each one endowed with a robot-specific convolutional neural network, learns how to on-line cooperate inside the station in order to maximize the sanitized area depending on the presence of the passengers. In the multi-robot sanitizing system literature, the prominent approach is based on coverage path planning (CPP) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref> where the area to sanitize is divided between agents in order to cover the whole space. These approaches are suitable for cleaning and sanitizing the environment with a scalable number of robots, but prioritization issues are hardly considered. MARL frameworks are often proposed to ensure flexibility and scalability in different applications like exploration <ref type="bibr" target="#b5">[6]</ref>, construction <ref type="bibr" target="#b6">[7]</ref>, or target-capturing <ref type="bibr" target="#b7">[8]</ref>, but also in this case priority-based cleaning issues are not commonly covered. An interesting approach is proposed by <ref type="bibr" target="#b7">[8]</ref>, where multiple agents distributedly learn a collaborative policy in a shared environment using A3C training method in order to achieve a target-capturing task. Inspired by these approaches, we propose a scalable multi-robot sanitizing framework where multiple mobile robots learns to cooperate during the execution of cleaning tasks into large crowded environments, introducing a priority-based strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The architecture</head><p>Our multi-robot sanitizing problem can be described as follows. Starting from a gridmap 𝑀 representing the environment to be sanitized, we define 𝑆 as the set of possible heatmaps (i.e., priority distributions) on the map 𝑀, and 𝑋 as the set possible free-obstacle positions of 𝑀. In this setting, we assume 𝑘 agents, tasked to sanitize the environment 𝑀, each one endowed with a set of single-agent actions 𝐴. Our aim is to find a set of agent-specific strategies (𝜋 1 , … , 𝜋 𝑘 ) such that each 𝜋 𝑖 ∶ 𝑆 × 𝑋 → 𝐴 drives an agent towards prioritized areas, in coordination with the other agents, in order to maximize the global cleaning effect. This distributed approach is mainly designed to support the scalability: we adopt a client-server approach, where each agent (client) learns a decoupled agent-specific strategy by communicating with a central system (server).</p><p>A representation of the overall architecture is depicted in Figure <ref type="figure" target="#fig_0">1</ref>. The framework is composed of a set of intelligent agents, representing mobile cleaning robots, each one communicating with the central server. The role of the server (server-side) is to merge the outcomes of the agents activities with (anonymized) data about people positions in order to produce a heatmap for the risky areas to be sterilized. The role of each agent (agent-side) is to elaborate the heatmap by means of an agent-specific Deep Q-Network (DQN) and to update the local strategy 𝜋 𝑖 considering the environmental settings and the different priorities in the map. In this framework, the cleaning priority can be defined as a heatmap, whose hot/cold points are high/low priority areas to be sanitized. Following this perspective, a state-position couple (𝑠, 𝑥) ∈ 𝑆 × 𝑋 is defined as a 2 channel matrix 𝑚 × 𝑛 × 2 where 𝑚 and 𝑛 are the width and the height of the environment, respectively. The first channel 𝑠 of the matrix represents the cleaning-priority on the environment, whose elements are real numbers in the interval [0, 1], where 1 is the maximum priority and 0 means that no cleaning is needed. The second channel 𝑥 is a binary matrix representing the position and size of the cleaning area of the robot, which is 1 for the portions of the environment that are in the range of the robot cleaning effect, and 0 otherwise. This matrix can be shown as a heatmap (see map in Figure <ref type="figure" target="#fig_0">1</ref>), where black pixels have 0 priority, while colors from red to yellow are for increasingly higher priorities.</p><p>In our framework, the update of priorities is performed by the server, which collects the outputs of the single agents, and integrates them considering the position of people and obstacles. More specifically, the cleaning priority is computed from the position of clusters of people by modeling possible spreading of viruses or bacteria. In our setting, we exploit the periodic convolution of a Gaussian filter 𝒩 (𝜇, 𝜎 2 ) every 𝜓 steps, where 𝜇, 𝜎 2 and 𝜓 are suitable parameters that can be regulated depending on the meters/pixels ratio, the timestep, and the considered typology of spreading (in this work we assume a setting inspired to the aerial diffusion of the Covid-19 <ref type="bibr" target="#b8">[9]</ref>). Here, starting from a set of randomly generated clusters, the probability distribution evolves through the iterative convolution of the Gaussian filter. The convolution process acts at every step by incrementally reducing the magnitude of the elements of the heatmap matrix, while distributing the priority on a wider area. Convolution is here exploited to simulate the effects of the attenuation and the spreading of the contamination process over time. We have chosen the parameters of the Gaussian function in order to have a radius of the area, interested by the infection, of 5 meters (𝜇 =0, 𝜎 = 0.9). This value is selected considering that we know the position of a cluster of people with an WiFi average positioning error of accuracy of about 3 meters as described in <ref type="bibr" target="#b9">[10]</ref> and we consider also that the distance of safety is about of 2 meters between peoples that make use of the indicated surgery masks during the actual period of emergency caused by the Covid-19 diffusion <ref type="bibr" target="#b8">[9]</ref>. In the map (see Figure <ref type="figure" target="#fig_1">2</ref>, right) there are several black areas (0 priority) that are regions of space associated with the static obstacles of the environment (shops, rooms and walls inside the station). These areas are assumed to be always clean, hence unattractive for the robots. When an agent moves into the environment with an action 𝑎 𝑖 ∈ 𝐴, the region in the neighborhood of the newly reached position is cleaned by the server, which sets to 0 the associated priority level. In our framework, we propose a simple multi-agent variation of the experience replay method proposed in <ref type="bibr" target="#b10">[11]</ref>. Following this approach, each of the 𝑘 agents is endowed with a specific replay buffer, along with specific target and main DQNs, that are synchronously updated with respect to the position of the agent and to the shared environment provided by the server (see Figure <ref type="figure" target="#fig_0">1</ref>). The local reward function 𝑟 𝑖 is designed to drive the agents toward prioritized areas of the environment (hot points), while avoiding obstacles and already visited areas (cold points). In this direction, we firstly introduce a cumulative priority function 𝑐𝑝 𝑖 that summarizes the importance of a cleaned area:</p><formula xml:id="formula_0">𝑐𝑝 𝑖 = ∑ (𝑗,𝑙) 𝑠 𝑖 (𝑗, 𝑙)𝑥 𝑖 (𝑗, 𝑙)<label>(1)</label></formula><p>as the sum of the element-wise priorities from matrix 𝑠 𝑖 in the area sterilized by the agent 𝑖 (i.e. where 𝑥 𝑖 (𝑗, 𝑙) = 1). The value in Equation 1 is then exploited to define the reward 𝑟 𝑖 for the agent 𝑖:</p><formula xml:id="formula_1">𝑟 𝑖 = { 𝑐𝑝 𝑖 if 𝑐𝑝 𝑖 &gt; 0; 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 otherwise. (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>Specifically, when an agent 𝑖 sanitizes a priority area, the reward is equal to the cumulative value 𝑐𝑝 𝑖 ; otherwise, if no priority is associated to the cleaned area (i.e., 𝑐𝑝 𝑖 = 0) a negative reward 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 &lt; 0 is earned (we empirically set 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = −2 for our case studies). This way, agents receive a reward that is proportional to the importance of the sanitized area, while routes toward zero-priority areas, such as obstacles or clean regions, are discouraged. Notice that in this framework, when the action of an agent leads to an obstacle (collision), no motion is performed. This behavior penalizes the agent (no further cleaning are performed), thus producing an indirect drive towards collision-free paths. Moreover, as long as an agent moves through the environment it leaves a wake of cleaned space behind. This way, since the priority of already visited areas is 0, agents can indirectly observe their mutual behavior from the priority update, in so avoiding explicit communication, hence robots in our experiments are not directly aware of the position of the other agents which is indirectly estimated from their paths.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Case Studies</head><p>A graphical representation of the environment is shown in Figure <ref type="figure" target="#fig_1">2</ref>. We selected a region of space of 100 × 172 meters in front of the rails, where people usually stands waiting for the incoming trains. From that region we also isolated shops, stairs and walls as obstacles to be avoided by the robot during the sanitizing process (black areas in the Figure, 2, right). Agents can move by one pixel in any direction, hence the set 𝐴 includes 8 actions (4 linear and 4 diagonal) while, in case one action leads to an inconsistent location (obstacle or out of bound) the agent stays in the current location. In this setting, we propose two case studies: in the first one we assess the system performance during the learning phase considering different numbers of robots (2 to 6 robots) while, in the second case, a more realistic scenario is considered, where the cleaning performance of robots are assessed considering an increasing number of moving clusters. In this first case study, we show how the learning performance of the proposed approach scales over the number of cleaning agents. The starting point of every robot in the heatmap is set at random, because in our study we want to find a solution that is independent by this initial condition. We designed a training process where, at the beginning of each episode, a random number of clusters is selected and each cluster is randomly positioned inside the station. Specifically, each obstacle-free location of the map has a 0.02 probability of generating a cluster. Each episode ends when agents successfully clean up to the 98% of the map or until a timeout is reached (400 steps are performed). During the training process we collect the overall reward as the sum of the single agents rewards and the number of steps needed to accomplish the task. This setting is intentionally designed to train agents to address a generic distribution of priorities, which can be generated during daily cleaning processes. As for the execution time, the number of steps needed to accomplish the task, namely to clean the 98% of the map, decreases with the increasing number of agents. Specifically, the 2 agents configuration needs 174 steps on average to accomplish the task, while the 4, 6 and 8 agents ones need 127, 112, and 94 steps, with a time reduction of 27%, 12%, and 16%, respectively. Also in this case, the time reduction indicates that the proposed approach successfully scales to different number of robots.</p><p>In order to assess the performance of the system into more realistic scenarios, we propose a different setting by considering different number of clusters and a simulated WiFi server that periodically updates the position of clusters at a specific rate (once every 15 steps). The numbers of clusters have been selected according to the average number of visitors-per-hour of the considered portion of the station (see Figure <ref type="figure" target="#fig_1">2</ref>); moreover, during the runs, the values are designed to be randomly reduced up to the 30% in order to simulate the departure/arrival of passengers in the station.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>In this work we proposed a scalable multi-robot sanitizing framework based on a distributed Deep Q-Learning technique, suitable for the efficient cleaning of large and crowded indoor environment such as railways stations. The proposed simulated experiments indicate that, as expected, the cleaning performance of the framework is proportional to the number of robots and inversely proportional to the number of people in the station. To asses the performance of our framework we proposed a worst-case test, where a large number of moving people is scattered (uniformly distributed) all around the station and robots should cover a wide area to perform the task. This setting is challenging compared to a real railway station, where people are often grouped near specific areas like shops, info points or ticket offices (see example in Figure <ref type="figure" target="#fig_1">2</ref>, left) and robots can easily converge to those areas to maximize the sanitization effect. As future research activities, we plan to extend our pilot study by testing the proposed framework in a more realistic scenario, considering more complex robotic models and daily recorded data about the real people distribution in the station.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Graphical representation of the framework including multiple agents (left) endowed with agent-specific experience replay buffers and networks, and a single server (right) exploiting WiFi statistics to provide an heatmap of priorities (red to yellow spots) for the agents.</figDesc><graphic coords="2,280.22,132.48,184.60,138.49" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example of the distribution of people inside the Termini station as retrieved from the Cisco Meraki WiFi network (left) and comparison of the 0 to 8 robot settings (5 pictures on the right) considering the simulated environment with 700 random dynamic clusters after 90 steps of execution.</figDesc><graphic coords="5,183.44,58.43,300.95,225.76" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Complete coverage navigation of cleaning robots using triangular-cell-based map</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">H</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<idno type="DOI">10.1109/TIE.2004.825197</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Industrial Electronics</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="718" to="726" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Scalable coverage path planning for cleaning robots using rectangular map decomposition on large environments</title>
		<author>
			<persName><forename type="first">X</forename><surname>Miao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B.-Y</forename><surname>Kang</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2018.2853146</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="38200" to="38215" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sector-based maximal online coverage of unknown environments for cleaning robots with limited sensing</title>
		<author>
			<persName><forename type="first">T.-K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Baek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-Y</forename><surname>Oh</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.robot.2011.05.005</idno>
		<ptr target="https://doi.org/10.1016/j.robot.2011.05.005" />
	</analytic>
	<monogr>
		<title level="j">Robotics and Autonomous Systems</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="698" to="710" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Complete coverage algorithm based on linked smooth spiral paths for mobile robots</title>
		<author>
			<persName><forename type="first">T.-K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-H</forename><surname>Baek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-Y</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Choi</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICARCV.2010.5707264</idno>
	</analytic>
	<monogr>
		<title level="m">11th International Conference on Control Automation Robotics Vision</title>
				<imprint>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="609" to="614" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation</title>
		<author>
			<persName><forename type="first">T.-K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-H</forename><surname>Baek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-Y</forename><surname>Oh</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.robot.2011.06.002</idno>
		<ptr target="https://doi.org/10.1016/j.robot.2011.06.002" />
	</analytic>
	<monogr>
		<title level="j">Robotics and Autonomous Systems</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="801" to="812" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mrcdrl: Multi-robot coordination with deep reinforcement learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pan</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2020.04.028</idno>
		<ptr target="https://doi.org/10.1016/j.neucom.2020.04.028" />
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">406</biblScope>
			<biblScope unit="page" from="68" to="76" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Deep decentralized multi-task multi-agent reinforcement learning under partial observability</title>
		<author>
			<persName><forename type="first">S</forename><surname>Omidshafiei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pazis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>How</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vian</surname></persName>
		</author>
		<ptr target="http://proceedings.mlr.press/v70/omidshafiei17a.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Precup</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><forename type="middle">W</forename><surname>Teh</surname></persName>
		</editor>
		<meeting>the 34th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="page" from="2681" to="2690" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Distributed reinforcement learning for multi-robot decentralized collective construction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sartoretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Paivine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Koenig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Choset</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Distributed Autonomous Robotic Systems</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Correll</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Schwager</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Otte</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="35" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">/6 feet of inter-personal distance could not be enough</title>
		<author>
			<persName><forename type="first">L</forename><surname>Setti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Passarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Gennaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Perrone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Borelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palmisani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Gilio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Piscitelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Miani</surname></persName>
		</author>
		<ptr target="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7215485/" />
	</analytic>
	<monogr>
		<title level="m">Airborne transmission route of covid-19: Why 2 meters</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Review of indoor positioning: Radio wave technology</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kim Geok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zar Aung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sandar Aung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thu Soe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abdaziz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">Pao</forename><surname>Liew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hossain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Tso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Yong</surname></persName>
		</author>
		<idno type="DOI">10.3390/app11010279</idno>
		<ptr target="https://www.mdpi.com/2076-3417/11/1/279.doi:10.3390/app11010279" />
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Human-level control through deep reinforcement learning</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mnih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Rusu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Veness</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Bellemare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Fidjeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ostrovski</surname></persName>
		</author>
		<ptr target="https://www.nature.com/articles/nature14236#article-info" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
