<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Learning Automata as a Basis for Multi Agent Reinforcement Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ann</forename><surname>Nowé</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Vrije Universiteit Brussel</orgName>
								<address>
									<addrLine>Pleinlaan 2</addrLine>
									<postCode>1050</postCode>
									<settlement>Brussel</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Katje</forename><surname>Verbeeck</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Vrije Universiteit Brussel</orgName>
								<address>
									<addrLine>Pleinlaan 2</addrLine>
									<postCode>1050</postCode>
									<settlement>Brussel</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maarten</forename><surname>Peeters</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Vrije Universiteit Brussel</orgName>
								<address>
									<addrLine>Pleinlaan 2</addrLine>
									<postCode>1050</postCode>
									<settlement>Brussel</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Learning Automata as a Basis for Multi Agent Reinforcement Learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1D5046FEABE97E58BD17943AF4F2E23C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Learning Automata (LA) are adaptive decision making devices suited for operation in unknown environments <ref type="bibr" target="#b11">[12]</ref>. Originally they were developed in the area of mathematical psychology and used for modeling observed behavior. In its current form, LA are closely related to Reinforcement Learning (RL) approaches and most popular in the area of engineering. LA combine fast and accurate convergence with low computational complexity, and have been applied to a broad range of modeling and control problems. However, the intuitive, yet analytically tractable concept of learning automata makes them also very suitable as a theoretical framework for Multi agent Reinforcement Learning (MARL).</p><p>Reinforcement Learning (RL) is already an established and profound theoretical framework for learning in stand-alone or single-agent systems. Yet, extending RL to multi-agent systems (MAS) does not guarantee the same theoretical grounding. As long as the environment an agent is experiencing is Markov, and the agent can experiment sufficiently, RL guarantees convergence to the optimal strategy. In a MAS however, the reinforcement an agent receives, may depend on the actions taken by the other agents acting in the same environment. Hence, the Markov property no longer holds. And as such, guarantees of convergence are lost. In the light of the above problem it is important to fully understand the dynamics of multi-agent reinforcement learning.</p><p>Although, they are not fully recognized as such, LA are valuable tools for current MARL research. LA are updated strictly on the basis of the response of the environment, and not on the basis of any knowledge regarding other automata, i.e. nor their strategies, nor their feedback. As such LA agents are very simple. Moreover, LA can be treated analytically. Convergence proofs do exist for a variety of settings ranging from a single automaton model acting in a simple stationary random environment to a distributed automata model interacting in a complex environment.</p><p>In this paper we argue that LA are very interesting building blocks for learning in multi agent systems. The LA can be viewed as policy iterators, who update their action probabilities based on private information only. This is a very attractive property in applications where communication is expensive. LA are in particular appealing in games with stochastic payoffs. Then we move to collections of learning automata, that can independently converge to interesting solution concepts. We study the single stage setting, including the analytical results. Then we generalize to interconnected learning automata, that can deal with multi agent multi-stage problems. We also show how Ant Colony Optimization can be mapped to the interconnected Learning Automata setting.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body/>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Swarm Intelligence, From Natural to Artificial Systems</title>
		<author>
			<persName><forename type="first">Eric</forename><surname>Bonabeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guy</forename><surname>Theraulaz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Santa Fe Institute studies in the sciences of complexity</title>
				<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Planning, learning and coordination in multiagent decision processes</title>
		<author>
			<persName><forename type="first">C</forename><surname>Boutilier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge</title>
				<meeting>the 6th Conference on Theoretical Aspects of Rationality and Knowledge<address><addrLine>Renesse, Holland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="195" to="210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sequential optimality and coordination in multiagent systems</title>
		<author>
			<persName><forename type="first">C</forename><surname>Boutilier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Joint Conference on Artificial Intelligence</title>
				<meeting>the 16th International Joint Conference on Artificial Intelligence<address><addrLine>Stockholm, Sweden</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="478" to="485" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Stochastic Models for Learning</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Bush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mosteller</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1958">1958</date>
			<publisher>Wiley</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The dynamics of reinforcement learning in cooperative multiagent systems</title>
		<author>
			<persName><forename type="first">C</forename><surname>Claus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Boutilier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th National Conference on Artificial Intelligence</title>
				<meeting>the 15th National Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="746" to="752" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Heuristics from nature for hard combinatorial optimization problems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Colorni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Maffioli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Maniezzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Righini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Trubian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Transactions in Operational Research</title>
		<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Ant algorithms for discrete optimization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">Di</forename><surname>Caro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Gambardella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Life</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="137" to="172" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The ant colony optimization meta-heuristic</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gianno</forename><surname>Di</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caro</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">New Ideas In Optimization</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Corne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Dorigo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Glover</surname></persName>
		</editor>
		<meeting><address><addrLine>Maidenhaid, UK</addrLine></address></meeting>
		<imprint>
			<publisher>McGraw-Hill</publisher>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The ant system: Optimization by a colony of cooperating agents</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vittorio</forename><surname>Maniezzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alberto</forename><surname>Colorni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEE Transactions on Systems, Man, and Cybernetics</title>
		<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Ant Colony Optimization</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Stützle</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<publisher>The MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Markov games as a framework for multi-agent reinforcement learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Littman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Conference on Machine Learning</title>
				<meeting>the 11th International Conference on Machine Learning</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="322" to="328" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Learning Automata: An Introduction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Narendra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thathachar</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Prentice-Hall International, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Learning automata approach to hierarchical multiobjective analysis</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Narendra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Parthasarathy</surname></persName>
		</author>
		<idno>No. 8811</idno>
		<imprint>
			<date type="published" when="1988">1988</date>
			<pubPlace>New Haven, Connecticut</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Electrical Engineering Yale University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report Report</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning automata as a basis for multi agent reinforcement learning. Learning and Adaptation in Multi-Agent Systems</title>
		<author>
			<persName><forename type="first">Ann</forename><surname>Nowé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Katja</forename><surname>Verbeeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maarten</forename><surname>Peeters</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lecture Notes in Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">3898</biblScope>
			<biblScope unit="page" from="71" to="85" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Continuous learning automata solutions to the capacity assignment problem</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Oommen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Roberts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Computations</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="608" to="620" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Reinforcement Learning: An Introduction</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>MIT Press</publisher>
			<pubPlace>Cambridge, MA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automaton theory and modelling of biological systems</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Tsetlin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mathematics in Science and Engineering</title>
		<imprint>
			<biblScope unit="volume">102</biblScope>
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Multiple stochastic learning automata for vehicule path control in an automated highway system</title>
		<author>
			<persName><forename type="first">C</forename><surname>Unsal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kachroo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Bay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Systems, Man, and Cybernetics, Part A</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="120" to="128" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Coordinated Exploration in Multi-Agent Reinforcement Learning</title>
		<author>
			<persName><forename type="first">K</forename><surname>Verbeeck</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<pubPlace>Belgium</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Computational Modeling Lab ; Vrije Universiteit Brussel</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Multi-agent reinforcement learning in stochastic single and multi-stage games</title>
		<author>
			<persName><forename type="first">K</forename><surname>Verbeeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nowé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tuyls</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Peeters</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Adaptive Agents and Multi-Agent Systems II</title>
				<editor>
			<persName><surname>Kudenko</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">3394</biblScope>
			<biblScope unit="page" from="275" to="294" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
