<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Article Summary</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ann Now´e</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katje Verbeeck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Maarten Peeters</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vrije Universiteit Brussel</institution>
          ,
          <addr-line>Pleinlaan 2, 1050 Brussel</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>[1] Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Swarm Intelligence, From Natural to Artificial Systems. Santa Fe Institute studies in the sciences of complexity. Oxford University Press</institution>
          ,
          <addr-line>1999</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Learning Automata (LA) are adaptive decision making devices suited for operation in unknown environments [12]. Originally they were developed in the area of mathematical psychology and used for modeling observed behavior. In its current form, LA are closely related to Reinforcement Learning (RL) approaches and most popular in the area of engineering. LA combine fast and accurate convergence with low computational complexity, and have been applied to a broad range of modeling and control problems. However, the intuitive, yet analytically tractable concept of learning automata makes them also very suitable as a theoretical framework for Multi agent Reinforcement Learning (MARL). Reinforcement Learning (RL) is already an established and profound theoretical framework for learning in stand-alone or single-agent systems. Yet, extending RL to multi-agent systems (MAS) does not guarantee the same theoretical grounding. As long as the environment an agent is experiencing is Markov, and the agent can experiment sufficiently, RL guarantees convergence to the optimal strategy. In a MAS however, the reinforcement an agent receives, may depend on the actions taken by the other agents acting in the same environment. Hence, the Markov property no longer holds. And as such, guarantees of convergence are lost. In the light of the above problem it is important to fully understand the dynamics of multi-agent reinforcement learning. Although, they are not fully recognized as such, LA are valuable tools for current MARL research. LA are updated strictly on the basis of the response of the environment, and not on the basis of any knowledge regarding other automata, i.e. nor their strategies, nor their feedback. As such LA agents are very simple. Moreover, LA can be treated analytically. Convergence proofs do exist for a variety of settings ranging from a single automaton model acting in a simple stationary random environment to a distributed automata model interacting in a complex environment. In this paper we argue that LA are very interesting building blocks for learning in multi agent systems. The LA can be viewed as policy iterators, who update their action probabilities based on private information only. This is a very attractive property in applications where communication is expensive. LA are in particular appealing in games with stochastic payoffs. Then we move to collections of learning automata, that can independently converge to interesting solution concepts. We study the single stage setting, including the analytical results. Then we generalize to interconnected learning automata, that can deal with multi agent multi-stage problems. We also show how Ant Colony Optimization can be mapped to the interconnected Learning Automata setting. This extended abstract is a summary of an article published earlier [14].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>References
[3] C. Boutilier. Sequential optimality and coordination in multiagent systems. In Proceedings of
the 16th International Joint Conference on Artificial Intelligence, pages 478 – 485, Stockholm,
Sweden, 1999.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.R.</given-names>
            <surname>Bush</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Mosteller</surname>
          </string-name>
          .
          <article-title>Stochastic Models for Learning</article-title>
          . Wiley, New York,
          <year>1958</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Claus</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Boutilier</surname>
          </string-name>
          .
          <article-title>The dynamics of reinforcement learning in cooperative multiagent systems</article-title>
          .
          <source>In Proceedings of the 15th National Conference on Artificial Intelligence</source>
          , pages
          <fpage>746</fpage>
          -
          <lpage>752</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Colorni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Maffioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Maniezzo</surname>
          </string-name>
          , G. Righini, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Trubian</surname>
          </string-name>
          .
          <article-title>Heuristics from nature for hard combinatorial optimization problems</article-title>
          .
          <source>International Transactions in Operational Research</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Caro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          .
          <article-title>Ant algorithms for discrete optimization</article-title>
          .
          <source>Artificial Life</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>137</fpage>
          -
          <lpage>172</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Dorigo</surname>
          </string-name>
          and
          <article-title>Gianno Di Caro. The ant colony optimization meta-heuristic</article-title>
          . D.
          <string-name>
            <surname>Corne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dorigo</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Glover</surname>
          </string-name>
          (Eds.), New Ideas In Optimization, Maidenhaid, UK:
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Dorigo</surname>
          </string-name>
          , Vittorio Maniezzo, and
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Colorni</surname>
          </string-name>
          .
          <article-title>The ant system: Optimization by a colony of cooperating agents</article-title>
          .
          <source>IEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Dorigo</surname>
          </string-name>
          and Thomas Stu¨tzle. Ant Colony Optimization. The MIT Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Littman</surname>
          </string-name>
          .
          <article-title>Markov games as a framework for multi-agent reinforcement learning</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Machine Learning</source>
          , pages
          <fpage>322</fpage>
          -
          <lpage>328</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Narendra</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Thathachar</surname>
          </string-name>
          .
          <source>Learning Automata: An Introduction</source>
          .
          <string-name>
            <surname>Prentice-Hall</surname>
            <given-names>International</given-names>
          </string-name>
          , Inc,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.S.</given-names>
            <surname>Narendra</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Parthasarathy</surname>
          </string-name>
          .
          <article-title>Learning automata approach to hierarchical multiobjective analysis</article-title>
          .
          <source>Technical Report Report No. 8811</source>
          , Electrical Engineering Yale University, New Haven, Connecticut,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [14] Ann Now´e, Katja Verbeeck, and
          <string-name>
            <given-names>Maarten</given-names>
            <surname>Peeters</surname>
          </string-name>
          .
          <article-title>Learning automata as a basis for multi agent reinforcement learning</article-title>
          .
          <source>Learning and Adaptation in Multi-Agent Systems, Lecture Notes in Artificial Intelligence</source>
          ,
          <volume>3898</volume>
          :
          <fpage>71</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.J.</given-names>
            <surname>Oommen</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.D.</given-names>
            <surname>Roberts</surname>
          </string-name>
          .
          <article-title>Continuous learning automata solutions to the capacity assignment problem</article-title>
          .
          <source>IEEE Transactions on Computations</source>
          ,
          <volume>49</volume>
          :
          <fpage>608</fpage>
          -
          <lpage>620</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sutton</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Barto</surname>
          </string-name>
          .
          <article-title>Reinforcement Learning: An Introduction</article-title>
          . MIT Press, Cambridge, MA,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.L.</given-names>
            <surname>Tsetlin</surname>
          </string-name>
          .
          <article-title>Automaton theory and modelling of biological systems</article-title>
          .
          <source>Mathematics in Science and Engineering</source>
          ,
          <volume>102</volume>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Unsal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kachroo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Bay</surname>
          </string-name>
          .
          <article-title>Multiple stochastic learning automata for vehicule path control in an automated highway system</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <volume>29</volume>
          :
          <fpage>120</fpage>
          -
          <lpage>128</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbeeck</surname>
          </string-name>
          .
          <article-title>Coordinated Exploration in Multi-Agent Reinforcement Learning</article-title>
          .
          <source>PhD thesis</source>
          , Computational Modeling Lab, Vrije Universiteit Brussel, Belgium,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Verbeeck</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Now´e,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tuyls</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Peeters</surname>
          </string-name>
          <article-title>. Multi-agent reinforcement learning in stochastic single and multi-stage games</article-title>
          . In Kudenko et al (Eds): Adaptive Agents and
          <string-name>
            <surname>Multi-Agent Systems</surname>
            <given-names>II</given-names>
          </string-name>
          , pages
          <fpage>275</fpage>
          -
          <lpage>294</lpage>
          . Springer LNAI 3394,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>