<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>F. Frattolillo);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>munication through a Trust Factor in Multi-Agent Reinforcement Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simone Di Rienzo</string-name>
          <email>dirienzo.1844531@studenti.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Frattolillo</string-name>
          <email>frattolillo@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Cipollone</string-name>
          <email>cipollone@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Fanti</string-name>
          <email>fanti@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolo' Brandizzi</string-name>
          <email>brandizzi@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Iocchi</string-name>
          <email>iocchi@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Multi-Agent Systems, Reinforcement Learning, Trust Factor, Computational Modeling,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Schloss Birlinghoven, 1, 53757 Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MultiTTrust: 3rd Workshop on Multidisciplinary Perspectives on Human-AI Team</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sapienza University of Rome</institution>
          ,
          <addr-line>Via Ariosto, 25, 00185 Roma RM</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2040</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The concept of trust has long been studied, initially in the context of human interactions and, more recently, in human-machine or human-agent interactions. Despite extensive studies, defining trust remains challenging due to its inherent complexities and the diverse factors that influence its dynamics in multi-agent environments. This paper focuses on a specific formalization of a trust factor: predictive reliability, defined as the ability of agents to accurately forecast the actions of their peers in a shared environment. By realizing this trust factor within the framework of multi-agent reinforcement learning (MARL), we integrate it as a criterion for agents to assess and select collaborators. This approach enhances the functionality of MARL systems, promoting improved cooperation and overall efectiveness.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction and Background</title>
      <p>
        With the advent of artificial intelligence, the number of applications that require co-existence
and the interaction between intelligent agents and humans is increasing over time. Such
applications include autonomous vehicles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], industrial robotics [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], healthcare robotics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
service robotics [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], agricultural robotics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and many more [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this context, the concept
of trust becomes essential, as it fosters cooperation and collaboration between humans and
robots, enhancing eficiency and user satisfaction. It instills confidence in the reliability and
predictability of robotic systems, which is crucial for their acceptance and adoption.
      </p>
      <p>
        Trust is a concept that has been defined numerous times in the literature [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], yet there is still
no single universally accepted definition. However, the numerous factors that influence trust
are much easier to study when analyzed separately. Such factors may be associated with the
trustor, which is the person that trusts, with the trustee, the ones being trusted, or could be
dependent on the context [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this article, to formalize one of such “trust factors”, we take
inspiration from the definition given in Gambetta [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]:
nEvelop-O
”trust (or, symmetrically, distrust) is a particular level of the subjective probability
with which an agent assesses that another agent or group of agents will perform a
particular action, both before he can monitor such action (or independently of his
capacity ever to be able to monitor it) and in a context in which it afects his own
action”
According to this notion, the trust between two agents can be correlated to the trustor’s
expectations about the choices made by the trustee in a context of mutual interaction. We
formalized this definition in a Multi-Agent Reinforcement Learning (MARL) setting. Here,
autonomous agents can benefit from reasoning about other agents’ intentions, and they can use
this information to improve their performance and select which agent to communicate with.
Problem and Solution Formulation we consider the common scenario in which agents do
not have complete knowledge of the environment which is formalized by the Decentralized
Partially Observable Markov Decision Process (DEC-POMDP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] framework, defined as a
Tuple ⟨,  ,  ,  , , Ω, ⟩ , where  is the number of agents;  is the set of environment states
shared by all agents;  is the set of joint actions;  ∶  × × → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is the transition function;
 ∶  ×  → ℝ is the reward function; Ω is the set of joint observations and  ∶  × Ω → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]
is a set of observation probabilities returning the probability of joint observation.
      </p>
      <p>A MARL solution for a DEC-POMDP is a set of  functions, called policies   ∶ Ω →   ,
which map the local observations of each agent to its actions, in order to maximize the expected
joint sum of discounted rewards: ∑=   (  ,   ), where 0 ≤  &lt; 1 .</p>
      <p>
        Trust factors We refer to Frattolillo et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for a general definition of trust factors in
MARL systems. Specifically, any trust factor is computed with respect to a specific Trustor X, a
Trustee Y, and a task Γ:
      </p>
      <p>TrustFactor( | , Γ) =  (
 ,   ,  , [  → ], [  → ], [  → ])
(1)
In this template,   denotes the observation of the trustor,   is the action of the trustee, 
is the immediate reward, and   → is the current belief that the trustor currently maintains
with respect to the trustee. Finally,   → and   → represent known facts that result from
communication from trustor to trustee and vice versa. The brackets denote that these are
optional components.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Method</title>
      <p>
        In this work, we propose a specific instantiation of the function  in eq. (1) to capture the
dependency identified above between the actions of the trustee and the trustor’s expectations.
Specifically, we model one of these trust factors as the ability of one agent, acting as a trustor,
to predict the actions of another agent, acting as trustee. Therefore, among all agents, we select
one to be considered as the primary agent who, in addition to learning its own policy, learns
how to predict the actions computed by other agents. Specifically, for each trustee  , the primary
agent estimates the Trust Score defined as the number of correctly predicted actions over the
(a)
(b)
(c)
number of true actions. For predicting the others’ actions, we adopt a simple neural network
that we define PredNet, which is trained in a supervised fashion and that takes as input the
observations of other agents and returns a prediction about their actions. The action predicted
by the PredNet is concatenated to the state of the primary agent; this allows us to influence its
decisions based on other agents’ intentions. All agents used in the experiment are trained in a
decentralized way through a MARL algorithm called Independent PPO [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Experiments</title>
      <p>The environment used is a customized version of the Level-based Foraging (LBF) [13]. This is a
grid-world multi-agent environment in which agents should navigate and cooperate to collect
food, which can be collected only if the sum of the levels of agents is equal to or higher than the
level of the food. In our experiments, the primary agent selects which agent to communicate
with among the agents in its field of view based on their trust score learned during the training.
We did some experiments in a three agents environment, where the other two agents are defined
respectively as trustable and unreliable. The trustable agent executes its actions according to
its learned policy, and its goal is to cooperate with the primary agent. On the other side, the
unreliable agent performs actions according to a bad policy that, with a certain probability,
leads to incorrect action performed. The results of the experiment are shown in Figure 1. Here,
the trust score with respect to the trustable agent (b) is much higher than the one referred to
the unreliable agent (c), and additionally, the average return of the primary agents is drastically
better when relying on the former (a). In conclusion, we showed that using the trust score as a
mechanism to select which agent to communicate with improves the performance in the case
where an agent is not reliable.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Acknowledgments</title>
      <p>This work is supported by the Air Force Ofice of Scientific Research under award number
FA8655-23-1-7257.
abs/2011.09533 (2020). URL: https://arxiv.org/abs/2011.09533. arXiv:2011.09533.
[13] F. Christianos, L. Schäfer, S. V. Albrecht, Shared experience actor-critic for multi-agent
reinforcement learning, in: Advances in Neural Information Processing Systems (NeurIPS),
2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Parekh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Poddar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chahal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. P.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>A review on autonomous vehicles: Progress, methods and challenges</article-title>
          ,
          <source>Electronics</source>
          <volume>11</volume>
          (
          <year>2022</year>
          ). URL: https://www.mdpi.com/2079-9292/11/14/2162. doi:
          <volume>10</volume>
          .3390/electronics11142162.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Villani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leali</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Secchi, Survey on human-robot collaboration in industrial settings: Safety, intuitive interfaces and applications</article-title>
          ,
          <source>Mechatronics</source>
          <volume>55</volume>
          (
          <year>2018</year>
          )
          <fpage>248</fpage>
          -
          <lpage>266</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S0957415818300321. doi:https: //doi.org/10.1016/j.mechatronics.
          <year>2018</year>
          .
          <volume>02</volume>
          .009.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kyrarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lygerakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajavenkatanarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sevastopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Nambiappan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Chaitanya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Babu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mathew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Makedon</surname>
          </string-name>
          ,
          <article-title>A survey of robots in healthcare</article-title>
          ,
          <source>Technologies</source>
          <volume>9</volume>
          (
          <year>2021</year>
          ). URL: https://www.mdpi.com/2227-7080/9/1/8. doi:
          <volume>10</volume>
          .3390/technologies9010008.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Zachiotis</surname>
          </string-name>
          , G. Andrikopoulos,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gornez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nakamura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nikolakopoulos</surname>
          </string-name>
          ,
          <article-title>A survey on the application trends of home service robotics</article-title>
          ,
          <source>in: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1999</fpage>
          -
          <lpage>2006</lpage>
          . doi:
          <volume>10</volume>
          .1109/ROBIO.
          <year>2018</year>
          .
          <volume>8665127</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Vasconez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Kantor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Auat</surname>
          </string-name>
          <string-name>
            <surname>Cheein</surname>
          </string-name>
          ,
          <article-title>Human-robot interaction in agriculture: A survey and current challenges</article-title>
          ,
          <source>Biosystems Engineering</source>
          <volume>179</volume>
          (
          <year>2019</year>
          )
          <fpage>35</fpage>
          -
          <lpage>48</lpage>
          . URL: https: //www.sciencedirect.com/science/article/pii/S1537511017309625. doi:https://doi.org/ 10.1016/j.biosystemseng.
          <year>2018</year>
          .
          <volume>12</volume>
          .005.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dahiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dautenhahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>A survey of multi-agent human-robot interaction systems</article-title>
          ,
          <source>Robotics and Autonomous Systems</source>
          <volume>161</volume>
          (
          <year>2023</year>
          )
          <article-title>104335</article-title>
          . URL: https: //www.sciencedirect.com/science/article/pii/S092188902200224X. doi:https://doi.org/ 10.1016/j.robot.
          <year>2022</year>
          .
          <volume>104335</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shahrdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Menezes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nojoumian</surname>
          </string-name>
          ,
          <article-title>A survey on trust in autonomous systems</article-title>
          ,
          <source>Advances in Intelligent Systems and Computing</source>
          <volume>857</volume>
          (
          <year>2019</year>
          )
          <fpage>368</fpage>
          -
          <lpage>386</lpage>
          . URL: https://link. springer.com/chapter/10.1007/978-3-
          <fpage>030</fpage>
          -01177-2_
          <fpage>27</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -01177-2_ 27/TABLES/4.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hancock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. T.</given-names>
            <surname>Kessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stowers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Brill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Billings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Schaefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Szalma</surname>
          </string-name>
          ,
          <article-title>How and why humans trust: A meta-analysis and elaborated model</article-title>
          ,
          <source>Frontiers in psychology 14</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gambetta</surname>
          </string-name>
          , Can We Trust Trust?,
          <source>Trust: Making and Breaking Cooperative Relations, electronic edition</source>
          , Department of Sociology, University of Oxford (
          <year>2000</year>
          )
          <fpage>213</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Givan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Immerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zilberstein</surname>
          </string-name>
          ,
          <source>The Complexity of Decentralized Control of Markov Decision Processes, Mathematics of Operations Research</source>
          <volume>27</volume>
          (
          <year>2002</year>
          )
          <fpage>819</fpage>
          -
          <lpage>840</lpage>
          . URL: https://pubsonline.informs.org/doi/10.1287/moor.27.4.819.297. doi:
          <volume>10</volume>
          .1287/ moor.27.4.819.297.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Frattolillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cipollone</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Luca</surname>
          </string-name>
          ,
          <article-title>Towards computational models for reinforcement learning in human-ai teams</article-title>
          ,
          <source>2nd International Workshop on Multidisciplinary Perspectives on Human-AI Team Trust</source>
          (
          <year>2023</year>
          ). URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3634</volume>
          /paper9. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C. S. de Witt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Makoviichuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Makoviychuk</surname>
            ,
            <given-names>P. H. S.</given-names>
          </string-name>
          <string-name>
            <surname>Torr</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , S. Whiteson,
          <article-title>Is independent learning all you need in the starcraft multi-agent challenge?</article-title>
          , CoRR
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>