Developing Targeted Communication through a Trust
                                Factor in Multi-Agent Reinforcement Learning
                                Simone Di Rienzo1 , Francesco Frattolillo1 , Roberto Cipollone1 , Andrea Fanti1 ,
                                Nicolo’ Brandizzi1,2 and Luca Iocchi1
                                1
                                    Sapienza University of Rome, Via Ariosto, 25, 00185 Roma RM, Italy
                                2
                                    Fraunhofer IAIS, Schloss Birlinghoven, 1, 53757 Sankt Augustin, Germany


                                               Abstract
                                               The concept of trust has long been studied, initially in the context of human interactions and, more
                                               recently, in human-machine or human-agent interactions. Despite extensive studies, defining trust
                                               remains challenging due to its inherent complexities and the diverse factors that influence its dynamics
                                               in multi-agent environments. This paper focuses on a specific formalization of a trust factor: predictive
                                               reliability, defined as the ability of agents to accurately forecast the actions of their peers in a shared
                                               environment. By realizing this trust factor within the framework of multi-agent reinforcement learning
                                               (MARL), we integrate it as a criterion for agents to assess and select collaborators. This approach
                                               enhances the functionality of MARL systems, promoting improved cooperation and overall effectiveness.

                                               Keywords
                                               Multi-Agent Systems, Reinforcement Learning, Trust Factor, Computational Modeling,


                                1. Introduction and Background
                                With the advent of artificial intelligence, the number of applications that require co-existence
                                and the interaction between intelligent agents and humans is increasing over time. Such
                                applications include autonomous vehicles [1], industrial robotics [2], healthcare robotics [3],
                                service robotics [4], agricultural robotics [5], and many more [6]. In this context, the concept
                                of trust becomes essential, as it fosters cooperation and collaboration between humans and
                                robots, enhancing efficiency and user satisfaction. It instills confidence in the reliability and
                                predictability of robotic systems, which is crucial for their acceptance and adoption.
                                   Trust is a concept that has been defined numerous times in the literature [7], yet there is still
                                no single universally accepted definition. However, the numerous factors that influence trust
                                are much easier to study when analyzed separately. Such factors may be associated with the
                                trustor, which is the person that trusts, with the trustee, the ones being trusted, or could be
                                dependent on the context [8]. In this article, to formalize one of such “trust factors”, we take
                                inspiration from the definition given in Gambetta [9]:


                                MultiTTrust: 3rd Workshop on Multidisciplinary Perspectives on Human-AI Team, June 11, 2024, Malmo, Sweden
                                Envelope-Open dirienzo.1844531@studenti.uniroma1.it (S. D. Rienzo); frattolillo@diag.uniroma1.it (F. Frattolillo);
                                cipollone@diag.uniroma1.it (R. Cipollone); fanti@diag.uniroma1.it (A. Fanti); brandizzi@diag.uniroma1.it
                                (N. Brandizzi); iocchi@diag.uniroma1.it (L. Iocchi)
                                Orcid 0000-0002-2040-3355 (F. Frattolillo); 0000-0002-0421-5792 (R. Cipollone); 0009-0003-0764-3965 (A. Fanti);
                                https://orcid.org/0000-0002-3191-6623 (N. Brandizzi); 0000-0001-9057-8946 (L. Iocchi)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
      ”trust (or, symmetrically, distrust) is a particular level of the subjective probability
      with which an agent assesses that another agent or group of agents will perform a
      particular action, both before he can monitor such action (or independently of his
      capacity ever to be able to monitor it) and in a context in which it affects his own
      action”

According to this notion, the trust between two agents can be correlated to the trustor’s
expectations about the choices made by the trustee in a context of mutual interaction. We
formalized this definition in a Multi-Agent Reinforcement Learning (MARL) setting. Here,
autonomous agents can benefit from reasoning about other agents’ intentions, and they can use
this information to improve their performance and select which agent to communicate with.

Problem and Solution Formulation we consider the common scenario in which agents do
not have complete knowledge of the environment which is formalized by the Decentralized
Partially Observable Markov Decision Process (DEC-POMDP) [10] framework, defined as a
Tuple ⟨𝐷, 𝒮 , 𝒜 , 𝑇 , 𝑅, Ω, 𝑂⟩, where 𝐷 is the number of agents; 𝒮 is the set of environment states
shared by all agents; 𝒜 is the set of joint actions; 𝑇 ∶ 𝒮 ×𝒜 ×𝒮 → [0, 1] is the transition function;
𝑅 ∶ 𝒮 × 𝒜 → ℝ is the reward function; Ω is the set of joint observations and 𝑂 ∶ 𝒮 × Ω → [0, 1]
is a set of observation probabilities returning the probability of joint observation.
   A MARL solution for a DEC-POMDP is a set of 𝐷 functions, called policies 𝜋 𝑖 ∶ Ω𝑖 → 𝒜𝑖 ,
which map the local observations of each agent to its actions, in order to maximize the expected
                                        𝑇
joint sum of discounted rewards: ∑𝑘=𝑡 𝛾 𝑘 𝑅(𝑠𝑘 , 𝑎𝑘 ), where 0 ≤ 𝛾 < 1.

Trust factors We refer to Frattolillo et al. [11] for a general definition of trust factors in
MARL systems. Specifically, any trust factor is computed with respect to a specific Trustor X, a
Trustee Y, and a task Γ:

                    TrustFactor(𝑋 |𝑌 , Γ) = 𝑓 (𝑜 𝑋 , 𝑎𝑌 , 𝑟, [𝑏 𝑋 →𝑌 ], [𝑐 𝑋 →𝑌 ], [𝑐 𝑌 →𝑋 ])     (1)
In this template, 𝑜 𝑋 denotes the observation of the trustor, 𝑎𝑌 is the action of the trustee, 𝑟
is the immediate reward, and 𝑏 𝑋 →𝑌 is the current belief that the trustor currently maintains
with respect to the trustee. Finally, 𝑐 𝑋 →𝑌 and 𝑐 𝑌 →𝑋 represent known facts that result from
communication from trustor to trustee and vice versa. The brackets denote that these are
optional components.


2. Method
In this work, we propose a specific instantiation of the function 𝑓 in eq. (1) to capture the
dependency identified above between the actions of the trustee and the trustor’s expectations.
Specifically, we model one of these trust factors as the ability of one agent, acting as a trustor,
to predict the actions of another agent, acting as trustee. Therefore, among all agents, we select
one to be considered as the primary agent who, in addition to learning its own policy, learns
how to predict the actions computed by other agents. Specifically, for each trustee 𝑖, the primary
agent estimates the Trust Score defined as the number of correctly predicted actions over the
                  (a)                             (b)                             (c)
Figure 1: (a) comparison between the mean score of the primary agent when it uses the predictions
from the trustable agent and the unreliable one. Trust score with respect to the (b) trustable agent and
(c) unreliable agent


number of true actions. For predicting the others’ actions, we adopt a simple neural network
that we define PredNet, which is trained in a supervised fashion and that takes as input the
observations of other agents and returns a prediction about their actions. The action predicted
by the PredNet is concatenated to the state of the primary agent; this allows us to influence its
decisions based on other agents’ intentions. All agents used in the experiment are trained in a
decentralized way through a MARL algorithm called Independent PPO [12].


3. Experiments
The environment used is a customized version of the Level-based Foraging (LBF) [13]. This is a
grid-world multi-agent environment in which agents should navigate and cooperate to collect
food, which can be collected only if the sum of the levels of agents is equal to or higher than the
level of the food. In our experiments, the primary agent selects which agent to communicate
with among the agents in its field of view based on their trust score learned during the training.
We did some experiments in a three agents environment, where the other two agents are defined
respectively as trustable and unreliable. The trustable agent executes its actions according to
its learned policy, and its goal is to cooperate with the primary agent. On the other side, the
unreliable agent performs actions according to a bad policy that, with a certain probability,
leads to incorrect action performed. The results of the experiment are shown in Figure 1. Here,
the trust score with respect to the trustable agent (b) is much higher than the one referred to
the unreliable agent (c), and additionally, the average return of the primary agents is drastically
better when relying on the former (a). In conclusion, we showed that using the trust score as a
mechanism to select which agent to communicate with improves the performance in the case
where an agent is not reliable.


4. Acknowledgments
This work is supported by the Air Force Office of Scientific Research under award number
FA8655-23-1-7257.
References
 [1] D. Parekh, N. Poddar, A. Rajpurkar, M. Chahal, N. Kumar, G. P. Joshi, W. Cho, A review
     on autonomous vehicles: Progress, methods and challenges, Electronics 11 (2022). URL:
     https://www.mdpi.com/2079-9292/11/14/2162. doi:10.3390/electronics11142162 .
 [2] V. Villani, F. Pini, F. Leali, C. Secchi, Survey on human–robot collaboration in industrial
     settings: Safety, intuitive interfaces and applications, Mechatronics 55 (2018) 248–266.
     URL: https://www.sciencedirect.com/science/article/pii/S0957415818300321. doi:https:
     //doi.org/10.1016/j.mechatronics.2018.02.009 .
 [3] M. Kyrarini, F. Lygerakis, A. Rajavenkatanarayanan, C. Sevastopoulos, H. R. Nambi-
     appan, K. K. Chaitanya, A. R. Babu, J. Mathew, F. Makedon, A survey of robots
     in healthcare, Technologies 9 (2021). URL: https://www.mdpi.com/2227-7080/9/1/8.
     doi:10.3390/technologies9010008 .
 [4] G. A. Zachiotis, G. Andrikopoulos, R. Gornez, K. Nakamura, G. Nikolakopoulos, A survey
     on the application trends of home service robotics, in: 2018 IEEE International Conference
     on Robotics and Biomimetics (ROBIO), 2018, pp. 1999–2006. doi:10.1109/ROBIO.2018.
     8665127 .
 [5] J. P. Vasconez, G. A. Kantor, F. A. Auat Cheein, Human–robot interaction in agriculture:
     A survey and current challenges, Biosystems Engineering 179 (2019) 35–48. URL: https:
     //www.sciencedirect.com/science/article/pii/S1537511017309625. doi:https://doi.org/
     10.1016/j.biosystemseng.2018.12.005 .
 [6] A. Dahiya, A. M. Aroyo, K. Dautenhahn, S. L. Smith, A survey of multi-agent human–robot
     interaction systems, Robotics and Autonomous Systems 161 (2023) 104335. URL: https:
     //www.sciencedirect.com/science/article/pii/S092188902200224X. doi:https://doi.org/
     10.1016/j.robot.2022.104335 .
 [7] S. Shahrdar, L. Menezes, M. Nojoumian, A survey on trust in autonomous systems,
     Advances in Intelligent Systems and Computing 857 (2019) 368–386. URL: https://link.
     springer.com/chapter/10.1007/978-3-030-01177-2_27. doi:10.1007/978- 3- 030- 01177- 2_
     27/TABLES/4 .
 [8] P. Hancock, T. T. Kessler, A. D. Kaplan, K. Stowers, J. C. Brill, D. R. Billings, K. E. Schaefer,
     J. L. Szalma, How and why humans trust: A meta-analysis and elaborated model, Frontiers
     in psychology 14 (2023).
 [9] D. Gambetta, Can We Trust Trust?, Trust: Making and Breaking Cooperative Relations,
     electronic edition, Department of Sociology, University of Oxford (2000) 213–237.
[10] D. S. Bernstein, R. Givan, N. Immerman, S. Zilberstein, The Complexity of Decentralized
     Control of Markov Decision Processes, Mathematics of Operations Research 27 (2002)
     819–840. URL: https://pubsonline.informs.org/doi/10.1287/moor.27.4.819.297. doi:10.1287/
     moor.27.4.819.297 .
[11] F. Frattolillo, N. Brandizzi, R. Cipollone, I. Luca, Towards computational models for rein-
     forcement learning in human-ai teams, 2nd International Workshop on Multidisciplinary
     Perspectives on Human-AI Team Trust (2023). URL: https://ceur-ws.org/Vol-3634/paper9.
     pdf.
[12] C. S. de Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. S. Torr, M. Sun, S. White-
     son, Is independent learning all you need in the starcraft multi-agent challenge?, CoRR
     abs/2011.09533 (2020). URL: https://arxiv.org/abs/2011.09533. arXiv:2011.09533 .
[13] F. Christianos, L. Schäfer, S. V. Albrecht, Shared experience actor-critic for multi-agent
     reinforcement learning, in: Advances in Neural Information Processing Systems (NeurIPS),
     2020.