Developing Targeted Communication through a Trust Factor in Multi-Agent Reinforcement Learning

Developing Targeted Communication through a Trust Factor in Multi-Agent Reinforcement Learning SimoneDiRienzo Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

FrancescoFrattolillo Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

RobertoCipollone Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

AndreaFanti Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

Nicolo'Brandizzi Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

Fraunhofer IAIS

Schloss Birlinghoven, 1 53757 Sankt Augustin Germany

LucaIocchi Sapienza University of Rome

Via Ariosto, 25 00185 Roma RM Italy

Developing Targeted Communication through a Trust Factor in Multi-Agent Reinforcement Learning 1613-0073 E459D9D959CBC6E3743BF00829044D78 GROBID - A machine learning software for extracting information from scholarly documents Multi-Agent Systems Reinforcement Learning Trust Factor Computational Modeling

The concept of trust has long been studied, initially in the context of human interactions and, more recently, in human-machine or human-agent interactions. Despite extensive studies, defining trust remains challenging due to its inherent complexities and the diverse factors that influence its dynamics in multi-agent environments. This paper focuses on a specific formalization of a trust factor: predictive reliability, defined as the ability of agents to accurately forecast the actions of their peers in a shared environment. By realizing this trust factor within the framework of multi-agent reinforcement learning (MARL), we integrate it as a criterion for agents to assess and select collaborators. This approach enhances the functionality of MARL systems, promoting improved cooperation and overall effectiveness.

Introduction and Background

With the advent of artificial intelligence, the number of applications that require co-existence and the interaction between intelligent agents and humans is increasing over time. Such applications include autonomous vehicles [1], industrial robotics [2], healthcare robotics [3], service robotics [4], agricultural robotics [5], and many more [6]. In this context, the concept of trust becomes essential, as it fosters cooperation and collaboration between humans and robots, enhancing efficiency and user satisfaction. It instills confidence in the reliability and predictability of robotic systems, which is crucial for their acceptance and adoption.

Trust is a concept that has been defined numerous times in the literature [7], yet there is still no single universally accepted definition. However, the numerous factors that influence trust are much easier to study when analyzed separately. Such factors may be associated with the trustor, which is the person that trusts, with the trustee, the ones being trusted, or could be dependent on the context [8]. In this article, to formalize one of such "trust factors", we take inspiration from the definition given in Gambetta [9]:

MultiTTrust: 3rd Workshop on Multidisciplinary Perspectives on Human-AI Team, June 11, 2024, Malmo, Sweden Envelope dirienzo.1844531@studenti.uniroma1.it (S. D. Rienzo); frattolillo@diag.uniroma1.it (F. Frattolillo); cipollone@diag.uniroma1.it (R. Cipollone); fanti@diag.uniroma1.it (A. Fanti); brandizzi@diag.uniroma1.it (N. Brandizzi); iocchi@diag.uniroma1.it (L. Iocchi) Orcid 0000-0002-2040-3355 (F. Frattolillo); 0000-0002-0421-5792 (R. Cipollone); 0009-0003-0764-3965 (A. Fanti); https://orcid.org/0000-0002-3191-6623 (N. Brandizzi); 0000-0001-9057-8946 (L. Iocchi) "trust (or, symmetrically, distrust) is a particular level of the subjective probability with which an agent assesses that another agent or group of agents will perform a particular action, both before he can monitor such action (or independently of his capacity ever to be able to monitor it) and in a context in which it affects his own action"

According to this notion, the trust between two agents can be correlated to the trustor's expectations about the choices made by the trustee in a context of mutual interaction. We formalized this definition in a Multi-Agent Reinforcement Learning (MARL) setting. Here, autonomous agents can benefit from reasoning about other agents' intentions, and they can use this information to improve their performance and select which agent to communicate with.

Problem and Solution Formulation we consider the common scenario in which agents do not have complete knowledge of the environment which is formalized by the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) [10] framework, defined as a Tuple ⟨𝐷, 𝒮 , 𝒜 , 𝑇 , 𝑅, Ω, 𝑂⟩, where 𝐷 is the number of agents; 𝒮 is the set of environment states shared by all agents; 𝒜 is the set of joint actions; 𝑇 ∶ 𝒮 ×𝒜 ×𝒮 → [0, 1] is the transition function; 𝑅 ∶ 𝒮 × 𝒜 → ℝ is the reward function; Ω is the set of joint observations and 𝑂 ∶ 𝒮 × Ω → [0, 1] is a set of observation probabilities returning the probability of joint observation.

A MARL solution for a DEC-POMDP is a set of 𝐷 functions, called policies 𝜋 𝑖 ∶ Ω 𝑖 → 𝒜 𝑖 , which map the local observations of each agent to its actions, in order to maximize the expected joint sum of discounted rewards: ∑ 𝑇 𝑘=𝑡 𝛾 𝑘 𝑅(𝑠 𝑘 , 𝑎 𝑘 ), where 0 ≤ 𝛾 < 1.

Trust factors

We refer to Frattolillo et al. [11] for a general definition of trust factors in MARL systems. Specifically, any trust factor is computed with respect to a specific Trustor X, a Trustee Y, and a task Γ:

TrustFactor(𝑋 |𝑌 , Γ) = 𝑓 (𝑜 𝑋 , 𝑎 𝑌 , 𝑟, [𝑏 𝑋 →𝑌 ], [𝑐 𝑋 →𝑌 ], [𝑐 𝑌 →𝑋 ])(1)

In this template, 𝑜 𝑋 denotes the observation of the trustor, 𝑎 𝑌 is the action of the trustee, 𝑟 is the immediate reward, and 𝑏 𝑋 →𝑌 is the current belief that the trustor currently maintains with respect to the trustee. Finally, 𝑐 𝑋 →𝑌 and 𝑐 𝑌 →𝑋 represent known facts that result from communication from trustor to trustee and vice versa. The brackets denote that these are optional components.

Method

In this work, we propose a specific instantiation of the function 𝑓 in eq. ( 1) to capture the dependency identified above between the actions of the trustee and the trustor's expectations. Specifically, we model one of these trust factors as the ability of one agent, acting as a trustor, to predict the actions of another agent, acting as trustee. Therefore, among all agents, we select one to be considered as the primary agent who, in addition to learning its own policy, learns how to predict the actions computed by other agents. Specifically, for each trustee 𝑖, the primary agent estimates the Trust Score defined as the number of correctly predicted actions over the number of true actions. For predicting the others' actions, we adopt a simple neural network that we define PredNet, which is trained in a supervised fashion and that takes as input the observations of other agents and returns a prediction about their actions. The action predicted by the PredNet is concatenated to the state of the primary agent; this allows us to influence its decisions based on other agents' intentions. All agents used in the experiment are trained in a decentralized way through a MARL algorithm called Independent PPO [12].

Experiments

The environment used is a customized version of the Level-based Foraging (LBF) [13]. This is a grid-world multi-agent environment in which agents should navigate and cooperate to collect food, which can be collected only if the sum of the levels of agents is equal to or higher than the level of the food. In our experiments, the primary agent selects which agent to communicate with among the agents in its field of view based on their trust score learned during the training. We did some experiments in a three agents environment, where the other two agents are defined respectively as trustable and unreliable. The trustable agent executes its actions according to its learned policy, and its goal is to cooperate with the primary agent. On the other side, the unreliable agent performs actions according to a bad policy that, with a certain probability, leads to incorrect action performed. The results of the experiment are shown in Figure 1. Here, the trust score with respect to the trustable agent (b) is much higher than the one referred to the unreliable agent (c), and additionally, the average return of the primary agents is drastically better when relying on the former (a). In conclusion, we showed that using the trust score as a mechanism to select which agent to communicate with improves the performance in the case where an agent is not reliable.

Figure 1 :1Figure 1: (a) comparison between the mean score of the primary agent when it uses the predictions from the trustable agent and the unreliable one. Trust score with respect to the (b) trustable agent and (c) unreliable agent

Acknowledgments

This work is supported by the Air Force Office of Scientific Research under award number FA8655-23-1-7257.

A review on autonomous vehicles: Progress, methods and challenges DParekh NPoddar ARajpurkar MChahal NKumar GPJoshi WCho 10.3390/electronics11142162 Electronics 11 2022 Survey on human-robot collaboration in industrial settings: Safety, intuitive interfaces and applications VVillani FPini FLeali CSecchi 10.1016/j.mechatronics.2018.02.009 Mechatronics 55 2018 MKyrarini FLygerakis ARajavenkatanarayanan CSevastopoulos HRNambiappan KKChaitanya ARBabu JMathew FMakedon 10.3390/technologies9010008 A survey of robots in healthcare 2021 9 A survey on the application trends of home service robotics GAZachiotis GAndrikopoulos RGornez KNakamura GNikolakopoulos 10.1109/ROBIO.2018.8665127 IEEE International Conference on Robotics and Biomimetics (ROBIO) 2018. 2018 Human-robot interaction in agriculture: A survey and current challenges JPVasconez GAKantor FAAuat Cheein 10.1016/j.biosystemseng.2018.12.005 Biosystems Engineering 179 2019 A survey of multi-agent human-robot interaction systems ADahiya AMAroyo KDautenhahn SLSmith 10.1016/j.robot.2022.104335 Robotics and Autonomous Systems 161 104335 2023 A survey on trust in autonomous systems SShahrdar LMenezes MNojoumian 10.1007/978-3-030-01177-2_27/TABLES/4 Advances in Intelligent Systems and Computing 857 2019 How and why humans trust: A meta-analysis and elaborated model PHancock TTKessler ADKaplan KStowers JCBrill DRBillings KESchaefer JLSzalma Frontiers in psychology 14 2023 DGambetta Can We Trust Trust?, Trust: Making and Breaking Cooperative Relations 2000 Department of Sociology, University of Oxford electronic edition The Complexity of Decentralized Control of Markov Decision Processes DSBernstein RGivan NImmerman SZilberstein 10.1287/moor.27.4.819.297 doi: Mathematics of Operations Research 27 2002 Towards computational models for reinforcement learning in human-ai teams FFrattolillo NBrandizzi RCipollone ILuca 2nd International Workshop on Multidisciplinary Perspectives on Human-AI Team Trust 2023 Is independent learning all you need in the starcraft multi-agent challenge? CSDe Witt TGupta DMakoviichuk VMakoviychuk PH STorr MSun SWhiteson CoRR abs/2011.09533 2020 Shared experience actor-critic for multi-agent reinforcement learning FChristianos LSchäfer SVAlbrecht Advances in Neural Information Processing Systems (NeurIPS) 2020