-

1613-0073

Sentimental Agents: Exploring Deliberation, Cognitive Biases, and Decision-making in LLM-based Multiagent Systems

Elizabeth A. Ondula

ondula@usc.edu 0 1 2

Daniele Orner

daniele@braveventurelabs.com 0 1

Nick Mumero Mwangi

nick@braveventurelabs.com 0 1

Casandra Rusti

rusti@usc.edu 0 1 2

Brave Venture Labs

0 1 0 30th ACM KDD Conference 1 Multi-Agent Systems, Large Language Models, Sentiment Analysis , Cognitive Biases, Decision-Making, Opinion Dynamics 2 University of Southern California , Los Angeles , USA

1991

How does sentiment afect deliberative opinion dynamics in multi-agent systems using Large Language Models (LLMs)? In this paper, we introduce Sentimental Agents, a framework designed to study collaborative decision-making in a society of agents, each equipped with a distinct Mental Model of Self. We propose a method to integrate sentiment analysis and a non-Bayesian update mechanism, to analyze and interpret agents' beliefs and interactions systematically. This method allows us to observe the volatility of the sentiment associated with diferent agent statements, as well as the change in opinion throughout the agents' conversation. We further use it to model and compare collaborative decision-making approaches. We situate these agents in a simulated Human Resource recruiting environment as a case study to evaluate a candidate's fit for a role. We present a set of metrics to assess the quality of the agents' output. Finally, we explore cognitive biases in the agents' individual and collective opinion formation, a fundamental step to enhance decision-making capabilities and mitigate distortions in the system and the agents' collective reasoning.

Multiagent

CEUR ceur-ws.org

1. Introduction

Multi-agent systems (MAS), composed of interactive agents have been pivotal in modeling social phenomena, decision-making processes and collaborative tasks. Large Language Models (LLMs) such as GPT-4 [ 1 ] have opened new possibilities for exploring complex social dynamics through the simulation of linguistic interactions among agents. These models can provide the necessary capabilities for simulating communication scenarios. Integrating LLMs into MAS facilitates the study of conversations and interaction patterns in a more detailed manner.

LLMs have demonstrated exceptional performance in ing sentiment analysis tasks [ 2 ]. However, the efect of sentiment on deliberative opinion dynamics within an artificial society of agents is a domain that has not yet been fully explored. Traditional agent models may nEvelop-O LGOBE

https://eondula.github.io/ (E. A. Ondula); https://bravelabs.ai/ (D. Orner); https://bravelabs.ai/ (N. M. Mwangi); https://www.linkedin.com/in/casandrarusti/ (C. Rusti)

0000-0003-0403-0306 (E. A. Ondula); 0009-0005-1264-1985 (D. Orner); 0009-0004-6654-2635 (N. M. Mwangi); not adequately account for the influence of behavioral states like sentiment and cognitive biases on the decisionmaking process. Our work adopts a nuanced approach to understanding how the output of LLM agents influences one another within these frameworks.

We introduce Sentimental Agents, a framework designed to study and analyze collaborative decision processes. These agents are not only equipped with language capabilities but also possess a unique Mental Model of Self. This allows them to process and exhibit behaviors that can ofer a comprehensive view of how opinions are formed and evolve in a multi-agent setting.

Our system is designed primarily to observe and deit. We do not currently include objectives, reward functions, utility metrics or payofs in our model. The focus is on the natural evolution of interactions among agents without imposing external incentives or goals. Our study concentrates on non-strategic interactions. Unlike strategic agents, which model the behavior of others and act based on these predictions, our non-strategic agents do not possess such models. This distinction is crucial as it means our agents are not engaging in behaviors such as scheming or deceiving to achieve a specific objective. If LLM-based multi-agent systems are ultimately to be used to support decision-making, it is critical to understand and explain how their decisions are made. This is especially true in the hypothetical case of such systems being designed to evaluate, rank or recommend humans. At

Additionally, this work introduces metrics for assessing the quality of conversation and decision-making in language model-based multi-agent systems. These metrics, namely nuance, platitudinal score, drift, and defensibility, ofer a toolkit for evaluating the efectiveness of such systems in diverse scenarios. Furthermore, we evaluate cognitive biases including negativity, positivity, and saliency biases. This assessment ofers valuable insights into the cognitive influences and tendencies within multiagent decision-making processes. Finally, the framework is applied in a simulated Human Resource recruiting environment, serving as a practical case study. This application not only validates the theoretical model but also highlights the practical potential of the approach in real-world settings. atically analyze the opinions and interactions of these the use of LLMs in multi-agent settings with a focus on agents, and the potential correlation between the two. To ”Theory of Mind” (TOM), which is the ability of an agent remediate this, we make the following key contributions: to understand and predict the mental states and inten• We develop a framework, Sentimental Agents [], tions of others. Although crucial for collaboration, our to explore and study collective decision-making focus looks more at how agents make decisions rather processes in a society of agents. than understanding others’ mental states. Other stud• We propose using sentiment analysis as a method ies, like those of [ 4 ] look at how agents can debate and to quantify content generated by LLM-based make collective decisions using a method known as gradagents for evaluation and recommendation tasks. ual semantics, where agents exchange arguments and • We propose a method to apply a non-Bayesian progressively update their opinions to reach a shared model for opinion dynamics within a multi-agent decision. Our approach is diferent in that it explains the system. This ofers a perspective on how opin- agent interactions and decision processes leveraging a ions are formed and altered in a sentiment-driven mental model of self and sentiment tracking. Further, environment. our agents don’t have access to other agents’ memories. [ 5 ] explores how agents coordinate in complex tasks that necessitate both working together on the same task (cooperation) and dividing the task into smaller parts to be done individually (divide-and-conquer). This study highlights the need for flexible strategies to manage tasks that require both joint and individual eforts, difering from our work which doesn’t focus on specific task coordination but rather on general deliberations on various topics.

Similarly, [ 6 ] demonstrate the potential of collaborative mechanisms with LLMs in enhancing social interactions among agents, providing valuable insights into how these technologies can foster collaborative intelligence within multi-agent settings.

2.2. LLM-based Multi-Agent Frameworks

An LLM-based agent is defined as an AI system compris2. Related Works ing three core components: the brain, perception, and action modules [7]. The brain module stores knowledge 2.1. Multi-Agent Collaboration and memories, facilitating information processing and decision-making, essential for reasoning and handling In the study of multi-agent systems, understanding how new tasks. The perception module extends the agent’s agents collaborate to achieve collective objectives is es- sensory capabilities to include textual, auditory, and visential. One interesting approach, explored [ 3 ] examines sual modalities. This enhances its understanding of the environment. The perception module extends the agent’s sensory capabilities to include textual, auditory, and visual modalities. This enhances its understanding of the environment. The action module enables the agent to perform physical tasks and interact with its environment.

In terms of operating mechanism, the agent use natural language for communication, with the brain processing information from the perception module to form strategies and make decisions. In our work, we introduce the concept of a Mental Model of Self (MMS). This concept has been discussed in social psychology [8]. It refers to an integrated theory and understanding that an agent forms to organize and make sense of one’s self-knowledge, experiences and memories into broader principles that can guide anticipation of future behaviors and consequences.

In our implementation, it serves as an important organizational function in making sense of self-knowledge.We summarize and show diferences between the Sentimental

Agents framework and prior works in Table 1. 2.3. Non-strategic Multi-Agent Systems

Opinion dynamics has been extensively explored for over six decades, predominantly in the fields of sociolprinciples that dictate the formation and alteration of individual opinions under the influence of others. This involves examining a range of models and frameworks to comprehend collective behaviors and the process of consensus formation [9]. Our work focuses on a nonmodel does not incorporate game theory principles, nor does it involve agents optimizing specific utilities.

Non-Bayesian updating, in this context, signifies a process wherein opinions are modified not based on a factual or probabilistic framework that converts prior probabilities into posterior probabilities. Instead, this approach entails agents updating their opinions influenced by the views of others, without basing these on an unknown state of nature. The updating mechanism in such models can be either synchronous, where all agents update their opinions simultaneously, or asynchronous, where updates occur at diferent times. A recent survey categorizes and discusses various models prevalent in existing literature [10].

We further use Sentiment Analysis to investigate opinions which manifest as either positive or negative [11]. Studies have shown that generative models, such as Large Language Models (LLMs), are capable of producing text, which can include opinions with specific sentiments, depending on their application [12]. ogy and psychology. It delves into the mechanisms and is ordered with equal participation. strategic model within opinion dynamics, meaning the (positive, negative, neutral, and their intensities):

2.4. Evaluating LLM-based Systems

Evaluation for LLMs is emerging as a discipline to assess the performance of diferent of AI systems. Currently, for LLMs, there is no single benchmark or protocol that emerges as universally superior. This reflects the diversity of tasks and model capabilities. [ 2 ] provides an exhaustive summary and discussion based on existing works. This work covers evaluation tasks, methods and benchmarks that are crucial for assessing the performance of LLMs. In our work, we adopt a nuanced approach to evaluation. We define specific metrics to assess the conversation quality. These metrics include nuance, platitudinal score, drift and defensibility scores, which are detailed in Section 5.6.

3. Preliminaries 3.1. Conversation protocols

Consider a conversational simulation system with a set of agents denoted as ℳ = { 1, 2, … , }. Each agent ∈ ℳ is initialized with a Mental Model of Self (MMS) and a memory component for storing an opinion log. In this system, the engagement among agents in each round Definition 1.

Argument ( ) is a component of an opinion that contributes to its overall sentiment.

For each argument a sentiment value is assigned, mapping the argument to a spectrum of sentiment values where ∶ ↦ for arguments.

Definition 2. is a set of arguments .

is the sentiment mapping function ( , ) is the opinion in a given round sentiment values of all its arguments: The sentiment of an opinion is the average of the = () (1) (2) (3) ( , ) =

1 |( , )| ∈( ∑ for each agent is updated to reflect the sentiment of the A comparison of diferent language model-based multi-agent frameworks.

The sentiment update is executed using a Non-Bayesian method, mathematically represented by: newly formed opinion. This process considers the senti- The Valence for each item is determined based on ment values of the arguments within the opinion . the sentiment of opinion . For < −0.5, = −1; for Memory Belief Store/Retrieve Internal critic Chat history Specialized roles User-driven Rationale analysis Dynamic Memory Opinion logs

No Yes Yes Yes Yes Yes Yes No Yes

No No No Confirmation bias Fact-checking User-preference Credibility check Opinion classifier Cognitive bias When the conversation ends we take the total sentiment. efectiveness of Sentimental Agents. These agents are de[13] [14] [15] [16] [17] [18] [19] [20]

()represents the average sentiment of all the arguments expressed by agent at round , with each argument having its sentiment value

= () . The parameter is a weighting factor that determines the influence of the new opinion’s average sentiment on the agent’s updated sentiment.

The change in sentiment Δ

()for agent is then calculated as the absolute diference between the updated sentiment value sentiment value ( − 1)at round − 1 : ()at round and the agent’s previous Δ () = |

3.2. Collective decision protocols

We have the final sentiment score and we have the average of the 0 for the gut feeling protocol Definition 3.

Borda Count Protocol: A method to collectively rank a list of items, given each individual’s order of preference. is Given , the number of items, each agent ranks these items. The point assignment for an item by agent , , with the top-ranked item receiving points and the last receiving 1 point. The total points for each item is calculated as = ∑ descending order of their total points .

|ℳ| =1 , , and items are ranked in Definition 4.

Tiered List Protocol: A method to collectively classify a list of items in 3 tiers, given the items that each individual can’t accept, and the items they like the most. (5)

4. Applying the Framework

= 1, Tier 2 for = 0, and Tier 3 for = −1. −0.5 ≤ ≤ 0.5, = 0; and for > 0.5, = 1. Items are classified into three tiers according to : Tier 1 for Definition 5.

Gut-feeling List Protocol: A method to collectively rank a list of items based on the confidence of individuals’ feeling toward each item.

The volatility , of agent ’s sentiment towards item over several rounds is calculated. Conviction , is derived as a function of both volatility , and the final sentiment score , for item . The Gut-feeling list is then generated using a Borda count based on , for each item across all agents, and items are ranked based on the order. total Conviction points = |ℳ| ∑ =1 , in descending Our framework is applied to a simulated environment inspired by Human Resource recruiting to evaluate the signed to generate opinions reflecting their unique expertise, contributing to collective decision-making. The simulation explores opinion formation and decision-making processes within an LLM-based multi-agent setting, mirroring real-world HR recruitment where employers assess candidates through discussions with various experts. In this context, LLM-based agents are expected to engage in conversation and form diverse opinions that influence their decision-making in a simulated recruiting scenario.

4.1. Configuration

In the HR recruiting simulation, advisor agents analyze candidates’ CVs and engage in discussions to provide opinions about each candidate. These agents, with expertise in roles like Chief Financial Oficer (CFO), Vice President of Engineering, and Recycling Plant Manager, evaluate profiles and generate text reports. They also score candidates and, through collective decision-making protocols like the Borda Count, rank candidates or select the top performers. 4.1.1. Dataset We sourced our dataset from the study conducted by [21]. This dataset is a collection of resumes represented in a multi-label format. To facilitate easy access and integration of this dataset into our framework, we have developed a script that automates the process of downloading and parsing the data.

5. Sentimental Agents Framework

The system design as shown in Fig 1, consists of 7 modules. We describe each of them here.

5.1. Brief Module

and transparent data. Figure 2 shows agent initialization prompt. The module provides a configuration interface for system initialization with four components: input type,

5.3. Opinion Dynamics Module

output type, task type, and context. It handles single and multiple item formats for input and output and This module coordinates agent conversations and requires user-defined context specifying task object decision-making, consisting of conversation and decisionand subject, with optional Knowledge base integration. making protocols. It focuses on: defining the number of Predefined rules in the module automatically associate agents, engagement type and stopping mechanism. The Input, Output, and Task Types. The logic enforces spe- current implementation employs ordered engagement cific task types Evaluate, Score, Classify for single-item with equal participation. For the stop mechanism, the inputs and broader tasks for multi-item inputs. For rank module uses a non-strategic approach, diferentiating tasks, the output is structured as a list to match task from strategic interactions. In this non-strategic context, requirements. This design ensures alignment between conversations conclude based on non-Bayesian updating input/output formats and system functionality. as shown in Algorithm 1 , where they end once agents’ opinions reach stability. This contrasts with strategic 5.2. Agent Initialization interactions, which involve diferent mechanisms like rewards or objectives.

The Agent Initialization module includes two main elements: Mental Model of Self (MMS) and Memory with qualitative and quantitative opinion logs. It configures 5.4. Conversation Module agent interaction types for opinion formation as dynamic This module analyzes agents’ statements in conversaor independent. The module requires user input to set the tions, comprising four components: Argumentation, number of agents and their expertise, which informs the which breaks down statements into arguments for qualicreation of detailed agent profiles, including priorities, tative logging; Sentiment Analysis, which evaluates and objectives, and evaluation criteria. Figure 3 shows an in- quantitatively logs the sentiment of each argument; Opinstance of an MMS. Key parameters include the tolerance ion Change, using non-Bayesian updating to monitor senlevel, afecting opinion change propensity, and the drift timent shifts; and Conversation Trends, gauging signifimetric, which tracks MMS variability. Strategies for main- cant changes across rounds to infer opinion stabilization taining agent consistency involve controlled character and conversation conclusion. prompts and setting MMS prompt temperature. The opinion formation, generated via boolean input, influences 5.5. Decision Module the nature of agents’ decision-making processes. Each agent’s opinion log is stored in a central memory system, The Decision Making Protocol module is designed to acensuring decision-making is based on comprehensive commodate various decision-making protocols, including Borda Count, Tiered List, and Gut Feeling List, as detailed

5.6. Evaluation and Cognitive Bias Modules

This module evaluates the quality of conversations through various metrics.

• Nuance: Examines the diversity of themes and perspectives, quantified by the number of topics identified within individual statements or the entire conversation. • Platitudinal Score: Calculated using cosine similarity, it measures the uniqueness of outcomes in the conversation rounds, with higher scores indicating less similarity between diferent runs. • Drift : Assesses the stability of each agent’s Mental Model of Self, monitoring the relevance of results to the advisors’ profiles and checking for consistency throughout the conversation. • Defensibility: Evaluates the strength and evidence backing of the agents’ arguments, ensuring they are well-supported and referenceable.

In this research, we examine three cognitive biases: [27]. For the result shown, the language model paramnegativity, positivity, and saliency.

Negativity bias might lead agents to give undue weight to adverse opin- temperature = 1.5. ions [22] [23], while positivity bias could result in an overemphasis on favorable views [24] [25]. Saliency bias, on the other hand, might cause agents to focus on the most prominent or emotionally striking aspects of an opinion, potentially overshadowing other relevant information [26].

Algorithm 1 Non-Bayesian Updating 1: for each round do

if > 0 then for each agent ∈ ℳ do Equation 5 Δ Equation 4 () = |

6. Experiments

() < threshold for each ∈ ℳ or = _ to False In this study, we aim to investigate the dynamics of sentiment and opinion formation in an LLM-based multiagent system. We focus on understanding how agents’ opinions evolve through deliberation, and how sentiment influences their decision-making processes. Our research questions are as follows: 1. How do agents’ opinions change as a result of deliberating with each other, and can we quantify these changes? in qualitative results? 2. Do agents adopt each other’s arguments during the deliberation process, and can we observe this 3. Does the sentiment of an argument (valence, arousal) afect its adoption by other agents? 4. Do agents exhibit cognitive biases in their opinion formation, and how can we identify and mitigate these biases?

6.1. Experimental Setting

In our experiments, we conducted the simulation with 3 agents and 10 candidates. We used the data set within the simulation environment described in Section 4. For the LLM, we used the gpt-3.5-turbo-0613 version of ChatGPT eters were set as alpha = 0.5, tolerance = 0.00001, and (c) (h) (c) (d) (i) (d) (e) (j) (e) (a) (f) (b) (g) 6.2. Results Drift scores . In Table 3 it is observed that the CFO agent generally exhibits moderate drift, while the VP 6.2.1. Evaluation Metrics of Engineering (VPE) and the Recycling Plant Manager The non-Bayesian updating data from the simulation, (RPM) show higher drift values, suggesting a more dyshown in Figure 4, reveals sentiment fluctuations among namic adaptation of their MMS in response to the conagents. For instance, Figure 5a shows the VP of Engineer- versation. This variability in drift signifies the agents’ ing exhibiting the most dramatic change, especially in the difering levels of adaptability and potential reevaluation ifnal round. This volatility, captured by sentiment and of their initial stances change metrics, highlights the dynamic nature of opinion formation in multi-agent conversations and suggests that Candidate CFO VPE RPM agents’ opinions evolve and respond to the unfolding dis- KMiemlibsesarlyMCoargrran 00..64252044 00..76535068 00..57811308 cuopudrastein,ge minpchaapstiuzirninggthreeael -fetcimtiveepneerssspoecftnivoens-Bhiaftsy. es1ian EMmikilayyMla aGrsahrrailslon 00..43627584 00..75989788 00..75379200

Platitudinal score. The inter-agent similarities Justin Davis 0.3458 0.3638 0.3940 heatmap shown in Figure 5 reveals a contrast in sen- Tamara Brown 0.3842 0.6030 0.6574 timent alignment among the agents. This divergence Taylor Mahoney 0.3814 0.4794 0.4154 contributes to an overall lower platitudinal score for this Joshua Alvarado 0.3756 0.5238 0.5788 specific run for the given candidate. Such diversity in Melissa Baldwin 0.4228 0.7988 0.5714 sentiment, as captured by the platitudinal metric, un- James Wallace 0.4240 0.6342 0.6926 derscores the variation in decision-making approaches Table 2 within the agent group, emphasizing the balance between Agent Drift Values for hypothetical candidates consensus and individual thought in the simulation outcomes.

Nuance Scores We use Latent Dirichlet Allocation 1For brevity, we only show results for 5 candidates, but the experiment was conducted with 10 candidates for the platitudinal scores, well as the data point distribution, provides insights into preprocessed by tokenization and removal of stop words the cognitive tendencies of the agents. Additionally, by and unwanted words. A dictionary and corpus are conadjusting our three parameters, alpha, tolerance, and temstructed using the Gensim library. The LDA model iden- perature, we aim to better understand how these factors tifies 5 topics, with the top 10 words per topic being most afect agents’ cognitive biases. This study ofers imporsignificant. Figure 6 and ?? show the number of unique tant insights into the decision-making processes in multiwords per topic and word clouds for each candidate, re- agent systems, particularly in sentiment-influenced conanalyze nega tive and positive sentiments separately, set- in ranks among agents could reflect unique valuations of ting the y-intercept at zero to indicate that neutral peer statements might not impact an agent’s sentiment. Analyzing the regression’s strength ( 2) and the slope, as

2https://github.com/langchain-ai/langchain

texts.

In our sensitivity analysis, we varied key parameters: setting alpha to 0.3, 0.5, and 0.7; tolerance to 0.001, 0.005, and 0.0001; and temperature to 0.7, 1, and 1.5, to evaluate their impact on sentiment changes. The outcome, depicted in Figure 7 for ten random candidates, provides insight into negativity and positivity biases through the slopes of the OLS regressions. Our findings on this variation of model parameters show a modest positivity bias, evidenced by the positive slope being approximately 29% steeper than its negative counterpart. A slight positivity or negativity bias trend persisted across varied parameter settings, with some scenarios, notably alpha = 0.3, tolerance = 0.005, and temperature = 1.5, showing a more pronounced positivity bias with a slope more than twice as steep on the positive side than on the negative side.

The absence of saliency bias was noted in all experiments, as indicated by slopes remaining below 1. Linear based on our evaluation of the 2 values. Notably in the shown experiment, agents displayed a tendency towards expressing stronger negative sentiments, with the most negative reaching -0.76, compared to a maximum posinegative expressions was marked in most scenarios. Additionally, the alpha parameter was observed to significantly influence sentiment ranges, with lower alpha values yielding more constrained ranges.

For future studies, we aim to extend our examination of expansion will enable us to deepen our understanding of how parameter tuning influences cognitive biases and decision-making processes within our framework. 6.2.3. Collective decision-making The decision-making data reveals diverse agent preferences, as evidenced by the variation in candidate ranks across Borda Count, Tier, and Conviction. We use the average sentiment score, (), from equation 4, where is the last round, as the basis for collective decision-making.

While some candidates consistently rank higher or lower, suggesting a consensus on their suitability, discrepancies candidate qualities.

Table 4 shows the overall sentiment scores. The CFO shows the highest sentiment score, of 0.46 towards Melissa Baldwin, indicating a strong positive inclination. spectively. 6.2.2. Cognitive Bias Testing We hypothesize that agents’ updates in sentiment during conversational rounds might be influenced by their peers’ positive, negative, or prominent opinions. To investigate this, we chart each agent’s sentiment change from the second round onwards, against the recent sentiments of other agents. This analysis reveals the correlation between an agent’s changing sentiment and the influence of peer opinions.

We apply Ordinary Least Squares (OLS) regression to

Defensibility Scores Candidate resumes are processed through the Langchain embedding2 and transsis. The llama index libraries, VectorStoreIndex and ServiceContext, are used to create an indexed repository of the vectorized documents. This index serves as a searchable database, allowing eficient retrieval of text formed into a format suitable for detailed analy- regression was determined as the most suitable model segments that are contextually similar to a given input. tive sentiment of 0.62. This inclination towards stronger for the argument. If no relevant text is found, a score of the cognitive bias to larger candidate sample sizes. This In contrast, the CFO’s lowest sentiment score is -0.74 to one candidate. Consequently, no candidate was clastowards Melissa Morgan, signaling a significant negative sified as Tier 1, with most classified as Tier 2, except for view. Similarly, the VPE aligns with the CFO in favor- the three candidates with negative valence, who were ing Melissa Baldwin with the highest score of 0.34, but classified as Tier 3. diverges in its lowest sentiment, which is directed towards Taylor Mahoney with a score of -0.55. The RPM, Valence on the other hand, exhibits the most positive sentiment Candidate CFO VPE RPM towards Mikayla Garrison with a score of 0.59, while Kimberly Carr -1 0 0 sharing the CFO’s negative sentiment towards Melissa Melissa Morgan -1 0 0 Morgan, albeit at a less intense level of -0.46. The sen- Mikayla Garrison 0 0 0 timent scores from the conversations directly influence Emily Marshall 0 0 0 the ranking of candidates as shown in Table 5. Applying Justin Davis 0 0 0 Tamara Brown 0 0 0 the Borda Count method to the combined rankings yields Taylor Mahoney 0 -1 0 a collective decision. Although individual agents might Joshua Alvarado 0 0 0 rank candidates diferently based on their interactions, Melissa Baldwin 0 0 0 the aggregated results provide a more comprehensive James Wallace 0 0 0 assessment. This approach demonstrates how sentiment analysis combined with a voting system could inform Table 6 hiring decisions in a multi-agent setting. Valence for each candidate

The intensity of an agent’s final sentiment score determines the valence score. In Table 6, this occurred only The sentiment volatility of the agents, as shown in Tathree times: the CFO attributed a negative valence to two ble 7, was mostly moderate, indicating strong conviction candidates, while the VPE attributed a negative valence in their opinions. However, there were instances of high volatility, such as the CFO’s sentiment towards Kimberly Carr and Mikayla Garrison, and the VPE’s sentiment towards Kimberly, Joshua, and James. The RPM’s sentiment was volatile towards Mikayla and Melissa Baldwin.

The agents’ conviction in their opinions is calculated by dividing the final sentiment by the volatility, with higher values indicating stronger intuition about a candidate’s suitability for the role.

Sentiment Volatility

Gut Feeling Rank for each candidate is a revised ranking that takes into account an agent’s conviction in its own sentiment. In Table 9, the Gut Feeling of the RPM toward Joshua is still to rank him in the first place. But the CFO revises its ranking of Kimberley, from the 9th place to the 8th place. The more generous ranking can be interpreted as a result of the ”acknowledgement” of the RPM agent that it is not sure of its opinion toward Kimberly.

7. Conclusion

In this paper, we introduce Sentimental Agents, LLM-based agents that generate opinions for collective decision-making within conversational settings. Our proposed framework integrates a non-Bayesian updating mechanism to track sentiment volatility and opinion evolution. In a simulated HR recruiting scenario, we assess these agents’ decision-making abilities, noting their diverse opinions and preference shifts over multiple rounds.

The findings suggest model parameters, such as alpha and tolerance, significantly influence sentiment expression and thus cognitive bias within the system. This research ofers a foundation for advanced tool development applicable to domains such as HR recruiting, medical diagnostics, or educational domains. View, 2023. URL: http://arxiv.org/abs/2310.02124, [20] Y.-S. Chuang, A. Goyal, N. Harlalka, S. Suresh, arXiv:2310.02124 [cs]. R. Hawkins, S. Yang, D. Shah, J. Hu, T. T. Rogers, [7] Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, Simulating Opinion Dynamics with Networks of M. Zhang, J. Wang, S. Jin, E. Zhou, et al., The rise LLM-based Agents, 2023. URL: http://arxiv.org/abs/ and potential of large language model based agents: 2311.09618, arXiv:2311.09618 [physics].

A survey, arXiv preprint arXiv:2309.07864 (2023). [21] K. Jiechieu, N. Tsopze, Skills prediction based [8] D. Hart, S. Fegley, Social imitation and the emer- on multi-label resume classification using cnn gence of a mental model of self. (1994). with model predictions explanation, Neural [9] A. Sîrbu, V. Loreto, V. D. P. Servedio, F. Tria, Computing and Applications (2020). URL: https:// Opinion Dynamics: Models, Extensions and Exter- doi.org/10.1007/s00521-020-05302-x. doi:10.1007/ nal Efects, in: V. Loreto, M. Haklay, A. Hotho, s00521- 020- 05302- x.

V. D. Servedio, G. Stumme, J. Theunis, F. Tria [22] T. A. Ito, J. T. Larsen, N. K. Smith, J. T. Cacioppo, (Eds.), Participatory Sensing, Opinions and Collec- Negative information weighs more heavily on the tive Awareness, Springer International Publishing, brain: the negativity bias in evaluative categorizaCham, 2017, pp. 363–401. URL: http://link.springer. tions., Journal of personality and social psychology com/10.1007/978-3-319-25658-0_17. doi:10.1007/ 75 (1998) 887. 978- 3- 319- 25658- 0_17, series Title: Understand- [23] P. Rozin, E. B. Royzman, Negativity bias, negativity ing Complex Systems. dominance, and contagion, Personality and social [10] M. Grabisch, A. Rusinowska, A survey on non- psychology review 5 (2001) 296–320. strategic models of opinion dynamics, Games 11 [24] M. W. Matlin, D. J. Stang, The Pollyanna princi(2020) 65. ple: Selectivity in language, memory, and thought, [11] E. Cambria, D. Das, S. Bandyopadhyay, A. Feraco, Schenkman Publishing Company, 1978. et al., A practical guide to sentiment analysis, vol- [25] P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. ume 5, Springer, 2017. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M. [12] U. Maqsud, Synthetic text generation for sentiment Kloumann, J. P. Bagrow, et al., Human language reanalysis, in: Proceedings of the 6th Workshop on veals a universal positivity bias, Proceedings of the Computational Approaches to Subjectivity, Senti- national academy of sciences 112 (2015) 2389–2394. ment and Social Media Analysis, 2015, pp. 156–161. [26] M. P. Inderbitzin, A. Betella, A. Lanatá, E. P. Scilingo, [13] G. Betz, Natural-language multi-agent simulations U. Bernardet, P. F. Verschure, The social perceptual of argumentative opinion dynamics, arXiv preprint salience efect., Journal of experimental psychology: arXiv:2104.06737 (2021). human perception and performance 39 (2013) 62. [14] Y. Li, Y. Zhang, L. Sun, Metaagents: Simulating [27] https://platform.openai.com/docs/models/gpt-3-5, interactions of human behaviors for llm-based task- 2024. Accessed: 2024-10-9. oriented coordination via collaborative generative agents, arXiv preprint arXiv:2310.06500 (2023). [15] Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, I. Mordatch, Improving factuality and reasoning in language models through multiagent debate, arXiv preprint arXiv:2305.14325 (2023). [16] C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang,

J. Fu, Z. Liu, Chateval: Towards better llm-based evaluators through multi-agent debate, arXiv preprint arXiv:2308.07201 (2023). [17] G. Chen, S. Dong, Y. Shu, G. Zhang, J. Sesay, B. F.

Karlsson, J. Fu, Y. Shi, Autoagents: A framework for automatic agent generation, arXiv preprint arXiv:2309.17288 (2023). [18] J. Park, B. Min, X. Ma, J. Kim, ChoiceMates: Supporting Unfamiliar Online Decision-Making with MultiAgent Conversational Interactions, 2023. URL: http: //arxiv.org/abs/2310.01331, arXiv:2310.01331 [cs]. [19] X. Sun, X. Li, S. Zhang, S. Wang, F. Wu, J. Li,

T. Zhang, G. Wang, Sentiment Analysis through LLM Negotiations, 2023. URL: http://arxiv.org/abs/ 2311.01876, arXiv:2311.01876 [cs].

[1] R. OpenAI , Gpt-4 technical report. arxiv 2303 .08774, View in Article 2 ( 2023 ) 13 .

[2]

Chang ,

Wang ,

Wu ,

Zhu ,

Chen ,

Yang ,

Yi ,

Wang ,

Wang , A survey on evaluation of large language models , arXiv preprint arXiv:2307.03109 ( 2023 ). URL: https://arxiv.org/abs/ 2307.03109.

[3]

Li ,

Y. Q.

Chong ,

Stepputtis , J. Campbell,

Hughes ,

Lewis ,

Sycara , Theory of mind for multi-agent collaboration via large language models , arXiv preprint arXiv:2310.10701 ( 2023 ).

[4] L. D. de Tarlé , E. Bonzon, N. Maudet , Multiagent dynamics of gradual argumentation semantics , in: 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022 ), 2022 .

[5]

S. A.

Wu ,

R. E.

Wang ,

J. A.

Evans ,

J. B.

Tenenbaum ,

D. C.

Parkes ,

Kleiman-Weiner , Too many cooks: Bayesian inference for coordinating multi-agent collaboration , Topics in Cognitive Science 13 ( 2021 ) 414 - 432 .

[6]

Zhang ,

Xu ,

Deng , Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology