Sentimental Agents: Exploring Deliberation, Cognitive Biases, and Decision-making in LLM-based Multiagent Systems

Sentimental Agents: Exploring Deliberation, Cognitive Biases, and Decision-making in LLM-based Multiagent Systems ElizabethAOndula ondula@usc.edu University of Southern California

Los Angeles USA

DanieleOrner daniele@braveventurelabs.com Brave Venture Labs NickMumeroMwangi Brave Venture Labs CasandraRusti rusti@usc.edu University of Southern California

Los Angeles USA

Sentimental Agents: Exploring Deliberation, Cognitive Biases, and Decision-making in LLM-based Multiagent Systems 1613-0073 D8EA20FA1074AFF09D14B104F22026BE GROBID - A machine learning software for extracting information from scholarly documents Multi-Agent Systems Large Language Models Sentiment Analysis Cognitive Biases Decision-Making Opinion Dynamics

How does sentiment affect deliberative opinion dynamics in multi-agent systems using Large Language Models (LLMs)? In this paper, we introduce Sentimental Agents, a framework designed to study collaborative decision-making in a society of agents, each equipped with a distinct Mental Model of Self. We propose a method to integrate sentiment analysis and a non-Bayesian update mechanism, to analyze and interpret agents' beliefs and interactions systematically. This method allows us to observe the volatility of the sentiment associated with different agent statements, as well as the change in opinion throughout the agents' conversation. We further use it to model and compare collaborative decision-making approaches. We situate these agents in a simulated Human Resource recruiting environment as a case study to evaluate a candidate's fit for a role. We present a set of metrics to assess the quality of the agents' output. Finally, we explore cognitive biases in the agents' individual and collective opinion formation, a fundamental step to enhance decision-making capabilities and mitigate distortions in the system and the agents' collective reasoning.

Introduction

Multi-agent systems (MAS), composed of interactive agents have been pivotal in modeling social phenomena, decision-making processes and collaborative tasks. Large Language Models (LLMs) such as GPT-4 [1] have opened new possibilities for exploring complex social dynamics through the simulation of linguistic interactions among agents. These models can provide the necessary capabilities for simulating communication scenarios. Integrating LLMs into MAS facilitates the study of conversations and interaction patterns in a more detailed manner.

LLMs have demonstrated exceptional performance in generating text that embodies sentiment and in executing sentiment analysis tasks [2]. However, the effect of sentiment on deliberative opinion dynamics within an artificial society of agents is a domain that has not yet been fully explored. Traditional agent models may not adequately account for the influence of behavioral states like sentiment and cognitive biases on the decisionmaking process. Our work adopts a nuanced approach to understanding how the output of LLM agents influences one another within these frameworks.

We introduce Sentimental Agents, a framework designed to study and analyze collaborative decision processes. These agents are not only equipped with language capabilities but also possess a unique Mental Model of Self. This allows them to process and exhibit behaviors that can offer a comprehensive view of how opinions are formed and evolve in a multi-agent setting.

Our system is designed primarily to observe and describe agents' behavior, rather than to design or direct it. We do not currently include objectives, reward functions, utility metrics or payoffs in our model. The focus is on the natural evolution of interactions among agents without imposing external incentives or goals. Our study concentrates on non-strategic interactions. Unlike strategic agents, which model the behavior of others and act based on these predictions, our non-strategic agents do not possess such models. This distinction is crucial as it means our agents are not engaging in behaviors such as scheming or deceiving to achieve a specific objective. If LLM-based multi-agent systems are ultimately to be used to support decision-making, it is critical to understand and explain how their decisions are made. This is especially true in the hypothetical case of such systems being designed to evaluate, rank or recommend humans. At present, there are no unified solutions that can system- atically analyze the opinions and interactions of these agents, and the potential correlation between the two. To remediate this, we make the following key contributions:

• We develop a framework, Sentimental Agents [],

to explore and study collective decision-making processes in a society of agents. • We propose using sentiment analysis as a method to quantify content generated by LLM-based agents for evaluation and recommendation tasks. • We propose a method to apply a non-Bayesian model for opinion dynamics within a multi-agent system. This offers a perspective on how opinions are formed and altered in a sentiment-driven environment.

Additionally, this work introduces metrics for assessing the quality of conversation and decision-making in language model-based multi-agent systems. These metrics, namely nuance, platitudinal score, drift, and defensibility, offer a toolkit for evaluating the effectiveness of such systems in diverse scenarios. Furthermore, we evaluate cognitive biases including negativity, positivity, and saliency biases. This assessment offers valuable insights into the cognitive influences and tendencies within multiagent decision-making processes. Finally, the framework is applied in a simulated Human Resource recruiting environment, serving as a practical case study. This application not only validates the theoretical model but also highlights the practical potential of the approach in real-world settings.

Related Works

Multi-Agent Collaboration

In the study of multi-agent systems, understanding how agents collaborate to achieve collective objectives is essential. One interesting approach, explored [3] examines the use of LLMs in multi-agent settings with a focus on "Theory of Mind" (TOM), which is the ability of an agent to understand and predict the mental states and intentions of others. Although crucial for collaboration, our focus looks more at how agents make decisions rather than understanding others' mental states. Other studies, like those of [4] look at how agents can debate and make collective decisions using a method known as gradual semantics, where agents exchange arguments and progressively update their opinions to reach a shared decision. Our approach is different in that it explains the agent interactions and decision processes leveraging a mental model of self and sentiment tracking. Further, our agents don't have access to other agents' memories.

[5] explores how agents coordinate in complex tasks that necessitate both working together on the same task (cooperation) and dividing the task into smaller parts to be done individually (divide-and-conquer). This study highlights the need for flexible strategies to manage tasks that require both joint and individual efforts, differing from our work which doesn't focus on specific task coordination but rather on general deliberations on various topics.

Similarly, [6] demonstrate the potential of collaborative mechanisms with LLMs in enhancing social interactions among agents, providing valuable insights into how these technologies can foster collaborative intelligence within multi-agent settings.

LLM-based Multi-Agent Frameworks

An LLM-based agent is defined as an AI system comprising three core components: the brain, perception, and action modules [7]. The brain module stores knowledge and memories, facilitating information processing and decision-making, essential for reasoning and handling new tasks. The perception module extends the agent's sensory capabilities to include textual, auditory, and vi-sual modalities. This enhances its understanding of the environment. The perception module extends the agent's sensory capabilities to include textual, auditory, and visual modalities. This enhances its understanding of the environment. The action module enables the agent to perform physical tasks and interact with its environment.

In terms of operating mechanism, the agent use natural language for communication, with the brain processing information from the perception module to form strategies and make decisions. In our work, we introduce the concept of a Mental Model of Self (MMS). This concept has been discussed in social psychology [8]. It refers to an integrated theory and understanding that an agent forms to organize and make sense of one's self-knowledge, experiences and memories into broader principles that can guide anticipation of future behaviors and consequences.

In our implementation, it serves as an important organizational function in making sense of self-knowledge.We summarize and show differences between the Sentimental Agents framework and prior works in Table 1.

Non-strategic Multi-Agent Systems

Opinion dynamics has been extensively explored for over six decades, predominantly in the fields of sociology and psychology. It delves into the mechanisms and principles that dictate the formation and alteration of individual opinions under the influence of others. This involves examining a range of models and frameworks to comprehend collective behaviors and the process of consensus formation [9]. Our work focuses on a nonstrategic model within opinion dynamics, meaning the model does not incorporate game theory principles, nor does it involve agents optimizing specific utilities. Non-Bayesian updating, in this context, signifies a process wherein opinions are modified not based on a factual or probabilistic framework that converts prior probabilities into posterior probabilities. Instead, this approach entails agents updating their opinions influenced by the views of others, without basing these on an unknown state of nature. The updating mechanism in such models can be either synchronous, where all agents update their opinions simultaneously, or asynchronous, where updates occur at different times. A recent survey categorizes and discusses various models prevalent in existing literature [10].

We further use Sentiment Analysis to investigate opinions which manifest as either positive or negative [11]. Studies have shown that generative models, such as Large Language Models (LLMs), are capable of producing text, which can include opinions with specific sentiments, depending on their application [12].

Evaluating LLM-based Systems

Evaluation for LLMs is emerging as a discipline to assess the performance of different of AI systems. Currently, for LLMs, there is no single benchmark or protocol that emerges as universally superior. This reflects the diversity of tasks and model capabilities. [2] provides an exhaustive summary and discussion based on existing works. This work covers evaluation tasks, methods and benchmarks that are crucial for assessing the performance of LLMs. In our work, we adopt a nuanced approach to evaluation. We define specific metrics to assess the conversation quality. These metrics include nuance, platitudinal score, drift and defensibility scores, which are detailed in Section 5.6.

Preliminaries

Conversation protocols

Consider a conversational simulation system with a set of agents denoted as ℳ = {𝑚 1 , 𝑚 2 , … , 𝑚 𝑛 }. Each agent 𝑚 𝑖 ∈ ℳ is initialized with a Mental Model of Self (MMS) and a memory component for storing an opinion log. In this system, the engagement among agents in each round 𝑡 is ordered with equal participation.

Definition 1. Argument (𝐴) is a component of an opinion that contributes to its overall sentiment.

For each argument 𝐴 a sentiment value 𝑆 𝐴 is assigned, mapping the argument to a spectrum of sentiment values (positive, negative, neutral, and their intensities):

𝑆 𝐴 = 𝑓 (𝐴)(1)

where 𝑓 ∶ 𝐴 ↦ 𝑆 𝐴 is the sentiment mapping function for arguments.

Definition 2. 𝑂(𝑚 𝑖 , 𝑡) is the opinion 𝑚 𝑖 in a given round 𝑡 is a set of arguments 𝐴.

The sentiment of an opinion 𝑆 𝑂 is the average of the sentiment values 𝑆 𝐴 of all its arguments:

𝑆 𝑂 (𝑚 𝑖 , 𝑡) = 1 |𝑂(𝑚 𝑖 , 𝑡)| ∑ 𝐴∈𝑂(𝑚 𝑖 ,𝑡) 𝑆 𝐴(2)

The Ordered Engagement in the system is represented by a function 𝐸 ∶ ℳ × 𝑡 → 𝑚 𝑖 , which establishes the speaking order of agents in each round 𝑡. Under this model, each agent 𝑚 𝑖 contributes exactly one opinion per round. The collective state of opinions at any given round 𝑡 is represented as a vector:

𝑋 𝐸 (𝑡) = [𝑂(𝑚 1 , 𝑡), 𝑂(𝑚 2 , 𝑡), … , 𝑂(𝑚 𝑛 , 𝑡)](3)

In each conversation round 𝑡, the sentiment value 𝑆 𝑂 𝑚 𝑖 , 𝑡 for each agent 𝑚 𝑖 is updated to reflect the sentiment of the

Related Work

Sentiment Analysis Engagement type Memory Decision module Bias Evaluation [13] No Ordered Belief No Confirmation bias [14] No Ordered Store/Retrieve Yes No [15] No Ordered Internal critic Yes Fact-checking [16] No Varies Chat history Yes No [17] No Ordered Specialized roles Yes No [18] No Ordered User-driven Yes User-preference [19] No Ordered Rationale analysis Yes Credibility check [20] No newly formed opinion. This process considers the sentiment values 𝑆 𝐴 of the arguments within the opinion 𝑂.

The sentiment update is executed using a Non-Bayesian method, mathematically represented by:

𝑆 𝑖 𝑂 (𝑡) = 𝛼 ⋅ ( 1 |𝐴| ∑ 𝐴∈𝑂(𝑚 𝑖 ,𝑡) 𝑆 𝐴 ) + (1 − 𝛼) ⋅ 𝑆 𝑖 𝑂 (𝑡 − 1)(4)

Here, 𝑆 𝑖 𝑂 (𝑡) represents the average sentiment of all the arguments expressed by agent 𝑚 𝑖 at round 𝑡, with each argument 𝐴 having its sentiment value 𝑆 𝐴 = 𝑓 (𝐴). The parameter 𝛼 is a weighting factor that determines the influence of the new opinion's average sentiment on the agent's updated sentiment.

The change in sentiment Δ𝑆 𝑖 𝑂 (𝑡) for agent 𝑚 𝑖 is then calculated as the absolute difference between the updated sentiment value 𝑆 𝑖 𝑂 (𝑡) at round 𝑡 and the agent's previous sentiment value 𝑆 𝑖 𝑂 (𝑡 − 1) at round 𝑡 − 1:

Δ𝑆 𝑖 𝑂 (𝑡) = |𝑆 𝑖 𝑂 (𝑡) − 𝑆 𝑖 𝑂 (𝑡 − 1)| (5)

Collective decision protocols

When the conversation ends we take the total sentiment. We have the final sentiment score and we have the average of the 𝑆 0 for the gut feeling protocol Definition 3. Borda Count Protocol: A method to collectively rank a list of items, given each individual's order of preference.

Given 𝑛, the number of items, each agent 𝑚 𝑖 ranks these items. The point assignment for an item 𝑗 by agent 𝑚 𝑖 is 𝑃 𝑚 𝑖 ,𝑗 , with the top-ranked item receiving 𝑛 points and the last receiving 1 point. The total points for each item 𝑗 is calculated as 𝑇 𝑗 = ∑ |ℳ| 𝑖=1 𝑃 𝑚 𝑖 ,𝑗 , and items are ranked in descending order of their total points 𝑇 𝑗 .

Definition 4. Tiered List Protocol: A method to collectively classify a list of items in 3 tiers, given the items that each individual can't accept, and the items they like the most.

The Valence 𝑉 𝑗 for each item 𝑗 is determined based on the sentiment of opinion 𝑆 𝑂 . For 𝑆 𝑂 < −0.5, 𝑉 𝑗 = −1; for −0.5 ≤ 𝑆 𝑂 ≤ 0.5, 𝑉 𝑗 = 0; and for 𝑆 𝑂 > 0.5, 𝑉 𝑗 = 1. Items are classified into three tiers according to 𝑉 𝑗 : Tier 1 for 𝑉 𝑗 = 1, Tier 2 for 𝑉 𝑗 = 0, and Tier 3 for 𝑉 𝑗 = −1.

Definition 5. Gut-feeling List Protocol: A method to collectively rank a list of items based on the confidence of individuals' feeling toward each item.

The volatility 𝜈 𝑚 𝑖 ,𝑗 of agent 𝑚 𝑖 's sentiment towards item 𝑗 over several rounds is calculated. Conviction 𝐼 𝑚 𝑖 ,𝑗 is derived as a function of both volatility 𝜈 𝑚 𝑖 ,𝑗 and the final sentiment score 𝑆 𝑚 𝑖 ,𝑗 for item 𝑗. The Gut-feeling list is then generated using a Borda count based on 𝐼 𝑚 𝑖 ,𝑗 for each item across all agents, and items are ranked based on the total Conviction points 𝑇 𝐼 𝑗 = ∑ |ℳ| 𝑖=1 𝐼 𝑚 𝑖 ,𝑗 in descending order.

Applying the Framework

Our framework is applied to a simulated environment inspired by Human Resource recruiting to evaluate the effectiveness of Sentimental Agents. These agents are designed to generate opinions reflecting their unique expertise, contributing to collective decision-making. The simulation explores opinion formation and decision-making processes within an LLM-based multi-agent setting, mirroring real-world HR recruitment where employers assess candidates through discussions with various experts. In this context, LLM-based agents are expected to engage in conversation and form diverse opinions that influence their decision-making in a simulated recruiting scenario.

Configuration

In the HR recruiting simulation, advisor agents analyze candidates' CVs and engage in discussions to provide opinions about each candidate. These agents, with expertise in roles like Chief Financial Officer (CFO), Vice President of Engineering, and Recycling Plant Manager, evaluate profiles and generate text reports. They also score candidates and, through collective decision-making protocols like the Borda Count, rank candidates or select the top performers.

Dataset

We sourced our dataset from the study conducted by [21]. This dataset is a collection of resumes represented in a multi-label format. To facilitate easy access and integration of this dataset into our framework, we have developed a script that automates the process of downloading and parsing the data.

Sentimental Agents Framework

The system design as shown in Fig 1, consists of 7 modules. We describe each of them here.

Brief Module

The module provides a configuration interface for system initialization with four components: input type, output type, task type, and context. It handles single and multiple item formats for input and output and requires user-defined context specifying task object and subject, with optional Knowledge base integration. Predefined rules in the module automatically associate Input, Output, and Task Types. The logic enforces specific task types Evaluate, Score, Classify for single-item inputs and broader tasks for multi-item inputs. For rank tasks, the output is structured as a list to match task requirements. This design ensures alignment between input/output formats and system functionality.

Agent Initialization

The Agent Initialization module includes two main elements: Mental Model of Self (MMS) and Memory with qualitative and quantitative opinion logs. It configures agent interaction types for opinion formation as dynamic or independent. The module requires user input to set the number of agents and their expertise, which informs the creation of detailed agent profiles, including priorities, objectives, and evaluation criteria. Figure 3 shows an instance of an MMS. Key parameters include the tolerance level, affecting opinion change propensity, and the drift metric, which tracks MMS variability. Strategies for maintaining agent consistency involve controlled character prompts and setting MMS prompt temperature. The opinion formation, generated via boolean input, influences the nature of agents' decision-making processes. Each agent's opinion log is stored in a central memory system, ensuring decision-making is based on comprehensive and transparent data. Figure 2 shows agent initialization prompt.

Figure 2: Series of prompts used to create a group of agents

Mental Model of Self, for one instance of the system

Opinion Dynamics Module

This module coordinates agent conversations and decision-making, consisting of conversation and decisionmaking protocols. It focuses on: defining the number of agents, engagement type and stopping mechanism. The current implementation employs ordered engagement with equal participation. For the stop mechanism, the module uses a non-strategic approach, differentiating from strategic interactions. In this non-strategic context, conversations conclude based on non-Bayesian updating as shown in Algorithm 1 , where they end once agents' opinions reach stability. This contrasts with strategic interactions, which involve different mechanisms like rewards or objectives.

Conversation Module

This module analyzes agents' statements in conversations, comprising four components: Argumentation, which breaks down statements into arguments for qualitative logging; Sentiment Analysis, which evaluates and quantitatively logs the sentiment of each argument; Opinion Change, using non-Bayesian updating to monitor sentiment shifts; and Conversation Trends, gauging significant changes across rounds to infer opinion stabilization and conversation conclusion.

Decision Module

The Decision Making Protocol module is designed to accommodate various decision-making protocols, including Borda Count, Tiered List, and Gut Feeling List, as detailed in the preliminaries (Section 3). It operates by capturing the final sentiment of each agent and the average sentiment throughout the conversation. The functionality and outcomes of these different decision-making processes are further explored and discussed in the results (Section 6).

Evaluation and Cognitive Bias Modules

This module evaluates the quality of conversations through various metrics.

• Nuance: Examines the diversity of themes and perspectives, quantified by the number of topics identified within individual statements or the entire conversation. • Platitudinal Score: Calculated using cosine similarity, it measures the uniqueness of outcomes in the conversation rounds, with higher scores indicating less similarity between different runs. • Drift: Assesses the stability of each agent's Mental Model of Self, monitoring the relevance of results to the advisors' profiles and checking for consistency throughout the conversation. • Defensibility: Evaluates the strength and evidence backing of the agents' arguments, ensuring they are well-supported and referenceable.

In this research, we examine three cognitive biases: negativity, positivity, and saliency. Negativity bias might lead agents to give undue weight to adverse opinions [22] [23], while positivity bias could result in an overemphasis on favorable views [24] [25]. Saliency bias, on the other hand, might cause agents to focus on the most prominent or emotionally striking aspects of an opinion, potentially overshadowing other relevant information [26]. Increment 𝑡 12: end for

Algorithm 1 Non-Bayesian Updating

Experiments

In this study, we aim to investigate the dynamics of sentiment and opinion formation in an LLM-based multiagent system. We focus on understanding how agents' opinions evolve through deliberation, and how sentiment influences their decision-making processes. Our research questions are as follows:

1. How do agents' opinions change as a result of deliberating with each other, and can we quantify these changes? 2. Do agents adopt each other's arguments during the deliberation process, and can we observe this in qualitative results? 3. Does the sentiment of an argument (valence, arousal) affect its adoption by other agents? 4. Do agents exhibit cognitive biases in their opinion formation, and how can we identify and mitigate these biases?

Experimental Setting

In our experiments, we conducted the simulation with 3 agents and 10 candidates. We used the data set within the simulation environment described in Section 4. For the LLM, we used the gpt-3.5-turbo-0613 version of ChatGPT [27]. For the result shown, the language model parameters were set as alpha 𝛼 = 0.5, tolerance = 0.00001, and temperature = 1.5.

(a)

(b) (c) (d) (e) (f) (g) (h) (i) (j)

Results

Evaluation Metrics

The non-Bayesian updating data from the simulation, shown in Figure 4, reveals sentiment fluctuations among agents. For instance, Figure 5a shows the VP of Engineering exhibiting the most dramatic change, especially in the final round. This volatility, captured by sentiment and change metrics, highlights the dynamic nature of opinion formation in multi-agent conversations and suggests that agents' opinions evolve and respond to the unfolding discourse, emphasizing the effectiveness of non-Bayesian updating in capturing real-time perspective shifts. 1 Platitudinal score. The inter-agent similarities heatmap shown in Figure 5 reveals a contrast in sentiment alignment among the agents. This divergence contributes to an overall lower platitudinal score for this specific run for the given candidate. Such diversity in sentiment, as captured by the platitudinal metric, underscores the variation in decision-making approaches within the agent group, emphasizing the balance between consensus and individual thought in the simulation outcomes. 1 For brevity, we only show results for 5 candidates, but the experiment was conducted with 10 candidates for the platitudinal scores, Drift scores. In Table 3 it is observed that the CFO agent generally exhibits moderate drift, while the VP of Engineering (VPE) and the Recycling Plant Manager (RPM) show higher drift values, suggesting a more dynamic adaptation of their MMS in response to the conversation. This variability in drift signifies the agents' differing levels of adaptability and potential reevaluation of their initial stances (LDA) to extract topics from text statements. The data is preprocessed by tokenization and removal of stop words and unwanted words. A dictionary and corpus are constructed using the Gensim library. The LDA model identifies 5 topics, with the top 10 words per topic being most significant. Figure 6 and ?? show the number of unique words per topic and word clouds for each candidate, respectively. Defensibility Scores Candidate resumes are processed through the Langchain embedding 2 and transformed into a format suitable for detailed analysis. The llama index libraries, VectorStoreIndex and ServiceContext, are used to create an indexed repository of the vectorized documents. This index serves as a searchable database, allowing efficient retrieval of text segments that are contextually similar to a given input. When evaluating agents' arguments, the indexed space is searched to find text segments from the resumes that closely match the argument. The similarity between an agent's argument and the retrieved text is quantified as a score, with higher scores indicating stronger support for the argument. If no relevant text is found, a score of zero is assigned, suggesting an unsupported argument.

Candidate

Cognitive Bias Testing

We hypothesize that agents' updates in sentiment during conversational rounds might be influenced by their peers' positive, negative, or prominent opinions. To investigate this, we chart each agent's sentiment change from the second round onwards, against the recent sentiments of other agents. This analysis reveals the correlation between an agent's changing sentiment and the influence of peer opinions.

We apply Ordinary Least Squares (OLS) regression to analyze nega tive and positive sentiments separately, setting the y-intercept at zero to indicate that neutral peer statements might not impact an agent's sentiment. Analyzing the regression's strength (𝑅 2 ) and the slope, as 2 https://github.com/langchain-ai/langchain well as the data point distribution, provides insights into the cognitive tendencies of the agents. Additionally, by adjusting our three parameters, alpha, tolerance, and temperature, we aim to better understand how these factors affect agents' cognitive biases. This study offers important insights into the decision-making processes in multiagent systems, particularly in sentiment-influenced contexts.

In our sensitivity analysis, we varied key parameters: setting alpha to 0.3, 0.5, and 0.7; tolerance to 0.001, 0.005, and 0.0001; and temperature to 0.7, 1, and 1.5, to evaluate their impact on sentiment changes. The outcome, depicted in Figure 7 for ten random candidates, provides insight into negativity and positivity biases through the slopes of the OLS regressions. Our findings on this variation of model parameters show a modest positivity bias, evidenced by the positive slope being approximately 29% steeper than its negative counterpart. A slight positivity or negativity bias trend persisted across varied parameter settings, with some scenarios, notably alpha = 0.3, tolerance = 0.005, and temperature = 1.5, showing a more pronounced positivity bias with a slope more than twice as steep on the positive side than on the negative side.

The absence of saliency bias was noted in all experiments, as indicated by slopes remaining below 1. Linear regression was determined as the most suitable model based on our evaluation of the 𝑅 2 values. Notably in the shown experiment, agents displayed a tendency towards expressing stronger negative sentiments, with the most negative reaching -0.76, compared to a maximum positive sentiment of 0.62. This inclination towards stronger negative expressions was marked in most scenarios. Additionally, the alpha parameter was observed to significantly influence sentiment ranges, with lower alpha values yielding more constrained ranges.

For future studies, we aim to extend our examination of the cognitive bias to larger candidate sample sizes. This expansion will enable us to deepen our understanding of how parameter tuning influences cognitive biases and decision-making processes within our framework.

Collective decision-making

The decision-making data reveals diverse agent preferences, as evidenced by the variation in candidate ranks across Borda Count, Tier, and Conviction. We use the average sentiment score, 𝑆 𝑖 𝑂 (𝑡), from equation 4, where 𝑡 is the last round, as the basis for collective decision-making. While some candidates consistently rank higher or lower, suggesting a consensus on their suitability, discrepancies in ranks among agents could reflect unique valuations of candidate qualities.

Table 4 shows the overall sentiment scores. The CFO shows the highest sentiment score, of 0.46 towards Melissa Baldwin, indicating a strong positive inclination. The sentiment volatility of the agents, as shown in Table 7, was mostly moderate, indicating strong conviction in their opinions. However, there were instances of high volatility, such as the CFO's sentiment towards Kimberly Carr and Mikayla Garrison, and the VPE's sentiment towards Kimberly, Joshua, and James. The RPM's sentiment was volatile towards Mikayla and Melissa Baldwin. The agents' conviction in their opinions is calculated by dividing the final sentiment by the volatility, with higher values indicating stronger intuition about a candidate's suitability for the role. Gut Feeling Rank for each candidate is a revised ranking that takes into account an agent's conviction in its own sentiment. In Table 9, the Gut Feeling of the RPM toward Joshua is still to rank him in the first place. But the CFO revises its ranking of Kimberley, from the 9th place to the 8th place. The more generous ranking can be interpreted as a result of the "acknowledgement" of the RPM agent that it is not sure of its opinion toward Kimberly.

Sentiment

Conclusion

In this paper, we introduce Sentimental Agents, LLM-based agents that generate opinions for collective decision-making within conversational settings. Our pro-

Table 9

Gut feeling for each candidate (Ranking of candidates that combines both the Sentiment Score, and the Conviction an agent has in this sentiment posed framework integrates a non-Bayesian updating mechanism to track sentiment volatility and opinion evolution. In a simulated HR recruiting scenario, we assess these agents' decision-making abilities, noting their diverse opinions and preference shifts over multiple rounds.

The findings suggest model parameters, such as alpha and tolerance, significantly influence sentiment expression and thus cognitive bias within the system. This research offers a foundation for advanced tool development applicable to domains such as HR recruiting, medical diagnostics, or educational domains.

Figure 1 :1Figure 1: The Sentimental Agents framework consists of 7 modules: The Brief, Agent Initialization, Opinion Dynamics, Conversation, Decision, Cognitive Bias and Evaluation Modules.

Figure 3 :3Figure 3: An instance of an agent's Mental Model of Self in a simulated HR environment. In this case, the agent took a Job Title as input, and generated a Description, Priorities, and Evaluation Criteria for a given Job Description.

1: for each round 𝑡 do 2 : 5 5 :255for each agent 𝑚 𝑖 ∈ ℳ do Δ𝑆 𝑖 𝑂 (𝑡) = |𝑆 𝑖 𝑂 (𝑡) − 𝑆 𝑖 𝑂 (𝑡 − 1)| Equation 4 𝑆 𝑖 𝑂 (𝑡) = 𝛼 ⋅𝑆 𝑂 (𝑚 𝑖 , 𝑡)+(1−𝛼)⋅𝑆 𝑖 𝑂 (𝑡 − 1) if all Δ𝑆 𝑖 𝑂 (𝑡) < threshold for each 𝑚 𝑖 ∈ ℳ or 𝑡 = max_rounds then

Figure 4 :Figure 5 :45Figure 4: Sentiment change, and corresponding Opinion Change in conversations for two different candidates. Each conversation stops after five rounds.

Figure 6 :6Figure 6: The nuance score for each candidate by showing the number of unique words used across all topics.

Figure 7 :7Figure 7: The results of the cognitive bias testing using a sample of ten candidates and model parameters of alpha = 0.5, tolerance = 0.00001, and temperature = 1.5.

Table 1 A1OrderedDynamic Memory NoOpinion classifierSentimental Agents YesOrderedOpinion logsYesCognitive bias

comparison of different language model-based multi-agent frameworks.

Table 44Final Sentiment Scores for each candidateFinal Sentiment ScoresCandidateCFO VPERPMKimberly Carr-0.51 -0.13 0.26Melissa Morgan-0.74 -0.21 -0.46Mikayla Garrison 0.690.380.59Emily Marshall0.29-0.29 0.05Justin Davis-0.01 -0.12 0.37Tamara Brown0.110.360.20Taylor Mahoney-0.49 -0.55 -0.41Joshua Alvarado0.280.580.25Melissa Baldwin0.460.340.17James Wallace-0.30 0.11-0.30Candidates RankCandidateCFO VPE RPM Borda Count RankCandidate Borda Count Tiered Gut-Feeling Kimberly 9 3 10 Melissa 10 3 8 Mikayla 2 2 1 Emily 3 2 3 Justin 5 2 5 Tamara 6 2 6 Taylor 8 3 7 Joshua 4 2 4 Melissa 1 2 2Kimberly Carr Melissa Morgan Mikayla Garrison 2 9 10 Emily Marshall 3 Justin Davis 5 Tamara Brown 6 Taylor Mahoney 8 Joshua Alvarado 4 Melissa Baldwin 1 James Wallace 75 6 9 8 4 7 10 3 1 23 10 4 6 2 7 9 1 5 84 2 7 4 8 3 1 9 10 4James729Table 5Table 3 Candidate Ranking MetricsRank of each candidate, including the final Rank taking into account each agent's individual rankings (calculated through Borda count)In contrast, the CFO's lowest sentiment score is -0.74to one candidate. Consequently, no candidate was clas-towards Melissa Morgan, signaling a significant negativesified as Tier 1, with most classified as Tier 2, except forview. Similarly, the VPE aligns with the CFO in favor-the three candidates with negative valence, who wereing Melissa Baldwin with the highest score of 0.34, butclassified as Tier 3.diverges in its lowest sentiment, which is directed to-wards Taylor Mahoney with a score of -0.55. The RPM,Valenceon the other hand, exhibits the most positive sentimentCandidateCFO VPE RPMtowards Mikayla Garrison with a score of 0.59, whileKimberly Carr-100sharing the CFO's negative sentiment towards MelissaMelissa Morgan-100Morgan, albeit at a less intense level of -0.46. The sen-timent scores from the conversations directly influence the ranking of candidates as shown in Table 5. Applying the Borda Count method to the combined rankings yields a collective decision. Although individual agents might rank candidates differently based on their interactions,Mikayla Garrison 0 Emily Marshall 0 Justin Davis 0 Tamara Brown 0 Taylor Mahoney 0 Joshua Alvarado 0 Melissa Baldwin 00 0 0 0 -1 0 00 0 0 0 0 0 0the aggregated results provide a more comprehensiveJames Wallace000assessment. This approach demonstrates how sentimentanalysis combined with a voting system could informhiring decisions in a multi-agent setting.The intensity of an agent's final sentiment score deter-mines the valence score. In Table 6, this occurred onlythree times: the CFO attributed a negative valence to twocandidates, while the VPE attributed a negative valence

Table 66Valence for each candidate

Table 77Sentiment Volatility for each candidate

VolatilityCandidateCFO VPERPMKimberly Carr0.580.640.40Melissa Morgan0.160.320.47Mikayla Garrison 0.31-0.47 0.17Emily Marshall0.170.420.21Justin Davis0.490.160.34Tamara Brown-0.06 -0.27 -0.16Taylor Mahoney0.130.260.17Joshua Alvarado0.090.090.42Melissa Baldwin0.140.460.61James Wallace0.390.540.39ConvictionCandidateCFO VPERPMKimberly Carr-0.30 -.0.08 0.11Melissa Morgan-0.12 -0.07-0.22Mikayla Garrison 0.21-0.180.10Emily Marshall0.05-0.120.01Justin Davis0.00-0.020.13Tamara Brown-0.01 -0.10-0.03Taylor Mahoney-0.06 -0.14-0.07Joshua Alvarado0.020.050.10Melissa Baldwin0.060.160.10James Wallace-0.12 0.06-0.12

Table 88Conviction for each candidate (the Sentiment Score of given by agents to a candidate, taking into account the Sentiment Volatility during a conversation about this candidate)

Gpt-4 technical report ROpenai arxiv 2303.08774 2023 13 View in Article 2 YChang XWang JWang YWu KZhu HChen LYang XYi CWang YWang arXiv:2307.03109 A survey on evaluation of large language models 2023 arXiv preprint HLi YQChong SStepputtis JCampbell DHughes MLewis KSycara arXiv:2310.10701 Theory of mind for multi-agent collaboration via large language models 2023 arXiv preprint Multiagent dynamics of gradual argumentation semantics LDDe Tarlé EBonzon NMaudet 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022. 2022 Too many cooks: Bayesian inference for coordinating multi-agent collaboration SAWu REWang JAEvans JBTenenbaum DCParkes MKleiman-Weiner Topics in Cognitive Science 13 2021 Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View JZhang XXu SDeng arXiv: 2023 ZXi WChen XGuo WHe YDing BHong MZhang JWang SJin EZhou arXiv:2309.07864 The rise and potential of large language model based agents: A survey 2023 arXiv preprint DHart SFegley Social imitation and the emergence of a mental model of self 1994 Opinion Dynamics: Models, Extensions and External Effects ASîrbu VLoreto VD PServedio FTria 10.1007/978-3-319-25658-0_17 Participatory Sensing, Opinions and Collective Awareness VLoreto MHaklay AHotho VDServedio GStumme JTheunis FTria

Cham

Springer International Publishing 2017 series Title: A survey on nonstrategic models of opinion dynamics MGrabisch ARusinowska Games 11 65 2020 ECambria DDas SBandyopadhyay AFeraco A practical guide to sentiment analysis Springer 2017 5 Synthetic text generation for sentiment analysis UMaqsud Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis 2015 GBetz arXiv:2104.06737 Natural-language multi-agent simulations of argumentative opinion dynamics 2021 arXiv preprint YLi YZhang LSun arXiv:2310.06500 Metaagents: Simulating interactions of human behaviors for llm-based taskoriented coordination via collaborative generative agents 2023 arXiv preprint YDu SLi ATorralba JBTenenbaum IMordatch arXiv:2305.14325 Improving factuality and reasoning in language models through multiagent debate 2023 arXiv preprint C.-MChan WChen YSu JYu WXue SZhang JFu ZLiu arXiv:2308.07201 Chateval: Towards better llm-based evaluators through multi-agent debate 2023 arXiv preprint GChen SDong YShu GZhang JSesay BFKarlsson JFu YShi arXiv:2309.17288 Autoagents: A framework for automatic agent generation 2023 arXiv preprint ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions JPark BMin XMa JKim arXiv: 2023 XSun XLi SZhang SWang FWu JLi TZhang GWang arXiv:2311.01876 Sentiment Analysis through LLM Negotiations 2023 Simulating Opinion Dynamics with Networks of LLM-based Agents Y.-SChuang AGoyal NHarlalka SSuresh RHawkins SYang DShah JHu TTRogers arXiv: 2023 physics Skills prediction based on multi-label resume classification using cnn with model predictions explanation KJiechieu NTsopze 10.1007/s00521-020-05302-x Neural Computing and Applications 2020 Negative information weighs more heavily on the brain: the negativity bias in evaluative categorizations TAIto JTLarsen NKSmith JTCacioppo Journal of personality and social psychology 75 887 1998 Negativity bias, negativity dominance, and contagion PRozin EBRoyzman Personality and social psychology review 5 2001 The Pollyanna principle: Selectivity in language, memory, and thought MWMatlin DJStang 1978 Schenkman Publishing Company Human language reveals a universal positivity bias PSDodds EMClark SDesu MRFrank AJReagan JRWilliams LMitchell KDHarris IMKloumann JPBagrow Proceedings of the national academy of sciences 112 2015 The social perceptual salience effect MPInderbitzin ABetella ALanatá EPScilingo UBernardet PFVerschure Journal of experimental psychology: human perception and performance 39 62 2013