<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Sentimental Agents: Exploring Deliberation, Cognitive Biases, and Decision-making in LLM-based Multiagent Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elizabeth A. Ondula</string-name>
          <email>ondula@usc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Orner</string-name>
          <email>daniele@braveventurelabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Mumero Mwangi</string-name>
          <email>nick@braveventurelabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Casandra Rusti</string-name>
          <email>rusti@usc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brave Venture Labs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>30th ACM KDD Conference</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Multi-Agent Systems, Large Language Models, Sentiment Analysis</institution>
          ,
          <addr-line>Cognitive Biases, Decision-Making, Opinion Dynamics</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Southern California</institution>
          ,
          <addr-line>Los Angeles</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1991</year>
      </pub-date>
      <abstract>
        <p>How does sentiment afect deliberative opinion dynamics in multi-agent systems using Large Language Models (LLMs)? In this paper, we introduce Sentimental Agents, a framework designed to study collaborative decision-making in a society of agents, each equipped with a distinct Mental Model of Self. We propose a method to integrate sentiment analysis and a non-Bayesian update mechanism, to analyze and interpret agents' beliefs and interactions systematically. This method allows us to observe the volatility of the sentiment associated with diferent agent statements, as well as the change in opinion throughout the agents' conversation. We further use it to model and compare collaborative decision-making approaches. We situate these agents in a simulated Human Resource recruiting environment as a case study to evaluate a candidate's fit for a role. We present a set of metrics to assess the quality of the agents' output. Finally, we explore cognitive biases in the agents' individual and collective opinion formation, a fundamental step to enhance decision-making capabilities and mitigate distortions in the system and the agents' collective reasoning.</p>
      </abstract>
      <kwd-group>
        <kwd>Multiagent</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Multi-agent systems (MAS), composed of interactive
agents have been pivotal in modeling social phenomena,
decision-making processes and collaborative tasks. Large
Language Models (LLMs) such as GPT-4 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have opened
new possibilities for exploring complex social dynamics
through the simulation of linguistic interactions among
agents. These models can provide the necessary
capabilities for simulating communication scenarios. Integrating
LLMs into MAS facilitates the study of conversations and
interaction patterns in a more detailed manner.
      </p>
      <p>
        LLMs have demonstrated exceptional performance in
ing sentiment analysis tasks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the efect
of sentiment on deliberative opinion dynamics within
an artificial society of agents is a domain that has not
yet been fully explored. Traditional agent models may
nEvelop-O
LGOBE
      </p>
      <p>https://eondula.github.io/ (E. A. Ondula); https://bravelabs.ai/
(D. Orner); https://bravelabs.ai/ (N. M. Mwangi);
https://www.linkedin.com/in/casandrarusti/ (C. Rusti)</p>
      <p>0000-0003-0403-0306 (E. A. Ondula); 0009-0005-1264-1985
(D. Orner); 0009-0004-6654-2635 (N. M. Mwangi);
not adequately account for the influence of behavioral
states like sentiment and cognitive biases on the
decisionmaking process. Our work adopts a nuanced approach to
understanding how the output of LLM agents influences
one another within these frameworks.</p>
      <p>We introduce Sentimental Agents, a framework
designed to study and analyze collaborative decision
processes. These agents are not only equipped with language
capabilities but also possess a unique Mental Model of
Self. This allows them to process and exhibit behaviors
that can ofer a comprehensive view of how opinions are
formed and evolve in a multi-agent setting.</p>
      <p>Our system is designed primarily to observe and
deit. We do not currently include objectives, reward
functions, utility metrics or payofs in our model. The focus
is on the natural evolution of interactions among agents
without imposing external incentives or goals. Our study
concentrates on non-strategic interactions. Unlike
strategic agents, which model the behavior of others and act
based on these predictions, our non-strategic agents do
not possess such models. This distinction is crucial as it
means our agents are not engaging in behaviors such as
scheming or deceiving to achieve a specific objective. If
LLM-based multi-agent systems are ultimately to be used
to support decision-making, it is critical to understand
and explain how their decisions are made.  This is
especially true in the hypothetical case of such systems being
designed to evaluate, rank or recommend humans.  At</p>
      <p>
        Additionally, this work introduces metrics for assessing
the quality of conversation and decision-making in
language model-based multi-agent systems. These metrics,
namely nuance, platitudinal score, drift, and defensibility,
ofer a toolkit for evaluating the efectiveness of such
systems in diverse scenarios. Furthermore, we
evaluate cognitive biases including negativity, positivity, and
saliency biases. This assessment ofers valuable insights
into the cognitive influences and tendencies within
multiagent decision-making processes. Finally, the framework
is applied in a simulated Human Resource recruiting
environment, serving as a practical case study. This
application not only validates the theoretical model but
also highlights the practical potential of the approach in
real-world settings.
atically analyze the opinions and interactions of these the use of LLMs in multi-agent settings with a focus on
agents, and the potential correlation between the two. To ”Theory of Mind” (TOM), which is the ability of an agent
remediate this, we make the following key contributions: to understand and predict the mental states and
inten• We develop a framework, Sentimental Agents [], tions of others. Although crucial for collaboration, our
to explore and study collective decision-making focus looks more at how agents make decisions rather
processes in a society of agents. than understanding others’ mental states. Other
stud• We propose using sentiment analysis as a method ies, like those of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] look at how agents can debate and
to quantify content generated by LLM-based make collective decisions using a method known as
gradagents for evaluation and recommendation tasks. ual semantics, where agents exchange arguments and
• We propose a method to apply a non-Bayesian progressively update their opinions to reach a shared
model for opinion dynamics within a multi-agent decision. Our approach is diferent in that it explains the
system. This ofers a perspective on how opin- agent interactions and decision processes leveraging a
ions are formed and altered in a sentiment-driven mental model of self and sentiment tracking. Further,
environment. our agents don’t have access to other agents’ memories.
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explores how agents coordinate in complex tasks that
necessitate both working together on the same task
(cooperation) and dividing the task into smaller parts to be
done individually (divide-and-conquer). This study
highlights the need for flexible strategies to manage tasks that
require both joint and individual eforts, difering from
our work which doesn’t focus on specific task
coordination but rather on general deliberations on various topics.
      </p>
      <p>
        Similarly, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] demonstrate the potential of collaborative
mechanisms with LLMs in enhancing social interactions
among agents, providing valuable insights into how these
technologies can foster collaborative intelligence within
multi-agent settings.
      </p>
      <sec id="sec-2-1">
        <title>2.2. LLM-based Multi-Agent Frameworks</title>
        <p>
          An LLM-based agent is defined as an AI system
compris2. Related Works ing three core components: the brain, perception, and
action modules [7]. The brain module stores knowledge
2.1. Multi-Agent Collaboration and memories, facilitating information processing and
decision-making, essential for reasoning and handling
In the study of multi-agent systems, understanding how new tasks. The perception module extends the agent’s
agents collaborate to achieve collective objectives is es- sensory capabilities to include textual, auditory, and
visential. One interesting approach, explored [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] examines
sual modalities. This enhances its understanding of the
environment. The perception module extends the agent’s
sensory capabilities to include textual, auditory, and
visual modalities. This enhances its understanding of the
environment. The action module enables the agent to
perform physical tasks and interact with its environment.
        </p>
        <p>In terms of operating mechanism, the agent use natural
language for communication, with the brain processing
information from the perception module to form
strategies and make decisions. In our work, we introduce the
concept of a Mental Model of Self (MMS). This concept has
been discussed in social psychology [8]. It refers to an
integrated theory and understanding that an agent forms
to organize and make sense of one’s self-knowledge,
experiences and memories into broader principles that can
guide anticipation of future behaviors and consequences.</p>
        <p>In our implementation, it serves as an important
organizational function in making sense of self-knowledge.We
summarize and show diferences between the Sentimental</p>
        <sec id="sec-2-1-1">
          <title>Agents framework and prior works in Table 1.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Non-strategic Multi-Agent Systems</title>
        <p>Opinion dynamics has been extensively explored for
over six decades, predominantly in the fields of
sociolprinciples that dictate the formation and alteration of
individual opinions under the influence of others. This
involves examining a range of models and frameworks
to comprehend collective behaviors and the process of
consensus formation [9]. Our work focuses on a
nonmodel does not incorporate game theory principles, nor
does it involve agents optimizing specific utilities.</p>
        <p>Non-Bayesian updating, in this context, signifies a
process wherein opinions are modified not based on a
factual or probabilistic framework that converts prior
probabilities into posterior probabilities. Instead, this
approach entails agents updating their opinions
influenced by the views of others, without basing these on an
unknown state of nature. The updating mechanism in
such models can be either synchronous, where all agents
update their opinions simultaneously, or asynchronous,
where updates occur at diferent times. A recent survey
categorizes and discusses various models prevalent in
existing literature [10].</p>
        <p>We further use Sentiment Analysis to investigate
opinions which manifest as either positive or negative
[11]. Studies have shown that generative models, such as
Large Language Models (LLMs), are capable of producing
text, which can include opinions with specific sentiments,
depending on their application [12].
ogy and psychology. It delves into the mechanisms and  is ordered with equal participation.
strategic model within opinion dynamics, meaning the (positive, negative, neutral, and their intensities):</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Evaluating LLM-based Systems</title>
        <p>
          Evaluation for LLMs is emerging as a discipline to
assess the performance of diferent of AI systems.
Currently, for LLMs, there is no single benchmark or
protocol that emerges as universally superior. This reflects the
diversity of tasks and model capabilities. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] provides
an exhaustive summary and discussion based on
existing works. This work covers evaluation tasks, methods
and benchmarks that are crucial for assessing the
performance of LLMs. In our work, we adopt a nuanced
approach to evaluation. We define specific metrics to
assess the conversation quality. These metrics include
nuance, platitudinal score, drift and defensibility scores,
which are detailed in Section 5.6.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <sec id="sec-3-1">
        <title>3.1. Conversation protocols</title>
        <p>Consider a conversational simulation system with a set
of agents denoted as ℳ = { 1,  2, … ,   }. Each agent
  ∈ ℳ is initialized with a Mental Model of Self (MMS)
and a memory component for storing an opinion log. In
this system, the engagement among agents in each round
Definition 1.</p>
        <p>Argument ( ) is a component of an
opinion that contributes to its overall sentiment.</p>
        <p>For each argument  a sentiment value   is assigned,
mapping the argument to a spectrum of sentiment values
where  ∶  ↦ 
for arguments.</p>
        <p>Definition 2.
 is a set of arguments  .</p>
        <p>is the sentiment mapping function
(  , ) is the opinion   in a given round
sentiment values   of all its arguments:
The sentiment of an opinion   is the average of the

 =  ()
(1)
(2)
(3)
  (  , ) =</p>
        <p>1
|(
 , )|
∈(
∑


for each agent   is updated to reflect the sentiment of the
A comparison of diferent language model-based multi-agent frameworks.</p>
        <p>The sentiment update is executed using a Non-Bayesian
method, mathematically represented by:
newly formed opinion. This process considers the senti- The Valence   for each item  is determined based on
ment values   of the arguments within the opinion  . the sentiment of opinion   . For   &lt; −0.5,   = −1; for
Memory
Belief
Store/Retrieve
Internal critic
Chat history
Specialized roles
User-driven
Rationale analysis
Dynamic Memory
Opinion logs</p>
        <p>No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes</p>
        <p>No
No
No
Confirmation bias
Fact-checking
User-preference
Credibility check
Opinion classifier
Cognitive bias
When the conversation ends we take the total sentiment. efectiveness of Sentimental Agents. These agents are
de[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]</p>
        <p>()represents the average sentiment of all the
arguments expressed by agent   at round  , with each
argument  having its sentiment value</p>
        <p>=  () . The
parameter  is a weighting factor that determines the
influence of the new opinion’s average sentiment on the
agent’s updated sentiment.</p>
        <p>The change in sentiment Δ</p>
        <p>()for agent   is then
calculated as the absolute diference between the updated
sentiment value  
sentiment value  
 ( − 1)at round  − 1 :
 ()at round  and the agent’s previous
Δ   () = |</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Collective decision protocols</title>
        <p>We have the final sentiment score and we have the
average of the  0 for the gut feeling protocol
Definition 3.</p>
        <p>Borda Count Protocol: A method to
collectively rank a list of items, given each individual’s order
of preference.
is   
Given  , the number of items, each agent   ranks these
items. The point assignment for an item  by agent  
, , with the top-ranked item receiving  points and
the last receiving 1 point. The total points for each item
 is calculated as   = ∑
descending order of their total points   .</p>
        <p>|ℳ|
=1   
, , and items are ranked in
Definition 4.</p>
        <p>Tiered List Protocol: A method to
collectively classify a list of items in 3 tiers, given the items that
each individual can’t accept, and the items they like the
most.
(5)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Applying the Framework</title>
      <p>= 1, Tier 2 for   = 0, and Tier 3 for   = −1.
−0.5 ≤   ≤ 0.5,   = 0; and for   &gt; 0.5,   = 1. Items
are classified into three tiers according to   : Tier 1 for
Definition 5.</p>
      <p>Gut-feeling List Protocol: A method to
collectively rank a list of items based on the confidence of
individuals’ feeling toward each item.</p>
      <p>The volatility    , of agent   ’s sentiment towards item
 over several rounds is calculated. Conviction    , is
derived as a function of both volatility    , and the final
sentiment score    , for item  . The Gut-feeling list is
then generated using a Borda count based on   
, for each
item across all agents, and items are ranked based on the
order.
total Conviction points    =
|ℳ|
∑
=1 
  , in descending
Our framework is applied to a simulated environment
inspired by Human Resource recruiting to evaluate the
signed to generate opinions reflecting their unique
expertise, contributing to collective decision-making. The
simulation explores opinion formation and decision-making
processes within an LLM-based multi-agent setting,
mirroring real-world HR recruitment where employers
assess candidates through discussions with various experts.
In this context, LLM-based agents are expected to engage
in conversation and form diverse opinions that influence
their decision-making in a simulated recruiting scenario.</p>
      <sec id="sec-4-1">
        <title>4.1. Configuration</title>
        <p>In the HR recruiting simulation, advisor agents analyze
candidates’ CVs and engage in discussions to provide
opinions about each candidate. These agents, with
expertise in roles like Chief Financial Oficer (CFO), Vice
President of Engineering, and Recycling Plant Manager,
evaluate profiles and generate text reports. They also
score candidates and, through collective decision-making
protocols like the Borda Count, rank candidates or select
the top performers.
4.1.1. Dataset
We sourced our dataset from the study conducted by
[21]. This dataset is a collection of resumes represented
in a multi-label format. To facilitate easy access and
integration of this dataset into our framework, we have
developed a script that automates the process of
downloading and parsing the data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Sentimental Agents Framework</title>
      <p>The system design as shown in Fig 1, consists of 7
modules. We describe each of them here.</p>
      <sec id="sec-5-1">
        <title>5.1. Brief Module</title>
        <p>and transparent data. Figure 2 shows agent initialization
prompt.
The module provides a configuration interface for
system initialization with four components: input type,</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.3. Opinion Dynamics Module</title>
        <p>output type, task type, and context. It handles
single and multiple item formats for input and output and This module coordinates agent conversations and
requires user-defined context specifying task object decision-making, consisting of conversation and
decisionand subject, with optional Knowledge base integration. making protocols. It focuses on: defining the number of
Predefined rules in the module automatically associate agents, engagement type and stopping mechanism. The
Input, Output, and Task Types. The logic enforces spe- current implementation employs ordered engagement
cific task types Evaluate, Score, Classify for single-item with equal participation. For the stop mechanism, the
inputs and broader tasks for multi-item inputs. For rank module uses a non-strategic approach, diferentiating
tasks, the output is structured as a list to match task from strategic interactions. In this non-strategic context,
requirements. This design ensures alignment between conversations conclude based on non-Bayesian updating
input/output formats and system functionality. as shown in Algorithm 1 , where they end once agents’
opinions reach stability. This contrasts with strategic
5.2. Agent Initialization interactions, which involve diferent mechanisms like
rewards or objectives.</p>
        <p>The Agent Initialization module includes two main
elements: Mental Model of Self (MMS) and Memory with
qualitative and quantitative opinion logs. It configures 5.4. Conversation Module
agent interaction types for opinion formation as dynamic This module analyzes agents’ statements in
conversaor independent. The module requires user input to set the tions, comprising four components: Argumentation,
number of agents and their expertise, which informs the which breaks down statements into arguments for
qualicreation of detailed agent profiles, including priorities, tative logging; Sentiment Analysis, which evaluates and
objectives, and evaluation criteria. Figure 3 shows an in- quantitatively logs the sentiment of each argument;
Opinstance of an MMS. Key parameters include the tolerance ion Change, using non-Bayesian updating to monitor
senlevel, afecting opinion change propensity, and the drift timent shifts; and Conversation Trends, gauging
signifimetric, which tracks MMS variability. Strategies for main- cant changes across rounds to infer opinion stabilization
taining agent consistency involve controlled character and conversation conclusion.
prompts and setting MMS prompt temperature. The
opinion formation, generated via boolean input, influences 5.5. Decision Module
the nature of agents’ decision-making processes. Each
agent’s opinion log is stored in a central memory system, The Decision Making Protocol module is designed to
acensuring decision-making is based on comprehensive commodate various decision-making protocols, including
Borda Count, Tiered List, and Gut Feeling List, as detailed</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.6. Evaluation and Cognitive Bias</title>
      </sec>
      <sec id="sec-5-4">
        <title>Modules</title>
        <p>This module evaluates the quality of conversations
through various metrics.</p>
        <p>• Nuance: Examines the diversity of themes and
perspectives, quantified by the number of
topics identified within individual statements or the
entire conversation.
• Platitudinal Score: Calculated using cosine
similarity, it measures the uniqueness of outcomes
in the conversation rounds, with higher scores
indicating less similarity between diferent runs.
• Drift : Assesses the stability of each agent’s
Mental Model of Self, monitoring the relevance of
results to the advisors’ profiles and checking for
consistency throughout the conversation.
• Defensibility: Evaluates the strength and
evidence backing of the agents’ arguments, ensuring
they are well-supported and referenceable.</p>
        <p>In this research, we examine three cognitive biases: [27]. For the result shown, the language model
paramnegativity, positivity, and saliency.</p>
        <p>Negativity bias
might lead agents to give undue weight to adverse opin- temperature = 1.5.
ions [22] [23], while positivity bias could result in an
overemphasis on favorable views [24] [25]. Saliency bias,
on the other hand, might cause agents to focus on the
most prominent or emotionally striking aspects of an
opinion, potentially overshadowing other relevant
information [26].</p>
        <sec id="sec-5-4-1">
          <title>Algorithm 1 Non-Bayesian Updating</title>
        </sec>
        <sec id="sec-5-4-2">
          <title>1: for each round  do</title>
          <p>if  &gt; 0 then
for each agent   ∈ ℳ do
Equation 5 Δ 
Equation 4  

 () = |</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <p>() &lt; threshold for each   ∈ ℳ or  =
_ 
to False
In this study, we aim to investigate the dynamics of
sentiment and opinion formation in an LLM-based
multiagent system. We focus on understanding how agents’
opinions evolve through deliberation, and how sentiment
influences their decision-making processes. Our research
questions are as follows:
1. How do agents’ opinions change as a result of
deliberating with each other, and can we quantify
these changes?
in qualitative results?
2. Do agents adopt each other’s arguments during
the deliberation process, and can we observe this
3. Does the sentiment of an argument (valence,
arousal) afect its adoption by other agents?
4. Do agents exhibit cognitive biases in their opinion
formation, and how can we identify and mitigate
these biases?</p>
      <sec id="sec-6-1">
        <title>6.1. Experimental Setting</title>
        <p>In our experiments, we conducted the simulation with 3
agents and 10 candidates. We used the data set within the
simulation environment described in Section 4. For the
LLM, we used the gpt-3.5-turbo-0613 version of ChatGPT
eters were set as alpha  = 0.5, tolerance = 0.00001, and
(c)
(h)
(c)
(d)
(i)
(d)
(e)
(j)
(e)
(a)
(f)
(b)
(g)
6.2. Results Drift scores . In Table 3 it is observed that the CFO
agent generally exhibits moderate drift, while the VP
6.2.1. Evaluation Metrics of Engineering (VPE) and the Recycling Plant Manager
The non-Bayesian updating data from the simulation, (RPM) show higher drift values, suggesting a more
dyshown in Figure 4, reveals sentiment fluctuations among namic adaptation of their MMS in response to the
conagents. For instance, Figure 5a shows the VP of Engineer- versation. This variability in drift signifies the agents’
ing exhibiting the most dramatic change, especially in the difering levels of adaptability and potential reevaluation
ifnal round. This volatility, captured by sentiment and of their initial stances
change metrics, highlights the dynamic nature of opinion
formation in multi-agent conversations and suggests that Candidate CFO VPE RPM
agents’ opinions evolve and respond to the unfolding dis- KMiemlibsesarlyMCoargrran 00..64252044 00..76535068 00..57811308
cuopudrastein,ge minpchaapstiuzirninggthreeael -fetcimtiveepneerssspoecftnivoens-Bhiaftsy. es1ian EMmikilayyMla aGrsahrrailslon 00..43627584 00..75989788 00..75379200</p>
        <p>Platitudinal score. The inter-agent similarities Justin Davis 0.3458 0.3638 0.3940
heatmap shown in Figure 5 reveals a contrast in sen- Tamara Brown 0.3842 0.6030 0.6574
timent alignment among the agents. This divergence Taylor Mahoney 0.3814 0.4794 0.4154
contributes to an overall lower platitudinal score for this Joshua Alvarado 0.3756 0.5238 0.5788
specific run for the given candidate. Such diversity in Melissa Baldwin 0.4228 0.7988 0.5714
sentiment, as captured by the platitudinal metric, un- James Wallace 0.4240 0.6342 0.6926
derscores the variation in decision-making approaches Table 2
within the agent group, emphasizing the balance between Agent Drift Values for hypothetical candidates
consensus and individual thought in the simulation
outcomes.</p>
        <p>Nuance Scores We use Latent Dirichlet Allocation
1For brevity, we only show results for 5 candidates, but the
experiment was conducted with 10 candidates for the platitudinal scores,
well as the data point distribution, provides insights into
preprocessed by tokenization and removal of stop words
the cognitive tendencies of the agents. Additionally, by
and unwanted words. A dictionary and corpus are
conadjusting our three parameters, alpha, tolerance, and
temstructed using the Gensim library. The LDA model iden- perature, we aim to better understand how these factors
tifies 5 topics, with the top 10 words per topic being most
afect agents’ cognitive biases. This study ofers
imporsignificant. Figure 6 and ?? show the number of unique
tant insights into the decision-making processes in
multiwords per topic and word clouds for each candidate, re- agent systems, particularly in sentiment-influenced
conanalyze nega tive and positive sentiments separately, set- in ranks among agents could reflect unique valuations of
ting the y-intercept at zero to indicate that neutral peer
statements might not impact an agent’s sentiment.
Analyzing the regression’s strength ( 2) and the slope, as</p>
        <sec id="sec-6-1-1">
          <title>2https://github.com/langchain-ai/langchain</title>
          <p>texts.</p>
          <p>In our sensitivity analysis, we varied key parameters:
setting alpha to 0.3, 0.5, and 0.7; tolerance to 0.001, 0.005,
and 0.0001; and temperature to 0.7, 1, and 1.5, to
evaluate their impact on sentiment changes. The outcome,
depicted in Figure 7 for ten random candidates, provides
insight into negativity and positivity biases through the
slopes of the OLS regressions. Our findings on this
variation of model parameters show a modest positivity bias,
evidenced by the positive slope being approximately 29%
steeper than its negative counterpart. A slight positivity
or negativity bias trend persisted across varied
parameter settings, with some scenarios, notably alpha = 0.3,
tolerance = 0.005, and temperature = 1.5, showing a more
pronounced positivity bias with a slope more than twice
as steep on the positive side than on the negative side.</p>
          <p>The absence of saliency bias was noted in all
experiments, as indicated by slopes remaining below 1. Linear
based on our evaluation of the  2 values. Notably in the
shown experiment, agents displayed a tendency towards
expressing stronger negative sentiments, with the most
negative reaching -0.76, compared to a maximum
posinegative expressions was marked in most scenarios.
Additionally, the alpha parameter was observed to
significantly influence sentiment ranges, with lower alpha
values yielding more constrained ranges.</p>
          <p>For future studies, we aim to extend our examination of
expansion will enable us to deepen our understanding
of how parameter tuning influences cognitive biases and
decision-making processes within our framework.
6.2.3. Collective decision-making
The decision-making data reveals diverse agent
preferences, as evidenced by the variation in candidate ranks
across Borda Count, Tier, and Conviction. We use the
average sentiment score,  

(), from equation 4, where  is
the last round, as the basis for collective decision-making.</p>
          <p>While some candidates consistently rank higher or lower,
suggesting a consensus on their suitability, discrepancies
candidate qualities.</p>
          <p>Table 4 shows the overall sentiment scores. The
CFO shows the highest sentiment score, of 0.46 towards
Melissa Baldwin, indicating a strong positive inclination.
spectively.
6.2.2. Cognitive Bias Testing
We hypothesize that agents’ updates in sentiment during
conversational rounds might be influenced by their peers’
positive, negative, or prominent opinions. To investigate
this, we chart each agent’s sentiment change from the
second round onwards, against the recent sentiments
of other agents. This analysis reveals the correlation
between an agent’s changing sentiment and the influence
of peer opinions.</p>
          <p>We apply Ordinary Least Squares (OLS) regression to</p>
          <p>Defensibility Scores Candidate resumes are
processed through the Langchain embedding2 and
transsis. The llama index libraries, VectorStoreIndex and
ServiceContext, are used to create an indexed
repository of the vectorized documents. This index serves as a
searchable database, allowing eficient retrieval of text
formed into a format suitable for detailed analy- regression was determined as the most suitable model
segments that are contextually similar to a given input. tive sentiment of 0.62. This inclination towards stronger
for the argument. If no relevant text is found, a score of the cognitive bias to larger candidate sample sizes. This
In contrast, the CFO’s lowest sentiment score is -0.74 to one candidate. Consequently, no candidate was
clastowards Melissa Morgan, signaling a significant negative sified as Tier 1, with most classified as Tier 2, except for
view. Similarly, the VPE aligns with the CFO in favor- the three candidates with negative valence, who were
ing Melissa Baldwin with the highest score of 0.34, but classified as Tier 3.
diverges in its lowest sentiment, which is directed
towards Taylor Mahoney with a score of -0.55. The RPM, Valence
on the other hand, exhibits the most positive sentiment Candidate CFO VPE RPM
towards Mikayla Garrison with a score of 0.59, while Kimberly Carr -1 0 0
sharing the CFO’s negative sentiment towards Melissa Melissa Morgan -1 0 0
Morgan, albeit at a less intense level of -0.46. The sen- Mikayla Garrison 0 0 0
timent scores from the conversations directly influence Emily Marshall 0 0 0
the ranking of candidates as shown in Table 5. Applying Justin Davis 0 0 0
Tamara Brown 0 0 0
the Borda Count method to the combined rankings yields Taylor Mahoney 0 -1 0
a collective decision. Although individual agents might Joshua Alvarado 0 0 0
rank candidates diferently based on their interactions, Melissa Baldwin 0 0 0
the aggregated results provide a more comprehensive James Wallace 0 0 0
assessment. This approach demonstrates how sentiment
analysis combined with a voting system could inform Table 6
hiring decisions in a multi-agent setting. Valence for each candidate</p>
          <p>The intensity of an agent’s final sentiment score
determines the valence score. In Table 6, this occurred only The sentiment volatility of the agents, as shown in
Tathree times: the CFO attributed a negative valence to two ble 7, was mostly moderate, indicating strong conviction
candidates, while the VPE attributed a negative valence in their opinions. However, there were instances of high
volatility, such as the CFO’s sentiment towards Kimberly
Carr and Mikayla Garrison, and the VPE’s sentiment
towards Kimberly, Joshua, and James. The RPM’s
sentiment was volatile towards Mikayla and Melissa Baldwin.</p>
          <p>The agents’ conviction in their opinions is calculated by
dividing the final sentiment by the volatility, with higher
values indicating stronger intuition about a candidate’s
suitability for the role.</p>
          <p>Sentiment Volatility</p>
          <p>Gut Feeling Rank for each candidate is a revised
ranking that takes into account an agent’s conviction in its
own sentiment. In Table 9, the Gut Feeling of the RPM
toward Joshua is still to rank him in the first place. But
the CFO revises its ranking of Kimberley, from the 9th
place to the 8th place. The more generous ranking can
be interpreted as a result of the ”acknowledgement” of
the RPM agent that it is not sure of its opinion toward
Kimberly.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper, we introduce Sentimental Agents,
LLM-based agents that generate opinions for collective
decision-making within conversational settings. Our
proposed framework integrates a non-Bayesian updating
mechanism to track sentiment volatility and opinion
evolution. In a simulated HR recruiting scenario, we assess
these agents’ decision-making abilities, noting their
diverse opinions and preference shifts over multiple rounds.</p>
      <p>The findings suggest model parameters, such as alpha
and tolerance, significantly influence sentiment
expression and thus cognitive bias within the system. This
research ofers a foundation for advanced tool development
applicable to domains such as HR recruiting, medical
diagnostics, or educational domains.
View, 2023. URL: http://arxiv.org/abs/2310.02124, [20] Y.-S. Chuang, A. Goyal, N. Harlalka, S. Suresh,
arXiv:2310.02124 [cs]. R. Hawkins, S. Yang, D. Shah, J. Hu, T. T. Rogers,
[7] Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, Simulating Opinion Dynamics with Networks of
M. Zhang, J. Wang, S. Jin, E. Zhou, et al., The rise LLM-based Agents, 2023. URL: http://arxiv.org/abs/
and potential of large language model based agents: 2311.09618, arXiv:2311.09618 [physics].</p>
      <p>A survey, arXiv preprint arXiv:2309.07864 (2023). [21] K. Jiechieu, N. Tsopze, Skills prediction based
[8] D. Hart, S. Fegley, Social imitation and the emer- on multi-label resume classification using cnn
gence of a mental model of self. (1994). with model predictions explanation, Neural
[9] A. Sîrbu, V. Loreto, V. D. P. Servedio, F. Tria, Computing and Applications (2020). URL: https://
Opinion Dynamics: Models, Extensions and Exter- doi.org/10.1007/s00521-020-05302-x. doi:10.1007/
nal Efects, in: V. Loreto, M. Haklay, A. Hotho, s00521- 020- 05302- x.</p>
      <p>V. D. Servedio, G. Stumme, J. Theunis, F. Tria [22] T. A. Ito, J. T. Larsen, N. K. Smith, J. T. Cacioppo,
(Eds.), Participatory Sensing, Opinions and Collec- Negative information weighs more heavily on the
tive Awareness, Springer International Publishing, brain: the negativity bias in evaluative
categorizaCham, 2017, pp. 363–401. URL: http://link.springer. tions., Journal of personality and social psychology
com/10.1007/978-3-319-25658-0_17. doi:10.1007/ 75 (1998) 887.
978- 3- 319- 25658- 0_17, series Title: Understand- [23] P. Rozin, E. B. Royzman, Negativity bias, negativity
ing Complex Systems. dominance, and contagion, Personality and social
[10] M. Grabisch, A. Rusinowska, A survey on non- psychology review 5 (2001) 296–320.
strategic models of opinion dynamics, Games 11 [24] M. W. Matlin, D. J. Stang, The Pollyanna
princi(2020) 65. ple: Selectivity in language, memory, and thought,
[11] E. Cambria, D. Das, S. Bandyopadhyay, A. Feraco, Schenkman Publishing Company, 1978.
et al., A practical guide to sentiment analysis, vol- [25] P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J.
ume 5, Springer, 2017. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M.
[12] U. Maqsud, Synthetic text generation for sentiment Kloumann, J. P. Bagrow, et al., Human language
reanalysis, in: Proceedings of the 6th Workshop on veals a universal positivity bias, Proceedings of the
Computational Approaches to Subjectivity, Senti- national academy of sciences 112 (2015) 2389–2394.
ment and Social Media Analysis, 2015, pp. 156–161. [26] M. P. Inderbitzin, A. Betella, A. Lanatá, E. P. Scilingo,
[13] G. Betz, Natural-language multi-agent simulations U. Bernardet, P. F. Verschure, The social perceptual
of argumentative opinion dynamics, arXiv preprint salience efect., Journal of experimental psychology:
arXiv:2104.06737 (2021). human perception and performance 39 (2013) 62.
[14] Y. Li, Y. Zhang, L. Sun, Metaagents: Simulating [27] https://platform.openai.com/docs/models/gpt-3-5,
interactions of human behaviors for llm-based task- 2024. Accessed: 2024-10-9.
oriented coordination via collaborative generative
agents, arXiv preprint arXiv:2310.06500 (2023).
[15] Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, I.
Mordatch, Improving factuality and reasoning in
language models through multiagent debate, arXiv
preprint arXiv:2305.14325 (2023).
[16] C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang,</p>
      <p>J. Fu, Z. Liu, Chateval: Towards better llm-based
evaluators through multi-agent debate, arXiv
preprint arXiv:2308.07201 (2023).
[17] G. Chen, S. Dong, Y. Shu, G. Zhang, J. Sesay, B. F.</p>
      <p>Karlsson, J. Fu, Y. Shi, Autoagents: A framework
for automatic agent generation, arXiv preprint
arXiv:2309.17288 (2023).
[18] J. Park, B. Min, X. Ma, J. Kim, ChoiceMates:
Supporting Unfamiliar Online Decision-Making with
MultiAgent Conversational Interactions, 2023. URL: http:
//arxiv.org/abs/2310.01331, arXiv:2310.01331 [cs].
[19] X. Sun, X. Li, S. Zhang, S. Wang, F. Wu, J. Li,</p>
      <p>T. Zhang, G. Wang, Sentiment Analysis through
LLM Negotiations, 2023. URL: http://arxiv.org/abs/
2311.01876, arXiv:2311.01876 [cs].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>R. OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report. arxiv 2303</source>
          .08774, View in Article 2 (
          <year>2023</year>
          )
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey on evaluation of large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2307.03109</source>
          (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/ 2307.03109.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. Q.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stepputtis</surname>
          </string-name>
          , J. Campbell,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hughes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sycara</surname>
          </string-name>
          ,
          <article-title>Theory of mind for multi-agent collaboration via large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2310.10701</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>L. D. de Tarlé</surname>
            , E. Bonzon,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Maudet</surname>
          </string-name>
          ,
          <article-title>Multiagent dynamics of gradual argumentation semantics</article-title>
          ,
          <source>in: 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS</source>
          <year>2022</year>
          ),
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Parkes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleiman-Weiner</surname>
          </string-name>
          ,
          <article-title>Too many cooks: Bayesian inference for coordinating multi-agent collaboration</article-title>
          ,
          <source>Topics in Cognitive Science</source>
          <volume>13</volume>
          (
          <year>2021</year>
          )
          <fpage>414</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Exploring Collaboration Mechanisms for LLM Agents:</article-title>
          A Social Psychology
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>