<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can LLMs Mediate Synchronous Dispute Dialogues?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>James Hale</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>HanMoe Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahyoung Choi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter H. Kim</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Gratch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gachon University</institution>
          ,
          <addr-line>Seongnam</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Southern California</institution>
          ,
          <addr-line>Los Angeles, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>117</fpage>
      <lpage>131</lpage>
      <abstract>
        <p>Characterized by elevated hostility, disputes often result in disputants being unable to resolve their diferences. This emphasizes the role of a mediator; however, mediators are typically highly specialized, dificult to find, and expensive. While alternatives exist, such as acquaintances or online moderators, these prove less efective. This raises the question of whether AI can efectively facilitate contentious disputes. We assess the potential of LLMs as mediators. First (Study 1), we focus on a large open-source corpus of customer service disputes that are objectively classified in terms of two key reasons for mediation: (1) whether or not the dispute ended in success or failure, and (2) to what extent the participants reported frustration with each other. We examine whether LLMs could be prompted to predict when a mediator should intervene in advance, and show that the decision to intervene is correctly sensitive to these two factors. Finally, we conducted a user study that compared AI to human suggestions on when and how to intervene. We find that the LLMs were rated significantly better in predicting when to intervene, rated as providing a better rationale for intervening, and rated as providing a more efective mediation message to the disputants. Remarkably, observers preferred the AI 2-to-1 over the human mediation decisions. Second (Study 2), we collect a small corpus of mediated dispute dialogues and analyze the efectiveness of novice human mediators. Quantifying the similarity of novice mediators' behaviors to those of an LLM, we find the closer one mediates to an LLM, the better the outcome - both subjectively and objectively. These results indicate LLMs not only outperform novice mediators but may serve to identify efective mediators. While these results do not definitely demonstrate LLMs can mediate human disputes, we do show they can sense escalation indicators and generate sensible messages - this work serves as a first-step toward AI-mediators in human conflict.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Adaptive Dialogue Management</kwd>
        <kwd>Human-Computer Interaction</kwd>
        <kwd>Mediation</kwd>
        <kwd>Dispute</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        This paper describes two studies assessing the ability of Large Language Models (LLMs) to mediate
contentious and emotional disputes. Disputes arise when one party in a relationship makes a claim that
another party rejects, thus threatening the future of that relationship [
        <xref ref-type="bibr" rid="ref1 ref27">1</xref>
        ]. For example, in a customer
service dispute, a customer might demand that they deserve a refund, but the store owner rejects
this claim. These relate to negotiation, which has been studied extensively in artificial intelligence
research [
        <xref ref-type="bibr" rid="ref2 ref28 ref29 ref3 ref30 ref31 ref4 ref5 ref6 ref7">2, 3, 4, 5, 6, 7</xref>
        ], but involve unique social processes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Negotiations are forward-looking:
parties focus on the potential gains of making a deal and establishing a new relationship. Disputes are
backward-looking: parties are focused on a perceived injustice by the other party and the potential costs
of ending an existing relationship. As a result, disputes are characterized by strong emotions such as
anger. Whereas prior work has shown expressions of anger promote compromise in negotiation [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ],
it can provoke retaliation in a dispute [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For example, disputants leverage appeals to justice (“You
violated my rights!”) or threatening harm (“I will sue you!”) as they attempt to overpower their
counterpart — the efect of which can be an escalatory spiral, including threats of physical violence [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">12,
13, 11</xref>
        ]. As such, the consequences of a spiraling dispute can outweigh the original perceived harm. The
key to successful dispute resolution is to get parties to shift their focus away from the past dispute and
forward to a potential negotiated agreement. Software that could forecast these negative tendencies
and intervene before parties reach an impasse could have enormous societal benefits.
      </p>
      <p>
        In evaluating LLMs as mediators, we start by examining a large corpus of dispute dialogues in which
online participants acted as buyers or sellers in a simulated purchase conflict [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Hale et al. crafted
this scenario in collaboration with a dispute resolution expert to evoke strong emotions while adhering
to ethical guidelines for human experimentation. Participants were recruited online and could receive a
substantial bonus if they reached an agreement. However, many of the disputes escalated and ended
without agreement, thus forfeiting their bonus. Even when agreements were reached, disputants often
reported high frustration with their partner. We want to see if LLMs, on this open-source corpus, could
forecast impasses in advance and show potential to use tactics to steer parties towards agreement.
Building on this, we create a new corpus — using the same scenario — introducing human novices to
mediate the dispute, making the task triadic. Figure 1 depicts one such example. Through analysis of
these two corpora, we hope to demonstrate the potential for LLMs to outperform human novices.
      </p>
      <p>
        First, with open-source dyadic dispute corpus (KODIS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]) we set out to demonstrate LLMs are
sensitive to features salient in disputes — such as disputant Frustration and dispute outcome — in
determining whether to intervene, without directly observing them. We further run a small user study
to show third parties prefer LLM mediations to novice human ones. Next, with the triadic mediation
corpus we collect, we attempt to quantify the similarity of the human mediator to what an LLM one
would have done, and show the more LLM-like a novice mediator behaves the better outcomes (objective
and subjective) the parties achieve.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The dispute resolution literature outlines empirical and theoretical guides for mediation best practices.
For example, several studies show the efect of mediator bias on disputant outcomes [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ], emphasizing
the need for impartiality. Additionally, much work on the role of emotion in mediation exists [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ],
with Boland and Ross underscoring the need for a mediator to possess emotional intelligence [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
Importantly, in a dispute context, often escalatory spirals will manifest, where a disputant will exhibit
increased displays of hostility in response to hostility [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] — this raises the importance of a mediator
understanding the complex emotional dynamics at play, such that they can prevent derailment. With
the dissemination of artificial intelligence of late, conflict researchers question whether AI agents can
adequately understand the dispute context and dynamics, and efectively employ mediation tactics to
guide disputants away from impasse.
      </p>
      <p>
        The AI research community has surveyed the growing capabilities of AI in mediation and moderation
tasks. Prior work has examined the potential of AI to detect and potentially intervene in disputes,
primarily over posts on social media. Much of this work has focused on recognizing overtly toxic
comments after they have been posted, such as detecting personal attacks [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] or general toxicity [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
Methods have also explored whether AI could suggest helpful comments to resolve the dispute. For
example, Cho et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] evaluated AI conversational moderators that intervene in emotional disputes
on Reddit. More recent work has explored whether models could forecast if a conversation was likely
to derail in the near future, again focusing on social media disputes [
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ]. Lai et al. propose a
humanagent interaction pattern where a human and AI moderators work together, finding that human-agent
teams achieve superior precision in moderating content [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. In contrast to moderation, Govers et al.
analyze the extent to which AI can act as a mediator in online environments – their work demonstrates
that large language models can efectively depolarize online communities [ 27]. Tessler et al. demonstrate
AI (LLMs) outperform human mediators in facilitating agreement in contentious debates on divisive
topics – i.e., the AI-mediated groups more often found common ground than human-mediated ones [ 28].
Tan et al. compared LLM mediators with novice human mediators in hand-crafted dispute resolution
dialogues, where they found that LLM mediators operated at or above the performance level of novice
humans [29]. Our work1 diferentiates itself in the following two studies:
• First (Study 1), we demonstrate the potential for LLMs to identify salient dispute features and
efectively guide disputants toward resolution, outperforming human novices, by leveraging a
pre-collected corpus of dispute dialogues.
• Second (Study 2), we collect triadic mediation dialogues, where human disputants interact with a
human mediator — this design allows for deeper analysis of objective and subjective outcomes for
disputants. In subsequent analysis, we found humans who mediate as the LLM mediator would
have done achieve better outcomes.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Study 1: When &amp; How to Intervene</title>
      <p>
        Study 12 motivates the potential of LLMs to act as mediators in a dispute context, leveraging an
opensource corpus of dispute resolutions KODIS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We examine the two questions of when — i.e., the
LLM determines to intervene or not given the progress of the dispute — and how — i.e., the type of
message generated for disputants — LLMs mediate disputes. Firstly, we analyze if LLMs can discern
salient aspects of the dispute — e.g., Frustration and Outcome — to intervene appropriately; i.e., when
determining when to intervene, we expect the LLM to do so in disputes heading toward impasse or
those with high self-reported frustration. Secondly, getting to the question of if LLMs can decide how to
mediate, we run a small user-study to evaluate intervention messages generated by an LLM compared
to messages generated by novice human mediators — expecting LLMs to craft more efective mediation
messages than human novices.
      </p>
      <sec id="sec-3-1">
        <title>3.1. When to Intervene</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Methodology</title>
          <p>First, we investigate if an LLM (gpt-4-0613) can efectively determine when to intervene in disputes
pulled from the KODIS corpus. In KODIS, a buyer and seller dispute over an online order of a basketball
jersey. The buyer reads a role-play prompt that states they ordered a Kobe Bryant jersey from an
online seller, however, received a generic one instead; on reaching out, the seller denies their refund
request; and each side then posts negative reviews about the other. The seller reads a similar prompt,
but believes they never described the product as a Kobe Bryant jersey. Thus, the scenario primes the
buyer and seller to argue over facts — a primary characteristic of disputes — as they attempt to resolve
four core issues: whether the buyer receives a refund, whether the buyer removes their negative review
of the seller, whether the seller removes their negative review of the buyer, and whether each side
apologizes. KODIS contains many participant responses, though we focus on the objective outcome
(whether the dispute ended in resolution or impasse) and self-reported frustration (from the Tactics
scale [31], an average of the frustration-related questions from each side). Using pre-collected dialogues
from this corpus, we experiment with LLM mediators.
1For these studies, we received IRB approval and consent from participants, which they could revoke at any time.
2Study 1 is an extension of previous work by the authors [30].</p>
          <p>Specifically, we analyze whether an LLM can pick up on salient features in a dispute (e.g., frustration,
and outcome) in determining to intervene. We iterate through each dialogue exchange in each dispute,
giving the LLM the conversation history thus far and asking it to determine whether to intervene at
the current point (see Figure 7 for the prompt used). We construct the prompt ensuring the model
understands its role as a mediator; identifies the severity of the situation on a scale from one to ten
(Intervention Score); selects the reason for intervention from four categories (Escalation of conflict ,
Impasse, Miscommunication, or Unreasonable demands); and generates an appropriate response to guide
the parties. We expect the LLM to ascribe higher Intervention Scores if participants report higher
frustration and if the dispute ends in an impasse.
3.1.2. Results
We perform a moderated regression ( (3, 1419) = 323.6 ,  &lt; .001 ,  2 = .41) to determine whether the
diferences in the Mean Intervention Score (the average of all Intervention Scores generated for each
exchange in a given dispute) over all ( = 1, 782 ) human-human dialogues significantly difer between
two factors — 1) whether the dialogue impasses or resolves (Impasse), and 2) how much Frustration
participants self-report (Z-scored). The test yielded main efects on Mean Intervention Score for each
independent variable. We find a significant main efect of Impasse on Mean Intervention Score ( = 1.75 ,
 = 0.12 ,  = 14.14 ,  &lt; .001 ), where Tukey’s posthoc test revealed the LLM scored dialogues resulting
in impasse significantly (  &lt; 0.01 ) higher ( = 5.51 ,  = 1.69 ) than those resulting in resolution
( = 3.16 ,  = 1.61 ); we also find a significant main efect of Frustration on the Mean Intervention
Score ( = 0.76 ,  = 0.04 ,  = 17.83 ,  &lt; .001 ); lastly, we find no significant interaction between the
independent variables ( = 0.08 ,  = 0.11 ,  = 0.75 ,  = .45 ). Thus, we find that an LLM can perceive
and act when disputants become frustrated with one another and when a dispute moves to an impasse.
Figure 2a visualizes these results. Figure 2b illustrates the average LLM intervention score over time
broken out by factor, with Frustration binarized into high or low using median-split. One can see the
LLM’s intervention score rise as the dialogue continues if it ultimately ends in an impasse, and there
exist higher intervention scores in high-frustration dialogues.</p>
          <p>Impasse
Resolution</p>
          <p>High Frustration
Low Frustration
10
9
e8
r
o
c
S7
n
o
iten6
v
r
e5
t
n
I
n4
a
e
3
M
2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. How to Intervene</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Methodology</title>
          <p>Given the previous section establishes LLMs can competently determine when to intervene, the question
remains of whether an LLM can formulate an efective message at an appropriate point — i.e., can an
LLM decide how to intervene? We compare LLM mediations against those of novice human mediators
and ask crowd-sourced annotators to rate each on several subjective measures — e.g., appropriateness
of the intervention point, the efectiveness of the message, and whether an accompanying justification
supports their action — and to ultimately pick which they felt more efective at guiding toward a
resolution. We expect to find that LLMs significantly outperform novice human mediators.</p>
          <p>We use crowd-sourced mediations gathered from Prolific as a baseline against the LLM — i.e., the
novice mediators. As participants enter the online survey, we tell them they will role-play a mediator,
working for an online retailer, overseeing a buyer and seller as they dispute over a purchase gone wrong.
Further, we notify them of their goal of intervening only if they believe the dialog will otherwise stall
at an impasse; we incentivize performance by ofering an additional bonus of $0.50 for each dialog (five
total) a participant intervenes where it ends in an impasse or if they remain inactive for one that ends in
resolution. When ready, they enter a page with the chat interface depicted in Figure 3 where they can
navigate through a KODIS dispute dialog utterance-by-utterance (“Continue”), intervening (“Intervene,”
with an accompanying message) when and how they decide. Through this, we collect the intervention
point (the point in the dialog where the participant intervenes) and their message for 198 dialogues.</p>
          <p>A diferent set of Prolific crowd-workers compared LLM mediations against novice human ones on
a subset ( = 20 ) of the dialogues where both the LLM and the novice human mediator elected to
intervene. Specifically, this was a within-subjects design, where we ask crowd workers (  = 106 ), given
a single random mediated dialog up to an intervention point as well as the intervention/justification, to
evaluate and compare the attempts of LLM and a human meditator (blind to which) on three subjective
measures (1-10 Likert scale) — appropriateness of the intervention point, the efectiveness of the message,
and whether an accompanying justification supports their action (see Table 1 for phrasing) — and to
ultimately pick which they felt more efective at resolving the dispute.
3.2.2. Results
We use a two-tailed t-test to test our hypothesis that human annotators prefer LLM mediations to human
ones and find significance across the three questions supporting as much. We see participants view the
LLM’s mediations as making resolution more likely, having more appropriate timing, and giving better
justification. Table 1 summarizes the statistics discussed. Lastly, a Chi-squared test on a forced choice
between the LLM or human mediations yielded a significant result (  2(1,  = 106) = 6.29 ,  = .01 ),
where 71 participants selected the LLM-generated mediation compared to 35 for the human-crafted one
— i.e., the human evaluators preferred the LLM’s mediations at a rate of two-to-one.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Question</title>
          <p>I believe this mediation increases the probability of a resolution.</p>
          <p>The supervisor intervened at an appropriate point.</p>
          <p>The supervisor provided appropriate justification for intervention.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Discussion</title>
        <p>Here, we demonstrated that an LLM could appropriately act as a mediator and intervene with some
accuracy — doing so in disputes where participants became frustrated or headed toward an impasse.
Secondly, we demonstrated that LLMs could craft compelling mediation messages to rival human
mediators. Crowd-sourced annotators evaluated the LLM as more likely to induce resolution, intervening
at a more appropriate point, and providing better justification than human mediators; participants also
overwhelmingly marked LLM mediations as better than human in a forced-choice question — choosing
the LLM-generated mediation by a margin of two-to-one. In explaining the superior performance of
LLMs in this task, one might consider that LLMs possess an innate lack of fatigue, broad training data,
and a high level of consistency. This, we posit, lays the groundwork for using LLMs as mediators in
complex dispute settings. However, one may recognize the lack of interactivity as a limitation of this
work, as we cannot gauge whether these mediations positively impact subjective evaluations or impasse
rate. We aim to address this in the next section via analysis of three-way mediated disputes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Study 2: Three-way Mediation Experiment</title>
      <p>This second study, building on the first, addresses the primary limitation of those analyses — namely, the
lack of interactivity between the disputants and mediator. I.e., in the previous study, mediators annotated
where they would intervene and what they would say on previously collected dialogues, which does
not allow one to examine their efectiveness. However, we do wish to evaluate that efectiveness, so
we conduct this second study. Specifically, we collect a smaller version of the previously used corpus 3,
with a third participant in the chat-room acting as a mediator; we quantify the similarity of the human
mediators with LLM mediators (gpt-4o-2024-08-06); and we test whether there exist significant
outcome efects when a human mediator behaves more similarly to an LLM one. We find such efects in
terms of subjective (e.g., SVI) and objective (e.g., impasse vs. resolution) outcomes.</p>
      <sec id="sec-4-1">
        <title>4.1. Methodology</title>
        <sec id="sec-4-1-1">
          <title>4.1.1. Data Collection</title>
          <p>
            We use Lioness Labs [32], a tool used by prior work [
            <xref ref-type="bibr" rid="ref14">14, 33</xref>
            ], to collect mediated dispute dialogues online
through Prolific. Lioness allows matching of participants online to complete multi-party behavioral
experiments – in our case, we match two participants role-playing as disputants with a third acting as a
mediator (see Figure 1). We collected  = 98 dialogues, so 294 participants. We pull the scenario from
Hale et al.’s [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] KODIS, as described in Section 3.1.1 — only now we add a participant as a mediator,
making it a three-party task. Before the buyer and seller interact with each other, we tell them that a
mediator will be present and may intervene. The mediator’s instructions outline a few potential reasons
to intervene, which we ground in the literature:
3This study was IRB approved.
• Escalation of Conflict: If the conversation becomes heated, with parties resorting to personal
attacks or hostile language [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
• Impasse: When parties reach a deadlock and are unable to move forward [34, 35].
• Miscommunication: There are signs the parties misunderstand each other’s points [36].
• Unreasonable demands: If one party makes unreasonable demands the other cannot meet [37].
          </p>
          <p>After completing the mediated dispute, participants answer some post-task questions. The buyer and
seller role players fill out Curhan</p>
          <p>et al.’s Subjective Value Inventory (SVI) questionnaire [38], which
contains four sub-scales measuring subjective feelings about the dispute — e.g., feelings about the
instrumental outcome, process fairness, self, and relationship. We also derive questions from those
sub-scales, explicitly invoking the mediator, to gauge the impression of the disputants and mediator
about the mediator’s performance.</p>
          <p>• Outcome: (You / The Mediator) helped achieve a satisfactory outcome.
• Relationship: (You / The Mediator) helped repair both parties relationship.
• Process: (You / The Mediator) helped facilitate a more fair outcome.
• Self: (You / The Mediator) helped (parties / me) to keep face, act to (their / my) principles, and
negotiate competently.</p>
          <p>• Avoid Impasse: (You / The Mediator) helped avoid a walkaway.</p>
          <p>We compensated parties $3.50 for a task that took approximately 20 minutes. Each player could earn
an additional bonus of up to $3 depending on how well they achieved their objectives — e.g., the buyer
and seller settling the refund, review, and apology issues, and the mediator encouraging both sides to
come to a resolution. We intended this bonus to motivate participants, in this online setting, to immerse
themselves in their role and to achieve their objectives.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Intervention Alignment</title>
          <p>In addition to the metrics recorded during data collection, we create various metrics to gauge the
efectiveness of LLM mediators. Here, we outline how we quantify the similarity of intervention
patterns between two mediators (human and LLM). Assume we have two vectors representing the
intervention patterns of the human and LLM —  for the human and  for the LLM. Consider  , though
we define  the same way, we set   = 1 if the human intervenes after the 
Whereas we construct  from the collected dialogue, we create  afterwards on that same dialogue,
removing the human mediator messages. Specifically, given some dialogue, we remove the mediation
messages; have an LLM iterate through each utterance and determine whether to intervene, in the same
manner as Study 1 (though the LLM does not output an Intervention Score, rather 1 to intervene or 0 to
not); and construct  such that   = 1 if the LLM intervened after utterance  , and zero otherwise. Thus,
we have  and  , which represent the intervention patterns of the human and LLM, respectively, for a
dialogue. Given these, we wish to gauge the similarity of each human mediator to the LLM’s behavior.</p>
          <p>We derive a measure from the Earth Mover’s Distance (EMD) to quantify the similarity of the human
and LLM mediator behavior — i.e., comparing  and  . As outlined below, we iterate through each
element of  and  in parallel, keeping track of the cumulative distance up to each element (e.g., CD
for element ); the Earth Mover’s Distance sums over the absolute value of each CD  .
ℎ message and zero otherwise.</p>
          <p>=0
CD = ∑   −</p>
          <p>EMD
= ∑ |CD |

=0
(1)
Our decision of EMD stems from requiring a metric that considers two vectors as more similar if they
contain interventions closer in proximity, rather than judging them as equivalent. For example, consider
vectors  ,  , and  where   = 1,  +1 = 1,  +2 = 1, and all else is 0; we expect EMD(, ) &lt;
EMD(, )
since  intervened closer to where  did than  . With EMD, this is true; however, with other metrics,
such as Euclidean distance, it does not hold. Of note, EMD measures distance, whereas we would like to
measure similarity — thus, we scale this metric by negative one. Further, we Z-score this for subsequent
analysis. Going forward, we refer to this measure as Intervention Alignment. Of note, this metric
does not consider what the mediators say to disputants; we next outline a method to quantify that.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Justification Alignment</title>
          <p>Building on the aforementioned intervention similarity metric, we construct a metric to measure the
similarity of the messages that the human and LLM mediators send. Contrasting with the previous
approach, we do not allow the LLM to decide where to intervene — rather, we fix the intervention
points to where the human novice intervened. Then, given the dialogue history, we prompt the LLM
to generate a message and select one of the justifications from Section 4.1.1 (or NA if none fit) — at
the same time, we categorize the human’s message into one of those justifications. We compute our
Justification Alignment dependent variable, then, as the proportion of interventions with matching
categories for a given dialogue — discarding those where the novice chose not to intervene. Therefore,
this variable gives a sense of whether the messages the human and LLM send have similar intents.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <sec id="sec-4-2-1">
          <title>4.2.1. Novice Mediation Outcomes</title>
          <p>We begin by examining the impact of the novice human mediations on subjective and objective outcomes.
For each dialogue, we ascribe a binary factor denoting whether the novice mediator intervened at all
(the novice mediators intervene at least once about 83% of the time) — in the subsequent statistical tests,
we consider this alongside another binary factor representing the outcome (impasse or resolution). We
run four two-by-two ANOVAs considering the efect of intervention and outcome on subjective outcome
(SVI). For the four SVI sub-scales, we do not find any significant main efects of this intervention factor;
rather, we find main efects of outcome (  &lt; .001 for each) such that disputants report a higher SVI
if they come to a resolution. Of note, in three of the SVIs, we see disputants report worse subjective
feelings about the outcome if the mediator intervened. Further, considering the impact of intervention
on the objective outcome, we see disputes resolved 81% of the time if the mediator intervened, compared
to 77% of the time otherwise — this small diference does not reach significance (  = .91 ) by Chi-squared.</p>
          <p>Outcome</p>
          <p>Intervention
Process</p>
          <p>No Intervention</p>
          <p>Self</p>
          <p>Relationship
1 Resolution</p>
          <p>Impasse Resolution Impasse Resolution Impasse Resolution Impasse
Figure 4: Depicts mean SVI sub-scale scores by outcome (impasse vs. resolution) and whether the human
mediator intervened.</p>
          <p>Given these seemingly unimpressive results, we proceed to analyze the impression from each party
of the mediator’s performance — specifically, Figure 5 depicts the responses from each party to the
questions overviewed in Section 4.1.1. We see the mediators actually thought they performed well,
relative to the buyer and seller evaluations. Next, we analyze whether we can quantify the extent to
which these novices behave like LLMs, and if there exist any efects of that on the outcomes.</p>
          <p>Mediator</p>
          <p>Buyer</p>
          <p>Seller
Outcome Relationship</p>
          <p>Process</p>
          <p>Self</p>
          <p>Avoid Impasse</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Comparing Against AI</title>
          <p>First, we consider the similarity of the intervention pattern between the human and LLM (Intervention
Alignment ). We conduct a regression analysis to examine the extent to which Intervention Alignment
predicts subjective (SVI outcomes) and objective (Impasse vs. Resolution) outcomes. By linear regression,
it significantly predicted each of the four SVI sub-scales, as seen in Table 2. As the human’s intervention
pattern becomes more similar to the LLM’s, the subjective outcome improves across the board —
Figure 6 (top row) illustrates these relationships. Next, we consider the objective outcome — i.e., a
binary dependent variable of whether the dispute ended in a resolution (coded as zero) or impasse
(coded as one). We run a logistic regression to test whether there exists a significant efect of the
similarity metric on the outcome. The regression ( = −.41 ,  = .23 ,  = −1.75 ,  = .08 ) shows a trend,
where the more similar one acts to the LLM, the less likely an impasse. Further, we see an impasse rate
of 29% in dialogues with the bottom half similarity, which reduces to 11% for the upper half.</p>
          <p>SVI Subscale
Relationship</p>
          <p>Self</p>
          <p>Process
Outcome</p>
          <p>Intervention Alignment</p>
          <p>B SE p  2
0.37 0.09 &lt;.001
0.21 0.10 0.035
0.36 0.10 &lt;.001
0.31 0.10 0.002</p>
          <p>Mirroring the analysis for Intervention Alignment, we conducted a regression analysis to test the efect
of the Justification Alignment metric on the various outcome measures. Starting with the subjective
outcome, we run four linear regressions, the results of which one can find in Table 2, which yielded
significant results for all subjective outcomes, aside from self. We again run a logistic regression to
analyze the efect of semantic similarity on impasse. We find a significant result (  = −.84 ,  = .34 ,
 = −2.47 ,  = .013 ) such that if the novice mediator tended to send messages in the same category as
the LLM, the chance of an impasse reduced significantly.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Discussion</title>
        <p>This study, building on the first, demonstrates that LLMs possess the potential to mediate human
disputes in a triadic setting. The first study demonstrated LLMs can tune into salient features of
pre-collected dispute dialogues, intervening — as expected — in dialogues with high frustration and
ending in impasse; in this second study, we analyze three-party human mediations, finding that novice
mediators acting more similarly to LLMs achieve better outcomes. Importantly, whereas in the first
study, we could not measure the efect of an intervention on disputes (we leveraged a pre-collected
corpus); in the second, we could. In Study 2, initially, we see uninspiring results related to impasse rate
and disputants’ reported subjective outcomes (SVI), where it seems as though, if anything, mediators
imposed an adverse efect. Alternatively, one may consider that mediators intervene more so in disputes
heading toward impasse or with high frustration, and parties could take a mediator’s interjection as
a signal that the outcome may be worse — this could partially explain the diferences in subjective
evaluations in cases of impasse between dialogues where the mediator intervened versus not (see
Figure 4). Despite this, we see mediators rate themselves relatively higher compared to evaluations from
disputants across several dimensions — potentially an example of the self-serving bias [39]. Considering
these results, we explore demonstrating that LLMs have the potential to act as mediators.</p>
        <p>We create two variables to capture how similar a novice mediator acts to an LLM: 1) Intervention
Alignment, which captures the similarity of the intervention patterns of mediators, and 2) Justification
Alignment, which captures whether two mediators intervene for similar reasons. In terms of objective
outcomes, we find a trend and a significant efect, respectively, where higher similarity to the LLM
reduces the probability of an impasse. Further, for nearly every SVI (aside from Self for Justification
Alignment), we see significant efects. This implies that when a novice mediator acts closely to an LLM
one, outcomes improve. There still exists a limitation in interactivity with Study 2, where disputants
never interact with an LLM mediator directly — rather, we collected three-party dialogues, and quantified
how much an LLM “agreed” with a novice mediator.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion &amp; Future Work</title>
      <p>This work demonstrates LLMs respond to escalation markers, and generate sensible mediation messages;
further, we show human mediators acting more similarly to LLMs induce better outcomes. Together,
these findings indicate LLMs could operate as a mediator in a zero-shot setting. We have two primary
results: 1) We show LLMs can efectively determine when and how to intervene on a pre-collected corpus,
and 2) we demonstrate the potential of LLM mediators in a triadic dispute setting via elevated subjective
evaluations from disputants and lower impasse rates. However, this work has several limitations that
we aim to address in future work. First, we test only one LLM (GPT4o) with one prompt configuration
(zero-shot). One might imagine performance improvements when endowing an LLM with expert
strategies for mediation [40], or in leveraging popular promoting techniques (e.g., chain-of-thought).
Secondly, in the work’s current paradigm, we have the LLM intervene only when it detects a potential
issue in the dialogue – however, one could imagine a mediator might attain greater efectiveness
through proactively intervening, preventing issues before they arise. We will consider this notion going
forward. Lastly, while an improvement over Study 1, Study 2 still did not place an LLM mediator directly
with human participants — rather, we evaluate the efectiveness of the LLM mediators through the
performance of the novices. Another study, placing the LLM directly between two human disputants,
must occur before making a definitive statement on how well LLMs can mediate — however, this work
illustrates their promise.</p>
    </sec>
    <sec id="sec-6">
      <title>Ethical Impact Statement</title>
      <p>These types of social influence technologies carry the potential for harm, especially considering the
intricate emotional dynamics at play. E.g., as Schluger et al. [41] note, technologies that proactively
work to prevent escalatory spirals in conversation — as does the LLM mediator in our scenario —
may infringe on one’s freedom of speech if they shut down the conversation or intervene based on a
prediction of future bad behavior. Further, prior work demonstrates LLMs struggle to generalize across
cultures with emotion [42] — given the salience of emotion in disputes, this warrants consideration.</p>
      <p>Further, over-reliance on AI can cause harm — especially in emotionally charges settings like dispute
resolution. E.g., Passi and Vorvoreanu [43] note several manifestations of over-reliance: users may have
bias to favor automatically generated results; a user may over-rely on AI if it does well initially, even if
it fails later; AI-explanations may also cause over-reliance. Generally, a mediator should not dominate
a dispute or exert pressure in such a way that disputants defer to them entirely — rather, a mediator
should guide disputants toward a solution [44]. As such, over-reliance on the AI-mediator may hamper
the mediation process.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Acknowledgments</title>
      <p>This work is supported by the U.S. Government including the Air Force Ofice of Scientific Research
(grant FA9550-23-1-0320), and the National Science Foundation (grant 2150187). The views and
conclusions contained in this document are those of the authors and should not be interpreted as representing
the oficial policies, either expressed or implied, of the Army Research Ofice or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation herein.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT5.2 and Grammarly to help with writing
clarity, and to assist with Python coding. After using these tools, the authors reviewed, verified, and
edited the content as needed and take full responsibility for the publication’s content.
delegation: A case study of content moderation, in: Proceedings of the 2022 CHI Conference on
Human Factors in Computing Systems, 2022, pp. 1–18.
[27] J. Govers, E. Velloso, V. Kostakos, J. Goncalves, Ai-driven mediation strategies for audience
depolarisation in online debates, in: Proceedings of the 2024 CHI Conference on Human Factors
in Computing Systems, CHI ’24, Association for Computing Machinery, New York, NY, USA, 2024.</p>
      <p>URL: https://doi.org/10.1145/3613904.3642322. doi:10.1145/3613904.3642322.
[28] M. H. Tessler, M. A. Bakker, D. Jarrett, H. Sheahan, M. J. Chadwick, R. Koster, G. Evans, L.
CampbellGillingham, T. Collins, D. C. Parkes, et al., Ai can help humans find common ground in democratic
deliberation, Science 386 (2024) eadq2852.
[29] J. Tan, H. Westermann, N. R. Pottanigari, J. Šavelka, S. Meeùs, M. Godet, K. Benyekhlef, Robots in
the middle: Evaluating llms in dispute resolution, arXiv preprint arXiv:2410.07053 (2024).
[30] J. Hale, H. Kim, A. Choi, J. Gratch, Ai-mediated dispute resolution, in: Proceedings of the AAAI</p>
      <p>Symposium Series, volume 5, 2025, pp. 67–70.
[31] S. Aslani, J. Ramirez-Marin, J. Brett, J. Yao, Z. Semnani-Azad, Z.-X. Zhang, C. Tinsley, L. Weingart,
W. Adair, Dignity, face, and honor cultures: A study of negotiation strategy and outcomes in three
cultures, Journal of Organizational Behavior 37 (2016) 1178–1201.
[32] M. Giamattei, K. S. Yahosseini, S. Gächter, L. Molleman, Lioness lab: a free web-based platform for
conducting interactive experiments online, Journal of the Economic Science Association 6 (2020)
95–111.
[33] K. Chawla, J. Ramirez, R. Clever, G. Lucas, J. May, J. Gratch, Casino: A corpus of campsite
negotiation dialogues for automatic negotiation systems, in: Proceedings of the 2021 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, 2021, pp. 3167–3185.
[34] A. Vannucci, C. M. Ohannessian, K. M. Flannery, A. De Los Reyes, S. Liu, Associations between
friend conflict and afective states in the daily lives of adolescents, Journal of Adolescence 65
(2018) 155–166.
[35] C. K. De Dreu, L. R. Weingart, Task versus relationship conflict, team performance, and team
member satisfaction: a meta-analysis., Journal of applied Psychology 88 (2003) 741.
[36] J. Van Veenen, Dealing with miscommunication, distrust, and emotions in online dispute resolution
(2010).
[37] P. F. Kirgis, Bargaining with consequences: Leverage and coercion in negotiation, Harv. Negot. L.</p>
      <p>Rev. 19 (2014) 69.
[38] J. R. Curhan, H. A. Elfenbein, H. Xu, What do people value when they negotiate? mapping the
domain of subjective value in negotiation., Journal of personality and social psychology 91 (2006)
493.
[39] D. R. Forsyth, Self-serving bias (2008).
[40] S. B. Goldberg, J. M. Brett, B. Blohorn-Brenneur, How mediation works: Theory, research, and
practice, Emerald Publishing Limited, 2017.
[41] C. Schluger, J. P. Chang, C. Danescu-Niculescu-Mizil, K. Levy, Proactive moderation of online
discussions: Existing practices and the potential for algorithmic support, Proceedings of the ACM
on Human-Computer Interaction 6 (2022) 1–27.
[42] S. Havaldar, B. Singhal, S. Rai, L. Liu, S. C. Guntuku, L. Ungar, Multilingual language models are not
multicultural: A case study in emotion, in: J. Barnes, O. De Clercq, R. Klinger (Eds.), Proceedings
of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, &amp; Social Media
Analysis, Association for Computational Linguistics, Toronto, Canada, 2023, pp. 202–214. URL:
https://aclanthology.org/2023.wassa-1.19/. doi:10.18653/v1/2023.wassa- 1.19.
[43] S. Passi, M. Vorvoreanu, Overreliance on ai literature review, Microsoft Research 339 (2022) 340.
[44] S. B. Goldberg, J. M. Brett, B. Blohorn-Brenneur, The roles of the mediator and the disputing
parties at each step of the mediation process, in: How Mediation Works, Emerald Publishing
Limited, 2017, pp. 17–57.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Prompts</title>
      <sec id="sec-9-1">
        <title>Prompt: Study 1 — Mediation</title>
        <p>Imagine you are playing the role of a mediator in a buyer/seller purchase dispute. Your goal is to
allow participants to resolve their dispute on their own if possible, but to intervene if necessary.
Some reasons to intervene include:
1. Escalation of Conflict: if the conversation becomes heated with parties resorting to personal
attacks or hostile language
2. Impasse: when parties reach a deadlock and are unable to move forward
3. Miscommunication: if there are signs that the parties are misunderstanding each other’s
points
4. Unreasonable Demands: if one party is making unreasonable demands that the other party
can’t possibly meet
You will be given the conversation so far. Rate the situation on a scale from 1 to 10 with 1 meaning
definitely don’t intervene and 10 meaning definitely intervene. Provide:
• (a) the rating on whether to intervene,
• (b) the reason to intervene, selected from the list above,
• (c) a one-sentence statement you might tell the parties at this point.</p>
        <p>You do not need to intervene every turn, and should consider how recently you’ve intervened before
making a decision.</p>
        <p>Here is the conversation so far…
You are a mediator for a buyer/seller purchase dispute. Let parties resolve issues themselves when
possible; intervene only if needed.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Valid reasons to intervene:</title>
        <p>Evaluate only the last Buyer/Seller utterance above. Respond with only JSON in the format:
{
"intervene": 0 or 1,
"reason": "Escalation of Conflict" | "Impasse" | "Miscommunication" |
"Unreasonable demands" | "NA",
"intervention": "one sentence suggestion (empty if intervene == 0)"
}</p>
        <p>Prompt: Study 2 — Justification Alignment
You are an expert mediation evaluator for a buyer/seller purchase dispute.</p>
      </sec>
      <sec id="sec-9-3">
        <title>Valid categories:</title>
        <p>1. Escalation of Conflict
2. Impasse
3. Miscommunication
4. Unreasonable demands
5. NA</p>
      </sec>
      <sec id="sec-9-4">
        <title>Conversation so far:</title>
        <p>{prev}
Classify only the last mediator utterance into exactly one category. Respond with only JSON in the
format:
{
"category": "Escalation of Conflict" | "Impasse" | "Miscommunication" |
"Unreasonable demands" | "NA"
}</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Felstiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Abel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarat</surname>
          </string-name>
          ,
          <article-title>The emergence and transformation of disputes: Naming, blaming, claiming…</article-title>
          ,
          <source>in: Theoretical and Empirical Studies of Rights</source>
          , Routledge,
          <year>2017</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Baarslag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaisers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gerding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Jonker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          ,
          <article-title>When will negotiation agents be able to represent us? the challenges and opportunities for autonomous negotiators</article-title>
          ,
          <source>International Joint Conferences on Artificial Intelligence</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Faratin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Jennings</surname>
          </string-name>
          ,
          <article-title>Negotiation decision functions for autonomous agents</article-title>
          ,
          <source>Robotics and Autonomous Systems</source>
          <volume>24</volume>
          (
          <year>1998</year>
          )
          <fpage>159</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <article-title>Negotiation and cooperation in multi-agent environments</article-title>
          ,
          <source>Artificial intelligence 94</source>
          (
          <year>1997</year>
          )
          <fpage>79</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Jonker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Hindriks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wiggers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Broekens</surname>
          </string-name>
          , Negotiating agents,
          <source>AI</source>
          Magazine
          <volume>33</volume>
          (
          <year>2012</year>
          )
          <fpage>79</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aydoğan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baarslag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fujita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          , D. De Jonge,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nakadai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Morinaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Osawa</surname>
          </string-name>
          , et al.,
          <article-title>Challenges and main results of the automated negotiating agents competition (anac) 2019, in: Multi-Agent Systems and Agreement Technologies: 17th European Conference</article-title>
          ,
          <string-name>
            <surname>EUMAS</surname>
          </string-name>
          <year>2020</year>
          ,
          <article-title>and</article-title>
          7th International Conference, AT 2020, Thessaloniki, Greece,
          <source>September 14-15</source>
          ,
          <year>2020</year>
          ,
          <source>Revised Selected Papers 17</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>366</fpage>
          -
          <lpage>381</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          , D. DeVault,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marsella</surname>
          </string-name>
          ,
          <article-title>Negotiation as a challenge problem for virtual humans</article-title>
          ,
          <source>in: Intelligent Virtual Agents: 15th International Conference, IVA</source>
          <year>2015</year>
          , Delft,
          <source>The Netherlands, August 26-28</source>
          ,
          <year>2015</year>
          , Proceedings 15, Springer,
          <year>2015</year>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Brett</surname>
          </string-name>
          ,
          <article-title>Negotiating globally: How to negotiate deals, resolve disputes, and make decisions across cultural boundaries</article-title>
          , John Wiley &amp; Sons,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. K. De Dreu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          <string-name>
            <surname>Manstead</surname>
          </string-name>
          ,
          <article-title>The interpersonal efects of anger and happiness in negotiations</article-title>
          .,
          <source>Journal of personality and social psychology 86</source>
          (
          <year>2004</year>
          )
          <fpage>57</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C. M. de Melo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Carnevale</surname>
            ,
            <given-names>J. Gratch,</given-names>
          </string-name>
          <article-title>The efect of expression of anger and happiness in computer agents on negotiations with humans</article-title>
          ,
          <source>in: The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume</source>
          <volume>3</volume>
          ,
          <year>2011</year>
          , pp.
          <fpage>937</fpage>
          -
          <lpage>944</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Pruitt</surname>
          </string-name>
          ,
          <article-title>Conflict escalation in organizations, in: The psychology of conflict and conflict management in organizations</article-title>
          , Psychology Press,
          <year>2007</year>
          , pp.
          <fpage>261</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>J. M. Brett</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Lytle</surname>
          </string-name>
          ,
          <article-title>Breaking the bonds of reciprocity in negotiations</article-title>
          ,
          <source>Academy of Management Journal</source>
          <volume>41</volume>
          (
          <year>1998</year>
          )
          <fpage>410</fpage>
          -
          <lpage>424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Halperin</surname>
          </string-name>
          ,
          <article-title>Group-based hatred in intractable conflict in israel</article-title>
          ,
          <source>Journal of Conflict resolution 52</source>
          (
          <year>2008</year>
          )
          <fpage>713</fpage>
          -
          <lpage>736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rakshit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Brett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          ,
          <article-title>Kodis: A multicultural dispute resolution dialogue corpus, in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers</article-title>
          ),
          <year>2025</year>
          , pp.
          <fpage>12771</fpage>
          -
          <lpage>12785</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Welton</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. G. Pruitt,</surname>
          </string-name>
          <article-title>The mediation process: The efects of mediator bias and disputant power</article-title>
          ,
          <source>Personality and Social Psychology Bulletin</source>
          <volume>13</volume>
          (
          <year>1987</year>
          )
          <fpage>123</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>J. M. Wittmer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Carnevale</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          <string-name>
            <surname>Walker</surname>
          </string-name>
          ,
          <article-title>General alignment and overt support in biased mediation</article-title>
          ,
          <source>Journal of Conflict Resolution</source>
          <volume>35</volume>
          (
          <year>1991</year>
          )
          <fpage>594</fpage>
          -
          <lpage>610</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gino</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Norton</surname>
          </string-name>
          ,
          <article-title>The surprising efectiveness of hostile mediators</article-title>
          ,
          <source>Management Science</source>
          <volume>63</volume>
          (
          <year>2017</year>
          )
          <fpage>1972</fpage>
          -
          <lpage>1992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Picard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Siltanen</surname>
          </string-name>
          ,
          <article-title>Exploring the significance of emotion for mediation practice</article-title>
          ,
          <source>Conflict Resolution Quarterly</source>
          <volume>31</volume>
          (
          <year>2013</year>
          )
          <fpage>31</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Emotion in mediation: Implications, applications, opportunities, and challenges, The Blackwell handbook of mediation: Bridging theory, research, and practice (</article-title>
          <year>2017</year>
          )
          <fpage>277</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>M. J. Boland</surname>
            ,
            <given-names>W. H.</given-names>
          </string-name>
          <string-name>
            <surname>Ross</surname>
          </string-name>
          ,
          <article-title>Emotional intelligence and dispute mediation in escalating and deescalating situations</article-title>
          ,
          <source>Journal of Applied Social Psychology</source>
          <volume>40</volume>
          (
          <year>2010</year>
          )
          <fpage>3059</fpage>
          -
          <lpage>3105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wulczyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thain</surname>
          </string-name>
          , L. Dixon,
          <article-title>Ex machina: Personal attacks seen at scale</article-title>
          ,
          <source>in: Proceedings of the 26th international conference on world wide web</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1391</fpage>
          -
          <lpage>1399</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <article-title>Deeper attention to abusive user content moderation</article-title>
          ,
          <source>in: Proceedings of the 2017 conference on empirical methods in natural language processing</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1125</fpage>
          -
          <lpage>1135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cho</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rizk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , J. May,
          <article-title>Can language model moderators improve the health of online discourse?</article-title>
          , in: K. Duh,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , S. Bethard (Eds.),
          <source>Proceedings of the</source>
          <year>2024</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>7478</fpage>
          -
          <lpage>7496</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>415</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .naacl- long.415.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Danescu-Niculescu-Mizil, Trouble on the horizon: Forecasting the derailment of online conversations as they develop</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>4743</fpage>
          -
          <lpage>4754</lpage>
          . URL: https://aclanthology.org/D19-1481/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          - 1481.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Conversation modeling to predict derailment</article-title>
          ,
          <source>in: Proceedings of The International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>17</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>926</fpage>
          -
          <lpage>935</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Tan,
          <article-title>Human-ai collaboration via conditional</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>1. Escalation of Conflict</mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>2. Impasse</mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>3. Miscommunication</mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>4. Unreasonable demands</mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          5.
          <string-name>
            <surname>NA</surname>
          </string-name>
          <article-title>(use when not recommending intervention)</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>