<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LLMediator: GPT-4 Assisted Online Dispute Resolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hannes Westermann</string-name>
          <email>hannes.westermann@umontreal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaromir Savelka</string-name>
          <email>jsavelka@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karim Benyekhlef</string-name>
          <email>karim.benyekhlef@umontreal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cyberjustice Laboratory, Faculté de droit, Université de Montréal</institution>
          ,
          <addr-line>Québec</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LLMediator</institution>
          ,
          <addr-line>GPT-4, Access to Justice, Large Language Models, Online Dispute Resolution, ODR, Augmented Intelligence</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laboratory and implemented with the Quebec consumer</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>OPC, an ODR platform developed by the Cyberjustice</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>School of Computer Science, Carnegie Mellon University</institution>
          ,
          <addr-line>Pittsburgh</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article, we introduce LLMediator, an experimental platform designed to enhance online dispute resolution (ODR) by utilizing capabilities of state-of-the-art large language models (LLMs) such as GPT-4. In the context of high-volume, low-intensity legal disputes, alternative dispute resolution methods such as negotiation and mediation ofer accessible and cooperative solutions for laypeople. These approaches can be carried out online on ODR platforms. LLMediator aims to improve the eficacy of such processes by leveraging GPT-4 to reformulate user messages, draft mediator responses, and potentially autonomously engage in the discussions. We present and discuss several features of LLMediator and conduct initial qualitative evaluations, demonstrating the potential for LLMs to support ODR and facilitate amicable settlements. The initial proof of concept is promising and opens up avenues for further research in AI-assisted negotiation and mediation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
volume, low-intensity disputes (such as debt, consumer
and employment issues), which often afect laypeople,
may often not understand what legal rules apply to their
situation, which can be an impediment to enforcing their
rights [2]. Even if they are aware of their rights, they may
struggle to come to a resolution (see, e.g., [3]). Bringing
the dispute to court could entail significant monetary,
temporal and psychological costs [4]. Unresolved legal
issues can be very stressful and they may even impact
individuals’ health. Unresolved legal issues were
estimated to cost society 746 million dollars in Canada alone
in 2014 [1]. Globally, the United Nations Task Force on</p>
    </sec>
    <sec id="sec-2">
      <title>Justice estimates that 1.5 billion individuals are unable</title>
      <p>to resolve their legal issues at any point in time [5].</p>
      <p>A popular way of increasing access to justice is the use
ods, i.e., methods of solving disputes outside of the
traditional judicial system [6]. Such methods include, e.g.,
negotiation or mediation. During negotiation, the
parties communicate to each other, attempting to reach an
amicable agreement. In mediation, a neutral third party
is expected to establish a rapport with the parties, to
Workshop on Artificial Intelligence for Access to Justice (AI4AJ 2023),
(K. Benyekhlef)
[12].</p>
    </sec>
    <sec id="sec-3">
      <title>To foster further improvements in the usability and</title>
      <p>eficacy of ODR systems, we present LLMediator—an
experimental platform focused on exploring how large
support the parties and the mediator in arriving at a
settlement. LLMs have recently risen to fame as very powerful
AI systems. Models such as GPT-4 [13] or LLama [14] are
able to perform various tasks based on a textual prompt,
which is often not too far from instructions
understandable to a human. The models can also be accessed via an
application programming interface (API), which allows
developers to integrate the capabilities into their own
applications.</p>
    </sec>
    <sec id="sec-4">
      <title>LLMediator is an experimental system leveraging GPT</title>
    </sec>
    <sec id="sec-5">
      <title>4 to enhance an ODR platform. The system can refor</title>
      <p>mulate users’ messages to be, e.g., less emotional and
more conducive to reaching a settlement. Further,
LLof so-called alternative dispute resolution (ADR) meth- language models (LLMs) such as GPT-4 may be used to
(or facilitator), taking into account previous messages cations. Yu et al. applied GPT-3 to the COLIEE legal
by the parties and special instructions by the mediator. entailment task that is based on the Japanese Bar exam,
Finally, LLMediator ofers an experimental feature to au- substantially improving over the existing state-of-the-art
tonomously respond to the discussion by the parties. In [28]. Similarly, Bommarito II and Katz utilized GPT-3.5
this paper, we demonstrate selected features of LLMedi- for the Multistate Bar Examination [29]. The model
perator and perform initial qualitative evaluations of this formed surprisingly well but not well enough to pass
system. the exam. Later, Katz et al. applied GPT-4 to the
entire Uniform Bar Examination (UBE) and observed the
system passing the exam [30]. The use of GPT models
2. Related Work has further been explored in the context of the
assessment of trademark distinctiveness [31], legal reasoning
[32, 33], U.S. Supreme court judgment modeling [34] and
giving legal information [35]. Likewise, GPT-4 has been
used to annotate textual legal data based on annotation
guidelines [36, 37] and to generate explanations of legal
concepts based on sentences from statutes [38]. Here, we
use the models to increase access to justice, by supporting
parties in negotiation and mediation.</p>
      <p>In this work, the key uses we explore employ
GPT4 as an augmented intelligence tool, that supports the
users in their decision-making as opposed to directly
providing instructions or advice, which can decrease the
risk associated with such tools. Previously, this approach
has been explored in the legal domain to generate boolean
search rules [39], perform annotations more eficiently
[40], and provide legal information to laypeople [19, 41].</p>
    </sec>
    <sec id="sec-6">
      <title>Using artificial intelligence (AI) to increase access to jus</title>
      <p>tice is an often studied question. For example, systems
have been built that seek to help users understand their
rights in certain situations (see, e.g., [15, 16, 17, 18, 19]).</p>
      <p>In this work, we present an approach to facilitate dispute
resolution via augmented intelligence tools for
negotiation and mediation.</p>
      <p>Integrating AI into the negotiation process has
previously been explored by multiple authors. One possible
such integration is the display of a so-called “Best
Alternative to the Negotiated Agreement” (BATNA)
during negotiation, which can provide contextual
information helping the users find a fair agreement (see e.g.
[6, 20, 21, 22, 17]). Other approaches include using
gametheory methods to elicit an acceptable settlement (see,
e.g., [23, 24]), and by structuring the dialogue between
parties [25]. Here, we detect messages by the parties
containing inflammatory language, and suggest alterna- 3. Example use cases
tive ways of conveying the message, that may be more
amenable to amicable resolution. LLMediator is a web-based application that contains the</p>
      <p>Similarly, AI systems have been built to support medi- necessary features to explore and evaluate the integration
ation. Branting et al. developed a system that monitors of LLMs into the negotiation and mediation processes.
messages exchanged during negotiation for situations Upon entering the system, the user is exposed to a chat
that require an immediate human intervention, such as interface, that allows them to discuss their issue with the
conflict escalation or inactivity. The system can further other party, with the aim to come to an amicable solution,
suggest standard messages, that may be relevant to a par- as can be seen in Figure 1. In the experimental prototype,
ticular dispute, to the facilitator [10]. Researchers have the user can select whether they wish to intervene as one
further investigated the possibility of using ChatGPT to of the parties (John or Jane) or the mediator. This part of
suggest relevant questions and potential party priorities the platform is similar to other existing ODR platforms
to to support the mediator in their role [26]. In this work, that allow the parties to communicate in a convenient
we evaluate the ability of LLMs to generate suggestions and asynchronous manner without having to physically
for the mediator, or potentially even intervene in a ne- relocate.
gotiation autonomously, which could support mediators However, the LLMediator platform also has several
or even be used to provide certain mediation services novel aspects, that use the power of LLMs to support
in areas where traditional mediation is not feasible (e.g., various tasks in the mediation platform. These features
due to the costs). include:</p>
      <p>Huang and Ouyang proposed a technique for
developing a judge perspective generation model in the context
of ODR. The approach employs pre-trained LLMs, and is
adaptable to various scenarios through minimal low-code
ifne-tuning [ 27].</p>
      <p>Our work explores the use of LLM models, e.g., GPT-4,
to support legal tasks. There has been a growing interest
in exploring capabilities of GPT models in such
appli1. F1 - Suggesting reformulations of inflammatory
messages to be less emotional and, hence, more
conducive to finding an amicable settlement
2. F2 - Drafting message suggestions for mediators</p>
      <p>to more eficiently intervene in a discussion
3. F3 - Potentially, autonomously intervening in a
discussion to guide the parties towards an
amicable solution</p>
    </sec>
    <sec id="sec-7">
      <title>The technical implementation of these features is described in Section 4.</title>
    </sec>
    <sec id="sec-8">
      <title>The mediator can press a button in the interface to</title>
      <p>generate a draft intervention. This feature uses GPT-4 to
3.1. F1 - Reformulating inflammatory read the previously sent messages and draft a suggested
messages message that gently guides the parties toward a friendly
resolution. Figure 3 shows such a suggested intervention.</p>
      <p>Emotions can run high in negotiation situations. In Fig- This message may serve as a starting point for the
ure 2, the user intends to send a message that is inflam- mediator to decide how to intervene. They have the
matory, and may derail attempts to find a mutually ac- option to send the message, or edit it and then send it.
ceptable solution. The LLMediator platform detects this Further, they can provide additional instructions to
GPTlanguage, and ofers an alternative formulation to the 4, guiding the model to emphasize certain aspects. For
user, generated by GPT-4. The user can choose to send example, the mediator may guide the parties to consider
the original message, send the reformulated message di- the packaging of the camera. Figure 4 shows a screenshot
rectly, or edit and then send the reformulated message. of what happens when the mediator instructs the model
The suggestion may nudge the user toward stating their to explore this issue with the parties.
request in a more neutral fashion, which could be more
helpful for arriving at an amicable resolution to their
dispute.</p>
      <sec id="sec-8-1">
        <title>3.3. F3 - Autonomously intervening in the</title>
        <p>negotiation?</p>
      </sec>
      <sec id="sec-8-2">
        <title>3.2. F2 - Drafting messages for the mediator</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>When negotiation is in deadlock or does not result in</title>
      <p>a settlement, it may be useful for a neutral, third-party
mediator to intervene in the discussion. Let us consider
the role of the mediator to see how the LLMediator can
support them in the task of encouraging the parties to
arrive to a friendly resolution.</p>
    </sec>
    <sec id="sec-10">
      <title>In some situations, it may make sense to allow the model</title>
      <p>to automatically intervene in the negotiation. This could
be the case when the dispute value is too low for
employing a human mediator, or if there are not enough
mediators to attend to all disputes in a certain area.</p>
      <p>In Figure 5, one of the parties requested an
intervention by the LLMediator, to move the discussion forward.
The LLMediator autonomously generated a message and
sent it to the parties, suggesting a few possible options
to encourage a settlement. There may also be other
triggers to move the system to intervene, such as a period
of inactivity, or inflammatory messages being sent (see
Section 4.4.1). As will be discussed in Section 5.4, this
approach may carry with it certain risks that would need
to be carefulyl studied before it could be deployed.</p>
      <p>We have now seen the three current capabilities of the
LLMediator to support the parties in a mediation. As
we can see, the system suggests relevant and
contextadapted reformulations and interventions, that may
encourage the parties to use calmer language, or save
mediator’s time. Next, let us take a look at the technical
considerations behind the system.</p>
      <sec id="sec-10-1">
        <title>4.1. Large language model used</title>
        <p>For the current version of the system, we used GPT-4
(Generative Pre-trained Transformer). GPT-4 is a
multimodal pre-trained LLM created by OpenAI [13]. The
model has shown impressive performance on a wide
variety of tasks, including passing the Uniform Bar
Examination [30]. Some researchers even argue that the model
shows sparks of general intelligence [42]. We interacted
with the 8k token version of the model via the OpenAI
API.1</p>
      </sec>
      <sec id="sec-10-2">
        <title>4.2. F1 - Reformulating inflammatory messages</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>The first capability of the LLMediator is the reformulating</title>
      <p>4. Technical considerations of inflammatory messages. The motivation behind this
feature is that individuals may have a lot of emotions
For the features described in the preceding section, a attached to their dispute, which could be reflected in
number of choices need to be considered. These involve the tone of the messages. Such messages may increase
questions such as when the features should be activated, the tension and aggravation between the parties rather
how the LLM should be prompted to achieve the best than decrease it, making it less likely that the parties
results, and what to do with the resulting text. In this will find an amicable agreement, which is the goal of the
section, we will explore these considerations. negotiation process.</p>
    </sec>
    <sec id="sec-12">
      <title>1GitHub: OpenAI Python Library. Available at: https://github.com/</title>
      <p>openai/openai-python [Accessed 2023-02-09]
immediate intervention was necessary, and
suggest standard message responses.
3. Using an of-the-shelf machine learning model.</p>
      <p>Sentiment detection is a well-studied field, with
many available models. These models could be
used to detect messages with a negative
polarity, or even to detect certain emotions (such as
angry or sad), and use thresholds to trigger an
intervention.
4. Using keyword search. The simplest approach in
detecting inflammatory words is simply to scan
the message sent by the user for inflammatory
words, such as “a**hole”.
5. The user may decide that they would like a
suggestion to reformulate a message, e.g., if they are
aware that their message may come across as
angry or defensive, and would like a suggestion as
to how they may express themselves in a more
neutral fashion.</p>
      <p>For the LLMediator platform, the main focus is
investigating whether the model is able to reformulate the
message in a useful way. Therefore, we chose options 4
and 5. The system scans the message for inflammatory
terms, and suggests a reformulation if any such terms
are found. Further, the user has the option to manually
activate the reformulation dialog. More sophisticated
methods to detect inflammatory messages will be
investigated in future work.</p>
      <sec id="sec-12-1">
        <title>4.2.2. Reformulating the message</title>
        <p>Once the inflammatory message has been detected, the
next step is to prepare a reformulated message with a
more constructive tone, while maintaining the core
meaning of the message. The message is thus sent to an LLM,
together with a prompt instructing the LLM to
reformulate the message. For the LLMediator, we are using
the GPT-4 API provided by OpenAI, together with the
following prompt:</p>
        <p>“You are an ODR platform. You are given the chat
message of a party. Reformulate the message to maintain the
content, but make it less confrontational and more
encouraging for an amicable settlement. Respond directly with
the reformulated message, do not explain.”</p>
        <p>In order to overcome the issue, the LLMediator
contains a feature that aims to reformulate the message of
the user to a more neutral tone, while preserving the
overall semantic content of the message. This way, the
parties are able to communicate their opinions, without
escalating the tone of the conversation.</p>
      </sec>
      <sec id="sec-12-2">
        <title>4.2.1. Detect a message requiring intervention</title>
        <p>In order to only intervene when necessary, the
LLMediator needs a way to detect whether a message contains
inflammatory language and would benefit from
reformulation. Thus, the first step is to classify the tone of the
message. It should be noted that the level of emotions and
anger in a message that should lead to an intervention is
a policy decision—diferent thresholds may lead to
diferent results. Further, one could imagine multiple methods
of detecting whether a message may be inflammatory.</p>
        <p>These methods include:
1. Contacting an LLM, such as GPT-4, with every
message and inquiring whether it is inflammatory
or not. Depending on the volume of messages
on a platform, this method may be expensive and 4.2.3. Showing the reformulation to the user
introduce latency into the platform, which may
be confusing for the user, since the message needs Once the inflammatory message has been detected and
to be analyzed before it is sent to the other party. reformulated, it needs to be re-introduced into the system
in a useful manner. Ways of doing this include:
2. Training a bespoke machine learning model to
detect whether a message warrants intervention.</p>
        <p>This approach was used by the authors in [10].</p>
        <p>The researchers trained a model on previous
messages from an ODR court, to detect whether an</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>1. The reformulated message could be sent directly</title>
      <p>to the other party.
2. The user could be provided with the reformulated
version of their message, and given the option to
send or edit the reformulated message, or send
their original message.</p>
      <p>For the LLMediator, we chose the second option, since
this preserves the control of the user over their
communication, while potentially nudging them toward a more
positive tone. The user is thus shown the reformulated
message, and has the option to either accept and send
the reformulated version, edit the reformulated version
and then send it, or to send the original message.</p>
      <sec id="sec-13-1">
        <title>4.2.4. Additional examples</title>
        <p>that a mediator may want to use, based on a discussion,
were previously explored in [10]. In our research, the
suggested messages are generated ad hoc using an LLM,
in our case the GPT-4 API provided by OpenAI. The
mediator can launch the draft feature by clicking a button
in their chat interface.</p>
      </sec>
      <sec id="sec-13-2">
        <title>4.3.1. Generating the message suggestion</title>
        <p>In order to generate the message, we use the following
prompt:</p>
        <p>“ You are a mediator. Your goal is to guide the
discussion of two parties towards an amicable settlement that is
acceptable to both parties. Respond to this communication
between the parties. Stick to the role of the mediator - do
not complete the dialog of the parties. Remain neutral, do
not take the side of any party. ”</p>
        <p>This prompt was arrived at after empirical
investigations. Further, we provide the model with a context of
the 10 latest messages in the conversation. In case the
mediator enters an additional instruction (see figure 4,
these instructions are also sent to the model. Based on
this context, the model can gain an understanding of the
discussion and hopefully provide a helpful draft message
to the mediator.</p>
      </sec>
      <sec id="sec-13-3">
        <title>4.3.2. Further examples</title>
        <p>Table 2 shows a number of examples where the
situation described above in section 3 was used, after an
additional follow-up by John inquiring why he should
be liable when the postal service is clearly at fault. The
table presents the suggested intervention by the model,
optionally together with additional instructions provided
by the mediator.</p>
        <sec id="sec-13-3-1">
          <title>4.4. F3 - Autonomously intervening in the negotiation?</title>
          <p>The most radical use of the LLMediator is the direct
intervention of the system in the negotiation. Here, the
system acts as a mediator itself. While this could be a
very interesting and powerful use-case, it also poses a
4.3. F2 - Drafting messages for the number of substantial risks, as will be explored in section
mediator 5.4. Thus, considerable studies would have to be
performed prior to the deployment of such a system. Here,
Another feature of the LLMediator is to provide a sug- it is presented as purely experimental feature.
gestion for an intervention that a mediator might want
to send to the parties of a dispute. In mediation, a third- 4.4.1. Triggers
party mediator aims to intervene in a discussion to guide
it toward a friendly settlement between the parties. Such For the LLMediator to be able to intervene, there needs to
an intervention can be very powerful and positive. be a trigger to determine when it should send a messages.</p>
          <p>In the LLMediator, we provide a feature that allows Some possibilities for such a trigger include:
the mediator to receive a suggestion for a message that
they may consider as an intervention in a discussion 1. Intervening when there has been a period of
inby the parties. The suggestion of pre-defined messages activity (compare [10])</p>
        </sec>
      </sec>
      <sec id="sec-13-4">
        <title>Mediator instructions</title>
        <p>None
Inquire whether there might be an insurance ofered by the
trading platform used
Ask the parties to clarify the model, value and state of the
sold good.</p>
      </sec>
      <sec id="sec-13-5">
        <title>Suggested intervention</title>
        <p>Thank you for expressing your concern, John. It’s clear
that the situation isn’t ideal for either party. One possible
solution could be to file a claim with the postal service to
seek reimbursement for the damaged camera. That way,
Jane can receive compensation for the broken camera and
the responsibility would shift to the postal service. Would
both of you be open to trying this approach to reach a
resolution?
John, I understand your concern. It might be possible that
the trading platform you have used for the transaction ofers
some form of insurance or buyer/seller protection. In order
to consider this as an option, could you please let us know
which platform you used for the transaction and if they
ofer anything in this regard? This might help both of you
reach a fair and amicable resolution.</p>
        <p>I understand your concerns, John. However, it’s important
to consider that part of the responsibility lies in the
packaging of the item to ensure its safe delivery. In order to
evaluate the options more fairly, could both of you please
provide more information about the camera, such as the
model and the estimated value, as well as its condition at
the time of the sale? This will allow us to further discuss the
possible solutions mentioned earlier and find a resolution
that both parties find satisfactory.</p>
      </sec>
      <sec id="sec-13-6">
        <title>4.4.2. Generating the message</title>
        <p>Since the focus of the LLMediator is to evaluate the
capability of the LLM models to interact with the user in
a useful way, we have currently only implemented the
last option.</p>
        <p>2. Intervening when the discussions between the the messages generated by the system.
parties are getting heated (compare [10] and
section 4.2.1).
3. Intervening every few messages (e.g., every 10 5. Discussion</p>
        <p>messages sent) to shape and guide the discussion. In this paper, we have demonstrated selected capabilities
4. Intervening when asked to by either of the par- of LLMediator, an experimental platform to explore the
ties. use of LLMs in an ODR context. The platform currently
ofers several features, including (F1) the reformulation
of messages to a more neutral tone, (F2) the assisting
of the mediators by providing suggested interventions,
and (F3) the optional direct intervention of the system in
the negotiation. Let us discuss selected aspects of these
diferent features, including their potential and possible
risks.</p>
        <p>Once it has been determined that the system should
intervene, the previous messages are sent to the LLM, which
will generate a response that can be provided by the
system to the user. In order to generate the message, the
same procedure as described above in section 4.3.1 is
used. However, in this case, the message is not given
to the human mediator, but rather sent directly to the
parties in the conversation.</p>
      </sec>
      <sec id="sec-13-7">
        <title>4.4.3. Additional examples</title>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>To better understand the types of messages suggested by the LLMediator, table 3 lists a few example disputes, and</title>
      <sec id="sec-14-1">
        <title>5.1. The large language models</title>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>An important takeaway of this research is the power of GPT-4. We use the model to reformulate messages to be less inflammatory, and to suggest or directly send interventions to negotiating parties.</title>
      <p>The capability of the model to accomplish these tasks
is very promising. While a more in-depth evaluation
is required to better understand the diferent
failuremodels and fully understand the performance, the model
seems to be capable of understanding a wide variety of</p>
      <sec id="sec-15-1">
        <title>Dispute summary</title>
        <p>A tenant (Jane) wants specific performance and damages for
a water leak that they supposedly informed their landlord
(John) of 5 weeks ago, claiming the landlord did not respond.
The landlord claims that they were never informed.
A person (John) wants damages for lost wages, after falling
on a road in front of the other parties apartment. The other
party (Jane) claims that they did clear the snow that day.
John borrowed 400 CAD from Jane to repay some credit
card debt, but became unemployed and is now unable to
repay. Jane wants the money back with interest. Multiple
messages have been exchanged back and forth.
Jane ordered seeds from Johns website, but never received
them. John claims to habe no record of the order. A heated
conversation ensues.</p>
      </sec>
      <sec id="sec-15-2">
        <title>Generated intervention</title>
        <p>As a mediator, I would like to help Jane and John resolve
this issue. It appears there may be a misunderstanding
about the communication taken place. Firstly, let’s try to
establish the facts. Jane, could you please provide more
information about when and how you informed John about
the water leak? And John, is there any possibility that you
might have missed or overlooked this communication? Let’s
work together to find a fair and acceptable solution for both
parties.</p>
        <p>As your mediator, I understand that both of you have
concerns and perspectives on this issue. John, you experienced
an accident that resulted in lost wages due to the snow
outside Jane’s apartment. Jane, you claim that you had
cleared the snow earlier that day. To move forward, let’s
first establish the extent of responsibility each party has in
this situation. This includes discussing the circumstances
of the accident further and any relevant information, such
as local laws or regulations regarding snow removal. Would
you both be willing to discuss in more detail the specifics of
the incident and the snow removal practices at Jane’s
apartment? This way we can better understand the situation and
work towards an amicable agreement.</p>
        <p>Thank you for providing more context about the situation.</p>
        <p>It seems like the initial agreement was informal and based
on John’s promise to repay when he got his next paycheck.</p>
        <p>However, John is currently unemployed, which makes the
repayment more challenging. As a suggestion, would both
of you be open to discussing a repayment plan that takes
John’s current financial situation into consideration without
burdening Jane? This could include re-assessing the interest
or agreeing on a feasible timeframe to repay the debt.</p>
        <p>As the mediator in this situation, I would like to remind both
parties to remain respectful during this discussion. Jane,
I understand that you have concerns regarding the status
of your order, and John, I hear that you have no record of
the transaction. Let’s try to work together to identify and
resolve the issue. Jane, would you mind providing any
evidence or details regarding your order, such as a transaction
ID, order number, or a confirmation email? This will help
John to verify your order in his system. John, please be
patient while we gather this information, and once we have it,
I kindly ask you to look into your system to confirm Jane’s
order. By acting in a respectful and cooperative manner we
can work on finding a solution that satisfies both parties.
contexts, and providing a relevant and appropriate re- emergent behaviour from the model. The prompts we
sponses, clearly taking into account the provided instruc- provided are very short, and an interesting approach for
tions and user messages. When reformulating a message, future work would be to examine the impact of further
the model appears to maintain the important elements “tuning” these prompts to achieve better results.
Howof a message, while at the same time changing the tone. ever, even so, the model seems to have a remarkable
beWhen drafting messages for the mediator, the model has havior that well corresponds with what it is expected to
no problem understanding the messages between the do. For example, as we can see in Section 4.3.1, the model
parties, and generating a relevant response. is essentially only told that it should act as a mediator</p>
        <p>It is interesting how much of these capabilities are and encourage the parties to come to a settlement. Based
on this simple prompt, the model uses a number of inter- inaccuracies in the reformulated message may lead to
esting and relevant techniques, such as establishing the misunderstandings, and the sending party may be
frusfacts when they are unclear, reminding the parties to be trated that they are not allowed to express themselves
respectful if necessary, and even suggesting relevant so- freely. Further, it raises interesting questions with
relutions to the parties. It seems likely the model, through gards to self-expression—is it really fair to put words in
being trained on billions of pages of text by OpenAI, has someones mouth in this manner?
learnt what kind of interventions a mediator may under- Similar points were discussed in the context of a study
take, and even which solutions could be appropriate for that investigated the introduction of a limit on the
numdiferent kinds of disputes. ber of messages that can be sent in a certain time period</p>
        <p>We want to highlight the fact that all of these tasks for individuals discussing controversial topics.
Experiwere accomplished without any training on our part ment participants did indicate that the limitation made
whatsoever. Previously, a model that is able to read a the messages written more intentional. However, they
message and provide a reformulated message, or draft also expressed frustration regarding the imposed limit
context-specific interventions, is likely to have required and its impact on building rapport between the parties
a dataset of thousands of existing examples. Even then, [43].
it is questionable whether the performance would reach Instead, we chose a more feasible and less risky
apthat of the GPT-4 model used in this case, as these are proach, of merely showing the reformulated message to
very complex tasks, especially considering the variety of the user, and giving them the choice of which message to
diferent domains the model seems to be able to handle. send. Here, the AI system acts as an augmented
intelliIn a domain such as law, where training data is often gence, that can help the parties convey their meaning in
dificult to create, finding useful ways to leverage such a more neutral way. Seeing a notification by the system
models may lead to very efective approaches. telling the user that their message may contain
inflamma</p>
        <p>At the same time, the models have limitations. For tory language could nudge them towards reconsidering
example, they are sometimes prone to hallucination, i.e., their message. Such an approach has previously been
making up untruthful information. While this is not tried by platforms such as twitter, which found that users
something that we witnessed in our experiments, de- that were prompted to reconsider harmful or ofensive
pending on the use case, this may involve certain risks. tweet replies were 34% likely to reformulate their reply,
Further, the texts given by the model may not be accurate. or not send it at all [44].</p>
        <p>For example, in the final example in table 2, the suggested Our approach uses LLMs to additionally suggest a
reintervention posits that the seller is responsible for the formulated version to the user. This could be a very
packaging of the product. This could be seen as giving an powerful way to further encourage them to write their
opinion, and may not be desirable in a mediation context. message using more acceptable tone, by making a
con</p>
        <p>To overcome these potential risks, the LLMediator pri- crete suggestion, which requires only the push of a button
marily employs the LLMs to generate suggestions that to send. However, the user is always in charge, and thus
have to be reviewed by a person before being sent out. able to correct any mistakes in the message, or update it
Let us discuss the diferent use-cases for LLMs in the to more closely align with what they want to say, while
system. retaining the more neutral tone.</p>
        <p>Thus, this system has a lot of potential in
support5.2. F1 - The reformulation of ing the parties to settle their disputes in a friendly and
eficient manner. At the same time, as an augmented
inflammatory messages intelligence system, the risks of hallucinations are
relatively low—the system supports the user by suggesting
a reformulation, but does not force them to write their
message in a certain way. While a deployment of such a
system to a real-world ODR platform would, of course,
require more empirical evaluation of the risks of such a
system, the framing as an augmented intelligence system
significantly reduces the risk of such a deployment.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>The first LLM-enhanced feature of the LLMediator is</title>
      <p>the automatic reformulation of messages, to make the
messages less inflammatory. This could help maintain the
discussion at a positive and neutral tone, and increase the
likelihood of the parties finding an amicable settlement.</p>
      <p>This feature is based on detecting inflammatory messages
and then making a request to GPT-4 with the message
and a prompt telling the LLM to reformulate it using a
more neutral tone. 5.3. F2 - Drafting messages for the</p>
      <p>We discuss two ways this message could then be used mediator
by the system in section 4.2.3. The first potential way is
to automatically reformulate the message, and send the The second LLM-enhanced feature of the LLMediator is
reformulated version to the other party directly. How- the drafting of a possible intervention for a mediator. In
ever, this approach has potential negative efects. Any order to utilize this feature, the mediator can initiate a
request for a suggestion, which asks the LLM to draft a 5.4. F3 - Autonomously intervening in the
message based on the previous messages exchanged by negotiation?
the parties. This message is then shown to the mediator,
who has the option of editing the suggestion, or sending The most radical use of the LLMediator is the direct
init to the parties. Further, they have the option to provide tervention of the system in the negotiation. Here, the
additional instructions to the model, which are taken into system acts as a mediator directly. Just like in F2, a
mesaccount by the LLM in drafting a suggestion. sage is drafted. However, instead of sending the message</p>
      <p>In this task, the model has significantly more freedom as a suggestion to the mediator, now the messages are
than in F1. We are not asking it simply to reformulate sent directly to the parties instead.
a message in a new tone, but to autonomously decide This approach carries with it substantial risks—since
how to respond to the messages of the parties. The re- under this use, no humans verify the message before it
sponses we examined do this by dynamically adapting goes to the user, any potential biased or inaccurate
inforto the conversation between the parties, e.g., by calm- mation it contains may have an efect on the negotiation
ing the discussion, trying to establish key facts, or even between the parties. An example of this can be seen in
suggesting possible compromises that the parties may Table 2 in the final example, where the LLM seems to
consider. None of these approaches were part of the indicate that one of the parties was responsible for
packprompt—rather, they were chosen by the model itself, aging the product safely. In doing so, it could be seen
when told to mediate the discussion. This kind of emer- to take the side of one of the parties, which may not be
gent capability is very interesting, and an impressive desirable for a mediator.
demonstration of the power of GPT-4 to perform difer- Of course, the messages would very clearly indicate
ent tasks. that they are written by an AI-based system, and the
par</p>
      <p>At the same time, just like F1, F2 is based on the concept ties are free to disregard the messages of the mediator—
of augmented intelligence. The system does not prescribe they are, after all, a neutral third party aiming to help
to the mediator what they should do, but rather ofers a the parties, and not an adjudicator. However, even so,
bidraft of the intervention message. The mediator can use ased or inaccurate messages may lead the parties to lose
the draft directly, or change it if they do not think it is trust in the platform, or even influence the negotiation
appropriate. They can also give additional instructions towards an unfair outcome. Thus, considerable studies
to the model, which will be incorporated in the prompt. would have to be performed before the deployment of
As can be seen in Table 2, the model has an impressive such a system.
ability to take into account both of these instructions and However, if the model can be prompted and deployed
the previous messaging context. in a manner that is suficiently safe, this approach could</p>
      <p>Supporting the mediator in this way could be very have tremendous implications for access to justice. Every
important in areas where suficient numbers of trained single dispute, no matter how small, could be given a
permediators or facilitators are not available (compare [10]). sonalized, always available mediator. This could enable a
Verifying that a message is appropriate and helpful could form of mediation to be used in areas where disputes are
be faster for an experienced mediator than drafting mes- too small to warrant a mediator intervention, or there
sages from scratch. Thus, fewer mediators could mediate are simply not enough mediators available. If these types
more disputes, which could be an important contribution of interventions could be shown to increase the rate of
to access to justice. settlement between parties, this could be a significant</p>
      <p>At the same time, it is important to be aware of the contribution to society, by potentially creating a new way
potential risks of deploying such a system. The interac- for individuals to resolve their legal problems.
tion between the mediator and the LLM should be closely
studied to make sure that the results are not harmful. For 6. Future Work
example, perhaps a suggested message could anchor a
mediator to a certain type of message, even if another
approach could have been more helpful. Likewise, the
mediator may start to overly rely on the LLM-provided
message to understand the previous messages written
by the parties, which could be risky if the model starts
to hallucinate. Finally, the assumption underlying the
AI suggestion feature is that it is faster for a mediator to
review and/or edit a suggested message than drafting a
new message—empirical studies are needed to confirm
whether this is the case.</p>
    </sec>
    <sec id="sec-17">
      <title>This work lays the groundwork for a number of impor</title>
      <p>tant research directions.</p>
      <p>First, an important avenue is the empirical evaluation
of the presented system. Understanding the actual
efifcacy of the system in dealing with conflicts is crucial
in understanding whether the generated reformulations
and interventions are relevant and useful, how they may
be adapted, and how such a system may eventually be
deployed. Such an evaluation might be done with
experienced mediators who could assess the quality of the
messages, and students taking the roles of parties in a ne- 2021, Juristat: Canadian Centre for Justice Statistics
gotiation. The evaluation should also examine whether (2022) 1–28.
the system presents any biases. [4] N. Semple, The cost of seeking civil justice in</p>
      <p>Another avenue to explore is the further improvement canada, Can. B. Rev. 93 (2015) 639.
and adaptation of the system. For example, while the [5] Justice For All - Final Report, Technical Report, The
prompts appear to work well, so-called “prompt engineer- Task Force on Justice, New York: Center on
Intering” may be an important step towards further improving national Cooperation, 2019.
the results, by altering the prompt sent to the LLM. There [6] D. Carneiro, P. Novais, F. Andrade, J. Zeleznikow,
are also multiple features described in this paper that are J. Neves, Online dispute resolution: an artificial
not yet fully developed, such as the function to trigger intelligence perspective, Artificial Intelligence
Rea reformulation suggestion (see section 4.2.1) and the view 41 (2014) 211–240.
function to trigger an intervention by the AI-mediator [7] H. M. Erichson, The dark side of consensus and
(see 4.4.1). These are important features which need to creativity: What mediators of mass disputes need
be explored and elaborated. to know about agency risks, Fordham L. Rev. 88</p>
      <p>Finally, there may be other interesting ways of inte- (2019) 2155.
grating LLMs into an ODR platform. For example, such a [8] S. R. Cole, K. M. Blankley, Online mediation: Where
system could summarize the conversation held between we have been, where we are now, and where we
the parties for the mediator or adjudicator. Further build- should be, U. Tol. L. Rev. 38 (2006) 193.
ing on the LLMediator to explore such use-cases will be [9] K. Benyekhlef, F. Gélinas, Online dispute resolution,
an important avenue for future work. Lex Electronica 10 (2005) i.
[10] K. Branting, S. McLeod, S. Howell, B. Weiss, B.
Profitt, J. Tanner, I. Gross, D. Shin, A computational
7. Conclusion model of facilitation in online dispute resolution,
Artificial Intelligence and Law (2022) 1–26.</p>
      <p>We presented the LLMediator, an experimental platform [11] A. J. Schmitz, Measuring” access to justice” in the
built to support negotiation and mediation using LLMs, rush to digitize, Fordham L. Rev. 88 (2019) 2381.
such as GPT-4. Currently, the system is able to refor- [12] K. Benyekhlef, J. Zhu, At the intersection of odr
mulate inflammatory messages, and suggest possible in- and artificial intelligence: Traditional justice at the
terventions to the mediator, or even send them directly crossroads, Lex Electronica 25 (2020) 34.
to the parties. We show a few examples of the system [13] OpenAI, Gpt-4 technical report, 2023.
being used in fictitious disputes, and discuss its eficacy.</p>
      <p>Overall, the approach seems promising and opens up the [14] aHr.XTiovu:v2r3o0n3,.T0.8L7a7vr4i.l, G. Izacard, X. Martinet, M.-A.
door for a lot of further research. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E.
Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave,
Acknowledgments G. Lample, Llama: Open and eficient foundation
language models, 2023. arXiv:2302.13971.</p>
      <p>Hannes Westermann and Karim Benyeklef would like [15] L. K. Branting, Advisory systems for pro se litigants,
to thank the Cyberjustice Laboratory at Université de in: Proceedings of the 8th international conference
Montréal, the LexUM Chair on Legal Information and on Artificial intelligence and law, 2001, pp. 139–146.
the Autonomy through Cyberjustice Technologies (ACT) [16] D. Thompson, Creating new pathways to justice
usproject for their support of this research. ing simple artificial intelligence and online dispute
resolution, IJODR 2 (2015) 4.
[17] E. A. Bickel, M. A. van Dijk, E. Giebels, Online legal
References advice and conflict support: A dutch experience,
Report, University of Twente (2015).
[1] T. C. Farrow, A. Currie, N. Aylwin, L. Jacobs, [18] L.-C. Paquin, F. Blanchard, C. Thomasset, Loge–
D. Northrup, L. Moore, Everyday legal problems expert: from a legal expert system to an information
and the cost of justice in canada: Overview report, system for non-lawyers, in: ICAIL 1991, 1991, pp.</p>
      <p>Osgoode Legal Studies Research Paper (2016). 254–259.
[2] A. Currie, The legal problems of everyday life, [19] H. Westermann, K. Benyekhlef, Justicebot: A
in: Access to justice, Emerald Group Publishing methodology for building augmented intelligence
Limited, 2009. tools for laypeople to increase access to justice, in:
[3] L. Savage, S. McDonald, Experiences of serious Proceedings of the Nineteenth International
Conproblems or disputes in the canadian provinces, ference on Artificial Intelligence and Law, 2023.</p>
      <p>[20] J. Zeleznikow, Can artificial intelligence and online
dispute resolution enhance eficiency and efective- in tasks requiring highly specialized domain
experness in courts, in: IJCA, volume 8, HeinOnline, tise?, in: ASAIL’23: 6th Workshop on Automated
2016, p. 30. Semantic Analysis of Information in Legal Text,
[21] R. Susskind, Online courts and the future of justice 2023.</p>
      <p>(2019). [37] J. Savelka, Unlocking practical applications in
le[22] K. Benyekhlef, J. Zhu, Intelligence artificielle et gal domain: Evaluation of gpt for zero-shot
sejustice: justice prédictive, conflits de basse intensité mantic annotation of legal texts, arXiv preprint
et données massives, Intelligence 30 (2018). arXiv:2305.04417 (2023).
[23] E. Bellucci, J. Zeleznikow, Developing negotiation [38] J. Savelka, K. Ashley, M. Gray, H. Westermann,
decision support systems that support mediators: a H. Xu, Explaining legal concepts with augmented
case study of the family_winner system, AI &amp; L. 13 large language models (gpt-4), in: AI4Legs 2023:
(2005) 233. AI for Legislation, 2023.
[24] A. Lodder, E. Thiessen, The role of artificial intelli- [39] H. Westermann, J. Savelka, V. R. Walker, K. D.
Ashgence in online dispute resolution, in: Workshop ley, K. Benyekhlef, Computer-assisted creation of
on online dispute resolution at the international boolean search rules for text classification in the
conference on artificial intelligence and law, Edin- legal domain., in: JURIX, 2019, pp. 123–132.
burgh, UK, 2003. [40] H. Westermann, J. Savelka, V. R. Walker, K. D.
Ash[25] L. Kistemaker, Rechtwijzer and uitelkaar. nl. dutch ley, K. Benyekhlef, Sentence embeddings and
highexperiences with odr for divorce, Family Court speed similarity search for fast computer assisted
Review 59 (2021) 232–243. annotation of legal documents, in: Legal
Knowl[26] R. Bergman, Chatgpt and mediation, 2023. URL: edge and Information Systems: JURIX 2020: The
https://mediate.com/chatgpt-and-mediation/. Thirty-third Annual Conference, Brno, Czech
Re[27] Q. Huang, W. Ouyang, Generate judge-view of on- public, December 9-11, 2020, volume 334, IOS Press,
line dispute resolution based on pretrained-model 2020, p. 164.
method, in: Intelligent Computing Methodologies: [41] H. Westermann, S. Meeùs, M. Godet, A. Troussel,
18th International Conference, ICIC 2022, Xi’an, J. Tan, J. Savelka, K. Benyekhlef, Bridging the gap:
China, August 7–11, 2022, Proceedings, Part III, Mapping layperson narratives to legal issues with
Springer, 2022, pp. 162–169. language models, in: ASAIL’23: 6th Workshop
[28] F. Yu, L. Quartey, F. Schilder, Legal prompting: on Automated Semantic Analysis of Information in
Teaching a language model to think like a lawyer, Legal Text, 2023.
2022. URL: https://arxiv.org/abs/2212.01326. doi:10. [42] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke,
48550/ARXIV.2212.01326. E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S.
Lund[29] M. Bommarito II, D. M. Katz, Gpt takes the bar berg, et al., Sparks of artificial general
intelliexam, arXiv preprint arXiv:2212.14402 (2022). gence: Early experiments with gpt-4, arXiv preprint
[30] D. M. Katz, M. J. Bommarito, S. Gao, P. Arredondo, arXiv:2303.12712 (2023).</p>
      <p>Gpt-4 passes the bar exam, Available at SSRN [43] T. W. Masrani, J. Jamieson, N. Yamashita, H. A. He,
4389233 (2023). Slowing it down: Towards facilitating interpersonal
[31] J. Goodhue, Y. Wei, Classification of trademark dis- mindfulness in online polarizing conversations over
tinctiveness using openai gpt 3.5 model, Available social media, Proceedings of the ACM on
Humanat SSRN 4351998 (2023). Computer Interaction 7 (2023) 1–27.
[32] A. Blair-Stanek, N. Holzenberger, B. Van Durme, [44] A. Butler, A. Parrella, Tweeting with consideration,
Can gpt-3 perform statutory reasoning?, arXiv 2021. URL: https://blog.twitter.com/en_us/topics/
preprint arXiv:2302.06100 (2023). product/2021/tweeting-with-consideration.
[33] H.-T. Nguyen, R. Goebel, F. Toni, K. Stathis,</p>
      <p>K. Satoh, How well do sota legal reasoning
models support abductive reasoning?, arXiv preprint
arXiv:2304.06912 (2023).
[34] S. Hamilton, Blind judgement: Agent-based
supreme court modelling with gpt, arXiv preprint
arXiv:2301.05327 (2023).
[35] J. Tan, H. Westermann, K. Benyekhlef, Chatgpt as
an artificial lawyer?, in: Artificial Intelligence for</p>
      <p>Access to Justice (AI4AJ 2023), 2023.
[36] J. Savelka, K. Ashley, M. Gray, H. Westermann,</p>
      <p>H. Xu, Can gpt-4 support analysis of textual data</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>