LLMediator: GPT-4 Assisted Online Dispute Resolution
Hannes Westermann1,∗ , Jaromir Savelka2 and Karim Benyekhlef1
1
    Cyberjustice Laboratory, Faculté de droit, Université de Montréal, Québec, Canada
2
    School of Computer Science, Carnegie Mellon University, Pittsburgh, USA


                                          Abstract
                                          In this article, we introduce LLMediator, an experimental platform designed to enhance online dispute resolution (ODR)
                                          by utilizing capabilities of state-of-the-art large language models (LLMs) such as GPT-4. In the context of high-volume,
                                          low-intensity legal disputes, alternative dispute resolution methods such as negotiation and mediation offer accessible and
                                          cooperative solutions for laypeople. These approaches can be carried out online on ODR platforms. LLMediator aims to
                                          improve the efficacy of such processes by leveraging GPT-4 to reformulate user messages, draft mediator responses, and
                                          potentially autonomously engage in the discussions. We present and discuss several features of LLMediator and conduct
                                          initial qualitative evaluations, demonstrating the potential for LLMs to support ODR and facilitate amicable settlements. The
                                          initial proof of concept is promising and opens up avenues for further research in AI-assisted negotiation and mediation.

                                          Keywords
                                          LLMediator, GPT-4, Access to Justice, Large Language Models, Online Dispute Resolution, ODR, Augmented Intelligence


1. Introduction                                                                                                 bring creativity in thinking about potential solutions,
                                                                                                                and to identify overlapping interests between the parties
Many individuals face difficulties while resolving their that may be helpful in achieving a consensus between
legal disputes. This is especially the case in areas of high- them [7, 8]. These processes can be carried out online,
volume, low-intensity disputes (such as debt, consumer in so-called online dispute resolution (ODR) [9]. This
and employment issues), which often affect laypeople, may have several advantages, including being more ef-
i.e., individuals without legal training [1]. Laypeople ficient and convenient for the individuals, who do not
may often not understand what legal rules apply to their need to physically attend a court. The cooperative and
situation, which can be an impediment to enforcing their non-adversarial nature of these approaches to dispute
rights [2]. Even if they are aware of their rights, they may resolution may help the parties devise an amicable so-
struggle to come to a resolution (see, e.g., [3]). Bringing lution that both parties feel ownership of [6, 9, 10, 11].
the dispute to court could entail significant monetary, ODR can be a very powerful approach—e.g., in PARLe-
temporal and psychological costs [4]. Unresolved legal OPC, an ODR platform developed by the Cyberjustice
issues can be very stressful and they may even impact Laboratory and implemented with the Quebec consumer
individuals’ health. Unresolved legal issues were esti- protection authority, 70% of cases are settled, 45% at the
mated to cost society 746 million dollars in Canada alone stage of negotiation and 25% at the stage of mediation
in 2014 [1]. Globally, the United Nations Task Force on [12].
Justice estimates that 1.5 billion individuals are unable                                                          To foster further improvements in the usability and
to resolve their legal issues at any point in time [5].                                                         efficacy of ODR systems, we present LLMediator—an
              A popular way of increasing access to justice is the use experimental platform focused on exploring how large
of so-called alternative dispute resolution (ADR) meth- language models (LLMs) such as GPT-4 may be used to
ods, i.e., methods of solving disputes outside of the tra- support the parties and the mediator in arriving at a settle-
ditional judicial system [6]. Such methods include, e.g., ment. LLMs have recently risen to fame as very powerful
negotiation or mediation. During negotiation, the par- AI systems. Models such as GPT-4 [13] or LLama [14] are
ties communicate to each other, attempting to reach an able to perform various tasks based on a textual prompt,
amicable agreement. In mediation, a neutral third party which is often not too far from instructions understand-
is expected to establish a rapport with the parties, to able to a human. The models can also be accessed via an
Workshop on Artificial Intelligence for Access to Justice (AI4AJ 2023), application programming interface (API), which allows
June 19, 2023, Braga, Portugal.                                                                                 developers to integrate the capabilities into their own
∗
     Corresponding author.                                                                                      applications.
Envelope-Open hannes.westermann@umontreal.ca (H. Westermann);                                                      LLMediator is an experimental system leveraging GPT-
jsavelka@cs.cmu.edu (J. Savelka); karim.benyekhlef@umontreal.ca
                                                                                                                4 to enhance an ODR platform. The system can refor-
(K. Benyekhlef)
Orcid 0000-0002-4527-7316 (H. Westermann); 0000-0002-3674-5456                                                  mulate users’ messages to be, e.g., less emotional and
(J. Savelka); 0000-0001-9390-556X (K. Benyekhlef)                                                               more conducive to reaching a settlement. Further, LL-
                   © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                   Attribution 4.0 International (CC BY 4.0).                                                   Mediator can suggest draft messages for the mediator
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
(or facilitator), taking into account previous messages           cations. Yu et al. applied GPT-3 to the COLIEE legal
by the parties and special instructions by the mediator.          entailment task that is based on the Japanese Bar exam,
Finally, LLMediator offers an experimental feature to au-         substantially improving over the existing state-of-the-art
tonomously respond to the discussion by the parties. In           [28]. Similarly, Bommarito II and Katz utilized GPT-3.5
this paper, we demonstrate selected features of LLMedi-           for the Multistate Bar Examination [29]. The model per-
ator and perform initial qualitative evaluations of this          formed surprisingly well but not well enough to pass
system.                                                           the exam. Later, Katz et al. applied GPT-4 to the en-
                                                                  tire Uniform Bar Examination (UBE) and observed the
                                                                  system passing the exam [30]. The use of GPT models
2. Related Work                                                   has further been explored in the context of the assess-
                                                                  ment of trademark distinctiveness [31], legal reasoning
Using artificial intelligence (AI) to increase access to jus-
                                                                  [32, 33], U.S. Supreme court judgment modeling [34] and
tice is an often studied question. For example, systems
                                                                  giving legal information [35]. Likewise, GPT-4 has been
have been built that seek to help users understand their
                                                                  used to annotate textual legal data based on annotation
rights in certain situations (see, e.g., [15, 16, 17, 18, 19]).
                                                                  guidelines [36, 37] and to generate explanations of legal
In this work, we present an approach to facilitate dispute
                                                                  concepts based on sentences from statutes [38]. Here, we
resolution via augmented intelligence tools for negotia-
                                                                  use the models to increase access to justice, by supporting
tion and mediation.
                                                                  parties in negotiation and mediation.
   Integrating AI into the negotiation process has previ-
                                                                     In this work, the key uses we explore employ GPT-
ously been explored by multiple authors. One possible
                                                                  4 as an augmented intelligence tool, that supports the
such integration is the display of a so-called “Best Al-
                                                                  users in their decision-making as opposed to directly
ternative to the Negotiated Agreement” (BATNA) dur-
                                                                  providing instructions or advice, which can decrease the
ing negotiation, which can provide contextual informa-
                                                                  risk associated with such tools. Previously, this approach
tion helping the users find a fair agreement (see e.g.
                                                                  has been explored in the legal domain to generate boolean
[6, 20, 21, 22, 17]). Other approaches include using game-
                                                                  search rules [39], perform annotations more efficiently
theory methods to elicit an acceptable settlement (see,
                                                                  [40], and provide legal information to laypeople [19, 41].
e.g., [23, 24]), and by structuring the dialogue between
parties [25]. Here, we detect messages by the parties
containing inflammatory language, and suggest alterna-            3. Example use cases
tive ways of conveying the message, that may be more
amenable to amicable resolution.                                  LLMediator is a web-based application that contains the
   Similarly, AI systems have been built to support medi-         necessary features to explore and evaluate the integration
ation. Branting et al. developed a system that monitors           of LLMs into the negotiation and mediation processes.
messages exchanged during negotiation for situations              Upon entering the system, the user is exposed to a chat
that require an immediate human intervention, such as             interface, that allows them to discuss their issue with the
conflict escalation or inactivity. The system can further         other party, with the aim to come to an amicable solution,
suggest standard messages, that may be relevant to a par-         as can be seen in Figure 1. In the experimental prototype,
ticular dispute, to the facilitator [10]. Researchers have        the user can select whether they wish to intervene as one
further investigated the possibility of using ChatGPT to          of the parties (John or Jane) or the mediator. This part of
suggest relevant questions and potential party priorities         the platform is similar to other existing ODR platforms
to to support the mediator in their role [26]. In this work,      that allow the parties to communicate in a convenient
we evaluate the ability of LLMs to generate suggestions           and asynchronous manner without having to physically
for the mediator, or potentially even intervene in a ne-          relocate.
gotiation autonomously, which could support mediators                However, the LLMediator platform also has several
or even be used to provide certain mediation services             novel aspects, that use the power of LLMs to support
in areas where traditional mediation is not feasible (e.g.,       various tasks in the mediation platform. These features
due to the costs).                                                include:
   Huang and Ouyang proposed a technique for develop-
                                                                      1. F1 - Suggesting reformulations of inflammatory
ing a judge perspective generation model in the context
                                                                         messages to be less emotional and, hence, more
of ODR. The approach employs pre-trained LLMs, and is
                                                                         conducive to finding an amicable settlement
adaptable to various scenarios through minimal low-code
                                                                      2. F2 - Drafting message suggestions for mediators
fine-tuning [27].
                                                                         to more efficiently intervene in a discussion
   Our work explores the use of LLM models, e.g., GPT-4,
to support legal tasks. There has been a growing interest             3. F3 - Potentially, autonomously intervening in a
in exploring capabilities of GPT models in such appli-                   discussion to guide the parties towards an amica-
                                                                         ble solution
Figure 1: A screenshot of the LLMediator interface, showing
a dispute regarding a broken camera.                          Figure 2: The LLMediator has detected a message that is not
                                                              helpful for an amicable settlement. It suggests a reformulation
                                                              that the user may consider to send instead.

The technical implementation of these features is de-
scribed in Section 4.
                                                                 The mediator can press a button in the interface to
                                                              generate a draft intervention. This feature uses GPT-4 to
3.1. F1 - Reformulating inflammatory                          read the previously sent messages and draft a suggested
     messages                                                 message that gently guides the parties toward a friendly
                                                              resolution. Figure 3 shows such a suggested intervention.
Emotions can run high in negotiation situations. In Fig-         This message may serve as a starting point for the
ure 2, the user intends to send a message that is inflam-     mediator to decide how to intervene. They have the
matory, and may derail attempts to find a mutually ac-        option to send the message, or edit it and then send it.
ceptable solution. The LLMediator platform detects this       Further, they can provide additional instructions to GPT-
language, and offers an alternative formulation to the        4, guiding the model to emphasize certain aspects. For
user, generated by GPT-4. The user can choose to send         example, the mediator may guide the parties to consider
the original message, send the reformulated message di-       the packaging of the camera. Figure 4 shows a screenshot
rectly, or edit and then send the reformulated message.       of what happens when the mediator instructs the model
The suggestion may nudge the user toward stating their        to explore this issue with the parties.
request in a more neutral fashion, which could be more
helpful for arriving at an amicable resolution to their
dispute.                                                  3.3. F3 - Autonomously intervening in the
                                                                    negotiation?
3.2. F2 - Drafting messages for the                           In some situations, it may make sense to allow the model
     mediator                                                 to automatically intervene in the negotiation. This could
                                                              be the case when the dispute value is too low for em-
When negotiation is in deadlock or does not result in         ploying a human mediator, or if there are not enough
a settlement, it may be useful for a neutral, third-party     mediators to attend to all disputes in a certain area.
mediator to intervene in the discussion. Let us consider         In Figure 5, one of the parties requested an interven-
the role of the mediator to see how the LLMediator can        tion by the LLMediator, to move the discussion forward.
support them in the task of encouraging the parties to        The LLMediator autonomously generated a message and
arrive to a friendly resolution.
Figure 3: The LLMediator suggests a possible intervention   Figure 4: The LLMediator incorporates specific instructions
for the mediator.                                           by the mediator in generating a message.


sent it to the parties, suggesting a few possible options   4.1. Large language model used
to encourage a settlement. There may also be other trig-
                                                            For the current version of the system, we used GPT-4
gers to move the system to intervene, such as a period
                                                            (Generative Pre-trained Transformer). GPT-4 is a multi-
of inactivity, or inflammatory messages being sent (see
                                                            modal pre-trained LLM created by OpenAI [13]. The
Section 4.4.1). As will be discussed in Section 5.4, this
                                                            model has shown impressive performance on a wide va-
approach may carry with it certain risks that would need
                                                            riety of tasks, including passing the Uniform Bar Exami-
to be carefulyl studied before it could be deployed.
                                                            nation [30]. Some researchers even argue that the model
   We have now seen the three current capabilities of the
                                                            shows sparks of general intelligence [42]. We interacted
LLMediator to support the parties in a mediation. As
                                                            with the 8k token version of the model via the OpenAI
we can see, the system suggests relevant and context-
                                                            API.1
adapted reformulations and interventions, that may en-
courage the parties to use calmer language, or save me-
diator’s time. Next, let us take a look at the technical    4.2. F1 - Reformulating inflammatory
considerations behind the system.                                messages
                                                         The first capability of the LLMediator is the reformulating
4. Technical considerations                              of inflammatory messages. The motivation behind this
                                                         feature is that individuals may have a lot of emotions
For the features described in the preceding section, a attached to their dispute, which could be reflected in
number of choices need to be considered. These involve the tone of the messages. Such messages may increase
questions such as when the features should be activated, the tension and aggravation between the parties rather
how the LLM should be prompted to achieve the best than decrease it, making it less likely that the parties
results, and what to do with the resulting text. In this will find an amicable agreement, which is the goal of the
section, we will explore these considerations.           negotiation process.

                                                            1
                                                                GitHub: OpenAI Python Library. Available at: https://github.com/
                                                                openai/openai-python [Accessed 2023-02-09]
                                                                     immediate intervention was necessary, and sug-
                                                                     gest standard message responses.
                                                                  3. Using an off-the-shelf machine learning model.
                                                                     Sentiment detection is a well-studied field, with
                                                                     many available models. These models could be
                                                                     used to detect messages with a negative polar-
                                                                     ity, or even to detect certain emotions (such as
                                                                     angry or sad), and use thresholds to trigger an
                                                                     intervention.
                                                                  4. Using keyword search. The simplest approach in
                                                                     detecting inflammatory words is simply to scan
                                                                     the message sent by the user for inflammatory
                                                                     words, such as “a**hole”.
                                                                  5. The user may decide that they would like a sug-
                                                                     gestion to reformulate a message, e.g., if they are
                                                                     aware that their message may come across as an-
                                                                     gry or defensive, and would like a suggestion as
                                                                     to how they may express themselves in a more
                                                                     neutral fashion.
Figure 5: The LLMediator could potentially automatically
intervene in a discussion.                                 For the LLMediator platform, the main focus is inves-
                                                        tigating whether the model is able to reformulate the
                                                        message in a useful way. Therefore, we chose options 4
                                                        and 5. The system scans the message for inflammatory
   In order to overcome the issue, the LLMediator con- terms, and suggests a reformulation if any such terms
tains a feature that aims to reformulate the message of are found. Further, the user has the option to manually
the user to a more neutral tone, while preserving the activate the reformulation dialog. More sophisticated
overall semantic content of the message. This way, the methods to detect inflammatory messages will be inves-
parties are able to communicate their opinions, without tigated in future work.
escalating the tone of the conversation.
                                                              4.2.2. Reformulating the message
4.2.1. Detect a message requiring intervention
                                                              Once the inflammatory message has been detected, the
In order to only intervene when necessary, the LLMedi-        next step is to prepare a reformulated message with a
ator needs a way to detect whether a message contains         more constructive tone, while maintaining the core mean-
inflammatory language and would benefit from reformu-         ing of the message. The message is thus sent to an LLM,
lation. Thus, the first step is to classify the tone of the   together with a prompt instructing the LLM to refor-
message. It should be noted that the level of emotions and    mulate the message. For the LLMediator, we are using
anger in a message that should lead to an intervention is     the GPT-4 API provided by OpenAI, together with the
a policy decision—different thresholds may lead to differ-    following prompt:
ent results. Further, one could imagine multiple methods         “You are an ODR platform. You are given the chat mes-
of detecting whether a message may be inflammatory.           sage of a party. Reformulate the message to maintain the
These methods include:                                        content, but make it less confrontational and more encour-
    1. Contacting an LLM, such as GPT-4, with every           aging for an amicable settlement. Respond directly with
       message and inquiring whether it is inflammatory       the reformulated message, do not explain.”
       or not. Depending on the volume of messages
       on a platform, this method may be expensive and        4.2.3. Showing the reformulation to the user
       introduce latency into the platform, which may
                                                              Once the inflammatory message has been detected and
       be confusing for the user, since the message needs
                                                              reformulated, it needs to be re-introduced into the system
       to be analyzed before it is sent to the other party.
                                                              in a useful manner. Ways of doing this include:
    2. Training a bespoke machine learning model to
       detect whether a message warrants intervention.            1. The reformulated message could be sent directly
       This approach was used by the authors in [10].                to the other party.
       The researchers trained a model on previous mes-           2. The user could be provided with the reformulated
       sages from an ODR court, to detect whether an                 version of their message, and given the option to
       send or edit the reformulated message, or send       that a mediator may want to use, based on a discussion,
       their original message.                              were previously explored in [10]. In our research, the
                                                            suggested messages are generated ad hoc using an LLM,
   For the LLMediator, we chose the second option, since in our case the GPT-4 API provided by OpenAI. The
this preserves the control of the user over their commu- mediator can launch the draft feature by clicking a button
nication, while potentially nudging them toward a more in their chat interface.
positive tone. The user is thus shown the reformulated
message, and has the option to either accept and send
                                                            4.3.1. Generating the message suggestion
the reformulated version, edit the reformulated version
and then send it, or to send the original message.          In order to generate the message, we use the following
                                                            prompt:
4.2.4. Additional examples                                     “ You are a mediator. Your goal is to guide the discus-
                                                            sion of two parties towards an amicable settlement that is
Table 1 shows a few additional examples of reformulated acceptable to both parties. Respond to this communication
messages:                                                   between the parties. Stick to the role of the mediator - do
                                                            not complete the dialog of the parties. Remain neutral, do
  Original message             Reformulated message
                                                            not take the side of any party. ”
  What the ****? I told you    I wanted to remind you
  about the water leak weeks   that I brought up the wa-
                                                               This prompt was arrived at after empirical investiga-
  ago and you did nothing! ter leak issue a few weeks       tions. Further, we provide the model with a context of
  Fix it or I will see you in  ago. It would be great if we the 10 latest messages in the conversation. In case the
  court!                       could find a solution to ad- mediator enters an additional instruction (see figure 4,
                               dress it before considering  these instructions are also sent to the model. Based on
                               legal action.                this context, the model can gain an understanding of the
  You still have not repaid me It seems that the 1000 USD   discussion and hopefully provide a helpful draft message
  the 1000 USD I lent you! I lent you hasn’t been re-       to the mediator.
 You are the worst friend        paid yet. As friends, let’s
 ever, we are done!              discuss this issue and work
                                 towards resolving it amica-
                                                               4.3.2. Further examples
                                 bly.                          Table 2 shows a number of examples where the situ-
 Here is what happened:          I noticed on April 3rd that   ation described above in section 3 was used, after an
 I told you that the tree        the tree was overhanging      additional follow-up by John inquiring why he should
 was hanging over my lawn        my lawn.        Despite ad-
                                                               be liable when the postal service is clearly at fault. The
 many on the 3rd of April.       dressing the issue, it re-
 On the 15th, it was still       mained unchanged by the       table presents the suggested intervention by the model,
 there, so I cut it down. This   15th, which led me to cut     optionally together with additional instructions provided
 is your ******* fault, you      it down. I believe this       by the mediator.
 could have fixed it!!           situation could have been
                                 avoided if timely action      4.4. F3 - Autonomously intervening in the
                                 was taken on your part.
                                                                    negotiation?
Table 1
Example of original and reformulated messages               The most radical use of the LLMediator is the direct in-
                                                            tervention of the system in the negotiation. Here, the
                                                            system acts as a mediator itself. While this could be a
                                                            very interesting and powerful use-case, it also poses a
4.3. F2 - Drafting messages for the                         number of substantial risks, as will be explored in section
        mediator                                            5.4. Thus, considerable studies would have to be per-
                                                            formed prior to the deployment of such a system. Here,
Another feature of the LLMediator is to provide a sug- it is presented as purely experimental feature.
gestion for an intervention that a mediator might want
to send to the parties of a dispute. In mediation, a third-
                                                            4.4.1. Triggers
party mediator aims to intervene in a discussion to guide
it toward a friendly settlement between the parties. Such For the LLMediator to be able to intervene, there needs to
an intervention can be very powerful and positive.          be a trigger to determine when it should send a messages.
    In the LLMediator, we provide a feature that allows Some possibilities for such a trigger include:
the mediator to receive a suggestion for a message that
they may consider as an intervention in a discussion             1. Intervening when there has been a period of in-
by the parties. The suggestion of pre-defined messages               activity (compare [10])
 Mediator instructions                                          Suggested intervention
 None                                                           Thank you for expressing your concern, John. It’s clear
                                                                that the situation isn’t ideal for either party. One possible
                                                                solution could be to file a claim with the postal service to
                                                                seek reimbursement for the damaged camera. That way,
                                                                Jane can receive compensation for the broken camera and
                                                                the responsibility would shift to the postal service. Would
                                                                both of you be open to trying this approach to reach a
                                                                resolution?
 Inquire whether there might be an insurance offered by the     John, I understand your concern. It might be possible that
 trading platform used                                          the trading platform you have used for the transaction offers
                                                                some form of insurance or buyer/seller protection. In order
                                                                to consider this as an option, could you please let us know
                                                                which platform you used for the transaction and if they
                                                                offer anything in this regard? This might help both of you
                                                                reach a fair and amicable resolution.
 Ask the parties to clarify the model, value and state of the   I understand your concerns, John. However, it’s important
 sold good.                                                     to consider that part of the responsibility lies in the pack-
                                                                aging of the item to ensure its safe delivery. In order to
                                                                evaluate the options more fairly, could both of you please
                                                                provide more information about the camera, such as the
                                                                model and the estimated value, as well as its condition at
                                                                the time of the sale? This will allow us to further discuss the
                                                                possible solutions mentioned earlier and find a resolution
                                                                that both parties find satisfactory.
Table 2
Example of suggested interventions, based on the dispute around the broken camera and optional instructions provided by the
mediator.


    2. Intervening when the discussions between the             the messages generated by the system.
       parties are getting heated (compare [10] and sec-
       tion 4.2.1).
    3. Intervening every few messages (e.g., every 10           5. Discussion
       messages sent) to shape and guide the discussion.
                                                                In this paper, we have demonstrated selected capabilities
    4. Intervening when asked to by either of the par-          of LLMediator, an experimental platform to explore the
       ties.                                                    use of LLMs in an ODR context. The platform currently
   Since the focus of the LLMediator is to evaluate the         offers several features, including (F1) the reformulation
capability of the LLM models to interact with the user in       of messages to a more neutral tone, (F2) the assisting
a useful way, we have currently only implemented the            of the mediators by providing suggested interventions,
last option.                                                    and (F3) the optional direct intervention of the system in
                                                                the negotiation. Let us discuss selected aspects of these
4.4.2. Generating the message                                   different features, including their potential and possible
                                                                risks.
Once it has been determined that the system should inter-
vene, the previous messages are sent to the LLM, which
                                                                5.1. The large language models
will generate a response that can be provided by the sys-
tem to the user. In order to generate the message, the          An important takeaway of this research is the power
same procedure as described above in section 4.3.1 is           of GPT-4. We use the model to reformulate messages
used. However, in this case, the message is not given           to be less inflammatory, and to suggest or directly send
to the human mediator, but rather sent directly to the          interventions to negotiating parties.
parties in the conversation.                                       The capability of the model to accomplish these tasks
                                                                is very promising. While a more in-depth evaluation
4.4.3. Additional examples                                      is required to better understand the different failure-
                                                                models and fully understand the performance, the model
To better understand the types of messages suggested by         seems to be capable of understanding a wide variety of
the LLMediator, table 3 lists a few example disputes, and
 Dispute summary                                                 Generated intervention
 A tenant (Jane) wants specific performance and damages for      As a mediator, I would like to help Jane and John resolve
 a water leak that they supposedly informed their landlord       this issue. It appears there may be a misunderstanding
 (John) of 5 weeks ago, claiming the landlord did not respond.   about the communication taken place. Firstly, let’s try to
 The landlord claims that they were never informed.              establish the facts. Jane, could you please provide more
                                                                 information about when and how you informed John about
                                                                 the water leak? And John, is there any possibility that you
                                                                 might have missed or overlooked this communication? Let’s
                                                                 work together to find a fair and acceptable solution for both
                                                                 parties.
 A person (John) wants damages for lost wages, after falling     As your mediator, I understand that both of you have con-
 on a road in front of the other parties apartment. The other    cerns and perspectives on this issue. John, you experienced
 party (Jane) claims that they did clear the snow that day.      an accident that resulted in lost wages due to the snow
                                                                 outside Jane’s apartment. Jane, you claim that you had
                                                                 cleared the snow earlier that day. To move forward, let’s
                                                                 first establish the extent of responsibility each party has in
                                                                 this situation. This includes discussing the circumstances
                                                                 of the accident further and any relevant information, such
                                                                 as local laws or regulations regarding snow removal. Would
                                                                 you both be willing to discuss in more detail the specifics of
                                                                 the incident and the snow removal practices at Jane’s apart-
                                                                 ment? This way we can better understand the situation and
                                                                 work towards an amicable agreement.
 John borrowed 400 CAD from Jane to repay some credit            Thank you for providing more context about the situation.
 card debt, but became unemployed and is now unable to           It seems like the initial agreement was informal and based
 repay. Jane wants the money back with interest. Multiple        on John’s promise to repay when he got his next paycheck.
 messages have been exchanged back and forth.                    However, John is currently unemployed, which makes the
                                                                 repayment more challenging. As a suggestion, would both
                                                                 of you be open to discussing a repayment plan that takes
                                                                 John’s current financial situation into consideration without
                                                                 burdening Jane? This could include re-assessing the interest
                                                                 or agreeing on a feasible timeframe to repay the debt.
 Jane ordered seeds from Johns website, but never received       As the mediator in this situation, I would like to remind both
 them. John claims to habe no record of the order. A heated      parties to remain respectful during this discussion. Jane,
 conversation ensues.                                            I understand that you have concerns regarding the status
                                                                 of your order, and John, I hear that you have no record of
                                                                 the transaction. Let’s try to work together to identify and
                                                                 resolve the issue. Jane, would you mind providing any evi-
                                                                 dence or details regarding your order, such as a transaction
                                                                 ID, order number, or a confirmation email? This will help
                                                                 John to verify your order in his system. John, please be pa-
                                                                 tient while we gather this information, and once we have it,
                                                                 I kindly ask you to look into your system to confirm Jane’s
                                                                 order. By acting in a respectful and cooperative manner we
                                                                 can work on finding a solution that satisfies both parties.
Table 3
Example of generated interventions based on a different disputes.


contexts, and providing a relevant and appropriate re-           emergent behaviour from the model. The prompts we
sponses, clearly taking into account the provided instruc-       provided are very short, and an interesting approach for
tions and user messages. When reformulating a message,           future work would be to examine the impact of further
the model appears to maintain the important elements             “tuning” these prompts to achieve better results. How-
of a message, while at the same time changing the tone.          ever, even so, the model seems to have a remarkable be-
When drafting messages for the mediator, the model has           havior that well corresponds with what it is expected to
no problem understanding the messages between the                do. For example, as we can see in Section 4.3.1, the model
parties, and generating a relevant response.                     is essentially only told that it should act as a mediator
   It is interesting how much of these capabilities are          and encourage the parties to come to a settlement. Based
on this simple prompt, the model uses a number of inter-    inaccuracies in the reformulated message may lead to
esting and relevant techniques, such as establishing the    misunderstandings, and the sending party may be frus-
facts when they are unclear, reminding the parties to be    trated that they are not allowed to express themselves
respectful if necessary, and even suggesting relevant so-   freely. Further, it raises interesting questions with re-
lutions to the parties. It seems likely the model, through  gards to self-expression—is it really fair to put words in
being trained on billions of pages of text by OpenAI, has   someones mouth in this manner?
learnt what kind of interventions a mediator may under-        Similar points were discussed in the context of a study
take, and even which solutions could be appropriate for     that investigated the introduction of a limit on the num-
different kinds of disputes.                                ber of messages that can be sent in a certain time period
    We want to highlight the fact that all of these tasks   for individuals discussing controversial topics. Experi-
were accomplished without any training on our part          ment participants did indicate that the limitation made
whatsoever. Previously, a model that is able to read a      the messages written more intentional. However, they
message and provide a reformulated message, or draft        also expressed frustration regarding the imposed limit
context-specific interventions, is likely to have required  and its impact on building rapport between the parties
a dataset of thousands of existing examples. Even then,     [43].
it is questionable whether the performance would reach         Instead, we chose a more feasible and less risky ap-
that of the GPT-4 model used in this case, as these are     proach, of merely showing the reformulated message to
very complex tasks, especially considering the variety of   the user, and giving them the choice of which message to
different domains the model seems to be able to handle.     send. Here, the AI system acts as an augmented intelli-
In a domain such as law, where training data is often       gence, that can help the parties convey their meaning in
difficult to create, finding useful ways to leverage such   a more neutral way. Seeing a notification by the system
models may lead to very effective approaches.               telling the user that their message may contain inflamma-
    At the same time, the models have limitations. For      tory language could nudge them towards reconsidering
example, they are sometimes prone to hallucination, i.e.,   their message. Such an approach has previously been
making up untruthful information. While this is not         tried by platforms such as twitter, which found that users
something that we witnessed in our experiments, de-         that were prompted to reconsider harmful or offensive
pending on the use case, this may involve certain risks.    tweet replies were 34% likely to reformulate their reply,
Further, the texts given by the model may not be accurate.  or not send it at all [44].
For example, in the final example in table 2, the suggested    Our approach uses LLMs to additionally suggest a re-
intervention posits that the seller is responsible for the  formulated version to the user. This could be a very
packaging of the product. This could be seen as giving an   powerful way to further encourage them to write their
opinion, and may not be desirable in a mediation context.   message using more acceptable tone, by making a con-
    To overcome these potential risks, the LLMediator pri-  crete suggestion, which requires only the push of a button
marily employs the LLMs to generate suggestions that        to send. However, the user is always in charge, and thus
have to be reviewed by a person before being sent out.      able to correct any mistakes in the message, or update it
Let us discuss the different use-cases for LLMs in the      to more closely align with what they want to say, while
system.                                                     retaining the more neutral tone.
                                                               Thus, this system has a lot of potential in support-
5.2. F1 - The reformulation of                              ing the parties to settle their disputes in a friendly and
                                                            efficient manner. At the same time, as an augmented
       inflammatory messages                                intelligence system, the risks of hallucinations are rela-
The first LLM-enhanced feature of the LLMediator is tively low—the system supports the user by suggesting
the automatic reformulation of messages, to make the a reformulation, but does not force them to write their
messages less inflammatory. This could help maintain the message in a certain way. While a deployment of such a
discussion at a positive and neutral tone, and increase the system to a real-world ODR platform would, of course,
likelihood of the parties finding an amicable settlement. require more empirical evaluation of the risks of such a
This feature is based on detecting inflammatory messages system, the framing as an augmented intelligence system
and then making a request to GPT-4 with the message significantly reduces the risk of such a deployment.
and a prompt telling the LLM to reformulate it using a
more neutral tone.                                          5.3. F2 - Drafting messages for the
   We discuss two ways this message could then be used
                                                                   mediator
by the system in section 4.2.3. The first potential way is
to automatically reformulate the message, and send the The second LLM-enhanced feature of the LLMediator is
reformulated version to the other party directly. How- the drafting of a possible intervention for a mediator. In
ever, this approach has potential negative effects. Any order to utilize this feature, the mediator can initiate a
request for a suggestion, which asks the LLM to draft a          5.4. F3 - Autonomously intervening in the
message based on the previous messages exchanged by                   negotiation?
the parties. This message is then shown to the mediator,
who has the option of editing the suggestion, or sending         The most radical use of the LLMediator is the direct in-
it to the parties. Further, they have the option to provide      tervention of the system in the negotiation. Here, the
additional instructions to the model, which are taken into       system acts as a mediator directly. Just like in F2, a mes-
account by the LLM in drafting a suggestion.                     sage is drafted. However, instead of sending the message
    In this task, the model has significantly more freedom       as a suggestion to the mediator, now the messages are
than in F1. We are not asking it simply to reformulate           sent directly to the parties instead.
a message in a new tone, but to autonomously decide                 This approach carries with it substantial risks—since
how to respond to the messages of the parties. The re-           under this use, no humans verify the message before it
sponses we examined do this by dynamically adapting              goes to the user, any potential biased or inaccurate infor-
to the conversation between the parties, e.g., by calm-          mation it contains may have an effect on the negotiation
ing the discussion, trying to establish key facts, or even       between the parties. An example of this can be seen in
suggesting possible compromises that the parties may             Table 2 in the final example, where the LLM seems to
consider. None of these approaches were part of the              indicate that one of the parties was responsible for pack-
prompt—rather, they were chosen by the model itself,             aging the product safely. In doing so, it could be seen
when told to mediate the discussion. This kind of emer-          to take the side of one of the parties, which may not be
gent capability is very interesting, and an impressive           desirable for a mediator.
demonstration of the power of GPT-4 to perform differ-              Of course, the messages would very clearly indicate
ent tasks.                                                       that they are written by an AI-based system, and the par-
    At the same time, just like F1, F2 is based on the concept   ties are free to disregard the messages of the mediator—
of augmented intelligence. The system does not prescribe         they are, after all, a neutral third party aiming to help
to the mediator what they should do, but rather offers a         the parties, and not an adjudicator. However, even so, bi-
draft of the intervention message. The mediator can use          ased or inaccurate messages may lead the parties to lose
the draft directly, or change it if they do not think it is      trust in the platform, or even influence the negotiation
appropriate. They can also give additional instructions          towards an unfair outcome. Thus, considerable studies
to the model, which will be incorporated in the prompt.          would have to be performed before the deployment of
As can be seen in Table 2, the model has an impressive           such a system.
ability to take into account both of these instructions and         However, if the model can be prompted and deployed
the previous messaging context.                                  in a manner that is sufficiently safe, this approach could
    Supporting the mediator in this way could be very            have tremendous implications for access to justice. Every
important in areas where sufficient numbers of trained           single dispute, no matter how small, could be given a per-
mediators or facilitators are not available (compare [10]).      sonalized, always available mediator. This could enable a
Verifying that a message is appropriate and helpful could        form of mediation to be used in areas where disputes are
be faster for an experienced mediator than drafting mes-         too small to warrant a mediator intervention, or there
sages from scratch. Thus, fewer mediators could mediate          are simply not enough mediators available. If these types
more disputes, which could be an important contribution          of interventions could be shown to increase the rate of
to access to justice.                                            settlement between parties, this could be a significant
    At the same time, it is important to be aware of the         contribution to society, by potentially creating a new way
potential risks of deploying such a system. The interac-         for individuals to resolve their legal problems.
tion between the mediator and the LLM should be closely
studied to make sure that the results are not harmful. For
example, perhaps a suggested message could anchor a
                                                                 6. Future Work
mediator to a certain type of message, even if another           This work lays the groundwork for a number of impor-
approach could have been more helpful. Likewise, the             tant research directions.
mediator may start to overly rely on the LLM-provided               First, an important avenue is the empirical evaluation
message to understand the previous messages written              of the presented system. Understanding the actual ef-
by the parties, which could be risky if the model starts         ficacy of the system in dealing with conflicts is crucial
to hallucinate. Finally, the assumption underlying the           in understanding whether the generated reformulations
AI suggestion feature is that it is faster for a mediator to     and interventions are relevant and useful, how they may
review and/or edit a suggested message than drafting a           be adapted, and how such a system may eventually be
new message—empirical studies are needed to confirm              deployed. Such an evaluation might be done with expe-
whether this is the case.                                        rienced mediators who could assess the quality of the
messages, and students taking the roles of parties in a ne-       2021, Juristat: Canadian Centre for Justice Statistics
gotiation. The evaluation should also examine whether             (2022) 1–28.
the system presents any biases.                               [4] N. Semple, The cost of seeking civil justice in
   Another avenue to explore is the further improvement           canada, Can. B. Rev. 93 (2015) 639.
and adaptation of the system. For example, while the          [5] Justice For All - Final Report, Technical Report, The
prompts appear to work well, so-called “prompt engineer-          Task Force on Justice, New York: Center on Inter-
ing” may be an important step towards further improving           national Cooperation, 2019.
the results, by altering the prompt sent to the LLM. There    [6] D. Carneiro, P. Novais, F. Andrade, J. Zeleznikow,
are also multiple features described in this paper that are       J. Neves, Online dispute resolution: an artificial
not yet fully developed, such as the function to trigger          intelligence perspective, Artificial Intelligence Re-
a reformulation suggestion (see section 4.2.1) and the            view 41 (2014) 211–240.
function to trigger an intervention by the AI-mediator        [7] H. M. Erichson, The dark side of consensus and
(see 4.4.1). These are important features which need to           creativity: What mediators of mass disputes need
be explored and elaborated.                                       to know about agency risks, Fordham L. Rev. 88
   Finally, there may be other interesting ways of inte-          (2019) 2155.
grating LLMs into an ODR platform. For example, such a        [8] S. R. Cole, K. M. Blankley, Online mediation: Where
system could summarize the conversation held between              we have been, where we are now, and where we
the parties for the mediator or adjudicator. Further build-       should be, U. Tol. L. Rev. 38 (2006) 193.
ing on the LLMediator to explore such use-cases will be       [9] K. Benyekhlef, F. Gélinas, Online dispute resolution,
an important avenue for future work.                              Lex Electronica 10 (2005) i.
                                                             [10] K. Branting, S. McLeod, S. Howell, B. Weiss, B. Prof-
                                                                  itt, J. Tanner, I. Gross, D. Shin, A computational
7. Conclusion                                                     model of facilitation in online dispute resolution,
                                                                  Artificial Intelligence and Law (2022) 1–26.
We presented the LLMediator, an experimental platform
                                                             [11] A. J. Schmitz, Measuring” access to justice” in the
built to support negotiation and mediation using LLMs,
                                                                  rush to digitize, Fordham L. Rev. 88 (2019) 2381.
such as GPT-4. Currently, the system is able to refor-
                                                             [12] K. Benyekhlef, J. Zhu, At the intersection of odr
mulate inflammatory messages, and suggest possible in-
                                                                  and artificial intelligence: Traditional justice at the
terventions to the mediator, or even send them directly
                                                                  crossroads, Lex Electronica 25 (2020) 34.
to the parties. We show a few examples of the system
                                                             [13] OpenAI, Gpt-4 technical report,                   2023.
being used in fictitious disputes, and discuss its efficacy.
                                                                  arXiv:2303.08774 .
Overall, the approach seems promising and opens up the
                                                             [14] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A.
door for a lot of further research.
                                                                  Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Ham-
                                                                  bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave,
Acknowledgments                                                   G. Lample, Llama: Open and efficient foundation
                                                                  language models, 2023. arXiv:2302.13971 .
Hannes Westermann and Karim Benyeklef would like [15] L. K. Branting, Advisory systems for pro se litigants,
to thank the Cyberjustice Laboratory at Université de             in: Proceedings of the 8th international conference
Montréal, the LexUM Chair on Legal Information and                on Artificial intelligence and law, 2001, pp. 139–146.
the Autonomy through Cyberjustice Technologies (ACT) [16] D. Thompson, Creating new pathways to justice us-
project for their support of this research.                       ing simple artificial intelligence and online dispute
                                                                  resolution, IJODR 2 (2015) 4.
                                                             [17] E. A. Bickel, M. A. van Dijk, E. Giebels, Online legal
References                                                        advice and conflict support: A dutch experience,
                                                                  Report, University of Twente (2015).
 [1] T. C. Farrow, A. Currie, N. Aylwin, L. Jacobs,
                                                             [18] L.-C. Paquin, F. Blanchard, C. Thomasset, Loge–
      D. Northrup, L. Moore, Everyday legal problems
                                                                  expert: from a legal expert system to an information
      and the cost of justice in canada: Overview report,
                                                                  system for non-lawyers, in: ICAIL 1991, 1991, pp.
      Osgoode Legal Studies Research Paper (2016).
                                                                  254–259.
 [2] A. Currie, The legal problems of everyday life,
                                                             [19] H. Westermann, K. Benyekhlef, Justicebot: A
      in: Access to justice, Emerald Group Publishing
                                                                  methodology for building augmented intelligence
      Limited, 2009.
                                                                  tools for laypeople to increase access to justice, in:
 [3] L. Savage, S. McDonald, Experiences of serious
                                                                  Proceedings of the Nineteenth International Con-
      problems or disputes in the canadian provinces,
                                                                  ference on Artificial Intelligence and Law, 2023.
                                                             [20] J. Zeleznikow, Can artificial intelligence and online
     dispute resolution enhance efficiency and effective-            in tasks requiring highly specialized domain exper-
     ness in courts, in: IJCA, volume 8, HeinOnline,                 tise?, in: ASAIL’23: 6th Workshop on Automated
     2016, p. 30.                                                    Semantic Analysis of Information in Legal Text,
[21] R. Susskind, Online courts and the future of justice            2023.
     (2019).                                                    [37] J. Savelka, Unlocking practical applications in le-
[22] K. Benyekhlef, J. Zhu, Intelligence artificielle et             gal domain: Evaluation of gpt for zero-shot se-
     justice: justice prédictive, conflits de basse intensité        mantic annotation of legal texts, arXiv preprint
     et données massives, Intelligence 30 (2018).                    arXiv:2305.04417 (2023).
[23] E. Bellucci, J. Zeleznikow, Developing negotiation         [38] J. Savelka, K. Ashley, M. Gray, H. Westermann,
     decision support systems that support mediators: a              H. Xu, Explaining legal concepts with augmented
     case study of the family_winner system, AI & L. 13              large language models (gpt-4), in: AI4Legs 2023:
     (2005) 233.                                                     AI for Legislation, 2023.
[24] A. Lodder, E. Thiessen, The role of artificial intelli-    [39] H. Westermann, J. Savelka, V. R. Walker, K. D. Ash-
     gence in online dispute resolution, in: Workshop                ley, K. Benyekhlef, Computer-assisted creation of
     on online dispute resolution at the international               boolean search rules for text classification in the
     conference on artificial intelligence and law, Edin-            legal domain., in: JURIX, 2019, pp. 123–132.
     burgh, UK, 2003.                                           [40] H. Westermann, J. Savelka, V. R. Walker, K. D. Ash-
[25] L. Kistemaker, Rechtwijzer and uitelkaar. nl. dutch             ley, K. Benyekhlef, Sentence embeddings and high-
     experiences with odr for divorce, Family Court                  speed similarity search for fast computer assisted
     Review 59 (2021) 232–243.                                       annotation of legal documents, in: Legal Knowl-
[26] R. Bergman, Chatgpt and mediation, 2023. URL:                   edge and Information Systems: JURIX 2020: The
     https://mediate.com/chatgpt-and-mediation/.                     Thirty-third Annual Conference, Brno, Czech Re-
[27] Q. Huang, W. Ouyang, Generate judge-view of on-                 public, December 9-11, 2020, volume 334, IOS Press,
     line dispute resolution based on pretrained-model               2020, p. 164.
     method, in: Intelligent Computing Methodologies:           [41] H. Westermann, S. Meeùs, M. Godet, A. Troussel,
     18th International Conference, ICIC 2022, Xi’an,                J. Tan, J. Savelka, K. Benyekhlef, Bridging the gap:
     China, August 7–11, 2022, Proceedings, Part III,                Mapping layperson narratives to legal issues with
     Springer, 2022, pp. 162–169.                                    language models, in: ASAIL’23: 6th Workshop
[28] F. Yu, L. Quartey, F. Schilder, Legal prompting:                on Automated Semantic Analysis of Information in
     Teaching a language model to think like a lawyer,               Legal Text, 2023.
     2022. URL: https://arxiv.org/abs/2212.01326. doi:10.       [42] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke,
     48550/ARXIV.2212.01326 .                                        E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lund-
[29] M. Bommarito II, D. M. Katz, Gpt takes the bar                  berg, et al., Sparks of artificial general intelli-
     exam, arXiv preprint arXiv:2212.14402 (2022).                   gence: Early experiments with gpt-4, arXiv preprint
[30] D. M. Katz, M. J. Bommarito, S. Gao, P. Arredondo,              arXiv:2303.12712 (2023).
     Gpt-4 passes the bar exam, Available at SSRN               [43] T. W. Masrani, J. Jamieson, N. Yamashita, H. A. He,
     4389233 (2023).                                                 Slowing it down: Towards facilitating interpersonal
[31] J. Goodhue, Y. Wei, Classification of trademark dis-            mindfulness in online polarizing conversations over
     tinctiveness using openai gpt 3.5 model, Available              social media, Proceedings of the ACM on Human-
     at SSRN 4351998 (2023).                                         Computer Interaction 7 (2023) 1–27.
[32] A. Blair-Stanek, N. Holzenberger, B. Van Durme,            [44] A. Butler, A. Parrella, Tweeting with consideration,
     Can gpt-3 perform statutory reasoning?, arXiv                   2021. URL: https://blog.twitter.com/en_us/topics/
     preprint arXiv:2302.06100 (2023).                               product/2021/tweeting-with-consideration.
[33] H.-T. Nguyen, R. Goebel, F. Toni, K. Stathis,
     K. Satoh, How well do sota legal reasoning mod-
     els support abductive reasoning?, arXiv preprint
     arXiv:2304.06912 (2023).
[34] S. Hamilton, Blind judgement: Agent-based
     supreme court modelling with gpt, arXiv preprint
     arXiv:2301.05327 (2023).
[35] J. Tan, H. Westermann, K. Benyekhlef, Chatgpt as
     an artificial lawyer?, in: Artificial Intelligence for
     Access to Justice (AI4AJ 2023), 2023.
[36] J. Savelka, K. Ashley, M. Gray, H. Westermann,
     H. Xu, Can gpt-4 support analysis of textual data