-

LLMediator: GPT-4 Assisted Online Dispute Resolution

Hannes Westermann

hannes.westermann@umontreal.ca 0 1 2 3

Jaromir Savelka

jsavelka@cs.cmu.edu 1 2 3 4

Karim Benyekhlef

karim.benyekhlef@umontreal.ca 0 1 2 3 0 Cyberjustice Laboratory, Faculté de droit, Université de Montréal , Québec , Canada 1 LLMediator , GPT-4, Access to Justice, Large Language Models, Online Dispute Resolution, ODR, Augmented Intelligence 2 Laboratory and implemented with the Quebec consumer 3 OPC, an ODR platform developed by the Cyberjustice 4 School of Computer Science, Carnegie Mellon University , Pittsburgh , USA

In this article, we introduce LLMediator, an experimental platform designed to enhance online dispute resolution (ODR) by utilizing capabilities of state-of-the-art large language models (LLMs) such as GPT-4. In the context of high-volume, low-intensity legal disputes, alternative dispute resolution methods such as negotiation and mediation ofer accessible and cooperative solutions for laypeople. These approaches can be carried out online on ODR platforms. LLMediator aims to improve the eficacy of such processes by leveraging GPT-4 to reformulate user messages, draft mediator responses, and potentially autonomously engage in the discussions. We present and discuss several features of LLMediator and conduct initial qualitative evaluations, demonstrating the potential for LLMs to support ODR and facilitate amicable settlements. The initial proof of concept is promising and opens up avenues for further research in AI-assisted negotiation and mediation.

1. Introduction volume, low-intensity disputes (such as debt, consumer and employment issues), which often afect laypeople, may often not understand what legal rules apply to their situation, which can be an impediment to enforcing their rights [2]. Even if they are aware of their rights, they may struggle to come to a resolution (see, e.g., [3]). Bringing the dispute to court could entail significant monetary, temporal and psychological costs [4]. Unresolved legal issues can be very stressful and they may even impact individuals’ health. Unresolved legal issues were estimated to cost society 746 million dollars in Canada alone in 2014 [1]. Globally, the United Nations Task Force on

Justice estimates that 1.5 billion individuals are unable

to resolve their legal issues at any point in time [5].

A popular way of increasing access to justice is the use ods, i.e., methods of solving disputes outside of the traditional judicial system [6]. Such methods include, e.g., negotiation or mediation. During negotiation, the parties communicate to each other, attempting to reach an amicable agreement. In mediation, a neutral third party is expected to establish a rapport with the parties, to Workshop on Artificial Intelligence for Access to Justice (AI4AJ 2023), (K. Benyekhlef) [12].

To foster further improvements in the usability and

eficacy of ODR systems, we present LLMediator—an experimental platform focused on exploring how large support the parties and the mediator in arriving at a settlement. LLMs have recently risen to fame as very powerful AI systems. Models such as GPT-4 [13] or LLama [14] are able to perform various tasks based on a textual prompt, which is often not too far from instructions understandable to a human. The models can also be accessed via an application programming interface (API), which allows developers to integrate the capabilities into their own applications.

LLMediator is an experimental system leveraging GPT 4 to enhance an ODR platform. The system can refor

mulate users’ messages to be, e.g., less emotional and more conducive to reaching a settlement. Further, LLof so-called alternative dispute resolution (ADR) meth- language models (LLMs) such as GPT-4 may be used to (or facilitator), taking into account previous messages cations. Yu et al. applied GPT-3 to the COLIEE legal by the parties and special instructions by the mediator. entailment task that is based on the Japanese Bar exam, Finally, LLMediator ofers an experimental feature to au- substantially improving over the existing state-of-the-art tonomously respond to the discussion by the parties. In [28]. Similarly, Bommarito II and Katz utilized GPT-3.5 this paper, we demonstrate selected features of LLMedi- for the Multistate Bar Examination [29]. The model perator and perform initial qualitative evaluations of this formed surprisingly well but not well enough to pass system. the exam. Later, Katz et al. applied GPT-4 to the entire Uniform Bar Examination (UBE) and observed the system passing the exam [30]. The use of GPT models 2. Related Work has further been explored in the context of the assessment of trademark distinctiveness [31], legal reasoning [32, 33], U.S. Supreme court judgment modeling [34] and giving legal information [35]. Likewise, GPT-4 has been used to annotate textual legal data based on annotation guidelines [36, 37] and to generate explanations of legal concepts based on sentences from statutes [38]. Here, we use the models to increase access to justice, by supporting parties in negotiation and mediation.

In this work, the key uses we explore employ GPT4 as an augmented intelligence tool, that supports the users in their decision-making as opposed to directly providing instructions or advice, which can decrease the risk associated with such tools. Previously, this approach has been explored in the legal domain to generate boolean search rules [39], perform annotations more eficiently [40], and provide legal information to laypeople [19, 41].

Using artificial intelligence (AI) to increase access to jus

tice is an often studied question. For example, systems have been built that seek to help users understand their rights in certain situations (see, e.g., [15, 16, 17, 18, 19]).

In this work, we present an approach to facilitate dispute resolution via augmented intelligence tools for negotiation and mediation.

Integrating AI into the negotiation process has previously been explored by multiple authors. One possible such integration is the display of a so-called “Best Alternative to the Negotiated Agreement” (BATNA) during negotiation, which can provide contextual information helping the users find a fair agreement (see e.g. [6, 20, 21, 22, 17]). Other approaches include using gametheory methods to elicit an acceptable settlement (see, e.g., [23, 24]), and by structuring the dialogue between parties [25]. Here, we detect messages by the parties containing inflammatory language, and suggest alterna- 3. Example use cases tive ways of conveying the message, that may be more amenable to amicable resolution. LLMediator is a web-based application that contains the

Similarly, AI systems have been built to support medi- necessary features to explore and evaluate the integration ation. Branting et al. developed a system that monitors of LLMs into the negotiation and mediation processes. messages exchanged during negotiation for situations Upon entering the system, the user is exposed to a chat that require an immediate human intervention, such as interface, that allows them to discuss their issue with the conflict escalation or inactivity. The system can further other party, with the aim to come to an amicable solution, suggest standard messages, that may be relevant to a par- as can be seen in Figure 1. In the experimental prototype, ticular dispute, to the facilitator [10]. Researchers have the user can select whether they wish to intervene as one further investigated the possibility of using ChatGPT to of the parties (John or Jane) or the mediator. This part of suggest relevant questions and potential party priorities the platform is similar to other existing ODR platforms to to support the mediator in their role [26]. In this work, that allow the parties to communicate in a convenient we evaluate the ability of LLMs to generate suggestions and asynchronous manner without having to physically for the mediator, or potentially even intervene in a ne- relocate. gotiation autonomously, which could support mediators However, the LLMediator platform also has several or even be used to provide certain mediation services novel aspects, that use the power of LLMs to support in areas where traditional mediation is not feasible (e.g., various tasks in the mediation platform. These features due to the costs). include:

Huang and Ouyang proposed a technique for developing a judge perspective generation model in the context of ODR. The approach employs pre-trained LLMs, and is adaptable to various scenarios through minimal low-code ifne-tuning [ 27].

Our work explores the use of LLM models, e.g., GPT-4, to support legal tasks. There has been a growing interest in exploring capabilities of GPT models in such appli1. F1 - Suggesting reformulations of inflammatory messages to be less emotional and, hence, more conducive to finding an amicable settlement 2. F2 - Drafting message suggestions for mediators

to more eficiently intervene in a discussion 3. F3 - Potentially, autonomously intervening in a discussion to guide the parties towards an amicable solution

The technical implementation of these features is described in Section 4. The mediator can press a button in the interface to

generate a draft intervention. This feature uses GPT-4 to 3.1. F1 - Reformulating inflammatory read the previously sent messages and draft a suggested messages message that gently guides the parties toward a friendly resolution. Figure 3 shows such a suggested intervention.

Emotions can run high in negotiation situations. In Fig- This message may serve as a starting point for the ure 2, the user intends to send a message that is inflam- mediator to decide how to intervene. They have the matory, and may derail attempts to find a mutually ac- option to send the message, or edit it and then send it. ceptable solution. The LLMediator platform detects this Further, they can provide additional instructions to GPTlanguage, and ofers an alternative formulation to the 4, guiding the model to emphasize certain aspects. For user, generated by GPT-4. The user can choose to send example, the mediator may guide the parties to consider the original message, send the reformulated message di- the packaging of the camera. Figure 4 shows a screenshot rectly, or edit and then send the reformulated message. of what happens when the mediator instructs the model The suggestion may nudge the user toward stating their to explore this issue with the parties. request in a more neutral fashion, which could be more helpful for arriving at an amicable resolution to their dispute.

3.3. F3 - Autonomously intervening in the

negotiation?

3.2. F2 - Drafting messages for the mediator When negotiation is in deadlock or does not result in

a settlement, it may be useful for a neutral, third-party mediator to intervene in the discussion. Let us consider the role of the mediator to see how the LLMediator can support them in the task of encouraging the parties to arrive to a friendly resolution.

In some situations, it may make sense to allow the model

to automatically intervene in the negotiation. This could be the case when the dispute value is too low for employing a human mediator, or if there are not enough mediators to attend to all disputes in a certain area.

In Figure 5, one of the parties requested an intervention by the LLMediator, to move the discussion forward. The LLMediator autonomously generated a message and sent it to the parties, suggesting a few possible options to encourage a settlement. There may also be other triggers to move the system to intervene, such as a period of inactivity, or inflammatory messages being sent (see Section 4.4.1). As will be discussed in Section 5.4, this approach may carry with it certain risks that would need to be carefulyl studied before it could be deployed.

We have now seen the three current capabilities of the LLMediator to support the parties in a mediation. As we can see, the system suggests relevant and contextadapted reformulations and interventions, that may encourage the parties to use calmer language, or save mediator’s time. Next, let us take a look at the technical considerations behind the system.

4.1. Large language model used

For the current version of the system, we used GPT-4 (Generative Pre-trained Transformer). GPT-4 is a multimodal pre-trained LLM created by OpenAI [13]. The model has shown impressive performance on a wide variety of tasks, including passing the Uniform Bar Examination [30]. Some researchers even argue that the model shows sparks of general intelligence [42]. We interacted with the 8k token version of the model via the OpenAI API.1

4.2. F1 - Reformulating inflammatory messages The first capability of the LLMediator is the reformulating

4. Technical considerations of inflammatory messages. The motivation behind this feature is that individuals may have a lot of emotions For the features described in the preceding section, a attached to their dispute, which could be reflected in number of choices need to be considered. These involve the tone of the messages. Such messages may increase questions such as when the features should be activated, the tension and aggravation between the parties rather how the LLM should be prompted to achieve the best than decrease it, making it less likely that the parties results, and what to do with the resulting text. In this will find an amicable agreement, which is the goal of the section, we will explore these considerations. negotiation process.

1GitHub: OpenAI Python Library. Available at: https://github.com/

openai/openai-python [Accessed 2023-02-09] immediate intervention was necessary, and suggest standard message responses. 3. Using an of-the-shelf machine learning model.

Sentiment detection is a well-studied field, with many available models. These models could be used to detect messages with a negative polarity, or even to detect certain emotions (such as angry or sad), and use thresholds to trigger an intervention. 4. Using keyword search. The simplest approach in detecting inflammatory words is simply to scan the message sent by the user for inflammatory words, such as “a**hole”. 5. The user may decide that they would like a suggestion to reformulate a message, e.g., if they are aware that their message may come across as angry or defensive, and would like a suggestion as to how they may express themselves in a more neutral fashion.

For the LLMediator platform, the main focus is investigating whether the model is able to reformulate the message in a useful way. Therefore, we chose options 4 and 5. The system scans the message for inflammatory terms, and suggests a reformulation if any such terms are found. Further, the user has the option to manually activate the reformulation dialog. More sophisticated methods to detect inflammatory messages will be investigated in future work.

4.2.2. Reformulating the message

Once the inflammatory message has been detected, the next step is to prepare a reformulated message with a more constructive tone, while maintaining the core meaning of the message. The message is thus sent to an LLM, together with a prompt instructing the LLM to reformulate the message. For the LLMediator, we are using the GPT-4 API provided by OpenAI, together with the following prompt:

“You are an ODR platform. You are given the chat message of a party. Reformulate the message to maintain the content, but make it less confrontational and more encouraging for an amicable settlement. Respond directly with the reformulated message, do not explain.”

In order to overcome the issue, the LLMediator contains a feature that aims to reformulate the message of the user to a more neutral tone, while preserving the overall semantic content of the message. This way, the parties are able to communicate their opinions, without escalating the tone of the conversation.

4.2.1. Detect a message requiring intervention

In order to only intervene when necessary, the LLMediator needs a way to detect whether a message contains inflammatory language and would benefit from reformulation. Thus, the first step is to classify the tone of the message. It should be noted that the level of emotions and anger in a message that should lead to an intervention is a policy decision—diferent thresholds may lead to diferent results. Further, one could imagine multiple methods of detecting whether a message may be inflammatory.

These methods include: 1. Contacting an LLM, such as GPT-4, with every message and inquiring whether it is inflammatory or not. Depending on the volume of messages on a platform, this method may be expensive and 4.2.3. Showing the reformulation to the user introduce latency into the platform, which may be confusing for the user, since the message needs Once the inflammatory message has been detected and to be analyzed before it is sent to the other party. reformulated, it needs to be re-introduced into the system in a useful manner. Ways of doing this include: 2. Training a bespoke machine learning model to detect whether a message warrants intervention.

This approach was used by the authors in [10].

The researchers trained a model on previous messages from an ODR court, to detect whether an

1. The reformulated message could be sent directly

to the other party. 2. The user could be provided with the reformulated version of their message, and given the option to send or edit the reformulated message, or send their original message.

For the LLMediator, we chose the second option, since this preserves the control of the user over their communication, while potentially nudging them toward a more positive tone. The user is thus shown the reformulated message, and has the option to either accept and send the reformulated version, edit the reformulated version and then send it, or to send the original message.

4.2.4. Additional examples

that a mediator may want to use, based on a discussion, were previously explored in [10]. In our research, the suggested messages are generated ad hoc using an LLM, in our case the GPT-4 API provided by OpenAI. The mediator can launch the draft feature by clicking a button in their chat interface.

4.3.1. Generating the message suggestion

In order to generate the message, we use the following prompt:

“ You are a mediator. Your goal is to guide the discussion of two parties towards an amicable settlement that is acceptable to both parties. Respond to this communication between the parties. Stick to the role of the mediator - do not complete the dialog of the parties. Remain neutral, do not take the side of any party. ”

This prompt was arrived at after empirical investigations. Further, we provide the model with a context of the 10 latest messages in the conversation. In case the mediator enters an additional instruction (see figure 4, these instructions are also sent to the model. Based on this context, the model can gain an understanding of the discussion and hopefully provide a helpful draft message to the mediator.

4.3.2. Further examples

Table 2 shows a number of examples where the situation described above in section 3 was used, after an additional follow-up by John inquiring why he should be liable when the postal service is clearly at fault. The table presents the suggested intervention by the model, optionally together with additional instructions provided by the mediator.

4.4. F3 - Autonomously intervening in the negotiation?

The most radical use of the LLMediator is the direct intervention of the system in the negotiation. Here, the system acts as a mediator itself. While this could be a very interesting and powerful use-case, it also poses a 4.3. F2 - Drafting messages for the number of substantial risks, as will be explored in section mediator 5.4. Thus, considerable studies would have to be performed prior to the deployment of such a system. Here, Another feature of the LLMediator is to provide a sug- it is presented as purely experimental feature. gestion for an intervention that a mediator might want to send to the parties of a dispute. In mediation, a third- 4.4.1. Triggers party mediator aims to intervene in a discussion to guide it toward a friendly settlement between the parties. Such For the LLMediator to be able to intervene, there needs to an intervention can be very powerful and positive. be a trigger to determine when it should send a messages.

In the LLMediator, we provide a feature that allows Some possibilities for such a trigger include: the mediator to receive a suggestion for a message that they may consider as an intervention in a discussion 1. Intervening when there has been a period of inby the parties. The suggestion of pre-defined messages activity (compare [10])

Mediator instructions

None Inquire whether there might be an insurance ofered by the trading platform used Ask the parties to clarify the model, value and state of the sold good.

Suggested intervention

Thank you for expressing your concern, John. It’s clear that the situation isn’t ideal for either party. One possible solution could be to file a claim with the postal service to seek reimbursement for the damaged camera. That way, Jane can receive compensation for the broken camera and the responsibility would shift to the postal service. Would both of you be open to trying this approach to reach a resolution? John, I understand your concern. It might be possible that the trading platform you have used for the transaction ofers some form of insurance or buyer/seller protection. In order to consider this as an option, could you please let us know which platform you used for the transaction and if they ofer anything in this regard? This might help both of you reach a fair and amicable resolution.

I understand your concerns, John. However, it’s important to consider that part of the responsibility lies in the packaging of the item to ensure its safe delivery. In order to evaluate the options more fairly, could both of you please provide more information about the camera, such as the model and the estimated value, as well as its condition at the time of the sale? This will allow us to further discuss the possible solutions mentioned earlier and find a resolution that both parties find satisfactory.

4.4.2. Generating the message

Since the focus of the LLMediator is to evaluate the capability of the LLM models to interact with the user in a useful way, we have currently only implemented the last option.

2. Intervening when the discussions between the the messages generated by the system. parties are getting heated (compare [10] and section 4.2.1). 3. Intervening every few messages (e.g., every 10 5. Discussion

messages sent) to shape and guide the discussion. In this paper, we have demonstrated selected capabilities 4. Intervening when asked to by either of the par- of LLMediator, an experimental platform to explore the ties. use of LLMs in an ODR context. The platform currently ofers several features, including (F1) the reformulation of messages to a more neutral tone, (F2) the assisting of the mediators by providing suggested interventions, and (F3) the optional direct intervention of the system in the negotiation. Let us discuss selected aspects of these diferent features, including their potential and possible risks.

Once it has been determined that the system should intervene, the previous messages are sent to the LLM, which will generate a response that can be provided by the system to the user. In order to generate the message, the same procedure as described above in section 4.3.1 is used. However, in this case, the message is not given to the human mediator, but rather sent directly to the parties in the conversation.

4.4.3. Additional examples To better understand the types of messages suggested by the LLMediator, table 3 lists a few example disputes, and 5.1. The large language models An important takeaway of this research is the power of GPT-4. We use the model to reformulate messages to be less inflammatory, and to suggest or directly send interventions to negotiating parties.

The capability of the model to accomplish these tasks is very promising. While a more in-depth evaluation is required to better understand the diferent failuremodels and fully understand the performance, the model seems to be capable of understanding a wide variety of

Dispute summary

A tenant (Jane) wants specific performance and damages for a water leak that they supposedly informed their landlord (John) of 5 weeks ago, claiming the landlord did not respond. The landlord claims that they were never informed. A person (John) wants damages for lost wages, after falling on a road in front of the other parties apartment. The other party (Jane) claims that they did clear the snow that day. John borrowed 400 CAD from Jane to repay some credit card debt, but became unemployed and is now unable to repay. Jane wants the money back with interest. Multiple messages have been exchanged back and forth. Jane ordered seeds from Johns website, but never received them. John claims to habe no record of the order. A heated conversation ensues.

Generated intervention

As a mediator, I would like to help Jane and John resolve this issue. It appears there may be a misunderstanding about the communication taken place. Firstly, let’s try to establish the facts. Jane, could you please provide more information about when and how you informed John about the water leak? And John, is there any possibility that you might have missed or overlooked this communication? Let’s work together to find a fair and acceptable solution for both parties.

As your mediator, I understand that both of you have concerns and perspectives on this issue. John, you experienced an accident that resulted in lost wages due to the snow outside Jane’s apartment. Jane, you claim that you had cleared the snow earlier that day. To move forward, let’s first establish the extent of responsibility each party has in this situation. This includes discussing the circumstances of the accident further and any relevant information, such as local laws or regulations regarding snow removal. Would you both be willing to discuss in more detail the specifics of the incident and the snow removal practices at Jane’s apartment? This way we can better understand the situation and work towards an amicable agreement.

Thank you for providing more context about the situation.

It seems like the initial agreement was informal and based on John’s promise to repay when he got his next paycheck.

However, John is currently unemployed, which makes the repayment more challenging. As a suggestion, would both of you be open to discussing a repayment plan that takes John’s current financial situation into consideration without burdening Jane? This could include re-assessing the interest or agreeing on a feasible timeframe to repay the debt.

As the mediator in this situation, I would like to remind both parties to remain respectful during this discussion. Jane, I understand that you have concerns regarding the status of your order, and John, I hear that you have no record of the transaction. Let’s try to work together to identify and resolve the issue. Jane, would you mind providing any evidence or details regarding your order, such as a transaction ID, order number, or a confirmation email? This will help John to verify your order in his system. John, please be patient while we gather this information, and once we have it, I kindly ask you to look into your system to confirm Jane’s order. By acting in a respectful and cooperative manner we can work on finding a solution that satisfies both parties. contexts, and providing a relevant and appropriate re- emergent behaviour from the model. The prompts we sponses, clearly taking into account the provided instruc- provided are very short, and an interesting approach for tions and user messages. When reformulating a message, future work would be to examine the impact of further the model appears to maintain the important elements “tuning” these prompts to achieve better results. Howof a message, while at the same time changing the tone. ever, even so, the model seems to have a remarkable beWhen drafting messages for the mediator, the model has havior that well corresponds with what it is expected to no problem understanding the messages between the do. For example, as we can see in Section 4.3.1, the model parties, and generating a relevant response. is essentially only told that it should act as a mediator

It is interesting how much of these capabilities are and encourage the parties to come to a settlement. Based on this simple prompt, the model uses a number of inter- inaccuracies in the reformulated message may lead to esting and relevant techniques, such as establishing the misunderstandings, and the sending party may be frusfacts when they are unclear, reminding the parties to be trated that they are not allowed to express themselves respectful if necessary, and even suggesting relevant so- freely. Further, it raises interesting questions with relutions to the parties. It seems likely the model, through gards to self-expression—is it really fair to put words in being trained on billions of pages of text by OpenAI, has someones mouth in this manner? learnt what kind of interventions a mediator may under- Similar points were discussed in the context of a study take, and even which solutions could be appropriate for that investigated the introduction of a limit on the numdiferent kinds of disputes. ber of messages that can be sent in a certain time period

We want to highlight the fact that all of these tasks for individuals discussing controversial topics. Experiwere accomplished without any training on our part ment participants did indicate that the limitation made whatsoever. Previously, a model that is able to read a the messages written more intentional. However, they message and provide a reformulated message, or draft also expressed frustration regarding the imposed limit context-specific interventions, is likely to have required and its impact on building rapport between the parties a dataset of thousands of existing examples. Even then, [43]. it is questionable whether the performance would reach Instead, we chose a more feasible and less risky apthat of the GPT-4 model used in this case, as these are proach, of merely showing the reformulated message to very complex tasks, especially considering the variety of the user, and giving them the choice of which message to diferent domains the model seems to be able to handle. send. Here, the AI system acts as an augmented intelliIn a domain such as law, where training data is often gence, that can help the parties convey their meaning in dificult to create, finding useful ways to leverage such a more neutral way. Seeing a notification by the system models may lead to very efective approaches. telling the user that their message may contain inflamma

At the same time, the models have limitations. For tory language could nudge them towards reconsidering example, they are sometimes prone to hallucination, i.e., their message. Such an approach has previously been making up untruthful information. While this is not tried by platforms such as twitter, which found that users something that we witnessed in our experiments, de- that were prompted to reconsider harmful or ofensive pending on the use case, this may involve certain risks. tweet replies were 34% likely to reformulate their reply, Further, the texts given by the model may not be accurate. or not send it at all [44].

For example, in the final example in table 2, the suggested Our approach uses LLMs to additionally suggest a reintervention posits that the seller is responsible for the formulated version to the user. This could be a very packaging of the product. This could be seen as giving an powerful way to further encourage them to write their opinion, and may not be desirable in a mediation context. message using more acceptable tone, by making a con

To overcome these potential risks, the LLMediator pri- crete suggestion, which requires only the push of a button marily employs the LLMs to generate suggestions that to send. However, the user is always in charge, and thus have to be reviewed by a person before being sent out. able to correct any mistakes in the message, or update it Let us discuss the diferent use-cases for LLMs in the to more closely align with what they want to say, while system. retaining the more neutral tone.

Thus, this system has a lot of potential in support5.2. F1 - The reformulation of ing the parties to settle their disputes in a friendly and eficient manner. At the same time, as an augmented inflammatory messages intelligence system, the risks of hallucinations are relatively low—the system supports the user by suggesting a reformulation, but does not force them to write their message in a certain way. While a deployment of such a system to a real-world ODR platform would, of course, require more empirical evaluation of the risks of such a system, the framing as an augmented intelligence system significantly reduces the risk of such a deployment.

The first LLM-enhanced feature of the LLMediator is

the automatic reformulation of messages, to make the messages less inflammatory. This could help maintain the discussion at a positive and neutral tone, and increase the likelihood of the parties finding an amicable settlement.

This feature is based on detecting inflammatory messages and then making a request to GPT-4 with the message and a prompt telling the LLM to reformulate it using a more neutral tone. 5.3. F2 - Drafting messages for the

We discuss two ways this message could then be used mediator by the system in section 4.2.3. The first potential way is to automatically reformulate the message, and send the The second LLM-enhanced feature of the LLMediator is reformulated version to the other party directly. How- the drafting of a possible intervention for a mediator. In ever, this approach has potential negative efects. Any order to utilize this feature, the mediator can initiate a request for a suggestion, which asks the LLM to draft a 5.4. F3 - Autonomously intervening in the message based on the previous messages exchanged by negotiation? the parties. This message is then shown to the mediator, who has the option of editing the suggestion, or sending The most radical use of the LLMediator is the direct init to the parties. Further, they have the option to provide tervention of the system in the negotiation. Here, the additional instructions to the model, which are taken into system acts as a mediator directly. Just like in F2, a mesaccount by the LLM in drafting a suggestion. sage is drafted. However, instead of sending the message

In this task, the model has significantly more freedom as a suggestion to the mediator, now the messages are than in F1. We are not asking it simply to reformulate sent directly to the parties instead. a message in a new tone, but to autonomously decide This approach carries with it substantial risks—since how to respond to the messages of the parties. The re- under this use, no humans verify the message before it sponses we examined do this by dynamically adapting goes to the user, any potential biased or inaccurate inforto the conversation between the parties, e.g., by calm- mation it contains may have an efect on the negotiation ing the discussion, trying to establish key facts, or even between the parties. An example of this can be seen in suggesting possible compromises that the parties may Table 2 in the final example, where the LLM seems to consider. None of these approaches were part of the indicate that one of the parties was responsible for packprompt—rather, they were chosen by the model itself, aging the product safely. In doing so, it could be seen when told to mediate the discussion. This kind of emer- to take the side of one of the parties, which may not be gent capability is very interesting, and an impressive desirable for a mediator. demonstration of the power of GPT-4 to perform difer- Of course, the messages would very clearly indicate ent tasks. that they are written by an AI-based system, and the par

At the same time, just like F1, F2 is based on the concept ties are free to disregard the messages of the mediator— of augmented intelligence. The system does not prescribe they are, after all, a neutral third party aiming to help to the mediator what they should do, but rather ofers a the parties, and not an adjudicator. However, even so, bidraft of the intervention message. The mediator can use ased or inaccurate messages may lead the parties to lose the draft directly, or change it if they do not think it is trust in the platform, or even influence the negotiation appropriate. They can also give additional instructions towards an unfair outcome. Thus, considerable studies to the model, which will be incorporated in the prompt. would have to be performed before the deployment of As can be seen in Table 2, the model has an impressive such a system. ability to take into account both of these instructions and However, if the model can be prompted and deployed the previous messaging context. in a manner that is suficiently safe, this approach could

Supporting the mediator in this way could be very have tremendous implications for access to justice. Every important in areas where suficient numbers of trained single dispute, no matter how small, could be given a permediators or facilitators are not available (compare [10]). sonalized, always available mediator. This could enable a Verifying that a message is appropriate and helpful could form of mediation to be used in areas where disputes are be faster for an experienced mediator than drafting mes- too small to warrant a mediator intervention, or there sages from scratch. Thus, fewer mediators could mediate are simply not enough mediators available. If these types more disputes, which could be an important contribution of interventions could be shown to increase the rate of to access to justice. settlement between parties, this could be a significant

At the same time, it is important to be aware of the contribution to society, by potentially creating a new way potential risks of deploying such a system. The interac- for individuals to resolve their legal problems. tion between the mediator and the LLM should be closely studied to make sure that the results are not harmful. For 6. Future Work example, perhaps a suggested message could anchor a mediator to a certain type of message, even if another approach could have been more helpful. Likewise, the mediator may start to overly rely on the LLM-provided message to understand the previous messages written by the parties, which could be risky if the model starts to hallucinate. Finally, the assumption underlying the AI suggestion feature is that it is faster for a mediator to review and/or edit a suggested message than drafting a new message—empirical studies are needed to confirm whether this is the case.

This work lays the groundwork for a number of impor

tant research directions.

First, an important avenue is the empirical evaluation of the presented system. Understanding the actual efifcacy of the system in dealing with conflicts is crucial in understanding whether the generated reformulations and interventions are relevant and useful, how they may be adapted, and how such a system may eventually be deployed. Such an evaluation might be done with experienced mediators who could assess the quality of the messages, and students taking the roles of parties in a ne- 2021, Juristat: Canadian Centre for Justice Statistics gotiation. The evaluation should also examine whether (2022) 1–28. the system presents any biases. [4] N. Semple, The cost of seeking civil justice in

Another avenue to explore is the further improvement canada, Can. B. Rev. 93 (2015) 639. and adaptation of the system. For example, while the [5] Justice For All - Final Report, Technical Report, The prompts appear to work well, so-called “prompt engineer- Task Force on Justice, New York: Center on Intering” may be an important step towards further improving national Cooperation, 2019. the results, by altering the prompt sent to the LLM. There [6] D. Carneiro, P. Novais, F. Andrade, J. Zeleznikow, are also multiple features described in this paper that are J. Neves, Online dispute resolution: an artificial not yet fully developed, such as the function to trigger intelligence perspective, Artificial Intelligence Rea reformulation suggestion (see section 4.2.1) and the view 41 (2014) 211–240. function to trigger an intervention by the AI-mediator [7] H. M. Erichson, The dark side of consensus and (see 4.4.1). These are important features which need to creativity: What mediators of mass disputes need be explored and elaborated. to know about agency risks, Fordham L. Rev. 88

Finally, there may be other interesting ways of inte- (2019) 2155. grating LLMs into an ODR platform. For example, such a [8] S. R. Cole, K. M. Blankley, Online mediation: Where system could summarize the conversation held between we have been, where we are now, and where we the parties for the mediator or adjudicator. Further build- should be, U. Tol. L. Rev. 38 (2006) 193. ing on the LLMediator to explore such use-cases will be [9] K. Benyekhlef, F. Gélinas, Online dispute resolution, an important avenue for future work. Lex Electronica 10 (2005) i. [10] K. Branting, S. McLeod, S. Howell, B. Weiss, B. Profitt, J. Tanner, I. Gross, D. Shin, A computational 7. Conclusion model of facilitation in online dispute resolution, Artificial Intelligence and Law (2022) 1–26.

We presented the LLMediator, an experimental platform [11] A. J. Schmitz, Measuring” access to justice” in the built to support negotiation and mediation using LLMs, rush to digitize, Fordham L. Rev. 88 (2019) 2381. such as GPT-4. Currently, the system is able to refor- [12] K. Benyekhlef, J. Zhu, At the intersection of odr mulate inflammatory messages, and suggest possible in- and artificial intelligence: Traditional justice at the terventions to the mediator, or even send them directly crossroads, Lex Electronica 25 (2020) 34. to the parties. We show a few examples of the system [13] OpenAI, Gpt-4 technical report, 2023. being used in fictitious disputes, and discuss its eficacy.

Overall, the approach seems promising and opens up the [14] aHr.XTiovu:v2r3o0n3,.T0.8L7a7vr4i.l, G. Izacard, X. Martinet, M.-A. door for a lot of further research. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, Acknowledgments G. Lample, Llama: Open and eficient foundation language models, 2023. arXiv:2302.13971.

Hannes Westermann and Karim Benyeklef would like [15] L. K. Branting, Advisory systems for pro se litigants, to thank the Cyberjustice Laboratory at Université de in: Proceedings of the 8th international conference Montréal, the LexUM Chair on Legal Information and on Artificial intelligence and law, 2001, pp. 139–146. the Autonomy through Cyberjustice Technologies (ACT) [16] D. Thompson, Creating new pathways to justice usproject for their support of this research. ing simple artificial intelligence and online dispute resolution, IJODR 2 (2015) 4. [17] E. A. Bickel, M. A. van Dijk, E. Giebels, Online legal References advice and conflict support: A dutch experience, Report, University of Twente (2015). [1] T. C. Farrow, A. Currie, N. Aylwin, L. Jacobs, [18] L.-C. Paquin, F. Blanchard, C. Thomasset, Loge– D. Northrup, L. Moore, Everyday legal problems expert: from a legal expert system to an information and the cost of justice in canada: Overview report, system for non-lawyers, in: ICAIL 1991, 1991, pp.

Osgoode Legal Studies Research Paper (2016). 254–259. [2] A. Currie, The legal problems of everyday life, [19] H. Westermann, K. Benyekhlef, Justicebot: A in: Access to justice, Emerald Group Publishing methodology for building augmented intelligence Limited, 2009. tools for laypeople to increase access to justice, in: [3] L. Savage, S. McDonald, Experiences of serious Proceedings of the Nineteenth International Conproblems or disputes in the canadian provinces, ference on Artificial Intelligence and Law, 2023.

[20] J. Zeleznikow, Can artificial intelligence and online dispute resolution enhance eficiency and efective- in tasks requiring highly specialized domain experness in courts, in: IJCA, volume 8, HeinOnline, tise?, in: ASAIL’23: 6th Workshop on Automated 2016, p. 30. Semantic Analysis of Information in Legal Text, [21] R. Susskind, Online courts and the future of justice 2023.

(2019). [37] J. Savelka, Unlocking practical applications in le[22] K. Benyekhlef, J. Zhu, Intelligence artificielle et gal domain: Evaluation of gpt for zero-shot sejustice: justice prédictive, conflits de basse intensité mantic annotation of legal texts, arXiv preprint et données massives, Intelligence 30 (2018). arXiv:2305.04417 (2023). [23] E. Bellucci, J. Zeleznikow, Developing negotiation [38] J. Savelka, K. Ashley, M. Gray, H. Westermann, decision support systems that support mediators: a H. Xu, Explaining legal concepts with augmented case study of the family_winner system, AI & L. 13 large language models (gpt-4), in: AI4Legs 2023: (2005) 233. AI for Legislation, 2023. [24] A. Lodder, E. Thiessen, The role of artificial intelli- [39] H. Westermann, J. Savelka, V. R. Walker, K. D. Ashgence in online dispute resolution, in: Workshop ley, K. Benyekhlef, Computer-assisted creation of on online dispute resolution at the international boolean search rules for text classification in the conference on artificial intelligence and law, Edin- legal domain., in: JURIX, 2019, pp. 123–132. burgh, UK, 2003. [40] H. Westermann, J. Savelka, V. R. Walker, K. D. Ash[25] L. Kistemaker, Rechtwijzer and uitelkaar. nl. dutch ley, K. Benyekhlef, Sentence embeddings and highexperiences with odr for divorce, Family Court speed similarity search for fast computer assisted Review 59 (2021) 232–243. annotation of legal documents, in: Legal Knowl[26] R. Bergman, Chatgpt and mediation, 2023. URL: edge and Information Systems: JURIX 2020: The https://mediate.com/chatgpt-and-mediation/. Thirty-third Annual Conference, Brno, Czech Re[27] Q. Huang, W. Ouyang, Generate judge-view of on- public, December 9-11, 2020, volume 334, IOS Press, line dispute resolution based on pretrained-model 2020, p. 164. method, in: Intelligent Computing Methodologies: [41] H. Westermann, S. Meeùs, M. Godet, A. Troussel, 18th International Conference, ICIC 2022, Xi’an, J. Tan, J. Savelka, K. Benyekhlef, Bridging the gap: China, August 7–11, 2022, Proceedings, Part III, Mapping layperson narratives to legal issues with Springer, 2022, pp. 162–169. language models, in: ASAIL’23: 6th Workshop [28] F. Yu, L. Quartey, F. Schilder, Legal prompting: on Automated Semantic Analysis of Information in Teaching a language model to think like a lawyer, Legal Text, 2023. 2022. URL: https://arxiv.org/abs/2212.01326. doi:10. [42] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, 48550/ARXIV.2212.01326. E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lund[29] M. Bommarito II, D. M. Katz, Gpt takes the bar berg, et al., Sparks of artificial general intelliexam, arXiv preprint arXiv:2212.14402 (2022). gence: Early experiments with gpt-4, arXiv preprint [30] D. M. Katz, M. J. Bommarito, S. Gao, P. Arredondo, arXiv:2303.12712 (2023).

Gpt-4 passes the bar exam, Available at SSRN [43] T. W. Masrani, J. Jamieson, N. Yamashita, H. A. He, 4389233 (2023). Slowing it down: Towards facilitating interpersonal [31] J. Goodhue, Y. Wei, Classification of trademark dis- mindfulness in online polarizing conversations over tinctiveness using openai gpt 3.5 model, Available social media, Proceedings of the ACM on Humanat SSRN 4351998 (2023). Computer Interaction 7 (2023) 1–27. [32] A. Blair-Stanek, N. Holzenberger, B. Van Durme, [44] A. Butler, A. Parrella, Tweeting with consideration, Can gpt-3 perform statutory reasoning?, arXiv 2021. URL: https://blog.twitter.com/en_us/topics/ preprint arXiv:2302.06100 (2023). product/2021/tweeting-with-consideration. [33] H.-T. Nguyen, R. Goebel, F. Toni, K. Stathis,

K. Satoh, How well do sota legal reasoning models support abductive reasoning?, arXiv preprint arXiv:2304.06912 (2023). [34] S. Hamilton, Blind judgement: Agent-based supreme court modelling with gpt, arXiv preprint arXiv:2301.05327 (2023). [35] J. Tan, H. Westermann, K. Benyekhlef, Chatgpt as an artificial lawyer?, in: Artificial Intelligence for

Access to Justice (AI4AJ 2023), 2023. [36] J. Savelka, K. Ashley, M. Gray, H. Westermann,

H. Xu, Can gpt-4 support analysis of textual data