LLMediator: GPT-4 Assisted Online Dispute Resolution Hannes Westermann1,∗ , Jaromir Savelka2 and Karim Benyekhlef1 1 Cyberjustice Laboratory, Faculté de droit, Université de Montréal, Québec, Canada 2 School of Computer Science, Carnegie Mellon University, Pittsburgh, USA Abstract In this article, we introduce LLMediator, an experimental platform designed to enhance online dispute resolution (ODR) by utilizing capabilities of state-of-the-art large language models (LLMs) such as GPT-4. In the context of high-volume, low-intensity legal disputes, alternative dispute resolution methods such as negotiation and mediation offer accessible and cooperative solutions for laypeople. These approaches can be carried out online on ODR platforms. LLMediator aims to improve the efficacy of such processes by leveraging GPT-4 to reformulate user messages, draft mediator responses, and potentially autonomously engage in the discussions. We present and discuss several features of LLMediator and conduct initial qualitative evaluations, demonstrating the potential for LLMs to support ODR and facilitate amicable settlements. The initial proof of concept is promising and opens up avenues for further research in AI-assisted negotiation and mediation. Keywords LLMediator, GPT-4, Access to Justice, Large Language Models, Online Dispute Resolution, ODR, Augmented Intelligence 1. Introduction bring creativity in thinking about potential solutions, and to identify overlapping interests between the parties Many individuals face difficulties while resolving their that may be helpful in achieving a consensus between legal disputes. This is especially the case in areas of high- them [7, 8]. These processes can be carried out online, volume, low-intensity disputes (such as debt, consumer in so-called online dispute resolution (ODR) [9]. This and employment issues), which often affect laypeople, may have several advantages, including being more ef- i.e., individuals without legal training [1]. Laypeople ficient and convenient for the individuals, who do not may often not understand what legal rules apply to their need to physically attend a court. The cooperative and situation, which can be an impediment to enforcing their non-adversarial nature of these approaches to dispute rights [2]. Even if they are aware of their rights, they may resolution may help the parties devise an amicable so- struggle to come to a resolution (see, e.g., [3]). Bringing lution that both parties feel ownership of [6, 9, 10, 11]. the dispute to court could entail significant monetary, ODR can be a very powerful approach—e.g., in PARLe- temporal and psychological costs [4]. Unresolved legal OPC, an ODR platform developed by the Cyberjustice issues can be very stressful and they may even impact Laboratory and implemented with the Quebec consumer individuals’ health. Unresolved legal issues were esti- protection authority, 70% of cases are settled, 45% at the mated to cost society 746 million dollars in Canada alone stage of negotiation and 25% at the stage of mediation in 2014 [1]. Globally, the United Nations Task Force on [12]. Justice estimates that 1.5 billion individuals are unable To foster further improvements in the usability and to resolve their legal issues at any point in time [5]. efficacy of ODR systems, we present LLMediator—an A popular way of increasing access to justice is the use experimental platform focused on exploring how large of so-called alternative dispute resolution (ADR) meth- language models (LLMs) such as GPT-4 may be used to ods, i.e., methods of solving disputes outside of the tra- support the parties and the mediator in arriving at a settle- ditional judicial system [6]. Such methods include, e.g., ment. LLMs have recently risen to fame as very powerful negotiation or mediation. During negotiation, the par- AI systems. Models such as GPT-4 [13] or LLama [14] are ties communicate to each other, attempting to reach an able to perform various tasks based on a textual prompt, amicable agreement. In mediation, a neutral third party which is often not too far from instructions understand- is expected to establish a rapport with the parties, to able to a human. The models can also be accessed via an Workshop on Artificial Intelligence for Access to Justice (AI4AJ 2023), application programming interface (API), which allows June 19, 2023, Braga, Portugal. developers to integrate the capabilities into their own ∗ Corresponding author. applications. Envelope-Open hannes.westermann@umontreal.ca (H. Westermann); LLMediator is an experimental system leveraging GPT- jsavelka@cs.cmu.edu (J. Savelka); karim.benyekhlef@umontreal.ca 4 to enhance an ODR platform. The system can refor- (K. Benyekhlef) Orcid 0000-0002-4527-7316 (H. Westermann); 0000-0002-3674-5456 mulate users’ messages to be, e.g., less emotional and (J. Savelka); 0000-0001-9390-556X (K. Benyekhlef) more conducive to reaching a settlement. Further, LL- © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Mediator can suggest draft messages for the mediator CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (or facilitator), taking into account previous messages cations. Yu et al. applied GPT-3 to the COLIEE legal by the parties and special instructions by the mediator. entailment task that is based on the Japanese Bar exam, Finally, LLMediator offers an experimental feature to au- substantially improving over the existing state-of-the-art tonomously respond to the discussion by the parties. In [28]. Similarly, Bommarito II and Katz utilized GPT-3.5 this paper, we demonstrate selected features of LLMedi- for the Multistate Bar Examination [29]. The model per- ator and perform initial qualitative evaluations of this formed surprisingly well but not well enough to pass system. the exam. Later, Katz et al. applied GPT-4 to the en- tire Uniform Bar Examination (UBE) and observed the system passing the exam [30]. The use of GPT models 2. Related Work has further been explored in the context of the assess- ment of trademark distinctiveness [31], legal reasoning Using artificial intelligence (AI) to increase access to jus- [32, 33], U.S. Supreme court judgment modeling [34] and tice is an often studied question. For example, systems giving legal information [35]. Likewise, GPT-4 has been have been built that seek to help users understand their used to annotate textual legal data based on annotation rights in certain situations (see, e.g., [15, 16, 17, 18, 19]). guidelines [36, 37] and to generate explanations of legal In this work, we present an approach to facilitate dispute concepts based on sentences from statutes [38]. Here, we resolution via augmented intelligence tools for negotia- use the models to increase access to justice, by supporting tion and mediation. parties in negotiation and mediation. Integrating AI into the negotiation process has previ- In this work, the key uses we explore employ GPT- ously been explored by multiple authors. One possible 4 as an augmented intelligence tool, that supports the such integration is the display of a so-called “Best Al- users in their decision-making as opposed to directly ternative to the Negotiated Agreement” (BATNA) dur- providing instructions or advice, which can decrease the ing negotiation, which can provide contextual informa- risk associated with such tools. Previously, this approach tion helping the users find a fair agreement (see e.g. has been explored in the legal domain to generate boolean [6, 20, 21, 22, 17]). Other approaches include using game- search rules [39], perform annotations more efficiently theory methods to elicit an acceptable settlement (see, [40], and provide legal information to laypeople [19, 41]. e.g., [23, 24]), and by structuring the dialogue between parties [25]. Here, we detect messages by the parties containing inflammatory language, and suggest alterna- 3. Example use cases tive ways of conveying the message, that may be more amenable to amicable resolution. LLMediator is a web-based application that contains the Similarly, AI systems have been built to support medi- necessary features to explore and evaluate the integration ation. Branting et al. developed a system that monitors of LLMs into the negotiation and mediation processes. messages exchanged during negotiation for situations Upon entering the system, the user is exposed to a chat that require an immediate human intervention, such as interface, that allows them to discuss their issue with the conflict escalation or inactivity. The system can further other party, with the aim to come to an amicable solution, suggest standard messages, that may be relevant to a par- as can be seen in Figure 1. In the experimental prototype, ticular dispute, to the facilitator [10]. Researchers have the user can select whether they wish to intervene as one further investigated the possibility of using ChatGPT to of the parties (John or Jane) or the mediator. This part of suggest relevant questions and potential party priorities the platform is similar to other existing ODR platforms to to support the mediator in their role [26]. In this work, that allow the parties to communicate in a convenient we evaluate the ability of LLMs to generate suggestions and asynchronous manner without having to physically for the mediator, or potentially even intervene in a ne- relocate. gotiation autonomously, which could support mediators However, the LLMediator platform also has several or even be used to provide certain mediation services novel aspects, that use the power of LLMs to support in areas where traditional mediation is not feasible (e.g., various tasks in the mediation platform. These features due to the costs). include: Huang and Ouyang proposed a technique for develop- 1. F1 - Suggesting reformulations of inflammatory ing a judge perspective generation model in the context messages to be less emotional and, hence, more of ODR. The approach employs pre-trained LLMs, and is conducive to finding an amicable settlement adaptable to various scenarios through minimal low-code 2. F2 - Drafting message suggestions for mediators fine-tuning [27]. to more efficiently intervene in a discussion Our work explores the use of LLM models, e.g., GPT-4, to support legal tasks. There has been a growing interest 3. F3 - Potentially, autonomously intervening in a in exploring capabilities of GPT models in such appli- discussion to guide the parties towards an amica- ble solution Figure 1: A screenshot of the LLMediator interface, showing a dispute regarding a broken camera. Figure 2: The LLMediator has detected a message that is not helpful for an amicable settlement. It suggests a reformulation that the user may consider to send instead. The technical implementation of these features is de- scribed in Section 4. The mediator can press a button in the interface to generate a draft intervention. This feature uses GPT-4 to 3.1. F1 - Reformulating inflammatory read the previously sent messages and draft a suggested messages message that gently guides the parties toward a friendly resolution. Figure 3 shows such a suggested intervention. Emotions can run high in negotiation situations. In Fig- This message may serve as a starting point for the ure 2, the user intends to send a message that is inflam- mediator to decide how to intervene. They have the matory, and may derail attempts to find a mutually ac- option to send the message, or edit it and then send it. ceptable solution. The LLMediator platform detects this Further, they can provide additional instructions to GPT- language, and offers an alternative formulation to the 4, guiding the model to emphasize certain aspects. For user, generated by GPT-4. The user can choose to send example, the mediator may guide the parties to consider the original message, send the reformulated message di- the packaging of the camera. Figure 4 shows a screenshot rectly, or edit and then send the reformulated message. of what happens when the mediator instructs the model The suggestion may nudge the user toward stating their to explore this issue with the parties. request in a more neutral fashion, which could be more helpful for arriving at an amicable resolution to their dispute. 3.3. F3 - Autonomously intervening in the negotiation? 3.2. F2 - Drafting messages for the In some situations, it may make sense to allow the model mediator to automatically intervene in the negotiation. This could be the case when the dispute value is too low for em- When negotiation is in deadlock or does not result in ploying a human mediator, or if there are not enough a settlement, it may be useful for a neutral, third-party mediators to attend to all disputes in a certain area. mediator to intervene in the discussion. Let us consider In Figure 5, one of the parties requested an interven- the role of the mediator to see how the LLMediator can tion by the LLMediator, to move the discussion forward. support them in the task of encouraging the parties to The LLMediator autonomously generated a message and arrive to a friendly resolution. Figure 3: The LLMediator suggests a possible intervention Figure 4: The LLMediator incorporates specific instructions for the mediator. by the mediator in generating a message. sent it to the parties, suggesting a few possible options 4.1. Large language model used to encourage a settlement. There may also be other trig- For the current version of the system, we used GPT-4 gers to move the system to intervene, such as a period (Generative Pre-trained Transformer). GPT-4 is a multi- of inactivity, or inflammatory messages being sent (see modal pre-trained LLM created by OpenAI [13]. The Section 4.4.1). As will be discussed in Section 5.4, this model has shown impressive performance on a wide va- approach may carry with it certain risks that would need riety of tasks, including passing the Uniform Bar Exami- to be carefulyl studied before it could be deployed. nation [30]. Some researchers even argue that the model We have now seen the three current capabilities of the shows sparks of general intelligence [42]. We interacted LLMediator to support the parties in a mediation. As with the 8k token version of the model via the OpenAI we can see, the system suggests relevant and context- API.1 adapted reformulations and interventions, that may en- courage the parties to use calmer language, or save me- diator’s time. Next, let us take a look at the technical 4.2. F1 - Reformulating inflammatory considerations behind the system. messages The first capability of the LLMediator is the reformulating 4. Technical considerations of inflammatory messages. The motivation behind this feature is that individuals may have a lot of emotions For the features described in the preceding section, a attached to their dispute, which could be reflected in number of choices need to be considered. These involve the tone of the messages. Such messages may increase questions such as when the features should be activated, the tension and aggravation between the parties rather how the LLM should be prompted to achieve the best than decrease it, making it less likely that the parties results, and what to do with the resulting text. In this will find an amicable agreement, which is the goal of the section, we will explore these considerations. negotiation process. 1 GitHub: OpenAI Python Library. Available at: https://github.com/ openai/openai-python [Accessed 2023-02-09] immediate intervention was necessary, and sug- gest standard message responses. 3. Using an off-the-shelf machine learning model. Sentiment detection is a well-studied field, with many available models. These models could be used to detect messages with a negative polar- ity, or even to detect certain emotions (such as angry or sad), and use thresholds to trigger an intervention. 4. Using keyword search. The simplest approach in detecting inflammatory words is simply to scan the message sent by the user for inflammatory words, such as “a**hole”. 5. The user may decide that they would like a sug- gestion to reformulate a message, e.g., if they are aware that their message may come across as an- gry or defensive, and would like a suggestion as to how they may express themselves in a more neutral fashion. Figure 5: The LLMediator could potentially automatically intervene in a discussion. For the LLMediator platform, the main focus is inves- tigating whether the model is able to reformulate the message in a useful way. Therefore, we chose options 4 and 5. The system scans the message for inflammatory In order to overcome the issue, the LLMediator con- terms, and suggests a reformulation if any such terms tains a feature that aims to reformulate the message of are found. Further, the user has the option to manually the user to a more neutral tone, while preserving the activate the reformulation dialog. More sophisticated overall semantic content of the message. This way, the methods to detect inflammatory messages will be inves- parties are able to communicate their opinions, without tigated in future work. escalating the tone of the conversation. 4.2.2. Reformulating the message 4.2.1. Detect a message requiring intervention Once the inflammatory message has been detected, the In order to only intervene when necessary, the LLMedi- next step is to prepare a reformulated message with a ator needs a way to detect whether a message contains more constructive tone, while maintaining the core mean- inflammatory language and would benefit from reformu- ing of the message. The message is thus sent to an LLM, lation. Thus, the first step is to classify the tone of the together with a prompt instructing the LLM to refor- message. It should be noted that the level of emotions and mulate the message. For the LLMediator, we are using anger in a message that should lead to an intervention is the GPT-4 API provided by OpenAI, together with the a policy decision—different thresholds may lead to differ- following prompt: ent results. Further, one could imagine multiple methods “You are an ODR platform. You are given the chat mes- of detecting whether a message may be inflammatory. sage of a party. Reformulate the message to maintain the These methods include: content, but make it less confrontational and more encour- 1. Contacting an LLM, such as GPT-4, with every aging for an amicable settlement. Respond directly with message and inquiring whether it is inflammatory the reformulated message, do not explain.” or not. Depending on the volume of messages on a platform, this method may be expensive and 4.2.3. Showing the reformulation to the user introduce latency into the platform, which may Once the inflammatory message has been detected and be confusing for the user, since the message needs reformulated, it needs to be re-introduced into the system to be analyzed before it is sent to the other party. in a useful manner. Ways of doing this include: 2. Training a bespoke machine learning model to detect whether a message warrants intervention. 1. The reformulated message could be sent directly This approach was used by the authors in [10]. to the other party. The researchers trained a model on previous mes- 2. The user could be provided with the reformulated sages from an ODR court, to detect whether an version of their message, and given the option to send or edit the reformulated message, or send that a mediator may want to use, based on a discussion, their original message. were previously explored in [10]. In our research, the suggested messages are generated ad hoc using an LLM, For the LLMediator, we chose the second option, since in our case the GPT-4 API provided by OpenAI. The this preserves the control of the user over their commu- mediator can launch the draft feature by clicking a button nication, while potentially nudging them toward a more in their chat interface. positive tone. The user is thus shown the reformulated message, and has the option to either accept and send 4.3.1. Generating the message suggestion the reformulated version, edit the reformulated version and then send it, or to send the original message. In order to generate the message, we use the following prompt: 4.2.4. Additional examples “ You are a mediator. Your goal is to guide the discus- sion of two parties towards an amicable settlement that is Table 1 shows a few additional examples of reformulated acceptable to both parties. Respond to this communication messages: between the parties. Stick to the role of the mediator - do not complete the dialog of the parties. Remain neutral, do Original message Reformulated message not take the side of any party. ” What the ****? I told you I wanted to remind you about the water leak weeks that I brought up the wa- This prompt was arrived at after empirical investiga- ago and you did nothing! ter leak issue a few weeks tions. Further, we provide the model with a context of Fix it or I will see you in ago. It would be great if we the 10 latest messages in the conversation. In case the court! could find a solution to ad- mediator enters an additional instruction (see figure 4, dress it before considering these instructions are also sent to the model. Based on legal action. this context, the model can gain an understanding of the You still have not repaid me It seems that the 1000 USD discussion and hopefully provide a helpful draft message the 1000 USD I lent you! I lent you hasn’t been re- to the mediator. You are the worst friend paid yet. As friends, let’s ever, we are done! discuss this issue and work towards resolving it amica- 4.3.2. Further examples bly. Table 2 shows a number of examples where the situ- Here is what happened: I noticed on April 3rd that ation described above in section 3 was used, after an I told you that the tree the tree was overhanging additional follow-up by John inquiring why he should was hanging over my lawn my lawn. Despite ad- be liable when the postal service is clearly at fault. The many on the 3rd of April. dressing the issue, it re- On the 15th, it was still mained unchanged by the table presents the suggested intervention by the model, there, so I cut it down. This 15th, which led me to cut optionally together with additional instructions provided is your ******* fault, you it down. I believe this by the mediator. could have fixed it!! situation could have been avoided if timely action 4.4. F3 - Autonomously intervening in the was taken on your part. negotiation? Table 1 Example of original and reformulated messages The most radical use of the LLMediator is the direct in- tervention of the system in the negotiation. Here, the system acts as a mediator itself. While this could be a very interesting and powerful use-case, it also poses a 4.3. F2 - Drafting messages for the number of substantial risks, as will be explored in section mediator 5.4. Thus, considerable studies would have to be per- formed prior to the deployment of such a system. Here, Another feature of the LLMediator is to provide a sug- it is presented as purely experimental feature. gestion for an intervention that a mediator might want to send to the parties of a dispute. In mediation, a third- 4.4.1. Triggers party mediator aims to intervene in a discussion to guide it toward a friendly settlement between the parties. Such For the LLMediator to be able to intervene, there needs to an intervention can be very powerful and positive. be a trigger to determine when it should send a messages. In the LLMediator, we provide a feature that allows Some possibilities for such a trigger include: the mediator to receive a suggestion for a message that they may consider as an intervention in a discussion 1. Intervening when there has been a period of in- by the parties. The suggestion of pre-defined messages activity (compare [10]) Mediator instructions Suggested intervention None Thank you for expressing your concern, John. It’s clear that the situation isn’t ideal for either party. One possible solution could be to file a claim with the postal service to seek reimbursement for the damaged camera. That way, Jane can receive compensation for the broken camera and the responsibility would shift to the postal service. Would both of you be open to trying this approach to reach a resolution? Inquire whether there might be an insurance offered by the John, I understand your concern. It might be possible that trading platform used the trading platform you have used for the transaction offers some form of insurance or buyer/seller protection. In order to consider this as an option, could you please let us know which platform you used for the transaction and if they offer anything in this regard? This might help both of you reach a fair and amicable resolution. Ask the parties to clarify the model, value and state of the I understand your concerns, John. However, it’s important sold good. to consider that part of the responsibility lies in the pack- aging of the item to ensure its safe delivery. In order to evaluate the options more fairly, could both of you please provide more information about the camera, such as the model and the estimated value, as well as its condition at the time of the sale? This will allow us to further discuss the possible solutions mentioned earlier and find a resolution that both parties find satisfactory. Table 2 Example of suggested interventions, based on the dispute around the broken camera and optional instructions provided by the mediator. 2. Intervening when the discussions between the the messages generated by the system. parties are getting heated (compare [10] and sec- tion 4.2.1). 3. Intervening every few messages (e.g., every 10 5. Discussion messages sent) to shape and guide the discussion. In this paper, we have demonstrated selected capabilities 4. Intervening when asked to by either of the par- of LLMediator, an experimental platform to explore the ties. use of LLMs in an ODR context. The platform currently Since the focus of the LLMediator is to evaluate the offers several features, including (F1) the reformulation capability of the LLM models to interact with the user in of messages to a more neutral tone, (F2) the assisting a useful way, we have currently only implemented the of the mediators by providing suggested interventions, last option. and (F3) the optional direct intervention of the system in the negotiation. Let us discuss selected aspects of these 4.4.2. Generating the message different features, including their potential and possible risks. Once it has been determined that the system should inter- vene, the previous messages are sent to the LLM, which 5.1. The large language models will generate a response that can be provided by the sys- tem to the user. In order to generate the message, the An important takeaway of this research is the power same procedure as described above in section 4.3.1 is of GPT-4. We use the model to reformulate messages used. However, in this case, the message is not given to be less inflammatory, and to suggest or directly send to the human mediator, but rather sent directly to the interventions to negotiating parties. parties in the conversation. The capability of the model to accomplish these tasks is very promising. While a more in-depth evaluation 4.4.3. Additional examples is required to better understand the different failure- models and fully understand the performance, the model To better understand the types of messages suggested by seems to be capable of understanding a wide variety of the LLMediator, table 3 lists a few example disputes, and Dispute summary Generated intervention A tenant (Jane) wants specific performance and damages for As a mediator, I would like to help Jane and John resolve a water leak that they supposedly informed their landlord this issue. It appears there may be a misunderstanding (John) of 5 weeks ago, claiming the landlord did not respond. about the communication taken place. Firstly, let’s try to The landlord claims that they were never informed. establish the facts. Jane, could you please provide more information about when and how you informed John about the water leak? And John, is there any possibility that you might have missed or overlooked this communication? Let’s work together to find a fair and acceptable solution for both parties. A person (John) wants damages for lost wages, after falling As your mediator, I understand that both of you have con- on a road in front of the other parties apartment. The other cerns and perspectives on this issue. John, you experienced party (Jane) claims that they did clear the snow that day. an accident that resulted in lost wages due to the snow outside Jane’s apartment. Jane, you claim that you had cleared the snow earlier that day. To move forward, let’s first establish the extent of responsibility each party has in this situation. This includes discussing the circumstances of the accident further and any relevant information, such as local laws or regulations regarding snow removal. Would you both be willing to discuss in more detail the specifics of the incident and the snow removal practices at Jane’s apart- ment? This way we can better understand the situation and work towards an amicable agreement. John borrowed 400 CAD from Jane to repay some credit Thank you for providing more context about the situation. card debt, but became unemployed and is now unable to It seems like the initial agreement was informal and based repay. Jane wants the money back with interest. Multiple on John’s promise to repay when he got his next paycheck. messages have been exchanged back and forth. However, John is currently unemployed, which makes the repayment more challenging. As a suggestion, would both of you be open to discussing a repayment plan that takes John’s current financial situation into consideration without burdening Jane? This could include re-assessing the interest or agreeing on a feasible timeframe to repay the debt. Jane ordered seeds from Johns website, but never received As the mediator in this situation, I would like to remind both them. John claims to habe no record of the order. A heated parties to remain respectful during this discussion. Jane, conversation ensues. I understand that you have concerns regarding the status of your order, and John, I hear that you have no record of the transaction. Let’s try to work together to identify and resolve the issue. Jane, would you mind providing any evi- dence or details regarding your order, such as a transaction ID, order number, or a confirmation email? This will help John to verify your order in his system. John, please be pa- tient while we gather this information, and once we have it, I kindly ask you to look into your system to confirm Jane’s order. By acting in a respectful and cooperative manner we can work on finding a solution that satisfies both parties. Table 3 Example of generated interventions based on a different disputes. contexts, and providing a relevant and appropriate re- emergent behaviour from the model. The prompts we sponses, clearly taking into account the provided instruc- provided are very short, and an interesting approach for tions and user messages. When reformulating a message, future work would be to examine the impact of further the model appears to maintain the important elements “tuning” these prompts to achieve better results. How- of a message, while at the same time changing the tone. ever, even so, the model seems to have a remarkable be- When drafting messages for the mediator, the model has havior that well corresponds with what it is expected to no problem understanding the messages between the do. For example, as we can see in Section 4.3.1, the model parties, and generating a relevant response. is essentially only told that it should act as a mediator It is interesting how much of these capabilities are and encourage the parties to come to a settlement. Based on this simple prompt, the model uses a number of inter- inaccuracies in the reformulated message may lead to esting and relevant techniques, such as establishing the misunderstandings, and the sending party may be frus- facts when they are unclear, reminding the parties to be trated that they are not allowed to express themselves respectful if necessary, and even suggesting relevant so- freely. Further, it raises interesting questions with re- lutions to the parties. It seems likely the model, through gards to self-expression—is it really fair to put words in being trained on billions of pages of text by OpenAI, has someones mouth in this manner? learnt what kind of interventions a mediator may under- Similar points were discussed in the context of a study take, and even which solutions could be appropriate for that investigated the introduction of a limit on the num- different kinds of disputes. ber of messages that can be sent in a certain time period We want to highlight the fact that all of these tasks for individuals discussing controversial topics. Experi- were accomplished without any training on our part ment participants did indicate that the limitation made whatsoever. Previously, a model that is able to read a the messages written more intentional. However, they message and provide a reformulated message, or draft also expressed frustration regarding the imposed limit context-specific interventions, is likely to have required and its impact on building rapport between the parties a dataset of thousands of existing examples. Even then, [43]. it is questionable whether the performance would reach Instead, we chose a more feasible and less risky ap- that of the GPT-4 model used in this case, as these are proach, of merely showing the reformulated message to very complex tasks, especially considering the variety of the user, and giving them the choice of which message to different domains the model seems to be able to handle. send. Here, the AI system acts as an augmented intelli- In a domain such as law, where training data is often gence, that can help the parties convey their meaning in difficult to create, finding useful ways to leverage such a more neutral way. Seeing a notification by the system models may lead to very effective approaches. telling the user that their message may contain inflamma- At the same time, the models have limitations. For tory language could nudge them towards reconsidering example, they are sometimes prone to hallucination, i.e., their message. Such an approach has previously been making up untruthful information. While this is not tried by platforms such as twitter, which found that users something that we witnessed in our experiments, de- that were prompted to reconsider harmful or offensive pending on the use case, this may involve certain risks. tweet replies were 34% likely to reformulate their reply, Further, the texts given by the model may not be accurate. or not send it at all [44]. For example, in the final example in table 2, the suggested Our approach uses LLMs to additionally suggest a re- intervention posits that the seller is responsible for the formulated version to the user. This could be a very packaging of the product. This could be seen as giving an powerful way to further encourage them to write their opinion, and may not be desirable in a mediation context. message using more acceptable tone, by making a con- To overcome these potential risks, the LLMediator pri- crete suggestion, which requires only the push of a button marily employs the LLMs to generate suggestions that to send. However, the user is always in charge, and thus have to be reviewed by a person before being sent out. able to correct any mistakes in the message, or update it Let us discuss the different use-cases for LLMs in the to more closely align with what they want to say, while system. retaining the more neutral tone. Thus, this system has a lot of potential in support- 5.2. F1 - The reformulation of ing the parties to settle their disputes in a friendly and efficient manner. At the same time, as an augmented inflammatory messages intelligence system, the risks of hallucinations are rela- The first LLM-enhanced feature of the LLMediator is tively low—the system supports the user by suggesting the automatic reformulation of messages, to make the a reformulation, but does not force them to write their messages less inflammatory. This could help maintain the message in a certain way. While a deployment of such a discussion at a positive and neutral tone, and increase the system to a real-world ODR platform would, of course, likelihood of the parties finding an amicable settlement. require more empirical evaluation of the risks of such a This feature is based on detecting inflammatory messages system, the framing as an augmented intelligence system and then making a request to GPT-4 with the message significantly reduces the risk of such a deployment. and a prompt telling the LLM to reformulate it using a more neutral tone. 5.3. F2 - Drafting messages for the We discuss two ways this message could then be used mediator by the system in section 4.2.3. The first potential way is to automatically reformulate the message, and send the The second LLM-enhanced feature of the LLMediator is reformulated version to the other party directly. How- the drafting of a possible intervention for a mediator. In ever, this approach has potential negative effects. Any order to utilize this feature, the mediator can initiate a request for a suggestion, which asks the LLM to draft a 5.4. F3 - Autonomously intervening in the message based on the previous messages exchanged by negotiation? the parties. This message is then shown to the mediator, who has the option of editing the suggestion, or sending The most radical use of the LLMediator is the direct in- it to the parties. Further, they have the option to provide tervention of the system in the negotiation. Here, the additional instructions to the model, which are taken into system acts as a mediator directly. Just like in F2, a mes- account by the LLM in drafting a suggestion. sage is drafted. However, instead of sending the message In this task, the model has significantly more freedom as a suggestion to the mediator, now the messages are than in F1. We are not asking it simply to reformulate sent directly to the parties instead. a message in a new tone, but to autonomously decide This approach carries with it substantial risks—since how to respond to the messages of the parties. The re- under this use, no humans verify the message before it sponses we examined do this by dynamically adapting goes to the user, any potential biased or inaccurate infor- to the conversation between the parties, e.g., by calm- mation it contains may have an effect on the negotiation ing the discussion, trying to establish key facts, or even between the parties. An example of this can be seen in suggesting possible compromises that the parties may Table 2 in the final example, where the LLM seems to consider. None of these approaches were part of the indicate that one of the parties was responsible for pack- prompt—rather, they were chosen by the model itself, aging the product safely. In doing so, it could be seen when told to mediate the discussion. This kind of emer- to take the side of one of the parties, which may not be gent capability is very interesting, and an impressive desirable for a mediator. demonstration of the power of GPT-4 to perform differ- Of course, the messages would very clearly indicate ent tasks. that they are written by an AI-based system, and the par- At the same time, just like F1, F2 is based on the concept ties are free to disregard the messages of the mediator— of augmented intelligence. The system does not prescribe they are, after all, a neutral third party aiming to help to the mediator what they should do, but rather offers a the parties, and not an adjudicator. However, even so, bi- draft of the intervention message. The mediator can use ased or inaccurate messages may lead the parties to lose the draft directly, or change it if they do not think it is trust in the platform, or even influence the negotiation appropriate. They can also give additional instructions towards an unfair outcome. Thus, considerable studies to the model, which will be incorporated in the prompt. would have to be performed before the deployment of As can be seen in Table 2, the model has an impressive such a system. ability to take into account both of these instructions and However, if the model can be prompted and deployed the previous messaging context. in a manner that is sufficiently safe, this approach could Supporting the mediator in this way could be very have tremendous implications for access to justice. Every important in areas where sufficient numbers of trained single dispute, no matter how small, could be given a per- mediators or facilitators are not available (compare [10]). sonalized, always available mediator. This could enable a Verifying that a message is appropriate and helpful could form of mediation to be used in areas where disputes are be faster for an experienced mediator than drafting mes- too small to warrant a mediator intervention, or there sages from scratch. Thus, fewer mediators could mediate are simply not enough mediators available. If these types more disputes, which could be an important contribution of interventions could be shown to increase the rate of to access to justice. settlement between parties, this could be a significant At the same time, it is important to be aware of the contribution to society, by potentially creating a new way potential risks of deploying such a system. The interac- for individuals to resolve their legal problems. tion between the mediator and the LLM should be closely studied to make sure that the results are not harmful. For example, perhaps a suggested message could anchor a 6. Future Work mediator to a certain type of message, even if another This work lays the groundwork for a number of impor- approach could have been more helpful. Likewise, the tant research directions. mediator may start to overly rely on the LLM-provided First, an important avenue is the empirical evaluation message to understand the previous messages written of the presented system. Understanding the actual ef- by the parties, which could be risky if the model starts ficacy of the system in dealing with conflicts is crucial to hallucinate. Finally, the assumption underlying the in understanding whether the generated reformulations AI suggestion feature is that it is faster for a mediator to and interventions are relevant and useful, how they may review and/or edit a suggested message than drafting a be adapted, and how such a system may eventually be new message—empirical studies are needed to confirm deployed. Such an evaluation might be done with expe- whether this is the case. rienced mediators who could assess the quality of the messages, and students taking the roles of parties in a ne- 2021, Juristat: Canadian Centre for Justice Statistics gotiation. The evaluation should also examine whether (2022) 1–28. the system presents any biases. [4] N. Semple, The cost of seeking civil justice in Another avenue to explore is the further improvement canada, Can. B. Rev. 93 (2015) 639. and adaptation of the system. For example, while the [5] Justice For All - Final Report, Technical Report, The prompts appear to work well, so-called “prompt engineer- Task Force on Justice, New York: Center on Inter- ing” may be an important step towards further improving national Cooperation, 2019. the results, by altering the prompt sent to the LLM. There [6] D. Carneiro, P. Novais, F. Andrade, J. Zeleznikow, are also multiple features described in this paper that are J. Neves, Online dispute resolution: an artificial not yet fully developed, such as the function to trigger intelligence perspective, Artificial Intelligence Re- a reformulation suggestion (see section 4.2.1) and the view 41 (2014) 211–240. function to trigger an intervention by the AI-mediator [7] H. M. Erichson, The dark side of consensus and (see 4.4.1). These are important features which need to creativity: What mediators of mass disputes need be explored and elaborated. to know about agency risks, Fordham L. Rev. 88 Finally, there may be other interesting ways of inte- (2019) 2155. grating LLMs into an ODR platform. For example, such a [8] S. R. Cole, K. M. Blankley, Online mediation: Where system could summarize the conversation held between we have been, where we are now, and where we the parties for the mediator or adjudicator. Further build- should be, U. Tol. L. Rev. 38 (2006) 193. ing on the LLMediator to explore such use-cases will be [9] K. Benyekhlef, F. Gélinas, Online dispute resolution, an important avenue for future work. Lex Electronica 10 (2005) i. [10] K. Branting, S. McLeod, S. Howell, B. Weiss, B. Prof- itt, J. Tanner, I. Gross, D. Shin, A computational 7. Conclusion model of facilitation in online dispute resolution, Artificial Intelligence and Law (2022) 1–26. We presented the LLMediator, an experimental platform [11] A. J. Schmitz, Measuring” access to justice” in the built to support negotiation and mediation using LLMs, rush to digitize, Fordham L. Rev. 88 (2019) 2381. such as GPT-4. Currently, the system is able to refor- [12] K. Benyekhlef, J. Zhu, At the intersection of odr mulate inflammatory messages, and suggest possible in- and artificial intelligence: Traditional justice at the terventions to the mediator, or even send them directly crossroads, Lex Electronica 25 (2020) 34. to the parties. We show a few examples of the system [13] OpenAI, Gpt-4 technical report, 2023. being used in fictitious disputes, and discuss its efficacy. arXiv:2303.08774 . Overall, the approach seems promising and opens up the [14] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. door for a lot of further research. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Ham- bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, Acknowledgments G. Lample, Llama: Open and efficient foundation language models, 2023. arXiv:2302.13971 . Hannes Westermann and Karim Benyeklef would like [15] L. K. Branting, Advisory systems for pro se litigants, to thank the Cyberjustice Laboratory at Université de in: Proceedings of the 8th international conference Montréal, the LexUM Chair on Legal Information and on Artificial intelligence and law, 2001, pp. 139–146. the Autonomy through Cyberjustice Technologies (ACT) [16] D. Thompson, Creating new pathways to justice us- project for their support of this research. ing simple artificial intelligence and online dispute resolution, IJODR 2 (2015) 4. [17] E. A. Bickel, M. A. van Dijk, E. Giebels, Online legal References advice and conflict support: A dutch experience, Report, University of Twente (2015). [1] T. C. Farrow, A. Currie, N. Aylwin, L. Jacobs, [18] L.-C. Paquin, F. Blanchard, C. Thomasset, Loge– D. Northrup, L. Moore, Everyday legal problems expert: from a legal expert system to an information and the cost of justice in canada: Overview report, system for non-lawyers, in: ICAIL 1991, 1991, pp. Osgoode Legal Studies Research Paper (2016). 254–259. [2] A. Currie, The legal problems of everyday life, [19] H. Westermann, K. Benyekhlef, Justicebot: A in: Access to justice, Emerald Group Publishing methodology for building augmented intelligence Limited, 2009. tools for laypeople to increase access to justice, in: [3] L. Savage, S. McDonald, Experiences of serious Proceedings of the Nineteenth International Con- problems or disputes in the canadian provinces, ference on Artificial Intelligence and Law, 2023. [20] J. Zeleznikow, Can artificial intelligence and online dispute resolution enhance efficiency and effective- in tasks requiring highly specialized domain exper- ness in courts, in: IJCA, volume 8, HeinOnline, tise?, in: ASAIL’23: 6th Workshop on Automated 2016, p. 30. Semantic Analysis of Information in Legal Text, [21] R. Susskind, Online courts and the future of justice 2023. (2019). [37] J. Savelka, Unlocking practical applications in le- [22] K. Benyekhlef, J. Zhu, Intelligence artificielle et gal domain: Evaluation of gpt for zero-shot se- justice: justice prédictive, conflits de basse intensité mantic annotation of legal texts, arXiv preprint et données massives, Intelligence 30 (2018). arXiv:2305.04417 (2023). [23] E. Bellucci, J. Zeleznikow, Developing negotiation [38] J. Savelka, K. Ashley, M. Gray, H. Westermann, decision support systems that support mediators: a H. Xu, Explaining legal concepts with augmented case study of the family_winner system, AI & L. 13 large language models (gpt-4), in: AI4Legs 2023: (2005) 233. AI for Legislation, 2023. [24] A. Lodder, E. Thiessen, The role of artificial intelli- [39] H. Westermann, J. Savelka, V. R. Walker, K. D. Ash- gence in online dispute resolution, in: Workshop ley, K. Benyekhlef, Computer-assisted creation of on online dispute resolution at the international boolean search rules for text classification in the conference on artificial intelligence and law, Edin- legal domain., in: JURIX, 2019, pp. 123–132. burgh, UK, 2003. [40] H. Westermann, J. Savelka, V. R. Walker, K. D. Ash- [25] L. Kistemaker, Rechtwijzer and uitelkaar. nl. dutch ley, K. Benyekhlef, Sentence embeddings and high- experiences with odr for divorce, Family Court speed similarity search for fast computer assisted Review 59 (2021) 232–243. annotation of legal documents, in: Legal Knowl- [26] R. Bergman, Chatgpt and mediation, 2023. URL: edge and Information Systems: JURIX 2020: The https://mediate.com/chatgpt-and-mediation/. Thirty-third Annual Conference, Brno, Czech Re- [27] Q. Huang, W. Ouyang, Generate judge-view of on- public, December 9-11, 2020, volume 334, IOS Press, line dispute resolution based on pretrained-model 2020, p. 164. method, in: Intelligent Computing Methodologies: [41] H. Westermann, S. Meeùs, M. Godet, A. Troussel, 18th International Conference, ICIC 2022, Xi’an, J. Tan, J. Savelka, K. Benyekhlef, Bridging the gap: China, August 7–11, 2022, Proceedings, Part III, Mapping layperson narratives to legal issues with Springer, 2022, pp. 162–169. language models, in: ASAIL’23: 6th Workshop [28] F. Yu, L. Quartey, F. Schilder, Legal prompting: on Automated Semantic Analysis of Information in Teaching a language model to think like a lawyer, Legal Text, 2023. 2022. URL: https://arxiv.org/abs/2212.01326. doi:10. [42] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, 48550/ARXIV.2212.01326 . E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lund- [29] M. Bommarito II, D. M. Katz, Gpt takes the bar berg, et al., Sparks of artificial general intelli- exam, arXiv preprint arXiv:2212.14402 (2022). gence: Early experiments with gpt-4, arXiv preprint [30] D. M. Katz, M. J. Bommarito, S. Gao, P. Arredondo, arXiv:2303.12712 (2023). Gpt-4 passes the bar exam, Available at SSRN [43] T. W. Masrani, J. Jamieson, N. Yamashita, H. A. He, 4389233 (2023). Slowing it down: Towards facilitating interpersonal [31] J. Goodhue, Y. Wei, Classification of trademark dis- mindfulness in online polarizing conversations over tinctiveness using openai gpt 3.5 model, Available social media, Proceedings of the ACM on Human- at SSRN 4351998 (2023). Computer Interaction 7 (2023) 1–27. [32] A. Blair-Stanek, N. Holzenberger, B. Van Durme, [44] A. Butler, A. Parrella, Tweeting with consideration, Can gpt-3 perform statutory reasoning?, arXiv 2021. URL: https://blog.twitter.com/en_us/topics/ preprint arXiv:2302.06100 (2023). product/2021/tweeting-with-consideration. [33] H.-T. Nguyen, R. Goebel, F. Toni, K. Stathis, K. Satoh, How well do sota legal reasoning mod- els support abductive reasoning?, arXiv preprint arXiv:2304.06912 (2023). [34] S. Hamilton, Blind judgement: Agent-based supreme court modelling with gpt, arXiv preprint arXiv:2301.05327 (2023). [35] J. Tan, H. Westermann, K. Benyekhlef, Chatgpt as an artificial lawyer?, in: Artificial Intelligence for Access to Justice (AI4AJ 2023), 2023. [36] J. Savelka, K. Ashley, M. Gray, H. Westermann, H. Xu, Can gpt-4 support analysis of textual data