Classification of Breach of Contract Court Decision Sentences Wai Yin Mok Jonathan R. Mok The University of Alabama in Huntsville Legal Services Alabama Huntsville, AL 35899, USA Birmingham, AL 35203, USA mokw@uah.edu jmok@alsp.org ABSTRACT defines the components and their relationships thereof, is therefore This research project develops a methodology to utilize machine- first constructed from relevant legal statutes, likely assisted by hu- learning analysis of live disputes to assist legal professionals in man legal experts. The constructed ontology will later be utilized narrowing issues and comparing relevant precedents. An important in the machine-learning phase of the system. part of this research project is an ontology of target court decisions, Court decisions are typically written in natural languages, such which is first constructed from relevant legal statutes, probably as English and Chinese; hence, the proposed system must be able to assisted by human legal experts. Utilizing the ontology, we extract process natural language. At this early stage of the research we fo- and classify sentences in court decisions according to type. Since cus on English. In recent years, Natural Language Processing (NLP) court decisions are typically written in natural languages, such has made significant progress. Years of research in linguistic and as English and Chinese, chosen court decisions are pre-processed NLP have produced many industrial-strength NLP Python libraries. with the Natural Language Processing library spaCy, and then fed Due to its ease of use and generality, spaCy [10] is chosen for this into the machine learning component of the system to extract and research project. Chosen court decisions are first preprocessed with classify the sentences of the input court decisions. spaCy. After that, the court decisions, and the information added by spaCy, are fed into the machine learning component of the system. KEYWORDS The rest of the paper is organized as follows. Relevant past research activities are presented in Section 2, which also puts our Ontology; Machine-Learning; spaCy; Natural Language Processing research into a boarder perspective. The ontology of breach of contract court decisions is given in Section 3. Classification of the 1 INTRODUCTION sentences with respect to the chosen sample court decisions, based This research project develops a methodology to utilize machine- on our implementation of the logistic regression algorithm and learning analysis of live disputes to assist legal professionals in that of scikit-learn [5], is demonstrated in Section 4. We present narrowing issues and comparing relevant precedents. At this early possible research directions in Section 5 and we conclude the paper stage of the research, we limit the scope to a specific methodology in Section 6. for legal research of breach of contract issues that also creates a general template for all other legal issues with common elements and/or factors. Our first step is to extract and classify sentences in 2 LITERATURE REVIEW breach of contract court decisions according to type. Such court decisions have five basic sentence types: sentences on contract The organization and formalization of legal information for com- law, sentences on contract holding, sentences on contract issues, puter processing to support decision making and enhance search, sentences on contract reasoning, and sentences on contract facts. retrieval, and knowledge management is not recent; the concept Classifying sentences is an important first step of our long-term dates back to the late 1940s and early 1950s, with the first legal research goal: developing a machine learning system that analyzes information systems being developed in the 1950s [2, 15]. Knowl- hundreds or thousands of past breach of contract court decisions edge of the legal domain is expressed in natural languages, such similar to the case at hand, thus providing a powerful tool to legal as English and Chinese, but such knowledge does not provide a professionals when faced with new cases and issues. Further down- well-defined structure to be used by machines for reasoning tasks stream processing can be made possible by this project, such as [7]. Furthermore, because language is an expression of societal constructing decision trees that predict the likely outcome for the conduct, it is imprecise and fluctuating, which leads to challenges case at hand, displaying the rationales on which court decisions are in formalization, as noted in a panel of the 1985 International Joint based, and calculating the similarity of previous legal precedents. Conferences on Artificial Intelligence, that include: (a) legal do- An understanding of the document structure of breach of con- main knowledge is not strictly orderly, but remains very much an tract court decisions is crucial to this research. Ontologies, or formal experience-based example driven field; (b) legal reasoning and argu- representations, have been created for many applications of diverse mentation requires expertise beyond rote memorization of a large academic disciplines, including Artificial Intelligence and Philos- number of cases and case synthesis; (c) legal domain knowledge ophy. An ontology of the target court decisions, which formally contains a large body of formal rules that purport to define and reg- ulate activity, but are often deliberately ambiguous, contradictory, In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information and incomplete; (d) legal reasoning combines many different types in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada. of reasoning processes such as rule-based, case-based, analogical, © 2019 Copyright held by the owner/author(s). Copying permitted for private and and hypothetical; (e) the legal domain field is in a constant state of academic purposes. Published at http://ceur-ws.org change, so expert legal reasoning systems must be easy to modify and update; (f) such expert legal reasoning systems are unusual ASAIL 2019, June 21, 2019, Montreal, QC, Canada Wai Yin Mok and Jonathan R. Mok due to the expectation that experts will disagree; and (g) legal rea- a superset-subset relationship where the set of objects connected soning, natural language, and common sense understanding are to its apex is a superset of the set of objects connected to its base. intertwined and difficult to categorize and formalize [13]. Other Thus, the set “Sentence” has six subsets: “Title”, “Law”, “Holding”, challenges include the (h) representation of context in legal domain “Fact”, “Issue”, and “Reasoning”. The plus sign + inside of the open knowledge; (i) formalizing degrees of similarity and analogy; (j) triangle means that in addition to the superset-subset relationship, formalizing probabilistic, default, or non-monotonic reasoning, in- no two subsets can have any common element. This means that cluding problems of priors, the weight of evidence and reference no two of the six subsets intersect. A line in the model denotes classes; and (k) formalizing discrete versus continuous issues [6]. a relationship set between the rectangles (object sets) connected To address these challenges, expert system developers have by the line. A relationship set contains all of the relationships of turned to legal ontology engineering, which converts natural lan- interest between the set of objects located at one end and the set of guage texts into machine extractable and minable data through objects located at the other end of the line. The connection point of categorization [1]. An ontology is defined as a conceptualization a relationship set and a rectangle is associated with a participation of a domain into a human understandable, machine-readable for- constraint. A participation constraint 1:* means that the minimum mat consisting of entities, attributes, relationships, and axioms [9]. is 1 and the maximum is *, where * can mean any positive integer. Ontologies may be regarded as advanced taxonomical structures, A participation constraint 1 is a shorthand symbol for 1:1. The pen- where concepts formalized as classes (e.g. “mammal”) are defined tagon in the figure means a 5-way relationship set, rather than the with axioms, enriched with description of attributes or constraints usual binary relationship sets. (e.g. “cardinality”), and linked to other classes through properties (e.g., “eats” or “is_eaten_by”) [2]. The bounds of the domain are Breach of Contract set by expert system developers with formal terms to represent Court Decision knowledge and determine what “exists” for the system [8]. This 1:* 1:* 1:* practice has been successfully applied to fields such as biology and family history to format machine unreadable data into readable 1:* 1:* data and is practiced in legal ontology engineering [4, 11, 14]. Plaintiff Defendant NLP can be employed to assist in legal ontology engineering; NLP 1 is an area of research and application that explores how computers Sentence can be used to understand and manipulate natural language text or speech to do useful things [3]. After years of research in linguistics and NLP, many NLP libraries have been produced and are ready for real-life applications. Examples are NLTK, TextBlob, Stanford Title CoreNLP, spaCy, and Gensim. These libraries can parse and add useful information about the input text, which makes machine Law Holding understanding of legal texts possible. In addition, machine learning 1:* 1:* 1 has gained tremendous attention in recent years because of its 1:* 1:* ability to learn from examples. With the additional information Fact Issue added by the NLP libraries, machine learning can be applied to 1:* legal ontological engineering because the concepts identified in the Reasoning ontology can be populated with real-life facts, rules, and reasoning extracted from past legal cases of interest. Many mature machine learning libraries have also been produced. To name a few, there Figure 1: An ontology created for breach of contract court are TensorFlow, scikit-learn, PyTorch, and Apache Mahout. decisions. Title sentences are not as useful as the other five categories 3 ONTOLOGY because a title sentence is a part of a court decision’s title or format, In terms of Computer Science, an ontology is a conceptual model for although it might contain useful information, like the names of the a certain system, which defines its components, and the attributes plaintiffs and defendants. of, and the relationships among, those components. Following the To justify the other categories of sentences, note that legal data notation of [11, 14], the ontology created for this paper is shown in is composed of written text that is broken down into sentences. Figure 1. Sentences in legal texts can be further categorized into five sentence Each rectangle in the model denotes a set of objects. The rec- types relevant to legal analysis: (1) fact sentences, (2) law sentences, tangle “Breach of Contract Court Decision” represents the set of (3) issue sentences, (4) holding sentences, and (5) reasoning sen- all the breach of contract court decisions in the state of Alabama. tences. In essence, legal texts arise from disputes between two or A filled triangle denotes a whole-part relationship, which means more parties that are ultimately decided by a third party arbitrator, that an object of the rectangle connected to the apex of the trian- which in most cases is a judge. In this study, machine learning will gle is a whole while an object of the rectangle connected to the be strictly applied to case law legal texts, which are written deci- base of the triangle is a part. Hence, a breach of contract court sions by a judge that detail a dispute between two or more parties decision is composed of many sentences. An open triangle denotes and the application of law to the dispute to reach a conclusion. In Classification of Breach of Contract Court Decision Sentences ASAIL 2019, June 21, 2019, Montreal, QC, Canada each case decision, there are facts that form the background of the (2) 256 So.3d 119, Court of Civil Appeals of Alabama, Phillip dispute. Fact sentences contain the relevant facts as determined by JONES and Elizabeth Jones v. THE VILLAGE AT LAKE MAR- the judge that underlie any written decision. This is discretionary TIN, LLC, 2160650, January 12, 2018. in nature and relies on the judge to correctly evaluate the facts (3) 873 F.Supp. 1519, United States District Court, M.D. Alabama, underlying the dispute and explicate them. Law sentences are state- Northern Division, Kimberly KYSER-SMITH, Plaintiff, v. UP- ments of law that the judge deems applicable to the dispute. Law SCALE COMMUNICATIONS, INC., Bovanti Communication, sentences are always followed by legal citations. Issue sentences Inc., Sally Beauty Company, Inc., Defendants, Civ. A. No. 94- are statements made by either party in the dispute, or by the judge, D-58-N, Jan. 17, 1995. that can be (a) assertions made by either party of what are correct The industrial-strength NLP library spaCy is chosen to extract applications of law or true determinations of facts, or (b) statements the sentences from the selected court decisions because of its ease by the judge of what he or she determines are the relevant issue(s) of use and cleanliness of design [10]. underlying the dispute, and how he or she frames such issue(s). Issue statements include the arguments that either party makes to 4.2 Logistic Regression Algorithm support their respective interests; such sentences are not considered We have not only implemented the logistic regression algorithm, fact sentences or law sentences because only the judge is able to but have also compared our implementation with the one provided make dispositive statements of fact and law as the final arbitrator of by scikit-learn [5], which is highly optimized for performance pur- the dispute. Holding sentences are determinations by the judge that poses. The chief finding is that our implementation consistently apply the law to the dispute and reach a conclusion, also known as has more correct predictions than the one provided by scikit-learn. a case holding. Holding sentences are the judge’s determination of how the law is applied to the facts of the dispute and the corollary 4.2.1 Recording the key words of the sentences. Classification of conclusion that sides with one party or the other. Holding sentences the sentences in a court decision requires identifying the features are actually considered new statements of law as application of of the sentences. For this purpose, we generate a list of key words. law towards a new set of facts—as no dispute is identical—made by spaCy generates a total of 529 sentences and 1861 key words from a judge that sets precedent for future disputes; holding sentences the three sample court decisions. We thus use a 529×1861 two- are arguably the most important sentences of a case decision as dimensional Numpy array X to store the number of appearances of such sentences are the only operative sentences that have bearing each keyword in each sentence. In X, X[i,j] records the number of on future disputes. Only direct application of law to a present set times keyword j appears in sentence i. of facts is considered a holding of the case and binding precedent. Reasoning sentences are statements by the judges that explain how 4.2.2 Our implementation of the logistic regression algorithm. Al- he or she reached the conclusion that he or she did in the case. though our implementation began with the Adaptive Linear Neuron Reasoning sentences will typically detail the steps the judge made algorithm in [12], we have made numerous modifications. The fol- to reach his or her conclusion, interpretation of law and/or fact, lowing code segment demonstrates the most salient parts of our comparison and differentiation of prior case law, discussion of so- implementation. cietal norms, and other analytical methods judges utilize to reach def __init__(self, eta=0.01, n_iter=50, slopeA=0.01): conclusions. Holding sentences and reasoning sentences are similar, self.eta = eta but can be distinguished in the regard that holding sentences are self.n_iter = n_iter direct applications of law to the present facts of the present dispute, self.slopeErrorAllowance = slopeA whereas reasoning sentences do not provide any direct applications of law to present facts of present disputes and only explain the def activationProb(self, X, w): mental gymnastics the judge has performed to reach his or her z = np.dot(X, w[1:]) + w[0] conclusion. At times it is even difficult for legal professionals to phiOfz = 1/(1+np.exp(-z)) distinguish holding sentences from reasoning sentences, but any return phiOfz hypothetical application of law to fact is considered reasoning. As such, the five types of sentences are related in a nontrivial way. def fit_helper(self, X, y, w): Hence, the five-way relationship set in Figure 1 denotes such a stopVal = self.slopeErrorAllowance * X.shape[1] complicated relationship among the five categories of sentences. for i in range(self.n_iter): phiOfz = self.activationProb(X, w) output = np.where(phiOfz >= 0.5, 1, 0) 4 CLASSIFICATION OF SENTENCES 4.1 Test Cases errors = (y - output) In this early stage of our research project, three sample breach of slopes = X.T.dot(errors) contract court decisions are considered. As the research progresses, w[1:] += self.eta * slopes more court decisions will be added. w[0] += self.eta * errors.sum() slopeSum = abs(slopes).sum() (1) 439 So.2d 36, Supreme Court of Alabama, GULF COAST if slopeSum <= stopVal: FABRICATORS, INC. v. G.M. MOSLEY, etc., 81-1042, Sept. break 23, 1983. ASAIL 2019, June 21, 2019, Montreal, QC, Canada Wai Yin Mok and Jonathan R. Mok def fit(self, X, aL): Table 1: Comparison of our implementations and scikit- self.ansList = np.unique(aL) learn’s implementation of the logistic regression algorithm self.ws = np.zeros((len(self.ansList), 1+X.shape[1])) for k in range(len(self.ansList)): Methods Tests Correct Guesses Total Guesses y = np.copy(aL) Ours 100 8707 15900 y = np.where(y == self.ansList[k], 1, 0) scikit-learn’s 100 8454 15900 self.fit_helper(X, y, self.ws[k]) def predict(self, X): numSamples = X.shape[0] The function fit accepts two parameters: X the two-dimensional self.probs = np.zeros((len(self.ansList), numSamples)) Numpy array, and aL the one-dimensional array that stores the for k in range(len(self.ansList)): correct sentence types of the sample sentences of X. It first finds self.probs[k] = self.activationProb(X, self.ws[k]) the unique sentence types in aL. Then, it applies the one versus self.results = [""] * numSamples the rest approach for each unique sentence type to calculate the for i in range(numSamples): weights and the constant term for the 1861 key words. To do so, maxk = 0 each appearance of the sentence type in y, which is a copy of for k in range(len(self.ansList)): aL, is replaced with a 1 and 0 elsewhere. The function fit then if self.probs[k][i] >= self.probs[maxk][i]: calls self.fit_helper to calculate the 1861 weights and the constant maxk = k term for that particular sentence type, which will be used to make self.results[i] = self.ansList[maxk] predictions. return np.array(self.results) The calculated weights and constant term for each sentence type are stored in self.ws, which is a 6×1862 Numpy array. To make a Regarding the three sample court decisions, the parameter w of prediction for a sentence, we use the predict function, which calcu- the function activationProb is a one-dimensional Numpy array of lates the probability of each sentence type on the input sentence 1862 integers. w[1:], of size 1861, stores the current weights for the and assigns the sentence type that has the greatest probability to 1861 key words and w[0] is the constant term of the net input z. The the sentence. function activationProb first calculates the net input z by computing 4.2.3 scikit-learn’s implementation of the logistic regression algo- the dot product of the rows of X and w. In the next step the function rithm. scikit-learn’s implementation of the logistic regression algo- calculates the probabilities ϕ(z) for the sample sentences (rows) of rithm can be straightforwardly applied. We first randomly shuffle X based on the net input z. (Technically, the function calculates the 529 sentences and their corresponding sentence types. We then the conditional probability that an input sentence has a certain apply the function train_test_split to split the 529 sentences and sentence type given the sample sentences in X. However, they are their sentence types into two sets: 370 training sentences and 159 simply called probabilities in this paper to avoid being verbose.) It testing sentences. Afterwards, a logistic regression object is created then returns the one-dimensional Numpy array phiOfz, which has and fitted with the training data. the activation probability for each sample sentence in X. The function fit_helper receives three parameters: the same two- 4.2.4 Discussions. To compare our implementation and that of dimensional Numpy array X, the one-dimensional Numpy array y scikit-learn for the logistic regression algorithm, we randomly se- that stores the correct outputs for the sample sentences of X, and lect 370 training sentences and 159 testing sentences from the 529 the one-dimensional Numpy array w that has the weights and the sentences of the three sample court decisions and feed them to our constant term calculated for the 1861 key words. The function first implementation. The results are shown in Table 1, which indicates calls activationProb to calculate the probabilities for the sample sen- that our implementation has more correct predictions than the tences of X. It then calculates the output for each sample sentence: highly optimized implementation of scikit-learn. However, since 1 if the probability ≥ 0.5; 0 otherwise. The errors for the sample we are familiar with our code, our implementation can serve as a sentences are then calculated accordingly. The most important part platform for future improvements. of the function is the calculation of the slopes (partial derivatives) Our result is remarkable if we compare our implementation with of the current weights, which will be updated in the next line. The randomly guessing the sentence types for the 159 sentences with six constant term w[0] is updated after that. possibilities for each sentence. Note that the probability of randomly Since there are 1861 weights, there are 1861 slopes. The ideal making a correct guess is p = 1/6 = 0.166667. Assuming each guess scenario is that each slope will become zero in the calculation, is an independent trial, making exactly 87 correct predictions and which is practically impossible. We thus allow an allowance for such making 87 or more correct predictions should follow the binomial a small discrepancy, which is stored in self.slopeErrorAllowance. distribution. Thus, applying Microsoft Excel’s binomial distribution Thus, the total allowance for 1861 slopes is self.slopeErrorAllowance statistical functions BINOM.DIST and BINOM.DIST.RANGE to the * 1861 (= X.shape[1]), which is then stored in stopVal. If the sum of problems, we obtain the probabilities in Table 2. the absolute values of the 1861 slopes is less than stopVal, it will The probabilities for both events are close to zero. Hence, our exit the for loop and no more updates are necessary. Otherwise, the keyword approach to capturing the characteristics of the sentences for loop will continue for self.n_iter times, a number that is set to of breach of contract court decisions is on the right track, although 500. Note that the learning rate self.eta is set to 0.0001. much improvement remains to be made. Classification of Breach of Contract Court Decision Sentences ASAIL 2019, June 21, 2019, Montreal, QC, Canada Table 2: Probabilities of randomly guessing with exactly 87 [6] James Franklin. 2012. Discussion paper: how much of commonsense and legal or 87 or more correct predictions for the 159 testing sen- reasoning is formalizable? A review of conceptual obstacles. Law, Probability and Risk 11, 2-3 (06 2012), 225–245. https://doi.org/10.1093/lpr/mgs007 tences [7] Mirna Ghosh, Hala Naja, Habib Abdulrab, and Mohamad Khalil. 2017. Ontology Learning Process as a Bottom-up Strategy for Building Domain-specific Ontol- ogy from Legal Texts. In 9th International Conference on Agents and Artificial Event Formula Probability Intelligence. 473–480. https://doi.org/10.5220/0006188004730480 [8] Thomas R. Gruber. 1995. Toward principles for the design of ontologies used 87 exactly BINOM.DIST(87,159,p,FALSE) 9.08471E−28 for knowledge sharing? International Journal of Human-Computer Studies 43, 5 87 or more BINOM.DIST.RANGE(159,p,87,159) 1.08519E−27 (1995), 907 – 928. https://doi.org/10.1006/ijhc.1995.1081 [9] Nicola Guarino and Pierdaniele Giaretta. 1995. Ontologies and knowledge bases: towards a terminological clarification. In Towards Very Large Knowledge Bases, N.J.I. Mars (Ed.). IOS Press, Amsterdam, 25–32. [10] Matthew Honnibal et al. [n. d.]. spaCy: Industrial-Strength Natural Language 5 FUTURE RESEARCH DIRECTIONS Processing in Python. https://spacy.io/. ([n. d.]). Accessed: 2019-01-25. [11] Deryle Lonsdale, David W. Embley, Yihong Ding, Li Xu, and Martin Hepp. 2010. In our experiments, we discover that if the sum of the absolute Reusing ontologies and language components for ontology generation. Data & values of the slopes does not reduce to zero, it usually will go Knowledge Engineering 69, 4 (2010), 318 – 330. https://doi.org/10.1016/j.datak.2009. 08.003 Including Special Section: 12th International Conference on Applications through cycles with respect to the number of iterations determined of Natural Language to Information Systems (NLDB’07) - Three selected and by the number self.n_iter, which is current set to 500. Our next goal extended papers. is to find the right number for self.n_iter so that the sum of the [12] Sebastian Raschka. 2015. Python Machine Learning (1 ed.). Packt Publishing. [13] Edwina Rissland. 1985. AI and Legal Reasoning. In Proceedings of the 9th interna- absolute values of the slopes is as small as possible. tional joint conference on Artificial intelligence - Volume 2. 1254–1260. The ontology in Figure 1 requires more details. More details will [14] Cui Tao and David W. Embley. 2009. Automatic Hidden-web Table Interpretation, Conceptualization, and Semantic Annotation. Data & Knowledge Engineering 68, result in more accurate characterization of the five types of sen- 7 (July 2009), 683–703. https://doi.org/10.1016/j.datak.2009.02.010 tences, which will lead to more accurate classification algorithms. [15] Donald A. Waterman, Jody Paul, and Mark Peterson. 1986. Expert systems for The word-vector approach of calculating the similarity of two legal decision making. Expert Systems 3, 4 (1986), 212–226. https://doi.org/10. 1111/j.1468-0394.1986.tb00203.x sentences might yield good results. Thus, we will utilize Word2vec, [16] Wikipedia contributors. [n. d.]. Word2vec. https://en.wikipedia.org/wiki/ which is one such algorithm for learning a word embedding from a Word2vec/. ([n. d.]). Accessed: 2019-01-26. text corpus [16]. 6 CONCLUSIONS In this early stage of the research, we focus on breach of contract court decisions. Five critical contract sentence types have been identified: contract fact sentences, contract law sentences, contract holding sentences, contract issue sentences, and contract reasoning sentences. spaCy, an NLP Python library, is used to parse the court decisions Three sample breach of contract court decisions are chosen to test our own implementation and the highly optimized scikit-learn implementation of the logistic regression algorithm. Our imple- mentation consistently has more correct predictions than the one provided by scikit-learn and it can serve as a platform for future improvements. Our keyword approach is a good starting-point because the number of correct predictions is far greater than the number produced by randomly guessing the sentence types for the 159 testing sentences. Lastly, many possible future improvements have also been identified. REFERENCES [1] Joost Breuker, André Valente, and Radboud Winkels. 2004. Legal Ontologies in Knowledge Engineering and Information Management. Artificial Intelligence and Law 12, 4 (01 Dec 2004), 241–277. https://doi.org/10.1007/s10506-006-0002-1 [2] Nuria Casellas. 2011. Legal Ontology Engineering: Methodologies, Modelling Trends, and the Ontology of Professional Judicial Knowledge (1 ed.). Law, Governance and Technology Series, Vol. 3. Springer Netherlands. https://doi.org/10.1007/ 978-94-007-1497-7 [3] Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of Information Science and Technology 37, 1 (2003), 51–89. https://doi.org/10.1002/ aris.1440370103 [4] Kim Clark, Deepak Sharma, Rui Qin, Christopher G Chute, and Cui Tao. 2014. A use case study on late stent thrombosis for ontology-based temporal reasoning and analysis. Journal of Biomedical semantics 5, 49 (2014). https://doi.org/10. 1186/2041-1480-5-49 [5] David Cournapeau et al. [n. d.]. scikit-learn: Machine Learning in Python. https: //scikit-learn.org/stable/. ([n. d.]). Accessed: 2019-01-25.