Classification of Breach of Contract Court Decision Sentences
                                 Wai Yin Mok                                                              Jonathan R. Mok
                The University of Alabama in Huntsville                                                Legal Services Alabama
                      Huntsville, AL 35899, USA                                                      Birmingham, AL 35203, USA
                           mokw@uah.edu                                                                    jmok@alsp.org

ABSTRACT                                                                              defines the components and their relationships thereof, is therefore
This research project develops a methodology to utilize machine-                      first constructed from relevant legal statutes, likely assisted by hu-
learning analysis of live disputes to assist legal professionals in                   man legal experts. The constructed ontology will later be utilized
narrowing issues and comparing relevant precedents. An important                      in the machine-learning phase of the system.
part of this research project is an ontology of target court decisions,                  Court decisions are typically written in natural languages, such
which is first constructed from relevant legal statutes, probably                     as English and Chinese; hence, the proposed system must be able to
assisted by human legal experts. Utilizing the ontology, we extract                   process natural language. At this early stage of the research we fo-
and classify sentences in court decisions according to type. Since                    cus on English. In recent years, Natural Language Processing (NLP)
court decisions are typically written in natural languages, such                      has made significant progress. Years of research in linguistic and
as English and Chinese, chosen court decisions are pre-processed                      NLP have produced many industrial-strength NLP Python libraries.
with the Natural Language Processing library spaCy, and then fed                      Due to its ease of use and generality, spaCy [10] is chosen for this
into the machine learning component of the system to extract and                      research project. Chosen court decisions are first preprocessed with
classify the sentences of the input court decisions.                                  spaCy. After that, the court decisions, and the information added by
                                                                                      spaCy, are fed into the machine learning component of the system.
KEYWORDS                                                                                 The rest of the paper is organized as follows. Relevant past
                                                                                      research activities are presented in Section 2, which also puts our
Ontology; Machine-Learning; spaCy; Natural Language Processing
                                                                                      research into a boarder perspective. The ontology of breach of
                                                                                      contract court decisions is given in Section 3. Classification of the
1    INTRODUCTION                                                                     sentences with respect to the chosen sample court decisions, based
This research project develops a methodology to utilize machine-                      on our implementation of the logistic regression algorithm and
learning analysis of live disputes to assist legal professionals in                   that of scikit-learn [5], is demonstrated in Section 4. We present
narrowing issues and comparing relevant precedents. At this early                     possible research directions in Section 5 and we conclude the paper
stage of the research, we limit the scope to a specific methodology                   in Section 6.
for legal research of breach of contract issues that also creates a
general template for all other legal issues with common elements
and/or factors. Our first step is to extract and classify sentences in                2   LITERATURE REVIEW
breach of contract court decisions according to type. Such court
decisions have five basic sentence types: sentences on contract                       The organization and formalization of legal information for com-
law, sentences on contract holding, sentences on contract issues,                     puter processing to support decision making and enhance search,
sentences on contract reasoning, and sentences on contract facts.                     retrieval, and knowledge management is not recent; the concept
Classifying sentences is an important first step of our long-term                     dates back to the late 1940s and early 1950s, with the first legal
research goal: developing a machine learning system that analyzes                     information systems being developed in the 1950s [2, 15]. Knowl-
hundreds or thousands of past breach of contract court decisions                      edge of the legal domain is expressed in natural languages, such
similar to the case at hand, thus providing a powerful tool to legal                  as English and Chinese, but such knowledge does not provide a
professionals when faced with new cases and issues. Further down-                     well-defined structure to be used by machines for reasoning tasks
stream processing can be made possible by this project, such as                       [7]. Furthermore, because language is an expression of societal
constructing decision trees that predict the likely outcome for the                   conduct, it is imprecise and fluctuating, which leads to challenges
case at hand, displaying the rationales on which court decisions are                  in formalization, as noted in a panel of the 1985 International Joint
based, and calculating the similarity of previous legal precedents.                   Conferences on Artificial Intelligence, that include: (a) legal do-
   An understanding of the document structure of breach of con-                       main knowledge is not strictly orderly, but remains very much an
tract court decisions is crucial to this research. Ontologies, or formal              experience-based example driven field; (b) legal reasoning and argu-
representations, have been created for many applications of diverse                   mentation requires expertise beyond rote memorization of a large
academic disciplines, including Artificial Intelligence and Philos-                   number of cases and case synthesis; (c) legal domain knowledge
ophy. An ontology of the target court decisions, which formally                       contains a large body of formal rules that purport to define and reg-
                                                                                      ulate activity, but are often deliberately ambiguous, contradictory,
In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information   and incomplete; (d) legal reasoning combines many different types
in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada.                      of reasoning processes such as rule-based, case-based, analogical,
© 2019 Copyright held by the owner/author(s). Copying permitted for private and       and hypothetical; (e) the legal domain field is in a constant state of
academic purposes.
Published at http://ceur-ws.org                                                       change, so expert legal reasoning systems must be easy to modify
                                                                                      and update; (f) such expert legal reasoning systems are unusual
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                     Wai Yin Mok and Jonathan R. Mok


due to the expectation that experts will disagree; and (g) legal rea-    a superset-subset relationship where the set of objects connected
soning, natural language, and common sense understanding are             to its apex is a superset of the set of objects connected to its base.
intertwined and difficult to categorize and formalize [13]. Other        Thus, the set “Sentence” has six subsets: “Title”, “Law”, “Holding”,
challenges include the (h) representation of context in legal domain     “Fact”, “Issue”, and “Reasoning”. The plus sign + inside of the open
knowledge; (i) formalizing degrees of similarity and analogy; (j)        triangle means that in addition to the superset-subset relationship,
formalizing probabilistic, default, or non-monotonic reasoning, in-      no two subsets can have any common element. This means that
cluding problems of priors, the weight of evidence and reference         no two of the six subsets intersect. A line in the model denotes
classes; and (k) formalizing discrete versus continuous issues [6].      a relationship set between the rectangles (object sets) connected
   To address these challenges, expert system developers have            by the line. A relationship set contains all of the relationships of
turned to legal ontology engineering, which converts natural lan-        interest between the set of objects located at one end and the set of
guage texts into machine extractable and minable data through            objects located at the other end of the line. The connection point of
categorization [1]. An ontology is defined as a conceptualization        a relationship set and a rectangle is associated with a participation
of a domain into a human understandable, machine-readable for-           constraint. A participation constraint 1:* means that the minimum
mat consisting of entities, attributes, relationships, and axioms [9].   is 1 and the maximum is *, where * can mean any positive integer.
Ontologies may be regarded as advanced taxonomical structures,           A participation constraint 1 is a shorthand symbol for 1:1. The pen-
where concepts formalized as classes (e.g. “mammal”) are defined         tagon in the figure means a 5-way relationship set, rather than the
with axioms, enriched with description of attributes or constraints      usual binary relationship sets.
(e.g. “cardinality”), and linked to other classes through properties
(e.g., “eats” or “is_eaten_by”) [2]. The bounds of the domain are                                       Breach of Contract
set by expert system developers with formal terms to represent                                            Court Decision
knowledge and determine what “exists” for the system [8]. This                                    1:*          1:*                1:*
practice has been successfully applied to fields such as biology and
family history to format machine unreadable data into readable                          1:*              1:*
data and is practiced in legal ontology engineering [4, 11, 14].                     Plaintiff      Defendant
   NLP can be employed to assist in legal ontology engineering; NLP                                                               1
is an area of research and application that explores how computers                                                      Sentence
can be used to understand and manipulate natural language text or
speech to do useful things [3]. After years of research in linguistics
and NLP, many NLP libraries have been produced and are ready
for real-life applications. Examples are NLTK, TextBlob, Stanford                                                                             Title
CoreNLP, spaCy, and Gensim. These libraries can parse and add
useful information about the input text, which makes machine                                                   Law           Holding
understanding of legal texts possible. In addition, machine learning                                          1:*           1:*                  1
has gained tremendous attention in recent years because of its                                          1:*                             1:*
ability to learn from examples. With the additional information                                  Fact                                         Issue

added by the NLP libraries, machine learning can be applied to                                                        1:*
legal ontological engineering because the concepts identified in the
                                                                                                                     Reasoning
ontology can be populated with real-life facts, rules, and reasoning
extracted from past legal cases of interest. Many mature machine
learning libraries have also been produced. To name a few, there         Figure 1: An ontology created for breach of contract court
are TensorFlow, scikit-learn, PyTorch, and Apache Mahout.                decisions.

                                                                            Title sentences are not as useful as the other five categories
3   ONTOLOGY                                                             because a title sentence is a part of a court decision’s title or format,
In terms of Computer Science, an ontology is a conceptual model for      although it might contain useful information, like the names of the
a certain system, which defines its components, and the attributes       plaintiffs and defendants.
of, and the relationships among, those components. Following the            To justify the other categories of sentences, note that legal data
notation of [11, 14], the ontology created for this paper is shown in    is composed of written text that is broken down into sentences.
Figure 1.                                                                Sentences in legal texts can be further categorized into five sentence
   Each rectangle in the model denotes a set of objects. The rec-        types relevant to legal analysis: (1) fact sentences, (2) law sentences,
tangle “Breach of Contract Court Decision” represents the set of         (3) issue sentences, (4) holding sentences, and (5) reasoning sen-
all the breach of contract court decisions in the state of Alabama.      tences. In essence, legal texts arise from disputes between two or
A filled triangle denotes a whole-part relationship, which means         more parties that are ultimately decided by a third party arbitrator,
that an object of the rectangle connected to the apex of the trian-      which in most cases is a judge. In this study, machine learning will
gle is a whole while an object of the rectangle connected to the         be strictly applied to case law legal texts, which are written deci-
base of the triangle is a part. Hence, a breach of contract court        sions by a judge that detail a dispute between two or more parties
decision is composed of many sentences. An open triangle denotes         and the application of law to the dispute to reach a conclusion. In
Classification of Breach of Contract Court Decision Sentences                             ASAIL 2019, June 21, 2019, Montreal, QC, Canada


each case decision, there are facts that form the background of the          (2) 256 So.3d 119, Court of Civil Appeals of Alabama, Phillip
dispute. Fact sentences contain the relevant facts as determined by              JONES and Elizabeth Jones v. THE VILLAGE AT LAKE MAR-
the judge that underlie any written decision. This is discretionary              TIN, LLC, 2160650, January 12, 2018.
in nature and relies on the judge to correctly evaluate the facts            (3) 873 F.Supp. 1519, United States District Court, M.D. Alabama,
underlying the dispute and explicate them. Law sentences are state-              Northern Division, Kimberly KYSER-SMITH, Plaintiff, v. UP-
ments of law that the judge deems applicable to the dispute. Law                 SCALE COMMUNICATIONS, INC., Bovanti Communication,
sentences are always followed by legal citations. Issue sentences                Inc., Sally Beauty Company, Inc., Defendants, Civ. A. No. 94-
are statements made by either party in the dispute, or by the judge,             D-58-N, Jan. 17, 1995.
that can be (a) assertions made by either party of what are correct          The industrial-strength NLP library spaCy is chosen to extract
applications of law or true determinations of facts, or (b) statements    the sentences from the selected court decisions because of its ease
by the judge of what he or she determines are the relevant issue(s)       of use and cleanliness of design [10].
underlying the dispute, and how he or she frames such issue(s).
Issue statements include the arguments that either party makes to         4.2   Logistic Regression Algorithm
support their respective interests; such sentences are not considered
                                                                          We have not only implemented the logistic regression algorithm,
fact sentences or law sentences because only the judge is able to
                                                                          but have also compared our implementation with the one provided
make dispositive statements of fact and law as the final arbitrator of
                                                                          by scikit-learn [5], which is highly optimized for performance pur-
the dispute. Holding sentences are determinations by the judge that
                                                                          poses. The chief finding is that our implementation consistently
apply the law to the dispute and reach a conclusion, also known as
                                                                          has more correct predictions than the one provided by scikit-learn.
a case holding. Holding sentences are the judge’s determination of
how the law is applied to the facts of the dispute and the corollary      4.2.1 Recording the key words of the sentences. Classification of
conclusion that sides with one party or the other. Holding sentences      the sentences in a court decision requires identifying the features
are actually considered new statements of law as application of           of the sentences. For this purpose, we generate a list of key words.
law towards a new set of facts—as no dispute is identical—made by         spaCy generates a total of 529 sentences and 1861 key words from
a judge that sets precedent for future disputes; holding sentences        the three sample court decisions. We thus use a 529×1861 two-
are arguably the most important sentences of a case decision as           dimensional Numpy array X to store the number of appearances of
such sentences are the only operative sentences that have bearing         each keyword in each sentence. In X, X[i,j] records the number of
on future disputes. Only direct application of law to a present set       times keyword j appears in sentence i.
of facts is considered a holding of the case and binding precedent.
Reasoning sentences are statements by the judges that explain how         4.2.2 Our implementation of the logistic regression algorithm. Al-
he or she reached the conclusion that he or she did in the case.          though our implementation began with the Adaptive Linear Neuron
Reasoning sentences will typically detail the steps the judge made        algorithm in [12], we have made numerous modifications. The fol-
to reach his or her conclusion, interpretation of law and/or fact,        lowing code segment demonstrates the most salient parts of our
comparison and differentiation of prior case law, discussion of so-       implementation.
cietal norms, and other analytical methods judges utilize to reach        def __init__(self, eta=0.01, n_iter=50, slopeA=0.01):
conclusions. Holding sentences and reasoning sentences are similar,           self.eta = eta
but can be distinguished in the regard that holding sentences are             self.n_iter = n_iter
direct applications of law to the present facts of the present dispute,       self.slopeErrorAllowance = slopeA
whereas reasoning sentences do not provide any direct applications
of law to present facts of present disputes and only explain the          def activationProb(self, X, w):
mental gymnastics the judge has performed to reach his or her                 z = np.dot(X, w[1:]) + w[0]
conclusion. At times it is even difficult for legal professionals to          phiOfz = 1/(1+np.exp(-z))
distinguish holding sentences from reasoning sentences, but any               return phiOfz
hypothetical application of law to fact is considered reasoning. As
such, the five types of sentences are related in a nontrivial way.        def fit_helper(self, X, y, w):
Hence, the five-way relationship set in Figure 1 denotes such a               stopVal = self.slopeErrorAllowance * X.shape[1]
complicated relationship among the five categories of sentences.              for i in range(self.n_iter):
                                                                                  phiOfz = self.activationProb(X, w)
                                                                                  output = np.where(phiOfz >= 0.5, 1, 0)
4 CLASSIFICATION OF SENTENCES
4.1 Test Cases                                                                      errors = (y - output)
In this early stage of our research project, three sample breach of                 slopes = X.T.dot(errors)
contract court decisions are considered. As the research progresses,                w[1:] += self.eta * slopes
more court decisions will be added.                                                 w[0] += self.eta * errors.sum()
                                                                                    slopeSum = abs(slopes).sum()
   (1) 439 So.2d 36, Supreme Court of Alabama, GULF COAST                           if slopeSum <= stopVal:
       FABRICATORS, INC. v. G.M. MOSLEY, etc., 81-1042, Sept.                           break
       23, 1983.
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                           Wai Yin Mok and Jonathan R. Mok


def fit(self, X, aL):                                                     Table 1: Comparison of our implementations and scikit-
    self.ansList = np.unique(aL)                                          learn’s implementation of the logistic regression algorithm
   self.ws = np.zeros((len(self.ansList), 1+X.shape[1]))
    for k in range(len(self.ansList)):                                           Methods        Tests   Correct Guesses    Total Guesses
        y = np.copy(aL)
                                                                                   Ours         100           8707              15900
        y = np.where(y == self.ansList[k], 1, 0)
                                                                               scikit-learn’s   100           8454              15900
        self.fit_helper(X, y, self.ws[k])

def predict(self, X):
    numSamples = X.shape[0]                                                  The function fit accepts two parameters: X the two-dimensional
   self.probs = np.zeros((len(self.ansList), numSamples))                 Numpy array, and aL the one-dimensional array that stores the
    for k in range(len(self.ansList)):                                    correct sentence types of the sample sentences of X. It first finds
      self.probs[k] = self.activationProb(X, self.ws[k])                  the unique sentence types in aL. Then, it applies the one versus
    self.results = [""] * numSamples                                      the rest approach for each unique sentence type to calculate the
    for i in range(numSamples):                                           weights and the constant term for the 1861 key words. To do so,
        maxk = 0                                                          each appearance of the sentence type in y, which is a copy of
        for k in range(len(self.ansList)):                                aL, is replaced with a 1 and 0 elsewhere. The function fit then
           if self.probs[k][i] >= self.probs[maxk][i]:                    calls self.fit_helper to calculate the 1861 weights and the constant
                maxk = k                                                  term for that particular sentence type, which will be used to make
        self.results[i] = self.ansList[maxk]                              predictions.
    return np.array(self.results)                                            The calculated weights and constant term for each sentence type
                                                                          are stored in self.ws, which is a 6×1862 Numpy array. To make a
   Regarding the three sample court decisions, the parameter w of         prediction for a sentence, we use the predict function, which calcu-
the function activationProb is a one-dimensional Numpy array of           lates the probability of each sentence type on the input sentence
1862 integers. w[1:], of size 1861, stores the current weights for the    and assigns the sentence type that has the greatest probability to
1861 key words and w[0] is the constant term of the net input z. The      the sentence.
function activationProb first calculates the net input z by computing     4.2.3 scikit-learn’s implementation of the logistic regression algo-
the dot product of the rows of X and w. In the next step the function     rithm. scikit-learn’s implementation of the logistic regression algo-
calculates the probabilities ϕ(z) for the sample sentences (rows) of      rithm can be straightforwardly applied. We first randomly shuffle
X based on the net input z. (Technically, the function calculates         the 529 sentences and their corresponding sentence types. We then
the conditional probability that an input sentence has a certain          apply the function train_test_split to split the 529 sentences and
sentence type given the sample sentences in X. However, they are          their sentence types into two sets: 370 training sentences and 159
simply called probabilities in this paper to avoid being verbose.) It     testing sentences. Afterwards, a logistic regression object is created
then returns the one-dimensional Numpy array phiOfz, which has            and fitted with the training data.
the activation probability for each sample sentence in X.
   The function fit_helper receives three parameters: the same two-       4.2.4 Discussions. To compare our implementation and that of
dimensional Numpy array X, the one-dimensional Numpy array y              scikit-learn for the logistic regression algorithm, we randomly se-
that stores the correct outputs for the sample sentences of X, and        lect 370 training sentences and 159 testing sentences from the 529
the one-dimensional Numpy array w that has the weights and the            sentences of the three sample court decisions and feed them to our
constant term calculated for the 1861 key words. The function first       implementation. The results are shown in Table 1, which indicates
calls activationProb to calculate the probabilities for the sample sen-   that our implementation has more correct predictions than the
tences of X. It then calculates the output for each sample sentence:      highly optimized implementation of scikit-learn. However, since
1 if the probability ≥ 0.5; 0 otherwise. The errors for the sample        we are familiar with our code, our implementation can serve as a
sentences are then calculated accordingly. The most important part        platform for future improvements.
of the function is the calculation of the slopes (partial derivatives)       Our result is remarkable if we compare our implementation with
of the current weights, which will be updated in the next line. The       randomly guessing the sentence types for the 159 sentences with six
constant term w[0] is updated after that.                                 possibilities for each sentence. Note that the probability of randomly
   Since there are 1861 weights, there are 1861 slopes. The ideal         making a correct guess is p = 1/6 = 0.166667. Assuming each guess
scenario is that each slope will become zero in the calculation,          is an independent trial, making exactly 87 correct predictions and
which is practically impossible. We thus allow an allowance for such      making 87 or more correct predictions should follow the binomial
a small discrepancy, which is stored in self.slopeErrorAllowance.         distribution. Thus, applying Microsoft Excel’s binomial distribution
Thus, the total allowance for 1861 slopes is self.slopeErrorAllowance     statistical functions BINOM.DIST and BINOM.DIST.RANGE to the
* 1861 (= X.shape[1]), which is then stored in stopVal. If the sum of     problems, we obtain the probabilities in Table 2.
the absolute values of the 1861 slopes is less than stopVal, it will         The probabilities for both events are close to zero. Hence, our
exit the for loop and no more updates are necessary. Otherwise, the       keyword approach to capturing the characteristics of the sentences
for loop will continue for self.n_iter times, a number that is set to     of breach of contract court decisions is on the right track, although
500. Note that the learning rate self.eta is set to 0.0001.               much improvement remains to be made.
Classification of Breach of Contract Court Decision Sentences                                                  ASAIL 2019, June 21, 2019, Montreal, QC, Canada

Table 2: Probabilities of randomly guessing with exactly 87                                [6] James Franklin. 2012. Discussion paper: how much of commonsense and legal
or 87 or more correct predictions for the 159 testing sen-                                     reasoning is formalizable? A review of conceptual obstacles. Law, Probability
                                                                                               and Risk 11, 2-3 (06 2012), 225–245. https://doi.org/10.1093/lpr/mgs007
tences                                                                                     [7] Mirna Ghosh, Hala Naja, Habib Abdulrab, and Mohamad Khalil. 2017. Ontology
                                                                                               Learning Process as a Bottom-up Strategy for Building Domain-specific Ontol-
                                                                                               ogy from Legal Texts. In 9th International Conference on Agents and Artificial
    Event                            Formula                          Probability              Intelligence. 473–480. https://doi.org/10.5220/0006188004730480
                                                                                           [8] Thomas R. Gruber. 1995. Toward principles for the design of ontologies used
 87 exactly          BINOM.DIST(87,159,p,FALSE)                      9.08471E−28               for knowledge sharing? International Journal of Human-Computer Studies 43, 5
 87 or more        BINOM.DIST.RANGE(159,p,87,159)                    1.08519E−27               (1995), 907 – 928. https://doi.org/10.1006/ijhc.1995.1081
                                                                                           [9] Nicola Guarino and Pierdaniele Giaretta. 1995. Ontologies and knowledge bases:
                                                                                               towards a terminological clarification. In Towards Very Large Knowledge Bases,
                                                                                               N.J.I. Mars (Ed.). IOS Press, Amsterdam, 25–32.
                                                                                          [10] Matthew Honnibal et al. [n. d.]. spaCy: Industrial-Strength Natural Language
5    FUTURE RESEARCH DIRECTIONS                                                                Processing in Python. https://spacy.io/. ([n. d.]). Accessed: 2019-01-25.
                                                                                          [11] Deryle Lonsdale, David W. Embley, Yihong Ding, Li Xu, and Martin Hepp. 2010.
In our experiments, we discover that if the sum of the absolute                                Reusing ontologies and language components for ontology generation. Data &
values of the slopes does not reduce to zero, it usually will go                               Knowledge Engineering 69, 4 (2010), 318 – 330. https://doi.org/10.1016/j.datak.2009.
                                                                                               08.003 Including Special Section: 12th International Conference on Applications
through cycles with respect to the number of iterations determined                             of Natural Language to Information Systems (NLDB’07) - Three selected and
by the number self.n_iter, which is current set to 500. Our next goal                          extended papers.
is to find the right number for self.n_iter so that the sum of the                        [12] Sebastian Raschka. 2015. Python Machine Learning (1 ed.). Packt Publishing.
                                                                                          [13] Edwina Rissland. 1985. AI and Legal Reasoning. In Proceedings of the 9th interna-
absolute values of the slopes is as small as possible.                                         tional joint conference on Artificial intelligence - Volume 2. 1254–1260.
   The ontology in Figure 1 requires more details. More details will                      [14] Cui Tao and David W. Embley. 2009. Automatic Hidden-web Table Interpretation,
                                                                                               Conceptualization, and Semantic Annotation. Data & Knowledge Engineering 68,
result in more accurate characterization of the five types of sen-                             7 (July 2009), 683–703. https://doi.org/10.1016/j.datak.2009.02.010
tences, which will lead to more accurate classification algorithms.                       [15] Donald A. Waterman, Jody Paul, and Mark Peterson. 1986. Expert systems for
   The word-vector approach of calculating the similarity of two                               legal decision making. Expert Systems 3, 4 (1986), 212–226. https://doi.org/10.
                                                                                               1111/j.1468-0394.1986.tb00203.x
sentences might yield good results. Thus, we will utilize Word2vec,                       [16] Wikipedia contributors. [n. d.]. Word2vec. https://en.wikipedia.org/wiki/
which is one such algorithm for learning a word embedding from a                               Word2vec/. ([n. d.]). Accessed: 2019-01-26.
text corpus [16].

6    CONCLUSIONS
In this early stage of the research, we focus on breach of contract
court decisions. Five critical contract sentence types have been
identified: contract fact sentences, contract law sentences, contract
holding sentences, contract issue sentences, and contract reasoning
sentences. spaCy, an NLP Python library, is used to parse the court
decisions
   Three sample breach of contract court decisions are chosen to
test our own implementation and the highly optimized scikit-learn
implementation of the logistic regression algorithm. Our imple-
mentation consistently has more correct predictions than the one
provided by scikit-learn and it can serve as a platform for future
improvements. Our keyword approach is a good starting-point
because the number of correct predictions is far greater than the
number produced by randomly guessing the sentence types for the
159 testing sentences. Lastly, many possible future improvements
have also been identified.

REFERENCES
 [1] Joost Breuker, André Valente, and Radboud Winkels. 2004. Legal Ontologies in
     Knowledge Engineering and Information Management. Artificial Intelligence and
     Law 12, 4 (01 Dec 2004), 241–277. https://doi.org/10.1007/s10506-006-0002-1
 [2] Nuria Casellas. 2011. Legal Ontology Engineering: Methodologies, Modelling Trends,
     and the Ontology of Professional Judicial Knowledge (1 ed.). Law, Governance
     and Technology Series, Vol. 3. Springer Netherlands. https://doi.org/10.1007/
     978-94-007-1497-7
 [3] Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of
     Information Science and Technology 37, 1 (2003), 51–89. https://doi.org/10.1002/
     aris.1440370103
 [4] Kim Clark, Deepak Sharma, Rui Qin, Christopher G Chute, and Cui Tao. 2014. A
     use case study on late stent thrombosis for ontology-based temporal reasoning
     and analysis. Journal of Biomedical semantics 5, 49 (2014). https://doi.org/10.
     1186/2041-1480-5-49
 [5] David Cournapeau et al. [n. d.]. scikit-learn: Machine Learning in Python. https:
     //scikit-learn.org/stable/. ([n. d.]). Accessed: 2019-01-25.