<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The LATO Knowledge Model for Automated Knowledge Extraction and Enrichment from Court Decisions Corpora</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mattia Falduti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Al o Ferrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <email>stefano.montanellig@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universita degli Studi di Milano Department of Computer Science - Via Celoria</institution>
          ,
          <addr-line>18 - 20133 Milano</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge extraction systems are strongly demanded in the legal domain, to provide legal actors like judges or lawyers with useful and relevant information to enforce a knowledge-based evaluation and judgement of new cases. In this paper, we present LATO-KM, a three-layer legal knowledge model where terms featuring legal knowledge, both law and case-law, are properly formalized as entities and relationships and they are implemented in the LATO ontology using SKOS. The LATO ontology constitutes the core component of CRIKE (CRIme Knowledge Extraction), a data-science approach and related tool environment conceived to support legal knowledge extraction and enrichment from a corpus of Court Decision documents.</p>
      </abstract>
      <kwd-group>
        <kwd>Legal Knowledge Model</kwd>
        <kwd>Legal Ontology</kwd>
        <kwd>Knowledge Ex- traction</kwd>
        <kwd>Knowledge Enrichment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Law is the set of rules which govern human conduct. Law is stated using a
general and abstract terminology, in that it has to be applicable to several cases and
events. On the opposite, Court Decisions (CDs) are written using speci c and
concrete terminology, in that they provide a contextualized, case-oriented
interpretation of law deriving from the way judges/lawyers decide to apply the law
statements to the speci c circumstances/situation of the case at hand. Both law
and case-law (that is, the set of CDs) constitute prominent knowledge sources to
be considered for the knowledge-based evaluation and judgement of a new case,
in that they provide the general legal framework (law) and the speci c
interpretations (case-law) adopted for already processed cases. When a new case is received
for judgement, the knowledge-based evaluation process takes into account
relevant legal knowledge to support CD de nition, that is, knowledge deriving from
i) the law, for understanding the general rules that are relevant/prominent for
the current case [
        <xref ref-type="bibr" rid="ref12 ref9">9, 12</xref>
        ], and ii) the case-law, for detecting possible relevant
interpretations of law terminology in history of similar CDs [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. In this context,
automated legal knowledge extraction systems are strongly demanded, to
support annotation of legal documents as well as legal knowledge extraction from
them, to provide legal actors (e.g., judges, lawyers) with useful and relevant
suggestions for managing incoming new cases [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the literature, some
contributions are appearing [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the authors propose to combine Natural
Language Processing (NLP) and machine learning techniques for mining relevant
legal terms from documents. The LUIMA approach characterized by sentence-level
annotations and reranking techniques has been also proposed to enforce retrieval
over a CD dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Moreover, a particularly relevant contribution is provided
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] about extraction of case law sentences for argumentation of statutory
terms. However, the accuracy of the above solutions depends on the
completeness of the term-sets associated with concepts. Due to the variety of terminology
adopted by judges in legal documents such as Court Decisions, the construction
of accurate and complete term-sets to associate with concepts is really hard to
obtain. Moreover, a challenging issue for e ective legal knowledge extraction is
related to the capability of developing knowledge models and related ontology
tools where to link the general and abstract knowledge, as it is expressed by law
terminology in law texts, with speci c and concrete knowledge as it is expressed
in CD texts. In fact, the task of discovering where and how law abstract terms
have been applied by judges inside Court Decisions is currently performed by
human experts and it is a time-consuming activity in most cases [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In this paper, we present LATO-KM (Legal Abstract Term Ontology -
Knowledge Model), a three-layer knowledge model where terms featuring legal
knowledge, both law and case-law, are properly formalized as entities and relationships
and they are implemented in a LATO ontology using SKOS. The LATO-KM
and the related LATO ontology constitute a core component of CRIKE (CRIme
Knowledge Extraction), a data-science approach and related tool environment
conceived to support knowledge extraction and enrichment from a corpus of
Court Decision documents. Knowledge extraction in CRIKE is based on
multilabel classi cation techniques that aim at associating CD documents with
appropriate concepts in the LATO ontology. Knowledge enrichment in CRIKE is based
on black-box model explanation techniques that aim at selecting the document
features (i.e., terms) candidate for enrichment of the LATO ontology.</p>
      <p>The paper is organized as follows. Section 2 presents the LATO ontology
formalization. The CRIKE techniques for knowledge extraction and enrichment
are described in Section 3. In Section 4, we discuss experimental results on a
real CD dataset. Concluding remarks are nally provided in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The LATO legal knowledge model</title>
      <p>
        The legal knowledge model of LATO captures and formalizes the features and
nature of terminology used in law and case-law documents. A design challenge is
to nd a suitable way of modeling the di erent nature of terms appearing in law
and case law as well as their meaning and roles [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To model legal knowledge
and capture these requirements, we de ne LATO-KM, a three-layer knowledge
model based on the following entities and relationships (see Fig. 1):
      </p>
      <p>Legend
FUNCTIONAL
CATEGORY</p>
      <p>LEGAL
CONCEPT
TERM-SET
is-a
istance-of
Drug Minor
Offense
Act</p>
      <p>Drug Trafficking
Illinois Controlled
Substances Act</p>
      <p>Illinois Contr.</p>
      <p>Sub. Act,
ICSA,
…</p>
      <p>Deal
Deal,
Dealing
to deal,
…</p>
      <p>Unit of Measure</p>
      <p>Weight
Kilo</p>
      <p>Kilo,
kilogram,
kg.,
…
{ Legal concept: a legal concept Ci denotes a general rule/fact/element de ned
in the law (e.g., Act, Illinois Controlled Substances Act) and it is labelled
with the terminology that appears in law texts. Legal concepts constitute
the intermediate layer of LATO-KM.
{ Term-set: a term-set Ti represents the concrete interpretation of a legal
concept Ci in form of a set of terms occurrences that can be found in case-law
texts. A term in a term-set is a string of characters of the language of the case
law texts; also multi-term expressions are considered as terms in LATO (e.g.
both Illinois Contr. Sub. Act and the acronym ICSA are terms). Term-sets
constitute the bottom layer of LATO-KM.
{ Functional category: a functional category represents the di erent kinds/roles
of legal concepts in the law formulation, namely descriptive, statutory,
modi er, and abstract, respectively. A statutory category describes a legal</p>
      <p>
        concept featuring something that is directly or indirectly de ned in the law
speci cation itself (e.g., Act). A descriptive category describes a legal
concept featuring actions, human activities, and any real-life object in the
law speci cation (e.g., Drug Tra cking). A modi er category describes
a legal concept featuring quantitative/qualitative aspects of things/actions
in the law speci cation (e.g., Weight). An abstract category describes a
legal concept featuring something indeterminate that requires a concrete
application for being really de ned (e.g., Drug Minor O ence) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Functional
categories constitute the top layer of LATO-KM.
      </p>
      <p>According to LATO-KM, the concrete meaning of legal concepts is fully
de ned by referring to the speci c terminology (i.e., term-set) that appears in
real CDs. Moreover, legal concepts are classi ed with respect to the role they
play in the law formulation using functional categories. To formalize, a legal
concept Ci is de ned as 3-uple of the form:</p>
      <p>Ci = hn(Ci); C(Ci); Ti i
where:
{ n(Ci) is the label of the legal concept;
{ C(Ci) 2 fSC; DC; M C; ACg is the functional category of Ci, either statutory
(SC), descriptive (DC), modi er (M C), or abstract (AC).
{ Ti = ft1; : : : ; tng is the term-set of the concept Ci, namely the language
terms concretely used in legal document corpora (i.e., Court Decisions) to
refer to Ci. The asterisk symbol (\*") denotes optionality, in that we may
have some legal concepts not yet associated with a corresponding term-set.
For instance, abstract concepts are not directly associated with a speci c
term-set, but rather they are indirectly expressed through the term-sets
associated with the legal concepts to which the abstract concept is related.</p>
      <p>Intra- and inter-layer relationships are de ned in LATO-KM to capture the
semantic relationships that hold between pairs of entities. The following
intralayer binary relationships are de ned in LATO-KM:
{ Term-to-Term: it is a binary relationship between a pair of terms t and t0
in a term-set Ti at the bottom layer, that holds due to either a
morphological or a linguistic relationship between terms. Examples of morphological
relationships are:
paradigm (e.g. to deal - dealt - dealt)
conjugation for verb (e.g. dealt - deals - dealing)
declension for nouns (e.g. drug - drugs - drug's)
abbreviation (e.g. Illinois Contr. Sub. Act - ICSA)
string similarity (e.g. Substances Act - substances act - Subs. Act).
An example of linguistic relationship is synonymy (e.g., Paragraph - Section).
{ Concept-to-Concept: it is a binary relationship between two legal
concepts Ci and Cj at the intermediate layer, capturing semantic
relationships holding between them in the law formulation. In particular, we
introduce the kind-of relationship between two concepts to represent a
generalization/specialization relationship between them. For example, Drug minor
o ence kind-of Minor o ence is de ned to express the fact that the former
is a more speci c crime than the latter in the law. Moreover, we introduce
the related relationship between two concepts to represent a generic positive
relationship between them. For example, Drug Minor O ence related Drug
is de ned to express the fact that the crime of drug minor o ence involves
detention of drug in some quantity.</p>
      <p>The following inter-layer binary relationships are de ned in LATO-KM:
{ Term-to-Concept: it is a binary relationship between a term t 2 Ti and a
legal concept Ci denoting that Ci can be \lexicalized" by t in a CD text. A
Term-to-Concept relationship is de ned through the instance-of relationship
for each term t 2 Ti and the corresponding legal concept Ci at the
intermediate layer of LATO-KM. For example, ICSA instance-of Illinois Controlled
Substances Act is de ned to express the term ICSA belongs to the term-set
of the concept Illinois Controlled Substances Act.
{ Concept-to-Category: it is a binary relationship between a legal concept
Ci and a functional category C (Ci) expressing the nature of the concept in
the law formulation. A Concept-to-Category relationship is de ned through
the is-a relationship. Act is-a Statutory is de ned to express that the notion
of Act is directly de ned in the law.
2.1</p>
      <sec id="sec-2-1">
        <title>The LATO ontology structure</title>
        <p>
          The LATO-KM is implemented in a LATO ontology by using the Simple
Knowledge Organization System (SKOS) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Table 2.1 provides a summary view of the
SKOS concepts and relations used in the LATO ontology to implement entities
and relationships of the LATO-KM (see Fig. 1).
        </p>
        <p>The legal concepts of the intermediate layer are implemented as SKOS
concepts in LATO. Concept-to-Concept relationships are speci ed through a
corresponding SKOS relation. In particular, the kind-of relationship of LATO-KM
is speci ed through the skos:broader relation. For instance, a skos:broader relation
is de ned between the concept Cocaine and the concept Drug. The Related
relationship of LATO-KM is speci ed through the skos:related relation. For instance,
a skos:related relation is de ned between the concept Drug Minor O ence and the
concept Drug.</p>
        <p>The term-sets of the bottom layer are implemented using labels of SKOS
concepts. In particular, for each SKOS concept i) a skos:prefLabel is de ned to
implement the instance-of relationship, and ii) a number of skos:altLabel are
dened to implement the various Term-to-Term relationships denoting possible
alternative terms for the considered SKOS concept. For instance, a skos:prefLabel</p>
        <p>RELATION</p>
        <p>EXAMPLE</p>
        <p>RELATION NAME</p>
        <p>SKOS IMPLEMENTATION
Conceptto-Category
Conceptto-Concept
Term-toConcept
Term-toTerm
relation is de ned between the Drug LATO concept and the Drug term, while a
skos:altLabel relation is de ned between the Drug term and the Narcotics term.</p>
        <p>Finally, functional categories of the top layer are implemented as SKOS
concepts, too. Concept-to-Category is-a relationships are expressed through the
skos:broader relation. For instance, a skos:broader relation is de ned between the
Drug LATO concept and the Descriptive category concept.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Knowledge extraction and enrichment in CRIKE</title>
      <p>The LATO ontology is a core component of the CRIKE approach to enforce
knowledge extraction and enrichment based on a given corpus of Court
Decision documents (see Figure 3). The goal of CRIKE is to progressively enrich
the knowledge speci ed in a reference LATO ontology by extracting concrete
terminology associated with concept applications/interpretations occurring in
the considered document corpus. At the beginning, CRIKE relies on an initial
version of the LATO ontology where domain experts manually de ne a starting
set of legal concepts of interest with associated term-sets. CRIKE is enforced as
a cyclic incremental approach where the execution of knowledge extraction and
knowledge enrichment tasks produces a new enriched version of the LATO
ontology. The enrichment task consists in discovering terms to populate term-sets
Corpus of
Court Decisions</p>
      <p>CRIKE (CRIme Knowledge Extraction)</p>
      <p>LATO
ontology</p>
      <p>LATO-KM
knowledge
enrichment
knowledge
extraction</p>
      <p>legal actors
(e.g., judges, lawyers)</p>
      <p>Court Decisions
on incoming trials
of legal concepts that have recognized in the text of the CD documents. This
new ontology version is then exploited to trigger the execution of a new CRIKE
cycle to further enrich the LATO ontology. The enforcement of CRIKE cycles
is stopped when the enrichment of the LATO ontology is terminated, namely
when it is not possible to detect/extract additional terms to insert in the LATO
ontology. As a result, the knowledge currently-available in the LATO ontology
can be exploited to support legal actors such as judges and lawyers in managing
new incoming legal trials and taking appropriate Court Decisions.
Knowledge extraction in CRIKE is based on multi-label classi cation techniques
where the training set is built by relying on the ontology contents without the
need of manual annotation. In other words, CRIKE works as a sort of
selftraining scheme that can be considered as a kind of semi-supervised learning
approach. Extraction is articulated in three main steps as follows:</p>
      <p>Document annotation. For each CD document d, the goal of annotation
is to determine the set of associated legal concepts Cd as follows:
(</p>
      <p>"
Cd =</p>
      <p>Ci :</p>
      <p>X w(t; d)
t2Ti
#
th
)
where w(t; d) is the weight of a term t in the document d according to standard
information retrieval techniques based on tokenization and tf-idf, while th is a
threshold used to set the minimum cumulative weight of all the terms t 2 Ti
that is required for associating a corresponding concept Ci with the document
d.</p>
      <p>
        Document vector representation. For each document d in the corpus, a
vector-based representation d is derived by exploiting doc2vec techniques [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In
particular, doc2vec is based on an unsupervised algorithm that learns xed-length
feature representations from variable-length pieces of texts (e.g., documents).
The algorithm represents each document by a dense vector which is trained to
predict words in the document. In addition, each document vector d is associated
with a concept vector cd, where each vector dimension denotes a concept Ci in
the LATO ontology whose value is set to 1 if Ci 2 Cd, or it is set to 0 otherwise.
      </p>
      <p>Document classi cation. A multi-label classi er is employed to generate
a model that is capable to predict the association of CD documents with
legal concepts. In CRIKE, we employ a Convolutional Neural Network (CNN)
with the goal to generalize the terminology of the documents and to enable the
association of legal concepts with Court Decisions that actually contain terms
other than those already included in LATO. For each document d, the CNN
receives the document vector representation d as input and it produces the
corresponding concept vector representation cd as output. As a result, a multi-label
classi cation model M is generated to map the correspondence between corpus
documents and legal concepts in the LATO ontology. In particular, by Ci 2 M (d)
we denote that the document d is associated with the legal concept Ci through
the model M .
3.2</p>
      <sec id="sec-3-1">
        <title>Knowledge enrichment</title>
        <p>Knowledge enrichment in CRIKE is based on black-box model explanation
techniques that aim at selecting the document features (i.e., terms) that play a major
role in determining the decision of the CNN classi er (see above the knowledge
extraction step) about the association of concepts with the corpus documents.
The selected document features are candidate for the enrichment of the LATO
ontology. Enrichment is articulated in three main steps as follows:</p>
        <p>
          Classi cation explanation. Black-box model explanation is enforced by
relying on LIME (Local Interpretable Model-agnostic Explanations) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. LIME
has been proposed to provide local explanations of black-box models, which
means to explain why (i.e., due to which features) a black-box model decides
to assign a given class to a certain document [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. According to the classi cation
model M generated by the CNN, LIME calculates a score (t; d) for each term t 2
d, where (t; d) is directly proportional to the importance of t in determining the
model decision Ci 2 M (d). Moreover, we exploit LIME to extend the black-box
model explanation to the concept layer of LATO as follows. Given a concept Ci,
we consider all the documents DCi = fd : Ci 2 M (d)g and all the terminology
that is potentially relevant for Ci, that is:
        </p>
        <p>TCi =
8
&lt;
:
t : t 2</p>
        <p>9
[ d=
d2DCi ;
The choice of CNN is due to the positive experimental results we observed in a
number of considered case-studies. As a general remark, di erent kinds of
multilabel classi er can be employed for enforcing document classi cation, like for example
random forest and kNN.
Then, we associate each term t 2 TCi with a degree of relevance Ci (t) as follows:
Ci (t) =</p>
        <p>X</p>
        <p>X
t2TCi d2DCi
(t; d)
The set TCi is the set which contains the terms that are candidate to enrich the
term-set associated with Ci in LATO.</p>
        <p>Expert validation. For each concept Ci, legal experts are involved to
validate the terms in the set TCi n Ti. Furthermore, the legal experts de ne the set
Ri (TCi n Ti) containing the terms that are relevant for Ci. In the validation
step, the degree of relevance Ci (t) is exploited by the experts i) to lter out
terms whose association with the concept Ci is poor (i.e., low values of Ci (t)),
and ii) to select terms whose association with the concept Ci is strong (i.e., high
values of Ci (t)).</p>
        <p>Ontology enrichment. According to the results of legal expert validation,
the term-set Ti associated with each concept Ci is enriched. Being k the current
CRIKE cycle, enrichment is enforced as follows:</p>
        <p>Tik+1</p>
        <p>T k
i [ Ri
where Tik is the term-set initially associated with the concept Ci and Tik+1 is
the term-set associated with Ci after enrichment.</p>
        <p>Example. In Figure 4, we report an example of two court decision fragments, d1
and d2 that are associated with the legal concept Drug.</p>
        <p>d1: [...] Paragraph 14 of section 1 of the same act provides: \Narcotic Drugs
means coca leaves, opium, cannabis, and each substance neither chemically
nor physically distinguishable from them." [...]
d2: [...]Defendant, who was charged by indictment with violation of 402 of the
Illinois Controlled Substances Act" [...]</p>
        <p>The Court Decision d1 is included in the corpus used for training the
multilabel classi cation model M . The model classify both d1 and d2 as documents
related to the Drug legal concept. In case of d1, the choice of the classi er is
trivial, since d1 is classi ed as Drug-related in the training set and it contains the
terms narcotic drugs and cannabis. The decision of classifying d2 as a Drug-related
is instead less trivial, because d2 does not contain any of the terms provided by
experts as part of the term-set of the Drug concept. However, the two documents
are semantically similar. As a consequence, the two court decisions are close
enough in the feature (i.e., term) space to motivate the classi er decision of
associating both with the concept Drug. Thus, we can use LIME to detect the
terms of d1 and d2 that have the main impact on the classi er decision, namely
the terms that, if deleted from the court decision, may more likely produce
a di erent classi cation result. According to the LIME results, we obtain the
following terms for the concept Drug: narcotic drugs, controlled substances, cannabis,
coca leaves, opium. Among the list, narcotic drugs and cannabis are already present
in LATO, while the others (underlined in Figure 4) are validated by the legal
experts and included in an enriched version of the term-set layer of LATO.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental results</title>
      <p>The goal of our experimentation is to assess the capability of our approach to
discover new terms for enriching the term-sets of legal concepts in LATO. For
the experiments, we select six concepts from our legal ontology, namely drug, drug
tra cking, unit of measure, illinois legislation, criminal procedure, and evidence. These
concepts are all related to the drug criminal legislation of the State of Illinois. The
court decision corpus used for experiments is composed by 14,000,000 sentences
taken from about 180,000 decisions of courts of the State of Illinois taken from the
Caselaw Access Project (CAP) that provides public access to U.S. law (https:
//case.law/bulk/download) digitized from the collection of the Harvard Law
Library. Sentences are indexed by exploiting standard techniques for tokenization
and compound term detection. The initial term-sets associated with the selected
concepts have been manually de ned by a legal expert and they are shown in
Table 1.</p>
      <p>By using CRIKE, we select a subset of 115,993 court decision sentences that
constitutes the training set for the classi cation step. The training set is prepared
for classi cation by embedding each document in a 100 dimensions vector using
doc2vec to obtain a 115,993 100 corpus matrix. The six concepts selected for the
experiment are associated with CD documents with the document annotation
process discussed in Section 3. The model M used to train the classi er is a
neural network organized in three layers. Between the input and the output
layer, we use a convolution lter activated by ReLU. The M1 accuracy obtained
by cross-validation is 0.77. M is then used to perform terminology enrichment
using LIME. For each legal concept Ci, we obtain a new set of terms TCi , where
each term t is associated with the degree of relevance Ci (t). In the experiment,
a legal expert validated the top-20 terms in the new term-set Ti of each concept
Ci. In particular, the expert associates each term t with a numerical value in
j TCi j
j T 1 j
j T 0 j
j T 1 j
f 1; 0; 1g, where T 1 denotes the set of terms that were not in LATO and that
are not relevant for the concept Ci; T 0 denotes the set of terms that were in
LATO (and thus have been already validated as relevant); T 1 denotes the set
of terms that were not in LATO but that are relevant for the concept Ci. An
overview of the results of knowledge enrichment is shown in Table 2.</p>
      <p>The number of relevant terms retrieved through knowledge enrichment (i.e.,
terms in T 0 or T 1) is equal to the 83% of the total number of new terms validated
by the expert (TCi ). The 34% of those terms was not in the term-sets of LATO.
As expected, the increment of new relevant terms is higher for the concepts that
were associated with small term-sets, such as illinois legislation, criminal procedure,
and evidence. The number of irrelevant terms T 1 is limited with the exception of
the concept evidence, because the criminal evidences usually consist in common
objects that are used in a criminal context. These objects are thus associated
with a generic terminology (e.g., garbage, suitcase) that cannot be associated
per se to an evidence according to the legal expert. The new relevant terms are
nally included in the term-sets of LATO. A new CRIKE cycle has been then
executed. The new term-sets are exploited in the knowledge extraction steps and
a new training set of 158,398 CD sentences is extracted (+37% with respect to
the rst execution). These sentences are then used to train a new model M and
to enforce the execution of the knowledge enrichment steps. Finally, the accuracy
of M obtained by cross-validation is 0.81 (+5.2%).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Concluding remarks</title>
      <p>In this paper, we presented the LATO-KM for automated knowledge extraction
from Court Decisions corpora. The CRIKE knowledge extraction and
enrichment process is based on black-box models explanation techniques. Preliminary
results on a corpus of Court Decision documents show that our approach achieves
promising results in e ectively discovering new terminology for enriching the
term-sets associated with legal concepts in the LATO ontology. Ongoing work
is related to the extension of the LATO knowledge model to enforce rule-based
extraction and classi cation techniques. The goal is to improve the accuracy
in recognizing the application of abstract legal concepts in CD documents. We
aim to exploit reasoning techniques based on ontology rules de ned over legal
concepts for detecting concept instances throughout documents where speci c
constraints are satis ed.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ashley</surname>
          </string-name>
          , K.D.:
          <article-title>Arti cial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age</article-title>
          . Cambridge University Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Castano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falduti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montanelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Crime Knowledge Extraction: An Ontology-Driven Approach for Detecting Abstract Terms in Case Law Decisions</article-title>
          .
          <source>In: Proc. of the 17th Int. Conf. on Arti cial Intelligence and Law</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Grabmair</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashley</surname>
          </string-name>
          , K.D.:
          <article-title>Facilitating Case Comparison Using Value Judgments and Intermediate Legal Concepts</article-title>
          .
          <source>In: Proc. of the 13th Int. Conference on Arti cial Intelligence and Law</source>
          . pp.
          <volume>161</volume>
          {
          <fpage>170</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Grabmair</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashley</surname>
            ,
            <given-names>K.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sureshkumar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nyberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>V.R.</given-names>
          </string-name>
          :
          <article-title>Introducing LUIMA: an Experiment in Legal Conceptual Retrieval of Vaccine Injury Decisions Using a UIMA Type System and Tools</article-title>
          .
          <source>In: Proc. of the 15th Int. Conference on Arti cial Intelligence and Law</source>
          . pp.
          <volume>69</volume>
          {
          <fpage>78</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Guidotti</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monreale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruggieri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannotti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedreschi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Survey of Methods for Explaining Black Box Models</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 51(5)</source>
          ,
          <volume>1</volume>
          {
          <fpage>42</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Summers</surname>
          </string-name>
          , E.:
          <article-title>SKOS Simple Knowledge Organization System Primer</article-title>
          .
          <source>Tech. rep.</source>
          , Working Group Note,
          <volume>W3C</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distributed Representations of Sentences and Documents</article-title>
          .
          <source>In: Proc. of the 31st Int. Conference on Machine Learning</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nazarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wyner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>Legal NLP Introduction. Traitement Automatique des Langues</source>
          <volume>58</volume>
          (
          <issue>2</issue>
          ),
          <volume>7</volume>
          {
          <fpage>19</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Poudyal</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A machine learning approach to argument mining in legal documents</article-title>
          .
          <source>In: Proc. of the Int. Workshop on AI Approaches to the Complexity of Legal Systems</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : \
          <string-name>
            <surname>Why Should I Trust You</surname>
          </string-name>
          <article-title>?" Explaining the Predictions of Any Classi er</article-title>
          .
          <source>In: Proc. of the 22nd ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>1135</volume>
          {
          <issue>1144</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haapio</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmirani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Legal Design Patterns: Towards a New Language for Legal Information Design</article-title>
          .
          <source>In: Proc. of the 22nd Int. Legal Informatics Symposium</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sartor</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casanovas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biasiotti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez-Barrera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Approaches to Legal Ontologies: Theories, Domains, Methodologies, vol.
          <volume>1</volume>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Savelka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashley</surname>
          </string-name>
          , K.D.:
          <article-title>Extracting Case Law Sentences for Argumentation about the Meaning of Statutory Terms</article-title>
          .
          <source>In: Proc. of the 3rd Int. Workshop on Argument Mining</source>
          . pp.
          <volume>50</volume>
          {
          <issue>59</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Savelka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grabmair</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashley</surname>
          </string-name>
          , K.D.:
          <article-title>Mining Information from Statutory Texts in Multi-Jurisdictional Settings</article-title>
          .
          <source>In: Proc. of the Int. Conference on Legal Knowledge and Information Systems</source>
          . pp.
          <volume>133</volume>
          {
          <fpage>142</fpage>
          . IOS Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Thammaboosadee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiattisin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darakorn</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watanapa</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Sentence Identi cation System based on Criminal Law Ontology</article-title>
          .
          <source>International Review of Law, Computers &amp; Technology</source>
          <volume>31</volume>
          (
          <issue>3</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wagh</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          :
          <article-title>Knowledge Discovery from Legal Documents Dataset Using Text Mining Techniques</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>66</volume>
          (
          <issue>23</issue>
          ) (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>