<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Service Architecture for AI-based Legal Knowledge Extraction⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Bellandi</string-name>
          <email>valerio.bellandi@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
          <email>silvana.castano@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <email>stefano.montanelli@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Riva</string-name>
          <email>davide.riva1@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Milano Department of Computer Science Via Celoria</institution>
          ,
          <addr-line>18 - 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper presents a reference service architecture for legal knowledge extraction based on a combination of Natural Language Processing and Machine Learning techniques/services. A case-study as well as experimental results are presented based on a pilot dataset of civil court decisions in the framework of the NGUPP project funded by the Italian Ministry of Justice.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal Knowledge Extraction</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Legal Knowledge Graph</kwd>
        <kwd>Digital Justice</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Legal documents constantly produced by Parliaments, Courts, and other institutional bodies
constitute a prominent source of information and knowledge not only for legal actors like
judges or lawyers, but also for general subjects like citizens or private and public organizations.
To improve both eficiency and efectiveness of courthouses and legal record ofices and to
foster digital justice, a significant efort is being devoted in almost all countries to digital
transformation projects, by developing legal information systems and modular architectures
providing a variety of services for acquisition, management, classification, exploration, and
retrieval of legal documents. In this context, knowing how to navigate the complex structure
and content of legal documents is an arduous task, and the availability of a legal knowledge
extraction service is not only desirable but even mandatory, to capture and formalize the features
and variety of legal terminology into representative concepts enabling the retrieval of pertinent
and relevant chunks of information within large corpora of legal documents.</p>
      <p>In this paper, we present a reference service architecture for legal knowledge extraction based
on a combination of Natural Language Processing and Machine Learning techniques/services,
with application and experimentation on a pilot dataset of civil court decisions in the framework
of the NGUPP project funded by the Italian Ministry of Justice. We also discuss some
preliminary evaluations of the proposed legal knowledge extraction service on the EurLex
dataset.</p>
      <p>
        Related Work. Legal knowledge extraction relates to the extraction of terms, rules, and
concepts from legal documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Competitions in this research field helped to develop
methods for these and other tasks. COLIEE has addressed legal information extraction and
entailment on case law and statutes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. TREC Legal Track and AILA (Artificial Intelligence for
Legal Assistance) track have focused mainly on legal document retrieval [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Compared with document retrieval, knowledge extraction from legal documents has received
lower interest so far. The work in this direction has mainly favored the conceptualization of
domain ontology models [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ], and the use of ontologies and thesauri to extract specific kind
of knowledge (e.g., abstract terms [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], or legal rules [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). In text mining, knowledge extraction
has traditionally been performed by frequency-based [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and rule-based approaches [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Since
their advent, transformer-based language models have been widely adopted for such task. For
instance, [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] improves Named Entity Recognition using pre-trained BERT model [13]; [14]
relies on BERT to extract topics and their associated terminology; [15] uses BART generative
model [16] to extract (, , ) triples. Since the legal domain sufers a
lack of annotated data, pre-trained language models can be efectively used in the context of
Zero-Shot Classification (ZSC), namely the task of classifying data instances with labels that
were never observed in the data [17]. While transformer models fine-tuned on Italian legal
language have been developed (e.g., LamBERTa [18] and Italian Legal BERT [19]), we adopt
Sentence-BERT pre-trained models [20] due to a preference for consistent sentence semantic
representation over token representation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The proposed service architecture</title>
      <p>The modules/components of the proposed service architecture are shown in Figure 1.</p>
      <p>A data ingestion layer is defined to acquire the corpus of legal documents that needs to
be managed (e.g., law, judgements, sentences). Ingestion is executed as a stream operation,
meaning that the documents of the corpus are acquired when they become available and they
can be progressively added without system downtime. A storage layer is defined to maintain i)
the document database for the raw ingested documents and corresponding texts; ii) the document
annotations as well as the index system for full text, metadata, and annotation search; iii) the
graph database for the Entity Registry (ER) to store a unique entry for the entities extracted
from documents; iv) the system logs and related data to monitor the overall system. The storage
layer exposes the ER APIs to manage both the entity types (the ER metamodel) and the entity
instances as described in [21]. Document texts and metadata are stored in an ElasticSearch
instance, while annotations in a SQL database as discussed in our previous work [22]. As a
graph database for ER, we employed Neo4j.</p>
      <p>The layers for back-end components and front-end components constitute the backbone of the
proposed architecture. In back-end components, we distinguish modules for document manager,
service catalogue, and NLP services. About the document manager modules, with their APIs, they
legal actors
(e.g., judges, lawyers)</p>
      <p>ACCESS
CONTROL</p>
      <p>USER
MANAGEMENT</p>
      <p>FRONT-END
COMPONENTS</p>
      <p>BACK-END
COMPONENTS
ingested
documents</p>
      <p>DATA
INGESTION</p>
      <p>Exploration</p>
      <p>Search / Query</p>
      <p>Analytics
can be considered proxies for client programs to index, filter, and fetch data from the storage.
Configurable service catalogue modules are also defined for exploitation at ingestion time. The
catalogue provides functionalities to process the incoming data and to create manipulated
versions of the original documents through cleaning, pre-processing, summarization, and so on.
The catalogue can also provide analysis functionalities over data, to be invoked for analytics
purposes. Finally, the service catalogue can support orchestration functionalities to manage the
workflow of articulated services. A set of NLP services is included in the back-end. They provide
specific services according to the kind of mining operations that the system aims to support,
like for example Named Entity Recognition (NER) and Linking (NEL), and concept extraction.
A Kafka queue is created when a NLP service is invoked, with all the information needed to get
the proper data. All the NLP services must expose standard APIs to be called by the system;
they must read the Kafka queue and use the parameters found in it to obtain the input texts.
At the end, they pass back their output to the document manager modules for storage in the
annotation database and the entity registry.</p>
      <p>In front-end components, we extend our previous work in [22] and we distinguish modules for
exploration, search/query, and analytics. These modules expose APIs to enforce the interaction of
users with the back-end components. Exploration allows to move from one document to another
according to similarity-based criteria. The idea is to provide a service for browsing the corpus
according to their common entities and/or concepts extracted by NLP services. Search/query
allows to retrieve pertinent documents according to input entities/concept of interest provided
by the final user. Analytics allows to examine the corpus through summary/statistical views
built over data, such as for example the distribution of an entity or concept in the corpus, the
shortest path (through documents) between given concepts or entities; the centrality of entities
and concepts, and so on.</p>
      <p>Finally, appropriate modules are included in the architecture to provide conventional access
control and user management functionalities.</p>
      <p>Some modules, like the ingestion and the document management, are executed in the
backend, in a transparent mode with respect to the user. A document becomes available for the
front-end services when the ingestion stage is completed. Some other modules, like the NLP
services and the front-end services, are lively invoked in response to a user request. In a typical
scenario, the document manager modules are invoked to ingest the (set of) documents to import.
Documents are stored as is, then cleaning and pre-processing are executed to extract and store
cleaned copies upon which full text and metadata indexing are executed. NLP services are then
invoked by front-end components to enforce the specific service functionality required by the
ifnal user. In this stage, the filtering module of the back-end can be invoked before the NLP
service to define the subset of documents to use for satisfying the user request.</p>
      <p>In the remaining of the paper, we present a knowledge extraction pipeline and an example of
exploration service based on a case-study of Italian sentences.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The knowledge extraction service</title>
      <p>The knowledge extraction service exploits the ingested documents to mine a set of featuring
concepts that provide a topic-oriented description of their textual contents. The concepts
extracted from the documents are organized in a graph, where a pair of similar concepts is
linked by an edge. Each concept is also connected to the document portions from which the
concept emerged, meaning that we can explore the pertinent document segments where a
certain concept somehow occurs. Our solution exploits Natural Language Processing (NLP)
techniques based on zero-shot learning and context-aware embedding models to enforce concept
extraction. A detailed description of the proposed zero-shot learning approach to classification
of legal documents is provided in [23]. In the following, we discuss how such an approach to
knowledge extraction has been integrated as a pipeline in the infrastructure of Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Data pre-processing</title>
        <p>For knowledge extraction, the data pre-processing stage is based on a tokenization step, where
the text of each ingested document  is split into a set of chunks. A document chunk  represents
the text unit to consider for classification and it determines the granularity of the document
that can be associated with a concept. We stress that the size of the document chunk should be
large enough, so that the context can be captured, but not too much extended to avoid segments
that are long to read and potentially noisy due to the presence of multiple concepts. In this
paper, we choose to tokenize documents by defining a chunk for few sentence/phrase detected
in a document, up to a maximum size of 512 words. This is particularly appropriate for legal
actors (e.g., lawyers, practitioners) that are typically interested in retrieving precise document
excerpts in which a given concept of interest appears and can be rapidly read/assimilated.</p>
        <p>As a further pre-processing step, the terms appearing in document chunks are lemmatized
and a vector-based representation of each document chunk is finally built. The use of embedding
techniques to represent chunks allows to map the document contents on a semantic vector
space where the similarity of two chunks can be measured by comparing the corresponding
vector representations through a similarity metric (e.g., cosine similarity). For embedding
construction, Sentence-BERT [20], a modification of the original BERT model based on siamese
and triplets networks, is employed to derive a semantically meaningful embedding for a given
sentence/phrase. As such, a document chunk is associated with a set of terms  therein
contained. Any term is described as  = (, , ¯) , where  is the label of the term (i.e., the
lemma),  is a description of the term meaning taken from a reference dictionary/vocabulary
(e.g., WordNet), and ¯ is the corresponding vector-based representation according to
SentenceBERT, respectively. A document chunk  has the form  = (, ¯), where  is the original
textual content of the chunk and ¯ is the corresponding vector-based representation calculated
as the mean of term vectors ¯ with  ∈ . Embedding models have the capability to represent
and compare the meaning of entire text blocks like document chunks. On such a target,
contextaware embedding models fine-tuned on document similarity tasks, like Sentence-BERT, are
appropriate. In the legal field, the phrase structure can be highly articulated, and some common
terms can have a precise technical meaning when used in a court (e.g., citation, clemency,
designation). Sentence-BERT can handle such a kind of situations, which may strongly deviate
with respect to everyday conversations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Concept extraction</title>
        <p>The document chunks are exploited by zero-shot learning techniques to enforce a multi-label
classification process with the aim at detecting a set of featuring concepts. Zero-shot learning is
an unsupervised classification technique, characterized by the capability to enforce classification
without requiring any pre-existing annotation of the considered documents.</p>
        <p>Initially, a seed knowledge is defined as a set of textual descriptions, each one featuring a
concept of interest, namely a seed concept, to consider for classification. Typically, for a seed
concept, a basic, gross-grained description is provided as a short text (e.g., one or two phrases)
or a list of keywords. As an example, for a seed concept about banking contract, a corresponding
textual description used for embedding is bank deposit, safe deposit box, bank credit opening, bank
advance, bank account, bank discount. Further concepts are derived from seed ones during the
extraction process, and they usually provide a more fine-grained description of the concept
instances occurring in the document chunks. A concept , either seed or derived, is defined
as a pair  = (, )¯ , where  is a label featuring the meaning of the concept expressed in a
synthetic and human-understandable way, and ¯ is a vector-based concept representation. Each
concept  is initially associated with the set of terms  extracted from the textual description
of . The vector concept ¯ is built as the mean of the vectors of all the terms in . Finally,
the label  corresponds to the label  of the term  ∈ , whose vector representation ¯ is
closest to the concept vector ¯. Concept extraction is defined as a progressive, iterative process
articulated in the following three steps:</p>
        <p>Zero-shot classification . Given a set of concepts (i.e., the seed concepts at the beginning
of the process), the document chunks are classified through zero-shot learning. A similarity
measure  , e.g. cosine similarity, is calculated over any pair of embeddings between chunks
and concepts. A document chunk  is classified with the concept  when the similarity value
satisfies  (, ) ≥  , with  defined as a similarity threshold configured in the system. The
value of  is empirically determined according to experimental results. In this paper, the value
 = 0.3 is employed in the proposed case-studies and experiments.</p>
        <p>Terminology enrichment. Given a document chunk  classified with the concept , the terms
in  are exploited for enriching the term set . The idea is that the initial description of
the concept  can become more detailed if we add terminology taken from chunks that are
pertinent (i.e., classified) with . This is done by summing, for each  ∈ , similarities  (, )
and  (, ), where  denotes the average embedding of document chunks classified with .
Terms  ∈  satisfying a system-defined  similarity threshold are inserted in .</p>
        <p>Concept derivation. By enriching the term set , it is possible that more fine-grained concepts
emerge from , and they can be generated as new concepts. The discovery of possible new
concepts emerging from  is enforced by clustering the embedding vectors ¯ of terms in .
The Afinity Propagation (AP) algorithm is adopted to this end, since it allows to detect the
emergence of sub-groups of similar terms within , without requiring to “a-priori define” the
number of clusters to generate. A new concept ′ is created for each cluster returned by AP
on the terms  of a concept . A link is defined between a concept ′ and  to denote that
′ is derived from  and they are somehow similar/related in content. The concept  is then
updated since the terms in  can be changed due to enrichment. As a consequence,  and ¯
are re-calculated.</p>
        <p>The set of concepts obtained after derivation can trigger the execution of a new cycle based
on the above three steps. New derived concepts can contribute to improve the classification of
chunks with more fine-grained concepts. Further new concepts can be also discovered through
a new execution of enrichment and derivation on the basis of a refined classification result.
As such, concept extraction is characterized by a predefined endpoint condition based on a
termination threshold. When the number of new concepts created in the derivation step is lower
than the threshold, the concept extraction process is concluded. A final concept graph providing
a topic-based description of the underlying document corpus is stored in the entity registry for
subsequent exploitation by the front-end services. An example of concept graph extracted from
a case-study of Italian legal documents will be discussed in Section 4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Application to the Italian context and evaluation</title>
      <p>In the following, we discuss some application examples and evaluation results by considering a
corpus of Italian court decisions collected in the framework of the Next Generation UPP (NGUPP)
project, funded by the Italian Ministry of Justice.</p>
      <p>We consider a case-study about “unfair competition” as subject matter and we invoke our
knowledge extraction service with the aim to explore the concepts extracted from the corpus on
such a subject. The user can enforce a preliminary filtering step over the document metadata to
select the set of court decisions to consider for concept exploration. The example is based on a
dataset of 34 documents resulting from the following filtering operations: first level of judgment,
judicial district in North-Western Italy, year of decision from 2008 onwards, subject matter
corresponding to 172011 or 172012, that are subject codes related to unfair competition in the
Italian law. In Figure 2, we show the concept graph returned by the knowledge extraction service
for describing the filtered dataset on unfair competition. We note that most of the graph concepts
pertain to the domain of trade justice (e.g., “consortium”, “partnership”, “transaction”), by also
describing specific aspects concerned with unfair competition. Through links, it is possible to
move from specific concepts (e.g., “sponsorship”) to more general ones (e.g., “business”), and
vice-versa. In the example, general concepts are usually associated with more chunks than
specific concepts. We also note that some concept labels appear many times (e.g., “business”,
“sponsorship”), meaning that they refer to diferent senses of the concept label.</p>
      <p>For evaluation of our concept extraction process, we consider EurLex57k, that is a dataset of
57,000 EU legislative documents annotated with labels representing entities, concepts, and topics
from the EuroVoc thesaurus1. The goal of the evaluation is to assess whether our extracted
concepts correspond with the labels of EuroVoc used for annotating the EurLex57k dataset. As
a baseline, we consider BERTopic [14] since it is a topic modeling approach based on BERT and
the mined topics can be straightforwardly compared to our extracted concepts. In Figure 3, we
show the precision-recall curve obtained by our concept extraction service when various values
of  and  thresholds are employed. We note that our solution outperforms the BERTopic
baseline: despite a 0.05 decrease, precision remains higher than the baseline even when recall
increases (i.e., when more concepts are extracted).</p>
      <p>As a further experiment, we consider the results of the zero-shot classification and we
evaluate the correspondence of our extracted concepts assigned to chunks w.r.t. the EuroVoc
label assigned to documents. Results in terms of precision and recall are shown in Table 1
1https://eur-lex.europa.eu/browse/eurovoc.html?locale=en.
by providing mean and standard deviation at the document level. In the experiment, the
following thresholds are set:  =  = 0.3. We note that precision and recall of our concept</p>
      <sec id="sec-4-1">
        <title>Model Our system BERTopic</title>
      </sec>
      <sec id="sec-4-2">
        <title>Precision</title>
        <p>0.593 (0.061)
0.455 (0.306)</p>
      </sec>
      <sec id="sec-4-3">
        <title>Recall</title>
        <p>0.681 (0.078)
0.422 (0.287)
extraction service are not only higher, but also significantly less variable than the ones obtained
by BERTopic according to the standard deviation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Concluding remarks</title>
      <p>In this paper, we presented a service architecture for legal knowledge extraction based on NLP
services. A case-study has been presented by considering a dataset of Italian court decisions
within the NGUPP project funded by the Italian Ministry of Justice. Preliminary results are
promising. Ongoing activities as about the development of a Proof-of-Concept of the proposed
architecture where a larger dataset of sentences will be considered from multiple legal-subject
areas. The integration of more services is under development as well as the capability to
orchestrate complex workflows where multiple services are involved. Moreover, future research
work is about the comparison and possible extension of our NLP services through alternative
mechanisms for document annotation (e.g., Semantic Role Labeling).</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is partially supported by i) the Next Generation UPP project within the PON
programme of the Italian Ministry of Justice, and ii) the project SERICS (PE00000014) under the
MUR NRRP funded by the EU - NextGenerationEU.
domain named entity recognition with distant supervision, in: Proceedings of the 26th
ACM SIGKDD international conference on knowledge discovery &amp; data mining, 2020, pp.
1054–1064.
[13] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[14] M. Grootendorst, Bertopic: Neural topic modeling with a class-based tf-idf procedure,
arXiv preprint arXiv:2203.05794 (2022).
[15] G. Rossiello, F. Chowdhury, N. Mihindukulasooriya, O. Cornec, A. Gliozzo, Knowgl:</p>
      <p>Knowledge generation and linking from text, arXiv preprint arXiv:2210.13952 (2022).
[16] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L.
Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language generation,
translation, and comprehension, arXiv preprint arXiv:1910.13461 (2019).
[17] M.-W. Chang, L.-A. Ratinov, D. Roth, V. Srikumar, Importance of semantic representation:</p>
      <p>Dataless classification., in: Aaai, volume 2, 2008, pp. 830–835.
[18] A. Tagarelli, A. Simeri, Unsupervised law article mining based on deep pre-trained language
representation models with application to the italian civil code, Artificial Intelligence and
Law (2021) 1–57.
[19] D. Licari, G. Comandè, Italian-legal-bert: A pre-trained transformer language model for
italian law, in: Companion Proceedings of the 23rd International Conference on Knowledge
Engineering and Knowledge Management, Bozen-Bolzano (Italy), 2022.
[20] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
arXiv preprint arXiv:1908.10084 (2019).
[21] V. Bellandi, S. Siccardi, An Entity Registry: A Model for a Repository of Entities Found in
a Document Set, Computer Science &amp; Information Technology (CS &amp; IT) 13 (2023).
[22] C. Batini, V. Bellandi, P. Ceravolo, F. Moiraghi, M. Palmonari, S. Siccardi, Semantic Data
Integration for Investigations: Lessons Learned and Open Challenges, in: Proc. of the IEEE
Int. Conference on Smart Data Services (SMDS), Chicago, IL, USA, 2021.
[23] V. Bellandi, S. Castano, P. Ceravolo, E. Damiani, A. Ferrara, S. Montanelli, S. Picascia,
A. Polimeno, D. Riva, Knowledge-Based Legal Document Retrieval: A Case Study on
Italian Civil Court Decisions, in: Proc. of the 1st Int. Knowledge Management for Law
Workshop (KM4LAW), Bozen-Bolzano, Italy, 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M.-F. Moens</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Uyttendaele</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dumortier</surname>
          </string-name>
          ,
          <article-title>Information extraction from legal texts: the potential of discourse analysis</article-title>
          ,
          <source>International Journal of Human-Computer Studies</source>
          <volume>51</volume>
          (
          <year>1999</year>
          )
          <fpage>1155</fpage>
          -
          <lpage>1171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rabelo</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yoshioka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Satoh</surname>
          </string-name>
          ,
          <year>Coliee 2020</year>
          :
          <article-title>methods for legal document retrieval and entailment</article-title>
          ,
          <source>in: New Frontiers in Artificial Intelligence: JSAI-isAI 2020 Workshops</source>
          ,
          <string-name>
            <surname>JURISIN</surname>
          </string-name>
          , LENLS 2020 Workshops, Virtual Event,
          <source>November 15-17</source>
          ,
          <year>2020</year>
          ,
          <source>Revised Selected Papers 12</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Grossman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hedin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <article-title>Overview of the trec 2010 legal track</article-title>
          .,
          <source>in: TREC</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Fire 2019 aila track: Artificial intelligence for legal assistance</article-title>
          ,
          <source>in: Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Breuker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Bello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boer</surname>
          </string-name>
          , et al.,
          <article-title>The lkif core ontology of basic legal concepts</article-title>
          .
          <source>, LOAIT</source>
          <volume>321</volume>
          (
          <year>2007</year>
          )
          <fpage>43</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Barabucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Di</given-names>
            <surname>Iorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vitali</surname>
          </string-name>
          ,
          <article-title>Integration of legal datasets: from meta-model to implementation</article-title>
          , in
          <source>: Proceedings of International Conference on Information Integration and Web-based Applications &amp; Services</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>585</fpage>
          -
          <lpage>594</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bartolini</surname>
          </string-name>
          , L. Robaldo, Pronto:
          <article-title>Privacy ontology for legal reasoning</article-title>
          ,
          <source>in: Electronic Government and the Information Systems Perspective: 7th International Conference, EGOVIS</source>
          <year>2018</year>
          , Regensburg, Germany, September 3-
          <issue>5</issue>
          ,
          <year>2018</year>
          , Proceedings 7, Springer,
          <year>2018</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Falduti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <article-title>Crime knowledge extraction: an ontologydriven approach for detecting abstract terms in case law decisions</article-title>
          ,
          <source>in: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Villata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          , G. Governatori,
          <article-title>Combining nlp approaches for rule extraction from legal documents</article-title>
          ,
          <source>in: 1st Workshop on MIning and REasoning with Legal texts (MIREL</source>
          <year>2016</year>
          ),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tarau</surname>
          </string-name>
          , Textrank:
          <article-title>Bringing order into text</article-title>
          ,
          <source>in: Proceedings of the 2004 conference on empirical methods in natural language processing</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>404</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stanovsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dagan</surname>
          </string-name>
          ,
          <article-title>Creating a large benchmark for open information extraction</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>2300</fpage>
          -
          <lpage>2305</lpage>
          . URL: https: //aclanthology.org/D16-1252. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D16</fpage>
          -1252.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Er</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Bond: Bert-assisted open-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>