<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A pipeline for data management, knowledge extraction and semantic analysis of unstructured legal judgments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Bonfanti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Colombino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgia Iacobellis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rachele Mignone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Spada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurentiu Jr Marius Zaharia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marinella Quaranta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marianna Molinari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susanna Marta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Angela Amantea</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Audrito</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilio Sulis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Di Caro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Boella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department - University of Turin</institution>
          ,
          <addr-line>Via Pessinetto 12, 10149, Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes a pipeline for data management, knowledge extraction and semantic analysis of unstructured legal judgments on a digital database. The research focuses on the storage of judgments, the processing of textual content through the use of Natural Language Processing and AI technologies and the advanced semantic navigation of the database. These results are obtained from the research group of the University of Torino in the NGUPP project.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal informatics</kwd>
        <kwd>Legal document classification</kwd>
        <kwd>Legal document similarity</kwd>
        <kwd>Principles of Law</kwd>
        <kwd>Text embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Ofice for Trial. The Ofice for Trial (UPP) is an organi</title>
        <p>zational structure made up of court assistants, operating
The digitalization of justice concerns both the direct ac- in the judicial ofices. The UPP aims of ensuring the
tivity of judges and lawyers and the sources from which reasonable length of the proceedings, through the
innothey draw information on precedents and laws. A more vation of organizational models, the increase in human
eficient exploitation of the stock of knowledge embod- resources and a more eficient use of technologies.
Proied in the decisions issued by the Courts implies a corre- vided for in Article 16-octies of Decree-Law No. 179/2012,
sponding eficiency gain of the justice system as a whole. which firstly highlighted a link between technological
Legal informatics aims at providing a possible feasible innovation, organization and quality of justice; it has
resolution to increase the eficiency of the justice system cently been revalued as a stable organizational structure,
by unlocking its very own potential. This work describes thanks to the Italian latest justice reform, and so destined
a pipeline for processing judgment with the creation of to operate even after the achievement of the National
a unified digital database for national Courts, through Recovery and Resilience Plan (NRRP) objectives.
the adoption of a Web App, aimed at the storage of
judgments, the processing of textual content through the use
of Natural Language Processing / AI technologies, and
the advanced semantic navigation of the database thus
created.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Research project. The Next Generation UPP project (NGUPP) aims at improving the eficiency of the judicial system in north-western Italy, by testing - throughout the 35 judicial ofices involved - new collaborative schemes</title>
        <p>Ital-IA 2023: 3rd National Conference on Artificial Intelligence, orga- between universities and judicial ofices in order to
pronized by CINI, May 29–31, 2023, Pisa, Italy vide to UPP employees transversal skills to ensure the
ef$ chiara.bonfanti@edu.unito.it (C. Bonfanti); fective functioning of a contemporary judicial system and
michele.colombino@edu.unito.it (M. Colombino); to provide support for the process of digitalization and
rgaiochrgeilae..imacigonbeolnlies@@uendiut.ou.intit(oR..itM( Gig.nIoancoeb);eilvliasn);.spada@unito.it technological innovation. NGUPP steams from the NRRP,
(I. Spada); laurentiu.zaharia@edu.unito.it (L. J. M. Zaharia); by which Italy engaged with the European Commission
marinella.quaranta@unito.it (M. Quaranta); in order to define actions and interventions to overcome
marianna.molinari@unito.it (M. Molinari); susanna.marta@unito.it the economic and social impact of the pandemic, acting
(S. Marta); ilariaangela.amantea@unito.it (I. A. Amantea); on the country’s structural nodes and successfully facing
ldu.aiguid.driictoa@ro@unuitnoi.tiot.(itD(.LA.Dud.rCitaor)o;)e;mguiliidoo.s.buoliesl@la@unuitnoi.tiot.(itE(.GS.uBlios)e;lla) the environmental, technological and social challenges
0009-0007-8015-7786 (C. Bonfanti); 0009-0007-3248-1661 of our time. In an efort to identify feasible solutions for
(M. Colombino); 0009-0003-1730-7711 (G. Iacobellis); the fulfilment of the undertakings given to the European
0009-0009-2699-8730 (R. Mignone); 0009-0002-0459-1189 (I. Spada); Union through a multidisciplinary approach, using legal,
0009-0002-3559-8367 (L. J. M. Zaharia); 0000-0003-2691-0611 business and IT skills, our research led us to the
imple(0M00.0Q-0u0a0r2a-n9t2a3)9;-05030508-0(0D0.3A-1u3d2r9i-t1o8);5080(0I0.A-0.0A03m-1a7n4t6e-a3);733 (E. Sulis); mentation of a tool that would not only be up-to-date but
0000-0002-7570-637X (L. D. Caro); 0000-0001-8804-3379 (G. Boella) could also be used by legal practitioners in post-project
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License phases. This paper describes the results obtained from
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
the research unit of the University of Torino. In the
following, Section 2 introduces the background with related
works, definitions, and dataset. Section 3 describes the
methodology, while first results are detailed in Section 4.</p>
        <p>Section 5 concludes the paper.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Related work. The present work follows the research
approach of legal informatics [1], where computational
methods and AI applications are increasingly relevant [2],
especially in the area of e-Justice and analysis of
judicial decisions. Judicial citations are approached with
network analysis to address, for instance, the decisions
of the CJEU [3, 4]. As concerns automatic judicial
interpretation and prediction, a variety of supervised [5]
and unsupervised [6] methodologies are applied, e.g. to
assess public procurement fraud detection [7], paying
attention to explainability [8]. Other research lines pursue
the objective of extracting and classifying argumentative
patterns in judgments [9] and to model the most efective
standards [10] and design-ontological techniques [11]
to represent legal text sources. Recently, a promising
research domain is engaged with analyzing the process
of harmonization of EU and domestic legislation [12].
Definitions. The present paragraph aims at defining
terms and keywords on which the particular topic of this
paper is based. A judgment (i.e. Sentenza) is identified
by “code and year". The code is a sequential number
released by the court when the judgment becomes
definitive and is inserted in the court’s oficial records. Year,
instead, determines the year in which the judgment was
published into. NGR, which stands for “number of
general register” and corresponds to a chronological number
assigned to a specific case (and its files, including the
judgment), is used to link and store all the acts and
documents related to the case in a unique folder. The subject
(i.e. Materia) pinpoints a Macro Area of the domain of
the judgment, nonetheless the section of the court that
created it. The label (i.e. Voce) discerns a specific subset of
the Macro Area: Salary (i.e. Retribuzione), Contribution
(i.e. Contribuzione), Individual dismissal (i.e.
Licenziamento individuale) are diferent labels of the subject Work
(i.e. Lavoro).</p>
      <sec id="sec-2-1">
        <title>An important step towards the achievement of the var</title>
        <p>ious tasks discussed below is the automatic extraction
and segmentation of text. The approach used to structure
the data was to mirror the segmentation pattern used
Dataset. The dataset used for the present work encom- by domain experts. The following is a brief pipeline of
passes data extracted from Turin Court (i.e.Tribunale), the operations that involved this task: 1) Conversion of
which supplied a gross amount of 27,477 judgments con- judgments to .docx format: we decided to converge files
cerning the labour law division (i.e. Sezione lavoro). The with diferent formats to a single data representation to
mentioned decisions were delivered in the following file facilitate the text extraction process. 2) Removal of less
formats: real-pdf, docx, doc, docm. A subset of 4,804 informative paragraphs: Stakeholder’s information was
judgments was provided with a specific label. The total disregarded in order to perform classification tasks
usnumber of labels is 309. It’s important to notice how ing clean data. 3) Structuring of textual content in JSON
the distribution of judgments on the diferent labels is
skewed, as shown in Figure 1.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In order to digitize legal archives and provide a system
that can be easily used by Judges and UPPs, a platform
is being developed to host the resources and processes
them in a way that automatically catalogues and indexes
the collection. Semantic information extraction allows
navigation by metadata and similarity.</p>
      <sec id="sec-3-1">
        <title>3.1. Information Retrieval and</title>
      </sec>
      <sec id="sec-3-2">
        <title>Segmentation</title>
        <p>format. which we will refer to below as “corpus_8_classes" and</p>
        <p>The process of information extraction led to the def- “corpus_15_classes", the first generated using 800
judginition of two diferent JSON representations for each ments distributed equally over 8 entries and the latter
judgment, by metadata and by content. The following with 1,872 judgments distributed over 15 entries. The
metadata was collected: court, section, subject, judgment entries considered are, in order, the first 15 illustrated in
code-year, NRG code-year. The content is organized as Figure 1.
follows: 1) Oggetto: The subject matter of the case ad- For the creation of the datasets we employed some kind
dressed by the judgment. It is typically very informative of vector space modelling techniques. Starting from these
about the subject to which the judgment belongs, 2) Con- representations we trained some models. For major
declusioni: Some indications about the conclusion of the tails, results and discussion are visible in section 4.1. Data
proceedings concerning the parties, 3) Svolgimento del used in this paper for the creation of the datasets matches
processo: The central part of the judgment where the with the following content of the JSON fields: “Oggetto",
facts of the case and the reasons for the decision made “conclusioni", “svolgimento del processo", “P.Q.M" and
by the judge are addressed, 4) P.Q.M.: The final verdict, “voce". Starting from these fields we defined 8 diferent
5) Voce: the indication of the label, where present. We datasets, 4 for each corpus. At the end of the
preprocesswere able to obtain a labelled dataset on Turin judgments ing pipeline on the “corpus_8_classes", the use of TF [14]
through a matching process. Given a list of indexes of and TF-IDF [15] led us to define two sparse matrices of
items, matching was conducted by comparing the judg- 23,618 x 800 dimension, while on the second corpus, the
ments’ code-year and NRG code-year to those reported result of the TF and TF-IDF vectorization returned two
within the indexes. An index represents a file containing 28,319 x 1,872 sparse matrices. To have a recent
comall references to a case organized by voce. Each case is parison regarding the state of the art on the embeddings
associated with an NRG code which can be found in the representation, the remaining 4 datasets were created
oggetto section of the judgment. using the following resources:</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. Preprocessing</title>
        <p>To enhance the quality of the data and preserve its
privacy, it was necessary to perform a preprocessing
pipeline, consisting of 1) pseudo-anonymization, 2)
conversion to lowercase, 3) removal of special characters
(accents, punctuation symbols and non-uft8 characters),
4) removal of URLs and HTML tags, 5) conversion of word
numbers to their numeric form, 6) removal of stopwords,
7) lemmatization. The pseudo-anonymization phase
overwrites proper names, surnames and tax codes. This phase
allows us to use the dataset without directly processing
this kind of personal data of the people involved in the
judgments. In addition, the use of specific tags, that
replace the data just mentioned, maintains the semantics
of the sentence and the relationships between entities
inside the text. Subsequently, the text was cleaned of
irrelevant components so as not to compromise the previous
phase, since some sensitive information includes
stopwords, capital characters and punctuation symbols. The
lemmatisation phase was performed using Morph-it! [13]
to speed up the computation on the Italian dataset.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3. Classification</title>
        <p>Datasets. One of the main tasks addressed in this work
is the automatic classification of judgments. Considering
the imbalance of the dataset, the tests on classification
were conducted with a limited dataset; in fact, not all
judgments from the Turin dataset were taken into
account. Specifically, two main corpora were produced,
• Doc2Vec: Doc2Vec [16] is an unsupervised
neural network model that learns fixed-length
feature vectors for representing textual data. The
network architecture, like for word2vec [17],
provides two diferent algorithms for the
embeddings generation: “Continuous Bag of Words”
(CBOW) e “Skip-Gram’(SG)”[17]. For the
learning process, we considered the first one,
CBOW, which implementation is visible in the
python library: gensim.models.Doc2Vec1. The
model, after a preprocessing step, specifically
required for this implementation of the
algorithm, was trained for 30 epochs with the
following hyperparameters: vector_size = 300,
negative=5, hs=0,min_count=2,sample=0, alpha=0.025,
min_alpha=0.001.
• Italian-Legal_bert: Italian-Legal_bert [18] is a
version of a pretrained BERT-BASED [19] model
(ITALIAN XXL BERT2) trained on italian legal
texts. The embeddings of this model are obtained
running an additional round of training for 4
epochs on a 3,7GB preprocessed text from the
National Jurisprudential Archive using the
Huggingface PyTorch-Transformers library3.</p>
        <p>Models. Our classification work focused more on data
representation than on the use of neural models and
finetuning of networks. A first experiment has seen the use
1https://radimrehurek.com/gensim/models/doc2vec.html
2https://huggingface.co/dbmdz/bert-base-italian-xxl-cased
3https://huggingface.co/docs/transformers/index
of a multiclass SVM [20] as a baseline model. Assuming to fill the gap existing between what is defined by legal
nonlinearly separable data, we trained the SVM model doctrine and reality. The first can be an imposition [ 23]
using an “rbf" kernel-trick4. In the second order, consid- as it happened in many countries that were colonized,
ering the dimensions of the datasets, we conducted some or [24] with sets of law written centuries before. In Italy,
tests using a Logistic Regression5 model with a “lbfgs" a Country following a civil law approach to legislation,
solver. In presence of sparse and poor data, these models principles of law are: an oficial interpretation given by
tend to show the same behaviour. Furthermore, we con- the Supreme Court (i.e. Corte di Cassazione), whose scope
sidered a Random Forest classifier[ 21] with max 2,000 is to give a generalized interpretation and application of a
trees, which, instead, results more eficiently on datasets rule.
with a limited number of features. Finally, the same tests
were repeated running an Ensemble Learning task with In Computer Science. In this project, as mentioned in
a simple Voting classifier 6 using all the previous models. the previous paragraphs in this section, we approached
topics as Classification and Similarity. Our hypothesis is
3.4. Similarity that given a correct set of methods to recognize the ways
in which principles of law are expressed in a sentence, we
are able to find new metadata, useful in the development
of the tasks before mentioned.</p>
        <sec id="sec-3-4-1">
          <title>Judgments contain a set of sections that describe the focal</title>
          <p>points of the document, specifically parts (i.e. Parti),
subject matter (i.e. Oggetto), fact (i.e. Fatto), reasoning (i.e.</p>
          <p>Motivi) and decision (i.e. Decisione). These sections
represent a substantial amount of information meticulously 4. Results
describing judgments, some of which share
characteristics and suggest similarity and relatedness between 4.1. Classification
judgments on multiple levels. Sections include citations
(e.g. judgments, legal articles) that relate resources, espe- In this section, we will show in more detail all the results
cially judgments with the same (or similar) citations that of our experiments. All data visualized in the following
can discuss similar issues and treat the fact in a similar tables are derived by applying a 10-fold cross-validation
manner. Citations can be considered diferently depend- method on the datasets and models defined in the
previing on their position in the text, domain, and specific ous section. Table 1 shows the results of the main
evalmoment in time. These relationships between resources uation metrics we considered: accuracy, precision, and
provided the input to develop an additional feature for recall. Reading the table by columns, as depicted, the
Ranthe dataset treatment in order to provide additional func- dom Forest classifier (2,000 trees) is the model with the
tionality consisting of semantic similarity search within best results. The limited structure of these datasets has
the online catalogue of judgments. The domain of ap- led to more performing results in that model which, in
plication constrains the use of recurring structures and general, tends to decrease its performance in case of the
terminologies in judgments [22] that guides the treat- number of classes and features increases. It is interesting
ment of data from an entropy perspective with the aim to note from Table 1 how the dataset that responds with
of finding the most relevant components in the text that higher performance is the one obtained using doc2Vec,
constitute the discriminating features. A hybrid approach in fact, all the models applied to this dataset return high
oriented to the analysis of know-how and reproduction precision and recall values.
of some methodologies applied by domain experts was Table 2 describes the results of the models on the
“coropted for. The goal is the completion of the task by en- pus_15_classes". From a first observation it can be seen
riching it with an attempt to provide an explanation of how the nature of this corpus has had a significant impact
the results provided by the system would allow greater on the performance of the models which are decreased,
transparency of the platform. compared to the previous test. All the results obtained
from the diferent models, except for the dataset created
by doc2vec embeddings, reflect our expectations about
3.5. Principles of Law the decreasing of the performances. In both corpora,
italian-legal-BERT reported the worst results, due to the
excessive sparseness of the data, while doc2vec appears to
guarantee excellent performance even with the baseline
models.</p>
          <p>In Case-law. Defining what can be considered a
principle of law is not straightforward. Whereas the country
considered in our analysis abide by a common or a civil
law legal system we found an across-the-board shared
definition, with a similar gauge. Principles of law are used
4https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
5https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
6https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions and Future Work</title>
      <p>perform the classification models, if two judgments are
more similar, it is more likely that they belong to the
In this paper, we presented a pipeline for providing a sys- same category.
tem that facilitates some of the activities of magistrates In regards to the principles of law, we speculate the
and UPP’s relating to the automatic classification, seman- possibility of identifying relationships of interest, useful
tic information research, and navigation of legal texts to model the connection between entities explicitly stated
by metadata and similarity. We explored some baseline in a legal text such as a judgment.
solutions focusing mainly on data representation than on
the use of state-of-the-art neural models and fine-tuning
of networks. Although the composition of the corpora Acknowledgments
and the lack of data, we obtained excellent results
showing that it is possible to achieve good performance even The project is part of the “Unitary project for the
dissemusing simple models, however in the future, there would ination of the Ofice for Trial and the implementation
be anything but baseline models to explore and evalu- of innovative operating models in the judicial ofices for
ate. Another approach to the classification task could the disposal of the backlog", promoted by the Ministry of
be a combination of similarity techniques and machine Justice as a side of the PON Governance and Institutional
learning models we will consider in future work. In fact, Capacity 2014-2020 (Axis I – Action 1.4.1) and
implethe use of some similarity metrics could help us to out- mented in synergy with the interventions envisaged by
the National Recovery and Resilience Plan (NRRP) in uments, Springer Netherlands, Dordrecht, 2011, pp.
support to the justice reform. 75–100.
[11] D. Audrito, E. Sulis, L. Humphreys, L. Di Caro,
Analogical lightweight ontology of eu criminal
proceReferences dural rights in judicial cooperation, Artificial
Intelligence and Law (2022) 1–24.
[1] G. Contissa, F. Godano, G. Sartor, Computation, [12] E. Sulis, L. B. Humphreys, D. Audrito, L. D. Caro,
Cybernetics and the Law at the Origins of Le- Exploiting textual similarity techniques in
harmogal Informatics, Springer, Cham, 2021, pp. 91–110. nization of laws, in: S. B. et al. (Ed.), AIxIA 2021,
doi:10.1007/978-3-030-54522-2_7. volume 13196 of LNCS, Springer, 2021, pp. 185–197.
[2] L. Robaldo, S. Villata, A. Wyner, M. Grabmair,
Introduction for artificial intelligence and law: special [13] dEo.iZ:1a0n.ch1e0t0ta7,/M9
7.8B-ar3o-n0i,31M-o0r8p4h2-i1t,-A8\f_re1e3c.orpusissue "natural language processing for legal texts", based morphological resource for the Italian
lanArtif. Intell. Law 27 (2019) 113–115. doi:10.1007/ guage. Corpus Linguistics 1 (2005) 2005.
s10506-019-09251-2. [14] H. P. Luhn, The automatic creation of literature
[3] M. Derlén, J. Lindholm, Is it Good Law? Network abstracts, IBM J. Res. Dev. 2 (1958) 159–165.</p>
      <p>Analysis and the CJEU’s Internal Market Jurispru- [15] K. S. Jones, A statistical interpretation of term
specidence, Journal of International Economic Law 20 ifcity and its application in retrieval, J.
Documenta(2017) 257–277. tion 60 (2021) 493–502.
[4] G. Sartor, P. Santin, D. Audrito, E. Sulis, L. Di Caro, [16] Q. V. Le, T. Mikolov, Distributed representations of
Automated extraction and representation of cita- sentences and documents, in: International
Confertion network: A cjeu case-study, in: R. Guizzardi, ence on Machine Learning, 2014.</p>
      <p>B. Neumayr (Eds.), Advances in Conceptual Model- [17] T. Mikolov, K. Chen, G. S. Corrado, J. Dean,
Efiing, Springer, Cham, 2022, pp. 102–111. cient estimation of word representations in vector
[5] F. Galli, G. Grundler, A. Fidelangeli, A. Galassi, F. La- space, in: International Conference on Learning
gioia, E. Palmieri, F. Ruggeri, G. Sartor, P. Torroni, Representations, 2013.</p>
      <p>Predicting outcomes of italian vat decisions 1, in: [18] D. Licari, G. Comandè, ITALIAN-LEGAL-BERT: A
Legal Knowledge and Information Systems, IOS Pre-trained Transformer Language Model for
ItalPress, 2022, pp. 188–193. ian Law, in: Symeonidou et al. (Ed.), EKAW,
vol[6] R. A. Shaikh, T. Sahu, V. Anand, Predicting out- ume 3256 of CEUR Workshop Proceedings, CEUR,
comes of legal cases based on legal factors us- Bozen-Bolzano, Italy, 2022. URL: https://ceur-ws.
ing classifiers, Procedia Computer Science 167 org/Vol-3256/#km4law3.
(2020) 2393–2402. doi:10.1016/j.procs.2020. [19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
03.292. Pre-training of deep bidirectional transformers for
[7] R. Nai, E. Sulis, R. Meo, Public procurement fraud language understanding, in: ACL: HLT, Vol. 1, ACL,
detection and artificial intelligence techniques: a Minnesota, 2019, pp. 4171–4186. doi:10.18653/
literature review, in: Symeonidou et al. (Ed.), EKAW,
volume 3256 of CEUR Workshop Proceedings, CEUR- [20] vB1./EN.1B9o-se1r4, 2I3..M. Guyon, V. N. Vapnik, A
trainWS.org, 2022. URL: https://ceur-ws.org/Vol-3256/ ing algorithm for optimal margin classifiers, in:
km4law4.pdf. Proceedings of the fifth annual workshop on
Com[8] R. Meo, R. Nai, E. Sulis, Explainable, inter- putational learning theory, 1992, pp. 144–152.
pretable, trustworthy, responsible, ethical, fair, [21] L. Breiman, Random forests, Machine Learning 45
verifiable AI... what’s next?, in: S. Chiusano, (2001) 5–32.</p>
      <p>T. Cerquitelli, R. Wrembel (Eds.), ADBIS 2022, Turin, [22] X. Li, J. Gao, D. Inkpen, W. Alschner, Detecting
Italy, September 5-8, 2022, Proceedings, volume relevant diferences between similar legal texts, in:
13389 of LNCS, Springer, 2022, pp. 25–34. doi:10. Proceedings of the Natural Legal Language
Process1007/978-3-031-15740-0\_3. ing Workshop 2022, 2022, pp. 256–264.
[9] G. Grundler, P. Santin, A. Galassi, F. Galli, F. Godano, [23] N. L. Mahao, Can african juridical principles
reF. Lagioia, E. Palmieri, F. Ruggeri, G. Sartor, P. Tor- deem and legitimise contemporary human rights
roni, Detecting arguments in CJEU decisions on fis- jurisprudence?, Comparative and International Law
cal state aid, in: Proc. of the 9th Workshop on Argu- Journal of Southern Africa 49 (2016) 455–476.
ment Mining, International Conference on Compu- [24] F. Galindo, Juridical principles for juridical
applicatational Linguistics, Korea, 2022, pp. 143–157. URL: tions. the derinfo methodology, in: D. Karagiannis
https://aclanthology.org/2022.argmining-1.14. (Ed.), Database and Expert Systems Applications,
[10] M. Palmirani, F. Vitali, Akoma-Ntoso for Legal Doc- Springer Vienna, Vienna, 1991, pp. 425–430.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>