<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Absolve or Condemn Defendants⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vithor G. F. Bertalan</string-name>
          <email>vithor.bertalan@polymtl.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evandro E. S. Ruiz</string-name>
          <email>evandro@usp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ribeirão Preto</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SP - Brazil</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Polytechnique Montreal, Departement de Genie Informatique et Genie Logiciel. 2500 Chem. de Polytechnique</institution>
          ,
          <addr-line>Montreal</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidade de São Paulo, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto. Av. Bandeirantes</institution>
          ,
          <addr-line>3900</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Legal prediction is one of the most critical subfields in Natural Language Processing. The researchers use state-of-the-art machine learning and artificial intelligence methodologies to predict specific judicial facets, such as the judicial outcome. For this research, we have built a web text crawler to extract homicide data cases from Brazilian electronic legal systems. Then, we used word embeddings, processed in several neural networks, to test the ability of our model to predict the outcome of the cases based on their textual characteristics and features, finding that Gated Recurring Units (GRU) showed the best performance. Afterwards, we applied Hierarchical Attention Networks (HAN) to see a sample of the most important words used to absolve or convict defendants. We also analyzed those results to find whether we could track patterns in each of the outcomes. Legal prediction, Attention weights, Natural language processing, Machine learning SCIA-2022: 1st International Workshop on Social Communication and Information Activity in Digital Humanities, October 20, 2022, Lviv, ORCID: 0000-0002-1585-7694 (V. G. F. Bertalan); 0000-0002-7434-897X (E. E. S. Ruiz)</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Law is among the most text-dependent areas of human activity. Daily new court decisions, appeals,
and other written legal instruments are produced by specialized professionals, such as lawyers, judges,
defendants, and plaintiffs. All of them have different necessities that intelligent systems could supply.
Branting et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] exemplify this potential for using new technologies. In this article, the authors argue
that Computer Science has long acted in the automation of legal reasoning, and problem-solving has
been an objective of research in Computer Science. Therefore, it is legitimate to consider that Artificial
Intelligence (AI) can be used to assist and optimize the daily lives of legal professionals.
      </p>
      <p>
        With computing resources progressively present in society, today, criminals and law en- forcement
are increasingly using the opportunities offered by AI. In criminal law, deepfake technologies and
algorithmic profiling contribute to existing new types of crime. In criminal procedural law, AI can be
used as a law enforcement technology, for example, for predictive policing or cyber agent technology.
Machine Learning (ML) is also generally used to improve the legal sector. Ashley and Brüninghaus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
have also reposted that two long-standing goals in ML and law research are automatic classification of
case texts and more precise prediction of case outcomes to make attorneys understand them. In the
article by Sula and colleagues [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] we also find arguments that legal professionals benefit significantly
from the kind of automation provided by machine learning.
      </p>
      <p>
        Concerning the data used to feed these AI and ML algorithms, it is worth mentioning that judges
have very particular writing styles, as stated in Alarie et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and often develop partic- ular writing
Ukraine
      </p>
      <p>
        2022 Copyright for this paper by its authors.
skills to customize how they present information. Le and his co-authors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] also characterize legal
documents as being very formal, and their structure is sometimes as essential as their readability.
Branting et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] state that even if the laws are faithfully represented in the texts, the terms of these
same laws are often impossible for a layperson to interpret.
      </p>
      <p>
        In addition, legal texts have specific properties and structures. The work of Aletras et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] shows
us that the formal facts of a court case are the most important predictors of a lawsuit. These descriptive
factor texts are essential because, in addition to passages like these, legal texts also have other specific
characteristics that differentiate them from other types of narratives. For example, references in a legal
text have a specific structure that differs from references in the Public Domain [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we see that
textual content and various parts of the case are vital factors influencing the results of the judicial
tribunals.
      </p>
      <p>
        In this project, we applied Hierarchical Attention Networks (HAN) to see a sample of the most
important words used to absolve or convict defendants. We also analyzed those results to find whether
we could track patterns in each of the outcomes. Our main inspiration for this work is the research
performed by Aletras et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], in which textual features were used to predict judicial decisions. To the
best of our knowledge, our work is the first to use this methodology, adapted to our final goal of
checking the weights of each word in the comprehensive documents to predict judicial decisions in
Brazilian Portuguese. The complete research can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this first section of the paper, we
present the introduction and motivation for our research.
      </p>
      <p>In Section 2, we write about similar research that has also inspired our main work. In Section 3, we
describe the methodology used in this work, as all as the algorithms chosen. Section 4 presents the
discussion about the attention weights of tokens in our judicial texts. In Section 5, we show the results
and the discussion conducted on the numbers given in the previous section. Finally, in Section 6, we
finish with the conclusion, problems found, and possible future research possibilities.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        The main objective of this research lies in the field of sentence prediction. The primary reference to
this project is the work of Aletras et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In this project, the authors used a dataset of cases from the
European Court of Human Rights on cases that violate three articles of its Convention,
namely: Article 3 prohibits torture and inhuman and degrading treatment; Article 6, which protects
the right to a fair trial; and Article 8, which provides for the right to respect for private and family life,
home and correspondence. From this set of sentences, an equal number of cases that violate (+1) and
do not violate (-1) each of the articles were selected and marked. Pre-processing tools and regular
expressions for extracting the texts were used. The n-grams were tagged and retrieved in the sections:
Procedure, Circumstances, Facts, Relevant Law, Law, and Full Case. These same n-grams were grouped
in a vector space model to find the main topics of each article. Classification using Support Vector
Machines (SVM) was 78% accurate in predicting topics and circumstances for article 3, 84% in
predicting topics and circumstances for article 6, and 78% in predicting topics and circumstances for
article 8.
      </p>
      <p>
        Random Forests’ methodology was used in the work of Katz, Bommarito, and Blackman [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to
predict the behavior of the US Supreme Court. The dataset used came from the United States Supreme
Court. This set contains 240 variables; among these are: chronological variables, case antecedent
variables, justice-specific variables, and outcome variables. Qualitative texts were labeled ’Inverted’,
’Affirmed’, or ’Other’. The remaining categorical variables were coded with binary or indicator
attributes. The authors used the Support Vector Machines and Random Forests methods to classify and
predict sentences. The authors pointed to Random Forests as the best algorithm to work on their dataset.
Using this methodology, the authors achieved a recall of up to 77% for the ’Affirmed’ class, while in
binary labeling, they had a recall of 78% for the ’Not Reversed’ cases. The authors also studied the
weights assigned to terms by the SVM method to find the terms that most impacted each violation of
the laws.
      </p>
      <p>
        Suela and colleagues [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also carried out an investigation similar to the previous ones in sentence
prediction but now involving the areas of Criminal Law and Social or Commercial Law. This group
drew on the texts of French Supreme Court decisions (Court de Cassation) and also used the
methodology of SVMs. The decision collection had 131,830 decision documents combined with
descriptive metadata. Such standardized metadata consisted of a label for the legal field, a timestamp,
the case decision (e.g., forfeiture, rejection, expropriation, etc.), a description of the case, and the cited
laws. After pre-processing, the dataset was reduced to 126,865 different court decisions. Each of these
documents contained a description of the case and four different types of labels: a) legal area, b) the
date of the decision, c) the process itself, and; d) a list of articles and laws cited in the description. The
most significant attributes were then selected using hierarchical clustering, and then SVMs were used
to classify the dataset. The authors reached a precision of 90.2% of correct answers to classify the area
of Law, 96.9% to classify the judicial decision using SVM of 6 classes, and 74.3% using SVM of 7
classes to estimate the date of the process and the decision.
      </p>
      <p>
        In another study by Gokhale and Fasli [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the authors developed a learning algorithm for
classifying human rights abuses using SVM and logistic regression. They based their work on a domain
ontology developed for human rights as background knowledge so that the initial term is extracted to
generate labeled data for class training. In the paper by Le and colleagues [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors used index
extraction with sentence structure information for Japanese legal documents to assign each token
weight, a statistical score showing its importance.
      </p>
      <p>
        Jurist court verdicts have distinct structures that are particularly suitable for text analysis. In a similar
study, Ashley and Brüninghaus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] also followed this thought. The authors developed a model that
extracts information from textual descriptions of facts about the decided cases and applies this
information to predict the results of issues. In a similar study, Bertalan [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used the textual
characteristics of tokens to predict court outcomes. However, analysts should not forget that the team
led by Nicolas Sannier [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] mentioned that a crucial complexity in analyzing legal texts is that legal
provisions are usually interconnected and spread over different texts that cannot be taken in isolation
from each other.
      </p>
      <p>
        In the paper by Sukanya and Priyadarshini [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] we can see several methods that use attention to
predict judicial outcomes. All these academic works indicate that the areas of Artificial Intelligence and
Natural Language Processing applied to Law is a growing field that offers promising opportunities for
research and application in the future. Law is an area of enormous activities in Applied Social Sciences,
providing a wide range of applications and generally producing a significant amount of text.
Consequently, researchers can produce substantial results with specific applications. The study of
computational prediction models for sentences that offer satisfactory results for a sizeable judicial court,
such as the São Paulo Justice Court (TJSP), can be of great use to any other court.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>This section explains the characterization of domains and the steps necessary to complete the
research.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Data Collection</title>
      <p>
        We based our work on the data provided by the eSAJ system, the electronic document manage- ment
system used by the São Paulo Justice Court. The São Paulo Justice Court is the world’s largest judicial
court, considering the number of legal processes [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. A crawler was designed and implemented to
query and save the documents retrieved.
      </p>
      <p>When one uses eSAJ to query the process’s database, the user can select from many fields to exhibit
judicial opinions, such as subjects, judges by their names, process numbers, and other fields, to choose
the appropriate texts to be retrieved. From the many attributes, have selected the following: Judicial
Class, Judicial Subject, Date of availability and Text. Classes are types of judicial documents. For
example, repeals and termination of contracts would be judicial classes under Brazilian law. Subjects
are the type of judicial case being conducted, such as drug trafficking or feminicide. The field Text
contains the court decision.</p>
      <p>Every process is labeled with its outcomes (absolved or condemned). The average word count for
each document is 1.019 words, with the most extensive text accounting for 5.846 words. The crawler
can gather the data in a predefined date interval to keep the number of selected cases doable. The initial
date of availability and a final date mentioned in the previous paragraph are initial arguments to capture
a restricted corpus. The documents retrieved were available from July 2018 to March 2019. We have
collected 1.681 homicide cases, accounting for 844 absolutions and 837 condemnations.</p>
      <p>For the labeling process, we hired two Brazilian different criminal lawyers, that were re- sponsible
for labeling 40% and 60% of the cases, respectively. That step is necessary due to the complexity of
law terms, and since there is no straight information on the outcome of a judicial decision on eSAJ.
Hence, the lawyers read the whole content of the decision so as to define whether the defendant had
been absolved or convicted.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Data Transformation</title>
      <p>
        Instead of using the words in the documents, the words were transformed into an embedded form.
Turian, Ratinov, and Bengio [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] define word embeddings as vectors composed of real numbers
distributed over an interdimensional space induced by semi-supervised learning. Since machine
learning algorithms work with numbers, not words, word embeddings are a very effective alternative to
transforming words into mathematical values. The reasoning for word embeddings is to calculate the
similarities between two vectors by cosine similarity. Each vector dimension represents a characteristic
intending to capture a word’s semantic, in other words, its synthetic or morphological proprieties in a
distributed way.
      </p>
      <p>
        GloVe [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] was the chosen methodology used to construct the embedded words used in the project.
GloVe learns word vectors such that their dot product equals the logarithm of the words’ probability of
co-occurrence. Rather than using a window to define local context, GloVe constructs an explicit
wordcontext or word co-occurrence matrix using statistics across the whole text corpus. It can combine local
and global representations of a term by mixing the features of two model families: the global matrix
factorization and local context window methods. We have used the pre-trained GloVe corpora
developed by the team led by Sandra Aluisio [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for Brazilian Portuguese.
3.3.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Machine Learning Processing</title>
      <p>We used five different neural network algorithms to process the dataset. These are Multilayer
Perceptron (MLP), Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Gated
Recurring Unit (GRU), and Hierarchical Attention Networks (HAN). We also applied a 10-fold
crossvalidation procedure in all five algorithms. Every file was embedded using Brazilian Portuguese GloVe
word vectors with 600 embedding dimensions. Both the learning rate and the loss function define the
convergence criteria. We used those word embeddings on GloVe, processed in several neural networks,
to test the ability of the model to predict the outcome of the cases based on their textual characteristics
and features. The learning rate was lowered by 0.2 every three epochs. Concerning the loss function,
the algorithm stops if, after five epochs, it does not decrease by at least 0.001. Results were evaluated
through the standard quality evaluation measures: accuracy, precision, recall, and F-measure. In our
tests, we have tried several different combinations of hyperparameters, to see which ones offered the
best results, and to find the optimal combination for each of the algorithms.
3.4.</p>
    </sec>
    <sec id="sec-7">
      <title>Using Hierarchical Attention Networks</title>
      <p>
        Although we used many different neural networks to evaluate the dataset, the research’s core
revolves around utilizing Hierarchical Attention Networks (HANs). Yang et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] describe a HAN as
a neural network architecture that highlights the importance of individual words, or phrases, in building
the representation of a document. This model emphasizes the most important sequences of terms that
affect the classification of a document. In theory, it is known that not all terms are equally crucial for
the classification of a text and that not all sentences represent the same meaning either.
      </p>
      <p>
        The result of processing HAN networks over a text is the association of an attention coefficient for
each term, indicating the importance of that term in its sentence. Still, considering the article by Yang
et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], we can see the application of a HAN in Figure 1. Note that sentence 1 ("The woman was
present at the crime scene") is more critical than sentence 2 for the overall text label (the distinguishable
shades of gray exemplify this difference).
      </p>
      <p>This sentence was highlighted in the HAN network because the network ranked this sentence among
the most important in this text - that is, this sentence received a relatively greater weight of attention
than the others. Also, note that in this sentence, some words are marked in red. These words constitute
essential terms in these sentences, as they have the highest attention weights of the words.</p>
    </sec>
    <sec id="sec-8">
      <title>4. Results</title>
      <p>In this section, we describe the numerical results and the textual analysis performed on the attention
weights of the HANs.
4.1.</p>
    </sec>
    <sec id="sec-9">
      <title>Numerical Evaluation of Results</title>
      <p>We used four distinct networks to evaluate the numerical results: a Multilayer Perceptron network,
an RNN (recurrent neural network), an LSTM type network, Long-short term memory, and, finally, a
GRT, Gated Recurrent Network.</p>
      <p>For the Multilayer Perceptron network, we adopted the following configuration: the archi- tecture
of 3 hidden layers, with 512 X 512 X 250 neurons, respectively, executed in 25 epochs. For the RNNs,
we use a hidden layer architecture with 128 units, with a dropout probability of 0.5 for the hidden layer
and a dropout probability of 0.2 for the inputs. The sigmoid function was used as the activation function,
and the binary cross-entropy function as the loss function performed in 25 epochs.</p>
      <p>For the LSTM networks, a hidden layer architecture with 128 units was also used, with a dropout
probability of 0.2 for the hidden layer and a dropout probability of 0.2 for the inputs. For the tests using
the GRN network, we also used a 1-hidden layer architecture with 128 units and a probability equal to
0.2 for both a 0.2 dropout in the hidden layer and the inputs. In contrast, all HANs were used with 600
dimensions of embedding, with a word and phrase encoder. The results can be found in Table 1.</p>
      <p>Precision Recall F-Score Accuracy
0.98
0.86
0.98
0.99
0.96</p>
      <p>
        Our results are similar to Chung et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Both have shown that the GRU is faster than other neural
network architectures while showing comparable accuracies. Chung et al. also mention that the choice
of the type of network may depend heavily on the dataset and corresponding task. Considering this
particular dataset, our research shows that GRU is the best choice.
4.2.
      </p>
    </sec>
    <sec id="sec-10">
      <title>Computation of Attention Weights</title>
      <p>All attention weights were calculated for each word in the data sets. Homonymous words appear
several times in the dataset corresponding to their different meanings. Note that the same word can have
different weights of attention for each class involved, acquittal or condemnation.</p>
      <p>We have recorded 248,460 single words for the acquittal texts in alleged murders and 466,461 single
word cards for convictions for murder. For each of the tokens, their attention weight was calculated.</p>
      <p>A histogram of word attention weights may be seen in Figure 2. The slope of this histogram may be
explained through Zipf’s Law. By this law, considering a relevance metric, it is possible to infer that
the frequency of any word is inversely proportional to its rank. Accordingly, the relevant terms account
for a small proportion of our dataset.</p>
      <p>The graphic in Figure 3 does not show either the top 10% or the low 10% that were removed, and it
shows only the 80% middle words remaining. We can see that Zipf’s Law continues to explain the
behavior of the terms. Although Figure 3 refers to the condemnation documents, the same behavior was
observed in the absolution documents.</p>
      <p>After the classification had been performed, the words in each dataset were ordered by their attention
weights. Each word had a unique value, ranging from 0 (a word with no importance for document
classification) to 1 (for a word with maximum importance for document classification). It is worth
mentioning that a word might have different attention weights in distinct sentences.</p>
      <p>As an example, consider these two sentences: The defendant robbed a bank and; The defendant did
not participate in the robbery, because it was going to a blood bank. Both have the word bank, but in
two very different contexts. In the first sentence, the word would be a vital contributor to the
condemnation, while the same word would contribute to the absolution in the second sentence.
Therefore, words have appeared more than once in our final calculations, with different attention
weights. The list with the top 50 words for each of the four outcomes is listed in Table 2.</p>
    </sec>
    <sec id="sec-11">
      <title>5. Discussion and Conclusions</title>
      <p>The first item we can analyze from the results is the difference in the number of words of absolutions
and condemnations: 248,460 unique word tokens for absolutions and 466,461 unique tokens for
condemnations, an 86% increase between the two groups. Therefore we infer that judges, as well as
their clerks, tend to write more in condemnation texts and dispatches – or, at least, they try to use a
vocabulary more refined and distinctive in those cases. That can reflect a more significant amount of
the established jargon in condemnation cases, indicating that those types of outcomes have particular
words, or that Law professionals tend to write more and more differently in those types of judicial cases.</p>
      <p>Also, when we analyze the dimension of the weights, we see that the condemnation tokens have a
more substantial weight on the outcome of the case. Even though the top two tokens (coincidentally,
"bo" for both outcomes have the same weight). However, when we look down on the list, we see that
the condemnation words keep on having a high weight. The 50th weights 0.373, while the 50th in the
absolution list weights 0.310. From that, we can observe that the unique tokens in the condemnation list
have a higher impact on the outcomes than absolutions. This fact matches the previous observation,
indicating that the use of a more advanced vocabulary for those cases significantly affects the text as a
whole. We can see this difference graphically in Figure 4.</p>
      <p>Looking directly at the meaning of words, we can also see patterns in the lists of both outcomes. In
the absolution list, we can read many words relating to the context of the defendant, such as bem (good),
origem (origin), infância (childhoood), social (social), tititi (gossip), cor (color), mãe (mother), anos
(years), and soubessem (if they knew). Although further analysis is required, we can conclude that many
absolution texts consider the socio-cultural aspects of the defendant to substantiate the decision.</p>
      <p>On the other hand, condemnation unique tokens contain several references to violent terms, such as
homicídio (homicide),qualificado (aggravated), disparos (gun shots), golpes (blows), socos (punches),
lesões (lesions) and colisão (collision). Also, several terms relating to the judicial process, such as
infração (infraction), penal (criminal), such as sentença (sentence), acusação (accusation) and comarca
(county). We can see by those terms that condemnation texts tend to heavily emphasize the nature of
the crime and the judicial process to substantiate the criminal penalty.</p>
      <p>It is helpful to mention that a word might have different attention weights in distinct sentences. As
a short example, the sentence The defendant robbed a bank and the sentence The defendant did not
participate in the robbery, because it was going to a blood bank both have the word bank, but in very
different contexts. In the first sentence, the word would be a vital contributor to the condemnation, while
it would contribute to the absolution in the second sentence.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Conclusions</title>
      <p>By analyzing the attention weights of each outcome, we could see that words do have substantial
effects on the meaning of the text. All the mathematical coefficients for the attention weights are shown
in Table 2. A HAN analysis offers a significant advantage when we compare our results to a single
word count. When we use attention neural networks, we can mathematically capture the impact of each
word in a single sentence, and each sentence in the overall text.</p>
      <p>
        However, we have to consider such a technique’s social and technical issues. As for technical issues,
as described by Surden [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], there are certain well-known limitations to applying AI in Law. First, the
model will be helpful only if a future case class has standard features pertinent to previous analyzed
topics in the training set. It will only be helpful if the model has the same properties as the previous
case. Therefore, the model does not consider the subtle changes in judicial thinking over time, except
when these changes represent a significant volume of training data. It will only consider subtle changes
in judicial thinking. The authors also present an example: not every law firm has several cases that are
similar enough to each other that the previous case has elements helpful in predicting future outcomes.
Therefore, one might infer that only large law firms will possess the financial and technological power
to develop such models.
      </p>
      <p>
        Regarding the social issues, Katz, Bommarito, and Blackman [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] state that qualitatively oriented
legal experts tend to suggest improvements in the model based on anecdotes or their untested mental
model rather than reliable facts and factual data. The authors suggest that to support a case from the
future applicability of a model, it must consistently outperform a baseline comparison. This prerequisite
is necessary not only for scientific purposes but also to gain lawyers’ trust for the model. We can also
see potential strategies to address gaps in social problems with Artificial Intelligence in Santoni and
Mecacci [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>A future path of research could also consider the sentences, not just the unique tokens, to perform a
contextual analysis of the meaning of the texts. This way, the most important sentences for each
outcome could also be considered and combined with the analysis we did in this paper to determine
which complex expressions have the highest weight in each of the outcomes. Another possibility is to
extend the research to the legal systems of other countries or languages to identify if our findings are
consistent with those different scenarios or if each language has its behavior pattern in its legal texts.</p>
    </sec>
    <sec id="sec-13">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Branting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , E. Merkhofer,
          <string-name>
            <given-names>B.</given-names>
            <surname>Brown</surname>
          </string-name>
          , Inducing Predictive Models for
          <article-title>De-cision Support in Administrative Adjudication</article-title>
          , in: U. Pagallo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Casanovas</surname>
          </string-name>
          , G. Sartor, S. Villata (Eds.),
          <source>AI Approaches to the Complexity of Legal Systems</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>465</fpage>
          -
          <lpage>477</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Ashley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brüninghaus</surname>
          </string-name>
          ,
          <source>Automatically classifying case texts and predicting outcomes, Artificial Intelligence and Law</source>
          <volume>17</volume>
          (
          <year>2009</year>
          )
          <fpage>125</fpage>
          -
          <lpage>165</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10506-009-9077-9.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O</given-names>
            <surname>.-M. Sulea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Genabith</surname>
          </string-name>
          ,
          <article-title>Exploring theUse of Text Classification in the Legal Domain</article-title>
          ,
          <source>in: Proceedings of the 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL)</source>
          ,
          <year>2017</year>
          . URL: http://arxiv.org/abs/1710.09306. arXiv:
          <volume>1710</volume>
          .
          <fpage>09306</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Alarie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niblett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <source>How Artificial Intelligence Will Affect the Practice of Law, in: Artificial Intelligence, Technology and the Future of Law</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . URL: https://papers.ssrn.com/sol3/papers.cfm?abstract{_}id=
          <fpage>3066816</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T. T. N.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shirai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shimazu</surname>
          </string-name>
          ,
          <article-title>Extracting indices from Japanese legal documents</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          <volume>23</volume>
          (
          <year>2015</year>
          )
          <fpage>315</fpage>
          -
          <lpage>344</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10506-015-9168- 8.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Branting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , E. Merkhofer,
          <string-name>
            <surname>B. Brown,</surname>
          </string-name>
          <article-title>Inducing predictive models for decision support in administrative adjudication</article-title>
          , in: AI Approaches to the
          <source>Complexity of Legal Systems</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>465</fpage>
          -
          <lpage>477</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarapatsanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Preoţiuc-Pietro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lampos</surname>
          </string-name>
          ,
          <article-title>Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>2</volume>
          (
          <year>2016</year>
          )
          <article-title>e93</article-title>
          . URL: https://peerj.com/articles/cs-93. doi:
          <volume>10</volume>
          .7717/ peerj-cs.
          <volume>93</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. X.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shimazu</surname>
          </string-name>
          ,
          <source>Automated reference resolu- tion in legal texts, Artificial Intelligence and Law</source>
          <volume>22</volume>
          (
          <year>2014</year>
          )
          <fpage>29</fpage>
          -
          <lpage>60</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10506-013-9149-8.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V. G. F.</given-names>
            <surname>Bertalan</surname>
          </string-name>
          ,
          <article-title>Using natural language processing methods to predict judicial outcomes</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Universidade de São Paulo,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>D. M. Katz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. Bommarito</surname>
            <given-names>II</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J. Blackman</surname>
          </string-name>
          ,
          <article-title>A general approach for predicting the behaviorof the Supreme Court of the United States</article-title>
          ,
          <source>PloS one 12</source>
          (
          <year>2017</year>
          )
          <article-title>e0174698</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gokhale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fasli</surname>
          </string-name>
          ,
          <article-title>Deploying A Co-training Algorithm to Classify Human-Rights Abuses</article-title>
          , in: 2017
          <source>International Conference on the Frontiers and Advances in Data Science(FADS)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V. G. F.</given-names>
            <surname>Bertalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E. S.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <article-title>Predicting judicial outcomes in the brazilian legal system using textual features</article-title>
          ., in: DHandNLP@ PROPOR,
          <year>2020</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sannier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Adedjouma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabetzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Briand</surname>
          </string-name>
          ,
          <article-title>An automated framework for detection and resolution of cross references in legal texts</article-title>
          ,
          <source>Requirements Engineering</source>
          <volume>22</volume>
          (
          <year>2017</year>
          )
          <fpage>215</fpage>
          -
          <lpage>237</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00766-015-0241-3.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sukanya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Priyadarshini</surname>
          </string-name>
          ,
          <article-title>A meta analysis of attention models on legal judgment prediction system</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications12</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tribunal de Justica de Sao Paulo - Quem Somos</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://www.tjsp.jus.br/QuemSomos.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Turian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Word representations: a simple and general method for semisupervised learning, in: Proceedings of the 48th annual meeting of the association for computational linguistics</article-title>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>384</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          ,
          <source>in: Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: http://www.aclweb.org/anthology/D14-1162.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shulby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Treviso</surname>
          </string-name>
          , J. da Silva, S. Aluisio,
          <source>Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks, arXiv preprint arXiv:1708.06025</source>
          (
          <year>2017</year>
          ). URL: https://arxiv.org/abs/1708.06025.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          , E. Hovy,
          <article-title>Hierarchical Attention Networks for Document Classification, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , San Diego, California,
          <year>2016</year>
          , pp.
          <fpage>1480</fpage>
          -
          <lpage>1489</lpage>
          . URL: https://www.aclweb.org/anthology/N16-1174. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N16</fpage>
          -1174.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          ,
          <source>arXiv preprint arXiv:1412.3555</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Surden</surname>
          </string-name>
          ,
          <source>Artificial Intelligence and Law</source>
          , Washington Law Review
          <volume>89</volume>
          (
          <year>2014</year>
          )
          <fpage>87</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Santoni de Sio</surname>
          </string-name>
          , G. Mecacci,
          <article-title>Four responsibility gaps with artificial intelligence: Why they matter and how to address them</article-title>
          ,
          <source>Philosophy &amp; Technology</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>