Predicting Judicial Outcomes in the Brazilian
         Legal System Using Textual Features

                 Vithor Gomes Ferreira Bertalan1[0000−0002−1585−7694] and
                    Evandro Eduardo Seron Ruiz2[0000−0002−7434−897X]
     1
          PPG-CA, DCM, FFCLRP, University of São Paulo (USP), vbertalan@usp.br
 2
         Department of Computing and Mathematics, FFCLRP, University of São Paulo,
                                    evandro@usp.br


            Abstract. The combination of Natural Language Processing and Arti-
            ficial Intelligence for the field of Law is a growing area, with the potential
            of radically changing the daily routine of legal professionals. The amount
            of text generated by those professionals is outstanding, and to this point,
            still unexplored by Computer Science. One of the most acclaimed re-
            search field covering both knowledge areas is Legal Prediction, in which
            intelligent systems try to predict specific judicial characteristics, such as
            the judicial outcome or the judicial class or a given case. This research
            intends to create a classifier to predict judicial outcomes in the Brazilian
            legal system. At first, we developed a text crawler to retrieve judicial
            outcomes from the official Brazilian electronic legal systems. Afterward,
            a few judicial subjects were selected, and some of their features were
            extracted. Later, a set of different classifiers was applied to predict the
            legal considering these textual features.

            Keywords: Legal prediction · Digital humanities · Artificial Intelligence
            and Law


1         Introduction and Research Objectives
1.1        Introduction
The combination of Natural Language Processing and Artificial Computer Sci-
ence has been revolutionizing many different fields of expertise. Subfields of
Computer Science, like Natural Language Processing (also known as NLP), have
steadily improved a myriad of professional and scientific activities. NLP helps re-
searchers to understand how computers can process and analyze large amounts
of natural language data and their meanings. Even simple NLP mechanisms,
such as dictionaries and word counts, can offer interesting underlying facts that
cannot always be noticeable without effective processing.
     Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
     mons License Attribution 4.0 International (CC BY 4.0). DHandNLP, 2 March 2020,
     Evora, Portugal.
       Bertalan and Ruiz

    Law is one of the knowledge areas that are the most dependent on text data.
Millions of legislation workpapers, court decisions, and appeals are produced
daily, and many different job specializations, such as lawyers, judges, defendants,
and plaintiffs, have various necessities that could be supplied by intelligent sys-
tems.
    Over the last years, researchers have been dedicated to predicting judicial
case outcomes using NLP application software and Machine Learning methods
over those textual cases. See Section 1.3 below. However, no research with this
intention has been done in Brazilian Portuguese for Brazilian courts, as of 2019.
    Branting et al. [7] cites that automation of legal reasoning and problem-
solving has been a goal of Computer Science research from its earliest days.
However, according to the author, broad adoption of legal computer systems
never occurred, and Computer Science and law remained a niche research area
with little practical impact.
    Hyman et al. [10] point out the Zubulake v. UBS Warburg case, a series of
trials and decisions dealing with what data a litigant must preserve and under
what circumstances the parties must pay for search and production costs, as
a seminal case for AI in Law. According to the authors, this case became a
landmark for the practical applications of the research. This case is mainly about
the inability of the defendant to retrieve hundreds to thousands of emails that
were claimed by the plaintiff to be relevant to the main issue in the lawsuit.


1.2   Research Objectives

As a primary objective, this research intends to develop a framework to predict
judicial outcomes in the Brazilian state of São Paulo Justice Court. The São
Paulo Justice Court is the most significant legal court on the planet, considering
the number of cases per year. We believe that designing a predicting model that
offers consistent results for this judicial court this model could be transferred,
after fine-tuning, to any other court. Firstly, we will develop a text crawler to
retrieve data from the legal outcomes. To pre-process these data, we combine
NLP tools to extract characteristics from the text, selecting what is believed to
be the primary information that can lead to valid predictions. After this step, we
will insert these pre-processed data into machine learning frameworks. Finally,
we evaluate all the methods and their respective results against the real judicial
outcomes of the court.


1.3   Related Works

Recent advancements have significantly improved the state-of-the-art in the field
of legal prediction. In the most influential work, Aletras et al. [3] have used a
dataset of cases from the European Court of Human Rights, containing cases
that violate Article 3 (Prohibits torture and inhuman and degrading treatment),
Article 6 (Protects the right to a fair trial ), and Article 8 (Provides a right to
respect for one’s private and family life, his home and his correspondence) of the
Predicting Judicial Outcomes in the B. Legal System Using Textual Features

Convention. Katz et al. [12] have used random forests to predict the behavior of
the Supreme Court of the United States.
     In another influential research, Sulea et al. [18] have done a similar inves-
tigation, predicting the law area and decisions of the French Supreme Court
using lexical features and support vector machine, SVM. The authors have used
a diachronic collection of rulings from the French Supreme Court (Court de
Cassation, in European French).
     Recent researches have used machine learning successfully to improve Law
decisions. Gokhale and Fasli [9] have developed a co-training algorithm to clas-
sify human rights abuses, using SVM and Logistic Regression. Branting et al. [7]
have used hierar-chical attention networks, SVMs, and maximum entropy clas-
sifications for decision support in administrative adjudication, such and routine
licensing, permitting, immigration, and benefits decisions. SVMs are also used by
Fornaciari and Poesio [8] to automatically detect deception, such as defamation
and false testimony, in Italian court cases. Remnits [16] has used Latent Dirichlet
Allocation to discover the main topics of discussion in judicial outcomes of the
United States Supreme Court. Mochales [15] has used argumentation mining to
structure better legal arguments, capturing main issues and evidence of a given
corpus.
     Not directly related to the scope of legal prediction used in this paper,
some recent research has also been conducted in Brazilian Portuguese. Aires et
al. [1] have used deontic logic to identify norm conflicts in contracts. In contrast,
Araujo, Rigo and Barbosa [5] have used ontology-based algorithms to classify
legal documents in Brazilian judicial outcomes.
     Liu and Chen [14] also write that natural language processing and machine
learning have augmented possibilities based on the exploration of semantic of
law and case texts. Also, according to Barraud [6], NLP and AI have expanded
the limits of justice, transforming it into a predictive, quantitative, statistic, and
simulative justice. As stated by Alarie, Niblett and Yoon [2], intelligent judicial
systems can not only help lawyers with timely and objective assessments of their
claims but also governments, by using legal classifiers to help evaluate claims and
manage litigation risks.


2     Methodology
2.1   Domain Characterization
A corpus of judicial sentences, along with their outcomes, was collected from
eSAJ, the electronic system of the São Paulo Justice Court (TJSP). To re-
strict the number of documents retrieved, a few previously defined judicial sub-
jects were selected, which are: second-degree murder (in Brazilian Portuguese,
homicı́dio simples), and active corruption (in Brazilian Portuguese, corrupção
ativa).
    Only subjects with very well defined outcomes will be selected. As well
defined outcomes, we intend to choose those judicial outcomes with the con-
demnation or absolution of the defendant. Many different legal subjects, such
        Bertalan and Ruiz

as divorce papers, do not have explicit terms for condemnation or absolution.
Therefore, it is of utmost importance to find those judicial subjects with clear
and well-established results. We also applied numerical labels to features as con-
demnations and to the absolutions. These binary labels are used to develop a
mathematical/statistical model that one may predict the conclusion of the cases
mentioned above based on their textual structure.

2.2   Data Collection
We have implemented a web text crawler to retrieve data from eSAJ, using
Python. As the user can select from many different fields to exhibit the judicial
opinions, such as classes, subjects, judges, and processes numbers, the crawler
must be able to choose from those different query choices to compose a raw text
file. The following fields were collected: judicial class, judicial subject, judge,
county, release date, and full text of the judicial sentence.
     In addition to the data collected from the text, additional fields were derived
from the information collected: gender of the judge, and a boolean field indicating
whether the county was the capital city of São Paulo, or a state city.
     To classify each of the judicial outcomes, we have created the binary labels
for condemnation (+1) and absolution (-1). We have also used the professional
guidance of Brazilian lawyers, with the purpose of better understanding the
texts. The language adopted in the field of Law worldwide might be notoriously
obscure for a layman.
     The data collection poses a compelling challenge, as the amount of data
displayed on the website of the TJSP is indeed substantial. As of June 1st, 2018,
a simple search for the Brazilian Portuguese correspondent of rape (estupro),
for example, returns 5,658 hits. Another search for the Brazilian Portuguese
term for drug (droga) returns 138,956 hits. Each hit is a judicial opinion of its
own, containing many sentences and text topics. As each judicial class under the
Brazilian law system contains different text topics, e.g. the topics in a text from
the class divorce papers (in Brazilian Portuguese, documentos de divórcio) differ
substantially from texts from the class release permits (in Brazilian Portuguese,
alvará de soltura).
     Table 1 illustrates the number of documents retrieved from the eSAJ system.

               Table 1. Number of texts collected by judicial subject

Judicial Subject     Number of Cases Absolutions Convictions
Second-degree Murder      591           255         336
Active Corruption         191            31         158


2.3   Data Preprocessing
The data retrieved was pre-processed to remove unnecessary information. We
began by tokenizing the text and eliminating stopwords. For this task, we used
Predicting Judicial Outcomes in the B. Legal System Using Textual Features

the Natural Language Processing Toolkit in Python, called NTLK. NLTK can
deal with different languages other than English, such as Brazilian Portuguese,
which makes this framework a strong candidate for this research. Unusual char-
acters, such as hyphens or parentheses, were also removed. After those steps, we
have proceeded with the stemming of the resulting text. For the testing rounds,
we used a 10-fold cross validation.


2.4   Data Transformation

In this step, we transformed the data into a mathematical sequence that can
be passed through machine learning algorithms. We used TFIDF (term fre-
quency–inverse document frequency) to transform each of the sentences into
numbers that are processed through various machine learning (ML) algorithms.


3     Preliminary Results

3.1   Homicides Data Set

We have already tested the two different datasets with various ML methods. The
methods selected were: Logistic Regression (LR), Linear Discriminant Analysis
(LDA), Gaussian Naı̈ve Bayes (GNB), K-Neighbor (KN), Support Vector Ma-
chines (SVM) and, Regression Trees (RT). All the tests were initially run by
splitting the training set with 75% of the whole data set, and the test set with
the remaing 25% .
    For the first data set, the homicides legal texts, we found the calibration
plot in Fig. 1. In calibration plots using Platt Scaling, the closest to the per-
fect calibrated curve, the better. Therefore, from this figure, we can infer that
Regression Trees show better results. Support Vector Machines presented the
lowest performance, predicting values very differently to those that would be
expected based on the reviewed literature.
    At last, we run a k-fold cross validation, with 10 epochs, for each of the algo-
rithms. The results are shown in Table 2 and Fig. 2. LDA and Linear Regression,
in these tests, were the best choices among all the algorithms.


Table 2. Results (by mean values) on the homicides database, after 10 epochs k-fold

Algorithm               Acccuracy Accuracy σ Recall Precision F1 Score
Logistic Regression     0.893390 0.040169    0.912864 0.896453 0.903230
LDA                     0.908588 0.035791    0.909055 0.922748 0.914758
K-Neighbors             0.815621  0.045490   0.873097 0.813255 0.839270
Regression Trees        0.881610  0.019635   0.887465 0.907709 0.895757
Gaussian Naı̈ve Bayes   0.700678  0.085623   0.968779 0.661592 0.783059
Support Vector Machines 0.568644  0.080258   1.000000 0.568644 0.900421
Bertalan and Ruiz


        Fig. 1. Calibration plot for the homicides database.


       Fig. 2. Candlesticks for the algorithms with 10 k-fold
Predicting Judicial Outcomes in the B. Legal System Using Textual Features

3.2   Corruption Data Set

For the second data set, with the corruption legal texts, we found the calibra-
tion plot shown in Fig. 3. In Platt Scaling, the closest to the perfect calibrated
curve, the better. As the previous results with the Homicides data set, we can
infer that Regression Trees are the best choice. K-Neighbors showed the lowest
performance, predicting values very differently to those that would be expected.
All the tests were initially run by splitting the training set considering 75% for
the training set and 25% for the test set.


               Fig. 3. Calibration plot for the corruption database.


   Those results show that, at least for the two databases being evaluated,
Regression Trees are the best choice for predicting the outcomes of legal texts.
In both data sets, it showed to the the best choice, or being among the best
choices available.


4     Conclusions and Future Steps

After the tests done, we can infer that Regression Trees are the best method to
predict results in both data sets being analyzed. Even though the other algo-
rithms showed varying results. SVM, as an example, showed a good performance
in the corruption database, but the lowest value in the homicides database. Re-
gression Trees have always kept good predicting outcomes. Those results match
        Bertalan and Ruiz


Table 3. Results (by mean values) on the corruption database, after 10 epochs k-fold.

Algorithm               Acccuracy Accuracy (σ) Recall Precision F1 Score
Logistic Regression     0.824269  0.100378     1.000000 0.824269 0.900421
LDA                     0.840351  0.108212     0.963918 0.864229 0.907444
K-Neighbors             0.824854  0.096681     0.963918 0.846981 0.897678
Regression Trees        0.973684 0.042433      0.988070 0.974167 0.980524
Gaussian Naı̈ve Bayes   0.866374  0.101778     0.989474 0.876901 0.925169
Support Vector Machines 0.824269  0.100378     1.000000 0.824269 0.900421


               Fig. 4. Candlesticks for the algorithms with 10 k-fold
Predicting Judicial Outcomes in the B. Legal System Using Textual Features

other results found by other researches in the legal area, in many different coun-
tries, such as the one conducted by Kastellec [11], who obtained good outcomes
by using Regression Trees in the American legal system. The author mentions
that Regression Trees have the capability of studying legal conceptions of Law,
revealing patterns that other methods cannot emulate as effectively.
    Other researchers also used the same method, such as Rios-Figueroa [17],
who used Regression Trees to analyze the concept of judicial independence and
corruption among Supreme Courts in Latin America, Antonucci et.al. [4], who
adopted Regression Trees to measure the efficiency of Italian courts, and Ku-
fandirimbwa and Kuranga [13], who used the same method to predict outcomes
in Zimbabwe.
    Those researches show that, even though legal systems are extremely different
around the world and throughout different languages and countries, such as
Brazil, USA, Italy and Zimbabwe, they do have similar characteristics that can
be effectively measured by the correct algorithms.
    As future steps, we plan to adopt different methods of converting word em-
beddings into whole texts, so that we can also utilize methods, such as neural
networks, and, eventually, compare these with the ones mentioned in this work.
        Bertalan and Ruiz

References
 1. Aires, J.P., Pinheiro, D., De Lima, V.S., Meneguzzi, F.: Norm conflict identification
    in contracts. Artificial Intelligence and Law 25(4), 397–428 (2017)
 2. Alarie, B., Niblett, A., Yoon, A.H.: How artificial intelligence will affect the practice
    of law. University of Toronto Law Journal 68(supplement 1), 106–124 (2018)
 3. Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., Lampos, V.: Predicting judicial
    decisions of the european court of human rights: A natural language processing
    perspective. PeerJ Computer Science 2, e93 (2016)
 4. Antonucci, L., Crocetta, C., D’Ovidio, F.D.: Evaluation of Ital-
    ian Judicial System. Procedia Economics and Finance 17(September
    2015),      121–130      (2014).     https://doi.org/10.1016/s2212-5671(14)00886-7,
    http://dx.doi.org/10.1016/S2212-5671(14)00886-7
 5. de Araujo, D.A., Rigo, S.J., Barbosa, J.L.V.: Ontology-based information extrac-
    tion for juridical events with case studies in brazilian legal realm. Artificial Intel-
    ligence and Law 25(4), 379–396 (2017)
 6. Barraud, B.: An algorithm that can predict judges’ decisions: Toward a robotiza-
    tion of justice? Les Cahiers de la Justice (1), 121–139 (2017)
 7. Branting, L., Yeh, A., Weiss, B., Merkhofer, E., Brown, B.: Inducing predictive
    models for decision support in administrative adjudication. In: AI Approaches
    to the Complexity of Legal Systems: AICOL International Workshops 2015-2017:
    AICOL-VI@ JURIX 2015, AICOL-VII@ EKAW 2016, AICOL-VIII@ JURIX 2016,
    AICOL-IX@ ICAIL 2017, and AICOL-X@ JURIX 2017, Revised Selected Papers.
    vol. 10791, p. 465. Springer (2018)
 8. Fornaciari, T., Poesio, M.: Automatic deception detection in italian court cases.
    Artificial intelligence and law 21(3), 303–340 (2013)
 9. Gokhale, R., Fasli, M.: Deploying a co-training algorithm to classify human-rights
    abuses. In: 2017 International Conference on the Frontiers and Advances in Data
    Science (FADS). pp. 108–113. IEEE (2017)
10. Hyman, H., Sincich, T., Will, R., Agrawal, M., Padmanabhan, B., Fridy, W.: A
    process model for information retrieval context learning and knowledge discovery.
    Artificial Intelligence and Law 23(2), 103–132 (2015)
11. Kastellec, J.P.: The Statistical Analysis of Judicial Decisions and Legal Rules
    with Classification Trees. Journal of Empirical Legal Studies 7(2), 202–230 (2010).
    https://doi.org/10.1111/j.1740-1461.2010.01176.x
12. Katz, D.M., Bommarito, M.J., II, J.B.: A general approach for predicting the
    behavior of the supreme court of the united states. PloS one 12(4) (2017)
13. Kufandirimbwa, O., Kuranga, C.: Towards Judicial Data Mining : Arguing for
    Adoption in the Judicial System 1(2), 15–21 (2012)
14. Liu, Z., Chen, H.: A predictive performance comparison of machine learning models
    for judicial cases. In: 2017 IEEE Symposium Series on Computational Intelligence
    (SSCI). pp. 1–6. IEEE (2017)
15. Mochales, R., Moens, M.F.: Argumentation mining. Artificial Intelligence and Law
    19(1), 1–22 (2011)
16. Remmits, Y.L.J.A.: Finding the topics of case law: Latent dirichlet allocation on
    supreme court decisions. B.Sc., Faculteit der Sociale Wetenschappen (7 2017), su-
    pervisors: Kachergis, G.E. ; Kuppevelt, D. van ; Dijck, G. van
17. Rios-Figueroa, J.: Judicial Independence and Corruption: An Anal-
    ysis of Latin America. SSRN Electronic Journal (212) (2011).
    https://doi.org/10.2139/ssrn.912924
Predicting Judicial Outcomes in the B. Legal System Using Textual Features

18. Şulea, O.M., Zampieri, M., Vela, M., van Genabith, J.: Predicting the law area
    and decisions of french supreme court cases. In: Proceedings of the International
    Conference Recent Advances in Natural Language Processing, RANLP 2017. pp.
    716–722 (2017)