Building and Analyzing the Brazilian Legal
Knowledge Graph
Rilder S. Pires1,*,† , Henrique Santos2,† , Ricardo Guedes1,† , João A. Monteiro Neto1,† ,
Carlos Caminha1,† and Vasco Furtado1,†
1
    Universidade de Fortaleza, Fortaleza CE, Brazil
2
    Rensselaer Polytechnic Institute, Troy NY, USA


                                         Abstract
                                         Artificial Intelligence has proven to be effective in streamlining processes in several domains. The
                                         Brazilian judiciary, specifically, has a very large number of cases, above the work capacity of the courts,
                                         generating urgency in the creation of methods that mainly support the access and manipulation of
                                         unstructured data. This paper presents the construction of a Knowledge Graph of the Brazilian Legislation
                                         using Semantic Web standards that allows an understanding of how Brazilian laws interact with each
                                         other. The Knowledge Graph was quantitatively evaluated using complex network analysis and it was
                                         found to be useful to support experts in understanding the Brazilian legislation by detecting special
                                         nodes, namely the “bridge-like nodes”, that play an important role in the structure of the graph.


1. Introduction
There is a growing demand for the application of Artificial Intelligence (AI) resources in all
branches of human activity. Many of the tasks performed by people can be performed by
machines with greater speed and reliability. In this context, Knowledge graphs (KGs) have been
successfully applied in many domains as a way to encode knowledge for supporting downstream
applications [1]. They represent knowledge in the form of triples subject - predicate - object and
are constructed based on domain sources, including authoritative documents, regulations, and
domain practitioners.
   These tools can be effectively applied in the legal domain, where accessing more structured
data about legal documents, especially laws, and understanding their connections and relation-
ships can provide useful information not only for legal scholars investigating how legal topics
evolve but, also, for the emerging Legal Tech Industry. A well-established source that accurately
represents the law, its topics, and relationships, could provide the path to more effective and
reliable legal knowledge which could be used to support applications ranging from automatic
classification of legal documents to deployment of predictive decision-making support.
Joint Proceedings of ISWC2022 Workshops: the International Workshop on Artificial Intelligence Technologies for Legal
Documents (AI4LEGAL) and the International Workshop on Knowledge Graph Summarization (KGSum) (2022)
*
  Corresponding author.
†
  These authors contributed equally.
$ rilder@unifor.br (R. S. Pires); oliveh@rpi.edu (H. Santos); ricardobmg@edu.unifor.br (R. Guedes);
joaoneto@unifor.br (J. A. M. Neto); caminha@unifor.br (C. Caminha); vasco@unifor.br (V. Furtado)
 0000-0003-4873-5308 (R. S. Pires); 0000-0002-2110-6416 (H. Santos)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          22
   In addition, there are, several approaches for encoding legislation and laws, including as KGs,
mainly in Europe [2, 3]. In this sense, using AI to streamline processes in the Courts of Justice
is a promising application [4], especially, in Brazil, where the Judiciary has a very large number
of cases that overwhelm its current capacity [5] and strengthen the notion of “slow justice”.
   In this paper, we present the construction of the Knowledge Graph for Brazilian Federal
Legislation through the examination of relationships between laws, including citations, changes,
and repeals. For this, we leveraged the LexML data portal, which serves as a source of semi-
structured Brazilian legal data [6]. To represent the data as a KG, we explore methods and
Web Semantic standards, including established legal vocabularies, to model and build an RDF-
based Brazilian Legal Knowledge Graph where nodes represent legal documents and edges the
relationships between them. Finally, we use graph theory to extract information from the KG
that allowed us to build a profile of the structure of Brazilian legislation. In addition, we test
the hypothesis that special nodes, namely the “bridge-like nodes” (nodes that belong to the
neighborhood of at least two nodes associated with Brazilian Legal Codes), can be successfully
used to identify legal documents that play an important role in the legislation.


2. Related Work
Semantic Web and Data Science applications for the legal domain have been very common in
the last 15 years [7, 8, 9, 10, 11, 12, 13]. In general, these applications are focused on the problem
of transforming a representation of legal information in natural language into representations
of structured data, such as creating ontologies [14], including investigating design patterns to
create these knowledge graphs in the legal domain [10]. By using the Resource Description
Framework (RDF) as a means of representing legal information, Ebenhoch outlines the challenges
of describing legal resources and highlights that the main approach to enriching legal data is to
enhance it with metadata [9].
   In recent years, systems for processing legal domain content received a lot of attention. In
the same measure, efforts have been done on the creation of knowledge graphs for the legal
domain. An example of such efforts is the EU-funded project Lynx [15] which has shown many
use cases in the legal domain. Despite that, Saias and Quaresma describe the problem of lack
of semantics in legal information retrieval systems and propose an ontology to enrich legal
data based on semantics in the Portuguese legal system [16]. Although it is possible to observe
the proposition of a series of semantic representations of the legal domain in the literature, we
identified a gap to be filled with the proposition of a Knowledge Graph of Brazilian Legislation.


3. Legal Data Sourcing / Knowledge Graph Construction
We sourced data by extracting information from the LexML portal [6], a website that special-
izes in legal and legislative information in Brazil. The portal uses Uniform Resource Names
(URNs) [17] to uniquely identify each legal document. URNs are persistent and unambiguous
identifications and we leverage them to unify, organize and facilitate access to descriptive
information about the legislation. This allowed our approach to remain comprehensive since
the URN can be unequivocally and internationally recognized [18].


                                                 23
                         Legislation type                       Quantity
                         Ordinary Laws                          15741
                         Decrees                                164066
                         Complementary Laws                     189
                         Decree-Laws                            12547
                         Delegated Laws                         13
                         Provisional Measures AE32              6189
                         Provisional Measures PE32              1080
                         Constitutions                          5
                         Legislative Decrees                    14577
                         Total                                  214407

Table 1
Legislation extracted from LexML portal


   The data extraction was performed using Python’s Requests module [19], which allowed us
to send HTTP/1.1 requests to the LexML portal and get back information about the federal
legislation. Table 1 shows the list of legislation types extracted from the LexML portal and the
corresponding quantity in which legislation appears for each type.
   The database created using this approach is composed of information about legislation, such
as type, name, date of approval, date of publication, and summary. Besides that, it contains
information about relationships between elements of the legislation such as changes, citations,
regulations, and repeals. This amount of collected information allowed us to create a Legal
Knowledge Graph capable of representing the different elements of the legislation and the
different relationships between them.
   In Figure 1, we show a complete visualization of the Legal Knowledge Graph. This graph
has a total of 214,407 nodes, that correspond to legislation presented in Table 1, and a total of
142,734 edges where 28,451 represent changes, 17,452 citations, 17,452 regulations and 93,905
repeals. We highlight that different legislation types (nodes) can interact through different types
of relationships.

3.1. Employing Semantic Web Standards
Although KGs are not defined by the technology they adopt, Semantic Web standards have
proven useful when used to represent knowledge in a graph structure, including provenance [20],
named-entity disambiguation [21], and domain knowledge representation [1]. Another impor-
tant advantage is making the content easier to use by downstream applications, allowing access
to it through the use of standardized query languages [22] and vocabularies [23, 24].
   Knowledge graphs often employ ontologies as a way to abstract graph content, supporting
graph cohesion by providing standardized terminology for nodes and edges. Ontologies help
during automatic KG construction (supporting link prediction) as well as to enable graph
querying. We used the European Legislation Identifier (ELI)1 ontology, which provides a set of
fundamental legal concepts and is reasonably capable of being reused in many legal sources.
1
    https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52012XG1026(01)


                                                      24
We also developed routines for the automatic generation of legal resource metadata in RDF
from the legislation database presented in the previous section. Listing 1 shows an example
containing Federal Decree-Law No. 2848 of 1940 and Law No. 14110 of 2020 that changes it.
 1      <#LOF#14.110#2020-12-18> a eli:LegalResource ;
 2         eli:date_publication "2020-12-18"^^xsd:date ;
 3         eli:in_force "2020-12-21"^^xsd:date ;
 4         eli:changes <#DLF#2.848#1940-12-7> ;
 5         eli:description "Altera o art. 339 do Decreto-Lei n. 2.848, de 7 de
               dezembro de 1940 (Codigo Penal), para dar nova redacao ao crime de
               denunciacao caluniosa" ;
 6         eli:type_document "Lei Ordinaria Federal" ;
 7         eli:number "14.110" .
 8
 9      <#DLF#2.848#1940-12-7> a eli:LegalResource ;
10         eli:date_publication "1940-12-7"^^xsd:date ;
11         eli:in_force "1940-12-7"^^xsd:date ;
12         eli:changed\_by <#LOF#14.110#2020-12-18> ;
13         eli:description "Codigo Penal" ;
14         eli:type_document "Decreto-Lei Federal" ;
15         eli:number "2.848" .

         Listing 1: Example of Brazilian legal resources in RDF using the ELI ontology

   The ELI ontology provides the high-level LegalResource class and the type_document
predicate that allows the characterization of the legal resource according to a domain-specific
ontology. Other predicates, including changes, amends, and repeals, help to build the graph
by allowing the proper identification of connections between laws. We provide an online version
of the Legal Knowledge Graph using this ontology at https://github.com/hansidm/br-legal-kg.


Figure 1: Graphical visualization of the Legal Knowledge Graph. The nodes correspond to the legislation
presented in Table 1. The edges represent the different relationships between the legislation according
to the colors presented in the upper right corner of the figure.


                                                  25
4. Complex Network Analysis
The approach described in the previous sections has the advantage of allowing the application
of complex network techniques to explore Brazilian legislation. Here, we perform a direct
simplification of the constructed KG by defining a network where nodes represent unique
legislation and edges represent any type of relationship described in the previous section. Then,
we use graph theory to extract topological information from the network that allows us to build
a profile of the structure of Brazilian legislation. This process resulted in a directed unweighted
graph with 𝑁𝑣 = 214, 407 nodes and 𝑁𝑒 = 141, 568 edges. It is important to note that the
fraction 𝑁𝑒 /𝑁𝑣 ≈ 0.660 is too low, which is an indication of the graph’s edge density. This
suggests that the network is not a connected graph and is likely split into many small fragments.
To explore this property, we perform a connected components analysis of the graph. By doing
this, we find a total of 98,710 weakly connected components, where the largest of them has 110,466
(51.5%) nodes and 95,484 (44.5%) of them are unconnected nodes. The remaining 8,457 nodes
are divided into the other 3,225 components. This indicates that a number of legal documents
do not relate with which other or are independent.
   To quantify the global importance of a given node, it is often convenient to look at some
node-centrality measures. In this case, the simple and natural choice for measuring importance
in our context is certainly in-degree centrality [25]. In Figure 2, we show the rank plot for the
in-degree centrality of the nodes in the network described before. As we can see, this centrality
is a good measure of importance, since the nodes with the highest in-degree are the Legal Codes
of Brazilian Legislation. In the inset of this figure, we show that the in-degree distribution
follows a power-law with exponent 2.65 which indicates a scale-free behavior, common in


       Legislation                           Code name                               𝑘𝑖𝑛
       Decree-Law No. 5452 of 01/05/1943     Consolidation of Labor Laws             313
       Law No. 5172 of 25/10/1966            National Tax Code                       257
       Decree-Law No. 2848 of 07/12/1940     Penal Code                              128
       Decree-Law No. 3689 of 03/10/1941     Penal Procedure Code                    78
       Law No. 5869 of 11/01/1973            Civil Procedure Code (1973)             78
       Law No. 4737 of 15/07/1965            Electoral Code                          73
       Law No. 10406 of 10/01/2002           Civil Code                              54
       Law No. 9503 of 23/09/1997            Brazilian Traffic Code                  50
       Law No. 4117 of 27/08/1962            Brazilian Telecommunications Code       36
       Law No. 8078 of 11/09/1990            Consumer Protection Code                31
       Law No. 7565 of 19/12/1986            Brazilian Aeronautics Code              24
       Decree-Law No. 227 of 28/02/1967      Mine Code                               22
       Law No. 12651 of 25/05/2012           Forest Code                             18
       Decree-Law No. 1002 of 21/10/1969     Military Penal Procedure Code           13
       Decree-Law No. 1001 of 21/10/1969     Military Penal Code                     9
       Law No. 13105 of 16/03/2015           Civil Procedure Code (2015)             8
       Decree No. 24643 of 10/07/1934        Water Code                              3

Table 2
Relation of Brazilian Legal Codes.


                                                26
Figure 2: Rank plot for in-degree centrality for the simplification of the Legal Knowledge Graph. In the
inset, the in-degree distribution shows a power-law behavior with an exponent of 2.65.


several phenomena [26, 27, 28, 13, 29].
   In Table 2, we show some Brazilian legal codes2 and the corresponding in-degree 𝑘 𝑖𝑛 of the
nodes associated with them. In this table, the laws are sorted in descending order of 𝑘 𝑖𝑛 . As
we can see, the highest in-degree node is associated with the “Consolidation of Labor Laws”,
followed by the “National Tax Code” and the “Penal Code”. Surprisingly, there are some Legal
Codes that, despite their importance, have a very low in-degree. Nonetheless, in-degree still is
a good proxy for the relevance of a Legal Code in the same way that it is for usual nodes. This
fact just indicates that having a high in-degree is sufficient but not a necessary condition for a
node to be important.

4.1. Detecting Bridge-Like Nodes
Brazilian law practitioners usually have a qualitative notion of what specific laws are of interest,
without specific metrics that would reveal them from a computational perspective. We have
worked with domain practitioners to identify elements that would characterize a legal document
as of interest. In doing so, we define the bridge-like nodes as the nodes that belong to the
neighborhood of at least two nodes associated with Brazilian Legal Codes. We hypothesize that
if a node is bridge-like, then this node represents a legal document that is of interest according
to the domain practitioners.
   Proceeding in that way, we detect 55 bridge-like nodes connected to 16 Brazilian Legal Codes
shown in Table 2. There is only one code that has no bridge-like nodes connected to it, namely
the "Water Code". A convenient representation of the relationship between the bridge-like nodes

2
    The list of Brazilian Legal Codes used for this analysis can be found at: http://www4.planalto.gov.br/legislacao/portal-
    legis/legislacao-1/codigos-1.


                                                              27
Figure 3: The subgraph defined by the Brazilian Legal Codes, the bridge-like nodes, and the edges
between them. The orange circles represent the Brazilian Legal Codes shown in Table 2, the blue circles
represent the bridge-like nodes and the gray lines represent the connections between different nodes.


and the Brazilian legal codes can be done by drawing the subgraph defined by the Brazilian
Legal Codes, the bridge-like nodes, and the edges between them. In Figure 3, we show this
subgraph. There, the Brazilian Legal Codes are represented by the big orange circles, and the
bridge-like nodes are represented by the blue circles. The subgraph defined by Brazilian Legal
Codes and bridge-like nodes has a total of 76 connected nodes and is a fraction of the largest
component of the network.
   In Table 3, we show the five bridge-like nodes with the highest 𝑘𝑖𝑛 . The first bridge-like
node corresponds to “Law No. 8383 of 30/12/1991”. This law changes the income tax legislation
and establishes the Tax Reference Unit, a tax correction factor created in a period of quite high
inflation in Brazil. The second node corresponds to “Law No. 8884 of 11/06/1994”. It provides
for the prevention and repression of infractions against the economic order and transforms


                                                  28
                        Legislation                                𝑘𝑖𝑛
                        Law No. 8383 of 30/12/1991                 45
                        Law No. 8884 of 11/06/1994                 21
                        Law No. 13146 of 06/07/2015                18
                        Law No. 11340 of 07/08/2006                16
                        Law No. 7730 of 31/01/1989                 15

Table 3
Top five Bridge-Like nodes by 𝑘𝑖𝑛 .


the Administrative Council for Economic Defense (Cade) into an Autarchy, which is a type of
indirect public administration entity with administrative and financial autonomy. The third
node corresponds to “Law No. 13146 of 06/07/2015”. It establishes the Brazilian Inclusion Law of
Persons with Disabilities. An important Brazilian law that later became known as the “Statute
of Persons with Disabilities”. These three examples show that, despite the bridge-like nodes
having a low in-degree, they are of great importance to Brazilian Legislation.


5. Conclusion
We presented an approach for constructing a Knowledge Graph for Brazilian Legislation. Legal
data from LexMl portal has been extracted and integrated into a database that serves as input
to a method that represents this data as linked data in RDF using Semantic Web standards.
Then, we used graph theory to extract information from the Knowledge Graph that allowed
us to make characterizations about the structure of Brazilian legislation, essentially building a
graph profile. We found indications of low edge density in the graph, which was confirmed by
performing a connected component analysis. We also calculated the in-degree centrality for the
network nodes and observed that it is a good global-importance measure of a given node in our
network since the nodes with the highest degree coincide with the Legal Codes of Brazilian
Legislation.
   This approach, also, allowed us to detect “bridge-like nodes” (nodes that belong to the
neighborhood of at least two nodes associated with Brazilian Legal Codes). These nodes have
been shown to play an important role in the structure of the network since they connect different
important parts of Brazilian Legislation. We believe that a careful analysis of these bridge-like
nodes would allow a more detailed understanding of the inner structure of the network. It
could also bring insights into the processes that lead to the formation of the Legal Knowledge
Graph. The information generated by the bridge-like nodes approach can be very useful for
legal scholars to understand the real impact of laws as it promotes a new way to identify to
what legal dimensions one particular law establishes connections.
   For future work, in terms of graph applications, we intend to carry out a deeper analysis of
the Knowledge Graph in order to better understand its underlying topological structure and
characterize temporal correlations between laws that can help us to understand the evolution
of Brazilian Legislation. In terms of Semantic Web Standards, we intend to refine the legal
modeling in the graph by exploring the formal definition of the extensions needed for further


                                                29
characterization of legal data in the form of an ontology. We also intend to explore the inner
structure of the legal documents which could shed light on the relations between parts of docu-
ments. As LexML is of a high level, we can refine the representation by providing contextualized
terminology for the Brazilian scenario.

5.0.1. Acknowledgments.
We gratefully acknowledge CNPq, CAPES, FUNCAP, and the Edson Queiroz Foundation for
financial support.


References
 [1] B. Abu-Salih, Domain-specific knowledge graphs: A survey, Journal of Network and
     Computer Applications 185 (2021) 103076.
 [2] J. Breuker, P. Casanovas, M. C. A. Klein, E. Francesconi, The Flood, the Channels and the
     Dykes: Managing Legal Information in a Globalized and Digital World, Law, Ontologies
     and the Semantic Web (2009) 3–18.
 [3] E. Filtz, S. Kirrane, A. Polleres, The linked legal data landscape: linking legal data across
     different countries, Artificial Intelligence and Law (2021).
 [4] L. F. Salomão, Artificial Intelligence: Technology Applied to Conflict Resolution in the
     Brazilian Judiciary, Research report, 2021. URL: https://ciapj.fgv.br/sites/ciapj.fgv.br/files/
     report_ai_ciapj.pdf.
 [5] Conselho Nacional de Justiça, GovRisk The International Governance & Risk Institute, UK-
     Brazil Cooperation: improving efficiency and performance in Brazil’s Judiciary, 2016/17
     (2017).
 [6] LexML Brasil: Rede de Informação Legislativa e Jurídica, https://www.lexml.gov.br, ????
 [7] C. Biagioli, E. Francesconi, A. Passerini, S. Montemagni, C. Soria, Automatic semantics
     extraction in law documents, in: Proceedings of the 10th international conference on
     Artificial intelligence and law, 2005, pp. 133–140.
 [8] A. Boer, R. Hoekstra, R. Winkels, T. v. Engers, F. Willaert, Proposal for a dutch legal xml
     standard, in: International Conference on Electronic Government, Springer, 2002, pp.
     142–149.
 [9] M. P. Ebenhoch, Legal knowledge representation using the resource description framework
     (rdf), in: 12th International Workshop on Database and Expert Systems Applications, IEEE,
     2001, pp. 369–373.
[10] A. Gangemi, Design patterns for legal ontology construction, Design Patterns for Legal
     Ontology Construction (2007) 1000–1021.
[11] E. Filtz, Building and processing a knowledge-graph for legal data, in: European Semantic
     Web Conference, Springer, 2017, pp. 184–194.
[12] F. Amato, A. Mazzeo, A. Penta, A. Picariello, Building rdf ontologies from semi-structured
     legal documents, in: 2008 International Conference on Complex, Intelligent and Software
     Intensive Systems, IEEE, 2008, pp. 997–1002.
[13] J. L. B. de Araújo, J. A. M. Neto, F. Siqueira, C. M. Santos, R. Vasconcelos, E. A. Oliveira,


                                                30
     C. Caminha, V. Furtado, Characteristics of effective paths in brazilian legal processes
     (2021).
[14] G. Lame, Using nlp techniques to identify legal ontology components: concepts and
     relations, in: Law and the Semantic Web, Springer, 2005, pp. 169–184.
[15] J. M. Schneider, G. Rehm, E. Montiel-Ponsoda, V. Rodríguez-Doncel, P. Martín-Chozas,
     M. Navas-Loro, M. Kaltenböck, A. Revenko, S. Karampatakis, C. Sageder, et al., Lynx: A
     knowledge-based ai service platform for content processing, enrichment and analysis for
     the legal domain, Information Systems 106 (2022) 101966.
[16] J. Saias, P. Quaresma, Semantic enrichment of a web legal information retrieval system,
     in: JURIX, 2002, pp. 11–20.
[17] URIs, URLs, and URNs: Clarifications and Recommendations 1.0, W3C Note, W3C, 2001.
     Https://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/.
[18] Rede de Informação Legislativa e Jurídica, https://projeto.lexml.gov.br/documentacao/
     Parte-2-LexML-URN.pdf, ????
[19] Requests: HTTP for Humans, https://docs.python-requests.org/en/latest/, ????
[20] L. F. Sikos, D. Philp, Provenance-Aware Knowledge Representation: A Survey of Data
     Models and Contextualized Knowledge Graphs, Data Science and Engineering 5 (2020)
     293–316.
[21] I. O. Mulang’, K. Singh, C. Prabhu, A. Nadgeri, J. Hoffart, J. Lehmann, Evaluating the
     Impact of Knowledge Graph Context on Entity Disambiguation Models, in: Proceedings of
     the 29th ACM International Conference on Information & Knowledge Management, CIKM
     ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 2157–2160.
[22] SPARQL         1.1     Overview,        W3C        Recommendation,            W3C,      2013.
     Https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/.
[23] R. Guha, D. Brickley, RDF Schema 1.1, W3C Recommendation, W3C, 2014.
     Https://www.w3.org/TR/2014/REC-rdf-schema-20140225/.
[24] OWL 2 Web Ontology Language Document Overview (Second Edition), Technical Report,
     W3C, 2012. Https://www.w3.org/TR/2012/REC-owl2-overview-20121211/.
[25] M. Newman, Networks: An Introduction, Oxford University Press, London, 2010.
[26] C. Caminha, V. Furtado, T. H. Pequeno, C. Ponte, H. P. Melo, E. A. Oliveira, J. S. Andrade Jr,
     Human mobility in large cities as a proxy for crime, PloS one 12 (2017) e0171609.
[27] C. Caminha, V. Furtado, V. Pinheiro, C. Ponte, Graph mining for the detection of over-
     crowding and waste of resources in public transport, Journal of Internet Services and
     Applications 9 (2018) 1–11.
[28] C. Ponte, H. P. M. Melo, C. Caminha, J. S. Andrade Jr, V. Furtado, Traveling heterogeneity
     in public transportation, EPJ Data Science 7 (2018) 1–10.
[29] R. Albert, Scale-free networks in cell biology, Journal of cell science 118 (2005) 4947–4957.


                                                31