Towards legal change analysis: clustering of Polish Civil Code
                           amendments
                                                                           Łukasz Górski
                                    Interdisciplinary Centre for Mathematical and Computational Modelling
                                                              University of Warsaw
                                                               lgorski@icm.edu.pl

ABSTRACT                                                                                        task, as those are often called simply "Statute amending Civil
Due to the growing activity of legislators, lawyers are in need of                              Code" (and sometimes statutes amending the Code focus on
tools that would allow them to get a better understanding of an                                 different pieces of legislation at the same time and the Code
ever-growing corpus of legislative materials. Herein we propose a                               may not be mentioned in their title at all) or the amending
tool that visualizes and clusters thematically similar amending acts,                           provisions can be scattered in a number of statutes that hold
allowing a lawyer to quickly review related provisions, thus giving                             other substantive provisions.
an insight into a legislative history of a given legal institution. The                   (iii) While there has been an extensive body of research pertain-
methods suggested herein (based on TF-IDF, word and paragraph                                   ing to the use of machine learning in the area of law in the
embeddings and PCA as well as k-mean clustering) are evaluated                                  English language, the body of research pertaining to Polish
on the provisions of the Polish Civil Code.                                                     law is obviously smaller.

1    INTRODUCTION                                                                     2     RELATED WORK
This paper describes first steps undertaken in the development                        Practising lawyers need tools that would allow them to track legal
of a software solution used for the visualization of legal change,                    changes, especially due to the increasing activity of legislatures. For
which aims to provide the user with means to effectively explore a                    example, in the Polish legal system it has been noted a number of
database of amending acts. We aim to develop a solution which is                      times that currently the legal system is undergoing the process of
able to group together amending acts that are thematically similar,                   "inflation of law". This notion was recognized by theorists [16] and
in an unsupervised manner.                                                            even the courts, one of which explicitly stated that the legislature
   The proof-of-concept implementation studied herein has been                        is currently multiplying the numbers of unnecessary statutes, which
tested using the Polish Civil Code and relevant amending acts issued                  makes accessing ... sources of law difficult [6].
from its enactment in 1965 up to November 2018. This legal act was                       The problem of orientation in a dynamically changing system
chosen as a basis for experiments for the following reasons:                          of statues can be mitigated to a degree by the introduction of con-
    (i) While the Code was in force, the Polish economy has under-                    solidated texts of acts. In practice, in Poland, the process of con-
        gone transformation from socialism to capitalism and later                    solidation of legal texts is two-fold. On the one hand, there are
        its law had be adapted to the law of the European Union. The                  official consolidated texts of legal acts published by the authorities.
        processes pertaining to the recognition of the information                    In practice, those are however seldom used. Lawyers routinely use
        and communication technologies in the domain of law were                      the legal databases and search engines that are developed by pri-
        also reflected in the Code. Turbulent times, in which the                     vate companies (legal information systems) instead. Currently, the
        Code existed, made it subject to almost 90 amending acts.                     market remains split between C.H. Beck, developer of Legalis infor-
        Some of the sections composing the Civil Code were, in fact,                  mation system, and Wolters Kluwer Polska, with their Lex system.
        subject to change multiple times - please consult the heat                    The editorial offices of both of these systems carefully analyse every
        map (Fig. 1) for a graphical representation of the number                     amending act and issue their versions of the consolidated text. Ob-
        of times a given legal section was amended. Therefore this                    viously, the consolidated texts published by those privately-owned
        research aimed to assess whether modern machine learning                      enterprises do not have a formal force of law, yet the convenience
        approaches would be able to recognize and discover discrete                   offered by them makes those closed and paid platforms a go-to
        categories of changes (not necessarily the three mentioned                    solution for professionals. As far as the recognition of amendments
        hereinbefore), based only on the text of relevant legal provi-                goes, both of these systems offer, inter alia, a clear diff-like view of
        sions.                                                                        the legislative history of a given legal provision (Fig. 2).
   (ii) Even though the amending acts should be as straight-forward                      However, those solutions do not employ any form of graphical
        to understand and as precise as possible, the legislative prac-               presentation of amendments. In fact, artificial intelligence meth-
        tice does not always live up to this standard. For example,                   ods are used sparsely in those types of software: for example the
        the titles of the amending acts do not help in the clustering                 consolidated versions of statutes are created mainly by hand [7].
In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information   Therefore this research, independent of aforementioned commer-
in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada.                      cial solutions, aims to look into means of extending already existing
© 2019 Copyright held by the owner/author(s). Copying permitted for private and       systems.
academic purposes.
Published at http://ceur-ws.org                                                          As far as the analysis of amending acts in the AI and Law com-
                                                                                      munity goes, the focus up to this time was mainly on the automatic
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                          Łukasz Górski


Figure 1: Heatmap showing the number of times each section of the Polish Civil Code was amended. All sections were sorted
sequentially by their numbers. Inspired by the traditional division put forth by the 19th-century German school of Pandectists,
the Code is divided into four books - the starting articles for those books were marked for reference.


Figure 2: Diff view of amended statute in Legalis system. Additions are in blue and underlined, while deletions are denoted by
red and crossed out. Competing Lex system offers a similar view. See the Table 1 for the English translation of the passage.


consolidation of legal texts. For example, authors in [15] created a    representation of changes, similar to that shown in Fig. 2,
tool for semiautomatic implementation of amending acts. Similar         is created. The amending acts are not directly processed:
subject was undertaken in [1], in which a feasibility of using an       this problem, while itself interesting, is out of the scope of
SGML-based engine for amendments processing was explored. Du-           this paper. Usable diffs can be created using Linux wdiff
ally, in [2] a drafting environment was prototyped, which generated     command. In fact, for the purpose of this study, a number
amending acts based on amendments introduced by drafter into a          of diff-generating tools were tested, yet wdiff seemed to
principal act.                                                          be best suited for our instant needs, offering the clearest
   Whilst this research uses word embeddings techniques as well         results (Table 1 can be consulted for examples of differences
as older TF-IDF-based methods, the feasibility of using word em-        between the output of various diff-generating tools).
beddings in eDiscovery procedures was in fact already explored.         The extraction of diffs allowed the creation of three different
In [18] a Disco system is described, which uses word2vec word           bodies of amendments corpora. For their detailed descrip-
embeddings to help legal expert with refining her document data-        tion and example Table 2 should be consulted. The first cor-
base search queries. In Poland, doc2vec model was already used          pus version (C 1 ) consisted of a complete text of given legal
in SAOS, a Polish courts’ judgment analysis system, as a basis for      sections after amending; the second version (C 2 ) included
similarity analysis module [4]. [8] focused on the explainability       only the words that were inserted into a given legal section.
of AI methods and supplemented text similarity measures (based          However, both of these corpora did not include the texts
on TFIDF and word embeddings) with metric showing how much              that were deleted by an amending act. Yet, the provisions
each word contributes to overall similarity result when comparing       or parts of them that were struck down can carry at least
text phrases. K-Means clustering employed in this research was          the same amount of semantic meaning as those that were
used with, inter alia, embedding-based methods for grouping con-        left untouched or added by the legislature. Moreover, in con-
troversial issues that were extracted from Chinese legal texts [17].    temporary legal systems, legislative action is not the only
Similarly, other authors clustered the documents regarding Chinese      means of changing the statute. For example, in Poland, the
criminal cases [5].                                                     Constitutional Tribunal was called a "negative legislator".
                                                                        This term means that, in principle, a Tribunal is unable to
3   METHODS                                                             amend a given legal act by adding some provisions, yet is
                                                                        perfectly capable of striking a given provision down. While
In pursuit of the aim outlined in the preceding section a pipeline of   this position is overly simplistic (as Tribunal in practice was
an existing tools has been created, with all of them instrumented       able to pass, inter alia, interpretative judgments, in which
by Python programms. Python 3.6.8 from Anaconda was used for            it concludes that a given provision is in accordance with
text processing instrumentation, as well as: gensim 3.4.0 for TF-       the Constitution as long as its interpretation is in line with
IDF and embeddings calculations, scikit-learn 0.20.2 for clustering,    the one put forth by the Tribunal [19]) we should be able to
pandas 0.24.0 for data manipulation, nltk 3.4 for text processing       include in our clusterization endeavour effects of removal of
and matplotlib 3.0.2 for visualization.                                 a given statutory provision. To achieve this aim, for the pur-
   Text processing pipeline can be divided into the following phases:   pose of this study, a third version of the corpus (C 3 ) included
    • The generation phase involves reading the consolidated            the parts of the legal provisions that were inserted by the
      versions of a given statute and extracting the differences        amending acts alongside the deleted ones. The disadvantage
      between each successive version. In this phase a textual          of this technique is that it distorts the natural flow of the text
Towards legal change analysis: clustering of Polish Civil Code amendments                             ASAIL 2019, June 21, 2019, Montreal, QC, Canada

Table 1: Differences old and new versions of a given legal provision (section 781 § 1 of the Civil Code, as amended by the
amending act published in Dz.U. [Journal of Laws] from 2016, item 1579), as shown by different implementations of diff (for
illustrative purposes the English translation of original Polish passage is used1 ).

 Diff command              Result
 used
 wdiff                     § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in
                           electronic form and provide it with a secure electronic signature verified with a valid qualified certificate. electronic
                           signature.
 diff                      § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in
                           electronic form and provide it with a secure electronic signature verified with a valid qualified certificate.
                           § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in
                           electronic form and provide it with a qualified electronic signature.
 difflib (Python           § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in
 library)                  electronic form and provide it with a secquralifiedelectronic signature verified with a valid qualified certificate.
 simplediff                § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in
 (Python library)          electronic form and provide it with a secquralifiedelectronic signature verified with a valid qualified certificate.

                                              Table 2: Corpus types for further down the line processing.

 Symbol       Description                           Example from the Civil Code
    C1        Legal section’s text after            § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a
              amendments                            declaration of intent in electronic form and provide it with a qualified electronic signature.
    C2        Only words inserted by the            electronic signature
              amending act
    C3        Using crossed out parts of a          § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a
              given section alongside the           declaration of intent in electronic form and provide it with a secure electronic signature
              inserted ones                         verified with a valid qualified certificate electronic signature.


       and might not fare well with a paragraph embedding method                           denoted as C 1P , C 2P , C 3P ), half of them - not (hereinafter C 1¬P ,
       that depends on the natural sequence of words in a sentence,                        C 2¬P , C 3¬P ).
       and might be better suited for methods that employ bag of                         • The processing stage involved using the K-means clustering
       words technique.                                                                    to group together similar documents from each corpus. The
     • In preprocessing phase these three variations of corpora                            number of clusters, for the sake of the experiments, was set to
       were later processed using the standard NLP pipeline - stop-                        10. Visualization module uses PCA to display the clustering
       words were removed and lemmatization was performed (us-                             results.
       ing the Polish Polimorfologik dictionary [11]). As Polish is                        The following methods were used to generate document
       a highly inflected language, lemmatization had to be used                           vectors as a basis of clustering:
       instead of stemming. On the other hand, stopwords removal                           – TF-IDF, which used corpora C 1P , C 2P , C 3P as well as C 1¬P ,
       and lemmatization are not always utilized with more ad-                                C 2¬P , C 3¬P .
       vanced techniques of text representation, like word or para-                        – word2vec, using the same corpora as TF-IDF. We have used
       graph embeddings. Seminal papers that introduced those                                  the pretrained word embeddings for this part, which were
       techniques do not mention stemming or lemmatization at all                              generated for Polish by other research groups [12]. Those
       (cf. [10]). Therefore we have decided to test the clustering                            were based the National Corpus of Polish database (built
       algorithm with either preprocessed corpus (i.e. with stop-                              using excepts from newspapers, magazines, text extracted
       words removed and lemmatization performed) or without                                   from the internet as well as conversation transcripts) [14],
       preprocessing. Six distinct corpora for clustering were thus                            in addition to Wikipedia database. Two versions of the
       prepared, half of them preprocessed (those will be hereinafter                          word embedding were put under scrutiny, both holding
                                                                                               forms for all part of speech in Polish, with vector consist-
                                                                                               ing of 300 elements. Both models were trained using the
1 The English translation of amended text comes from the Legalis legal information
                                                                                               negative sampling algorithm and differed in the architec-
system, which in turn references The Polish Law Collection database by Translegis
publishing house. The crossed-out sections were translated from Polish to English by           ture - one used CBOW, the other Skip-Gram architecture
the authors of this paper.                                                                     (hereinafter those will be denoted as word2vec(CBOW)
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                                             Łukasz Górski

Table 3: Internal evaluation results of different text representation methods and corpora (bold numbers represent the best
results)

            Text
                                           word2vec(CBOW)                      word2vec(skip-gram)             doc2vec               TF-IDF
       representation
          Corpus                     C 1P C 2P C 3P C 1¬P C 2¬P C 3¬P   C 1P   C 2P   C 3P C 1¬P C 2¬P C 3¬P C 1P C 1¬P C 1P C 2P C 3P C 1¬P C 2¬P C 3¬P
   Silhouette coefficient
                                     0.16 0.14 0.1 0.17 0.23 0.09 0.16 0.29 0.13 0.18 0.38 0.11 0.11 0.09 0.12 0.05 0.09 0.08 0.05 0.08
  higher = better defined clusters
      Calinski-Harabaz
                                     8.03 9.75 5.72 9.74 10.77 6.28 10.27 12.01 6.04 14.17 15.65 6.47 3.17 2.5 2.87 2.12 2.7 2.25 1.52 2.18
  higher = better defined clusters
       Davies-Bouldin
                                     1.69 1.08 1.65 1.15 0.54 1.66 1.28 0.99 1.56 1.35             0.7   1.67 1.61 1.7 1.52 2.97 2.8 1.5 1.16 2.8
 lower = clusters better separated


                                                                                             for clustering algorithms. Firstly, internal evaluation consid-
                                                                                             ers not a given ground truth, but the model itself. Metrics
                                                                                             for internal evaluation presented herein include: silhouette
                                                                                             coefficient and Calinski-Harabaz index (both evaluate how
                                                                                             well the clusters are defined) as well as Davies-Bouldin index
                                                                                             (assesses the separation between clusters) [13].
                                                                                             The external evaluation methods compare machine-generated
                                                                                             clusters with some pre-existing evaluation gold standard,
                                                                                             thus allowing the introduction of standard measures of pre-
                                                                                             cision, recall or the F-score. However, the creation of such
                                                                                             metric in the context of this research is not a straightforward
                                                                                             task. Obvious method of such standard creation involves clas-
                                                                                             sifying of existing data by a legal expert. There are however
                                                                                             a number of concerns regarding this method. Firstly, it is
                                                                                             necessarily subjective. Secondly, machine learning methods
                                                                                             are conceived as means to discover latent patterns existing in
                                                                                             the data, that are missable for humans (cf. [3]). Using human-
Figure 3: Clusters of amendments to the Civil Code as gen-                                   generated gold standard therefore defeats the purpose of
erated by the word2vec(CBOW) model, using the C 1P corpus.                                   using machine learning methods in the first place. Thirdly,
Color-coded dots represent clusters of amending acts. Clus-                                  putting the subjectivity aside, creation of such gold standard
ters types were determined by human actor.                                                   is a cumbersome and tiresome task. Unfortunately, we did
                                                                                             not have enough resources to push that venue of inquiry fur-
                                                                                             ther. For external evaluation we have therefore settled down
                                                                                             on qualitative methods of evaluation in place of quantitative.
        and word2vec(skip-gram)). As word2vec holds embed-                                   The clustering results, after being generated, were assessed
        dings for single words, to generate a vector that summa-                             for their distinctiveness by human actor and the best ones
        rizes documents belonging to a corpus, the summarizing                               were selected. The grading procedure called for each result
        vectors were created by averaging word vectors for all the                           set to be reviewed and scored on 1-10 scale based on the
        words that were present in a given amending act.                                     subjective impression of results quality. The qualities such
      – paragraph vectors (with gensim’s doc2vec implementa-                                 as thematic homogenity of clustered amendments, as well as
        tion). Here the model was trained using commentaries                                 their distinctiveness, were accounted for in this procedure.
        to the Polish Civil Code. The texts of the commentaries                              The relative sizes of each cluster were also considered (for
        were divided into 44,518 paragraphs, each consisting on                              example, results effecting in a single cluster holding over 75%
        average of 50 words. This corpus was used to create para-                            of all amendments were considered to not be very useful).
        graph embeddings. The following parameters were used
        for embeddings generation: vector size = 1600, window =                       4   RESULTS
        10, training epochs = 20, training algorithm = PV-DBOW.                       The internal evaluation results of clustering are shown in Table 3.
        The value of their values were determined experimentally.                     Generally, the word2vec (skip-gram) model achieved the best results
        C 1P and C 1¬P corpora were the only ones that keep the                       as far as the internal evaluation results are concerned and the model
        natural flow of the text; they were the only ones tested                      worked best when it was run with the C 2¬P corpus. It scored the best
        with doc2vec embeddings.                                                      in terms of silhouette coefficient and Calinski-Harabaz index and
    • The results were put under scrutiny in the evaluation stage.                    well in terms of Davies-Bouldin index. Whilst preprocessing was
      There are in general two main types of verification metrics                     rather detrimental to the quality of internal evaluation of results in
Towards legal change analysis: clustering of Polish Civil Code amendments                                         ASAIL 2019, June 21, 2019, Montreal, QC, Canada


case of various word embeddings implementations, in the case of                                    security informatics. Springer, 113–125.
TF-IDF metric it allowed an increase of the aforementioned quality.                            [6] Provincial Administrative Court in Gliwice. [n. d.]. Judgment of 12th of March
                                                                                                   2007, IV SA/Gl 1455/06.
   In the case of external evaluation, the word2vec(CBOW) model                                [7] M. Kokoszczyński and G. Wierczyński. 2011. System informacji prawnej w pracy
with C 1P corpus was ranked the highest, even though the internal                                  sędziego. Wolters Kluwer. https://books.google.pl/books?id=vmFSAwAAQBAJ
                                                                                               [8] Jörg Landthaler, Ingo Glaser, and Florian Matthes. 2018. Towards Explainable
evaluation results might have not pointed to that. Fig. 3 shows                                    Semantic Text Matching. In Legal Knowledge and Information Systems: JURIX
the visualization of the clusters as generated by this model. The                                  2018: The Thirty-first Annual Conference, Vol. 313. IOS Press, 200.
results prove that contemporary word embeddings methods should                                 [9] George Letsas. [n. d.]. The ECHR as a Living Instrument: Its Meaning and its
                                                                                                   Legitimacy (March 14, 2012). Available at SSRN 2021836 ([n. d.]).
be considered when preparing a clustering legal assistant. The data                           [10] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
preprocessing phase does not have to include lemmatization, stem-                                  estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
ming or stopwords removal. However, the creation of training set                                   (2013).
                                                                                              [11] Marcin Miłkowski. [n. d.]. Polimorfologik. https://github.com/morfologik/
and training itself remains a computationally-intensive challenge.                                 polimorfologik
                                                                                              [12] IPI PAN. [n. d.]. Word2Vec. http://dsmodels.nlp.ipipan.waw.pl/
                                                                                              [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
5     CONCLUSION AND FUTURE WORK                                                                   Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
We have shown a proof-of-concept system capable of enhancing a                                     napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
                                                                                                   Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
lawyer with visual representation of legal change. The work pre-                              [14] Adam Przepiórkowski, Rafal L Górski, Barbara Lewandowska-Tomaszyk, and
sented herein was concerned with the Civil Code, however other                                     Marek Lazinski. 2008. Towards the National Corpus of Polish.. In LREC.
areas of law (e.g. criminal law) should be put under scrutiny as                              [15] PierLuigi Spinosa, Gerardo Giardiello, Manola Cherubini, Simone Marchi, Giulia
                                                                                                   Venturi, and Simonetta Montemagni. 2009. NLP-based metadata extraction for
well. Similarly, as far as the created word embeddings are con-                                    legal text consolidation. In Proceedings of the 12th International Conference on
cerned, we should try to create ones that use larger training sets or                              Artificial Intelligence and Law. ACM, 40–49.
                                                                                              [16] Franciszek Studnicki. 1978. Wprowadzenie do informatyki prawniczej: zautomaty-
are more domain-oriented. Whilst this paper used traditional and                                   zowane wyszukiwanie informacji prawnej. Państwowe Wydawnictwo Naukowe.
well-understood methods for clustering and data dimensionality                                [17] Xin Tian, Yin Fang, Yang Weng, Yawen Luo, Huifang Cheng, and Zhu Wang.
reduction, more modern techniques should be tested as well.                                        2018. K-Means Clustering for Controversial Issues Merging in Chinese Legal
                                                                                                   Texts. In Legal Knowledge and Information Systems - JURIX 2018: The Thirty-first
    This work has been based on the legal change as caused by the                                  Annual Conference, Groningen, The Netherlands, 12-14 December 2018. 215–219.
amending acts. It should be noted that this extremely positivist (or                               https://doi.org/10.3233/978-1-61499-935-5-215
formalistic) point of view should be supplemented with more gen-                              [18] Ngoc Phuoc An Vo, Caroline Privault, and Fabien Guillot. 2017. Experimenting
                                                                                                   word embeddings in assisting legal review. In Proceedings of the 16th edition of
eral notions, in which the statutes themselves do not change, how-                                 the International Conference on Articial Intelligence and Law. ACM, 189–198.
ever the practice of officials (e.g. judges) who apply given laws does.                       [19] Tomasz Woś. 2016. Wyroki interpretacyjne i zakresowe w orzecznictwie Try-
                                                                                                   bunału Konstytucyjnego. Studia Iuridica Lublinensia 25, 3 (2016), 985–995.
Two examples of such practices may be given, one stemming from
the practice of Polish legal system, the other based on European
human rights protection system. As for the former, we have already
mentioned that the Polish Constitutional Tribunal sometimes re-
sorts to pointing out that there exists a certain interpretation of the
statute that makes it compatible with the constitutional provisions.
Secondly, as far as the European Convention on Human Rights is
concerned, the European Court of Human Rights has on a number
of occasions called it a "living instrument" and has stressed that
its provisions, even if unchanged, should always be interpreted in
the light of present circumstances [9]. Therefore a support system
should be able to recognize the change in practice as well, which
itself is a challenging problem.
    This work, which is concerned with the legislative change, should
therefore be viewed in the light of a broader subject of legal change.
In future work we aim to employ machine learning techniques to
discover and visualize changes stemming not only from the actions
of the legislature, but also of other legal actors as well.

REFERENCES
 [1] Timothy Arnold-Moore. 1995. Automatically processing amendments to legis-
     lation. In Proceedings of the 5th international conference on Artificial intelligence
     and law. ACM, 297–306.
 [2] Timothy Arnold-Moore. 1997. Automatic generation of amendment legislation.
     In Proceedings of the 6th international conference on Artificial intelligence and law.
     ACM, 56–62.
 [3] Kevin D Ashley. 2017. Artificial intelligence and legal analytics: new tools for law
     practice in the digital age. Cambridge University Press.
 [4] Krystyna Chodorowska. 2018. Automatic court judgment similarity analysis.
     Master’s thesis. Interdisciplinary Centre for Mathematical and Computational
     Modelling, University of Warsaw.
 [5] Shihchieh Chou and Tai-Ping Hsing. 2010. Text mining technique for Chinese
     written judgment of criminal case. In Pacific-Asia workshop on intelligence and