Towards legal change analysis: clustering of Polish Civil Code amendments Łukasz Górski Interdisciplinary Centre for Mathematical and Computational Modelling University of Warsaw lgorski@icm.edu.pl ABSTRACT task, as those are often called simply "Statute amending Civil Due to the growing activity of legislators, lawyers are in need of Code" (and sometimes statutes amending the Code focus on tools that would allow them to get a better understanding of an different pieces of legislation at the same time and the Code ever-growing corpus of legislative materials. Herein we propose a may not be mentioned in their title at all) or the amending tool that visualizes and clusters thematically similar amending acts, provisions can be scattered in a number of statutes that hold allowing a lawyer to quickly review related provisions, thus giving other substantive provisions. an insight into a legislative history of a given legal institution. The (iii) While there has been an extensive body of research pertain- methods suggested herein (based on TF-IDF, word and paragraph ing to the use of machine learning in the area of law in the embeddings and PCA as well as k-mean clustering) are evaluated English language, the body of research pertaining to Polish on the provisions of the Polish Civil Code. law is obviously smaller. 1 INTRODUCTION 2 RELATED WORK This paper describes first steps undertaken in the development Practising lawyers need tools that would allow them to track legal of a software solution used for the visualization of legal change, changes, especially due to the increasing activity of legislatures. For which aims to provide the user with means to effectively explore a example, in the Polish legal system it has been noted a number of database of amending acts. We aim to develop a solution which is times that currently the legal system is undergoing the process of able to group together amending acts that are thematically similar, "inflation of law". This notion was recognized by theorists [16] and in an unsupervised manner. even the courts, one of which explicitly stated that the legislature The proof-of-concept implementation studied herein has been is currently multiplying the numbers of unnecessary statutes, which tested using the Polish Civil Code and relevant amending acts issued makes accessing ... sources of law difficult [6]. from its enactment in 1965 up to November 2018. This legal act was The problem of orientation in a dynamically changing system chosen as a basis for experiments for the following reasons: of statues can be mitigated to a degree by the introduction of con- (i) While the Code was in force, the Polish economy has under- solidated texts of acts. In practice, in Poland, the process of con- gone transformation from socialism to capitalism and later solidation of legal texts is two-fold. On the one hand, there are its law had be adapted to the law of the European Union. The official consolidated texts of legal acts published by the authorities. processes pertaining to the recognition of the information In practice, those are however seldom used. Lawyers routinely use and communication technologies in the domain of law were the legal databases and search engines that are developed by pri- also reflected in the Code. Turbulent times, in which the vate companies (legal information systems) instead. Currently, the Code existed, made it subject to almost 90 amending acts. market remains split between C.H. Beck, developer of Legalis infor- Some of the sections composing the Civil Code were, in fact, mation system, and Wolters Kluwer Polska, with their Lex system. subject to change multiple times - please consult the heat The editorial offices of both of these systems carefully analyse every map (Fig. 1) for a graphical representation of the number amending act and issue their versions of the consolidated text. Ob- of times a given legal section was amended. Therefore this viously, the consolidated texts published by those privately-owned research aimed to assess whether modern machine learning enterprises do not have a formal force of law, yet the convenience approaches would be able to recognize and discover discrete offered by them makes those closed and paid platforms a go-to categories of changes (not necessarily the three mentioned solution for professionals. As far as the recognition of amendments hereinbefore), based only on the text of relevant legal provi- goes, both of these systems offer, inter alia, a clear diff-like view of sions. the legislative history of a given legal provision (Fig. 2). (ii) Even though the amending acts should be as straight-forward However, those solutions do not employ any form of graphical to understand and as precise as possible, the legislative prac- presentation of amendments. In fact, artificial intelligence meth- tice does not always live up to this standard. For example, ods are used sparsely in those types of software: for example the the titles of the amending acts do not help in the clustering consolidated versions of statutes are created mainly by hand [7]. In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information Therefore this research, independent of aforementioned commer- in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada. cial solutions, aims to look into means of extending already existing © 2019 Copyright held by the owner/author(s). Copying permitted for private and systems. academic purposes. Published at http://ceur-ws.org As far as the analysis of amending acts in the AI and Law com- munity goes, the focus up to this time was mainly on the automatic ASAIL 2019, June 21, 2019, Montreal, QC, Canada Łukasz Górski Figure 1: Heatmap showing the number of times each section of the Polish Civil Code was amended. All sections were sorted sequentially by their numbers. Inspired by the traditional division put forth by the 19th-century German school of Pandectists, the Code is divided into four books - the starting articles for those books were marked for reference. Figure 2: Diff view of amended statute in Legalis system. Additions are in blue and underlined, while deletions are denoted by red and crossed out. Competing Lex system offers a similar view. See the Table 1 for the English translation of the passage. consolidation of legal texts. For example, authors in [15] created a representation of changes, similar to that shown in Fig. 2, tool for semiautomatic implementation of amending acts. Similar is created. The amending acts are not directly processed: subject was undertaken in [1], in which a feasibility of using an this problem, while itself interesting, is out of the scope of SGML-based engine for amendments processing was explored. Du- this paper. Usable diffs can be created using Linux wdiff ally, in [2] a drafting environment was prototyped, which generated command. In fact, for the purpose of this study, a number amending acts based on amendments introduced by drafter into a of diff-generating tools were tested, yet wdiff seemed to principal act. be best suited for our instant needs, offering the clearest Whilst this research uses word embeddings techniques as well results (Table 1 can be consulted for examples of differences as older TF-IDF-based methods, the feasibility of using word em- between the output of various diff-generating tools). beddings in eDiscovery procedures was in fact already explored. The extraction of diffs allowed the creation of three different In [18] a Disco system is described, which uses word2vec word bodies of amendments corpora. For their detailed descrip- embeddings to help legal expert with refining her document data- tion and example Table 2 should be consulted. The first cor- base search queries. In Poland, doc2vec model was already used pus version (C 1 ) consisted of a complete text of given legal in SAOS, a Polish courts’ judgment analysis system, as a basis for sections after amending; the second version (C 2 ) included similarity analysis module [4]. [8] focused on the explainability only the words that were inserted into a given legal section. of AI methods and supplemented text similarity measures (based However, both of these corpora did not include the texts on TFIDF and word embeddings) with metric showing how much that were deleted by an amending act. Yet, the provisions each word contributes to overall similarity result when comparing or parts of them that were struck down can carry at least text phrases. K-Means clustering employed in this research was the same amount of semantic meaning as those that were used with, inter alia, embedding-based methods for grouping con- left untouched or added by the legislature. Moreover, in con- troversial issues that were extracted from Chinese legal texts [17]. temporary legal systems, legislative action is not the only Similarly, other authors clustered the documents regarding Chinese means of changing the statute. For example, in Poland, the criminal cases [5]. Constitutional Tribunal was called a "negative legislator". This term means that, in principle, a Tribunal is unable to 3 METHODS amend a given legal act by adding some provisions, yet is perfectly capable of striking a given provision down. While In pursuit of the aim outlined in the preceding section a pipeline of this position is overly simplistic (as Tribunal in practice was an existing tools has been created, with all of them instrumented able to pass, inter alia, interpretative judgments, in which by Python programms. Python 3.6.8 from Anaconda was used for it concludes that a given provision is in accordance with text processing instrumentation, as well as: gensim 3.4.0 for TF- the Constitution as long as its interpretation is in line with IDF and embeddings calculations, scikit-learn 0.20.2 for clustering, the one put forth by the Tribunal [19]) we should be able to pandas 0.24.0 for data manipulation, nltk 3.4 for text processing include in our clusterization endeavour effects of removal of and matplotlib 3.0.2 for visualization. a given statutory provision. To achieve this aim, for the pur- Text processing pipeline can be divided into the following phases: pose of this study, a third version of the corpus (C 3 ) included • The generation phase involves reading the consolidated the parts of the legal provisions that were inserted by the versions of a given statute and extracting the differences amending acts alongside the deleted ones. The disadvantage between each successive version. In this phase a textual of this technique is that it distorts the natural flow of the text Towards legal change analysis: clustering of Polish Civil Code amendments ASAIL 2019, June 21, 2019, Montreal, QC, Canada Table 1: Differences old and new versions of a given legal provision (section 781 § 1 of the Civil Code, as amended by the amending act published in Dz.U. [Journal of Laws] from 2016, item 1579), as shown by different implementations of diff (for illustrative purposes the English translation of original Polish passage is used1 ). Diff command Result used wdiff § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in electronic form and provide it with a secure electronic signature verified with a valid qualified certificate. electronic signature. diff § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in electronic form and provide it with a secure electronic signature verified with a valid qualified certificate. § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in electronic form and provide it with a qualified electronic signature. difflib (Python § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in library) electronic form and provide it with a secquralifiedelectronic signature verified with a valid qualified certificate. simplediff § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a declaration of intent in (Python library) electronic form and provide it with a secquralifiedelectronic signature verified with a valid qualified certificate. Table 2: Corpus types for further down the line processing. Symbol Description Example from the Civil Code C1 Legal section’s text after § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a amendments declaration of intent in electronic form and provide it with a qualified electronic signature. C2 Only words inserted by the electronic signature amending act C3 Using crossed out parts of a § 1. In order to observe the electronic form of an act in law it shall be sufficient to make a given section alongside the declaration of intent in electronic form and provide it with a secure electronic signature inserted ones verified with a valid qualified certificate electronic signature. and might not fare well with a paragraph embedding method denoted as C 1P , C 2P , C 3P ), half of them - not (hereinafter C 1¬P , that depends on the natural sequence of words in a sentence, C 2¬P , C 3¬P ). and might be better suited for methods that employ bag of • The processing stage involved using the K-means clustering words technique. to group together similar documents from each corpus. The • In preprocessing phase these three variations of corpora number of clusters, for the sake of the experiments, was set to were later processed using the standard NLP pipeline - stop- 10. Visualization module uses PCA to display the clustering words were removed and lemmatization was performed (us- results. ing the Polish Polimorfologik dictionary [11]). As Polish is The following methods were used to generate document a highly inflected language, lemmatization had to be used vectors as a basis of clustering: instead of stemming. On the other hand, stopwords removal – TF-IDF, which used corpora C 1P , C 2P , C 3P as well as C 1¬P , and lemmatization are not always utilized with more ad- C 2¬P , C 3¬P . vanced techniques of text representation, like word or para- – word2vec, using the same corpora as TF-IDF. We have used graph embeddings. Seminal papers that introduced those the pretrained word embeddings for this part, which were techniques do not mention stemming or lemmatization at all generated for Polish by other research groups [12]. Those (cf. [10]). Therefore we have decided to test the clustering were based the National Corpus of Polish database (built algorithm with either preprocessed corpus (i.e. with stop- using excepts from newspapers, magazines, text extracted words removed and lemmatization performed) or without from the internet as well as conversation transcripts) [14], preprocessing. Six distinct corpora for clustering were thus in addition to Wikipedia database. Two versions of the prepared, half of them preprocessed (those will be hereinafter word embedding were put under scrutiny, both holding forms for all part of speech in Polish, with vector consist- ing of 300 elements. Both models were trained using the 1 The English translation of amended text comes from the Legalis legal information negative sampling algorithm and differed in the architec- system, which in turn references The Polish Law Collection database by Translegis publishing house. The crossed-out sections were translated from Polish to English by ture - one used CBOW, the other Skip-Gram architecture the authors of this paper. (hereinafter those will be denoted as word2vec(CBOW) ASAIL 2019, June 21, 2019, Montreal, QC, Canada Łukasz Górski Table 3: Internal evaluation results of different text representation methods and corpora (bold numbers represent the best results) Text word2vec(CBOW) word2vec(skip-gram) doc2vec TF-IDF representation Corpus C 1P C 2P C 3P C 1¬P C 2¬P C 3¬P C 1P C 2P C 3P C 1¬P C 2¬P C 3¬P C 1P C 1¬P C 1P C 2P C 3P C 1¬P C 2¬P C 3¬P Silhouette coefficient 0.16 0.14 0.1 0.17 0.23 0.09 0.16 0.29 0.13 0.18 0.38 0.11 0.11 0.09 0.12 0.05 0.09 0.08 0.05 0.08 higher = better defined clusters Calinski-Harabaz 8.03 9.75 5.72 9.74 10.77 6.28 10.27 12.01 6.04 14.17 15.65 6.47 3.17 2.5 2.87 2.12 2.7 2.25 1.52 2.18 higher = better defined clusters Davies-Bouldin 1.69 1.08 1.65 1.15 0.54 1.66 1.28 0.99 1.56 1.35 0.7 1.67 1.61 1.7 1.52 2.97 2.8 1.5 1.16 2.8 lower = clusters better separated for clustering algorithms. Firstly, internal evaluation consid- ers not a given ground truth, but the model itself. Metrics for internal evaluation presented herein include: silhouette coefficient and Calinski-Harabaz index (both evaluate how well the clusters are defined) as well as Davies-Bouldin index (assesses the separation between clusters) [13]. The external evaluation methods compare machine-generated clusters with some pre-existing evaluation gold standard, thus allowing the introduction of standard measures of pre- cision, recall or the F-score. However, the creation of such metric in the context of this research is not a straightforward task. Obvious method of such standard creation involves clas- sifying of existing data by a legal expert. There are however a number of concerns regarding this method. Firstly, it is necessarily subjective. Secondly, machine learning methods are conceived as means to discover latent patterns existing in the data, that are missable for humans (cf. [3]). Using human- Figure 3: Clusters of amendments to the Civil Code as gen- generated gold standard therefore defeats the purpose of erated by the word2vec(CBOW) model, using the C 1P corpus. using machine learning methods in the first place. Thirdly, Color-coded dots represent clusters of amending acts. Clus- putting the subjectivity aside, creation of such gold standard ters types were determined by human actor. is a cumbersome and tiresome task. Unfortunately, we did not have enough resources to push that venue of inquiry fur- ther. For external evaluation we have therefore settled down on qualitative methods of evaluation in place of quantitative. and word2vec(skip-gram)). As word2vec holds embed- The clustering results, after being generated, were assessed dings for single words, to generate a vector that summa- for their distinctiveness by human actor and the best ones rizes documents belonging to a corpus, the summarizing were selected. The grading procedure called for each result vectors were created by averaging word vectors for all the set to be reviewed and scored on 1-10 scale based on the words that were present in a given amending act. subjective impression of results quality. The qualities such – paragraph vectors (with gensim’s doc2vec implementa- as thematic homogenity of clustered amendments, as well as tion). Here the model was trained using commentaries their distinctiveness, were accounted for in this procedure. to the Polish Civil Code. The texts of the commentaries The relative sizes of each cluster were also considered (for were divided into 44,518 paragraphs, each consisting on example, results effecting in a single cluster holding over 75% average of 50 words. This corpus was used to create para- of all amendments were considered to not be very useful). graph embeddings. The following parameters were used for embeddings generation: vector size = 1600, window = 4 RESULTS 10, training epochs = 20, training algorithm = PV-DBOW. The internal evaluation results of clustering are shown in Table 3. The value of their values were determined experimentally. Generally, the word2vec (skip-gram) model achieved the best results C 1P and C 1¬P corpora were the only ones that keep the as far as the internal evaluation results are concerned and the model natural flow of the text; they were the only ones tested worked best when it was run with the C 2¬P corpus. It scored the best with doc2vec embeddings. in terms of silhouette coefficient and Calinski-Harabaz index and • The results were put under scrutiny in the evaluation stage. well in terms of Davies-Bouldin index. Whilst preprocessing was There are in general two main types of verification metrics rather detrimental to the quality of internal evaluation of results in Towards legal change analysis: clustering of Polish Civil Code amendments ASAIL 2019, June 21, 2019, Montreal, QC, Canada case of various word embeddings implementations, in the case of security informatics. Springer, 113–125. TF-IDF metric it allowed an increase of the aforementioned quality. [6] Provincial Administrative Court in Gliwice. [n. d.]. Judgment of 12th of March 2007, IV SA/Gl 1455/06. In the case of external evaluation, the word2vec(CBOW) model [7] M. Kokoszczyński and G. Wierczyński. 2011. System informacji prawnej w pracy with C 1P corpus was ranked the highest, even though the internal sędziego. Wolters Kluwer. https://books.google.pl/books?id=vmFSAwAAQBAJ [8] Jörg Landthaler, Ingo Glaser, and Florian Matthes. 2018. Towards Explainable evaluation results might have not pointed to that. Fig. 3 shows Semantic Text Matching. In Legal Knowledge and Information Systems: JURIX the visualization of the clusters as generated by this model. The 2018: The Thirty-first Annual Conference, Vol. 313. IOS Press, 200. results prove that contemporary word embeddings methods should [9] George Letsas. [n. d.]. The ECHR as a Living Instrument: Its Meaning and its Legitimacy (March 14, 2012). Available at SSRN 2021836 ([n. d.]). be considered when preparing a clustering legal assistant. The data [10] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient preprocessing phase does not have to include lemmatization, stem- estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 ming or stopwords removal. However, the creation of training set (2013). [11] Marcin Miłkowski. [n. d.]. Polimorfologik. https://github.com/morfologik/ and training itself remains a computationally-intensive challenge. polimorfologik [12] IPI PAN. [n. d.]. Word2Vec. http://dsmodels.nlp.ipipan.waw.pl/ [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. 5 CONCLUSION AND FUTURE WORK Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- We have shown a proof-of-concept system capable of enhancing a napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. lawyer with visual representation of legal change. The work pre- [14] Adam Przepiórkowski, Rafal L Górski, Barbara Lewandowska-Tomaszyk, and sented herein was concerned with the Civil Code, however other Marek Lazinski. 2008. Towards the National Corpus of Polish.. In LREC. areas of law (e.g. criminal law) should be put under scrutiny as [15] PierLuigi Spinosa, Gerardo Giardiello, Manola Cherubini, Simone Marchi, Giulia Venturi, and Simonetta Montemagni. 2009. NLP-based metadata extraction for well. Similarly, as far as the created word embeddings are con- legal text consolidation. In Proceedings of the 12th International Conference on cerned, we should try to create ones that use larger training sets or Artificial Intelligence and Law. ACM, 40–49. [16] Franciszek Studnicki. 1978. Wprowadzenie do informatyki prawniczej: zautomaty- are more domain-oriented. Whilst this paper used traditional and zowane wyszukiwanie informacji prawnej. Państwowe Wydawnictwo Naukowe. well-understood methods for clustering and data dimensionality [17] Xin Tian, Yin Fang, Yang Weng, Yawen Luo, Huifang Cheng, and Zhu Wang. reduction, more modern techniques should be tested as well. 2018. K-Means Clustering for Controversial Issues Merging in Chinese Legal Texts. In Legal Knowledge and Information Systems - JURIX 2018: The Thirty-first This work has been based on the legal change as caused by the Annual Conference, Groningen, The Netherlands, 12-14 December 2018. 215–219. amending acts. It should be noted that this extremely positivist (or https://doi.org/10.3233/978-1-61499-935-5-215 formalistic) point of view should be supplemented with more gen- [18] Ngoc Phuoc An Vo, Caroline Privault, and Fabien Guillot. 2017. Experimenting word embeddings in assisting legal review. In Proceedings of the 16th edition of eral notions, in which the statutes themselves do not change, how- the International Conference on Articial Intelligence and Law. ACM, 189–198. ever the practice of officials (e.g. judges) who apply given laws does. [19] Tomasz Woś. 2016. Wyroki interpretacyjne i zakresowe w orzecznictwie Try- bunału Konstytucyjnego. Studia Iuridica Lublinensia 25, 3 (2016), 985–995. Two examples of such practices may be given, one stemming from the practice of Polish legal system, the other based on European human rights protection system. As for the former, we have already mentioned that the Polish Constitutional Tribunal sometimes re- sorts to pointing out that there exists a certain interpretation of the statute that makes it compatible with the constitutional provisions. Secondly, as far as the European Convention on Human Rights is concerned, the European Court of Human Rights has on a number of occasions called it a "living instrument" and has stressed that its provisions, even if unchanged, should always be interpreted in the light of present circumstances [9]. Therefore a support system should be able to recognize the change in practice as well, which itself is a challenging problem. This work, which is concerned with the legislative change, should therefore be viewed in the light of a broader subject of legal change. In future work we aim to employ machine learning techniques to discover and visualize changes stemming not only from the actions of the legislature, but also of other legal actors as well. REFERENCES [1] Timothy Arnold-Moore. 1995. Automatically processing amendments to legis- lation. In Proceedings of the 5th international conference on Artificial intelligence and law. ACM, 297–306. [2] Timothy Arnold-Moore. 1997. Automatic generation of amendment legislation. In Proceedings of the 6th international conference on Artificial intelligence and law. ACM, 56–62. [3] Kevin D Ashley. 2017. Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press. [4] Krystyna Chodorowska. 2018. Automatic court judgment similarity analysis. Master’s thesis. Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw. [5] Shihchieh Chou and Tai-Ping Hsing. 2010. Text mining technique for Chinese written judgment of criminal case. In Pacific-Asia workshop on intelligence and