=Paper=
{{Paper
|id=Vol-3033/paper26
|storemode=property
|title=A First Step Towards Automatic Consolidation of Legal Acts: Reliable Classification of Textual Modifications
|pdfUrl=https://ceur-ws.org/Vol-3033/paper26.pdf
|volume=Vol-3033
|authors=Samuel Fabrizi,Maria Iacono,Andrea Tesei,Lorenzo De Mattei
|dblpUrl=https://dblp.org/rec/conf/clic-it/FabriziITM21
}}
==A First Step Towards Automatic Consolidation of Legal Acts: Reliable Classification of Textual Modifications==
A First Step Towards Automatic Consolidation of Legal Acts: Reliable Classification of Textual Modifications Samuel Fabrizi, Maria Iacono, Andrea Tesei and Lorenzo De Mattei Aptus.AI / Pisa, Italy {samuel,maria,andrea,lorenzo}@aptus.ai Abstract two main steps: a) the identification and classifi- cation of the textual modifications in amendment The automatic consolidation of legal texts acts; b) the integration within a single document of with the integration of its successive the textual modifications identified in the previous amendments and corrigenda might have an step. The first step can be expressed as the auto- important practical impact on public insti- matic classification of textual modifications inside tutions, citizens and organizations. This a legal document. In this work, we focus on step process involves two steps: a) the clas- a). sification of the textual modifications in Several authors tried to solve this task using stan- amendment acts and b) the integration dard Natural Language Processing (NLP) tech- within a single document of such mod- niques. Ogawa et al. (2008) showed that amend- ifications. In this work we propose a ment clauses described in the Japanese statutes methodology to solve step a) by exploiting can be formalized in terms of sixteen regular ex- Machine Learning and Natural Language pressions. Lesmo et al. (2009) tried to identify Process techniques on the Italian versions and classify integrations, substitutions and dele- of European Regulations: our results sug- tions using a three-step approach: 1) prune text gest that the methodology we propose is fragments that do not convey relevant informa- a reliable first milestone toward the auto- tion, 2) perform the syntactic analysis of the re- matic consolidation of legal texts. trieved sentences, 3) semantically annotate the provision using a rule-based approach based on tree. In this last step, they also used a knowl- 1 Introduction edge base that describes the provisions taxonomy (Arnold-Moore, 1997).4 Brighi et al. (2008) and Consolidation consists of the integration in a le- Spinosa et al. (2009) followed a similar approach. gal act of its successive amendments and corri- In both cases, semantic analysis is carried out on genda.1 Consolidated texts are very important for the syntactically pre-processed text using a rule- legal practitioners. However, their maintenance is based approach. The difference is related to the a tedious task. Some regulatory publishers such as starting point of the semantic analysis. The for- Normattiva2 provide continuously updated consol- mer’s system relied on a deep semantic analysis of idated texts, others such as Eur-Lex3 do times to the textual modifications. The latter started from times, some other do not. The automation of this the shallow syntactically parsed text. Garofalakis process could let institutions to save resources and et al. (2016) presented a semi-automatic system practitioners to access continuously updated con- for the consolidation of Greek legislative texts solidated documents. This achievement would let based on regular expressions. Francesconi and organizations stay compliant with the normative Passerini (2007) defined a module that automat- more easily. The consolidation process involves ically classifies paragraphs into provision types. Copyright © 2021 for this paper by its authors. Use per- Each paragraph is represented using Bag of words mitted under Creative Commons License Attribution 4.0 In- ternational (CC BY 4.0). either with TF-IDF weighting (Salton and Buck- 1 Eur-Lex, About consolidation, https://bit.ly/2 VFyGhv 4 A legislative provision represents the meaning of a law 2 Normattiva, https://www.normattiva.it/ part from a legal point of view. Obligations, definitions and 3 Eur-Lex, https://eur-lex.europa.eu/ modifications are specific types of provision. ley, 1988) or binary weight. The authors showed – to annotates the words that replace the an experimental comparison of the different repre- previous ones (“novella”). sentation methods using the Naive Bayes and Mul- ticlass Support Vector Machine (MSVM) models. • replacement ref is a type of replacement. We This paper describes our approach in the classifi- use it to handle textual modifications that in- cation of textual modifications, namely substitu- clude attachments. tion, addition, repeal and abolition. The proposed • addition annotates textual modifications that approach is based on standard statistical NLP tech- add or complete a part of a legal document. niques (Manning and Schutze, 1999). Our method involves i) the use of XML-based standards for the • repeal indicates the removal or reversal of a annotation of legislative documents, ii) the con- law. It is used to invalidate its provisions al- struction of the dataset assigning a label to each together. word according to the tagging format used, and • abolition indicates the removal of a law part. iii) the implementation of NLP models to iden- It is used to replace the law with an updated, tify and classify textual modifications. We carried amended or related law. This textual mod- out a systematic comparison among several fea- ification could just involve single words or ture extraction techniques and models. The main whole subdivision as in the replacements. contribution of this paper is the application of ma- chine learning models to classify textual modifica- tions. In contrast to rule-based or regular expres- Category Total sion techniques, our models do not need expert replacement 308 knowledge about the application domain’s proper- from 95 ties. They try to extract formulas used to introduce to 95 a textual modification without the need for an ex- replacement ref 34 plicit definition of all the formulas. Our approach addition 96 leads to lower maintenance costs and hopefully in- repeal 93 creased robustness of the system. abolition 92 2 Data Table 1: Total number of textual modifications for each category We extracted the data from Daitomic5 , a product that contains all the regulations from a set of legal Table 2 reports an example for each of the men- sources encoded automatically in Akoma Ntoso tioned categories. Table 1 shows the total number standard format (Palmirani and Vitali, 2011). We of textual modifications per category. The number collected from this product all the Italian versions of replacements examples is greater than that the of the amendment documents originally extracted others types of modifications because substitutions from Eur-Lex and we randomly sampled 260 legal can be introduced by different formulas that deter- documents for manual labelling. mine their specific meaning. Indeed, from a pre- Accordingly to the Eur-Lex web service specifica- liminary experiment, we understood that there is tions6 , we identified seven different types of tex- a relationship of proportionality between the num- tual modifications: ber of formulas used to introduce textual modifica- • replacement annotates a substitution which tions and the number of examples needed to train may concern a part of a sentence (expression, the models. For this reason, we needed a different word, date, amount) or a whole subdivision number of examples for each category to train our of the document (article, paragraph, indent). models. Usually, this type of textual modification in- Given the differences among the nature of each cludes also the following subcategories: modification type, we preferred to split the orig- – from annotates the replaced words inal problem into five subtasks, namely: (“novellando”). 1. replacement classification that also contains 5 Daitomic, https://www.daitomic.com/ the replacement ref category; 6 Eur-Lex, How to use the webservice?, https://bit. ly/393qt9Z 2. addition classification; 3. repeal classification; between quotes because it has led to a perfor- mance improvement. 4. abolition classification; 3 Experiments 5. from to classification. For each task, we gathered the documents that The manual annotation consisted in assigning one contain one or more occurrences of that specific label at each token of the selected document for modification. Then, we split the dataset into a each subtask that indicates if it represents or not training and a test set. More precisely, we used the a textual modification. We defined three different 80/20 ratio adopting a stratified technique (Trost, tagging formats: Inside-Outside-Beginning (IOB), 1986). We used the training set to validate the hy- Inside-Outside (IO), Limit-Limit(LL). The first perparameters of each model. Once computed the two tagging formats are standard.7 The last one, final models, we made use of the test set to mea- instead, uses the prefix “L-” to indicate that the to- sure their generalization ability. It is important to ken is either the beginning or end of a textual mod- emphasise that we never used the internal test set ification. We adopted a specific tagging format for before the definition of the final models. each model based on our preliminary results. The The general pipeline is composed of the following tagging format was one of the most critical choices steps: to improve model performance. The dataset used for the last subtask is different. 1. The annotated documents are tokenized. Indeed, the from and to tags are always enclosed 2. Each token is associated with one label for within the replacement tags. We could not use any each category following the tagging formats of our tagging formats because their syntax does previously defined. not permit any nesting (Dai, 2018). Therefore, we decided to change the dataset itself to train the 3. From each token, we extract its represen- models. We considered only the tokens inside the tation using either hand-crafted features or sentences representing a replacement and tagged character level N-grams or word embeddings. them using the aforementioned tagging formats. Depending on the model used, both tagging In this way, we avoided the nesting issue. format and features extraction change. 2.1 Preprocessing 4. We execute the model selection phase ex- ploiting K-fold cross-validation. In our ex- Each model needs a different preprocessing periments, we set the K parameter to 3 so method to process the raw text legal documents, that validation sets size is reasonable. The depending on the feature extractor used. There are purpose of this step is to find the best hyper- only a few preprocessing operations common to parameters of each model. all models: 5. For each subtask, we chose the model with 1. substitution of the special characters ≪ and the best performance in the previous step. ≫ with the quote marks; 6. After choosing the best configuration of each 2. substitution of words between quote marks model, we computed and compared their per- with the special token QUOTES TEXT. This formances over the test set. step has allowed us to limit the number of to- kens in each paragraph. The words between 3.1 Feature Extraction quote marks often represent a whole article We applied several feature extraction techniques to (for example to substitute or to add). We de- figure out which one was the most effective. In this cided to substitute these words with a special section, we will explain these techniques with an token because they are redundant for our task. in-depth description. Considering the nature of the This consideration permits us to improve the task, all the features are extracted at the word level. performances of all models. In the from and We define different sets of features according to to subtask, we avoided substituting the text the models’ needs. We logically divided our fea- 7 Breckbaldwin, Coding Chunkers as Taggers: IO, BIO, tures into hand-crafted features, n-gram features BMEWO, and BMEWO+, https://bit.ly/3DzuqBc and word embeddings. All’articolo 7 della decisione 2005/692/CE, la data del replacement≪ . L’allegato II al regolamento (CE) n. 998/2003 è sostituito dal testo dell’ replacement ref < replacement ref > allegato al presente regolamento. È aggiunto il seguente allegato: addition31 dicembre 2010 ≫ è sostituita da ≪30 giugno 2012 ≫“ALLEGATO III [...]” repeal Il regolamento (CEE) n. 160/88 è abrogato.abolition nel titolo i termini “raccolti nel 1980” sono soppressi Table 2: Annotations examples In the following we list the hand-crafted features each set of words, it produces a sparse vector rep- extracted and their meaning: resentation that captures a large number (376037) of character n-grams features. • is upper: boolean value indicating whether Finally, we decided to use a word embedding the token is in uppercase lexicon as it has been shown that provides good • is lower: boolean value indicating whether performances in other Italian tasks (De Mattei the token is in lowercase et al., 2018; Cimino et al., 2018). We tested a few different in-domain and general purpose em- • is title: boolean value indicating whether the beddings lexicons trained using both fastText (Bo- token is in titlecase janowski et al., 2017) and word2vec (Mikolov et al., 2013), we obtained the best results with fast- • is alpha: boolean value indicating whether Text pretrained Italian model (Grave et al., 2018). the token consists of alphabetic characters The features extracted from each token do not con- tain enough information to discriminate the true • is digit: boolean value indicating whether the amendment class. For this reason, we decided token consists of digits to introduce the sliding window concept (Diet- • is punct: boolean value indicating whether terich, 2002). It represents a set of tokens that pre- the token is a punctuation mark cede and/or follow each token, like a “window” with a fixed size that moves forward through the • pos val cg: coarse-grained part-of-speech text. For each feature extraction technique, we from the Universal POS tag set (Kumawat introduced two parameters, window size and and Jain, 2015): the text has been POS tagged is bilateral window. The former indicates with SpaCy Italian model8 the dimension of the window. The latter is a boolean value indicating whether the window con- • is alnum: boolean value indicating whether siders only the preceding tokens (False) or both all characters in the token are alphanumeric preceding and following tokens (True). For exam- (either alphabets or numbers) ple, the sentence “È aggiunto il seguente allegato” • word lower: token in lowercase with a bilateral sliding window of size 1, becomes 〈(PAD, È, aggiunto), (È, aggiunto, il), (aggiunto, • word[-3:]: last three characters of the token il, seguente), (il, seguente, allegato), (seguente, al- legato, PAD)〉 where PAD indicates the padding • word[-2:]: last two characters of the token value. The introduction of the sliding window has Then, we decided to use a more complex represen- made it possible to improve the evaluation metric tation. We used a Count Vectorizer (Sarlis and of all models. Maglogiannis, 2020) computed over all the Ital- ian legal documents contained in EUR-Lex at the 3.2 Models date we created it. It converts a collection of text We want to find a fully automatic approach based documents to a matrix of n-gram counts. From on the extraction of interesting features. For this 8 Spacy, Models, https://spacy.io/models/it reason, we developed a systematic comparison among three models: Support Vector Machine The CRF outperforms other models in almost all (SVM) with n-gram features, Conditional Ran- the subtasks. We think that it is due to the na- dom Field (CRF) with hand-crafted features and ture of this model. Indeed, CRFs naturally con- a Neural Network (NN) that uses word embed- sider state-to-state dependencies and feature-to- dings. This latter model is a rather general con- state dependencies (Lafferty et al., 2001). Once volutional network architecture. The inputs of our NLP tasks are the words that compose the slid- Subtask SVM CRF NN ing window represented as a matrix. Each row Replacement 0.868 0.881 0.841 of the matrix corresponds to the word embedding Addition 0.825 0.852 0.796 representation of one token. We decided to use a Repeal 0.915 0.938 0.924 convolutional layer given its efficiency in terms of Abolition 0.823 0.878 0.939 both representation and speed; it permits us to cap- From To 0.748 0.873 0.800 ture local and position-invariant features (Yin et al., 2017) useful for our purpose. Then, we added Table 3: Average results in terms of F1 macro a Batch Normalization layer. It significantly re- score obtained in the validation phase duces the training time in feedforward neural net- works (Ba et al., 2016). During the experiment completed the model selection phase, we chose the phase, we observed that layer normalization of- best model and its configuration for each subtask. fers a speedup over the baseline model without We considered both the mean and standard devi- normalization and it stabilizes the training of the ation of the f1 metric among the folds. Then, we model. We have also tried to use a Bidirectional re-trained the best model on the whole training set. Long Short-Term Memory based model with an Table 4 reports the results and the average score of additional CRF layer (Bi-LSTM-CRF) to solve the precision, recall and F1 metrics over the in- our task (Huang et al., 2015). Its application leads ternal test set. The precision score is higher than to poor performance in terms of scores and speed. recall in all except one subtask which may be good The results obtained show the need to solve our for an application perspective. task using simple models that are able to discover local patterns. Model Prec. Rec. F1 Replacement CRF 0.949 0.864 0.902 Addition CRF 0.790 0.865 0.823 4 Results Repeal CRF 0.937 0.912 0.924 The objective of the evaluation was to define a Abolition NN 0.951 0.912 0.931 systematic comparison among the models’ perfor- From To CRF 0.977 0.841 0.899 mance with respect to F1 macro, precision and re- Table 4: Precision, recall and F1 scores of the best call. In the model selection step, we used the F1 model for each subtask macro score as the evaluation metric since the fre- quency distribution of the labels turned out to be The models’ performances are improved com- strongly unbalanced in all the subtasks. pared to the results achieved in the model selec- After some preliminary experiments, we fixed the tion phase, probably thanks to the larger training sliding window size and the tagging format for set provided. each model. We found that both the CRF and NN models are more inclined to use a bigger sliding 5 Conclusion window size (5) than the SVM models (1) from a performance-based perspective. We think this We presented and analysed a machine-learning ap- difference comes from the Curse of Dimensional- proach to the problem of the classification of tex- ity problem that could be encountered in the SVM tual modifications. We compared different tag- models (Bengio et al., 2005). Concerning the tag- ging formats, feature extractor techniques and ma- ging format, we adopted the LL tagging for all the chine learning models. Our experiments show that models. Our experiments show that it increases the sliding window approach, combined with char the f1 score of about 20 percentage points. count vectorizer or word embeddings, allows the Table 3 reports the mean results among the 3-fold models to capture most of the formulas that in- obtained by the best configuration of each model. troduce textual modifications. Following Occam’s razor principle, we defined simple models that ob- Enrico Francesconi and A. Passerini. 2007. Automatic tained good performances in all the subtasks. Our classification of provisions in legislative texts. Arti- ficial Intelligence and Law, 15:1–17, 01. approach does not need any expertise in the law field since it tries to formalized rules to identify John Garofalakis, Konstantinos Plessas, and Athana- textual modifications. We use different NLP tech- sios Plessas. 2016. A semi-automatic system for the niques to extract hidden features from the words consolidation of greek legislative texts. In Proceed- ings of the 20th Pan-Hellenic Conference on Infor- inside a window. matics, PCI ’16, New York, NY, USA. Association Results validate our approach in terms of both cor- for Computing Machinery. rectness and stability. They represent the first step Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Ar- to build a fully automatic model capable to iden- mand Joulin, and Tomas Mikolov. 2018. Learn- tify and integrates textual modifications. ing word vectors for 157 languages. In Proceed- ings of the International Conference on Language Resources and Evaluation (LREC 2018). References Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirec- Timothy Arnold-Moore. 1997. Automatic generation tional lstm-crf models for sequence tagging. of amendment legislation. In ICAIL ’97, pages 56– Deepika Kumawat and Vinesh Jain. 2015. Pos tagging 62, 01. approaches: a comparison. International Journal of Computer Applications, 118(6). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton. 2016. Layer normalization. arXiv preprint J. Lafferty, A. McCallum, and Fernando Pereira. 2001. arXiv:1607.06450. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML. Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux. 2005. The curse of dimensionality for Leonardo Lesmo, Alessandro Mazzei, and Daniele local kernel machines. Techn. Rep, 1258:12. Radicioni. 2009. Extracting semantic annotations from legal texts. In HT ’09, pages 167–172, 01. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Christopher Manning and Hinrich Schutze. 1999. Tomas Mikolov. 2017. Enriching word vectors with Foundations of statistical natural language process- subword information. Transactions of the Associa- ing. MIT press. tion for Computational Linguistics, 5:135–146. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- Raffaella Brighi, Leonardo Lesmo, Alessandro Mazzei, rado, and Jeff Dean. 2013. Distributed representa- Monica Palmirani, and Daniele Radicioni. 2008. tions of words and phrases and their composition- Towards semantic interpretation of legal modifica- ality. In C. J. C. Burges, L. Bottou, M. Welling, tions through deep syntactic analysis. volume 189, Z. Ghahramani, and K. Q. Weinberger, editors, Ad- pages 202–206, 01. vances in Neural Information Processing Systems, volume 26. Curran Associates, Inc. Andrea Cimino, Lorenzo De Mattei, and Felice Dell’Orletta. 2018. Multi-task learning in deep Yasuhiro Ogawa, Shintaro Inagaki, and Katsuhiko neural networks at evalita 2018. Proceedings of Toyama. 2008. Automatic consolidation of the Wvaluation Campaign of Natural Language Pro- japanese statutes based on formalization of amend- cessing and Speech tools for Italian, pages 86–95. ment sentences. In Ken Satoh, Akihiro Inokuchi, Katashi Nagao, and Takahiro Kawamura, editors, Xiang Dai. 2018. Recognizing complex entity men- New Frontiers in Artificial Intelligence, pages 363– tions: A review and future directions. In Pro- 376, Berlin, Heidelberg. Springer Berlin Heidelberg. ceedings of ACL 2018, Student Research Workshop, pages 37–44, Melbourne, Australia, July. Associa- Monica Palmirani and Fabio Vitali, 2011. Akoma- tion for Computational Linguistics. Ntoso for Legal Documents, pages 75–100. Springer Netherlands, Dordrecht. Lorenzo De Mattei, Andrea Cimino, and Felice Gerard Salton and Christopher Buckley. 1988. Term- Dell’Orletta. 2018. Multi-task learning in deep neu- weighting approaches in automatic text retrieval. In- ral network for sentiment polarity and irony classifi- formation Processing & Management, 24(5):513– cation. In NL4AI@ AI* IA, pages 76–82. 523. Thomas G. Dietterich. 2002. Machine learning for S. Sarlis and I. Maglogiannis. 2020. On the reusability sequential data: A review. In Terry Caelli, Ad- of sentiment analysis datasets in applications with nan Amin, Robert P. W. Duin, Dick de Ridder, and dissimilar contexts. In Ilias Maglogiannis, Lazaros Mohamed Kamel, editors, Structural, Syntactic, and Iliadis, and Elias Pimenidis, editors, Artificial Intel- Statistical Pattern Recognition, pages 15–30, Berlin, ligence Applications and Innovations, pages 409– Heidelberg. Springer Berlin Heidelberg. 418, Cham. Springer International Publishing. Pierluigi Spinosa, Gerardo Giardiello, Manola Cheru- bini, Simone Marchi, Giulia Venturi, and Simonetta Montemagni. 2009. Nlp-based metadata extraction for legal text consolidation. In ICAIL, pages 40–49, 01. Jan E Trost. 1986. Statistically nonrepresentative strat- ified sampling: A sampling technique for qualitative studies. Qualitative sociology, 9(1):54–57. Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923.