=Paper= {{Paper |id=Vol-3878/10_main_long |storemode=property |title=Title Is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws |pdfUrl=https://ceur-ws.org/Vol-3878/10_main_long.pdf |volume=Vol-3878 |authors=Lorenzo Bocchi,Alessio Palmero Aprosio |dblpUrl=https://dblp.org/rec/conf/clic-it/BocchiA24 }} ==Title Is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws== https://ceur-ws.org/Vol-3878/10_main_long.pdf
                                Title is (Not) All You Need for EuroVoc
                                Multi-Label Classification of European Laws
                                Lorenzo Bocchi1,† , Alessio Palmero Aprosio1,*,†
                                1
                                    University of Trento, Italy


                                                   Abstract
                                                   Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in
                                                   recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the
                                                   classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in
                                                   the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the
                                                   European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text,
                                                   with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and
                                                   the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the
                                                   source code used for the experiments.

                                                   Keywords
                                                   EuroVoc taxonomy, Sentence reordering, Text classification



                                1. Introduction                                                                                             ically assigning EuroVoc labels to a document, starting
                                                                                                                                            from the existing approaches in document and text clas-
                                The presence of Machine Learning and Artificial Intelli-                                                    sification, that use pretrained large language models fol-
                                gence techniques has become almost ubiquitous in many                                                       lowed by a fine-tuning phase on a specific task. Un-
                                fields, from hobbyist projects to industrial and govern-                                                    fortunately, these families of language models have an
                                ment usage. Also inside the Italian Public Administra-                                                      intrinsic limit regarding the maximum number of words
                                tion, there have been efforts to digitize and modernize                                                     present in a text (usually 512). In the case of documents
                                the processes for more than a decade. In particular, some                                                   that can be quite large, like legal ones, it is important
                                documents released by the Italian PA suggest the use of                                                     to try and make sure that the key information about a
                                EuroVoc,1 a multilingual thesaurus developed and main-                                                      text is included in the chosen set of words. The previous
                                tained by the Publications Office of the European Union                                                     research deals with this limit by concatenating the title
                                (EU), that covers a wide range of subjects (law, economics,                                                 with the raw text, and then clipping it to the limit.
                                environment, ...) organized hierarchically. Outside Italy,                                                     In some countries (such as Italy, see [3]) the title is
                                Portuguese [1] and Croatian [2] communities are making                                                      usually very well formulated and it is very important
                                efforts to automatically perform tagging of official reg-                                                   to correctly classify a document. On the contrary, the
                                ulations using EuroVoc. In addition to that, in 2010 the                                                    text of a law is usually very redundant, and the most
                                EU organized in Luxembourg the Eurovoc Conference,2                                                         representative text is often after a notable sequence of
                                in order to facilitate the comprehension and use of the                                                     preambles.
                                taxonomy.                                                                                                      Given these premises, we investigate how the previous
                                   The classification of a document with respect to the                                                     approaches work on European laws and apply different
                                EuroVoc taxonomy has previously been addressed by                                                           strategies to create a summarized version of a text by
                                several studies (see Section 2), since at present the clas-                                                 reordering the sentences. The results show that in this
                                sification of the documentation in the PA is carried out                                                    specific case, both the title and the context are important,
                                manually, a task that can be very expensive in the long                                                     and that the best approach in regulations enacted by the
                                run.                                                                                                        European Parliament is to fill the 512-words limit with
                                   In this context, we concentrate our work on automat-                                                     as much information as possible.
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,                                           The paper is structured as follows: Section 2 will ex-
                                Dec 04 — 06, 2024, Pisa, Italy                                                                              pose the related work; Section 3 describes the data; the
                                *
                                  Corresponding author.                                                                                     approach and the experiments are described in Section 4;
                                †
                                  These authors contributed equally.                                                                        the results are then discussed in Section 5.
                                $ lorenzo.bocchi@unitn.it (L. Bocchi); a.palmeroaprosio@unitn.it                                               Finally, both the software and the dataset are available
                                (A. Palmero Aprosio)                                                                                        for download, as described in Section 6.
                                 0000-0002-1484-0882 (A. Palmero Aprosio)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                             Attribution 4.0 International (CC BY 4.0).
                                1
                                  https://bit.ly/eurovoc-ds
                                2
                                  https://bit.ly/eurovoc-conference




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Related work                                             EUR-Lex are tagged with labels from this level. Each TC
                                                            is linked to an MT, which is then part of a specific DO.
There have been a number of studies that explored the          The version of EuroVoc used for our studies is 4.17,
classification of European legislation with EuroVoc labels. released on 31st January 2023, containing 7,382 TCs, 127
   JRC EuroVoc Indexer [4] is a tool that allows the cate- MTs, and 21 DOs.
gorization of documents with EuroVoc classifiers in 22
languages. The data used is contained in an old dataset
[5] with documents up to 2006. The algorithm used in-
                                                            3.3. Dataset collection
volves generating a collection of lemma frequencies and To collect the documents for our task, we built a set of
weights. These frequencies are associated with specific tools written in Python that can be customized to obtain
descriptors, referred to as associates or topic signatures different subsets of the data (year, language, etc.). In total,
in the paper. When classifying a new document, the al- after filtering out the documents not tagged with EuroVoc
gorithm selects the descriptors from the topic signatures or not containing an easy accessible text (for instance, old
that exhibit the highest similarity to the lemma frequency documents only available as scanned PDFs), we collect
list of the new document.                                   around 1.1 million documents in four languages (English,
   The research described in [6] explored the usage of Italian, Spanish, French).
Recurrent Neural Networks on extreme multi-label clas-         As a subsequent task, we also removed labels that have
sification datasets, including RCV1 [7], Amazon-13K been deprecated by the EuroVoc developers throughout
[8], Wiki-30K and Wiki-500K [9], and an older EUR-Lex the years.4 Following previous work [11], we also remove
dataset from 2007 [10].                                     labels having less than 10 examples.
   In [11] the authors explore the usage of different deep-    Finally, by looking at the data, we see that the labelling
learning architectures. Furthermore, the authors also became consistent starting from 2004, while many dep-
released a dataset of 57,000 tagged documents from EUR- recated labels are still present in documents, especially
Lex.                                                        previous to 2010. We therefore consider only documents
   There are also other monolingual studies on the topic, published in the interval 2010-2022.
that mainly concentrate on Italian [12], Croatian [13],        The final dataset will consist of 471,801 documents.
and Portuguese [1].                                         On average, each law is labelled with 6 EuroVoc concepts.
   More recent works on multi-language classification on Table 1 shows some statistics about the dataset used.
EuroVoc are described in Chalkidis et al. [14], Shaheen
et al. [15], and Wang et al. [16].
                                                              4. Experiments
3. Dataset                                                    In this Section, we describe the experiments performed
                                                              on the above-described data.
3.1. EUR-Lex
The primary source for European legislation is EUR-Lex3 ,     4.1. Data split
a web portal offering comprehensive access to EU legal        To keep our experiments consistent with previous similar
documents. It is available in all 24 official languages of    approaches [17], we split the data into train, dev, and test
the European Union and is updated daily by its Publica-       sets with an approximate ratio of 80/10/10 in percentage,
tions Office. Most documents on EUR-Lex are manually          respectively.
categorized using EuroVoc concepts.                              In order to make the training reproducible and to avoid
                                                              that a single random extraction could be too (un)lucky, we
3.2. EuroVoc                                                  repeat the split using three different seeds and a pseudo-
                                                              random number generator.
EuroVoc’s hierarchical structure is divided into three           Each partition into train/dev/test is done using Itera-
layers: Thesaurus Concept (TC), Micro Thesaurus (MT,          tive Stratification [18, 19], in order to preserve the con-
previously known as the “sub-sector” level), and Do-          cept balance.
main (DO, previously known as the “main sector” level).          Unless differently specified, all the results in the rest
Each layer contains descriptors for documents, cover-         of the paper refer to the average of the values obtained
ing a broad range of EU-related subjects such as law,         by our experiments on the three splits.
economics, social affairs, and the environment, each at
varying levels of detail. The TC level is the foundational
layer where all key concepts reside, and documents on

3                                                             4
    https://eur-lex.europa.eu/                                    https://bit.ly/eurovoc-handbook
                                                                               English     Italian   Spanish   French
                Total documents                                                195,236    177,952    178,444   183,068
                Documents with text and EuroVoc labels                         118,296    117,711    117,882   117,912
                Number of EuroVoc labels used before filtering                   6,098       6,088     6,098     6,088
                Number of EuroVoc labels having less than 10 documents           2,070       2,077     2,070     2,070
                Final number of labels                                           4,028       4,011     4,028     4,018
                Removed documents                                                    3           3         3         3
Table 1
Number of documents in English, Italian, Spanish, and French relative to the time interval 2010-2022.



4.2. Methodology                                                  4.4. Pre-processing
Our models are trained using BERT [20] and its deriva-           The text of the laws is preprocessed using spaCy,9 a Nat-
tives.                                                           ural Language Processing pipeline that can extract infor-
   The choice of the best pre-trained model is very impor-       mation from texts in 24 languages. In particular, we used
tant for the accuracy of the classification using the model      it to perform sentence splitting part-of-speech tagging,
obtained after fine-tuning. In particular, [21] shows that       and named-entities recognition, used to extract content
classification tasks over the legal domain obtain better         words from the text and perform the selection of the
performance when pre-trained on legal corpora. Never-            sentences that are used in the task.
theless, in some preliminary experiments, we have tried
BERT models pre-trained on various datasets (among                4.5. Summarization
them, legal ones of course), but not always the results
award models built from legal texts.                               Given that the input length for these BERT models is 512
   Although the difference was not statistically signif-           tokens, while legislative texts are usually longer, summa-
icant, we decided to use these models anyway (from                 rizing the text by using the most important parts of it to
HuggingFace5 ):                                                    make sure it fits in the input was seen as an important
                                                                   step to follow.
         • legal-bert-base-uncased [22], consisting                   As underlined in the Introduction, the text of a law is
           of 12 GB of diverse English legal text from sev- usually very redundant, and its most representative part
           eral fields (e.g., legislation, court cases, contracts) is often after a notable sequence of preambles.
           scraped from publicly available resources;                 Since the limit of 512 tokens is very strong if compared
         • bert-base-italian-xxl-cased [23], the to the usual length of a legal document, we concentrate
           main Italian BERT model, consisting of a recent our summarization effort on reordering the sentences
           Wikipedia dump and various texts from the inside a single document so that the most informative
           OPUS corpora collection6 and data from the part of the text can be brought to the beginning and
           Italian part of the OSCAR corpus;7                      therefore included in the first 512 tokens.
         • bert-base-spanish-wwm-cased [24], also                     We use two different approaches to reach the goal:
           called BETO, is a BERT model trained on a big TF-IDF and centroid-based. In both cases, we perform
           Spanish corpus8 that consists of 3 billion words; training with the sole text reordered and the concatena-
         • camembert-base [25], a state-of-the-art lan- tion of the title and the above text.
           guage model for French based on the RoBERTa
           model [26].                                             4.5.1. TF-IDF
                                                             TF-IDF (Term Frequency-Inverse Document Frequency)
4.3. Basic configurations                                    is a widely used technique in information retrieval and
                                                             text mining to quantify the importance of terms in a
The basic configurations consist of using the sole title,
                                                             document within a larger collection of documents. It
the sole text, and the concatenation of the title and the
                                                             aims to highlight terms that are both frequent within a
text. Note that, apart from some rare outliers, title length
                                                             document and relatively rare in the overall collection,
is consistently less than 50 tokens.
                                                             thus capturing their discriminative power.
5
                                                                The TF-IDF score of a term in a document is calculated
  https://huggingface.co/
6
  http://opus.nlpl.eu/
                                                             by  multiplying two factors: the term frequency (TF) and
7
    https://traces1.inria.fr/oscar/
8                                                                 9
    https://bit.ly/big-spanish-corpora                                https://spacy.io/
the inverse document frequency (IDF).                           After obtaining the embedding for the sentence, its
  Let 𝑡 be the term and 𝑑 the document:                       score is computed as the cosine similarity between the
                                      𝑓𝑡,𝑑                    centroid and the embedding:
             tf(𝑡, 𝑑)     =     ∑︀
                                     𝑡′ ∈𝑑 𝑓𝑡 ,𝑑
                                             ′
                                                                                                𝐶 𝑇 · 𝑆𝑗
                                             𝑁                           sim(𝐶, 𝑆𝑗 ) = 1 −
           idf(𝑡, 𝐷)      =     log                                                          ||𝐶|| × ||𝑆𝑗 ||
                                    1 + |{𝑑 ∈ 𝐷 : 𝑡 ∈ 𝑑}|
                                                                 By using the previously described approach, every text
where 𝑓𝑡,𝑑 is the frequency of term 𝑡 in document 𝑑, and      was converted into a list of ranked sentences, each with
𝑁 = |𝐷| is the number of documents in the set 𝐷.              its own score.
   Beyond the usual TF-IDF, we also perform a label-
based approach, that considers one document for each
label, by concatenating all the texts belonging to the laws   4.6. Random
having that label.                                            Because of the obtained results (see Section 4.7), we also
   Once all the documents have gone through this process,     added two configurations that used a random ordering of
the TF-IDF matrix is calculated using TfidfVectorizer         the sentences (one concatenated with the title, the other
from the Python package scikit-learn10 over the content       one containing only the randomly ordered text).
words (see Section 4.4) of the texts.
   After obtaining the TF-IDF matrix, the final step is
to assign a score to each sentence. For each valid base       4.7. Evaluation
form, its score is determined from the TF-IDF matrix          The evaluation of our experiments is performed by using
by selecting the highest value within the corresponding       the F1 score, macro-averaged so that each label has the
column (which represents a word). These scores are then       same weight (this metric awards models that perform
added to a list for each sentence. Once a sentence is         better on less-represented labels). Since we are dealing
processed, the maximum or average score is calculated         with a multi-label classification task, we have to choose
(“max” and “mean” in the results). This calculated value      between considering always the same number 𝐾 of re-
becomes the sentence’s score. The process is repeated         sults (𝑃 @𝐾, 𝑅@𝐾, 𝐹 1@𝐾) or keeping only the labels
for all sentences in every document.                          whose confidence is higher than a particular threshold
                                                              (usually between 0 and 1). In our experiments, we chose
4.5.2. Centroid                                               the second approach, since the number of concepts in
                                                              each document of the dataset is not constant. Given the
In this approach, described in [27], the centroid of the
                                                              evaluation performed on the development set, we set that
word vectors in the text is calculated, then a score is
                                                              threshold to 0.5.
assigned to each sentence based on their cosine distance
from the centroid. The closer a sentence is to the centroid,
the higher the score it will receive. In our approach, we 4.8. Results
use fastText [28] for word embeddings.                       Table 2 shows the results of the different configurations
   The words used to compute the centroid are those that in the four languages. The first column contains the de-
have been extracted as content words (see Section 4.4) scription of the experiment, while columns TC, MT, and
and have a TF-IDF higher than a certain threshold 𝑡, DO show the result in terms of Thesaurus Concept (TC),
which in this case was 0.3. The centroid is computed Micro Thesaurus (MT), and Domain (DO), as described
as the mean of the word embeddings of the previously in Section 3.
selected words:
                      ∑︀
                         𝑤∈𝐷𝑡 𝐸[idx(𝑤)]
                𝐶=
                              |𝐷𝑡 |
                                                             5. Discussion
where 𝐷𝑡 is the corpus of words with tfidf(𝑤) > 𝑡.       Results show that the best performances are reached
   Each sentence in the document gets transformed into a when   the title is included in the text (see the rows without
unique embedding representation by averaging the sum “not”) with the exception brought by the simple use of the
of the embedding vectors of each word in the sentence: text without reordering. An interesting outcome is that
                                                         the experiment using title+random obtains very good
                                                         results when compared to the best configurations.
                      ∑︀
                         𝑤∈𝑆𝑗 𝐸[idx(𝑤)]
               𝑆𝑗 =                                         On the contrary, using random text without the ti-
                            |𝑆 − 𝑗|
                                                         tle, or using the sole title results in a decrease in global
where 𝑆𝑗 is the 𝑗-th sentence in document 𝐷.             performance.
10
     https://scikit-learn.org
                                           English                        Italian                      Spanish                  French
                                    TC       MT         DO        TC        MT          DO      TC       MT    DO        TC       MT      DO
        basic                      0.484    0.729      0.812     0.450     0.709       0.798   0.493    0.732 0.818     0.383    0.666   0.775
        basic-not                  0.474    0.722      0.808     0.453     0.710       0.799   0.483    0.726 0.811     0.370    0.655   0.765
        centroid                   0.468    0.720      0.806     0.454     0.710       0.799   0.479    0.719 0.810     0.372    0.658   0.764
        centroid-not               0.426    0.692      0.784     0.405     0.673       0.774   0.430    0.687 0.784     0.335    0.627   0.745
        title-only                 0.432    0.682      0.772     0.407     0.665       0.758   0.444    0.684 0.771     0.320    0.600   0.716
        tfidf-max-doc              0.476    0.724      0.811     0.427     0.693       0.788   0.459    0.711 0.804     0.345    0.642   0.754
        tfidf-max-lab              0.477    0.728      0.812     0.459     0.711       0.802   0.483    0.724 0.813     0.378    0.660   0.767
        tfidf-mean-doc             0.479    0.726      0.812     0.427     0.693       0.786   0.484    0.726 0.812     0.381    0.663   0.774
        tfidf-mean-lab             0.481    0.726      0.813     0.428     0.693       0.788   0.485    0.726 0.813     0.338    0.633   0.749
        tfidf-max-doc-not          0.427    0.692      0.787     0.379     0.657       0.763   0.422    0.682 0.786     0.301    0.607   0.726
        tfidf-max-lab-not          0.433    0.696      0.791     0.411     0.678       0.779   0.425    0.685 0.782     0.298    0.608   0.728
        tfidf-mean-doc-not         0.433    0.696      0.790     0.415     0.682       0.781   0.442    0.700 0.796     0.332    0.626   0.742
        tfidf-mean-lab-not         0.436    0.697      0.792     0.388     0.667       0.771   0.428    0.684 0.784     0.296    0.598   0.723
        random                     0.472    0.722      0.808     0.423     0.692       0.787   0.482    0.723 0.807     0.372    0.652   0.767
        random-not                 0.429    0.693      0.788     0.398     0.671       0.774   0.439    0.693 0.778     0.318    0.611   0.724
Table 2
Results of our experiments (macro 𝐹1 ).



   By looking at the statistical significance,11 we find out                     are used to select it.
that we can split, more or less, the experiments into two                           French results bring significantly lower accuracy: this
big groups: the ones that in the English part of the table                       is not expected and is probably due to the choice of the
have a DO 𝐹1 above 0.80 and the remaining ones that are                          BERT pre-trained model.
below 0.79. The exception is the “title-only” configura-
tion, which obtains lower accuracy in all languages and
contrasts with the results obtained in a similar previous                        6. Release
work applied to Italian laws [3], where the use of the sole
                                                                                 The source code for all the experiments (from the retrieval
title results in an increase in performance with respect
                                                                                 of the documents to the training of the models), the data
to the concatenation between title and text.
                                                                                 downloaded from EUR-Lex, and the models are available
   By listing the documents where EuroVoc labels are
                                                                                 on the project Github page.12
not extracted correctly, it seems that in the European
legislation it is quite common to find very generic ti-
tles. For instance, the title of the document with ID                            7. Conclusions and Future Work
“CELEX:32011Q0624(01)” is “Rules of procedure for the
appeal committee (Regulation (EU) No 182/2011)”, from                            In this paper, we presented some approaches to perform
which is very hard to extract relevant information about                         document classification on long documents, by reorder-
the topic. One can find other similar documents, such                            ing their sentences before the fine-tuning phase. The
as “Action brought on 2 March 2011 — Attey v Council”,                           best results are obtained when all the 512 tokens allowed
title of law with ID “CELEX:62011TN0118”.                                        in the BERT paradigm are filled, possibly including the
   In general, our experiments show that the classifica-                         title of the law.
tion of European laws obtains the best performance on                               In the future, we want to extend this approach to other
BERT when all the possible tokens are filled, possibly                           languages, trying to understand whether the same re-
using the title and some parts of the text. The high accu-                       ordering algorithm leads to some improvement in the
racy obtained in the experiments performed by randomly                           classification task. We will also investigate other sum-
reordering the sentences demonstrates that the context                           marization approaches, or new architectures that rely on
is important per se, even when no particular strategies                          Local, Sparse, and Global attention [29] so that longer
                                                                                 texts (up to 16K tokens) can be used to train the model.
11
     To calculate statistical significance, a one-tailed 𝑡-test with a signif-
     icance level of .05 was applied to the scores of the five runs, with
     the null hypothesis that no difference is observed, and the alterna-
     tive hypothesis that the score obtained with the summarized text
                                                                                 12
     is significantly greater than the one with the normal text.                      https://github.com/bocchilorenzo/AutoEuroVoc
References                                                         lems in the legal domain, 2010. URL: http://dx.doi.
                                                                   org/10.1007/978-3-642-12837-0_11. doi:10.1007/
 [1] D. Caled, M. Won, B. Martins, M. J. Silva, A hi-              978-3-642-12837-0_11.
     erarchical label network for multi-label eurovoc         [11] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, I. An-
     classification of legislative contents, in: Digi-             droutsopoulos, Large-scale multi-label text clas-
     tal Libraries for Open Knowledge: 23rd Interna-               sification on eu legislation,          arXiv preprint
     tional Conference on Theory and Practice of Digi-             arXiv:1906.02192 (2019).
     tal Libraries, TPDL 2019, Oslo, Norway, September        [12] G. Boella, L. Di Caro, L. Lesmo, D. Rispoli,
     9-12, 2019, Proceedings, Springer-Verlag, Berlin,             Multi-label classification of legislative text into
     Heidelberg, 2019, p. 238–252. URL: https://doi.               eurovoc,       Legal Knowledge and Information
     org/10.1007/978-3-030-30760-8_21. doi:10.1007/                Systems: JURIX 2012: the Twenty-Fifth An-
     978-3-030-30760-8_21.                                         nual Conference 250 (2013) 21. doi:10.3233/
 [2] T. D. Prekpalaj, The role of key words and the use            978-1-61499-167-0-21.
     of the multilingual eurovoc thesaurus when search-       [13] F. Saric, B. D. Basic, M.-F. Moens, J. Šnajder, Multi-
     ing for legal regulations of the republic of croatia -        label classification of croatian legal documents us-
     research results, in: 2021 44th International Con-            ing eurovoc thesaurus, 2014.
     vention on Information, Communication and Elec-          [14] I. Chalkidis, M. Fergadiotis, I. Androutsopoulos,
     tronic Technology (MIPRO), 2021, pp. 1470–1475.               MultiEURLEX - a multi-lingual and multi-label le-
     doi:10.23919/MIPRO52101.2021.9597043.                         gal document classification dataset for zero-shot
 [3] M. Rovera, A. P. Aprosio, F. Greco, M. Lucchese,              cross-lingual transfer, in: Proceedings of the
     S. Tonelli, A. Antetomaso, Italian legislative text           2021 Conference on Empirical Methods in Natu-
     classification for gazzetta ufficiale (2022).                 ral Language Processing, Association for Compu-
 [4] R. Steinberger, M. Ebrahim, M. Turchi, Jrc eurovoc            tational Linguistics, Online and Punta Cana, Do-
     indexer jex-a freely available multi-label categori-          minican Republic, 2021, pp. 6974–6996. URL: https:
     sation tool, arXiv preprint arXiv:1309.5223 (2013).           //aclanthology.org/2021.emnlp-main.559. doi:10.
 [5] R. Steinberger, B. Pouliquen, A. Widiger, C. Ig-              18653/v1/2021.emnlp-main.559.
     nat, T. Erjavec, D. Tufiş, D. Varga, The JRC-            [15] Z. Shaheen, G. Wohlgenannt, E. Filtz, Large scale
     Acquis: A multilingual aligned parallel corpus                legal text classification using transformer models,
     with 20+ languages, in: Proceedings of the                    2020. arXiv:2010.12871.
     Fifth International Conference on Language Re-           [16] L. Wang, Y. W. Teh, M. A. Al-Garadi, Adopt-
     sources and Evaluation (LREC’06), European Lan-               ing the multi-answer questioning task with an
     guage Resources Association (ELRA), Genoa, Italy,             auxiliary metric for extreme multi-label text
     2006. URL: http://www.lrec-conf.org/proceedings/              classification utilizing the label hierarchy, 2023.
     lrec2006/pdf/340_pdf.pdf.                                     arXiv:2303.01064.
 [6] R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka,         [17] A. Avram, V. F. Pais, D. Tufis,             Pyeurovoc:
     S. Zhu, Attentionxml: Label tree-based attention-             A tool for multilingual legal document clas-
     aware deep model for high-performance extreme                 sification with eurovoc descriptors,               CoRR
     multi-label text classification, Advances in Neural           abs/2108.01139 (2021). URL: https://arxiv.org/abs/
     Information Processing Systems 32 (2019).                     2108.01139. arXiv:2108.01139.
 [7] D. D. Lewis, Y. Yang, T. G. Rose, F. Li, Rcv1: A         [18] K. Sechidis, G. Tsoumakas, I. Vlahavas, On the
     new benchmark collection for text categorization              stratification of multi-label data, Machine Learning
     research, J. Mach. Learn. Res. 5 (2004) 361–397.              and Knowledge Discovery in Databases (2011) 145–
 [8] J. McAuley, J. Leskovec, Hidden factors and hid-              158.
     den topics: Understanding rating dimensions with         [19] P. Szymański, T. Kajdanowicz, A network perspec-
     review text, in: Proceedings of the 7th ACM Con-              tive on stratification of multi-label data, in: L. Torgo,
     ference on Recommender Systems, RecSys ’13, As-               B. Krawczyk, P. Branco, N. Moniz (Eds.), Proceed-
     sociation for Computing Machinery, New York,                  ings of the First International Workshop on Learn-
     NY, USA, 2013, p. 165–172. URL: https://doi.org/              ing with Imbalanced Domains: Theory and Applica-
     10.1145/2507157.2507163. doi:10.1145/2507157.                 tions, volume 74 of Proceedings of Machine Learning
     2507163.                                                      Research, PMLR, ECML-PKDD, Skopje, Macedonia,
 [9] A. Zubiaga, Enhancing navigation on wikipedia                 2017, pp. 22–35.
     with social tags, arXiv preprint arXiv:1202.5469         [20] J. Devlin, M. Chang, K. Lee, K. Toutanova,
     (2012).                                                       BERT: pre-training of deep bidirectional trans-
[10] E. Loza Mencía, J. Fürnkranz, Efficient multil-               formers for language understanding,                CoRR
     abel classification algorithms for large-scale prob-          abs/1810.04805 (2018). URL: http://arxiv.org/abs/
     1810.04805. arXiv:1810.04805.
[21] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Ale-
     tras, I. Androutsopoulos, Extreme multi-label legal
     text classification: A case study in EU legislation,
     in: Proceedings of the Natural Legal Language Pro-
     cessing Workshop 2019, Association for Computa-
     tional Linguistics, Minneapolis, Minnesota, 2019,
     pp. 78–87. URL: https://aclanthology.org/W19-2209.
     doi:10.18653/v1/W19-2209.
[22] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Ale-
     tras, I. Androutsopoulos, LEGAL-BERT: The mup-
     pets straight out of law school, in: Findings
     of the Association for Computational Linguistics:
     EMNLP 2020, Association for Computational Lin-
     guistics, Online, 2020, pp. 2898–2904. URL: https://
     aclanthology.org/2020.findings-emnlp.261. doi:10.
     18653/v1/2020.findings-emnlp.261.
[23] S. Schweter, Italian bert and electra models,
     2020. URL: https://doi.org/10.5281/zenodo.4263142.
     doi:10.5281/zenodo.4263142.
[24] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho,
     H. Kang, J. Pérez, Spanish pre-trained bert model
     and evaluation data, in: PML4DC at ICLR 2020,
     2020.
[25] L. Martin, B. Muller, P. J. O. Suárez, Y. Dupont, L. Ro-
     mary, É. V. de la Clergerie, D. Seddah, B. Sagot,
     Camembert: a tasty french language model, in:
     Proceedings of the 58th Annual Meeting of the As-
     sociation for Computational Linguistics, 2020.
[26] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
     O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
     Roberta: A robustly optimized BERT pretraining
     approach, CoRR abs/1907.11692 (2019). URL: http:
     //arxiv.org/abs/1907.11692. arXiv:1907.11692.
[27] G. Rossiello, P. Basile, G. Semeraro, Centroid-based
     text summarization through compositionality of
     word embeddings, in: Proceedings of the multiling
     2017 workshop on summarization and summary
     evaluation across source types and genres, 2017, pp.
     12–21.
[28] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, En-
     riching word vectors with subword information,
     Transactions of the association for computational
     linguistics 5 (2017) 135–146.
[29] C. Condevaux, S. Harispe, Lsg attention: Extrapola-
     tion of pretrained transformers to long sequences,
     in: Pacific-Asia Conference on Knowledge Discov-
     ery and Data Mining, Springer, 2023, pp. 443–454.