<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>the oficial journal of records of the Italian government</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>KEVLAR: the Complete Resource for EuroVoc Classification of Legal Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo Bocchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camilla Casula</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Palmero Aprosio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>The use of Machine Learning and Artificial Intelligence in the Public Administration (PA) has increased in the last years. In particular, recent guidelines proposed by various governments for the classification of documents released by the PA suggest to use the EuroVoc thesaurus. In this paper, we present KEVLAR, an all-in-one solution for performing the above-mentioned task on acts belonging to the Public Administration. First, we create a collection of 8 million documents in 24 languages, tagged with EuroVoc labels, taken from EUR-Lex, the web portal of the European Union legislation. Then, we train diferent pre-trained BERT-based models, comparing the performance of base models with domain-specific and multilingual ones. We release the corpus, the best-performing models, and a Docker image containing the source code of the trainer, the REST API, and the web interface. This image can be employed out-of-the-box for document classification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;EuroVoc taxonomy</kwd>
        <kwd>multilingual text classification</kwd>
        <kwd>BERT</kwd>
        <kwd>web interface</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>EuroVoc is a multilingual and multidisciplinary thesaurus
that has seen a significant rise in its use and importance
in recent years. In particular, the taxonomy used in this
thesaurus has become crucial for a number of activities
of European Public Administrations, shaping the way
information is organized, disseminated, and accessed.</p>
      <p>Containing over 7,000 concepts, EuroVoc acts as a
reliable and eficient indexing system for a vast range of
documents, legislative texts, and reports. Due to this, a
growing number of governmental institutions around
Europe has begun to use it internally for document
categorization.</p>
      <p>
        The Spanish government, for instance, has suggested
the adoption of EuroVoc since 2014 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and has more
recently started using it regularly in its oficial open
data portal,1 and in the Portal de la Administración
Electrónica website.2 Similarly, German and French public
administrations are following the same strategy, in the
DCAT-AP.de3 and data.gouv.fr4 portals respectively.
      </p>
      <p>
        Furthermore, Rovera et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] presented a preliminary
1. First, we release a collection of more than 8
million documents from EUR-Lex, the European
Union’s oficial web portal, which gives
comprehensive access to EU legal documents, spanning
more than 70 years of EU legislation (1948-2022),
and covering 24 languages. Over half of these
texts are already tagged with the corresponding BERT models in 22 diferent languages, which were
fineEuroVoc concepts. tuned for the task. The source code in Python is publicly
2. Secondly, we perform a series of experiments for released, but cannot be used out-of-the-box and a known
automatic tagging of the documents using the Eu- bug6 may have led to unreliable results.
roVoc taxonomy, comparing diferent approaches Some similar recent works on multi-language
classiand language models. ifcation are described in Chalkidis et al. [16], Shaheen
3. Finally, we develop a web interface (see Figure 1) et al. [17], and Wang et al. [18]. Outside of the EuroVoc
and a REST API that anyone (citizen or public ecosystem, two large-sized legal datasets were released
administration) could use both to easily try auto- by Niklaus et al. [19, 20] for language model creation.
matic classification of documents and to integrate
such categorization in any systems that might
need it.
      </p>
      <p>The models used for the web demo and the release
are the best-performing ones we found, as described in
Section 5. All the data and tools (the set of documents
labeled with EuroVoc labels, the models, and the demo
code) are freely available for download.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Several investigations have delved into the categorization
of European legislation using EuroVoc labels. Notably,
the task can be regarded as Extreme Multilabel
Classification, as recognized in Liu et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The JRC EuroVoc Indexer, detailed in Steinberger et al.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], stands as a tool facilitating document categorization
through EuroVoc classifiers across 22 languages.
However, the dataset used for this tool [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is limited to doc- Figure 2: Example of EuroVoc taxonomy.
uments up to 2006. Their method entails the creation
of lemma frequencies and associated weights, linked to
specific descriptors referred to as associates or topic
signatures in the research. When classifying a new document, 3. Dataset description
the algorithm selects descriptors from the topic
signatures exhibiting the highest resemblance to the lemma 3.1. EUR-Lex
frequency list of the new document.
      </p>
      <p>
        Later, You et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] explored the application of Recur- The reference for European legislation is EUR-Lex7, a web
rent Neural Networks (RNNs) to extreme multi-label clas- portal that grants users comprehensive access to EU legal
sification datasets, encompassing RCV1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Amazon-13K documents. It is available in all of the European Union’s
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Wiki-30K, Wiki-500K [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and an older EUR-Lex 24 oficial languages and is updated daily by its
Publicadataset from 2007 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Attention-based RNNs proved to tions Ofice. Most of the documents present in EUR-Lex
be particularly efective, outperforming other methods are manually categorized using EuroVoc concepts.
in 4 out of 5 datasets.
      </p>
      <p>
        Chalkidis et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] explored diverse deep learning 3.2. EuroVoc
architectures for this task. Among these, a fine-tuned
BERT-base model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] showed the highest performance, EuroVoc’s hierarchical structure is organized into three
achieving a micro-averaged F1 score of 0.732 (considering diferent layers: Thesaurus Concept (TC), Micro
Theall labels). Furthermore, they released a dataset consist- saurus (MT, previously referred to as “sub-sector” level),
ing of 57,000 tagged documents from EUR-Lex.5 and Domain (DO, previously referred to as “main
sec
      </p>
      <p>One of the most complete contributions to document tor” level). The TC level is the base level, where all the
classification using EuroVoc is PyEuroVoc, outlined in key concepts are found. The documents on EUR-Lex are
Avram et al. [15]. This study employs various pre-trained tagged with labels from this level. Every TC is assigned</p>
      <sec id="sec-2-1">
        <title>5https://bit.ly/eurlex57k</title>
      </sec>
      <sec id="sec-2-2">
        <title>6https://bit.ly/pyeurovoc-bug 7https://eur-lex.europa.eu/</title>
        <p>to an MT, which in turn is part of a specific DO. For
example, the label “Confidentiality” 8 is assigned to the
MT “Information and information processing”, which be- In this section we provide a detailed account of the
exlongs to the DO concept “Education and communication”. periments conducted on document classification with
Figure 2 shows a small subset of the EuroVoc taxonomy. respect to the EuroVoc taxonomy.</p>
        <p>The experiments of this work have been launched on
version 4.17 of EuroVoc. It contains 7,382 TCs, 127 MTs, 4.1. Deprecated labels and labels
and 21 DOs. frequency</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <p>3.3. Dataset collection The EuroVoc thesaurus was initially developed in the
1980s and has constantly been updated and revised. Some
KEVLAR was collected by downloading the documents labels started being used much earlier than others, and
from EUR-Lex. We built a set of tools written in Python some are even deprecated for modern use but are still
that can be customized to obtain diferent subsets of the present in older documents.10 This means that certain
data (year, language, etc.). topics could stop being used in the future, potentially</p>
      <p>In total, 8,368,328 documents were collected in 24 lan- resulting in concept being replaced or merged with other
guages, 5,158,438 of which are annotated with EuroVoc existing concepts in future releases of EuroVoc.
descriptors, for a total of 32,021,783 tags. On average, 6.2 Figure 4 shows the total occurrences of deprecated
tags are associated with each document. labels on a yearly basis. The result shows that from</p>
      <p>After filtering out these documents, 9 around 1.1 million 2010 the usage of these labels decreased dramatically
texts with EuroVoc labels are collected. compared to the previous decade.</p>
      <p>Figure 3 shows the number of documents per year in In addition to this, in EuroVoc labels assignment there
English. The blue bars show the total number of docu- is a strong imbalance in the data. For example, the most
ments retrieved for the year, while the orange bars show frequent label in the Italian documents, “economic
conthe number of documents that were labelled and have centration" with ID 69, is used more than 13,000 times,
full text. The reduction is quite significant, especially while the least frequent ones were assigned to just one
before the year 2000. document.
8http://eurovoc.europa.eu/92
9Laws without any EuroVoc concept associated are not useful for
our study. Regarding documents available in PDF format only, one
could extract the text from them using OCR: this could be done in
future work.
10https://bit.ly/eurovoc-handbook
repeat the split using three diferent seeds and a
pseudorandom number generator.</p>
      <p>Each partition into train/dev/test is done using
Iterative Stratification [ 27, 28], in order to preserve the
concept balance.</p>
      <p>Unless diferently specified, all the results in the rest
of the paper refer to the average of the values obtained
by our experiments on the three seeds.
4.4. Training</p>
      <p>To keep our experiments consistent with previous similar
approaches (e.g. Avram et al. [15]), we split the data into
train, dev, and test sets with an approximate ratio of
80/10/10, respectively.</p>
      <p>In order to make the training reproducible and to avoid
a single random extraction that could be too (un)lucky, we
11joelniklaus/legal-swiss-roberta-large
en
fr
it
es
de</p>
      <p>Base model
bert-base-uncased
flaubert/flaubert_base_uncased
dbmdz/bert-base-italian-cased</p>
      <p>Legal model
nlpaueb/legal-bert-base-uncased
joelniklaus/legal-french-roberta-base
dlicari/Italian-Legal-BERT
dccuchile/bert-base-spanish-wwm-cased</p>
      <p>joelniklaus/legal-spanish-roberta-base
bert-base-german-cased
joelniklaus/legal-german-roberta-base</p>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion</title>
    </sec>
    <sec id="sec-5">
      <title>6. Release and demo</title>
      <p>All the data12 and models13 described in this paper are
available for download under the CC-BY 4.0.</p>
      <p>In addition to the documents, we also release on
GitHub the code used to train and evaluate the models.14</p>
      <p>Given that one of the main objectives of our research
is to ofer a comprehensive solution for aiding public
administrations in document classification, we have also
shared the source code for a REST API and a
demonstration interface system (see Figure 1), alongside a Docker
image for efortless deployment.</p>
      <p>While the training phase requires GPUs for optimal
performance, the models discussed in this article –
accessible through package installation via Docker – can
be utilized eficiently with CPU processing. Upon tool
installation, users have the flexibility to select the
desired languages, allowing only necessary models to be
downloaded and loaded into memory.
12https://bit.ly/kevlar-2024
13https://dh.fbk.eu/software/kevlar-models
14https://github.com/dhfbk/kevlar</p>
      <p>A running instance of the API and the web demo is
available for testing purposes.15</p>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusions and Future Work</title>
      <p>In this paper, we release KEVLAR, an all-in-one solution
for performing the document classification task on acts
belonging to the Public Administration. We collected
more than 8 million documents in 24 languages,
compared diferent BERT and RoBERTa-based models on the
classification of documents with respect to the EuroVoc
taxonomy, and built an out-of-the-box tool for easily
applying the classification to any text.</p>
      <p>In the future, we will continue the exploration of novel
methods to address this task with potentially better
performance, for example using better-performing models
or exploiting generation-based solutions.
15https://dh-server.fbk.eu/kevlar-ui/
en (base)
en (legal)
en (legal-ml)
it (base)
it (legal)
it (legal-ml)
fr (base)
fr (legal)
fr (legal-ml)
de (base)
de (legal)
de (legal-ml)
es (base)
es (legal)
es (legal-ml)
nl (legal-ml)
cs (legal-ml)
da (legal-ml)
et (legal-ml)
fi (legal-ml)
pt (legal-ml)
hu (legal-ml)
lt (legal-ml)
sv (legal-ml)
bg (legal-ml)
el (legal-ml)
ga (legal-ml)
hr (legal-ml)
lv (legal-ml)
mt (legal-ml)
pl (legal-ml)
ro (legal-ml)
sk (legal-ml)
sl (legal-ml)
0,455
0,484
0,544
0,450
0,330
0,487
0,529
0,461
0,495
0,435
0,371
0,514
0,485
0,408
0,523
0,400
0,406
0,359
0,413
0,412
0,385
0,438
0,302
0,429
0,399
0,414
0,213
0,386
0,299
0,371
0,434
0,417
0,390
0,391
0,714
0,729
0,769
0,709
0,619
0,735
0,750
0,719
0,737
0,689
0,656
0,738
0,730
0,686
0,754
0,669
0,675
0,633
0,677
0,672
0,662
0,695
0,608
0,684
0,669
0,680
0,298
0,660
0,600
0,646
0,688
0,680
0,665
0,663
0,800
0,812
0,842
0,798
0,736
0,818
0,827
0,808
0,822
0,786
0,766
0,823
0,812
0,783
0,830
0,774
0,778
0,746
0,775
0,772
0,769
0,792
0,732
0,783
0,771
0,782
0,494
0,770
0,727
0,756
0,786
0,781
0,770
0,768
Papers), Association for Computational Linguistics, //aclanthology.org/2020.coling-main.598. doi:10.
Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: 18653/v1/2020.coling-main.598.
https://aclanthology.org/N19-1423. doi:10.18653/ [25] D. Licari, G. Comandè, ITALIAN-LEGAL-BERT:
v1/N19-1423. A Pre-trained Transformer Language Model for
[15] A. Avram, V. F. Pais, D. Tufis, Pyeurovoc: Italian Law, in: D. Symeonidou, R. Yu, D. Ceolin,
A tool for multilingual legal document clas- M. Poveda-Villalón, D. Audrito, L. D. Caro, F. Grasso,
sification with eurovoc descriptors, CoRR R. Nai, E. Sulis, F. J. Ekaputra, O. Kutz, N. Troquard
abs/2108.01139 (2021). URL: https://arxiv.org/abs/ (Eds.), Companion Proceedings of the 23rd
Interna2108.01139. arXiv:2108.01139. tional Conference on Knowledge Engineering and
[16] I. Chalkidis, M. Fergadiotis, I. Androutsopoulos, Knowledge Management, volume 3256 of CEUR
MultiEURLEX - a multi-lingual and multi-label le- Workshop Proceedings, CEUR, Bozen-Bolzano, Italy,
gal document classification dataset for zero-shot 2022. URL: https://ceur-ws.org/Vol-3256/#km4law3,
cross-lingual transfer, in: Proceedings of the iSSN: 1613-0073.
2021 Conference on Empirical Methods in Natu- [26] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N.
Aleral Language Processing, Association for Compu- tras, I. Androutsopoulos, LEGAL-BERT: The
muptational Linguistics, Online and Punta Cana, Do- pets straight out of law school, in: Findings
minican Republic, 2021, pp. 6974–6996. URL: https: of the Association for Computational Linguistics:
//aclanthology.org/2021.emnlp-main.559. doi:10. EMNLP 2020, Association for Computational
Lin18653/v1/2021.emnlp-main.559. guistics, Online, 2020, pp. 2898–2904. URL: https://
[17] Z. Shaheen, G. Wohlgenannt, E. Filtz, Large scale aclanthology.org/2020.findings-emnlp.261. doi: 10.
legal text classification using transformer models, 18653/v1/2020.findings-emnlp.261.
2020. arXiv:2010.12871. [27] K. Sechidis, G. Tsoumakas, I. Vlahavas, On the
[18] L. Wang, Y. W. Teh, M. A. Al-Garadi, Adopt- stratification of multi-label data, Machine Learning
ing the multi-answer questioning task with an and Knowledge Discovery in Databases (2011) 145–
auxiliary metric for extreme multi-label text 158.
classification utilizing the label hierarchy, 2023. [28] P. Szymański, T. Kajdanowicz, A network
perspecarXiv:2303.01064. tive on stratification of multi-label data, in: L. Torgo,
[19] J. Niklaus, V. Matoshi, M. Stürmer, I. Chalkidis, D. E. B. Krawczyk, P. Branco, N. Moniz (Eds.),
ProceedHo, Multilegalpile: A 689gb multilingual legal cor- ings of the First International Workshop on
Learnpus, 2023. arXiv:2306.02069. ing with Imbalanced Domains: Theory and
Applica[20] J. Niklaus, V. Matoshi, P. Rani, A. Galassi, tions, volume 74 of Proceedings of Machine Learning
M. Stürmer, I. Chalkidis, Lextreme: A multi-lingual Research, PMLR, ECML-PKDD, Skopje, Macedonia,
and multi-task benchmark for the legal domain, 2017, pp. 22–35.</p>
      <p>2023. arXiv:2301.13126. [29] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
[21] H. Le, L. Vial, J. Frej, V. Segonne, M. Coavoux, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
B. Lecouteux, A. Allauzen, B. Crabbé, L. Besacier, Roberta: A robustly optimized bert pretraining
apD. Schwab, Flaubert: Unsupervised language model proach, arXiv preprint arXiv:1907.11692 (2019).
pre-training for french, in: Proceedings of The [30] I. Chalkidis, A. Jana, D. Hartung, M.
Bommar12th Language Resources and Evaluation Confer- ito, I. Androutsopoulos, D. Katz, N. Aletras,
ence, European Language Resources Association, LexGLUE: A benchmark dataset for legal
lanMarseille, France, 2020, pp. 2479–2490. URL: https: guage understanding in English, in: Proceedings
//www.aclweb.org/anthology/2020.lrec-1.302. of the 60th Annual Meeting of the Association
[22] S. Schweter, Italian bert and electra models, for Computational Linguistics (Volume 1: Long
2020. URL: https://doi.org/10.5281/zenodo.4263142. Papers), Association for Computational
Linguisdoi:10.5281/zenodo.4263142. tics, Dublin, Ireland, 2022, pp. 4310–4330. URL:
[23] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, https://aclanthology.org/2022.acl-long.297. doi:10.</p>
      <p>H. Kang, J. Pérez, Spanish pre-trained bert model 18653/v1/2022.acl-long.297.
and evaluation data, in: PML4DC at ICLR 2020, [31] I. Beltagy, M. E. Peters, A. Cohan, Longformer:
2020. The long-document transformer, arXiv:2004.05150
[24] B. Chan, S. Schweter, T. Möller, German’s (2020).</p>
      <p>next language model, in: Proceedings of [32] M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie,
the 28th International Conference on Com- C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang,
putational Linguistics, International Commit- L. Yang, et al., Big bird: Transformers for longer
tee on Computational Linguistics, Barcelona, sequences, Advances in neural information
proSpain (Online), 2020, pp. 6788–6796. URL: https: cessing systems 33 (2020) 17283–17297.
[33] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N.
Aletras, I. Androutsopoulos, Extreme multi-label legal
text classification: A case study in EU legislation,
in: Proceedings of the Natural Legal Language
Processing Workshop 2019, Association for
Computational Linguistics, Minneapolis, Minnesota, 2019,
pp. 78–87. URL: https://aclanthology.org/W19-2209.
doi:10.18653/v1/W19-2209.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.-J.</given-names>
            <surname>Martínez-Méndez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>López-Carreño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-A.</given-names>
            <surname>Pastor-Sánchez</surname>
          </string-name>
          ,
          <article-title>Open data en las administraciones públicas españolas: categorías temáticas y apps</article-title>
          ,
          <source>Profesional de la información 23</source>
          (
          <year>2014</year>
          )
          <fpage>415</fpage>
          -
          <lpage>423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rovera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lucchese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Antetomaso</surname>
          </string-name>
          ,
          <article-title>Italian legislative text classification for Gazzetta Uficiale, AI per la Pubblica Amministrazione</article-title>
          , at Ital-IA (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Prekpalaj</surname>
          </string-name>
          ,
          <article-title>The role of key words and the use of the multilingual eurovoc thesaurus when searching for legal regulations of the republic of croatia - research results</article-title>
          ,
          <source>in: 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1470</fpage>
          -
          <lpage>1475</lpage>
          . doi:
          <volume>10</volume>
          .23919/MIPRO52101.
          <year>2021</year>
          .
          <volume>9597043</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Caled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Won</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <article-title>A hierarchical label network for multi-label eurovoc classification of legislative contents, in: Digital Libraries for Open Knowledge: 23rd Interna-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Deep learning for extreme multi-label text classification</article-title>
          ,
          <source>in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          , p.
          <fpage>115</fpage>
          -
          <lpage>124</lpage>
          . URL: https://doi.org/10.1145/3077136. 3080834. doi:
          <volume>10</volume>
          .1145/3077136.3080834.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Turchi</surname>
          </string-name>
          ,
          <article-title>Jrc eurovoc indexer jex-a freely available multi-label categorisation tool</article-title>
          , arXiv preprint arXiv:
          <volume>1309</volume>
          .5223 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pouliquen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Widiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ignat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Erjavec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tufiş</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Varga</surname>
          </string-name>
          ,
          <article-title>The JRCAcquis: A multilingual aligned parallel corpus with 20+ languages</article-title>
          ,
          <source>in: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Genoa, Italy,
          <year>2006</year>
          . URL: http://www.lrec-conf.org/proceedings/ lrec2006/pdf/340_pdf.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mamitsuka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Attentionxml: Label tree-based attentionaware deep model for high-performance extreme multi-label text classification</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Rcv1: A new benchmark collection for text categorization research</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>5</volume>
          (
          <year>2004</year>
          )
          <fpage>361</fpage>
          -
          <lpage>397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J. McAuley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Hidden factors and hidden topics: Understanding rating dimensions with review text</article-title>
          ,
          <source>in: Proceedings of the 7th ACM Conference on Recommender Systems</source>
          , RecSys '13,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2013</year>
          , p.
          <fpage>165</fpage>
          -
          <lpage>172</lpage>
          . URL: https://doi.org/ 10.1145/2507157.2507163. doi:
          <volume>10</volume>
          .1145/2507157. 2507163.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Enhancing navigation on wikipedia with social tags</article-title>
          ,
          <source>arXiv preprint arXiv:1202.5469</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Loza Mencía</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fürnkranz</surname>
          </string-name>
          ,
          <article-title>Eficient multilabel classification algorithms for large-scale problems in the legal domain</article-title>
          ,
          <year>2010</year>
          . URL: http://dx.doi. org/10.1007/978-3-
          <fpage>642</fpage>
          -12837-0_
          <fpage>11</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>642</fpage>
          -12837-0_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fergadiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <article-title>Large-scale multi-label text classification on eu legislation</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>02192</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>