<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Chowdhury);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Topic Modelling on the European AI Act</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ettore Carbone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alex Giulio Berton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Purbasha Chowdhury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Scantamburlo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Falcarin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ca' Foscari University of Venice</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>European Centre for Living Technology</institution>
          ,
          <addr-line>Ca' Bottacin, Dorsoduro 3911, Calle Crosera, 30123 Venice</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Trieste</institution>
          ,
          <addr-line>via Economo 12/3, 34123 Trieste</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This work explores knowledge acquisition and representation tools for automatically creating a high-level model representation of the European regulation on artificial intelligence, commonly known as AI ACT. We utilized BERTopic for extracting topics and we also focused on the comparative analysis with other language models based on the topic extractions and representations. Natural language processing and comprehension of legal text is becoming important as legal texts are often interconnected with a large number of other related materials. Therefore, legal text analysis requires technologies which are able to extract important topics and representing them into a comprehensive form, in order to correctly inform the requirements engineering process.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Requirements Elicitation</kwd>
        <kwd>Legal Compliance</kwd>
        <kwd>Information extraction</kwd>
        <kwd>Topic Modelling</kwd>
        <kwd>AI ACT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Our aim is using topic modelling on a legal text, with particular focus on EU directives that are afecting
software systems and services, and we used the EU AI Act as a case study, as recent works on AI Act
compliance showed the limits of manual elicitation and validation of requirements from a huge corpora
of legal text [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
2.1. The AI ACT
The AI Act is part of a broader EU strategy designed to enhance Europe’s potential to compete globally
in regulating the digital sector. A number of proposed laws would establish regulatory bodies at the
EU and member state level, with broad investigative and enforcement powers, including an European
Artificial Intelligence Board to oversee the AI Act, while a European Board for Digital Services is
responsible for the Digital Services Act.
      </p>
      <p>The AI Act proposal delineates four distinct risk categories and sets specific requirements accordingly.
These categories are:
• Unacceptable Risk (Title II - Art. 5 and following);
• High Risk (Title III - Art. 6 and following);
• Limited Risk (Art. 52);
• Minimal Risk / No Risk;
Companies developing or deploying high-risk AI systems must comply with various requirements,
including having an appropriate risk management system, logging capabilities, and human oversight
(see Chapter 2 of the AI Act). For systems deemed to pose an unacceptable risk, which is outright
prohibited, the Act provides explicit examples and exceptions, including the utilization of real-time
remote biometric identification in public spaces (such as facial recognition), social scoring systems
(classifying individuals based on behaviour, socio-economic status, or personal characteristics), and the
use of subliminal manipulation techniques targeting specific vulnerable groups (Art. 5).</p>
      <p>High-risk systems are permitted, but due to their ability to negatively afect safety or fundamental
rights, they must comply with multiple requirements and undergo a compliance assessment throughout
their life cycle, including before and after being deployed. High-risk systems are further divided into
two categories (Art. 6, Annexes II and III):
• AI systems intended to serve as safety components in products covered by the legislation listed
in Annex II, or subject to third-party ex-ante conformity assessment (e.g., toys, aviation, cars,
medical devices, and lifts).
• Stand-alone AI systems with mainly fundamental rights implications, listed in Annex III, that
will have to be registered in an EU database.</p>
      <p>Examples of high-risk systems include those related to critical infrastructure management, systems
in hiring processes or employee ratings, credit scoring systems, and systems with critical impact on
law enforcement and interpretation of law. The recent division of AI Act includes the AI systems
into three main risk categories for compliance purposes: Unacceptable, High, and General Purpose AI
(GPAI). Originally, Limited and Minimal Risk were separate categories, but the final version afiliates
these into general guidelines with minimal obligations. The compromise agreement dated 26th January
2024 formally introduces GPAI models (Articles 52a–52e), whose risk-level needs to be assessed on a
case-by-case basis.</p>
      <p>Similar to the GDPR, proper data governance must be applied to users’ (and, more broadly, data
subjects’) data, but, in comparison to the data protection regulatory framework, the AI Act goes further
by requiring data governance for data used in the training, testing, and validation of AI systems.</p>
      <sec id="sec-2-1">
        <title>2.2. Related work</title>
        <p>
          Topic modelling is used to extract common themes or to cluster similar documents, often serving
as a foundation for more complex NLP tasks in the legal domain, such as retrieving similar cases or
classifying legal documents. It can also assist legal experts in the annotation of legal texts by providing a
preliminary grouping of related materials, thereby improving eficiency for a labour and time-intensive
task. Alternatively, topic modelling can be used to enhance datasets manually curated by experts,
leading to improved accuracy in document classification [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It is also a relevant methodology to
automatically annotate metadata as demonstrated by Tuarob et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          More broadly, topic modelling contributes to ongoing eforts to leverage NLP techniques for the
analysis and management of legal texts (for a complete overview of the field see this survey: [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]),
or what has been called Legal AI [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Topic modelling of legal texts can be done in various ways.
A standard approach is based on Latent Dirichlet Allocation (LDA) techniques [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ], a generative
approach that models words in documents as being probabilistically sampled from underlying latent
variables [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          A more recent approach is based on clustering approaches over vector representation of the text
(embeddings), where the vectors are created using transformer-based models [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ]. Cabot et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
have proposed a method to verify GDPR compliance in data processing agreements using NLP to
compare contractual language with mandatory legal provisions. Similarly, Lippi et al. [18] introduced
a new framework that uses machine learning and NLP techniques to evaluate legal compliance by
automatically extracting processes for obligations and constraints from the regulatory texts. Natural
Language Processing (NLP) approaches to the analysis of legal documents have largely relied on topic
modelling. Motivated by the overabundance of often interlinked materials, this technique has been
applied across various pieces of legislation ranging from court decisions [19] to national and regional
statutory laws [20, 21].
        </p>
        <p>A recent study investigated the efectiveness of combining topic modelling techniques with
contextualized embeddings and various preprocessing strategies to organize and analyse large collections
of Brazilian legal documents across diverse formats and lengths [22]. Another related study applies
topic modelling to improve the semantic retrieval and summarization of court judgments, enabling
more efective matching of user queries and highlighting the most relevant content to enhance search
eficiency and user understanding [23].</p>
        <p>Our work applies and compares topic modelling techniques to the analysis of the European AI
Act, with the aim of exploring how extracted topics could be used to represent the legal text and,
possibly, support preliminary phases of legal compliance. For example, topic modelling could serve
as a foundation for creating or extending legal ontologies or for building structured representations
of legal texts that support machine-readable compliance frameworks, similar to approaches used for
cybersecurity regulation in IoT governance [24].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In our work, the pipeline adopted to retrieve the topic from the whole legal text is divided into the
following two main steps:
1. Segmentation: The text was segmented into shorter passages to serve as input for topic modelling.
2. topic modelling: We applied a topic modelling approach to cluster similar chunks of the legal text
and present them with a human-understandable representation.</p>
      <p>For the segmentation step, we applied three diferent approaches in order to produce diferent topic
representations of the whole original text.</p>
      <p>
        We design our experiments to mimic what an average skilled practitioner might want to apply to
create a topical representation of a legal text, and therefore we employ techniques that are widely
adopted among the machine-learning community and that are easily accessible. In particular, to perform
topic modelling we applied: Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], BERTopic [25]; and Large Language
Models (LLMs) [26] prompting technique, specifically ChatGPT 4o[27].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Text segmentation</title>
        <p>To prepare the dataset for topic modelling, we split the text of the AI Act into small chunks from the
oficial web page of the legislation 1. By examining the HTML structure of the document, we devised a
hierarchical division, where nested tags represented diferent levels of paragraphs. First we separated
the recitals, enacting terms, and annexes. When present, we further divided the content into chapters
and sections based on the document structure. The individual articles were easily identifiable, as they
were enclosed in div tags, and their internal structure followed three distinct levels of depth using
table and p tags. The final output was a table in which each row contained a paragraph along with its
corresponding metadata (e.g. the part and article it belongs to) and a unique identifier. We chose the
numbered paragraph as the unit of our topic analysis, as it represents a coherent piece of text conveying
self-contained semantics. A summary of the dataset’s statistics is presented in Table 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Topic modelling</title>
        <p>We assume that the segmentation technique employed outputs a set of paragraphs {}=0, where 
represents the -the paragraph, and  is the total number of paragraphs; in our case,  = 2594. Note
that these correspond to the numbered paragraphs in the text of the AI Act. We assume that each  can
be associated with at least one topic, and to allow the comparison between diferent topic modelling
strategies, we further assume that each paragraph  can be broadly represented with only one topic ,
with 1 ≤  ≤  , where  is the total number of topic present in the collection of paragraphs. Thus, we
define a topic modelling technique as a function  which, given a paragraph , returns a topic , i.e.,
 () = . In the following subsections, we provide a brief description of the three methods adopted.
LDA LDA is a standard technique used to perform topic modelling presented for the first time by
Blei et al. [28] in 2001, and it is still considered one of the main baselines to be used in a comparative
analysis and a vast literature of applications and improved implementation has been developed during
recent years [29].</p>
        <p>LDA, using a standard statistical terminology, is a mixture model, in which each document (or
paragraph) is a mixture of topics, and each topic is a mixture of words. The model is usually described as
a Bayesian Network, and in our experiments, we adopted an online variational Bayes algorithm [30] to
approximate the target posterior distributions using the implementation provided by scikit-learn [31].</p>
        <p>It should be noted that LDA does not associate a single topic with each paragraph, but instead, a
probability distribution of topics is associated. However, to allow a better comparison with the other
strategies, we pick the topic with the highest probability as the topic associated with the paragraph.
Formally, if (,  ) is the LDA function that estimates the probability that the topic  is present
in a specific paragraph . Thus, given a paragraph , topic model based on LDA  associates the
topic with the highest probability among all the available topic:
() = arg max (,  )

with 1 ≤  ≤ 
BERT Topic The BERT Topic[25] approach is instead based on the assumption that the vector
representation produced by an encoder-only pre-trained language model as BERT, can represent the
paragraphs semantically in the vector space created at the end of the encoding phase.</p>
        <p>In particular, BERT Topic assigns topics to a set of paragraphs employing the following steps:
1. The paragraphs are encoded into the vector space (also called embedding space).
2. The embeddings are down-projected to a lower-dimensional space.
3. The down-projected embeddings are clustered together using a specific similarity measure
between vectors.</p>
        <p>4. A representation for each cluster is provided to the end user.</p>
        <p>For the first step we used the default SBERT model to generate the embeddings of the documents. For
the second step we decided to use CountVectorizer from scikit-learn removing English stop words. For
the third step, we used K-means, experimenting with diferent numbers of clusters and selecting the
one that yielded the highest in-group similarity.</p>
        <p>LLM Prompting The last strategy we explored leverages the direct use of a large language model
(LLM) to assign topics to each paragraph through prompt engineering. In this case, we employed
ChatGPT-4o. The motivation for adopting this approach lies in its accessibility: unlike traditional topic
modelling algorithms, which often require technical expertise for data preparation, parameter tuning,
and model evaluation, an LLM-based solution can, in principle, be used by any layperson without
specialized knowledge. Through a simple natural language prompt, non-experts can obtain a high-level
overview of the topics addressed in the AI Act, as well as identify where these topics are discussed
across the text. So in our research we wanted to simulate this possibility and compare it with the other
methods. For this experiment, we provided ChatGPT-4o with the following instruction: “Perform topic
modelling of the column ’text’ of the attached data file in CSV format and provide the result of the topic
modelling in a CSV format including topic number, keywords, and number of documents associated.”
Notably, we did not supply any example of the expected output format but the model successfully
produced the requested results in the desired structure.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental evaluation</title>
      <p>All experiments were conducted using the Python programming language, employing several libraries
to facilitate model implementation and evaluation. For the BERTopic-based approach, we utilized
the BERTopic library, with KMeans as the clustering algorithm and CountVectorizer (both from the
scikit-learn library) employed for vectorization, removing English stop words from the input.</p>
      <p>For the Latent Dirichlet Allocation (LDA) model, we also relied on the implementation provided by
scikit-learn. Hyperparameters such as the maximum number of iterations and the learning method
were optimized through empirical testing across multiple runs.</p>
      <p>To determine the optimal number of topics for both BERTopic and LDA, we developed a heuristic
algorithm that systematically evaluated cluster sizes ranging from 5 to 10. For each candidate number of
clusters, we computed a cohesion score based on the average pairwise similarity of the topic keywords.
Specifically, we encoded the keywords using Sentence Transformer embeddings and calculated their
cosine similarity, subsequently averaging these scores for each topic and across all topics at a given
cluster size. The number of topics yielding the highest average similarity score was selected, as it
indicates more internally coherence.</p>
      <p>In the case of ChatGPT-4o, no constraints were imposed on the number of topics. The large language
model was prompted directly and allowed to determine both the number of topics and their composition
based on its internal understanding of the text. Following topic assignment at the paragraph level,
post-processing was performed using the Pandas library to aggregate the results by topic, enabling
consistent comparison with the outputs generated by the other methodologies.</p>
      <sec id="sec-4-1">
        <title>4.1. Analysis of results</title>
        <sec id="sec-4-1-1">
          <title>We decided to present results as top ten keywords per topic as shown in Table II.</title>
          <p>The distribution of topics for each methodology is shown in Figure 1. The three approaches produced
a diferent number of topics: BERTopic generated 6 topics, LDA produced 10 topics, and ChatGPT
identified 5 topics. As can be observed, the BERTopic model produces an almost uniform distribution
across topics, with the exception of Topic 1, which contains a significantly higher number of paragraphs
compared to the others. ChatGPT yields similar results, although two of its topics include more than
250 paragraphs each. Conversely, the LDA model generates the highest number of topics, resulting in a
more uneven distribution: Topic 1 includes approximately 300 paragraphs, while Topics 5 and 6 contain
fewer than 10 paragraphs each.</p>
          <p>A preliminary qualitative analysis of the identified topics and their corresponding clustered
paragraphs revealed mixed results. The topic modelling outcomes were manually examined and discussed
in meetings among the authors. A more in-depth analysis (e.g., through coding and thematic analysis)
is planned for future work to obtain more fine-grained insights into the results of the topic modelling
techniques. Qualitative analysis suggests that, in some cases, the topic keywords were well-aligned with
the content of the assigned paragraphs, indicating meaningful and coherent clusters. For example, in
the case of Topic 6 from the LDA division, the content of paragraph 3 of article 2 shows high coherence
with the identified keywords. However, for other topics, the clustered paragraphs were considerably
more heterogeneous, suggesting that the models may have grouped together paragraphs covering
diverse or loosely related content under a single topic label. For instance, recital 98 states: “Large
generative AI models are a typical example of a general-purpose AI model, given that they allow for
lfexible generation of content, such as in the form of text, audio, images or video, that can readily
accommodate a wide range of distinctive tasks.” This paragraph was grouped under Topic 5, even
though none of the top keywords appeared in the text and its content lacked coherence with the other
documents assigned to that topic.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future work</title>
      <p>
        In this paper, we propose a method to perform topic mining on the legal articles of the AI Act European
Directive. We used BERT, LDA and GPT methods to do topic analysis, with the goal of analysing
legal requirements using a higher level representation. Our goals were also to compare diferent topic
modelling approaches in order to test their behaviour on the same task and assess their limits. Building
on approaches such as those proposed by Hagen et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], domain experts could assess both the
quality of the generated topic keywords and the coherence between these keywords and the associated
paragraphs.
      </p>
      <p>Qualitative analysis has been used to create formal taxonomies out of cybersecurity reports in natural
language [32]. Similarly, automated topic modelling could support the development of such formal
representations for cybersecurity [33], but also for legal texts, such as the AI Act. This hybrid approach
may serve as a premise for constructing a structured knowlede sege base that facilitates systematic
exploration of the regulation’s thematic content. Additionally, topic modelling aims to be applied on
other legal directives such as NIS2, cyber-resilience etc. as well as software requirement specifications
of diferent enterprises. The methodologies of this work could also be employed as groundworks for
other tasks, such as semantic search as proposed by Ma et al. [34], summarization as done by Haghighi
et al. [35], or as a guiding tool in the construction of knowledge graphs from the AI ACT and other legal
directives [36], in order to improve the translation from unstructured text to data model for requirement
compliance analysis.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors want to thank Dr Alberto Veneri for his useful insights and for reviewing our work.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>[18] M. Lippi, P. Pałka, G. Contissa, F. Lagioia, H.-W. Micklitz, G. Sartor, P. Torroni, Claudette: an
automated detector of potentially unfair clauses in online terms of service, Artificial Intelligence
and Law 27 (2019) 117–139.
[19] B. J. U. Razon, G. A. Solano, L. T. B. Ranera, Topic modelling supreme court case decisions
using latent dirichlet allocation, in: 2022 13th International Conference on Information and
Communication Technology Convergence (ICTC), IEEE, 2022, pp. 284–289.
[20] J. O’Neill, C. Robin, L. O’Brien, P. Buitelaar, An analysis of topic modelling for legislative texts,
in: Proc of the 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts
(ASAIL’17), CEUR Workshop Proceedings, London, UK, 2017.
[21] A. Dyevre, M. Glavina, M. Ovádek, The voices of european law: legislators, judges and law
professors, German Law Journal 22 (2021) 956–982.
[22] D. Vianna, E. S. de Moura, A. S. da Silva, A topic discovery approach for unsupervised organization
of legal document collections, Artificial Intelligence and Law 32 (2024) 1045–1074.
[23] T.-H. Wu, B. Kao, F. Chan, A. S. Cheung, M. M. Cheung, G. Yuan, Y. Chen, Semantic search and
summarization of judgments using topic modeling, in: Legal Knowledge and Information Systems,
IOS Press, 2021, pp. 100–106.
[24] S. S. Chennu, L. Elluri, G. Batra, Bridging AI and legal compliance: Knowledge graphs for IoT
cybersecurity regulations, in: AMCIS 2025 Proceedings, 22, Assoc. for Information Systems, 2025.
[25] M. Grootendorst, Bertopic: Neural topic modeling with a class-based tf-idf procedure, arXiv
preprint arXiv:2203.05794 (2022).
[26] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, W. Ye,
Y. Zhang, Y. Chang, P. S. Yu, Q. Yang, X. Xie, A survey on evaluation of large language models,
ACM Trans. Intell. Syst. Technol. 15 (2024). doi:10.1145/3641289.
[27] OpenAI, Hello GPT-4o, https://openai.com/index/hello-gpt-4o/, 2025. Accessed: 2025-06-10.
[28] D. Blei, A. Ng, M. Jordan, Latent Dirichlet allocation, Advances in neural information processing
systems 14 (2001).
[29] U. Chauhan, A. Shah, Topic modeling using latent dirichlet allocation: A survey, ACM Comput.</p>
        <p>Surv. 54 (2021). doi:10.1145/3462478.
[30] M. D. Hofman, D. M. Blei, C. Wang, J. Paisley, Stochastic variational inference, J. Mach. Learn.</p>
        <p>Res. 14 (2013) 1303–1347.
[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay,
Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011).
[32] M. Ceccato, P. Tonella, C. Basile, P. Falcarin, M. Torchiano, B. Coppens, B. De Sutter, Understanding
the behaviour of hackers while performing attack tasks in a professional setting and in a public
challenge, Empirical Software Engineering 24 (2019) 240–286.
[33] P. Falcarin, F. Dainese, Building a cybersecurity knowledge graph with Cybergraph, in: Proc of
the 2024 ACM/IEEE 4th International Workshop on Engineering and Cybersecurity of Critical
Systems (EnCyCriS) and 2024 IEEE/ACM Second International Workshop on Software Vulnerability,
EnCyCriS/SVM ’24, ACM, 2024, p. 29–36. doi:10.1145/3643662.3643962.
[34] B. Ma, N. Zhang, G. Liu, L. Li, H. Yuan, Semantic search for public opinions on urban afairs: A
probabilistic topic modeling-based approach, Information Processing &amp; Management 52 (2016).
[35] A. Haghighi, L. Vanderwende, Exploring content models for multi-document summarization,
in: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North
American Chapter of the Association for Computational Linguistics, NAACL ’09, NAACL, 2009.
[36] P. Falcarin, P. Chowdhury, E. Carbone, T. Scantamburlo, R. Tripodi, S. Vascon, Legal
requirements compliance using NLP and Knowledge Graphs, in: Proc. of the 1st Intern. Workshop on
Requirements Engineering for Accountable and Conscious Human-centered AI, REACH-AI 25,
IEEE, 2025.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Union</surname>
          </string-name>
          ,
          <article-title>Laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts</article-title>
          ,
          <year>2021</year>
          . URL: https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=celex%3A52021PC0206.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bertaina</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Biganzoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Desiante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fontanella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Inverardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. G.</given-names>
            <surname>Penco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Cosentini</surname>
          </string-name>
          ,
          <article-title>Fundamental rights and artificial intelligence impact assessment: A new quantitative methodology in the upcoming era of AI act</article-title>
          ,
          <source>Computer Law &amp; Security Review</source>
          <volume>56</volume>
          (
          <year>2025</year>
          )
          <fpage>106101</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>UN</surname>
          </string-name>
          <article-title>Ofice of the Secretary-General's Envoy on Technology, High-level advisory body on</article-title>
          <source>AI</source>
          ,
          <year>2023</year>
          . URL: https://www.un.org/techenvoy/ai-advisory-body.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burrell</surname>
          </string-name>
          ,
          <article-title>How the machine 'thinks': Understanding opacity in machine learning algorithms</article-title>
          ,
          <source>Big Data &amp; Society</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <article-title>2053951715622512</article-title>
          . doi:
          <volume>10</volume>
          .1177/2053951715622512.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Selbst</surname>
          </string-name>
          ,
          <article-title>Big data's disparate impact</article-title>
          ,
          <source>SSRN Electronic Journal</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .2139/ ssrn.2477899.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Scantamburlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Charlesworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cristianini</surname>
          </string-name>
          ,
          <article-title>Machine decisions and human consequences</article-title>
          , CoRR abs/
          <year>1811</year>
          .06747 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1811</year>
          .06747. arXiv:
          <year>1811</year>
          .06747.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Scantamburlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Falcarin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Veneri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fabris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gallese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Billa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rotolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcuzzi</surname>
          </string-name>
          ,
          <article-title>Software systems compliance with the AI Act: Lessons learned from an international challenge</article-title>
          ,
          <source>in: Proc. of the 2nd International Workshop on Responsible AI Engineering</source>
          , RAIE '24,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2024</year>
          , p.
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . doi:
          <volume>10</volume>
          .1145/3643691.3648589.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L. J. G.</given-names>
            <surname>Freitas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          , G. Rodrigues,
          <string-name>
            <given-names>P.</given-names>
            <surname>Edokawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farias</surname>
          </string-name>
          ,
          <article-title>Text clustering applied to data augmentation in legal contexts</article-title>
          ,
          <source>arXiv preprint arXiv:2404.08683</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tuarob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Pouchard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>A generalized topic modeling approach for automatic document annotation</article-title>
          ,
          <source>International Journal on Digital Libraries</source>
          <volume>16</volume>
          (
          <year>2015</year>
          )
          <fpage>111</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>ARIAI</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. DEMARTINI</surname>
          </string-name>
          ,
          <article-title>Natural language processing for the legal domain: A survey of tasks, datasets, models, and challenges</article-title>
          ,
          <source>ACM Comput. Surv</source>
          <volume>1</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , Ass. for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>5218</fpage>
          -
          <lpage>5230</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>466</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mandal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Measuring similarity among legal court case documents</article-title>
          ,
          <source>in: Proceedings of the 10th annual ACM India compute conference</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <article-title>Content analysis of e-petitions with topic modeling: How to train and evaluate lda models?</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>54</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1016/j.ipm.
          <year>2018</year>
          .
          <volume>05</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>D. M. Blei</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>M. I. Jordan</given-names>
          </string-name>
          , Latent dirichlet allocation,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>3</volume>
          (
          <year>2003</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Silveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Araujo</given-names>
            <surname>Monteiro Neto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Furtado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Pimentel</surname>
          </string-name>
          <string-name>
            <surname>Filho</surname>
          </string-name>
          ,
          <article-title>Topic modelling of legal documents via Legal-BERT</article-title>
          ,
          <source>in: Proc. of the 1st Intern. Workshop RELATED - Relations in the Legal Domain</source>
          <year>2021</year>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .2139/ssrn.4539091.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sargeant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Izzidien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Stefek</surname>
          </string-name>
          ,
          <article-title>Topic classification of case law using a large language model and a new taxonomy for uk law: Ai insights into summary judgment</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>P.-L. H. Cabot</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , Rebel:
          <article-title>Relation extraction by end-to-end language generation</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>2370</fpage>
          -
          <lpage>2381</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>