<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Pre-trained LLMs for GDPR Compliance in Online Privacy Policies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Ciaramella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Petrillo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Margaret Varilek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Mercaldo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Comandé</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Martinelli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IMT School for Advanced Studies Lucca</institution>
          ,
          <addr-line>Lucca</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for High Performance Computing and Networking of CNR</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Informatics and Telematics of CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sant'Anna School of Advanced Studies</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Molise</institution>
          ,
          <addr-line>Campobasso</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article explores the use of Large Language Models (LLMs) to determine if online privacy policies comply with the General Data Protection Regulation (GDPR) since privacy policies do not always adhere to all relevant GDPR requirements. This paper proposes a method to classify privacy policies as compliant or not with a single duty within Article 13(2)(b) of the GDPR, which mandates that data subjects be informed of their right to rectification or erasure of personal data. To address that, we employed several LLMs such as BERT-base-uncased, roBERTa-base, distilBERT-base-uncased, t5-base, and ERNIE-2.0-base-en on a dataset built by the authors from European websites domains. Moreover, once the dataset was built, a legal expert from our research team manually classified a set of privacy policies to perform the contextual sentence similarity task. As the final step, we employed a set of unseen privacy policies to test models, obtaining interesting results demonstrating moderate accuracy in identifying compliant phrases using these thresholds. Future research could include expanding the analysis to encompass other GDPR requirements and refining the models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>Privacy Policy</kwd>
        <kwd>Compliance</kwd>
        <kwd>Legal</kwd>
        <kwd>AI</kwd>
        <kwd>Cybersecurity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the European Union, privacy policies communicated by the data controller or processor to inform data
subjects visiting a website cannot easily be evaluated in terms of “compliant” versus “non-compliant”
by a human being. There may be aspects within a policy that might be compliant, and aspects which
are likely to be considered not aligned with the General Data Protection Regulation (EU) 2016/679.
The nature of GDPR compliance or privacy policy compliance are not black and white concepts to be
determined about the entirety of the policy based on the face of the text of a policy alone. Therefore,
to be “checkable” this bigger goal had to be broken down into smaller, verifiable steps. GDPR article
13(2)(b) (duty to inform the data subject of the right to rectification or erasure) was chosen to be tested
as a first experiment. Since this task can be challenging for a human being, it can also be dificult
for an AI model. However, it is still possible to leverage powerful models like the Large Language
Model (LLMs) to assist humans in performing this time-consuming task. These models can be trained
to dissect the text of privacy policies into specific terms and clauses and flag those which are relevant
to selected articles of the GDPR. Additionally, they can analyze the phrasing within privacy policies
and compare this to the language of the GDPR. They can spot inconsistencies, missing information or
overly vague language that could fall short of compliance benchmarks. LLMs have the advantage of
working with many texts in a short amount of time, making it possible to look at several privacy policies
from various organizations or services at once. This scalability is helpful for big companies looking to
ensure compliance over separate platforms, such as Google, which operates various services in Europe:
Google Search, YouTube, and Google Drive, each requiring adherence to diferent privacy policies.
Using these capabilities, the goal is that eventually enterprises could utilize LLMs to support their
compliance activities, mitigate risks, and confirm that their privacy agreements meet the jurisdiction’s
legal requirements, e.g., GDPR. We choose most suitable LLM to identify the language corresponding to
the roBERTa-selected text in privacy policies. This would then allow testing whether a sentence appears
"compliant" with a single requirement. In order to achieve this goal, we employed state-of-the-art
text embedding models (like all-mpnet-base-v2 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and all-distilroBERTa-v1 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). They excel at tasks
that require understanding the semantic meaning of sentences, such as semantic similarity, clustering,
and information retrieval. In addition, we relied on LLMs (like state-of-the-art BERT and variants)
since they are designed for a broader range of tasks related to natural language understanding and
generation, such as text generation, summarization, translation, and question answering. Preliminary
results show that text classifiers returned “compliant” phrases, as verified by a human check. This
step was performed by calculating three similarity scores (cosine similarity, dot product, and euclidean
distance) between the manually selected privacy policy sentences and the Art. 13(2)(b). These metrics
values are used to determine if a given sentence policy complies with the GDPR Article 13(2)(b). Future
research could expand to additional GDPR requirements that make up a typical privacy policy, such as
GDPR Art 13 (2)(d) which requires data subjects to be informed of the right to lodge a compliant with a
supervisory authority. An important limit to bear in mind in this research process of using LLMs to
check compliance of policies is that the quality of the result must be demonstrable in the words of the
text alone.
      </p>
      <p>The paper proceeds as follows: the next section shows an overview related to the state-of-the-art related
to the adoption of AI techniques in the legal field; in Section 3, the proposed method is presented, while
the results of the experimental evaluation performed are shown in Section 4 and, finally, conclusion
and future research lines are drawn in the last section.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Over the years, Artificial Intelligence has played a crucial role in several fields. AI profoundly impacts,
primarily through Natural Language Processing (NLP) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] and Large Language Models (LLMs) [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
In this Section, we provide a comprehensive study of the literature review showing contributions to the
state-of-the-art in which experts employ AI techniques in the legal field.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], researchers presented a new model named ITALIAN LEGAL-BERT, which can interpret the
semantic nuances of Italian legal texts with unprecedented accuracy. In detail, they proposed two
versions of that model using two diferent methods of domain adaptation: one based on Italian civil
cases and another on Italian legal documents based on the CamemBERT architecture. Diferent from
them, in our experiment, we proposed the usage of LLMs to verify the compliance of online privacy
policies, using a small dataset manually validated by a legal expert. Once the validation phase was
concluded, we used the dataset to fine-tune five diferent models belonging to state-of-the-art. After
the appearance of LLMs the approach changed significantly, at least partially abandoning the BERT
architectures.
      </p>
      <p>
        Rodriguez et al. in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presented a study in which the researchers employed LLMs to analyze privacy
policies by automating the extraction and interpretation of critical information at scale. In detail, the
authors reported performances obtained using ChatGPT and Llama 2 i.e., popular LLM-based tools. At
the end of the experiments, they achieved interesting results. Diferently from our work, researchers
employed four diferent datasets (MAPP, OPP-115, APP-350, and IT-100) belonging to several fields.
However, in our study we focused only on the Policy Privacy of websites with the European domain, in
English. The privacy policies were selected by the criterion that they were typical in the content and
style of phrasing of privacy policies found online so that they would be a better representative sample.
The text of online privacy policies tend to use standard phrases, and therefore the samples which
were selected are consistent with the numerous policies that were read for this exercise. Moreover, we
employed a large set of LLMs such as BERT-base-uncased, roBERTa-base, distilBERT-base-uncased,
t5-base, and ERNIE-2.0-base-en.
      </p>
      <p>
        Garza et al. in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] designed a novel approach named PrivComp-KG to manage and verify privacy policy
compliance eficiently. In detail, they leveraged LLMs and Retrieval Augmented Generation (RAG) to
identify relevant privacy policy sections and match them with corresponding legal rules, achieving
high score accuracy. On the other hand, in our proposed method, instead of using an open-science
dataset, we created a dataset from scratch, downloading several privacy policies from the web (by using
a Python script written by the authors) and then manually validated by the legal expert in our research
team, and we selected a specific clause of the GDPR Article 13(2)(b), which is the duty to inform the
data subject of the right to rectification and erasure of personal data.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The Method</title>
      <p>This Section provides an overview of the proposed method to classify a privacy policy that complies
with Art.13(2)(b) of the GDPR. Figure 1 illustrates the steps performed in detail.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset Composition</title>
        <p>To automate evaluating whether or not a given privacy policy complies with Art.13(2)(b) of the GDPR
as a preliminary stage, we developed a Python script to retrieve the Policy Privacy page of several
websites located within the EU. In detail, we defined the privacy policy slugs list with several slugs
typically employed in websites to store privacy policies like /privacy /privacy-policy, /privacy-notice,
/privacy-statement, /legal/privacy-policy, /legal/privacy, /policies/privacy, /about/privacy, /about-us/privacy,
/info/privacy, /support/privacy, /help/privacy, /terms/privacy, /resources/privacy, end /company/privacy.
Once we identified these websites, considering only websites with European domains, we employed
another Python script developed by the authors to extract the text from the retrieved URLs. The script
was designed to clean and format the extracted content, removing webpage elements such as buttons
and other non-text components. In detail, we considered only privacy policies in English, which were
divided into sentences, removing stop words and punctuation marks.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Manual Classification</title>
        <p>After composing the dataset and converting all of the privacy policies into text files, we moved on to
the next stage of the experiment: the manual classification phase to ensure a qualitative dataset. Human
annotators can guarantee that labels are accurate and consistent, which is crucial for training efective
machine learning models, especially in a supervised learning task as delicate as legal compliance.
Additionally, this process can provide transparency and help others understand how the dataset was
created. Given that this type of task is time-consuming and resource-intensive, we decided to take a
subset of privacy policies retrieved (10 out of 50). Due to the small amount of data (less than 30 samples),
we calculated the confidence interval using the t-distribution. The calculated confidence interval
(5.91,8.29) represents the range within which we are 95% confident the true mean of the population
lies. This interval is based on the sample mean (7.1) and the sample standard deviation (1.66), statistical
estimates which were derived from our data. The critical t-value (2.262) at a 95% confidence level sets the
margin of error, reflecting how much we expect the sample mean to vary from the population mean. The
resulting interval captures the inherent uncertainty in estimating population parameters from a small
sample and provides an actionable range for making informed decisions with high confidence. Once the
number of samples was identified, a legal expert in the group manual classified the ten selected privacy
policies. Regarding the manual classification phase, the first challenge, then, was to select a suitable
article from the GDPR to test the LLM and determine if its requirements were satisfied within the text
of online privacy policies. Article 13, “Information to be provided where personal data are collected
from the data subject” and Article 14, “Information to be provided where personal data have not been
obtained from the data subject” were the best choices in terms of checking for website information
because these two articles list the information that must be supplied to the data subject. Informing
data subjects of their rights (as part of the principle of transparency) is demonstrated by using clear
and plain language provided in an easy display, in a place where the data subject is likely to find it
(GDPR Rec. 39). More specifically, the clause 13(2)(b) “the existence of the right to request from the
controller access to and rectification or erasure of personal data” is a duty to inform of this right that
could be demonstrated as compliant on its face (meaning in its language alone). Essentially then this
means being compliant with the applicable articles of the GDPR that would be required to be present in
a privacy policy on a website that intends to gather or process personal data. Several sample privacy
policies that were viewed for this project were determined to be a mix of compliant and non-compliant
wording. A manual selection of privacy policies on websites were made from the criteria that they were
1.) in English, and 2.) in the EU. Other parameters that may help demonstrate “compliance” such as
location on the website and size of font, are outside the scope of this research project. The selection
of privacy policies would serve as one small test scenario on which the method could be tested, and
further expanded based on positive results, or at least adjusted and measured against reliable results.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Metrics Extraction</title>
        <p>
          The dataset composition is essential to training a model. For this reason, we decided to use a larger
dataset that provides more examples for the model to learn from, which can improve its ability to
generalize to unseen data [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. With only this data (those retrieved in Section 3.2), models may memorize
the training examples rather than learning the underlying patterns, leading to overfitting. A larger
dataset helps mitigate this risk by providing more diverse examples. Also, a larger dataset is more likely
to capture a wide variety of scenarios, edge cases, and variations in the data. This diversity helps the
model learn to handle diferent situations and improves its robustness. To speed up the process and
increase the dataset size, this work aims to carry out a semantic sentence similarity process. The goal is
to rely on Sentence Transformers, like [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and LLMs like [
          <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
          ] to obtain contextual embeddings
from the labeled sentences and calculate their similarity with Art. 13(2)(b). Once we calculated this
similarity, we used this score as a basis to label new unseen sentences from other privacy policies.
The assumption is that the calculated score between the manually labeled sentences and Art. 13(2)(b)
can be used as a threshold to evaluate the compliance of new sentences. Sentence transformers are
models designed to convert sentences into fixed-size vector representations, capturing their semantic
meaning. These models are used for tasks like semantic similarity, clustering, and information retrieval,
demonstrating excellent results [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. On the other hand, LLMs are advanced neural networks trained
on vast amounts of text data to understand and generate human-like text, enabling them to perform
a wide range of language-related tasks, demonstrating excellent results for this purpose [
          <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
          ].
Regarding the similarity score, we used three metrics commonly used in Natural Language Processing
(NLP): cosine similarity, dot product, and Euclidean distance. Cosine similarity measures the cosine of
the angle between two non-zero vectors in a multi-dimensional space. It is beneficial for measuring
the similarity of text data represented as vectors because it focuses on the orientation of the vectors
rather than their magnitude, making it practical for comparing documents of diferent lengths. The dot
product of two vectors is a scalar value, the sum of the products of their corresponding components.
It can indicate the degree of similarity between two vectors. A higher dot product suggests greater
similarity, but it is sensitive to the magnitude of the vectors, which can be a limitation when comparing
vectors of diferent lengths, which does not afect the case of Sentence Transformers since they produce
ifxed-size vector representations. At the same time, the Euclidean distance measures the straight-line
distance between two points (or vectors) in Euclidean space. It provides a measure of the absolute
distance between two vectors. NLP can be used to assess how similar or dissimilar two sentences are
based on their vector representations. Using the score obtained with the previously described metrics,
between each manually labeled sentence and Art.13(2)(b), we aimed to label a new unseen privacy
policy sentence as compliant or not if this score is equal to or greater.
        </p>
        <p>Text
Right to rectification and erasure: If your data is
incorrect or incomplete, you can request us to amend or
supplement it. You can also request that your personal
data be deleted. We strive to process your request as
quickly as possible, but at the latest within four weeks.</p>
        <p>Please note that we may not be able to delete all your
data because we are bound by certain laws, such as
the fiscal retention obligation.</p>
        <p>Esri will permit you to access, correct, or delete
your information in our database by contacting
accounts@esri.com or by logging in to your account
at https://my.esri.com/ and making the appropriate
changes or selecting the “Remove my account” option.</p>
        <p>We will respond to all requests for access within a
reasonable timeframe.</p>
        <p>Link
https://www.twence.com/
privacy-statement
https://www.esri.com/en-us/
privacy/privacy-statements/
privacy-statement</p>
        <p>Compliant</p>
        <p>Yes
Yes</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Experiments</title>
        <p>
          As explained, we obtained the sentence embeddings from both Sentence Transformers and Large
Language Models to perform the contextual sentence similarity task. Regarding the first type, we
selected the five best models based on their average performance on encoding sentences over 14 diverse
tasks from diferent domains 1, reported in Table 2. Regarding the LLMs, we selected the BERT model [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ],
which is the first bi-directional (or non-directional) pre-trained language model (BERT-base-uncased),
and two diferent variants of the latter that are DistilBERT [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] (distilBERT-base-uncased) aiming to
optimize BERT’s performance and RoBERTa [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (roBERTa-base), short for "Robustly Optimized BERT
Approach." In addition, we employed the T5 model (t5-base), or Text-to-Text Transfer Transformer, a
large language model [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] developed by Google AI that uses a text-to-text approach. Finally, ERNIE
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], Enhanced Representation through kNowledge IntEgration (ERNIE-2.0-base-en) consists of two
stacked modules: a textual encoder and a knowledgeable encoder, which is responsible for integrating
extra token-oriented knowledge information into textual information.
        </p>
        <sec id="sec-3-4-1">
          <title>1https://www.sbert.net/docs/sentence_transformer/pretrained_models.html#original-models</title>
          <p>all-mpnet-base-v2
multi-qa-mpnet-base-dot-v1
all-distilroberta-v1
all-MiniLM-L12-v2
multi-qa-distilbert-cos-v1
Description
All-round model tuned for many use-cases. Trained on a large
and diverse dataset of over 1 billion training pairs
This model was tuned for semantic search: Given a
query/question, it can find relevant passages. It was trained on a large and
diverse set of (question, answer) pairs
All-round model tuned for many use-cases. Trained on a large
and diverse dataset of over 1 billion training pairs
All-round model tuned for many use-cases. Trained on a large
and diverse dataset of over 1 billion training pairs
This model was tuned for semantic search: Given a
query/question, it can find relevant passages. It was trained on a large and
diverse set of (question, answer) pairs</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>As described in the previous section, we used the ten manual classified sentence policies and calculated
their similarity scores based on GDPR Art.13(2)(b) right to rectification and erasure. Table 3 reports
the average score metrics for each model. We used these scores as a threshold to classify new unseen
privacy policy sentences automatically. In order to test this approach, we randomly extracted a privacy
policy from the dataset created in Section 3.1. We split it into sentences, submitted them to each model,
and calculated the metrics described in Section 3.3, i.e., cosine similarity, dot product, and Euclidean
distance against Art.13(2)(b). After this process, we extracted 168 privacy policy. Whenever one of the
calculated metrics satisfied the previously calculated threshold, we labeled the sentence as compliant.
Table 4 shows the results of this phase after the predictions extracted by the models were manually
checked. The Table shows the false positive results (where the sentence was predicted as compliant but
were not) and the false negatives (i.e., the sentence was predicted as not compliant but actually was
compliant). While certain models exhibited a high success rate, some failed due to high false positive
occurrences. Among the models tested, multi-qa-distilBERT-cos-v1 and all-MiniLM-L12-v2 were the
best-performing models, as each model produced two false positives and one false negative. This implies
that these models possess a vague but dependable general semantic understanding with no adjustment
or fine-tuning for legal text. The roBERTa-base and distilBERT-base-uncased models predicted many
non-compliant sentences as compliant. It means roBERTa-base misclassified 55 sentences, attributing
them to the compliant position because of a tendency to superficially overlap with the semantics of
those in the compliant position. Since all models used in this experiment were pre-trained and not
ifne-tuned on GDPR-specific datasets, their performance reflects their general-purpose capabilities
rather than specialized legal expertise. Therefore, this means that the models were not legal experts
in any particular area but were general models. Indeed, the pretrained models were able to assist in
providing information about the importance of the semantic similarity of privacy policy sentences to
the requirements of GDPR.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>In this preliminary experiment, we designed a methodology to identify phrases within websites’
European English language privacy policies, so that these phrases could be used to determine if they
fulfill the requirement imposed by the GDPR on data controllers to inform the data subject of the right
to rectify or erase personal data collected from them (Art. 13(2)(b)) by leveraging LLMs. As a first step,
we employed a Python script written by authors to retrieve websites’ policies and create a preliminary
dataset. Next, we manually classified ten privacy policies and then we calculated several metrics like
Cosine Similarity Mean, Dot Product Mean, and Euclidean Distance Mean leveraging pre-trained models.
We tested the model using a single privacy policy that was not included in the training phase.
To move our research forward, we will refine our model using a Small Language Model (SML) with
cosine similarity as the evaluation metric. As a matter also of future work, a concern to be addressed is
the risk of a research design that leads to a self-fulfilling prophecy: Since many privacy policies in their
information sheets “mirror” the text of the GDPR, it is possible that the similarity score returns positive
results from a semantic point of view without reflecting actual compliance. Other next steps could be to
expand to other GDPR requirements that make up a typical privacy policy. Additionally, we will expand
our dataset by retrieving a diverse set of internet privacy policies. By leveraging few-shot learning
techniques, we strive to achieve high accuracy with limited data, optimizing our model’s ability to
generalize efectively to new examples.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by EU DUCA, EU CyberSecPro, SYNAPSE, PTR 22-24 P2.01
(Cybersecurity) and SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded
by the EU - NextGenerationEU projects, by MUR - REASONING: foRmal mEthods for computAtional
analySis for diagnOsis and progNosis in imagING - PRIN, e-DAI (Digital ecosystem for integrated
analysis of heterogeneous health data related to high-impact diseases: innovative model of care and
research), Health Operational Plan, FSC 2014-2020, PRIN-MUR-Ministry of Health, the National Plan
for NRRP Complementary Investments D∧3 4 Health: Digital Driven Diagnostics, prognostics and
therapeutics for sustainable Health care, Progetto MolisCTe, Ministero delle Imprese e del Made in
Italy, Italy, CUP: D33B22000060001, FORESEEN: FORmal mEthodS for attack dEtEction in autonomous
driviNg systems CUP N.P2022WYAEW and ALOHA: a framework for monitoring the physical and
psychological health status of the Worker through Object detection and federated machine learning, Call
for Collaborative Research BRiC -2024, INAIL - NextGenerationEU projects, SMAUG, Horizon Europe
GA number 101121129, Fit4MedRob: This work was supported by the Italian Ministry of Research, under
the complementary actions to the NRRP “Fit4MedRob - Fit for Medical Robotics” Grant (# PNC0000007)</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] https://huggingface.co/sentence-transformers/
          <article-title>all-mpnet-base-</article-title>
          <string-name>
            <surname>v2</surname>
          </string-name>
          ,
          <year>2024</year>
          . Online; accessed
          <issue>14</issue>
          <year>Dec 2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] https://huggingface.co/sentence-transformers/all-distilroberta-v1,
          <year>2024</year>
          . Online; accessed
          <issue>12</issue>
          <year>Dec 2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ariai</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Demartini, Natural language processing for the legal domain: A survey of tasks, datasets, models, and challenges</article-title>
          ,
          <source>arXiv preprint arXiv:2410.21306</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hartung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gerlach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jana</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. Bommarito</surname>
            <given-names>II</given-names>
          </string-name>
          ,
          <article-title>Natural language processing in the legal domain</article-title>
          ,
          <source>arXiv preprint arXiv:2302.12039</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Philip</surname>
          </string-name>
          ,
          <article-title>Large language models in law: A survey</article-title>
          ,
          <source>AI</source>
          Open (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Anh</surname>
          </string-name>
          , D.-T. Do,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Le</given-names>
            <surname>Minh</surname>
          </string-name>
          ,
          <article-title>The impact of large language modeling on natural language processing in legal texts: a comprehensive survey</article-title>
          ,
          <source>in: 2023 15th International Conference on Knowledge and Systems Engineering (KSE)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Licari</surname>
          </string-name>
          , G. Comandè,
          <article-title>Italian-legal-bert models for improving natural language processing tasks in the italian legal domain</article-title>
          ,
          <source>Computer Law &amp; Security Review</source>
          <volume>52</volume>
          (
          <year>2024</year>
          )
          <fpage>105908</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Del Alamo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sadeh</surname>
          </string-name>
          ,
          <article-title>Large language models: a new approach for privacy policy analysis at scale</article-title>
          ,
          <source>Computing</source>
          <volume>106</volume>
          (
          <year>2024</year>
          )
          <fpage>3879</fpage>
          -
          <lpage>3903</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Garza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Elluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kotal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piplai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          , Privcomp-kg:
          <article-title>Leveraging knowledge graph and large language models for privacy policy compliance verification</article-title>
          ,
          <source>arXiv preprint arXiv:2404.19744</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rajput</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Chen</surname>
          </string-name>
          ,
          <article-title>Evaluation of a decided sample size in machine learning applications</article-title>
          ,
          <source>BMC bioinformatics 24</source>
          (
          <year>2023</year>
          )
          <fpage>48</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sarzynska-Wawer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wawer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pawlak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Szymanowska</surname>
          </string-name>
          , I. Stefaniak,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarkiewicz</surname>
          </string-name>
          , L. Okruszek,
          <article-title>Detecting formal thought disorder by deep contextualized word representations</article-title>
          ,
          <source>Psychiatry Research</source>
          <volume>304</volume>
          (
          <year>2021</year>
          )
          <fpage>114135</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. G. Carbonell, R. Salakhutdinov,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          , CoRR abs/
          <year>1906</year>
          .08237 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1906</year>
          .08237. arXiv:
          <year>1906</year>
          .08237.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ormerod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martínez del Rincón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Devereux</surname>
          </string-name>
          ,
          <article-title>Predicting semantic similarity between clinical sentence pairs using transformer models: Evaluation and representational analysis</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <article-title>e23099</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Freestone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K. K.</given-names>
            <surname>Santu</surname>
          </string-name>
          ,
          <article-title>Word embeddings revisited: Do llms ofer something new?</article-title>
          ,
          <source>arXiv preprint arXiv:2402.11094</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>M. T. R. Laskar</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , E. Hoque,
          <article-title>Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task</article-title>
          ,
          <source>in: Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5505</fpage>
          -
          <lpage>5514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          , ArXiv abs/
          <year>1910</year>
          .01108 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          , CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          . URL: http://jmlr.org/papers/v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tian</surname>
          </string-name>
          , H. Wu, Ernie:
          <article-title>Enhanced representation through knowledge integration</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>09223</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>