<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Lilli);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Lupus Alberto: A Transformer-Based Approach for SLE Information Extraction from Italian Clinical Reports</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Livia Lilli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Antenucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Augusta Ortolan</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvia Laura Bosello</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Antonietta D'Agostino</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Patarnello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlotta Masciocchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacopo Lenkowicz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Catholic University of the Sacred Heart</institution>
          ,
          <addr-line>Rome, 00168</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario Agostino Gemelli IRCCS</institution>
          ,
          <addr-line>Rome, 00168</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UOC di Reumatologia, Fondazione Policlinico Universitario A Gemelli IRCCS</institution>
          ,
          <addr-line>00168 Roma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Natural Language Processing (NLP) is widely used across several fields, such as in medicine, where information often originates from unstructured data sources. This creates the need for automated systems, in order to classify text and extract information from Electronic Health Records (EHRs). However, a significant challenge lies in the limited availability of pre-trained models for less common languages, such as Italian, and for specific medical domains. Our study aims to develop an NLP approach to extract Systemic Lupus Erythematosus (SLE) information from Italian EHRs at Gemelli Hospital in Rome. We then introduce Lupus Alberto, a fine-tuned version of AlBERTo, trained for classifying categories derived from three distinct domains: Diagnosis, Therapy and Symptom. We evaluated Lupus Alberto's performance by comparing it with other baseline approaches, selecting from available BERT-based models for the Italian language and fine-tuning them for the same tasks. Evaluation results show that Lupus Alberto achieves overall F-Scores equal to 79%, 87%, and 76% for the Diagnosis, Therapy, and Symptom domains, respectively. Furthermore, our approach outperformed other baseline models in the Diagnosis and Symptom domains, demonstrating superior performance in identifying and categorizing relevant SLE information, thereby improving clinical decision-making and patient management.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Systemic Lupus Erythematosus</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Italian Language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>tact visit. However, these Lupic features are not always
available in a structured format, then there is the need for
Natural Language Processing (NLP) is used in many ap- NLP approaches in order to interpret clinical reports and
plications, such as in the medical domain, where the extract the desired data. Based on the literature, large
huge amount of unstructured data sources coming from language models (LLMs) and transformer-based
architecElectronic Health Records (EHRs) generates the need to tures represent the state-of-the-art for EHR classification
develop automated systems for text classification and in- tasks [1, 2, 3, 4].
formation extraction. However, employing such methods This work aims to develop a transformer-based
apis challenging due to the scarcity of pre-trained models proach to identify SLE information from unstructured
in less common languages like Italian, and for specific EHRs at the Italian Gemelli Hospital of Rome. We then
medical domains. propose Lupus Alberto, a fine-tuned version of Alberto</p>
      <p>In this study, we explored the Systemic Lupus Erythe- [5], the available BERT-based model for the Italian
lanmatosus (SLE), a complex pathology which involves dif- guage trained on Italian tweets. In order to assess the
ferent organ domains and can occur in patients at several Lupus Alberto performance, we compare it with other
levels of severity. For this reason, information about diag- baseline approaches, choosing among the BERT-based
noses, symptoms and therapies are used by physicians to models available for the Italian language, always
finecharacterize Lupus patients and to make better informed tuned on the same tasks.
decisions about therapy changes or time for the next
con</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Hospitals may not have structured data sources and often
there is a need for advanced and automated approaches
for the extraction of specific features from clinical
reports. For this reason, there are several studies related
to information extraction and text classification in the
medical domain, in the context of diferent diseases and
languages.</p>
      <p>Specifically for SLE, we found the work of Deng et al.
[6], who applied rule-based and logistic regression to
identify SLE patient population from unstructured EHRs
in the English language. Also Turner et al. [7]
investigated NLP techniques for SLE characterization from
clinical notes, by using Bag-of-Words and cTakes to
transform input EHR texts into features eligible for Machine
Learning algorithms. They then used several models
like Neural Networks, Random Forest, Support Vector
Machines, Naïve Bayes and Word2Vec Bayesian
inversion, for the final text classification. Furthermore, in the
studies of Lilli et al. [8], Ortolan et al. [9], a rule-based
approach combined with a Bert-based topic modelling,
is proposed for the identification of longitudinal features Figure 1: Diversity of the fine-tuned categories. The inner
in Italian EHRs of SLE patients. circle shows the three classification domains, while the outer</p>
      <p>
        We then found more recent techniques applied in other circle represents the related categories.
pathological contexts in the Italian language, and based
on transformers and large language models. For example,
the work of Paolo et al. [10] presented a NER transformer- at the paragraph level, complying with the token limit
based approach in the lung cancer domain, on Italian of the BERT models. The final classification was then
EHRs. Additionally, Crema et al. [11] delivered an Ital- aggregated on the entire report, through a logical-OR.
ian dataset for the neuropsychiatric domain, training a
transformer-based model for NER tasks. About text clas- 3.2. Data Annotation
sification, Torri et al. [
        <xref ref-type="bibr" rid="ref2">12</xref>
        ] exploited text classification
models to extract relevant clinical variables, comparing The training set for the fine-tuning consisted of a
silrule-based, recurrent neural network and BERT-based ver standard made up of annotations from a rule-based
models, in the ST-Elevation Myocardial Infarction do- algorithm, developed ad hoc for the study [8]. In
particumain, from an Italian hospital. Finally, Lilli et al. [13] lar, we formulated rules and expressions for tagging each
proposed an ensemble of Llama with a Bert-based model, EHR paragraph with the presence of the categories shown
for metastasis classification of Italian EHRs in the Breast in Figure 1, excluding the possible negations. Rules
conCancer domain. sist of personalized regex and checks on distances among
      </p>
      <p>Based on the previous findings, our study aims to pro- words.
pose a transformer-based approach for the Italian lan- The gold standard for the evaluation was built by
physiguage, specifically for SLE. To this scope, we searched for cians, who annotated a set of EHRs in two steps. Manual
suitable methods to extract multiple Lupic features from annotation was performed by a first team of two
physithe clinical reports of our Italian hospital. We based on cians with medical knowledge in SLE, who annotated
the models delivered by Polignano et al. [5], who trained the reports of each patient with respect to the target
Albert [14] on Italian tweets, and by Buonocore et al. [15], information. A second team of two specialist
rheumatolwho proposed transformer-based models, pre-trained on ogists reviewed the manual annotations, for the quality
neural-machine translations of English resources and on assessment. For labelling data, an interactive dashboard
natively Italian-written medical texts. was developed ad hoc for the project, where the user
assigned to each EHR the corresponding tags. The
dashboard URL is accessible only from the hospital’s internal
3. Methods network, then it’s not sharable. However, Figure 2
provides a screen of the home and annotation pages.
3.1. Data Corpus The Inter Annotator Agreement (IAA) among the
annotations of the two groups was also computed for a
quality assurance measure of data and annotations [16].</p>
      <p>For this purpose, we chose the Cohen’s Kappa metric,
which is a measure of the agreements of two
annotators while considering the agreement that could occur
by chance [17]:</p>
      <sec id="sec-2-1">
        <title>In this paper, we used data from the SLE Data Mart of the</title>
        <p>Gemelli Hospital of Rome, which comprises an extensive
collection of structured and unstructured data related
to Lupus patients. We selected the outpatient clinical
reports, considered by physicians as more informative
for extracting information like diagnoses, therapies and
symptoms. For their length, we also chose to treat EHRs</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <sec id="sec-3-1">
        <title>4.1. Dataset</title>
        <sec id="sec-3-1-1">
          <title>In the Equation 1, 0 is the observed agreement, while</title>
          <p>is the expected agreement when both the annotators
randomly assign labels, and it is estimated using a
perannotator empirical prior over the class labels [18].</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Fine-Tuning and Classification</title>
        <sec id="sec-3-2-1">
          <title>This study aimed to extract information about diagnoses,</title>
          <p>therapies and symptoms from the EHRs of the Gemelli
Hospital of Rome. Our purpose was to identify, for each
of the three domains, a set of categories provided by our
team of rheumatologists, related to SLE. As explained in
Figure 1, we then trained our model on 8 diferent types
of diagnoses, 4 therapies, and 7 symptoms.</p>
          <p>For this purpose, we fine-tuned AlBERTo 1, a
BERTbased model for the Italian language proposed by
Polignano et al. [5]. The fine-tuning was performed following
the approach of Polignano et al. [5], by treating every
category as a singular binary task, with its own training
set of labelled texts, randomly sampled from the original
data corpus. We then obtained multiple binary classifiers,
one for each category to extract.</p>
          <p>Fine-tuning and inference were implemented at the
paragraph level and not at the entire reports, in order to
comply with the token limit imposed by BERT models.
The final evaluation was then applied at the overall EHR
level, comparing the gold standard reports to the
paragraphs’ classification, combined at EHR level through a
logical-OR. Then if at least a paragraph is positive to a
specific category, the corresponding report is classified
with that category.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>For this study, we started from the SLE data mart of the</title>
          <p>Gemelli Hospital of Rome, by selecting among the 13299
available EHRs of outpatient visits.</p>
          <p>For our training set, we sampled 1000 training texts for
each binary category shown in Figure 1, balancing them
among positive and negative samples, such that each
category had 50% training samples labelled as positives. The
training set was composed of EHR paragraphs, in order
to comply with the token limit of 512 tokens imposed by
BERT-models.</p>
          <p>The gold standard set was composed of 750 EHRs
randomly sampled from the data mart, verifying that their
paragraphs were not already in the training set. Gold
standard set was annotated by two groups of physicians
through the annotation dashboard in Figure 2. The same
set of gold standard reports were used for the evaluation
of all the classification domains.</p>
          <p>Details about the dataset are shown in Table 1, where
some statistics are reported for each domain,
distinguished by training set and gold standard. In
particular, for each case are shown the number of categories to
classify, the total of paragraphs processed during
training and inference, the overall number of EHRs, and the
mean of tokens and characters over the paragraphs.
Tokens were computed through the BERT tokenizer2 [19]
available on Hugging Face [20].</p>
          <p>For privacy reasons, the dataset used in this study is
not publicly available. We then provided the descriptive
summary metrics in Table 1.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>4.2. Inter Annotator Agreement</title>
        <sec id="sec-3-3-1">
          <title>In order to measure the Inter Annotator Agreement on</title>
          <p>the gold standards, we used the cohen_kappa_score
func</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>1https://github.com/marcopoli/AlBERTo-it</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>2google-bert/bert-base-uncased</title>
          <p>tion provided by the Python Scikit-Learn package [21]. BERT-based models for text classification. Particularly,
As inputs to the function, we considered the arrays con- we considered the three models proposed by Buonocore
taining the binary annotations performed by the two et al. [15], BioBit3, MedBit4 and MedBIT-r3-plus5, which
groups of annotators respectively. Additionally, we per- are pre-trainings on the Italian language, in the medical
formed the analysis grouping the annotations by the context. Additionally, we also tried the two base versions
three domains: Diagnosis, Therapy and Symptom. Re- of Albert6 [14], that is the base model used by Polignano
sults are shown in Table 2. Staying on the grid proposed et al. [5] to release AlBERTo.
by Landis and Koch [22] for the interpretation of the The inference for all the models was performed at the
coeficient, we have an almost perfect quality of annota- paragraph level instead of the whole report level, and
tion for the Diagnosis and Therapy domains ( &gt; 0.80), the final classification was aggregated at the EHR level
and a substantial level for the Symptom case ( = 0.69). through a logical-OR. Then, if at least a paragraph is
Although acceptable according to literature standards positive to the Articular Diagnosis, the overall EHR is
[16], the latter k score has a lower value than the others, classified as positive to that category.
because of the greater dificulty of identifying symptoms
from text. Symptoms at current contact are in fact more 4.4. Results and Discussion
complex concepts to identify, compared to therapies and
diagnoses, which are usually mentioned in the EHR more For the evaluation, we compared Lupus Alberto to the
explicitly. So, even if analyzed by clinical experts, the other baseline models (fine-tuned on the same tasks), in
same report can present inconsistency of annotations, terms of F-Score at the singular category level.
Addidue to the poor quality of text semantics. tionally, to quantify the overall performances, we also
computed the mean F-Score for the Diagnosis, Therapy
Table 2 and Symptom domains.</p>
          <p>The Inter Annotator Agreement (IAA) computed between the As shown in Table 3, Lupus Alberto presents the
hightwo groups of physicians, through the Cohen’s Kappa metric, est F-score for the therapy domain, with a value of 87%.
distinguished by the three classification domains. Then follow the Diagnosis and Symptom domains with
overall metrics of 79% and 76% respectively. These
perDomain Cohen’s Kappa (k) formances reflect the IAA results in Table 2, which shows
Diagnosis 0.88 that Therapy presents a higher quality of annotations
Therapy 0.93 compared to Diagnosis and Symptom.</p>
          <p>Symptom 0.69 Concerning the baselines, Lupus Alberto outperforms
the other experiments for Diagnosis and Symptom, while
the Therapy domain presents the higher metric value
4.3. Modeling with the fine-tuned MedBIT-r3-plus [ 15], whose score
equals 88%.</p>
          <p>The AlBERTo fine-tuning was performed through the Py- At the singular category level, the Hematologic and
Torch Trainer of the Hugging Face Transformers library Renal diagnoses present the highest performance metrics
[20], using 10 epochs (for further implementation details, in their domain, with values of 98% and 94%, respectively.
see Appendix A). Fine-tuning was performed for each of The Glucocorticoid is the therapy with the best F-Score,
the 19 categories, in order to obtain a classifier for each equal to 97%. Finally, Papula and Raynaud’s Phenomenon
binary task.</p>
          <p>In order to assess the Lupus Alberto performance, we 3IVN-RIN/bioBIT
then compared the model to other baselines, always fine- 45IIVVNN--RRIINN//mmeeddBBIITT-r3-plus
tuned on the same binary tasks, choosing among several 6albert/albert-base-v1, albert/albert-base-v2
are the best-performing symptoms, with a score equal to baseline methods, outperforming especially in the
clas89% and 87% respectively. sification of information in the Diagnosis and Symptom</p>
          <p>In all the three domains, the second version of Al- domains, achieving F-Scores of 79% and 76%, respectively.
bert model present the lowest performance values, with
F-Scores equal to 69%, 78% and 44% respectively, if
compared to our Lupus Alberto and to the fine-tuned models 6. Limitations
of Buonocore et al. [15]. Then, as demonstrated from
the above results, fine-tuning models specifically trained
in the Italian language, improved the final classification
performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <sec id="sec-4-1">
        <title>This study aims to deliver a transformer-based approach</title>
        <p>to extract SLE information from real-world data of the
Gemelli Hospital of Rome. The scarcity of available
models for the Italian language, specialized in Lupus,
prompted us to develop a solution to automate the
extraction process of SLE information from Italian EHRs.
We especially focused on identifying features in the
domains of Diagnosis, Therapy and Symptom, reported as
of interest for SLE. Our work shows that Lupus Alberto
presents competitive performance if compared to other
While our proposed approach presents higher
performances if compared to the baselines, many aspects could
be investigated in future studies, in order to enhance the
ifnal performance. This includes the usage of a larger set
of training data for the model fine-tuning. Additionally,
new research could be conducted by extracting Lupus
features through LLMs, and comparing the results with the
traditional transformer-based classifiers. Finally, a first
release of the Lupus Alberto could be implemented using
diferential privacy techniques to ensure the protection
of data from inference risks [23].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>For this study, the use of electronic health records was</title>
        <p>essential for training and testing our new technology.</p>
        <p>
          However, these data contain sensitive patient informa- and validation of a rule-based framework for
autotion and it was fundamental adhering to strict privacy and mated identification of longitudinal clinical features
confidentiality guidelines. To this purpose, the dataset about systemic lupus erythematosus patients from
used in this paper was fully de-identified and we received electronic health records, Annals of the Rheumatic
approval from our institution to conduct the presented Diseases 2024;83:1014 (2024).
research. Approval protocol number from the relevant [10] D. Paolo, A. Bria, C. Greco, M. Russano, S. Ramella,
Ethics Committee can be provided on request. P. Soda, R. Sicilia, Named entity recognition in
italian lung cancer clinical reports using transformers,
in: 2023 IEEE International Conference on
BioinReferences formatics and Biomedicine (BIBM), IEEE, 2023, pp.
4101–4107.
[1] Y. Li, S. Rao, J. R. A. Solares, A. Hassaine, R. Ra- [11] C. Crema, T. M. Buonocore, S. Fostinelli, E.
Parimmakrishnan, D. Canoy, Y. Zhu, K. Rahimi, G. Salimi- belli, F. Verde, C. Fundarò, M. Manera, M. C.
RaKhorshidi, Behrt: transformer for electronic health musino, M. Capelli, A. Costa, et al.,
Advancrecords, Scientific reports 10 (2020) 7155. ing italian biomedical information extraction with
[2] V. Yogarajan, J. Montiel, T. Smith, B. Pfahringer, transformers-based models: Methodological
inTransformers for multi-label classification of medi- sights and multicenter practical application, Journal
cal text: an empirical comparison, in: International of Biomedical Informatics 148 (2023) 104557.
Conference on Artificial Intelligence in Medicine, [
          <xref ref-type="bibr" rid="ref2">12</xref>
          ] V. Torri, S. Mazzucato, S. Dalmiani, U. Paradossi,
Springer, 2021, pp. 114–123. C. Passino, S. Moccia, S. Micera, F. Ieva,
Structur[3] M. Rupp, O. Peter, T. Pattipaka, Exbehrt: Extended ing clinical notes of italian st-elevation myocardial
transformer for electronic health records, in: Inter- infarction patients, in: Proceedings of the First
national Workshop on Trustworthy Machine Learn- Workshop on Patient-Oriented Language
Processing for Healthcare, Springer, 2023, pp. 73–84. ing (CL4Health)@ LREC-COLING 2024, 2024, pp.
[4] Z. Yang, A. Mitra, W. Liu, D. Berlowitz, H. Yu, Trans- 37–43.
        </p>
        <p>formehr: transformer-based encoder-decoder gen- [13] L. Lilli, S. Patarnello, C. Masciocchi, V. Masiello,
erative model to enhance prediction of disease out- F. Marazzi, T. Luca, N. Capocchiano, Llamamts:
Opcomes using electronic health records, Nature com- timizing metastasis detection with llama instruction
munications 14 (2023) 7857. tuning and bert-based ensemble in italian clinical
[5] M. Polignano, P. Basile, M. De Gemmis, G. Semeraro, reports, in: Proceedings of the 6th Clinical Natural
V. Basile, et al., Alberto: Italian bert language under- Language Processing Workshop, 2024, pp. 162–171.
standing model for nlp challenging tasks based on [14] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma,
tweets, in: CEUR workshop proceedings, volume R. Soricut, Albert: A lite bert for self-supervised
2481, CEUR, 2019, pp. 1–6. learning of language representations, arXiv
[6] Y. Deng, J. A. Pacheco, A. Ghosh, A. Chung, C. Mao, preprint arXiv:1909.11942 (2019).</p>
        <p>J. C. Smith, J. Zhao, W.-Q. Wei, A. Barnado, C. Dorn, [15] T. M. Buonocore, C. Crema, A. Redolfi, R. Bellazzi,
et al., Natural language processing to identify lupus E. Parimbelli, Localizing in-domain adaptation
nephritis phenotype in electronic health records, of transformer-based biomedical language
modBMC Medical Informatics and Decision Making 22 els, Journal of Biomedical Informatics 144 (2023)
(2022) 348. 104431.
[7] C. A. Turner, A. D. Jacobs, C. K. Marques, J. C. Oates, [16] K. L. Soeken, P. A. Prescott, Issues in the use of
D. L. Kamen, P. E. Anderson, J. S. Obeid, Word2vec kappa to estimate reliability, Medical care (1986)
inversion and traditional text classifiers for pheno- 733–741.
typing lupus, BMC medical informatics and deci- [17] J. Cohen, A coeficient of agreement for nominal
sion making 17 (2017) 1–11. scales, Educational and psychological measurement
[8] L. Lilli, S. L. Bosello, L. Antenucci, S. Patarnello, 20 (1960) 37–46.</p>
        <p>A. Ortolan, J. Lenkowicz, M. Gorini, G. Castellino, [18] R. Artstein, M. Poesio, Inter-coder agreement for
A. Cesario, M. A. D’Agostino, et al., A comprehen- computational linguistics, Computational
linguissive natural language processing pipeline for the tics 34 (2008) 555–596.
chronic lupus disease, in: Digital Health and In- [19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
formatics Innovations for Sustainable Health Care Bert: Pre-training of deep bidirectional
transformSystems, IOS Press, 2024, pp. 909–913. ers for language understanding, arXiv preprint
[9] A. Ortolan, L. Lilli, S. Bosello, L. Antenucci, C. Mas- arXiv:1810.04805 (2018).</p>
        <p>ciocchi, J. Lenkowicz, P. Cerasuolo, L. Lanzo, S. Pi- [20] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C.
Deunno, G. Castellino, et al., Pos1142 development langue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Fun</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Implementation Details</title>
      <sec id="sec-6-1">
        <title>The fine-tuning was performed through the PyTorch</title>
        <p>Trainer7 of the Hugging Face Transformers library [20],
with a desktop GPU Nvidia RTX 5000 Graphics
Processing with 16GB of RAM, on a machine with Ubuntu 20.04.3
LTS. The 20% of training set was used as eval_dataset,
while the remaining was employed as train_dataset.
The learning rate was set to 2e-5, the batch size to 16,
and the weight decay to 0.01.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          arXiv:
          <year>1910</year>
          .
          <volume>03771</volume>
          (
          <year>2019</year>
          ). [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>research 12</source>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . [22]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Landis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. G. Koch,</surname>
          </string-name>
          <article-title>The measurement of ob-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          (
          <year>1977</year>
          )
          <fpage>159</fpage>
          -
          <lpage>174</lpage>
          . [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Ruzzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Santilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Zanzotto</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>solutions</surname>
          </string-name>
          ,
          <source>arXiv preprint arXiv:2408.05212</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>