<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Environmental Research and Public
Health 19 (2022) 7353.
[26] R. Leone</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.48550/ARXIV.2407</article-id>
      <title-group>
        <article-title>DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariano Barone</string-name>
          <email>mariano.barone@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Laudante</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Riccio</string-name>
          <email>giuseppe.riccio3@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Romano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Postiglione</string-name>
          <email>marco.postiglione@northwestern.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Moscato</string-name>
          <email>vincenzo.moscato@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naples</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Pharmacological Text Mining, Adverse Drug Reactions, Drug-Drug Interactions, Italian Biomedical NLP</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Consorzio Interuniversitario Nazionale per l'Informatica (CINI) - ITEM National Lab, Complesso Universitario Monte S.Angelo</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Evanston</institution>
          ,
          <addr-line>IL 60208</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Northwestern University, Department of Computer Science, McCormick School of Engineering and Applied Science</institution>
          ,
          <addr-line>2233 Tech Dr</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Naples Federico II, Department of Electrical Engineering and Information Technology (DIETI)</institution>
          ,
          <addr-line>Via Claudio, 21 -</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>15576</volume>
      <fpage>9</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>The extraction of pharmacological knowledge from regulatory documents has become a key focus in biomedical natural language processing, with applications ranging from adverse event monitoring to AI-assisted clinical decision support. However, research in this field has predominantly relied on English-language corpora such as DrugBank, leaving a significant gap in resources tailored to other healthcare systems. To address this limitation, we introduce DART (Drug Annotation from Regulatory Texts), the first structured corpus of Italian Summaries of Product Characteristics derived from the oficial repository of the Italian Medicines Agency (AIFA). The dataset was built through a reproducible pipeline encompassing web-scale document retrieval, semantic segmentation of regulatory sections, and clinical summarization using a few-shot-tuned large language model with low-temperature decoding. DART provides structured information on key pharmacological domains such as indications, adverse drug reactions, and drug-drug interactions. To validate its utility, we implemented an LLM-based drug interaction checker that leverages the dataset to infer clinically meaningful interactions. Experimental results show that instruction-tuned LLMs can accurately infer potential interactions and their clinical implications when grounded in the structured textual fields of DART. We publicly release our code on GitHub: https://github.com/PRAISELab-PicusLab/DART.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent times, extracting and organizing pharmacological information from regulatory documents has
taken on a pivotal role in the domain of biomedical natural language processing (NLP). This research
goal is focused on automating the assimilation of clinical and regulatory data into decision-oriented
processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], thereby supporting applications like prescription aid systems and pharmacovigilance
instruments. Among these regulatory resources, the Summary of Product Characteristics (SmPC) —
referred to in Italy as the Riassunto delle Caratteristiche del Prodotto (RPC) — is notably distinguished
as a comprehensive and reliable document published by the Italian Medicines Agency1 (AIFA). Designed
for healthcare professionals, the RCP serves as the ’identity card’ of a medicinal product, providing
standardized and regularly updated information on eficacy, safety, therapeutic use, contraindications,
adverse drug reactions (ADR), drug-drug interactions (DDI), and other essential clinical characteristics.
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>Despite its importance, such texts remain underrepresented in the literature, with most prior work
focusing exclusively on English-language corpora and overlooking the linguistic and structural particularities
of national regulatory frameworks. In the Italian context, the absence of tailored resources hampers the
development of clinically grounded AI (Artificial Intelligence) systems that align with local healthcare
practices and regulatory standards. To address this gap, we present DART (Drug Annotation from
Regulatory Texts), a structured corpus of RCPs in Italian, developed through a scalable and reproducible
pipeline. The dataset is built by automatically retrieving documents from AIFA, extracting and
semantically segmenting their contents, and organizing the information into structured fields that correspond to
standard regulatory sections. Additionally, DART is enhanced with clinical summaries generated using
large language models (LLMs) via few-shot learning and low-temperature decoding strategies. These
summaries are intended to support downstream applications such as interaction checking, knowledge
graph construction, and automated risk profiling. With more than 16,000 processed RCPs and over
95 million tokens, DART represents a high-value asset for the Italian clinical NLP community and the
broader healthcare data science ecosystem. It provides a robust foundation for the training, evaluation,
and deployment of large-scale language models in both regulatory and clinical contexts. Furthermore,
DART contributes significantly to the healthcare Big Data ecosystem by ofering a high-resolution corpus
of regulatory texts that supports the training of LLMs, the development of interpretable knowledge
graphs, and the implementation of AI-driven clinical decision-making tools.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The extraction of pharmacological knowledge from regulatory texts—such as Summary of Product
Characteristics (SmPC) —is a growing area in biomedical NLP. These documents ofer authoritative
information on adverse drug reactions (ADRs), drug–drug interactions (DDIs), contraindications, and
indications, and form the normative basis for safe prescribing. However, most existing work has
focused on English-language corpora, leaving national regulatory texts, especially Italian RCPs,
underrepresented. Early ADR extraction relied on classical machine learning models, including ensemble
methods and multilayer perceptrons [2, 3]. The adoption of transformer-based architectures such
as BERT, BioBERT, and PubMedBERT significantly improved performance [ 4], though non-English
texts still require costly adaptation and fine-tuning [ 5]. More recently, large language models (LLMs)
like GPT-4 have shown strong zero- and few-shot performance in biomedical tasks, including ADR
detection, outperforming traditional baselines and enhancing interpretability in pharmacovigilance
pipelines [6, 7, 8]. Retrieval-augmented generation and agent-based simulation approaches have further
demonstrated the value of context-aware models [9, 10]. DDI prediction has similarly evolved toward
hybrid and graph-based architectures. Recent studies integrate knowledge graphs (KGs) with LLMs to
produce accurate and explainable predictions [11, 12], while in-context learning techniques have
improved interaction detection [13]. Medication recommendation systems now incorporate regulatory text
and clinical narratives, outperforming structured-code-based methods, especially in multilingual and
safety-aware settings [14, 15, 16]. Ongoing work also explores explainability in recommendations [17]
and the combination of symbolic and generative approaches in medical summarization [18]. Despite
this progress, Italian regulatory documents remain largely unexplored. Resources like DrugBank [19]
include Italian drug names but abstract away regulatory phrasing and section structure. Challenges
such as DIMMI [20] and aggregation eforts [ 21] highlight the need for domain-specific resources
aligned with the Italian context. Real-world data sources like the National Pharmacovigilance Network
(RNF) [22, 23], VALORE [24], and regional datasets [25] ofer complementary insights but are often
incomplete, misaligned with regulatory language, or dependent on patient self-reporting [26]. In contrast,
RCPs provide standardized, high-quality knowledge that enables direct modeling of pharmacological
phenomena [27]. To fill this gap, we present DART, a structured dataset derived from full-text Italian
RCPs, designed to support the development of LLM-based systems grounded in oficial regulatory
content. DART enables validation of LLM outputs against both normative sources and observational
datasets, fostering a bidirectional loop between automated pharmacological reasoning and real-world
clinical safety evidence.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Construction</title>
        <p>The dataset DART was constructed through a three-step pipeline designed for reproducibility and
scalability. Specifically: (i) automated retrieval of URLs for the Summary of Product Characteristics
(RCPs) from the AIFA portal, (ii) semantic parsing and segmentation of the extracted RCP text, and (iii)
data structuring, filtering, and validation. All modules were implemented in Python using open-source
libraries. The complete construction workflow is illustrated in Figure 1.</p>
        <p>AIFA
Agenzia
Italiana del
Farmaco</p>
        <p>Automatic URL
acquisition of
AIFA RCPs</p>
        <p>Name of the
medicinal product</p>
        <p>Nam.e..of the
medicinal product
4.ClinicalparNtiacumla.e.r.sof the
4.1Therapeuticindications
4.2P4o.sColoingiycalparticulaprsroduct</p>
        <p>medicinal...
4.3Con4t.r1aiTnhdeicraptieountsicindications</p>
        <p>4.2.P.4o..soCloignyicalparticulars
6.Pharm4a.c3eCutoinc4atr.l1apiTnadhrietciractupioleanurtsicindications
6.1Listofexci4p.i2e.nP..tossology
6.2I6n.coPmhapramtiba4ic.l3ietuiCetsoicnatlrapinadrticicautiloanrs
6.1.L..istofexcipie.n.t.s
6.2In6c.oPmhpaartmibailciteiuesticalparticulars
6.1.L..istofexcipients
6.2Incompatibilities</p>
        <p>...</p>
        <p>RCPs</p>
        <p>Text Extraction
and Section
Segmentation</p>
        <p>Name AIC ... 4.x Clinical ... ... 6.x Pharmaceutical ...</p>
        <p>...</p>
        <p>...</p>
        <p>Final dataset in tabular format</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Automated Retrieval of RCP URLs</title>
          <p>The first phase of the pipeline involved the programmatic retrieval of RCP PDFs by interrogating
undocumented but publicly accessible RESTful APIs exposed by the AIFA web portal. Due to the
SPA-based (Single Page Application) architecture of the website—built using frameworks such as
Angular—static DOM scraping was inefective. Instead, a detailed network trafic analysis was conducted
via browser developer tools (DevTools, “Network” tab), which led to the identification of two critical
endpoints. The first is a search endpoint, which requires a zero-padded AIC code (e.g., 123456 becomes
00123456) and returns a JSON payload containing metadata for each drug, including the keys CodiceSis
and aic6. These values are then used to query a second endpoint that provides a direct URL to the
corresponding RCP PDF. This two-step API interaction is illustrated in Figure 2. Although these APIs
are unoficial and subject to change without notice, they provided the only viable and scalable access to
RCP documents at the time of this study (June 2025). A web spider was implemented in Python using
the requests library and seeded with a list of valid AIC codes, sourced from public datasets or inferred
from known numerical intervals, in compliance with applicable ethical and legal constraints. For each
AIC code, the system executed: querying the search endpoint, parsing the JSON response, constructing
the PDF URL, and downloading the file. Failures and exceptions were handled using structured logging.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Text Extraction and Section Segmentation</title>
          <p>Once the documents were collected, the pipeline proceeded with text extraction and semantic
segmentation. Text was extracted using the PyMuPDF library, selected for its robustness in handling complex
layouts, preserving reading order, and maintaining basic spatial formatting where feasible. This
approach proved efective in most cases, except for PDFs consisting solely of rasterized images (i.e.,</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Example API Call for RCP PDF Retrieval</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Step 1: Query the Search Endpoint</title>
          <p>https://api.aifa.gov.it/aifa-bdf-eif-be/1.0.0/formadosaggio/ricerca?
query={AIC_code}&amp;spellingCorrection=true&amp;page=0
Note: The input {AIC_code} must be a zero-padded version of the original AIC code (e.g., 123456 → 00123456).
Response: JSON object containing:
• CodiceSis (e.g., 10004290)
• aic6 (e.g., 123456)</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>Step 2: Construct the PDF Download URL</title>
          <p>https://api.aifa.gov.it/aifa-bdf-eif-be/1.0.0/organizzazione/{CodiceSis}/
farmaci/{aic6}/stampati?ts=RCP
Example:
https://api.aifa.gov.it/.../organizzazione/10004290/farmaci/123456/stampati?ts=RCP</p>
          <p>Output: Direct link to the corresponding RCP PDF.
scanned documents), which lack an embedded text layer. Approximately 4.1% of collected PDFs were
excluded due to the absence of an embedded text layer, making them incompatible with text-based
parsing. These cases are flagged for future integration through OCR modules, which are currently under
development. The text structuring phase was based on the automatic identification of section headers,
which follow well-defined regulatory conventions in RCPs (e.g., ”04.1 Therapeutic Indications”, ”04.8
Undesirable Efects”). A robust regular expression was designed to recognize both the numerical and
textual components of the headers, accounting for typographic variability (e.g., spacing, punctuation,
capitalization). This enabled segmentation of each document into blocks corresponding to individual
sections, each assigned a standardized label. Sections not detected were marked as ”N/A” in the resulting
dataset, preserving the structural consistency of the data model.</p>
          <p>Tabular Content Handling Special attention was devoted to Section 04.8 (”Undesirable Efects”)
often includes tabular structures. While full table parsing was out of scope in this version, raw text
within tables was preserved using PyMuPDF’s line-by-line reading mechanism, which retains spatial
alignment. Although columnar relationships are not explicitly modeled, the output allows partial
semantic interpretation. Future iterations will integrate table extraction tools such as pdfplumber,
camelot, or layout-aware parsing models.</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>3.1.3. Data Structuring and Validation</title>
          <p>Extracted data were finally mapped into a tabular dataset, where each row corresponds to an RCP
document and each column to a specific regulatory section. Final validation included completeness
checks (e.g., verifying the presence of expected sections), spot comparisons between raw documents
and extracted text, and analysis of error logs produced by the spider. The combined application of these
methods enabled the construction of a coherent, scalable dataset suitable for downstream analyses in
pharmacological, linguistic, and computational research contexts. Validation steps included logging
and error tracking using loguru and structured output reports. On a random sample of 300 documents,
over 97% of expected sections were correctly identified and segmented. Remaining errors were mostly
due to non-standard formatting or OCR failures.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing and Filtering</title>
        <p>Following initial structuring, DART contains 21,502 drugs, subject to a preprocessing step was applied
to improve the consistency and correctness of the dataset. This phase involved regex-based cleaning
to standardize formatting, eliminating excess whitespace, resolving punctuation inconsistencies, and
uniforming typographic variances in section headers. Documents with empty or flawed text were
identified and excluded. The ”05.0 Pharmacological Properties” section was removed entirely due to a
high rate of missing or unusable data, impacting content density and usability. This led to a dataset with
strong structural consistency and semantic integrity, appropriate for various clinical NLP applications.
Ultimately, 16,029 documents (74.55%) were correctly segmented into at least 5 mandatory regulatory
sections, while the remaining 25.45% were removed due to structural problems or incomplete content,
often originating from OCR errors or missing data.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. RCP Summarization</title>
        <p>To improve the usability of the dataset for both human users and NLP systems, a summarization phase
was introduced to condense long and heterogeneous regulatory sections into standardized clinical
summaries. This step facilitates tasks such as text classification, knowledge extraction, semantic search,
and decision support, while also enabling rapid inspection by clinicians and analysts. Summaries
were generated using LLaMA 3.1–405B[28] through Nvidia NIM API2, a state-of-the-art large language
model, with a low-temperature setting (0.2) to ensure high consistency and minimal hallucination. Each
summary was limited to 450 words and aimed to capture key information on drug interactions, adverse
events, contraindications, warnings, and pregnancy-related considerations. To guide the generation,
we employed a structured prompt combined with a few-shot learning strategy. Handcrafted examples
were prepended to the prompt to ensure alignment with regulatory tone, content structure, and domain
terminology. Input text was extracted from seven RCP sections (04.1, 04.2, 04.3, 04.4, 04.5, 04.6, and 04.8)
and dynamically inserted into the prompt. The resulting summaries were integrated into the dataset as
an additional field, enhancing its value for downstream applications and enabling comparisons with
real-world DDI/ADR evidence. A manual review of 100 generated summaries showed a high degree of
factual consistency (95%) and minimal hallucination. Most deviations involved stylistic variation or
omission of low-priority details. An expert-based validation protocol is currently under development.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Dataset Analysis</title>
        <p>DART consists of 16,029 documents, spanning multiple therapeutic areas and regulatory reimbursement
classes. The dataset was last updated in May 2025. The corpus comprises over 95 million tokens, with a
vocabulary of 102,749 unique terms. Document lengths are generally compact: the mean token count
per document is 177.5 (median: 168.3), with a maximum of 9,512 tokens.
essential reimbursed drugs, respectively. The subclasses C-nn and C-bis are nested under C, which
explains the sum exceeding the total document count.</p>
        <p>Section Coverage and Quality Metrics. We evaluated the presence of key regulatory sections
across documents to assess completeness and usability for NLP tasks. Table 3 reports the coverage
of selected sections critical for pharmacological information extraction. The results indicate a high
degree of consistency, with most sections present in over 90% of the RCPs, ensuring reliable availability
of therapeutic indications, dosage information, contraindications, interactions, and adverse efects
for computational analysis. Only section 04.6 Pregnancy/Lactation has a slightly lower percentage of
coverage (89.6%).
Lexical and Semantic Insights. The vocabulary size (≈103k unique terms) includes a significant
portion of technical jargon, multi-token entities, and standard pharmaceutical terminology. Key
pharmacological terms (e.g., “interactions”, “pregnancy”, “contraindications”) occur with high frequency
across classes, supporting targeted NLP extraction. Further lexical analysis is ongoing to quantify
the proportion of domain-specific terms and evaluate term distribution across reimbursement and
therapeutic classes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Applications</title>
      <p>The DART dataset derived from RCPs supports high-precision tasks in computational pharmacovigilance,
structured biomedical information extraction, and explainable clinical decision support systems. The
dataset applies a semantic structuring pipeline, which categorizes regulatory text into standardized
groups (e.g., interactions, contraindications, adverse efects), allowing traceable links to source sections,
aligning with regulatory demands, and enhancing interpretability, notably during clinical or legal audits.
This structured, authoritative data anchors AI-driven systems, fostering the development of reliable,
explainable tools for clinicians, pharmacists, and health IT systems. Key applications of this dataset are
outlined below.</p>
      <sec id="sec-4-1">
        <title>4.1. LLM-based Drug-Drug Interaction Checker</title>
        <p>To assess the efectiveness of the DART dataset in the context of automated processing of regulatory
information, we designed and implemented an advanced system for the identification of drug–drug
interactions (DDIs), leveraging the capabilities of LLMs. The system takes as input a set of drugs
 = ( 1,  2, ...,   ) each represented through its active ingredient and the structured sections of the RCP,
extracted directly from the DART dataset. RCPs, being rich and complex technical documents, contain
relevant information for DDI detection dispersed across heterogeneous sections such as “Warnings
and Precautions”, “Interactions”, or “Pharmacokinetic Properties”. However, direct analysis of the full
text proves suboptimal for LLMs due to both input length limitations and high semantic dispersion. In
order to tackle these issues, as outlined in Section 3.3, we implemented a regulatory summarization
...</p>
        <p>...</p>
        <p>RCP
Summarized
RCP
Summarized
RCP
Summarized</p>
        <p>LLM
as
Drug
Drug
Interaction</p>
        <p>Interaction</p>
        <p>Absent
Minor
Moderate
 Major
module in which each drug is paired with a corresponding summary, denoted as   = ( 1 ,  2 , ...  ).
This feature, built upon an LLM, produces an organized summary concentrating solely on components
that could be pertinent to analyzing pharmacological interactions. In this initial phase, the system
markedly reduces the complexity of the regulatory text, directing the model’s focus toward clinically
relevant concepts while ensuring compliance with the computational constraints of current LLMs. The
resulting summary is then forwarded to the LLM-as-DDI module, which—through the application of
targeted prompt engineering techniques—detects potential interactions between active pharmaceutical
ingredients, elucidates the underlying pharmacological mechanisms (such as receptor synergies or
enzymatic pathways), and assesses the clinical relevance of each interaction based on the patient’s
profile. This process is followed by the formulation of context-specific recommendations—such as
dosage adjustments or monitoring requirements—and the assignment of a severity level to each identified
interaction. The full pipeline is illustrated in Figure 3 and an example end-to-end is showed in Figure 4.
Interactions are categorized into four ascending levels of clinical severity—Absent, Minor, Moderate,
and Major —in accordance with taxonomies commonly employed in scientific research. Specifically,
Absent indicates the lack of any known or clinically meaningful interaction between the drugs; Minor
denotes a pharmacological interaction of negligible clinical relevance, typically not requiring any
intervention; Moderate refers to a clinically significant interaction that may necessitate monitoring
or dosage adjustments; and Major implies a severe interaction, which is either contraindicated or
requires substantial modifications to the therapeutic regimen. To facilitate comparison with widely used
online tools such as Drugs.com3, Medscape4, WebMD5, and RxList6, which adopt a binary classification
framework, we employed a simplified model that consolidates the Minor, Moderate, and Major categories
into a single class, labeled Interaction, while retaining Absent as a distinct category. This adaptation
ensures compatibility with systems commonly used in clinical practice. Performance was evaluated
using standard metrics—Precision, Recall, F1-score, and Accuracy—on a manually annotated test set
comprising 100 examples. Particular emphasis was placed on Recall, as it serves as a critical metric
for assessing the system’s ability to detect all clinically relevant drug–drug interactions (DDIs). In
medical contexts, achieving high Recall is essential to minimize false negatives and thereby ensure
patient safety. Table 4 highlights the substantial advantages of the proposed framework. The upper
3https://www.drugs.com/drug_interactions.html
4https://reference.medscape.com/drug-interactionchecker
5https://www.webmd.com/interaction-checker/default.htm
6https://www.rxlist.com/drug-interaction-checker.htm
section of the table presents the results obtained from four established web-based tools. A range of
large language models (LLMs) were tested in standalone configuration, including both closed-source
models (GPT-4o, Claude, Gemini) and open-source models (LLaMA, Mistral, Gemma). Additionally,
the table includes performance data for open-source models enhanced with regulatory summaries
generated via the DART system. Comparative analysis reveals that certain closed-source models, such as
Claude-3.5 and GPT-4o, achieve performance comparable to or exceeding that of conventional clinical
tools. However, the incorporation of DART summarization emerges as a pivotal factor. For instance, the
configuration LLaMA-3.1-8B + DART achieves a Recall of 0.843, substantially outperforming the same
model without summarization (Recall = 0.229), and surpassing most of the evaluated web-based systems.
These findings underscore the critical role of guided regulatory summarization in enhancing DDI
detection capabilities without compromising precision. Overall, the results validate the efectiveness of
the proposed framework: integrating the DART dataset with advanced language models enables even
lightweight open-source architectures to efectively identify complex pharmacological interactions.
This approach demonstrates the potential to rival high-end proprietary systems, ofering an optimal
balance of accuracy, coverage, and computational eficiency—factors essential for practical deployment
in both clinical and regulatory domains.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Other Applications</title>
        <sec id="sec-4-2-1">
          <title>Training and Fine-tuning of Multilingual NLP Models. The dataset DART serves as a natural</title>
          <p>benchmark for training and fine-tuning NLP models specialized in Named Entity Recognition (NER)
and Relation Extraction (RE) in Italian, particularly for regulatory and clinical pharmacology domains.
It includes entities such as active substances, administration routes, pharmacokinetic mechanisms, and</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Illustrative Example of Drug–Drug Interaction Detection using the DART Framework</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Input:</title>
          <p>Drug F1: Warfarin
Active Ingredient: Warfarin</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>Step 2 – Extract Summarized RCPs:</title>
          <p>RCP for Warfarin → Summarized RCP F1 (≈ 450 words)
RCP for Ibuprofen → Summarized RCP F2 (≈ 450 words)
Drug F2: Ibuprofen
Active Ingredient: Ibuprofen</p>
        </sec>
        <sec id="sec-4-2-5">
          <title>Step 3 – Compare Summaries to Detect Interaction:</title>
          <p>The LLM receives the summarized RCPs F1 and F2, then prompting them → Evaluates interaction
risk, mechanism, and severity</p>
        </sec>
        <sec id="sec-4-2-6">
          <title>Output:</title>
          <p>Interaction Detected: Major
Drug Pair: Warfarin + Ibuprofen
Mechanism: Inhibition of CYP2C9 by ibuprofen increases the bleeding risk associated with warfarin
Recommendation: Avoid co-administration or closely monitor INR levels
pregnancy risk categories. Thanks to its semantic consistency and structural regularity, the dataset
supports both supervised training and distant supervision, filling a critical gap in the multilingual
biomedical NLP landscape, which remains heavily English-centric.</p>
        </sec>
        <sec id="sec-4-2-7">
          <title>Fine-tuning of Domain-specific LLMs or SLMs. The normalized corpus of RCP texts ofers a</title>
          <p>unique foundation for domain-specific fine-tuning of LLMs or Small Language Models (SLMs) tailored
to the Italian pharmaceutical regulatory domain. Potential downstream applications include:
Automatic classification of clinical risks from free text; Assisted generation of pharmacovigilance reports;
Controlled rewriting of regulatory documents (e.g., technical leaflets, RCPs). Such models could
significantly enhance automation and consistency in regulatory workflows, particularly in contexts requiring
traceable and explainable outputs.</p>
          <p>Construction of Regulatory Knowledge Graphs. The extracted relational triples (e.g., active
substance → causes → adverse efect, drug → interacts with → compound) can be transformed into
semantic knowledge graphs (KGs). These KGs support automated inference over contraindications and
interactions, and allow structured linking between regulatory sources (e.g., RCPs) and observational
data (e.g., national registries such as RNF or VALORE). Moreover, KGs facilitate the generation of
explainable clinical decision rules, increasing transparency and trust in AI-powered systems.</p>
        </sec>
        <sec id="sec-4-2-8">
          <title>Semi-automated Population of Clinical Decision Support Systems (CDSS). The structured</title>
          <p>dataset is highly suitable for integration into next-generation Clinical Decision Support Systems that
combine structured knowledge (e.g., ontologies, terminologies) with unstructured textual evidence.
Data extracted from RCPs can be used to populate modules within Electronic Health Records (EHRs),
generate safety alerts in hospital pharmacy systems, or support real-time prescription checks. The
goal is to enhance patient safety and prescribing appropriateness, by anchoring decisions to verified,
regulatory-grade information.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion &amp; Future Work</title>
      <p>This work introduces a structured and scalable method for transforming Italian Summary of Product
Characteristics (RCPs) into machine-readable resources for biomedical AI. Through semantic parsing
and organization, we demonstrate their applicability in multiple domains, including pharmacological
interaction checking, domain-specific model tuning, knowledge graph creation, and clinical decision
support systems. Despite their linguistic variability, RCPs ofer a strong foundation for transparent and
regulation-compliant AI systems. The resulting dataset serves both as a benchmark for multilingual
biomedical NLP and as a driver for innovation in pharmacovigilance and clinical AI. Future developments
will aim to extend coverage to additional regulatory document types and therapeutic areas, improve
prompt and alignment strategies, introduce validation processes with domain experts, and publish
reusable tools and subsets to support open research in regulatory science.</p>
      <p>Limitations Although DART represents a relevant step for Italian biomedical NLP, some limitations
apply. It includes only RCPs, thus lacking real-world clinical nuances such as patient adherence or
of-label use. Not all AIFA-listed medicines are included due to technical issues like malformed or
inaccessible documents, potentially underrepresenting some drug categories. Additionally, the
LLMbased components, while optimized for factual consistency, may miss rare or context-specific details
and remain sensitive to prompt design and model variability.</p>
      <p>Ethical Issues Since DART relies exclusively on publicly available regulatory texts intended for
healthcare professionals, it presents minimal direct ethical risks. However, caution is necessary when
using generated outputs in clinical contexts, as language models may propagate inaccuracies, particularly
in sensitive areas like drug safety. Human oversight remains essential, and future work should include
expert review and mechanisms to flag uncertainty.</p>
      <p>Data License and Copyright Issues All documents were sourced from the oficial website of the
Italian Medicines Agency (AIFA) under public access policies. The DART dataset is released under
the Creative Commons Attribution 4.0 International License (CC BY 4.0), allowing reuse with proper
attribution. However, the original RCPs remain property of AIFA, and downstream use must respect
ethical and regulatory guidelines.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was conducted with the financial support of (1) the PNRR MUR project PE0000013-FAIR and
(2) the Italian ministry of economic development, via the ICARUS (Intelligent Contract Automation for
Rethinking User Services) project (CUP: B69J23000270005).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT and DeepL in order to: Grammar and
spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
[2] E. Hong, J. Jeon, H. U. Kim, Recent development of machine learning models for the prediction of
drug-drug interactions, Korean Journal of Chemical Engineering 40 (2023) 276–285.
[3] S. Abbas, G. A. R. Sampedro, M. B. Abisado, A. S. Almadhor, T. Kim, M. M. Zaidi, A novel
drugdrug indicator dataset and ensemble stacking model for detection and classification of drug-drug
interaction indicators, IEEE Access 11 (2023) 101525–101536. URL: https://doi.org/10.1109/ACCESS.
2023.3315241. doi:10.1109/ACCESS.2023.3315241.
[4] B. Portelli, E. Lenzi, E. Chersoni, G. Serra, E. Santus, BERT prescriptions to avoid unwanted
headaches: A comparison of transformer architectures for adverse drug event detection, in:
P. Merlo, J. Tiedemann, R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European
Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online,
April 19 - 23, 2021, Association for Computational Linguistics, 2021, pp. 1740–1747. URL: https:
//doi.org/10.18653/v1/2021.eacl-main.149. doi:10.18653/V1/2021.EACL- MAIN.149.
[5] A. Romano, G. Riccio, M. Postiglione, V. Moscato, Identifying cardiological disorders in spanish
via data augmentation and fine-tuned language models, in: G. Faggioli, N. Ferro, P. Galuscáková,
A. G. S. de Herrera (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum
(CLEF 2024), Grenoble, France, 9-12 September, 2024, volume 3740 of CEUR Workshop Proceedings,
CEUR-WS.org, 2024, pp. 207–222. URL: https://ceur-ws.org/Vol-3740/paper-19.pdf.
[6] M. Gope, J. Wang, Using llms to extract adverse drug reaction (ADR) from short text, in: J. Kim,
R. C. Conceição, M. Yousef, A. Bhavsar, S. Pelayo, A. Fred, H. Gamboa (Eds.), Proceedings of
the 18th International Joint Conference on Biomedical Engineering Systems and Technologies,
BIOSTEC 2025 - Volume 2: HEALTHINF, Porto, Portugal, February 20-22, 2025, SCITEPRESS, 2025,
pp. 548–555. URL: https://doi.org/10.5220/0013160700003911. doi:10.5220/0013160700003911.
[7] B. M. J. Alshehri, N. Kraiem, H. Sakly, N. Alasbali, Enhancing medication safety with large language
models: Advanced detection and prediction of drug-drug interactions, in: 7th IEEE International
Conference on Advanced Technologies, Signal and Image Processing, ATSIP 2024, Sousse, Tunisia,
July 11-13, 2024, IEEE, 2024, pp. 547–552. URL: https://doi.org/10.1109/ATSIP62566.2024.10638993.
doi:10.1109/ATSIP62566.2024.10638993.
[8] R. J. AbuNasser, M. Z. Ali, Y. Jararweh, M. Daraghmeh, T. Z. Ali, Large language models in drug
discovery: A comprehensive analysis of drug-target interaction prediction, in: 2nd International
Conference on Foundation and Large Language Models, FLLM 2024, Dubai, United Arab Emirates,
November 26-29, 2024, IEEE, 2024, pp. 417–431. URL: https://doi.org/10.1109/FLLM63129.2024.
10852448. doi:10.1109/FLLM63129.2024.10852448.
[9] R. Russo, D. Russo, G. M. Orlando, A. Romano, G. Riccio, V. L. Gatta, M. Postiglione, V. Moscato,
Europeanlawadvisor: an open source search engine for european laws, in: W. Ding, C. Lu,
F. Wang, L. Di, K. Wu, J. Huan, R. Nambiar, J. Li, F. Ilievski, R. Baeza-Yates, X. Hu (Eds.), IEEE
International Conference on Big Data, BigData 2024, Washington, DC, USA, December 15-18,
2024, IEEE, 2024, pp. 4751–4756. URL: https://doi.org/10.1109/BigData62323.2024.10826025. doi:10.
1109/BIGDATA62323.2024.10826025.
[10] A. Ferraro, A. Galli, V. L. Gatta, M. Postiglione, G. M. Orlando, D. Russo, G. Riccio, A. Romano,
V. Moscato, Agent-based modelling meets generative AI in social network simulations, in: L. M.
Aiello, T. Chakraborty, S. Gaito (Eds.), Social Networks Analysis and Mining - 16th International
Conference, ASONAM 2024, Rende, Italy, September 2-5, 2024, Proceedings, Part I, volume 15211
of Lecture Notes in Computer Science, Springer, 2024, pp. 155–170. URL: https://doi.org/10.1007/
978-3-031-78541-2_10. doi:10.1007/978- 3- 031- 78541- 2\_10.
[11] C. Xu, K. C. Bulusu, H. Pan, O. Elemento, Ddi-gpt: Explainable prediction of drug-drug interactions
using large language models enhanced with knowledge graphs, BioRxiv (2024) 2024–12.
[12] D. Russo, G. M. Orlando, A. Romano, G. Riccio, V. L. Gatta, M. Postiglione, V. Moscato, Scaling
llm-based knowledge graph generation: A case study of italian geopolitical news, in: W. Ding,
C. Lu, F. Wang, L. Di, K. Wu, J. Huan, R. Nambiar, J. Li, F. Ilievski, R. Baeza-Yates, X. Hu (Eds.),
IEEE International Conference on Big Data, BigData 2024, Washington, DC, USA, December
15-18, 2024, IEEE, 2024, pp. 3494–3497. URL: https://doi.org/10.1109/BigData62323.2024.10825937.
doi:10.1109/BIGDATA62323.2024.10825937.
spontaneous reporting database in italy, Drug safety 33 (2010) 667–675.
[27] Z. Shen, M. Spruit, Automatic extraction of adverse drug reactions from summary of product
characteristics, Applied Sciences 11 (2021) 2663.
[28] A. . M. Llama Team, The llama 3 herd of models, CoRR abs/2407.21783 (2024). URL: https:
//doi.org/10.48550/arXiv.2407.21783. doi:10.48550/ARXIV.2407.21783. arXiv:2407.21783.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Velupillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Morley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Osborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Downs</surname>
          </string-name>
          , et al.,
          <article-title>Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances</article-title>
          ,
          <source>Journal of biomedical informatics 88</source>
          (
          <year>2018</year>
          )
          <fpage>11</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>