<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Augmenting Public Procurement Event Logs with Large Language Models: a Legal Process Mining Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Spada</string-name>
          <email>ivan.spada@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilio Sulis</string-name>
          <email>emilio.sulis@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Audrito</string-name>
          <email>davide.audrito2@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vittoria Margherita Sofia Trifiletti Roberto Nai</string-name>
          <email>roberto.nai@unito.it</email>
          <email>vittoriamargheritasofia.trifiletti@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Turin</institution>
          ,
          <addr-line>Corso Svizzera 185, 10149, Torino, IT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Law, University of Turin</institution>
          ,
          <addr-line>C.so Lungo Dora Siena 100, 10153, Torino, IT</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Legal Studies, University of Bologna</institution>
          ,
          <addr-line>Via Zamboni 27, 40126, Bologna, IT</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>In process mining, the construction of an event log is the foundation for all subsequent analysis. The enrichment of event logs with information extracted from unstructured data is a promising area with great potential that remains largely unexplored. Nevertheless, recent advances in Large Language Models present new opportunities in the field. This paper proposes a framework to extract valuable information from large volumes of textual data. In particular, we demonstrate our approach through a proof-of-concept in the legal domain, analysing textual information from tender notices. We demonstrate how LLMs help identify events and temporal data, enriching event logs and enabling more comprehensive process analysis. Two domain experts in the legal field provide their expertise, e.g. validating the approach, as well as improving the text processing task. Finally, the results obtained are discussed to explore the integration of LLMs to support process mining through the identification of “hidden” information from unstructured data, as in the case of legal texts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Process Mining</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Event log enrichment</kwd>
        <kwd>Legal text</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Process mining has emerged as a foundational methodology for analysing and optimising business
processes by systematically examining event logs extracted from information systems. Within the
business process lifecycle, the construction of comprehensive event logs constitutes the critical first
step upon which all subsequent analysis is based. The integrity of process mining outcomes, including
insights into process performance, compliance verification, and eficiency assessments, is fundamentally
contingent on the accuracy, completeness, and granularity of these event logs [
        <xref ref-type="bibr" rid="ref1 ref9">1</xref>
        ].
      </p>
      <p>Conventional approaches primarily derive event logs from structured data repositories, such as
Enterprise Resource Planning systems. However, this established paradigm neglects the substantial
volume of potentially valuable process information embedded within unstructured textual data. Recent
advancements in Natural Language Processing (NLP), particularly the emergence of Large Language
Models (LLMs), have created new opportunities for extracting meaningful insights from unstructured
textual data. These sophisticated models, trained on vast and diverse corpora, demonstrate remarkable
capabilities in context handling, pattern recognition, and entity identification within complex documents.</p>
      <p>This paper presents a systematic investigation into the integration of LLMs within process mining
workflows. Our research focuses specifically on enhancing event log fidelity and comprehensiveness
through the extraction of information embedded in unstructured textual sources. In particular, we
propose a framework to investigate the automated extraction of new events and dates from legal
text. Such information can be incorporated into event logs, enhancing the accuracy of the following
process analysis. The proposed framework highlights the potential of combining LLM-driven extraction
techniques with traditional process mining methodologies, paving the way for more comprehensive
and data-informed process optimisation strategies. In a practical case study, we focus on a specific
application within the legal domain. Specifically, we analyse tender notices that contain rich, yet
unstructured, information about public procurement processes. As a proof-of-concept (POC), we focus
on the complementary information contained in Tenders Electronic Daily (TED) documents from 2022.
The research question investigates how LLMs can extract meaningful process-related information from
unstructured legal texts and enhance the completeness of event logs in public procurement process
analysis.</p>
      <p>The paper is organised as follows: Section 2 synthesises pertinent literature regarding the utilisation
of unstructured data in process mining methodologies. Section 3 delineates our proposed framework,
employed in a case study described in Section 4. Section 5 elucidates the experimental configuration and
preliminary findings, followed by a comprehensive analytical discussion in Section 6. Finally, Section 7
presents concluding observations and implications for future research trajectories.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Process Mining (PM) is a research area that focuses on extracting knowledge from event logs to analyse
and improve business processes [
        <xref ref-type="bibr" rid="ref10 ref2">2</xref>
        ]. The discipline combines techniques from data mining, business
process management, and machine learning to discover, monitor, and optimise workflows based on
real execution data. PM techniques include methods for event log generation, process discovery,
conformance checking, and predictive analytics [
        <xref ref-type="bibr" rid="ref11">3</xref>
        ].
      </p>
      <p>
        Despite traditional PM methods relying on structured event logs, the increasing availability of
unstructured data sources led to a growing interest in combining NLP techniques and process analysis [
        <xref ref-type="bibr" rid="ref12">4</xref>
        ].
Therefore, NLP techniques can be used in PM in case of textual information related to business processes,
i.e. documents which may contain valuable process-related knowledge that can be extracted and
structured into event logs. Such techniques can also be applied to legal processes, as confirmed by the
growing attention on process mining in the legal field [
        <xref ref-type="bibr" rid="ref13">5</xref>
        ].
      </p>
      <p>
        The adoption of LLMs, such as BERT, GPT, and transformers, has recently significantly focused on
process modelling [
        <xref ref-type="bibr" rid="ref14">6, 7</xref>
        ], abstraction techniques [8], semantic-aware process mining tasks [9], event
abstraction [10]. A recent work investigates the adoption of LLM to create SQL queries for the extraction
of event logs from the database [11]. In particular, the identification of events within textual data can be
performed with several methods using rule-based linguistic techniques (e.g., a recent work investigates
event log enrichment from unstructured legal text with regular expressions [12]). In this work, we
propose the adoption of LLMs to perform the task.
      </p>
      <p>A crucial aspect of NLP-driven PM is the enrichment of event logs by integrating information
extracted from unstructured text. Beyond traditional event identification, event log enrichment involves
associating extracted entities with structured process representations, enhancing timestamps, detecting
missing events, and linking unstructured descriptions to predefined process activities. We address this
challenge with LLMs-based automatic extraction of events and dates from text, within a
human-in-theloop validation framework [13] based on legal domain experts involved in the research.</p>
      <p>
        The increasing adoption of these techniques across various domains further demonstrates the potential
of NLP-enhanced process mining to address complex, real-world challenges. Process mining techniques
have been increasingly applied across diverse domains, including education [14, 15], administrative and
legal settings [16, 17], public procurement [
        <xref ref-type="bibr" rid="ref3">18</xref>
        ], fraud detection [
        <xref ref-type="bibr" rid="ref4">19</xref>
        ], and regulatory compliance [
        <xref ref-type="bibr" rid="ref5">20</xref>
        ],
often in combination with NLP methods [
        <xref ref-type="bibr" rid="ref6 ref7">21, 22</xref>
        ]. These cross-domain applications demonstrate the
lfexibility of PM methodologies and highlight the growing relevance of integrating textual information
into process-aware analyses. Building on this trend, recent eforts have begun exploring the application
of NLP-enhanced PM in healthcare scenarios, where the complexity and critical nature of clinical
workflows require explainability [
        <xref ref-type="bibr" rid="ref8">23</xref>
        ], traceability, and robust integration of structured and unstructured
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodological framework</title>
      <p>In this section, we introduce a methodological framework for enriching process mining event logs
through the extraction of events from unstructured legal documents, which has been implemented in a
POC.</p>
      <p>The proposed framework consists of a pipeline designed to transform unstructured textual information
into structured event log entries compatible with standard process mining algorithms. The idea we
started with is to extract a new event with a date, but here we propose key steps that can also be
applied to other types of textual information to be extracted. This pipeline encompasses document
preprocessing, information extraction, validation mechanisms, and description of newly identified
process elements. We detail distinct modules of the methodological pipeline, as illustrated in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.0.1. Data Inspection and Pattern Identification.</title>
        <p>Following the identification of the domain and initial document corpus, a preliminary qualitative data
inspection conducted by domain experts facilitates the identification of specific cases, requirements, and
patterns. Through this analytical process, relevant textual segments containing pertinent information
emerge, along with additional requirements such as the presence of keywords that identify specific
events or activities beneficial for inclusion in the log file for subsequent process mining tasks.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.0.2. Data Filtering.</title>
        <p>Once the textual segments to process in the domain-specific requirements are identified, it may be
necessary to apply filtering mechanisms for reducing dimensionality and consequently resource
consumption. For instance, events may fall exclusively within a subset of possible activities that can be
identified through natural language processing methodologies, including stemming and lemmatisation.
With respect to dates, those outside the temporal range of interest may be excluded.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.0.3. Event Detection.</title>
        <p>This module encompasses the extraction of data of interest, e.g. the name of the event and the
corresponding date pairs from the (filtered) dataset. When data exhibit standard structures, classical natural
language processing methodologies may be employed (e.g., regular expressions). Alternatively, LLMs
or comparable approaches can be used when the unstructured nature of the data necessitates more
advanced semantic context management.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.0.4. Evaluation on a sample of cases.</title>
        <p>The results obtained must undergo an evaluation to assess the performance of the methodology and
the selected requirements relative to the domain, dataset, and data type utilised. The evaluation may
proceed through automatic or manual approaches contingent upon the existence of a gold standard
or alternative automatic evaluation methods; otherwise, manual inspection would confer enhanced
experimental relevance. Suggestions collected at this stage allow directing the final extraction and
provide useful suggestions for improving the data through post-processing.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.0.5. Post-processing.</title>
        <p>Results obtained in the LLM extraction phase may contain noise, necessitating further investigation or
removal. For instance, the corresponding documents containing keywords and dates within a specific
range might also encompass event-date pairs irrelevant to log file enrichment.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.0.6. Event log Creation and Enrichment.</title>
        <p>Upon acquisition of new information, as in the case of new event-date pairs, incorporation into either
new log files or existing ones becomes necessary. In the latter scenario, newly inserted entries must
maintain consistent formatting, contain compatible case identifiers (which could align with existing
records in the log file belonging to the same case), and establish default values for potentially incomplete
ifelds.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proof-of-concept</title>
      <p>A case study has been selected to apply the proposed framework. This section describes the data, the
practical steps, and an example of utility in process mining. In particular, we implemented a
proof-ofconcept focused on public procurement processes. Specifically, we relied on a previous work in which
we already extracted an event log from data with 6 events from the publication of the notice to the
closing of the contract [12]. The goal of the actual POC is to extract new data, when present, that is
related to the procedure under consideration.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>We utilised a corpus of TED documents from 2022, which represent oficial public procurement notices
from the European Union1. These documents were selected due to their standardised yet linguistically
complex nature, containing rich procedural information often embedded within legal terminology and
narrative text.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Data on tenders.</title>
          <p>Each tender is identified by an alphanumeric code referred to as the document-number, which serves as
the key value. The TED dataset includes several relevant attributes: the sector each tender falls under
(Services, Works, or Supplies); the NUTS (Nomenclature of Territorial Units for Statistics) code of the
contracting authority responsible for issuing the tender notice; the type of contracting authority (such
as a Ministry, European Institution, or Regional/local authority); and the amount associated with the
tender.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. The legal process.</title>
          <p>We specifically focused on regional authorities within the Italian cases. The legal process of public
procurements includes the following five main events. The process starts with the event “publication"
of the tender, followed by the “participation" phase, during which individual entities submit their bids.
After this step, the evaluation process leads to the “award", where the contract is assigned to the selected
bidder. Subsequently, the contract becomes legally efective (“contract-start"), and the process concludes
upon reaching the deadline specified in the tender notice (“contract-end").
1TED portal: https://data.europa.eu/data/datasets/ted-csv?locale=en</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Overview of the dataset.</title>
          <p>The full dataset explored in our proofs-of-concept concerns 27,841 tenders, available in PDF and XML
format. As our main interest is to develop an automated method to extract new activities and the
corresponding dates from the unstructured text of public tenders, we manage the TED dataset in XML
format to extract data of interest. The documents are in the Italian language and contain sections such
as “contracting authority”, “subject”, “legal, economic, financial, and technical information”, “procedure”,
and “additional information”. A recent work provides a broader description of the dataset [16]. In our
POC, we focus on a subset of the Italian tender dataset for the year 20222.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Methods and Technologies</title>
        <p>This subsection describes the application of the framework proposed in Section 3 to the specific legal
case of interest. The development of the modules in Figure 1 is detailed below according to the two
macro sets of modules related to data management and event extraction for insertion into logfiles.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Document handling.</title>
          <p>Upon inspection of the previously described dataset, the information useful for enriching existing
logfiles is contained in the “additional information" section, which corresponds to the &lt;INFO_ADD&gt; tags
aggregated within &lt;COMPLEMENTARY_INFO&gt; in XML formats. Considering the legend of acronyms
and abbreviations used in the Regulations database section3, a set of stems4 and acronyms5 were defined
for entries.</p>
          <p>The documents containing complementary information do not always exist, as this is an optional
section, number 10781. Among these, 3505 documents contain dates in the Italian format "dd/mm/yyyy".
Subsequently, the TED documents containing at least one of the selected stems in their complementary
information were filtered, yielding 2352 instances on which to experiment. At this point, a mapping
was performed in a separate document with the caseid, the TED document identifier, and the related
complementary information.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Event processing and evaluation.</title>
          <p>The unstructured nature of relevant information contained in TED documents and the flexibility with
which events can be described in the extracted sections led to the choice of leveraging an LLM-based
approach for the event detection task. We select for the experiment GPT-4o6, as it is considered one of
the best-performing models.</p>
          <p>The prompt (Appendix A) requests the extraction of event-date pairs from documents following
six rules useful for describing the task without making the prompt too specific. Subsequently, the
complementary information of the instance is inserted and the output requested is a table with the
following columns: "event" and "date." This method reduces the randomness of the generated output
structure and produces a result that is both machine-readable and easy for humans to interpret. The
results, in markdown format, were stored in a mapping with the respective caseids.</p>
          <p>In total, 2959 event entries (called "name-of-activity") and dates were obtained. To clean the outputs
and obtain entries to be annotated with actual events, entries with "name-of-activity" not contained in
the complementary information were removed, thus obtaining 1015 entries. According to legal experts,
2The 2022 Italian TED dataset used is available at https://bit.ly/46JZe37
3Legend of acronyms/abbreviations used in the Normative section database: "tipologia norma". Available at:
https://www.pim.mi.it/legenda-dei-termini-utilizzati-nella-sezione-normativa
4["deliber", "det", "determin", "dirett", "provved", "decret", "atti", "comunic", "atto", "ordin"]
5["DL", "DM", "DCM", "DPCM", "DLGS", "DAR", "DCR", "DGR", "DUPCR", "DCCR", "DPGR", "DDS", "DSGR", "DDUO", "DDG",
"DCIPE", "CU", "OCDPC", "DSM", "DCMM", "DCoM", "DA", "DCC", "DCUC", "DGC", "DCS", "DD", "DRA", "DDP", "DPP", "DCP",
"DGP", "CdG", "DComP"]
6OpenAI, GPT-4o. Available at: https://openai.com/index/hello-gpt-4o
entries with dates prior to January 1st, 2019, were ignored, as they cannot refer to the case of interest
because they are too far prior to the advertisement of the notice.</p>
          <p>Evaluation. To evaluate the efectiveness of the experiment, two legal experts explored 50 instances
each, selected randomly. Since each instance corresponds to a caseid, each document may contain
multiple event-date pairs within the complementary information, producing a total of 121 entries.
Doubtful cases helped clarify aspects of extraction, as well as identify new possible events to be
extracted.</p>
          <p>The guidelines7 describe the annotation task, which requires, for each entry, indicating a label and
any notes. The labels assume the value "correct" if the extracted pair is suitable for enriching the log
ifle, "doubt" if there is a problem in the extraction with doubts to be identified in the notes, and "error"
if the extraction is considered as incorrect. The considerations of doubtful cases and errors are then
discussed to propose improvements in the extraction or post-processing phase.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section presents the main outcomes of the diferent phases of the POC. We focus initially on
the first results of the extraction evaluation, provided by two domain experts, and then dwell on the
extraction carried out by the LLM. Finally, we provide an example of how information extracted from
language models can significantly enrich the log for legal process analysis.</p>
      <sec id="sec-5-1">
        <title>5.1. Evaluation results</title>
        <p>The evaluation of the sample of LLMs results ofers interesting insights into several aspects. In fact,
the LLM provided the correct result (71.9%). The "error" cases were about 10%, a not-so-negligible
percentage. Nevertheless, a manual inspection of error cases demonstrates how almost all cases can
be managed with post-processing steps of the LLMs results. For instance, the most common errors
were using an inappropriate portion of text that did not contain the keywords used in the identification
phase. Therefore, a script in post-processing has been suficient to remove the occurrences that did not
contain the keywords.</p>
        <p>With "doubt" cases, we obtain meaningful insights. First, the manual inspection enables the
identification of new keywords (e.g., "Disposizione"), which can be added to the set of keywords for a
new extraction. Additionally, a verification reveals the identification of new events (e.g., "request for
clarification"). We have opted not to include this event in the actual process analysis, but it could be
considered for future work. Moreover, it was observed that in a few cases, the LLM identifies two
diferent dates for the same event. This is not an error, as the legal text contains two similar events
occurring on distinct but closely related dates. However, the heuristic considers the earlier date as the
most accurate.</p>
        <p>Finally, we consider it physiological that there may be some errors during automatic extraction,
but the proposed arrangements (initial keywords to identify the paragraph, stemming, and
postprocessing) allow us to achieve a very positive result by minimising any errors. In addition, the domain
experts noticed from the LLM results new events of interest with the corresponding date (e.g., request
for clarification), which, however, is not considered in the current POC as we are interested in the
administrative decision preceding the publicization of the tender.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Event log enrichment with LLMs</title>
        <p>According on the annotators’ feedback, entries were removed if their "name-of-activity" field did not
contain one of the selected stems (after legal acts had been excluded from consideration). As already
mentioned, dates prior to 2019 were disregarded as they were not relevant to this study .
7The annotation guidelines and the annotated spreadsheet are available at https://bit.ly/46JZe37</p>
        <p>In cases where multiple dates were detected for the same caseid, the annotators suggested inserting
only the oldest date in the log file, as it is the only one of interest in the process. In this POC, we add
the new event "Administrative decision" to the event log, which can be explored through traditional
process discovery algorithms.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Process Discovery</title>
        <p>To conclude the presentation of our POC, we propose the legal process discovered with the log
enrichment. The extracted event concerns a subset of cases, as not all TED texts contain such a section, and
not all sections contain the event of interest. By filtering the log for cases starting with the new event,
we obtain the new activity diagrams that include 6 events, both for activity frequency and performance
(the median duration between the activities weights arcs) as depicted in Figure 2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>In conducting the present analysis, the full range of prodromal acts that occur before the tender process
was identified, with it being noted that the legal document initiating this process may be designated
in various ways depending on the procurement type and the issuing authority. This variability is
attributable to the fact that diferent sectors and public bodies operate under distinct Procedural
Requirements and Legal Traditions. As a consequence, the fundamental act - the formal decision to
commence the tender process - can be labelled in diferent ways. Examples of such designations include
"Authorisation to Proceed", "Decision to Contract", "Procurement Decision", "Awarding Decision", and
"Notice of Intended Procurement", reflecting the nuanced influence of legal and administrative practices
on the naming of these essential documents.</p>
      <p>The utilisation of large language models for the extraction of data from legal texts constitutes a
substantial progression in optimising the eficacy of legal document analysis. The evaluation task under
scrutiny highlights the model’s capacity for rapid extraction of key information, including events and
dates. However, the presence of errors in the outputs of LLMs necessitates a cautious approach, given
the inherent complexity and nuanced nature of legal language, which can lead to misinterpretations.
While the risk of embedded biases within the model’s training data appears minimal in this context, it
remains essential that automated data extraction undergoes comprehensive human review to ensure
accuracy and to mitigate any potential for erroneous or discriminatory outcomes. Furthermore, the
capability to accurately ascertain the point in time at which preliminary actions were commenced
during the tender process is of immense legal significance. This enables a more profound comprehension
of the procedural issues related to adjudication and facilitates the identification of any inconsistencies
on the part of the public administration in adhering to the legally mandated time frame between the
prodromal steps and adjudication. An interesting additional consideration concerns the evaluation
phase and the role of domain experts involved in the proposed framework. As the slight disagreement
between the two annotators in the evaluation stage on doubt cases showed, the legal domain presents
considerable complexity. The human-in-the-loop approach therefore, still proves to be the correct one
for this type of analysis, for which full automation does not seem easily achievable at present.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and future work</title>
      <p>We introduced a framework that enhances process mining by leveraging LLMs to extract meaningful
information from unstructured data. Through a case study in the legal domain, we demonstrated how
language models can identify events and data in tender notices, enriching event logs for enhancing
process analysis. Our results highlight the potential of integrating LLMs exploitation into automated
process analysis to improve the accuracy and depth of event representations. In the future, we intend
to apply this technique to conduct a comprehensive analysis of the distinctions among various public
procurement procedures, taking into account the new implications introduced by the updated Italian
procurement code. First, we aim to conduct a more extensive annotation task involving a larger number
of legal cases. In addition, we intend to systematically analyse instances of annotator disagreement and
compute inter-annotator agreement metrics to assess the reliability and consistency of the annotations,
based on the guidelines proposed and tested in the current work. Moreover, we plan to compare GPT-4o
with other competitor models among both open- and closed-source options. Finally, it is of interest to
extract new events from the data, as we noted in the evaluation phase (request for clarification) and
consider legal documents in diferent languages.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The research work in this article was partially conducted as part of the following projects: the Circular
Health European Digital Innovation Hub (CHEDIH) - Grant Agreement n. 101083745; PiemontAIs - PR
FESR 2021/2027 Grant Agreement n. 187173.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>
        During the preparation of this work, the authors used Grammarly to perform a grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[
        <xref ref-type="bibr" rid="ref11">3</xref>
        ] W. M. P. van der Aalst, J. Carmona (Eds.), Process Mining Handbook, volume 448 of LNBIP, Springer,
2022. doi:10.1007/978-3-031-08848-3.
[
        <xref ref-type="bibr" rid="ref12">4</xref>
        ] B. Estrada-Torres, A. del-Río-Ortega, M. Resinas, Mapping the landscape: Exploring large language
model applications in business process management, in: H. van der Aa, D. Bork, R. Schmidt,
A. Sturm (Eds.), Enterprise, Business-Process and Information Systems Modeling, volume 511 of
LNBIP, Springer, 2024, pp. 22–31. doi:10.1007/978-3-031-61007-3\_3.
[
        <xref ref-type="bibr" rid="ref13">5</xref>
        ] L. Genga, H. A. López, E. Sulis, Emerging challenges in legal informatics from machine learning
to llms-preface to the proceedings of the 1st plc workshop, in: 1st International Workshop on
Processes, Laws and Compliance, CEUR-WS, 2024. URL: https://ceur-ws.org/Vol-3850/preface.pdf.
[
        <xref ref-type="bibr" rid="ref14">6</xref>
        ] H. Kourani, A. Berti, D. Schuster, W. M. P. van der Aalst, Process modeling with large language
models, in: H. van der Aa, D. Bork, R. Schmidt, A. Sturm (Eds.), Enterprise, Business-Process and
Information Systems Modeling, Springer Nature Switzerland, Cham, 2024, pp. 229–244.
[7] A. Norouzifar, H. Kourani, M. Dees, W. M. P. van der Aalst, Bridging domain knowledge and process
discovery using large language models, in: K. Gdowska, M. T. Gómez-López, J.-R. Rehse (Eds.),
BPM Workshops, Springer Nature Switzerland, Cham, 2025, pp. 44–56.
[8] A. Berti, D. Schuster, W. M. P. van der Aalst, Abstractions, scenarios, and prompt definitions
for process mining with llms: A case study, in: J. D. Weerdt, L. Pufahl (Eds.), Business
Process Management Workshops - BPM 2023 International Workshops, Utrecht, The Netherlands,
September 11-15, 2023, Revised Selected Papers, volume 492 of LNBIP, Springer, 2023, pp. 427–439.
doi:10.1007/978-3-031-50974-2\_32.
[9] A. Rebmann, F. D. Schmidt, G. Glavaš, H. van Der Aa, Evaluating the ability of llms to solve
semantics-aware process mining tasks, in: 2024 6th International Conference on Process Mining
(ICPM), 2024, pp. 9–16. doi:10.1109/ICPM63005.2024.10680677.
[10] E. Brzychczy, K. Kluza, L. Szala, Enhancement of low-level event abstraction with large language
models (llms), in: K. Gdowska, M. T. Gómez-López, J. Rehse (Eds.), BPM Workshops, Krakow,
Poland, Sept. 1-6, 2024, volume 534 of Lecture Notes in Business Information Processing, Springer,
2024, pp. 209–220. doi:10.1007/978-3-031-78666-2\_16.
[11] V. S. Dani, M. Dees, H. Leopold, K. Busch, I. Beerepoot, J. M. E. M. van der Werf, H. A. Reijers,
Event log extraction for process mining using large language models, in: M. Comuzzi, D. Grigori,
M. Sellami, Z. Zhou (Eds.), Cooperative Information Systems - 30th International Conference,
CoopIS 2024, Porto, Portugal, November 19-21, 2024, Proceedings, volume 15506 of Lecture Notes
in Computer Science, Springer, 2024, pp. 56–72. doi:10.1007/978-3-031-81375-7\_4.
[12] R. Nai, E. Sulis, L. Genga, Automated analysis with event log enrichment of the european public
procurement processes, in: T. P. Sales, J. Araújo, J. Borbinha, G. Guizzardi (Eds.), Advances in
Conceptual Modeling - ER 2023 Workshops, JUSMOD, Lisbon, Portugal, November 6-9, 2023,
Proceedings, volume 14319 of Lecture Notes in Computer Science, Springer, 2023, pp. 178–188.
doi:10.1007/978-3-031-47112-4\_17.
[13] C. Fernandez-Llatas, Interactive Process Mining in Healthcare: An Introduction, Springer
International Publishing, Cham, 2021, pp. 1–9. doi:10.1007/978-3-030-53993-1_1.
[14] R. Nai, E. Sulis, L. Genga, Enhancing e-learning efectiveness: a process mining approach for
shortterm tutorials, J. Intell. Inf. Syst. 62 (2024) 1773–1794. doi:10.1007/S10844-024-00874-9.
[15] R. Nai, E. Sulis, E. Marengo, M. Vinai, S. Capecchi, Process mining on students’ web learning
traces: A case study with an ethnographic analysis, in: O. Viberg, I. Jivet, P. J. Muñoz-Merino, M. A.
Perifanou, T. Papathoma (Eds.), Responsive and Sustainable Educational Futures - 18th EC-TEL
2023, Aveiro, Portugal, September 4-8, 2023, Proceedings, volume 14200 of LNCS, Springer, 2023,
pp. 599–604. doi:10.1007/978-3-031-42682-7\_48.
[16] R. Nai, E. Sulis, R. Meo, Ith: an open database on italian tenders 2016–2023, Scientific Data 11
(2024) 1452. doi:10.1038/s41597-024-04342-5.
[17] R. Nai, E. Sulis, P. Pasteris, M. Giunta, R. Meo, Exploitation and merge of information sources
for public procurement improvement, in: Machine Learning and Principles and Practice of
Knowledge Discovery in Databases - International Workshops of ECML PKDD 2022, Grenoble,
France, September 19-23, 2022, Proceedings, Part I, 2022. doi:10.1007/978-3-031-23618-1_6.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          , Fundamentals of Business Process Management, Springer,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -56509-4.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , Process Mining - Data Science in Action, Springer,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>662</fpage>
          -49851-4.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          , E. Sulis,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gorgerino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Racca</surname>
          </string-name>
          , L. Genga,
          <article-title>Process mining on a public procurement dataset: A case study</article-title>
          , in: R.
          <string-name>
            <surname>Meo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          (Eds.),
          <source>International Workshops of ECML PKDD</source>
          <year>2023</year>
          , Turin, Italy, Sept.
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2023</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <string-name>
            <given-names>Papers</given-names>
            ,
            <surname>Part</surname>
          </string-name>
          <string-name>
            <surname>I</surname>
          </string-name>
          , volume
          <volume>2133</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>477</fpage>
          -
          <lpage>492</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -74630-7\_
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          , E. Sulis,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          ,
          <article-title>Public procurement fraud detection and artificial intelligence techniques: a literature review</article-title>
          , in: D.
          <string-name>
            <surname>Symeonidou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ceolin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poveda-Villalón</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Audrito</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          <string-name>
            <surname>Caro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Grasso</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Nai</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Sulis</surname>
            ,
            <given-names>F. J.</given-names>
          </string-name>
          <string-name>
            <surname>Ekaputra</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kutz</surname>
          </string-name>
          , N. Troquard (Eds.),
          <source>Companion Proceedings of the 23rd EKAW Conference</source>
          , Bozen-Bolzano, Italy,
          <source>September 26-29</source>
          , volume
          <volume>3256</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3256</volume>
          /km4law4.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Audrito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M. S.</given-names>
            <surname>Trifiletti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          , L. Genga,
          <article-title>Leveraging process mining and event log enrichment in european public procurement analysis: a case study</article-title>
          ,
          <source>Computer Law &amp; Security Review</source>
          <volume>57</volume>
          (
          <year>2025</year>
          )
          <article-title>106144</article-title>
          . doi:https://doi.org/10.1016/j.clsr.
          <year>2025</year>
          .
          <volume>106144</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          , G. Morina,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pasteris</surname>
          </string-name>
          ,
          <article-title>Public tenders, complaints, machine learning and recommender systems: a case study in public administration</article-title>
          ,
          <source>Comput. Law Secur. Rev</source>
          .
          <volume>51</volume>
          (
          <year>2023</year>
          )
          <article-title>105887</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.CLSR.
          <year>2023</year>
          .
          <volume>105887</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Fatima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          ,
          <article-title>Large language models and recommendation systems: A proofof-concept study on public procurements</article-title>
          , in: A.
          <string-name>
            <surname>Rapp</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          <string-name>
            <surname>Caro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Meziane</surname>
          </string-name>
          , V. Sugumaran (Eds.),
          <source>NLDB</source>
          <year>2024</year>
          , Turin, Italy, June 25-27,
          <year>2024</year>
          , Proceedings, volume
          <volume>14763</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>280</fpage>
          -
          <lpage>290</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -70242-6\_
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Meo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nai</surname>
          </string-name>
          , E. Sulis, Explainable, interpretable, trustworthy, responsible, ethical, fair,
          <source>verifiable AI</source>
          ...
          <article-title>what's next?</article-title>
          , in: S. Chiusano,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cerquitelli</surname>
          </string-name>
          , R. Wrembel (Eds.), ADBIS, Turin, Italy,
          <source>Sept. 5-8</source>
          ,
          <year>2022</year>
          , Proceedings, volume
          <volume>13389</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -15740-0\_3.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>1. Only include dates in dd/mm/yyyy format</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          2.
          <article-title>Ignore incomplete dates (missing day, month</article-title>
          ,
          <source>or year)</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          3.
          <article-title>Event names contain the type of document and document number</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          4.
          <article-title>Present results in a markdown table</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          5.
          <article-title>Each row contains the event name and its associated date</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          6.
          <article-title>No additional text before or after the table</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>