<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Threat Model Validation: A White-Box Approach based on Statistical Model Checking and Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Casaluce</string-name>
          <email>roberto.casaluce@santannapisa.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Burratin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Chiaromonte</string-name>
          <email>francesca.chiaromonte@santannapisa.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Lluch Lafuente</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Vandin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DTU Technical University of Denmark</institution>
          ,
          <addr-line>Lyngby</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dept. of Statistics and Huck Institutes of the Life Sciences, Penn State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute of Economics and L'EMbeDS, Sant'Anna School of Advanced Studies</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our method addresses the challenge of validating threat models by comparing actual behavior with expected behavior. Statistical Model Checking (SMC) is frequently the more appropriate technique for validating models, as it relies on statistically relevant samples to analyze systems with potentially infinite state spaces. In the case of black-box systems, where it is not possible to make complete assumptions about the transition structure, black-box SMC becomes necessary. However, the numeric results of the SMC analysis lack insights on the model's dynamics, prompting our proposal to enhance SMC analysis by incorporating visual information on the behavior that led to a given estimation. Our method improves traditional model validation using SMC by enriching its analyses with Process Mining (PM) techniques. Our approach takes simulated event logs as inputs, and uses PM techniques to reconstruct an observed model to be compared with the graphical representation of the original model, obtaining a dif model highlighting discrepancies among expected and actual behavior. This allows the modeler to address unexpected or missing behaviors. In this paper we further customize the dif model for aspects specific to threat model analysis, incorporating features such as new colored edges to symbolize an attacker's initial assets and a automatic fix for simple classes of modeling errors which generate unexpected deadlocks in the simulated model. Our approach ofers an efective and scalable solution for threat model validation, contributing to the evolving landscape of risk modeling and analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Threat models</kwd>
        <kwd>Probabilistic modeling</kwd>
        <kwd>Statistical model checking</kwd>
        <kwd>Process mining</kwd>
        <kwd>Attack-defense trees</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Our work introduces an approach to validate threat models by providing accessible tools for observing
and comparing the actual behavior of a model with the expected one. When dealing with quantitative
aspects in model behavior validation, exact or statistical analysis techniques are commonly employed.
Exact techniques, while providing precise quantitative properties, may face challenges in handling
complex models due to the state-space explosion problem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. On the contrary, statistical analysis
techniques, like Statistical Model Checking (SMC) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], rely on statistically relevant samples, ofering a
viable solution for analyzing complex systems with potentially infinite state spaces, albeit providing
statistically reliable estimations rather than exact results. In the case of black-box systems, where it
is not possible to assume the complete transition structure, employing a black-box SMC [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] becomes
necessary for analyzing dynamics without prior system knowledge, yielding numerical estimates and
plots. However, these come without an explanation for the obtained numerical results, i.e., we do not
get an explanation on the observed behavior and on why it led to such results.
      </p>
      <p>Legend Diff Model
Nodes Edges Description</p>
      <p>Edges and Nodes in both models
Initial assets
Edges and Nodes missing in the original model</p>
      <p>Edges and Nodes missing in the simulated model</p>
      <sec id="sec-1-1">
        <title>Input Model</title>
      </sec>
      <sec id="sec-1-2">
        <title>Output Models</title>
        <sec id="sec-1-2-1">
          <title>Graphical representation of procedural</title>
          <p>part of the RisQFLan model</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>Reconstruction of input model</title>
          <p>from SMC simulations via PM</p>
        </sec>
        <sec id="sec-1-2-3">
          <title>Comparison between reconstruction and input models (Diff model)</title>
          <p>
            Consider a classic threat model depicting a thief’s strategies for robbing a bank [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], analyzed using
SMC to quantify the probability of a successful robbery, resulting in an unexpected low value. This
prompts questions about the reasons behind the extreme value and whether it aligns with the model’s
intended dynamic or whether it signifies a bug. The numeric results do not provide any insights into
why the analysis yielded those results or if there are issues in the model. On this basis, we formulated
our proposal [
            <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
            ]. Our approach enhances SMC analysis results by automatically incorporating explicit
visual information highlighting discrepancies between the model and specified criteria. This not only
aids in model refinement but also serves as a comprehensive testing method, allowing experimentation
with various settings within the same model structure to assess simulated model correctness. To be
more specific, our method involves enriching SMC analyses by post-processing and analyzing SMC
byproducts, such as log files stored while performing the simulations. We use data-driven techniques
like Process Mining (PM). PM helps to identify process patterns, bottlenecks, and other model issues
through visualizing activity flow [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
          <p>Figure 1 depicts an abstract example of a model validated using our method. The input model
represents an abstract model with diferent states, and actions used to move between those states. The
simulator starts from the node A by choosing the corresponding action to move to other states of the
model. To validate the output of an SMC, without our methodology, a modeler can only use the obtained
numerical results, such as the probability of reaching a certain node in the model. Conversely, our
approach enables the modeler to scrutinize simulation results in a model-to-model language, as opposed
to model-to-numbers. Our method takes as inputs the simulated event logs, stored while performing
the simulations necessary for statistical model checking. These logs are mined with Process Mining
techniques and, then, used to reconstruct the first "Output Model". The rightmost model depicts the
graph obtained by comparing the “Input Model“ and the reconstructed one, highlighting the diferences
between the original model and the simulated one. The diferences highlighted in the rightmost "Output
Model," i.e., the dif model, enable the modeler to quickly identify issues, such as unexpected or missing
behaviors.</p>
          <p>
            Several studies on risk modeling and analysis utilize SMC to evaluate the properties of a threat
model [
            <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
            ]. Additionally, there are other works that perform threat modeling analysis using PM - refer
to [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] for an overview of works on cybersecurity and PM. Nevertheless, to the best of our knowledge, the
ifrst preliminary work in which SMC results were enhanced with PM techniques to aid the modeler in
visually identifying unwanted behaviors was presented in our previous works [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. We further extended
such work in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] by automatically computing a dif model based on the original model specification
language. The latter study demonstrated the efectiveness and scalability of our method, showcasing its
applicability in both Product Line Engineering (PLE), and threat analysis domains. In the current work,
we expand upon the dif model introduced in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] and further customize it for threat model analysis.
The updated dif model incorporates a new type of edge colored in green, symbolizing the initial assets
possessed by the attacker before initiating the attack. For instance, an initial asset could be one of
the two combinations required to open the bank vault, held by a bank clerk who decides to rob the
bank. Our enhanced dif model takes into account the transitions and states in the model, depicting
the attacker’s initial assets by singling them out from other types of nodes and edges. This distinction
helps to separate them from other transitions that are missing due to an error in the formal model.
Additionally, we have incorporated a component in our method that is able to automatically fix specific
classes of issues discovered in the modeler, in particular when unexpected deadlocks are detected in the
simulated model. In Section 4, we conducted experiments on the RobBank model [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] to illustrate the
efectiveness of our method, and we exemplify the automatic fixes proposed by our framework for the
deadlocks identified in the original model.
          </p>
          <p>The remainder of the paper is structured as follows: Section 2 introduces necessary background
material, along with a brief introduction to the RobBank model as a running example. Following this,
Section 3 presents our methodology for the dif model and the approach to fixing deadlocks. Section 4
validates our method on the presented case study. Finally, Section 5 concludes the paper and outlines
future work.</p>
          <p>We make available the use of our updated tool with all the models, the full-size figures and the
replication material for this paper are available at https://t.ly/FVs6Z.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        2.1. Modeling in RisQFLan
In the present work, we use RisQFLan [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a domain-specific instantiation of QFLan [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ] applied
to the risk modeling and analysis domain. RisQFLan provides more capabilities than other risk analysis
tools since it allows for quantitative analysis of the probability of the attack strategies in security threat
models. RisQFLan allows for SMC analysis and the Probability Model Checking (PMC) technique. In
RisQFLan, the probabilistic simulator and SMC (MultiVeStA) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] interact to estimate the properties of
the model statistically.
      </p>
      <p>
        For the experiments, we adopt the threat model from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], outlining potential attack strategies for
a bank robbery. The authors emphasized that the model has infinite state space which necessitates
using SMC for analysis due to the impracticality of exhaustive exploration. RisQFLan models consist
of a declarative part (Attack-defence tree) and a procedural part (probabilistic attacker). Refer to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
for a detailed presentation of RisQFLan. The Attack-defence tree (ADT) illustrates the potential attack
strategies of an attacker to exploit a system. Attack trees are widely used in several domains like, e.g.,
defense (e.g., their use is recommended by NATO [14]), aerospace [15], or safety-critical cyber-physical
systems [16].
      </p>
      <p>Figure 2 illustrates the ADT depicting the root attack goal of robbing the bank and sub-attacks
like OpenVault and BlowUp. OpenVault refines into LearnCombo and GetToVault, connected by an
AND-relation. LearnCombo further breaks down into three nodes, FindCode, governed by a
2-out-of-3relation. An attack node succeeds only if its refinements permit it. Defense nodes, such as Memo and
LockDown, decrease attack success probabilities. Countermeasures, like LockDown, are dynamic defenses
activated by monitored attacks. The ADT suggests two robbery strategies: opening the vault or blowing
it up, both requiring access to the vault. Opening the vault necessitates learning the combination, itself
dependent on discovering at least two codes.</p>
      <p>In addition, RisQFLan enables users to specify probabilistic attacker behavior, thereby facilitating
specific analyses on diferent types of attackers. Figure 3 illustrates the probabilistic attacker behavior
in the original RobBank model. The capacity to define various attacker types is beneficial for evaluating
OpenVault
BlowUp
Attributes</p>
      <p>Cost = 90.0</p>
      <p>Detection Rate = 1.0
LearnCombo</p>
      <p>GetToVault
FindCode1</p>
      <p>Attributes
Cost = 5.0</p>
      <p>FindCode3</p>
      <p>Attributes</p>
      <p>Cost = 5.0
2:3
2:3</p>
      <p>2:3
FindCode2</p>
      <p>Attributes
Cost = 5.0</p>
      <p>Memo
Defense Effectiveness</p>
      <p>
        ALL : FindCode2 = 0.5
system vulnerabilities and allocating resources for system protection. Furthermore, this probabilistic
attacker behavior in RisQFLan may enable the incorporation of human factors into the model. For
instance, this can be achieved by modifying probabilities associated with specific vulnerabilities in
the system due to user interaction, such as clicking on infected links or falling for phishing emails.
Alternatively, users may design attacker behaviors that account for these human factors. In RisQFLan,
the attacker behavior is modeled as a rated transition system. The transitions are labeled with the
actions executed, and the rates used to compute the probability of completing one of the actions. These
transitions also include the so-called guards that represent the conditions in which an action can be
executed and the possible efects of executing a specific action. Besides the transition guards on a
specific action, RisQFLan supports quantitative constraints that allow the imposition of limitations
on the attacker’s behavior by adding a total cost that an attacker can spend during the attack on a
system. Any attempts to perform an attack will cost the attacker even if the attack fails, and they can
keep trying to complete the attack until the limit imposed by the total cost is not reached. Once the
attacker has run out of resources, they cannot try any other attacks on the system. These quantitative
constraints allow us to test the system against diferent types of attackers with diferent resources.
2.2. SMC
Black-box statistical model checking is a method employed in analyzing RisQFLan models using the
tool MultiVeStA. This approach is applied to quantitative properties, such as the likelihood of robbing the
bank. MultiVeStA conducts numerous probabilistic simulations, ensuring statistically reliable estimations
of the specified property. It operates as a simulation-based technique, making no assumptions about
the overarching model behavior. MultiVeStA, a black-box SMC tool, facilitates simultaneous statistical
analyses of multiple properties, ofering scalability and the generation of confidence intervals for
user-specified parameters. For instance, when estimating the expected value E[X] for robbing the
bank, MultiVeStA computes the mean x from n independent simulations within a confidence interval.
It has been successful applied in diverse domains, including economic agent-based models [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
highlyconfigurable systems [
        <xref ref-type="bibr" rid="ref10 ref11">11, 10</xref>
        ], public transportation systems [17, 18], security risk modeling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], lending
pools in decentralized finance [ 19], business process modeling [20], crowd steering scenarios [21], and
robotic scenarios with planning capabilities [22].
      </p>
      <p>1. Model creation</p>
      <p>2. Logs generation
Design model</p>
      <p>SMC analysis
Event logs
for process</p>
      <p>mining
3. Logs pre-processing
Graph. rep.
of proced.</p>
      <p>part
of QFlan
model</p>
      <p>Numerical results and
single counter-example</p>
      <p>Black-box
evaluation of
numerical
results
White-box
testing with
process
mining
Mined PM</p>
      <p>model
4. Process mining</p>
      <p>Conversion</p>
      <p>into
procedural
part of QFlan
model</p>
      <p>Behavioral
evaluation of</p>
      <p>PM results
5. Automatic diff</p>
      <p>Diff model</p>
      <p>
        Unexpected behavior discovered with process mining and numerical results
2.3. Process mining
This section provides a brief overview of the primary tasks involved in process-oriented data-driven
techniques, commonly referred to as Process Mining (PM). PM serves as the interdisciplinary field that
connects data-centric and model-based analysis techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It utilizes real process executions to
derive pertinent insights into how the observed behavior deviates from the intended behavior. PM
encompasses three key activities: (control-flow) discovery, enhancement, and conformance checking.
Discovery seeks to unveil an abstract representation of the executed process by consolidating all
observed instances into a unified model. Once the process model is discovered, it can be enriched
with additional information, such as the frequency of activity/path executions—an activity known as
enhancement. Finally, conformance checking aims to identify cases where the actual executions deviate
from the expected normative model. For the current work, we utilize discovery techniques to extract
information from the execution traces obtained from SMC analyses. We then apply the mined model to
evaluate how well the generated behavior aligns with the initial expectations.
      </p>
      <p>In the following Section 3, we detail our method for constructing the updated dif model, which
graphically highlights disparities between the simulated model’s behavior and the model originally
intended by the modeler. Additionally, we outline the steps employed to automatically resolve deadlocks
identified in the simulated model.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>
        Our method enriches SMC with PM techniques to overcome the limitations of the classic SMC black-box
validation which relies only on numerical results and counterexamples. PM allows us to incorporate a
white-box analysis of the behavior of the model by leveraging mining techniques on simulated model
executions. However, we go beyond a simple discovery algorithm applied to event logs and, instead, we
integrate these two techniques, i.e., SMC and PM, through a graphical component given in the model
specification language to facilitate the automatic discovery of missing or undesirable behaviors in the
model. Figure 5 illustrates our methodology presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The steps employed to create the dif
model are listed below:
• 1. Model creation - Our methodology starts with creating the model where the modeler defines
all the system components, e.g., attack nodes, a list of constraints, and its procedural part.
RisQFLan will then automatically generate the corresponding Attack-defence Tree and a graphical
representation of the procedural part of the RisQFLan model describing the attacker’s behavior.
• 2. Log creation - The second step consists of the simulation of the model in which the modeler
chooses the properties of the modeler that will be evaluated using the SMC tool MultiVeStA.
MultiVeStA has been extended with log capabilities in which two new functionalities enable the
creation of an empty log file, invoked once per SMC analysis, and to add the rows to the log,
invoked whenever an event (of interest) is to be recorded.
• 3. Log pre-processing - In this step, we preprocess the event logs before applying PM techniques.
      </p>
      <p>
        To maintain the correct order and prevent the loss of information about transitions in the logs, we
rename actions by adding names of origin and target states. This step helps avoid losing crucial
information about the executed process.
• 4. Process mining - After pre-processing the event logs, we mine them using the Heuristic Miner
(HM) [
        <xref ref-type="bibr" rid="ref6">23, 6</xref>
        ] algorithm. Once the Mined PM model is discovered, we convert it from a PM model
into a mined RisQFLan model (representing the procedural part of a RisQFLan model). In Figure 1,
this corresponds to the first “Output Model“. In this step, we return the names of the actions to
their original ones which helps to compare the original RisQFLan model with the mined one –
the “Input model“ with the first “Output Model“ in Figure 1.
• 5. Automatic dif - The final step involves parsing graphical representations of the original
RisQFLan model and the mined one to create a dif model, highlighting diferences. The dif
model includes common edges and nodes in both models colored in black, while edges and
nodes unique to one model are highlighted in red. Dashed red edges indicate transitions present
in the original model but missing in the mined one, suggesting constraints preventing those
transitions. Continuous red edges represent transitions only in the mined model, potentially
indicating simulator deadlock states or errors. Our methodology identifies and automatically
ifxes such bugs, providing opportunities to enhance the model beyond classic SMC black-box
validation. Additionally, the dif model includes green edges and nodes, as depicted in Figure 1
in the rightmost "Output model," representing the initial nodes acquired by the attacker before
initiating the system breach.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>4.1. Experiments on the original RobBank model</p>
      <p>To showcase the capability of our approach in identifying undesired behaviors within this domain,
we employ MultiVeStA for the analysis of the query presented in Figure 7. This directs MultiVeStA to
assess the likelihood of success for eight diferent attacks (line 7 in Figure 7) at each simulation step
RisQFLan bbt file
Original model</p>
      <p>Mined PM</p>
      <p>Model</p>
      <p>Extracting the original
components</p>
      <p>Automatic</p>
      <p>Refined</p>
      <p>AB model
Fill in the template</p>
      <p>
        RisQFLan bbt file
Refined model
ranging from 1 to 100. Similar to the approach outlined in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we presume that the attacker possesses a
LaserCutter capable of bypassing the Lockdown defense and has already acquired the initial code for
the vault (line 2 FindoCode1, Figure 7). Our goal is to demonstrate that our methodology efectively
identifies issues in the model and prompts a way to fix them.
4.1.1. Results analysis on the original RobBank model.
      </p>
      <p>MultiVeStA instructed the probabilistic simulator to execute 320 simulations. The analysis results
for step 100 are summarized in Table 1. As expected, the probability of a complete lockdown is zero,
attributed to the presence of the LaserCutter. Consequently, the likelihood of FindCode1 is 1, as both
already belong to the attacker’s initial assets. The SMC results show that the probability of a successful
root attack is 0.19 even with LockDown disabled. This represents a low percentage (approximately 19%)
of simulations concluding with a successful bank robbery. Understanding why this occurs proves to be
challenging through a mere black-box examination of numerical outcomes.</p>
      <p>Illustrated in Figure 8 is the dif model generated by our methodology. It exhibits two red edges
and a red node, two green edges while the remaining edges and nodes are black. The green edges
represent transitions that the simulator never traversed because they were already included in the
initial setup of the attacker (in Figure 7 line 2 FindCode1 is already gained by the attacker). The red
fail(BlowUp) choose add(BlowUp)</p>
      <p>Start
choose add(OpenVault)
fail(OpenVault)
tryAction tryGTV add(GetToVault) fail(GetToVault)
add(RobBank) fail(RobBank)
TryOpenVault</p>
      <p>TryGetToVault</p>
      <p>Complete
tryAction add(LearnCombo) fail(LearnCombo)</p>
      <p>TryLearnCombo
tryAction add(FindCode1) fail(FindCode1) add(FindCode2) fail(FindCode2) add(FindCode3) fail(FindCode3)</p>
      <p>TryFindCode
deadlock
node signifies a unique addition to our methodology: a deadlock node, indicating simulations that
ended unexpectedly due to no enabled transitions. Examining the two red edges reveals that some
simulations ended unexpectedly in the states TryFindCode and TryBlowUp, diminishing the overall
success probability.</p>
      <p>Issue in the model, and an automatic fix. The model has an issue with states TryFindCode and
TryBlowUp, as highlighted in Figure 8. These states can only transition to attempt attacks, and if the
attacker lacks suficient funds for the corresponding attack, they get stuck due to a maximum cost
constraint of 100 EUR (line 6 in Figure 4). To address this, the modeler should add escape transitions
in these nodes, leading back to their parent nodes with no cost, resolving the deadlock problem and
ensuring a more accurate evaluation of properties. Using the mined model, we have implemented an
automatic refinement of the model that adds an escape action in the presence of a deadlock due to
violating the quantitative constraints.
4.2. Refiend model
In Figure 9, we enhance our model by introducing a goBack action, creating transitions from TryBlowUp
to Start and from TryFindCode to TryLearnCombo in Figure 9. The refined attacker behavior, shown
in orange, assigns a low weight of 0.001 to these transitions, encouraging the simulator to select them
only when the other transitions are not allowed.1
4.2.1. Results on the refined model</p>
      <p>Start
tryAction tryGTV add(GetToVault) fail(GetToVault) choose add(BlowUp) fail(BlowUp) goBack</p>
      <p>TryGetToVault TryBlowUp
add(RobBank) fail(RobBank)</p>
      <p>Complete
Thanks to the introduction of these new transitions, the observed dynamics no longer manifest the
previously discussed deadlocks. To ensure this outcome, we utilize the identical query as depicted
in Figure 7. In this particular case, MultiVeStA necessitated the execution of 320 simulations. The
numerical results are listed in Table 2. Except LockDown, which remains static at zero and FindoCode1
which is again equal to 1, all other properties have exhibited an increment Now, the probability of
successfully robbing the bank is approximately 45%. The resultant dif model is shown in Figure 10,
where the absence of red edges or nodes means that the identified issues in the original model are fixed.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future research</title>
      <p>
        In the present work, we have expanded our approach for validating simulation models, with a specific
focus on threat model analysis, and we are now making our updated tool available for practitioners. The
methodology involves a combination of simulation-based analysis techniques derived from Statistical
Model Checking (SMC)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and process-oriented data-driven techniques from Process Mining (PM)[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
More specifically, we employ PM to elucidate SMC analyses, generating a graphical representation
of the system behavior as observed in the SMC simulations. This graphical representation has been
enhanced from the original implementation by incorporating additional information beneficial for the
modeler when examining the results of our method. In our experimental evaluation, we demonstrate
that: (1) our methodology aids in identifying issues in the model and distinguishes them from other
information that are not errors, i.e., transitions belonging to the initial assets of the attacker, and (2) it
generates a refined model when errors are detected in the simulated model.
      </p>
      <p>In future investigations, we will focus on developing an automated tool designed to construct a
comprehensive probabilistic attacker behavior based on an attack tree. This tool will assist a modeler
capable of describing a system using an attack tree but unfamiliar with creating a probabilistic attacker
behavior in RisQFLan. The tool will derive the system rules governing the actions permitted or restricted
for the attacker from the attack tree. For example, consider an attack on a parent node, feasible only
when all its child nodes are acquired. In this scenario, the sequence of attacks initiated by the simulator
should be restricted from gaining the parent node before obtaining all the child nodes. This restriction
is crucial, particularly when considering, for instance, an attacker attempting a lateral movement step
1Figure 9 is the preview of the new attacker behavior model created with our automatic tool.
before gaining access to the system. Allowing such a sequence of attacks would make the model
unrealistic. The new automated tool will generate a complete RisQFLan file with a realistic attacker
behavior that understands the rules. The modeler can easily refine it based on the type of attacker
they intend to simulate. Furthermore, as part of future research, we will aim to extend the use of PM
techniques beyond discovering techniques to help the modeler validate a model. Indeed, our dif model
assists the modeler in checking if the behavior of the simulated model corresponds to the behavior
intended by the modeler in the original model. However, the dif model can only highlight mismatches
between the two models without checking if the order of sequences of attacks is correct, which, as
explained above, is crucial. To ensure that the model’s behavior also follows the correct order in
simulating the attack on the system, we need to extract all the possible paths allowed from the attack
tree. We use these paths to mine a process model, and thanks to the use of conformance checking
techniques, we can ensure that the simulated model does not include cases in which the actual behavior
difers from that imposed by the attack tree.
[14] Research and Technology Organisation of NATO, Improving Common Security Risk Analysis
report, RTO Technical Report TR-IST-049, 2008.
[15] U.S. Department of Defense, Defense Acquisition Guidebook, Section 8.5.3.3, 2009. URL: https:
//bit.ly/3NJjs07.
[16] J. Hu, H. Niu, J. Carrasco, B. Lennox, F. Arvin, Fault-tolerant cooperative navigation of networked
uav swarms for forest fire monitoring, Aerospace Science and Technology 123 (2022) 107494.
[17] S. Gilmore, M. Tribastone, A. Vandin, An analysis pathway for the quantitative evaluation of
public transport systems, in: IFM, 2014.
[18] V. Ciancia, D. Latella, M. Massink, R. Paškauskas, A. Vandin, A tool-chain for statistical
spatiotemporal model checking of bike sharing systems, in: ISOLA’17, 2017.
[19] M. Bartoletti, J. H. Chiang, T. Junttila, A. Lluch-Lafuente, M. Mirelli, A. Vandin, Formal analysis of
lending pools in decentralized finance, in: T. Margaria, B. Stefen (Eds.), Leveraging Applications
of Formal Methods, Verification and Validation. Adaptation and Learning - 11th International
Symposium, ISoLA 2022, Rhodes, Greece, October 22-30, 2022, Proceedings, Part III, volume 13703
of Lecture Notes in Computer Science, Springer, 2022, pp. 335–355. URL: https://doi.org/10.1007/
978-3-031-19759-8_21. doi:10.1007/978-3-031-19759-8\_21.
[20] F. Corradini, F. Fornari, A. Polini, B. Re, F. Tiezzi, A. Vandin, A formal approach for the analysis of</p>
      <p>BPMN collaboration models, JSS 180 (2021) 111007.
[21] D. Pianini, S. Sebastio, A. Vandin, Distributed statistical analysis of complex systems modeled
through a chemical metaphor, in: HPCS, 2014, pp. 416–423.
[22] L. Belzner, R. De Nicola, A. Vandin, M. Wirsing, Reasoning (on) service component ensembles in
rewriting logic, in: Spec., Alg., and Soft., 2014, pp. 188–211.
[23] A. Weijters, W. M. van Der Aalst, A. A. De Medeiros, Process mining with the heuristics
mineralgorithm, Technische Universiteit Eindhoven, Tech. Rep. WP 166 (2006) 1–34.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Lafuente</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>Quantitative security risk modeling and analysis with RisQFLan</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>109</volume>
          (
          <year>2021</year>
          )
          <fpage>102381</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Agha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Palmskog</surname>
          </string-name>
          ,
          <article-title>A survey of statistical model checking</article-title>
          ,
          <source>ACM Trans. Model. Comp. Simul</source>
          .
          <volume>28</volume>
          (
          <year>2018</year>
          ) 6:
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          :
          <fpage>39</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Younes</surname>
          </string-name>
          ,
          <article-title>Probabilistic verification for “black-box” systems</article-title>
          , in: CAV 2015, Springer,
          <year>2005</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Casaluce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaromonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>Process mining meets statistical model checking: Towards a novel approach to model validation and enhancement</article-title>
          , in: C.
          <string-name>
            <surname>Cabanillas</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <string-name>
            <surname>Garmann-Johnsen</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Koschmider (Eds.),
          <source>Business Process Management Workshops</source>
          , Springer International Publishing, Cham,
          <year>2023</year>
          , pp.
          <fpage>243</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Casaluce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaromonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Lafuente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>White-box validation of quantitative product lines by statistical model checking and process mining</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>210</volume>
          (
          <year>2024</year>
          )
          <article-title>111983</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0164121224000268. doi:https://doi.org/10.1016/j.jss.
          <year>2024</year>
          .
          <volume>111983</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          , Process Mining, 2nd ed., Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Vallant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stojanović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Božić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofer-Schmitz</surname>
          </string-name>
          ,
          <article-title>Threat modelling and beyond-novel approaches to cyber secure the smart energy system</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>5149</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jawad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jaskolka</surname>
          </string-name>
          ,
          <article-title>Analyzing the impact of cyberattacks on industrial control systems using timed automata, in: 2021 IEEE 21st international conference on software quality, reliability and security (QRS)</article-title>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>966</fpage>
          -
          <lpage>977</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Macak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Daubner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Buhnova</surname>
          </string-name>
          ,
          <article-title>Cybersecurity analysis via process mining: A systematic literature review</article-title>
          ,
          <source>in: International Conference on Advanced Data Mining and Applications</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lluch-Lafuente</surname>
          </string-name>
          ,
          <article-title>QFLan: A Tool for the Quantitative Analysis of Highly Reconfigurable Systems</article-title>
          , in: FM,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lluch-Lafuente</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>A Framework for Quantitative Modeling and Analysis of Highly (Re)configurable Systems</article-title>
          ,
          <source>IEEE Trans. Software Eng</source>
          .
          <volume>46</volume>
          (
          <year>2020</year>
          )
          <fpage>321</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lluch-Lafuente</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>Statistical analysis of probabilistic models of software product lines with quantitative constraints</article-title>
          , in: D. C. Schmidt (Ed.),
          <source>Proceedings of the 19th International Conference on Software Product Line, SPLC</source>
          <year>2015</year>
          ,
          <article-title>Nashville</article-title>
          ,
          <string-name>
            <surname>TN</surname>
          </string-name>
          , USA, July
          <volume>20</volume>
          -
          <issue>24</issue>
          ,
          <year>2015</year>
          , ACM,
          <year>2015</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          . URL: https://doi.org/10.1145/2791060.2791087. doi:
          <volume>10</volume>
          .1145/ 2791060.2791087.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Giachini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lamperti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaromonte</surname>
          </string-name>
          ,
          <article-title>Automated and distributed statistical analysis of economic agent-based models</article-title>
          ,
          <source>Journal of Economic Dynamics and Control</source>
          (
          <year>2022</year>
          )
          <fpage>104458</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>