<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Process Mining meets Statistical Model Checking to Explain Threat Models: Novel Approach to Model Validation and Enhancement (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Casaluce</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Pisa</institution>
          ,
          <addr-line>Lungarno Pacinotti 43 56126 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sant'Anna School of Adv. Studies</institution>
          ,
          <addr-line>Piazza Martiri della Libertà, 33 - 56127 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>13</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Formal verification of the dynamics of a system can be conducted by employing statistical analysis techniques, such as Statistical Model Checking (SMC). SMC techniques resort to probabilistic simulations to evaluate the system properties to help to circumvent the state space explosion problem, the wellknown curse of the classic model checking techniques. Nevertheless, SMC provides only estimations and confidence intervals of the evaluated properties of the system without explaining why the analysis estimated a particular property value. This project aims to present a novel methodology that integrates SMC with process-oriented data-driven techniques known as process mining (PM) applied to threat models. This methodology will empower modelers to see their models' unfolded behavior instead of just numerical aggregated values obtained by SMC analysis. In the present work, there are two research goals. The primary research goal focus on implementing and validating the novel methodology in which we enrich SMC techniques with PM techniques. The secondary research goals focus on implementing an approach to extract an attack pattern from its textual description and another to extract a textual description of the salient information from the process model discovered using PM techniques. The secondary research goals add the necessary means to assist a modeler in using this novel methodology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Statistical Model Checking (SMC) techniques have been applied to various domains to analyze
the dynamics of the systems, even when those systems have complex dynamics. Indeed, those
complex dynamic systems can only be statistically analyzed by resorting to the simulations of
their properties to avoid running into the space explosion problem common to the other classic
numerical techniques [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. When SMC is used to analyze the dynamics of a system without prior
knowledge of the overarching behavior of the formal model is called black-box SMC [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As
discussed in the next section, the state-of-the-art methodology for validating a model through
SMC techniques can mainly rely on plots, numerical results, or counterexamples to study the
properties of the model. However, when the results of the analyses are not what the modeler
SMC analysis
(MultiVeStA)
expected, they need to make an informed guess on how to intervene in the model to fix an
unexpected mode’s behavior. The present project aims to help the modeler rely upon more than
just an informed guess to improve the model by integrating SMC with process mining (PM)
techniques.
      </p>
      <p>Two main research goals are presented below, to which we give diferent levels of priority.
Our project’s primary research goal, with the highest priority, is dedicated to working on
the central part of the novel methodology where we enrich SMC with PM techniques. The
secondary research goals are directed at completing the primary research goal. One of the
secondary research goals is to implement ways to automate the extraction of the attack model
and the attacker behavior from a textual description. The other secondary research goal is
directly to present through a textual description the most salient information discovered by
mNuimneirnicagl simulations of the formal model using PM techniques. When the salient information
results and
coeuxnsttienrrgealxecamtepd from the simulations comevmBallauucaknt-iboioncxoaftes the presence in the model of some unwanted
behleaviors, we could use this informnuamteiroicnal rteosulfixts the formal model automatically.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research goals</title>
      <p>White-box eBvaelhuaavtiioonraolf
2.1. Primary RepsroecaenasasrlymsciisnhingGoal process mining</p>
      <p>
        results
The first research goal is to propose a novel methodology in which SMC is enriched with PM
techniques that assists a modeler in validating and identifying flows in the model or enhancing
opportunities. Fig. 1 depicts the life cycle of the state-of-the-art methodology based on SMC
that starts with the model creation followed by SMC analysis that returns numeric results, plots,
and counteUrneexxpaemctepdlbeehsa[vi3or]d.isWcovhereendwtithheprnocuesmsmeirniincgaalndrenusmuelrticsalarreseulntsot what the modeler expected, e.g., if
the probability of success of an attack is too low, the modeler needs to make an informed guess
on how to change the model to improve its performance. In our previous work, on the first
experimental results of our methodology [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we called this way to validate a model SMC-guided
black-box model validation since the modeler evaluates the model and makes changes based
only on the numerical results without any other information.
      </p>
      <p>Model creation</p>
      <p>The novel methodology proposed here, depicted in Fig. 2, is defined as an SMC- and PM-guided
white-box behavioral model validation in which the state-of-the-art method, depicted in Fig. 1, is
augmented with PM techniques. The additions of the novel methodology are colored in green
in Fig. 2. Now, besides the numerical results, the modeler also has, thanks to PM techniques,
a behavioral evaluation of the model to support them in identifying flaws and improving the
model. Indeed, this is defined as a white-box behavioral model validation precisely because the
graphical results obtained by the PM analysis represent a closer representation of the original
model displaying an overview of its overarching behavior.</p>
      <p>
        Methodology. We use RisQFLan, a quantitative graphic-based framework for risk modeling
and analysis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] belonging to the so-called QFLan family [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. RisQFLan uses graph-based
security models called Attack Defense Trees, a variant of the classic AttacrNekusumltestraircnaedles that can
single
represent an intuitive visual language to describe an attack scenario. In adcodunitterieoxanmp, RisQFLan
le
supports SMC analysis using MultiVeStA [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], a black-box statiSsMtCiacnaalylsismodel checker thatBlaccka-bnox
be integrated with diferent simulators to add more reMloidael bcrelaetionstatisti(cMualtliVeaStAn) alysis techniquesn.umWericael results
evaluation of
have enriched MultiVeStA with PM-oriented logging capabilities by extending its interface
with simulators with new features that allow saving the complete traces ofInefoarmcehdguess driven by numericalaressults
simulation
event logs. Although we implement the novel methodology proposed using MultiVeStA and
RisQFLan, this can be applied to any simulation models and SMC tools since it does not rely on
the internal mechanics of either the analyzer or the model but exploits logs of the computed
simulations. We implement PM techniques using Fluxicon Disco and PM4PY, a Python library,
to experiment with diferent PM algorithms.
2.2. Secondary Research Goals
Secondary research goals mainly aim to help non-experts apply this novel methodology to
validate threat models. On the one hand, we could automate the extraction of the attack
models and the attacker behaviors directly using the textual description from diferent data
sources. Currently, the model creation (Fig. 2) is performed by manually encoding the model in
RisQFLan; therefore, an automatic creation from a textual description of the attack model could
improve the usability of our methodology. Moreover, a model extracted from a description
could help the modeler automatically enhance the threat model when new attack strategies
are available to the attacker. On the other hand, another way to increase the usability of the
novel methodology is achieved by describing the most salient information of the process model
obtained from the event logs. Indeed, a process model extracted from the simulated event
logs representing the model’s behavior might need to be explained to the non-expert of PM
when they visually inspect the process model. Here, we could present the user with a textual
description of the unwanted behaviors of the model discovered to help the modeler to identify
what needs to be correct in the formal model.
      </p>
      <p>
        Methodology. To implement the automatic creation of the attack model and attacker behavior,
we experiment with NLP techniques using pre-trained models, such as the GPT3 model that
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used with promising results. In their work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], they demonstrated how it is possible
to extract meaningful information from the text describing a process using Large Language
Models (LLM). In our case, we would extract from the textual description of a threat model the
activities/strategies an attacker can use to complete their attack. We will investigate techniques
to enhance the performance of the GPT-3 model by improving the prompt design [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
Then, we could fill in templates with the information extracted and feed them into RisQFLan
to create the actual attack models. Furthermore, within the secondary research goals, we
could implement a method to automatically extract a textual description of the most salient
information found in the process model by employing fuzzy quantification techniques and
natural language generation (NLG) tools [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This approach would help identify unwanted
behaviors in the formal model during the phase of evaluating the model’s behavior discovered
by mining the event logs.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Planned Research</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we presented a prototypical instantiation of our approach. We demonstrated how even a
trivial threat model could display unexpected behaviors. Indeed, we could find and fix unwanted
behaviors in the formal model thanks to PM techniques. Although these experimental results
showed the potentiality of the methodology, this needs to be validated with experiments on
complete threat models. Therefore, the next step in our project would be to work on real threat
models to validate our methodology. At the same time, we will experiment with diferent
discovery PM algorithms or create an ad hoc algorithm to extract the attack models from the
event logs. If an ad hoc algorithm is needed, we will also work on a model-to-model metric to
measure how much the normative models overlay with the discovered ones. We will experiment
with the secondary research goals once we have completed the previous steps for the primary
research goal. The last step would be to put together the primary and secondary research goals
in a final work.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Simulation-based validation approaches run statistical analysis to evaluate the properties of
a simulated model returning numerical estimations of those properties. However, without
providing behavioral explanations on why the analyses returned those results, the modeler
can make only an informed guess based on the numeral results obtained to adjust the model
and fix unwanted behaviors. The present project proposes a novel methodology that can help
reason upon the whole behavior of the model and understand why the analysis of the model’s
properties has returned some estimations of those properties. We achieve that by integrating
the simulation-based analysis technique from formal quantitative methods known as SMC
with the data- and process-driven techniques known as PM. Thanks to the widespread use of
simulation models, this new methodology would be valuable among several other disciplines to
help identify issues in the model (validation) or suggest relevant improvements (enhancement)
to the modeler.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Agha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Palmskog</surname>
          </string-name>
          ,
          <article-title>A survey of statistical model checking</article-title>
          ,
          <source>ACM Trans. Model. Comp. Simul</source>
          .
          <volume>28</volume>
          (
          <year>2018</year>
          ) 6:
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          :
          <fpage>39</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Viswanathan</surname>
          </string-name>
          , G. Agha,
          <article-title>Statistical model checking of black-box probabilistic systems</article-title>
          , in: International Conference on Computer Aided Verification, Springer,
          <year>2004</year>
          , pp.
          <fpage>202</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bulychev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikucionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Poulsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Legay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , UPPAAL-SMC:
          <article-title>statistical model checking for priced timed automata</article-title>
          ,
          <source>in: Proc. QAPL'12</source>
          , volume
          <volume>85</volume>
          ,
          <year>2012</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Casaluce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaromonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>Process mining meets statistical model checking: Towards a novel approach to model validation and enhancement</article-title>
          , in: C.
          <string-name>
            <surname>Cabanillas</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <string-name>
            <surname>Garmann-Johnsen</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Koschmider (Eds.),
          <source>Business Process Management Workshops</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Lafuente</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vandin</surname>
          </string-name>
          ,
          <article-title>Quantitative security risk modeling and analysis with RisQFLan</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>109</volume>
          (
          <year>2021</year>
          )
          <fpage>102381</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H. ter Beek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Legay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lluch-Lafuente</surname>
          </string-name>
          ,
          <article-title>QFLan: A Tool for the Quantitative Analysis of Highly Reconfigurable Systems</article-title>
          , in: FM,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sebastio</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Vandin,
          <article-title>MultiVeStA: statistical model checking for discrete event simulators</article-title>
          ,
          <source>in: 7th Int. Conf ValueTools'13</source>
          , ICST/ACM,
          <year>2013</year>
          , pp.
          <fpage>310</fpage>
          -
          <lpage>315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vandin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Giachini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lamperti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaromonte</surname>
          </string-name>
          ,
          <article-title>Automated and distributed statistical analysis of economic agent-based models</article-title>
          ,
          <source>Journal of Economic Dynamics and Control</source>
          (
          <year>2022</year>
          )
          <fpage>104458</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <article-title>Leveraging pre-trained language models for conversational information seeking from text</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Orr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Chami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <article-title>Ask me anything: A simple strategy for prompting language models</article-title>
          ,
          <source>arXiv preprint arXiv:2210.02441</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Saparov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Language models are greedy reasoners: A systematic formal analysis of chain-of-thought</article-title>
          ,
          <source>arXiv preprint arXiv:2210.01240</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fontenla-Seco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>González-Salvado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Peña-Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bugarín-Diz</surname>
          </string-name>
          ,
          <article-title>A framework for the automatic description of healthcare processes in natural language: Application in an aortic stenosis integrated care process</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>128</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>