=Paper= {{Paper |id=Vol-3299/Paper03 |storemode=property |title=Process Mining meets Statistical Model Checking to Explain Threat Models: Novel Approach to Model Validation and Enhancement (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-3299/Paper03.pdf |volume=Vol-3299 |authors=Roberto Casaluce |dblpUrl=https://dblp.org/rec/conf/icpm/Casaluce22 }} ==Process Mining meets Statistical Model Checking to Explain Threat Models: Novel Approach to Model Validation and Enhancement (Extended Abstract)== https://ceur-ws.org/Vol-3299/Paper03.pdf
Process Mining meets Statistical Model Checking to
Explain Threat Models: Novel Approach to Model
Validation and Enhancement (Extended Abstract)
Roberto Casaluce1,2
1
    Department of Computer Science, University of Pisa, Lungarno Pacinotti 43 56126 Pisa, Italy
2
    Sant’Anna School of Adv. Studies, Piazza Martiri della Libertà, 33 - 56127 Pisa, Italy


                                         Abstract
                                         Formal verification of the dynamics of a system can be conducted by employing statistical analysis
                                         techniques, such as Statistical Model Checking (SMC). SMC techniques resort to probabilistic simulations
                                         to evaluate the system properties to help to circumvent the state space explosion problem, the well-
                                         known curse of the classic model checking techniques. Nevertheless, SMC provides only estimations
                                         and confidence intervals of the evaluated properties of the system without explaining why the analysis
                                         estimated a particular property value. This project aims to present a novel methodology that integrates
                                         SMC with process-oriented data-driven techniques known as process mining (PM) applied to threat
                                         models. This methodology will empower modelers to see their models’ unfolded behavior instead of
                                         just numerical aggregated values obtained by SMC analysis. In the present work, there are two research
                                         goals. The primary research goal focus on implementing and validating the novel methodology in which
                                         we enrich SMC techniques with PM techniques. The secondary research goals focus on implementing
                                         an approach to extract an attack pattern from its textual description and another to extract a textual
                                         description of the salient information from the process model discovered using PM techniques. The
                                         secondary research goals add the necessary means to assist a modeler in using this novel methodology.

                                         Keywords
                                         Process Mining, Statistical Model Checking, Validation, Natural Language Processing, Natural Language
                                         Generation




1. Introduction
Statistical Model Checking (SMC) techniques have been applied to various domains to analyze
the dynamics of the systems, even when those systems have complex dynamics. Indeed, those
complex dynamic systems can only be statistically analyzed by resorting to the simulations of
their properties to avoid running into the space explosion problem common to the other classic
numerical techniques [1]. When SMC is used to analyze the dynamics of a system without prior
knowledge of the overarching behavior of the formal model is called black-box SMC [2]. As
discussed in the next section, the state-of-the-art methodology for validating a model through
SMC techniques can mainly rely on plots, numerical results, or counterexamples to study the
properties of the model. However, when the results of the analyses are not what the modeler

ICPM 2022 Doctoral Consortium and Tool Demonstration Track
$ roberto.casaluce@santannapisa.it (R. Casaluce)
 0000000257869167 (R. Casaluce)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                          13
                expected, they need to make an informed guess on how to intervene in the model to fix an
                unexpected mode’s behavior. The present project aims to help the modeler rely upon more than
                just an informed guess to improve the model by integrating SMC with process mining (PM)
                techniques.
                     Two main research goals are presented below, to which we give different levels of priority.
                Our project’s primary research goal, with the highest priority, is dedicated to working on
                the central part of the novel methodology where we enrich SMC with PM techniques. The
                secondary research goals are directed at completing the primary research goal. One of the
                secondary research goals is to implement ways to automate the extraction of the attack model
                and the attacker behavior from a textual description. The other secondary research goal is
                directly to present through a textual description the most salient information discovered by
                mining
                 Numerical simulations of the formal model using PM techniques. When the salient information
                 results and
                extracted
                    single
               counterexamp
                             from the simulations communicates
                                                        Black-box
                                                       evaluation of   the presence in the model of some unwanted
                                                     numerical results
                behaviors, we could use this information to fix the formal model automatically.
                      le


SMC analysis
(MultiVeStA)
               2. Research goals
                                        White-box           Behavioral
                                      process mining       evaluation of
                                                          process mining
               2.1. Primary Research Goalanalysis
                                                              results


               The first research goal is to propose a novel methodology in which SMC is enriched with PM
 Event logs
               techniques that assists a modeler in validating and identifying flows in the model or enhancing
               opportunities. Fig. 1 depicts the life cycle of the state-of-the-art methodology based on SMC
               that starts with the model creation followed by SMC analysis that returns numeric results, plots,
               and counterexamples        [3].discovered
                           Unexpected behavior  Whenwith theprocess
                                                              numerical
                                                                    mining andresults
                                                                               numericalare   not what the modeler expected, e.g., if
                                                                                         results

               the probability of success of an attack is too low, the modeler needs to make an informed guess
               on how to change the model to improve its performance. In our previous work, on the first
               experimental results of our methodology [4], we called this way to validate a model SMC-guided
               black-box model validation since the modeler evaluates the model and makes changes based
               only on the numerical results without any other information.

                                                                                    Numerical
                                                                                    results and
                                                                                       single
                                                                                  counterexamp
                                                                                         le


                                                           SMC analysis                                     Black-box
                                      Model creation       (MultiVeStA)                                    evaluation of
                                                                                                         numerical results




                                                                                Informed guess driven by numerical results



               Figure 1: SMC-guided black-box model validation.


                 The novel methodology proposed here, depicted in Fig. 2, is defined as an SMC- and PM-guided
               white-box behavioral model validation in which the state-of-the-art method, depicted in Fig. 1, is




                                                                           14
augmented with PM techniques. The additions of the novel methodology are colored in green
in Fig. 2. Now, besides the numerical results, the modeler also has, thanks to PM techniques,
a behavioral evaluation of the model to support them in identifying flaws and improving the
model. Indeed, this is defined as a white-box behavioral model validation precisely because the
graphical results obtained by the PM analysis represent a closer representation of the original
model displaying an overview of its overarching behavior.

                                                   Numerical
                                                   results and
                                                      single                                            Black-box
                                                 counterexamp                                          evaluation of
                                                        le                                           numerical results



                 Model creation   SMC analysis
                                  (MultiVeStA)


                                                                               White-box                Behavioral
                                                                             process mining            evaluation of
                                                                                analysis              process mining
                                                                                                          results



                                   Event logs




                                                                 Unexpected behavior discovered with process mining and numerical results



Figure 2: SMC- and PM-guided white-box behavioral model validation.


Methodology. We use RisQFLan, a quantitative graphic-based framework for risk modeling
and analysis [5] belonging to the so-called QFLan family [6]. RisQFLan uses graph-based
security models called Attack Defense Trees, a variant of the classic Attack         resultstrees
                                                                                     Numerical
                                                                                             and       that can
                                                                                        single
represent an intuitive visual language to describe an attack scenario. In addition,counterexamp
                                                                                          le        RisQFLan
supports SMC analysis using MultiVeStA [7, 8], a black-box statistical
                                                                   SMC analysis
                                                                               model  checker           thatBlack-box
                                                                                                                 can
                                                    Model creation                                           evaluation of
be integrated with different simulators to add more reliable statistical     analysis techniques.
                                                                   (MultiVeStA)
                                                                                                                  We
                                                                                                           numerical results


have enriched MultiVeStA with PM-oriented logging capabilities by extending its interface
with simulators with new features that allow saving the complete traces of Informed
                                                                                  eachguess  simulation
                                                                                                 driven by numericalas
                                                                                                                     results

event logs. Although we implement the novel methodology proposed using MultiVeStA and
RisQFLan, this can be applied to any simulation models and SMC tools since it does not rely on
the internal mechanics of either the analyzer or the model but exploits logs of the computed
simulations. We implement PM techniques using Fluxicon Disco and PM4PY, a Python library,
to experiment with different PM algorithms.

2.2. Secondary Research Goals
Secondary research goals mainly aim to help non-experts apply this novel methodology to
validate threat models. On the one hand, we could automate the extraction of the attack
models and the attacker behaviors directly using the textual description from different data
sources. Currently, the model creation (Fig. 2) is performed by manually encoding the model in
RisQFLan; therefore, an automatic creation from a textual description of the attack model could
improve the usability of our methodology. Moreover, a model extracted from a description
could help the modeler automatically enhance the threat model when new attack strategies
are available to the attacker. On the other hand, another way to increase the usability of the




                                                                  15
novel methodology is achieved by describing the most salient information of the process model
obtained from the event logs. Indeed, a process model extracted from the simulated event
logs representing the model’s behavior might need to be explained to the non-expert of PM
when they visually inspect the process model. Here, we could present the user with a textual
description of the unwanted behaviors of the model discovered to help the modeler to identify
what needs to be correct in the formal model.

Methodology. To implement the automatic creation of the attack model and attacker behavior,
we experiment with NLP techniques using pre-trained models, such as the GPT3 model that
[9] used with promising results. In their work [9], they demonstrated how it is possible
to extract meaningful information from the text describing a process using Large Language
Models (LLM). In our case, we would extract from the textual description of a threat model the
activities/strategies an attacker can use to complete their attack. We will investigate techniques
to enhance the performance of the GPT-3 model by improving the prompt design [10, 11].
Then, we could fill in templates with the information extracted and feed them into RisQFLan
to create the actual attack models. Furthermore, within the secondary research goals, we
could implement a method to automatically extract a textual description of the most salient
information found in the process model by employing fuzzy quantification techniques and
natural language generation (NLG) tools [12]. This approach would help identify unwanted
behaviors in the formal model during the phase of evaluating the model’s behavior discovered
by mining the event logs.


3. Planned Research
In [4], we presented a prototypical instantiation of our approach. We demonstrated how even a
trivial threat model could display unexpected behaviors. Indeed, we could find and fix unwanted
behaviors in the formal model thanks to PM techniques. Although these experimental results
showed the potentiality of the methodology, this needs to be validated with experiments on
complete threat models. Therefore, the next step in our project would be to work on real threat
models to validate our methodology. At the same time, we will experiment with different
discovery PM algorithms or create an ad hoc algorithm to extract the attack models from the
event logs. If an ad hoc algorithm is needed, we will also work on a model-to-model metric to
measure how much the normative models overlay with the discovered ones. We will experiment
with the secondary research goals once we have completed the previous steps for the primary
research goal. The last step would be to put together the primary and secondary research goals
in a final work.


4. Conclusion
Simulation-based validation approaches run statistical analysis to evaluate the properties of
a simulated model returning numerical estimations of those properties. However, without
providing behavioral explanations on why the analyses returned those results, the modeler
can make only an informed guess based on the numeral results obtained to adjust the model




                                               16
and fix unwanted behaviors. The present project proposes a novel methodology that can help
reason upon the whole behavior of the model and understand why the analysis of the model’s
properties has returned some estimations of those properties. We achieve that by integrating
the simulation-based analysis technique from formal quantitative methods known as SMC
with the data- and process-driven techniques known as PM. Thanks to the widespread use of
simulation models, this new methodology would be valuable among several other disciplines to
help identify issues in the model (validation) or suggest relevant improvements (enhancement)
to the modeler.


References
 [1] G. Agha, K. Palmskog, A survey of statistical model checking, ACM Trans. Model. Comp.
     Simul. 28 (2018) 6:1–6:39.
 [2] K. Sen, M. Viswanathan, G. Agha, Statistical model checking of black-box probabilistic
     systems, in: International Conference on Computer Aided Verification, Springer, 2004, pp.
     202–215.
 [3] P. E. Bulychev, A. David, K. G. Larsen, M. Mikucionis, D. B. Poulsen, A. Legay, Z. Wang,
     UPPAAL-SMC: statistical model checking for priced timed automata, in: Proc. QAPL’12,
     volume 85, 2012, pp. 1–16.
 [4] R. Casaluce, A. Burattin, F. Chiaromonte, A. Vandin, Process mining meets statistical
     model checking: Towards a novel approach to model validation and enhancement, in:
     C. Cabanillas, N. F. Garmann-Johnsen, A. Koschmider (Eds.), Business Process Management
     Workshops, Springer International Publishing, Cham, 2022.
 [5] M. H. ter Beek, A. Legay, A. L. Lafuente, A. Vandin, Quantitative security risk modeling
     and analysis with RisQFLan, Computers & Security 109 (2021) 102381.
 [6] A. Vandin, M. H. ter Beek, A. Legay, A. Lluch-Lafuente, QFLan: A Tool for the Quantitative
     Analysis of Highly Reconfigurable Systems, in: FM, 2018.
 [7] S. Sebastio, A. Vandin, MultiVeStA: statistical model checking for discrete event simulators,
     in: 7th Int. Conf ValueTools’13, ICST/ACM, 2013, pp. 310–315.
 [8] A. Vandin, D. Giachini, F. Lamperti, F. Chiaromonte, Automated and distributed statistical
     analysis of economic agent-based models, Journal of Economic Dynamics and Control
     (2022) 104458.
 [9] P. Bellan, M. Dragoni, C. Ghidini, Leveraging pre-trained language models for conversa-
     tional information seeking from text, arXiv (2022).
[10] S. Arora, A. Narayan, M. F. Chen, L. J. Orr, N. Guha, K. Bhatia, I. Chami, F. Sala, C. Ré,
     Ask me anything: A simple strategy for prompting language models, arXiv preprint
     arXiv:2210.02441 (2022).
[11] A. Saparov, H. He, Language models are greedy reasoners: A systematic formal analysis
     of chain-of-thought, arXiv preprint arXiv:2210.01240 (2022).
[12] Y. Fontenla-Seco, M. Lama, V. González-Salvado, C. Peña-Gil, A. Bugarín-Diz, A framework
     for the automatic description of healthcare processes in natural language: Application in
     an aortic stenosis integrated care process, Journal of Biomedical Informatics 128 (2022).




                                               17