1. Introduction

Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, November

Multi-Models and Multi-Formulas Finite Model Checking for Modal Logic Formulas Induction

Mauro Milella

Giovanni Pagliarini

0 1

Andrea Paradiso

Ionel Eduard Stan

0 1 0 ACLAI Lab., Dept. of Math. and Comp. Sci., University of Ferrara , Italy 1 Dept. of Math. , Phy., and Comp. Sci. , University of Parma , Italy

2022

28 2022 0000 0001

Modal symbolic learning is the subfield of artificial intelligence that brings together machine learning and modal logic to design algorithms that extract modal logic theories from data. The generalization of model checking to multi-models and multi-formulas is key for the entire inductive process (with modal logics). We investigate such generalization by, first, pointing out the need for finite model checking in automatic inductive reasoning, and, then, showing how to eficiently solve it. We release an open-source implementation of our simulations.

eol>Modal Logic Machine Learning Modal Symbolic Learning Model Checking

1. Introduction

the finite state space of the system to assess whether some specification (i.e., property of the system) is true or not.

In this work, we argue that model checking plays a crucial role in the induction of modal formulas, and that an eficient model checking algorithm is essential for modal symbolic learning methods. We show how to extend the typical model checking machinery to check multiple formulas on multiple models, and experimentally show how memoization (i.e. storing model checking results for later reuse) can be leveraged to drastically reduce model checking computational time. Finally, we release an open-source implementation to reproduce the experiments of this work.

2. Model Checking with Memoization for Modal Logic Formulas Induction

modal logic (ML) are obtained by the following grammar: Given a set of propositional letters as the alphabet, the well-formed formulas of the propositional

::= | ¬ | ∨ | ♢ , and call the height of a formula the height of its syntax tree. where the remaining classic Boolean operators can be obtained as shortcuts. In what follows, we use □ to denote ¬♢ ¬ . The modality ♢ (resp., □ ) is usually referred to as it is possible that (resp., it is necessary that). We refer to ℱ as the set of formulas produced by the above grammar,

ML is paradigmatic of temporal, spatial, and spatio-temporal logics, and it is an extension of propositional logic (PL). Its semantics is given in terms of Kripke models. A Kripke model = (, , ) is a Kripke frame (, ) composed by a non-empty (possible infinite, but countable) set of worlds (which contains a distinguished world 0, called initial world), a binary accessibility relation ⊆ ×

, and a valuation function : → 2 , which associates each world with the set of propositional letters that are true on it. The truth relation , ⊩ , for a generic (Kripke) model , a world (in that model), and a formula (to be interpreted on that model), is given by the following clauses: , ⊩ , ⊩ ¬

, ⊩ ♢ , ⊩ 1 ∨ 2 if if if if , ⊩ 1 or , ⊩ 2; ∃′ s.t. ′ and , ′ ⊩ . ∈ (); , ⊮ (i.e., it is not the case that , ⊩ ); In the following, we use ⊩ to denote , 0 ⊩ . In modal symbolic learning, Kripke models can be used to represent data such as time series, images, audio, videos, and graphs, which, in the era of big data, accounts for 80% − 95% of the existing data [ 2 ]. This data is commonly referred to as unstructured, as it does not have a well-studied data model, nor a predefined structure, and it is typically counterposed to structured, tabular data, organized in rows and columns, which is found in spreadsheets and relational databases.

Among the many interesting mathematical problems studied over the years in the field of ML there is model checking [ 1 ]. Formally, the model checking problem is the problem of establishing if ⊩ , where is a Kripke model and is a formula of ML. Canonically, model checking is the problem of verifying properties of modal temporal logics on infinite state , finitely represented , abstract models (i.e., Kripke models) of concrete ones (e.g., reactive systems). Depending on the logical formalism, model checking may not be a trivial task [ 3 ]. The common denominator of the ML logical approaches (see, e.g., [ 4, 5, 6, 7, 8, 9, 10, 11, 12 ]) is that the kind of model checking, which is key for the entire learning process, is in fact finite , which, to some extent, trivializes the problem itself (in general, it becomes PTIME). Nevertheless, the model checking problem still raises some dificulties, and leaves room for algorithmic optimizations. Finite model checking a single ML formula on a single model can be achieved by simply adapting the well-known Emerson-Clarke algorithm for checking CTL* formulas [ 13 ]. This procedure relies on a memoization schema where a structure ℋ : ℱ → 2 is filled, in a bottom-up fashion, with the truth values of all subformulas on all worlds.

In real-world contexts, it is common to check many formulas on many Kripke models. In fact, modal symbolic learning is an inductive statistical process, that learns a general theory, seen as multiple ML formulas, from datasets of Kripke models. This sets the stage for the more general problem of finite model checking of multiple models against multiple formulas. Let = {1, . . . , } be a collection of Kripke models and Φ = { 1, . . . , } be a collection of ML formulas, then the multi-models and multi-formulas finite model checking problem is the problem of deciding, for all ∈ and for all ∈ Φ, if ⊩ . The straightforward approach involves calling the single model checking procedure · times; however, the memoization results generated by a single call can be reused for a later call on the same model. In fact, the memoization structure for a single model can be shared for checking all formulas; ideally, this reduces the overall computational load, but it requires more memory accesses which, in turn, introduce overhead. This tradeof can be mitigated by only sharing memoized (sub)formulas with height no greater than a fixed parameter ℎℎ, leveraging the fact that shorter (sub)formulas are more likely to be checked in the future.

We prove and quantify the benefits of a shared memoization structure in an experimental setting. The Kripke models are generated by fixing the same Kripke frame with 20 worlds, randomly generated using the Fan-in/Fan-out method from [ 14 ], and by assigning to each world a random subset of true propositional letters. The formulas are, first, generated as formulas of a fixed height ℎmax using a random procedure. On top of this, a pruning strategy is adopted for reducing each formula: in a top-down fashion, each node of the syntax tree is cut with probability and substituted with a random propositional letter. As a result, all formulas have maximum height ℎmax, and are generally smaller in size with greater values of . We fix an alphabet size of || = 16 and = 50 models. Diferent parametrizations are used for ℎmax and . For each parametrization, = 1000 · ℎmax formulas are checked on all models using the non-shared memoization approach and the shared memoization approach with diferent values of ℎℎ, with ℎℎ ≤ ℎmax. We let ℎmax ∈ {1, 2, 4, 8}, ∈ {0.2, 0.5}, and ℎℎ ∈ {0, 1, 2, 4, 8}; note that when ℎℎ = 0, only the results for the propositional letters are shared, and when ℎℎ = ℎmax, those for all subformulas are shared.

The cumulative times of the diferent approaches are illustrated in Fig. 1. It can be observed that sharing at least the propositional letters is beneficial in all cases; in other words, the shared memoization approach improves over the non-shared one. When comparing diferent 0.6 ) ( e itm0.4 e v i t la0.2 u m u C0.0 ) (6 e m i te4 v i t a lu2 m u C 0 non-shared ℎℎ = 0 ℎℎ = 1 400 600 -th formula (a) ℎ = 1 non-shared ℎℎ = 0 ℎℎ = 1 ℎℎ = 2 ℎℎ = 4 2000 -th formula (c) ℎ = 4 parametrizations of the shared approach, it appears that, within the given experimental setting, ℎℎ = 1 and ℎℎ = 2 tend to improve over the other values. Overall, the speedup achieved by shared memoization, calculated as the ratio between the total cumulative times of the non-shared and best shared approaches, ranges from 185% to 471%. The open-source implementation of our experiments, together with more results revealing similar trends, is available at https://github.com/aclai-lab/OVERLAY2022.jl.

3. Conclusions

In this work, we considered modal logic as paradigmatic of temporal, spatial, and spatiotemporal logics, and noted how Kripke models can be used for representing data with no predefined structure (which, nowadays, amounts to the vast majority of data). We pointed out the importance of finite model checking in automatic inductive reasoning. We generalized this problem to a multiple models and multiple formulas setting, showing the benefits of a shared memoization approach. This study is part of a bigger investigation on modal symbolic learning, which attempts at learning qualitative patterns from unstructured objects, seen as Kripke models, which is not possible with the limited expressive power of propositional logic.

[1]

Clarke ,

Grumberg ,

Kroening ,

Peled ,

Veith , Model Checking, 2nd ed., MIT Press, 2018 .

[2]

Gandomi , M. Haider, Beyond the hype: Big data concepts, methods, and analytics , International Journal of Information Management 35 ( 2015 ) 137 - 144 .

[3]

Sistla , E. Clarke, The Complexity of Propositional Linear Temporal Logics , Journal of the ACM 32 ( 1985 ) 733 - 749 .

[4]

Bartocci ,

Bortolussi , G. Sanguinetti, Data-driven statistical learning of temporal logic properties , in: Proceedings of the 12th International Conference on Formal Modeling and Analysis of Timed Systems (FORMATS) , volume 8711 of Lecture Notes in Computer Science, Springer, 2014 , pp. 23 - 37 .

[5]

Bombara ,

C. I.

Vasile ,

Penedo ,

Yasuoka ,

Belta , A Decision Tree Approach to Data Classification using Signal Temporal Logic , in: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control (HSCC) , 2016 , pp. 1 - 10 .

[6]

Jones ,

Kong ,

Belta , Anomaly detection in cyber-physical systems: A formal methods approach , in: Proceedings of the 53rd IEEE Conference on Decision and Control (CDC) , 2014 , pp. 848 - 853 .

[7]

Kong ,

Jones ,

A. Medina

Ayala ,

E. Aydin

Gol ,

Belta , Temporal logic inference for classification and prediction from data , in: Proceedings of the 17th International Conference on Hybrid Systems: Computation and Control (HSCC) , 2014 , pp. 273 - 282 .

[8]

Grosu ,

S. A.

Smolka ,

Corradini ,

Wasilewska ,

Entcheva ,

Bartocci , Learning and detecting emergent behavior in networks of cardiac myocytes , Communications of the ACM 52 ( 2009 ) 97 - 105 .

[9]

Brunello , G. Sciavicco,

I. E.

Stan , Interval Temporal Logic Decision Tree Learning , in: Proceedings of the 16th European Conference on Logics in Artificial Intelligence (JELIA) , volume 11468 of Lecture Notes in Computer Science, Springer, 2019 , pp. 778 - 793 .

[10]

Sciavicco ,

I. E.

Stan , Knowledge Extraction with Interval Temporal Logic Decision Trees , in: Proceedings of the 27th International Symposium on Temporal Representation and Reasoning (TIME) , volume 178 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik , 2020 , pp. 9 : 1 - 9 : 16 .

[11]

Lucena-Sánchez ,

Sciavicco ,

I. E.

Stan , Feature and Language Selection in Temporal Symbolic Regression for Interpretable Air Quality Modelling , Algorithms 14 ( 2021 ) 76 .

[12]

Pagliarini , G. Sciavicco, Decision Tree Learning with Spatial Modal Logics , in: Proceedings of the 12th International Symposium on Games, Automata, Logics, and Formal Verification (GandALF) , volume 346 , 2021 , pp. 273 - 290 .

[13]

E. M.

Clarke ,

E. A.

Emerson , Design and synthesis of synchronization skeletons using branching-time temporal logic , in: D. Kozen (Ed.), Logics of Programs, Workshop, Yorktown Heights, New York, USA, May 1981 , 1981 , pp. 52 - 71 .

[14]

Cordeiro , G. Mounié,

Perarnau ,

Trystram ,

Vincent ,

Wagner , Random graph generation for scheduling simulations , in: Proceedings of the 3rd International Conference on Simulation Tools and Techniques (SIMUTools) , 2010 , p. 60 .