<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis,
November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-Models and Multi-Formulas Finite Model Checking for Modal Logic Formulas Induction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mauro Milella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Pagliarini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Paradiso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ionel Eduard Stan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ACLAI Lab., Dept. of Math. and Comp. Sci., University of Ferrara</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Math.</institution>
          ,
          <addr-line>Phy., and Comp. Sci.</addr-line>
          ,
          <institution>University of Parma</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>28</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Modal symbolic learning is the subfield of artificial intelligence that brings together machine learning and modal logic to design algorithms that extract modal logic theories from data. The generalization of model checking to multi-models and multi-formulas is key for the entire inductive process (with modal logics). We investigate such generalization by, first, pointing out the need for finite model checking in automatic inductive reasoning, and, then, showing how to eficiently solve it. We release an open-source implementation of our simulations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Modal Logic</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Modal Symbolic Learning</kwd>
        <kwd>Model Checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the finite state space of the system to assess whether some specification (i.e., property of the
system) is true or not.</p>
      <p>In this work, we argue that model checking plays a crucial role in the induction of modal
formulas, and that an eficient model checking algorithm is essential for modal symbolic
learning methods. We show how to extend the typical model checking machinery to check
multiple formulas on multiple models, and experimentally show how memoization (i.e. storing
model checking results for later reuse) can be leveraged to drastically reduce model checking
computational time. Finally, we release an open-source implementation to reproduce the
experiments of this work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Model Checking with Memoization for Modal Logic Formulas</title>
    </sec>
    <sec id="sec-3">
      <title>Induction</title>
      <p>modal logic (ML) are obtained by the following grammar:
Given a set of propositional letters  as the alphabet, the well-formed formulas of the propositional</p>
      <p>::=  | ¬ |  ∨  | ♢ ,
and call the height of a formula the height of its syntax tree.
where the remaining classic Boolean operators can be obtained as shortcuts. In what follows,
we use □  to denote ¬♢ ¬ . The modality ♢ (resp., □ ) is usually referred to as it is possible that
(resp., it is necessary that). We refer to ℱ as the set of formulas produced by the above grammar,</p>
      <p>ML is paradigmatic of temporal, spatial, and spatio-temporal logics, and it is an extension
of propositional logic (PL). Its semantics is given in terms of Kripke models. A Kripke model
 = (, ,  ) is a Kripke frame (, ) composed by a non-empty (possible infinite, but
countable) set of worlds  (which contains a distinguished world 0, called initial world),
a binary accessibility relation  ⊆
 ×</p>
      <p>, and a valuation function  :  → 2 , which
associates each world with the set of propositional letters that are true on it. The truth relation
,  ⊩  , for a generic (Kripke) model , a world  (in that model), and a formula  (to be
interpreted on that model), is given by the following clauses:
,  ⊩ 
,  ⊩
¬</p>
      <p>
        ,  ⊩ ♢ 
,  ⊩  1 ∨  2 if
if
if
if
,  ⊩  1 or ,  ⊩  2;
∃′ s.t. ′ and , ′ ⊩ .
 ∈  ();
,  ⊮  (i.e., it is not the case that ,  ⊩  );
In the following, we use  ⊩  to denote , 0 ⊩  . In modal symbolic learning, Kripke
models can be used to represent data such as time series, images, audio, videos, and graphs,
which, in the era of big data, accounts for 80% − 95% of the existing data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This data is
commonly referred to as unstructured, as it does not have a well-studied data model, nor a
predefined structure, and it is typically counterposed to structured, tabular data, organized in
rows and columns, which is found in spreadsheets and relational databases.
      </p>
      <p>
        Among the many interesting mathematical problems studied over the years in the field of ML
there is model checking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Formally, the model checking problem is the problem of establishing
if  ⊩  , where  is a Kripke model and  is a formula of ML. Canonically, model checking is
the problem of verifying properties of modal temporal logics on infinite state , finitely represented ,
abstract models (i.e., Kripke models) of concrete ones (e.g., reactive systems). Depending on
the logical formalism, model checking may not be a trivial task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The common denominator
of the ML logical approaches (see, e.g., [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref4 ref5 ref6 ref7 ref8 ref9">4, 5, 6, 7, 8, 9, 10, 11, 12</xref>
        ]) is that the kind of model
checking, which is key for the entire learning process, is in fact finite , which, to some extent,
trivializes the problem itself (in general, it becomes PTIME). Nevertheless, the model checking
problem still raises some dificulties, and leaves room for algorithmic optimizations. Finite
model checking a single ML formula on a single model can be achieved by simply adapting the
well-known Emerson-Clarke algorithm for checking CTL* formulas [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This procedure relies
on a memoization schema where a structure ℋ : ℱ → 2 is filled, in a bottom-up fashion,
with the truth values of all subformulas on all worlds.
      </p>
      <p>In real-world contexts, it is common to check many formulas on many Kripke models. In
fact, modal symbolic learning is an inductive statistical process, that learns a general theory,
seen as multiple ML formulas, from datasets of Kripke models. This sets the stage for the more
general problem of finite model checking of multiple models against multiple formulas. Let
 = {1, . . . , } be a collection of  Kripke models and Φ = { 1, . . . ,  } be a collection
of  ML formulas, then the multi-models and multi-formulas finite model checking problem
is the problem of deciding, for all  ∈  and for all  ∈ Φ, if  ⊩  . The straightforward
approach involves calling the single model checking procedure  ·  times; however, the
memoization results generated by a single call can be reused for a later call on the same model.
In fact, the memoization structure for a single model can be shared for checking all formulas;
ideally, this reduces the overall computational load, but it requires more memory accesses
which, in turn, introduce overhead. This tradeof can be mitigated by only sharing memoized
(sub)formulas with height no greater than a fixed parameter ℎℎ, leveraging the fact that
shorter (sub)formulas are more likely to be checked in the future.</p>
      <p>
        We prove and quantify the benefits of a shared memoization structure in an experimental
setting. The  Kripke models are generated by fixing the same Kripke frame with 20 worlds,
randomly generated using the Fan-in/Fan-out method from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and by assigning to each
world a random subset of true propositional letters. The  formulas are, first, generated as
formulas of a fixed height ℎmax using a random procedure. On top of this, a pruning strategy
is adopted for reducing each formula: in a top-down fashion, each node of the syntax tree
is cut with probability  and substituted with a random propositional letter. As a result, all
formulas have maximum height ℎmax, and are generally smaller in size with greater values of
. We fix an alphabet size of || = 16 and  = 50 models. Diferent parametrizations are
used for ℎmax and . For each parametrization,  = 1000 · ℎmax formulas are checked on all
models using the non-shared memoization approach and the shared memoization approach with
diferent values of ℎℎ, with ℎℎ ≤ ℎmax. We let ℎmax ∈ {1, 2, 4, 8},  ∈ {0.2, 0.5},
and ℎℎ ∈ {0, 1, 2, 4, 8}; note that when ℎℎ = 0, only the results for the propositional
letters are shared, and when ℎℎ = ℎmax, those for all subformulas are shared.
      </p>
      <p>The cumulative times of the diferent approaches are illustrated in Fig. 1. It can be observed
that sharing at least the propositional letters is beneficial in all cases; in other words, the
shared memoization approach improves over the non-shared one. When comparing diferent
0.6
)

(
e
itm0.4
e
v
i
t
la0.2
u
m
u
C0.0
)
(6
e
m
i
te4
v
i
t
a
lu2
m
u
C
0
non-shared
ℎℎ = 0
ℎℎ = 1
400 600
-th formula
(a) ℎ = 1
non-shared
ℎℎ = 0
ℎℎ = 1
ℎℎ = 2
ℎℎ = 4
2000
-th formula
(c) ℎ = 4
parametrizations of the shared approach, it appears that, within the given experimental setting,
ℎℎ = 1 and ℎℎ = 2 tend to improve over the other values. Overall, the speedup
achieved by shared memoization, calculated as the ratio between the total cumulative times
of the non-shared and best shared approaches, ranges from 185% to 471%. The open-source
implementation of our experiments, together with more results revealing similar trends, is
available at https://github.com/aclai-lab/OVERLAY2022.jl.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Conclusions</title>
      <p>In this work, we considered modal logic as paradigmatic of temporal, spatial, and
spatiotemporal logics, and noted how Kripke models can be used for representing data with no
predefined structure (which, nowadays, amounts to the vast majority of data). We pointed
out the importance of finite model checking in automatic inductive reasoning. We generalized
this problem to a multiple models and multiple formulas setting, showing the benefits of a
shared memoization approach. This study is part of a bigger investigation on modal symbolic
learning, which attempts at learning qualitative patterns from unstructured objects, seen as
Kripke models, which is not possible with the limited expressive power of propositional logic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grumberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kroening</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Peled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Veith</surname>
          </string-name>
          , Model Checking, 2nd ed., MIT Press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gandomi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Haider, Beyond the hype: Big data concepts, methods, and analytics</article-title>
          ,
          <source>International Journal of Information Management</source>
          <volume>35</volume>
          (
          <year>2015</year>
          )
          <fpage>137</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sistla</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Clarke,</surname>
          </string-name>
          <article-title>The Complexity of Propositional Linear Temporal Logics</article-title>
          ,
          <source>Journal of the ACM</source>
          <volume>32</volume>
          (
          <year>1985</year>
          )
          <fpage>733</fpage>
          -
          <lpage>749</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bartocci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bortolussi</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sanguinetti, Data-driven statistical learning of temporal logic properties</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on Formal Modeling and Analysis of Timed Systems (FORMATS)</source>
          , volume
          <volume>8711</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2014</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bombara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. I.</given-names>
            <surname>Vasile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Penedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yasuoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Belta</surname>
          </string-name>
          ,
          <article-title>A Decision Tree Approach to Data Classification using Signal Temporal Logic</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control (HSCC)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Belta</surname>
          </string-name>
          ,
          <article-title>Anomaly detection in cyber-physical systems: A formal methods approach</article-title>
          ,
          <source>in: Proceedings of the 53rd IEEE Conference on Decision and Control (CDC)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>848</fpage>
          -
          <lpage>853</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Medina</given-names>
            <surname>Ayala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Aydin</given-names>
            <surname>Gol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Belta</surname>
          </string-name>
          ,
          <article-title>Temporal logic inference for classification and prediction from data</article-title>
          ,
          <source>in: Proceedings of the 17th International Conference on Hybrid Systems: Computation and Control (HSCC)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Grosu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Smolka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Corradini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wasilewska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Entcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bartocci</surname>
          </string-name>
          ,
          <article-title>Learning and detecting emergent behavior in networks of cardiac myocytes</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>52</volume>
          (
          <year>2009</year>
          )
          <fpage>97</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Brunello</surname>
          </string-name>
          , G. Sciavicco,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Stan</surname>
          </string-name>
          ,
          <article-title>Interval Temporal Logic Decision Tree Learning</article-title>
          ,
          <source>in: Proceedings of the 16th European Conference on Logics in Artificial Intelligence (JELIA)</source>
          , volume
          <volume>11468</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2019</year>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>793</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sciavicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Stan</surname>
          </string-name>
          ,
          <article-title>Knowledge Extraction with Interval Temporal Logic Decision Trees</article-title>
          ,
          <source>in: Proceedings of the 27th International Symposium on Temporal Representation and Reasoning (TIME)</source>
          , volume
          <volume>178</volume>
          of LIPIcs,
          <source>Schloss Dagstuhl - Leibniz-Zentrum für Informatik</source>
          ,
          <year>2020</year>
          , pp.
          <volume>9</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          :
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lucena-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sciavicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Stan</surname>
          </string-name>
          ,
          <article-title>Feature and Language Selection in Temporal Symbolic Regression for Interpretable Air Quality Modelling</article-title>
          ,
          <source>Algorithms</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>76</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pagliarini</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sciavicco, Decision Tree Learning with Spatial Modal Logics</article-title>
          ,
          <source>in: Proceedings of the 12th International Symposium on Games, Automata, Logics, and Formal Verification (GandALF)</source>
          , volume
          <volume>346</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Emerson</surname>
          </string-name>
          ,
          <article-title>Design and synthesis of synchronization skeletons using branching-time temporal logic</article-title>
          , in: D.
          <string-name>
            <surname>Kozen</surname>
          </string-name>
          (Ed.), Logics of Programs, Workshop, Yorktown Heights, New York, USA, May
          <year>1981</year>
          ,
          <year>1981</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cordeiro</surname>
          </string-name>
          , G. Mounié,
          <string-name>
            <given-names>S.</given-names>
            <surname>Perarnau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trystram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <article-title>Random graph generation for scheduling simulations</article-title>
          ,
          <source>in: Proceedings of the 3rd International Conference on Simulation Tools and Techniques (SIMUTools)</source>
          ,
          <year>2010</year>
          , p.
          <fpage>60</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>