On Modal Logic Association Rule Mining
Mauro Milella1 , Emilio Muñoz-Velasco2 , Giovanni Pagliarini1,3 , Andrea Paradiso1 ,
Guido Sciavicco1,† and Ionel Eduard Stan1,3,*,†
1
  University of Ferrara, Italy
2
  University of Màlaga, Spain
3
  University of Parma, Italy


                                         Abstract
                                         The most iconic duality of machine learning models is symbolic learning versus functional learning.
                                         While functional learning is based on a numerical approach to knowledge extraction and modelling,
                                         the purpose of symbolic machine learning is to extract knowledge from data in such a form that it can
                                         be understood, discussed, modified, and applied by humans, as well as serve as the basis of artificial
                                         intelligence applications. The typical problems associated with machine learning are classification and
                                         rule extraction; while classification can be dealt with using both functional and symbolic learning, rule
                                         extraction is essentially symbolic. One element common to nearly all definitions and tools for rule
                                         extraction is that they are applied to static datasets and based on propositional logic; unfortunately,
                                         very often real-world applications give rise to non-static sets of data (e.g., temporal, spatial, graph-based
                                         data) and may require more-than-propositional expressive power. In order to extract association rules
                                         from non-static data, in this paper we propose a definition of modal association rules based on modal
                                         logic, and we study how a standard rule extraction algorithm such as APRIORI can be generalized to
                                         the modal case while keeping the properties of the canonical, non-modal case, namely, correctness and
                                         completeness.

                                         Keywords
                                         Modal logic, Association rule mining, Modal symbolic learning


1. Introduction
The most iconic and fundamental separation between the sub-fields of machine learning is the
one between functional and symbolic learning. Functional learning is the process of learning a
function that represents the theory underlying a certain phenomenon. Symbolic learning, on
the other hand, is the process of learning a logical description that represents that phenomenon.
Whether one or the other approach should be preferred raised a long-standing debate among
experts, which roots in the fact that functional methods tend to be more versatile and statistically
accurate than symbolic ones, while symbolic methods are able to extract models that can be

Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022
*
  Corresponding author.
†
  These authors contributed equally.
" mauro.milella@edu.unife.it (M. Milella); ejmunoz@uma.es (E. Muñoz-Velasco); giovanni.pagliarini@unife.it
(G. Pagliarini); andrea.paradiso@edu.unife.it (A. Paradiso); guido.sciavicco@unife.it (G. Sciavicco);
ioneleduard.stan@unife.it (I. E. Stan)
 0000-0001-7128-6745 (M. Milella); 0000-0002-0117-4219 (E. Muñoz-Velasco); 0000-0002-8403-3250 (G. Pagliarini);
0000-0002-3614-2487 (A. Paradiso); 0000-0002-8403-3250 (G. Sciavicco); 0000-0001-9260-102X (I. E. Stan)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
interpreted, explained, and then enhanced using human-expert knowledge. These characteristics
of symbolic methods, both for political reasons (consider, for example, the recent General
Data Protection Regulation (GDPR) [1] of the European Union, that highlights the need for
interpretable/explainable automatic learning-based decision-making processes, including those
involving AI technologies) and technical ones (interpretable models are often easier to train,
explore, integrate, and implement), are sometimes used as arguments for preferring a symbolic
approach over a functional one. Classification and rule extraction are, among others, typical
problems of machine learning; while classification can be dealt with using both functional and
symbolic learning, rule extraction is essentially symbolic.
   From a logical standpoint, canonical symbolic learning methods are mostly characterized by
being based on propositional logic, and by being designed for static data, in which every instance
is described by the value of 𝑛 attributes. Non-static data, such as dimensional (temporal, spatial,
spatio-temporal) data, graph-based data, or textual data, however, cannot be successfully dealt
with within the same schema in a native way. The standard approach is to pre-process such
data, so that their aspect becomes static again (e.g., by averaging the value of attributes along
all dimensions), and that off-the-shelf static symbolic methods can be used. Modal symbolic
learning [2, 3, 4] takes a different point of view: modal symbolic methods are characterized by
being based on modal logic (e.g., temporal, spatial logic) so that non-static data can be dealt
with natively, and that the extracted knowledge takes the form of interpretable modal logic
formulas. Modal symbolic learning has been mainly focused on classification; here, we consider
the case of association rule extraction. At the propositional level, there are two main approaches
to association rule extraction, or, more particularly, to its key step, frequent sets extraction,
namely APRIORI [5] and FP-Growth [6]. In this paper, we consider the most basic one, APRIORI,
and we generalize it to the modal level. A first approach for extracting interval temporal logic
association rules, that can be considered a particular case of this work, has been proposed in [7].
  This paper is organized as follows. First, in Section 2, we lay down some basic preliminaries,
and we define the concept of modal datasets. Then, in Section 3, we define modal association
rules from a logical viewpoint, and in Section 4 we extend APRIORI to extract modal frequent
sets and we study its properties, namely, correctness and completeness. In Section 5 we give
examples of possible applications of modal association rules, before concluding.


2. Preliminaries
Given a set of propositional letters 𝒫, the well-formed formulas of the propositional modal logic
(ML) are obtained by the following grammar:

                                   𝜙 ::= 𝑝 | ¬𝜙 | 𝜙 ∧ 𝜙 | ♢𝜙,

where the remaining classic Boolean operators can be obtained as shortcuts. In what follows,
we use □𝜙 to denote ¬♢¬𝜙, and ≡ to denote the logical equivalence. The modality ♢ (resp., □)
is usually referred to as it is possible that (resp., it is necessary that). Modal logic is considered as
archetypal of temporal, spatial, and spatio-temporal logics, and it is an extension of propositional
logic (PL). Its semantics is given in terms of Kripke models. A (finite) Kripke model 𝐾 =
                                                                                 𝑤2
                                                            𝑤0         𝑤1
                                                                                 𝑤3
                   Instance/Kripke model
                             𝐼1
                             𝐼2                                        𝑤1        𝑤3
                             𝐼3                             𝑤0
                             𝐼4                                        𝑤2


                                                            ...

Figure 1: An example of modal dataset with 4 instances, each described by a Kripke model.


(𝑊, 𝑅, 𝑉 ) is composed by a finite set of worlds 𝑊 (which contains a distinguished world 𝑤0 ,
called initial world), a binary accessibility relation 𝑅 ⊆ 𝑊 × 𝑊 , and a valuation function
𝑉 : 𝑊 → 2𝒫 , which associates each world with the set of propositional letters that are true on
it. The truth relation 𝐾, 𝑤 ⊩ 𝜙 for a generic model 𝐾 and a generic world 𝑤 in that model is
given by the following clauses:

                         𝐾, 𝑤 ⊩ 𝑝          iff   𝑝 ∈ 𝑉 (𝑤);
                         𝐾, 𝑤 ⊩ ¬𝜙         iff   𝐾, 𝑤 ̸⊩ 𝜙;
                         𝐾, 𝑤 ⊩ 𝜙 ∧ 𝜓      iff   𝐾, 𝑤 ⊩ 𝜙 and 𝐾, 𝑤 ⊩ 𝜓;
                         𝐾, 𝑤 ⊩ ♢𝜙         iff   ∃𝑣 s.t. 𝑤𝑅𝑣 and 𝐾, 𝑣 ⊩ 𝜙.

Most classic temporal [8, 9, 10] and spatial logics [11, 12] are specializations of the uni-modal
logic with more than one (possibly with higher arieties) accessibility relations (and associated
modalities), subject to constraints that range from very simple and intuitive ones (e.g., transitivity,
antisymmetry) to very complex ones (e.g., when worlds are assumed to be intervals in time and
modalities are assumed to mimic relations between intervals).

Definition 1 (modal dataset). Given a set of propositional letters 𝒫, a modal dataset ℐ =
{𝐼1 , . . . , 𝐼𝑚 } is a finite collection of 𝑚 instances, each of which is a finite Kripke model 𝐼𝑖 =
(𝑊𝑖 , 𝑅𝑖 , 𝑉𝑖 ), for each 𝑖 = 1, . . . , 𝑚.

The above definition generalizes the conventional notion of propositional, or static, dataset, in
which each of the 𝑚 instances is associated to the value of 𝑛 distinct numerical attributes. An
example of modal dataset is illustrated in Fig. 1. In machine learning, several problems are
associated to a dataset ℐ. The most common one is the classification/regression problem; here,
however, we are interested in the equally common learning problem known as association rule
extraction: in a generic dataset, it is the problem of extracting meaningful associations among
the values of the attributes.
  In the following, we first define the concept of meaningful association rule, and then we
focus our attention to the problem of extracting (all) meaningful rules from a modal dataset.
3. Modal Logic Association Rules
Rule-based methods are ubiquitous in modern machine learning and data mining applica-
tions [13], that aim at learning regularities in data which can be expressed in the form of if-then
rules. In this paper, we are interested in descriptive (or association) rules, that collectively cover
the instances’ space.
  Let 𝒫 be an alphabet on ℐ. In its simplest form, propositional association rules are objects of
the type [14]:
                                  𝜌 : 𝑝1 ∧ . . . ∧ 𝑝𝑘 ⇒ 𝑝𝑘+1 ,                                (1)
where 𝑝1 , . . . , 𝑝𝑘 , 𝑝𝑘+1 ∈ 𝒫 are positive propositional literals (or simply propositional literals);
recalling the standard nomenclature in the association rule mining literature, propositional
literals are also known as items [5]. We use the notation:

                                             𝜌 : 𝑋 ⇒ 𝑌,                                             (2)

where 𝑋 = {𝑝1 , . . . , 𝑝𝑘 } is called antecedent, 𝑌 = {𝑝𝑘+1 } is called consequent, and 𝑋 ∩ 𝑌 = ∅.
Association analysis must be interpreted with caution: causality is not implied in the inference
process by an association rule as it simply represents strong co-occurrence relationship of
the literals in the antecedent with those in the consequent [15]. Therefore, to emphasize the
difference with respect to →, which is a logical implication, in (association) rules we use the
symbol ⇒. In this sense, 𝜌 is not a formula of propositional logic. When treated like one,
however, it becomes a formula that belongs to the known subset of propositional logic, that
is, Horn propositional logic; this usually happens after the extraction phase, when rules are
no longer at risk of not being meaningful. Horn propositional logic is interesting from the
deductive point of view (its satisfiability problem is 𝑃 -complete), which becomes relevant when
data-driven knowledge is mixed, in some sense, with expert, top-down knowledge in intelligent
applications. Non-empty subsets of 𝒫 are also called propositional patterns [5]. Moreover, the
length of a (propositional) pattern 𝑋 is the set-theoretic cardinality of 𝑋. In this paper we focus
only on Horn-like association rules; however, the fact that, in other settings, the consequent 𝑌
of a rule 𝜌 : 𝑋 ⇒ 𝑌 could be a set with more than one literal, that is, it could have 𝑘 ′ literals
𝑌 = {𝑝𝑘+1 , . . . , 𝑝𝑘+𝑘′ } which, if interpreted as a disjunction, would result in being a non-Horn
logic rule.
  In order to formalize the modal association rule mining problem, where instances of a modal
dataset are generic, finite Kripke models, we must first lift the concept of literals to the modal
case.
Definition 2 (modal literals). For a fixed alphabet 𝒫, the set of positive modal literals (or simply
modal literals) Λ𝒫 over 𝒫 is defined by the grammar:

                                         𝜆 ::= 𝑝 | ♢𝜆 | □𝜆,

where 𝑝 ∈ 𝒫.
   Modal literals of the type 𝑝 are called propositional, as in the propositional case, and modal
literals of the type ♢𝜆 (resp., □𝜆) are called existential (resp., universal).
Definition 3 (modal association rules). For a fixed set of modal literals Λ𝒫 , modal association
rules are objects of the type:
                                 𝜌 : 𝜆1 ∧ . . . ∧ 𝜆𝑘 ⇒ 𝜆𝑘+1 ,                                 (3)
where 𝜆1 , . . . , 𝜆𝑘 , 𝜆𝑘+1 ∈ Λ𝒫 .

  Note that (3) has the same structure as (1), except that 𝒫 has been replaced by Λ𝒫 . Non-empty
subsets of Λ𝒫 are called modal patterns, and we sometimes write a rule 𝜌 in its set-theoretical
notation, that is, 𝑋 ⇒ 𝑌 , where both 𝑋 and 𝑌 are modal patterns and 𝑌 is a singleton.
Similarly to the propositional case, modal rules are not modal formulas; when treated like that,
they belong to the Horn fragment of modal logic MLHorn . There are several, although very
similar, proposals for a Horn fragment of modal logic. Bresolin et al. [16] studied MLHorn and
several of its sub-fragments from the deductive point of view; while the satisfiability problem
for MLHorn is still 𝑃 𝑆𝑃 𝐴𝐶𝐸-complete (as it is for full ML), sub-fragments of it such as the
one with only universal and propositional literals becomes 𝑃 -complete.
   Not all rules are interesting or meaningful. As for the propositional case, in order to capture
the notion of meaningfulness in logical terms, in [5] authors implicitly defined a notion of 𝜌
holding on an instance, and then on a dataset. They did so by introducing two parameters,
namely support and confidence, which they used to modify the notion of truth of literal, first,
and of a rule, then. After the first introduction of such concepts, and specifically of the concept
of confidence, alternative measures of meaningfulness have been proposed, among which we
mention lift, and conviction (see, e.g., [15]). In the end, as we shall see, the key observation is
that their definitions are all consequences of the definition of support; our aim is to generalize
the latter to the modal case, so that all other measures can then be re-defined accordingly. If the
notion of propositional support was originally designed to capture the idea of how frequent
was a literal (resp., a set of literals) across the static dataset, we now have to capture the
additional idea of how frequent a modal literal (resp., a set of modal literals) is across a single
instance, as well. Such two notions are only partially orthogonal: locally, we want to capture
the number of different worlds in which a certain phenomenon happens, in order to measure its
interestingness, while, globally, we want to capture the number of different instances in which
such an interesting phenomenon occurs. In the following, if 𝑋 is a set of literals, 𝐼⋀︀= (𝑊, 𝑅, 𝑉 )
an instance, and 𝑤 ∈ 𝑊 a world in 𝐼, we use 𝐼, 𝑤 ⊩ 𝑋 as a shortcut for 𝐼, 𝑤 ⊩ 𝜆∈𝑋 𝜆.

Definition 4 (support). Let ℐ be a modal dataset and 𝑋 ⊆ Λ𝒫 be a set of literals. The local
support of 𝑋 on some instance 𝐼 ∈ ℐ, being 𝐼 = (𝑊, 𝑅, 𝑉 ), is defined as:

                                               |{𝑤 ∈ 𝑊 | 𝐼, 𝑤 ⊩ 𝑋}|
                               𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) =                        ,
                                                      |𝑊 |

and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global support of 𝑋 on ℐ relatively to 𝑠𝑙 is
defined as:
                                            |{𝐼 ∈ ℐ | 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) ≥ 𝑠𝑙 }|
                         𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) =                                 .
                                                          |ℐ|
Similarly to the propositional case, we can define popular interestingness measures for modal
rules in such a way to be (local and global) support-dependent.
Definition 5 (confidence, lift). Let ℐ be a modal dataset, and 𝜌 : 𝑋 ⇒ 𝑌 a modal rule. The local
confidence of 𝜌 on some instance 𝐼 ∈ ℐ is defined as:
                                                    𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 )
                             𝑙𝑐𝑜𝑛𝑓 (𝐼, 𝑋 ⇒ 𝑌 ) =                     ,
                                                      𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋)

and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global confidence of 𝜌 on ℐ relatively to 𝑠𝑙
is defined as:
                                                    𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 )
                           𝑔𝑐𝑜𝑛𝑓𝑠𝑙 (ℐ, 𝑋 ⇒ 𝑌 ) =                        .
                                                      𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋)
Similarly, the local lift of 𝜌 on some instance 𝐼 ∈ ℐ is defined as:
                                                   𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 )
                         𝑙𝑙𝑖𝑓 𝑡(𝐼, 𝑋 ⇒ 𝑌 ) =                              ,
                                               𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) · 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑌 )

and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global lift of 𝜌 on ℐ relatively to 𝑠𝑙 is
defined as:
                                                𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 )
                    𝑔𝑙𝑖𝑓 𝑡𝑠𝑙 (ℐ, 𝑋 ⇒ 𝑌 ) =                                  .
                                           𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) · 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑌 )
Confidence measures how frequently items in 𝑌 appear in instances (in the global setting) or
worlds (in the local setting) that contain 𝑋. Similarly, lift measures the dependence between
𝑋 and 𝑌 . From the above examples, we can derive a general notion of support-dependent
interestingness measure; we say that a measure is local support-dependent (resp., global support-
dependent) if and only if it can be computed by a function of the local support (resp., global
support) of its set of worlds (resp., instances). We denote by 𝑙𝑚𝑒𝑎𝑠 (resp., 𝑔𝑚𝑒𝑎𝑠𝑠𝑙 ) a generic
local (resp., global) support-dependent measure.
Definition 6 (holding). Let ℐ be a modal dataset, 𝜌 : 𝑋 ⇒ 𝑌 a modal rule, 𝜃𝑙 a (𝑧 + 1)-vector
of values in (0, 1] ⊂ R (indexed from 0 to 𝑧), and let 𝑙𝑚𝑒𝑎𝑠1 , . . . , 𝑙𝑚𝑒𝑎𝑠𝑧 be 𝑧 local support-
dependent measures. We say that 𝜌 𝜃𝑙 -holds on 𝐼 ∈ ℐ if and only if: (i) 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑙 [0],
and (ii) for each 1 ≤ 𝑖 ≤ 𝑧, 𝑙𝑚𝑒𝑎𝑠𝑖 (𝐼, 𝜌) ≥ 𝜃𝑙 [𝑖]. We denote such a situation by:

                                               𝐼 ⊩𝜃𝑙 𝜌.

Similarly, given 𝑠𝑙 ∈ (0, 1] ⊂ R, if 𝜃𝑔𝑠𝑙 is a (𝑧 + 1)-vector of values in (0, 1] ⊂ R (indexed
from 0 to 𝑧), and 𝑔𝑚𝑒𝑎𝑠1 , . . . , 𝑔𝑚𝑒𝑎𝑠𝑧 are 𝑧 global support-dependent measures, we say that
𝜌 𝜃𝑔𝑠𝑙 -holds on ℐ if and only if: (i) 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑔𝑠𝑙 [0], and (ii) for each 1 ≤ 𝑖 ≤ 𝑧,
𝑔𝑚𝑒𝑎𝑠𝑖𝑠𝑙 (ℐ, 𝜌) ≥ 𝜃𝑔𝑠𝑙 [𝑖]. We denote such a situation by:

                                             ℐ ⊩𝜃𝑔𝑠𝑙 𝜌.

As defined in the literature, the association rules mining problem has two parts: frequent
patterns mining and extraction of interesting rules. Therefore, in both settings, local and global,
the first conditions, 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑙 [0] and 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑔𝑠𝑙 [0], enforces a rule
to be frequent in terms of their supports, locally and globally, respectively. The remaining
conditions can be arbitrarily combined but they cannot be separated from imposing a minimal
support. Observe that, the notion of a rule holding on a single instance collapses to the standard
notion of truth when instances are propositional (that is, with a single world); similarly, in
the same case, the notion of a rule holding on a dataset collapses to the (implicit) notion of
propositional rule holding on a dataset (driven by the classical support and confidence/lift/. . . ).
The definition of such generalized notions is not unique, as different constraints could be
enforced. However, not every combination makes sense. For example, forcing a rule to be
globally supported if and only if it is locally supported on every instance would imply requiring
perfect global support, which is rather restrictive and possibly non-adherent to the intended
use of rules.
  In the next section we shall focus on solving the following natural problem: given a modal
dataset ℐ and a vector 𝜃𝑔𝑠𝑙 of (𝑧 + 1) values in (0, 1] ⊂ R for some 𝑠𝑙 ∈ (0, 1] ⊂ R, extract (or
mine) all and only modal rules that 𝜃𝑔𝑠𝑙 -hold on ℐ, which we regard to as interesting.


4. Mining Modal Logic Rules
As a consequence of focusing on support-dependent interestingness measures only, it turns
out that extracting interesting rules can be achieved by simply extracting sets of literals with
a relatively high support, which in the classic terminology are called frequent sets. To see
this, consider a rule 𝜌 : 𝑋 ⇒ 𝑌 , and assume that we want to check if it [𝑠𝑔 , 𝑐𝑔 ]𝑠𝑔𝑙 -holds on a
certain dataset ℐ, where 𝑐𝑔 is the minimum requested global confidence. To this end, it must
hold that 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝑠𝑔 and that 𝑔𝑐𝑜𝑛𝑓𝑠𝑙 (ℐ, 𝜌) ≥ 𝑐𝑔 . To compute the latter, one
has to compute, first, 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) and 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋), which are both by-products of
computing 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ). Therefore, being able to extract all frequent sets of literals is
necessary and sufficient to be able to extract all interesting rules as we have defined them. This
phenomenon occurred at the propositional level too, obviously; it is interesting to observe that
it still holds at the modal one by inheriting exactly the same procedure (see again, e.g., [5]).
   The problem of enumerating all sets of frequent literals has been studied in the literature [17];
in general, enumerating frequent sets has several applications, not necessarily related to rule
extraction. The first key observation is that so-called, and well-known APRIORI principle.
Proposition 1 (APRIORI principle [5]). If a set of literals is frequent, then all of its subsets must
be frequent.
Naturally, there are a potentially exponential number of frequent sets of a given dataset ℐ, even
at the propositional level. Focusing on maximal frequent sets of literals does not change this,
and the complexity of the problem of extracting them becomes therefore interesting. As proved
in [17], the counting problem associated to the problem of extracting all maximal frequent
sets is already #𝑃 -hard [18]. In most cases, the interesting problems in machine learning are
computationally hard, thus the go-to strategy is turning to approximation algorithms. This
is not the case with frequent set extraction: in many practical cases, indeed, there are only a
few maximal frequent sets in a given dataset, and enumerating them in a correct and complete
fashion is the typical approach. There are two classical approaches to this problem, called,
respectively, APRIORI [5] and FP-Growth [6]. In the rest of this section, we consider the former
one, and we study its modal generalization.
ALGORITHM 1: Modal extension of the APRIORI algorithm for mining frequent modal patterns.
1  function 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼(ℐ, 𝒫, 𝑠𝑔 , 𝑠𝑙 , 𝛿):
 2     Λ𝒫 ← 𝐿𝑖𝑡𝑒𝑟𝑎𝑙𝑠(𝒫, 𝛿)
 3     𝐶1 ← {{𝜆} | 𝜆 ∈ Λ𝒫 }
 4     𝐹1 ← {𝑋 ∈ 𝐶1 | 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) ≥ 𝑠𝑔 }
 5     𝑘←1
 6     repeat
 7         𝑘 ←𝑘+1
 8         𝐶𝑘 ← 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(𝐹𝑘−1 )
 9         𝐶𝑘 ← 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(𝐶𝑘 , 𝐹𝑘−1 )
10         𝐹𝑘 ← {𝑋 ∈ 𝐶𝑘 | 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) ≥ 𝑠𝑔 }
11     until 𝐹𝑘 = ∅
       return 𝑘𝑖=1 𝐹𝑖
              ⋃︀
12
13 end

14 function 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(𝐹𝑘−1 ):
15     𝐶𝑘 ← {{𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 } | {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 }, {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } ∈
        𝐹𝑘−1 and 𝜆𝑘−1 < 𝜆′𝑘−1 }
16     return 𝐶𝑘
17 end

18 function 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(𝐶𝑘 , 𝐹𝑘−1 ):
19     foreach 𝑋 ∈ 𝐶𝑘 do
20         forall (𝑘 − 1)-length subsets 𝑆 of 𝑋 do
21             if 𝑆 ∈/ 𝐹𝑘−1 then 𝐶𝑘 ← 𝐶𝑘 ∖ 𝑋
22         end
23     end
24     return 𝐶𝑘
25 end


   APRIORI has been the first pruning-based algorithm for mining sets of propositional literals.
Borrowing the same concept from the propositional level, the global support models the (relative)
frequency of instances in ℐ that satisfy the modal pattern 𝑋, whereas the local support, in the
more general modal setting that we present here, models the frequency of worlds in 𝐼 ∈ ℐ that
satisfy the modal pattern 𝑋. Observe that the anti-monotonic property (the APRIORI principle)
also holds for the local support, as longer modal patterns hold on fewer worlds, which is a
consequence of the logical conjunction in the definition of the truth 𝐼, 𝑤 ⊩ 𝑋. The modal
extension of APRIORI is based on the following assumption, which also holds in its propositional
version of APRIORI: modal literals in modal patterns can be lexicographically ordered. The
reason behind such an assumption, that works in synergy with the APRIORI principle, is to
avoid useless computation during the learning routine. Algorithm 1 generalizes the APRIORI
algorithm to the modal case and takes as input a modal dataset ℐ, an alphabet 𝒫 defined on ℐ,
three user-specified thresholds 𝑠𝑙 , 𝑠𝑔 ∈ (0, 1] ⊂ R, and 𝛿 ∈ N for the minimum local support,
minimum global support, and maximum modal depth of modal literals, respectively. Setting a
maximum modal depth is a necessary step for the algorithm to work and terminate, as there is
no anti-monotonic property that holds for modal literals; it should be noticed, however, that
modal rules with too long modal literals would disfavour their interpretability. In Algorithm 1,
𝐹𝑘 and 𝐶𝑘 denote the set of frequent modal patterns and the set of generated candidates of
modal patterns of length 𝑘, for 𝑘 ≥ 1 ∈ N, respectively. The algorithm starts by computing
the set of modal literals Λ𝒫 from 𝒫 and 𝛿. Then, it computes 𝐹1 whose patterns are frequent
from 𝐶1 . Next, the repeat-until loop block of the algorithm computes iteratively 𝐹𝑘 , with
𝑘 > 1. The 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(·) procedure generates candidate modal patterns 𝐶𝑘 by
joining1 𝐹𝑘−1 with itself: two lexicographically ordered frequent (𝑘 − 1)-length modal patterns
{𝜆1 , . . . , 𝜆𝑘−1 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } of 𝐹𝑘−1 are merged together if they have the
same modal patterns in the first 𝑘 − 2 positions and if 𝜆𝑘−1 < 𝜆′𝑘−1 to form a candidate 𝑘-length
modal pattern {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. The candidates from 𝐶𝑘 are then pruned by the
𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(·) function based on the APRIORI principle. It is important to stress the
fact that, both 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(·) and 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(·) are identical as in the
canonical APRIORI algorithm. At this stage, the algorithm produces 𝐹𝑘 whose elements are
from 𝐶𝑘 which are frequent. Termination is guaranteed by the anti-monotonic property of the
support.
   We now study correctness and completeness of our approach. It is important to stress the
fact that completeness is parametric in 𝛿.
Theorem 1 (correctness). 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼 is correct.
Proof. We want to show that 𝐹𝑘 , for each 𝑘 ≥ 1, is the set of frequent 𝑘-length modal pat-
terns. The proof is by induction on 𝑘. If 𝑘 = 1, then 𝐹1 contains singleton sets of modal
literals of the type {𝜆}, where 𝜆 ∈ Λ𝒫 , that are frequent. Thus, by construction, 𝐹1 is made
of frequent 1-length modal patterns. If 𝑘 > 1, to obtain the set of frequent sets of modal
literals 𝐹𝑘 , the algorithm generates candidates by combining elements of 𝐹𝑘−1 , which are
frequent by inductive hypothesis. Let {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } be el-
ements of 𝐹𝑘−1 used by the candidate generation procedure to generate 𝑋 ∈ 𝐹𝑘 , such that
𝑋 = {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. If 𝑋 is frequent, that is, meets the support, then 𝑋 ∈ 𝐹𝑘 .
Therefore, 𝐹𝑘 is made of 𝑘-length frequent patterns.

Theorem 2 (completeness). 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼 is complete.
Proof. Let 𝐹𝑘 be the set of frequent sets of modal literals and let 𝐶𝑘 be the set of candidate
frequent sets of modal literals, whose modal patterns have length 𝑘. We want to show that
𝐹𝑘 ⊆ 𝐶𝑘 , for all 𝑘 ≥ 1. The proof is by induction on 𝑘. If 𝑘 = 1, then 𝐶1 is the set of singleton
sets of the type {𝜆}, and 𝐹1 ⊆ 𝐶𝑘 holds since 𝐹1 is defined as the subset of 𝐶𝑘 that are frequent.
If 𝑘 > 1, assume, by inductive hypothesis, that 𝐹𝑘−1 ⊆ 𝐶𝑘−1 . We must show that 𝐹𝑘 ⊆ 𝐶𝑘 .
Let 𝑋 = {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. The pattern 𝑋 is generated by two (𝑘 − 1)-length modal
patterns, say {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 }, such that 𝜆𝑘−1 < 𝜆′𝑘−1 , which
are necessarily frequent, by inductive hypothesis. Thus, 𝑋 ∈ 𝐶𝑘 must hold and if 𝑋 meets the
support, then 𝑋 ∈ 𝐹𝑘 , obtaining the thesis.


5. Applications
Modal logics are, as we have observed, archetypal of many temporal, spatial, and spatio-
temporal logics. The purpose of modal symbolic learning is designing symbolic machine learning
1
    This is the original nomenclature.
                                                              𝐴2
         Instance   𝐴𝑖𝑑    𝐴1    𝐴2    𝑇                 𝐽1
                                                              𝐴1
            𝐼1      𝑎1𝑖𝑑   𝑎11
                            1    𝑎11
                                  2    𝑡1
            𝐼2      𝑎1𝑖𝑑   𝑎12
                            1    𝑎12
                                  2    𝑡2
            𝐼3      𝑎1𝑖𝑑   𝑎13
                            1    𝑎13
                                  2    𝑡3
            𝐼4      𝑎1𝑖𝑑   𝑎14
                            1    𝑎14
                                  2    𝑡4                     𝐴2
            𝐼5      𝑎2𝑖𝑑   𝑎21
                            1    𝑎21
                                  2    𝑡1                𝐽2
            𝐼6      𝑎2𝑖𝑑   𝑎22   𝑎22   𝑡2                     𝐴1
                            1     2
            𝐼7      𝑎2𝑖𝑑   𝑎23
                            1    𝑎23
                                  2    𝑡3
            𝐼8      𝑎2𝑖𝑑   𝑎24
                            1    𝑎24
                                  2    𝑡4
            𝐼9      𝑎3𝑖𝑑   𝑎31
                            1    𝑎31
                                  2    𝑡1                     𝐴2
           𝐼10      𝑎3𝑖𝑑   𝑎32
                            1    𝑎32
                                  2    𝑡2                𝐽3
           𝐼11      𝑎3𝑖𝑑   𝑎33
                            1    𝑎33
                                  2    𝑡3                     𝐴1
           𝐼12      𝑎3𝑖𝑑   𝑎34
                            1    𝑎34
                                  2    𝑡4
           𝐼13      𝑎4𝑖𝑑   𝑎41
                            1    𝑎41
                                  2    𝑡1
           𝐼14      𝑎4𝑖𝑑   𝑎42
                            1    𝑎42
                                  2    𝑡2
                                                              𝐴2
           𝐼15      𝑎4𝑖𝑑   𝑎43
                            1    𝑎43
                                  2    𝑡3
                                                         𝐽4
           𝐼16      𝑎4𝑖𝑑   𝑎44
                            1    𝑎44
                                  2    𝑡4                     𝐴1


Figure 2: An example of dimensional dataset.


techniques to deal with non-static data. Non-static data include, among others, dimensional data
(that is, data with temporal and/or spatial components), graph-based data (that is, data natively
in the form of graphs, such as social networks, or gene expression networks), and textual data
(that is, data in the form of natural language text). The most basic advantage of modal symbolic
learning is that of reuniting all such non-static data under the same viewpoint, that is, seeing
them all as modal data. How this is done in all cases is outside the scope of this paper; by way
of example we take here the dimensional, temporal case to show how the modal framework
emerges naturally.
    Dimensional datasets can be regarded as modal datasets in which the Kripke model associated
to an instance is implicit and presented in tabular form; this corresponds to the typical presen-
tation of dimensional data, as in Fig. 2, left-hand side. While in the classical sense instances
correspond to lines, more than one line make up for a real-world instance. So, for example, each
instance of the left-hand side of the picture may be the status of a patient under observation,
each being described by two attributes 𝐴1 and 𝐴2 , over the course of four time points 𝑡1 through
𝑡4 . Thus, the intended meaning of such a dataset is that we have in fact four patients (being
distinguished by their names 𝑎1𝑖𝑑 through 𝑎4𝑖𝑑 ), which become four multivariate time series,
denoted by 𝐽1 through 𝐽4 , each composed by four values per attribute, as in Fig. 2, right-hand
side. By regarding each interval in the temporal domain 𝑡1 , . . . , 𝑡4 as a world, one ends up
building a Krikpe structure from each time series (in this example, each structure has 6 worlds,
that is, 6 non-instantaneous intervals). Connecting each pair of such intervals there is exactly
one of 12 Allen’s relations [19]; the modal logic that emerges from this preprocessing, and that
allows one to express how the properties of a time series relate to each other through time is
known as interval temporal logic of Allen’s relations, or HS (from Halpern and Shoham, who
first introduced it [9]).
    Observe, now, that each interval in this setting accounts for a set of values for each attribute;
for example, in Fig 2, the instance 𝐽1 has values 𝑎11 1 , 𝑎1 , 𝑎1 for 𝐴1 in the interval [𝑡1 , 𝑡3 ]. By
                                                           12 13

applying suitable feature extraction function(s) one can, at this point, aggregate such set of values
in a single one; typical aggregations include the average, the maximum, and the minimum, but
in the literature of learning from temporal data one finds several more, and less trivial, examples.
Let us call 𝑓 any such aggregation function; by applying 𝑓 to every possible interval for the
attribute 𝐴, one ends up with a new set of values, that is, the domain 𝑑𝑜𝑚(𝑓 (𝐴)). At this point,
it is possible to extract a propositional alphabet, of the type:

                                         𝑎1 ≤ 𝑓 (𝐴) ≤ 𝑎2 ,

with 𝑎1 , 𝑎2 ∈ 𝑑𝑜𝑚(𝑓 (𝐴)). By labeling each world (interval) with the truth value of each of the
propositional letters, we extract from a dimensional, temporal dataset a modal one.
   The explicit knowledge that can be extracted from electroencephalogram recording (EEG) of
subjects that are in certain conditions or are carrying on certain activities is of uttermost impor-
tance in the scientific enquiry of those conditions and activities. The current approaches to this
problem are entirely functional, but, despite the terminological affinity, black-box classification
models such as neural networks cannot be considered models of the human mind. EEG is a
real-world example of dimensional data that could be used for modal association rule extraction.
A typical EEG is performed with a head-set that accommodates several electrodes (typically, in
the fifties), each one of which records, during a single experiment, the voltage that is expressed
by a specific area of the human brain; the signal of each electrode is then preprocessed through
Fourier-like transformations, so that each signal gives rise to several components at different
frequency intervals (typically, in the tens). All in all, a single EEG recording generates anywhere
between 500 and 2000 temporal attributes (pairs electrode-frequency) over the course of the
experiment. A recent trend in neurobiology is focusing on how different experimental situations
can be distinguished by the form and the values of the corresponding EEG recording, which
helps neurobiologists to understand the nature of the different brain areas in different settings.
While most often these problems are designed as classification problems, understanding how
the several attributes relate to each other through time in a specific situation may turn out to be
very interesting. Temporal relationships can be extracted as modal rules, specifically HS rules,
which would have the form of assertions of the type: if the value of the voltage in this sensor at
this frequency is within this interval of values during an interval of time, then the value of the
voltage in this other sensor at this other frequency must be within this other interval immediately
after it.


6. Conclusions
Symbolic learning is the sub-field of machine learning that deals with symbolic algorithms and
models, which have been known for decades and successfully applied to a variety of contexts.
The main limitation of current symbolic models is the fact that they are essentially based on
classical propositional logic, which implies that dimensional data, graph-based data, or textual
data must be pre-processed in order to be dealt with within the classic framework, at the
expenses of decreasing the interpretability of the result and loosing information.
  Modal symbolic learning is the modal logic generalization of symbolic learning; in this paper
we considered the problem of defining the concept of modal association rules and their logical
characterization, and we devised a correct and complete algorithm, ModalAPRIORI, which is
able to extract frequent sets of modal literals both at the global and local level, and therefore
modal symbolic rules. We also discussed potential applications of modal symbolic rules.


References
 [1] Parliament and Council of the European Union, General Data Protection Regulation, 2016.
     URL: https://gdpr-info.eu/.
 [2] F. Manzella, G. Pagliarini, G. Sciavicco, I. E. Stan, Interval Temporal Random Forests
     with an Application to COVID-19 Diagnosis, in: Proceedings of the 28th International
     Symposium on Temporal Representation and Reasoning (TIME), volume 206 of Leibniz
     International Proceedings in Informatics, Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
     2021, pp. 7:1–7:18.
 [3] G. Pagliarini, G. Sciavicco, Decision Tree Learning with Spatial Modal Logics, in: Pro-
     ceedings of the 12th International Symposium on Games, Automata, Logics, and Formal
     Verification (GandALF), volume 346 of Electronic Proceedings in Theoretical Computer
     Science, 2021, pp. 273–290.
 [4] G. Sciavicco, I. E. Stan, Knowledge extraction with interval temporal logic decision trees,
     in: Proceedings of the 27th International Symposium on Temporal Representation and
     Reasoning (TIME), volume 178 of Leibniz International Proceedings in Informatics, Schloss
     Dagstuhl - Leibniz-Zentrum für Informatik, 2020, pp. 9:1–9:16.
 [5] R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases,
     in: Proceedings of 20th International Conference on Very Large Data Bases (VLDB), 1994,
     pp. 487–499.
 [6] J. Han, J. Pei, Y. Yin, Mining Frequent Patterns without Candidate Generation, in:
     Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data,
     2000, pp. 1–12.
 [7] D. Bresolin, E. Cominato, S. Gnani, E. Muñoz-Velasco, G. Sciavicco, Extracting Interval
     Temporal Logic Rules: A First Approach, in: Proceedings of the 25th International
     Symposium on Temporal Representation and Reasoning (TIME), volume 120 of Leibniz
     International Proceedings in Informatics, Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
     2018, pp. 7:1–7:15.
 [8] E. M. Clarke, E. A. Emerson, Design and Synthesis of Synchronization Skeletons Using
     Branching-Time Temporal Logic, in: Proceedings of the Workshop on Logics of Programs
     (LOP), volume 131 of Lecture Notes in Computer Science, Springer, 1981, pp. 52–71.
 [9] J. Y. Halpern, Y. Shoham, A Propositional Modal Logic of Time Intervals, Journal of the
     ACM 38 (1991) 935–962.
[10] A. Pnueli, The Temporal Logic of Programs, in: Proceedings of the 18th Annual Symposium
     on Foundations of Computer Science (FOCS), IEEE Computer Society, 1977, p. 46–57.
[11] M. Aiello, J. van Benthem, A Modal Walk Through Space, Journal of Applied Non-Classical
     Logics 12 (2002) 319–364.
[12] C. Lutz, F. Wolter, Modal Logics of Topological Relations, Logical Methods in Computer
     Science 2 (2006) 1–41.
[13] J. Fürnkranz, D. Gamberger, N. Lavrac, Foundations of Rule Learning, Cognitive Technolo-
     gies, Springer, 2012.
[14] A. Horn, On Sentences Which are True of Direct Unions of Algebras, Journal of Symbolic
     Logic 16 (1951) 14–21.
[15] P. N. Tan, M. S. Steinbach, A. Karpatne, V. Kumar, Introduction to Data Mining, 2nd ed.,
     Pearson, 2019.
[16] D. Bresolin, E. Muñoz-Velasco, G. Sciavicco, On Sub-Propositional Fragments of Modal
     Logic, Logical Methods in Computer Science 14 (2018).
[17] G. Yang, The complexity of mining maximal frequent itemsets and maximal frequent pat-
     terns, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge
     Discovery and Data Mining (KDD), 2004, pp. 344–353.
[18] L. G. Valiant, The Complexity of Enumeration and Reliability Problems, SIAM Journal on
     Computing 8 (1979) 410–421.
[19] J. F. Allen, Maintaining Knowledge about Temporal Intervals, Communication of the ACM
     26 (1983) 832–843.