On Modal Logic Association Rule Mining Mauro Milella1 , Emilio Muñoz-Velasco2 , Giovanni Pagliarini1,3 , Andrea Paradiso1 , Guido Sciavicco1,† and Ionel Eduard Stan1,3,*,† 1 University of Ferrara, Italy 2 University of Màlaga, Spain 3 University of Parma, Italy Abstract The most iconic duality of machine learning models is symbolic learning versus functional learning. While functional learning is based on a numerical approach to knowledge extraction and modelling, the purpose of symbolic machine learning is to extract knowledge from data in such a form that it can be understood, discussed, modified, and applied by humans, as well as serve as the basis of artificial intelligence applications. The typical problems associated with machine learning are classification and rule extraction; while classification can be dealt with using both functional and symbolic learning, rule extraction is essentially symbolic. One element common to nearly all definitions and tools for rule extraction is that they are applied to static datasets and based on propositional logic; unfortunately, very often real-world applications give rise to non-static sets of data (e.g., temporal, spatial, graph-based data) and may require more-than-propositional expressive power. In order to extract association rules from non-static data, in this paper we propose a definition of modal association rules based on modal logic, and we study how a standard rule extraction algorithm such as APRIORI can be generalized to the modal case while keeping the properties of the canonical, non-modal case, namely, correctness and completeness. Keywords Modal logic, Association rule mining, Modal symbolic learning 1. Introduction The most iconic and fundamental separation between the sub-fields of machine learning is the one between functional and symbolic learning. Functional learning is the process of learning a function that represents the theory underlying a certain phenomenon. Symbolic learning, on the other hand, is the process of learning a logical description that represents that phenomenon. Whether one or the other approach should be preferred raised a long-standing debate among experts, which roots in the fact that functional methods tend to be more versatile and statistically accurate than symbolic ones, while symbolic methods are able to extract models that can be Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022 * Corresponding author. † These authors contributed equally. " mauro.milella@edu.unife.it (M. Milella); ejmunoz@uma.es (E. Muñoz-Velasco); giovanni.pagliarini@unife.it (G. Pagliarini); andrea.paradiso@edu.unife.it (A. Paradiso); guido.sciavicco@unife.it (G. Sciavicco); ioneleduard.stan@unife.it (I. E. Stan)  0000-0001-7128-6745 (M. Milella); 0000-0002-0117-4219 (E. Muñoz-Velasco); 0000-0002-8403-3250 (G. Pagliarini); 0000-0002-3614-2487 (A. Paradiso); 0000-0002-8403-3250 (G. Sciavicco); 0000-0001-9260-102X (I. E. Stan) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) interpreted, explained, and then enhanced using human-expert knowledge. These characteristics of symbolic methods, both for political reasons (consider, for example, the recent General Data Protection Regulation (GDPR) [1] of the European Union, that highlights the need for interpretable/explainable automatic learning-based decision-making processes, including those involving AI technologies) and technical ones (interpretable models are often easier to train, explore, integrate, and implement), are sometimes used as arguments for preferring a symbolic approach over a functional one. Classification and rule extraction are, among others, typical problems of machine learning; while classification can be dealt with using both functional and symbolic learning, rule extraction is essentially symbolic. From a logical standpoint, canonical symbolic learning methods are mostly characterized by being based on propositional logic, and by being designed for static data, in which every instance is described by the value of 𝑛 attributes. Non-static data, such as dimensional (temporal, spatial, spatio-temporal) data, graph-based data, or textual data, however, cannot be successfully dealt with within the same schema in a native way. The standard approach is to pre-process such data, so that their aspect becomes static again (e.g., by averaging the value of attributes along all dimensions), and that off-the-shelf static symbolic methods can be used. Modal symbolic learning [2, 3, 4] takes a different point of view: modal symbolic methods are characterized by being based on modal logic (e.g., temporal, spatial logic) so that non-static data can be dealt with natively, and that the extracted knowledge takes the form of interpretable modal logic formulas. Modal symbolic learning has been mainly focused on classification; here, we consider the case of association rule extraction. At the propositional level, there are two main approaches to association rule extraction, or, more particularly, to its key step, frequent sets extraction, namely APRIORI [5] and FP-Growth [6]. In this paper, we consider the most basic one, APRIORI, and we generalize it to the modal level. A first approach for extracting interval temporal logic association rules, that can be considered a particular case of this work, has been proposed in [7]. This paper is organized as follows. First, in Section 2, we lay down some basic preliminaries, and we define the concept of modal datasets. Then, in Section 3, we define modal association rules from a logical viewpoint, and in Section 4 we extend APRIORI to extract modal frequent sets and we study its properties, namely, correctness and completeness. In Section 5 we give examples of possible applications of modal association rules, before concluding. 2. Preliminaries Given a set of propositional letters 𝒫, the well-formed formulas of the propositional modal logic (ML) are obtained by the following grammar: 𝜙 ::= 𝑝 | ¬𝜙 | 𝜙 ∧ 𝜙 | ♢𝜙, where the remaining classic Boolean operators can be obtained as shortcuts. In what follows, we use □𝜙 to denote ¬♢¬𝜙, and ≡ to denote the logical equivalence. The modality ♢ (resp., □) is usually referred to as it is possible that (resp., it is necessary that). Modal logic is considered as archetypal of temporal, spatial, and spatio-temporal logics, and it is an extension of propositional logic (PL). Its semantics is given in terms of Kripke models. A (finite) Kripke model 𝐾 = 𝑤2 𝑤0 𝑤1 𝑤3 Instance/Kripke model 𝐼1 𝐼2 𝑤1 𝑤3 𝐼3 𝑤0 𝐼4 𝑤2 ... Figure 1: An example of modal dataset with 4 instances, each described by a Kripke model. (𝑊, 𝑅, 𝑉 ) is composed by a finite set of worlds 𝑊 (which contains a distinguished world 𝑤0 , called initial world), a binary accessibility relation 𝑅 ⊆ 𝑊 × 𝑊 , and a valuation function 𝑉 : 𝑊 → 2𝒫 , which associates each world with the set of propositional letters that are true on it. The truth relation 𝐾, 𝑤 ⊩ 𝜙 for a generic model 𝐾 and a generic world 𝑤 in that model is given by the following clauses: 𝐾, 𝑤 ⊩ 𝑝 iff 𝑝 ∈ 𝑉 (𝑤); 𝐾, 𝑤 ⊩ ¬𝜙 iff 𝐾, 𝑤 ̸⊩ 𝜙; 𝐾, 𝑤 ⊩ 𝜙 ∧ 𝜓 iff 𝐾, 𝑤 ⊩ 𝜙 and 𝐾, 𝑤 ⊩ 𝜓; 𝐾, 𝑤 ⊩ ♢𝜙 iff ∃𝑣 s.t. 𝑤𝑅𝑣 and 𝐾, 𝑣 ⊩ 𝜙. Most classic temporal [8, 9, 10] and spatial logics [11, 12] are specializations of the uni-modal logic with more than one (possibly with higher arieties) accessibility relations (and associated modalities), subject to constraints that range from very simple and intuitive ones (e.g., transitivity, antisymmetry) to very complex ones (e.g., when worlds are assumed to be intervals in time and modalities are assumed to mimic relations between intervals). Definition 1 (modal dataset). Given a set of propositional letters 𝒫, a modal dataset ℐ = {𝐼1 , . . . , 𝐼𝑚 } is a finite collection of 𝑚 instances, each of which is a finite Kripke model 𝐼𝑖 = (𝑊𝑖 , 𝑅𝑖 , 𝑉𝑖 ), for each 𝑖 = 1, . . . , 𝑚. The above definition generalizes the conventional notion of propositional, or static, dataset, in which each of the 𝑚 instances is associated to the value of 𝑛 distinct numerical attributes. An example of modal dataset is illustrated in Fig. 1. In machine learning, several problems are associated to a dataset ℐ. The most common one is the classification/regression problem; here, however, we are interested in the equally common learning problem known as association rule extraction: in a generic dataset, it is the problem of extracting meaningful associations among the values of the attributes. In the following, we first define the concept of meaningful association rule, and then we focus our attention to the problem of extracting (all) meaningful rules from a modal dataset. 3. Modal Logic Association Rules Rule-based methods are ubiquitous in modern machine learning and data mining applica- tions [13], that aim at learning regularities in data which can be expressed in the form of if-then rules. In this paper, we are interested in descriptive (or association) rules, that collectively cover the instances’ space. Let 𝒫 be an alphabet on ℐ. In its simplest form, propositional association rules are objects of the type [14]: 𝜌 : 𝑝1 ∧ . . . ∧ 𝑝𝑘 ⇒ 𝑝𝑘+1 , (1) where 𝑝1 , . . . , 𝑝𝑘 , 𝑝𝑘+1 ∈ 𝒫 are positive propositional literals (or simply propositional literals); recalling the standard nomenclature in the association rule mining literature, propositional literals are also known as items [5]. We use the notation: 𝜌 : 𝑋 ⇒ 𝑌, (2) where 𝑋 = {𝑝1 , . . . , 𝑝𝑘 } is called antecedent, 𝑌 = {𝑝𝑘+1 } is called consequent, and 𝑋 ∩ 𝑌 = ∅. Association analysis must be interpreted with caution: causality is not implied in the inference process by an association rule as it simply represents strong co-occurrence relationship of the literals in the antecedent with those in the consequent [15]. Therefore, to emphasize the difference with respect to →, which is a logical implication, in (association) rules we use the symbol ⇒. In this sense, 𝜌 is not a formula of propositional logic. When treated like one, however, it becomes a formula that belongs to the known subset of propositional logic, that is, Horn propositional logic; this usually happens after the extraction phase, when rules are no longer at risk of not being meaningful. Horn propositional logic is interesting from the deductive point of view (its satisfiability problem is 𝑃 -complete), which becomes relevant when data-driven knowledge is mixed, in some sense, with expert, top-down knowledge in intelligent applications. Non-empty subsets of 𝒫 are also called propositional patterns [5]. Moreover, the length of a (propositional) pattern 𝑋 is the set-theoretic cardinality of 𝑋. In this paper we focus only on Horn-like association rules; however, the fact that, in other settings, the consequent 𝑌 of a rule 𝜌 : 𝑋 ⇒ 𝑌 could be a set with more than one literal, that is, it could have 𝑘 ′ literals 𝑌 = {𝑝𝑘+1 , . . . , 𝑝𝑘+𝑘′ } which, if interpreted as a disjunction, would result in being a non-Horn logic rule. In order to formalize the modal association rule mining problem, where instances of a modal dataset are generic, finite Kripke models, we must first lift the concept of literals to the modal case. Definition 2 (modal literals). For a fixed alphabet 𝒫, the set of positive modal literals (or simply modal literals) Λ𝒫 over 𝒫 is defined by the grammar: 𝜆 ::= 𝑝 | ♢𝜆 | □𝜆, where 𝑝 ∈ 𝒫. Modal literals of the type 𝑝 are called propositional, as in the propositional case, and modal literals of the type ♢𝜆 (resp., □𝜆) are called existential (resp., universal). Definition 3 (modal association rules). For a fixed set of modal literals Λ𝒫 , modal association rules are objects of the type: 𝜌 : 𝜆1 ∧ . . . ∧ 𝜆𝑘 ⇒ 𝜆𝑘+1 , (3) where 𝜆1 , . . . , 𝜆𝑘 , 𝜆𝑘+1 ∈ Λ𝒫 . Note that (3) has the same structure as (1), except that 𝒫 has been replaced by Λ𝒫 . Non-empty subsets of Λ𝒫 are called modal patterns, and we sometimes write a rule 𝜌 in its set-theoretical notation, that is, 𝑋 ⇒ 𝑌 , where both 𝑋 and 𝑌 are modal patterns and 𝑌 is a singleton. Similarly to the propositional case, modal rules are not modal formulas; when treated like that, they belong to the Horn fragment of modal logic MLHorn . There are several, although very similar, proposals for a Horn fragment of modal logic. Bresolin et al. [16] studied MLHorn and several of its sub-fragments from the deductive point of view; while the satisfiability problem for MLHorn is still 𝑃 𝑆𝑃 𝐴𝐶𝐸-complete (as it is for full ML), sub-fragments of it such as the one with only universal and propositional literals becomes 𝑃 -complete. Not all rules are interesting or meaningful. As for the propositional case, in order to capture the notion of meaningfulness in logical terms, in [5] authors implicitly defined a notion of 𝜌 holding on an instance, and then on a dataset. They did so by introducing two parameters, namely support and confidence, which they used to modify the notion of truth of literal, first, and of a rule, then. After the first introduction of such concepts, and specifically of the concept of confidence, alternative measures of meaningfulness have been proposed, among which we mention lift, and conviction (see, e.g., [15]). In the end, as we shall see, the key observation is that their definitions are all consequences of the definition of support; our aim is to generalize the latter to the modal case, so that all other measures can then be re-defined accordingly. If the notion of propositional support was originally designed to capture the idea of how frequent was a literal (resp., a set of literals) across the static dataset, we now have to capture the additional idea of how frequent a modal literal (resp., a set of modal literals) is across a single instance, as well. Such two notions are only partially orthogonal: locally, we want to capture the number of different worlds in which a certain phenomenon happens, in order to measure its interestingness, while, globally, we want to capture the number of different instances in which such an interesting phenomenon occurs. In the following, if 𝑋 is a set of literals, 𝐼⋀︀= (𝑊, 𝑅, 𝑉 ) an instance, and 𝑤 ∈ 𝑊 a world in 𝐼, we use 𝐼, 𝑤 ⊩ 𝑋 as a shortcut for 𝐼, 𝑤 ⊩ 𝜆∈𝑋 𝜆. Definition 4 (support). Let ℐ be a modal dataset and 𝑋 ⊆ Λ𝒫 be a set of literals. The local support of 𝑋 on some instance 𝐼 ∈ ℐ, being 𝐼 = (𝑊, 𝑅, 𝑉 ), is defined as: |{𝑤 ∈ 𝑊 | 𝐼, 𝑤 ⊩ 𝑋}| 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) = , |𝑊 | and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global support of 𝑋 on ℐ relatively to 𝑠𝑙 is defined as: |{𝐼 ∈ ℐ | 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) ≥ 𝑠𝑙 }| 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) = . |ℐ| Similarly to the propositional case, we can define popular interestingness measures for modal rules in such a way to be (local and global) support-dependent. Definition 5 (confidence, lift). Let ℐ be a modal dataset, and 𝜌 : 𝑋 ⇒ 𝑌 a modal rule. The local confidence of 𝜌 on some instance 𝐼 ∈ ℐ is defined as: 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) 𝑙𝑐𝑜𝑛𝑓 (𝐼, 𝑋 ⇒ 𝑌 ) = , 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global confidence of 𝜌 on ℐ relatively to 𝑠𝑙 is defined as: 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) 𝑔𝑐𝑜𝑛𝑓𝑠𝑙 (ℐ, 𝑋 ⇒ 𝑌 ) = . 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) Similarly, the local lift of 𝜌 on some instance 𝐼 ∈ ℐ is defined as: 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) 𝑙𝑙𝑖𝑓 𝑡(𝐼, 𝑋 ⇒ 𝑌 ) = , 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋) · 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑌 ) and, given a certain local support 𝑠𝑙 ∈ (0, 1] ⊂ R, the global lift of 𝜌 on ℐ relatively to 𝑠𝑙 is defined as: 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) 𝑔𝑙𝑖𝑓 𝑡𝑠𝑙 (ℐ, 𝑋 ⇒ 𝑌 ) = . 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) · 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑌 ) Confidence measures how frequently items in 𝑌 appear in instances (in the global setting) or worlds (in the local setting) that contain 𝑋. Similarly, lift measures the dependence between 𝑋 and 𝑌 . From the above examples, we can derive a general notion of support-dependent interestingness measure; we say that a measure is local support-dependent (resp., global support- dependent) if and only if it can be computed by a function of the local support (resp., global support) of its set of worlds (resp., instances). We denote by 𝑙𝑚𝑒𝑎𝑠 (resp., 𝑔𝑚𝑒𝑎𝑠𝑠𝑙 ) a generic local (resp., global) support-dependent measure. Definition 6 (holding). Let ℐ be a modal dataset, 𝜌 : 𝑋 ⇒ 𝑌 a modal rule, 𝜃𝑙 a (𝑧 + 1)-vector of values in (0, 1] ⊂ R (indexed from 0 to 𝑧), and let 𝑙𝑚𝑒𝑎𝑠1 , . . . , 𝑙𝑚𝑒𝑎𝑠𝑧 be 𝑧 local support- dependent measures. We say that 𝜌 𝜃𝑙 -holds on 𝐼 ∈ ℐ if and only if: (i) 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑙 [0], and (ii) for each 1 ≤ 𝑖 ≤ 𝑧, 𝑙𝑚𝑒𝑎𝑠𝑖 (𝐼, 𝜌) ≥ 𝜃𝑙 [𝑖]. We denote such a situation by: 𝐼 ⊩𝜃𝑙 𝜌. Similarly, given 𝑠𝑙 ∈ (0, 1] ⊂ R, if 𝜃𝑔𝑠𝑙 is a (𝑧 + 1)-vector of values in (0, 1] ⊂ R (indexed from 0 to 𝑧), and 𝑔𝑚𝑒𝑎𝑠1 , . . . , 𝑔𝑚𝑒𝑎𝑠𝑧 are 𝑧 global support-dependent measures, we say that 𝜌 𝜃𝑔𝑠𝑙 -holds on ℐ if and only if: (i) 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑔𝑠𝑙 [0], and (ii) for each 1 ≤ 𝑖 ≤ 𝑧, 𝑔𝑚𝑒𝑎𝑠𝑖𝑠𝑙 (ℐ, 𝜌) ≥ 𝜃𝑔𝑠𝑙 [𝑖]. We denote such a situation by: ℐ ⊩𝜃𝑔𝑠𝑙 𝜌. As defined in the literature, the association rules mining problem has two parts: frequent patterns mining and extraction of interesting rules. Therefore, in both settings, local and global, the first conditions, 𝑙𝑠𝑢𝑝𝑝(𝐼, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑙 [0] and 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝜃𝑔𝑠𝑙 [0], enforces a rule to be frequent in terms of their supports, locally and globally, respectively. The remaining conditions can be arbitrarily combined but they cannot be separated from imposing a minimal support. Observe that, the notion of a rule holding on a single instance collapses to the standard notion of truth when instances are propositional (that is, with a single world); similarly, in the same case, the notion of a rule holding on a dataset collapses to the (implicit) notion of propositional rule holding on a dataset (driven by the classical support and confidence/lift/. . . ). The definition of such generalized notions is not unique, as different constraints could be enforced. However, not every combination makes sense. For example, forcing a rule to be globally supported if and only if it is locally supported on every instance would imply requiring perfect global support, which is rather restrictive and possibly non-adherent to the intended use of rules. In the next section we shall focus on solving the following natural problem: given a modal dataset ℐ and a vector 𝜃𝑔𝑠𝑙 of (𝑧 + 1) values in (0, 1] ⊂ R for some 𝑠𝑙 ∈ (0, 1] ⊂ R, extract (or mine) all and only modal rules that 𝜃𝑔𝑠𝑙 -hold on ℐ, which we regard to as interesting. 4. Mining Modal Logic Rules As a consequence of focusing on support-dependent interestingness measures only, it turns out that extracting interesting rules can be achieved by simply extracting sets of literals with a relatively high support, which in the classic terminology are called frequent sets. To see this, consider a rule 𝜌 : 𝑋 ⇒ 𝑌 , and assume that we want to check if it [𝑠𝑔 , 𝑐𝑔 ]𝑠𝑔𝑙 -holds on a certain dataset ℐ, where 𝑐𝑔 is the minimum requested global confidence. To this end, it must hold that 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) ≥ 𝑠𝑔 and that 𝑔𝑐𝑜𝑛𝑓𝑠𝑙 (ℐ, 𝜌) ≥ 𝑐𝑔 . To compute the latter, one has to compute, first, 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ) and 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋), which are both by-products of computing 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋 ∪ 𝑌 ). Therefore, being able to extract all frequent sets of literals is necessary and sufficient to be able to extract all interesting rules as we have defined them. This phenomenon occurred at the propositional level too, obviously; it is interesting to observe that it still holds at the modal one by inheriting exactly the same procedure (see again, e.g., [5]). The problem of enumerating all sets of frequent literals has been studied in the literature [17]; in general, enumerating frequent sets has several applications, not necessarily related to rule extraction. The first key observation is that so-called, and well-known APRIORI principle. Proposition 1 (APRIORI principle [5]). If a set of literals is frequent, then all of its subsets must be frequent. Naturally, there are a potentially exponential number of frequent sets of a given dataset ℐ, even at the propositional level. Focusing on maximal frequent sets of literals does not change this, and the complexity of the problem of extracting them becomes therefore interesting. As proved in [17], the counting problem associated to the problem of extracting all maximal frequent sets is already #𝑃 -hard [18]. In most cases, the interesting problems in machine learning are computationally hard, thus the go-to strategy is turning to approximation algorithms. This is not the case with frequent set extraction: in many practical cases, indeed, there are only a few maximal frequent sets in a given dataset, and enumerating them in a correct and complete fashion is the typical approach. There are two classical approaches to this problem, called, respectively, APRIORI [5] and FP-Growth [6]. In the rest of this section, we consider the former one, and we study its modal generalization. ALGORITHM 1: Modal extension of the APRIORI algorithm for mining frequent modal patterns. 1 function 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼(ℐ, 𝒫, 𝑠𝑔 , 𝑠𝑙 , 𝛿): 2 Λ𝒫 ← 𝐿𝑖𝑡𝑒𝑟𝑎𝑙𝑠(𝒫, 𝛿) 3 𝐶1 ← {{𝜆} | 𝜆 ∈ Λ𝒫 } 4 𝐹1 ← {𝑋 ∈ 𝐶1 | 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) ≥ 𝑠𝑔 } 5 𝑘←1 6 repeat 7 𝑘 ←𝑘+1 8 𝐶𝑘 ← 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(𝐹𝑘−1 ) 9 𝐶𝑘 ← 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(𝐶𝑘 , 𝐹𝑘−1 ) 10 𝐹𝑘 ← {𝑋 ∈ 𝐶𝑘 | 𝑔𝑠𝑢𝑝𝑝𝑠𝑙 (ℐ, 𝑋) ≥ 𝑠𝑔 } 11 until 𝐹𝑘 = ∅ return 𝑘𝑖=1 𝐹𝑖 ⋃︀ 12 13 end 14 function 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(𝐹𝑘−1 ): 15 𝐶𝑘 ← {{𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 } | {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 }, {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } ∈ 𝐹𝑘−1 and 𝜆𝑘−1 < 𝜆′𝑘−1 } 16 return 𝐶𝑘 17 end 18 function 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(𝐶𝑘 , 𝐹𝑘−1 ): 19 foreach 𝑋 ∈ 𝐶𝑘 do 20 forall (𝑘 − 1)-length subsets 𝑆 of 𝑋 do 21 if 𝑆 ∈/ 𝐹𝑘−1 then 𝐶𝑘 ← 𝐶𝑘 ∖ 𝑋 22 end 23 end 24 return 𝐶𝑘 25 end APRIORI has been the first pruning-based algorithm for mining sets of propositional literals. Borrowing the same concept from the propositional level, the global support models the (relative) frequency of instances in ℐ that satisfy the modal pattern 𝑋, whereas the local support, in the more general modal setting that we present here, models the frequency of worlds in 𝐼 ∈ ℐ that satisfy the modal pattern 𝑋. Observe that the anti-monotonic property (the APRIORI principle) also holds for the local support, as longer modal patterns hold on fewer worlds, which is a consequence of the logical conjunction in the definition of the truth 𝐼, 𝑤 ⊩ 𝑋. The modal extension of APRIORI is based on the following assumption, which also holds in its propositional version of APRIORI: modal literals in modal patterns can be lexicographically ordered. The reason behind such an assumption, that works in synergy with the APRIORI principle, is to avoid useless computation during the learning routine. Algorithm 1 generalizes the APRIORI algorithm to the modal case and takes as input a modal dataset ℐ, an alphabet 𝒫 defined on ℐ, three user-specified thresholds 𝑠𝑙 , 𝑠𝑔 ∈ (0, 1] ⊂ R, and 𝛿 ∈ N for the minimum local support, minimum global support, and maximum modal depth of modal literals, respectively. Setting a maximum modal depth is a necessary step for the algorithm to work and terminate, as there is no anti-monotonic property that holds for modal literals; it should be noticed, however, that modal rules with too long modal literals would disfavour their interpretability. In Algorithm 1, 𝐹𝑘 and 𝐶𝑘 denote the set of frequent modal patterns and the set of generated candidates of modal patterns of length 𝑘, for 𝑘 ≥ 1 ∈ N, respectively. The algorithm starts by computing the set of modal literals Λ𝒫 from 𝒫 and 𝛿. Then, it computes 𝐹1 whose patterns are frequent from 𝐶1 . Next, the repeat-until loop block of the algorithm computes iteratively 𝐹𝑘 , with 𝑘 > 1. The 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(·) procedure generates candidate modal patterns 𝐶𝑘 by joining1 𝐹𝑘−1 with itself: two lexicographically ordered frequent (𝑘 − 1)-length modal patterns {𝜆1 , . . . , 𝜆𝑘−1 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } of 𝐹𝑘−1 are merged together if they have the same modal patterns in the first 𝑘 − 2 positions and if 𝜆𝑘−1 < 𝜆′𝑘−1 to form a candidate 𝑘-length modal pattern {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. The candidates from 𝐶𝑘 are then pruned by the 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(·) function based on the APRIORI principle. It is important to stress the fact that, both 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛(·) and 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑃 𝑟𝑢𝑛𝑖𝑛𝑔(·) are identical as in the canonical APRIORI algorithm. At this stage, the algorithm produces 𝐹𝑘 whose elements are from 𝐶𝑘 which are frequent. Termination is guaranteed by the anti-monotonic property of the support. We now study correctness and completeness of our approach. It is important to stress the fact that completeness is parametric in 𝛿. Theorem 1 (correctness). 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼 is correct. Proof. We want to show that 𝐹𝑘 , for each 𝑘 ≥ 1, is the set of frequent 𝑘-length modal pat- terns. The proof is by induction on 𝑘. If 𝑘 = 1, then 𝐹1 contains singleton sets of modal literals of the type {𝜆}, where 𝜆 ∈ Λ𝒫 , that are frequent. Thus, by construction, 𝐹1 is made of frequent 1-length modal patterns. If 𝑘 > 1, to obtain the set of frequent sets of modal literals 𝐹𝑘 , the algorithm generates candidates by combining elements of 𝐹𝑘−1 , which are frequent by inductive hypothesis. Let {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 } be el- ements of 𝐹𝑘−1 used by the candidate generation procedure to generate 𝑋 ∈ 𝐹𝑘 , such that 𝑋 = {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. If 𝑋 is frequent, that is, meets the support, then 𝑋 ∈ 𝐹𝑘 . Therefore, 𝐹𝑘 is made of 𝑘-length frequent patterns. Theorem 2 (completeness). 𝑀 𝑜𝑑𝑎𝑙𝐴𝑃 𝑅𝐼𝑂𝑅𝐼 is complete. Proof. Let 𝐹𝑘 be the set of frequent sets of modal literals and let 𝐶𝑘 be the set of candidate frequent sets of modal literals, whose modal patterns have length 𝑘. We want to show that 𝐹𝑘 ⊆ 𝐶𝑘 , for all 𝑘 ≥ 1. The proof is by induction on 𝑘. If 𝑘 = 1, then 𝐶1 is the set of singleton sets of the type {𝜆}, and 𝐹1 ⊆ 𝐶𝑘 holds since 𝐹1 is defined as the subset of 𝐶𝑘 that are frequent. If 𝑘 > 1, assume, by inductive hypothesis, that 𝐹𝑘−1 ⊆ 𝐶𝑘−1 . We must show that 𝐹𝑘 ⊆ 𝐶𝑘 . Let 𝑋 = {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 , 𝜆′𝑘−1 }. The pattern 𝑋 is generated by two (𝑘 − 1)-length modal patterns, say {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆𝑘−1 } and {𝜆1 , . . . , 𝜆𝑘−2 , 𝜆′𝑘−1 }, such that 𝜆𝑘−1 < 𝜆′𝑘−1 , which are necessarily frequent, by inductive hypothesis. Thus, 𝑋 ∈ 𝐶𝑘 must hold and if 𝑋 meets the support, then 𝑋 ∈ 𝐹𝑘 , obtaining the thesis. 5. Applications Modal logics are, as we have observed, archetypal of many temporal, spatial, and spatio- temporal logics. The purpose of modal symbolic learning is designing symbolic machine learning 1 This is the original nomenclature. 𝐴2 Instance 𝐴𝑖𝑑 𝐴1 𝐴2 𝑇 𝐽1 𝐴1 𝐼1 𝑎1𝑖𝑑 𝑎11 1 𝑎11 2 𝑡1 𝐼2 𝑎1𝑖𝑑 𝑎12 1 𝑎12 2 𝑡2 𝐼3 𝑎1𝑖𝑑 𝑎13 1 𝑎13 2 𝑡3 𝐼4 𝑎1𝑖𝑑 𝑎14 1 𝑎14 2 𝑡4 𝐴2 𝐼5 𝑎2𝑖𝑑 𝑎21 1 𝑎21 2 𝑡1 𝐽2 𝐼6 𝑎2𝑖𝑑 𝑎22 𝑎22 𝑡2 𝐴1 1 2 𝐼7 𝑎2𝑖𝑑 𝑎23 1 𝑎23 2 𝑡3 𝐼8 𝑎2𝑖𝑑 𝑎24 1 𝑎24 2 𝑡4 𝐼9 𝑎3𝑖𝑑 𝑎31 1 𝑎31 2 𝑡1 𝐴2 𝐼10 𝑎3𝑖𝑑 𝑎32 1 𝑎32 2 𝑡2 𝐽3 𝐼11 𝑎3𝑖𝑑 𝑎33 1 𝑎33 2 𝑡3 𝐴1 𝐼12 𝑎3𝑖𝑑 𝑎34 1 𝑎34 2 𝑡4 𝐼13 𝑎4𝑖𝑑 𝑎41 1 𝑎41 2 𝑡1 𝐼14 𝑎4𝑖𝑑 𝑎42 1 𝑎42 2 𝑡2 𝐴2 𝐼15 𝑎4𝑖𝑑 𝑎43 1 𝑎43 2 𝑡3 𝐽4 𝐼16 𝑎4𝑖𝑑 𝑎44 1 𝑎44 2 𝑡4 𝐴1 Figure 2: An example of dimensional dataset. techniques to deal with non-static data. Non-static data include, among others, dimensional data (that is, data with temporal and/or spatial components), graph-based data (that is, data natively in the form of graphs, such as social networks, or gene expression networks), and textual data (that is, data in the form of natural language text). The most basic advantage of modal symbolic learning is that of reuniting all such non-static data under the same viewpoint, that is, seeing them all as modal data. How this is done in all cases is outside the scope of this paper; by way of example we take here the dimensional, temporal case to show how the modal framework emerges naturally. Dimensional datasets can be regarded as modal datasets in which the Kripke model associated to an instance is implicit and presented in tabular form; this corresponds to the typical presen- tation of dimensional data, as in Fig. 2, left-hand side. While in the classical sense instances correspond to lines, more than one line make up for a real-world instance. So, for example, each instance of the left-hand side of the picture may be the status of a patient under observation, each being described by two attributes 𝐴1 and 𝐴2 , over the course of four time points 𝑡1 through 𝑡4 . Thus, the intended meaning of such a dataset is that we have in fact four patients (being distinguished by their names 𝑎1𝑖𝑑 through 𝑎4𝑖𝑑 ), which become four multivariate time series, denoted by 𝐽1 through 𝐽4 , each composed by four values per attribute, as in Fig. 2, right-hand side. By regarding each interval in the temporal domain 𝑡1 , . . . , 𝑡4 as a world, one ends up building a Krikpe structure from each time series (in this example, each structure has 6 worlds, that is, 6 non-instantaneous intervals). Connecting each pair of such intervals there is exactly one of 12 Allen’s relations [19]; the modal logic that emerges from this preprocessing, and that allows one to express how the properties of a time series relate to each other through time is known as interval temporal logic of Allen’s relations, or HS (from Halpern and Shoham, who first introduced it [9]). Observe, now, that each interval in this setting accounts for a set of values for each attribute; for example, in Fig 2, the instance 𝐽1 has values 𝑎11 1 , 𝑎1 , 𝑎1 for 𝐴1 in the interval [𝑡1 , 𝑡3 ]. By 12 13 applying suitable feature extraction function(s) one can, at this point, aggregate such set of values in a single one; typical aggregations include the average, the maximum, and the minimum, but in the literature of learning from temporal data one finds several more, and less trivial, examples. Let us call 𝑓 any such aggregation function; by applying 𝑓 to every possible interval for the attribute 𝐴, one ends up with a new set of values, that is, the domain 𝑑𝑜𝑚(𝑓 (𝐴)). At this point, it is possible to extract a propositional alphabet, of the type: 𝑎1 ≤ 𝑓 (𝐴) ≤ 𝑎2 , with 𝑎1 , 𝑎2 ∈ 𝑑𝑜𝑚(𝑓 (𝐴)). By labeling each world (interval) with the truth value of each of the propositional letters, we extract from a dimensional, temporal dataset a modal one. The explicit knowledge that can be extracted from electroencephalogram recording (EEG) of subjects that are in certain conditions or are carrying on certain activities is of uttermost impor- tance in the scientific enquiry of those conditions and activities. The current approaches to this problem are entirely functional, but, despite the terminological affinity, black-box classification models such as neural networks cannot be considered models of the human mind. EEG is a real-world example of dimensional data that could be used for modal association rule extraction. A typical EEG is performed with a head-set that accommodates several electrodes (typically, in the fifties), each one of which records, during a single experiment, the voltage that is expressed by a specific area of the human brain; the signal of each electrode is then preprocessed through Fourier-like transformations, so that each signal gives rise to several components at different frequency intervals (typically, in the tens). All in all, a single EEG recording generates anywhere between 500 and 2000 temporal attributes (pairs electrode-frequency) over the course of the experiment. A recent trend in neurobiology is focusing on how different experimental situations can be distinguished by the form and the values of the corresponding EEG recording, which helps neurobiologists to understand the nature of the different brain areas in different settings. While most often these problems are designed as classification problems, understanding how the several attributes relate to each other through time in a specific situation may turn out to be very interesting. Temporal relationships can be extracted as modal rules, specifically HS rules, which would have the form of assertions of the type: if the value of the voltage in this sensor at this frequency is within this interval of values during an interval of time, then the value of the voltage in this other sensor at this other frequency must be within this other interval immediately after it. 6. Conclusions Symbolic learning is the sub-field of machine learning that deals with symbolic algorithms and models, which have been known for decades and successfully applied to a variety of contexts. The main limitation of current symbolic models is the fact that they are essentially based on classical propositional logic, which implies that dimensional data, graph-based data, or textual data must be pre-processed in order to be dealt with within the classic framework, at the expenses of decreasing the interpretability of the result and loosing information. Modal symbolic learning is the modal logic generalization of symbolic learning; in this paper we considered the problem of defining the concept of modal association rules and their logical characterization, and we devised a correct and complete algorithm, ModalAPRIORI, which is able to extract frequent sets of modal literals both at the global and local level, and therefore modal symbolic rules. We also discussed potential applications of modal symbolic rules. References [1] Parliament and Council of the European Union, General Data Protection Regulation, 2016. URL: https://gdpr-info.eu/. [2] F. Manzella, G. Pagliarini, G. Sciavicco, I. E. Stan, Interval Temporal Random Forests with an Application to COVID-19 Diagnosis, in: Proceedings of the 28th International Symposium on Temporal Representation and Reasoning (TIME), volume 206 of Leibniz International Proceedings in Informatics, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021, pp. 7:1–7:18. [3] G. Pagliarini, G. Sciavicco, Decision Tree Learning with Spatial Modal Logics, in: Pro- ceedings of the 12th International Symposium on Games, Automata, Logics, and Formal Verification (GandALF), volume 346 of Electronic Proceedings in Theoretical Computer Science, 2021, pp. 273–290. [4] G. Sciavicco, I. E. Stan, Knowledge extraction with interval temporal logic decision trees, in: Proceedings of the 27th International Symposium on Temporal Representation and Reasoning (TIME), volume 178 of Leibniz International Proceedings in Informatics, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, pp. 9:1–9:16. [5] R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases, in: Proceedings of 20th International Conference on Very Large Data Bases (VLDB), 1994, pp. 487–499. [6] J. Han, J. Pei, Y. Yin, Mining Frequent Patterns without Candidate Generation, in: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000, pp. 1–12. [7] D. Bresolin, E. Cominato, S. Gnani, E. Muñoz-Velasco, G. Sciavicco, Extracting Interval Temporal Logic Rules: A First Approach, in: Proceedings of the 25th International Symposium on Temporal Representation and Reasoning (TIME), volume 120 of Leibniz International Proceedings in Informatics, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018, pp. 7:1–7:15. [8] E. M. Clarke, E. A. Emerson, Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic, in: Proceedings of the Workshop on Logics of Programs (LOP), volume 131 of Lecture Notes in Computer Science, Springer, 1981, pp. 52–71. [9] J. Y. Halpern, Y. Shoham, A Propositional Modal Logic of Time Intervals, Journal of the ACM 38 (1991) 935–962. [10] A. Pnueli, The Temporal Logic of Programs, in: Proceedings of the 18th Annual Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, 1977, p. 46–57. [11] M. Aiello, J. van Benthem, A Modal Walk Through Space, Journal of Applied Non-Classical Logics 12 (2002) 319–364. [12] C. Lutz, F. Wolter, Modal Logics of Topological Relations, Logical Methods in Computer Science 2 (2006) 1–41. [13] J. Fürnkranz, D. Gamberger, N. Lavrac, Foundations of Rule Learning, Cognitive Technolo- gies, Springer, 2012. [14] A. Horn, On Sentences Which are True of Direct Unions of Algebras, Journal of Symbolic Logic 16 (1951) 14–21. [15] P. N. Tan, M. S. Steinbach, A. Karpatne, V. Kumar, Introduction to Data Mining, 2nd ed., Pearson, 2019. [16] D. Bresolin, E. Muñoz-Velasco, G. Sciavicco, On Sub-Propositional Fragments of Modal Logic, Logical Methods in Computer Science 14 (2018). [17] G. Yang, The complexity of mining maximal frequent itemsets and maximal frequent pat- terns, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 344–353. [18] L. G. Valiant, The Complexity of Enumeration and Reliability Problems, SIAM Journal on Computing 8 (1979) 410–421. [19] J. F. Allen, Maintaining Knowledge about Temporal Intervals, Communication of the ACM 26 (1983) 832–843.