=Paper=
{{Paper
|id=Vol-3733/short1
|storemode=property
|title=Condensed Representations for Contrast Sequential Pattern Mining in ASP
|pdfUrl=https://ceur-ws.org/Vol-3733/short1.pdf
|volume=Vol-3733
|authors=Gioacchino Sterlicchio,Francesca A. Lisi
|dblpUrl=https://dblp.org/rec/conf/cilc/SterlicchioL24
}}
==Condensed Representations for Contrast Sequential Pattern Mining in ASP==
Condensed Representations for Contrast Sequential Pattern Mining in ASP Gioacchino Sterlicchio1,2,3,* , Francesca A. Lisi2,4 1 Dept. of Mechanics, Mathematics and Management, Polytechnic University of Bari, Via G. Amendola 126/b - 70126 Bari, Italy 2 Dept. of Computer Science, University of Bari “Aldo Moro”, Via E. Orabona 4, Bari, 70125, Italy 3 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, 29 Av. John F. Kennedy, 1855 Kirchberg Luxembourg, Luxembourg 4 Centro Interdipartimentale di Logica e Applicazioni (CILA), University of Bari “Aldo Moro”, Via E. Orabona 4, Bari, 70125, Italy Abstract In this work, we address an extension of the contrast sequential pattern mining problem which aims at detecting condensed representations for contrast sequential patterns. The problem is encoded with Answer Set Programming (ASP). The efficiency and scalability of the ASP encoding are evaluated on two publicly available dataset, iPRG and UNIX. Keywords Declarative Pattern Mining, Contrast Sequential Pattern Mining, Condensed representations, Answer Set Pro- gramming 1. Introduction The continuous increase in available data makes effective and efficient techniques necessary to extrap- olate key information to be used to make decisions in various application contexts. An example is temporal data (e.g. system logs, banking transactions, telephone records, ...) that must be analyzed using suitable methodologies such as Sequential Pattern Mining (SPM) [1]. In SPM, the goal is to find frequent and non-empty temporal sequences (i.e. sequential patterns) from a sequence dataset. It also happens that available temporal data is labeled or grouped according to precise semantics. For example, in the domain of network security it is possible to label the network behavior as normal or as anomalous and in this last case an attack could be underway. The idea behind Contrast Pattern Mining (CPM) [2] is to find statistically significant differences between two or more disjoint datasets or portions of the same dataset. The possibility of merging the two previous concepts for finding significant differences between frequent sequences of different classes is known as Contrast Sequential Pattern Mining (CSPM) [3]. CSPM is the pattern mining task considered in this paper. In recent years there has been an increasing interest in the so-called Declarative Pattern Mining (DPM), a research stream in which the objective is to develop declarative approaches to pattern mining. Several encodings have been presented so far, to cover pattern mining tasks such as sequence mining [4, 5] and frequent itemset mining [6, 7]. Answer Set Programming (ASP) is widely used in DPM. The first proposal is described by Guyet et al. [8]. The authors explore the SPM problem with ASP and compare their method with a dedicated algorithm. Gebser et al. [5] use ASP for extracting condensed representations of sequential patterns. Samet et al. in [9] mine rare sequential patterns with ASP. In [10], Guyet et al. propose a real world application to ASP-based DPM investigating the possible association between hospitalization for seizure and antiepileptic drug switch from a French medico-administrative database. Guyet et al. [11] present the use of ASP to mine sequential patterns within two representations of embeddings (fill-gaps vs skip-gaps) and compare them with CP. An hybrid ASP approach is proposed CILC 2024: 39th Italian Conference on Computational Logic, June 26-28, 2024, Rome, Italy * Corresponding author. $ g.sterlicchio@phd.poliba.it (G. Sterlicchio); FrancescaAlessandra.Lisi@uniba.it (F. A. Lisi) http://www.di.uniba.it/~lisi/ (F. A. Lisi) 0000-0002-2936-0777 (G. Sterlicchio); 0000-0001-5414-5844 (F. A. Lisi) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings by Paramonov et al. [12] which combines dedicated algorithms for pattern mining and ASP. In [13, 14] Guyet’s ASP encodings for SPM are adapted in order to address the requirements of an application in the digital forensics domain.1 Motivated by the same application, Lisi and Sterlicchio in [15] propose an ASP-based approach to CPM.2 In [16], the same authors present the first ASP encoding for the CSPM problem, which we call MASS-CSP (Mining with Answer Set Solving - Contrast Sequential Patterns) hereafter. In this paper we want to carry on the work on MASS-CSP by facing one of the major problems in pattern mining, i.e. the huge number of patterns most of which might be not useful. To this aim we explore the so-called condensed representations to decrease the size of output. The paper is organized as follows. In Section 2 we briefly recall the basics of ASP and give the necessary background on CSPM. In Section 3 we define the problem of mining condensed representations for the CSPM task and report the experimental results obtained on the a couple of datasets. Section 4 concludes the paper with final remarks. 2. Background 2.1. Answer Set Programming Answer Set Programming (ASP) [17, 18] is a declarative programming paradigm that allows for the representation and solving of complex combinatorial problem. It is based on the logical formalism of logic programs, specifically disjunctive logic programs with answer set semantics. ASP provides a powerful tool for solving problems in various domains such as planning, scheduling, reasoning about actions, and knowledge representation. In an ASP program, rules are defined using predicates and logical connectors such as conjunction, disjunctions, and negations. The program consists of a set of rules which define relationships between different elements in the problem domain. These rules are then used to generate answer sets - sets of consistent interpretations that satisfy all constraints specified in the program. One advantage of ASP over other declarative programming paradigms is its expressiveness and flexibility in representing complex problems concisely through logical constraints. Additionally, ASP programs can be easily modified or extended without changing their overall structure due to their modular nature. ASP solvers use sophisticated algorithms based on efficient search techniques to compute answer sets, the most important are Clingo [19] and DLV [20]. An example of general rule is: 𝑎1 ∨. . .∨𝑎𝑛 ← 𝑏1 , . . . , 𝑏𝑘 , 𝑛𝑜𝑡𝑏𝑘+1 , . . . , 𝑛𝑜𝑡𝑏𝑚 . The rule says that if 𝑏1 , . . . , 𝑏𝑘 are true and there is not reason for believing that 𝑏𝑘+1 , . . . , 𝑏𝑚 are true then at least one of the 𝑎1 , . . . , 𝑎𝑛 is believed to be true. The left hand side and the right hand side of the ← are called head and body respectively. Rules without body are called facts. The head is unconditionally true and the arrow is usually omitted. Conversely, rules without head are called constraints and are used to discard stable models, thus reducing the number of answers returned by the ASP solver. 2.2. Contrast Sequential Pattern Mining Contrast sequential Pattern Mining (CSPM) [3] is a data mining technique that aims to discover interesting patterns in sequential data by comparing and contrasting different sequences. This approach goes beyond traditional sequential item pattern mining, which focuses solely on finding frequent patterns, by also considering the differences between sequences. The main idea behind CSPM is to identify patterns that occur frequently in one group of sequences but infrequently in another group. This allows for the detection of significant differences between two sets of sequences and can provide valuable insights into the underlying relationships or trends within data. By focusing on contrasting patterns, CSPM can uncover hidden associations or trends that may not be apparent when analyzing each group separately. This can lead to new discoveries and insights into complex datasets where traditional pattern mining techniques may fall short. Below, we will show how to arrive at the final definition of CSPM starting from SPM, finally we will show an example to understand the technique. 1 We refer to this encoding as MASS-SP (Mining with Answer Set Solving - Sequential Patterns). 2 We refer to this encoding as MASS-CP (Mining with Answer Set Solving - Contrast Patterns). ID Sequence Class 1 ⟨𝑎 𝑏 𝑎 𝑐 𝑑⟩ 𝐶1 2 ⟨𝑎 𝑏 𝑐⟩ 𝐶1 3 ⟨𝑐 𝑎 𝑏 𝑐⟩ 𝐶1 4 ⟨𝑐⟩ 𝐶1 5 ⟨𝑏 𝑐 𝑎 𝑎⟩ 𝐶2 6 ⟨𝑐 𝑏 𝑎⟩ 𝐶2 7 ⟨𝑐 𝑏 𝑎⟩ 𝐶2 8 ⟨𝑎 𝑏 𝑎 𝑐 𝑏 𝑎⟩ 𝐶2 Table 1 Example of a sequence dataset. Each sequence has a class label, that is used in CSPM Let 𝐷 be a database containing a set of sequences 𝑆 = {𝑠1 , 𝑠2 , . . . , 𝑠𝑘 }, where each sequence 𝑠𝑖 consists of ordered elements or items from an alphabet Σ. A sequence is represented as 𝑠𝑖 = ⟨𝑖1 , 𝑖2 , . . . , 𝑖𝑚 ⟩, where each item 𝑖𝑘 belongs to Σ and appears in the sequence in order according to some timestamp or position information. A sequential pattern 𝑃 is defined as an ordered list of items 𝜋 = ⟨𝑎1 , 𝑎2 , . . . , 𝑎𝑘 ⟩ such that each 𝑎𝑖 ∈ Σ and occurs consecutively in at least one sequence in 𝑆. The 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 of a sequential pattern 𝜋 is the number of sequences in which it occurs. Given a minimum support threshold minsup, SPM aims to find all frequent sequential patterns 𝜋, such that 𝑠𝑢𝑝𝑝(𝜋) ≥ 𝑚𝑖𝑛𝑠𝑢𝑝. frequent sequential patterns are those that occur frequently enough within the dataset base on the specified support threshold. A contrast sequential pattern is defined as a sequential pattern tha occurs frequently in one sequence dataset but not in the others. It is necessary to introduce the concept of growth rate and contrast rate to find contrast sequential patterns. Given two sequences dataset, 𝐷1 labeled with the 𝐶1 class and 𝐷2 labeled as 𝐶2 , first we compute the growth 𝑠𝑢𝑝𝑝(𝜋, 𝐷1 )/|𝐷1 | rate from 𝐷2 to 𝐷1 of a sequential pattern 𝜋 as 𝐺𝑅𝐶1 (𝜋) = 𝑠𝑢𝑝𝑝(𝜋, 𝐷2 )/|𝐷2 | . If the 𝑠𝑢𝑝𝑝(𝜋, 𝐷2 ) = 0 and 𝑠𝑢𝑝𝑝(𝜋, 𝐷1 ) ̸= 0 then 𝐺𝑅𝐶1 (𝜋) = ∞. After, the growth rate from 𝐷1 to 𝐷2 of 𝜋 is defined 𝐷2 )/|𝐷2 | as 𝐺𝑅𝐶2 (𝜋) = 𝑠𝑢𝑝𝑝(𝜋, 𝑠𝑢𝑝𝑝(𝜋, 𝐷1 )/|𝐷1 | . If the 𝑠𝑢𝑝𝑝(𝜋, 𝐷1 ) = 0 and 𝑠𝑢𝑝𝑝(𝜋, 𝐷2 ) ̸= 0 then 𝐺𝑅𝐶2 (𝜋) = ∞. The contrast rate of 𝜋 is defined as 𝐶𝑅(𝜋) = 𝑚𝑎𝑥{𝐺𝑅𝐶1 (𝜋), 𝐺𝑅𝐶2 (𝜋)} and if 𝐺𝑅𝐶1 (𝜋) = 0 and 𝐺𝑅𝐶2 (𝜋) = 0 then 𝐶𝑅(𝜋) = ∞. 𝜋 is a contrast sequential pattern if 𝐶𝑅(𝜋) ≥ 𝑚𝑖𝑛𝑐𝑟, where mincr is the minimum contrast rate threshold. Table 1 shows a sequences dataset 𝐷 that we split in 𝐷1 and 𝐷2 according to the classes 𝐶1 and 𝐶2 , respectively. We start by finding sequential patterns first and given 𝑚𝑖𝑛𝑠𝑢𝑝 = 2, ⟨𝑎 𝑏 𝑐⟩ is a sequential pattern because in occurs in sequences 1, 2, 3, and 8. Another example is ⟨𝑐 𝑏 𝑎⟩ within sequences number 6, 7, and 8. Assuming we have found all the sequential patterns, we check whether these are contrasting for one of the two classes. Given 𝑚𝑖𝑛𝑐𝑟 = 2, 𝜋1 = ⟨𝑎 𝑏 𝑐⟩ and the metrics 𝑠𝑢𝑝𝑝(𝜋1 , 𝐷1 ) = 3, 𝑠𝑢𝑝𝑝(𝜋1 , 𝐷2 ) = 1, 𝐺𝑅𝐶1 (𝜋1 ) = 3, 𝐺𝑅𝐶2 (𝜋1 ) = 0.33, and 𝐶𝑅(𝜋1 ) = 3, 𝜋1 is a contrast sequential pattern for 𝐶1 because 𝐶𝑅(𝜋1 ) ≥ 𝑚𝑖𝑛𝑐𝑟. Given 𝜋2 = ⟨𝑐 𝑏 𝑎⟩ and its metrics 𝑠𝑢𝑝𝑝(𝜋2 , 𝐷1 ) = 0, 𝑠𝑢𝑝𝑝(𝜋2 , 𝐷2 ) = 3, 𝐺𝑅𝐶1 (𝜋) = 0, 𝐺𝑅𝐶2 (𝜋) = ∞, 𝜋2 is a contrast sequential pattern for 𝐶2 . In this specific case it has 𝐺𝑅 = ∞ therefore it is only a pattern for the 𝐶2 class. 3. Mining Condensed Representations of Contrast Sequential Patterns In traditional pattern mining, algorithms often generate a large number of patterns that may contain redundant or overlapping information. This can lead to issues such as increased computational com- plexity, difficulty in interpretation, and inefficiency in storing and processing the discovered patterns. Condensed representation techniques address these challenges by summarizing the set of mined pat- terns into a more compact form without losing important insights or key relationship within data. One common method used for condensed representation concerns the concept of closed and maximal patterns [11]. A pattern 𝑠 is closed, w.r.t. a dataset 𝐷, if no other pattern t exists such that 𝑠 ⊆ 𝑡 and 𝑠𝑢𝑝𝑝(𝑠, 𝐷) = 𝑠𝑢𝑝𝑝(𝑡, 𝐷). A pattern 𝑠 in maximal, w.r.t. a dataset 𝐷, if there are no other patterns 𝑡 such that 𝑠 ⊆ 𝑡 and 𝑠𝑢𝑝𝑝(𝑠, 𝐷) ≥ 𝑚𝑖𝑛𝑠𝑢𝑝. With reference to the example reported in Section 2.2, we Dataset |Σ| |D| ‖D‖ max|T| avg|T| density iPRG 21 8628 111,743 12 11.95 0.62 iPRG_25_25 20 50 657 12 11.88 0.64 iPRG_100_100 20 200 2591 12 11.83 0.64 iPRG_500_500 21 1000 12,933 12 11.92 0.62 iPRG_1000_1000 21 2000 25,841 12 11.91 0.61 UNIX 2672 9099 165,748 1256 18.22 0.01 UNIX_25_25 70 50 365 55 7.3 0.10 UNIX_100_100 178 200 2281 175 11.41 0.06 UNIX_500_500 420 1000 13,289 187 13.29 0.03 UNIX_755_755 540 1510 20,234 214 13.4 0.02 Table 2 Features of iPRG and UNIX User sub-datasets: The number of distinct symbols, the number of sequences, the total number of symbols in the dataset, the maximum sequence length, the average sequence length, and the ||𝐷|| density (calculated by |Σ||𝐷| ) know that ⟨𝑎 𝑏 𝑐⟩ and ⟨𝑐 𝑏 𝑎⟩ are sequential patterns. Following the definition of closed and maximal patterns, ⟨𝑎 𝑏 𝑐⟩ and ⟨𝑐 𝑏 𝑎⟩ are not only maximal but also closed and because 𝐶𝑅(⟨𝑎 𝑏 𝑐⟩) ≥ 𝑚𝑖𝑛𝑐𝑟 and 𝐶𝑅(⟨𝑐 𝑏 𝑎⟩) ≥ 𝑚𝑖𝑛𝑐𝑟 also contrast patterns. In the next section we examine the computational behavior of the condensed representations for the CSPM task comparing with the results obtained in [16] in Figures 1, 2 and, 3. In pattern mining, it is usual to evaluate the effectiveness (number of extracted patterns), runtime and memory consumption of an algorithm. Moreover in ASP-based DPM approaches it is important to know the solver and grounder time. To this end, we conducted experiments on two datasets (Table 2) creating several subsets of increasing size. In iPRG, each transaction is a sequence of peptides that is known to cleave in presence of a Trypsin enzyme,3 while in UNIX User, each transaction is a sequence of shell commands executed by a user during one session.4 . We have chosen these datasets because (i) they are suitable for the task considered in this paper (classified sequences), (ii) they have been already used in the DPM literature, in particular in [11, 4] although for a different task, and (iii) they are publicly available. Notably, transactions in both datasets are labelled with one of two classes, pos and neg. Due to lack of space we do not report the ASP encodings and all the experiments carried out, which however can be found in the Github repository devoted to MASS-CSP.5 3.1. Evaluation In the following we report and discuss the results obtained from scalability tests on iPRG and UNIX User. We have used the version 5.4.0 of Clingo, with default solving parameters. The timeout (T.O) has been set to 1 hour. The ASP programs were run on a laptop computer with Windows 10 (with Ubuntu 20.04.4 subsystem), AMD Ryzen 5 3500U @ 2.10 GHz, 8GB RAM without using the multi-threading mode of clingo. Multi-threading reduces the mean runtime but introduces variance due to the random allocation of tasks. Such variance is inconvenient for interpreting results with repeated executions. Tables 3 and 4 summarize the experiments conducted on iPRG and UNIX. It is clear that with the extraction of maximal patterns the output is further reduced compared to closed ones. Obviously a pattern can be both closed and maximal as shown for all sub-datasets when minsup grows up to 20%. As the input to the program increases, the total execution time and occupied memory increases accordingly as well as grounding time (time - solv. t.) but not much. Only when the size of the dataset reaches the order of thousands of rows (iPRG_1000_1000) or when minsup is 10% (iPRG_500_500 maximal) the process ends because of the timeout. The reason is that a high value of minsup is able to reduce the search space and runtime. Figures 1, 2 (iPRG), and 3 (UNIX) compare basic, closed and maximal representations. 3 https://dtai.cs.kuleuven.be/CP4IM/cpsm/datasets.html 4 https://archive.ics.uci.edu/ml/datasets/UNIX+User+Data 5 https://github.com/mpia3/Contrast-Sequential-Pattern-Mining It seems that maximal representation requires more time compared to the closed one while closed representation requires more memory compared to the maximal one. The advantage of using condensed representations instead of the basic frequent formulation is when we want to reduce the output size. It is strange that for UNIX_100_100 (Figure 3) the number of condensed patterns is higher than the basic ones and it will be necessary to delve deeper into this phenomenon to understand the underlying reasons. As the reader can see from the Figures mentioned before, condensed representations require more execution time and memory due to the increase in new atoms to be derived to represent these new patterns which increment the program size. Also in this case, as in various other contexts, it is necessary to define a trade-off between what types of information are wanted and at what cost. Closed Maximal (a) iPRG_25_25, mincr = 3 (b) iPRG_25_25, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 164 4.72 4.52 31.84 10% 115 13.01 12.73 34.35 20% 23 2.59 2.41 31.82 20% 23 9.09 8.77 39.13 30% 2 1.46 1.3 31.38 30% 2 5.58 5.26 38.5 40% 0 0.93 0.77 30.67 40% 0 3.01 2.7 34.7 50% 0 0.36 0.19 29.83 50% 0 1.29 0.94 30.86 (a) iPRG_100_100, mincr = 3 (b) iPRG_100_100, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 536 75.48 74.46 96.69 10% 460 116.96 115.91 88.4 20% 14 22.26 21.36 84.27 20% 14 30.37 29.22 82.51 30% 0 8.19 7.33 82.53 30% 0 15.54 14.65 84.4 40% 0 4.72 3.88 80.4 40% 0 10.92 9.98 83.43 50% 0 5.51 4.68 77.77 50% 0 7.14 6.24 82.55 (a) iPRG_500_500, mincr = 3 (b) iPRG_500_500, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 71 993.41 982.49 604.75 10% 28 T.O 3589.91 555.71 20% 12 554.70 543.17 600.45 20% 12 2167.71 2156.21 549.48 30% 0 246.49 235.8 600.45 30% 0 187.37 176.54 546.68 40% 0 1242.32 1231.62 579.17 40% 0 1523.46 1513.25 553.42 50% 0 1745.27 1735.21 585.21 50% 0 766.99 757.04 600.09 (a) iPRG_1000_1000, mincr = 3 (b) iPRG_1000_1000, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 14 T.O 3560.14 1883.13 10% 11 T.O 3560.77 2787.54 20% 9 T.O 3559.32 1875.15 20% 9 T.O 3557.44 1779.18 30% 0 736.96 694.01 1861.71 30% 0 T.O 3574.37 1773.29 40% 0 T.O 3559.38 1851.53 40% 0 2389.90 2342.93 1740.44 50% 0 3132.80 3093.54 3270.31 50% 0 T.O 3558.29 1729.63 Table 3 Number of closed and maximal patterns, runtime (seconds), solver time (seconds) and memory consumption (MB) on all iPRG sub-datasets by varying minsup and leaving fixed mincr. T.O means timeout 4. Conclusions Contrast Sequential Pattern Mining offers a powerful tool for exploring sequential data and discovering meaningful patterns by highlighting differences between groups of sequences. It has applications across various domains such as market analysis, healthcare research, fraud detection, and more where understanding contrasts in sequential data is crucial for decision-making and problem-solving. Combi- natorial explosion is typically involved in pattern mining. Condensed representation for patterns are the solutions proposed in the literature to address this issue. We have reported the results of the evaluation of closed/maximal contrast sequential patterns comparing with the basic CSPM representation in ASP. We have used two datasets from two different domains for our evaluation. The experiments illustrate Figure 1: Comparison on number of basic, closed and maximal contrast sequential patterns extracted on iPRG_25_25 Figure 2: Number of basic, closed and maximal contrast sequential patterns extracted , runtime, and memory consumption on iPRG_100_100 Figure 3: Runtime and memory consumption comparison on basic, closed, and maximal contrast sequential patterns for UNIX_100_100 what are the advantages and weaknesses of condensed representations and in particular pros and cons of closed and maximal patterns. On one hand, they reduce the number of patterns, on the other hand, they tend to consume more computational resources. Overall, condensed representations for pattern mining offers a valuable tool for extracting actionable insights from data by simplifying complex pattern structures into concise yet informative summaries that facilitate better decision-making processes across various domains such as market analysis, bioinformatics research, customer behavior prediction among others where understanding underlying trends is crucial for making informed decisions. Acknowledgments This work was partially supported by the project FAIR - Future AI Research (PE00000013), under the NRRP MUR program funded by the NextGenerationEU. References [1] C. H. Mooney, J. F. Roddick, Sequential pattern mining–approaches and algorithms, ACM Computing Surveys (CSUR) 45 (2013) 1–39. [2] G. Dong, J. Bailey, Contrast data mining: concepts, algorithms, and applications, CRC Press, 2012. [3] Y. Chen, W. Gan, Y. Wu, P. S. Yu, Contrast pattern mining: A survey, 2022. arXiv:2209.13556. Closed Maximal (a) UNIX_25_25, mincr = 3 (b) UNIX_25_25, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 13 0.23 0.09 28.77 10% 8 0.22 0.11 24.39 20% 1 0.17 0.05 23.57 20% 1 0.14 0.04 19.63 30% 0 0.15 0.03 20.25 30% 0 0.12 0.02 21.42 40% 0 0.14 0.02 21.3 40% 0 0.11 0.02 24.00 50% 0 0.17 0.04 25.45 50% 0 0.13 0.04 18.8 (a) UNIX_100_100, mincr = 3 (b) UNIX_100_100, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 104 10.99 9.82 99.74 10% 38 17.24 16.55 68.95 20% 0 2.40 1.55 93.93 20% 0 2.83 2.29 58.94 30% 0 1.82 1.02 93.39 30% 0 1.84 1.23 58.00 40% 0 1.16 0.34 93.71 40% 0 0.87 0.31 57.99 50% 0 0.98 0.21 93.32 50% 0 0.74 0.19 57.99 (a) UNIX_500_500, mincr = 3 (b) UNIX_500_500, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 13 175.92 162.35 902.93 10% 12 135.03 126.04 487.08 20% 1 68.30 53.48 890.00 20% 1 65.08 56.13 494.34 30% 0 54.61 41.47 888.28 30% 0 46.67 37.83 483.34 40% 0 31.15 17.86 890.66 40% 0 26.66 17.75 483.21 50% 0 21.76 8.84 890.66 50% 0 17.70 8.80 483.21 (a) UNIX_755_755, mincr = 3 (b) UNIX_755_755, mincr = 3 minsup #pat time solv. t. memory mincr #pat time solv. t. memory 10% 4 T.O 3569.58 1855.13 10% 3 T.O 3579.70 936.05 20% 0 206.60 176.4 1834.99 20% 0 190.55 169.98 936.61 30% 0 124.90 95.39 1812.50 30% 0 109.95 89.79 930.67 40% 0 80.04 50.43 1811.34 40% 0 71.47 50.74 930.55 50% 0 51.06 21.00 1811.33 50% 0 41.23 21.01 930.55 Table 4 Number of closed and maximal patterns, runtime (seconds), solver time (seconds) and memory consumption (MB) on all UNIX sub-datasets. T.O means timeout [4] B. Negrevergne, T. Guns, Constraint-based sequence mining using constraint programming, in: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Springer, 2015, pp. 288–305. [5] M. Gebser, T. Guyet, R. Quiniou, J. Romero, T. Schaub, Knowledge-based sequence mining with ASP, in: IJCAI 2016-25th International joint conference on artificial intelligence, AAAI, 2016, p. 8. [6] S. Jabbour, L. Sais, Y. Salhi, Decomposition based SAT encodings for itemset mining problems, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2015, pp. 662–674. [7] T. Guns, A. Dries, S. Nijssen, G. Tack, L. De Raedt, Miningzinc: A declarative framework for constraint-based mining, Artificial Intelligence 244 (2017) 6–29. [8] T. Guyet, Y. Moinard, R. Quiniou, Using answer set programming for pattern mining, arXiv preprint arXiv:1409.7777 (2014). [9] A. Samet, T. Guyet, B. Negrevergne, Mining rare sequential patterns with ASP, in: ILP 2017-27th International Conference on Inductive Logic Programming, 2017. [10] T. Guyet, A. Happe, Y. Dauxais, Declarative sequential pattern mining of care pathways, in: Artificial Intelligence in Medicine: 16th Conference on Artificial Intelligence in Medicine, AIME 2017, Vienna, Austria, June 21-24, 2017, Proceedings 16, Springer, 2017, pp. 261–266. [11] T. Guyet, Y. Moinard, R. Quiniou, T. Schaub, Efficiency analysis of ASP encodings for sequential pattern mining tasks, in: Advances in Knowledge Discovery and Management, Springer, 2018, pp. 41–81. [12] S. Paramonov, D. Stepanova, P. Miettinen, Hybrid ASP-based approach to pattern mining, Theory Pract. Log. Program. 19 (2019) 505–535. URL: https://doi.org/10.1017/S1471068418000467. doi:10. 1017/S1471068418000467. [13] F. A. Lisi, G. Sterlicchio, Declarative pattern mining in digital forensics: Preliminary results, in: R. Calegari, G. Ciatto, A. Omicini (Eds.), Proceedings of the 37th Italian Conference on Computa- tional Logic, Bologna, Italy, June 29 - July 1, 2022, volume 3204 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 232–246. URL: http://ceur-ws.org/Vol-3204/paper_23.pdf. [14] F. A. Lisi, G. Sterlicchio, Mining sequences in phone recordings with Answer Set Programming, in: P. Bruno, F. Calimeri, F. Cauteruccio, M. Maratea, G. Terracina, M. Vallati (Eds.), Joint Proceedings of the 1st International Workshop on HYbrid Models for Coupling Deductive and Inductive ReAsoning (HYDRA 2022) and the 29th RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion (RCRA 2022) co-located with the 16th International Conference on Logic Programming and Non-monotonic Reasoning (LPNMR 2022), Genova Nervi, Italy, September 5, 2022, volume 3281 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 34–50. URL: http://ceur-ws.org/Vol-3281/paper4.pdf. [15] F. A. Lisi, G. Sterlicchio, A declarative approach to contrast pattern mining, in: A. Dovier, A. Montanari, A. Orlandini (Eds.), AIxIA 2022 - Advances in Artificial Intelligence - XXIst International Conference of the Italian Association for Artificial Intelligence, AIxIA 2022, Udine, Italy, November 28 - December 2, 2022, Proceedings, volume 13796 of Lecture Notes in Computer Science, Springer, 2022, pp. 17–30. URL: https://doi.org/10.1007/978-3-031-27181-6_2. doi:10.1007/978-3-031-27181-6\_2. [16] F. A. Lisi, G. Sterlicchio, Mining contrast sequential patterns with ASP, in: R. Basili, D. Lembo, C. Limongelli, A. Orlandini (Eds.), AIxIA 2023 - Advances in Artificial Intelligence - XXI- Ind International Conference of the Italian Association for Artificial Intelligence, AIxIA 2023, Rome, Italy, November 6-9, 2023, Proceedings, volume 14318 of Lecture Notes in Computer Sci- ence, Springer, 2023, pp. 44–57. URL: https://doi.org/10.1007/978-3-031-47546-7_4. doi:10.1007/ 978-3-031-47546-7\_4. [17] V. Lifschitz, Answer sets and the language of answer set programming, AI Magazine 37 (2016) 7–12. [18] G. Brewka, T. Eiter, M. Truszczynski, Answer set programming at a glance, Communications of the ACM 54 (2011) 92–103. URL: http://doi.acm.org/10.1145/2043174.2043195. doi:10.1145/ 2043174.2043195. [19] M. Gebser, R. Kaminski, B. Kaufmann, T. Schaub, Clingo= ASP+ control: Preliminary report, arXiv preprint arXiv:1405.3694 (2014). [20] N. Leone, C. Allocca, M. Alviano, F. Calimeri, C. Civili, R. Costabile, A. Fiorentino, D. Fuscà, S. Germano, G. Laboccetta, B. Cuteri, M. Manna, S. Perri, K. Reale, F. Ricca, P. Veltri, J. Zan- gari, Enhancing DLV for large-scale reasoning, in: M. Balduccini, Y. Lierler, S. Woltran (Eds.), Logic Programming and Nonmonotonic Reasoning - 15th International Conference, LPNMR 2019, Philadelphia, PA, USA, June 3-7, 2019, Proceedings, volume 11481 of Lecture Notes in Com- puter Science, Springer, 2019, pp. 312–325. URL: https://doi.org/10.1007/978-3-030-20528-7_23. doi:10.1007/978-3-030-20528-7\_23.