=Paper= {{Paper |id=Vol-2085/sametLBP-ILP2017 |storemode=property |title=Mining Rare Sequential Patterns with ASP |pdfUrl=https://ceur-ws.org/Vol-2085/sametLBP-ILP2017.pdf |volume=Vol-2085 |authors=Ahmed Samet,Thomas Guyet,Benjamin Negrevergne |dblpUrl=https://dblp.org/rec/conf/ilp/SametGN17 }} ==Mining Rare Sequential Patterns with ASP== https://ceur-ws.org/Vol-2085/sametLBP-ILP2017.pdf

Mining rare sequential patterns with ASP
Ahmed Samet Thomas Guyet

Université Rennes 1/IRISA-UMR6074 AGROCAMPUS-OUEST/IRISA-UMR6074

Benjamin Negrevergne

Université Paris-Dauphine, PSL Research University, CNRS, LAMSADE

Abstract
This article presents an approach of meaningful rare sequential pattern
mining based on the declarative programming paradigm of Answer Set
Programming (ASP). The setting of rare sequential pattern mining is
introduced. Our ASP approach provides an easy manner to encode ex-
pert constraints on expected patterns to cope with the huge amount of
meaningless rare patterns. Encodings are presented and quantitatively
compared to a procedural baseline. An application on care pathways
analysis illustrates the interest of expert constraints encoding.

1 Introduction
Pattern mining aims at extracting meaningful structured knowledge hidden in large amounts of raw data. Build-
ing on the hypothesis that an interesting pattern occurs frequently in the data, most research on pattern mining
have focused on frequent pattern mining. However, in many cases, rare patterns can also be meaningful. For
example, this is true when physicians want to identify dangerous outcomes out of ordinary procedures from care
pathway data. Fortunately for patients, such outcomes are rares but, unfortunately for data analysts, such events
are dicult to extract using standard approaches.
Mining rare patterns has been studied in the context of itemsets [1]. But to the best of our knowledge, the
problem of mining rare sequential patterns has not been addressed yet. The lack of work on this topic has been
recently identied by Hu et al. [2] as an important matter for future research.
The main problem with rare patterns, known from experiments on rare itemsets, is their huge number.
Condensed rare patterns, called minimal rare patterns [3], have been proposed to reduce the number of patterns
to extract without loosing information. Yet, it is desirable to further reduce the number of patterns and to improve
the pattern signicance. The approach we propose in this paper, is to let the expert express extra constraints to
specify the most interesting patterns. To achieve this goal, we need to develop a method to extract condensed
rare sequential patterns which will be versatile enough to support easy addition of expert constraints. However,
most approaches based on procedural programs would require specic and long developments to integrate extra
constraints. Instead, declarative programming appears to be an interesting alternative to extract knowledge from
data under complex expert constraints.
Declarative programming, and more especially logic programming received much attention in the past decades,
and even more recently thanks to the progress in solver eciency. The state-of-the-art can be organized according
to two analysis axis: the data analysis tasks and the declarative programming paradigm. Data analysis task are
classically separated in supervised vs unsupervised learning. More especially, inductive logic programming (ILP)
[4] belongs to the eld of supervised machine learning, while the more recent eld of declarative pattern mining
[5], which consists in mining patterns with declarative encodings, belongs to the eld of unsupervised learning.
For these two tasks, several declarative paradigms have been proposed: logic programming, Prolog or Answer

In: Nicolas Lachiche, Christel Vrain (eds.): Late Breaking Papers of ILP 2017, Orléans, France, September 4-6, 2017, published at
http://ceur-ws.org

51
s1 s2 s3 s4 s5 s6 s7
Seq. hd, a, b, ci ha, c, b, ci ha, b, ci ha, b, ci ha, ci hbi hci

Table 1: Example of articial dataset of 7 sequences.

Set Programming (ASP), Constraint Programming (CP) and Satisability Solving (SAT). For each approach,
the objective is to propose a uniform representation for examples, background knowledge and hypotheses. ILP
started with encodings in Prolog [4], and alternative systems have been implemented based on CP [6] or on ASP
[7]. In declarative pattern mining, most of the approaches have been implemented using SAT [8], CP [9, 10] and
some with ASP [11]. In this work, we choose the paradigm of ASP to implement our rare sequential pattern
mining to benet from a rst-order logic programming language. It makes the encoding of extra constraints
easier and its solvers have proved their eciency [11].
The contributions of this paper are threefold:

1. we formalize the general problem of rare sequential pattern mining and we model it in ASP, together with
two important variations of this problem (non-zero-rare and minimal rare patterns). Thanks to the exibility
of ASP, we were able to solve all three problems with the same ASP solver, avoiding the tedious work of
designing and implementing an algorithm for each new problem.

2. We provide important insights on modelling eciency by comparing several alternative models. The ex-
perimental comparison shows that general ASP-based approaches can compete with ad-hoc specialized
algorithms.