-

Sequence mining using FCA and the NextPriorityConcept algorithm

Salah Eddine Boukhetta

Christophe Demko

J´er´emy Richard

Karell Bertet

0 0 Laboratory L3i, La Rochelle University , La Rochelle , France

209 222

In this paper, we are interested in sequence mining using GALACTIC, a new library based on Formal Concept Analysis (FCA) for generating a concept lattice from heterogeneous and complex data. Inspired from pattern structure theory, data in Galactic are described by predicates according to their types and a system of plugins which allows an easy integration of new characteristics, descriptions and strategies. We present new plugins to describe and analyze a sequential dataset with predicates: the KCS description (where a set of sequences are described by the set of K-Common Subsequences), the MCS description (where Maximal Common Subsequences are used to describe a sequential dataset) and the PCS description (where the Prefix Common Subsequence is used). Experimentation on both real and synthetic datasets shows the effectiveness of our plugins in terms of the number of generated predicates and concepts.

Formal Concept Analysis Lattice Pattern structure Sequences Sequence mining

Sequences appear in many areas: sequences of words in a text, trajectories, sequences of life stage, sequences of surfing on the internet, or sequences of buying products in a supermarket. A sequence is an ordered list of symbols, sets or events. Sequence mining is a topic of data mining which aims at finding patterns in a database of sequences that appears more frequently, where patterns can be subsequences, prefixes, suffixes, subsequences according to a sliding window [ 23 ].

The first generation of algorithms emerged in the 90s with the article of Agrawal and Srikant [ 1 ] that extends the well-known Apriori algorithm to sequence mining. The Apriori algorithm can be used to discover frequent subsequences such as h{M ilk, Cof f ee}, {Sugar}i in a customer transaction database, indicating that {Milk, Coffee}, followed by {Sugar}, are frequently purchased together by most customers. Many algorithms were proposed to improve the time computation of Apriori algorithm, and could be categorized in three classes [ 26 ]: horizontal formatting methods (GSP [ 27 ], PSP [ 24 ]), vertical formatting methods (Spade [ 31 ], Spam [ 2 ]) and projection-based methods (FreeSpan [ 19 ],

PrefixSpan [ 25 ]). All these algorithms take as input a database of sequences and a minimum support threshold, and generate the set of frequent subsequences. Algorithms with horizontal formatting databases require multiple scans of the database, while using vertical formatting needs at most three database scans for the construction of the ID-list tables, which allows a fast calculation of the support [ 31 ]. Projection-based methods use projected databases by means of many types of extensions [ 19 ].

For big databases and a short minimum support, these algorithms take a huge computing time, and generate a too large number of patterns that are difficult to interpret and contain redundancy. For example, subsequences ha, b, ci, ha, bi, ha, ci, hb, ci, h i

a , hbi, and hci, all with the same support, are redondant while ha, b, ci clearly represents all the others.

A second generation of algorithms focuses on maximal patterns also denoted closed patterns because they verify a well-known property of closure. Many algorithms directly address this problem (CloSpan [ 30 ], ClaSP [ 18 ] and BIDE [ 28 ]). Some algorithms also appear within

Formal Concept Analysis (FCA) frameworks [ 15 ] and their extensions to pattern structures[ 14 ] [ 21 ], where the lattice property of closed patterns is promoted. We can mention here an article for analysing titles of papers using suffix tree [ 10 ], mining medical care trajectories using pattern structures [ 7 ], sequence mining to discover rare patterns [ 8 ]. Also other studies on demographic sequences with FCA and pattern structures, [ 16 ] and [ 17 ].

Formal Concept Analysis (FCA) appears in 1982 [ 29 ], then in the Ganter and Wille’s 1999 work [ 15 ]. It is based on mathematical ordered sets theory, and the theory of lattices introduced by Birkhoff in 1940 [ 5 ]. Starting from a binary relation between a set of objects and a set of attributes, formal concepts are built as maximal sets of objects in relation with maximal sets of attributes. The whole set of concepts equipped with a specialization/generalization relation forms a partially ordered set with the lattice property called the concept lattice. This lattice represents the initial data where concepts are clusters of similar objects, and the concept lattice is a hierarchy of clusters.

FCA is issued from a branch of applied lattice theory that first appeared in the Barbut and Monjardet’s 1970 work [ 3 ]. By means of derivation operators between objects and attributes forming a Galois connection and whose composition is a closure operator [ 4 ], nice bijections between (reduced) binary data, lattices and bases of rules have been stated, providing efficient methods of data analysis based on these canonical structures, thus with few tuning. The lattice property guarantees both a hierarchy of clusters, and a complete and consistent navigation structure for interactive approaches [ 12 ].

The formalism of pattern structures [ 14 ] [ 21 ] and abstract conceptual navigation [ 11,13 ] extend FCA to deal with non binary data, where data is described by patterns such that the patterns space must be organised as a semi-lattice in order to maintain a Galois connection between objects and their descriptions. By FCA framework, pattern lattice and bases of rules are defined, where a concept is composed of a subset of objects together with their common patterns, and a rule possesses patterns in premises and conclusions. However, pattern lattices are huge, often untractable [ 20 ], and the need for approaches to drive the search towards the most interesting patterns is a current challenge.

Inspired from pattern structures, the NextPriorityConcept algorithm, introduced in a recent article [ 9 ], proposes a user-driven patterns mining approach for heterogeneous and complex data as input. This algorithm allows a generic pattern computation through specific descriptions of objects by predicates. It also proposes to reduce predecessors of a concept by refinement of a set of objects into a fewer one through specific user exploration strategies, thus a reduction of the number of generated patterns.

In this article, we propose to specialize this NextPriorityConcept algorithm for new sequence mining approaches with three descriptions and two strategies dedicated to sequences. As descriptions, we propose the classical maximal common subsequences (MCS), the maximal prefix (PCS), and the k-subsequences (KCS). Therefore a set of sequences can be mined by one of these three descriptions in a generic way. We also propose two strategies of pattern exploration in order to reduce a given set of sequences into their predecessors in the pattern lattice. The first strategy (simple) considers all the possible subsets of sequences as in classical approaches, while the second one (AAS) proposes a refinement of a set of sequences by adding only one item to their common description.

Section 2 introduces a short description of the NextPriorityConcept algorithm and basic definitions related to sequence mining. Section 3 will be dedicated to our new sequence descriptions and strategies. Experimental results are presented in Section 4. 2 2.1

Preliminaries

Description of the NextPriorityConcept algorithm

The NextPriorityConcept algorithm [ 9 ] computes concepts for heterogeneous and complex data G as inputand its main characteristics are: Heterogeneous data as input, described by specific predicates. The algorithm introduces the notion of description δ as an application to provide predicates describing a set of objects A ⊆ G according to their characteristics, each predicate being specific to one type of characteristic. Predicates describing the objects A are locally computed by a specific treatment for each type of characteristics, depending on whether it is digital, discrete or more complex, and the final description δ is the union of these predicates. Each concept (A, δ(A)) is then composed of a subset of objects A together with a set of predicates δ(A) describing them. Such generic use of predicates makes it possible to consider heterogeneous data as input, i.e. digital, discrete or more complex data. As in pattern structutres, the descriptions must verify δ(A) v δ(A0) for A0 ⊆ A in order to obtain a lattice. However, unlike classical pattern structures, predicates are not globally computed in a preprocessing step, but locally for each concept.

Concept lattice generation. NextPriorityConcept algorithm is inspired from Bordat’s algorithm[ 6 ], also found in Linding’s work [ 22 ], that recursively computes the immediates successors of a concept, starting with the bottom concept. It is a dual version that computes the immediate predecessors of a concept, starting with the top concept (G, δ(G)) containing the whole set of objects, until no more concepts can be generated. The use of a priority queue ensures that each concept is generated before its predecessors, and a mechanism of propagation of constraints ensures that meets will be computed. NextPriorityConcept computes a concept lattice and therefore is positioned in FCA framework, with the possibility of extraction of rules, closure computations or navigation in the lattice, that can be useful in many fields of pattern mining and discovery.

Predecessors selection by specific strategies. The algorithm also introduces the notion of strategy σ to provide predicates (called selectors) generating the predecessors of a concept (A, δ(A)) according to each characterisitc. A selector proposes a way to refine the description δ(A) to a reduced set A0 ⊆ A of objects. Selectors can be defined either by a restriction on the description δ(A) or on the set A. Several strategies are possible to generate predecessors of a concept, going from the naive strategy classically used in FCA that considers all the possible predecessors, to strategies reducing the number of predecessors in order to obtain smaller lattices. Let us observe that selectors are only used for the predecessors generation, they are not kept either in the description or in the final set of predicates. Therefore, choosing or testing several strategies at each iteration in a user-driven pattern discovery approach would be interesting. It is also possible to introduce a filter (or meta-strategy) on the selector as for example maximal support (number of objects) of a concept, or entropy (according to a class information). Galactic. The use of predicates regardless of the data allows a generic implementation of our algorithm which, mixed with a system of plugins, makes it possible easy integration of new kinds of data (description and strategies). Galactic1 (GAlois LAttices, Concept Theory, Implicational systems and Closures) is a development platform of our algorithm allowing easy integration of new plugins.

The main result in [ 9 ] states that the NextPriorityConcept algorithm computes the formal context hG, P, IP i and its concept lattice (where P is the set of predicates describing the characteristics, and IP = {(a, p), a ∈ G, p ∈ P : p(a)} is the relation between objects and predicates) if the description δ verifies δ(A) v δ(A0) for A0 ⊆ A. The run-time of NextPriorityConcept algorithm has a complexity O(|B| |G| |P |2 (cσ + cδ)) (where B is the number of concepts, cσ is the cost of the strategy and cδ is the cost of the description), and a space memory in O(w |P |2) (where w is the width of the concept lattice).

1 https://galactic.univ-lr.fr Sequences

A sequence s = hs1, s2, . . . sni is a list of ordered events belonging to an alphabet Σ of event types. The length |s| of a sequence s is the number of its elements, here |s| = n.

Let us denote G a database of sequences.

Example 1 in Table 1 represents a database G which contains 4 sequences with the alphabet Σ = {A, C, G, T }.

A sequence a = ha1a2 . . . ani is a subsequence of a sequence b = hb1b2 . . . bmi, and we write2 a vs b if there are integers 1 ≤ i1 < i2 < . . . < in ≤ m such that aj = bij with j ≤ n. It is also said that b is a super-sequence of a, or b contains a. From the example in Table 1, we have 3 vs 4.

A sequence p = hp1, p2 . . . pmi, is a prefix of a sequence s = hs1, s2, . . . sni if m < n and pi = si for any i ≤ m. For example, hA, Gi is a prefix of sequence 1. A k-subsequence is a subsequence of length k. For a subset A ⊆ G of sequences, s is a common subsequence of A if, ∀x ∈ A, we have s vs x.

The support of a subsequence s is support(s) = |{x ∈ G : s vs x}|. A closed subsequence c is a subsequence that has no super-sequences with the same support, i.e if ∃c0 ∈ Σ∗ such that support(c) = support(c0) then c0 @s c or c0 = c.

1 hA, G, T i 2 hG, C, Gi 3 hT, C, Ai 4 hT, G, C, Ai NextPriorityConcept is a generic algorithm that considers any characteristic as input, that must be supplied with a description δ and a strategy σ. In order to mine sequences with this algorithm, we have to define descriptions and strategies for a set G of input sequences defined on an alphabet Σ.

A description δ is a mapping δ : 2G → 2P which defines a set of predicates δ(A) describing any subset A ⊆ G of sequences.

A strategy σ is an mapping σ : 2G → 2P which defines a set of selectors σ(A) to select strict subset A0 of A corresponding to the predecessors of any concept (A, δ(A)). 2 The symbol vs denotes a subsequence, to distinguish with the symbol v classicaly used for the subsumption

For sequences, predicates are based on subsequences, and are of the form ”a matches x” meaning x vs a. For better readability, the sets δ(A) and σ(A) will be treated either as sets of predicates or as sets of subsequences in the rest of this document, they can reciprocally be deduced from each other with a predicate on the form ”matches x” for each subsequence x. These predicates could be specialized according to the type of subsequences given by the description. For example, for a description that computes prefixes, predicates are ”starts with”. 3.1

Descriptions for sequences

We define three descriptions for a set A of sequences. The first one generates the classical maximal common subsequences of A. The second one is the restriction to prefix sub-sequences. The third one offers the possibility to limit the size of subsequences according to a given length:

The Maximal Common Subsequences (MCS) description is defined, for

a subset A ⊆ G of sequences by:

δmcs(A) = {x ∈ Σ∗ : ∀a ∈ A, x vs a and 6 ∃x0 ∈ δmcs(A) : x vs x0} (1) For this description we use the BIDE algorithm [ 28 ] to generate all maximal common subsequences, with a time complexity cδmcs = O(2|A|) ≤ O(2|G|)

The Prefix Common Subsequence (PCS) description is defined for a sub

set A ⊆ G of sequences by: δpcs(A) = {x ∈ Σ∗ : ∀a ∈ A, x prefix of a and 6 ∃x0 ∈ δpcs(A) : x prefix of x’} (2) The complexity of this description is cδpcs = O(|G|.|Σ|.lmax), where lmax is the length of the maximal sequence in A.

The K-Common Subsequence (KCS) description is defined for a subset

A ⊆ G and a length k by: δkcs(A, k) = {x ∈ Σ∗ : ∀a ∈ A, x vs a and |x| = k} (3) This description has the same time complexity of cδmcs because maximal common subsequences have to be computed, thus cδkcs = O(2|A|) ≤ O(2|G|).

Some examples of descriptions for the sequences in Table 1 are given in Table 2.

To ensure that NextPriorityConcept will generate a concept lattice, these three descriptions must verify δ(A) v δ(A0) for A0 ⊆ A where v is the subsumption relation between sets of subsequences defined by:

δ(A) v δ(A0) ⇐⇒ ∀x ∈ δ(A), ∃x0 ∈ δ(A0) such that x vs x0 Proposition 1 For A0 ⊆ A ⊆ G and a length k, we have the three following properties: 1. δmcs(A) v δmcs(A0) 2. δpcs(A) v δpcs(A0) 3. δkcs(A, k) v δkcs(A0, k) Proof: Let A and A0 be two subsets of G such that A0 ⊆ A. 1. Let c ∈ δmcs(A), i.e. c is a maximal subsequence of A. By A0 ⊆ A we can deduce that c is also a subsequence of sequences in A0, but c is not necessarily a maximal subsequence for A0. If c is a maximal subsequence in A0 then c ∈ δmcs(A0). Otherwise, there exists c0 ∈ δmcs(A0) such that c is a subsequence of c0. In these two cases, we can deduce that, δmcs(A) v δmcs(A0) 2. The proof is similar for the prefix common subsequence description δpcs. 3. For δkcs, the subsumption relation can be refined by the inclusion relation because it is obvious that k-subsequences of a set A of sequences also are k-subsequences of any subset A0 of A. Then δkcs(A) ⊆ δkcs(A0) and therefore δkcs(A) v δkcs(A0). 3.2

Strategies and selectors for sequences

NextPriorityConcept refines each concept (A, δ(A)) into predecessors with fewer sequences and more specific descriptions using an input strategy that gives a means of reducing the set of objects from the concept to its predecessors. In this section, we introduce two strategies for sequences. The first one considers all the possible subsets of A, whereas the second one consists in adding one element of the alphabet to the current common subsequences of the description. The Simple strategy σsimple is defined for a subset A of sequences and a description δ by: σSimple(A, δ) = {x : x ∈ δ(A0), A0 = A\{a}, for all a ∈ A} (4) The cost for this strategy depends on the cost of the description δ : cσSimple = O(cδ.|G|).

The Augmented Alphabet Subsequence (AAS) strategy σaas is defined

for a subset A and a descirption δ of sequences by: σaas(A, δ) = {x + a : x ∈ δ(A), a ∈ Σ} (5) Note that the x + a notation means the concatenation of the item a to the subsequence x. The cost for this strategy is caσas = O(|δ(A)|.|Σ|)

Table 3 gives some generated predicates for A = {2, 4} ( concept 5 in Figure 1, concept 2 in Figure 3) and with these two strategies. For our example in Table 2, those which generated predecessors are mentioned by adding the $ symbol with the concept number, from Figures 1 and 3.

Figure 2 displays the concept lattice generated with the AAS strategy and the PCS description. Figure 1 displays the concept lattice generated with the Simple strategy and the MCS description. Figures 3 and 4 display the two concept lattices generated with the simple strategy and the KCS description with resp. k = 2 and k = 3.

The concept $1 in Figure 2 shows that sequences 3 and 4 both start with the prefix T . We can observe that we obtain fewer concepts with k = 3 than with k = 2. In Figure 3, concept $2 contains the subsequence hG, Ci which is the only 2-common subsequence of sequences 2 and 4.

Clearly the predicates in the concept lattice in Figure 1 corresponds to all the closed subsequences since the MCS description generates maximal subsequences, and the simple strategy considers all the possible subsets A of sequences. We can observe that the other lattices contain fewer concepts.

Fig. 2. Hasse diagram of the reduced concept lattice Fig. 1. Hasse diagram of the reduced con- for the AAS strategy and the PCS description cept lattice for the Simple strategy and the MCS description To experimentally evaluate the effectiveness of our descriptions and strategies, we use Galactic3 (GAlois LAttices, Concept Theory, Implicational systems and Closures), a development platform of the NextPriorityConcept algorithm which, mixed with a system of plugins, makes it possible easy integration of new kinds of data (description and strategies). We have implemented new plugins for the sequences: MCS Match, PCS Match and KCS Match as description plugins, and Simple Match and AAS Match as strategies plugins. We consider the following synthetic and real datasets that are summarized in Table 4: 3 groups of synthetic datasets. They are generated by the IBM Quest Dataset Generator software hosted in the SPMF Website4. These datasets are generated by varying three parameters; the dataset size |G|, the alphabet size |A|, and the sequences size | |

S . Generated datasets are named as follows, G100A30S7 for the dataset where |G| = 100, | | = 30 and |S| = 7. In A total, 10 datasets were generated : G200A30S7, G100A30S7, G75A30S7, G50A30S7, G25A30S7, G100A20S7, G100A10S7, G100A30S5, G100A30S6, G100A30S8 2 real datasets. The Daily-actions dataset is a small database of sequences that represents daily actions of 25 individuals of L3i laboratory5, where daily actions can be [W akeup, Breakf ast, W ork, Dinner]. The Wine city dataset is issued from the museum data ”La cit´e du vin” in Bordeaux, France6, gathered from the visits on a period of one year (May 2016 to

3 https://galactic.univ-lr.fr

4 http://www.philippe-fournier-viger.com/spmf/ 5 https://l3i.univ-larochelle.fr/ 6 https://www.laciteduvin.com/en

May 2017). The museum is a large ”open-space”, where visitors are free to explore the museum the way they want, without predetermined path. When they arrive at the museum, they receive a small personal device to detect whenever a visitor is close to an animation spot called a module. The museum contains 20 modules, so we can say that extracted sequences for this data is quite short. By extracting the sequences of activation of each module, we end up with a precise enough idea of what the visit looked like for each visitor of the museum.

datasets dataset size alphabet size average sequence size Group 1 {25, 50, 75, 100, 200} 30 7 Group 2 100 {10, 20, 30} 7 Group 3 100 30 {5, 6, 7, 8} Daily-actions 25 12 8 Wine city {25, 50, 75, 100, 200} 30

Closed subsequences with the MCS description

Table 5 shows the number of predicates and concepts generated by δmcs/σsimple and δmcs/σaas for real and synthetic datasets, and the number of closed subsequences generated by Clospan[ 30 ].

Using δmcs/σsimple, we clearly generate all the closed subsequences since the MCS description generate closed subsequences, and the simple strategy considers all the possible subsets of sequences. We can indeed observe that the number of predicates is the same as the number of closed subsequences with Clospan. The number of concepts is greater than the number of predicates/closed subsequences because many concepts are generated from their successors without new predicates.

We can also observe that the AAS strategy generates less closed subsequences since only the right augmented subsequences are considered. But this difference is only observed with more than 100 sequences in G and more than 20 elements in A, meaning that the AAS strategy is close to the simple strategy, but with a better time complexity. This illustrates the possibility of reducing the closed subsequences generated according to a specific strategy.

KCS description and the Daily-actions dataset

Figure 5 and Table 6 show a comparison between MCS and KCS descriptions for synthetic and the Daily-actions datasets.

Clearly, the KCS description does not generate closed subsequences because we set the size of the subsequences that are not maximal. Therefore we obtain more predicates, but fewer concepts. Sequences are likely to share more subsequences of small length, but when we increase the length, concepts sharing the same subsequences became fewer. We can also observe that we get fewer concepts as the k parameter increases. The KCS description is clearly adapted to the Daily-actions dataset because small subsequences make more sense to analyze daily actions than the maximal / closed subsequences. The possibility to fix k allows to avoid the generation of all possible subsequences. prefixes since we were focused on the flow of visitors at the entrance of the museum, that is which paths were visited first. Here we can observe that we obtain fewer concepts and predicates with the PCS description compared to the MCS description that generates a very large number of closed subsequences. In this paper, we present a sequence mining approach using the NextPriorityConcept algorithm that generated pattern concepts in a user-driven patterns mining approach for heterogeneous and complex data as input. This algorithm allows a generic patterns computation through specific descriptions and strategies by predicates.

We present three descriptions and two strategies dedicated to the analysis of sequential data. As descriptions we propose the classical maximal common subsequences (MCS), the maximal prefix (PCS), or the k-subsequences (KCS). Therefore a set of sequences can be mined by one of these three descriptions in a generic way. We also propose two strategies of pattern exploration. The first strategy (simple) considers all the possible subsets of sequences as in classical approaches, while the second one (AAS) proposes a refinement of a set of sequences by adding only one item to their common description.

Sequence analysis of real world data may require a specific way of describing the sequences, and a specific strategy of exploration of the predecessors, and the NextPriorityConcept algorithm could be very useful since it allows several descriptions and strategies in a generic way. For example the prefix match description was triggered by the need to know which spaces are visited first in a museum, which common paths are chosen by visitors, or which spaces may be forgotten within the visit.

For future work, we are interested in finding new ways of describing sequences to make it more specific to the application, one of which is the use of distances between elements in the sequence, or specify a given window parameter to deal with long sequences.

1. Agrawal , R. , Srikant , R.: Mining sequential patterns . In: Data Engineering , 1995 . Proceedings of the Eleventh International Conference on. pp. 3 - 14 . IEEE ( 1995 )

2. Ayres , J. , Flannick , J. , Gehrke , J. , Yiu , T. : Sequential pattern mining using a bitmap representation . In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining . pp. 429 - 435 . ACM ( 2002 )

3. Barbut , M. , Monjardet , B. : Ordres et classifications : Alg`ebre et combinatoire . Hachette, Paris ( 1970 ), 2 tomes

4. Bertet , K. , Demko , Ch., Viaud , J.F. , Gu´erin, C.: Lattices, closure systems and implication bases: A survey of structural aspects and algorithms . Theoretical Computer Science 743 , 93 - 109 ( 2018 )

5. Birkhoff , G.: Lattice theory , vol. 25 . American Mathematical Soc. ( 1940 )

6. Bordat , J.P. : Calcul pratique du treillis de Galois d'une correspondance . Math´ematiques et Sciences humaines 96 , 31 - 47 ( 1986 )

7. Buzmakov , A. , Egho , E. , Jay , N. , Kuznetsov , S.O. , Napoli , A. , Ra¨ıssi, C.: Fca and pattern structures for mining care trajectories ( 2013 )

8. Codocedo , V. , Bosc , G. , Kaytoue , M. , Boulicaut , J.F. , Napoli , A. : A proposition for sequence mining using pattern structures . In: International Conference on Formal Concept Analysis . pp. 106 - 121 . Springer ( 2017 )

9. Demko , C. , Bertet , K. , Faucher , C. , Viaud , J.F. , Kuznetsov , S. : Next priority concept: A new and generic algorithm computing concepts from complex and heterogeneous data . arXiv preprint arXiv:1912 . 11038 ( 2019 )

10. Ferr´e, S.: The efficient computation of complete and concise substring scales with suffix trees . In: International Conference on Formal Concept Analysis . pp. 98 - 113 . Springer ( 2007 )

11. Ferr´e, S.: Reconciling Expressivity and Usability in Information Access - From Filesystems to the Semantic Web . Habilitation, University of Rennes 1, France (november 2014 )

12. Ferr´e, S.: Reconciling expressivity and usability in information access from file systems to the semantic web ( 2014 )

13. Ferr´e, S.: Syst`emes d'information logiques : un paradigme logico-contextuel pour interroger, naviguer et apprendre . Doctorat, University of Rennes 1, France (Oct 2002 )

14. Ganter , B. , Kuznetsov , S.O. : Pattern structures and their projections . In: LNCS of International Conference on Conceptual Structures (ICCS'01) . pp. 129 - 142 ( 2001 )

15. Ganter , B. , Wille , R.: Formal Concept Analysis, Mathematical foundations . Springer Verlag, Berlin ( 1999 )

16. Gizdatullin , D. , Baixeries , J. , Ignatov , D.I. , Mitrofanova , E. , Muratova , A. , Espy , T.H. : Learning interpretable prefix-based patterns from demographic sequences . In: International Conference on Intelligent Data Processing: Theory and Applications . pp. 74 - 91 . Springer ( 2016 )

17. Gizdatullin , D. , Ignatov , D. , Mitrofanova , E. , Muratova , A. : Classification of demographic sequences based on pattern structures and emerging patterns . In: Supplementary Proceedings of 14th International Conference on Formal Concept Analysis, ICFCA . pp. 49 - 66 ( 2017 )

18. Gomariz , A. , Campos , M. , Marin , R. , Goethals , B. : Clasp: An efficient algorithm for mining frequent closed sequences . In: Pacific-Asia Conference on Knowledge Discovery and Data Mining . pp. 50 - 61 . Springer ( 2013 )

19. Han , J ., Pei , J. , Mortazavi-Asl , B. , Chen , Q. , Dayal , U. , Hsu , M.C. : Freespan: frequent pattern-projected sequential pattern mining . In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining . pp. 355 - 359 . ACM ( 2000 )

20. Kaytoue , M. : Contributions to Pattern Discovery. Habilitation, University of Lyon, France (february 2020 )

21. Kaytoue , M. , Codocedo , V. , Buzmakov , A. , Baixeries , J. , Kuznetsov , S.O. , Napoli , A. : Pattern structures and concept lattices for data mining and knowledge processing . In: In Proceedings of ECML-PKDDl ( 2015 )

22. Linding , C. : Fast concept analysis . In: Working with Conceptual StructuresContributions to ICC . pp. 235 - 248 ( 2002 )

23. Mannila , H. , Toivonen , H. , Verkamo , A.I. : Discovery of frequent episodes in event sequences. Data mining and knowledge discovery 1(3 ), 259 - 289 ( 1997 )

24. Masseglia , F. , Cathala , F. , Poncelet , P.: The psp approach for mining sequential patterns . In: European Symposium on Principles of Data Mining and Knowledge Discovery . pp. 176 - 184 . Springer ( 1998 )

25. Pei , J ., Han, J ., Mortazavi-Asl , B. , Pinto , H. , Chen , Q. , Dayal , U. , Hsu , M.C. : Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth . In: icccn. p. 0215 . IEEE ( 2001 )

26. Pei , J ., Han, J ., Wang , W. : Mining sequential patterns with constraints in large databases . In: Proceedings of the eleventh international conference on Information and knowledge management . pp. 18 - 25 ( 2002 )

27. Srikant , R. , Agrawal , R.: Mining sequential patterns: Generalizations and performance improvements, edbt ( 1996 )

28. Wang , J ., Han, J .: Bide: Efficient mining of frequent closed sequences . In: Proceedings. 20th international conference on data engineering . pp. 79 - 90 . IEEE ( 2004 )

29. Wille , R.: Restructuring lattice theory : an approach based on hierarchies of concepts . Ordered sets pp. 445 - 470 ( 1982 ), i. Rival (ed.), Dordrecht-Boston, Reidel.

30. Yan , X. , Han, J ., Afshar , R.: Clospan: Mining: Closed sequential patterns in large datasets . In: Proceedings of the 2003 SIAM international conference on data mining . pp. 166 - 177 . SIAM ( 2003 )

31. Zaki , M.J.: Spade: An efficient algorithm for mining frequent sequences . Machine learning 42(1-2) , 31 - 60 ( 2001 )