<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A FCA-based analysis of sequential care tra jectories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elias EGHO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Jay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chedy Raissi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <email>amedeo.napoli@loria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Orpailleur Team, LORIA</institution>
          ,
          <addr-line>Vandoeuvre-les-Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a research work in the domains of sequential pattern mining and formal concept analysis. Using a combined method, we show how concept lattices and interestingness measures such as stability can improve the task of discovering knowledge in symbolic sequential data. We give example of a real medical application to illustrate how this approach can be useful to discover patterns of trajectories of care in a french medico-economical database.</p>
      </abstract>
      <kwd-group>
        <kwd>Data-Mining</kwd>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>Sequential patterns</kwd>
        <kwd>stability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>indicates to what extent a pattern is frequent in a database. Many (sequential
and non sequential) itemset mining methods use support as measure for
nding interesting correlations in databases. However, the most relevant patterns
may not be the most frequent ones. Moreover, discovering interesting patterns
with low support leads generally to overwhelming results that need to be further
processed in order to be analyzed by human experts.</p>
      <p>
        Formal Concept Analysis (FCA) is a theory of data analysis introduced in
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], that is tightly connected with data-mining and especially the search of
frequent itemsets [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. FCA organizes information into a concept lattice
representing inherent structures existing in data. Recently, some authors proposed
new interest measures to reduce complex concept lattices and thus nd
interesting patterns. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Kuznetsov introduces stability, successfully used in social
network and social community analysis [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ].
      </p>
      <p>To our knowledge, there are no similar approaches to nd interesting
sequential patterns. In this paper, we present an original experiment based on both
multilevel and multidimensional sequential patterns and lattice-based classi
cation. This experiment may be regarded from two points of view: on the one
hand, it is based on multilevel and multidimensional sequential patterns search,
and on the other hand, visualization and classi cation of extracted sequences
is based on Formal Concept Analysis (FCA) techniques, organizing them into
a lattice for analysis and interpretation. It has been motivated by the problem
of mining care trajectories in a regional healthcare system, using data from the
PMSI, the so called French hospital information system. The remaining of the
paper is organized as follows. In Section 2, we present the problem of mining
care trajectories. Section 3 presents the methods proposed in domains of
multilevel and multidimensional sequential patterns and Formal Concept Analysis. In
Section 4, we present some of the results we achieved.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Mining healthcare trajectories</title>
      <p>The PMSI (Programme de Medicalisation des Systemes d' information) database
is a national information system used in France to describe hospital activity
with both an economical and medical point of view. The PMSI is based on
the systematic collection of administrative and medical data. In this system,
every hospitalization leads to the collection of administrative, demographical and
medical data. This information is mainly used for billing and planning purposes.
Its structure can be described (and voluntarily simpli ed) as follows:
{ Entities (attributes):</p>
      <p>Patients (id, gender . . . )
Stays (id, hospital, principal diagnosis, . . . )
Associated Diagnoses (id)</p>
      <p>Procedures (id, date,. . . )
{ Relationships
a patient has 1 or more stays
a stay may have several procedures
a stay may have several associated diagnoses</p>
      <p>The collection of data is done with a minimum recordset using controlled
vocabularies and classi cations. For example, all diagnoses are coded with the
International Classi cation of Diseases (ICD10)1. Theses classi cations can be
used as taxonomies to feed the process of multilevel sequential pattern mining
as shown in gure 1.</p>
      <p>ICD 10</p>
      <sec id="sec-2-1">
        <title>Institutions taxonomy Fig. 1. Examples of taxonomies used in multilevel sequential pattern mining</title>
        <p>Healthcare management and planning play a key role for improving the
overall health level of the population. From a population point of view, even the
best and state-of-the-art therapy is not e ective if it cannot be delivered in
the right conditions. Actually, many determinants a ect the e ective delivery of
healthcare services: availability of trained personnel, availability of equipment,
security constraints, costs, proximity . . . . All of these should meet economics,
demographics, and epidemiological needs in a given area. This issue is especially
acute in the eld of cancer care where many institutions and professionals must
cooperate to deliver high level, long term, and costly care. Therefore, it is crucial
for healthcare managers and decision makers to be assisted by decision support
systems that give strategic insights about the intrinsic behavior of the healthcare
system.</p>
        <p>
          On the one hand, healthcare systems can be considered as rich in data as
they produce massive amounts of data such as electronic medical records,
clinical trial data, hospital records, administrative data, and so on. On the other
hand, they can be regarded as poor in knowledge as these data are rarely
embedded into a strategic decision-support resource [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We used the PMSI system
as a source of data to study patient movements between several institutions.
By organizing themselves into groups of sequences representing trajectories of
care, we aim at discovering patterns describing the whole course of treatments
for a given population. This global approach contrasts with the usual statistical
exploitations of the PMSI data that focus mainly on single hospitalizations.
        </p>
        <p>In this experiment, we have worked on four years (2006 { 2009) of the PMSI
data of the Burgundy region related to patient su ering from lung cancer.</p>
        <sec id="sec-2-1-1">
          <title>1 http://apps.who.int/classi cations/apps/icd/icd10online/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>
        Sequential Pattern Mining
Let I be a nite set of items. A subset of I is called an itemset. A sequence
s = hs1s2 : : : ski (si I) is an ordered list of itemsets. A sequence s = hs1s2 : : : sni
is a subsequence of a sequence s0 = hs01s02 : : : s0mi if and only if 9i1; i2; : : : in such
that i1 i2 : : : in and s1 s0i1 ; s2 s0i2 : : : an s0in . We note s s0 and
also say that s0 contains s. Let D = fs1; s2 : : : sng be a database of sequences.
The support of a sequence s in D is the proportion of sequences of D containing
s. Given a minsup threshold, the problem of frequent sequential pattern
mining consists in nding the set FS of sequences whose support is not less than
minsup. Following the seminal work of Agrawal and Srikant [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the Apriori
algorithm, many studies have contributed to the e cient mining of sequential
pattern. The main approaches are Pre xSpan [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], SPADE [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], SPAM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], PSP
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], DISC [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and PAID [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Much work has been done in the area of single-dimensional sequential
patterns, i.e, all the items in a sequence have the same nature like the sequence
of products sold in a certain store. But in many cases, the information in a
sequence can be based on several dimensions. For example: a male patient had a
surgical operation in Hospital A and then received chemotherapy in Hospital B.
In this case, we have 3 dimensions: gender, type of treatment (chemotherapy,
surgery) and location (Hospitals A and B). Pinto et al [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is the rst work
giving solutions for mining multidimensional sequential patterns. They propose
to include some dimensions in the rst or the last itemset in the sequence. But
this works only for dimensions that remain constant over time, such as gender
in our previous example. Among other proposals addressed in this area, Yu et al
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] consider multidimensional sequential pattern mining in the web domain. In
their approach, dimensions are pages, sessions and days. They present two
algorithms AprioriMD and Pre xMDSpan by modifying the Apriori and Pre xSpan
algorithms. Zhang et al [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] propose the mining of multidimensional sequential
patterns in distributed system.
      </p>
      <p>Moreover, each dimension can be represented by di erent levels of
granularity, using a taxonomy which de nes the hierarchical relations between items.
Figure 2 shows an example of a diseases taxonomy. Including knowledge
contained in the taxonomy leads to the problem of multilevel sequential pattern
mining. Its interest resides in the capacity to extract more or less general/speci c
sequential patterns and overcome problems of excessive granularity and low
support. For example, using the diseases taxonomy in Figure 2, sequences such as
hHeartDisease; BrainDisease &gt; could be extracted while hArryth:; BrainDisease &gt;
and hMyoc:Inf:; BrainDisease &gt; may have a too low support.</p>
      <sec id="sec-3-1">
        <title>Although Srikant and Agrawal [14] early introduced hierarchy management in the</title>
        <p>
          extraction of association rules and sequential patterns, their approach was not scalable
in a multidimensional context. Han et al [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] proposed a method for mining multiple
level association rules in large databases. But their approach could not extract patterns
containing items from di erent levels in the taxonomy. Plantevit et al [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed
M3SP, a method taking both multilevel and multidimensional aspects into account.
M3SP is able to nd sequential patterns with the most appropriate level of granularity.
        </p>
        <p>
          The PMSI is a multidimensional database holding information coded with controled
vocabularies and taxonomies. Therefore, we relied on M3SP to extract multilevel and
multidimensional sequential patterns. Nevertheless, the M3SP paradigm is still the
search of frequent patterns. As our objective is to discover interesting patterns that
may be infrequent, we ran M3SP iteratively until very low support thresholds. (See
appendix for more details about M3SP and how we used it). This produced massive
amounts of patterns requiring further processing for a practical interpretation by a
domain expert. This next phase was conducted with a lattice-based classi cation of
sequential patterns described in the following section.
Introduced by Wille [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], Formal Concept Analysis is based on the mathematical order
theory. FCA has successfully been applied to many elds, such as medicine and
psychology, musicology, linguistic databases, information science, software engineering . . . .
A strong feature of Formal Concept Analysis is its capability of producing graphical
visualizations of the inherent structures among data.
        </p>
        <p>FCA starts with a formal context K = (G; M; I) where G is a set of objects, M is a
set of attributes, and the binary relation I = G M speci es which objects have which
attributes. Two operators, both denoted by 0, connect the power sets of objects 2G and
attributes 2M as follows:
0 : 2G ! 2M; X0 = fm 2 Mj8g 2 X; gImg
0 : 2M ! 2G; Y0 = fg 2 Gj8m 2 Y; gImg
The operator 0 is dually de ned on attributes. The pair of 0 operators induces a Galois
connection between 2G and 2M. The composition operators 00 are closure operators: they
are idempotent, extensive and monotonous. For any A G and B M, A00 and B00 are
closed sets whenever A = A00 and B = B00.</p>
        <p>A formal concept of the context K = (G; M; I) is a pair (A; B) G M where A0 = B
and B0 = A. A is called the extent and B is called the intent. A concept (A1; B1) is a
subconcept of a concept (A2; B2) if A1 A2 (which is equivalent to B2 B1) and we write
(A1; B1) (A2; B2). The set B of all concepts of a formal context K together with the
partial order relation forms a lattice and is called concept lattice of K. This lattice
can be represented as a Hasse diagram providing a visual support for interpretation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Classi cation and selection of interesting care trajectories</title>
      <p>We use FCA to classify and lter the results of the sequential mining step. The formal
context is built by taking patients as objects, and sequential patterns as attributes. A
patient p, considered as a sequence, is related to a sequential pattern s if p contains
s. Table 4 shows a formal context KP S representing the binary relation between the
patients and the sequences. The cross indicates that the patient has passed completely
in the sequence of the health facilities. Thus, we achieve a classi cation of patients
according to their trajectories of care.</p>
      <p>Seq1 Seq2 Seq3 Seq4
P1 x x x
P2 x
P3 x x x</p>
      <p>P4 x</p>
      <sec id="sec-4-1">
        <title>The stability index of a concept indicates how much the concept intent depends</title>
        <p>on particular objects of the extent. It indicates the probability of preserving concept
intent while removing some objects of its extent. A stable concept continues to be a
concept even if a few members stop being members. This means also that a stable
concept is resistant to noise and will not collapse when some members are removed
from its extent.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Stability o ers an alternative point of view on concepts compared to the well known metric of support based on frequency, which is noticely used to build iceberg lattices [15]. Actually, combining support and stability allows a more subtle interpretation, as shown in a previous work in the same application domain [6].</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>Patient healthcare trajectories
The PMSI is a relational database holding informations for any hospitalization in
France. We reconstituted patient care trajectories from PMSI data considering each
stay as an itemset. The sequence of stays for a same patient de nes his care trajectory.
In our experiment, itemsets could be made of various combinations of dimensions.
Table 2 shows the trajectories of care obtained using two dimension (principal diagnosis,
hospital ID). For example (C341,210780581) represents one hospitalization for a patient
in the University Hospital of Dijon (coded as 210780581) treated for a lung cancer
(C341). Our dataset contained 486 patients su ering from lung cancer and living in
the French region of Burgundy.</p>
      <p>Table 3 shows some of the patterns generated by M3SP with the data presented in
Table 2 using taxonomies of Figure 1. Pattern 3 can be interpreted as follows: 36% of
patients have a hospitalization in a private institution (CL), for any kind of principal
diagnosis (ALL). Then, 3 hospitalizations follow with the same principal diagnosis
(Z511 coding for chemotherapy). That kind of pattern demonstrates the interest of
multilevel and multidimensional sequential pattern mining: though principal diagnosis
are the same in the third last stays, hospitals can be di erent. Mining at the lowest
level of granularity, without taxonomies, would generate many di erent patterns with
lower support.</p>
      <sec id="sec-5-1">
        <title>However, for low support thresholds, the number of extracted patterns dramatically grows with the size of the database, depending on the number of patients, the size of the taxonomies and the number of dimensions as shown in Table 4.</title>
      </sec>
      <sec id="sec-5-2">
        <title>Dimensions used</title>
        <p>Institutions
Principal Diagnosis, Institution
All diagnoses
Institutions, Medical Procedures</p>
      </sec>
      <sec id="sec-5-3">
        <title>Number of patterns</title>
        <p>1529
4051
50546
293402</p>
      </sec>
      <sec id="sec-5-4">
        <title>The next step consists in building a lattice with the resulting sequential patterns in order to facilitate interpretation and selection of interesting care trajectories.</title>
        <p>Lattice-based classi cation of sequential patterns
We illustrate this approach with patterns representing the sequences of institutions
that are frequent in the patients set. We built a formal context relating 486 patients
and 1529 sequential patterns. These sequences are generated in the rst experimental
by considering only one dimension (healthcare institutions). It is characterized with
a taxonomy with two levels of granularity. We iteratively applied M3SP, decreasing
threshold by one patient at each step. The resulting lattice has 10145 concepts
organized on 48 di erent levels. Figure 3 shows the upper part of the lattice. Concepts
intents are sets of one or more sequential patterns. From the lowest right concept, we
can see that 37 patients support 3 sequential patterns:
{ at least one hospitalization in the hospital 690781810
{ they were hospitalized at least once in a University Hospital (CHU/CHR)
{ they had at least 2 hospitalization, for simplicity, 2*(ALL) is the contraction of
(ALL)(ALL).</p>
        <p>The intent of top concept is h(ALL)i, because all patients have at least one
hospitalization during their treatment. The intent of co-atoms (i.e. immediate descendant of
top) is always a sequence of length one, holding items of high level of granularity.
&lt;{(All)}&gt;</p>
        <p>Nb=486
&lt;{(CL)}&gt;
&lt;{(All)}&gt;</p>
        <p>Nb=287
&lt;2*{(All)}&gt;</p>
        <p>Nb=475
&lt;{(CHU/CHR)}</p>
        <p>&gt;
&lt;{(All)}&gt;
Nb=259
&lt;{(CH)}&gt;
&lt;{(All)}&gt;
Nb=279
…  
&lt;3*{(All)}&gt;</p>
        <p>Nb=459
&lt;{(710780354)}&gt;
&lt;{(CL)}&gt;
&lt;{(All)}&gt;
Nb=12
&lt;{(890000409)}&gt;
&lt;{(CH)}&gt;
&lt;{(All)}&gt;
Nb=457
&lt;{(CHU/CHR)}&gt;
&lt;2*{(All)}&gt;</p>
        <p>Nb=254
&lt;{(CH)}&gt;
&lt;2*{(All)}&gt;</p>
        <p>Nb=277
…   &lt;4*{(All)}&gt;</p>
        <p>Nb=445
&lt;{(210780714)}&gt;
&lt;{(CH)}&gt;
&lt;2*{(All)}&gt;</p>
        <p>Nb=23
&lt;{(710780354)}&gt;
&lt;{(210780979)}&gt;
&lt;{(CL)}&gt;
&lt;{(All)}&gt;</p>
        <p>Nb=8
&lt;{(710780958)}&gt;
&lt;{(CH)}&gt;
&lt;2*{(All)}&gt;</p>
        <p>Nb=32
&lt;{(210780581)}&gt;
&lt;{(CHU/CHR)}&gt;
&lt;{(All)}&gt;</p>
        <p>Nb=156
&lt;{(690781810)}&gt;
&lt;{(CHU/CHR)}&gt;
&lt;2*{(All)}&gt;</p>
        <p>Nb=37
…  
…  </p>
      </sec>
      <sec id="sec-5-5">
        <title>Filtering concepts can be achieved using both support and stability. In order to</title>
        <p>highlight the interesting properties of stability, we try to answer the question \is there a
number of hospitalizations that characterizes care trajectories for lung cancer?". A basic
scheme in lung cancer treatment consists generally in a sequence of 4 chemotherapy
sessions possibly following a surgical operation. Due to noise in data or variability in
practices, we may observe sequences of 4, 5, 6 or more stays in the PMSI database.
Mining such data with an a priori xed support threshold may not discover the most
interesting patterns. If the threshold is too high, we simply miss the good pattern. If
it is too low, similar patterns, di ering only in length, with close values of support can
be extracted. Figure 4 shows the power of stability in discriminating such patterns.
The concept with intent h(CL)ih2 (ALL)i is the most frequent. It represents patients
with at least a stay in a private organization, and at least 2 stays in hospital. Similar
concepts have a relatively close support, and di er only in the total number of stays.
The concept with 5 stays has the highest stability. This probably matches the basic
treatment scheme of lung cancer. Our interpretation relies on the power of stability to
point out noisy concepts. Actually, only a few patients in concept h(CL)ih2 (ALL)i
have only 2 stays.</p>
        <p>Another interesting feature of lattice-based classi cation of sequential patterns lies
in its ability to characterize objects by several patterns. Let consider the minimal
database of sequences D = fs1 = h(a)(b)(c)i; s2 = h(a)(c)(b)i : s3 = h(d)ig. With a 2/3
threshold, h(a)(b)i and h(a)(c)i are considered as frequent sequential patterns, but
sequential pattern mining will give no information about the fact that all sequences
containing the pattern h(a)(b)i contain also the pattern h(a)(c)i. However this
information can be obtained by classifying sequential patterns with FCA.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper we propose an original combination of sequential pattern mining and FCA
to explore a database of multidimensional sequences. We show some interesting
properties of concept lattices and stability index to classify and select interesting sequential
patterns. This work is in a early step. Further developments can be made in several
axes. First, other measures of interest could be investigated to qualify sequential
patterns. Furthermore, connexions between FCA and the sequential mining problem could
be explored in a more integrative approach, especially by studying closure operators
on sequences.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors wish to thank the TRAJCAN project for its nancial support and Mrs.
Catherine QUANTIN, the responsible of TRAJCAN project at university hospital of</p>
      <sec id="sec-7-1">
        <title>Dijon.</title>
        <p>
          The M3SP algorithm is able to extract sequential patterns characterized by several
dimensions with di erent levels of granularity for each dimension [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Each dimension
has a taxonomy which de nes the hierarchical relations between items. M3SP runs in
three steps: data pre-processing , MAF-item generation and sequence mining.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>In Figure 5, we present an example to illustrate the mechanism of M3SP. Table</title>
        <p>5b shows a dataset of hospitalizations relating patients (P) with attributes from three
dimensions
{ T, the date of stay,
{ H, the healthcare setting in which the hospitalization takes place,
{ D, the disease of the patient.</p>
      </sec>
      <sec id="sec-7-3">
        <title>For instance, the rst tuple means that, at date 1, the patient 1 has been treated</title>
        <p>for the disease D11 in hospital H11. Let us now assume that we want to extract all
multidimensional sequences that deal with hospitals and diseases that are frequent in
the patients set. Figure 5a displays a taxonomy for dimensions H and D.
Pre-processing step
M3SP considers three types of dimensions: a temporal dimension Dt, a set of analysis
dimensions DA, and a set of reference dimensions DR. M3SP orders the dataset according
to Dt. The tuples appearing in a sequence are de ned over the dimensions of DA. The
support of the sequences is computed according to dimensions of DR. M3SP splits the
dataset into blocks according to distinct tuple values over reference dimensions. The
support of a given multidimensional sequence is the ratio of the number of blocks
supporting the sequence over the total number of blocks. In our example, H (hospitals)
and D (diseases) are the analysis dimensions, T is the temporal dimension and P
(patients) is the only reference dimension. We obtain two blocks de ned by Patient1
and Patient2. as shown in table 5c.</p>
        <p>MAF-item generation step
In this step, M3SP generates all the Maximal Atomic Frequent items or MAF-items.</p>
      </sec>
      <sec id="sec-7-4">
        <title>In order to de ne MAF-items, we srt de ne the speci city relation between items.</title>
        <p>Speci city relation. Given two multidimensional items a = (d1; :::; dm) and
a0 = (d01; :::; d0m), a0 is said to be more speci c than a, denoted by a 4I a0 , if for every
i = 1; :::; m, d0i 2 di #. Where di # is the set of all direct specializations of di according
to the dimension taxonomy of di. In our example, we have (H1; D1) 4I (H1; D11), because
H1 2 H1 # and D1 2 D11 #.</p>
        <p>MAF-item. An atomic item a is said to be a Maximal Atomic Frequent item, or a
MAF-item, if a is frequent and if for every a0 such that a 4I a0 , the item a0 is not
frequent. In our example, if we consider minsup = 100%, b = (H1; D1) is a MAF-item,
because it is frequent and there is not another item as frequent and more speci c than
b.</p>
      </sec>
      <sec id="sec-7-5">
        <title>The computation of MAF-items is represented by a tree in which the nodes are</title>
        <p>of the form (d1; d2)s, meaning that (d1; d2)s is an atomic item with support s as we
show in Figure 5d. In this tree, MAF-items are displayed as boxed nodes. We note
that all leaves are not necessarily MAF-items. For example, (H2; D21)100% is a leaf, but
not a MAF-item. This is because (H2; D21)100% 4I (H21; D21)100% and (H21; D21) has been
identi ed as being an MAF-item.</p>
        <p>Sequence mining step
Frequent sequences can be mined using any standard sequential pattern-mining
algorithm (Pre xSpan in this work). Since in such algorithms, the dataset to be mined is
a set-pairs of the form (id; seq), where id is a sequence identi er and seq is a sequence
of itemsets, our example dataset is transformed as follows :
{ every MAF-item is associated with a unique identi er denoted by ID(a) (table</p>
      </sec>
      <sec id="sec-7-6">
        <title>5e), playing the role of the items in standard algorithms.</title>
        <p>{ every block b is assigned a patient identi er ID(p), playing the role of the sequence
identi ers in standard algorithms,
{ every block b transformed into a pair (ID(b); (b)), where (b) is a sequence. (table
5f)</p>
        <p>Pre xSpan is run over table 5f. By considering a support threshold minsup =50%,
table 5g displays all the frequent sequences in their transformed format as well in their
multidimensional format in which identi ers are replaced with their actual values.</p>
      </sec>
      <sec id="sec-7-7">
        <title>The basic step in M3SP method is MAF-item generation, because it provides all multidimensional items that occur in sequences to be mined. If the set of MAF-items is changed, the sequence will be changed. M3SP always extracts the most speci c multidimensional items.</title>
        <p>For example (H1; D1) is frequent according to minsup=50%, but another item, (H11; D11)
is more speci c and still frequent. As a result, (H1; D1) is not a MAF-item and
consenquently not used to build squences. Finaly the frequent sequence hf(H1; D1); (H21; D21)gi
does not appear in the results of M3SP. However, tables 5 and 6 show the MAF-items
set and the frequent sequences extracted by M3SP at a 100% threshold. It can be
noticed that (H1; D1) is a MAF-item and that the sequence hf(H1; D1); (H21; D21)gi is
generated.</p>
        <sec id="sec-7-7-1">
          <title>Thus, given two minsup thresholds 0 &lt; . The set of frequent sequences obtained</title>
          <p>for 0 may not always contain the set of sequences obtained for .</p>
        </sec>
      </sec>
      <sec id="sec-7-8">
        <title>Considering this as a limit in our approach as we wanted to extract both general and speci c sequences, we iteratively applied M3SP, decreasing threshold by one patient at each step. This allowed us to extract more potentially interesting sequences than by using a single low minsup threshold.</title>
      </sec>
      <sec id="sec-7-9">
        <title>MAF-item</title>
        <p>(H1; D1)
(H21; D21)</p>
      </sec>
      <sec id="sec-7-10">
        <title>Frequent Multidimensional Sequences</title>
        <p>hf(H1; D1)gi
hf(H21; D21)gi
hf(H1; D1); (H21; D21)gi
/ d
a r
      P co
T 1 2 c
a
o
n
o
i
t
    i
  1 1 t</p>
        <p>1 2 r
D a</p>
        <p>D D p</p>
        <p> 
    1 k
  1 1 t c</p>
        <p>1 2 n lo
H</p>
        <p>H H e B
/ .</p>
        <p>)
a c
      P (5
T 1 2
e
r
.
}
t
       
  1 1</p>
        <p>1 2
D
12 21 t</p>
        <p>e
D D D D S
a
t
       
  1 1 2 1 a</p>
        <p>1 2 1 2 D
H .</p>
        <p>H H H H )
b
(
          5</p>
        <p>e
T 1 2 1 2 l</p>
        <p>a
2  xo 2 
2 a 2   's
H t  D y e</p>
        <p>'s m n
 2  1 s   1 o io</p>
        <p>H 2 n  2 2 n s
D
,
*
(
D
 
,
1
e
S m
I
I
I
 
s
e
c
n
e
n
    e
&gt; &gt; u
)} )} q</p>
        <p>e
1 1
2 2 S
H H t
( ( n
{ { e
&lt; &lt; u
 
)
a      
( 1 2 3
D
I
 
      a
) ) )
m 1 2 1 M
e 1 1 2 .
-it­‐ ,D ,D ,D ()e
f 1 2 1
1 1 2
T
.
)
f
(
T
 
%
n</p>
        <p>0
  a
  % m p 5
% 0 te S =
0 0 i x p
10 )1 ic fi u
) 1 e s
2 2 m r in</p>
        <p>m
 
o
  D D t P
0% ,2 ,2 ta
0 H H n
1 ( ( e
) u
q
 
) ,*
,* 2     %  fer
b
a
t
(* (H ,)*H21100% ,)D211200% ,)D2112100 .)(fereodT ()ςb     ,)(()&gt;13  ,)(()&gt;23  fsreoadDm
( H H 5 &lt; &lt; n
( ( e a
r r</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abidi</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          :
          <article-title>Knowledge management in healthcare: towards 'knowledge-driven' decision-support services</article-title>
          .
          <source>Int J Med Inform</source>
          <volume>63</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>5</volume>
          {
          <fpage>18</fpage>
          (Sep
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Mining sequential patterns</article-title>
          . In: Yu,
          <string-name>
            <given-names>P.S.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.S.P</surname>
          </string-name>
          . (eds.)
          <source>Eleventh International Conference on Data Engineering</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>14</fpage>
          . IEEE Computer Society Press, Taipei, Taiwan (
          <year>1995</year>
          ), citeseer.ist.psu.edu/agrawal95mining.html
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ayres</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehrke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yiu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flannick</surname>
          </string-name>
          , J.:
          <article-title>Sequential pattern mining using a bitmap representation</article-title>
          . pp.
          <volume>429</volume>
          {
          <fpage>435</fpage>
          . ACM Press (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. ying Chiu, D.,
          <string-name>
            <surname>hung Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A.L.P.:</given-names>
          </string-name>
          <article-title>An e cient algorithm for mining frequent sequences by a new strategy without support counting</article-title>
          .
          <source>In: In Proceedings of the 20th International Conference on Data Engineering (ICDE'04</source>
          . pp.
          <volume>375</volume>
          {
          <fpage>386</fpage>
          . IEEE Computer Society (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Han,
          <string-name>
            <given-names>J</given-names>
            .,
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Mining multiple-level association rules in large databases</article-title>
          .
          <source>Knowledge and Data Engineering, IEEE Transactions on 11(5)</source>
          ,
          <volume>798</volume>
          {805 (sep/oct
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Analysis of social communities with iceberg and stability-based concept lattices</article-title>
          . In: Medina,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Obiedkov</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.A</surname>
          </string-name>
          . (eds.)
          <source>International Conference on Formal Concept Analysis (ICFCA'08)</source>
          . LNAI, vol.
          <volume>4923</volume>
          , pp.
          <volume>258</volume>
          {
          <fpage>272</fpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Reducing the representation complexity of lattice-based taxonomies</article-title>
          . In: Priss,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Polovina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Proc. of ICCS 15th Intl Conf Conceptual Structures. LNCS/LNAI</source>
          , vol.
          <volume>4604</volume>
          , pp.
          <volume>241</volume>
          {
          <fpage>254</fpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Stability as an estimate of the degree of substantiation of hypotheses derived on the basis of operational similarity</article-title>
          .
          <source>Nauchn</source>
          . Tekh. Inf.,
          <source>Ser</source>
          .
          <volume>2</volume>
          (
          <issue>Automat</issue>
          . Document. Math. Linguist.)
          <volume>12</volume>
          ,
          <fpage>21</fpage>
          {
          <fpage>29</fpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>On stability of a formal concept</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          <volume>49</volume>
          ,
          <fpage>101</fpage>
          {
          <fpage>115</fpage>
          (
          <year>2007</year>
          ), http://www.springerlink.com/content/fk1414v361277475/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Masseglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cathala</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poncelet</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The psp approach for mining sequential patterns</article-title>
          . pp.
          <volume>176</volume>
          {
          <issue>184</issue>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .,
          <string-name>
            <surname>Mortazavi-Asl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinto</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dayal</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Pre xspan: Mining sequential pattern by pre x-projected growth</article-title>
          .
          <source>In: ICDE</source>
          . pp.
          <volume>215</volume>
          {
          <issue>224</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pinto</surname>
            , H., Han,
            <given-names>J</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dayal</surname>
          </string-name>
          , U.:
          <article-title>Multi-dimensional sequential pattern mining</article-title>
          .
          <source>In: CIKM '01: Proceedings of the tenth international conference on Information and knowledge management</source>
          . pp.
          <volume>81</volume>
          {
          <fpage>88</fpage>
          . ACM Press, New York, NY, USA (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Plantevit</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choong</surname>
            ,
            <given-names>Y.W.</given-names>
          </string-name>
          :
          <article-title>Mining multidimensional and multilevel sequential patterns</article-title>
          .
          <source>ACM Trans. Knowl. Discov. Data</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>37</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Srikant</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
          </string-name>
          , R.:
          <article-title>Mining sequential patterns: Generalizations and performance improvements</article-title>
          . In: Apers,
          <string-name>
            <given-names>P.M.G.</given-names>
            ,
            <surname>Bouzeghoub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gardarin</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds.)
          <source>Proc. 5th Int. Conf. Extending Database Technology, EDBT</source>
          . vol.
          <volume>1057</volume>
          , pp.
          <volume>3</volume>
          {
          <fpage>17</fpage>
          . Springer-Verlag (
          <volume>25</volume>
          {29
          <year>1996</year>
          ), http://citeseer.ist.psu.edu/article/srikant96mining.html
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>E cient data mining based on formal concept analysis</article-title>
          .
          <source>In: Lecture Notes in Computer Science</source>
          , vol.
          <volume>2453</volume>
          , p.
          <fpage>534</fpage>
          . Springer (Jan
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>Formal concept analysis for knowledge discovery and data mining: The new challenges</article-title>
          . In: Eklund, P.W. (ed.)
          <source>ICFCA. Lecture Notes in Computer Science</source>
          , vol.
          <volume>2961</volume>
          , pp.
          <volume>352</volume>
          {
          <fpage>371</fpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <article-title>Restructuring lattice theory: an approach based on hierarchies of concepts</article-title>
          .
          <source>In: Rival, I. (ed.) Ordered Sets. Reidel</source>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kitsuregawa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Paid: Mining sequential patterns by passed item deduction in large databases</article-title>
          .
          <source>In: IDEAS'06</source>
          . pp.
          <volume>113</volume>
          {
          <issue>120</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.L.</given-names>
          </string-name>
          :
          <article-title>Mining sequential patterns from multidimensional sequence data. Knowledge and Data Engineering</article-title>
          , IEEE Transactions on
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <volume>136</volume>
          { 140 (jan
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Spade: An e cient algorithm for mining frequent sequences</article-title>
          .
          <source>Machine Learning</source>
          <volume>42</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>31</volume>
          {60 (
          <year>January 2001</year>
          ), http://www.springerlink.com/link.asp?id=n3t642725v615427
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Approxmgmsp: A scalable method of mining approximate multidimensional sequential patterns on distributed system</article-title>
          .
          <source>In: Fuzzy Systems and Knowledge Discovery</source>
          ,
          <year>2007</year>
          .
          <article-title>FSKD 2007</article-title>
          . Fourth International Conference on. vol.
          <volume>2</volume>
          , pp.
          <volume>730</volume>
          {
          <issue>734</issue>
          (aug
          <year>2007</year>
          )
          <article-title>    2 o   2 1 t t 1 2 n g H n H H e i</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>