<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Temporal Data Using Relational Concept Analysis: An Application to Hydroecology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cristina Nica</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agnes Braud</string-name>
          <email>agnes.braud@unistra.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Dolques</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marianne Huchard</string-name>
          <email>huchard@lirmm.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florence Le Ber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICube, University of Strasbourg</institution>
          ,
          <addr-line>CNRS, ENGEES</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIRMM, University of Montpellier</institution>
          ,
          <addr-line>CNRS https://</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an approach for mining temporal data, based on Relational Concept Analysis (RCA), that has been developed for a real world application. Our data are sequential samples of biological and physico-chemical parameters taken from watercourses. Our aim is to reveal meaningful relations between the two types of parameters. To this end, we propose a comprehensive temporal data mining process starting by using RCA on an ad hoc temporal data model. The results of RCA are converted into closed partially ordered patterns to provide experts with a synthetic representation of the information contained in the lattice family. Patterns can also be ltered with various measures, exploiting the notion of temporal objects. The process is assessed through some quantitative statistics and qualitative interpretations resulting from experiments carried out on hydroecological datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Exploring temporal datasets is a major challenge in current research and various
methods have therefore been proposed since the 90's [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is worth pointing out
that temporal data are relational, so that relational methods [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] can be useful
to respect their relational structure, e.g. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In particular, Relational Concept
Analysis (RCA, [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) allows to classify relational data and provides hierarchical
results which facilitates the analysis step.
      </p>
      <p>
        Based on these properties, we propose to use RCA for exploring sequential
datasets from the hydroecological domain. These datasets were collected during
the Fresqueau project3 that focused on methods for assessing the quality of
watercourses. The collected data represent biological (Bio) and physico-chemical
(PhC) samples taken at xed points (river sites) and repeated in time. Both
parameters are used by the experts to determine the quality of watercourses.
3 http://engees-fresqueau.unistra.fr/presentation.php?lang=en
Therefore, a global assessment of the temporal relationship between PhC and
Bio parameters is needed. To this end, preprocessings of the raw sequential data
allow to build a qualitative temporal model that can be used to apply RCA on
these data. The RCA result is a family of lattices that can be navigated by the
users. The users can select relevant navigation paths through the lattices
(starting from concepts in a main lattice) by applying measures of interest based on
the concept extents, that can be linked to geographical information in our
application. Furthermore, in order to help their analysis and to synthetize the results,
we propose to transform those concepts within closed partially ordered patterns
(cpo-patterns, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), i.e. directed acyclic graphs where vertices are labelled with
information extracted from the concepts out of the family of lattices. Since
concepts can be more or less general or speci c, the extracted patterns can be
classi ed within three types, according to the number of vertices that are labelled
with general information. Then the users can choose to select and to navigate
general or speci c paths in the lattices.
      </p>
      <p>The paper is structured as follows. Section 2 presents basic de nitions and
related work. Section 3 describes the hydroecological data and their
preprocessing while the RCA process is detailed in Section 4. Section 5 introduces some
measures of interest dealing with the temporal dimension of obtained concepts.
Section 6 presents cpo-patterns in order to help the analysis. Section 7 describes
and discusses the experimental results carried out on Fresqueau datasets. Section
8 concludes and gives a few perspectives of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basics and Related Work</title>
      <p>
        Relational Concept Analysis (RCA, [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) extends Formal Concept Analysis (FCA
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) to classify sets of objects described by attributes and relations, thus allowing
to discover knowledge patterns and implication rules in relational datasets. RCA
applies iteratively FCA on a Relational Context Family (RCF) that is
constituted of a set K of object-attribute contexts and a set R of object-object contexts.
K contains n object-attribute formal contexts Ki = (Gi; Mi; Ii) ; i 2 f1; :::; ng.
R contains m object-object relational contexts Rj = (Gk; Gl; rj ) ; j 2 f1; :::; mg,
where Gk, called the domain of the relation, and Gl, called the range of the
relation, are respectively the sets of objects of Kk and Kl, and rj Gk Gl; k; l 2
f1; :::; ng. At each step, object-attribute contexts are extended with relational
attributes taking the syntactic form qrj (C), where q is a quanti er, rj is a
relation and C = (X; Y ) is a concept where X is a subset of objects from the
range of rj . This paper uses the existential quanti er: 9rj (C) is an attribute of
o 2 Gk if rj (o) \ X 6= ;. RCA process consists in applying FCA rst on each
object-attribute context of an RCF, and then iteratively on each object-attribute
context extended by the relational attributes created using the concepts from
the previous step. The RCA result is obtained when the family of lattices of two
consecutive steps are isomorphic and the contexts are unchanged.
      </p>
      <p>
        RCA has been applied to various data, e.g. for software model analysis and
re-engineering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To our knowledge, this is the rst time that RCA is used to
explore sequential datasets. There are, however, various related FCA approaches.
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] introduced Temporal Concept Analysis where objects are characterized with
a date and a state (i.e. a set of attributes). Data are merged into a single context,
and the resulting concept lattice is analysed thanks to the date element in the
concepts, so that temporal relations between concepts are actually revealed by
the analyst. This approach has been used to analyse sequential data about crime
suspects [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In our RCA approach, the temporal relation between dates is
considered as an object-object relation and it links concepts from several lattices.
In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], sequential datasets are processed without involving any partial order. In
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], closed subsequences are mined and then grouped in a lattice similar to a
concept lattice. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], sequential data are mapped onto pattern structures whose
projections are used to build a pattern concept lattice. The authors combine the
stability of concepts and the projections of pattern structures in order to select
relevant patterns.
      </p>
      <p>
        Besides, there exist various methods to explore qualitative sequential data.
Indeed, sequential pattern mining is an active research area, in relation to the
exponential growth of temporal and spatio-temporal databases. Sequential
patterns have been introduced by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and used for di erent purposes. Such an
approach has been developed within the Fresqueau project and focused on closed
po-patterns, which were selected through various measures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Indeed, selecting
relevant results is a main challenge for all approaches dealing with large datasets.
In FCA, the most used measures for selecting relevant concepts are stability [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
probability and separation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Unfortunately, these measures are not able to
take into account the speci c structure of concepts built on temporal objects.
We thus propose to use speci c measures, as detailed in Section 5.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Context and Data Preprocessing</title>
      <p>
        In the Fresqueau project, the analysed data cover various compartments such as
physico-chemistry, hydrobiology, hydromorphology and land use (as described
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Here, we try to tackle the following issue by means of RCA: Can experts
explain values of biological parameters from PhC values occuring in past months
and thus improve the global assessment of the quality of watercourse ecosystems?
      </p>
      <p>To answer this question we should mention that the quality of watercourses
is determined by the Bio parameters (e.g. Standardised Global Biological
Index (IBGN), Biological Index of Diatoms (IBD) and Fish Biotic Index (IPR)).
Hence, the objects of interest from our work are the Bio samples and we want to
assess, over a period of time, the impact of PhC macro-parameters (e.g. Nitrogen
(AZOT), Phosphor (PHOS) and Particulate Matter (PAES)) on Bio ones.</p>
      <p>Table 1(a) illustrates a small raw sequential dataset of Bio and PhC samples
taken from a site (e.g. S1) corresponding to a river segment. A set of sites
constitutes a geographical area. A data sequence is a chronologically ordered set
of PhC samples with a Bio one at the end, all taken from the same site. This
raw sequential dataset shows measurements made only for IBGN Bio parameter
and for four PhC parameters namely Ammonium (N H4+), Kjeldahl Nitrogen
(N KJ ), Nitrite (N O2 ) and Orthophosphate (P O43 ). For instance, 0:043 mg=l
of N H4+ is measured on 01=04, i.e. January 2004, for the site S1. An IBGN score
of 8/20 is measured on September 2004 for the same site.</p>
      <p>The raw sequential dataset contains only numerical values. For mining such
data, we transform them by applying discretization and selection processes based
on domain knowledge. The discretization aims at converting numerical values
into qualitative ones. To this end, we use qualitative values for Bio and PhC
parameters that are provided by the SEQ-Eau4 standard. Both types of
parameters have ve qualitative values, namely very good, good, medium, bad and very
bad represented respectively by the colors blue, green, yellow, orange and red.
In addition, SEQ-Eau standard groups PhC parameters into macro-parameters.
For example, N H4+, N KJ and N O2 are grouped into AZOT macro-parameter.
The selection process considers only relevant data by de ning some constraints
based on expert advice. For instance, the only analysed PhC samples are those
taken within 4 months before a Bio parameter, from the same site.</p>
      <p>Table 1(b) shows the preprocessed sequential dataset ready to be mined
using RCA. This sequential dataset is obtained by applying the discretization
and selection processes to the raw sequential dataset illustrated in Tab. 1(a).
It is worth pointing out that the preprocessed sequential dataset is signi cantly
small compared to the raw one thanks to the macro-parameters and the limited
analysed period of time.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Temporal Relational Analysis</title>
      <p>The sequential dataset is structured following the schema depicted in Fig. 1. The
four rectangles represent the four sets of objects we manipulate: Bio samples,
PhC samples, Bio parameters and PhC parameters. The links between Bio/PhC
samples and PhC samples are de ned by the temporal binary relation is preceded
by (denoted by ipb). This temporal relation associates one sample to another one
if the rst sample is preceded in time by the second one, on the same site. There
4 http://rhin-meuse.eaufrance.fr/IMG/pdf/grilles-seq-eau-v2.pdf
is no temporal binary relation between Bio samples since in this work we evaluate
the impact of physico-chemistry on biology. The Bio/PhC samples are described
only by the qualitative relations has parameter blue/green/yellow/orange/red
that link the Bio/PhC samples with the measured Bio/PhC parameters. For
instance, has parameter green links the PhC samples taken from S1 on 08/04
(Tab. 1(b)) with AZOT PhC parameter.</p>
      <p>Following the temporal data model illustrated in Fig. 1, we build the RCF
depicted in Tab. 2 for a small hydroecological sequential dataset. The tables KPHC
(PhC parameters), KBIOS (Bio samples) and KPHCS (PhC samples) represent
object-attribute contexts. There is no object-attribute context for Bio
parameters because each dataset is restricted to one value of one parameter (here IBGN
red). KBIOS and KPHCS have no column since the samples are only described using
the qualitative relations. The tables RPHCS-ipb-PHCS, RBIOS-ipb-PHCS , RbPHC
and RgPHC represent object-object contexts. In these object-object contexts, a
row is an object from the domain of the relation, a column is an object from the
range of the relation and a cross indicates a link between two objects. For
example, RPHCS-ipb-PHCS de nes the temporal relations (ipb) between PhC samples
and has KPHCS both as domain and range. RbPHC de nes the qualitative relations
between PhC samples and PhC parameters that have the blue (b) qualitative
value.</p>
      <p>Figure 2 represents the family of concept lattices obtained by applying RCA
on the RCF illustrated in Tab. 2. There are three lattices, one for each formal
context: LKPHCS (PhC samples, Fig. 2(a)), LKPHC (PhC parameters, Fig. 2(b)) and
LKBIOS (Bio samples, Fig. 2(c)). Each concept is represented by a box structured
from top to bottom as follows: concept name, simpli ed intent and simpli ed
extent. As said before, we have used the existential quanti er to build relational
attributes. For instance, the intent of C KPHCS 2 from concept LKPHCS contains the
relational attribute 9RgPHC(C KPHC 1) inherited from concept C KPHCS 5. This
relational attribute is common to all PhC samples that measure a green PHOS
parameter, which represents the extent of concept C KPHC 1 shown in Fig. 2(b).</p>
      <p>The navigation amongst the lattices shown in Fig. 2 follows the concepts
used to build relational attributes. For example, the aforementioned relational
attribute 9RgPHC(C KPHC 1) allows us to navigate from concept C KPHCS 2 out
of LKPHCS to concept C KPHC 1 out of LKPHC.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Measures of Interest for Temporal Concepts</title>
      <p>To analyse the results of the RCA process, experts start from a main lattice, here
the lattice LKBIOS, and navigate through the relational attributes linking concepts
of di erent lattices. Besides, since RCA process can produce a large number of
interrelated concepts, depending on the dataset volume and characteristics, some
interestingness measures are required to select relevant concepts from where to
start the navigation.</p>
      <p>Such measures should take into account the speci city of concepts built on
temporal objects, whereas well-known measures (e.g. concept stability) t basic
concepts. For example, Fig. 3 depicts two concept extents where the temporal
objects are the Bio samples. Both concepts { that we call temporal concepts
{ have the same number of Bio samples and they cover the same geographical
area. If two Bio samples are deleted, following the idea of stability measure, one
of the site S2 and one of S3, then both concepts still have the same number of
Bio samples but they cover di erent river sites.</p>
      <p>To overcome this limitation, we introduce below an approach based on the
distribution of temporal concept extents. The main idea in our method states
that a concept is relevant if it is frequent and related to many sites where Bio
samples are evenly distributed amongst these sites. Accordingly, we try to nd
temporal concepts whose intents represent universally available regularities in
the studied geographical area. In our example, both concepts have the same
frequency (7 samples), but the distribution is di erent: Concept 1 is more relevant
than Concept 2.</p>
      <p>Let (X; Y ) be a formal concept of the main lattice, then its extent X is a
set of temporal objects { or pairs { (Object; Date). If the value of Object is
not identical for all the pairs, then the pairs can be grouped into categories by
objects. We accordingly de ne X which represents the set of distinct objects</p>
      <p>C_KPHCS_4
∃RbPHC(C_KPHC_3)
∃RbPHC(C_KPHC_2)</p>
      <p>S1_10/01</p>
      <p>C_KPHCS_3
∃RgPHC(C_KPHC_2)
∃RPHCS-ipb-PHCS(C_KPHCS_0)
∃RPHCS-ipb-PHCS(C_KPHCS_4)</p>
      <p>S1_17/01</p>
      <p>C_KPHCS_2
S1_25/12</p>
      <p>S2_20/02</p>
      <p>C_KPHCS_6
∃RPHCS-ipb-PHCS(C_KPHCS_5)
∃RPHCS-ipb-PHCS(C_KPHCS_2)</p>
      <p>S2_28/02</p>
      <p>C_KPHCS_1</p>
      <p>*
(a) LKPHCS</p>
      <p>C_KPHC_3
C_KPHC_2 C_KPHC_1
AZOT PHOS
AZOT PHOS</p>
      <p>C_KPHC_0
(b) LKPHC</p>
      <p>C_KBIOS_4 C_KBIOS_3
∃RBIOS-ipb-PHCS(C_KPHCS_3) ∃RBIOS-ipb-PHCS(C_KPHCS_2)</p>
      <p>S1_20/01 S1_28/12</p>
      <p>C_KBIOS_0
∃RBIOS-ipb-PHCS(C_KPHCS_0)
∃RBIOS-ipb-PHCS(C_KPHCS_5)
∃RBIOS-ipb-PHCS(C_KPHCS_4)</p>
      <p>C_KBIOS_2
∃RBIOS-ipb-PHCS(C_KPHCS_6)</p>
      <p>S2_30/02</p>
      <p>C_KBIOS_1
∃RBIOS-ipb-PHCS(C_KPHCS_1)
(c) LKBIOS</p>
      <p>De nition 1 (Absolute Frequency ( o)). Let C = (X; Y ) be a temporal
concept and o an object of X . The absolute frequency of o in C , denoted o, is
equal to the number of distinct pairs of X where o occurs. X = f(o; o) jo 2 X g.</p>
      <p>In our example (Fig. 3), X1 = X2 = fS1; S2; S3g. Concept 1 has X1 =
f(S1; 3) ; (S2; 3) ; (S3; 1)g and Concept 2 has X2 = f(S1; 5) ; (S2; 1) ; (S3; 1)g.</p>
      <p>Fig. 3: Bio samples distribution by sites for two concept extents.
De nition 2 (Support and Richness ( )). The support of a concept (X; Y )
corresponds to the number of pairs (Object; Date) out of X. Its richness,
represented by , is de ned as the cardinality of X.</p>
      <p>
        De nition 3 (Distribution index (IQV)). The distribution of a concept
(X; Y ) describes the number of times each object out of X occurs in X and it
is measured by the Index of Qualitative Variation (IQV, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). IQV is based on
the ratio of observed di erences in X to the total number of possible di erences
within X ( &gt; 1).
      </p>
      <p>IQV =
jXj2
jXj2 (</p>
      <p>P
i=1
oi</p>
      <p>2
1)
(1)
If
= 1, IQV = 0.</p>
      <p>
        Our choice of IQV stems from the observation that the objects of X do not
have an intrinsic ordering. Thus, measuring their distribution using the IQV
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] seems interesting. The IQV ranges from 0 to 1. When all pairs of X contain
the same object, there is no diversity and the IQV is 0. In contrast, when there
are di erent objects and all pairs of X have equal o, there is even distribution
and the IQV is 1.
      </p>
      <p>Returning to our example (Fig. 3), both concepts have support jX1j =
jX2j = 7 and richness 1 = 2 = 3. For Concept 1 the distribution is IQV1 =
3[72 (32+32+12)] = 0:91 and for Concept 2 IQV2 = 0:67. Hence, Concept 1 is
72(3 1)
computed as more relevant than Concept 2 since its objects (Bio samples) are
better distributed amongst the sites.
6</p>
    </sec>
    <sec id="sec-6">
      <title>CPO-patterns for Helping Expert Analysis</title>
      <p>
        Since our aim is to facilitate the analysis work, we propose, in addition to the
selection of relevant concepts, to convert those concepts into cpo-patterns.
Indeed cpo-patterns are structures with a graphical representation easy to read
and understand (e.g. Fig. 4). The expert can choose a cpo-pattern that
highlights interesting, surprising knowledge, and deepen the analysis by exploring the
area in the lattice surrounding the corresponding concept. Thus, starting from
the family of lattices built using RCA, we extract cpo-patterns following the
approach proposed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. It is worth pointing out that there is a cpo-pattern
for each concept out of the lattice corresponding to the objects of interest for
the study, i.e. LKBIOS in our work.
      </p>
      <p>
        Formally, let I = fI1; I2; :::; Img be a set of items. An itemset IS is a non
empty, unordered, set of items, IS = (Ij 1:::Ij k) where Ij i 2 I. Let IS be the
set of all itemsets built from I. A sequence S is a non empty ordered list of
itemsets, S = hIS1IS2:::ISpi where ISj 2 IS. The sequence S is a subsequence
of another sequence S0 = hIS10IS20:::ISq0i, denoted as S s S0, if p q and if
there are integers j1 &lt; j2 &lt; ::: &lt; jk &lt; ::: &lt; jp such that IS1 ISj0 1; IS2
ISj0 2; :::; ISp ISj0 p. Sequential patterns have been de ned by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as frequent
subsequences found in a sequence database. A po-pattern is a directed acyclic
graph G = (V; E ; l). V is the set of vertices, E is a set of directed edges such
that E V V , and l is a labelling function mapping each vertex to an itemset.
A partial order can be de ned on G as follows: for all fu; vg 2 V2, u &lt; v if
there is a directed path from u to v. However, if there is no directed path from u
to v, these elements are not comparable. Each path of the graph is a sequential
pattern as de ned before. The set of paths in G is denoted by PG. A po-pattern is
associated to the set of sequences SG that contains all paths of PG. Furthermore,
let G and G0 be two po-patterns with PG and PG0 their sets of paths. G is a
sub po-pattern of G0, denoted by G g G0, if 8M 2 PG; 9M 0 2 PG0 such that
M s M 0. A po-pattern G is closed, denoted cpo-pattern, if there exists no
po-pattern G0 such that G g G0 with SG = SG0 .
      </p>
      <p>
        As described in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], thanks to the hierarchical structure of the RCA results,
more or less accurate cpo-patterns are extracted. Based on their accuracy, three
types of cpo-patterns could be de ned: abstract, hybrid and concrete. Firstly,
the abstract cpo-pattern represents an imprecise common trend of the analysed
data. Secondly, the hybrid one, depicted in Fig. 4, corresponds to a more or less
accurate common trend of the analysed data. Finally, the concrete cpo-pattern
designates an accurate common trend of the analysed data.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Experiments and Discussion</title>
      <p>The experiments are carried out on a MacBook Pro with a 2.9 GHz Intel Core i7,
8GB DDR3 RAM running OS X 10.9.5. RCA is applied using the RCAExplore5
tool. For the extraction and selection of cpo-patterns we have developed an
algorithm in Java 8 based on Java Collections Framework and Lambda Expressions.</p>
      <p>
        Three sequential datasets (each dataset concerns only one Bio parameter
having the yellow quality) from the Fresqueau project are analysed: IBDyellow,
IP Ryellow and IBGNyellow. These datasets are interesting since the yellow
quality of watercourses represents a median area between good ecological status and
bad ecological status of watercourses. Other quality values have also been
analysed but are not presented here. The objective is to extract more or less accurate
5 http://dolques.free.fr/rcaexplore
cpo-patterns representing frequent PhC trends of watercourses common in many
sites. To this end, the datasets are preprocessed and temporally modelled as
described in Sections 3 and 4. The temporal relational analysis relies on the IceBerg
algorithm [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which result is a concept lattice of frequent closed itemsets. A
10% threshold is used only for the input of Bio samples (it corresponds to the
lattice of Bio samples that covers the objects of interest from our work). The
choice of this value allows us to focus on the cpo-patterns that describe many
sites.
      </p>
      <p>Table 3 shows some quantitative statistics regarding the temporal relational
analysis and the extraction of cpo-patterns. The results in Output column show
that the number of extracted concepts for the IBGN dataset is about 3 times
smaller than the number of extracted concepts for the IPR and IBD datasets.
This reveals greater heterogeneity in IPR and IBD datasets in contrast with
IBGN. Consequently, cpo-patterns linking PhC and IBGN Bio parameters
represent more examples and will provide more reliable forecasts of the yellow quality
of watercourses.</p>
      <p>The CPO-patterns columns represent the di erent types of extracted
cpopatterns and illustrate their quite large number that has to be reduced. To this
end, we select relevant cpo-patterns based on the support, richness and
distribution of the associated concepts (see Section 5). Figure 5 shows three scatter-plots
(for the three sets of extracted concrete cpo-patterns in Tab. 3) of the distribution
index (IQV) with respect to the support. The diameter of the circles is
proportional to the richness. The user can rst explore a few selected cpo-patterns based
on high thresholds for these measures. Then he/she can follow the cpo-pattern
hierarchy to deepen the analysis, as described below, or select more cpo-patterns
based on lower thresholds. For example, by de ning two thresholds IQV = 0:98
and Support = 25, the top-6 (IBGN), the top-26 (IBD) and the top-30 (IPR)
best distributed and most frequent cpo-patterns are selected. Focusing on IBD,
if the thresholds are e.g. IQV = 0:98 and Support = 20, 52 cpo-patterns are
selected. These cpo-patterns cover various numbers of sampling sites, and thus
more or less extensive geographical areas. To select greater or smaller areas, the
cpo-patterns are ranked by analysing the diameter of the circles.</p>
      <p>The qualitative interpretation of the extracted cpo-patterns was performed
by an hydroecologist. In Fig. 6 is an interesting excerpt from the main lattice
of IBGNyellow dataset. This group of cpo-patterns is subsumed by the abstract
cpo-pattern of C KBIOS 868 (support = 28) that represents the less accurate
common trend: often before yellow IBGN are sampled simultaneously a green
PhC parameter and another yellow PhC parameter. Figure 6 also emphasizes
the well-known correspondence between MOOX (organic matter pollutions) quality
classes and IBGN ones: a yellow MOOX appears in the yellow IBGN cpo-pattern,
which is associated to C KBIOS 595. The concepts C KBIOS 720, C KBIOS 550
and C KBIOS 400 highlight the impact of phosphorus pollution (PHOS) on
macroinvertebrates (IBGN) that is a lesser-known fact.</p>
      <p>Moreover, in Fig. 6 two bene ts of exploring sequential data by means of RCA
are observed. The rst one is the generalisation order regarding the structure
of the extracted cpo-patterns. For example, the structure of C KBIOS 400
cpopattern is more speci c than the structure of its ancestor cpo-patterns, i.e. there
exist a projection from its ancestor cpo-patterns into C KBIOS 400 cpo-pattern.
The second bene t is the generalisation of items. For instance, the C KBIOS 550
cpo-pattern reveals the rule fPAESgreen; PHOSyellowg ! fIBGNyellowg that is a
specialisation of the rule revealed by the C KBIOS 720 cpo-pattern, that is f?green;
PHOSyellowg ! fIBGNyellowg. These properties are useful for the expert who can
navigate from speci c to general patterns or vice versa.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>
        We have introduced an original approach for exploring temporal data using
RCA. Given a hydroecological dataset, where data represent Bio or PhC samples
measured at a given time in a certain site, we nd hierarchies of more or less
general cpo-patterns that summarize the impact of PhC parameters on Bio ones.
A comprehensive process for mining sequential datasets has been proposed: 1)
preprocessing of the raw data based on domain knowledge, 2) relational analysis
of the preprocessed data based on an original temporal data model, 3) selection of
temporal concepts using the distribution, the richness and the support measures,
and 4) extraction of cpo-patterns by navigating amongst temporal concepts (step
1,00
0,92
IVQ0,84
0,76
0,68 0
1,00
0,98
IVQ0,96
0,94
detailed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Our method has been applied to sequential datasets from the
Fresqueau project.
      </p>
      <p>The main bene ts of our approach are as follows. Using RCA produces
hierarchical concepts, while cpo-patterns synthetize complex navigation paths, both
facilitating the expert analysis. Furthermore, the proposed measures on temporal
concepts are useful to select relevant information in our application.</p>
      <p>In the future, we plan to apply our approach on other relational datasets.
This will require to deeply investigate the behaviour of our measures and maybe
to nd other methods for selecting the extracted cpo-patterns.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Mining sequential patterns</article-title>
          .
          <source>In: Int. Conference on Data Engineering</source>
          . pp.
          <volume>3</volume>
          {
          <issue>14</issue>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arevalo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falleri</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huchard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nebut</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Building abstractions in class models: Formal concept analysis in a model-driven approach</article-title>
          .
          <source>In: MoDELS 2006</source>
          . pp.
          <volume>513</volume>
          {
          <issue>527</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bimonte</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulil</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braud</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernesson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolques</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabregue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grac</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lalande</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Ber</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Un systeme decisionnel pour l'analyse de la qualite des eaux de rivieres</article-title>
          .
          <source>Ingenierie des Systemes d'Information</source>
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <volume>143</volume>
          {
          <fpage>167</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Buzmakov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Egho</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Rassi, C.:
          <article-title>On mining complex sequential data by means of FCA and pattern structures</article-title>
          .
          <source>International Journal of General Systems</source>
          <volume>45</volume>
          ,
          <fpage>135</fpage>
          {
          <fpage>159</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Casas-Garriga</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Summarizing sequential data with closed partial orders</article-title>
          .
          <source>In: 2005 SIAM Int. Conference on Data Mining</source>
          . pp.
          <volume>380</volume>
          {
          <issue>391</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Dzeroski, S.:
          <article-title>Relational data mining</article-title>
          . In: Maimon,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Data Mining and Knowledge Discovery Handbook</article-title>
          , pp.
          <volume>869</volume>
          {
          <fpage>898</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fabregue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braud</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grac</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Ber</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Discriminant temporal patterns for linking physico-chemistry and biology in hydro-ecosystem assessment</article-title>
          .
          <source>Ecological Informatics</source>
          <volume>24</volume>
          ,
          <issue>210</issue>
          {
          <fpage>221</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ferre</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The e cient computation of complete and concise substring scales with su x trees</article-title>
          .
          <source>In: Formal Concept Analysis</source>
          , pp.
          <volume>98</volume>
          {
          <fpage>113</fpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>V.S.</given-names>
          </string-name>
          :
          <article-title>Exploring multi-relational temporal databases with a propositional sequence miner</article-title>
          .
          <source>Progress in AI 4</source>
          (
          <issue>1-2</issue>
          ),
          <volume>11</volume>
          {
          <fpage>20</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Frankfort-Nachmias</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leon-Guerrero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Social Statistics for a Diverse Society, chap</article-title>
          .
          <source>Measures of Variability. SAGE Publications</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Klimushkin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Approaches to the selection of relevant concepts in the case of noisy data</article-title>
          .
          <source>In: Formal Concept Analysis</source>
          , pp.
          <volume>255</volume>
          {
          <fpage>266</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>On stability of a formal concept</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          <volume>49</volume>
          (
          <issue>1-4</issue>
          ),
          <volume>101</volume>
          {
          <fpage>115</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Nica</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braud</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolques</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huchard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Le</given-names>
            <surname>Ber</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Extracting Hierarchies of Closed Partially-Ordered Patterns using Relational Concept Analysis</article-title>
          .
          <source>In: International Conference on Conceptual Structures</source>
          , ICCS'
          <year>2016</year>
          , Annecy, France. pp.
          <volume>1</volume>
          {
          <fpage>14</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Poelmans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elzinga</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viaene</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedene</surname>
          </string-name>
          , G.:
          <article-title>A Method based on Temporal Concept Analysis for Detecting and Pro ling Human Tra cking Suspects</article-title>
          .
          <source>In: Arti cial Intelligence and Applications</source>
          ,
          <string-name>
            <surname>AIA</surname>
          </string-name>
          <year>2010</year>
          , Innsbruck, Austria. pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rouane-Hacene</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huchard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Relational concept analysis: Mining concept lattices from multi-relational data</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          <volume>67</volume>
          (
          <issue>1</issue>
          ),
          <volume>81</volume>
          {
          <fpage>108</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>E cient data mining based on formal concept analysis</article-title>
          .
          <source>In: Database and Expert Systems Applications</source>
          , pp.
          <volume>534</volume>
          {
          <fpage>546</fpage>
          . Springer (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Wol</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          :
          <article-title>Temporal Concept Analysis</article-title>
          .
          <source>In: ICCS-01 Workshop on Concept Lattice for KDD, 9th Int. Conference on Conceptual Structures</source>
          . pp.
          <volume>91</volume>
          {
          <issue>107</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>