<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Spatial-based KDD Process to Better Understand the Spatiotemporal Phenomena</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irstea - TETIS</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rue J. F. Breton</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montpellier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noumea</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>New Caledonia hugo.alatrista@univ-nc.nc</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a knowledge discovery process applied to hydrological data. To achieve this objective, we combine successive methods to extract knowledge on data collected at stations located along several rivers. Firstly, data is pre processed in order to obtain di erent spatial proximities. Later, we apply two algorithms to extract spatiotemporal patterns and compare them. Such elements can be used to assess spatialized indicators to assist the interpretation of ecological and rivers monitoring pressure data.</p>
      </abstract>
      <kwd-group>
        <kwd>data mining</kwd>
        <kwd>sequential patterns</kwd>
        <kwd>spatiotemporal data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In response to the rapidly rising and widespread use of database technology
including heterogeneous, geo-referenced and multidimensional data -, there is
a growing interest in developing new techniques for extracting knowledge from
data. These techniques are the subject of the emerging eld of Knowledge
Discovery in Databases (KDD). The KDD process is de ned as the multi-steps
process of discovering valid, novel, and potentially useful knowledge from large
databases [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The KDD process can be very complex and the steps may change signi
cantly depending on data origin. For instance, when data are geo-referenced,
KDD process is considerably limited because spatial information is not easy
to discern and can provide additional information. Currently, research in
geographic knowledge discovery (i.e., KDD on spatial databases) is very active [
        <xref ref-type="bibr" rid="ref17 ref3">17,
3</xref>
        ]. However, the spatial characteristics of data are not fully exploited in the
KDD process. For instance, river pollution study may lead to di erent space
division following di erent pollution hypotheses. Knowing the impact of
spatiality on handle strategies is essential to restore the ecological status of aquatic
environments.
      </p>
      <p>In this context, this paper focus on the impact of spatial components of
data on the KDD process. For this, we propose a KDD process including two
steps: rst, we pre process the data in order to divide the space using two original
spatialization approaches. Finally, we apply two di erent data mining techniques
enabling to extract two semantically di erent kinds of spatiotemporal patterns.
These techniques have been compared and di erences were widely discussed in
this paper.</p>
      <p>This paper shows the current state of a PhD thesis, which focus on
understand and modelize spatiotemporal phenomena. This thesis is under direction of
Prof. Maguelonne Teisseire, Prof. Nazha Selmaoui-Folcher, Prof. Sandra Bringay
and Prof. Frederic Flouvat.</p>
      <sec id="sec-1-1">
        <title>Problem statement</title>
        <p>The water system, structuring landscapes and ecosystems of metropolitan
France, covers more than 500000 km. divided into 6 water supply agencies
containing several watersheds1. This structure is a fragile environment subject to the
presence of many economic activities and usages that have increased the
vulnerability of the water resources including rivers, canals, lakes, etc. In this context,
river pollution is a phenomenon that is observed by measuring physicochemical
and biological indicators for water quality. These indicators, which evolve over
time and depend explicitly on the location of sampling stations strategically
located along several rivers.</p>
        <p>Two types of data are available: (1) static informations related to the
station itself, e.g., its location (coordinates x, y), its reference code, etc., and; (2)
dynamic informations which correspond to data measured by the station, e.g.,
the Standardized Global Biological Index (ibgn), Biological Diatom Index (ibd),
the taxonomic variety (taxovar), the sh index ( shindex), etc.</p>
        <p>This manuscript is organized as follows: In Section 2, we present our
motivation and a detailed overview of the related work. Then, in Section 3, we
describe our generic knowledge discovery process. Later, we apply our
proposition on dataset and some patters obtained are shown. This paper ends with our
conclusions and some perspectives.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>
        Knowledge discovery in databases (KDD ) is a dynamic research eld. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
authors presented the most widely used KDD framework and provided a broad
overview of knowledge discovery techniques. Here KDD, was described as a set
of interactive and iterative steps: data selection, pre-processing, transformation,
data mining, and post processing or interpretation. As mentioned in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the
basic problem addressed by the KDD process is one of mapping low-level data
into other forms that might be more compact, more abstract, or more useful.
Data mining is only one step of this general process. Indeed, if only data mining
is used, this can lead to the discovery of meaningless patterns for experts.
      </p>
      <p>
        In addition, the advent of GIS (Geographical Information Systems)
technology and the availability of large volume of spatiotemporal data has increased
1 In the context of surface water, a watershed is a geographic area bounded
peripherally by a water parting and draining to a common outlet: a point on a larger stream
or river, a lake, etc.
the need for e ective and e cient methods to extract unknown and unexpected
information. Unfortunately, in many situations, a simple data mining method
will often be limited in its ability to retrieve informative knowledge from
complex spatiotemporal databases [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The speci city of environmental data - and
in a more general sense spatiotemporal data, w.r.t. classical data - is the
significance of spatial and temporal dimensions in the extraction and interpretation
process [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this context, authors in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] highlight the importance of pre and
post processing in a KDD process concerning spatiotemporal data.
      </p>
      <p>
        Pre-processing and transformation steps (or more simply pre-processing)
are directly related to the data mining step because these steps have an
important impact on mining results. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], pre-processing is used to integrate spatial
information in the data mining step. Spatial data is converted in spatial
predicates. Thanks to this transformation a commonly used data mining algorithm
can be used to extract spatial patterns. Moreover, classical data mining
algorithms take a simple table as input and does not consider spatial information
directly. For example, if the objective is to study changes in data generated by
monitoring stations, one way of extracting such spatial patterns is to aggregate
informations for each station in a single row of the input table. In [
        <xref ref-type="bibr" rid="ref12 ref18">12, 18</xref>
        ],
authors use this approach, and map their spatial data to sets (or sequences) of
values. Several pre-processing techniques in spatiotemporal datasets have been
discussed in the literature [
        <xref ref-type="bibr" rid="ref13 ref15 ref6">6, 13, 15</xref>
        ]. Each reference has its own focus such as
spatial classi cation, spatial association rules or knowledge discovery respectively.
      </p>
      <p>
        Referring to data mining, several solutions are proposed in the literature
to extract knowledge in a spatiotemporal database. Early works addressed the
spatial and temporal dimensions separately. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], authors have studied
temporal sequences which only take into account the temporal dimension. Later,
in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], authors have extended these works to represent sets of environmental
features evolving over time. They extract sequences of characteristics that appear
frequently in areas, but without taking into account the spatial environment.
Other authors such as [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] looked for spatial patterns or co-locations, i.e.,
subsets of features (object-types) with instances often identi ed close in space.
As well, in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], authors focus on the extraction of sequences representing the
propagation of spatiotemporal events in prede ned time windows w.r.t. a
reference location. They introduce two concepts: Flow patterns and Generalized
Spatiotemporal Patterns in order to precisely extract the sequence of events that
occur frequently in some locations. On the other hand, in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], authors found that
all the patterns discovered with other approaches are not all the time relevant
because they may not be statistically signi cant and in particular not dense in
space and time. Nevertheless, they study events one after another. Later, in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
authors proposed the concept of Mixed-Drove Spatiotemporal Co-occurrence
Patterns, i.e., subsets of two or more di erent event-types whose instances are often
located in spatial and temporal proximity. But, they do not extract the frequent
evolutions of even-types over time (events of each instance occur necessarily in
the same time slot).
      </p>
      <p>Spatiotemporal
database</p>
      <p>Space division</p>
      <p>Patterns extraction
Watercourse
∊-neighborhood</p>
      <p>Spatially Sequential Pattern
Spatio-sequential Pattern</p>
      <p>To our knowledge, no works have tried to mine sequential patterns at
different spatial granularity levels and then combine their results to obtain more
informative and general spatial patterns. In fact, the goal of spatial data mining
is to discover spatial patterns and to suggest hypotheses about potential
generators of this kind of patterns. This task is not straightforward and requires
us to challenge the classical KDD process. In this paper we focus on spatial
patterns from the perspective of space division using di erent levels of spatial
granularities. This task was performed to deduce more general patterns by
averaging attributes of spatial objects grouped into homogeneous areas. This rst
pre-processing step was combined, in the one hand, with a classical algorithm of
sequential pattern mining and, in the other hand, with a new method for
extracting spatiotemporal patterns (i.e., sequences of spatial sets of events). This second
data mining approach allow us to deal with the developments and interactions
between the study area and its immediate environment.</p>
      <p>Several phenomena can be studied using our KDD approach, e.g., the soil
erosion, the epidemic surveillance, the river pollution, and many others. In this
paper, we have applied our method on data of river pollution, but our approach
has been tested also on dengue epidemiological monitoring data collected in New
Caledonia.
3</p>
    </sec>
    <sec id="sec-3">
      <title>General process for mining spatial databases</title>
      <p>Our approach is divided into three steps: (1) spatial decomposition and
aggregation, and; (2) spatiotemporal patterns mining using two approaches. This general
process is illustrated in Figure 1.</p>
      <p>Spatial decomposition and aggregation are pre-processing steps in which
spatial data is mapped to sequences according to di erent spatial relationships (e.g.,
station proximity, watercourse). Thanks to this transformation, the spatial
features of data are integrated into the KDD process.</p>
      <p>
        The resulting spatial sequences are used as an input of the data mining
step, which is composed of two methods. The rst one extracts spatially
frequent sequences using a classical sequential patterns mining algorithm (see, [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]),
therefore, extracted patterns represent spatially frequent temporal evolutions of
zones. The second one is a new approach that extracts patterns called
spatiosequential patterns. This last approach enables to analyze changes in zones over
time taking into account their neighboring environment. Notice that we obtain
two semantically di erent kinds of patterns.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Spatial pre-processing</title>
        <p>
          Data at our disposal is associated to biological indicators collected by
monitoring stations strategically positioned along several watersheds. This heterogeneous
data is also geo-referenced and temporally variable, thus making them di cult
to explore globally. Moreover, several implicit spatial relationship between
studied objects may be considered. For instance, a monitoring station is located in
upstream (or downstream) along to a speci c watercourse, and it is also located
in an agricultural zone. It is therefore necessary to perform pre-processing that
takes into account di erent spatial proximities (e.g., grouping stations according
to their distance, according to their agricultural zone, etc.). In this work, we
do not study the evolution of events for each station independently, this kind
of approaches are widely studied, e.g., see [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In this article, we propose two
di erent ways to explore spatial data.
        </p>
        <p>{ A watercourse approach: for a given watercourse, two stations X and Y
located on this watercourse are considered to be neighbors. For instance, in
Figure 2, stations W , X, Y and Z belong to the same watercourse, moreover,
these stations are considered as a single area and their data are combined.
An example of incident that can be study thanks to this approach is: a fuel
out ow from a boat at station X will impact on measures of station X and
later on measures on stations Y and Z located on downstream of station X.
{ The -neighborhood approach: the space is divided into areas grouping
stations by exploiting the Lambert coordinates. In each of these areas, stations
covering an area of km2 are grouped, even if these stations belong to
different watercourses. For instance in Figure 3, stations W , X and Y are
considered as a single area, even if they are not on the same watercourse.
An example of phenomenon that can be study based on this approach is:
pesticide use in a crop eld located between stations X and Y can impact on
measures of stations located on rivers around this crop eld even if stations
are not positioned in the same river.</p>
        <p>Thanks to these two spatial division methods, we are able to group the
stations within areas and thus to aggregate data in order to build spatial sequences,
i.e., sequences containing spatial characteristics of data. Nevertheless, if we
cannot perceive the "spatialization" in sequences, this feature can be evinced in
patterns obtained as discussed in Section 4.1.</p>
        <p>W</p>
        <p>X</p>
        <p>Y</p>
        <p>Z
!</p>
        <p>W
!
X</p>
        <p>
          Y
In this section, we present two data mining methods. The rst one is a classical
algorithm ([
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]) and the second one is a new approach called Spatio-Sequential
Patterns mining (or simply S2P mining).
        </p>
        <p>Sequential patterns mining: Consider the spatiotemporal database DB,
illustrated in Table 1, which groups all records made by stations dispersed along
several rivers (e.g. in Table 1, item A could be "good biological indicator IBGN").</p>
        <p>Each tuple T is a transaction and consists of a triplet (id-station, id-date,
itemset): the id of the station, the date of record as well as all current quality
status of the river.</p>
        <p>Let I = fi1; i2; : : : ; img the set of items (quality status). An itemset is a
nonempty set of items denoted by (i1; i2; : : : ; ik) where ij is an item. A sequence S is
an non-empty ordered list, of itemsets denoted by &lt; IS1; IS2; : : : ; ISp &gt; where
ISj is an itemset.</p>
        <p>A n-sequence is a sequence of n-itemsets. For example, consider quality status
A; B; C; D and E recorded by the station Station1 according to the sequence
S =&lt; (A; E)(B; C)(D)(E) &gt;, as shown in the Table 1. This means quality
status A and E were recorded together by Station1, i.e., at the same time.
Then, Station1 recorded B and C, the last items in the sequence were recorded
later and separately, by the same station. In this example, S is a 4-sequence.</p>
        <p>A sequence &lt; IS1; IS2; : : : ; ISp &gt; is a subsequence of another sequence &lt;
IS10; IS20; : : : ; ISm0 &gt; if there exist integers k1 &lt; : : : &lt; kj &lt; : : : &lt; kp such as
IS1 ISk01 ; IS2 ISk02 ; : : : ; ISp ISk0p . For example, the sequence S0 =&lt;
(B)(E) &gt; is a subsequence of S because (B) (B; C) and (E) (E). However,
&lt; (B)(C) &gt; is not a subsequence of S because the two itemsets (B) and (C)
are not included in two itemsets of S. All quality status recorded by the same
station are grouped and sorted by date. It is called the data sequence of the
station.</p>
        <p>A station supports a sequence S if S is included in his data sequence (S is
a subsequence of the station data sequence). The support of a sequence S is
calculated as the percentage of stations that support S.</p>
        <p>Let minsupp be a minimum support set by the user, a sequence that satis es
the minimum support (i.e., whose support is greater than minsupp) is a frequent
sequence called a sequential pattern.</p>
        <p>
          The interpretation of this rst type of patterns is strongly due to how we
preprocess data to be mined. Indeed, the main challenge of spatially frequent
sequences is to capture the spatial characteristics of data grouping stations
using di erent topologies before to data mining step. For more information,
see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Afterwards, we present a second type of patterns whose semantics takes into
account the spatial relationships (e.g., neighborhood) between stations.
Spatio-sequential patterns mining: On the spatiotemporal database
illustrated in Table 1, we de ne a neighborhood relationship between stations (or a
set of stations), which is denoted by Neighbor, as:
n Neighbor(Stationi; Stationj) = true if Stationi and Stationj are close by</p>
        <p>Neighbor(Stationi; Stationj) = false otherwise</p>
        <p>We de ne the In relationship between stations and itemsets which describes
the occurrence of itemset IS in station S at time t in the database DB: In(IS; S; t)
is true if is is present in DB for station S at time t. In our example, consider the
itemset IS = (A; E) then In(IS; Station1; 2012=01=12) is true (see Table 1).</p>
        <p>Spatial itemset. Let ISi and ISj be two itemsets, we say that ISi and
ISj are spatially close if and only if In(ISi; Stationi; t) ^ In(ISj ; Stationj ; t) ^
N eighbor(Stationi; Stationj ) is true.</p>
        <p>A pair of itemsets ISi and ISj that are spatially close, is called a spatial
itemset and denoted by IST = ISi ISj .</p>
        <p>To facilitate notations, we introduce a group operator for itemsets to be
assigned by the operator (near ), denoted by []. The symbol represents the
absence of itemsets in a zone. Figure 4 shows the three types of spatial itemsets
that we can build with the proposed notations. The dotted lines represent spatial
neighborhood relationship.</p>
        <p>We now de ne the notion of zones evolution according to their spatial
neighborhood relationship.</p>
        <p>Spatial Sequence. A spatial sequence or simply S2 is an ordered list of
spatial itemsets, denoted by s = hIST1 IST2 : : : ISTm i where ISTi , ISTi+1 satisfy
the constraint of temporal sequentiality . A S2 s = h(AB)( [B; C])(P [Q; R])i
is illustrated in Figure 5, where the arrows represent the temporal dynamics and
the dotted lines represent the environment.</p>
        <p>A
(a)</p>
        <p>
          D
The main challenge involved in the spatio-sequential patterns mining
problem is to study the evolution of characteristics/events in monitoring
stations taking into account immediate surrounding areas (for more information,
see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]). In providing a minimal support, the method is able to extract the
frequent sequences, i.e., sequences with a support equal or greatest than a minimal
support xed by users.
        </p>
        <p>More formally: Let minsupp be a minimum threshold set by the user, a
spatial sequence S2 satisfying ST P i(S2) minsupp. These frequent sequences
are called spatio-sequential patterns or simply S2P.</p>
        <p>It is important to notice that both data mining methods - spatially sequential
patterns and spatio-sequential patterns - can be used with any spatialization
approach. For instance, monitoring station located on watercourses can be grouped
by district in order to study the impact of river pollution between neighboring
districts.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Some results</title>
      <p>In this section, we present a qualitative evaluation by giving some examples of
spatiotemporal patterns extracted by the two data mining methods. Later, these
two kinds of patterns are compared semantically.</p>
      <p>Spatially sequential patters: Table 2 shows some spatial sequential
patterns extracted from RM water supply agency dataset using the -neighborhood
spatialization approach. We can notice that we obtain a sequence of itemsets
or set of items (events), which characterizes the evolution of a set of stations
- covering 10 km2 - over time. For instance, the spatially frequent sequence
h(ibd:&gt;16.010)(ibd:(12.990;16.010] taxovar:(19.500;31.500])i can be interpreted
as: frequently, a high value of IBD has been register before an decrement of IDB
indicator associated to a mean value of taxonomic variety.
Spatio-sequential patters: Table 3 shows some spatio-sequential patterns
(S2P) extracted from RMC dataset using the watercourse spatialization
approach. We may con rm that we obtain a sequence of itemsets (i.e., a set of
items or events), which characterizes the evolution of a zone and its near
surrounding over time. We should remember that a zone group a set of stations
placed in a speci c watercourse and stations located in close watercourses
compose its near surrounding.</p>
      <p>For instance, in Table 3, the second S2P h( [ibd:&lt;=13.325; taxovar:&lt;=15.500;
ibgn:&lt;=4.500])i means that: often, a low values of IBD, taxonomic variety and
IBGN indicators appear together, subsequently, we can assume that the water
quality is seriously a ected in some watercourses belonging the RM water
supply agency. Moreover, the third S2P h(ibd:&gt;21.216)( taxovar:(17.500;29.500])
(ibd:(13.985;21.216])i can be interpreted by: In 30% of areas, a high value of
IBD appear before the occurrence of a means value of taxovar in a neighbor
watercourse followed by a decrement of IBD indicator.
4.1</p>
      <p>Discusion: Spatially sequential patterns vs Spatio-sequential
patterns
In this data mining process, we have focused on the extraction of spatiotemporal
patterns. In this context, we have proposed two methods allowing us include
spatial characteristics into the obtained patterns. These two techniques di er
substantially in the process and the results are semantically di erent. The rst
one uses a widely used sequential pattern mining algorithm whereas the other
uses a new method called spatio-sequential pattern mining. These two techniques
have been performed on a real database that have been pre processed in order
to divide the space into homogeneous zones following two pollution hypotheses
(see Section 3.1).</p>
      <p>The rst kind of patterns represents the evolution of a set of characteristics
- biological indicators - belonging to a set of monitoring stations grouped using
di erent spatial proximities. It is important to notice that, by applying the same
algorithm on the same database for the same minimal support but with the
two spatial division methods, we obtain two di erent sets of spatially sequential
patterns. This di erence is re ected not only in the number of extracted patterns
but also in their constitution themselves. To know which spatialization approach
is more interesting for experts, we can apply a post processing techniques like
a clustering (e.g., k-means) combined to statistical measure (e.g., sum of square
for errors).</p>
      <p>The second kind of patterns also represents changes in zones over time,
nevertheless, it includes additional information, i.e., events appeared in
neighboring areas. In this kind of extracted patterns, we can directly perceive the spatial
relationships between neighboring areas thanks to spatial operator (close to).
The extraction of this additional information impacts directly in the performance
of our algorithm since the search space increases with the number of neighbors
to be evaluated. In contrast, this additional information can be crucial in
decisions concerning the preservation and restoration of rivers and their surrounding
environments.</p>
      <p>It is important to notice that, the rst kind of patterns are included in the
second one. Indeed, the spatio-sequential pattern mining is an extension of the
spatially sequential patterns mining taking into account the neighboring areas.</p>
      <p>These two proposed approaches are generic. Indeed, we have also applied
to our approaches to other real datasets, e.g., some results for epidemic
monitoring of dengue fever and a visualization prototype are available on http:
//datamining.univ-nc.nc/.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and perspectives</title>
      <p>In this paper we have presented the rst steps of a data mining project on
hydrological data. In particular, we applied two algorithms for spatiotemporal
pattern extraction according to two spatialization approaches. Moreover, a
detailed comparison between these two data mining techniques has been included
in this work. We highlighted the problems that are posed depending on choices
made in terms of spatialization and their in uence on the number of extracted
patterns. This work has been conducted blind, i.e., without the intervention of
data specialists. The results underline the di culties involved in pre-processing
search data without a thorough knowledge of the study area in question.</p>
      <p>The perspectives of this work are numerous. First, regarding the data
processed, additional elements on water pressures are currently in acquisition phase.
Indeed, the exact determination of the condition of the watercourse requires
other indicators that are absent from the data presently studied. Then, for the
extraction phase, we would like to compare di erent data mining techniques in
terms of obtained patterns. In addition, a huge number of patterns have been
extracted. Currently, we have proposed a new quality measure called the least
temporal contradiction to lter the most relevant patterns. This measure allow
us to estimate how many times a rule is veri ed vs how many times it is disabled.
A pattern that is most frequently tested as disabled is a priori irrelevant. This
measure is being adapted to spatio-sequential patterns.</p>
      <p>We also have proposed a visualization prototype, which is available on http:
//datamining.univ-nc.nc/ and allow us to visualize a spatial dynamic of
spatio-sequential patterns extracted on dengue fever dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bogorny</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvares</surname>
            ,
            <given-names>L. O.</given-names>
          </string-name>
          ,
          <article-title>Spatial data preparation for knowledge discovery</article-title>
          .
          <source>IEEE Computer</source>
          Graphics pp.
          <volume>24</volume>
          (
          <issue>5</issue>
          ),
          <volume>8</volume>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alatrista-Salas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouvat</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmaoui-Folcher</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>The Pattern Next Door: Towards Spatio-sequential Pattern Discovery. Advances in Knowledge Discovery and Data Mining -</article-title>
          16th
          <string-name>
            <surname>Paci</surname>
          </string-name>
          c-Asia Conference, PAKDD, pp.
          <fpage>157</fpage>
          -
          <lpage>168</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <article-title>Combined mining: Discovering informative knowledge in complex data</article-title>
          .
          <source>Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          :
          <string-name>
            <surname>Cybernetics</surname>
          </string-name>
          , IEEE Transactions on
          <volume>41</volume>
          (
          <issue>3</issue>
          ), pp.
          <volume>699</volume>
          {
          <issue>712</issue>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alatrista-Salas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernesson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aze</surname>
            , J.,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouvat</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>SelmaouiFolcher</surname>
          </string-name>
          , N. and
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Recherche de sequences spatio-temporelles peu contredites dans des donnees hydrologiques</article-title>
          . Revue des Nouvelles Technologies de l'
          <source>Information (RNTI)</source>
          , RNTI-E-
          <volume>22</volume>
          , pp.
          <volume>165</volume>
          {
          <issue>188</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Celik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shekhar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shine</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Mixed-drove spatiotemporal cooccurrence pattern mining</article-title>
          .
          <source>Proc. of IEEE TKDE</source>
          ,
          <volume>20</volume>
          (
          <issue>10</issue>
          ), pp.
          <volume>1322</volume>
          {
          <issue>1335</issue>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Elias</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <article-title>Extracting landmarks with data mining methods</article-title>
          .
          <source>In: Spatial Information Theory. Foundations of Geographic Information Science</source>
          . Vol.
          <volume>2825</volume>
          of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp.
          <volume>375</volume>
          {
          <issue>389</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fayyad</surname>
            ,
            <given-names>U. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piatetsky-Shapiro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>Advances in knowledge discovery and data mining</article-title>
          .
          <source>American Association for Arti cial Intelligence</source>
          , Menlo Park, CA, USA, Ch.
          <article-title>From data mining to knowledge discovery: an overview</article-title>
          , pp.
          <volume>1</volume>
          {
          <issue>34</issue>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Temporal and Spatio-Temporal Data Mining. Gale Virtual Reference Library</article-title>
          .
          <source>IGI Pub</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            , L., and Zhang,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>A framework for mining sequential patterns from spatio-temporal event data sets</article-title>
          .
          <source>Proc. of IEEE TKDE</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ) pp.
          <volume>433</volume>
          {
          <issue>448</issue>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gibert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izquierdo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Athanasiadis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Comas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Sanchez-Marre.,
          <article-title>On the role of pre and post-processing in environmental data mining</article-title>
          .
          <source>In: The iEMSs: International Congress on Environmental Modeling and Software Integrating Sciences and Information Technology for Environmental Assessment and Decision Making</source>
          . Vol.
          <volume>3</volume>
          , pp.
          <year>1937</year>
          {
          <year>1958</year>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Koperski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stefanovic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <article-title>Geominer: a system prototype for spatial data mining</article-title>
          .
          <source>In Proc. of ACM SIGMOD, SIGMOD '97</source>
          , pp.
          <volume>553</volume>
          {
          <issue>556</issue>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Koperski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .,
          <article-title>Discovery of spatial association rules in geographic information databases</article-title>
          .
          <source>In: Advances in Spatial Databases</source>
          . Vol.
          <volume>951</volume>
          of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp.
          <volume>47</volume>
          {
          <issue>66</issue>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mennis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>J. W.</surname>
          </string-name>
          ,
          <article-title>Mining association rules in spatio-temporal data: An analysis of urban socioeconomic and land cover change</article-title>
          .
          <source>Transactions in GIS 9</source>
          (
          <issue>1</issue>
          ), pp.
          <volume>5</volume>
          {
          <issue>17</issue>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .,
          <string-name>
            <surname>Mortazavi-Asl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinto</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dayal</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            , M.-
            <given-names>C.</given-names>
          </string-name>
          ,
          <article-title>Mining sequential patterns by pattern-growth: The pre xspan approach</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>16</volume>
          (
          <issue>11</issue>
          ), pp.
          <volume>1424</volume>
          {
          <issue>1440</issue>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Knowledge discovery from soil maps using inductive learning</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          <volume>17</volume>
          (
          <issue>8</issue>
          ), pp.
          <volume>771</volume>
          {
          <issue>795</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shekhar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <source>Discovering Spatial Co-Location Patterns A Summary Of Results. Advances in Spatial and Temporal Databases</source>
          , pages pp.
          <volume>236</volume>
          {
          <issue>256</issue>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Triki</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frihida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            <given-names>Ghezala</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Claramunt</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <article-title>Modele et langage pour la manipulation de trajectoires spatio-temporelles</article-title>
          .
          <source>In: International Journal of Geomatics and Spatial Analysis IJGSA</source>
          , vol.
          <volume>20</volume>
          /1, pp.
          <volume>37</volume>
          {
          <issue>64</issue>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tsoukatos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>E cient mining of spatiotemporal patterns</article-title>
          .
          <source>In: Advances in Spatial and Temporal Databases. In Lecture Notes in Computer Science</source>
          . Springer Berlin / Heidelberg, Vol.
          <volume>2121</volume>
          , pp.
          <volume>425</volume>
          {
          <issue>442</issue>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Mining generalized spatio-temporal patterns</article-title>
          .
          <source>In Database Systems for Advanced Applications</source>
          , Springer, pp.
          <volume>649</volume>
          {
          <issue>661</issue>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>