=Paper= {{Paper |id=None |storemode=property |title=Trajectory Sequential Patterns with Regular Expression Constraints Including Spatial Queries |pdfUrl=https://ceur-ws.org/Vol-619/paper7.pdf |volume=Vol-619 |dblpUrl=https://dblp.org/rec/conf/amw/GardellaGV10 }} ==Trajectory Sequential Patterns with Regular Expression Constraints Including Spatial Queries== https://ceur-ws.org/Vol-619/paper7.pdf
  Traje tory Sequential Patterns with Regular
Expression Constraints In luding Spatial Queries
                          1                         2                           3
      Juan P. Gardella        and Leti ia I. Gómez      and Alejandro A. Vaisman


                             Universidad Na ional de La Plata
                              1

                                 jpgardelsenasa.gov.ar
                         2
                           Instituto Te nólogi o de Buenos Aires
                                   lgomezitba.edu.ar
                             3
                                Universidad de Buenos Aires
                                   avaismand .uba.ar


         Abstra t. Moving obje t (MO) data representation and omputing have
         re eived a fair share of attention over re ent years from the database
          ommunity. Repla ing raw traje tory data (i.e., MO positions at dier-
         ent time instants) by sequen es of appli ation-dependent stops o urred
         at so- alled pla es of interest (PoIs) leads to the notion of semanti tra-
         je tories. Dierent te hniques exist for sequential pattern analysis of tra-
         je tories dened in this way. One of them, RE-SPaM, expresses sequen-
         tial patterns by means of regular expressions built not only over item
         identiers, but also over onstraints dened on the (temporal and non-
         temporal) attributes of the items to be analyzed. This analysis ould be
         greatly enri hed if spatial and non-spatial data asso iated with the MO
         are taken into a ount. In this paper we show how we an take advan-
         tage of the extensibility properties of RE-SPaM to augment its expressive
         power by allowing to in lude spatial queries in the onstraints. For this,
         we make use of Piet, a framework allowing to integrate OLAP, GIS and
         MO data, and its asso iated query language denoted Piet-QL, providing
         a link between moving obje t data and their geographi environment.


1     Introdu tion

Typi al queries in Geographi         Information Systems (GIS) [10℄ ask for geomet-
ri   obje ts that satisfy some      ondition, or involve the aggregation of geographi
measures (i.e. area, length). Although it is usual in GIS pra ti e to store non-
spatial data in themati layers (also alled themes ), when aggregation is involved,
non-spatial GIS data       an be stored in a data warehouse to a hieve better perfor-
man e. Then, OLAP (On Line Analyti al Pro essing) [9℄ tools and algorithms
 an be used for exploiting the data warehouse. In OLAP, data are per eived as
a data    ube where ea h ell ontains a measure or set of measures representing
fa ts and the  ontextual information whi h onforms dimensions. A re ently pro-
posed paradigm, denoted SOLAP [11℄ states the basi            requirements for e iently
providing integration between GIS and OLAP.
     Moving obje t data (MOD) appli ations [7℄ have been steadily gaining at-
tention from the GIS          ommunity. The behavior of moving obje ts is tra eable
by means of ele troni        devi es (e.g., GPS, RFID), produ ing traje tory data,
whi h     an be analyzed in order to obtain interesting patterns. Sin e lo ations of
a moving obje t are reported as a time-ordered sequen e, sequential pattern al-
gorithms appear as natural tools for querying and mining traje tory databases.
Moving obje t positions are        aptured at a given time interval, with a       ertain
granularity. Thus, the traje tory of a moving obje t is given by samples              om-
posed of a nite number of tuples of the form < Oid, x, y, t >, stating that at a
 ertain point in time, namely t, the obje t Oid was lo ated at         oordinates (x, y).
A very a tive resear h area in this setting is the dis overy of sequential patterns
in traje tories [3, 8℄. Many re ent proposals perform this kind of analysis not
over the original MOD, but over a database built based on the ideas introdu ed
by Spa     apietra et al. [12℄, where it is assumed that obje ts move over a map
that    ontains disjoint geometries, and there is also semanti       information asso i-
ated with them in the form of attributes. These geometries are denoted Pla es
of Interest (PoIs). When a moving obje t spends a fair amount of time within a
PoI, the PoI is onsidered a stop of the traje tory, and all (x, y) points in the PoI
are repla ed by a data obje t representing a stop. This allows           onsidering ea h
obje t's traje tory as a sequen e of stops instead of a sequen e of points. Thus,
sequential pattern analysis      an be applied to this approximation of traje tories,
also    alled semanti    traje tories, given that they   an provide more information
than the one provided by the x, y, t points alone.



1.1     Problem statement and           ontributions

Several authors have emphasized the need of dis overing patterns in traje tories
at dierent temporal and/or spatial granularities. However, to the best of our
knowledge, the problem of nding sequential patterns that a           ounts for the   har-
a teristi s of the geographi     environment in whi h the obje ts move has not been
addressed so far. Here we study this problem and provide a solution, building
over previous work in the elds of SOLAP and sequential pattern mining.
      Gómez et   al. re ently introdu ed RE-SPaM, a language that            an express
sequential patterns by means of regular expressions over           onstraints dened in
terms of the (temporal and non-temporal) attributes of the items to be analyzed,
where an item is a tuple       omposed of an obje t identier, a time instant, and a
                      4
pla e of interest [5℄ . These patterns      an be used during the sequential pattern
mining pro ess to prune sequen es that, although satisfying minimum support
requirements, are not of interest to the user [2℄.
      In order to retrieve spatial data, we need a query language that         an return
                                                      5
spatial obje ts. For this, we use Piet-QL [6℄ , a language that supports the
Piet data model [4℄, a proposal aimed at integrating GIS and OLAP in a single
framework. Piet-QL is an SQL-like query language that           an express   omplex and
powerful spatial and OLAP queries, and supports the operators in luded in the
                                                6
Open Geospatial Consortium spe i ation             for SQL. In addition, it in orporates
4
  http://piet.exp.d .uba.ar/mo-patterns
5
  http://piet.exp.d .uba.ar/pietql
6
  http://www.opengeospatial.org
                                                                                      7
the ne essary syntax to integrate OLAP operations through the MDX standard .
The language resulting from the integration between RE-SPaM and Piet-QL is
                  ++
denoted RE-SPaM      . This language delivers apabilities not present in other
proposals, whose usefulness we show in this paper.
      Throughout the paper we assume we have traje tory data, originally in the
form (Oid , x, y, t), that are transformed in sequen es of PoIs. Obje ts move in a
geographi     spa e represented in a real-world map of Belgium,        onsisting in ve
layers,   ontaining geographi    information on rivers, regions, provin es, distri ts,
and     ities. The rivers are represented as polylines,   ities as points, and the other
layers as polygons. There is also a data warehouse with information about stores
and sales for dierent regions in Belgium.



1.2     Related work


Some re ent proposals address the problem of mining patterns in MODs, based
on the notion of pla es of interest. Giannotti et.al. introdu ed t-patterns, for
mining sequential patterns on regions of interest [3℄. A t-pattern is of the form
                1h10min                    2h15min
Railway Station         > Castle Square            > Museum. Karli and Saygin
[8℄ propose to obtain patterns over so- alled important pla es (a region where a
tra ed obje t spends a fair amount of time) at dierent time granularities. Note
that these patterns are basi ally dened by extension. On the          ontrary, regular
expressions allow dening more         omplex patterns, intensionally. The idea of
using regular expressions for traje tory analysis was rst proposed by Mouza and
Rigaux [1℄. They present a language based on regular expressions for querying
mobility patterns where ea h zone is represented by its label (a        onstant) or by
a variable (x). In this language, ea h o      urren e of a variable in the pattern is
instantiated with the same value. The language, however, has some limitations.
For instan e, it   annot deal with time    onstraints or    ategories, neither supports
variables or data des ribing the geographi        environment. Gómez et al. extend
this language using regular expressions to express sequential patterns over pla es
of interest [5℄, resulting in the RE-SPaM language, dis ussed in Se tion 2.
      The remainder of the paper is organized as follows. Se tion 2 gives an overview
                                                                                  ++
of the languages involved in our proposal. Se tion 3 introdu es the RE-SPaM
query language and provides examples. Se tion 4 des ribes how the sequential
pattern mining algorithm is modied to allow an e ient implementation of our
ideas. We     on lude in Se tion 6.



2      Preliminaries

Piet-QL. From the kinds of queries Piet-QL supports, we are interested in the
ones that return spatial obje ts. We introdu e Piet-QL through an example.
Consider the query Distri ts in Belgium with at least one sale in 2007.


7
    http://msdn2.mi rosoft. om/en-us/library/ms145506.aspx
SELECT GIS bel_dist.name
FROM bel_dist
WHERE bel_dist IN(
   SELECT CUBE
   filter([Store℄.[Store Distri t℄.Members,
   [Measures℄.[Unit Sales℄>0)
   FROM [Sales℄
   sli e [Time℄.[2007℄)

   Here, bel_dist represents a layer      ontaining the distri ts in Belgium. The
keyword GIS tells that the query returns spatial obje ts. The query           ontains a
sub-query of OLAP type (indi ated by the CUBE keyword), that is, a query that
returns a data   ube. This sub-query is expressed in a language whi h is a slight
variation of MDX, operates over a data     ube,   alled Sales, takes a   ube sli e   or-
responding to sales in 2007, and lters out the stores with no sales. The hierar hy
of the Store dimension in the      ube is of the form storeId -> store     ity -> store
distri t -> store provin e, meaning that store sales aggregate over      ities, distri ts
and provin e, in that order. The expression ontaining the path [Store℄.[Store
Distri t℄.Members in the sub-query returns the distri ts with at least one unit
sold. System metadata allows mat hing the identiers of the geometri            obje ts
with the identiers of the level members in the OLAP dimension (in the example
above, the members of the level Store Distri t ). Finally, we obtain the distri ts
in the layer bel_dist with at least one unit sold. A detailed des ription of the
language    an be found in [6℄.



RE-SPaM. The RE-SPaM data model is basi ally omposed of ategory s hemas,
 ategory o    urren es,   ategory instan es, and the table of items (ToI). Our
tourist appli ation in ludes four     ategory s hemas, namely hotels, restaurants,
airports and tourist attra tions. Ea h     ategory s hema is    omposed of a set of
attributes that des ribe it. An element in a   ategory is denoted a ategory o ur-
ren e, and the set of all o urren es in all ategories in an appli ation is denoted
a ategory instan e. A set of ategory instan es for our running example is shown
in Figure 1 (for example, the ategory hotels has two o urren es). A value of the
attribute geom represents the geometri extension of the orresponding ategory
o urren e. For example, in the rst tuple, pol1 an be Point(10 20). Adding a
time interval to a   ategory o    urren e, produ es an item. The time interval of
an item is des ribed by its initial and nal instants, and denoted [ts, tf ℄. A pair
(Oid , item) is a tuple in the ToI. For the same Oid , the time-ordered sequen e
of items represent the semanti  traje tory of the obje t. Figure 2 shows a nor-
malized instan e of the ToI       orresponding to the   ategory instan es of Figure
1. There are two moving obje ts, O1 and O2 , and the table         ontains only the
Oid , the   ategory o   urren e identier, and the temporal attributes. All other
attributes are stored elsewhere.
   Over this model, a pattern language based on regular expressions is built.
The atoms in RE-SPaM are          onstraints expressed as formulas over attributes
Category      Instan e
hotels         [(ID, H1), (categoryN ame, hotel), (geom, pol1), (star, 3)]
               [(ID, H2), (categoryN ame, hotel), (geom, pol2), (star, 5)]
               [(ID, R1), (categoryN ame, restaurant), (geom, pol3), (typeOf F ood, F rench), (price, cheap)]
restaurants    [(ID, R2), (categoryN ame, restaurant), (geom, pol4), (typeOf F ood, F rench), (price, expensive)]
               [(ID, R3), (categoryN ame, restaurant), (geom, pol5), (typeOf F ood, Italian), (price, cheap)]
               [(ID, A1), (categoryN ame, airport), (geom, pol6), (type, International)]
airports       [(ID, A2), (categoryN ame, airport), (geom, pol7), (type, Local)]
               [(ID, A3), (categoryN ame, airport), (geom, pol8), (type, International)]
attra tions    [(ID, C1), (categoryN ame, touristattraction), (geom, pol9), (name, Cathedralof O.L.), (price, f ree)]
               [(ID, C2), (categoryN ame, touristattraction), (geom, pol10), (name, Castleof G.theD.), (price, f ree)]



                                          Fig. 1. A set of instan es


                      OID Items
                               ([(ts,04/08/2008 14:05), (tf, 04/08/2008 14:33), (ID,R2)℄)
                               ([(ts,04/08/2008 17:30), (tf,04/08/2008 18:48), (ID,R3)℄)
                         O1
                               ([(ts,08/08/2008 06:22), (tf,08/08/2008 07:05), (ID,R1)℄)
                               ([(ts,08/08/2008 17:10), (tf,08/08/2008 18:17), (ID,R1)℄)
                               ([(ts,19/08/2008 09:00), (tf,19/08/2008 10:20), (ID,R1)℄)
                         O2
                               ([(ts,19/08/2008 17:00), (tf,19/08/2008 18:12), (ID,R2)℄)



                                Fig. 2. An instan e of the Normalized ToI




of the        omplex items dened above. Constraints                     onsist in      onjun tions of ex-
pressions, en losed between squared bra kets. The regular expression language is
built in the usual way, supporting the standard operators (`()',`*',`+',` ?',`.',`|').
The language also supports variables (strings pre eded by `').
    As an example, a pattern expressing traje tories of tourists who visit hotel
H1 and then a pla e                hara terized as ` heap' or that serves Fren h food, reads:
    [ID=`H1'℄.([pri e=` heap'℄|[typeOfFood=`Fren h'℄)
    The se ond                onstraint does not mention IDs, only                 ategori al attributes.
The disjun tion is evaluated as follows: ` heap' pla es are restaurants R1 and
R3 (Figure 1). Pla es that serve Fren h food are R1 and R2. During the mining
pro ess, the items whi h satisfy these                 onditions are        omputed, without the need
of expli it enumeration of all the possibilities.
    Fun tions are supported in RE-SPaM in the forms fun tionName(attr, ...) =
` onstant', and fun tionName(attr, ...) = variable, and                           an be dened ad-ho .
Synta ti ally, the rst parameter may be an attribute of a                             ategory o       urren e
(for example, typeOfFood in our running example), or a temporal attribute (ts,
tf, or their subparts). All other parameters must be literals, and the fun tion
also returns a literal. For example, a fun tion compares(price, c),                             ompares the
attribute pri e with a literal, and returns `equal', `less', or `greater than'; the
rst parameter is an attribute of the                     ategory o       urren es of restaurants and
tourist attra tions, and the se ond one is a                      onstant. The fun tion             an be in-
voked as compares(price,`100'). Also rollup fun tions à la OLAP                               an be dened
to return ranges of time for a temporal attribute of an item (e.g., `Early Morn-
ing', `Morning',..). The query Traje tories that visit two pla es (the se ond one
oering        heap pri es), at the same part of the day (e.g., both of them during the
morning) on O tober 10th, 2008 uses this fun tion, reading:
[rollup(ts_time,`range',`Time')=@z ∧ ts_date=`10/10/2008'℄.
[rollup(ts_time, `range',`Time')= @z ∧ ts_date=`10/10/2008' ∧
pri e=` heap'℄
    Note that RE-SPaM       ould be used as a query language over the traje tory
database, or to prune the patterns obtained during the mining pro ess.



3    The RE-SPaM++ Language

Sin e moving obje ts evolve in a geographi      environment, we would like to allow
                                                                              ++
geometri     onditions to be in luded in the patterns. We present RE-SPaM        , a
language that integrates Piet-QL and RE-SPaM allowing to add SOLAP              ondi-
tions to    onstraints in the regular expressions of RE-SPaM. Synta ti ally, this
extension is very simple: we only add a WITH statement to a Piet-QL SELECT
 lause. This statement generates a sort of materialized view that is used in a
RE-SPaM expression. Thus, the language allows not only single statements but
also programs     omprising sequen es of Piet-QL and RE-SPaM statements.
    The kinds of fun tions dis ussed in Se tion 2 are not enough to support
geometri     onditions in regular expression-based     onstraints. A Piet-QL query
returns a    ursor over tuples (i.e., a set of literals), not a literal. Thus, we need
to dene new kinds of fun tions. The syntax for these fun tions           onsists in a
rst parameter whi h      orresponds to an attribute of a    ategory o    urren e (for
example, geom ) or a temporal attribute (ts, tf, or their sub-parts). The se ond
parameter must be of the form a.b, where the semanti s is that b is the name
of an attribute, and a is the name of a table asso iated to some WITH           lause.
The fun tion returns a literal.
    For example, if a Piet-QL query returns the geometries of regions       rossed by
rivers, in a stru ture named r.geom (using the WITH          lause), we   an then use
this result to dene a fun tion that      he ks whether the value of the attribute
geom (e.g., the geometry of the PoI in our running example) is ontained by any
of the geometries in the ursor dened by r.geom. The fun tion returns `true'
or `false', and it is invoked as containedBy(geom, r.geom). We now give some
                                             ++
examples that illustrate the use of RE-SPaM     .
Q1. Traje tories that stop at a pla e whi h belongs to a region that        ontains a
river, and whose next stop is an airport or a tourist attra tion.
WITH TABLE regRiver(the_geom) AS
SELECT GIS DISTINCT(bel_regn.the_geom)
FROM bel_regn, bel_river
WHERE ontains(bel_regn.the_geom,bel_river.the_geom);

[containedBy (geom, regRiver.the_geom)=`true'℄.
([ ategoryName=`Airport'℄|[ ategoryName=`Tourist Attra tion'℄)

    The Piet-QL part returns a set of geometri      obje ts (polygons) representing
regions    ontaining rivers, in the   ursor regRiver(the_geom). In the RE-SPaM
part of the query, the rst   onstraint    he ks if the PoI is   ontained in one of the
regions in the set. In other words, when an item in the Table of Items is being
                                                                 ++
evaluated (e.g., during the mining pro ess or just using RE-SPaM     as a query
language), the    orresponding PoI geometry (represented by the attribute geom)
is   ompared against ea h of the geometri       elements in the    ursor.


Q2. Traje tories that stop at a pla e with          heap pri es, whi h is very       lose
to a distri t lo ated in a region       rossed by a river, and then at the Castle of
Gerard the Devil (G. the D.), nishing there.


WITH TABLE distri t(the_geom) AS
SELECT GIS DISTINCT(bel_dist.the_geom)
FROM bel_dist, bel_regn, bel_river
WHERE interse ts(bel_regn.the_geom, bel_river.the_geom) and
 ontains(bel_regn.the_geom, bel_dist.the_geom);
[pri e=` heap' ∧ short _distance (geom,
distri t.the_geom)=`true'℄.[name=`Castle of G. the D.'℄)
     This example also shows how the Piet-QL part of the query is used to link
the traje tories of the moving obje ts to the geographi      spa e where they evolve.
Here, the Piet-QL query returns distri ts (i.e., polygons) in a map. At evaluation
time, ea h geometry of the PoI where a traje tory stops is           ompared with the
geometry of ea h distri t in the    ursor, to    he k if the PoI being visited is    lose
to it (we are not interested in how this ` loseness' is    omputed, we just give this
query as an example to illustrate the power of the language).


Q3. Traje tories that visit a pla e with      heap pri es and then stop at an airport
(nishing there), su h that both stops are either lo ated in regions        rossed by a
river (although not ne essarily the same region), or not         rossed by rivers.
WITH TABLE reCrRi(the_geom) AS
SELECT GIS DISTINCT(bel_regn.the_geom)
FROM bel_regn, bel_river
WHERE interse ts(bel_regn.the_geom, bel_river.the_geom);
[pri e=` heap' ∧ containedBy (geom,reCrRi.the_geom)= @x℄.
[ ategoryName=`Airport' ∧ containedBy (geom,reCrRi.the_geom)=@x ℄
     In this   ase, the variable @x is of boolean type. At evaluation time, the
variable is bound to `true' or `false', and the two     onstraints are evaluated with
                                                                        ++
this value. In this example, the two       onstraints in the RE-SPaM       expression
in lude the geometri    fun tion   ontainedBy.


4     The RE-SPaM++ Algorithm

                                   ++
We explain now how RE-SPaM              is used within a sequential pattern mining al-
gorithm. The algorithm for nding frequent patterns is a variation of the one de-
     [ ontainedBy(geom,regRiver.the_geom)=`true'℄
    a                                                 b

                               [ ategoryName=

                           `Tourist Attra tion'℄      [ ategoryName=

                                                     `Airport'℄



                                              c

                                  Fig. 3. Automaton for Q1




                                                                  ++        8
s ribed in detail in [5℄. The input to the algorithm is an RE-SPaM   program ,
and a value for the minimum support required for the dis overed sequen es. In
a nutshell, the support of a sequential pattern is the number of sequen es in
the ToI that satisfy su h pattern, out of the total number of sequen es in the
database. During evaluation, the dis overed patterns are restri ted to the ones
                       ++
satisfying the RE-SPaM     expression. In what follows we fo us on the hanges
introdu ed on the original algorithm in order to support the new features of RE-
      ++
SPaM     , although to make the explanation self- ontained we need to briey go
over the RE-SPaM algorithm. For the sake of                       larity, we pro eed by means of
an example, using query Q1 from Se tion 3.
     As a rst step, a deterministi               nite automaton (DFA) that a         epts the
language generated by the regular expression (RE) is built. The DFA for the
         ++
RE-SPaM      part of Q1 is shown in Figure 3. The labels of the edges of the
automaton are       onstraints that must be satised by the sequen e that is being
evaluated at ea h step of the algorithm that we des ribe next. The evaluation
of the RE pro eeds in in remental phases, building, at ea h step,                  andidate sets
Ci of sequen es of length i. In short, in the rst step it builds the set C1 with
 andidate sequen es (i.e. sequen es of POIs) of length one. The automaton is
used to prune      andidate sequen es that do not satisfy expressions not involving
temporal attributes. At the nal phase of the step, the ToI is queried to dete t
whi h sequen es must be dis arded be ause they do not have the          minimum
support. During a step k , the self-join of Ck−1 is used to produ e Ck , and then
the automaton and the ToI are, again, used for pruning. Temporal attributes
and variables in the       onstraints     an be evaluated at dierent moments. We            an
either: (a) postpone their evaluation to the nal phase of the algorithm; or, (b)
evaluate them as soon as possible to redu e the size of the intermediate               andidate
sequen es Ck . This is the approa h we follow.
     Sin e the algorithm pro eeds in in remental steps, fun tions must be re-
peatedly evaluated. Besides the need of evaluating if a portion of a                   andidate
sequen e satises some     onstraint labeling the edges of the automaton, we need
                                                                    ++
to take    are e iently of the expressions introdu ed in RE-SPaM      programs,
namely the      ursors explained in the previous se tion. To a               omplish an e ient
implementation we borrow ideas from dynami                    programming te hniques, where,

8
    We denote RE-SPaM++ program the Piet-QL and the regular expression parts al-
    together, and RE-SPaM++ the part orresponding to the regular expression.
on e a fun tion is evaluated with some parameters its result is stored in a      a he
avoiding re al ulation. Two types of     a hing    an be exploited, whi h we denote
ma ro and mi ro a he, respe tively. The former stores the result of a fun tion
of type f n(`value', tableN ame.attribute) (that is, geometri     fun tions returning
sets). The latter stores the result of a fun tion of the form f n(`value', `literal ',
...). We explain these ideas by means of an example.
   Consider query Q1 from Se tion 3, asking for traje tories that pass through
a pla e whi h belongs to regions that     ontain a river and nish at an airport or
a tourist attra tion. The Piet-QL part of the query returns regions in the      ursor
denoted regRiver.the_geom. Belgium is             omposed of three regions: Vlaams
Gewest, Brussel-Hoofdstad, and the Wallonne. Brussel-Hoofdstad is      rossed by
the Kanaal van Charleroi river, but this river is not    ontained by the region.
Vlaams Gewest        ontains the Ieperlee river and the Wallonne region ontains
the Ourthe O     le river. Thus, both regions are in the    ursor after the Piet-QL
query is evaluated. Now, we move on to the RE:


[containedBy (geom, regRiver.the_geom)=`true'℄.
([ ategoryName=`Airport'℄|[ ategoryName=`Tourist Attra tion'℄)
   The algorithm starts by building C1 with all the      ategory o   urren es. Then,
it uses the automaton for pruning: if a     andidate sequen e does not satisfy any
path in the automaton, it is pruned. We have three paths of length 1 in the
automaton (see Figure 3). The expression [      ategoryName = `Airport'℄ is sat-
ised by the three airport o     urren es of Table 1 (A1 though A3). The expres-
sion [ategoryName=`Tourist Attra tion'℄ is satised by the astle and the
               urren es C1 and C2). The third expression, [containedBy (geom,
 athedral (i.e., o
regRiver.the_geom)℄ must be evaluated for the ategory o urren es of ho-
tels and restaurants, to    he k whi h PoIs are in regions rossed by rivers (re-
                                               ++
turned by the Piet-QL part of the RE-SPaM          program). Given that the ex-
pression    ontains a fun tion, the    a he is used for this evaluation. The engine
rst looks up in the ma ro- a he (implemented as a hash table) for the value
asso iated with the key containedBy (`pol1', regRiver.the_geom). No value is
retrieved in this    ase, and the fun tion starts browsing the       ursor. The rst
tuple in the    ursor is   h‘V laams   Gewest'i. Instead of evaluating the fun -
tion, the   mi ro- a he is now queried for the value asso iated with the key
containedBy (`pol1', `V laams Gewest'). We know that `pol1' (the geometry of
Hotel H1) is    ontained in the Vlaams Gewest region. Thus, the asso iation be-
tween containedBy(`pol1', `V laams Gewest') and `true' is stored in the mi ro-
 a he. Sin e the value returned by the fun tion is `true', there is no need to
 ontinue browsing the      ursor. Moreover, the ma ro- a he is also updated, asso-
 iating the key containedBy (`pol1', regRiver.the_geom) with the value `true'
(see below how this is used in the next step). Next, we evaluate evaluation
containedBy (`pol2', regRiver.the_geom). Again, nothing is retrieved from the
ma ro    a he, therefore we must browse the       ursor. The engine looks up in the
mi ro- a he for a value asso iated with containedBy (`pol2', `V laams Gewest').
Sin e nothing is retrieved, the fun tion is evaluated, returning the value `false',
and this asso iation is stored in the mi ro- a he. Finally, the    andidate sequen e
                                       IDs
                                                  IDs
                                       A1
                                                  A1
                                       A2
                                                  A2
                                       A3
                                                  A3
                                       C1
                                                  C1
                                       C2
                                                  C2
                                       H1
                                                  H1
                                       H2
                                                  R1
                                       R1
                                                  R2
                                       R2
                                                  R3
                                       R3


    Fig. 4. Computing C1 : before (left) and after (right) pruning with automaton



                                      IDs           IDs
                                    A1 A1         A2 A1
                                    A1 A2         A2 A2
                                    ... ...       ... ...
                                    A1 R3         C1 A1
                                    ... ...       ... ...
                                    A2 R3         H1 A1
                                    ... ...       H1 A2
                                    H1 A1         ... ...
                                    H1 A2         R2 A3
                                    ... ...       R2 C1
                                    R3 R3         ... ...


Fig. 5. C2 before (left: 81 tuples), and after (right: 40 tuples) pruning with automaton




H2 is dis arded sin e it     an not satisfy any path of length one in the automaton.
Thus, all sequen es     ontaining H2 are pruned. Analogously, the system nds
out if `pol3', `pol4' and `pol5' are          ontained in some tuple of the     ursor re-
gRiver.the_geom, by taking advantage of the ma ro and mi ro               a hes. Figure 4
shows the initial and nal states of C1 . In the nal phase of the rst step (the
step with k = 1), C1 is analyzed against the ToI to           he k for minimum support.

   For the se ond step, suppose that all           andidate sequen es of C1 are main-
tained (i.e., they will be    he ked for minimum support). Thus, the system pop-
ulates the set C2 in the se ond step by joining C1 with itself. This self-join is
the one typi ally used in sequential pattern mining [13℄, adapted to the           ase of
itemsets of length 1. Two tuples t1 and t2 mat h in the join Ck−1 ⋊
                                                                  ⋉ Ck−1 , if the
last k − 2 of the items in t1   oin ide with the rst k − 2 items in t2 . Then, a new
tuple is formed as the union of the items in t1 and t2 . Similarly to the rst step,
using the automaton the engine determines the               andidate sequen es of length
two that are satised by some path of length two and dis ards the rest. To op-
timize the evaluation of fun tions we use the ma ro- a he. When de iding if the
 andidate sequen e of length 2 {H1A2} is a              epted by some path of length two
in the automaton, the fun tion containedBy(`pol1', regRiver.the_geom) must
be evaluated. But now, this key and its asso iated value            an be obtained from
the ma ro- a he, sin e its value was           al ulated in the previous step, and this
 a he was updated a        ordingly. The pro ess        ontinues until a step k su h that
no sequen es of length k exist in the ToI. In Figure 5 we show the state of C2
before and after pruning.
                     Fig. 6. Implementation tool: output window




5     Implementation and Experimental Results

                               ++
We implemented RE-SPaM              and in orporated this language into the Piet
                9
framework [4℄ . Figure 6 depi ts the graphi        output of the system. Patterns
of length two (i.e., involving three PoIs) are shown. The thi kness of the lines
ree ts the relative support of the dis overed pattern, and the edges are labeled
with the number of asso iated traje tories. For example, pattern [r3,a3, 3℄ has
447 o    urren es, and the lines are mu h thi ker than the ones      orresponding to
pattern [r1,h3, 1℄, whi h has only 3.
     Preliminary experiments were aimed at assessing the impa t of introdu ing
geometri fun tions in regular expressions. Due to spa e limitation we do not give
a detailed report of the experimental results, but a general       omment on them.
Experiments showed that the time overhead introdu ed by the parsing of the
Piet-QL part of a RE-SPaM query is negligible. With respe t to exe ution time,
we know that it depends on several parameters, like the size of the intermediate
Ck sets, minimum support, and the kinds of expressions used in the query, among
other fa tors. Sin e GSP nds all frequent sequen es in a database, and RE-
      ++
SPaM      restri ts these sequen es to the ones mat hing the regular expressions,
adding a spatial predi ate to a     onstraint in general results in a lower number
of sequen es. Thus, two opposite ee ts appear: on the one hand the use of
fun tions requires less memory spa e due to the smaller size of intermediate
Ck ; on the other hand, fun tion evaluation impa ts over exe ution time. Our
hypothesis was that the former redu tion         ompensates the    ost of evaluating
the fun tion. This hypothesis was      onrmed by our experiments. Moreover, the

9
    A demo is available at http://piet.exp.d .uba.ar/extendedrespam/installation.pdf.
overall performan e of the algorithm does not de rease substantially when using
the   a hed fun tions and exe ution times appear        ompatible with user needs.
For low values of support, the number of frequent sequen es obtained in reases,
therefore the exe ution times also in rease be ause of the higher number of
iterations (and the size of intermediate Ck sets).



6     Con lusion

We have presented a language that allows to in lude spatial OLAP queries in
regular expressions that are used during sequential pattern mining for pruning
sequen es that are of no interest to the user. We believe this is a relevant feature
when dealing with sequential patterns in traje tory databases, and, to the best
of our knowledge, no proposal of this kind has been introdu ed so far in the eld.
Our experimental results show that this language enhan ement is not a hieved
at the expense of algorithm exe ution times.



Referen es

 1. C. du Mouza and P. Rigaux. Mobility patterns. In Pro eedings of the STDBM'04,
    pages 1  8, Toronto, Canada, 2004.
 2. M. N. Garofalakis, R. Rastogi, and K. Shim. Mining sequential patterns with
    regular expression onstraints. In IEEE Transa tions on Knowledge and Data
    Engineering, 2002.
 3. F. Giannotti, M. Nanni, D. Pedres hi, and F. Pinelli. Traje tory pattern mining.
    In KDD'07, pages 667680, 2007.
 4. L. Gómez, S. Haesevoets, B. Kuijpers, and A. A. Vaisman. Spatial aggregation:
    Data model and implementation. Information Systems, 34:551576, 2009.
 5. L. I. Gómez and A. A. Vaisman. E ient onstraint evaluation in ategori al
    sequential pattern mining for traje tory databases. In EDBT, pages 541552,
    2009.
 6. L. I. Gómez, A. A. Vaisman, and S. Zi h. Piet-ql: a query language for gis-olap
    integration. In GIS, page 27, 2008.
 7. R. H. Güting and M. S hneider. Moving Obje ts Databases. Morgan Kaufman,
    2005.
 8. S. Karli and Y. Saygin. Mining periodi patterns in spatio-temporal sequen es at
    dierent time granularities. Intelligent Data Analysis, 13(2):301335, 2009.
 9. R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to
    Dimensional Modeling, 2nd. Ed. J.Wiley and Sons, In , 2002.
10. P. Rigaux, M. S holl, and A. Voisard. Spatial Databases. Morgan Kaufmann, 2002.
11. S. Rivest, Y. Bédard, and P. Mar hand. Modeling multidimensional spatio-
    temporal data warehouses in a ontext of evolving spe i ations. Geomati a, 55
    (4), 2001.
12. S. Spa apietra, C. Parent, M. L. Damiani, J. A. Fernandes de Ma edo, F. Porto,
    and C. Vangenot. A on eptual view on traje tories. Data Knowl. Eng., 65 (1):126
    146, 2008.
13. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and perfor-
    man e improvements. In Pro . of the Fifth Int'l Conferen e on Extending Database
    Te hnology (EDBT), 1996.