Mining Long, Sharable Patterns in Trajectories of
                      Moving Objects

                                Gyozo Gidófalvi / Torben Bach Pedersen
                                       Geomatic aps / Aalborg University
                                       gyg@geomatic.dk / tbp@cs.aau.dk


                     Abstract                                   management of trajectories. Second, they can be used
                                                                to facilitate various Location–Based Services (LBS).
    The efficient analysis of spatio–temporal data,             One LBS example is an intelligent rideshare applica-
    generated by moving objects, is an es-                      tion, which finds sharable routes for a set of commuters
    sential requirement for intelligent location–               and suggests rideshare possibilities to them, is consid-
    based services. Spatio-temporal rules can be                ered. Such a rideshare application can be one possible
    found by constructing spatio–temporal bas-                  solution to the ever increasing congestion problems of
    kets, from which traditional association rule               urban transportation networks.
    mining methods can discover spatio–temporal
                                                                    Patterns in trajectories for an intelligent rideshare
    rules. When the items in the baskets are
                                                                application are only interesting if those patterns are
    spatio–temporal identifiers and are derived
                                                                sharable by multiple commuters, are reoccurring fre-
    from trajectories of moving objects, the dis-
                                                                quently, and are worthwhile pursuing, i.e., are long
    covered rules represent frequently travelled
                                                                enough for the savings to compensate for the coordi-
    routes. For some applications, e.g., an intel-
                                                                nation efforts. The discovery of Long, Sharable Pat-
    ligent ridesharing application, these frequent
                                                                terns (LSP) in trajectories is difficult for several rea-
    routes are only interesting if they are long
                                                                sons. Patterns do not usually exist along the whole
    and sharable, i.e., can potentially be shared
                                                                trajectory. As a example, consider two commuters A
    by several users. This paper presents a data-
                                                                and B living in the same area of town, leaving for
    base projection based method for efficiently
                                                                work approximately the same time, and working in
    extracting such long, sharable frequent routes.
                                                                the same part of town. Given the underlying road
    The method prunes the search space by mak-
                                                                network and traffic conditions, for a given support
    ing use of the minimum length and sharable
                                                                threshold the middle part of the trips of the two com-
    requirements and avoids the generation of
                                                                muters may be frequent, the initial and final parts may
    the exponential number of sub–routes of long
                                                                not. In recent work [4] a general problem transfor-
    routes. A SQL–based implementation is de-
                                                                mation method, called pivoting, was proposed for the
    scribed, and experiments on real life data
                                                                analysis of spatio–temporal data. Pivoting is the pro-
    show the effectiveness of the method.
                                                                cess of grouping a set of records based on a set of
                                                                attributes and assigning the values of likely another
1   Introduction                                                set of attributes to groups or baskets. Pivoting ap-
In recent years Global Positioning Systems (GPS) have           plied to spatio–temporal data allows the construction
become increasingly available and accurate in mobile            of spatio–temporal baskets, which can be mined with
devices. As a result large amounts of spatio–temporal           traditional association rule mining algorithms. When
data is being generated by users of such mobile devices,        the items in the baskets are spatio–temporal identifiers
referred to as moving objects in the following. Tra-            and are derived from trajectories, the discovered rules
jectories of moving objects, or trajectories for short,         represent frequently travelled routes. While there ex-
contain regularities or patterns. For example, a per-           ist several efficient association rule mining methods [6],
son tends to drive almost every weekday to work ap-             the straight–forward application of these algorithms to
proximately at the same time using the same route.              spatio–temporal baskets representing trajectories is in-
The benefits of finding such regularities or patterns is        feasible for two reasons. First, all sub–patterns of fre-
many–fold. First, such patterns can help the efficient          quent patterns are also frequent, but not interesting,
                                                                as longer patterns are preferred. Second, the support
Proceedings of the third Workshop on STDBM                      criterion used in association rule mining algorithms
Seoul, Korea, September 11, 2006                                is inadequate for a rideshare application, i.e., a fre-


                                                           49
quent itemset representing a frequent trajectory pat-            and mining CFIs are described. While CFI mining
tern, may be supported by a single commuter on many              methods can be modified to find the desired solution
occasions and hence presents no rideshare opportunity.           that meets the sharable criterion, they employ com-
   In this paper, to overcome the above difficulties of          plex data structures and their implementation is quite
finding LSPs in trajectories, a novel method is given.           involved; hence their augmentation is difficult. In par-
According to a new support criterion, the proposed               ticular, a projection-based CFI mining algorithm that
method first efficiently filters the trajectories to con-        employs an in–memory FP-tree to represent itemsets,
tain only sub-trajectories that are frequent. Next, it           would need to be modified at every node to maintain
removes trajectories that do not meet the minimum                a set of distinct objects at that have transactions as-
length criterion. Then it alternates two steps until             sociated with them that support the itemset that is
there are undiscovered LSPs. The first step entails              represented by the node. In comparison, the herein
the discovery of a LSP. The second step entails the fil-         presented method –building on work presented in [11]–
tering of trajectories by the previously discovered pat-         exploits the power of commercial RDBMSs, yielding a
tern. An advantage of the proposed method is the ease            simple, but effective solution.
of implementation in commercial Relational Database
                                                                    Since trajectories are temporally ordered sequences
Management Systems (RDBMSes). To demonstrate
                                                                 of locations, sequential pattern mining [3] naturally
this, a SQL–based implementation is described. The
                                                                 comes to mind. However, a straight forward inter-
effectiveness of the method is demonstrated on the
                                                                 pretation of trips as transactions and application of
publicly available INFATI data, which contains tra-
                                                                 a state–of–the–art closed frequent sequential pattern
jectories of cars driving on a road network.
                                                                 mining algorithm [15] does not yield the desired so-
   The herein presented work is novel in several as-             lution, since in this case sequences of frequent sub-
pects. It is the first to consider the problem of mining         trajectories would be found. Furthermore, since the
LSPs in trajectories. It describes a novel transforma-           trajectories can contain hundreds of items, closedness
tion, and the relationship between the problem of min-           checking of frequent itemsets even for prominent meth-
ing LSPs in trajectories and mining frequent itemsets.           ods would be computationally expensive. Interpret-
Finally, it describes an effective method with a simple          ing single elements of trajectories as transactions and
SQL–implementation to mine such LSPs in trajecto-                applying closed sequential pattern mining could find
ries.                                                            frequent sub–trajectories. However a number of prob-
   The remainder of the paper is organized as follows.           lems arise. First, to meet the sharable criterion, the
Section 2 reviews related work. Section 3 describes the          in–memory data structures would need similar, non-
transformation, the use of the framework in frequent             trivial augmentation as described above. Second, since
itemset mining, and formally defines the task of min-            patterns in trajectories could be extremely long, even
ing LSPs in trajectories. he proposed algorithm and a            state–of–the–art sequential mining methods [12, 15]
SQL–based implementation for mining such patterns                would have a difficulties handling patterns of such
is described in Section 4. Section 5 presents experi-            lengths. Third, patterns in trajectories repeat them-
mental results. Finally Section 6 concludes and points           selves, which cannot be handled by traditional se-
to future research.                                              quential pattern mining algorithms. The extraction
                                                                 of spatio–temporal periodic patterns from trajectories
2   Related work                                                 is studied in [9], where a bottom–up, level–wise, and
                                                                 a faster top–down mining algorithm is presented. Al-
Frequent pattern mining is a core field in data min-
                                                                 though the technique is effective, the patterns found
ing research. Since the first solution to the problem
                                                                 are within the trajectory of a single moving object. In
of frequent itemset mining [1, 2], various specialized
                                                                 comparison, the herein presented method effectively
in–memory data structures have been proposed to im-
                                                                 discovers long, sharable, periodic patterns.
prove the mining efficiency, see [6] for an overview. It
has been recognized that the set of all frequent item-              Moving objects databases are particular cases of
sets is too large for analytical purposes and the infor-         spatio–temporal databases that represent and man-
mation they contain is redundant. To remedy this, two            age changes related to the movement of objects. A
modification to the task have been proposed: mining              necessary component to such databases are specialized
of Closed Frequent Itemsets (CFI) and mining of max-             spatio–temporal indices such as the Spatio–Temporal
imal frequent itemsets. A frequent itemset X is closed           R–tree (STR–tree) and Trajectory–Bundle tree (TB–
if no itemset Y exists with the same support as X such           tree) [7]. An STR-tree organizes line segments of a tra-
that X ⊂ Y . A frequent itemset X is maximal if no               jectory according to both their spatial properties and
frequent itemset Y exists such that X ⊂ Y . Prominent            the trajectories they belong to, while a TB–tree only
methods that efficiently exploit these modifications to          preserves trajectories. If trajectories are projected to
the problem are MAFIA, GenMax, CLOSET(+), and                    the time–of–day domain, STR–tree index values on
CHARM [6]. Later in the paper, a relationship be-                the projected trajectories could be used as an alter-
tween the problems of mining LSPs in trajectories                native representation of trajectories. While this ap-


                                                            50
proach would reduce the size of the problem of mining


                                                                                                                                                                                           time−dimension / time−of−day domain
                                                                                                                                                                                 trip 1


                                                                     time−dimension / date−time domain
                                                                                                                                                                                 trip 2

LSPs in trajectories, it would not solve it. In compar-                                                                                                                          trip 3
                                                                                                                                                                                                                                 8:35
                                                                                                                                                                                                                                 8:30

ison, the herein presented method solves the problem                                                     Wed 8:00                                                                                                                8:00
                                                                                                                                                                                                                                 8:00

of mining LSPs in trajectories, which is orthogonal,                                                       Tu 8:00                d < δ → end of trip 1
                                                                                                                                      k
                                                                                                                                                                                                                                 8:15
                                                                                                                                                                                                                                 8:10

but not unrelated to indexing of trajectories.                                                                                                                                                                                   8:05
                                                                                                                                                                                                                                 8:00
                                                                                                                                                                                                                                   15
                                                                                                         Mon 8:00                                                                                                                                            spatio−temporal region
   In [13] a way to effectively retrieve trajectories in                                                       15
                                                                                                                     10
                                                                                                                                              d > δ → start of trip 1
                                                                                                                                               k
                                                                                                                                                                                                                                        10
                                                                                                                                                                                                                                             5                                             40
the presence of noise is presented. Similarity func-                                                 y−dimension          5                                                 35    40                                                                                             30   35
                                                                                                                                                                  25   30                                                                                              20   25
                                                                                                                              0           5        10   15   20                            y−dimension                                           0       5   10   15
                                                                                                                                  0                                                                                                                  0
                                                                                                                                                        x−dimension                                                                                                x−dimension

tions, based on the longest sharable subsequence, are
                                                                    (a) Identification of trips in                                                                                        (b) Time–of–day projection and
defined, facilitating an intuitive notion of similarity             raw trajectories                                                                                                      spatio–temporal region substi-
between trajectories. While such an efficient similar-                                                                                                                                    tution
ity search between the trajectories will discover similar                                                        Figure 1: From trajectories to transactions
trajectories, the usefulness of this similarity in terms of
length and support would not be explicit. In compar-                Projection of the temporal dimension
ison, there herein proposed method returns only pat-                   Since frequent patterns within a single object’s tra-
terns that meet the user–specified support and length               jectory are expected to repeat themselves daily, the
constraints. Furthermore, the trajectory patterns re-               temporal dimension of the so identified trips is pro-
turned by our method are explicit, as opposed to the                jected down to the time–of–day domain. This pro-
only implicit patterns contained in similar trajectories.           jection is essential to discover the daily periodic na-
                                                                    ture of patterns in trajectories. Mining patterns with
3     Long, sharable patterns in trajecto-                          other periodicity can be facilitated by projections of
      ries                                                          the temporal domain to appropriate finer, or coarser
                                                                    levels of granularity. Finer levels of granularity can be
The following section describes a novel transformation              used to detect patterns with shorter periodicity. For
of raw trajectories. This transformation allows (1) the             example, a delivery person might use a different route
formulation of the problem of mining LSPs in trajec-                depending on the time–of–hour knowing that at the
tories in a framework similar to that used in frequent              given time of the hour certain traffic conditions arise,
itemset mining, (2) to establish a relationship between             which make an otherwise optimal delivery route sub–
the two problems.                                                   optimal. The detection of these patterns in delivery
                                                                    routes requires the projection of the temporal dimen-
3.1   From trajectories to transactions                             sion to the time–of–hour domain. Conversely, coarser
The proposed transformation of raw trajectories con-                levels of granularity can be used to detect patterns
sists of three steps: identification of trips, projection of        with longer periodicity. For example, a person might
the temporal dimension, and spatio–temporal region                  visit his bank only at the end of pay periods. The de-
substitution. It is assumed that locations of moving                tection of this pattern requires the projection of the
objects are sampled over a long history. That is, a                 temporal dimension to the day–of–month domain. Fi-
raw trajectory is a long sequence of (x, y, t) measure-             nally, to discover the pattern that the above mentioned
ments at regular time intervals.                                    person makes these visits to his bank Saturday morn-
                                                                    ings following the end of pay periods, requires the pro-
Identification of trips                                             jection of the temporal domain to a combination of the
    A trip is a temporally consecutive set or se-                   day–of-month, the day–of-week, and the part–of-day
quence of measurements such that for any measure-                   domains. Performing different projections is part of
ment mi in the sequence, the sum of spatial dis-                    the inherently iterative and only semi-automatic pro-
placement during the k measurements immediately                     cess of doing data mining when the exact format of the
following mi , denoted dk , is larger than some user–               patterns searched for is not known beforehand. Figure
defined displacement, δ. Trips can be identified                    1(b) shows the projection of the temporal dimension
in a straight–forward manner by linearly scanning                   to the time–of–day domain for the three trips identi-
through a trajectory, and calculating dk using a look–              fied in Figure 1(a). Since the projection of a single
ahead window of k measurements. That is, scanning                   database record is a constant operation, the total pro-
through the total trajectory from the beginning, the                cessing time of this transformation step is optimal and
first measurement for which dk > δ, signals the begin-              linear in the number of database records.
ning of the first trip. Consecutive measurements are
part of this trip until a measurement is reached for                Spatio–temporal region substitution
which dk ≤ δ, which signals the end of the first tra-                  Trajectories are noisy. One source of this noise is
jectory. Trips following the first trip are detected in             due to imprecise GPS measurements. From the point
the same fashion from the remaining part of the total               of view of patterns in such trajectories, slight devia-
trajectory. Figure 1(a) shows three example trips that              tion of trajectories from the patterns can be viewed as
are derived from the total trajectory of one moving                 noise. Examples of such deviations could be due to a
object.                                                             few minute delay, or to the usage of different lanes on


                                                               51
       OBJ. 1 (2)          OBJ. 2 (3)           OBJ. 3 (4)   OBJ. 4 (5)       OBJ. 5 (6)        8:25 ± 2.5 minutes, respectively. In the following a
                                                                                                spatio-temporal region will be referred to by its con-
                                                                                                catenated values of the cell identifiers along the x– and
                                                     8:20                         8:35
A              8:05                                                                             y–axis, and the corresponding time instance denoting
                                                                                                the center of the time interval of the spatio–temporal
                              8:10      8:15                  8:25
B                                                                         8:30                  region. Hence, trips associated with object 3 will be
                                                  8:20                                          denoted by the a sequence {HD8:05, HC8:10, HB8:15,
      8:00                           8:10
                                                                                                IB8:20, IC8:25}. Furthermore, the trajectory database
C                   8:05                             8:25    8:25                8:35
                                                                                                T is assumed to be in a relational format with schema
                                                                                                hoid, tid, itemi, where item is a single item, that is
D              8:00          8:05        8:05                                                   part of the transaction tid associated with object oid.
                                                                                                Hence, each of the four trips of object 3 is represented
      E         F            G            H              I    J           K       L             by 5 unique rows in T .
 Figure 2: Illustration of the sample trajectory DB
                                                                                                3.3   Problem statement
the route. Hence, while a person might be driving from
home to work at approximately the same time of day                                              After performing the three above transformation steps,
using approximately the same route, the chance of two                                           the data set can be represented in a database T con-
identical trajectories is highly unlikely. Consequently,                                        taining tuples hoid, tid, si, where oid is an object iden-
patterns in raw trajectories are few and certainly not                                          tifier, tid is a trip identifier, and s is a sequence
long. Thus, patterns have to be mined in trajecto-                                              of spatio–temporal region identifiers. Since spatio–
ries represented in a generalized way, yielding general                                         temporal region identifiers contain a temporal com-
patterns in trajectories. To achieve this generalization                                        ponent, the sequence s can, without loss of informa-
of trajectories, individual (x, y, t) measurements of a                                         tion, be represented as a set of spatio–temporal region
trajectory are discretized and mapped to the spatio–                                            identifiers. Conforming to the naming convention used
temporal regions they fall into. Thus, a generalized                                            in the frequent itemset mining framework, a spatio–
trajectory is constructed by substituting (x, y, t) mea-                                        temporal region identifier will be equivalently referred
surements with the spatio–temporal regions they map                                             to as an item, and a sequence of spatio–temporal re-
to. If within a trajectory multiple (x, y, t) measure-                                          gion identifiers will be equivalently referred to as a
ments map to the same spatio–temporal region, they                                              transaction. Let X be a set of items, called an item-
are substituted with a single instance of the corre-                                            set. A transaction t satisfies an itemset X iff X ⊆ t.
sponding spatio–temporal region. The box in Fig-                                                Let STX denote the set of transactions that satisfy X.
ure 1(b) represents such a spatio–temporal region.                                              The following definitions are emphasized to point out
Since spatio–temporal substitution of a single data-                                            the differences between the frequent itemset mining
base record can be achieved using simple arithmetics                                            framework and the one established here.
from the spatial and temporal coordinates, the pro-                                             Definition 1 The n–support of an itemset X in
cessing time of this transformation step is optimal and                                         T , denoted as X.supp(n), is defined as the number of
linear in the number of database records.                                                       transactions in STX if the number of distinct oids as-
                                                                                                sociated with the transactions in STX is greater than
3.2    Example trajectory database                                                              or equal to n, and 0 otherwise. The n–support of
                                                                                                an item i in T , denoted as i.supp(n), is equal to the
Figure 2 visualizes a sample trajectory database. It                                            n–support of the itemset that contains only i.
shows the trajectories of trips of 5 moving objects,
which were derived using the three transformation                                               Definition 2 The length of an itemset X, denoted
steps described in Section 3.1. For clarity, the tem-                                           as |X|, is defined as the number of items in X.
poral dimension is projected down to the 2D–plane.
Spatio–temporal regions are defined by the square cells                                         Definition 3 An itemset X is n–frequent in T if
and a five minute interval centered around time in-                                             X.supp(n) ≥ MinSupp, and X is long if |X| ≥
stances written inside the square. Each connected line                                          MinLength, where MinLength, MinSupp, and n are
represents specific trips of a particular object. The                                           user–defined values.
number of times that trip was performed by the ob-
                                                                                                Definition 4 An itemset X is n–closed if there ex-
ject is represented in the width of the line, and is also
                                                                                                ists no itemset Y such that X ⊂ Y and X.supp(n) =
written in parenthesis next to the object name in the
                                                                                                Y.supp(n).
legend. For example, the trip trajectory associated
with object 3 was performed 4 times by the object.                                                 The task of mining LSPs in trajectories can be de-
The object was in spatial regions HD, HC, HB, IB, and                                           fined as finding all long, n–closed, n–frequent itemsets.
IC during time intervals 8:05 ± 2.5 minutes, 8:10 ± 2.5                                         Itemsets that meet these requirements are also referred
minutes, 8:15 ± 2.5 minutes, 8:20 ± 2.5 minutes, and                                            to as LSPs, or just patterns.


                                                                                           52
      OBJ. 1 (2)          OBJ. 2 (3)           OBJ. 3 (4)        OBJ. 4 (5)       OBJ. 5 (6)           OBJ. 1 (2)          OBJ. 2 (3)          OBJ. 3 (4)   OBJ. 4 (5)       OBJ. 5 (6)        OBJ. 1 (2)     OBJ. 2 (3)          OBJ. 3 (4)    OBJ. 4 (5)       OBJ. 5 (6)


                                                                                      8:35                                                                                      8:35                                                                                8:35
 A                                                                                                A                                                                                        A
                                                                                                                                                                                                                                                                    (0,8)

                             8:10      8:15                      8:25                                                         8:10      8:15        8:20    8:25                                                8:10       8:15        8:20    8:25      8:30
 B                                                        8:20
                                                                              8:30                B                                                                      8:30              B
                                                                                                                                                                                                                (5,11)     (5,11)      (5,5)   (5,11)   (5,11)

                                                                                                                                                                                                      8:05
                                    8:10
 C                 8:05                                                                           C                 8:05                                                                   C          (5,5)


     E        F             G              H          I          J            K        L              E         F             G           H           I     J            K       L             E        F       G           H            I       J           K       L

     (a) The sample DB after STEP 1                                                                   (b) The sample DB after STEP 2                                                      (c) STEP 3: Item–conditional sample DB
                                                                                                                                                                                          TF |FC8:05 and pattern discovery
                                                                                               Figure 3: Illustration of steps 1, 2, and 3
4    Projection–based LSP mining                                                                                                                     the second step of the method further filters TFV
                                                                                                                                                     and constructs TF that only contain transactions that
Now let us turn to the description of the proposed me-
                                                                                                                                                     have at least MinLength number of items.
thod for mining LSPs in trajectories. This description
                                                                                                                                                        The second step can be formulated in one SQL
is based on a number of observations, each of which
                                                                                                                                                     statement:
is associated with a particular step in the method. In
related technical report [5], these observations are also                                                                                            INSERT INTO TF (tid, oid, item)
stated as lemmas, and their corresponding proofs show                                                                                                SELECT tid, oid, item FROM TFV
the correctness and completeness of the method. To                                                                                                   WHERE tid IN
demonstrate the simplicity of the implementation in                                                                                                      (SELECT tid FROM TFV GROUP BY tid
a RDBMS, for each step a simple SQL–statement is                                                                                                           HAVING COUNT(item) >= MinLength)
given. The effect of each step is also illustrated on                                                                                                The sub–select is used to find trip identifiers that have
the previously introduced sample trajectory database                                                                                                 at least MinLength number of items. The outer part
assuming MinLength = 4, MinSupp = 2, and n = 2.                                                                                                      of the statement selects all records belonging to these
STEP 1: Filtering infrequent items                                                                                                                   trip identifiers and inserts them into TF.
   Items, i.e., spatio–temporal regions that are not fre-                                                                                               The effects of the second step are illustrated in Fig-
quent in T cannot be part of a LSP. Hence as first step                                                                                              ure 3(b). In particular, the remaining sharable parts
of the method, T is filtered such that it contains items                                                                                             of trips belonging to objects 3 and 5 are deleted, be-
with n–support larger than or equal to MinSupp.                                                                                                      cause the length of them is not greater than or equal
   The first step can be formulated in two SQL state-                                                                                                to MinLength, which is 4 in the example. Also, note
ments:                                                                                                                                               that although in this case items HB8:15 and IB8:20
                                                                                                                                                     did not become infrequent in TF, they lost n–support.
INSERT INTO F (item, i_cnt)                                                                                                                             Before stating further observations and continuing
SELECT item, count(*) i_cnt FROM T                                                                                                                   with the development of the proposed method it is im-
GROUP BY item HAVING COUNT(DISTINCT oid) >= n                                                                                                        portant to note the following. The set of discoverable
       AND COUNT(*) >= MinSupp                                                                                                                       LSPs from T is equivalent to the set of discoverable
                                                                                                                                                     LSPs from TF. This is ensured by first two observa-
CREATE VIEW TFV AS                                                                                                                                   tions. Since further steps of the proposed method will
SELECT T.oid, T.tid, T.item                                                                                                                          discover LSPs from TF, these two observation ensure
FROM T, F                                                                                                                                            the correctness of the method so far. However, it is
WHERE T.item = F.item                                                                                                                                also important to note that not all transactions in TF
The first statement finds items that meet the unique                                                                                                 necessarily satisfy a LSP. This is due to the sequen-
support criterion. The second statement constructs a                                                                                                 tiality of the first two steps. After the first step all
filtered view of T , called TFV, in which transactions                                                                                               the remaining items in transactions are frequent items.
only contain the items found by the previous state-                                                                                                  Then, in the second step, some of these transactions,
ment.                                                                                                                                                which are not long, are deleted. Due to this deletion a
    The effects of the first step are illustrated in Fig-                                                                                            frequent item in the remaining long transactions may
ure 3(a). Spatio–temporal regions, which are part of                                                                                                 become non–frequent, which in turn may cause some
trajectories that belong to less than 2 distinct objects,                                                                                            transactions to become short again. While there is no
are removed from trajectories. From the point of view                                                                                                simple solution to break this circle, note that the cor-
of an intelligent rideshare application these spatio–                                                                                                rectness of the first and second steps are not violated
temporal regions are uninteresting, since these parts                                                                                                since the deleted items and transactions could not have
of the trajectories cannot be shared by any objects,                                                                                                 satisfied a LSP.
i.e., are not sharable.                                                                                                                              STEP 3: Item–conditional DB projection
STEP 2: Filtering of short transactions                                                                                                                 For the following discussion, adopted from [10], let
  Transactions, i.e., trip trajectories, having less than                                                                                            an item–conditional database of transactions, equiva-
MinLength frequent items cannot satisfy a LSP. Hence,                                                                                                lently referred to as an an item–projected database, be


                                                                                                                                           53
      OBJ. 1 (2)   OBJ. 2 (3)          OBJ. 3 (4)   OBJ. 4 (5)       OBJ. 5 (6)        OBJ. 1 (2)   OBJ. 2 (3)          OBJ. 3 (4)    OBJ. 4 (5)       OBJ. 5 (6)         OBJ. 1 (2)   OBJ. 2 (3)          OBJ. 3 (4)   OBJ. 4 (5)       OBJ. 5 (6)


                                                                        8:35                                                                              8:35
 A                                                                                 A                                                                                A
                                                                                                                                                          (8,8)

                     8:10       8:15                8:25                                              8:10       8:15                8:25     8:30                                       8:10       8:15                8:25    8:30
 B                                                               8:30              B                 (8,11)      (8,11)              (8,11)   (8,11)                B


 C                                                                                 C                                                                                C

     E        F      G            H           I     J            K       L             E       F      G           H            I      J            K       L              E        F     G            H           I     J            K       L

     (a) The sample DB after STEP 5                                               (b) Item–conditional DB TF |LA8:35 and                                                (c) Sample DB after the second PDD
                                                                                  PDD
                                                    Figure 4: Illustration of Pattern Discovery and Deletion phases
defined as:                                                                                                                   JB8:25, KB8:30}. LA8:35 is the only item that is in
Definition 5 Let T be a database of transactions, and                                                                         TF |FC8:05 , but is not in the discovered SMFCI. Since
i an item in T . Then, the item–conditional database                                                                          further projecting TF |FC8:05 on LA8:35 yields a data-
of transactions, is denoted as T|i and contains all the                                                                       base of transactions where no item meets the minimum
items from the transactions containing i.                                                                                     n–support criterion, the discovered SMFCI is the only
   The construction of an item–conditional database of                                                                        CFI present in TF |FC8:05 . Since the discovered SMFCI
transactions can be formulated in a single SQL state-                                                                         meets both the minimum length and an minimum n–
ment as:                                                                                                                      support criteria it is a pattern.
INSERT INTO T_i (oid, tid, item)                                                                                              STEP 5: Deletion of unnecessary items
SELECT t1.oid, t1.tid, t1.item                                                                                                   The subproblems that are recursively solved by the
FROM TF t1, TF t2                                                                                                             method presented so far are overlapping. That is to
WHERE t1.tid = t2.tid and t2.item = i                                                                                         say, viewed from a top level, a CFI that has n items is
   Given n frequent items in T , the problem of finding                                                                       at least once discovered in each of the n corresponding
CFIs can be divided into n subproblems of finding the                                                                         item–projected databases. To eliminate this redun-
CFIs in each of the n item–projected databases [10].                                                                          dancy, both in the mining process and the result set,
Using the divide–and–conquer paradigm, each of these                                                                          observe that an item j can be deleted from TF if it has
n subproblems can be solved by recursively mining the                                                                         the same n–support in TF |i as in TF. The intuition
item-projected databases as necessary.                                                                                        behind the observation is the following. If j has the
                                                                                                                              same n–support in TF |i as in TF, it implies that all
STEP 4: Discovery of the single most frequent                                                                                 the transactions in TF that satisfy j are also present
closed itemset                                                                                                                in TF |i . Thus, the set of patterns containing j, which
   Since i is in every transaction of the item–projected                                                                      can be discovered from TF, can also be discovered from
database T|i , and hence has maximum n–support, the                                                                           TF |i .
items in T|i can be grouped in two: items that have the                                                                          The fifth step can be formulated in two SQL state-
same n–support as i, and items that have n–support                                                                            ments:
less than that of i. The set of items that have the same                                                                      INSERT INTO FT_i (item, i_cnt)
n–support in the T|i as i is the Single Most Frequent                                                                         SELECT item, count(*) i_cnt FROM T_i
Closed Itemset (SMFCI) in T|i . The fourth step of the                                                                        GROUP BY item
method discovers this SMFCI.                                                                                                  HAVING COUNT(DISTINCT oid) >= n
   The fourth step can be formulated in two SQL state-
                                                                                                                              DELETE FROM TF WHERE TF.item IN
ments:
                                                                                                                                  (SELECT F.item FROM F, FT_i
INSERT INTO FT_i (item, i_cnt)                                                                                                      WHERE F.item = FT_i.item
SELECT item, COUNT(*) i_cnt FROM T_i                                                                                                AND F.i_cnt = FT_i.i_cnt)
GROUP BY item
                                                                                                                              The first statement counts the n–support of items in
HAVING COUNT(DISTINCT oid) >= n
                                                                                                                              TF |i . The second statement deletes all items in TF
SELECT item FROM FT_i                                                                                                         that have the same n–support in TF as in TF |i .
WHERE i_cnt = (SELECT MAX(i_cnt) FROM FT_i)                                                                                      Figure 4(a) shows the effects of deleting the un-
The first statement derives n–support of n–frequent                                                                           necessary items after the mining of TF |FC8:05 . Since
                                                                                                                              items FC:8:05 and IB8:20 have the same n–support
items in the item–projected database, while the second
                                                                                                                              in TF |FC8:05 as in TF, shown in Figure 3(c), they are
statement selects those items from these n–frequent
                                                                                                                              deleted from TF. Items remaining in TF are shown in
items that have maximum n–support.
                                                                                                                              Figure 4(a).
   Figure 3(c) shows the effects of projecting TF based
on the item FC8:05. The numbers in parentheses show                                                                           Item-projection ordered by increasing n–
the n–support of the items in TF |FC8:05 and TF re-                                                                           Support
spectively. The SMFCI that is immediately discovered                                                                             A LSP p in T , containing items i1 . . . ik , can be dis-
from TF |FC8:05 is {FC8:05, GB8:10, HB8:15, IB8:20,                                                                           covered from any one of the item–projected databases


                                                                                                                    54
T|i1 , . . . , T|ik . Steps 4 and 5 of the proposed method                 PAT. 1: O(1,2), S(5), L(6)      PAT. 2: O(1,5), S(8), L(5)          PAT. 3: O(1,2,5), S(11), L(4)

assure that p will be discovered from exactly one of
these item–projected databases, but the method pre-
                                                                    A                                                                                              8:35
sented so far does not specify which one. While this
point is irrelevant from the point of view of correctness,
                                                                                                    8:10   8:15           8:20          8:25
it is crucial from the point of view of effectiveness.              B                                                                                   8:30
    To illustrate this, assume that i1 .supp(n) <
i2 .supp(n) < . . . < ik .supp(n). If projections are per-          C                   8:05
formed in decreasing order of item n–support, then,
first T|ik is constructed, then T|ik |ik−1 is constructed                 E           F            G         H              I           J             K              L
from it, and so on, all the way to T|ik |ik−1 |...|i1 , from
which finally p is discovered. If on the other hand,                     Figure 5: Three patterns in the sample DB
projections are performed in increasing order of item
                                                                    JB8:25, KB8:30} is discovered. Since this closed item-
n–support, then p is discovered from the first item–
                                                                    set meets both the minimum length and n–support re-
projected database that is constructed, namely T|i1 .
                                                                    quirements, it is recorded as a pattern. Finally, items
    Assume that p and its qualifying (k − l + 1) (at least
                                                                    having the same n–support in TF |i as in TF, which in
l–long) sub–patterns are the only LSPs in T . Then
                                                                    this case means all the items in TF |i , are deleted from
during the whole mining process, the total number of
                                                                    TF. After this deletion part of the final PDD phase,
projections in the decreasing processing order is Pdec =
                                                                    TF becomes empty and the mining terminates. Figure
k, whereas in the increasing processing order the total
                                                                    5 shows the three patterns that are discovered during
number of projections is only Pinc = k − l + 1. If k
                                                                    the mining. Supporting oids, n–supports, and length
and l are comparable and large, then Pdec ≫ Pinc .
                                                                    for each discovered patterns are shown in the legend.
Similar statements can be made about the total size
of the projected databases in both cases. Hence, item–              LSP mining algorithm
projection should be performed in increasing order of                   Using the observations and the associated steps,
item n-support.                                                     the complete algorithm for mining LSPs in trajecto-
Alternating pattern discovery and deletion                          ries is given in Figure 6. Since item–projected data-
   Alternating steps 3, 4 and 5, all patterns can be                bases are constructed at every level of the recursion
discovered in a recursive fashion. The sequential ap-               and are modified across levels when deleting unnec-
plication of these steps is referred to as a Pattern Dis-           essary items, the level of recursion L is passed as an
covery and Deletion phase (PDD). Mining terminates                  argument in the recursive procedure, and is used as
when all items have been deleted from TF.                           a superscript to associate databases to the levels they
                                                                    were constructed in.
   Figures 3(c) and 4(a) shows the effects of the first
of these PDD phases. Figures 4(b) and 4(c) show                         Lines 2 and 3 in the MineLSP procedure represent
the effect of the next pattern PDD phase. Since af-                 step 1 and 2 of the method, and they construct the fil-
ter the first PDD phase LA8:35 has the lowest n–                    tered database of transactions at the initial level, level
support in TF, namely 8, it is chosen as the next                   0. Line 4 processes frequent items in TF 0 in ascending
item to base the database projection on. Figure 4(b)                order of n–support. Line 5 represent step 3 of the me-
show TF |LA8:35 with the corresponding n–support of                 thod, and for each such frequent item i, it constructs
the items in TF |FC8:05 and TF respectively. Since                  the item–conditional database of transactions TF 0|i at
all the items have the same n–support in TF |FC8:05                 level 0. Line 6 calls procedure FindLSP to extract all
as LA8:35, namely 8, the closed itemset {GB8:10,                    LSPs from TF 0|i recursively.
HB8:15, JB8:25, KB8:30, LA8:35} is discovered. Since                    Lines 2 and 3 in the FindLSP procedure represent
this closed itemset both meets the minimum length                   steps 1 and 2 of the method, and they construct the
and n–support requirements it is recorded as a pat-                 filtered database of transactions at the current level
tern. In the deletion part of this PDD phase, item                  L. Line 4 represents step 4 of the method, and it finds
LA8:35 is deleted from TF as the only item that have                the SMFCI P in TF L    long . Line 5 represents step 5 of
the same n–support in TF |LA8:35 as in TF. The results              the method, and it deletes all items from the filtered
                                                                                                                         L−1
of this deletion are shown on Figure 4(c).                          database of transactions of the previous level, TF long  ,
                                                                                                              L−1         L
   The third and final PDD phase is implicitly shown                that have the same n-support in TF long as in TF freq ,
in Figure 4(c). Since after the second PDD phase all                the current level. Lines 6 and 7 check if the single
the items in TF have the same n–support, the next                   most frequent closed itemset P meets the minimum
projection is performed on any one of the items, i,                 requirements and store it accordingly. Lines 8 and 9
and the resulting item–projected database, TF |i , is               processes frequent items in TF L   long , which are not in
identical to the current state of TF, depicted on Fig-              P , in ascending order of n–support. Line 10 represent
ure 4(c). Since all the items in TF |i have the same                step 3 of the method, and for each such frequent item
n–support as i, the closed itemset {GB8:10, HB8:15,                 i it constructs the item–conditional database of trans-


                                                               55
(1) procedure MineLSP (T , MinSupp, MinLength, n)                         5   Experimental evaluation
(2)     TF 0freq ← MinSupportFilter (T , MinSupp, n)
(3)     TF 0long ← MinLengthFilter (TF 0freq , MinLength)
                                                                          The proposed method was implemented using MS-
(4)     for each freq item i in TF 0long ordered by asc n–supp
                                                                          SQL Server 2000 running on Windows XP on a 3.6GHz
(5)       TF 0|i ← ConstructConditionalDB (TF 0long , i)
                                                                          Pentium 4 processor with 2GB main memory. The me-
(6)       FindLSP (TF 0|i , 1, MinSupp, MinLength, n)
                                                                          thod was tested on the publicly available INFATI data
(7)     end for each
                                                                          set, which comes from intelligent speed adaptation ex-
                                                                          periments conducted at Aalborg University. This data
(1) procedure FindLSP (T ,L,MinSupp,MinLength,n)
                                                                          set records cars moving around in the road network of
(2)     TF L
                                                                          Aalborg, Denmark over a period of several months.
           freq ← MinSupportFilter (T , MinSupp, n)
(3)     TF L                          L                                   20 distinct test cars and families participated in the
           long ← MinLengthFilter (TF freq , MinLength)
(4)     (P, P .supp(n)) ← FindSMFCI (TF L
                                                                          INFATI experiment; Team–1 consisting of 11 cars op-
                                        long )
(5)     TF L−1  ← DeleteUnnecessaryItems (TF L−1  , TF L
                                                                          erated between December 2000 and January 2001 and
                                                       freq )
(6)
           long                              long
        if P .supp(n) ≥ MinSupp and |P | ≥ MinLength
                                                                          Team–2 consisting of 9 cars operated between Febru-
(7)       StorePattern (P , P .supp(n))
                                                                          ary and March 2001. Each car was equipped with a
(8)     for each freq item i in TF L
                                                                          GPS receiver, from which GPS positions were sampled
                                   long ordered by asc n–supp
(9)       if i is not in P
                                                                          every second whenever the car was operated. Addi-
(10)         TF L                               L                         tional information about the experiment can be found
                |i ← ConstructConditionalDB (TF long , i)
(11)         FindLSP (TF L
                                                                          in [8].
                         |i , L + 1, MinSupp, MinLength, n)
(12)    end for each                                                          The method presented in Section 3.1 identifies trips
                                                                          from continuous GPS measurements, which is not the
                                                                          case in the INFATI data. Hence in this case, a trip
               Figure 6: The LSP algorithm                                was defined as sequence of GPS readings where the
                                                                          time difference between two consecutive readings is less
                                                                          than 5 minutes. Using the definition, the INFATI data
actions TF L |i at the current level L. Finally, line 11                  contains 3699 trips. After projecting the temporal di-
recursively calls procedure FindLSP to find LSPs in
                                                                          mension to the time–of–day domain and substituting
TF L|i at the next level.                                                 the noisy GPS readings with 100 meter by 100 me-
   The structure and functionality of procedures                          ter by 5 minutes spatio-temporal regions, the resulting
MineLSP and FindLSP have a significant overlap.                           trajectory database has the following characteristics.
While the two functions can be merged into one, the                       There are 200929 unique items in the 3699 transac-
separation of the two is used to emphasize the facts                      tions. The average number of items in a transaction is
that (1) DeleteUnnecessaryItems presumes the exis-                        approximately 102. The average n–support of 1–, 2–,
tence of databases constructed at the previous level,                     and 3–frequent items is 1.88, 4.2 and 6.4 respectively.
and (2) FindSMFCI correctly operates only on an                           Notice that the averages only include the n–supports
item–projected database, and hence it can only be ap-                     of 1–, 2–, and 3– frequent items.
plied at level 1 and above.                                                   Two sets of experiments were performed, each
                                                                          varying one of the two parameters of the algorithm,
    Several implementation details are worth mention-
                                                                          MinSupp and MinLength. The performance of the al-
ing. First, DeleteUnnecessaryItems deletes items from
     L−1                                                                  gorithms was measured in terms of processing time
TF long     based on the n–support of items in TF L         freq ,
           L
                                                                          and working space required, where the working space
not TF long . This is important, as it was noted that                     required by the algorithm was approximated by the
MinLengthFilter decreases the n–support of items in                       sum of the rows in the projected tables that were con-
TF L freq , thereby possibly making an unnecessary item                   structed by the algorithm. Both measures were evalu-
appear to be necessary. Second, arguments to func-                        ated in an absolute and a relative, per pattern, sense.
tions and operands in statements are logical, i.e., the                   Figures 7(a), 7(c), and 7(e) show the results of the
functions and statements can be more efficiently imple-                   first set of experiments, where MinSupp = 2, n = 2
mented using previously derived tables. For example,                      and MinLength is varied between 120 and 530. Lower
both FindSMFCI and DeleteUnnecessaryItems are im-                         settings for MinLength were also tested, but due to
plemented using previously derived n–support count                        the very low MinSupp value these measurement were
tables not the actual trajectory tables. Third, simple                    terminated after exceeding the 2 hour processing time
shortcuts can significantly improve the efficiency of the                 limit. Noting the logarithmic scale in Figure 7(a) it is
method. For example, during the derivation of TF L          freq ,        evident that both the running time and the working
if the number of unique frequent items in TF L      freq is  less         space required by the algorithm exponentially increase
than MinLength, no further processing is required at                      as the MinLength parameter is decreased. Examining
that level, since none of the CFIs that can be derived                    the same quantities in a relative, per pattern sense,
from TF L    freq are long. To preserve clarity, these simple             Figure 7(e) reveals that the average running time and
shortcuts are omitted from Figure 6.                                      average working space required per pattern is approx-


                                                                     56
                                                                                                                                                                                                                                                                                                                                                                                                   Trajectories of 20 moving objects in INFATI
                                                            Absolute Time and Space Performace vs MinLength         8
                                                                                                                                                                                                                                     Absolute Time and Space Performace vs MinSupp                      10
                                                     4                                                                                                                                                                          4
                                                   10                                                              10                                                                                                      10                                                                  10                                                                                                   4
                                                                                                                                                                                                                                                                                                                                                                                                 x 10
                                                                                           processing time                                                                                                                                                                 processing time                                                                                               4


                                                                                                                                                                                                                                                                                                                                total working space (# of rows)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    2000


                                                                                                                                      total working space (# of rows)
                                                                                           working space                                                                                                                                                                   working space
                     total processing time (sec)


                                                                                                                                                                                             total processing time (sec)


                                                                                                                                                                                                                                                                                                                                                                                                                                                               Time−of−day (5min)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1800
                                                                                                                                                                                                                                                                                                                                                                                         3


                                                                                                                                                                                                                                                                                                                                                                           Time (5min)
                                                     3                                                              6
                                                   10                                                              10                                                                                                                                                                                                                                                                                                                                                               1600
                                                                                                                                                                                                                                                                                                                                                                                         2                                                                                          1400
                                                                                                                                                                                                                                2                                                                       5
                                                                                                                                                                                                                           10                                                                  10                                                                                                                                                                                   1200
                                                                                                                                                                                                                                                                                                                                                                                         1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1000
                                                     2                                                              4
                                                   10                                                              10                                                                                                                                                                                                                                                                0                                                                                              800
                                                                                                                                                                                                                                                                                                                                                                                   6.4                                                                                              6.4
                                                                                                                                                                                                                                                                                                                                                                                                                                                        6500                                   6.35                                                         6000
                                                                                                                                                                                                                                                                                                                                                                                                        6.3
                                                                                                                                                                                                                                                                                                                                                                                                                                                 6000                                                                                                5800
                                                                                                                                                                                                                                                                                                                                                                                             4                                                                                             4          6.3                                     5600
                                                                                                                                                                                                                                                                                                                                                                               x 10                           6.2                         5500                                      x 10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            6.25                       5400
                                                                                                                                                                                                                                                                                                                                                                                                                                 5000                                                                                           5200
                                                     1                                                              2                                                                                                           0                                                                       0                                                                                                           6.1   4500                                                                                     6.2   5000
                                                   10                                                              10                                                                                                      10                                                                  10                                                                               Y−coor (100m)                                           X−coor (100m)                               Y−coor (100m)                                      X−coor (100m)
                                                     100 150 200 250 300 350 400 450 500                                                                                                                                         5          10        15      20          25        30        35
                                                                     MinLength                                                                                                                                                                             MinSupp


                                    (a) Absolute time and space (b) Absolute time and space                                                                                                                                                                                                                                                                            (a) Moving object trajectories                                                                                  (b) 28 discovered LSPs
                                                   Number of Patterns and Ineffective Projections vs MinLength
                                                   10
                                                     4
                                                                                                          10
                                                                                                             4
                                                                                                                                                                                                                           Number of Patterns and Ineffective Projections vs MinSupp
                                                                                                                                                                                                                             6                                                      3
                                                                                                                                                                                                                                                                                                                                                                                                         Figure 8: LSP discovery results in INFATI
                                                                                                                                                                                                                           10                                                     10
                                                                                                 patterns                                                                                                                                                               patterns
                                                                                     ineffective projections                                                                                                                                                     ineffectibe projections
                                                                                                                                                                                                                                                                                                                                                                       of the trajectories of the 20 moving objects in the IN-


                                                                                                                                                                                                                                                                                                                            # of ineffective projections
                                                                                                                                      # of ineffective projections


                                                        3                                                               3
                                                   10                                                              10
                                                                                                                                                                                                                             4                                                                    2
                                                                                                                                                                                                                           10                                                                  10
                                                                                                                                                                                                                                                                                                                                                                       FATI data set. While some regularities are apparent
              # of patterns


                                                                                                                                                                                            # of patterns


                                                        2                                                               2
                                                   10                                                              10

                                                                                                                                                                                                                             2                                                                    1
                                                                                                                                                                                                                                                                                                                                                                       in it to the human eye, to find LSPs in it seems like a
                                                   10
                                                        1
                                                                                                                   10
                                                                                                                        1
                                                                                                                                                                                                                           10                                                                  10
                                                                                                                                                                                                                                                                                                                                                                       daunting task. Figure 8(b) shows 28 LSPs in it that
                                                        0                                                               0                                                                                                    0                                                                    0
                                                                                                                                                                                                                                                                                                                                                                       are at least 200 long, sharable for at least 2 distinct
                                                   10
                                                     100 150 200 250 300 350 400 450 500
                                                                    MinLength
                                                                                                                   10                                                                                                      10
                                                                                                                                                                                                                                5           10        15      20
                                                                                                                                                                                                                                                           MinSupp
                                                                                                                                                                                                                                                                          25        30
                                                                                                                                                                                                                                                                                               10
                                                                                                                                                                                                                                                                                              35                                                                       objects, and have a support of at least 2.
                                                            (c) Number of patterns                                                                                                                                                  (d) Number of patterns                                                                                                                In conclusion, the experiments show that the me-
                                                     Relative Time and Space Performace vs MinLength                                                                                                                        Relative Time and Space Performace vs MinSupp
                                                                                                                                                                                                                                                                                                                                                                       thod is effective and robust to changes in the user–
                                                                                                               10000
                                    40
                                                                                        processing time
                                                                                                                                                                                                               4                                                                          4000
                                                                                                                                                                                                                                                                                                                                                                       defined parameter settings, MinLength and MinSupp,
                                                                                                                            working space per pattern (# of rows)


                                                                                                                                                                                                                                                                     processing time
                                                                                                                                                                                                                                                                                                 working space per pattern (# of rows)
processing time per pattern (sec)


                                                                                                                                                                        processing time per pattern (sec)


                                                                                         working space                                                                                                                                                               working space
                                                                                                                                                                                                                                                                                                                                                                       and is a useful analysis tool for finding LSPs in moving
                                                                                                                                                                                                                                                                                                                                                                       object trajectories.
                                    20                                                                         5000                                                                                            2                                                                          2000


                                                                                                                                                                                                                                                                                                                                                                       6                            Conclusions and future work
                                                   0                                                           0
                                                   100 150 200 250 300 350 400 450 500
                                                                   MinLength
                                                                                                                                                                                                               0
                                                                                                                                                                                                                5                      10        15       20
                                                                                                                                                                                                                                                       MinSupp
                                                                                                                                                                                                                                                                     25        30
                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                         35                                                                            The herein presented work, for the first time, considers
                                     (e) Relative time and space                                                                                                                                            (f) Relative time and space                                                                                                                                the problem of mining LSPs in trajectories, and trans-
                                                                                                                                                                                                                                                                                                                                                                       forms it to a framework similar to that used in frequent
Figure 7:    Performance evaluation for various                                                                                                                                                                                                                                                                                                                        itemset mining. A database projection based method,
MinLength (a,c,e) and MinSupp (b,d,f) settings                                                                                                                                                                                                                                                                                                                         and its simple SQL–implementation is presented for
imately linearly decreasing as the MinLength param-                                                                                                                                                                                                                                                                                                                    mining LSPs in trajectories. The effectiveness of the
eter is decreased. The presence of the two irregular                                                                                                                                                                                                                                                                                                                   method is demonstrated on a real world data set.
bumps in Figure 7(e) can be explained in relation to                                                                                                                                                                                                                                                                                                                      The large number of patterns discovered are diffi-
the number of patterns found, and the number of in-                                                                                                                                                                                                                                                                                                                    cult to analyze. To reduce this number, future work
effective projections that yield no patterns, shown in                                                                                                                                                                                                                                                                                                                 will consider the mining of a compressed patterns in
Figure 7(c). The sharp increases in relative process-                                                                                                                                                                                                                                                                                                                  trajectories [14].
ing time and working space are due to the fact that                                                                                                                                                                                                                                                                                                                       In future work, we also plan to perform additional
the algorithm is unable to take some of the shortcuts                                                                                                                                                                                                                                                                                                                  experiments on larger real world data sets when such
and it performs relatively more ineffective projections                                                                                                                                                                                                                                                                                                                become available. These experiments will include the
yielding no pattern discovery. The sharp decreases can                                                                                                                                                                                                                                                                                                                 investigation of scale–up properties of the algorithm as
be explained by the presence of an increasing number                                                                                                                                                                                                                                                                                                                   the number of moving objects are increasing and/or
of patterns that share the total pattern discovery cost.                                                                                                                                                                                                                                                                                                               as the granularity of the spatio–temporal regions is
   Similar observations can be made about the second                                                                                                                                                                                                                                                                                                                   varied.
set of experiments, shown in Figures 7(b), 7(d), and
7(f), where MinLength = 50, n = 2 and MinSupp                                                                                                                                                                                                                                                                                                                          Acknowledgments
is varied between 7 and 33. For example, the sharp
decrease in relative processing time in Figure 7(f) when                                                                                                                                                                                                                                                                                                               This work was supported in part by the Danish Min-
going from MinSupp = 33 to MinSupp = 32 is simply                                                                                                                                                                                                                                                                                                                      istry of Science, Technology, and Innovation under
due to the sudden appearance of patterns in the data                                                                                                                                                                                                                                                                                                                   grant number 61480.
for the given parameters. While there is only 1 pattern
for MinSupp = 33, and an order of magnitude more                                                                                                                                                                                                                                                                                                                       References
number of patterns for MinSupp = 32, the projections
performed and hence the absolute processing time to                                                                                                                                                                                                                                                                                                                    [1] R. Agrawal, T. Imilienski, and A. Swami. Mining Asso-
                                                                                                                                                                                                                                                                                                                                                                           ciation Rules between Sets of Items in Large Databases.
discover these patterns is approximately the same in
                                                                                                                                                                                                                                                                                                                                                                           In Proc. of SIGMOD, pp. 207–216, 1993.
both cases. Hence, the relative processing time for                                                                                                                                                                                                                                                                                                                    [2] R. Agrawal and R. Srikant. Fast algorithms for mining
MinSupp = 33 is an order of magnitude larger than                                                                                                                                                                                                                                                                                                                          association rules. In Proc. of VLDB, pp. 487–499, 1994.
that for MinSupp = 32.                                                                                                                                                                                                                                                                                                                                                 [3] R. Agrawal and R. Srikant. Mining Sequential Pat-
   Figure 8(a) shows a 50–fold down–sampled version                                                                                                                                                                                                                                                                                                                        terns. In Proc. of ICDE, pp. 3–14, 1995.


                                                                                                                                                                                                                                                                                                                                                                  57
[4] G. Gidofalvi and T. B. Pedersen. Spatio-Temporal Rule
    Mining: Issues and Techniques. In Proc. of DaWaK,
    pp. 275–284, 2005.
[5] G. Gidofalvi and T. B. Pedersen. Mining Long,
    Common Patterns in Trajectories of Moving
    Objects. A DB Technical Report (15), 2006:
    www.cs.auc.dk/DBTR/DBPublications/DBTR-15.pdf
[6] B. Goethals. Survey on frequent pattern mining.
    citeseer.ist.psu.edu/goethals03survey.html
[7] C. S. Jensen, D. Pfoser, and Y. Theodoridis. Novel Ap-
    proaches to the Indexing of Moving Object Trajecto-
    ries. In Proc. of VLDB, pp. 395-406, 2000.
[8] C. S. Jensen, H. Lahrmann, S. Pakalnis, and S.
    Runge. The INFATI data. Time Center TR–79, 2004:
    www.cs.aau.dk/TimeCenter
[9] N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou,
    Y. Tao, and D. W. Cheung. Mining, Indexing, and
    Querying Historical Spatiotemporal Data. In Proc. of
    KDD, pp. 236–245, 2004.
[10] J. Pei, J. Han, and R. Mao. CLOSET: An efficient
    algorithm for mining frequent closed itemsets. In Proc.
    of DMKD, pp. 11-20, 2000.
[11] X. Shang, K.-U. Sattler, and I. Geist. Efficient Fre-
    quent Pattern Mining in Relational Databases. In Proc.
    of LWA, pp. 84–91, 2004.
[12] I. Tsoukatos and D. Gunopulos. Efficient Mining of
    Spatiotemporal Patterns. In Proc. of SSTD, pp. 425–
    442, 2001.
[13] M. Vlachos, D. Gunopoulos, and G. Kollios. Discover-
    ing Similar Multidimensional Trajectories. In Proc. of
    ICDE, pp. 673–685, 2002.
[14] D. Xin, J. Han, X. Yan, and H. Cheng. Mining Com-
    pressed Frequent-Pattern Sets. In Proc. of VLDB, pp.
    709–720, 2005.
[15] X. Yan, J. Han, and R. Afshar. CloSpan: Mining
    closed sequential patterns in large datasets. In Proc.
    of SDM, pp. 166–177, 2003.


                                                              58