Discover spatio-temporal cluster from trajectory data
enhance by heterogeneous contextual knowledge
using FCA and the NextPriorityConcept Algorithm
Jérémy Richard1,∗ , Guillaume Savarit1 , Salah Eddine Boukhetta1 , Cyril Faucher1 ,
Karell Bertet1 and Christophe Demko1
1
    Laboratory L3i, La Rochelle University, La Rochelle, France


                                         Abstract
                                         The rising number of different kinds of data that can be used to describe a human trajectory (Such as
                                         GPS Coordinate, GSM, RFID, RSSI…) put in the spotlight the semantically rich trajectory. A semantic
                                         trajectory annotates semantic knowledge directly into raw data based on features of the studied area
                                         such as point of interest or weather conditions. One of the challenges of mobility studies nowadays
                                         is to find the right data model to shape all those data coming from different source into a framework
                                         flexible enough to multiply the contextual data that can be used; where contextual data are knowledge
                                         coming from external data source (public city dataset, web pages, national weather services etc….).
                                         Such data models are the key component of mobility studies, but oftentimes lose the computational
                                         aspect of trajectories. In this paper, we will use the semantically rich trajectory as a way to analyse
                                         behavioral data enriched by contextual knowledge as this issue has rarely been addressed in the state
                                         of the art. We will study the use of formal concept analysis and pattern mining as a way to compute
                                         complex sequential patterns in a dataset of semantic trajectories by using the NextPriorityConcept
                                         algorithm. This kind of formal concept analysis allows an interactive analysis between individuals path
                                         and contextual data resulting in a hierarchy of spatio-temporal clusters where each cluster contains a
                                         specific pattern depicting the trajectories within.


1. Introduction
A trajectory represents the path of a moving object in space. Most of raw trajectories are
formalised as a sequence of time related coordinates (𝑥, 𝑦) at a time 𝑡, noted < (𝑥𝑖 , 𝑦𝑖 ), 𝑡𝑖 >. The
most common type of data used to reconstruct such paths is GPS coordinates. Yet, with the
exponential growth of the internet of Things, we collect more and more data from various ways
(such as GSM, radio frequency etc….) that are related to a person’s path (or animals, vehicles
etc ..). Therefore, making mobility studies more and more complexes taking into account a wide
range of data that can be used to reconstitute the movement of an object in space. This question
the way we can use and model this information. To address this problem, researchers consider

Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 159–173.
∗
    Corresponding author.
Envelope-Open jeremy.richard2@univ-lr.fr (J. Richard); guillaume.savarit1@univ-lr.fr (G. Savarit); salah.boukhetta@univ-lr.fr
(S. Boukhetta); cyril.faucher@univ-lr.fr (C. Faucher); karell.bertet@univ-lr.fr (K. Bertet);
christophe.demko@l3i.univ-lr.fr (C. Demko)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
the use of semantic trajectories [1] [2]. A semantic trajectory adds contextual knowledge to
a path of a mobile object; where contextual knowledge can be defined as every information
bringing a better understanding of an event, person or object. This information brings to
trajectories another point of view avoiding to rely only on movement data. Weather condition
or events such as protests or all kinds of “background” information can be used to discover
trajectory-shaping decisions. This data needs to be added onto those trajectories and are often
time shaped as an episode. An episode is a time interval (𝐴, 𝑡, 𝑡) where 𝑡, 𝑡 are timestamps and
A a semantic annotation. Using a succession of episodes such as the sequence of “district”
crossed by a trajectory can be interpreted as a “discretization” of this one. Nevertheless, not
only the “context-aware” trajectory has been hard to analyze, but also the temporal aspects of a
semantic trajectory are largely ignored and only a few studies address this problem directly. In
this paper, we propose a way to analyse such enhanced trajectories by using the similarities
between semantic trajectories and temporal sequences. Recent advances in the field of pattern
mining and Formal Concept Analysis (FCA) show promising ways to analyze heterogeneous
data coming from multiple datasets simultaneously; method that we are going to apply to
such semantic trajectories. Processing complex semantic trajectories using the GALACTIC
framework and the NextPriorityConcept[3] algorithm shows that it is possible to extract
such mobility patterns. We also can add any type of contextual data during the process enabling
an interactive analysis where we can follow the mobility of an object enriched with all kinds of
contextual data evolving on the edges of its trajectory. Also, adding contextual knowledge is a
mean to avoid the deluge of pattern by focusing on specific behaviors. We propose a method for
movement data analysis focusing on the semantic aspect of trajectories alongside interactive
analysis with contextual data. As of today, the number of methods for analysing the relationship
between a moving object and background information is relatively low.
   The paper is organized as follows; in section 2 we will present related work both from the
field of trajectory modelling and formal concept analysis. Section 3 will explain the library
GALACTIC. In section 4 we will explore the computational properties of such a tool with
experimentations on heterogeneous data. We are using real life semantic trajectories captured
by our team and composed by touristic path in the city of La Rochelle (France) and enriched
with contextual knowledge.


2. From trajectories to temporal sequences: Related work
2.1. Trajectory processing :
In this section, we will present an overview of existing state-of-the-art models for semantic
trajectories. First, we will detail models and solutions to add more contextual knowledge into
trajectories. Next, we will study methods and analysis techniques for movement data. Based on
a dataset of semantic trajectories, we will describe the existing solutions used to retrieve similar
sub-trajectory common behaviors or movement patterns.
2.1.1. The semantic aspect and the knowledge management.
Semantic trajectory is defined by Parent et al. 2013 [4] as : “ … a [raw] trajectory that has been
enhanced with annotations and/or one or several complementary segmentation.”. The addition
of semantic knowledge directly into a raw trajectory enables researchers to better interpret
such data. A simple GPS coordinate can become “street A”, and a series of radio frequency from
multiple sensors in a system of indoor localization can become “room A”. The definition and
the formalisation of the semantic trajectory will differ from a data model to another but the
main idea is to provide a framework generic enough to be used in many contexts. To do so,
such models are usually structured as a sequence of time related episodes where an episode is a
group of positions from a raw trajectory over a time period according to a preset defined by the
data-scientist [5]. This preset can be a POI (Place Of Interest), a behavior or any other attribute
that the researcher chooses to focuses on in his model which will be shaped as an alphabet.
The first model for semantic trajectories is the “stop and move” model [6]. By semantically
annotating when an individual stops and moves again, S. Spaccapietra et al. (along L. O Alvares
also based on stop and move [7]) built their semantic trajectories as a succession of stop and move.
A stop is a non-empty time interval [𝑡, 𝑡] where the travelling object does not move and a move is
a non-empty time interval [𝑡, 𝑡] where the travelling object is in motion. In this model, stop and
move are episodes. This work made the first step in the field of semantic trajectory inspiring
several works such as [8] where a framework for semantic annotation and enhancement is
proposed. Shortly after, semantic trajectories began to use ontology-based models such as [9]
still based on the first stop and move model. Ontology formalism is then used in those models as
a mean to convert the raw trajectory into a high-level semantic representation. The “stop” and
“move” model can be seen as a discretization of the speed while a “district” is a discretization of
space. Those ontology-based data models for semantic trajectories are only “queryable” and
cannot be used in pattern mining processes. As a matter of fact, the analysis of these data
models is a major issue in the field of semantic trajectories.

2.1.2. Trajectory analysis.
Mobility patterns tend to vectorize raw GPS trajectories such as the trajectory clustering
patterns where the goal is to determine common trajectories in order to group similar ones.
Using a similarity measure ( [10] where authors proposed a small overview), several studies
proposed clustering algorithms for trajectories such as [11] where authors used those metrics
to extract common sub-trajectories. Nevertheless, these studies used raw GPS trajectories
and similarity measures can be difficult to adapt to other types of trajectories such as indoor
semantic trajectories. In a notable uncommon trajectory clustering study very close to our
work [12], J. Nyoman et al. experiment the use of sequential pattern mining for analysing
visitor path in a museum. By shaping trajectories as sequences, the article combines sequence
mining with algorithms such as MFCS ( for “Mining Frequent Contiguous Subsequences” )
and MRGS ( for “Mining Rare General Subsequences” ) with movement data analysis; authors
were able to identify four visiting behaviors. The use of pattern mining techniques to process
trajectories shows promising results for analysing and clustering movement data with semantic
trajectories [13]. These works are closely related to what has been done in sequential pattern
analysis. This is an emerging issue in the field of movement data analysis, but still we can
cite [14] where authors proposed Splitter, a framework capable of efficiently mines sequential
pattern in semantic trajectories with as a classification approach. In [15], C. Huiping et al.
proposed to use an Apriori-like algorithm to solve this problem in a dataset of trajectories.
Several studies have addressed this problem in the larger field of sequence mining and they will
be discussed next section.


2.2. Introduction to FCA, pattern mining and sequence mining:
2.2.1. The basis of pattern mining.
A sequence is defined as a succession of elements from an alphabet Σ, often in the form of
𝑠 =< 𝐴𝑖 >𝑖≤𝑛 , where 𝐴𝑖 ∈ Σ. Each element of the dictionary can be text, action, item, protein and
so on. The ordering of elements in a sequence is very important and is a key element in this data
modelling. This form of representation gained popularity in the 90s, in particular with the work
of Agrawal Srikant with the Mining sequential pattern [16]. Given a dataset of customers, they
build sequences as a succession of items purchased by an individual and managed to get several
lists of items that are frequently bought together through a pattern mining algorithm and more
specifically the Apriori algorithm. By using pattern mining techniques, we allow a semantic
kind of representation to be computed and the goal is then to find common patterns in our
dataset. Sequence mining is a subfield of data mining which aims at finding patterns in a dataset
of sequences that appear more frequently. Patterns can be subsequences, prefixes, suffixes,
subsequences according to a sliding window [17]. The first generation of algorithms emerged
in the 90s with the article of Agrawal and Srikant [16] which extends the well-known Apriori
algorithm to sequence mining. All these algorithms take as input a dataset of sequences and a
minimum support threshold, and generate the set of frequent subsequences. For big datasets
and a short minimum support, these algorithms take a huge computing time and generate a
too large number of patterns that are difficult to interpret and contain redundancy. A second
generation of algorithms focuses on maximal patterns also know as closed patterns because
they verify a well-known property of closure in order to limit the number of patterns extracted
- called the deluge of pattern. Many algorithms directly address this problem (CloSpan [18],
ClaSP [19]).

2.2.2. The basis of formal concept analysis.
Some algorithms also appear within Formal Concept Analysis (FCA) frameworks [20] and their
extensions to pattern structures[21], where the lattice property of closed patterns is promoted.
We can mention an article for mining medical care trajectories using pattern structures [22].
Pattern concepts are built as maximal sets of individuals with their maximal common subse-
quences, the whole set of concepts is equipped with a specialization/generalization relation
from a partially ordered set with the lattice property. This lattice represents the initial data
where concepts are clusters of “similar” sequences, and the concept lattice is a hierarchy of
clusters (regrouping objects with their associated common patterns). The use of the Formal
Concept Analysis while working with semantic sequences is a promising way to find common
patterns of behavior [23].

2.2.3. Temporal sequence mining.
To the best of our knowledge, Yoshida et al. [24] where the firsts introduce the notion of
temporal pattern mining, called “Delta patterns” where a pattern (𝑎, [0, 3], 𝑏) is a sequential
pattern (𝑎, 𝑏) that frequently occurs in the dataset and has a transition time from 𝑎 to 𝑏 of [0, 3],
a time interval. This transition time between two elements has known a notable extension in
work such as [25]. In [26], researchers focus on mining chronicles, where a chronicle is a couple
(ℰ , 𝒯 ) with ℰ is a set of temporal events and 𝒯 a set of temporal constraints on the set ℰ. A
temporal constraint noted 𝑒1 [𝑡 − , 𝑡 + ]𝑒2 represent the time gap between two events 𝑒1 and 𝑒2 .
    Thus, a parallel can be made between semantic trajectories and temporal sequences. By
extracting the semantic annotations and the time information of all episodes of the semantic
trajectories, we end up with a sequence of time intervals denoted by 𝑆 =< 𝐴𝑖 , 𝑡𝑖 , 𝑡𝑖 >. As defined
in “Extracting temporal patterns from Interval-Based Sequences” [27], T. Guyet and R. Quiniou
defined a way of representing temporal sequences “A temporal sequence S is an ordered set
of events, where an event 𝒜 = (𝐴, [𝑙, 𝑢]) is composed of a symbol A and a nonempty interval
[𝑙, 𝑢], where l and u are dates.” Based on this definition, semantic annotations of episodes
can then be considered as temporal events, making the whole semantic trajectory a temporal
sequence. The goal of temporal pattern mining is then to find sequential patterns from a dataset
of temporal sequences, such as ({𝑎, 𝑏}, [1, 5]), where the event a and b are sharing a common
time interval. Several temporal pattern mining algorithms showed good results in extracting
sequential patterns, such as NeGPSan [28] to extract negative sequential patterns. Nevertheless,
these approaches cause a deluge of patterns and remain dedicated to a sequence dataset on a
same alphabet.


3. Description of the NextPriorityConcept algorithm
The NextPriorityConcept algorithm [3] computes concepts from complex and heterogeneous
data for a set of objects 𝐺. We first introduce the notion description 𝛿, which is an application to
provide the smallest set of predicates describing a set of objects 𝐴 ⊆ 𝐺, based on their character-
istics. A characteristic describes our objects, such as numerical, discrete, semantic, temporal etc
… A concept (𝐴, 𝛿(𝐴)) is composed of a subset of A and a set of predicates 𝛿(𝐴) describing the
objects it contains. An example of predicate can be “less than c” or “match subsequence s” where
c is the max of the numerical values and s is a maximal common subsequence of the sequence .
Depending on the data type, generation algorithms will differ. Each predicate is specific to one
type of characteristics. The final description 𝛿 is the union of all those predicates.

                                        𝛿(𝐴) = ⋃ 𝛿𝑖 (𝐴)                                           (1)
                                                 𝑖∈1…𝑛
  Where (𝛿𝑖 (𝐴))𝑖∈1…𝑛 is a family of descriptions. This is the exact principle that allows us to
perform heterogeneous analyses. The generation of a lattice with NextPriorityConcept is
inspired by Bordat’s algorithm [29]. It recursively computes the immediate successors of a
concept, starting at the bottom concept.The NextPriorityConcept focuses on objects G and
starts the computation at the top concept, (𝐺, 𝛿(𝐺)) containing the whole set of objects 𝐺 and
their common description by predicates 𝛿(𝐺) until no more concepts can be generated. These
concepts are computed on demand and do not need any kind of preprocessing.
   In order to generate the immediate predecessors of a concept (𝐴, 𝛿(𝐴)), the algorithm intro-
duces strategies 𝜎 that select predecessors of such a concept based on each characteristic. The
strategy refines the description 𝛿(𝐴) to a reduced set 𝐴′ ⊂ 𝐴. We call cardinality the number
of individuals within 𝐴′ . 𝜎 is an application such that 𝜎 ∶ 2𝐺 → 2𝑃 where 𝑃 is the set of all
possible predicates. Several strategies are available to generate predecessors of a concept such
as the naive strategy (considering all possible predecessors of a concept), or strategies reducing
the number of predecessors.
   The use of predicates regardless of the data allows a generic implementation of algorithms
which mixed with a system of plugins, enables for an easy integration of new data types (de-
scription and strategies), hence, taking into account a wide range of data types. GALACTIC1
(GAlois LAttices, Concept Theory, Implicational systems and Closures) is a development plat-
form for our algorithm allowing easy integration of new plugins for characteristics, descriptions
and strategies.
   NextPriorityConcept maintains the lattice structure using queue for a generation level
by level, and a mechanism of propagation of constraints to ensure the meet and join will be
generated.
   With GALACTIC and the NextPriorityConcept algorithm, we contribute to the field of
temporal sequences by creating two descriptions 𝛿 and a set of strategies 𝜎 where we consider
temporal sequences [30] [31] as:
        • A temporal sequence such that 𝑆 = (⟨𝑎, 𝑡⟩𝑖 )𝑖∈1…𝑛 with 𝑎 ∈ Σ is an item from a dictionary Σ
          and t is a timestamp. Then, we compute the distance (either transition time or duration)
          between two items of S.
        • A temporal sequence such that 𝑆 = (⟨𝑎, [𝑡𝑠 , 𝑡𝑒 ]⟩𝑖 )𝑖∈1…𝑛 where 𝑎 ∈ Σ is an item from a
          dictionary Σ and [𝑡𝑠 , 𝑡𝑒 ] the interval of time during the element a occurs. We will then
          compute common sub-sequences of items during a maximal similar interval of time.


4. Experimentations
4.1. The Geoluciole dataset
The Geoluciole dataset is composed of GPS trajectories in
the city of La Rochelle France. This dataset was collected
during the summer of 2019, from tourists visiting the city
during their vacation. It contains 192 trajectories of dif-
ferent individuals. Their positions were obtained by an
application we developed, giving their position every 30
seconds during their stay. Before activating their applica-     Figure 1: Segmentation of the city of La Rochelle
tion, we also had them fill out a survey, giving contextual               into district

1
    https://galactic.univ-lr.fr
knowledge, such as the staying status or the arrival means.
By matching the GPS coordinates to districts of the city such as shown in figure 1, we can then
transform these raw data into semantic trajectories. The segmentation of the trajectory into a
sequence of districts has been done by using a GIS (Geographic Information Systems). We can
also add all kinds of knowledge we have webscrapped, such as the beach they were and parks
which is spatial information and weather and tide information which is more of a temporal
information (figure 2). This data representation is very similar to the one we can find in [32] a
“Multi-Level and Multiple Aspect Semantic Trajectory Model”.

4.2. The semantic trajectory model
In Table 2, each “aspect” (or side) of those semantic trajectories keeps the specificity of the
data within. The district, weather, tide, green space (parks) and beach are sequences of temporal
intervals. In each “temporal” aspect of a sequence, the temporal information represents the
hour of the day. While the district, green space and beach depends on the localisation of the user
between 𝑡𝑘 and 𝑡𝑘 , the weather and tide depends on the temporal information and are issued
from webscrapping.


Figure 2: Semantic trajectory of a tourist in La Rochelle


   We formalise the semantic trajectory model for a trajectory 𝑇 as follows : 𝑇 ∶ ⟨𝛼1 , 𝛼2 … 𝛼𝑚 ⟩
where 𝑇 is described by 𝑚 aspects and each aspects 𝛼𝑖 , with 𝑖 ≤ 𝑚 can either be a temporal
sequence 𝛼𝑖 = (⟨𝑎𝑘 , 𝑡𝑘 , 𝑡𝑘 ⟩)𝑘≤𝑛 , on an alphabet Σ𝑖 and 𝑎𝑘 ∈ Σ𝑖 , a semantic value 𝛼𝑖 = 𝑋 on an
alphabet Σ𝑖 and 𝑋 ∈ Σ𝑖 , or a numeric value 𝛼𝑖 = 𝑥. Our dataset of trajectories 𝒟 is given by
𝑇𝑗 ∶ ⟨𝛼1𝑗 , 𝛼2𝑗 … 𝛼𝑚𝑗 ⟩. The description space used with GALACTIC for any trajectory 𝑇𝑗 ∈ 𝒟 is a
set of predicate defined by :

                                 𝛿(𝑇𝑗 ) ∶ ⟨𝛿1 (𝛼1𝑗 ), 𝛿2 (𝛼2𝑗 ) … 𝛿𝑚 (𝛼𝑚𝑗 )⟩                   (2)

4.3. Experiment description
In this section, we use GALACTIC to extract concepts from the Geoluciole dataset. First, we will
study the impact of a heterogeneous analysis on the number of concepts (ie. clusters) generated.
Then, we will focus on specific parts of the dataset with selected semantic information in order
to better define touristic behaviors. Table 1 shows the descriptions and strategies used during
this experimentation. Table 2 show a description of each dataset.
                                                 𝛿                          𝜎
               Temporal Sequence      maximal common interval       minimal cardinality
                    Chain                 chain matching             complete match
                   Numeric               simple numerical                quantile
Table 1
Descriptions and strategies associated with datatype


   NextPriorityConcept uses descriptions 𝛿 to define a set of predicates describing the at-
tributes. As predicates, the descriptions can then be seen as binary table. We also makes use of
strategies to choose next predecessors at each iteration.
   In this paper, we will use the following descriptions and strategies :

    • Numeric : We describe a set 𝐴 = {𝛼1 , ..𝛼𝑘 } of numerical values by the “simple numerical”
      description 𝛿𝑆 :

                        𝛿𝑆 (𝐴) = {is greather than min(A), is lesser than max(A)}

      The strategy is the quantile description 𝜎𝑄 (𝐴, 𝑘) where k is the number of quantiles:

                    𝜎 (𝐴) = { is greater than 𝑞𝑗 , is lesser than 𝑞𝑗 : 𝑞𝑗 is a k-quantile }         (3)

    • Chain : For a set 𝐴 = {< 𝑠𝑖 >} of sequences, we use the “chain matching” description
      𝛿𝐶𝑀 (𝐴, 𝑘) that compute the maximal number of subsequences of size k, with ⊆𝑠 is the
      subsequence relation:

                           𝛿𝐶𝑀 (𝐴, 𝑘) = {𝑥 ∈ Σ∗ ∶ ∀𝑎 ∈ 𝐴, 𝑥 ⊑𝑠 𝑎 and |𝑥| = 𝑘}                       (4)

      We use the “complete match” strategy 𝜎𝐶𝑀 that computes all the possibles subsets 𝐴′ ⊂ 𝐴:

                        𝜎𝐶𝑀 (𝐴) = {𝑥 ∶ 𝑥 ∈ 𝛿𝐶𝑀 (𝐴′ ), 𝐴′ = 𝐴\{𝑎}, for all 𝑎 ∈ 𝐴}                    (5)

    • Temporal sequence : We consider a set of temporal sequences 𝐴 = {< 𝑋𝑖 , 𝑇𝑖 >} where
      𝑇𝑖 = 𝑡𝑖 , 𝑡𝑖 is an interval and 𝑋𝑖 an itemset. For a sequence 𝑎 ∈ 𝐴, the projection 𝜙𝑇 (𝑎) selects
      all the itemsets of 𝑎 included in the interval. We use the “maximal common interval”
      description that computes the set of maximal common sub-intervals:

                               𝛿𝑀𝐶𝑇 𝐹 (𝐴) = {⟨(𝑇 , 𝑋 )⟩ ∶ ∀𝑎 ∈ 𝐴, 𝑋 ⊆ Φ𝑇 (𝑎)}                       (6)

      The “minimal cardinality” strategy 𝜎𝐴 𝑀𝐶(𝐴) adds element of minimal cardinality to the
      subsequences of the description:

                          𝜎𝐴𝑀𝐶 (𝐴) = {⟨(𝑇 , 𝑋 )⟩ ∶ ∀𝑎 ∈ 𝐴, 𝜙𝑇 (𝑎) ⊆ 𝑋 and ∀𝑥 ∈ 𝑋                    (7)
                               𝑛𝑏(𝐴, 𝑇 , 𝑥) = |𝐴| and 𝑛𝑏(𝐴, 𝑇 , 𝑋 ) = 𝑛𝑏𝑚𝑖𝑛 (𝐴, 𝑇 )}                (8)
             Aspects              Type                                     Dictionnary                                 Source
             district       Temporal sequence                         Σ = set of city district                       Trajectory
            weather         Temporal sequence        Σ = {‶ 𝑟𝑎𝑖𝑛𝑖𝑛𝑔",‶ 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒 𝑟𝑎𝑖𝑛𝑖𝑛𝑔",‶ 𝑐𝑙𝑜𝑢𝑑𝑦",‶ 𝑠𝑢𝑛𝑛𝑦"}        Web scrapping
              beach         Temporal sequence   Σ = {‶ 𝑛𝑜 beach",‶ 𝑃𝑙𝑎𝑔𝑒 𝑑𝑒 𝑙𝑎 𝐶𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒",‶ 𝑃𝑙𝑎𝑔𝑒 𝑑𝑒𝑠 𝑀𝑖𝑛𝑖𝑚𝑒𝑠"}    Trajectory
          green space       Temporal sequence                          Σ = set of city parks                         Trajectory
               tide         Temporal sequence                      Σ = {‶ ℎ𝑖𝑔ℎ tide",‶ 𝑙𝑜𝑤 tide"}                   Web scrapping
      average temperature       Numeric                                                                             Web scrapping
           stay status           Chain               Σ = {‶ 𝑤𝑖𝑡ℎ 𝑓 𝑎𝑚𝑖𝑙𝑦",‶ 𝑤𝑖𝑡ℎ 𝑓 𝑟𝑖𝑒𝑛𝑑𝑠",‶ 𝑐𝑜𝑢𝑝𝑙𝑒",‶ 𝑎𝑙𝑜𝑛𝑒"}      questionnaire
         arrival means           Chain                          Σ = {‶ 𝑡𝑟𝑎𝑖𝑛",‶ 𝑐𝑎𝑟",‶ 𝑏𝑢𝑠",‶ 𝑣𝑒𝑙𝑜"}                questionnaire

Table 2
Dataset heterogeneous aspects’ description and source


4.4. Impact of heterogeneous data on the patterns
Running a heterogeneous analysis with multiple description and strategy types can reduce the
number of concepts in the final lattice. While a “naive” strategy selects all predicates within a
description 𝛿, such as 𝜎𝐶𝑀 , other strategies such as 𝜎𝐴𝑀𝐶 will select specific predicates and skip
other to refine the analysis.
   Example: Table 3 shows a dataset 𝐷 of five individuals, characterized by a numeric value
and a sequence of intervals. In this example, we will be using the simple numerical description
and quantile strategy for the numerical aspects and the maximal common interval description
alongside the minimal cardinality strategy for the sequence of intervals.

                      Individual         Numeric (𝑥)                          Interval
                          a                 1                   (8.3, 11): “P”, (13, 15): “M”, “H”
                          b                 2                   (10, 12): “P”, ”H”, (14, 16): “M”
                           c                3                   (8.3, 12): “P”, (14, 16): “M”, “C”
                          d                 4                     (7, 9): “P”, “H”, (12, 13): “M”
                          e                 5                   (10, 11): “P”, (12, 12): “M”, “C”
Table 3
Dataset 𝐷

   Let 𝐿 be the lattice shown in Figure 3 as the result of a numeric description and quantile
strategy, taking into account only the numeric characteristics. By adding an interval description
and strategy, it results in the lattice 𝐿′ with a fewer number of concepts. Figure 4 shows that
concept $1 (𝑥 ≤ 4) and $2 (𝑥 ≥ 2) from 𝐿 do not appear in the lattice 𝐿′ as they have not
been selected by the minimal cardinality strategy. The concepts contained in 𝐿′ are the most
representative of the two description spaces.

4.4.1. Naive analysis using time interval.
first, we will only consider districts since they are a geographical segmentation of trajectories
into districts of the city. By doing so, we obtain 416 concepts with a low cardinality value. The
huge number of concepts generated involving only one or two trajectories is an indicator that
there are no significant places where multiple individuals were at a same time of the day. To
sharpen our research, we then choosed to add other aspects of semantic trajectories in order to
discover new significant behaviors depending on external factors. The next section will present
                                                   $0: #5


                                                                                                                                                                            $0: #5

                                        $1: #4             $2: #4
                                        x<=4               x>=2
                                                                                                                                                                                                      $2: #3
                                                                                                                                                        $1: #3
                                                                                                                                           s match ["[8.3;9):['P']"]              x<=3
                                                                                                                                                                                  s match ["[10;11):['P']", "[14;15):[ ...
                               $3: #3                                   $5: #3
                                                   $4: #3
                               x<=3                                     x>=3
                                                                                                                                                   $3: #2                                             $6: #2                 $5: #2
                                                                                                                                                                 $4: #2
                                                                                                                                                   x>=3                           s match ["[10;12):['P']", "[14;16):[ ...   x<=2

                      $6: #2                                                        $9: #2
                                        $7: #2              $8: #2
                      x<=2                                                          x>=4
                                                                                                                                          $7: #1
                                                                                                                                                                          $8: #1             $9: #1         $10: #1
                                                                                                                      s match ["[7;9):['H', 'P']", "[12;13 ...
                                                                                                                                                                          ['c']              ['a']          ['b']
                                                                                                                      ['d']
            $10: #1                                                                          $14: #1
                             $11: #1             $12: #1                 $13: #1
            x<=1                                                                             x>=5
                             ['b']               ['c']                   ['d']
            ['a']                                                                            ['e']
                                                                                                                                                                                      $11: #0


                                                  $15: #0


                                                                                                                  Figure 4: Lattice L’ by adding interval
   Figure 3: Lattice L with quantile strategy                                                                     description and strategy


experimentations by taking into account multiple aspects of semantic trajectories, such as the
weather or the staying status. With only the weather and the same approach, we obtain 379
concepts.

4.4.2. Analysis with heterogeneous data


                                                                                  400

                                                                                  350

                                                                                  300
                                                             NUMBERS OF CONCEPT


                                                                                  250

                                                                                  200

                                                                                  150

                                                                                  100

                                                                                  50

                                                                                   0
                                                                                    WEATHER DISTRICT TIDE       AVG BEACH ARRIVAL STAY
                                                                                                            TEMPERATURE   MEANS STATUS


Figure 5: Number of concept by attributes successively added


   In this experiment, we exploit the ability of GALACTIC to deal with heterogeneous data
by adding successively the other attributes to the “weather” attributes (the ones that generate
the largest number of concepts). Hence it is possible to mix several sequences (such as shown
in Equation 1 and 2) together with other data in a heterogeneous analysis. With nearly all
attributes added successively, we diminish the number of concepts to only 36. Figure 5 gives
the number of concepts obtained for each added attributes that decreases from 379 (only the
weather) to 36 (with all our attributes).
   The table 6 in appendix gives the descriptions by common predicates of three concepts
(concept number 0 is the top concept for all attributes). We can observe that the addition of
more knowledge gives a more precise description of concepts, and a reduction of the number of
patterns.

4.5. Focus on some particular behaviours when reducing pattern
One way for the data analyst to analyse the dataset is to focus on particular behaviors, where a
behavior can be a subset of objects (people going to the beach) or a part of the description (an
action during the night).

4.5.1. The detection of beach trip related to the weather.
In order to focus on visitors going to the beach, we choose to analyse beach with the weather
that is semantically meaningful for this experiment. The dataset is composed of 108 trajectories,
and generates 93 concepts as a result of the formal concept analysis. The Table 7 (found in
Appendix) shows some concepts containing valuable information, such as a strong correlation
between “raining” and “no beach”, meaning that people won’t go to the beach if it’s raining .
   In the dataset, 31 trajectories contain rain.
   On that part of the trajectory we can see that 8 of them can be described with a predicate
which matchs “raining” between 11 and 12 am and “no beach”. Other predicates are also
generated which also describe that bad weather has an impact on a beach trip and this predicate
seems to focus on bad weather in the morning.

4.5.2. The detection of sleeping habits during touristic stay.
With the time information, it is also possible to analyse any part of the day. One possible
user driven analysis could be to see if sleeping habits are dependent on the staying status of
individuals. To do so, we will make an analysis with two attributes, the staying status, which is
a chain and locations which are temporal. With 138 trajectories, we have in total 340 concepts,
and 108 for the sleeping period of time (between 22 and 10).
   Some of the results can be seen in Table 8 (see Appendix). As we can see, predicates are
generated and are supported by nearly 10% of “family” trajectories and more than 10% of “with
friends” trajectories to locations subsequences, which by comparing it with other predicates of
supersequence seem to represent sleeping places (no movement during a long period of time
between 0 and 10 am).


5. Conclusion and future works
In this paper, we propose a way to process heterogeneous data using the NextPriorityConcept
algorithm alongside the GALACTIC platform. In the experiments, we illustrate the advantage
of mixing heterogeneous data while keeping their semantic and readability with two examples.
First, by reducing the number of patterns by adding contextual knowledge onto trajectories
and afterwards by focusing on specific aspects of the data to detect and identify distinctive
behaviors. The specificity of this approach is to keep the raw data within and avoid any kind
vectorization technique. Also, it allow us to enrich the dataset with semantic information
from the space and/or temporal knowledge directly from the trajectory (such as districts in this
paper), surveys filled out by individuals (staying status, arrival means) or even web scrapping
(weather, tide). The plugins system of GALACTIC allows to easily integrate descriptions and
strategies for heterogeneous data and offering a wide range of data types to work with. The
NextPriorityConcept and its ability to run interactive, heterogeneous analysis is a big step
in data science and offers new ways to deal with complex data structures such as semantic
trajectories.
   For future works, we would like to measure the quality of generated concepts to better
represent and sort the relevance of their predicates (such as the stability measure). These results
should become available very soon. Also, we are currently working on an incremental tool in
order to reinforce the interactivity with a user driven tool where the data-analyst can select
data to analyse it in an interactive way. By doing so, we allow the data scientist to select which
side of the data he wants to explore, because only he knows best the semantic of the data he
manipulates. In particular, it gives the possibility to change the strategy or description during
every step of the process (where new concepts are generated), in an intuitive and simple way.
For example weather and tide information are not interesting during the night. We wish that by
giving the opportunity to the data-scientist to interactively explore the data in a user-driven
approach, we will be able to optimize data mining processes.


References
 [1] A. Kontarinis, K. Zeitouni, C. Marinica, D. Vodislav, D. Kotzinos, Towards a Semantic
     Indoor Trajectory Model, in: 2nd International Workshop on ”Big Mobility Data Analytics”
     (BMDA) with EDBT 2019, Lisbon, Portugal, 2019. URL: https://hal.archives-ouvertes.fr/
     hal-02314572.
 [2] S. Ilarri, D. Stojanovic, C. Ray, Semantic management of moving objects: A vision towards
     smart mobility, Expert Systems with Applications 42 (2015) 1418 – 1435. doi:10.1016/
     j.eswa.2014.08.057.
 [3] C. Demko, K. Bertet, C. Faucher, J. Viaud, S. O. Kuznetsov, Nextpriorityconcept: A new and
     generic algorithm computing concepts from complex and heterogeneous data, Theoretical
     Computer Science 845 (2020) 1–20.
 [4] C. Parent, S. Spaccapietra, Semantic trajectories modeling and analysis, ACM Comput.
     Surv. 45 (2013) 1–32. doi:10.1145/2501654.2501656.
 [5] D. Mountain, J. Raper, Modelling human spatio-temporal behaviour: A challenge for
     location-based services, 2001.
 [6] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, C. Vangenot, A
     conceptual view on trajectories, Data Knowledge Engineering 65 (2008) 126–146.
 [7] L. O. Alvares, V. Bogorny, B. Kuijpers, J. A. F. de Macedo, B. Moelans, A. Vaisman, A
     model for enriching trajectories with semantic geographical information, Proc. of the 15th
     Annual ACM International Symposium on Advances in Geographic Information Systems.
     ser. GIS ’07. ACM, 2007 (2007) 22:1–22:8.
 [8] V. Bogorny, C. Renso, A. R. de Aquino, F. de Lucca Siqueira, L. O. Alvares, Constant – a
     conceptual data model for semantic trajectories of moving objects, Transactions in GIS 18
     (2014) 66–88. doi:10.1111/tgis.12011.
 [9] Z. Yan, J. Macedo, C. Parent, S. Spaccapietra, Trajectory ontologies and queries, Transac-
     tions in GIS 12 (2008) 75–91. doi:10.1111/j.1467-9671.2008.01137.x.
[10] K. Toohey, M. Duckham, Trajectory similarity measures, Sigspatial Special 7 (2015) 43–50.
[11] S. J. Camargo, A. W. Robertson, S. J. Gaffney, P. Smyth, M. Ghil, Cluster analysis of typhoon
     tracks. part i: General properties, Journal of Climate 20 (2007) 3635–3653.
[12] N. Juniarta, M. Couceiro, A. Napoli, C. Raïssi, Sequential pattern mining using fca and
     pattern structures for analyzing visitor trajectories in a museum, in: CLA 2018-The 14th
     International Conference on Concept Lattices and Their Applications, 2018.
[13] Y. Chen, P. Yuan, M. Qiu, D. Pi, An indoor trajectory frequent pattern mining algorithm
     based on vague grid sequence, Expert Systems with Applications 118 (2019) 614–624.
[14] C. Zhang, J. Han, L. Shou, J. Lu, T. La Porta, Splitter: Mining fine-grained sequential
     patterns in semantic trajectories, Proceedings of the VLDB Endowment 7 (2014) 769–780.
[15] H. Cao, N. Mamoulis, D. W. Cheung, Mining frequent spatio-temporal sequential patterns,
     in: Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005, pp. 8–pp.
[16] R. Agrawal, R. Srikant, Mining sequential patterns, in: Proceedings of the Eleventh
     International Conference on Data Engineering, 1995, pp. 3–14.
[17] H. Mannila, H. Toivonen, A. Inkeri V., Discovery of frequent episodes in event sequences,
     Data mining and knowledge discovery 1 (1997) 259–289.
[18] X. Yan, J. Han, R. Afshar, Clospan: Mining: Closed sequential patterns in large datasets,
     in: Proceedings of the 2003 SIAM international conference on data mining, SIAM, 2003,
     pp. 166–177.
[19] A. Gomariz, M. Campos, R. Marin, B. Goethals, Clasp: An efficient algorithm for mining
     frequent closed sequences, in: Pacific-Asia Conference on Knowledge Discovery and Data
     Mining, Springer, 2013, pp. 50–61.
[20] B. Ganter, Attribute exploration with background knowledge, Theoretical Computer
     Science 217 (1999) 215–234.
[21] B. Ganter, S. O. Kuznetsov, Pattern structures and their projections, in: International
     conference on conceptual structures, Springer, 2001, pp. 129–142.
[22] A. Buzmakov, E. Egho, N. Jay, S. O. Kuznetsov, A. Napoli, C. Raïssi, On projections of
     sequential pattern structures (with an application on care trajectories), in: The Tenth
     International Conference on Concept Lattices and Their Applications-CLA’13, 2013.
[23] N. Juniarta, M. Couceiro, A. Napoli, C. Raïssi, Sequential Pattern Mining using FCA and
     Pattern Structures for Analyzing Visitor Trajectories in a Museum, in: CLA 2018 - The 14th
     International Conference on Concept Lattices and Their Applications, Olomouc, Czech
     Republic, 2018. URL: https://hal.inria.fr/hal-01887914.
[24] M. Yoshida, T. Iizuka, H. Shiohara, M. Ishiguro, Mining sequential patterns including time
     intervals, in: Data Mining and Knowledge Discovery: Theory, Tools, and Technology II,
     volume 4057, SPIE, 2000, pp. 213–220.
[25] S. Yen, Y. Lee, Mining non-redundant time-gap sequential patterns, Applied Intelligence
     39 (2013) 727–738.
[26] D. Cram, B. Mathern, A. Mille, A complete chronicle discovery approach: application to
     activity analysis, Expert Systems 29 (2012) 321–346.
[27] T. Guyet, R. Quiniou, Extracting temporal patterns from interval-based sequences, in:
     International Join Conference on Artificial Intelligence (IJCAI), Barcelone, Spain, 2011.
     URL: https://hal.inria.fr/inria-00618444.
[28] T. Guyet, R. Quiniou, NeGPSpan: efficient extraction of negative sequential patterns with
     embedding constraints, Data Mining and Knowledge Discovery 34 (2020) 563–609.
[29] J. Bordat, Calcul pratique du treillis de galois d’une correspondance, Mathématiques et
     Sciences humaines 96 (1986) 31–47.
[30] S. E. Boukhetta, C. Demko, K. Bertet, J. Richard, C. Cayèré, Temporal sequence mining
     using fca and galactic, in: International Conference on Conceptual Structures, Springer,
     2021, pp. 185–199.
[31] S. E. Boukhetta, J. Richard, C. Demko, K. Bertet, Interval-based sequence mining using fca
     and the nextpriorityconcept algorithm., in: FCA4AI@ ECAI, 2020, pp. 91–102.
[32] C. Cayèré, C. Sallaberry, C. Faucher, M. Bessagnet, P. Roose, M. Masson, J. Richard, Multi-
     level and multiple aspect semantic trajectory model: Application to the tourism domain,
     ISPRS International Journal of Geo-Information 10 (2021) 592.


Appendix

                    Concepts                          Patterns (common predicates)                                        Individuals
                       0                                             ∅                                                        All
                                                           temperature = 20.75
                                                           stay status: ’family’
                                                      arrival mean: ’personnal car’
                                   district
                                                               Dor
                                  weather
                                                                               cloudy
                       1              tide                                                                                    2
                                                                                          low tide
                                                                     Square Valin
                               green space
                                                             Parc Charruyer
                                    beach
                                                                              No beach
                                                 12     13     14        15      16      17        18        19    20 t
                                                  25.0417 ≥ temperature ≥ 20.75
                                                        stay status: ’family’
                                     district
                                                                   Dor
                                    weather
                                                                     cloudy

                       2                  tide                                                                                2
                                                                               low tide
                                green space
                                                           Parc Charruyer
                                       beach
                                                                              No beach

                                                      12      13         14       15          16        17        18 t


Figure 6: Examples of heterogeneous predicates
                      Patterns (common predicates)                                             Support      Support “raining”         Individuals

                weather match [“[11;12):[’raining’]”]                                           0.074              0.260                   8
                beach match [“[0;24):[’No beach’]”]

                 weather match [“[1;2):[’raining’]”]                                            0.037              0.130                   4
                 beach match [“[0;24):[’No beach’]”]

                 weather match [“[8;9):[’raining’]”]                                            0.028              0.100                   3
                 beach match [“[0;24):[’No beach’]”]

                 weather match [“[1;4):[’raining’]”]
                                                                                                0.019              0.065                   2
                 beach match [“[0;24):[’No beach’]”]


Figure 7: Sample of predicates showing the impact of weather on beach trip


                                      Patterns (common predicates)                                      Support   Support by status     Individuals

                                      ’stay status’ match (’family’,)
                                          district                                                       0.060          0.098                  8
                                                     Oratoire
                                                               9        10 t

                                      ’stay status’ match (’family’,)
                           district                                                                      0.043          0.073                  6
                                          Oratoire                                Oratoire
                                      3        4       5       6    7       8     9     10 t

                                    ’stay status’ match (’family’,)
                                district                                                                 0.036          0.061                  5
                                            Oratoire      Oratoire
                                                   7           8        9        10 t

                                      ’stay status’ match (’family’,)
                      district                                                                           0.036          0.061                  5
                                                               Oratoire
                                           9           10          11       12        13 t

                                      ’stay status’ match (’family’,)
                district                                                                                 0.029          0.049                  4
                                                       Oratoire                         Oratoire
                               1      2        3       4       5    6       7     8     9      10 t

                                   ’stay status’ match (’with friends’,)
                                       district                                                          0.049          0.108                  4
                                                     Fétilly
                                                           7       8            9 t

                                                                                               1
Figure 8: Sleeping habits by status


                                                                                  1