Discover spatio-temporal cluster from trajectory data enhance by heterogeneous contextual knowledge using FCA and the NextPriorityConcept Algorithm Jérémy Richard1,∗ , Guillaume Savarit1 , Salah Eddine Boukhetta1 , Cyril Faucher1 , Karell Bertet1 and Christophe Demko1 1 Laboratory L3i, La Rochelle University, La Rochelle, France Abstract The rising number of different kinds of data that can be used to describe a human trajectory (Such as GPS Coordinate, GSM, RFID, RSSI…) put in the spotlight the semantically rich trajectory. A semantic trajectory annotates semantic knowledge directly into raw data based on features of the studied area such as point of interest or weather conditions. One of the challenges of mobility studies nowadays is to find the right data model to shape all those data coming from different source into a framework flexible enough to multiply the contextual data that can be used; where contextual data are knowledge coming from external data source (public city dataset, web pages, national weather services etc….). Such data models are the key component of mobility studies, but oftentimes lose the computational aspect of trajectories. In this paper, we will use the semantically rich trajectory as a way to analyse behavioral data enriched by contextual knowledge as this issue has rarely been addressed in the state of the art. We will study the use of formal concept analysis and pattern mining as a way to compute complex sequential patterns in a dataset of semantic trajectories by using the NextPriorityConcept algorithm. This kind of formal concept analysis allows an interactive analysis between individuals path and contextual data resulting in a hierarchy of spatio-temporal clusters where each cluster contains a specific pattern depicting the trajectories within. 1. Introduction A trajectory represents the path of a moving object in space. Most of raw trajectories are formalised as a sequence of time related coordinates (𝑥, 𝑦) at a time 𝑡, noted < (𝑥𝑖 , 𝑦𝑖 ), 𝑡𝑖 >. The most common type of data used to reconstruct such paths is GPS coordinates. Yet, with the exponential growth of the internet of Things, we collect more and more data from various ways (such as GSM, radio frequency etc….) that are related to a person’s path (or animals, vehicles etc ..). Therefore, making mobility studies more and more complexes taking into account a wide range of data that can be used to reconstitute the movement of an object in space. This question the way we can use and model this information. To address this problem, researchers consider Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 159–173. ∗ Corresponding author. Envelope-Open jeremy.richard2@univ-lr.fr (J. Richard); guillaume.savarit1@univ-lr.fr (G. Savarit); salah.boukhetta@univ-lr.fr (S. Boukhetta); cyril.faucher@univ-lr.fr (C. Faucher); karell.bertet@univ-lr.fr (K. Bertet); christophe.demko@l3i.univ-lr.fr (C. Demko) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) the use of semantic trajectories [1] [2]. A semantic trajectory adds contextual knowledge to a path of a mobile object; where contextual knowledge can be defined as every information bringing a better understanding of an event, person or object. This information brings to trajectories another point of view avoiding to rely only on movement data. Weather condition or events such as protests or all kinds of “background” information can be used to discover trajectory-shaping decisions. This data needs to be added onto those trajectories and are often time shaped as an episode. An episode is a time interval (𝐴, 𝑡, 𝑡) where 𝑡, 𝑡 are timestamps and A a semantic annotation. Using a succession of episodes such as the sequence of “district” crossed by a trajectory can be interpreted as a “discretization” of this one. Nevertheless, not only the “context-aware” trajectory has been hard to analyze, but also the temporal aspects of a semantic trajectory are largely ignored and only a few studies address this problem directly. In this paper, we propose a way to analyse such enhanced trajectories by using the similarities between semantic trajectories and temporal sequences. Recent advances in the field of pattern mining and Formal Concept Analysis (FCA) show promising ways to analyze heterogeneous data coming from multiple datasets simultaneously; method that we are going to apply to such semantic trajectories. Processing complex semantic trajectories using the GALACTIC framework and the NextPriorityConcept[3] algorithm shows that it is possible to extract such mobility patterns. We also can add any type of contextual data during the process enabling an interactive analysis where we can follow the mobility of an object enriched with all kinds of contextual data evolving on the edges of its trajectory. Also, adding contextual knowledge is a mean to avoid the deluge of pattern by focusing on specific behaviors. We propose a method for movement data analysis focusing on the semantic aspect of trajectories alongside interactive analysis with contextual data. As of today, the number of methods for analysing the relationship between a moving object and background information is relatively low. The paper is organized as follows; in section 2 we will present related work both from the field of trajectory modelling and formal concept analysis. Section 3 will explain the library GALACTIC. In section 4 we will explore the computational properties of such a tool with experimentations on heterogeneous data. We are using real life semantic trajectories captured by our team and composed by touristic path in the city of La Rochelle (France) and enriched with contextual knowledge. 2. From trajectories to temporal sequences: Related work 2.1. Trajectory processing : In this section, we will present an overview of existing state-of-the-art models for semantic trajectories. First, we will detail models and solutions to add more contextual knowledge into trajectories. Next, we will study methods and analysis techniques for movement data. Based on a dataset of semantic trajectories, we will describe the existing solutions used to retrieve similar sub-trajectory common behaviors or movement patterns. 2.1.1. The semantic aspect and the knowledge management. Semantic trajectory is defined by Parent et al. 2013 [4] as : “ … a [raw] trajectory that has been enhanced with annotations and/or one or several complementary segmentation.”. The addition of semantic knowledge directly into a raw trajectory enables researchers to better interpret such data. A simple GPS coordinate can become “street A”, and a series of radio frequency from multiple sensors in a system of indoor localization can become “room A”. The definition and the formalisation of the semantic trajectory will differ from a data model to another but the main idea is to provide a framework generic enough to be used in many contexts. To do so, such models are usually structured as a sequence of time related episodes where an episode is a group of positions from a raw trajectory over a time period according to a preset defined by the data-scientist [5]. This preset can be a POI (Place Of Interest), a behavior or any other attribute that the researcher chooses to focuses on in his model which will be shaped as an alphabet. The first model for semantic trajectories is the “stop and move” model [6]. By semantically annotating when an individual stops and moves again, S. Spaccapietra et al. (along L. O Alvares also based on stop and move [7]) built their semantic trajectories as a succession of stop and move. A stop is a non-empty time interval [𝑡, 𝑡] where the travelling object does not move and a move is a non-empty time interval [𝑡, 𝑡] where the travelling object is in motion. In this model, stop and move are episodes. This work made the first step in the field of semantic trajectory inspiring several works such as [8] where a framework for semantic annotation and enhancement is proposed. Shortly after, semantic trajectories began to use ontology-based models such as [9] still based on the first stop and move model. Ontology formalism is then used in those models as a mean to convert the raw trajectory into a high-level semantic representation. The “stop” and “move” model can be seen as a discretization of the speed while a “district” is a discretization of space. Those ontology-based data models for semantic trajectories are only “queryable” and cannot be used in pattern mining processes. As a matter of fact, the analysis of these data models is a major issue in the field of semantic trajectories. 2.1.2. Trajectory analysis. Mobility patterns tend to vectorize raw GPS trajectories such as the trajectory clustering patterns where the goal is to determine common trajectories in order to group similar ones. Using a similarity measure ( [10] where authors proposed a small overview), several studies proposed clustering algorithms for trajectories such as [11] where authors used those metrics to extract common sub-trajectories. Nevertheless, these studies used raw GPS trajectories and similarity measures can be difficult to adapt to other types of trajectories such as indoor semantic trajectories. In a notable uncommon trajectory clustering study very close to our work [12], J. Nyoman et al. experiment the use of sequential pattern mining for analysing visitor path in a museum. By shaping trajectories as sequences, the article combines sequence mining with algorithms such as MFCS ( for “Mining Frequent Contiguous Subsequences” ) and MRGS ( for “Mining Rare General Subsequences” ) with movement data analysis; authors were able to identify four visiting behaviors. The use of pattern mining techniques to process trajectories shows promising results for analysing and clustering movement data with semantic trajectories [13]. These works are closely related to what has been done in sequential pattern analysis. This is an emerging issue in the field of movement data analysis, but still we can cite [14] where authors proposed Splitter, a framework capable of efficiently mines sequential pattern in semantic trajectories with as a classification approach. In [15], C. Huiping et al. proposed to use an Apriori-like algorithm to solve this problem in a dataset of trajectories. Several studies have addressed this problem in the larger field of sequence mining and they will be discussed next section. 2.2. Introduction to FCA, pattern mining and sequence mining: 2.2.1. The basis of pattern mining. A sequence is defined as a succession of elements from an alphabet Σ, often in the form of 𝑠 =< 𝐴𝑖 >𝑖≤𝑛 , where 𝐴𝑖 ∈ Σ. Each element of the dictionary can be text, action, item, protein and so on. The ordering of elements in a sequence is very important and is a key element in this data modelling. This form of representation gained popularity in the 90s, in particular with the work of Agrawal Srikant with the Mining sequential pattern [16]. Given a dataset of customers, they build sequences as a succession of items purchased by an individual and managed to get several lists of items that are frequently bought together through a pattern mining algorithm and more specifically the Apriori algorithm. By using pattern mining techniques, we allow a semantic kind of representation to be computed and the goal is then to find common patterns in our dataset. Sequence mining is a subfield of data mining which aims at finding patterns in a dataset of sequences that appear more frequently. Patterns can be subsequences, prefixes, suffixes, subsequences according to a sliding window [17]. The first generation of algorithms emerged in the 90s with the article of Agrawal and Srikant [16] which extends the well-known Apriori algorithm to sequence mining. All these algorithms take as input a dataset of sequences and a minimum support threshold, and generate the set of frequent subsequences. For big datasets and a short minimum support, these algorithms take a huge computing time and generate a too large number of patterns that are difficult to interpret and contain redundancy. A second generation of algorithms focuses on maximal patterns also know as closed patterns because they verify a well-known property of closure in order to limit the number of patterns extracted - called the deluge of pattern. Many algorithms directly address this problem (CloSpan [18], ClaSP [19]). 2.2.2. The basis of formal concept analysis. Some algorithms also appear within Formal Concept Analysis (FCA) frameworks [20] and their extensions to pattern structures[21], where the lattice property of closed patterns is promoted. We can mention an article for mining medical care trajectories using pattern structures [22]. Pattern concepts are built as maximal sets of individuals with their maximal common subse- quences, the whole set of concepts is equipped with a specialization/generalization relation from a partially ordered set with the lattice property. This lattice represents the initial data where concepts are clusters of “similar” sequences, and the concept lattice is a hierarchy of clusters (regrouping objects with their associated common patterns). The use of the Formal Concept Analysis while working with semantic sequences is a promising way to find common patterns of behavior [23]. 2.2.3. Temporal sequence mining. To the best of our knowledge, Yoshida et al. [24] where the firsts introduce the notion of temporal pattern mining, called “Delta patterns” where a pattern (𝑎, [0, 3], 𝑏) is a sequential pattern (𝑎, 𝑏) that frequently occurs in the dataset and has a transition time from 𝑎 to 𝑏 of [0, 3], a time interval. This transition time between two elements has known a notable extension in work such as [25]. In [26], researchers focus on mining chronicles, where a chronicle is a couple (ℰ , 𝒯 ) with ℰ is a set of temporal events and 𝒯 a set of temporal constraints on the set ℰ. A temporal constraint noted 𝑒1 [𝑡 − , 𝑡 + ]𝑒2 represent the time gap between two events 𝑒1 and 𝑒2 . Thus, a parallel can be made between semantic trajectories and temporal sequences. By extracting the semantic annotations and the time information of all episodes of the semantic trajectories, we end up with a sequence of time intervals denoted by 𝑆 =< 𝐴𝑖 , 𝑡𝑖 , 𝑡𝑖 >. As defined in “Extracting temporal patterns from Interval-Based Sequences” [27], T. Guyet and R. Quiniou defined a way of representing temporal sequences “A temporal sequence S is an ordered set of events, where an event 𝒜 = (𝐴, [𝑙, 𝑢]) is composed of a symbol A and a nonempty interval [𝑙, 𝑢], where l and u are dates.” Based on this definition, semantic annotations of episodes can then be considered as temporal events, making the whole semantic trajectory a temporal sequence. The goal of temporal pattern mining is then to find sequential patterns from a dataset of temporal sequences, such as ({𝑎, 𝑏}, [1, 5]), where the event a and b are sharing a common time interval. Several temporal pattern mining algorithms showed good results in extracting sequential patterns, such as NeGPSan [28] to extract negative sequential patterns. Nevertheless, these approaches cause a deluge of patterns and remain dedicated to a sequence dataset on a same alphabet. 3. Description of the NextPriorityConcept algorithm The NextPriorityConcept algorithm [3] computes concepts from complex and heterogeneous data for a set of objects 𝐺. We first introduce the notion description 𝛿, which is an application to provide the smallest set of predicates describing a set of objects 𝐴 ⊆ 𝐺, based on their character- istics. A characteristic describes our objects, such as numerical, discrete, semantic, temporal etc … A concept (𝐴, 𝛿(𝐴)) is composed of a subset of A and a set of predicates 𝛿(𝐴) describing the objects it contains. An example of predicate can be “less than c” or “match subsequence s” where c is the max of the numerical values and s is a maximal common subsequence of the sequence . Depending on the data type, generation algorithms will differ. Each predicate is specific to one type of characteristics. The final description 𝛿 is the union of all those predicates. 𝛿(𝐴) = ⋃ 𝛿𝑖 (𝐴) (1) 𝑖∈1…𝑛 Where (𝛿𝑖 (𝐴))𝑖∈1…𝑛 is a family of descriptions. This is the exact principle that allows us to perform heterogeneous analyses. The generation of a lattice with NextPriorityConcept is inspired by Bordat’s algorithm [29]. It recursively computes the immediate successors of a concept, starting at the bottom concept.The NextPriorityConcept focuses on objects G and starts the computation at the top concept, (𝐺, 𝛿(𝐺)) containing the whole set of objects 𝐺 and their common description by predicates 𝛿(𝐺) until no more concepts can be generated. These concepts are computed on demand and do not need any kind of preprocessing. In order to generate the immediate predecessors of a concept (𝐴, 𝛿(𝐴)), the algorithm intro- duces strategies 𝜎 that select predecessors of such a concept based on each characteristic. The strategy refines the description 𝛿(𝐴) to a reduced set 𝐴′ ⊂ 𝐴. We call cardinality the number of individuals within 𝐴′ . 𝜎 is an application such that 𝜎 ∶ 2𝐺 → 2𝑃 where 𝑃 is the set of all possible predicates. Several strategies are available to generate predecessors of a concept such as the naive strategy (considering all possible predecessors of a concept), or strategies reducing the number of predecessors. The use of predicates regardless of the data allows a generic implementation of algorithms which mixed with a system of plugins, enables for an easy integration of new data types (de- scription and strategies), hence, taking into account a wide range of data types. GALACTIC1 (GAlois LAttices, Concept Theory, Implicational systems and Closures) is a development plat- form for our algorithm allowing easy integration of new plugins for characteristics, descriptions and strategies. NextPriorityConcept maintains the lattice structure using queue for a generation level by level, and a mechanism of propagation of constraints to ensure the meet and join will be generated. With GALACTIC and the NextPriorityConcept algorithm, we contribute to the field of temporal sequences by creating two descriptions 𝛿 and a set of strategies 𝜎 where we consider temporal sequences [30] [31] as: • A temporal sequence such that 𝑆 = (⟨𝑎, 𝑡⟩𝑖 )𝑖∈1…𝑛 with 𝑎 ∈ Σ is an item from a dictionary Σ and t is a timestamp. Then, we compute the distance (either transition time or duration) between two items of S. • A temporal sequence such that 𝑆 = (⟨𝑎, [𝑡𝑠 , 𝑡𝑒 ]⟩𝑖 )𝑖∈1…𝑛 where 𝑎 ∈ Σ is an item from a dictionary Σ and [𝑡𝑠 , 𝑡𝑒 ] the interval of time during the element a occurs. We will then compute common sub-sequences of items during a maximal similar interval of time. 4. Experimentations 4.1. The Geoluciole dataset The Geoluciole dataset is composed of GPS trajectories in the city of La Rochelle France. This dataset was collected during the summer of 2019, from tourists visiting the city during their vacation. It contains 192 trajectories of dif- ferent individuals. Their positions were obtained by an application we developed, giving their position every 30 seconds during their stay. Before activating their applica- Figure 1: Segmentation of the city of La Rochelle tion, we also had them fill out a survey, giving contextual into district 1 https://galactic.univ-lr.fr knowledge, such as the staying status or the arrival means. By matching the GPS coordinates to districts of the city such as shown in figure 1, we can then transform these raw data into semantic trajectories. The segmentation of the trajectory into a sequence of districts has been done by using a GIS (Geographic Information Systems). We can also add all kinds of knowledge we have webscrapped, such as the beach they were and parks which is spatial information and weather and tide information which is more of a temporal information (figure 2). This data representation is very similar to the one we can find in [32] a “Multi-Level and Multiple Aspect Semantic Trajectory Model”. 4.2. The semantic trajectory model In Table 2, each “aspect” (or side) of those semantic trajectories keeps the specificity of the data within. The district, weather, tide, green space (parks) and beach are sequences of temporal intervals. In each “temporal” aspect of a sequence, the temporal information represents the hour of the day. While the district, green space and beach depends on the localisation of the user between 𝑡𝑘 and 𝑡𝑘 , the weather and tide depends on the temporal information and are issued from webscrapping. Figure 2: Semantic trajectory of a tourist in La Rochelle We formalise the semantic trajectory model for a trajectory 𝑇 as follows : 𝑇 ∶ ⟨𝛼1 , 𝛼2 … 𝛼𝑚 ⟩ where 𝑇 is described by 𝑚 aspects and each aspects 𝛼𝑖 , with 𝑖 ≤ 𝑚 can either be a temporal sequence 𝛼𝑖 = (⟨𝑎𝑘 , 𝑡𝑘 , 𝑡𝑘 ⟩)𝑘≤𝑛 , on an alphabet Σ𝑖 and 𝑎𝑘 ∈ Σ𝑖 , a semantic value 𝛼𝑖 = 𝑋 on an alphabet Σ𝑖 and 𝑋 ∈ Σ𝑖 , or a numeric value 𝛼𝑖 = 𝑥. Our dataset of trajectories 𝒟 is given by 𝑇𝑗 ∶ ⟨𝛼1𝑗 , 𝛼2𝑗 … 𝛼𝑚𝑗 ⟩. The description space used with GALACTIC for any trajectory 𝑇𝑗 ∈ 𝒟 is a set of predicate defined by : 𝛿(𝑇𝑗 ) ∶ ⟨𝛿1 (𝛼1𝑗 ), 𝛿2 (𝛼2𝑗 ) … 𝛿𝑚 (𝛼𝑚𝑗 )⟩ (2) 4.3. Experiment description In this section, we use GALACTIC to extract concepts from the Geoluciole dataset. First, we will study the impact of a heterogeneous analysis on the number of concepts (ie. clusters) generated. Then, we will focus on specific parts of the dataset with selected semantic information in order to better define touristic behaviors. Table 1 shows the descriptions and strategies used during this experimentation. Table 2 show a description of each dataset. 𝛿 𝜎 Temporal Sequence maximal common interval minimal cardinality Chain chain matching complete match Numeric simple numerical quantile Table 1 Descriptions and strategies associated with datatype NextPriorityConcept uses descriptions 𝛿 to define a set of predicates describing the at- tributes. As predicates, the descriptions can then be seen as binary table. We also makes use of strategies to choose next predecessors at each iteration. In this paper, we will use the following descriptions and strategies : • Numeric : We describe a set 𝐴 = {𝛼1 , ..𝛼𝑘 } of numerical values by the “simple numerical” description 𝛿𝑆 : 𝛿𝑆 (𝐴) = {is greather than min(A), is lesser than max(A)} The strategy is the quantile description 𝜎𝑄 (𝐴, 𝑘) where k is the number of quantiles: 𝜎 (𝐴) = { is greater than 𝑞𝑗 , is lesser than 𝑞𝑗 : 𝑞𝑗 is a k-quantile } (3) • Chain : For a set 𝐴 = {< 𝑠𝑖 >} of sequences, we use the “chain matching” description 𝛿𝐶𝑀 (𝐴, 𝑘) that compute the maximal number of subsequences of size k, with ⊆𝑠 is the subsequence relation: 𝛿𝐶𝑀 (𝐴, 𝑘) = {𝑥 ∈ Σ∗ ∶ ∀𝑎 ∈ 𝐴, 𝑥 ⊑𝑠 𝑎 and |𝑥| = 𝑘} (4) We use the “complete match” strategy 𝜎𝐶𝑀 that computes all the possibles subsets 𝐴′ ⊂ 𝐴: 𝜎𝐶𝑀 (𝐴) = {𝑥 ∶ 𝑥 ∈ 𝛿𝐶𝑀 (𝐴′ ), 𝐴′ = 𝐴\{𝑎}, for all 𝑎 ∈ 𝐴} (5) • Temporal sequence : We consider a set of temporal sequences 𝐴 = {< 𝑋𝑖 , 𝑇𝑖 >} where 𝑇𝑖 = 𝑡𝑖 , 𝑡𝑖 is an interval and 𝑋𝑖 an itemset. For a sequence 𝑎 ∈ 𝐴, the projection 𝜙𝑇 (𝑎) selects all the itemsets of 𝑎 included in the interval. We use the “maximal common interval” description that computes the set of maximal common sub-intervals: 𝛿𝑀𝐶𝑇 𝐹 (𝐴) = {⟨(𝑇 , 𝑋 )⟩ ∶ ∀𝑎 ∈ 𝐴, 𝑋 ⊆ Φ𝑇 (𝑎)} (6) The “minimal cardinality” strategy 𝜎𝐴 𝑀𝐶(𝐴) adds element of minimal cardinality to the subsequences of the description: 𝜎𝐴𝑀𝐶 (𝐴) = {⟨(𝑇 , 𝑋 )⟩ ∶ ∀𝑎 ∈ 𝐴, 𝜙𝑇 (𝑎) ⊆ 𝑋 and ∀𝑥 ∈ 𝑋 (7) 𝑛𝑏(𝐴, 𝑇 , 𝑥) = |𝐴| and 𝑛𝑏(𝐴, 𝑇 , 𝑋 ) = 𝑛𝑏𝑚𝑖𝑛 (𝐴, 𝑇 )} (8) Aspects Type Dictionnary Source district Temporal sequence Σ = set of city district Trajectory weather Temporal sequence Σ = {‶ 𝑟𝑎𝑖𝑛𝑖𝑛𝑔",‶ 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒 𝑟𝑎𝑖𝑛𝑖𝑛𝑔",‶ 𝑐𝑙𝑜𝑢𝑑𝑦",‶ 𝑠𝑢𝑛𝑛𝑦"} Web scrapping beach Temporal sequence Σ = {‶ 𝑛𝑜 beach",‶ 𝑃𝑙𝑎𝑔𝑒 𝑑𝑒 𝑙𝑎 𝐶𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒",‶ 𝑃𝑙𝑎𝑔𝑒 𝑑𝑒𝑠 𝑀𝑖𝑛𝑖𝑚𝑒𝑠"} Trajectory green space Temporal sequence Σ = set of city parks Trajectory tide Temporal sequence Σ = {‶ ℎ𝑖𝑔ℎ tide",‶ 𝑙𝑜𝑤 tide"} Web scrapping average temperature Numeric Web scrapping stay status Chain Σ = {‶ 𝑤𝑖𝑡ℎ 𝑓 𝑎𝑚𝑖𝑙𝑦",‶ 𝑤𝑖𝑡ℎ 𝑓 𝑟𝑖𝑒𝑛𝑑𝑠",‶ 𝑐𝑜𝑢𝑝𝑙𝑒",‶ 𝑎𝑙𝑜𝑛𝑒"} questionnaire arrival means Chain Σ = {‶ 𝑡𝑟𝑎𝑖𝑛",‶ 𝑐𝑎𝑟",‶ 𝑏𝑢𝑠",‶ 𝑣𝑒𝑙𝑜"} questionnaire Table 2 Dataset heterogeneous aspects’ description and source 4.4. Impact of heterogeneous data on the patterns Running a heterogeneous analysis with multiple description and strategy types can reduce the number of concepts in the final lattice. While a “naive” strategy selects all predicates within a description 𝛿, such as 𝜎𝐶𝑀 , other strategies such as 𝜎𝐴𝑀𝐶 will select specific predicates and skip other to refine the analysis. Example: Table 3 shows a dataset 𝐷 of five individuals, characterized by a numeric value and a sequence of intervals. In this example, we will be using the simple numerical description and quantile strategy for the numerical aspects and the maximal common interval description alongside the minimal cardinality strategy for the sequence of intervals. Individual Numeric (𝑥) Interval a 1 (8.3, 11): “P”, (13, 15): “M”, “H” b 2 (10, 12): “P”, ”H”, (14, 16): “M” c 3 (8.3, 12): “P”, (14, 16): “M”, “C” d 4 (7, 9): “P”, “H”, (12, 13): “M” e 5 (10, 11): “P”, (12, 12): “M”, “C” Table 3 Dataset 𝐷 Let 𝐿 be the lattice shown in Figure 3 as the result of a numeric description and quantile strategy, taking into account only the numeric characteristics. By adding an interval description and strategy, it results in the lattice 𝐿′ with a fewer number of concepts. Figure 4 shows that concept $1 (𝑥 ≤ 4) and $2 (𝑥 ≥ 2) from 𝐿 do not appear in the lattice 𝐿′ as they have not been selected by the minimal cardinality strategy. The concepts contained in 𝐿′ are the most representative of the two description spaces. 4.4.1. Naive analysis using time interval. first, we will only consider districts since they are a geographical segmentation of trajectories into districts of the city. By doing so, we obtain 416 concepts with a low cardinality value. The huge number of concepts generated involving only one or two trajectories is an indicator that there are no significant places where multiple individuals were at a same time of the day. To sharpen our research, we then choosed to add other aspects of semantic trajectories in order to discover new significant behaviors depending on external factors. The next section will present $0: #5 $0: #5 $1: #4 $2: #4 x<=4 x>=2 $2: #3 $1: #3 s match ["[8.3;9):['P']"] x<=3 s match ["[10;11):['P']", "[14;15):[ ... $3: #3 $5: #3 $4: #3 x<=3 x>=3 $3: #2 $6: #2 $5: #2 $4: #2 x>=3 s match ["[10;12):['P']", "[14;16):[ ... x<=2 $6: #2 $9: #2 $7: #2 $8: #2 x<=2 x>=4 $7: #1 $8: #1 $9: #1 $10: #1 s match ["[7;9):['H', 'P']", "[12;13 ... ['c'] ['a'] ['b'] ['d'] $10: #1 $14: #1 $11: #1 $12: #1 $13: #1 x<=1 x>=5 ['b'] ['c'] ['d'] ['a'] ['e'] $11: #0 $15: #0 Figure 4: Lattice L’ by adding interval Figure 3: Lattice L with quantile strategy description and strategy experimentations by taking into account multiple aspects of semantic trajectories, such as the weather or the staying status. With only the weather and the same approach, we obtain 379 concepts. 4.4.2. Analysis with heterogeneous data 400 350 300 NUMBERS OF CONCEPT 250 200 150 100 50 0 WEATHER DISTRICT TIDE AVG BEACH ARRIVAL STAY TEMPERATURE MEANS STATUS Figure 5: Number of concept by attributes successively added In this experiment, we exploit the ability of GALACTIC to deal with heterogeneous data by adding successively the other attributes to the “weather” attributes (the ones that generate the largest number of concepts). Hence it is possible to mix several sequences (such as shown in Equation 1 and 2) together with other data in a heterogeneous analysis. With nearly all attributes added successively, we diminish the number of concepts to only 36. Figure 5 gives the number of concepts obtained for each added attributes that decreases from 379 (only the weather) to 36 (with all our attributes). The table 6 in appendix gives the descriptions by common predicates of three concepts (concept number 0 is the top concept for all attributes). We can observe that the addition of more knowledge gives a more precise description of concepts, and a reduction of the number of patterns. 4.5. Focus on some particular behaviours when reducing pattern One way for the data analyst to analyse the dataset is to focus on particular behaviors, where a behavior can be a subset of objects (people going to the beach) or a part of the description (an action during the night). 4.5.1. The detection of beach trip related to the weather. In order to focus on visitors going to the beach, we choose to analyse beach with the weather that is semantically meaningful for this experiment. The dataset is composed of 108 trajectories, and generates 93 concepts as a result of the formal concept analysis. The Table 7 (found in Appendix) shows some concepts containing valuable information, such as a strong correlation between “raining” and “no beach”, meaning that people won’t go to the beach if it’s raining . In the dataset, 31 trajectories contain rain. On that part of the trajectory we can see that 8 of them can be described with a predicate which matchs “raining” between 11 and 12 am and “no beach”. Other predicates are also generated which also describe that bad weather has an impact on a beach trip and this predicate seems to focus on bad weather in the morning. 4.5.2. The detection of sleeping habits during touristic stay. With the time information, it is also possible to analyse any part of the day. One possible user driven analysis could be to see if sleeping habits are dependent on the staying status of individuals. To do so, we will make an analysis with two attributes, the staying status, which is a chain and locations which are temporal. With 138 trajectories, we have in total 340 concepts, and 108 for the sleeping period of time (between 22 and 10). Some of the results can be seen in Table 8 (see Appendix). As we can see, predicates are generated and are supported by nearly 10% of “family” trajectories and more than 10% of “with friends” trajectories to locations subsequences, which by comparing it with other predicates of supersequence seem to represent sleeping places (no movement during a long period of time between 0 and 10 am). 5. Conclusion and future works In this paper, we propose a way to process heterogeneous data using the NextPriorityConcept algorithm alongside the GALACTIC platform. In the experiments, we illustrate the advantage of mixing heterogeneous data while keeping their semantic and readability with two examples. First, by reducing the number of patterns by adding contextual knowledge onto trajectories and afterwards by focusing on specific aspects of the data to detect and identify distinctive behaviors. The specificity of this approach is to keep the raw data within and avoid any kind vectorization technique. Also, it allow us to enrich the dataset with semantic information from the space and/or temporal knowledge directly from the trajectory (such as districts in this paper), surveys filled out by individuals (staying status, arrival means) or even web scrapping (weather, tide). The plugins system of GALACTIC allows to easily integrate descriptions and strategies for heterogeneous data and offering a wide range of data types to work with. The NextPriorityConcept and its ability to run interactive, heterogeneous analysis is a big step in data science and offers new ways to deal with complex data structures such as semantic trajectories. For future works, we would like to measure the quality of generated concepts to better represent and sort the relevance of their predicates (such as the stability measure). These results should become available very soon. Also, we are currently working on an incremental tool in order to reinforce the interactivity with a user driven tool where the data-analyst can select data to analyse it in an interactive way. By doing so, we allow the data scientist to select which side of the data he wants to explore, because only he knows best the semantic of the data he manipulates. In particular, it gives the possibility to change the strategy or description during every step of the process (where new concepts are generated), in an intuitive and simple way. For example weather and tide information are not interesting during the night. We wish that by giving the opportunity to the data-scientist to interactively explore the data in a user-driven approach, we will be able to optimize data mining processes. References [1] A. Kontarinis, K. Zeitouni, C. Marinica, D. Vodislav, D. Kotzinos, Towards a Semantic Indoor Trajectory Model, in: 2nd International Workshop on ”Big Mobility Data Analytics” (BMDA) with EDBT 2019, Lisbon, Portugal, 2019. URL: https://hal.archives-ouvertes.fr/ hal-02314572. [2] S. Ilarri, D. Stojanovic, C. Ray, Semantic management of moving objects: A vision towards smart mobility, Expert Systems with Applications 42 (2015) 1418 – 1435. doi:10.1016/ j.eswa.2014.08.057. [3] C. Demko, K. Bertet, C. Faucher, J. Viaud, S. O. Kuznetsov, Nextpriorityconcept: A new and generic algorithm computing concepts from complex and heterogeneous data, Theoretical Computer Science 845 (2020) 1–20. [4] C. Parent, S. Spaccapietra, Semantic trajectories modeling and analysis, ACM Comput. Surv. 45 (2013) 1–32. doi:10.1145/2501654.2501656. [5] D. Mountain, J. Raper, Modelling human spatio-temporal behaviour: A challenge for location-based services, 2001. [6] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, C. Vangenot, A conceptual view on trajectories, Data Knowledge Engineering 65 (2008) 126–146. [7] L. O. Alvares, V. Bogorny, B. Kuijpers, J. A. F. de Macedo, B. Moelans, A. Vaisman, A model for enriching trajectories with semantic geographical information, Proc. of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems. ser. GIS ’07. ACM, 2007 (2007) 22:1–22:8. [8] V. Bogorny, C. Renso, A. R. de Aquino, F. de Lucca Siqueira, L. O. Alvares, Constant – a conceptual data model for semantic trajectories of moving objects, Transactions in GIS 18 (2014) 66–88. doi:10.1111/tgis.12011. [9] Z. Yan, J. Macedo, C. Parent, S. Spaccapietra, Trajectory ontologies and queries, Transac- tions in GIS 12 (2008) 75–91. doi:10.1111/j.1467-9671.2008.01137.x. [10] K. Toohey, M. Duckham, Trajectory similarity measures, Sigspatial Special 7 (2015) 43–50. [11] S. J. Camargo, A. W. Robertson, S. J. Gaffney, P. Smyth, M. Ghil, Cluster analysis of typhoon tracks. part i: General properties, Journal of Climate 20 (2007) 3635–3653. [12] N. Juniarta, M. Couceiro, A. Napoli, C. Raïssi, Sequential pattern mining using fca and pattern structures for analyzing visitor trajectories in a museum, in: CLA 2018-The 14th International Conference on Concept Lattices and Their Applications, 2018. [13] Y. Chen, P. Yuan, M. Qiu, D. Pi, An indoor trajectory frequent pattern mining algorithm based on vague grid sequence, Expert Systems with Applications 118 (2019) 614–624. [14] C. Zhang, J. Han, L. Shou, J. Lu, T. La Porta, Splitter: Mining fine-grained sequential patterns in semantic trajectories, Proceedings of the VLDB Endowment 7 (2014) 769–780. [15] H. Cao, N. Mamoulis, D. W. Cheung, Mining frequent spatio-temporal sequential patterns, in: Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005, pp. 8–pp. [16] R. Agrawal, R. Srikant, Mining sequential patterns, in: Proceedings of the Eleventh International Conference on Data Engineering, 1995, pp. 3–14. [17] H. Mannila, H. Toivonen, A. Inkeri V., Discovery of frequent episodes in event sequences, Data mining and knowledge discovery 1 (1997) 259–289. [18] X. Yan, J. Han, R. Afshar, Clospan: Mining: Closed sequential patterns in large datasets, in: Proceedings of the 2003 SIAM international conference on data mining, SIAM, 2003, pp. 166–177. [19] A. Gomariz, M. Campos, R. Marin, B. Goethals, Clasp: An efficient algorithm for mining frequent closed sequences, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2013, pp. 50–61. [20] B. Ganter, Attribute exploration with background knowledge, Theoretical Computer Science 217 (1999) 215–234. [21] B. Ganter, S. O. Kuznetsov, Pattern structures and their projections, in: International conference on conceptual structures, Springer, 2001, pp. 129–142. [22] A. Buzmakov, E. Egho, N. Jay, S. O. Kuznetsov, A. Napoli, C. Raïssi, On projections of sequential pattern structures (with an application on care trajectories), in: The Tenth International Conference on Concept Lattices and Their Applications-CLA’13, 2013. [23] N. Juniarta, M. Couceiro, A. Napoli, C. Raïssi, Sequential Pattern Mining using FCA and Pattern Structures for Analyzing Visitor Trajectories in a Museum, in: CLA 2018 - The 14th International Conference on Concept Lattices and Their Applications, Olomouc, Czech Republic, 2018. URL: https://hal.inria.fr/hal-01887914. [24] M. Yoshida, T. Iizuka, H. Shiohara, M. Ishiguro, Mining sequential patterns including time intervals, in: Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, volume 4057, SPIE, 2000, pp. 213–220. [25] S. Yen, Y. Lee, Mining non-redundant time-gap sequential patterns, Applied Intelligence 39 (2013) 727–738. [26] D. Cram, B. Mathern, A. Mille, A complete chronicle discovery approach: application to activity analysis, Expert Systems 29 (2012) 321–346. [27] T. Guyet, R. Quiniou, Extracting temporal patterns from interval-based sequences, in: International Join Conference on Artificial Intelligence (IJCAI), Barcelone, Spain, 2011. URL: https://hal.inria.fr/inria-00618444. [28] T. Guyet, R. Quiniou, NeGPSpan: efficient extraction of negative sequential patterns with embedding constraints, Data Mining and Knowledge Discovery 34 (2020) 563–609. [29] J. Bordat, Calcul pratique du treillis de galois d’une correspondance, Mathématiques et Sciences humaines 96 (1986) 31–47. [30] S. E. Boukhetta, C. Demko, K. Bertet, J. Richard, C. Cayèré, Temporal sequence mining using fca and galactic, in: International Conference on Conceptual Structures, Springer, 2021, pp. 185–199. [31] S. E. Boukhetta, J. Richard, C. Demko, K. Bertet, Interval-based sequence mining using fca and the nextpriorityconcept algorithm., in: FCA4AI@ ECAI, 2020, pp. 91–102. [32] C. Cayèré, C. Sallaberry, C. Faucher, M. Bessagnet, P. Roose, M. Masson, J. Richard, Multi- level and multiple aspect semantic trajectory model: Application to the tourism domain, ISPRS International Journal of Geo-Information 10 (2021) 592. Appendix Concepts Patterns (common predicates) Individuals 0 ∅ All temperature = 20.75 stay status: ’family’ arrival mean: ’personnal car’ district Dor weather cloudy 1 tide 2 low tide Square Valin green space Parc Charruyer beach No beach 12 13 14 15 16 17 18 19 20 t 25.0417 ≥ temperature ≥ 20.75 stay status: ’family’ district Dor weather cloudy 2 tide 2 low tide green space Parc Charruyer beach No beach 12 13 14 15 16 17 18 t Figure 6: Examples of heterogeneous predicates Patterns (common predicates) Support Support “raining” Individuals weather match [“[11;12):[’raining’]”] 0.074 0.260 8 beach match [“[0;24):[’No beach’]”] weather match [“[1;2):[’raining’]”] 0.037 0.130 4 beach match [“[0;24):[’No beach’]”] weather match [“[8;9):[’raining’]”] 0.028 0.100 3 beach match [“[0;24):[’No beach’]”] weather match [“[1;4):[’raining’]”] 0.019 0.065 2 beach match [“[0;24):[’No beach’]”] Figure 7: Sample of predicates showing the impact of weather on beach trip Patterns (common predicates) Support Support by status Individuals ’stay status’ match (’family’,) district 0.060 0.098 8 Oratoire 9 10 t ’stay status’ match (’family’,) district 0.043 0.073 6 Oratoire Oratoire 3 4 5 6 7 8 9 10 t ’stay status’ match (’family’,) district 0.036 0.061 5 Oratoire Oratoire 7 8 9 10 t ’stay status’ match (’family’,) district 0.036 0.061 5 Oratoire 9 10 11 12 13 t ’stay status’ match (’family’,) district 0.029 0.049 4 Oratoire Oratoire 1 2 3 4 5 6 7 8 9 10 t ’stay status’ match (’with friends’,) district 0.049 0.108 4 Fétilly 7 8 9 t 1 Figure 8: Sleeping habits by status 1