Spatio-Temporal Data Mining: From Big Data to Patterns Maguelonne Teisseire UMR TETIS (Cirad, Irstea, AgroParisTech, CNRS) – France maguelonne.teisseire@irstea.fr Web: www.textmining.biz Abstract With the dramatic growth of spatial informa- tion and Geographic Information Systems (GIS), Technological advances in terms of data many studies have been carried out in the context acquisition enable to better monitor dy- of spatiotemporal patterns mining. Early work in namic phenomena in various domains (ar- this area has dealt with spatial and temporal di- eas, fields) including environment. The mensions separately. Extraction of temporal se- collected data is more and more complex quences aims at identifying features frequent over - spatial, temporal, heterogeneous and time without taking into account spatial relation- multi-scale. Exploiting this data requires ships. Colocation mining methods extract set of new data analysis and knowledge discov- features which frequently appear in close objects ery methods. In that context, approaches without taking into account the temporal aspect. aimed at discovering spatio-temporal pat- More recently, these works have been extended terns are particularly relevant. This paper1 to simultaneously integrate spatial and temporal focuses on spatio-temporal data and asso- dimensions. Examples include the detection of ciated data mining methods. sequences of located events and trajectory min- ing. A review has been published by the consor- 1 Spatio-temporal Data tium GeoPKDD (Giannotti and Pedreschi, 2008). In recent years, technological advances in data ac- However, in those approaches, the mined patterns quisition (satellite images, sensors, etc.) have en- do not match the spatial complexity encountered abled numerous applications in surveillance and when dealing with sattelite images. Similarly, the environmental monitoring: detection of abrupt primitive constraints usually used (typically mini- changes (natural disasters, etc.), evolution track- mum frequency) are not sufficient to express crite- ing of natural phenomena (coastal erosion, deser- ria of interest for experts, such as geologists. tification, wildfires, etc.) or development of mod- A spatiotemporal database contains information els (hydrology, agriculture, etc.). The collected characterized by a spatial and a temporal dimen- data is usually heterogeneous, multiscale, spa- sions. Two types of spatiotemporal databases are tial and temporal (time series of satellite images, mainly considered: databases containing trajecto- aerial or terrestrial photos, digital terrain models, ries of moving objects located in both space and physical ground measurements, qualitative obser- time (e.g. bird or aircraft trajectories); databases vations, etc. ). This data is used to understand and storing spatial and temporal dynamics of events predict phenomena generated by processes that are (e.g. erosion evolution in a region or epidemic complex and of multidisciplinary origin (climatic, spread in a city). geological, etc.). Exploitation by experts of those 2 Mining moving object trajectories huge volume of complex data (big data) requires not only to structure it to the best but also and The emergence of new mobile technologies has mainly to design data analysis and knowledge dis- facilitated the collection of large amounts of spa- covery methods. In that context, approaches in- tiotemporal data, dedicated to the localization of volving pattern mining are particularly relevant. mobile objects in space and time (Perera et al., 1 2015). These new databases provide opportunities The content of the paper was prepared in collaboration with H. Alatrista Salas, S. Bringay, F. Flouvat, and N. Sel- for new applications. The project GeoPKDD (Gi- maoui. annotti and Pedreschi, 2008), for example, studied 17 the development of traffic planning in large cities for those patterns would further facilitate their in- according to vehicle- flows. Other application do- terpretation. Many application areas remain to mains include socio-economic geography, sports be explored as for example image-mining where (e.g. football players), fishing control and weather large amounts of data are available but few effec- forecast- (e.g. hurricanes). In most of these appli- tive and scalable methods have been developed so cations, the number of paths is high. One of the far. Finally, there is a real need for collaboration objectives of trajectory analysis is to find the most between domain experts and data mining experts. relevant paths according to the targeted problem Collaboration is the key to success for the knowl- (e.g. the most frequent, the most unexpected, peri- edge extraction process. odic, etc.). Several approaches have been recently proposed in the literature, for instance (Orakzai et al., 2015). References Hugo Alatrista-Salas, Sandra Bringay, Frédéric Flou- 3 Spatial patterns and spatiotemporal vat, Nazha Selmaoui-Folcher, and Maguelonne Teis- patterns for located event- mining seire. 2016. Spatio-sequential patterns mining: Be- yond the boundaries. Intell. Data Anal., 20(2):293– The extraction of spatial and spatiotemporal pat- 316. terns has been studied extensively in recent years Mete Celik, Shashi Shekhar, James P. Rogers, and in geographic data and GIS. There are two families James A. Shine. 2008. Mixed-drove spatiotem- of approaches: colocations (Shekhar and Huang, poral co-occurrence pattern mining. IEEE TKDE, 20(10):1322–1335. 2001) that identify events that are frequently close; and spatiotemporal patterns that identify the evo- Fosca Giannotti and Dino Pedreschi, editors. 2008. lution of events in both space and time (Alatrista- Mobility, Data Mining and Privacy - Geographic Knowledge Discovery. Springer. Salas et al., 2016). Sequences and more gener- ally graphs have often been used and extended to Christian S. Jensen, Markus Schneider, Bernhard Seeger, and Vassilis J. Tsotras, editors. 2001. Ad- the spatiotemporal context in order to represent the vances in Spatial and Temporal Databases, 7th propagation of phenomena in space and time. Col- International Symposium, SSTD 2001, Redondo locations focus on objects and their spatial rela- Beach, CA, USA, July 12-15, 2001, Proceedings, tionships, for instance (Shekhar and Huang, 2001; volume 2121 of Lecture Notes in Computer Science. Celik et al., 2008). Springer. Faisal Orakzai, Thomas Devogele, and Toon Calders. 4 Conclusion 2015. Towards distributed convoy pattern mining. In Proceedings of the 23rd SIGSPATIAL Interna- The challenges associated with spatial and spatio- tional Conference on Advances in Geographic Infor- temporal databases are numerous. Firstly, the se- mation Systems, Bellevue, WA, USA, November 3-6, mantics of extracted patterns must be considered 2015, pages 50:1–50:4. to present experts with patterns which actually Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi meet their application needs. Patterns with more Lakhal. 1998. Pruning closed itemset lattices for associations rules. In BDA’98. complex structures, such as attributed graphs, can be really effective in spatial databases as shown by Kushani Perera, Tanusri Bhattacharya, Lars Kulik, and Pasquier’s promising work (Pasquier et al., 1998) James Bailey. 2015. Trajectory inference for mobile devices using connected cell towers. In Proceedings and (Sanhes et al., 2013). In addition, methods of the 23rd SIGSPATIAL International Conference of spatio-temporal data mining often generate a on Advances in Geographic Information Systems, lot of patterns, sometimes more than the size of Bellevue, WA, USA, November 3-6, 2015, pages original data. It is therefore important to define 23:1–23:10. measures of interest that enable experts to select Jérémy Sanhes, Frédéric Flouvat, Claude Pasquier, the most relevant patterns. As highlighted in the Nazha Selmaoui-Folcher, and Jean-François Bouli- method based on colocations, it is also necessary caut. 2013. Weighted path as a condensed pattern in a single attributed DAG. In IJCAI 2013, Beijing, to include - the domain knowledge (e.g. metadata, China, August 3-9, 2013, pages 1642–1648. semantic descriptions, ontologies, etc.) in the ex- traction process to improve the scalability as well Shashi Shekhar and Yan Huang. 2001. Discovering spatial co-location patterns: A summary of results. as the quality of the extracted patterns and their in- In Jensen et al. (Jensen et al., 2001), pages 236–256. terpretation. A definition of relevant visualizations 18