=Paper= {{Paper |id=Vol-1641/paper9 |storemode=property |title=User-Centered Event Data Modelling and Analytics |pdfUrl=https://ceur-ws.org/Vol-1641/paper9.pdf |volume=Vol-1641 |authors=Stefano Valtolina,Marco Mesiti,Luca Ferrari,Koji Zettsu,Minh S. Dao |dblpUrl=https://dblp.org/rec/conf/iseud/ValtolinaMFZD15 }} ==User-Centered Event Data Modelling and Analytics== https://ceur-ws.org/Vol-1641/paper9.pdf
                 User-Centered Event Data Modelling and Analytics

            Stefano Valtolina, Marco Mesiti and                                Koji Zettsu and
                       Luca Ferrari                                             Minh S. Dao

                Department of Computer Science                     Universal Communication Research Insti-
               Università degli Studi di Milano, Italy                     tute, NICT Kyoto, Japan

                  {valtolin, mesiti,                                   {zettsu,dao.minhson}
                 ferrari}@di.unimi.it                                          @nict.go.jp



                       Abstract. Conventional data analytics platforms are not adequate to be ap-
                    plied in the management of emergency situations. The 3V the usually character-
                    ize big data (volume, variety, velocity) along with the issue of integrating infor-
                    mation coming from heterogeneous networks require the development of new
                    systems. In this paper we provide the design of a data analytics platform that we
                    are developing around the concept of event, that is simple or complex data stream
                    gathered from physical and social sensors that are encapsulated with contextual
                    information (space, time, thematics).

                    Keywords: Event ETL, Event Datawarehouse, Event OLAP, Service-Con-
                    trolled Networking


           1        Introduction

           Nowadays we are witnesses of the proliferations of different sensor devices able to
           produce heterogeneous types of data (textual, visual, audio, and other rich multimedia
           formats) that can be profitable used for detecting, handling and advising people of the
           verification of emergency events such as disasters due to natural phenomena (like
           flooding, storming, extreme temperatures etc.). Beside the physical sensors, able to de-
           tect data about physical phenomena (like temperature, humidity, wind, rain, pressure,
           level of see water), there is a proliferation of social sensors able to collect data from
           people (like twitter data, traffic information, train or flight schedule) [1]. These events
           are characterized both from the temporal, spatial and thematic dimensions that can be
           exploited from one side for the identification of the useful information needed to face
           a given emergency event and from another side for the analysis and forecast of useful
           activities to carry out for alerting people and rescue victims from the places of the dis-
           aster. Conventional data analytics platforms cannot be exploited profitably for handling
           this kind of data and new advanced architectures should be developed for several rea-
           sons. First, the sensors (both physical and social) are located in different networks and
           made available by different institutes, agencies and NPOs. In this context, network con-
           figuration, sensor detection and discovery are difficult issues to be solved. Moreover,
           sensors and the data they produce should be handled in real time in order to be properly



                                                            52



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
           elaborated during the emergency event. Therefore, scalable and efficient solutions
           should be devised that can be applied on-line. ETL (extract, transform and load) solu-
           tions, usually applied off-line, need to be revised and applied on-line for feeding the
           DW (data warehouse) with fresh and timely data. Finally, an user-friendly environ-
           ment has been conceived for properly helping the user in the different phases of the
           analysis processes (sensor discovery, feeding, and knowledge inference).
              In this paper we propose a novel OLAP Analytics Platform at the level of event for
           analyzing multi-dimensional relations between multiple data streams based on event
           attributes (spatial, temporal and thematic attributes). The output would be spatial-tem-
           poral-thematic aggregates of multiple data streams (e.g., physical and social sensing
           data streams) representing complex events (e.g., social response to natural disaster) or-
           ganized at different spatio-temporal granularities..
              In Section 2 the new architecture will be illustrated and compared with respect to
           conventional Data Analytic Platform. Section 3 describes the event data model tailored
           for handling multidimensional complex events according to a multigranular spatio-tem-
           poral-thematic (STT) data model. Section 4 starts with discussing the description of the
           services that should be developed for the Event ETL, for then describing the user inter-
           face developed for creating a workflow based on the combinations of different data
           streams. Finally, we provide a description of the current efforts and detail for develop-
           ing the Event OLAP interface, which aim is to support users in analyzing data coming
           from different sensors and detect possible correlation and significant events.


           2      From Conventional to Event based Analytic Platform

           In conventional OLAP system there is a distinction between ETL operations and OLAP
           operations. The former are used for feeding the DW, whereas the latter are used to query
           the DW once the cube has been defined. Moreover, feeding the DW is considered a pre-
           processing phase that is taken off-line with the purpose to solve many issues (extracting,
           cleaning and riconciliate heterogeneous data and denormalize data) in order to make
           somehow simpler the execution of the OLAP operations. Stream DW have been also
           proposed for handling information produced in streams and approaches based on slid-
           ing windows have been introduced to manage continuous data [2].
              Recently the term “active” or “real-time” DW [3] was coined to capture the need for
           a DW containing data as fresh as possible. In this context the periodic population of a
           DW is considered outdated, and new OLAP and DSS services are required in order to
           work on-line, therefore they should handle updates in order to change the data repre-
           sentation and the machine learning models developed on top of them. In Real-time DW,
           data are loaded in real time from the OLTP system into data warehouse, providing a
           convenient way for user to real-timely read the data information and make tactical de-
           cisions. In this context, the ETL and OLAP operations must be tightly connected in
           order to make the process possible and their implementations need to be quite efficient
           and scalable. In our work, we wish to develop real-time DW that is developed around
           the concept of event as described before.




                                                         53



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
              First of all, we need to point out the adoption of a different data model. Indeed,
           conventional DWs adopt the relational model for the representation of the information
           on the storage. This is not adequate in the Event warehouse because of the heterogeneity
           of the data formats used and also the fact that information is stored at different granu-
           larities (ranging from simple raw values of temperature to structured and complex data
           like the objects extracted from a picture, or the mood inferred from tweet data with their
           level of trustiness). Second, conventional systems process stored data, whereas in the
           Event warehouse streams of data should be processed and collected and analyses
           through temporal windows. These streams can be stored in the DW or only aggregated
           information are stored. Orthogonally, contextual information can be associated with the
           stream by means of machine learning algorithms and exploited for extracting
           knowledge. For what concern the implementation of the architecture and of the required
           operations, we are considering the use of cloud-based architectures because we need to
           guarantee high throughput and a low end-to-end latency from the system, despite pos-
           sible fluctuations in the workload. We are currently evaluating several architectures like
           Apache S4 [4], D-Streams [5], Storm [6] and StreamCloud [7]. Apache S4 [4] and
           Storm [6] allow the design of parallel applications for dealing with stream of data.


           3      STT Event Data Model

           In order to represent events, we consider the set of basic and structured (like list and
           records) types whose values are represented thorough a JSON-like notation. These
           types allow to represent different aggregation of values without imposing the re-
           strictions of the relational model and so be able to handle a great variety of datatypes.
           Moreover, we consider spatial and temporal types at different granularities. Temporal
           granularities include seconds, minutes, days with the usual meaning adopted in the Gre-
           gorian calendar, whereas, meters, kilometers, feet, yards, provinces and countries are
           examples of spatial granularities. Different granularities provide different partitions of
           their domains because of the diverse relationships that can exist among granularities,
           depending on the inclusion and the overlapping of granules [8].
              For example, temporal granularity seconds is finer than minutes, and granularity
           months is finer than years. Likewise, spatial granularity provinces is finer than regions.
           Thematics are also considered for associating to a given value a semantic annotation.
           For example, the annotation HR,MR,LR can be associated to the real number repre-
           senting the degree (high, medium, low) of rain that is precipitated in a given area. Sev-
           eral thematics can be identified depending on the context where the datum is acquired
           or processed. Thematics can indeed be inferred by machine learning algorithms. Rely-
           ing on the concepts of temporal and spatial granularities, we present the concept of
           event, that is a value associated with a spatial object at a given time according to given
           thematics. Therefore, an event is a value represented at a given spatio-temporal gran-
           ularity for which thematic information is added. Relying on the concept of events, we
           can characterize an event stream that a source can produce. A source can be sensor
           (either physical or virtual) or a service (for example by aggregating sensors/services
           streams).




                                                         54



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
           4      Event Management

           At the bottom of our architecture, a middleware provides description of the available
           sensors and offers several services including service discovery, service monitoring, ex-
           ecution control, and service message exchanges. These services are exploited for exe-
           cuting Event ETL operators over a programmable network (like SDN) efficiently and
           effectively (e.g., filter by in-network data processing). Currently we developed three
           kinds of operators that are relevant in our context: conversion, merging and connecting
           operators. Conversion operators are used for changing the spatio-temporal-thematics
           granularity of an event stream. Merging operators are used for merging events at the
           same spatio-temporal granularity in a single data structure. Connecting operators are
           used to link a data source to another by applying a SQL joint-like operator. Following,
           for reasons of space the paper does not describe in detail the implementation of these
           operators focusing only on the graphical visual interface using which the user can iden-
           tify the sensors and apply the operators for converting, merging and combining their
           data stream. Finally, we provide the description of the Event OLAP system used for
           analyzing the data and for extracting knowledge.


           4.1    Visual Event ETL
           ETL operations have been proposed in different contexts depending on the kinds of
           data to handle (structured and semi-structured). In [2,3,9] there is a good treatment at
           the conceptual level for feeding a DW. Moreover, in [10] there are approaches for the
           semi-automatic generation of ETL operations depending on the user needs and context
           of use. ETL operations are usually coupled with graphical visual data-flow for helping
           the user in the identification of the original data sources, the application of the opera-
           tions for extracting, cleaning, transforming and combining their data. Once the ETL
           specification is completed, some strategies are proposed for the optimization of the
           data-flow and for the efficient execution of the loading schedule. These approaches
           have been mainly developed for producing relational data to feed conventional DW
           system. In [11] an approach has been presented for feeding arbitrary target sources (ei-
           ther relational or based on a noSql system).
           In our context, we designed a Web environment where the users have the possibility to
           drag and drop different sensor data sources and visually apply on them a set of opera-
           tions. This application offers an engine and graphical environment for data transfor-
           mation and mashup. As depicted in Fig. 1 (section A), this mash-up consists of a user
           interface that contextually displays icons of data sources or operations in order to link,
           filter or merge data coming from different sensors. It is based on the idea of providing
           a visual workflow generator for letting the end user creating aggregation, filtering, and
           porting of data originated by sources. An advanced use of such visual paradigm allows
           the end users to have an online generation of sample data coming from the data sources
           dragged-and-dropped on the canvas, or as result of the operations carried out on them
           (see Fig. 1 section B). This End User development strategy offers a solution able to
           gather information from across the net and trigger specific filters and operations. This
           solution enables a very simple and easy to learn solution based on the definition of



                                                         55



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
           sources to use for collecting data and on the possibility to apply converting, merging
           and connecting operations in a visual way.




            Fig. 1. The image present a screenshot of the User interface of the Visual Event ETL. In sec-
            tion A the image illustrates the canvas used by the user to drag and drop data sources. The so-
             cial sensor “Twitter” is linked to the merge of two different sensors: “RainFall” and ”Wind-
              Force”. In Section B is presented a set of sample data coming from the selected sensor: The
            “RainFall”. In this part of the interface the user can set up filters on the data sources or condi-
                                                  tions on the operations

              Future activates aim at developing machine-learning algorithms for exploiting the
           possible operations that users can apply to a set of data sources or for providing them
           useful predictions of what can happen by integrating the selected data.


           4.2     Event OLAP
           Traditional OLAP front-ends, designed primarily to support routine reporting and anal-
           ysis and offering visualization merely for expressive presentation of the data [12], are
           not suitable in the context of Event OLAP.
              Event OLAP aims at providing a much more powerful data analysis environment,
           accessible through an intuitive user interface for quickly analyzing STT events. Event
           OLAP delivers a web-based interactive environment that allows non-technical users
           but experts of the analysis domain, to explore data in real-time by slicing and dicing,
           pivoting, filtering, and summarizing them in an intuitive way. As depicted in Fig. 2, it
           is based on a 3D map-based visualization through which carried out analysis and mon-
           itoring of trajectories (that is, spatio-temporal movements) of data streams that are part
           of an event (e.g. a thunderstorm). The aim of this environment is to provide users with
           a visualization of the data streams coming from different sensors in order to capture
           possible visual correlations among data. The visual perception of these correlations can
           be used for integrating correlations that can be detected in automatic way by applying




                                                            56



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
           data mining techniques on the data streams. Visual display of events’ trajectories and
           collection of movement statistics can pro-vide useful indicators about how an event
           stream can influence another event streams. Moreover, current efforts are addressed to
           endow the web application with the possible to modify the workflow designed by using
           the Visual Event ETL in order to visualize the related changes in real time on the on
           the Event OLAP system.




            Fig. 2. The image presents a screen shot of the event OLAP web application. The image illus-
           trates as in a specific range of time (2013, August) the concurrence of streams coming from the
            sensor “PM2,5” that collects data of over a specific level of fine dusts threshold and “Twitter”
             that collects tweet containing the keyword “Asthma” is extremely high. This event can bring
             the user to formulate specific hypothesis about the correlation highlighted in the visual inter-
                                                          face.

               Other activities aim at extending this web application with a location intelligence
           visualization strategy to identify patterns and trends by seeing and analyzing data in a
           map view with spatial analysis tools such as thematic maps and spatial statistics. This
           location intelligence service will help to find data by using spatial relationships to filter
           relevant data. Moreover, a temporal condition of this location intelligence service is
           applied for providing spatio-temporal clusters, simulation and visualization, map ani-
           mation and movement tracking. To finish, in order to take into account the social aspect
           of the events collected by the Event OLAP, it will be endowed with a set of function-
           alities for creating a social network of users that will promote the creation of commu-
           nities around each event. This social component exploiting ad-hoc computing tech-
           niques is able to study social network dynamics and to promote crowd-sourcing analy-
           sis for new and meaningful uses of data.




                                                          57



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
           5      Concluding Remarks

           In this paper we have presented the architecture of an Event OLAP system specifically
           conceived for the management of emergency situation. We have detailed some peculi-
           arity of the system and compare it with conventional OLAP systems and current re-
           search efforts. We are currently working on the formal specification of the needed op-
           erations at the ETL, DW and OLAP levels. Moreover, we are choosing the Cloud ar-
           chitecture in which the different services will be implemented and tested. For the testing
           we will exploit the collection of event data made available by NICT.



           6      References
          1. M. Imran, C. Castillo, F. Diaz, and S. Vieweg. Processing Social Media Messages in Mass
             Emergency: A Survey eprint arXiv:1407.7071, 2014.
          2. M. Gorawski and A. Gorawska. Research on the stream ETL process. In Int'l Conf. on Beyond
             Databases, Architectures, and Structures, 61-71, 2014.
          3. H. Zhou, D. Yang, and Y. Xu. An ETL strategy for real-time data warehouse. In Practical
             Applications of Intelligent Systems, volume 124 of Advances in Intelligent and Soft Compu-
             ting, 329-336. Springer, 2012.
          4. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: distributed stream computing platform.
             In IEEE Int'l Conf. on Data Mining Workshops, 170-177, 2010.
          5. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: fault-
             tolerant streaming computation at scale. In ACM Symposium on Operating Systems Princi-
             ples, 423-438, 2013.
          6. N. Marz. Storm: Distributed and fault-tolerant realtime computation, 2012.
          7. V. Gulisano, R. Jiménez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez.
             Streamcloud: An elastic and scalable data streaming system. IEEE Trans. Parallel Distrib.
             Syst., 23(12):2351-2365, 2012.
          8. E. Camossi, E. Bertino, M. Mesiti, and G. Guerrini. Handling expiration of multigranular
             temporal objects. J. Log. Comput., 14(1):23-50, 2004.
          9. P. Vassiliadis, A. Simitsis, and S. Skiadopoulos. Conceptual modeling for ETL processes. In
             Proc. Int'l Workshop on Data Warehousing and OLAP, 14-21, 2002.
         10. V. Theodorou, A. Abellò, M. Thiele, and W. Lehner. A framework for user-centered declar-
             ative etl. In Proc. Int'l Workshop on Data Warehousing and OLAP,67-70, 2014.
         11. M. Mesiti and S. Valtolina. Towards a user-friendly loading system for the analysis of big
             data in the internet of things. In IEEE Computer Software and Applications Conference -
             workshops, 2014, 312-317, 2014.
         12. M. Scholl and S. Mansmann. Visual on-line analytical processing (OLAP). In Encyclopedia
             of Database Systems, 3388-3395. Springer US, 2009.




                                                         58



Proc. of Third International Workshop on Cultures of Participation in the Digital Age - CoPDA 2015
Madrid (Spain), May 26th, 2015 (published at http://ceur-ws.org).
Copyright © 2014 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.