=Paper=
{{Paper
|id=None
|storemode=property
|title=MappingSets for Spatial Observation Data Warehouses
|pdfUrl=https://ceur-ws.org/Vol-1075/06.pdf
|volume=Vol-1075
|dblpUrl=https://dblp.org/rec/conf/immoa/ViqueiraMVT13
}}
==MappingSets for Spatial Observation Data Warehouses==
MappingSets for Spatial Observation Data Warehouses José R.R. Viqueira Sebastián Villarroya COGRADE - CITIUS COGRADE - CITIUS Universidade de Santiago de Universidade de Santiago de Compostela, Spain Compostela, Spain jrr.viqueira@usc.es sebastian.villarroya@usc.es David Martı́nez José A. Taboada COGRADE - CITIUS COGRADE - CITIUS Universidade de Santiago de Universidade de Santiago de Compostela, Spain Compostela, Spain david.martinez.casas joseangel.taboada@usc.es @usc.es ABSTRACT quency. The latter are started by some external event at The amount of time evolving spatial data that is currently any moment in time. being generated by automatic observation processes is huge. Observation data has an inherent temporal nature. Be- In general, observation data consists of both heterogeneous sides, in many cases FOIs are also spatial. Therefore, sys- spatio-temporal data and relevant observation metadata. tems devoted to observation data analysis should cope with The former includes data of Spatial Entities (cities, roads, spatial and spatio-temporal data analysis. In particular, vehicles, etc.) and data of temporal evolution of both prop- they should support relevant functionality for the manage- erties of Spatial Entities (population of a city, position of ment of Spatial Entities and Spatial Coverages, and their a vehicle, etc.) and properties of space (temperature, el- evolution with respect to time [9, 20, 6]. Spatial Entities are evation, etc.). Real uniform integrated management of all entities of a given application domain that have geometric these types of data is still not achieved by current models valued properties (rivers, municipalities, cities, etc.). Spatial and systems. The present paper describes the design of a Coverages are sets of functions with a common spatial do- data modeling and management framework for observation main that describe the continuous or discrete variation over data warehouses. A hybrid logical-functional data model space of some specific phenomenon (temperature, humidity, based on the concept of MappingSet and relevant language elevation above sea level, etc.). enables the specification of spatio-temporal analytical pro- The amount of data that is currently being obtained from cesses. The framework in currently being implemented. automatic observation processes is huge and the estimated tendency is to have an exponential growth during the up- coming years. The analysis of all these data to support 1. INTRODUCTION appropriate decision making is key challenge for future in- According to [16], properties of entities (called Features formation systems. Many application domains exist that of Interest - FOI) are either exact values assigned by some would benefit from innovative technologies in this area, in- authority (names, prices, geometry of a municipality, etc.) cluding environmental observation and monitoring, natural or estimated by some observation process (height, classifica- disaster management, e-health, etc. tion, color, etc.). Observation processes may be classified in Based on the above, in the present paper a data modeling various different ways [15]. Physical Processes produce their and management solution is proposed that enables spatio- data in some spatial context. They are usually hardware temporal analysis in data warehouses of observation data. In sensing devices that perform measurements either locally or particular, a proposed E-R extension enables the insertion remotely. Besides, they may be installed in either static or of observation metadata in spatial models at a conceptual mobile platforms. Non-Physical Processes are computations level. At a logical level, a new data model based on Map- that may be defined in some mathematical way. Any pro- pingSets enables the integrated management of any kind of cess may be either Time-triggered o Event-triggered. The spatial and temporal data. A MappingSet is a collection of former perform their results at some predefined time fre- Mappings, in the functional programming sense, defined on a common domain. Both Spatial Entities and Spatial Cov- erages and both Time-triggered and Event-triggered obser- vation data are modeled uniformly with MappingSets. The remainder of this paper is organized as follows. Sec- tion 2 describes other pieces of work related to the pro- posed solution. The MappingSet based spatio-temporal log- ical model is described in Section 3. The conceptual level E-R extension for observation data is described in Section 4, as it is also its translation to the MappingSet based logi- cal model. Section 5 illustrates the spatio-temporal analysis Proceedings IMMoA’13 601 http://www.dbis.rwth-aachen.de/IMMoA2013/ capabilities of the model for the definition of Non-Physical on those input streams. Continuous query languages [1, 11] spatio-temporal analytical processes. Finally, Section 6 con- enable the definition of those continuous queries on both cludes the paper and outlines lines of future work. data streams and recorded relations. Operations to create relations from streams and streams from relations are at the 2. RELATED WORK core of those languages. A similar approach is followed by some languages specifically designed to access sensor net- The OGC defines an abstract specification of a data model works [13, 7]. It is important to notice that spatial data, for Observations and Measurements [16] in a Geographic including spatial entities and spatial coverages and spatial Information context. Various types of observations are sup- analysis is not explicitly supported in these solutions. ported, according to the data type of their values. Simple observations include: i) measurements that combine a value of a real type with a unit of measure, ii) categories whose 3. SPATIO-TEMPORAL MAPPINGSET results are items of enumerated types, iii) counts of inte- BASED DATA MANAGEMENT ger types, iv) truth observations of boolean type, v) time This section introduces the MappingSet based data model observations and vi) geometric observations. Complex ob- that is the basis of the proposed framework. Temporal and servations are record structures that combine various simple spatial data types are first defined. Based on them Mappings observation types. Metadata of each observation is also rep- and MappingSets are next formalized. Data management resented in the model. In particular, each observation refer- will be based on the intensional definition of MappingSets ences its observation Process, the observed Property and its using both logical and functional paradigms. related FOI, the time instant when the observation applies Conventional data types include Boolean, CString (vari- to the FOI observed property (phenomenon time) and the able size character strings), Int16, Int32, Int64 (integers), time instant when the Process obtained the result value (re- Float32, Float64 (reals with floating point representation). sult time). Notice for example that if a sample of water is Fixed point parametric type Numeric(P,D) consists of real obtained from a river and next analyzed in a laboratory two numbers with a maximum of P digits, D of them are in the different observation time instants are involved. Optionally, fractional part. In order to define temporal and spatial data other metadata, parameters, data quality information and types, 1D and 2D samplings are first formalized. Let R and observation context may also be provided. I denote the set of real and integer numbers, respectively, In [4] a conceptual model to represent observation data se- then 1D and 2D samplings are defined as follows. mantics is defined. Annotating the conventional data mod- els of available heterogeneous datasets with observation and Definition 1. A 1D-sampling S with resolution r ∈ R measurement conceptual constructs enables their integra- and phase p ∈ R is defined as the infinite subset of R tion at a semantic level. Integrated query of heterogeneous {x|x = i · r + p, ∀i ∈ I} observation datasets becomes therefore possible after the an- notation process. A similar approach is followed by the E-R Definition 2. Let vr1 , vr2 , vp1 and vp2 be four vectors extension proposed in the present paper. in R2 defined by respective directions D1 , D2 , D1 , D2 ∈ Observation data has always a temporal nature. Besides, (−π, π] and respective magnitudes r1 , r2 , p1 and p2 . A 2D- the spatial components of observation data and metadata is sampling S with directions D1 , D2 , resolutions r1 , r2 and centric to many application domains, such as those related phases p1 ∈ [−r1 /2, r1 /2], p2 ∈ [−r2 /2, r2 /2] is defined as to environmental observation and monitoring. Spatial and the infinite subset of R2 temporal extensions of conceptual and logical data models {(x, y) ∈ R2 | have to be considered. Examples of spatio-temporal concep- x = (i1 r1 + p1 ) cos(D1 ) + (i2 r2 + p2 ) cos(D2 )∧ tual models are [18, 17]. Relational and object-relational y = (i1 r1 + p1 ) sin(D1 ) + (i2 r2 + p2 ) sin(D2 ), spatio-temporal extensions are defined in the area of Spa- ∀i1 , i2 ∈ I} tial Databases [9, 20] to support spatial entity management. An element s of a 1D-sampling (2D-sampling) S is called a Field [6] and array algebras [3] are behind spatial cover- 1D-sample (2D-sample). Integer i, i1 , i2 are called the sam- age and array management systems [14, 5, 2]. Integrated pling coordinates of s. s(i), s(i1 , i2 ) denote respectively the management of spatial entities and coverages is also objec- 1D-sample and 2D-sample with sampling coordinates i and tive of some approaches [19, 12], that incorporate different (i1 , i2 ). Figure 1 illustrates the above definitions with a structures for those data types. Integrated management of geometrical representation. entities and coverages in a uniform manner is achieved by the MappingSet data model proposed in the present paper. Definition 3. TimeInstant(D) is defined as a finite sub- Various different data management approaches are pos- set of elements s(i) of a 1D-sampling S with resolution 10−D sible to deal with spatio-temporal observation data auto- and phase 0 such that matically generated by sensing devices. If we consider the −263 < i < 263 + 1 data generated by each sensor as a virtual temporal rela- where each s(i) is interpreted as the time instant 1/1/1970+ tion, then the simplest approach is to consider Materialized s(i) seconds. Maximum allowed D is 6 (microsecond). Views of such virtual relations. Automatic maintenance of such views on the arrival of new data from sensors has to Definition 4. TimeInstantSample(D, R) is defined as be solved by the system [8]. Automatically updating these a finite subset of elements s(i) of a 1D-sampling S with views through Extraction Transformation and Load (ETL) resolution R · 10−D and phase (R · 10−D )/2 such that processes on sensor data is the approach followed by the −263 < i < 263 + 1 present framework. where each s(i) is interpreted as the time interval [1/1/1970+ A more sophisticated solution is to consider sensor data s(i) seconds, 1/1/1970 + (s(i) + R · 10−D ) seconds). Again, streams and to enable the continuous execution of queries maximum allowed D is 6 (microsecond). Proceedings IMMoA’13 612 http://www.dbis.rwth-aachen.de/IMMoA2013/ p r • LineString(S): Vector polylines defined by sequences s(-2) s(-1) s(0) s(1) s(2) of elements of S. 0 (a) 1D Sampling • Polygon(S): Vector polygons, possibly with holes, whose borders are defined by sequences of elements of S. y • GeometryCollection(S): Heterogeneous collections s(1,1) of Geometries. s(0,1) • MultiPoint(S): Homogeneous collection of elements of S. vr2 • MultiLineString(S): Homogeneous collection of ele- vr 1 ments of LineString(S). s(1,0) s(0,0) vp2 • MultiPolygon(S): Homogeneous collection of elements vp1 of Polygon(S). (0,0) x Definition 8. If ADT1 , ADT2 , . . . ADTn are not necessar- ily distinct data types, A1 , A2 , . . . , An are distinct names and RDT is a data type, then: (a) 2D Sampling 1. A Mapping with signature M () : RDT is defined as a value of type RDT Figure 1: Illustration of 1D and 2D samplings. 2. A Mapping with signature M (A1 : ADT1 , A2 : ADT2 , . . . , An : ADTn ) : RDT Definition 5. Date is defined as a shorthand of TimeIn- is defined as a partial function stantSample(0, 86400). M : ADT1 × ADT2 × ADTn → RDT Definition 6. Point2D(P,D) is defined as the finite sub- Operations are syntactic sugar for Mappings. Implicit set of elements s(i1 , i2 ) of a 2D-sampling S with directions castings between compatible data types are applied during D1 = 0 and D2 = π/2, resolutions R1 = R2 = 10−D and Mapping invocations, enabling transparent transformation phases P h1 = P h2 = 0 such that between temporal and spatial elements of different resolu- −10P < i1 , i2 < 10P tions by applying constant interpolation. Various primitive mappings and operations are provided by the model. How- Definition 7. Point2DSample(P,D,R) is defined as the ever, formalizing a complete set of them is out of the scope finite subset of elements s(i1 , i2 ) of a 2D-sampling S with im- of the paper. Informal descriptions of required primitive plementation dependent directions D1 and D2 , resolutions mappings will be given throughout the paper. r1 = r2 = K · R · 10−D and phases p1 = p2 = 0 such that A MappingSet is nothing but a set of Mappings that share 1. −10P < i1 , i2 < 10P a common domain defined as a n-ary relation over data ¯ ¯ ¯ ¯ types. Formalism is given below. 2. K < max(¯cos( D2 −D2 1 ¯ ¯ ) , sin( D2 −D 2 1 ¯ )) Definition 9. Let C1 , C2 , . . . , Cn be distinct names, ADT1 , TimeInstant and Point2D data types provide discrete rep- ADT2 , . . ., ADTn be not necessarily distinct data types resentations for both time and space, where the user has con- and RDT1 , RDT2 , . . ., RDTm be not necessarily distinct trol over the supported precision. Types TimeInstantSam- data types. Let also D be a n-ary relation with scheme ple and Point2DSample provide representations for temporal D(C1 : ADT1 , C2 : ADT2 , . . ., Cn : ADTn ) defined as and spatial samplings at user defined resolution. It is noticed a finite subset of ADT1 × ADT2 × . . . × ADTn . Then that each time instant is approximated by its closest lower a MappingSet is defined in either of the three following TimeInstantSample, whereas each 2D point is approximated forms: by its closest Point2DSample. It is out of the scope of this paper to demonstrate that K factor above ensures that any 1. A 1-tuple M S = hDi. 2D point is approximated by a sample at a distance lower or equal to R · 10−D . Type castings are available for the above 2. A m-tuple M S = hM1 , M2 , . . . , Mm i, where each Mi data types. is a Mapping with signature Mi () : RDTi defined as a If T is either a numeric or temporal type, then data type value of RDTi . Interval(T) is a new data type whose values are closed in- tervals over data type T. If t1 , t2 are two elements of data 3. A (m+1)-tuple M S = hD, M1 , M2 , . . . , Mm i, where type T, then [t1 , t2 ] is used to denote the relevant closed each Mi is a Mapping with signature Mi (C1 : ADT1 , C2 : interval. Similarly, if S is spatial data type then the follow- ADT2 , . . . , Cn : ADTn ) : RDTi defined as a partial ing geometric data types are also supported, based on the function Mi : ADT1 × ADT2 × ADTn → RDTi . standard specification given by [10]. The evolution with respect to time of spatial entities and • Geometry(S): Abstract type. Represents any vector spatial coverages may be modeled with appropriate Map- geometry or set of geometries defined with elements of pingSets that contain both Domain and Mappings. n-ary S. relationships are also modeled with MappingSets, usually Proceedings IMMoA’13 623 http://www.dbis.rwth-aachen.de/IMMoA2013/ without Mappings. MappingSets without Domain are also keyProperty property GeoProperty property1 property2 useful to record short collections of key-value pairs that are C common in the specification of configuration settings. SpatialEntity SpatialCoverage The Domains and Mappings of a MappingSet may be de- fined either extensionally or intensionally. If a extensional (a) Spatial Entities (b) Spatial Coverages definition of the Domain is given, then both extensional EO TO and intensional definitions of Mappings are allowed. On EO TO Entity Entity relat. relat. the other hand, an intensional definition of the Domain may only be accompanied by intensional definitions of Mappings. (c) Observed Entities (d) Observed Relationships Generally, an extensional definition is a sequence of all the elements of Domain and Mappings in some specific order. component1 component2 simpleProperty TO Both row-wise and column-wise orderings may be used. It is even possible to combine row and column-wise orders for simpleProperty EO multiValued TO different components and Mappings. If the data type of complexProperty EO a Domain component is of some integer or sampling data (e) Observed Properties type, then its extensional definition might be given in the form of a collection of sequence definitions. In general, a sequence definition has an start element, a size and a step. For example, for an integer data type, a sequence start- Figure 2: E-R Diagram Notation for Spatial and ing at 5, with size 4 and step 2 describes the following list Observation Data. < 5, 7, 9, 11 >. For a TimeInstantSample data types, a se- quence starting at “2013 − 05 − 0215 : 00 : 45.06”, with by the system including both statistical and rank functions. size 2 and step 30.42 describes the following sequence of MappingSet domains may also be intensionally defined. type TimeInstant(2, 3042) ¡“2013 − 05 − 0215 : 00 : 20.22”, “2013 − 05 − 0215 : 00 : 50.64”¿. 1 . For Point2DSample data types, starting element is fo type Point2D and step has to Intensional Domain. Let e be a functional expression that yields a value s of either Interval(T) or Geometry(S) data be given by two pairs (direction, resolution). type, whose base type T, S is either some integer type or Spatio-temporal analysis is enabled through the inten- some sample type. Then, SAMPLING(e) yields all the el- sional definition of Mappings and MappingSets. Mappings ements of type T or S contained in s. Based on this, the may be intensionally defined with functional, conditional domain D of a MappingSet M may be defined by an expres- and aggregate expressions. sion of the form {(e1 , e2 , . . . , en )|P } Functional expression. A Mapping M with signature M(D): where P is a domain relational calculus predicate and each ei DT may be defined by a expression of the form is either a functional expression or an expression of the form M(D) := e SAM P LIN G(e), where e is also a functional expression. where e is a functional expression of data type DT that may Expressions e and ei may include variable names bounded include variables referencing components of D, mappings, to MappingSet domain components in P. Given that nested operations, constants and castings. structures are not allowed in the model, if an expression SAM P LIN G(e) is used then the result relation has to be Conditional Expression. A Mapping M with signature unnested. M(D): DT may be defined by a expression of the form M(D) := CASE b1 THEN e1 CASE b2 THEN e2 4. MODELING OBSERVATION DATA WARE- ... HOUSES CASE bn THEN en The data model described in this section captures observa- [OTHERWISE en+1 ] tion data semantics and integrates them with spatial entities where each bi is a functional expression that yields a value and coverages. An E-R extension is proposed in Subsection of Boolean type and each ei is a functional expression that 4.1 to model observation metadata. The translation of such yields a value of type DT. The semantics are the obvious a conceptual model to the MappingSet based logical model ones. is explained in Subsection 4.2. Aggregate Expression. A Mapping M with signature M(D): 4.1 Conceptual Data Model DT may be defined by a expression of the form Contrary to conventional metadata that is recorded at the M(D):= agge level of entity and property types, some observation meta- OVER {P} data has to be recorded at the level of entity and property where P is a domain relational calculus predicate and agge instances, i.e., combined with the data itself. This is the is an functional expression where variables bounded to Map- case for example of observation time instants and observa- pingSet domains in P must be used as arguments of aggre- tion processes. gate functions. Various aggregate functions are provided An extension of the E-R model is next proposed to incor- porate spatial and observation data semantics in conceptual 1 Notice that the start instant of the sequence is automati- models. Spatial Entity types are represented in diagrams cally adapted to match the underlying time representation as conventional entities (see Figure 2(a)). Spatial Coverage for type TimeInstant(2, 3042) Types are represented as entities tagged with the symbol Proceedings IMMoA’13 634 http://www.dbis.rwth-aachen.de/IMMoA2013/ load cond temp depth ProcessType ObsProperty EO EO catches ctd EO EO Process vesId name comfort stock TO quota ObsFOI NonObsFOI (0..*) (0..*) loc TO (0..*) Species (0..*) capacity Vessel difTem EO (0..*) (0..*) speId comfGeo TO power FOI MappingSet EO (0..*) enters C TO (0..*) stock TO SST ICES Effort quota (1..*) temp DomComponent Mapping zoneId geo Figure 4: E-R Diagram of the Frameworks Catalog. Figure 3: E-R Diagram of a Running Application Example. using the quota and the vessels GPS information. The Fish- ing Capacity gives the kilograms of each species that the ves- C (see Figure 2(b)). Entity Types, either spatial or not, sel may get from each zone. Again both quota is recorded and Coverages whose whole data is obtained through an ob- and stock is computed by a Non-Physical Process. servation Process are tagged with either symbol TO if it is The translation of the above model to the MappingSet EO logical model of the framework is explained in the following a Time-triggered Process or symbol if it is an Event- subsection. triggered Process (see Figure 2(c)). Relationships resulting from observation processes are tagged in the same way (see 4.2 MappingSet Based Logical Model Figure 2(d)). Finally, properties of either Entity or Cover- To support the implementation of the conceptual model of age types that are obtained through observation processes the previous section, observation metadata has to be added are also tagged with the same TO and EO symbols, as it is to the frameworks catalog. Thus, the catalog contains meta- shown in Figure 2(e) for simple, complex and multivalued data of the defined Mappings and MappingSets and meta- properties. data related to the various observation processes, including To illustrate the use of the above notation the E-R dia- observation properties and features of interest. The E-R gram of a reduced running application example is given in diagram of such catalog structures is given in Figure 4. Figure 3. Spatial Entity Type ICES records fishing zones de- Entity types MappingSet, DomComponent and Mapping fined by the International Council for the Exploration of the record general metadata of the MappingSets. Entity type Sea (ICES). Spatial Coverage SST records Sea Surface Tem- FOI records metadata of Features of Interest, and it refer- perature at each location of the sea, daily produced by the ences the MappingSet that records its data. FOIs that are Moderate Resolution Imaging Spectroradiometer (MODIS) fully generated by observation processes are registered in sensor installed in the Terra and Aqua NASA satellites. En- ObsFOI. The remainder FOIs, i.e., those that combine ob- tity type Vessel records data of fishing vessels, including an served with non observed properties are represented by en- identifier (vesId ) and its engine power. Vessels incorporate tity type NonObsFOI. Each observed property of such a FOI CTD sensors that enable obtaining triples of water conduc- is represented by a weak entity of type ObsProperty, which tivity, water temperature and depth. Every time a ctd ob- references the MappingSet that records its data. Finally, servation is performed a Non-Physical Process is executed ProcessType records metadata of the various types of ob- that computes the difference with the value given by MODIS servation processes registered in the framework. Metadata and provides it as a derived property difTemp. Vessels also of each specific instance of each process type is recorded in incorporate GPS sensors from which locations are obtained weak entity type Process. Notice the difference between the every 30 seconds. Entity type Species records data of fish- process type “Vessel Bascule” that obtains values of load ing species, including an identifier specId, species name and property of relationships catches and the specific bascule an interval of temperature values where the fish feels com- installed in each vessel that must be referenced from each fortable (property comfort). The derived property comfGeo observation. records the geometry of the area of the sea where comfort- The rules that enable the transformation of the concep- able temperatures for the fish are located. This property tual model of the previous section to MappingSets are now is obtained by a Non-Physical Process from the SST data. given next. Each Entity Type, either Spatial or not, gener- Property load of relationship catches records the values mea- ates a relevant MappingSet, whose domain is defined by key sured by the vessel bascule for each species. The autho- properties and whose Mappings are defined by the remain- rized fishing capacity of a vessel is given by two parameters. der properties. See for example Entity Types Vessel, Species The Fishing Effort gives a measure of the number of days and ICES in Figure 3 and relevant MappingSets in Figure 5. weighted by the vessel engine power that the vessel may Each Spatial Coverage generates a MappingSet, whose do- stay in each zone. Relationship Effort records both the ini- main has just one component of some Point2DSample type tial quota and the available one (property stock ). Available and whose Mappings are generated from coverage proper- Fishing Effort stock is obtained by a Non-Physical Procress ties. Each Relationship Type with cardinalities various to Proceedings IMMoA’13 645 http://www.dbis.rwth-aachen.de/IMMoA2013/ MAPPINGSET Vessel MAPPINGSET Capacity DOMAIN DOMAIN vesId: CString vessel: CString MAPPINGS ices: CString power(vesId:CString):Numeric(6,2) species: CString MAPPINGS MAPPINGSET Vessel_loc quota(vessel:CString, ices:CString DOMAIN species:CString):Numeric(7,3) obsTime: TimeIntantSample(0, 30), quotaUOM(vessel:CString, ices:CString vesId: CString species:CString):CString MAPPINGS loc(phenTime: TimeIntantSample(0, 30), vesId:CString):Point2D MAPPINGSET Effort process(obsTime: TimeIntantSample(0, 30), DOMAIN vesId:CString):CString vessel: CString ices: CString MAPPINGSET Vessel_ctd MAPPINGS DOMAIN quota(vessel:CString, obsTime: TimeIntant(0), ices:CString):Numeric(7,3) vesId: CString quotaUOM(vessel:CString, MAPPINGS ices:CString):CString cond(obsTime: TimeIntantSample(0, 30), vesId:CString):Numeric(4,1) MAPPINGSET Catches condUOM(obsTime: TimeIntantSample(0, 30), DOMAIN vesId:CString):CString species: CString temp(obsTime: TimeIntantSample(0, 30), vessel: CString vesId:CString):Numeric(5,2) obsTime: TimeInstant(0) tempUOM(obsTime: TimeIntantSample(0, 30), MAPPINGS vesId:CString):CString load(species:CString, vessel:CString, depth(obsTime: TimeIntantSample(0, 30), obsTime:TimeInstant(0)): Numeric(7,3) vesId:CString):Numeric(5,2) loadUOM(species:CString, vessel:CString, depthUOM(obsTime: TimeIntantSample(0, 30), obsTime:TimeInstant(0)): CString vesId:CString):CString process(species:CString, vessel:CString, process(obsTime: TimeIntantSample(0, 30), obsTime:TimeInstant(0)): CString vesId:CString):CString MAPPINGSET Species MAPPINGSET SST DOMAIN DOMAIN speId: CString loc:Point2DSample(9,2,100000) MAPPINGS obsTime:Date name(speId:CString):CString MAPPINGS comfort(speId:CString):Interval(Numeric(5,2)) temp(loc:Point2DSample(9,2,100000), obsTime:Date):Numeric(5,2) MAPPINGSET ICES tempUOM(loc:Point2DSample(9,2,100000), DOMAIN obsTime:Date):CString zoneId: CString process(loc:Point2DSample(9,2,100000), MAPPINGS obsTime:Date):CString geo(zoneId:CString):Polygon(Point2D(9,2)) Figure 5: MappingSets for a Running Application Example. Proceedings IMMoA’13 656 http://www.dbis.rwth-aachen.de/IMMoA2013/ various generates a MappingSet whose domain is defined MODIS (see difTemp derived property of Vessel in Figure from the key properties of the participating Entity Types. 3). Properties of those Relationship Types generate Mappings MAPPINGSET Vessel difTem in such a MappingSet. See for an example Relationship DOMAIN Types capacity and effort in Figure 3 and MappingSets Ca- {(obsTime, vesId) | Vessel ctd(obsTime, vesId)} pacity and Effort in Figure 5. If an Entity, Coverage or Re- MAPPINGS difTem(obsTime, vesId):= lationship Type is tagged with the symbol TO , then a com- SST.temp(Vessel loc.loc(obsTime, vesId), obsTime) − ponent named obsTime of some TimeInstantSample(D,R) Vessel ctd.temp(obsTime, vesId) data type is added to the MappingSet Domain to enable the difTemUOM(obsTime, vesID):= recording of observation time.2 Besides, a Mapping named Vessel ctd.tempUOM(obsTime, vesId) process is also added to obtain the id of the process used process(obsTime, vesID):= “difTemProcess” to produce the observation. See for example Spatial Cover- In the expression above it is noticed that automatic castings of age Type SST in Figure 3 and relevant MappingSet SST in spatial and temporal types are performed during the evaluations Figure 5. If symbol EO is used instead, then the data type of Mappings Vessel loc.loc and SST.temp. of component obsTime is some TimeInstant(D). See for ex- Example 2. Define a Non-Physical Process that detects when ample Relationship Type catches in Figure 3 and relevant a vessel leaves an ICES zone to enter a new one (see enters derived Catches MappingSet in Figure 5. In any of the above cases, relationship in Figure 3). an entity of type ObsFOI has to be added to the catalog with relevant relationships to its process type and MappingSet. ICESFromLoc(loc):= If a simple or complex property is tagged with symbol TO singleton(zone) then such property is not added as a Mapping to the relevant OVER {ICES(zone) ∧ within(loc, ICES.geo(zone))} MappingSet. Instead, a separate MappingSet is created for MAPPINGSET enters the property whose domain has components to reference the DOMAIN key of its Entity Type (FOI of the relevant observation) and {(vesId, ICESFromLoc(Vessel loc.loc(obsTime, vesId)), has a component named obsTime of some TimeInstantSam- obsTime) | Vessel loc(obsTime, vesId) ∧ ple(D,R) type to record observation time. The property ICESFromLoc(Vessel loc.loc(obsTime, vesId)) <> itself is added as a Mapping to the MappingSet and an ad- ICESFromLoc(Vessel loc.loc(predecessor(obsTime), vesId))} ditional Mapping named process is added to record the id of MAPPINGS the process that generates the observation. An example is process(vesId, zoneId, obsTime):= “entersProcess” loc property of Entity Type Vessel in Figure 3 and relevant In the above expression, Mapping within(g1 , g2 ) yields true if Vessel loc MappingSet in Figure 5. If symbol EO is used geometry g1 is within geometry g2 . Aggregate function single- instead then the transformation is exactly the same except ton(S) yields the element contained in the unitary set S. Finally, for the fact that Domain component obsTime is of some Mapping predecessor (ts) yields the time sample that precedes time sample ts in its data type. TimeInstant(D) type. For an example see ctd property of Vessel Entity Type in Figure 3 and relevant Vessel ctd Map- Example 3. Define a Non-Physical Process that produces a pingSet in Figure 5. In any of the above cases an entity of measure of the remaining fishing effort for each vessel and ICES type ObsProperty is added to the catalog, with appropriate zone for each of the preceding 60 days. Consumed fishing effort is obtained from the temporal evolution of vessel location data and references to its MappingSet, ProcessType and NonObsFOI. ICES zone geometries (see derived property stock of relationship Once the MappingSets are created and the required meta- type Effort in Figure 3). data are added to the catalog, the insertion of observation data may be started. ETL tasks are continuously executed ICESFromLoc(loc):= to maintain the data warehouse updated with latest obser- singleton(zone) OVER {ICES(zone) ∧ within(loc, ICES.geo(zone))} vation data, using extensional MappingSet definitions. Each observation is appended to the appropriate MappingSet with consumed effort(vesId, zoneId, obsTime) := its observation time and reference to its process and FOI. ((count(obsTime2)*30)/86400)*Vessel.power(vesId) OVER {Vessel loc(obsTime2, vesId2) ∧ obsTime2 < obsTime ∧ vesId2 = vesId ∧ 5. DEFINITION OF SPATIO-TEMPORAL AN- ICESFromLoc(Vessel loc.loc(obsTime2, vesId2)) = zoneId } ALYTICAL PROCESSES MAPPINGSET Effort stock The capabilities provided by the framework for the inten- DOMAIN sional definition of MappingSets enable the specification of {(vesId, zoneId, spatio-temporal analytical processes. These capabilities are SAMPLING([cast(difTime(now(), 60 Days) AS Date), now illustrated with some examples. cast(now() AS Date)])) | Effort(vesId, zoneId)} MAPPINGS stock(vesId, zoneId, obsTime):= Example 1. Define a Non-Physical Process that obtains Effort.quota(vesId, zoneId) − a derived observed property that computes the difference consumed effort(vesId, zoneId, obsTime) between the temperature measured by the CTD and the stockUOM (vesId, zoneId, obsTime):= sea surface temperature produced for the same location by Effort.quotaUOM(vesId, zoneId) process(vesId, zoneId, obsTime):= “EffortStockProcess” 2 Currently we restrict to phenomenon time semantics, how- In the above expression, Mapping now () yields the current sys- ever, it can be extended with result time and other required tem time instant. Mapping difTime(t, i) subtracts time interval metadata. i from time instant t. Proceedings IMMoA’13 667 http://www.dbis.rwth-aachen.de/IMMoA2013/ Example 4. Define a Non-Physical Process that obtains the [4] S. Bowers, J. Madin, and M. Schildhauer. A conceptual evolution with respect to to time during the last 7 days of the modeling framework for expressing observational data geometry of the comfort zone for each species. Comfort zone is semantics. In Q. Li, S. Spaccapietra, E. Yu, and A. Oliv, obtained from the temperature interval defined for each species editors, Conceptual Modeling - ER 2008, volume 5231 of and the sea surface temperature generated by MODIS (see derived Lecture Notes in Computer Science, pages 41–54. Springer property comfGeo of entity type species in Figure 3). Berlin Heidelberg, 2008. [5] P. G. Brown. Overview of scidb: large scale array storage, MAPPINGSET Species comfGeo processing and analysis. In Proceedings of the 2010 ACM DOMAIN SIGMOD International Conference on Management of {(speId, data, SIGMOD ’10, pages 963–968, New York, NY, USA, SAMPLING([cast(difTime(now(), 7 Days) AS Date), 2010. ACM. cast(now() AS Date)])) | Species(speId)} [6] J. a. P. Cerveira Cordeiro, G. Câmara, U. Moura MAPPINGS De Freitas, and F. Almeida. Yet another map algebra. comfGeo(speId, obsTime):= Geoinformatica, 13(2):183–202, June 2009. vectorize(loc) [7] I. Galpin, C. Brenninkmeijer, A. Gray, F. Jabeen, OVER { SST(loc, obsTime2) ∧ obsTime = obsTime2 ∧ A. Fernandes, and N. Paton. Snee: a query processor for within(SST.temp(loc, obsTime2), Species.comfort(speId))} wireless sensor networks. Distributed and Parallel process(vesId, zoneId, obsTime):= “ComfortZoneProcess” Databases, 29(1-2):31–85, 2011. In the above expression, aggregate function vectorize(loc) ob- [8] A. Gupta and I. S. Mumick. Materialized views. chapter tains the vector geometry that surrounds the set of sample loca- Maintenance of materialized views: problems, techniques, tions loc. Mapping within(e, i) yields true if element e is within and applications, pages 145–157. MIT Press, Cambridge, interval i. MA, USA, 1999. [9] R. H. Güting, M. H. Böhlen, M. Erwig, C. S. Jensen, N. A. Lorentzos, M. Schneider, and M. Vazirgiannis. A 6. CONCLUSIONS AND FURTHER WORK foundation for representing and querying moving objects. A data model and data management framework has been pro- ACM Trans. Database Syst., 25(1):1–42, Mar. 2000. posed spatio-temporal analysis of data in data warehouses of spa- [10] International Organization for Standardization (ISO). tial observation data. The approach consists of an E-R exten- Information technology – Database languages – SQL sion for observation data to be used at a conceptual level and multimedia and application packages – Part 3: Spatial. a new logical level model that combines logical and functional ISO/IEC 13249-3:2011, 2011. paradigms. The advantages of the approach can be summarized [11] N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, as follows: H. Balakrishnan, U. Çetintemel, M. Cherniack, • General purpose observation data and metadata coexist R. Tibbetts, and S. Zdonik. Towards a streaming sql with application specific Spatial Entities and Coverages, standard. Proc. VLDB Endow., 1(2):1379–1390, Aug. 2008. enabling efficient analysis over the whole set. [12] M. Kersten, Y. Zhang, M. Ivanova, and N. Nes. Sciql, a query language for science applications. In Proceedings of • Few primitive Mappings combined with general purpose the EDBT/ICDT 2011 Workshop on Array Databases, AD logical and functional expressions enable the integrated man- ’11, pages 1–12, New York, NY, USA, 2011. ACM. agement of any kind of spatial and spatio-temporal data. [13] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and Besides, both data and analytical processing is unified un- W. Hong. Tinydb: an acquisitional query processing system der the well known mathematical concept of function. for sensor networks. ACM Trans. Database Syst., • Parametric temporal and spatial types enable the user to 30(1):122–173, Mar. 2005. have control over the precision and resolution of underlying [14] M. Neteler and H. Mitasova. Open Source GIS: A GRASS time and space representations. GIS Approach. Third edition. Springer, New York, USA, • Specific constructs for the specification of sampled and non- 2008. sampled domain components together with the absence of [15] Open Geospatial Consortium (OGC). OpenGIS Sensor nested structures simplifies efficient implementation. Model Language (SensorML) Implementation Specification, 2007. http://www.opengeospatial.org/standards/sensorml. Further work is mainly related to efficient implementation struc- [16] Open Geospatial Consortium (OGC). Geographic tures and algorithms and the extension of the framework to deal Information: Observations and Measurements. OGC with continuous queries on sensor data. Abstract Specification Topic 20, 2010. http://www.opengeospatial.org/standards/om. 7. ACKNOWLEDGMENTS [17] C. Parent, S. Spaccapietra, and E. Zimányi. Spatio-temporal conceptual models: data structures + This work has been partially supported by the Spanish Ministry space + time. In Proceedings of the 7th ACM international of Science and Innovation (TIN2010-21246-C02-02). symposium on Advances in geographic information systems, GIS ’99, pages 26–33, New York, NY, USA, 1999. ACM. 8. REFERENCES [18] N. Tryfona, R. Price, and C. Jensen. Chapter 3: [1] A. Arasu, S. Babu, and J. Widom. The cql continuous Conceptual models for spatio-temporal applications. In query language: semantic foundations and query execution. T. Sellis, M. Koubarakis, A. Frank, S. Grumbach, The VLDB Journal, 15(2):121–142, June 2006. R. Güting, C. Jensen, N. Lorentzos, Y. Manolopoulos, [2] P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and E. Nardelli, B. Pernici, B. Theodoulidis, N. Tryfona, H.-J. N. Widmann. The multidimensional database system Schek, and M. Scholl, editors, Spatio-Temporal Databases, rasdaman. In Proceedings of the 1998 ACM SIGMOD volume 2520 of Lecture Notes in Computer Science, pages international conference on Management of data, SIGMOD 79–116. Springer Berlin Heidelberg, 2003. ’98, pages 575–577, New York, NY, USA, 1998. ACM. [19] A. Vaisman and E. Zimányi. A multidimensional model [3] P. Baumann and S. Holsten. A comparative analysis of representing continuous fields in spatial data warehouses. array models for databases. In T.-h. Kim, H. Adeli, In Proceedings of the 17th ACM SIGSPATIAL A. Cuzzocrea, T. Arslan, Y. Zhang, J. Ma, K.-i. Chung, International Conference on Advances in Geographic S. Mariyam, and X. Song, editors, Database Theory and Information Systems, GIS ’09, pages 168–177, New York, Application, Bio-Science and Bio-Technology, volume 258 NY, USA, 2009. ACM. of Communications in Computer and Information Science, [20] J. Viqueira and N. Lorentzos. Sql extension for pages 80–89. Springer Berlin Heidelberg, 2011. spatio-temporal data. The VLDB Journal, 16(2):179–200, 2007. Proceedings IMMoA’13 678 http://www.dbis.rwth-aachen.de/IMMoA2013/