The Ontological Multidimensional Data Model
                                          (extended abstract)
                          Leopoldo Bertossi? and Mostafa Milani??


         Abstract. We briefly present OMD, a model of multidimensional data
         that uses ontologies written in Datalog± , an extension of the classical
         declarative language Datalog for relational databases.


    We present the Ontological Multidimensional Data Model (OMD) as an on-
tological, Datalog± -based [3] extension of the Hurtado-Mendelzon (HM) model
for multidimensional data [5].
    For limitations of space, we will use a running example to illustrate the main
elements of an OMD model.
                                                   WorkSchedules
            AllHospital              Unit          Day        Nurse Specialization                        AllTemporal
                                    Terminal    Sep/5/2016    Cathy Cardiac Care
                                    Intensive   Nov/12/2016    Alan    Critical Care
           Institution         𝜂                                                                          Year
                                    Standard    Sep/6/2016    Helen         ?
                                    Intensive   Aug/21/2016    Sara         ?

                 Unit                                 Shifts                                              Month

                               𝜎2     Ward         Day     Nurse         Shift         𝜎1
                                         W4      Sep/5/2016    Cathy     Noon
               Ward                                                                                       Day
                                         W1      Sep/6/2016    Helen    Morning
                                         W3     Nov/12/2016    Alan     Evening
                                         W3     Aug/21/2016     Sara     Noon                             Time
                                         W2      Sep/6/2016    Helen        ?

Fig. 1. An OMD model with categorical relations, dimensional rules, and constraints
    An OMD model has a database schema RM = H∪Rc , where H is a relational
schema with multiple dimensions, with sets K of unary category predicates, and
sets L of binary, child-parent predicates; and Rc is a set of categorical predicates.
Example: Figure 1 shows Hospital and                                                   AllHospital    allHospital
Temporal dimensions. The former’s instance
is here on the RHS. K contains predi-
cates Ward (·), Unit(·), Institution(·), etc.                                    Institution         H1         H2
Instance DH gives them extensions, e.g.
Ward = {W1 , W2 , W3 , W4 }. L contains, e.g.
WardUnit(·, ·), with extension: WardUnit                                   Unit        standard      intensive terminal

= {(W1 , standard), (W2 , standard), (W3 ,
intensive), (W4 , terminal)}. In the middle of
Figure 1, categorical relations are associated                                  Ward        W1       W2        W3       W4
to dimension categories.                     

?
     Carleton Univ., School of Computer Science, Canada. bertossi@scs.carleton.ca
??
     McMaster Univ., Dept. Computing and Software, Canada. mmilani@mcmaster.ca
    Attributes of categorical predicates are either categorical, whose values are
members of dimension categories, or non-categorical, taking values from arbitrary
domains. Categorical predicate are represented in the form R(C1 , . . . , Cm ; N1 , . . . ,
Nn ), with categorical attributes before “;” and non-categorical after.
    The extensional data, i.e the instance for the schema RM , is I M = DH ∪
I , where DH is a complete instance for dimensional subschema H containing
  c

the category and child-parent predicates; and sub-instance I c contains possibly
partial, incomplete extensions for the categorical predicates, i.e. those in Rc .
    Schema RM comes with basic, application-independent semantic constraints,
listed below.
1. Dimensional child-parent predicates must take their values from categories.
Accordingly, if child-parent predicate P ∈ L is associated to category predicates
K, K 0 ∈ K, in this order, we introduce inclusion dependencies (IDs) as Datalog±
negative constraints (ncs): P (x, x0 ), ¬K(x) → ⊥, and P (x, x0 ), ¬K 0 (x0 ) → ⊥.
(The ⊥ symbol denotes an always false propositional atom.) We do not repre-
sent them as Datalog± ’s tuple-generating dependencies (tgds) P (x, x0 ) → K(x),
etc., because we reserve tgds for possibly incomplete predicates (in their RHSs).
2. Key constraints on dimensional child-parent predicates P ∈ K, as equality-
generating dependencies (egds): P (x, x1 ), P (x, x2 ) → x1 = x2 .
3. The connections between categorical attributes and the category predicates
are specified by means of ncs. For categorical predicate R, the nc R(x̄; ȳ), ¬K(x)
→ ⊥, where x ∈ x̄ takes values in category K.
Example: The categorical attributes Unit and Day of categorical predicate
WorkingSchedules(Unit,Day;Nurse, Speciality) in Rc are connected to the Hospi-
tal and Temporal dimensions, resp., as captured by the IDs WorkingSchedules[1] ⊆
Unit[1], and WorkingSchedules[2] ⊆ Day[1]. The former is written in Datalog+ as
WorkingSchedules(u, d ; n, t), ¬Unit(u) → ⊥. For the Hospital dimension, one
of the IDs for predicate WardUnit is WardUnit[2] ⊆ Unit[1], which is expressed
by the nc: WardUnit(w , u), ¬Unit(u) → ⊥. The key constraint of WardUnit is
captured by the egd: WardUnit(w , u), WardUnit(w , u 0 ) → u = u0 .           
   The OMD model allows us to build multidimensional ontologies, OM . In
addition to an instance I M for a schema RM , they include the set Ω M of basic
constraints as in 1.-3. above, a set Σ M of dimensional rules (those in 4. below),
and a set κM of dimensional constraints (in 5. below); all of them application-
dependent and expressed in the Datalog+ language associated to schema RM .
4. Dimensional rules as Datalog+ tgds: R1 (x̄1 ; ȳ1 ), ..., Rn (x̄n ; ȳn ), P1 (x1 , x01 ), ...,
Pm (xm , x0m ) → ∃ȳ 0 Rk (x̄k ; ȳ). Here, the Ri (x̄i ; ȳi )) are categorical predicates,
the Pi are child-parent predicates, ȳ 0 ⊆ ȳ, x̄k ⊆ x̄1 ∪...∪x̄n ∪{x1 , ..., xm , x01 , ..., x0m },
ȳ r ȳ 0 ⊆ ȳ1 ∪ ... ∪ ȳn ; repeated variables in bodies (join variables) appear only
categorical positions in categorical relations and in child-parent predicates. Ex-
istential variables appear only in non-categorical attributes.
5. Dimensional constraints, as egds or ncs: R1 (x̄1 ; ȳ1 ), ..., Rn (x̄n ; ȳn ), P1 (x1 , x01 ),
..., Pm (xm , x0m ) → z = z 0 , and R1 (x̄1 ; ȳ1 ), ..., Rn (x̄n ;Sȳn ), P1S    (x1 , x01 ), ...,
           0                        c                           0
Pm (xm , xm ) → ⊥. Here, Ri ∈ R , Pj ∈ L, and z, z ∈                     x̄i ∪ ȳj . Some
of the lists in the bodies may be empty, i.e. n = 0 or m = 0, which allows to
represent also classical constraints on categorical relations, e.g. keys or FDs.
Example: The left-hand-side of Figure 1 shows dimensional constraint η on cate-
gorical relation WorkingSchedules, which is linked to the Temporal dimension via
the Day category. It says: “No personnel was working in the Intensive care unit in
January”, i.e. η : WorkingSchedules(intensive, d; n, s), DayMonth(d, jan) → ⊥.
    Dimensional tgd σ1 in Figure 1, given by Shifts(w, d; n, s), WardUnit(w, u)
→ ∃t WorkingSchedules(u, d; n, t), says that “If a nurse has shifts in a ward
on a specific day, he/she has a working schedule in the unit of that ward on
the same day”. The use of σ1 generates, from the Shifts relation, new tuples
for relation WorkingSchedules, with null values for the Specialization attribute,
due to the existential variable. Existential rules like this (and also egds and ncs)
make us depart from classic Datalog, taking us into Datalog± . Relation Working
Schedules may be incomplete, and new -possibly virtual- entries can be produced
for it, e.g. the shaded ones showing Helen and Sara working for the Standard and
Intensive units, resp. This is done by upward-navigation and data propagation
through the dimension hierarchy. Constraint η is expected to be satisfied both
by the initial extensional tuples for WorkingSchedules and its tuples generated
through σ1 , i.e. by its non-shaded tuples and shaded tuples in Figure 1, resp. In
this example, η is satisfied.
    Notice that WorkingSchedules refers to the Day attribute of the Temporal
dimensions, whereas η involves the Month attribute. Then, checking η requires
upward-navigation through the Temporal dimension. Also the Hospital dimension
is involved in the satisfaction of η: The tgd σ1 may generate new tuples for
WorkingSchedules, by upward-navigation from Ward to Unit.
    Furthermore, we have an additional tgd σ2 that can be used with Work-
ingSchedules to generate data for categorical relation Shifts (the shaded tuple
in it is one of them): σ2 : WorkingSchedules(u, d; n, t), WardUnit(w, u) →
∃s Shifts(w, d; n, s). It reflects the institutional guideline stating that “If a nurse
works in a unit on a specific day, he/she has shifts in every ward of that unit on
the same day”. Accordingly, σ2 relies on downward-navigation for tuple gener-
ation, from the Unit category level down to the Ward category level.
    If we have a categorical relation Therm(Ward , Thertype; Nurse), with Ward
and Thertype categorical attributes (the latter for an Instrument dimension), the
following is an egd saying that “All thermometers in a unit are of the same type”:
Therm(w , t; n), Therm(w 0 , t 0 ; n 0 ),WardUnit(w , u),WardUnit(w 0 , u) → t = t0 .
    Notice that our ontological language allows us to impose a condition at the
Unit level without having it as an attribute in the categorical relation. The ex-
istential variables in dimensional rules, such as t and s as in σ1 and σ2 , resp.,
make up for the missing, non-categorical attributes Speciality and Shift in Work-
ingSchedules and Shifts, resp.                                                        
    Dimensional tgds can be used for upward- or downward-navigation (or data
generation) depending on the joins in the body. A one-step direction is deter-
mined by the difference of levels of the dimension categories appearing (as at-
tributes) in the joins. Multi-step navigation, between a category and an ancestor
or descendant category, can be captured through a chain of joins with adjacent
child-parent dimensional predicates in the body of a tgd, e.g. propagating doctors
at the unit level all the way up to the hospital level: WardDoc(ward ; na, sp),
WardUnit(ward , unit), UnitInst(unit, ins) → HospDoc(ins; na, sp).
Example: Rule σ2 supports downward tuple-generation. When enforcing it on a
tuple WorkingSchedules(u, d; n, t), via category member u (for Unit), a tuple for
Shifts is generated for each child w of u in the Ward category for which the body
of σ2 is true. For example, chasing σ2 with the third tuple in WorkingSched-
ules generates two new tuples in Shifts: Shifts(W2 , sep/6/2016, helen, ζ) and
Shifts(W1 , sep/6/2016, helen, ζ 0 ), with fresh nulls, ζ and ζ 0 . The latter tuple is not
shown in Figure 1 (it is dominated by the third tuple, Shifts(W1 , sep/6/2016, helen,
morning), in Shifts). With the old and new tuples we obtain the answers to the
query about Helen’s wards on Sep/6/2016: Q0 (w) : ∃s Shifts(w, sep/6/2016, helen,
s). They are W1 and W2 .
    In contrast, the join between Shifts and WardUnit in σ1 enables upward-
navigation; and the generation of only one tuple for WorkingSchedules from
each tuple in Shifts, because each Ward member has at most one Unit parent.

    We can see that the OMD data model is an ontological model that goes far
beyond classical multidimensional data models. For example, the HM model [5],
which is subsumed by OMD, does not include general tgds, egds, or ncs. Starting
from our relational reconstruction of the HM model, all these elements, plus
the data and queries, are seamlessly integrated into a uniform logico-relational
framework. OMD supports general, possibly incomplete categorical relations,
and not only complete “fact tables” linked to base (or bottom) categories.
    Furthermore, the constraints considered in the HM model are specific for
the dimensional structure of data, most prominently, to guarantee summariz-
ability (i.e. correct aggregation, avoiding double-counting). Specifically, we find
constraints enforcing strictness and homogeneity [5]. The former requires that
every category elements rolls-up to a single element in a parent category, which
in OMD can be expressed by egds. The latter requires that category elements
have parent elements in parent categories, which in OMD can be expressed by
tgds. (Cf. [10, sec. 4.3] for more details.)
    The OMD model enables ontology-based data access (OBDA) [6] and allows
for the tight integration of conceptual models (e.g. an ER model expressed in
logical terms) and the relational model of data, while representing and using
dimensional structures and data. Cf. [7, 2] for applications of the OMD model
to quality data specification and extraction.
    The ontologies of the OMD model have good computational properties [2, 7].
Actually, they belong to the class of weakly-sticky Datalog± programs [4], for
which conjunctive query answering (CQA) can be done in polynomial time in
data. Algorithms for CQA have been proposed [8, 9], so as optimizations thereof
[8] with magic-sets techniques [1].
Acknowledgements: Research supported by NSERC Discovery Grant #06148.
References
 [1] M. Alviano, N. Leone, M. Manna, G. Terracina and P. Veltri. Magic-Sets for
     Datalog with Existential Quantifiers. Proc. Datalog 2.0, Springer LNCS 7494,
     2012, pp. 31-43.
 [2] Bertossi, L. and Milani, M. Ontological Multidimensional Data Models and Con-
     textual Data Quality. Journal submission, 2017. Posted as Corr Arxiv Paper
     cs.DB/1704.00115.
 [3] A. Cali, G. Gottlob, and T. Lukasiewicz. Datalog±: A Unified Approach to On-
     tologies and Integrity Constraints. Proc. ICDT, 2009, pp. 14-30.
 [4] A. Cali, G. Gottlob, and A. Pieris. Towards more Expressive Ontology Languages:
     The Query Answering Problem. Artificial Intelligence, 2012, 193:87-128.
 [5] Hurtado, C. and Mendelzon, A. OLAP Dimension Constraints. Proc. PODS,
     2002, pp. 169-179.
 [6] M. Lenzerini. Ontology-Based Data Management. Proc. AMW 2012, CEUR Pro-
     ceedings, Vol. 866, pp. 12-15.
 [7] Milani, M. and Bertossi, L. Ontology-Based Multidimensional Contexts with Ap-
     plications to Quality Data Specification and Extraction. Proc. RuleML, Springer
     LNCS 9202, 2015, pp. 277-293.
 [8] Milani, M. and Bertossi, L. Extending Weakly-Sticky Datalog± : Query-Answering
     Tractability and Optimizations. Proc. RR, Springer LNCS 9898, 2016, pp. 128-
     143.
 [9] Milani, M., Bertossi, L. and Calı̀, A. A Hybrid Approach to Query Answering
     under Expressive Datalog± . Proc. RR, Springer LNCS 9898, 2016, pp. 144-158.
[10] Milani. M. Multidimensional Ontologies for Contextual Quality Data Spec-
     ification and Extraction. PhD Thesis, Carleton University, January 2017.
     http://people.scs.carleton.ca/∼ bertossi/papers/mostafaFinal.pdf