=Paper=
{{Paper
|id=Vol-2083/paper-11
|storemode=property
|title=Guided Query Composition with Semantic OLAP Patterns
|pdfUrl=https://ceur-ws.org/Vol-2083/paper-11.pdf
|volume=Vol-2083
|authors=Ilko Kovacic,Christoph G. Schuetz,Simon Schausberger,Roman Sumereder,Michael Schrefl
|dblpUrl=https://dblp.org/rec/conf/edbt/KovacicSSSS18
}}
==Guided Query Composition with Semantic OLAP Patterns==
Guided Query Composition with Semantic OLAP Patterns∗
Ilko Kovacic Christoph G. Schuetz Simon Schausberger
Johannes Kepler University Linz Johannes Kepler University Linz Johannes Kepler University Linz
Linz, Austria Linz, Austria Linz, Austria
ilko.kovacic@jku.at christoph.schuetz@jku.at simon.schausberger@jku.at
Roman Sumereder Michael Schrefl
Johannes Kepler University Linz Johannes Kepler University Linz
Linz, Austria Linz, Austria
roman.sumereder@jku.at michael.schrefl@jku.at
ABSTRACT In the course of several industrial research projects, such as
Enabling domain experts to independently compose ad hoc OLAP semCockpit [9] and agriProKnow [13], we have noticed that, inde-
queries is the primary goal of semantic OLAP (semOLAP) pat- pendent of a specific domain, ad hoc OLAP queries follow certain
terns. In this respect, a semOLAP pattern represents a recurring recurring patterns. In previous work we have hence identified
domain-independent OLAP query by describing the application and semantically described those patterns, leading to semantic
scope and defining the structure of the query using formal pattern OLAP (semOLAP) patterns (for details see [10]). A semOLAP
elements (FPEs). Such a semOLAP pattern is executable: In order pattern comprises a structural definition using formal pattern el-
to execute a semOLAP pattern, the user instantiates the pattern ements (FPEs) as well as a textual description including a concise
by providing FPE bindings. In this paper, we propose an approach name, the analysis situation that the pattern can be applied in,
for guided query composition which considers the inherent query the instructions to follow, and an example.
structure in order to determine a navigation flow and recommend The semOLAP pattern approach is followed in the agriPro-
possible bindings for the corresponding FPEs. Guidance supports Know project to support precision diary farming. The aim of
both existing as well as future, currently unidentified semOLAP precision dairy farming is to exploit data generated by agricul-
patterns. The presented approach has been implemented in the tural cyber-physical systems to improve the overall health of
course of a collaborative research project between industry and the herd through early diagnosis and prevention of diseases [2].
academia on precision dairy farming. In the course of the agriProKnow project animals are tracked
by milk robots measuring milk yield and milk components, vet-
erinarians capturing the animals’ health state, smart ear tags
1 INTRODUCTION tracking animal movement, and micro-climate sensors capturing
Data warehousing and online analytical processing (OLAP) facil- environmental conditions. These data are transformed and loaded
itate data-driven decision making, allowing domain experts to into a data warehouse which allows to compare animals across
make rational decisions. A data warehouses organizes data in a different farm sites. The data warehouse is accessed by domain
multidimensional space (data cube). Each point in such a multi- experts such as veterinarians and farmers, allowing data-driven
dimensional space represents an occurrence of a business event decision making. For example, a domain expert who wants to
(fact) which is quantified by measures. Hierarchically organized compare the milk yield of all young cows with the milk yield
dimensions support the aggregation of facts along a hierarchy of of all cows of the farm site Kremesberg per date starts with the
granularity levels, e.g., day to month, city to county. selection of a suitable semOLAP pattern, i.e., the homogeneous
Standardized reports provide access to data warehouses in set-base comparison pattern which allows to compare a subset
order to satisfy the domain expert’s information needs. These re- with its base set. The selected pattern is then instantiated by
ports are usually not static but rather support the specification of considering the domain expert’s information need. Finally, an
selection criteria restricting only one dimension (slice) or multiple OLAP query is generated based on a pattern instance in order to
dimensions (dice). Each report executes a predefined underlying retrieve the required information (see Fig. 1).
query – an OLAP query – to retrieve the required information. The domain expert provides values for a semOLAP pattern’s
Reports, however, can only satisfy about 60-80% of the informa- FPEs during the instantiation. The values provided for the FPEs
tion needs [6, p. 19]. Satisfying the remaining information needs must satisfy constraints regarding the pattern structure, the
requires the composition of ad hoc OLAP queries. schema, and existing FPE bindings. To guide the domain expert
In order to compose ad hoc OLAP queries, domain experts in this difficult task, an interactive interface is required which
must have knowledge about the underlying schema and the em- allows to refine the semOLAP pattern instance by navigating
ployed query language. Domain experts, however, typically lack from one FPE to another. A domain expert should receive recom-
the required knowledge and, therefore, must rely on assistance mendations of possible FPE value bindings to ease the process
for ad hoc OLAP query composition. of instantiation and to avoid the generation of possibly invalid
queries by enforcing existing constraints. The guidance approach
∗ This research was conducted as part of the agriProKnow project (http://www.
should facilitate instantiation of all semOLAP patterns, even
agriProKnow.com/), funded by the Austrian Federal Ministry of Transport, Inno-
vation and Technology (BMVIT) under the program “Production of the Future” those which are not yet defined. Therefore, the approach should
between 11/2015 and 01/2018, Grant No. 848610. be specific enough to incorporate pattern-specific characteristics
yet general enough to support all existing and future semOLAP
© 2018 Copyright held by the owner/author(s). Published in the Workshop patterns.
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted
under the terms of the Creative Commons license CC-by-nc-nd 4.0.
67
In this paper we propose a guidance approach for semOLAP Example
Compare measures of a subset with its base set.
Abstraction Realization
Pattern
pattern instantiation. After selection of a semOLAP pattern, the (homogeneous set-base comparison pattern)
Pattern
Definition
user is guided through the steps of the instantiation process Compare the milk yield of all cows with the milk
yield of all cows of the farm site per date.
Template
Partial Pattern
Instance
following a navigation flow. This navigation flow connects all Compare the milk yield of all young cows with the milk
Query
Full Pattern
yield of all cows of the farm site Kremesberg per date. Instance
activities required to instantiate the FPEs. For each FPE, possible
bindings are recommended to the user. To this end, the guid- Figure 2: Query abstraction levels
ance approach relies on a semOLAP knowledge graph, which
contains knowledge about the pattern structure, the underlying
schema, and the bindings of already instantiated FPEs. The semO- covers generalized multidimensional queries aggregating busi-
LAP knowledge graph enables the recommendation of bindings ness events (facts) according to spatial, temporal, and/or semantic
which are suitable for specific FPEs. The user can select values as aspects. Domain experts perform such a query by joining one
bindings during the instantiation of the pattern without deeper fact class with its dimensions, restricting the result of the join
knowledge of the underlying schema, dependencies between using selection criteria (business terms), grouping the result us-
query elements, and the query language. After all FPE bindings ing grouping criteria, and aggregating measures by applying
are specified by the user, the instantiation process is finished predefined aggregation functions (calculated measures).
and a corresponding query is generated in order to retrieve the In contrast to basic patterns, which are only based on one set,
required information. The implementation of the data warehouse comparative patterns serve to compare two sets. Therefore, a set
employs a relational database where different fact classes are of interest (SI) and a set of comparison (SC) need to be defined.
stored as fact tables. The semOLAP knowledge graph is repre- The SI is used to specify the primarily focused data which is com-
sented in Resource Description Framework (RDF) format. The pared to another set, the SC. For each of these two sets, either the
interaction flow modelling language (IFML)1 and the WebRatio2 same or different fact class(es), selection criteria, dimension(s),
platform are used for the implementation supporting a model- grouping criteria, and/or measure(s) are defined. Depending on
driven and data-centric development. the number of shared pattern elements, different types of com-
parative patterns can be identified. The homogeneous set-base
Browse Select Instantiate Generate Retrieve comparison pattern, for example, covers all OLAP queries where a
subset (SI) is compared with its base set (SC). It is a homogeneous
Pattern Pattern Pattern Query Result
Instantiate Pattern comparison because both SI and SC refer to the same fact class.
[yes]
The grouping criteria, measures, and selection criteria are shared,
with the exception of additional selection criteria which are ex-
Get next not Instantiate
[no]
instantiated FPE FPE
Gather current Handle FPE Check current
<>
Uninstantiated
clusively used to define the SI. The heterogeneous independent set
FPE information dependencies binding context FPEs exists?
comparison pattern, contrary to the previously described patterns,
<> <> <>
is not restricted to one fact class. It is heterogeneous since two
Semantic Schema
Knowledge
Pattern
Knowledge
Binding
Knowledge different fact classes are used to define SI and SC. Furthermore,
no pattern elements at all must be shared. This also applies to the
Figure 1: Pattern instantiation guidance activities measures to be compared since they can be based on completely
different aggregation functions. The measures from SI and SC
can be used to calculate ratios, rates, percentages, proportions,
The remainder of this paper is organized as follows. Section 2 and other complex values.
discusses the semOLAP approach. Section 3 details the semOLAP The definition of such semOLAP patterns is based on semantic
knowledge graph. Section 4 explains the determination of the web technologies, i.e., RDF, yielding formalized and machine-
navigation flow and the recommendation of possible bindings. readable representations. Furthermore, RDF allows to define
Section 5 exemplifies the instantiation of a semOLAP pattern. shared conceptualizations representing calculated measures and
Section 6 reviews related work. The paper concludes with a sum- business terms (predicates) which can be used during pattern in-
mary and an outlook on future work. stantiation and linked to domain ontologies. Each pattern defini-
tion comprises a textual description, a target language-dependent
2 SEMANTIC OLAP PATTERNS pattern expression, the pattern result, and the FPEs defining its
The notion of patterns is introduced by Alexander et al. [1] where structure.
patterns describe how a specific problem in a specific context As the target audience are domain experts the textual descrip-
can be solved while considering existing constraints. In OLAP, tion includes all relevant information needed to instantiate the
an unsatisfied information need represents the problem whereas pattern. Therefore, each semOLAP pattern definition covers a con-
the specific analysis situation represents the context [10]. A sem- cise pattern name, a description of the analysis situation where
OLAP pattern can, therefore, be seen as an instruction on how to it can be applied in, the solution describing the instructions to
compose an OLAP query that satisfies the information need in follow, and an example. In addition to the textual description,
a specific analysis situation. The identification of such patterns a pattern definition contains a pattern expression. This pattern
is based on the detection of recurring OLAP queries, which are expression is a representation of the query to be generated in
usually abstracted to domain-dependent templates for OLAP a specific target language, e.g., SQL. This representation is en-
reports. To obtain domain-independent semOLAP patterns, such riched by grammar expressions which indicate where certain FPE
templates are grouped and abstracted (see Fig. 2). values must be placed in order to generate an executable query.
As of now, the identified patterns can be grouped into basic The result of a pattern is specified by defining which FPEs are
patterns and comparative patterns. The group of basic patterns returned, i.e., which measures and grouping criteria are returned
1 http://www.omg.org/ifml/ and how they can be enriched by prefixes to foster differentiation
2 http://www.webratio.com of set-specific elements. It is specified only once in the pattern
68
definition and not changed during the instantiation. Reusability sets share the same internal structure, since each of them consists
is fostered, since each result yields a new cube which again can of a FactClass and one or more Measure, Dimension, DimensionAt-
be used as the fact class in other pattern instances. tribute, and Slice values. The outer FPEs, also called non-set FPEs,
The structure of an OLAP query is represented by FPEs, which factCorrelation and compMeasure refer to FPEs within the sets.
are defined as objects in the pattern definition but treated as prop- The join condition factCorrelation determines which attributes
erties during the instantiation. The FPE siFactClass, for example, are used to combine the sets whereas the compMeasure defines
is used to define the fact class of the SI during the pattern instanti- the comparative measure to be calculated.
ation of the heterogeneous independent set comparison pattern. To
support such a behaviour an FPE consists of an element range, a 3 SEMOLAP KNOWLEDGE GRAPH
multiplicity, and is part of zero or more pattern element sets (see Guiding users through semOLAP pattern instantiation requires
Fig. 3). The FPE range defines the (sub)type of the values which the consideration of the available semOLAP knowledge graph,
can be specified during the instantiation. For the FPE siFactClass, which comprises the three knowledge graphs representing the
for example, the range is set to the Fact type. Depending on the pattern definition, the semantic schema elements, and current
FPE, the range can be set to (sub)types representing measures, di- binding of FPEs within the instantiation process (see detailed
mensions, dimension attributes, and predicates. The multiplicity, activity in Fig. 1). Thereby, the semOLAP knowledge graph al-
as the name suggests, determines the number of values that can lows to identify interconnections between a pattern instance and
be provided for an FPE during the pattern instantiation, such as existing values, types, and the underlying schema (see Fig. 4).
One or OneOrMore. The FPE siFactClass, for example, is defined
with the multiplicity One specifying that only one value of the Pattern
Knowledge
siFactClass FPE
Fact type can be used for the definition of the SI. As already
val:Fact
Graph siSlice range
indicated by the prefix of the name siFactClass, FPEs can be as-
val:ObjectPredicate Possible
siDimension
Binding
val:DimensionObject
signed to pattern element sets, e.g., SC or SI, using the partOfSet
Semantic
property. This is especially important during the instantiation Schema
(Sub)Type
FactPredicate Fact
of SI and SC, since different selection criteria can be applied to Knowledge
Graph
different pattern element sets. Object FactDimension Dimension
dependsOn
Predicate Predicate Object subtypeOf
...
requires
1..* Dimension Dimension
range Element Range Predicate Level
isRangeOf
1..* 1
Formal Pattern Element multiplicity Multiplicity binding
Binding :siMeasure
0..* InstanceOf
dependsOn partOfSet Pattern Element Set Knowledge val = BCS
Graph
riskOfObesity
:siFactClass
Figure 3: Formal pattern element structure :siSlice
val = BodyCond.
val = ? :siDimension
val = Animal
FPEs can also be related to each other using the dependsOn
property. The fact class, for example, which stores occurrences
:siDimension
Attribute
of a business event, is the core element of the multidimensional val = MainBreed
model. These stored occurrences are quantified by measures,
hence, there exists a dependency between a measure and its cor- Figure 4: Exemplified semOLAP knowledge graph
responding fact class. Further dependencies exist, since a fact
class can be aggregated to different levels of granularities ac- A typical OLAP query is composed of the fact class represent-
cording to its corresponding dimensions and hierarchies. Each ing the data of interest, grouping criteria, and selection criteria
fact class has predefined dimensions and each dimension can be representing logical restrictions regarding temporal, spatial, and
assigned also to different fact classes. Dimensions support the semantic aspects. The semOLAP pattern definitions reflect this
aggregation of fact classes to different levels of granularity and, structure by specifying FPEs and the relationships between them,
therefore, each dimension has one or more dimension hierarchies e.g., the set of selectable measures and dimensions depends on
which, again, consist of dimension attributes. All these depen- the previously selected fact class. The pattern definition also in-
dencies between FPEs are expressed by dependsOn relationships. cludes constraints for each FPE: multiplicity, element range, and
In addition to the dependencies within the pattern element sets the pattern element sets to which the FPE is related. This avail-
SI and SC, dependencies of FPEs located outside of the pattern able knowledge, also called pattern knowledge, can be exploited
element sets can exist. Comparative measures, for example, are during pattern instantiation, e.g., to determine the FPE instantia-
defined by using measures from both SI and SC. Comparing two tion order or the type of possible values for an FPE. The pattern
sets requires the specification of FPEs respectively attributes over knowledge graph in Fig. 4 is an extract of the FPE’s dependsOn
which those sets can be joined. The join condition can be implicit, relationships of an SI definition. The siSlice depends on the siFact-
if both sets share attributes, or explicitly specified, if no attributes Class and the siDimension, whereas the siDimension depends only
are shared. on the siFactClass. The ranges for these FPEs are represented by
The relationships between pattern element sets and FPEs as the (sub)types of their values, e.g., for the siFactClass the FPE
well as the dependencies between the FPEs themselves yield a range is the type Fact.
graph representation. Fig. 5 depicts such a graph for an instance The types of the FPE ranges are part of the underlying seman-
of the heterogeneous independent set comparison pattern. The tic schema knowledge. The schema is based on the Dimensional
dependsOn relationships are displayed as grey edges since they Fact Model (DFM) [8] which allows to conceptually represent
are only available in the pattern definition and not directly in the multidimensional elements such as fact classes, attributes, dimen-
displayed pattern instance (binding graph). Both pattern element sions, dimension hierarchies, and relationships between them.
69
The modeled elements are represented using the RDF Data Cube 4.1 Navigation Flows
(QB) [5] vocabulary and its extension QB4OLAP [7], thus creating Adapting the idea of logical stratification [12, p. 131-136], we
a semantic multidimensional schema. This RDF representation determine a default navigation flow by calculating the corre-
facilitates the definition of predicates (ObjectPredicates) repre- sponding level of each FPE. The calculation of the levels is based
senting business terms as well as calculated measures (Calculat- on the FPEs from the pattern knowledge and simple rules: FPEs
edMeasures) which can exceed simple aggregations. The semantic with no outgoing dependsOn edge are assigned to level 0; FPEs
schema knowledge graph in Fig 4 shows the types and subtypes which have one or more outgoing dependsOn edges are assigned
of the range of the FPEs and the structural relationships (dotted to the highest level of the referred FPEs plus one; these steps are
directed requires edge) between these (sub)types. During the in- repeated until the level assignments are not changed any more
stantiation process this RDF knowledge provides information (see Algorithm. 1).
about the structure of the type, e.g., the type ObjectPredicate and
some of its subtypes require the structure provided by (sub)types
repeat
of the ranges of siFactClass and/or siDimension. forall formalPatternElement f pe in
In addition to the pattern and the semantic schema knowl- patternKnowledgeGraph do
edge, the binding knowledge has to be considered. It represents level[f pe] := 0;
the current instantiation, i.e., the bindings of FPEs within the end
instantiation process. The pattern instance, again represented in repeat
RDF, is updated during the instantiation process. The binding forall formalPatternElement f pe in
knowledge contains the already instantiated FPEs with their val- patternKnowledgeGraph do
ues and all currently uninstantiated FPEs. During the binding forall dependsOn dp in f pe.dependsOn do
recommendation process, the binding knowledge needs to be tarдetFpe := dp.tarдet;
considered, since it reflects the available structure of existing val- if level[f pe] <= level[tarдetFpe] then
ues on the basis of which suitable values can be determined. The level[f pe] := level[tarдetFpe]+1;
current binding knowledge graph in Fig 4 depicts the available
end
fact class value BCS and the dimension level value MainBreed
end
for siFactClass and siDimension. These values must be consid-
ered to recommend values for siSlice, e.g., in order to recommend end
until there are no changes to any level or a level
the FactDimensionPredicate value riskOfObesity, it is checked if
exceeds the number of formal pattern elements;
its structurally required values BCS and MainBreed exist in the
binding knowledge. until all levels of abstraction are processed;
A guidance approach for query instantiation requires to con- Algorithm 1: Level computation
sider the whole knowledge graph in order to provide navigation
and recommendation and to avoid the creation of invalid queries. An exemplified application of this algorithm is the calculation
Valid queries can be only ensured when all relationships between of the SI levels depicted in Fig. 5. The calculated level assignments
the pattern to be instantiated, the semantic schema elements, and are indicated by the number in the left corner of the instanti-
the already provided values are considered. ated FPEs. The first number indicates the assigned level whereas
the second number indicates the sequence within the default
4 EXPLOITING SCHEMA AND PATTERN navigation flow. The siFactClass is assigned to level 0 since it
has no outgoing dependsOn edges; siMeasure and siDimension
KNOWLEDGE FOR INCREMENTAL
are assigned to level 1 due to their dependence on siFactClass;
PATTERN INSTANTIATION siDimensionAttribute and the siSlice are assigned to level 2 due
The semOLAP patterns provide a conceptual foundation to com- to their dependence on siDimension. This algorithm, however,
pose ad hoc OLAP queries without further assistance. A domain is not limited to the FPEs in the pattern element sets SI and SC,
expert, however, requires visual assistance to fulfil this task. They it can also be applied to the next level of abstraction. Each pat-
should be enabled to browse existing semOLAP patterns, select tern element set can be also seen as an FPE of an outer graph.
the one which fits their information need, and instantiate the Considering this abstraction level both SI and SC represent FPEs
semOLAP pattern in order to generate the desired query. Es- without outgoing dependsOn edges which are referenced by the
pecially the pattern instantiation is a non trivial task since the FPEs factCorrelation or the compMeasure.
available knowledge graph, which can be used to determine and The default navigation flow can be determined by considering
restrict possible values for FPEs, must be considered (see Fig. 1). the dependsOn relationships between the FPEs and the assigned
The guidance process based on semOLAP patterns requires levels. It starts from FPEs assigned to the lowest level, i.e., from
the consideration of the semOLAP knowledge graph as well as FPEs assigned to level 0. If multiple FPEs are assigned to the
an interactive instantiation interface. The interface implementa- same level an arbitrary navigation flow order can be specified for
tion is based on IFML which supports a data-driven application them. These steps are repeated for the next levels until all levels
development following a strict separation of the data model, the are processed. The result is a default navigation flow linking
hypertext model, and the presentation model [4]. We focus on the interface elements of the FPEs during the instantiation of a
the hypertext and presentation model since these are crucial for selected pattern.
the user interaction. Furthermore, interfaces are generated for The dotted directed linksTo edges in Fig. 5 represent the default
the browsing, selection, instantiation, and result retrieval step. navigation flow of the heterogeneous independent set compari-
To detail the guidance approach and the implementation, the son pattern. It starts with the value specification of siFactClass,
instantiation of the heterogeneous independent set comparison followed by the specification of siMeasures and siDimension. The
pattern is exemplified (see Fig. 5). order of these last two FPEs is interchangeable since both of them
70
0:8 0:16
1:2 1:17 1:10 pattern element
:setOfInterest :factCorrelation :setOfComparison
June2016 :siMeasure :scMeasure set instance
MainBreedHolstein val = Consumed val = matchDay val = SumOf
HighFoodConsumption WeighedRough ToDayAfter MilkYieldParlour
HolsteinWithHighFood-
Consumption 2:7 0:1 1:18 0:9 2:15 Level:Sequence
:siSlice :siFactClass :factCorrelation :scFactClass :scSlice instantiated
val = June2016 val = Feeding val = sameAnimal val = Milk val = June2016 FPE
1:4 1:3 1:11 1:12
:siDimension :siDimension :scDimension :scDimension
Suggested
val = Animal val = Date val = Date val = Animal
Bindings
1:19
:compMeasure
2:6 2:5 val = FoodMilk 2:13 2:14
:siDimension :siDimension :scDimension :scDimension
YieldRatio
Attribute Attribute Attribute Attribute
linksTo
val = Animal val = Date val = Date val = Animal
dependsOn
Figure 5: Enriched binding graph of a heterogeneous independent set comparison pattern example
are assigned to level 1. At level 2, values must be specified for this approach can be applied to all other FPEs and (sub)types as
the dimension attributes siDimensionAttribute, representing the well.
grouping criteria, and the siSlice, representing set-specific predi- Each FPE specified in the pattern definition is represented as a
cates. The selectable dimension attributes depend on the specified property during the pattern instantiation. The range of each FPE
dimensions. Depending on the values specified in the previous determines the type of possible bindings, e.g., the range of siSlice
levels, only specific types of predicates can be specified for the is ObjectPredicate. Due to the complexity of the multidimensional
siSlice. In the heterogeneous independent set comparison pattern model, each range can cover multiple subtypes which are a part of
the FPE dependsOn relations of SI and SC are the same, therefore the semantic schema knowledge, e.g., ObjectPredicate is a subtype
SI’s navigation flow can be applied to the SC analogously. To of Predicate and a supertype of FactPredicate, DimensionPredicate,
finish the instantiation of the pattern the factCorrelation attribute, and FactDimensionPredicate. Consequently it is possible to select
used for joining, and the compMeasure, which determines the bindings of the types FactPredicate, DimensionPredicate, or Fact-
type of comparison to be performed, have to be specified. DimensionPredicate for the FPE siSlice (see pattern knowledge
The default navigation flow only allows slight adaptations, and the semantic schema knowledge graph in Fig. 4). We focus
such as changing the FPE order within one level, e.g., either here on bindings of the type FactDimensionPredicate.
siMeasure or siDimension can be instantiated first. Additional Recommending bindings for the current FPE requires to re-
adaptations, however, have to be supported since a user might trieve all other FPEs that the current FPE depends on – the depend-
not want to start with the specification of the FPE siFactClass, ing FPEs. To this end, dependsOn relationships from the pattern
instead they might want to start with other FPEs. As discussed knowledge graph are used. Considering, for example, the depend-
in [3], a user knows prior to the query composition which mea- sOn relationships of the siSlice allows to identify the depending
sure(s) they want to retrieve, therefore a user typically starts FPEs siFactClass and siDimension. For each subtype of the cur-
with the selection of the desired measure(s). This is especially rent FPE’s range, each depending FPE is processed separately.
relevant for ad hoc query composition, since a user wants to For the sake of simplicity we refer to the (sub)type of the current
retrieve something that is not covered by existing reports. Facili- FPE’s range as current subtype and to the subtypes of the de-
tating such a custom navigation flow requires the adaptation of pending FPEs’ range as depending subtypes. For example, for the
the default navigation flow which is based, so far, on the FPEs’ current subtype FactDimensionPredicate, as a subclass of siSlice’s
dependsOn relationships and the assigned levels. The navigation range, the depending FPE siDimension is processed. Therefore,
flow must be detached from these dependsOn relationships to the (sub)types of the depending FPE’s range are retrieved. For
provide such flexibility. A custom navigation flow cannot be de- the range of the depending FPE siDimension, for example, the
termined automatically, instead it must be specified manually subtypes are DimensionLevel and DimensionRole3 .
in the course of system configuration. In exchange, the custom For the current subtype FactDimensionPredicate and the de-
navigation flow allows to move arbitrarily between the FPEs, e.g., pending subtypes DimensionLevel and DimensionRole the possible
allowing to navigate from siMeasure to siFactClass. basic relations FactDimensionPredicateRelatesToDimensionLevel
and FactDimensionPredicateRelatesToDimensionRole need to be
4.2 Binding Recommendation considered. A basic relation is used to represent the structural
The navigation flow allows to move from one interface element relationship of a current subtype to a depending subtype (see
to another while providing values for the corresponding FPEs. requires relationships in the semantic schema knowledge graph
The user can be guided in this process by having bindings recom- in Fig. 4). Not all possible basic relations derived from current
mended for the FPE values. Therefore, the range of the current and depending subtypes actually exist. The siSlice, for example,
FPE (available in the semantic schema knowledge), the dependsOn depends on siDimension but the FactPredicate, which is a subtype
relationships between FPEs (pattern knowledge), and the bind- of siSlice’s range, does not have a basic relation to either sub-
ings of other FPEs (binding knowledge) need to be considered. To types of siDimension’s range DimensionLevel nor DimensionRole.
illustrate this, we exemplify the instantiation of the FPE siSlice Therefore, only the actually existing basic relations are then used
in the heterogeneous independent set comparison pattern by 3 A dimension role is used to reference dimensions using different names, e.g., the
recommending bindings of the FactDimensionPredicate subtype; dimension animal can be references using the dimension role dam animal.
71
to determine potential bindings of the current subtype for the extended to bidirectional ones. After the dependsOn relationships
current FPE. Each of the existing basic relations is represented are extended, the corresponding basic relations have to be de-
by a predefined SPARQL Protocol And RDF Query Language fined. Therefore, the basic relations of the subtypes of the FPE’s
(SPARQL) query which checks all available values of the current ranges need to be extended by relations in the opposite direction,
subtype in order to determine potential bindings. e.g., for the FPEs siSlice and siFactClass with their corresponding
The bindings of the depending FPE that are of the depend- ranges ObjectPredicate and Fact the existing basic relations are
ing subtype are checked against the required structure of each extended by FactRelatesToFactDimensionPredicate, FactRelatesTo-
available value of the current subtype. The structure of values FactPredicate, and FactRelatesToDimensionPredicate.
is represented by relationships in the multidimensional schema, If a user, for example, starts to select a binding for siMea-
predicates, and measures, e.g., the FactDimensionPredicate riskO- sure and then navigates to the FPE siFactClass, the basic relation
fObesity requires the dimension attribute MainBreed and the FactRelatesToObjectCalculatedMeasureRelates can be used to de-
measure BCS (see Fig. 4). If the required structure regarding the termine Fact values, which provide the necessary structure for
current basic relation is available in the structure of the depend- the previously specified siMeasure value(s). This, again, takes the
ing FPE binding, the value of the current subtype is added to the semOLAP knowledge graph into account. The binding recom-
list of potential bindings of the current depending FPE. Determin- mendations for a custom navigation flow, however, faces also
ing if riskOfObesity, for example, is a potential binding for the limitations. For example, the instantiation could start with the
current FPE, the underlying SPARQL query of the basic relation specification of the siMeasure value, followed by the siDimension
FactDimensionPredicateRelatesToDimensionLevel checks whether value and continuing with the siFactClass value. Recommending
a binding of the depending subtype DimensionLevel (of the de- possible bindings for siDimension would not be possible, since
pending FPE siDimension) exists which contains the dimension no basic relations between the subtypes of siMeasure and siDi-
attribute MainBreed. Since available values of the current sub- mension ranges exist in the heterogeneous independent set com-
type are checked against multiple bindings of different depending parison pattern. All potential bindings, in this case all values
subtypes, the list of potential bindings of the current depend- of the subtypes DimensionRole and DimensionLevel, would be
ing FPE is extended continuously, e.g., bindings of siDimension recommended as possible bindings. This issue can be solved by
are either of the depending subtype DimensionLevel or Dimen- introducing new basic relations between the subtypes of siMea-
sionRole. This concludes the calculation of potential bindings sure and siDimension. These new basic relations, which do not
for the first depending FPE siDimension for the current subtype follow existing dependsOn relationships, can be created by con-
FactDimensionPredicate. sidering existing basic relations between subtypes of siMeasure
The current subtype, however, might require depending sub- and siFactClass and subtypes of siFactClass and siDimension. In
types of more than one depending FPE, e.g., FactDimensionPred- contrast to siDimension, possible bindings could be recommended
icate requires the depending subtype Fact of siFactClass and at for the siFactClass without new basic relations, since there exist
least one of the depending subtypes of DimensionObject of siDi- dependsOn relationships between the siFactClass and both siDi-
mension (see semantic schema knowledge graph in Fig. 4). There- mension and siMeasure. The bidirectional basic relations FactRe-
fore, the potential bindings of these depending FPEs have to be latesToDimensionRole, FactRelatesToDimensionLevel, and FactRe-
determined which results in a list of potential bindings for each latesToObjectCalculatedMeasure are therefore used. Considering
depending FPE. Only the potential bindings present in all these these relations allows to recommend bindings for the instantia-
lists are possible bindings which can be recommended as a bind- tion of the FPE siFactClass, however, it could be possible that no
ing for the current FPE, e.g., riskOfObesity requires a binding of bindings at all could be recommended. This would be the case if,
subtype DimensionLevel or DimensionRole which contains the for example, a combination of values for siMeasure and siDimen-
dimension attribute MainBreed as well as a binding of subtype sion is selected which cannot be structurally supported by any
Fact which contains the measure BCS. value of siFactClass. To resolve this issue the user would need to
The calculation of possible bindings is repeated for all other navigate back and edit the corresponding FPE values, otherwise
current subtypes, i.e., FactPredicate and DimensionPredicate. Fi- the instantiation could not be continued. The default navigation
nally, all possible bindings of all subtypes of the current FPE’s flow is not affected by these limitations at all, since it consid-
range are combined and returned to the user as binding recom- ers the logical dependencies derived from the multidimensional
mendations for the FPE currently being instantiated. model.
Considering the relationships between a current subtype and Binding recommendations are restricted by the availability of
its depending subtypes using the corresponding basic relations existing bindings of the depending FPEs. The advantage of this
enables the recommendation of bindings along the default navi- approach is that the basic relations are defined only once and can
gation flow. These corresponding basic relations are following be reused multiple times since FPEs with the same dependencies
the dependsOn relationships between the FPEs. Supporting rec- are used within different pattern definitions. For example, the
ommendations for the custom navigation flow, however, requires basic relation between the subtype FactDimensionPredicate and
the extension of the dependsOn relationships to bidirectional ones. DimensionLevel occurs in the basic multi-aggregation patterns as
Up to now, the FPE’s dependsOn relationships have reflected a well as in other comparison patterns such as the homogeneous
hierarchical structure, i.e., siFactClass serves as the root whereas set-base comparison pattern. New FPEs with currently not con-
all other FPEs are either directly or indirectly depending on sidered dependsOn relationships can be introduced as part of new
it. This hierarchical view yields a directed graph which can be semOLAP patterns. To handle these dependencies only their basic
traversed from top to bottom, i.e., the default navigation flow. relations and the underlying SPARQL queries need to be defined
Representing the dependsOn relationships bidirectionally leads once. Contrary to this case, new semOLAP patterns using only
to a non-hierarchical view which can be traversed beginning considered dependsOn relationships, can be instantiated without
from any FPE. No new relationships between FPEs are intro- further effort.
duced, only existing unidirectional dependsOn relationships are
72
5 EXEMPLIFIED PATTERN INSTANTIATION FPEs scFactClass and scMeasure different bindings have to be
The guidance approach is exemplified by instantiating the het- specified. The Milk fact is specified as the binding of scFactClass
erogeneous independent set comparison pattern following the and SumOfMilkYieldParlour measure as the binding of scMeasure.
default navigation flow. Even though custom navigation flows After both sets are instantiated, the domain expert specifies
could be supported, only the unidirectional basic relations are the bindings matchDayToDayAfter and sameAnimal for the fact-
implemented so far. To illustrate our approach we consider a Correlation and FoodMilkyieldRatio for the compMeasure to finish
domain expert who wants to compose an ad hoc query which the instantiation process. Similar to the previous instantiation
calculates the ratio of the consumed food of one day (SI) and the steps, other possible values for these two FPEs are provided, how-
milk yield of the next day (SC) for the same animal in June 2016 ever, only the mentioned are relevant for the intended query. The
per date and animal (see FPE bindings in Fig. 5). The resulting factCorrelation value matchDayToDayAfter combines the fact oc-
ratio is used by the expert to see whether the amount of food fed currences of SI of one specific day with the fact occurrences of SC
the day before impacts the milk yield of the next day. Therefore, of the next day whereas sameAnimal restricts the combinations
a data cube containing the two fact classes Milk and Feeding with to the same animals. The FoodMilkyieldRatio calculates the ratio
the shared dimensions Date, Farm Site, and Animal is accessed between SI’s consumedWeighedRough and SC’s SumOfMilkYield-
(see DFM model in Fig. 6). Parlour. After all FPE values are specified, a summary of the
pattern instance is provided (see Fig. 7). This summary provides
Month an overview of all FPEs of a semOLAP pattern instance and in-
Year Day
ProvinceState dicates which FPEs are differently specified in SI and SC. The
differences are color-encoded to ease their identification. The
Town
Date FarmSite
Date FarmSite
Hierarchy
structure of the overview is independent of the semOLAP pattern
Milk
milkYield
Feeding
amountOfferedWeighedRough
Level but it can be adapted by the developer to consider characteristics
fatContent amountRestWeighedRough
Dimension of certain patterns.
proteinContent NoOfFeedingsRough
Attribute
... ...
Dimension
Animal
Second MainBreed
Breed Fact
DateOf Sire Dam
Birth Animal measure
Figure 7: Detail of the pattern instance summary
Figure 6: DFM of facts of interest
Finally, to satisfy the domain expert’s information need, the
The instantiation starts with the specification of the SI and
OLAP query is generated and sent to the underlying ROLAP
its corresponding FPEs. The available fact classes Milk and Feed-
system. The result of this OLAP query is visualized and can
ing are recommended, besides others, to the domain expert. The
thereby be interpreted by the domain expert. The domain expert
domain expert selects Feeding as the binding value for the siFact-
can reuse this pattern instance for future analysis situations and
Class FPE. The specified fact class Feeding and the corresponding
adapt it to fit their information need. Besides fully-instantiated
basic relations are used to determine possible bindings for siDi-
patterns, partial instances can be specified as well, e.g., the fact, di-
mension, i.e., the dimensions Animal, Date, and FarmSite. The
mension, dimension attributes, and measures are predefined and
domain expert selects the Animal and Date dimensions as values
only additional selection criteria can be specified. These partially-
for siDimension and continues with the specification of the mea-
instantiated patterns can be reused as domain-dependent query
sure of interest siMeasure, i.e., the consumedWeighedRough repre-
templates. A video4 of this instantiation process is provided to
senting the summarized values of consumed weighed roughage.
demonstrate the current state of the implementation.
For the siDimensionAttribute the dimension attributes Animal
and Date binding values are selected. The names of the values of 6 RELATED WORK
siDimension and siDimensionAttribute are the same, even though
they represent different FPEs. This is the result of naming con- The guidance of users during the OLAP query composition is
ventions, since the dimension’s name is used as the name of mostly accompanied by providing suitable visualizations of the
the identifying dimension attribute (see Fig. 6). Based on the query elements. The Semantic Data Warehouse Model (SDWM [3])
dimension values Animal and Date and the fact value Feeding, allows for a visual specification of multidimensional queries. The
DimensionPredicate values, such as June2016 or MainBreedHol- SDWM does not represent semantic representations of the mul-
stein, FactPredicate values, such as HighFoodConsumption, and tidimensional data model, e.g., using QB and QB4OLAP, instead
FactDimensionPredicate values, such as HolsteinWithHighFood- the SDWM considers both the operational requirements as well
Consumption, are recommended to the domain expert as possible as the semantics of the business processes to be modeled [3].
values for siSlice (see recommended bindings in Fig. 5). To finish Therefore, templates which are based on the SDWM are pro-
the instantiation of the SI, the domain expert selects June2016 as vided to users, serving as configurable reports [3]. Each of these
the binding for siSlice. templates consists of predefined measures, dimensions, and di-
The instantiation of SC starts after the instantiation of SI is mension attributes which are related to each other. The relation-
finished. Since the default navigation graph of SC and SI are iso- ships between these template elements are visualized to represent
morph, the domain expert is guided through the same steps. The the dependencies between the measures and dimensions. The
domain expert provides the same value bindings to scDimension, user specifies the OLAP query by either adding/removing new
scDimensionAttribute, and scSlice already used for the analogous measures or by selecting the dimension hierarchy levels. Sim-
FPEs of the SI. This is possible since in this query SI and SC share ilar to our approach, only possible dimension and dimension
the dimensions, dimension attributes, and slices. Only for the 4 https://www.youtube.com/watch?v=BLt6heO7WKY
73
attribute values are provided to the user, however, this is not For each semOLAP pattern a default navigation flow is calcu-
ensured through reusable basic relations. Furthermore, additivity lated and provided to guide the user through the instantiation
checks are performed, which restrict aggregations to only possi- process. If an instantiation sequence other than suggested by
ble measures, e.g., it does not make sense to summarize the food the default navigation flow is more convenient for a particular
to milk ratio over time. The proposed approach using the SDWM, pattern, a custom navigation flow can be configured by a devel-
however, does not provide the abstraction of semOLAP pattern, oper. To this end, basic relations are extended to bidirectional
since it focuses on case-specific templates which are restricted relations, allowing to recommend bindings independent of the
to a corresponding fact class. The user is not able to specify fact navigation sequence. Even navigation flows between FPEs which
values nor complex predicate values; only simple restrictions are not represented in the FPE’s dependsOn relationships can be
are supported. Furthermore, the composition of ad hoc queries supported, hence, leading to a maximum level of flexibility. The
targeting multiple fact classes is not considered at all. drawback of this flexibility is that instantiation situations can
Another query visualization interface is Polaris [11] which occur where no bindings at all can be recommended, since an
led to the development of Tableau5 . Polaris focuses on analyzing, unsupported FPE value combination is selected.
querying, and visualizing multidimensional relational databases Future work will include displaying available information
although newer versions of Tableau support other data struc- currently hidden in the semantic representation of schema ele-
tures as well. Instead of focusing on ad hoc queries, it primarily ments, e.g., the measure SumOfMilkYieldParlour is linked to the
supports the explorative data analysis approach by providing an AGROVOC 6 ontology which includes a concise definition of the
interactive visualization of both the query and the result. The measure. The calculation of a navigation flow can also consider
query is defined by a visual specification within a table-based the number of available FPE values. This would require the con-
interface, which allows to specify dimensions, measures, and sideration of the selectivity of FPE values, i.e., it is preferable
grouping and filter criteria along with possible visualisation op- to start with the FPE value specification which has the highest
tions. Corresponding queries are generated using an underlying potential to reduce the number of potential values of other FPEs.
table algebra. Ad hoc queries can be formulated since all function- Furthermore, the visualization of the result, which is currently
alities for composition are provided, however, analysis situation- limited to a table representation, will be extended. In the future
specific guidance is not supported. The user is not guided while domain- and data-dependent visualizations will be automatically
creating, for example, comparisons of sets from one or multiple applied to the result. This visualization will also comprise a gen-
fact classes, even though, the necessary functionality is available. erated caption as well as a generated result description.
The application of filters is provided, however, these are restricted
to simple expressions; predicates representing business terms REFERENCES
are not supported. Furthermore, calculated measures relate only [1] Christopher Alexander, Sara Ishikawa, Murray Silverstein, Joaquim Roma-
guera i Ramió, Max Jacobson, and Ingrid Fiksdahl-King. 1977. A pattern
to the fact class where they where specified, whereas calculated language. Oxford University Press.
measures and object predicates in semOLAP are independent of [2] Jeffrey Bewley. 2010. Precision dairy farming: advanced analysis solutions for
the fact class as long as the necessary structure is provided by future profitability. In Proceedings of the first North American conference on
precision dairy management, Toronto, Canada. 2–5.
any target fact class. This is possible since our approach is not [3] Michael Böhnlein, Achim Ulbrich-vom Ende, and Markus Plaha. 2002. Visual
directly based on the relational data model, unlike Polaris [11], Specification of Multidimensional Queries based on a Semantic Data Model.
instead it is based on the multidimensional data model. In com- In Vom Data Warehouse zum Corporate Knowledge Center. Springer, 379–397.
[4] Marco Brambilla and Piero Fraternali. 2014. Interaction flow modeling language:
parison to the Polaris approach, we support reusing and editing Model-driven UI engineering of web and mobile apps with IFML. Morgan
instantiated semOLAP queries to match the new information Kaufmann.
[5] Richard Cyganiak and Dave Reynolds. 2014. The RDF Data Cube Vocabulary.
demand. SemOLAP queries can, additionally, be used as the data W3C Recommendation. W3C. http://www.w3.org/TR/2014/REC-vocab-data-
input for other semOLAP queries since each result is represent- cube-20140116/.
ing a possible fact class as it consists of measures and dimension [6] Wayne W. Eckerson. 2008. Pervasive business intelligence: Techniques and
technologies to deploy BI on an enterprise scale. TDWI Best Practices Report
attributes. (2008).
[7] Lorena Etcheverry and Alejandro A. Vaisman. 2012. QB4OLAP: A New Vo-
cabulary for OLAP Cubes on the Semantic Web. In Proceedings of the Third
7 SUMMARY AND FUTURE WORK International Conference on Consuming Linked Data. 27–38.
We have proposed a guided query composition approach based [8] Matteo Golfarelli, Dario Maio, and Stefano Rizzi. 1998. The dimensional fact
model: A conceptual model for data warehouses. International Journal of
on semOLAP patterns. The semOLAP pattern approach provides Cooperative Information Systems 7, 2-3 (1998), 215–247.
the conceptual foundation to allow ad hoc query composition [9] Thomas Neuböck, Bernd Neumayr, Michael Schrefl, and Christoph Schütz.
2014. Ontology-Driven Business Intelligence for Comparative Data Analysis.
by domain experts. This conceptual foundation is realized by In eBISS 2013. LNBIP, Vol. 172. Springer, 77–120.
a data-centric and model-driven implementation which guides [10] Christoph G. Schuetz, Simon Schausberger, Ilko Kovacic, and Michael Schrefl.
the domain expert during the instantiation of the FPEs. For each 2017. Semantic OLAP Patterns: Elements of Reusable Business Analytics. In
OTM 2017 (LNCS), Vol. 10573. Springer.
FPE instantiation, bindings are recommended by considering the [11] Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query,
semOLAP knowledge graph which comprises knowledge about analysis, and visualization of multidimensional relational databases. IEEE
the semantic schema, the pattern structure, and the current FPE Transactions on Visualization and Computer Graphics 8, 1 (2002), 52–65.
[12] Jeffrey D. Ullman. 1988. Principles of Database and Knowledge-Base Systems,
binding. Therefore, basic relations are introduced which are used Volume I. Principles of computer science series, Vol. 14. Computer Science
to check structural dependencies between the (sub)types of the Press.
[13] Martin Wischenbart, Dana Tomic, Michael Iwersen, Michael Schrefl, and
FPE ranges. This allows users to move through the FPE instanti- Valentin Sturm. 2017. agriProKnow – Prozessbezogenes Informationsmanage-
ation process and select recommended bindings which consider ment in Precision Dairy Farming. In Proceedings der 13. Tagung Bau, Technik
the current instantiation state, the FPE’s type information, and und Umwelt in der landwirtschaftlichen Nutztierhaltung (BTU-Tagung 2017).
constraints of the FPE itself.
6 http://aims.fao.org/vest-registry/vocabularies/agrovoc-multilingual-agricultural%
5 https://www.tableau.com
2Dthesaurus
74