Augmented Business Intelligence
                Matteo Francia                                     Matteo Golfarelli                              Stefano Rizzi
         DISI - University of Bologna                         DISI - University of Bologna                DISI - University of Bologna
             m.francia@unibo.it                                matteo.golfarelli@unibo.it                    stefano.rizzi@unibo.it

ABSTRACT                                                                        OLAP queries on the enterprise multidimensional cubes. The
Augmented reality allows users to superimpose digital informa-                  quantity of data returned must be limited in size and focused
tion (typically, of operational type) upon real world entities. The             on the context to meet performance constraints and be easily
synergy of analytical frameworks and augmented reality opens                    interpretable by the user; furthermore, the intrinsic dynamics of
the door to a new wave of situated OLAP, in which users within                  AR applications asks for right-time (reasonably a few seconds)
a physical environment are provided with immersive analyses of                  responsiveness of A-BI.
local contextual data. In this paper we propose an approach that,                  As shown in Figure 1, given a user situated in an environment
based on the sensed augmented context (provided by wearable                     and equipped with a smart device capable of perceiving relevant
and smart devices), proposes a set of relevant analytical queries               elements (i.e., the context) of such environment by means of
to the user. This is done by relying on a mapping between the                   sensors, A-BI provides real-time generation and visualization
entities that can be recognized by the devices and the elements                 of multidimensional analyses out of contextual features. The
of the enterprise data, and also taking into account the queries                context is modeled as a set of recognizable environment elements
preferred by users during previous interactions that occurred                   (e.g., a package) plus a set of user/environmental information
in similar contexts. A set of experimental tests evaluates the                  (e.g., the user role and the room temperature); we assume the
proposed approach in terms of efficiency and effectiveness.                     pattern recognition capabilities necessary to recognize them are
                                                                                provided by the AR smart device (specifically, through the Context
                                                                                generation component in Figure 1).
1    INTRODUCTION                                                                  A-BI keeps track of user feedbacks on queries, and adopts col-
With the disruptive advances in pervasive computing and indus-                  laborative filtering to provide a more focused experience. With
try 4.0, business intelligence is shifting its focus to the integration         reference to Figure 1, this task is carried out by the Context in-
of (internal) enterprise and (external) contextual data. In this con-           terpretation component by relying on the query log. Although
text, the synergy of analytical frameworks and augmented reality                A-BI supports the possibility to learn meaningful queries from
opens the door to situated OLAP, in which users within a physi-                 the log, its capability of returning the right information primarily
cal environment are provided with immersive analyses of local                   comes from some a-priori knowledge provided by domain ex-
contextual data. Indeed, Augmented Reality (AR), a variation of                 perts. This choice is not simply a solution to the well-known cold
virtual reality, allows users to superimpose digital information                start problem (i.e., the problem of providing significant recom-
upon real world entities [1], thus determining an augmented                     mendations when user feedbacks are still very few); it is rather
environment. Nowadays digital data returned to users are typi-                  a design choice aimed at enabling the system to give a useful
cally operative, meaning that they either provide information on                answer in complex context scenarios, where learning from the
the current state of visualized objects (e.g., the temperature of a             log would require too many examples. The a-priori knowledge is
machine) or on the current operations to be carried out (e.g., the              modeled through a set of weighted mappings between the poten-
instruction to use the machinery). Conversely, limited attention                tially recognizable environment entities (stored in the dictionary)
has been devoted to provide analytical reports that can be useful               and the cube multidimensional ones. Instead of proposing the
to let the user compare the current behavior of the visualized                  most relevant query only (called maximal query), A-BI proposes
objects with their historical behavior.                                         a set of alternative queries to the user; all of them are related
   This new goal opens relevant research challenges and revamps                 to the current context but they are different enough to offer to
many issues related to business intelligence and recommenda-                    the user different flavors of the same information. This phase is
tion systems [2]. Indeed, when working with high-dimensional                    implemented by the Diversification component.
contextual data (the multi-dimensional nature of the context is                    A-BI can be applied to different application domains ranging
well understood [3, 4]), identifying insightful queries and visual-             from healthcare to factories; for this reason, the main model-
izations is not trivial [5]: How is the perceived context mapped                ing choices underlying our approach (e.g., how to define the
to enterprise data? Which data is salient with respect to the user              relevance of an object in a context) have been formalized in a
analysis? How can data be retrieved in real time? How do users                  domain-independent fashion, while domain-dependent examples
interact with the retrieved information?                                        are provided in the context of AR in a factory where a smart
   To the best of our knowledge, none of the context-aware rec-                 device is coupled with a sensor data warehouse [6]. To better
ommender systems proposed in literature addresses the above                     understand the user/system interaction suppose that Bob, a ma-
questions with reference to situated analytics in general, and to               chinery inspector, is situated in a factory to investigate a malfunc-
augmented reality in particular. In this paper we envision and                  tioning packaging machine that —possibly due to overheating—
formalize a foundation for Augmented Business Intelligence (A-BI),              produces dented packages. Environmental data is gathered and
a framework empowering AR users with analytical information                     integrated from wearable and smart devices into a single context
under visualization and time constraints. The analytical informa-               representation. So, when looking at the machine, the AR hel-
tion returned comes in the form of reports obtained by running                  met Bob is wearing recognizes the machinery and, knowing the
                                                                                role of Bob, suggests the set of analytical queries that are more
© 2019 Copyright held by the author(s). Published in the Workshop Proceedings   relevant according both to a-priori knowledge and to previous
of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on
CEUR-WS.org.
                                                                                feedbacks given by users in similar contexts. Relevant figures
                                                                                                         Analytical Report


                                                                                                                                        Log

                                                                                                               diversified queries
                            Data mart                                 Context generation
             Year
                               DeviceType                                                                     Diversification
           Month
                        Device
            Date
                                                                             Context                            maximal queries
                     MaintenanceActivity                         <Device, ConveyorBelt, 1> dist = 0.5m
                    Duration
                                                 Maint.Type      <Package, P2645, 0.3> dist = 0.8m                Context
                                                                 <Device, TempSensor, 1> dist = 1m             interpretation
                               DeviceType                        <Role, Inspector, 0.5>
                                            Building
                                                                 <Location, RoomA.1, 1> dist = 0m
             Time           Device
                                                                 <Date, 16/10/2018, 0.5>
                                            Room

                            Measurement
                                                               Dictionary
                    Value
                                                   Meas.Type


Figure 1: Overview of the A-BI framework; engaged entities are enclosed in dotted rectangles, while grayed elements are
outside the paper scope


could be the number of defective packages along time and the                         results [10] or databases [11], user interests [12], and social data
dates and types of previous maintenance activities for that ma-                      [13]. Given such heterogeneity, other contributions address the
chinery. Bob can either execute one of the proposed queries or                       integration of contextual data (e.g., [13–15]) to provide a common
express a new query to obtain a different report. Finally, Bob                       ground for further analyses and recommendations.
gives his feedback on the proposed queries, which is stored in                          The previous context types have been widely adopted in sev-
the log.                                                                             eral recommender system applications where the recommenda-
   To sum up, the main contributions of this paper are:                              tion process is activated by an explicit user-defined input state-
    (1) We envision an A-BI framework, its functional architec-                      ment (e.g., query or keywords). Examples of applications are
        ture, and the user/system interaction process;                               web query categorization [16, 17], recommendation [12, 18], and
    (2) We provide a method to model the a-priori system knowl-                      diversification [19]; query completion [8, 9]; localized web key-
        edge by mapping context entities to relevant multidimen-                     words suggestion [7]; and interactive exploration of databases
        sional elements;                                                             [20]. The two main differences between A-BI and these work are
    (3) We provide an efficient algorithm to generate relevant and                   (1) the multidimensional nature of the data handled and returned,
        diverse queries to be returned to the user;                                  and (2) the nature of the context as well as the type of user/system
    (4) We propose a collaborative filtering approach to let the                     interaction that triggers the recommendation. As to (1), multi-
        system learn from user feedbacks.                                            dimensional and hierarchical data support recommendations at
                                                                                     different granularities, which intrinsically makes finding the best
The remainder of the paper is organized as follows. Section 2                        recommendation more complex; as to (2), physical contexts re-
describes the related work in the field of context-aware recom-                      quire ad hoc solutions to choose the relevant context elements
mendation systems. Section 3 formalizes the A-BI framework.                          due to application specificities (e.g., engagement) and to the pos-
Section 4 explains how the context is used to derive a starting                      sibility of having in the context elements that are perceived but
query for the diversification process by also considering the user                   that are not relevant for the user.
feedbacks stored in the log, while Section 5 describes how rele-                        Recommender systems in business intelligence applications
vant and diverse queries are generated from that query. Section 6                    are well surveyed in [21]. Recommendations typically involve
includes the results of experimental tests that measure the per-                     multidimensional queries [22] or sessions [17] (i.e., query se-
formance of A-BI. Finally, Section 7 sums up our contribution                        quences) using query logs as contexts. These approaches are
and gives future research directions.                                                based on collaborative filtering techniques that do not synthesize
                                                                                     new queries from existing ones, but pick queries from the log
2    RELATED WORK                                                                    depending on their similarity score. Conversely, A-BI allows the
A-BI can be classified as a recommender system in the area of                        generation of queries not already present in the log by combin-
of business intelligence, based on a context made of augmented                       ing similar queries from the log and contextual information into
entities. Although the huge amount of work in each area, to the                      a set of diverse queries. This assumption collocates A-BI as a
best of our knowledge no approach lies at their intersection.                        hybrid approach to recommendation [21], differentiating A-BI
   Over the years, scholars have highlighted the importance of                       from the above-mentioned contributions in multidimensional
exploiting contextual information to provide focused recommen-                       recommendation systems. Note that diversification and multiple
dations with the nature of contexts being quite heterogeneous,                       recommendations are used to better meet user interests [23]. A
for instance space and time [7], query logs [8, 9], statistics on
further advantage of A-BI over pure collaborative filtering ap-         of queries that was first studied in [28]. A GPSJ query is composed
proaches is that A-BI does not suffer from the so-called cold start     of joins, selection predicates, and aggregations.
problem, since it is able to return an appropriate recommendation
                                                                           Definition 3.4 (Query). A query q on cube C = (H, M, ω) is a
even when the log is empty [24].
                                                                        triple q = (Gq , Pq , Mq ) where Gq is the query group-by set, i.e.,
    In the area of AR, contexts play an even more central role.
                                                                        a subset of levels in the hierarchies of H ; Pq is a set of Boolean
There, a context is the set of entities recognized in the environ-
                                                                        clauses whose conjunction defines the selection predicate for q;
ment that acts as situated stimulus (i.e., object properties) to be
                                                                        Mq ∈ M is the set of measures whose values are returned by q.
translated into inputs for a search query, is augmented with vir-
tual information, and is returned to the user [25]. Scholars focus         Example 3.5. With reference to the MaintenanceActivity cube,
on finding proper visualization of relevant information [25, 26],       the query asking for the average duration of maintenance ac-
with visualization typically embedded in physical objects. While        tivities for each month of 2018 and each device type is defined
[26] considers multidimensional data, it is not specified how           by
the process of information retrieval and analysis of data at mul-
                                                                                           Gq ={Month, DeviceType}
tiple level of details is carried out. Also, these approaches do
not include collaborative filtering to discover potentially useful                          Pq ={(Year = 2018)}
information. Interestingly, although [26] does not consider ana-                           Mq ={Duration}
lytical data, it introduces a mantra for situated analytics: “details                                                                     
first, analysis, then context-on-demand” which contradicts the
well-known mantra “overview first, zoom and filter, then details-          Enterprise cubes are the data source for the analytical reports
on-demand” [27] of classical visualization systems. Indeed, when        to be returned to users according to the environment as perceived
it comes to pure augmented visualizations, information is directly      by the AR device. The set of data possibly perceived are listed
attached to single entities [25, 26], assigning higher priority to      in a dictionary that, intuitively, defines the device capabilities.
details than to generic information.                                    These data are not limited to physical objects recognized in the
                                                                        environment through a pattern recognition process, but may
3   BASICS                                                              include user-related information such as the user role as well as
We start this section by introducing a formal setting to manipulate     environmental properties such as the room temperature.
multidimensional data; for simplicity we will work on a single             Definition 3.6 (Dictionary). A dictionary D is a set of keys,
cube and consider linear hierarchies.                                   each coupled with a domain of values (all domains include a
    Definition 3.1 (Hierarchy). A hierarchy is defined as a triple      NULL value). Each pair d = ⟨key, value⟩ such that value belongs
h = (Lh , ≽h , ≥h ) where:                                              to the domain of key is called an entry of D.
   (i) Lh is a set of categorical levels; each level l ∈ Lh is cou-        NULL values are used whenever the device through which the
       pled with a domain Dom(l) including a set of members (all        user perceives the environment successfully classifies an object
       domains are disjoint);                                           but is not capable of labelling the specific instance sensed (e.g., a
  (ii) ≽h is a roll-up total order of Lh ; and                          package is recognized but the package code cannot be read).
 (iii) ≥h is a part-of partial order of l ∈Lh Dom(l).
                                         Ð
                                                                           The power of the A-BI framework comes from the ability to
Exactly one level dim(h) ∈ Lh , called dimension, is such that          bind the perceived information to the cube md-elements. This
dim(h) ≽h l for each other l ∈ Lh . The part-of partial order is        capability is rooted in a-priori knowledge that specifies which
such that, for each couple of levels l and l ′ such that l ≽h l ′ ,     md-elements can be interesting for the user when a given dic-
for each member u ∈ Dom(l) there is exactly one member u ′ ∈            tionary entry is perceived. This knowledge is defined through a
Dom(l ′ ) such that u ≥h u ′ .                                          dictionary-to-cube mapping established by a domain expert at
                                                                        setup time.
    Definition 3.2 (Cube). A cube is defined as a triple C =
(H, M, ω) where:                                                           Definition 3.7 (Mapping). A mapping from dictionary D to
   (i) H is a set of hierarchies;                                       cube C is a (partial) multivalued function µ that maps each entry
  (ii) M is a set of numerical measures, with each measure              d of D to a set of md-elements of C. Each md-element e ∈ µ(d)
       m ∈ M coupled with one aggregation operator op(m) ∈              has a mapping weight, w(d, e) ∈ (0, 1].
       {sum, avg, . . .}; and                                              Although a discussion about how mappings can be established
 (iii) ω is a partial function that maps the tuples of members for      is out of the scope of this paper, we remark that this does not
       the dimensions of H to a numerical value for each measure        necessarily have to be done manually for all the cube members,
       m ∈ M.                                                           which would be tedious for attributes with large domains, but it
   The levels, members, and measures of the cube C are given            can be largely automated (for instance, by providing universally-
the generic name of md-elements of C.                                   quantified rules such as µ(⟨Device, value⟩) = {value} ∀value ∈
                                                                        Dom(Device)). The mapping function is multivalued since many
   Example 3.3. Let us consider cube MaintenanceActivity in our
                                                                        md-elements may be of interest for each dictionary entry; this
working example which, as sketched in Figure 1, includes hierar-
                                                                        typically happens for hierarchy levels, which can be all inter-
chies Date, Device, and MaintenanceType. Maintenance activities
                                                                        esting —even if with different values of w. For example, when
are described by measure Duration (coupled with the sum oper-
                                                                        some device is perceived, besides showing data for that specific
ator). A member of the Device attribute is PackagingMachine,
                                                                        device, also showing aggregated data for the device type could
while a member of MaintenanceType is Oiling.                  
                                                                        be interesting. Through mapping weights, domain experts give
   In the A-BI framework, cubes are queried through GPSJ (Gen-          an a-priori quantification of the interest of each md-element for
eralized Projection / Selection / Join) queries, a well-knows class     analyses when a given entry is part of the context.
   Example 3.8. The dictionary for our example includes, among            The image includes the set of md-elements relevant to a con-
the others, entries related to machines and their components           text according to the mapping, but it does not specify how they
(key = Device, value = ConveyorBelt, PackagingMachine, etc.),          will be used to generate the queries to be proposed to the user
and user roles (key = Role, value = Inspector, Operator, etc.).        when that context is sensed. Indeed, given an image, several
The mapping from this dictionary to the cube of Example 3.3            queries can be generated, each including a subset of the md-
may look like this (see Figure 1):                                     elements in the image. This is ruled by the definition of compatible
                                                                       query.
                µ(⟨Device, ConveyorBelt⟩) = {ConveyorBelt},
                       µ(⟨Role, Inspector⟩) = {Duration,                  Definition 4.4 (Compatible and Equivalent Query). A query
                                                                       q = ⟨Gq , Pq , Mq ⟩ is compatible with context T if it refers to at
                                                 MaintenanceType},
                                                                       least one of the md-elements in the image of T ; specifically,
                     µ(⟨Date, 16/10/2018⟩) = {Date},                        • a level l is referred by q if l ∈ Gq ;
w(⟨Device, ConveyorBelt⟩, ConveyorBelt) = 0.5                               • a measure m is referred by q if m ∈ Mq ;
                                                                            • a member u is referred to by q if (lev(u) = u) ∈ Pq and
                                                                              lev(u) ∈ Gq , being lev(u) the level whose domain includes
where the first line refers to a member, the second one to a mea-             u.
sure, the third and fourth ones to a level. The mapping weight
                                                                       With a slight abuse of notation, we will write e ∈ q to state that
of member ConveyorBelt when the conveyor belt device is per-
                                                                       q refers to md-element e. Let QT be the set of queries compatible
ceived is 0.5.                                                 
                                                                       with T ; the query equivalent to T is the query qeq (T ) ∈ QT that
                                                                       refers to all md-elements in I µ (T ).
4     CONTEXT INTERPRETATION
In this section we show how, given a set of perceived entities,        4.2    ...add the log...
A-BI produces the most relevant query, i.e., the one whose results
                                                                       The a-priori knowledge expressed through a mapping does not
would be more useful to the user.
                                                                       enable the system to learn by considering how user interests
                                                                       evolve, which instead could lead to picking different md-elements
4.1     Take the context...                                            when proposing queries or to choosing one of them more/less
The A-BI starting point is the context: a list of dictionary entries   frequently. To this end, A-BI exploits the history of previous
corresponding to the currently perceived environment entities.         interactions, stored in the query log, by means of a collaborative
More formally:                                                         filtering approach. The log stores, for each context, all the queries
                                                                       proposed to the user and the specific one chosen for execution.
   Definition 4.1 (Context). A context T over dictionary D is a
set of entries of D; each entry d ∈ T is coupled with a weight            Definition 4.5 (Log). A log L is an ordered list of triples ⟨T , q, f ⟩
w(T , d) ∈ (0, 1].                                                     where T is a context, q is a query, and f (feedback) is 1 if the user
                                                                       accepted the query, −1 if she rejected the query.
    The value of the weight for each entry may depend on different
factors, depending on the application domain. Non-perceived              Example 4.6. With reference to the context T proposed in
entities (i.e., for which it would be w(T , d) = 0) are not included   Example 4.3, a possible log is L = (⟨T , q 1 , −1⟩, ⟨T , q 2 , 1⟩), where
in the context. In our case study we assume that a subset of entries                    q 1 = ⟨{Date, Device},
are engaged, meaning that they have explicitly been indicated by
the user as being part of her current focus of interest; for these                             {(Device = ConveyorBelt)},
entries, the weight is always 1. For the other entries, the weight                             {Duration}⟩
is inversely proportional to the distance between the user and                          q 2 = ⟨{Month, Device},
the specific object being observed.
                                                                                               {(Device = ConveyorBelt)},
    Given a context, the mapping function identifies the relevant
md-elements, i.e., those that will be involved in the queries to be                            {Duration}⟩
issued against the cube.
    Definition 4.2 (Image). Given context T over dictionary D,         While q 1 has been rejected, q 2 (which is a roll-up of q 1 on the
cube C, and mapping µ from D to C, the image of T through µ            Date hierarchy) has been accepted.                               
is I µ (T ) = d ∈T µ(d), i.e., the set of md-elements of C that are
             Ð
                                                                          A log entry related to context T ′ should impact the recom-
mapped through µ to at least one entry in T .                          mendations related to the current context T only if the two con-
   Example 4.3. A possible context is the one depicted in Figure       texts are similar, since it is reasonable to assume that the user
1, where Bob is checking a conveyor belt placed in room A.1:           will have similar behaviors in similar contexts. Context simi-
                                                                       larity is computed as the similarity between their two equiva-
                 T = {⟨Device, ConveyorBelt, 1⟩,                       lent queries, sim(qeq (T ), qeq (T ′ )), which in turn is computed as
                      ⟨Role, Inspector, 0.5⟩,                          in [29]. The similarity function sim(q, q ′ ) between two queries
                      ⟨Date, 16/10/2018, 0.5⟩}                         q = ⟨Gq , Pq , Mq ⟩, q ′ = ⟨Gq ′ , Pq ′ , Mq ′ ⟩ combines three compo-
                                                                       nents, related to group-by sets, selection predicates, and mea-
The ConveyorBelt device is engaged. The image of T through µ           sures, respectively. In particular, the group-by set similarity con-
is                                                                     siders the distance between the levels involved in the query
                                                                       group-by sets; the selection similarity takes into account both
    I µ (T ) = {ConveyorBelt, Duration, MaintenanceType, Date}
                                                                       the levels and the members that form the selection predicates;
                                                                      the measure similarity is based on the Jaccard index. Finally, the
similarity between two queries is defined as the weighted average            4.3    ...get the queries
of the three similarity components. Following [29], we assume                In principle, given a context T , its equivalent query qeq (T )
the three components to be equally significant.                              might be directly proposed to the user. Unfortunately, equiv-
   Given log L, the image I µ (T ) of context T is extended to take          alent queries are often monster queries, i.e., quite complex queries
previous user interactions into account as follows. Let LT ⊆ L be            with very high cardinalities. A monster query would be obtained
the subset of triples whose context is similar to T :                        when the number of md-elements in the image is high because
           LT = {⟨T ′, q, f ⟩ | sim(qeq (T ), qeq (T ′ )) ≥ ϵ }              several entities were sensed in the environment so the context in-
                                                                             cludes a large number of entries. Unfortunately, monster queries
where ϵ is the similarity threshold. Then, I µ (T ) is extended with         are particularly undesirable in AR applications since:
all the md-elements referred to by the queries in LT :                            • High-cardinality queries take a long time to be computed,
           I µ∗ (T ) = I µ (T ) ∪ {e : ∃⟨T ′, q, f ⟩ ∈ LT | e ∈ q}                   transferred to the user smart device, and visualized.
                                                                                  • While working on the field, users must be quick and reac-
In this way, I µ∗ (T ) includes all the md-elements that are relevant                tive, while the results of monster queries are hard to be
to context T either according to the mapping or to the previous                      interpreted.
user experience. We define the log relevance to T of md-element              In the A-BI framework, monster queries are avoided in two ways:
e ∈ I µ∗ (T ) as the weighted number of times e has been accepted by         (i) by posing an upper bound γ to the query cardinality, and (ii)
the user (f = 1) over the number of times it has been proposed;              by considering only the most relevant md-elements in the image
weighting is based on the similarity between the current context             when generating the queries to be proposed to the user.
T and the considered log context T ′ :                                           As to (i), given query q = ⟨Gq , Pq , Mq ⟩, the expected cardinal-
                    1 + ⟨T ′,q, f ⟩ ∈LT (e),f =1 sim(qeq (T ), qeq (T ′ ))   ity of its result, denoted card(q), can be estimated as follows:
                        Í
    ρT (L, e) =
                      2 + ⟨T ′,q,f ⟩ ∈LT (e) sim(qeq (T ), qeq (T ′ ))
                          Í                                                                                        Ö
                                                                                           card(q) = card(Gq ) ×        selectivity(p)
                                                                                                                 p ∈Pq
where LT (e) is the subset of tuples in LT such that q refers to e.
To avoid relevance to be 0 when e has never been accepted, a                 where card(Gq ) is the cardinality of the query group-by set (es-
Laplace smoothing is applied in the formula above. Noticeably,               timated for instance using the Cardenas formula [30, 31]) and
the impact of Laplace smoothing decreases as the cardinality of              sel(p) is the selectivity of each simple predicate p belonging to
LT (e) increases, that is, the weight tends to 0 if several queries          Pq . Note that we can safely use this formula to estimate card(q)
referring e have been proposed but never accepted by the user.               because, as a consequence of the way we create queries in our
Conversely, it tends to 0.5 if only few queries referring e have             approach, all predicates in Pq are always external, i.e., they are
been proposed.                                                               expressed on levels that are less or equal to a level in Gq [32] in
   It is now possible to define the relevance to T of each md-               the roll-up order.
element e ∈ I µ∗ (T ) by taking into account, for each context entry d          As to (ii), before generating the maximal query for a context
that maps to e, not only the entry weight w(T , d) and the mapping           we remove from the context image I µ∗ (T ) all the md-elements e
weight w(d, e), but also the log relevance ρT (L, e):                        whose relevance relT (e) is below a given threshold η.
           (Í
                              w(T , d) · w(d, e), if LT (e) = ∅                 Definition 4.8 (Query Relevance and Maximal Query). Given
relT (e) = Íd ∈T |e ∈µ(d )                                                   context T , let QT be the set of queries that refer to any subset of
               d ∈T |e ∈µ(d ) w(T , d) · w(d, e) · ρT (L, e), otherwise
                                                                             md-elements in the image of T through mapping µ. The relevance
(note that log relevance is not considered —i.e., it does not affect         to T of a query q ∈ QT is defined as
the overall relevance— if e is never referred in a log query).
                                                                                                                e ∈q relT (e)
                                                                                                              Í
                                                                                                 relT (q) = Í
   Example 4.7. With reference to the image I µ (T ) from Exam-                                               e ∈I µ∗ (T ) relT (e)
ple 4.3 and to the log entries in Example 4.6, the extended image
                                                                             The maximal query for T is the query qmax (T ) ∈ QT that has
is
                                                                             maximum relevance among all those in QT such that card(q) ≤ γ .
              I µ∗ (T ) = {ConveyorBelt,
                                                                                Clearly, the number of queries in QT increases exponentially
                          Duration, MaintenanceType,                         with |T |. Finding the maximal query can be formalized as a Knap-
                          Date, Month}                                       sack problem on the md-elements in T , considering γ as the knap-
                                                                             sack capacity. The Knapsack problem is an NP-hard optimization
Month has been added to I µ (T ) as part of a query with a posi-             problem, so in Algorithm 1 we propose a greedy approach to
tive feedback. As to relT (Date) and relT (Month), assuming that             compute the maximal query in real time. Starting from the image,
the mapping weight w(d, Duration) = 1, and that the mapping                  we first include all the predicates to produce the smallest (i.e.,
weight for the remaining levels and members is set to 0.5, it is             most focused) result set and then, while the result set cardinality
                                relT (Device) = 0.25,                        is below the threshold, we incrementally extend the group-by set
                                                                             to produce progressively larger result sets.
                        relT (ConveyorBelt) = 0.25,
                                                                                Given the extended image I µ∗ of context T , the cardinality
                             relT (Duration) = 0.25,                         threshold γ , and the relevance threshold η, the algorithms works
                  relT (MaintenanceType) = 0.17,                             as follows. At first, in Line 1 we remove from I µ∗ the md-elements
                                relT (Month) = 0.13,                         whose relevance is below η. Then, we initialize the maximal
                                                                             query by selecting all the members in I µ∗ to create the conjunc-
                                   relT (Date) = 0.08
                                                                             tive predicate Pq (intuitively, this is the “most focused” predicate),
                                                                            by adding all the corresponding levels to the group-by set, and
Algorithm 1 Maximal query generation
Require: I µ∗ : extended image of context T , γ : cardinality threshold, η: relevance threshold
Ensure: qmax (T ): maximal query
 1: I µ∗ ← I µ∗ \ {e ∈ I µ∗ , relT (e) < η}                                   ◃ Drop low-relevance md-elements from the extended image
 2: Pq ← CreatePred({e ∈ I µ∗ , e is a member})◃ Create a predicate as the conjunction of IN clauses, each including all the members of
    the same level in the extended image
 3: G q ← {lev(e), e ∈ I µ∗ , e is a member}            ◃ Create a group-by set including the levels of all members in the extended image
 4: Mq ← {e ∈ I µ∗ , e is a measure}                                  ◃ Create a set of measures including all those in the extended image
 5: q ← ⟨G q , Pq , Mq ⟩                                                                               ◃ Initialize the current maximal query
 6: A ← {e ∈ I µ∗ \ G q , e is a level}                                                   ◃ Initialize a set of candidate levels to extend Gq
 7: q ′ ← q                                                                                                    ◃ New candidate maximal query
 8: while card(q ′ ) ≤ γ do                                                    ◃ While the candidate query cardinality is below threshold...
 9:      q ← q′                                                                                         ◃ ...update the current maximal query
10:      if A = ∅ then                                                                                 ◃ If the search space is not exhausted...
11:           break
12:      l ← arдmax e ∈A (relT (e))                                                               ◃ ...pick the most relevant candidate level,
13:      A ← A \ {l }                                                                             ◃ remove from the set of candidate levels,
14:      Gq ← Gq ∪ {l }                                                                                             ◃ add it to the group-by set,
15:      q ′ ← ⟨Gq , Pq , Mq ⟩                                                                                ◃ and update the candidate query
16: end while
17: return q


by adding all the measures in I µ∗ (Lines 2–5). Now we iterate to                                                                                 
progressively add to the group-by set new relevant levels taken
from the image, but only as long as the cardinality constraint is          5    QUERY DIVERSIFICATION
met (Lines 8–16). Finally, the last query with a cardinality below         Given a context T , the maximal query qmax (T ) is the more rel-
γ is returned to the user (Line 17). Note that, if the candidate           evant query that meets the cardinality constraint. Nonetheless,
query defined in line 5 violates the cardinality constraint γ , it is      to better meet the user’s desiderata, it may be useful to return
immediately returned since all the transformations that follow             a set of alternative queries that, though being still related to T ,
would further increase its cardinality. In this specific case, con-        are different enough from each other and from qmax (T ) to offer
straint enforcement will be ensured by the diversification phase           different flavors of the same information to the user. This gen-
(see Algorithm 2).                                                         eral idea is often practiced in the literature and referred to as
                                                                           diversification [23].
   Example 4.9. With reference to the context T from Exam-
                                                                              Given the set of queries QT compatible with T , the problem
ple 4.3 and to its extended image from Example 4.7, we ap-
                                                                           of diversification consists in finding the set of top-N queries
ply Algorithm 1 to generate the maximal query for T with
                                                                           R ⊆ QT that maximizes diversity and relevance with respect
γ = 20 and η = 0.1. At first, Date is pruned from I µ∗ since
                                                                           to user analyses [23]. The diversity div(q, R) of a query q with
its relevance is below η. Then Pq is initialized to {(Device =
                                                                           respect to R can be defined as in [33]
ConveyorBelt)}, Gq to {Device}, Mq to {Duration}; levels Main-
tenanceType and Month are stored in A for possible further inclu-                                       1 Õ
                                                                                           div(q, R) =         (1 − sim(q, q ′ ))
sion into Gq . The maximal query is initially set to ⟨Gq , Pq , Mq ⟩,                                  |R| ′
                                                                                                             q ∈R
with 1 being its cardinality. At this time, the algorithm at-
tempts to add MaintenanceType to the query group-by set.                   with div(q, R) = 1 when R is empty.
Let the cardinality of ⟨{Device, MaintenanceType}, {(Device =                 Finding the optimal set of diverse queries is NP-hard with
ConveyorBelt)}, {Duration}⟩ be 5. Since the cardinality of the             reference to the cardinality of QT which, in turn, is exponen-
candidate maximal query is below threshold, the current maxi-              tial in the image cardinality |I µ∗ (T )|. Considering the real-time
mal query is updated. The algorithm attemps to add Month to                constraint related to every AR application, we propose in the
the query group-by set. Since this increases the query cardinality         following a greedy approach that starting from qmax (T ) gener-
by a factor equal to the number of months being monitored in               ates progressively different queries. Diversification is obtained
the MaintenanceActivity cube, the cardinality of the new can-              through three OLAP-based generative primitives, each applied to
didate query ⟨{Device, MaintenanceType, Month}, {(Device =                 the previous query q = ⟨G, P, M⟩ to return a set of new queries:
ConveyorBelt)}, {Duration}⟩ will supposedly be larger than 20,                   • roll(q) returns a set of queries whose group-by set is
so the previous query ⟨{Device, MaintenanceType}, {(Device =                       coarser than G; each of these queries is obtained by re-
ConveyorBelt)}, {Duration}⟩ is returned. The SQL representa-                       placing in G one level l with its immediate successor
tion of the maximal query is                                                       in the roll-up partial order. For each q ′ ∈ roll(q), it is
                                                                                   card(q ′ ) ≤ card(q). Only queries whose predicates are all
    SELECT Device, MaintenanceType, sum(Duration)                                  external are returned.
    FROM MaintenanceActivity                                                     • drill(q) returns a set of queries whose group-by set is finer
    WHERE Device = ConveyorBelt                                                    than G; each of these queries is obtained by replacing in G
                                                                                   one level l with its immediate predecessor in the roll-up
    GROUP BY Device, MaintenanceType
                                                                                   partial order. For each q ′ ∈ roll(q), it is card(q ′ ) ≥ card(q).
Algorithm 2 Diversification                                              with the queries generated by Extend(q) (Line 11). Lines from
Require: qmax (T ): maximal query for context T , γ : cardinal-          4 to 11 are repeated until either the cardinality of the result set
    ity threshold, θ : diversity threshold, N : number of diverse        overcomes N or the search space Q is empty (Line 3).
    queries                                                                  Intuitively, Algorithm 2 explores the search space QT around
Ensure: R: diverse queries                                               qmax (T ) assuming that the lower the similarity between q ′ and
 1: R ← ∅                                              ◃ Result set      qmax (T ), the lower the relevance of q ′ . The queries added to the
 2: Q ← {qmax (T )}                                 ◃ Search space       result set R are no longer considered for differentiation aimed
 3: while (|R| < N ) ∧ (Q , ∅) do                                        at maximizing the probability of finding a a new different query
 4:     q ← arдmaxq ′ ∈Q (div(q ′, R))       ◃ Most diverse query        with high relevance: indeed, a query q ′′ obtained by extending
 5:     Q ← Q \ {q}                                                      q ′ ∈ R will probably be very similar to q ′ and less relevant. This
 6:     if (card(q) ≤ γ ) ∧ (div(q, R) ≥ θ ) then                        rule does not apply to qmax (T ), which is the starting point for
 7:         R ← R ∪ {q}                  ◃ q is added to the result      exploration and must be always expanded and added to R (except
 8:         if q = qmax (T ) then                                        the specific case in which it violates the cardinality constraint).
 9:             Q ← Q ∪ Extend(q)               ◃ Apply primitives          Example 5.2. Let qmax (T ) = ⟨{Device, Month}, {Device =
10:     else                                 ◃ Continue the search       ConveryorBelt}, {Duration}⟩ be the maximal query, with
11:         Q ← Q ∪ Extend(q)                   ◃ Apply primitives       card(qmax (T )) ≤ γ . At the first iteration, qmax (T ) (the only
12: end while                                                            query in the search space Q) is picked. Since div(qmax (T ), R) = 1
13: return R                                                             (R is empty), R is extended with qmax (T ), and Q is extended,
                                                                         among the others, with qr oll , qdr ill , and qslice (see Example 5.1).
                                                                         At the second iteration, let qdr ill be the most diverse query.
    • slice(q) returns a set of queries whose selection predicate        Since qdr ill is quite similar to the maximal query, it is likely
      is less selective than P; each of these queries is obtained        that div(qdr ill , R) is below θ . In this case, qdr ill is not added to
      by replacing one of the IN clauses in P, defined on a set of       R, and Q is extended by applying Extend(qdr ill ). The iteration
      members u 1 , . . . , un , with a clause on the member(s) that     continues until N diverse queries are generated, or the search
      are the immediate successors of u 1 , . . . , un in the part-of    space Q is exhausted.                                                  
      order. For each q ′ ∈ roll(q), it is card(q ′ ) ≥ card(q).
The queries returned by all these operators are progressively less       6    EXPERIMENTAL TESTS
similar to the maximal query, so their relevance is lower than the       In this section we evaluate the A-BI framework in terms of both
one of q. Besides, only queries compatible with the context are          effectiveness (to what extent the proposed queries meet the user’s
returned.                                                                interests) and efficiency. Tests were carried out on a synthetic
                                                                         benchmark since, in this work, we assume the problem of context
  Example 5.1. Given q = ⟨{Device, Month}, {Device =
                                                                         generation to be addressed by the smart device and, to the best
ConveryorBelt}, {Duration}⟩, examples of queries returned by
                                                                         of our knowledge, no AR open dataset exists.
our three primitives are, respectively,
                                                                             The user-system interaction is as follows. The user is moving
         qr oll = ⟨{Device, Year},                                       in a factory of 10 rooms, and, while moving, she collects one view
                   {(Device = ConveyorBelt)},                            of each room. In each view the smart device recognizes a set of
                                                                         entities that belong to the dictionary and lists them into a context.
                   {Duration}⟩
                                                                         Starting from each context, the A-BI framework proposes a set
        qdr ill = ⟨{Device, Date},                                       of queries to the user. Finally, the user either chooses one of
                   {(Device = ConveyorBelt)},                            the proposed queries or formulates an additional query that is
                   {Duration}⟩                                           slightly different from the ones proposed. After some time, the
                                                                         user ends her exploration of the factory and moves back through
        qsl ice = ⟨{Device, Month},
                                                                         the same rooms. Knowing the precedent behavior of the user, the
                   {(DeviceType = PackagingMachine)},                    A-BI framework exploits collaborative filtering to propose a new
                   {Duration}⟩                                           set of queries (generated by taking the query log into account)
                                                                         that better suit her interests. We denote with α the number of
                                                                    
                                                                         times the user has visited the factory (i.e., how many times each
    The diversification process is described in Algorithm 2. Given       context has already been sensed by the user’s devices before the
the maximal query qmax , the cardinality threshold γ , a diversity       current visit).
threshold θ , and the number of desired diverse queries N , the              This interaction is simulated by randomly generating 10 seed
set of diverse queries R is initialized to the empty set (Line 1)        contexts, each corresponding to a different room, in such a way
and the search space Q is initialized to qmax (Line 2). The most         that these contexts differ significantly from each other. Then, to
diverse query, q, is picked from the search space (Line 4); if its       simulate multiple visits of each room, small context variations are
cardinality is below γ and its diversity from the queries in R is        generated starting from each seed (specifically, one for each visit;
higher than θ , then q is added to R (Lines 6–7). Note that at the       the idea is that each room is perceived with slight differences
first step, when the search space contains only qmax (T ), function      in each visit). The number of entities recognized in each room
Extend(q) is invoked to apply our three primitives to qmax (T )          (i.e., the context cardinality) ranges between 5 and 15. Each test
(Lines 8–9); specifically, roll(q) is alway applied since it decreases   is repeated 10 times and the average behavior is considered.
the query cardinality, while drill(q) and slice(q) are applied only          We executed our tests against a cube including 5 linear hi-
if card(q) ≤ γ . Otherwise, if either card(q) is too high or q is not    erarchies with 5 levels of details each. Each dimension has 64
diverse enough from the queries in R, the search space is extended       members, determining a maximum cube cardinality of about 109 .
                    Table 1: Notation summary                              6.1    Effectiveness
                                                                           Figure 3a evaluates the benefit of diversification and of collab-
  Notation           Meaning                                               orative filtering by showing how similar qmax , qdiv , and qloд
  T                  Context (corresponds to a view of a factory           are to qu for different values of β and θ . Clearly, the lower β, the
                     room)                                                 less similar qu from qmax . Diversification comes to the rescue,
  qmax               Maximal query                                         providing at least one diverse query qdiv that is closer to the
  qu                 User query                                            user’s interests. However, even if a high diversity threshold is set
  qdiv               Diverse query most similar to qu                      (e.g., θ = 0.2), the diverse queries are not sensibly closer to qu ,
  qloд               Log query most similar to qu                          as they deviate too much from queries with high relevance. The
  α ∈ [0, 8]         Number of times the user has already seen             closest query in the log, qloд , is always more similar to qu than
                     a context                                             qdiv .
  β ∈ [0.6, 1]       Similarity between qu and qmax                            In Figure 3b we focus on diversification by showing that the
  γ = 100            Query cardinality threshold                           higher the number of diverse queries N , the higher the values
  ϵ = 0.5            Context similarity threshold                          of sim(qu , qdiv ), hence, the higher the effectiveness of diversi-
  η = 0.5            Relevance threshold                                   fication. Finally, Figure 3c shows the capability of collaborative
  θ ∈ [0.05, 0.2]    Diversity threshold                                   filtering to meet the user’s desiderata even when the user query
  N ∈ [2, 8]         Number of diverse queries generated                   qu is quite different from the maximal one. Indeed, the closest
                                                                           query in the log converges towards the user one after a few user
                                                                           feedbacks are collected. Already after one user feedback, the sim-
                                                   qu                      ilarity between qloд and qu sensibly rises, being very close to 1
                             qdiv                                          when 4 feedbacks are collected.
                                      β

                        θ                                                  6.2    Efficiency
                            qmax                                           We ran the tests on a machine equipped with Intel(R) Core(TM)
                                                                           i7-6700 CPU @ 3.40GHz CPU and 16GB RAM, with the A-BI
                                                                           framework being implemented in Scala.
                                                                              The efficiency results are depicted in Figure 4, with the overall
Figure 2: The user query qu , the maximal query qmax , and                 execution time being in the order of 10−1 seconds at most. The
the set of diverse queries (in blue); among them, qdiv is the              execution of Algorithm 2 accounts for most computational time.
                                   (a)
one most similar to qu                                                     Indeed, the process of diversification is computationally heavy,
                                             qu                            requiring to find, among the search space, the query with the
                                        qlog
                                                                           highest diversity with reference to the current result set (with
                                                                           O(|Q |) being the complexity of Line 4 in Algorithm 2 in case Q
                                                  qmax
The dictionary includes qone key for each md-element      (i.e., we        is not sorted). Note that the set of diverse queries is increasingly
                            div
assume the smart device can recognize each single element of the           built, thus the time to first result is lower since, once a query is
cube); each dictionary entry d is one-to-one mapped to the corre-          added to result set, it can be directly returned. We also varied
sponding md-element e with mapping weight w(d, e) randomly                 the number of entries in the context (|T | ∈ [5, 15]), but since this
ranging in [0.2, 1]. Additionally, for each context, we call               does not significantly affect performances we do not show the
     • qmax the maximal query generated by Algorithm 1.                    results.
     • qu the query chosen by the user.
                                      (b)  We denote with β the               We finally emphasize that the execution time corresponds to
       similarity between the user query and the maximal query             the time necessary to define the recommended queries, and not
       (β = sim(qu , qmax )). The lower the value of β, the higher         to the time to actually execute them. Indeed, the execution of the
       the difference between the user and maximal queries; if             query is demanded to and strictly depends on the performance
       β = 1, the user exactly chooses the maximal query.                  of the enterprise data warehouse system.
     • qdiv the query most similar to qu among those generated
       by the diversification process, when the log is ignored.
     • qloд the query most similar to qu among those generated             7     CONCLUSION
       by the diversification process, when the log is taken into          The A-BI framework is a first result in the direction of establishing
       account. At each visit, 10 contexts are shown to the user;          a tight connection between analytical reporting and augmented
       so, after each iteration, 10 contexts are added to the log          reality applications. Besides proposing a reference functional
       with the corresponding user choices.                                architecture and interaction process, in this paper we have shown
At the first visit (α = 0), the log is empty so qloд ≡ qdiv . In theory,   that recommendations can be given in real-time.
qdiv should be more similar to qu than qmax since it results from             A-BI can be improved from many points of view, in particu-
diversification, and qloд should further improve over qdiv since           lar we are working towards producing recommendations that
it also uses the log.                                                      are based on patterns of recognized context entities rather than
    A notation summary is provided in Table 1. To help the reader          on single entities (e.g., md-element e is relevant if the context
understand the role of each query, a visual representation of              includes d and d ′ but not d ′′ ). Furthermore, in its current im-
qmax , qdiv , and qloд is depicted in Figure 2, where the query            plementation, a recommendation involves all the elements in
space is represented as a Cartesian plane with Euclidean distances.        the context while it would be useful to provide separate recom-
                                                                           mendations related to subsets of elements. Finally, it would be
                                   = 0.05                                                = 0.1                                                            = 0.2
      1.00
      0.95
      0.90
sim


      0.85
      0.80             qmax
                       qdiv
      0.75             qlog
      0.70
          1.0               0.9     0.8          0.7           0.6 1.0         0.9       0.8          0.7                     0.6 1.0           0.9       0.8             0.7   0.6

                      (a) Average similarity of qmax , qd iv , and ql oд with qu for decreasing β and increasing θ ( |T | = 10, N = 4, α = 1)


                                   qmax                                                                            1.00
                       0.78        qdiv
                                                                                                                   0.95


                                                                                                   sim(qu, qlog)
                       0.76                                                                                        0.90
                sim


                       0.74                                                                                        0.85                                           = 0.6
                                                                                                                                                                  = 0.7
                       0.72                                                                                        0.80                                           = 0.8
                              2           4               6              8                                                0     1       2   3     4   5    6       7      8
                                                 N
  (b) Average similarity of qmax and qd iv with qu for increasing                         (c) Average similarity of qu and ql oд at increasing iterations for
  N (β = 0.6, θ = 0.1, |T | = 10)                                                         decreasing β (θ = 0.1, N = 4, |T | = 10)

                                                                         Figure 3: Effectiveness
                                             |T| = 10
                                  Maximal query                                            [8] Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu.
                                  Diversification                                              Snipsuggest: Context-aware autocompletion for SQL. PVLDB, 4(1):22–33,
                                                                                               2010.
                                                                                           [9] Raphaël Thollot, Nicolas Kuchmann-Beauger, and Marie-Aude Aufaure. Se-
          Time (s)


                     10 2                                                                      mantics and usage statistics for multi-dimensional query expansion. In Proc.
                                                                                               DASFAA, pages 250–260, Busan, South Korea, 2012.
                                                                                          [10] Marie Le Guilly, Jean-Marc Petit, and Vasile-Marian Scuturici. SQL query
                                                                                               completion for data exploration. CoRR, abs/1802.02872, 2018.
                                                                                          [11] Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya G. Parameswaran,
                                                                                               and Neoklis Polyzotis. SEEDB: efficient data-driven visualization recommen-
                     10 3                                                                      dations to support visual analytics. PVLDB, 8(13):2182–2193, 2015.
                                                                                          [12] Houssem Jerbi, Franck Ravat, Olivier Teste, and Gilles Zurfluh. Preference-
                        0.05          0.10              0.15         0.20                      based recommendations for OLAP analysis. In Proc. DaWaK, pages 467–478,
                                                                                               Linz, Austria, 2009.
                                                                                          [13] Rafael Berlanga Llavori and Victoria Nebot. Context-Aware Business Intelli-
                                                                                               gence, pages 87–110. 2015.
          Figure 4: Efficiency for increasing values of θ                                 [14] Kaijian Xu, Manli Zhu, Daqing Zhang, and Tao Gu. Context-aware content
                                                                                               filtering & presentation for pervasive & mobile information systems. In Proc.
                                                                                               ICST, page 20, Quebec, Canada, 2008.
                                                                                          [15] Wenwei Xue, Hung Keng Pung, Paulito P. Palmes, and Tao Gu. Schema
interesting to investigate on how to turn A-BI into a purely sta-                              matching for context-aware computing. In Proc. UbiComp, pages 292–301,
tistical framework where all weights are expressed in terms of                                 Seoul, Korea, 2008.
probabilities and reasoning is probabilistic too.                                         [16] Minmin Chen, Jian-Tao Sun, Xiaochuan Ni, and Yixin Chen. Improving
                                                                                               context-aware query classification via adaptive self-training. In Proce. CIKM,
                                                                                               pages 115–124, Glasgow, United Kingdom, 2011.
REFERENCES                                                                                [17] Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, Patrick Marcel, and Stefano
 [1] Ronald Azuma. A survey of augmented reality. Presence, 6(4):355–385, 1997.                Rizzi. A collaborative filtering approach for recommending OLAP sessions.
 [2] Angelo Croatti and Alessandro Ricci. Towards the web of augmented things.                 Decision Support Systems, 69:20–30, 2015.
     In Proc. ICSA, pages 80–87, Gothenburg, Sweden, 2017.                                [18] Xiaohui Yan, Jiafeng Guo, and Xueqi Cheng. Context-aware query recommen-
 [3] Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and                          dation by learning high-order relation in query logs. In Proc. CIKM, pages
     Alexander Tuzhilin. Incorporating contextual information in recommender                   2073–2076, Glasgow, United Kingdom, 2011.
     systems using a multidimensional approach. ACM Trans. Inf. Syst., 23(1):103–         [19] Di Jiang, Kenneth Wai-Ting Leung, Jan Vosecky, and Wilfred Ng. Personalized
     145, 2005.                                                                                query suggestion with diversity awareness. In Proc. ICDE, pages 400–411,
 [4] Kostas Stefanidis, Evaggelia Pitoura, and Panos Vassiliadis. A context-aware              Chicago, USA, 2014.
     preference database system. Int. J. Pervasive Computing and Communications,          [20] Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. AIDE: an active
     3(4):439–460, 2007.                                                                       learning-based approach for interactive data exploration. IEEE Trans. Knowl.
 [5] Ibrahim A. Ibrahim, Abdullah M. Albarrak, and Xue Li. Constrained recom-                  Data Eng., 28(11):2842–2856, 2016.
     mendations for query visualizations. Knowl. Inf. Syst., 51(2):499–529, 2017.         [21] Patrick Marcel and Elsa Negre. A survey of query recommendation techniques
 [6] S. Dobson, M. Golfarelli, S. Graziani, and S. Rizzi. A reference architecture and         for data warehouse exploration. In Proc. EDA, pages 119–134, Clermont-
     model for sensor data warehousing. IEEE Sensors Journal, 18(18):7659–7670,                Ferrand, France, 2011.
     2018.                                                                                [22] Arnaud Giacometti, Patrick Marcel, Elsa Negre, and Arnaud Soulet. Query
 [7] Shuyao Qi, Dingming Wu, and Nikos Mamoulis. Location aware keyword                        recommendations for OLAP discovery-driven analysis. IJDWM, 7(2):1–25,
     query suggestion based on document proximity. IEEE Trans. Knowl. Data Eng.,               2011.
     28(1):82–97, 2016.
[23] Kaiping Zheng, Hongzhi Wang, Zhixin Qi, Jianzhong Li, and Hong Gao. A
     survey of query result diversification. Knowl. Inf. Syst., 51(1):1–36, 2017.
[24] Elsa Negre, Franck Ravat, Olivier Teste, and Ronan Tournier. Cold-start
     recommender system problem within a multidimensional data warehouse. In
     Proc. RCIS, pages 1–8, 2013.
[25] Wolfgang Büschel, Annett Mitschick, and Raimund Dachselt. Here and now:
     Reality-based information retrieval: Perspective paper. In Proc. CHIIR, pages
     171–180, New Brunswick, USA, 2018.
[26] Neven A. M. ElSayed, Bruce H. Thomas, Kim Marriott, Julia Piantadosi, and
     Ross T. Smith. Situated analytics: Demonstrating immersive analytical tools
     with augmented reality. J. Vis. Lang. Comput., 36:13–23, 2016.
[27] Ben Shneiderman. The eyes have it: A task by data type taxonomy for infor-
     mation visualizations. In Proc. IEEE Symposium on Visual Languages, pages
     336–343, Boulder, USA, 1996.
[28] Ashish Gupta, Venky Harinarayan, and Dallan Quass. Aggregate-query pro-
     cessing in data warehousing environments. In Proc. VLDB, pages 358–369,
     1995.
[29] Julien Aligon, Matteo Golfarelli, Patrick Marcel, Stefano Rizzi, and Elisa Turric-
     chia. Similarity measures for OLAP sessions. Knowl. Inf. Syst., 39(2):463–489,
     2014.
[30] Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A.
     Lorie, and Thomas G. Price. Access path selection in a relational database
     management system. In Proc. SIGMOD, pages 23–34, Boston, USA, 1979.
[31] Matteo Golfarelli and Ettore Saltarelli. The workload you have, the workload
     you would like. In Proc. DOLAP, pages 79–85, New Orleans, USA, 2003.
[32] Stefano Rizzi and Ettore Saltarelli. View materialization vs. indexing: Balancing
     space constraints in data warehouse design. In Proc. CAiSE, pages 502–519,
     2003.
[33] Jaime G. Carbonell and Jade Goldstein. The use of MMR, diversity-based
     reranking for reordering documents and producing summaries. In Proc. SIGIR,
     pages 335–336, Melbourne, Australia, 1998.