A Compilation Technique for Interactive
         Ontology-mediated Data Exploration∗

      Medina Andresel, Magdalena Ortiz de la Fuente, and Mantas Šimkus

                                   TU Wien, Austria


       Abstract We make a step towards the interactive exploration of data
       in the ontology-mediated setting, presenting a technique to construct an
       offline compilation that allows us to answer efficiently different related
       queries in online phase, without the need to access again the original data.
       We also propose algorithms to construct relevant variations of a given
       query that, for example, reduce or increase the number of answers, while
       modifying the query in a minimal way, or make explicit the common
       properties of all objects that are in the answers to a given query.


1    Introduction
Ontology based data-access (OBDA) [9], understood broadly as the use of ontolo-
gies to mediate access to data sources and to leverage domain knowledge when
querying possibly incomplete data, has become a key application of Description
Logics (DLs) and a central topic of research in the last decade. However, most
work focuses on providing algorithms for answering queries, and the lack of
support to users when they formulate queries or process their query answers has
been recognized as an obstacle to the wider adaptation of OBDA [1].
    A major gap of existing systems is that they only support answering queries
one-at-a-time. Given multiple variations of a query, answers are computed from
scratch for each one of them, accessing the data every time. Moreover, it is left to
the users to find and formulate the query that precisely captures their information
needs. Inspired by existing powerful database management tools such as Online
Analytical Processing (OLAP), where data may be pre-processed and stored for
interactive data analysis, we are interested in compiling relevant information in
an offline-phase, in order to support efficient online answering of related queries,
as well as making modifications to queries to help users explore their answers.
    We consider ontologies in DL-LiteR [3], the DL that underlies the OWL 2
QL profile [7]. We focus on conjunctive queries (CQs) that are tree-shaped and
have exactly one answer variable. We assume that two initial queries are given,
which we call the lower- and upper-bound query. Intuitively, the lower bound
query simply says which kind of objects the user is interested in; it may be very
general, like a query that retrieves all persons, or all courses. The upper bound
query (which need not have any answers) gathers all properties of interest that
these objects may potentially have. We develop a technique for compiling a data
∗
    Funded by FWF projects T515, W1255 and P25207.
structure such that, without accessing the data again, we can efficiently get the
answers to any query that asks for objects of the desired type, with different
combinations of properties. We also use this compilation to obtain (minimal)
modifications of a query that retrieve strictly more or strictly less answers, as well
as to obtain the most specialized query that retrieves exactly the same answers
but makes explicit all their shared properties. A prototype implementation of our
algorithms shows promising results, pointing to adequate functionality to support
users in their attempts to understand datasets in the presence of ontologies.

2      Preliminaries
We briefly recall the syntax and semantics of DL-LiteR . As usual, NC , NR and
NI are countably infinite sets of concept names, role names, and individuals. The
set of roles is NR = NR ∪ {P − | P ∈ NR }, and R− denotes the inverse of the role
R. Basic concepts are concept names A, and expressions of the form ∃R with
R a role. A TBox is a set of axioms of the forms R1 ⊑ R2 , R1 ⊑ ¬R2 , C1 ⊑ C2 ,
and C1 ⊑ ¬C2 , where R1 , R2 are roles and the C1 , C2 basic concepts. An ABox
is a finite set of assertions of the form A(b), ¬A(b) and P (b, c), ¬P (b, c), where
A ∈ NC , P ∈ NR and b, c ∈ NI . We denote by Ind(A) the set of individuals
appearing in A. A knowledge base (KB) has the form K = ⟨A, T ⟩ for A an ABox
and T a TBox. The semantics of DL KBs is defined in terms of interpretations
I = (∆I , ·I ) with ∆I the domain and ·I the usual interpretation function. We
use the usual definitions and notations for satisfaction of axioms, assertions,
TBoxes and ABoxes, models of a KB, and entailment.
     We also recall the syntax and semantics of conjunctive queries (CQs) over
DL KBs. A CQ has the form q(x) :- ∃y.ϕ, where ϕ is a conjunction of atoms
of the form A(t) and R(t, t′ ), where each term t, t′ is either an individual or a
variable from x ∪ y. The variables in x are called answer variables, and term(q)
denotes the set of terms occurring in q. Given a CQ q and an interpretation I,
we call a partial match any function π mapping terms of q to objects in ∆I , and
we write I π A(t) for a concept atom A(t) if π(t) ∈ AI , and I π R(t1 , t2 ) for
role atom R(t1 , t2 ) ∈ q if (π(t1 ), π(t2 )) ∈ RI . A match for q in I is a partial
match with domain term(q) such that I π α for every atom α in q. A tuple of
individuals t is a certain answer to q(x) over KB K if for each I model of K
there exists a match π for q in I s.t. (t)I = π(x). We use cert(q, K) to denote
the set of certain answers of q over K. It is well known that if a DL-LiteR KB
K = ⟨T , A⟩ is consistent, it possesses a canonical model I T ,A such that for any
                                                                        T ,A
CQ q(x), cert(q, K) = {t | I T ,A π q, π(x) = t} [3]. Its domain ∆I         consists of
all words aR1 . . . Rn (n ≥ 0), where a ∈ Ind(A) and each Ri is a role such that
    −                                          T ,A
∃Ri−1    ⊑T ∃Ri . We let AnonObj = ∆I               \ Ind(A) be the set of anonymous
                                           T ,A
objects. The interpretation function ·I          is as follows:
    T ,A                 T ,A
 aI      =a           AI      = {a | T , A  A(a)} ∪ {aR1 . . . Rn | n ≥ 1, ∃Rn− ⊑T A}
     T ,A
PI          ={(a, b) | R(a, b) ∈ A and R ⊑T P } ∪ {(b, a) | R(a, b) ∈ A and R ⊑T P − }
              ∪ {(w1 , w2 ) | w2 = w1 S, S ⊑T P } ∪ {(w2 , w1 ) | w2 = w1 S, S ⊑T P − }


                                              2
We can rely on I T ,A for query answering, since we can assume that KB consistency
has been tested in advance, and queries trivialize over inconsistent KBs.


3      Answering Related Queries
In this paper we only consider CQs of a restricted form, which we call 1treeCQs.
A 1treeCQs is tree-shaped (i.e., its primal graph is a tree), and has exactly one
answer variable, which is its root. We denote this variable x. We use a special
subquery relation between 1treeCQs:
Definition 1 (Subquery). Given a DL-LiteR TBox T and two 1treeCQs q1 and
q2 , we call q1 subquery of q2 (w.r.t. T ), written q1 FT q2 , iff term(q1 ) ⊆ term(q2 )
and (i) for each atom R1 (t1 , t2 ) ∈ q1 there exists R2 (t1 , t2 ) or R2− (t2 , t1 ) ∈ q2
such that R2 ⊑T R1 ; and (ii) for each atom C1 (t) ∈ q1 there exists C2 (t) ∈ q2
such that C2 ⊑T C1 . In this case, we also call q2 a superquery of q1 (w.r.t. T ).
      The following claim is straightforward:

Proposition 1. Let K be a DL-LiteR KB. For any two given 1treeCQs q1 , q2
such that q1 FT q2 , we have that cert(q2 , K) ⊆ cert(q1 , K).

3.1     Compiling the Answers of Related Queries
For a given DL-LiteR KB K := ⟨T , A⟩, let qL , qU be two 1treeCQs such that
qL FT qU holds. We call them the lower bound query and the the upper bound
query, respectively. Any 1treeCQ q such that qL FT q FT qU is called in-between
query. We use the term 1treeCQs family to refer to the set of 1treeCQs containing
a pair of qL and qU , together with all the in-between queries. Intuitively, the lower
bound is a general query that retrieves all objects the user could be interested
in, while the upper bound is a query (which may have no answers) that gathers
properties that those objects could potentially have; the family of 1treeCQs
contains queries that ask for combinations of those objects.

Example 1. Our examples are based on the well-known LUBM ontology and
describe the domain of universities [5]. If we are interested in exploring the
properties of the employees in the database, we can use as lower bound qL (x) :
−Employee(x), and as upper bound a complex query with many potential proper-
ties of employees, i.e., qU (x) : −Dean(x), Prof(x), FullProf(x), teacherOf(x, y1 ),
Course(y1 ), headOf(x, y2 ), Dep(y2 ), autPub(x, y3 ), Pub(y3 ). Some of the in-
between queries in the induced family are discussed in Section 5, and depicted in
Figures 5 and 6.

    In what follows, we assume a given pair of qL and qU , and a KB K = ⟨T , A⟩.
The goal of this section is to build a structure that stores all information needed
for answering any query in the 1treeCQ family, without having to look up A.
    To this aim, we introduce the notion of matching witness. Intuitively, it
stores all objects in the canonical model that may be relevant to a match of an


                                            3
in-between query. We start from the answers to qL (since any answer to an in-
between query must also be an answer to qL ), and from there, we consider partial
matches for pairs of terms t1 , t2 that occur together in an atom R(t1 , t2 ) ∈ qU .
For each object in the range of a partial match, we store some labels that store
the concept and roles the object satisfies that may be relevant to the matches.
                                                  T ,A
     We introduce some notation. For a ∈ ∆I , MSconc (a, K) denotes the set of
                                                 T ,A
most specialized concepts satisfied by a in I         , containing each concept name A
such that I T ,A  A(a) and there is no A′ ̸= A with A′ ⊑T A and I T ,A  A′ (a).
                                 T ,A
Similarly, for a pair a, b ∈ ∆I , the set of all most specialized roles satisfied
by (a, b) in I T ,A is denoted MSrole (a, b, K) and contains each role R such that
I T ,A  R(a, b) and there is no R′ ̸= R with R′ ⊑T R and I T ,A  R′ (a, b). Given
a partial match π := [t1 7→ o1 , t2 7→ o2 ], a role label R is relevant w.r.t. π iff there
exists R′ (t1 , t2 ) ∈ qU with R ⊑T R′ or R′ ⊑T R. Similarly, a concept label C
is relevant (w.r.t π) iff there exists C ′ (t2 ) ∈ qU with C ⊑T C ′ or C ′ ⊑T C. For
each pair t1 , t2 of query terms that occur together in some atom R(t1 , t2 ) ∈ qU ,
                      T ,A
and each o1 ∈ ∆I , we define a set of matching candidates that contains each
o2 such that if o1 is a match for t1 in I T ,A , then o2 may be a match for t2 .
Definition 2 (Matching candidates). Let {t1 , t2 } ⊆ term(qU ) be such that
there is some R(t1 , t2 ) in qU , and let o1 ∈ Ind(A) ∪ AnonObj. The set of
matching candidates for t2 relative to t1 7→ o1 , denoted MCt1 7→o1 (t2 ), is:
 I. if t2 ∈ Ind(A), then
      1. if there is some R ∈ MSrole (o1 , t2 , K) that is relevant w.r.t. [t1 7→ o1 , t2 7→
         t2 ], then MCt1 7→o1 (t2 ) := {t2 },
     2. otherwise, MCt1 7→o1 (t2 ) := ∅.
II. if t2 ∈ vars(qU ), then
      1. if o1 ∈ Ind(A), then MCt1 7→o1 (t2 ) := candA ∪ cando1 R where:

          candA ={o2 | T , A  R(o1 , o2 ), R′ ⊑T R, R′ (t1 , t2 ) ∈ qU }
        cando1 R ={o1 R | T , A  ∃R(o1 ), R is relevant w.r.t [t1 7→ o1 , t2 7→ o1 R]}

     2. if o1 := wR ∈ AnonObj, then MCt1 7→o1 (t2 ) := candwRS ∪ candw where:

        candwRS ={wRS | T  ∃R− ⊑ ∃S, S relevant w.r.t [t1 7→ wR, t2 7→ wRS]}
           candw ={w | T  R− ⊑ S, R′ ⊑T S, R′ (t1 , t2 ) ∈ qU }

We also define label strings that store the concepts and roles for partial matches:
Definition 3 (Label String). For a partial match π, we define labels(π) as:
 I. if π is of the form [t 7→ o], where t ∈ term(qU ) and o ∈ Ind(A) ∪ AnonObj,
    then labels(π) = ⟨{C | C relevant w.r.t. π}⟩
II. if π is of the form [t1 7→ o1 , t2 7→ o2 ] with o2 ∈ MCt1 7→o1 (t2 ), then labels(π) =
    ⟨Rlbl , Clbl ⟩, where Clbl = {C | C ∈ MSconc (o2 , K) relevant w.r.t. π}, and Rlbl
    is defined as follows:
     A. if o1 , o2 ∈ Ind(A) or o1 is of the form o2 R, then Rlbl = {R | R ∈
         MSrole (o1 , o2 , K) and R is relevant w.r.t π}


                                             4
      B. if o2 = o1 R, then Rlbl = {R}

    Now we are ready to build matching witnesses by iteratively constructing
relevant partial matches and storing them together with their label strings:

Definition 4 (Matching Witness). Let a1 , . . . , an be an arbitrary enumer-
ation of the individuals that are a certain answer to qL . We define the root
matching witness as wroot = ⟨[labels(x 7→ a1 )a1 , . . . labels(x 7→ an )an ]⟩.
    For a given term t ∈ term(qU ) and an object o ∈ Ind(A) ∪ AnonObj, the
matching witness for t and o is defined as follows: wot = ⟨valuest1 , . . . , valuestk ⟩,
where {t1 , . . . , tk } = {t′ | there exists R(t, t′ ) ∈ qU }, and valuesti , 1 ≤ i ≤ k, is
constructed as follows:

  if MCt7→o (ti ) = {o1 . . . om }, then valuesti = [φ1 o1 , . . . , φm om ], where φj =
   labels(t 7→ o, ti 7→ oj ), 1 ≤ j ≤ m.
  In particular, if MCt7→o (ti ) = ∅, then valuesti = [ε] (ε is the empty string).

We denote by W the set obtained by starting from x and wroot , and then recursively,
for each child t′ of the current term t and each object o in I T ,A that occurs in
                                               ′
the range of a witness, adding the witness wot to W.


3.2    Compilation-based Query Answering

W gathers all information that we need to answer any query in the 1treeCQ family.
In fact, to decide whether an individual is an answer to an arbitrary in-between
query, it suffices to recursively check the following satisfaction conditions. Below,
for t ∈ term(q), we denote by q t the query obtained by restricting q to atoms
that involve only terms that are descendants of t in q.

Definition 5 (Satisfaction). Let t ∈ term(qU ) and q FT qU t . An object
o ∈ Ind(A) ∪ AnonObj satisfies q (w.r.t. W) iff:

  if t = x, then o is an answer to qL , and for each C ′ (x) ∈ q there exists
   C ∈ labels(x 7→ o) s. t. C ⊑T C ′
  if t is a leaf in (the tree-structure of) q, then wot = ⟨⟩ ∈ W

Additionally, the following condition must hold: for each t′ child of t in q there
exists o′ ∈ MCt7→o (t′ ) such that labels(t 7→ o, t′ 7→ o′ ) = ⟨Rlbl , Clbl ⟩ satisfies: for
each R′ (t, t′ ) ∈ q there exists R ∈ Rlbl such that R ⊑T R′ and for each C ′ (t′ ) ∈ q
there exists C ∈ Clbl such that C ⊑T C ′ ; and o′ satisfies q t′ (w.r.t. W).

    This correctly captures the answers of any query in the 1treeCQ family:

Theorem 1 (Correctness of compilation-based QA). For each in-between
1treeCQ q, we have: cert(q, K) = {a ∈ Ind(A) | a satisfies q (w.r.t.W)}.


                                             5
4     Query Modifications
Given a query, we identify interesting variations of it that could be relevant to a
user. For example, we identify the minimal ways to modify the current query so
that it provides more or less answers. We are also interested in identifying all the
common properties that our set of answers has, that is, the possible additions to
our query that, while making it syntactically more restrictive, would still retrieve
the same set of answers. In the experiments in the next section, we will illustrate
the usefulness of these query modifications on several examples.
    Algorithm 1.1: neutralSpecialization
   Input: W the set of matching witnesses, q in-between query
   Output: Q the set of neutral specializations of q
 1 S := {};
 2 foreach a ∈ ans(q, W) do
 3     S := S ∪ {q | q x-maximal for a};
 4 Q := ∅;
   /* compute intersection of each tuple in the n-fold Cartesian product over S */
 5 intersect(S,Q,q,∅,0);
 6 return Q;


    Algorithm 1.2: intersect
   Input: S = {Q1 , . . . , Qn } set of sets of queries, Qneutral collects the neutral
           specializations, q in-between query, qinter intersection query, k the index
           of current set in S
 1 if k == n then
 2     Qneutral := Qneutral ∪ {qinter } ;
 3 else
 4     foreach query q ′ ∈ Qk do
 5        if q FT q ′ then
 6             if qinter == ∅ then
 7                  qinter := q ′ ;
 8             else
 9                  qinter := qinter ∩T q ′ ;
10             intersect(S,Qneutral ,q,qinter ,k + 1)

    Algorithm 1.3: strictSpecialization
   Input: W the set of matching witnesses, q in-between query
   Output: Q set of strict specializations of q
 1 Q := ∅ ;
 2 foreach a ∈ ans(q, W) do
 3    foreach q1 x-maximal for a do
 4        foreach q2 max. neutral spec. of q do
 5           if q2 FT q1 then
 6                qdif f := q1 \ q2 ;
 7                foreach query atom A ∈ qdif f do
 8                     if q2 ∪ {A} is 1treeCQ then
 9                          Q := Q ∪ {q2 ∪ {A}} ;
10 return Q;


                                           6
 Algorithm 1.4: minGeneralization
   Input: W the set of matching witnesses, q in-between query
   Output: Q the set of minimal generalizations for q
 1 Q := ∅;
 2 foreach a ∈ ans(qL , W) and a ∈      / ans(q, W) do
 3    foreach q ′ x-maximal for a do
 4         qinter := ∅ ;
 5         if q ′ FT q then
 6               Q := Q ∪ q ′ ;
 7         else
 8               qinter := q ∩T q ′ ;
 9               if qinter ̸= ∅ then
10                    Q := Q ∪ qinter ;
11 return Q;


4.1     Construction of Maximal Queries

Intuitively, the set of constructed matching witnesses contains information regard-
ing how much of the upper-bound query we have matched in the canonical model,
for each a ∈ Ind(A) - possible answer. Hence we can read from the witnesses,
for each possible answer a, the set of maximal queries in the family for which a
is a match. We will see that these maximal queries provide the key information
to suggest the user the desired query modifications. We remark that there can
be multiple such maximal queries for the same individual a.

 Definition 6 (t-Maximal Query). Let q t be a 1treeCQ rooted in t, where
 t ∈ term(qU ), s.t. q t FT qU t . A query q is said to be t-maximal for a ∈
 Ind(A) ∪ AnonObj iff: (a) a satisfies q t w.r.t. W, and (b) for each q ′ t , q t
̸= q ′ t , s.t. q t FT q ′ t FT qU t , a does not satisfy q ′ t w.r.t. W.

   In what follows, we use ans(q, W) to denote the answers for q over W, which
are the individuals a such that a satisfies q w.r.t. W (see Theorem 1).


4.2     Specializations and Generalizations

We use the term query specialization to refer to any superquery of a given q in a
1treeCQ family. Indeed, superqueries ‘specialize’ the query by requiring additional
or more specific properties to hold. We further distinguish between two kinds of
specializations: the ones that retrieve the same answers as the given q, and the
ones that retrieve strictly less answers.

Definition 7 (Neutral Specialization). We call q2 a neutral specialization
of q1 if q1 FT q2 and ans(q2 , W) = ans(q1 , W).

      Trivially, any in-between 1treeCQ is a neutral specialization of itself.

Definition 8 (Strict Specialization). We call q2 a strict specialization of q1
if q1 FT q2 and ans(q2 , W) ( ans(q1 , W), with ans(q2 , W) ̸= ∅.


                                           7
    Conversely, any subquery of a given 1treeCQ q is less constrained than q, and
retrieves at least the same answers. We are interested in those subqueries that
retrieve strictly more answers, which we call query generalizations.
    The first query modification problem we consider is to construct the maximal
neutral specializations of a query. Intuitively, the maximal neutral specialization
includes all most specialized additional constraints that we can add to the query
without modifying its answers, and hence it makes explicit all the properties that
are common to all the answer individuals.
Definition 9 (Maximal Neutral Specialization). A maximal neutral spe-
cialization for q w.r.t. ans(q, W) is a neutral specialization q ′ such that, for each
q ′′ s.t. q ′′ ̸= q ′ and q ′ FT q ′′ , we have ans(q ′′ , W) ̸= ans(q, W).
    Algorithm 1.1 computes neutral specializations for a given in-between query
q, by intersecting maximal queries for each a ∈ ans(q, W). Binary operation ∩T
on 1treeCQs keeps only those query atoms that are subsumed in both 1treeCQs.
One can show that by dropping all subqueries from the neutral specializations
we obtain the maximal neutral specializations.
    The second query modification problem we consider is to find the minimal
strict specializations of a given query, that is, which are the minimal modifications
to the query that will result in a smaller set of answers.
Definition 10 (Minimal Strict Specialization). A minimal strict specializa-
tion for q is a strict specialization q ′ such that, for each q ′′ with q FT q ′′ FT q ′ ,
q ′′ is a neutral specialization of q.
    It can be the case that q cannot be strictly specialized without loosing all
answers. In this case we say that q’s only strict specialization is qU . One can obtain
the minimal strict specializations for a given in-between query by discarding all
superqueries from the set of strict specializations, constructed by Algorithm 1.3.
    Conversely a query generalization refers to any subquery of a given q in a
1treeCQ family that has strictly more answers. Finally, we consider the problem
of finding the minimal generalizations of given query, that is, how to minimally
relax the query to retrieve more answers.
Definition 11 (Minimal Generalization). Let q be an in-between 1treeCQ. A
minimal generalization or relaxation for q is a generalization q ′ such that for
each q ′′ , that is q ′ FT q ′′ FT q, it holds that: ans(q ′′ , W) = ans(q, W)
    It might be the case that an in-between 1treeCQ q cannot be generalized
if ans(q, W) = ans(qL , W). Hence, whenever q is a neutral specialization of qL
it doesn’t have any generalizations. Algorithm 1.4 constructs all the minimal
generalizations for a given in-between query.

5      Implementation and Experiments
The matching witness solution is implemented in Java (jdk1.8), using additional
libraries. For ontology loading we used the OWLAPI java library1 . For ABox
 1
     http://owlapi.sourceforge.net/


                                            8
                                                                          Seconds                                                                                          Seconds                                                                                                               Seconds
                                                                                                                35
                                                                                2.50

                                                                                                                                                                                 0.60
                                                                                                                30                                                                                                        7,000                                                                        30.00
                    2,000                                                       2.00

#Possible Answers


                                                                                            #Possible Answers


                                                                                                                                                                                           #Possible Answers
                                                                                                                25                                                                                                                                                                                     25.00
                                                                                                                                                                                                                          6,000
                                                                                1.50                                                                                             0.40
                    1,500
                                                                                                                20                                                                                                                                                                                     20.00

                                                                                                                                                                                                                          5,000
                                                                                1.00
                                                                                                                15                                                                                                                                                                                     15.00
                    1,000                                                                                                                                                        0.20

                                                                                                                                                                                                                          4,000
                                                                                0.50                            10                                                                                                                                                                                     10.00

                     500
                        0.5         1        1.5        2         2.5                                                 0.5             1          1.5           2                                                                  2           2.5           3         3.5      4        4.5
                                        #Matching Witnesses        ·104                                                               #Matching Witnesses          ·104                                                                                #Matching Witnesses            ·104


                                        (a) Batch1                                                                                        (b) Batch2                                                                                                   (c) Batch3

                                                                                        Figure 1: Performance of QA
                       4.25                                                                                           1.5                                                                                                         75
                                   Max. Neutral                                                                                   Max. Neutral                                                                                                      Max. Neutral
                       3.75         Min. Gen                                                                                       Min. Gen                                                                                                          Min. Gen
                                                                                                                     1.25
                                    Min. Strict                                                                                    Min. Strict                                                                                    60                 Min. Strict

                            3
                                                                                                                        1
   Time [seconds]


                                                                                              Time [seconds]


                                                                                                                                                                                                         Time [seconds]
                                                                                                                                                                                                                                  45
                       2.25
                                                                                                                     0.75


                        1.5                                                                                                                                                                                                       30
                                                                                                                      0.5


                       0.75                                                                                          0.25
                                                                                                                                                                                                                                  15


                            0                                                                                           0                                                                                                             5
                            400          1,000      1,500        2,000        2,500                                              10         15         20     25          30         35                                               3,000         4,000           5,000     6,000      7,000        8,000

                                             #Possible Answers                                                                               #Possible Answers                                                                                                  #Possible Answers


                                        (a) Batch1                                                                                    (b) Batch2                                                                                                      (c) Batch3
                                                                 Figure 2: Performance for query modifications
                                  Dep                                               Dep                                                                       ResearchGroup                                                                                                 Institute
                                   x                                                   x                                                                                  x                                                                                                    x

                                                                 member                                                                     member                                         subOrgOf
         member                          subOrgOf                                     subOrgOf                                                                                                                                                        member
                                                                                                                                                                               orgPub                                                                                                 subOrgOf
                                                                 FullProf       y1 y2 Univ
                                  y1 y2                                                                                                                y1                 y3                  y2                                                                             y1 y2
                                                                                teacherOf
                    (a) In-bet. query                                                                                                                       (c) In-bet. query                                                                                       (d) Min. Gen.
                                                                              z1

                                                                 (b) Max. Neutr. Spec.
                                          Figure 3: Example of query modifications for in-between queries
                                        Student                                                                              Student                                                      GraduentStudent, ResearchAssistant
                                             x                                                                                    x                                                                                                                             x

                                                                                                                                                                          takesCourse                                                                                                 advisor
             worksFor                                   advisor              memberOf                                                            advisor
                                                                                                                                                                                                                                  worksFor

                                    y3              y4                                                                      y3              y4                                       y1                                                            y3                                            y4
                                                                                                                                                                     Course                                                                    Organization                                   Professor
                    (a) In-between query                                                    (b) Min. Gen.
                                                                                                                                                                                          (c) Max. Neutral Spec.

                                                                                           GraduentStudent, ResearchAssistant
                                                                                                                                                 x

                                                                          takesCourse                                                                                          advisor
                                                                                                                            worksFor

                                                                                       y1                                                 y3                                  y4
                                                                        Course                                                        Organization                    AssistantProfessor

                                                                                                                     (d) Min. Strict Spec.
                                        Figure 4: Example of obtained modifications for in-between query

reasoning we used the Ontop [10] since it efficiently provides complete answers
for DL-LiteR . However, Ontop was specially tailored for OBDA and does not
provide full support when the data is not stored using relational databases. So


                                                                                                                                                 9
                                       Employee                               Employee    Employee
                                          x                                      x             x
                           teacherOf              autPub          teacherOf                        autPub
                                   y1           y3                              y1            y3

                            (a) In-between query                 (b) Minimal Generalizations
                     Faculty                                         Faculty                             Professor
                        x                                                x                                   x

 teacherOf                       autPub         teacherOf                         autPub                              autPub
                            worksFor                                         headOf   teacherOf                  worksFor

             y1        y2       y3                          y1           y2          y3             y1      y2           y3
    Course             Dep      Pub               Course           Dep         Pub        Course            Dep          Pub

    (c) Max. Neutral Spec.                                                   (d) Min. Strict Spec.
                  Figure 5: Example of obtained modifications for in-between query
                    Employee                                     Employee                                FullProfessor
                       x                                             x                                        x

 teacherOf                      autPub        teacherOf                         autPub     teacherOf                       autPub
              headOf                                      worksFor                                                headOf

       y1             y2          y3                 y1            y2             y3               y1        y2              y3
                                                                                          Course         Department      Publication
       (a) In-between query                    (b) Minimal Generalization
                                                                                          (c) Max. Neutral Specialization
                  Figure 6: Example of obtained modifications for in-between query

we used HermiT2 for TBox reasoning, which captures more expressive DLs but is
not as efficient over DL-LiteR . To boost the performance we used multi-threading
for computing the matching witnesses as well as for query answering.
Specifications. The entire evaluation was done on a server using 7 CPUs (each
core running at 2.3GHz), 10 GB RAM, operating system Ubuntu Server amd64,
java version Java SE Runtime Environment 7.
Ontology. As ontology we used LUBM [5] - a well known ontology suitable
for testing, especially since it provides a data generator, and its domain is easy
to comprehend and relatively simple. To match our DL fragment, we used a
DL-LiteR version previously used in [6].
Datasets. We used the LUBM generator3 to compute the data sets. The datasets
size vary between ≈ 3.5 MB and ≈ 19 MB.
Evaluation measurements. We evaluated 3 batches of 1treeCQs families de-
fined by 3 pairs of bound queries that can be accessed online 4 . With this simple
prototype implementation, the time for constructing the compilation varies from
minutes (for small-sized data sets) up to few hours (for largest data set). We
hope to decrease this time dramatically in an improved version that, among
others optimizations, does not rely on expensive calls to external reasoners. We
measured the number of possible answers for the 1treeCQs family, the number of
matching witnesses, the average time for answering an in-between query, and for
each query modification, the average time per in-between query.
2
  http://www.hermit-reasoner.com/
3
  http://swat.cse.lehigh.edu/projects/lubm/
4
  https://github.com/medinaAnd/MatchingWitnesses


                                                                  10
    Figure 1 reflects the results of query answering procedure for each batch.
There is a clear dependence on the number of possible answers for the query
answering performance. Figure 2 shows the time performance for each query
modification on each batch. Given that the modifications call the query answering
procedure, the efficiency depends on the number of possible answers.
    To illustrate the usefulness of the obtained query modifications, we present
in Figures 3, 4, 5, and 6 several variations of some in-between queries obtained
with our algorithms. The relevant fragment of the ontology is given below:
    ResearchGroup ⊑T Institute      headOf ⊑T worksFor       AssProf ⊑T Prof
       worksFor ⊑T memberOf             Prof ⊑T Faculty      FullProf ⊑T Prof
          Faculty ⊑T Employee
    Note that query modifications depend also on the data, not only on the
ontology. E.g., for the maximal neutral specialization 5c, no axiom in the ontology
states that each employee that is a head of something is also a full professor.

6    Related Work

Works that aim at assisting users formulate queries include Quelo [4], the Faceted
Search Interface SemFacet [2], and the Optique virtual query formulation system
(VQS) [11]. Quelo and SemFacet both provide interfaces in which users can
build tree-shaped queries like the ones considered here, which can be specialized
by adding additional atoms, or generalized by dropping them. Moreover, the
interfaces have an ontological reasoning layer that is used to find, for example,
relevant specializations to suggest to the user, similarly to our work. However,
unlike our work, those interfaces do not aim at compiling data to support online
answering of different query variations. Instead, they use off-the-shelf reasoners,
and query answering is then done from scratch for each query variation. Another
closely related work is presented in [8], where the authors focus on information
overload and study ways to specialize a query. They consider EL and arbitrary
CQs. We focus only on tree-shaped CQs and DL-Lite, but consider also other
reasoning problems.

7    Conclusions

In this paper we contributed to the theoretical foundations for compilation-
based query answering and construction of useful query modifications in support
of data exploration. Our prototype shows overall good performance, however
implementation can be further improved. Many questions remain open for future
work. A natural next step is to explore how one can extend the solution to other
DLs. It seems that this can be done at least for Horn-DLs, since a canonical
model can be constructed. Another interesting but challenging question is how to
extend the solution in such way that it considers queries that are not tree-shaped.
This would be indeed very valuable, and we believe it may be possible.


                                        11
References
 1. Ahmet, S., Evgeny, K., Dmitriy, Z., Ernesto, J.R., Martin, G., Ian, H.: Ontology-
    based visual query formulation: An industry experience. In: Proceedings of the
    11th International Symposium on Visual Computing (ISVC 2015). Springer, Las
    Vegas, Nevada, USA (2015), http://www.ahmetsoylu.com/wp-content/uploads/
    2015/10/Soylu_et_al_ISVC2015.pdf
 2. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuska, S., Zheleznyakov, D.:
    Faceted search over ontology-enhanced rdf data. In: Proceedings of the 23rd ACM
    International Conference on Conference on Information and Knowledge Management.
    pp. 939–948. CIKM ’14 (2014)
 3. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The dl-lite family. J.
    OF AUTOMATED REASONING p. 2007 (2007)
 4. Franconi, E., Guagliardo, P., Tessaris, S., Trevisan, M.: Quelo: an ontology-driven
    query interface (2011)
 5. Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems.
    Web Semant. 3(2-3), 158–182 (Oct 2005), http://dx.doi.org/10.1016/j.websem.
    2005.06.005
 6. Kontchakov, R., Rezk, M., Rodriguez-Muro, M., Xiao, G., Zakharyaschev, M.:
    Answering SPARQL queries over databases under OWL 2 QL entailment regime.
    In: Proc. of International Semantic Web Conference (ISWC 2014). Lecture Notes
    in Computer Science, Springer (2014)
 7. Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: Owl 2 web
    ontology language: Profiles. World Wide Web Consortium, Working Draft WD-
    owl2-profiles-20081202 (December 2008)
 8. Nutakki, P.: Specializing conjunctive queries in the EL-family for better compre-
    hension of result sets. Master’s thesis, TU Dresden (2011)
 9. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics
    X, Lecture Notes in Computer Science, vol. 4900, pp. 133–173. Springer Berlin
    Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-77688-8_5
10. Rodriguez-Muro, M., Kontchakov, R., Zakharyaschev, M.: Ontology-based data
    access: Ontop of databases. In: International Semantic Web Conference (1). Lecture
    Notes in Computer Science, vol. 8218, pp. 558–573. Springer (2013)
11. Soylu, A., Kharlamov, E., Zheleznyakov, D., Jimenez-Ruiz, E., Giese, M., Horrocks,
    I.: Optiquevqs: Visual query formulation for obda. In: Proceedings of the 27th
    International Workshop on Description Logics (DL 2014). vol. 1193, pp. 725–728.
    CEUR-WS.org, Vienna, Austria (2014), http://ceur-ws.org/Vol-1193/paper_88.
    pdf


                                           12