A Compilation Technique for Interactive Ontology-mediated Data Exploration∗ Medina Andresel, Magdalena Ortiz de la Fuente, and Mantas Šimkus TU Wien, Austria Abstract We make a step towards the interactive exploration of data in the ontology-mediated setting, presenting a technique to construct an offline compilation that allows us to answer efficiently different related queries in online phase, without the need to access again the original data. We also propose algorithms to construct relevant variations of a given query that, for example, reduce or increase the number of answers, while modifying the query in a minimal way, or make explicit the common properties of all objects that are in the answers to a given query. 1 Introduction Ontology based data-access (OBDA) [9], understood broadly as the use of ontolo- gies to mediate access to data sources and to leverage domain knowledge when querying possibly incomplete data, has become a key application of Description Logics (DLs) and a central topic of research in the last decade. However, most work focuses on providing algorithms for answering queries, and the lack of support to users when they formulate queries or process their query answers has been recognized as an obstacle to the wider adaptation of OBDA [1]. A major gap of existing systems is that they only support answering queries one-at-a-time. Given multiple variations of a query, answers are computed from scratch for each one of them, accessing the data every time. Moreover, it is left to the users to find and formulate the query that precisely captures their information needs. Inspired by existing powerful database management tools such as Online Analytical Processing (OLAP), where data may be pre-processed and stored for interactive data analysis, we are interested in compiling relevant information in an offline-phase, in order to support efficient online answering of related queries, as well as making modifications to queries to help users explore their answers. We consider ontologies in DL-LiteR [3], the DL that underlies the OWL 2 QL profile [7]. We focus on conjunctive queries (CQs) that are tree-shaped and have exactly one answer variable. We assume that two initial queries are given, which we call the lower- and upper-bound query. Intuitively, the lower bound query simply says which kind of objects the user is interested in; it may be very general, like a query that retrieves all persons, or all courses. The upper bound query (which need not have any answers) gathers all properties of interest that these objects may potentially have. We develop a technique for compiling a data ∗ Funded by FWF projects T515, W1255 and P25207. structure such that, without accessing the data again, we can efficiently get the answers to any query that asks for objects of the desired type, with different combinations of properties. We also use this compilation to obtain (minimal) modifications of a query that retrieve strictly more or strictly less answers, as well as to obtain the most specialized query that retrieves exactly the same answers but makes explicit all their shared properties. A prototype implementation of our algorithms shows promising results, pointing to adequate functionality to support users in their attempts to understand datasets in the presence of ontologies. 2 Preliminaries We briefly recall the syntax and semantics of DL-LiteR . As usual, NC , NR and NI are countably infinite sets of concept names, role names, and individuals. The set of roles is NR = NR ∪ {P − | P ∈ NR }, and R− denotes the inverse of the role R. Basic concepts are concept names A, and expressions of the form ∃R with R a role. A TBox is a set of axioms of the forms R1 ⊑ R2 , R1 ⊑ ¬R2 , C1 ⊑ C2 , and C1 ⊑ ¬C2 , where R1 , R2 are roles and the C1 , C2 basic concepts. An ABox is a finite set of assertions of the form A(b), ¬A(b) and P (b, c), ¬P (b, c), where A ∈ NC , P ∈ NR and b, c ∈ NI . We denote by Ind(A) the set of individuals appearing in A. A knowledge base (KB) has the form K = ⟨A, T ⟩ for A an ABox and T a TBox. The semantics of DL KBs is defined in terms of interpretations I = (∆I , ·I ) with ∆I the domain and ·I the usual interpretation function. We use the usual definitions and notations for satisfaction of axioms, assertions, TBoxes and ABoxes, models of a KB, and entailment. We also recall the syntax and semantics of conjunctive queries (CQs) over DL KBs. A CQ has the form q(x) :- ∃y.ϕ, where ϕ is a conjunction of atoms of the form A(t) and R(t, t′ ), where each term t, t′ is either an individual or a variable from x ∪ y. The variables in x are called answer variables, and term(q) denotes the set of terms occurring in q. Given a CQ q and an interpretation I, we call a partial match any function π mapping terms of q to objects in ∆I , and we write I π A(t) for a concept atom A(t) if π(t) ∈ AI , and I π R(t1 , t2 ) for role atom R(t1 , t2 ) ∈ q if (π(t1 ), π(t2 )) ∈ RI . A match for q in I is a partial match with domain term(q) such that I π α for every atom α in q. A tuple of individuals t is a certain answer to q(x) over KB K if for each I model of K there exists a match π for q in I s.t. (t)I = π(x). We use cert(q, K) to denote the set of certain answers of q over K. It is well known that if a DL-LiteR KB K = ⟨T , A⟩ is consistent, it possesses a canonical model I T ,A such that for any T ,A CQ q(x), cert(q, K) = {t | I T ,A π q, π(x) = t} [3]. Its domain ∆I consists of all words aR1 . . . Rn (n ≥ 0), where a ∈ Ind(A) and each Ri is a role such that − T ,A ∃Ri−1 ⊑T ∃Ri . We let AnonObj = ∆I \ Ind(A) be the set of anonymous T ,A objects. The interpretation function ·I is as follows: T ,A T ,A aI =a AI = {a | T , A  A(a)} ∪ {aR1 . . . Rn | n ≥ 1, ∃Rn− ⊑T A} T ,A PI ={(a, b) | R(a, b) ∈ A and R ⊑T P } ∪ {(b, a) | R(a, b) ∈ A and R ⊑T P − } ∪ {(w1 , w2 ) | w2 = w1 S, S ⊑T P } ∪ {(w2 , w1 ) | w2 = w1 S, S ⊑T P − } 2 We can rely on I T ,A for query answering, since we can assume that KB consistency has been tested in advance, and queries trivialize over inconsistent KBs. 3 Answering Related Queries In this paper we only consider CQs of a restricted form, which we call 1treeCQs. A 1treeCQs is tree-shaped (i.e., its primal graph is a tree), and has exactly one answer variable, which is its root. We denote this variable x. We use a special subquery relation between 1treeCQs: Definition 1 (Subquery). Given a DL-LiteR TBox T and two 1treeCQs q1 and q2 , we call q1 subquery of q2 (w.r.t. T ), written q1 FT q2 , iff term(q1 ) ⊆ term(q2 ) and (i) for each atom R1 (t1 , t2 ) ∈ q1 there exists R2 (t1 , t2 ) or R2− (t2 , t1 ) ∈ q2 such that R2 ⊑T R1 ; and (ii) for each atom C1 (t) ∈ q1 there exists C2 (t) ∈ q2 such that C2 ⊑T C1 . In this case, we also call q2 a superquery of q1 (w.r.t. T ). The following claim is straightforward: Proposition 1. Let K be a DL-LiteR KB. For any two given 1treeCQs q1 , q2 such that q1 FT q2 , we have that cert(q2 , K) ⊆ cert(q1 , K). 3.1 Compiling the Answers of Related Queries For a given DL-LiteR KB K := ⟨T , A⟩, let qL , qU be two 1treeCQs such that qL FT qU holds. We call them the lower bound query and the the upper bound query, respectively. Any 1treeCQ q such that qL FT q FT qU is called in-between query. We use the term 1treeCQs family to refer to the set of 1treeCQs containing a pair of qL and qU , together with all the in-between queries. Intuitively, the lower bound is a general query that retrieves all objects the user could be interested in, while the upper bound is a query (which may have no answers) that gathers properties that those objects could potentially have; the family of 1treeCQs contains queries that ask for combinations of those objects. Example 1. Our examples are based on the well-known LUBM ontology and describe the domain of universities [5]. If we are interested in exploring the properties of the employees in the database, we can use as lower bound qL (x) : −Employee(x), and as upper bound a complex query with many potential proper- ties of employees, i.e., qU (x) : −Dean(x), Prof(x), FullProf(x), teacherOf(x, y1 ), Course(y1 ), headOf(x, y2 ), Dep(y2 ), autPub(x, y3 ), Pub(y3 ). Some of the in- between queries in the induced family are discussed in Section 5, and depicted in Figures 5 and 6. In what follows, we assume a given pair of qL and qU , and a KB K = ⟨T , A⟩. The goal of this section is to build a structure that stores all information needed for answering any query in the 1treeCQ family, without having to look up A. To this aim, we introduce the notion of matching witness. Intuitively, it stores all objects in the canonical model that may be relevant to a match of an 3 in-between query. We start from the answers to qL (since any answer to an in- between query must also be an answer to qL ), and from there, we consider partial matches for pairs of terms t1 , t2 that occur together in an atom R(t1 , t2 ) ∈ qU . For each object in the range of a partial match, we store some labels that store the concept and roles the object satisfies that may be relevant to the matches. T ,A We introduce some notation. For a ∈ ∆I , MSconc (a, K) denotes the set of T ,A most specialized concepts satisfied by a in I , containing each concept name A such that I T ,A  A(a) and there is no A′ ̸= A with A′ ⊑T A and I T ,A  A′ (a). T ,A Similarly, for a pair a, b ∈ ∆I , the set of all most specialized roles satisfied by (a, b) in I T ,A is denoted MSrole (a, b, K) and contains each role R such that I T ,A  R(a, b) and there is no R′ ̸= R with R′ ⊑T R and I T ,A  R′ (a, b). Given a partial match π := [t1 7→ o1 , t2 7→ o2 ], a role label R is relevant w.r.t. π iff there exists R′ (t1 , t2 ) ∈ qU with R ⊑T R′ or R′ ⊑T R. Similarly, a concept label C is relevant (w.r.t π) iff there exists C ′ (t2 ) ∈ qU with C ⊑T C ′ or C ′ ⊑T C. For each pair t1 , t2 of query terms that occur together in some atom R(t1 , t2 ) ∈ qU , T ,A and each o1 ∈ ∆I , we define a set of matching candidates that contains each o2 such that if o1 is a match for t1 in I T ,A , then o2 may be a match for t2 . Definition 2 (Matching candidates). Let {t1 , t2 } ⊆ term(qU ) be such that there is some R(t1 , t2 ) in qU , and let o1 ∈ Ind(A) ∪ AnonObj. The set of matching candidates for t2 relative to t1 7→ o1 , denoted MCt1 7→o1 (t2 ), is: I. if t2 ∈ Ind(A), then 1. if there is some R ∈ MSrole (o1 , t2 , K) that is relevant w.r.t. [t1 7→ o1 , t2 7→ t2 ], then MCt1 7→o1 (t2 ) := {t2 }, 2. otherwise, MCt1 7→o1 (t2 ) := ∅. II. if t2 ∈ vars(qU ), then 1. if o1 ∈ Ind(A), then MCt1 7→o1 (t2 ) := candA ∪ cando1 R where: candA ={o2 | T , A  R(o1 , o2 ), R′ ⊑T R, R′ (t1 , t2 ) ∈ qU } cando1 R ={o1 R | T , A  ∃R(o1 ), R is relevant w.r.t [t1 7→ o1 , t2 7→ o1 R]} 2. if o1 := wR ∈ AnonObj, then MCt1 7→o1 (t2 ) := candwRS ∪ candw where: candwRS ={wRS | T  ∃R− ⊑ ∃S, S relevant w.r.t [t1 7→ wR, t2 7→ wRS]} candw ={w | T  R− ⊑ S, R′ ⊑T S, R′ (t1 , t2 ) ∈ qU } We also define label strings that store the concepts and roles for partial matches: Definition 3 (Label String). For a partial match π, we define labels(π) as: I. if π is of the form [t 7→ o], where t ∈ term(qU ) and o ∈ Ind(A) ∪ AnonObj, then labels(π) = ⟨{C | C relevant w.r.t. π}⟩ II. if π is of the form [t1 7→ o1 , t2 7→ o2 ] with o2 ∈ MCt1 7→o1 (t2 ), then labels(π) = ⟨Rlbl , Clbl ⟩, where Clbl = {C | C ∈ MSconc (o2 , K) relevant w.r.t. π}, and Rlbl is defined as follows: A. if o1 , o2 ∈ Ind(A) or o1 is of the form o2 R, then Rlbl = {R | R ∈ MSrole (o1 , o2 , K) and R is relevant w.r.t π} 4 B. if o2 = o1 R, then Rlbl = {R} Now we are ready to build matching witnesses by iteratively constructing relevant partial matches and storing them together with their label strings: Definition 4 (Matching Witness). Let a1 , . . . , an be an arbitrary enumer- ation of the individuals that are a certain answer to qL . We define the root matching witness as wroot = ⟨[labels(x 7→ a1 )a1 , . . . labels(x 7→ an )an ]⟩. For a given term t ∈ term(qU ) and an object o ∈ Ind(A) ∪ AnonObj, the matching witness for t and o is defined as follows: wot = ⟨valuest1 , . . . , valuestk ⟩, where {t1 , . . . , tk } = {t′ | there exists R(t, t′ ) ∈ qU }, and valuesti , 1 ≤ i ≤ k, is constructed as follows:  if MCt7→o (ti ) = {o1 . . . om }, then valuesti = [φ1 o1 , . . . , φm om ], where φj = labels(t 7→ o, ti 7→ oj ), 1 ≤ j ≤ m.  In particular, if MCt7→o (ti ) = ∅, then valuesti = [ε] (ε is the empty string). We denote by W the set obtained by starting from x and wroot , and then recursively, for each child t′ of the current term t and each object o in I T ,A that occurs in ′ the range of a witness, adding the witness wot to W. 3.2 Compilation-based Query Answering W gathers all information that we need to answer any query in the 1treeCQ family. In fact, to decide whether an individual is an answer to an arbitrary in-between query, it suffices to recursively check the following satisfaction conditions. Below, for t ∈ term(q), we denote by q t the query obtained by restricting q to atoms that involve only terms that are descendants of t in q. Definition 5 (Satisfaction). Let t ∈ term(qU ) and q FT qU t . An object o ∈ Ind(A) ∪ AnonObj satisfies q (w.r.t. W) iff:  if t = x, then o is an answer to qL , and for each C ′ (x) ∈ q there exists C ∈ labels(x 7→ o) s. t. C ⊑T C ′  if t is a leaf in (the tree-structure of) q, then wot = ⟨⟩ ∈ W Additionally, the following condition must hold: for each t′ child of t in q there exists o′ ∈ MCt7→o (t′ ) such that labels(t 7→ o, t′ 7→ o′ ) = ⟨Rlbl , Clbl ⟩ satisfies: for each R′ (t, t′ ) ∈ q there exists R ∈ Rlbl such that R ⊑T R′ and for each C ′ (t′ ) ∈ q there exists C ∈ Clbl such that C ⊑T C ′ ; and o′ satisfies q t′ (w.r.t. W). This correctly captures the answers of any query in the 1treeCQ family: Theorem 1 (Correctness of compilation-based QA). For each in-between 1treeCQ q, we have: cert(q, K) = {a ∈ Ind(A) | a satisfies q (w.r.t.W)}. 5 4 Query Modifications Given a query, we identify interesting variations of it that could be relevant to a user. For example, we identify the minimal ways to modify the current query so that it provides more or less answers. We are also interested in identifying all the common properties that our set of answers has, that is, the possible additions to our query that, while making it syntactically more restrictive, would still retrieve the same set of answers. In the experiments in the next section, we will illustrate the usefulness of these query modifications on several examples. Algorithm 1.1: neutralSpecialization Input: W the set of matching witnesses, q in-between query Output: Q the set of neutral specializations of q 1 S := {}; 2 foreach a ∈ ans(q, W) do 3 S := S ∪ {q | q x-maximal for a}; 4 Q := ∅; /* compute intersection of each tuple in the n-fold Cartesian product over S */ 5 intersect(S,Q,q,∅,0); 6 return Q; Algorithm 1.2: intersect Input: S = {Q1 , . . . , Qn } set of sets of queries, Qneutral collects the neutral specializations, q in-between query, qinter intersection query, k the index of current set in S 1 if k == n then 2 Qneutral := Qneutral ∪ {qinter } ; 3 else 4 foreach query q ′ ∈ Qk do 5 if q FT q ′ then 6 if qinter == ∅ then 7 qinter := q ′ ; 8 else 9 qinter := qinter ∩T q ′ ; 10 intersect(S,Qneutral ,q,qinter ,k + 1) Algorithm 1.3: strictSpecialization Input: W the set of matching witnesses, q in-between query Output: Q set of strict specializations of q 1 Q := ∅ ; 2 foreach a ∈ ans(q, W) do 3 foreach q1 x-maximal for a do 4 foreach q2 max. neutral spec. of q do 5 if q2 FT q1 then 6 qdif f := q1 \ q2 ; 7 foreach query atom A ∈ qdif f do 8 if q2 ∪ {A} is 1treeCQ then 9 Q := Q ∪ {q2 ∪ {A}} ; 10 return Q; 6 Algorithm 1.4: minGeneralization Input: W the set of matching witnesses, q in-between query Output: Q the set of minimal generalizations for q 1 Q := ∅; 2 foreach a ∈ ans(qL , W) and a ∈ / ans(q, W) do 3 foreach q ′ x-maximal for a do 4 qinter := ∅ ; 5 if q ′ FT q then 6 Q := Q ∪ q ′ ; 7 else 8 qinter := q ∩T q ′ ; 9 if qinter ̸= ∅ then 10 Q := Q ∪ qinter ; 11 return Q; 4.1 Construction of Maximal Queries Intuitively, the set of constructed matching witnesses contains information regard- ing how much of the upper-bound query we have matched in the canonical model, for each a ∈ Ind(A) - possible answer. Hence we can read from the witnesses, for each possible answer a, the set of maximal queries in the family for which a is a match. We will see that these maximal queries provide the key information to suggest the user the desired query modifications. We remark that there can be multiple such maximal queries for the same individual a. Definition 6 (t-Maximal Query). Let q t be a 1treeCQ rooted in t, where t ∈ term(qU ), s.t. q t FT qU t . A query q is said to be t-maximal for a ∈ Ind(A) ∪ AnonObj iff: (a) a satisfies q t w.r.t. W, and (b) for each q ′ t , q t ̸= q ′ t , s.t. q t FT q ′ t FT qU t , a does not satisfy q ′ t w.r.t. W. In what follows, we use ans(q, W) to denote the answers for q over W, which are the individuals a such that a satisfies q w.r.t. W (see Theorem 1). 4.2 Specializations and Generalizations We use the term query specialization to refer to any superquery of a given q in a 1treeCQ family. Indeed, superqueries ‘specialize’ the query by requiring additional or more specific properties to hold. We further distinguish between two kinds of specializations: the ones that retrieve the same answers as the given q, and the ones that retrieve strictly less answers. Definition 7 (Neutral Specialization). We call q2 a neutral specialization of q1 if q1 FT q2 and ans(q2 , W) = ans(q1 , W). Trivially, any in-between 1treeCQ is a neutral specialization of itself. Definition 8 (Strict Specialization). We call q2 a strict specialization of q1 if q1 FT q2 and ans(q2 , W) ( ans(q1 , W), with ans(q2 , W) ̸= ∅. 7 Conversely, any subquery of a given 1treeCQ q is less constrained than q, and retrieves at least the same answers. We are interested in those subqueries that retrieve strictly more answers, which we call query generalizations. The first query modification problem we consider is to construct the maximal neutral specializations of a query. Intuitively, the maximal neutral specialization includes all most specialized additional constraints that we can add to the query without modifying its answers, and hence it makes explicit all the properties that are common to all the answer individuals. Definition 9 (Maximal Neutral Specialization). A maximal neutral spe- cialization for q w.r.t. ans(q, W) is a neutral specialization q ′ such that, for each q ′′ s.t. q ′′ ̸= q ′ and q ′ FT q ′′ , we have ans(q ′′ , W) ̸= ans(q, W). Algorithm 1.1 computes neutral specializations for a given in-between query q, by intersecting maximal queries for each a ∈ ans(q, W). Binary operation ∩T on 1treeCQs keeps only those query atoms that are subsumed in both 1treeCQs. One can show that by dropping all subqueries from the neutral specializations we obtain the maximal neutral specializations. The second query modification problem we consider is to find the minimal strict specializations of a given query, that is, which are the minimal modifications to the query that will result in a smaller set of answers. Definition 10 (Minimal Strict Specialization). A minimal strict specializa- tion for q is a strict specialization q ′ such that, for each q ′′ with q FT q ′′ FT q ′ , q ′′ is a neutral specialization of q. It can be the case that q cannot be strictly specialized without loosing all answers. In this case we say that q’s only strict specialization is qU . One can obtain the minimal strict specializations for a given in-between query by discarding all superqueries from the set of strict specializations, constructed by Algorithm 1.3. Conversely a query generalization refers to any subquery of a given q in a 1treeCQ family that has strictly more answers. Finally, we consider the problem of finding the minimal generalizations of given query, that is, how to minimally relax the query to retrieve more answers. Definition 11 (Minimal Generalization). Let q be an in-between 1treeCQ. A minimal generalization or relaxation for q is a generalization q ′ such that for each q ′′ , that is q ′ FT q ′′ FT q, it holds that: ans(q ′′ , W) = ans(q, W) It might be the case that an in-between 1treeCQ q cannot be generalized if ans(q, W) = ans(qL , W). Hence, whenever q is a neutral specialization of qL it doesn’t have any generalizations. Algorithm 1.4 constructs all the minimal generalizations for a given in-between query. 5 Implementation and Experiments The matching witness solution is implemented in Java (jdk1.8), using additional libraries. For ontology loading we used the OWLAPI java library1 . For ABox 1 http://owlapi.sourceforge.net/ 8 Seconds Seconds Seconds 35 2.50 0.60 30 7,000 30.00 2,000 2.00 #Possible Answers #Possible Answers #Possible Answers 25 25.00 6,000 1.50 0.40 1,500 20 20.00 5,000 1.00 15 15.00 1,000 0.20 4,000 0.50 10 10.00 500 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2 2.5 3 3.5 4 4.5 #Matching Witnesses ·104 #Matching Witnesses ·104 #Matching Witnesses ·104 (a) Batch1 (b) Batch2 (c) Batch3 Figure 1: Performance of QA 4.25 1.5 75 Max. Neutral Max. Neutral Max. Neutral 3.75 Min. Gen Min. Gen Min. Gen 1.25 Min. Strict Min. Strict 60 Min. Strict 3 1 Time [seconds] Time [seconds] Time [seconds] 45 2.25 0.75 1.5 30 0.5 0.75 0.25 15 0 0 5 400 1,000 1,500 2,000 2,500 10 15 20 25 30 35 3,000 4,000 5,000 6,000 7,000 8,000 #Possible Answers #Possible Answers #Possible Answers (a) Batch1 (b) Batch2 (c) Batch3 Figure 2: Performance for query modifications Dep Dep ResearchGroup Institute x x x x member member subOrgOf member subOrgOf subOrgOf member orgPub subOrgOf FullProf y1 y2 Univ y1 y2 y1 y3 y2 y1 y2 teacherOf (a) In-bet. query (c) In-bet. query (d) Min. Gen. z1 (b) Max. Neutr. Spec. Figure 3: Example of query modifications for in-between queries Student Student GraduentStudent, ResearchAssistant x x x takesCourse advisor worksFor advisor memberOf advisor worksFor y3 y4 y3 y4 y1 y3 y4 Course Organization Professor (a) In-between query (b) Min. Gen. (c) Max. Neutral Spec. GraduentStudent, ResearchAssistant x takesCourse advisor worksFor y1 y3 y4 Course Organization AssistantProfessor (d) Min. Strict Spec. Figure 4: Example of obtained modifications for in-between query reasoning we used the Ontop [10] since it efficiently provides complete answers for DL-LiteR . However, Ontop was specially tailored for OBDA and does not provide full support when the data is not stored using relational databases. So 9 Employee Employee Employee x x x teacherOf autPub teacherOf autPub y1 y3 y1 y3 (a) In-between query (b) Minimal Generalizations Faculty Faculty Professor x x x teacherOf autPub teacherOf autPub autPub worksFor headOf teacherOf worksFor y1 y2 y3 y1 y2 y3 y1 y2 y3 Course Dep Pub Course Dep Pub Course Dep Pub (c) Max. Neutral Spec. (d) Min. Strict Spec. Figure 5: Example of obtained modifications for in-between query Employee Employee FullProfessor x x x teacherOf autPub teacherOf autPub teacherOf autPub headOf worksFor headOf y1 y2 y3 y1 y2 y3 y1 y2 y3 Course Department Publication (a) In-between query (b) Minimal Generalization (c) Max. Neutral Specialization Figure 6: Example of obtained modifications for in-between query we used HermiT2 for TBox reasoning, which captures more expressive DLs but is not as efficient over DL-LiteR . To boost the performance we used multi-threading for computing the matching witnesses as well as for query answering. Specifications. The entire evaluation was done on a server using 7 CPUs (each core running at 2.3GHz), 10 GB RAM, operating system Ubuntu Server amd64, java version Java SE Runtime Environment 7. Ontology. As ontology we used LUBM [5] - a well known ontology suitable for testing, especially since it provides a data generator, and its domain is easy to comprehend and relatively simple. To match our DL fragment, we used a DL-LiteR version previously used in [6]. Datasets. We used the LUBM generator3 to compute the data sets. The datasets size vary between ≈ 3.5 MB and ≈ 19 MB. Evaluation measurements. We evaluated 3 batches of 1treeCQs families de- fined by 3 pairs of bound queries that can be accessed online 4 . With this simple prototype implementation, the time for constructing the compilation varies from minutes (for small-sized data sets) up to few hours (for largest data set). We hope to decrease this time dramatically in an improved version that, among others optimizations, does not rely on expensive calls to external reasoners. We measured the number of possible answers for the 1treeCQs family, the number of matching witnesses, the average time for answering an in-between query, and for each query modification, the average time per in-between query. 2 http://www.hermit-reasoner.com/ 3 http://swat.cse.lehigh.edu/projects/lubm/ 4 https://github.com/medinaAnd/MatchingWitnesses 10 Figure 1 reflects the results of query answering procedure for each batch. There is a clear dependence on the number of possible answers for the query answering performance. Figure 2 shows the time performance for each query modification on each batch. Given that the modifications call the query answering procedure, the efficiency depends on the number of possible answers. To illustrate the usefulness of the obtained query modifications, we present in Figures 3, 4, 5, and 6 several variations of some in-between queries obtained with our algorithms. The relevant fragment of the ontology is given below: ResearchGroup ⊑T Institute headOf ⊑T worksFor AssProf ⊑T Prof worksFor ⊑T memberOf Prof ⊑T Faculty FullProf ⊑T Prof Faculty ⊑T Employee Note that query modifications depend also on the data, not only on the ontology. E.g., for the maximal neutral specialization 5c, no axiom in the ontology states that each employee that is a head of something is also a full professor. 6 Related Work Works that aim at assisting users formulate queries include Quelo [4], the Faceted Search Interface SemFacet [2], and the Optique virtual query formulation system (VQS) [11]. Quelo and SemFacet both provide interfaces in which users can build tree-shaped queries like the ones considered here, which can be specialized by adding additional atoms, or generalized by dropping them. Moreover, the interfaces have an ontological reasoning layer that is used to find, for example, relevant specializations to suggest to the user, similarly to our work. However, unlike our work, those interfaces do not aim at compiling data to support online answering of different query variations. Instead, they use off-the-shelf reasoners, and query answering is then done from scratch for each query variation. Another closely related work is presented in [8], where the authors focus on information overload and study ways to specialize a query. They consider EL and arbitrary CQs. We focus only on tree-shaped CQs and DL-Lite, but consider also other reasoning problems. 7 Conclusions In this paper we contributed to the theoretical foundations for compilation- based query answering and construction of useful query modifications in support of data exploration. Our prototype shows overall good performance, however implementation can be further improved. Many questions remain open for future work. A natural next step is to explore how one can extend the solution to other DLs. It seems that this can be done at least for Horn-DLs, since a canonical model can be constructed. Another interesting but challenging question is how to extend the solution in such way that it considers queries that are not tree-shaped. This would be indeed very valuable, and we believe it may be possible. 11 References 1. Ahmet, S., Evgeny, K., Dmitriy, Z., Ernesto, J.R., Martin, G., Ian, H.: Ontology- based visual query formulation: An industry experience. In: Proceedings of the 11th International Symposium on Visual Computing (ISVC 2015). Springer, Las Vegas, Nevada, USA (2015), http://www.ahmetsoylu.com/wp-content/uploads/ 2015/10/Soylu_et_al_ISVC2015.pdf 2. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuska, S., Zheleznyakov, D.: Faceted search over ontology-enhanced rdf data. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. pp. 939–948. CIKM ’14 (2014) 3. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The dl-lite family. J. OF AUTOMATED REASONING p. 2007 (2007) 4. Franconi, E., Guagliardo, P., Tessaris, S., Trevisan, M.: Quelo: an ontology-driven query interface (2011) 5. Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. Web Semant. 3(2-3), 158–182 (Oct 2005), http://dx.doi.org/10.1016/j.websem. 2005.06.005 6. Kontchakov, R., Rezk, M., Rodriguez-Muro, M., Xiao, G., Zakharyaschev, M.: Answering SPARQL queries over databases under OWL 2 QL entailment regime. In: Proc. of International Semantic Web Conference (ISWC 2014). Lecture Notes in Computer Science, Springer (2014) 7. Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: Owl 2 web ontology language: Profiles. World Wide Web Consortium, Working Draft WD- owl2-profiles-20081202 (December 2008) 8. Nutakki, P.: Specializing conjunctive queries in the EL-family for better compre- hension of result sets. Master’s thesis, TU Dresden (2011) 9. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics X, Lecture Notes in Computer Science, vol. 4900, pp. 133–173. Springer Berlin Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-77688-8_5 10. Rodriguez-Muro, M., Kontchakov, R., Zakharyaschev, M.: Ontology-based data access: Ontop of databases. In: International Semantic Web Conference (1). Lecture Notes in Computer Science, vol. 8218, pp. 558–573. Springer (2013) 11. Soylu, A., Kharlamov, E., Zheleznyakov, D., Jimenez-Ruiz, E., Giese, M., Horrocks, I.: Optiquevqs: Visual query formulation for obda. In: Proceedings of the 27th International Workshop on Description Logics (DL 2014). vol. 1193, pp. 725–728. CEUR-WS.org, Vienna, Austria (2014), http://ceur-ws.org/Vol-1193/paper_88. pdf 12