-

A Well-founded Graph-based Summarization Framework for Description Logics

0 Univ Rennes , Lannion , France

The quotient operation from graph theory ofers an elegant graph summarization framework that has been widely investigated in the literature, notably for the exploration and eficient management of large graphs; it consists in fusing equivalent vertices according to an equivalence relation. In this paper, we study whether a similar operation may be used to summarize ABoxes. Towards this goal, we define and examine the quotient operation on an ABox: we establish that a quotient ABox is more specific than the ABox it summarizes, and characterize to which extent it is more specific. This preliminary investigation validates the interest of a quotient-based ABox summarization framework, and paves the way for further studies on it in the description logic setting, e.g., to devise equivalence relations suited to the optimization of typical data management and reasoning tasks on large ABoxes or to the visualization of large ABoxes, and on its utilization in related settings, e.g., Semantic Web.

Data summaries quotient graph most specific concept

Graph summarization has received considerable attention in the literature [ 14,5 ], in particular for the exploration and visualization of large graphs, as well as for the optimization of graph management systems (cardinality estimation, indexing, etc). A well-established tool used to summarize graphs into compact ones is the quotient operation from graph theory [ 12 ]; other kinds of summaries have also been considered like collections of graph statistics or of frequent graph patterns, or a mix of both as in [ 16 ].

The quotient operation ofers an elegant graph summarization framework by decoupling the summarization method, which basically fuses equivalent vertices, from the high-level specification of equivalent vertices, defined by an equivalence relation, e.g., bisimilarity [ 1 ]. Applying the quotient operation to a graph results Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). in a (typically small) homomorphic approximation of it, called quotient graph; the equivalence relation to choose depends on the target summary usage.

In this paper, we study whether a similar operation may be used to summarize ABoxes. Towards this goal, we first transfer the quotient operation from graphs to description logics (DLs) [ 2 ] in order to summarize ABoxes as if they were merely graphs. Then, because ABoxes are not just graphs but are essentially ifrst-order theories (or slight extensions of them), we examine the semantics of the quotient operation on an ABox. We establish that a quotient ABox is more specific than the ABox it summarizes , by showing that its structure entails the structure of the ABox, and we characterize to which extent this quotient ABox is more specific than the ABox it summarizes , by determining the precise nature of the approximation between the quotient ABox contents and the ABox contents. This latter characterization is the main result of this paper. It provides an in-depth understanding of the quotient operation itself as a tool for ABox summarization; the essence of quotient ABoxes is independent of the DL under consideration and of the chosen equivalence relation (that depends on the target summary usage).

The paper is organized as follows. We recall the basics of DLs in Sec. 2. Then, we transfer the quotient operation from graphs to ABoxes in Sec. 3 and we examine its semantics w.r.t. DLs in Sec. 4. Finally, we discuss related works in Sec. 5 and we conclude with salient perspectives in Sec. 6.

The proofs of our technical results can be found in the appendix of this paper. 2

Preliminaries

We recall the important aspects of DLs that are used in this paper [ 2 ], with a focus on the lightweight DL-liteR DL [ 4 ] that we (just) use to illustrate our discussions throughout this paper; our results hold for (standard) DLs. DL syntax and semantics Given a set NC of concept names (unary predicates), a set NR of role names (binary predicates), and a set NI of individuals (constants), a DL allows expressing unary and binary formulae respectively called concepts and roles. They are used to build TBoxes (ontologies) made of concept inclusions of the form C1 ⊑ C2 and role inclusions of the form R1 ⊑ R2, as well as to build ABoxes (databases) storing concept assertions of the form C(a) and role assertions of the form R(a1, a2), where a, a1, a2 ∈ NI . In DL-liteR, concepts and roles are built according to the following rules, where A ∈ NC and R ∈ NR:

B := A | ∃Q, C := B | ¬B, Q := R | R− , S := Q | ¬Q.

Also, TBox inclusions are of the forms B ⊑ C and Q ⊑ S, while ABox assertions are of the forms A(a) and R(a1, a2).

First-order interpretations are used to define the semantics of DLs. They are of the form I = (∆ I , · I ), where ∆ I is a non-empty domain and · I is a function that maps every a ∈ NI to aI ∈ ∆ I , every A ∈ NC to AI ⊆ ∆ I , and every R ∈ NR to RI ⊆ ∆ I × ∆ I . Further, the interpretation function · I is extended to the concept and role formulae allowed in the DL under consideration. In DLliteR, · I is extended as follows: (R− )I = {(o1, o2) | (o2, o1) ∈ RI }, (∃Q)I = {o1 | ∃o2 (o1, o2) ∈ QI }, (¬B)I = ∆ I \ BI and (¬Q)I = (∆ I × ∆ I ) \ QI . Let I be an interpretation. I satisfies a DL concept C (resp. role R) if it has a non-empty interpretation, i.e., CI ̸= ∅ (resp. RI ̸= ∅). I satisfies an inclusion C1 ⊑ C2 (resp. R1 ⊑ R2) if C1I ⊆ C2I (resp. R1I ⊆ R2I ) holds, and it satisfies a TBox if it satisfies all the TBox inclusions. I satisfies an assertion C(a) (resp. R(a1, a2)) if aI ∈ CI (resp. (a1I , a2I ) ∈ RI ) holds, and it satisfies an ABox if it satisfies all the ABox assertions. Finally, I is a model of a concept, role, inclusion, TBox, assertion or ABox if I satisfies it.

Generalization/specialization relations Concepts and roles are compared through subsumption, which may rely on the knowledge of a TBox T . C1 is subsumed by C2 or C2 subsumes C1 w.r.t. T , denoted C1 ⪯ T C2, if every model I of T is a model of the inclusion C1 ⊑ C2; subsumption for roles is defined similarly.

ABoxes are compared through entailment, which may also rely on a TBox T . A1 entails A2 or A2 is entailed by A1 w.r.t. T , denoted A1 |=T A2, if every model I of A1 and T is also a model of A2. As special case, an ABox A entails a concept assertion C(a) w.r.t. T , denoted A |=T C(a), if A entails w.r.t. T the ABox {C(a)}; entailment of a role assertion is defined similarly.

Data management The main data management tasks are consistency checking and query answering. They involve an ABox A and may rely on the knowledge of a TBox T . A is consistent w.r.t. T if A has a model that is also a model of T . To query ABoxes, we consider the first-order language of unions of conjunctive queries that is widely-considered for ontology-mediated query answering [ 3 ] and ontology-based data access [ 17 ]. A conjunctive query (CQ) is of the form ∃Y⃗ ϕ , where ϕ is a conjunction of the forms A(t) or R(t, t′) where t, t′ are variables or constants, and Y⃗ is a tuple of variable from ϕ . A CQ is Boolean if all its variables are existentially quantified. The answer to a Boolean CQ q is true if it is entailed by A and T , denoted A |=T q, i.e., q is true in all the models of A and T ; otherwise q is false. The answer to a non-Boolean CQ q with free variables X1, . . . , Xn, i.e., of arity n, is the set of all the tuples of constants of the form ⃗a = ⟨a1, . . . , an⟩ such that A |=T q[⃗a], where q[⃗a] is the Boolean CQ obtained by replacing each Xk by ak for 1 ≤ k ≤ n. We note qA,T the answer set of q on A w.r.t. T , with the convention that for Boolean queries qA,T = ∅ if q is f alse, and qA,T = {⟨⟩} if q is true where ⟨⟩ is the empty tuple. A union of CQs (UCQ) is a disjunction of CQs with same arity. The answer to a UCQ is the union of the answers to its individual CQs. 3

Quotient operation for ABoxes

An ABox A can be represented, and is frequently drawn, as a multigraph with typed vertices and typed edges: the vertices are the constants in A, there is a vertex type C on the vertex a if the concept assertion C(a) is in A, and there is an R-typed edge a →R a′ if the role assertion R(a, a′) is in A. Based on this analogy, we transfer the quotient operation from graphs to ABoxes to define quotient ABoxes.

Definition 1 (Quotient ABox). Let A be an ABox, ≡ be some equivalence relation between constants, and let a≡1 , . . . , a≡n denote by a slight abuse of notation both the equivalence classes of the constants in A w.r.t. ≡ and the names of these equivalence classes. The quotient ABox of A w.r.t. ≡ is the ABox A≡ such that: –– CR((aa≡ii ),a∈≡j A)∈≡ Aift≡heirfetheexriestesxisat ∈aa≡i∈ sau≡ch that C(a) ∈ A, for 1 ≤ i ≤ n, ≡ i and a′ ∈ a≡j such that R(a, a′) ∈ A, for 1 ≤ i, j ≤ n.

In this paper, we do not focus on particular equivalence relations between constants (i.e., binary relations that are reflexive, symmetric and transitive), though in the description logic setting we consider they should be denfied w.r.t. an ABox and a TBox. Discussing DL-specific equivalence relations between constants is delegated to future work.

Tex.

Example 1 (Running example). Let us consider the ABox Aex = {PhDS(s), sB(s, r1), sB(s, r2), R(r1), R(r2), wW(r2, r1), wF(r1, l1), wF(r2, l2),

ULab(l1), CLab(l2)} which states that s is a PhD student (PhDS), who is jointly supervized by (sB) the researchers (R) r1, who works for (wF) the university lab (ULab) l1, and r2, who works for the company lab (CLab) l2 and who also works with (wW) r1.

The graph representations of Aex and of three of quotient ABoxes obtained from it are depicted in Fig. 1. The equivalence relations used in this figure somehow reflect that equivalent constants are instances of the same concept for Fig. 1(b), or instances of non-disjoint concepts with a common generalizing concept if we assume that company labs and university labs are not necessarily disjoint kinds of labs for Fig. 1(c), i.e., CLab ⊑ Lab ∈ Tex and ULab ⊑ Lab ∈ Tex, and if we further assume that PhD student and researcher are not necessarily disjoint kinds of employees for Fig. 1(d), i.e., PhDS ⊑ Emp ∈ Tex and R ⊑ Emp ∈

In the sequel, we term summary of an ABox A a quotient ABox of A. 4

Characterization of ABox summaries

Although Definition 1 does enable the quotient-based summarization of an ABox, an ABox is not just a graph because it also has a first-order semantics. An important question that therefore arises is: does there exist a semantic relationship between an ABox and a summary of it? To answer this question we first point out that, in Definition 1, the implicit function that maps the constants in A to the constants in A≡ defines a homomorphism from A to A≡ ; from now, we name this function h and we say that an ABox constant a is represented by a summary PhDS s wW

PhDS s

sB sB sB

R r1 r2

R (a) Aex wF wF

ULab l1 l2

CLab

R r1|r2 wF

ULab

l1 wF wF

CLab (b) A≡b with r1 ≡ r2 sB PhDS s sB l1|l2 CLab, ULab

PhDS, R s|r1|r2 wF l1|l2 CLab, ULab wW (c) A≡c with r1 ≡ r2 and l1 ≡ l2 wW (d) A≡d with s ≡ r1 ≡ r2 and l1 ≡ l2 constant a≡ if h(a) = a≡ holds. Based on this ABox-to-summary homomorphism, we answer the above question by relating an ABox and a summary of it through entailment between ABoxes w.r.t. a TBox. The property below states that if we compare an ABox and a summary of it through entailment regardless of the constants they use (they are incomparable otherwise because they use diferent sets of constants), by just considering how the ABox (resp. summary) atoms join according to these constants, then the ABox is more general than its summary.

Property 1. Let T be a TBox, a1, . . . , am be the constants in an ABox A and cao≡1n,s.t.a.n,tas≡nasbeextishteenctoianlstvaanrtiasbilness,otmheens∃uam≡1 m·· · a∃ryaA≡n≡ A≡of |=AT. I∃faw1· e·· ∃conasmidAer htohledsse.

Besides, it is also worth noting that, due to the existence of the ABox-tosummary homomorphism, an ABox and a summary of it are related through the essential data management tasks of consistency checking and query answering. Property 2. Let T be a TBox, A be an ABox and A≡ be some summary of A. If A≡ is consistent w.r.t. T , then A is consistent w.r.t. T . lPertoqpebrteya3.UCLeQt.TIf bqeAa,T T̸=Bo∅x,hAoldbsetahnenAqBhAo≡x,,T A̸ =≡ ∅ holds, with qh the query q be some summary of A, and in which every constant a is replaced by its image h(a) through the A-to-A≡ homomorphism h.

We remark that Property 2 and the contraposition of Property 3 (i.e., if qA≡ ,T = ∅ holds then qA,T = ∅ holds) are of practical interest to check rapidly h if, for sure, an ABox is consistent and if a query has no answer, by using the typically (much) smaller ABox summary.

Example 2 (Cont.). From now, let us assume that Tex just consists of the inclusions mentioned in Example 1.

To illustrate Property 1, it is easy to see on Fig. 1 that A≡i |=∅ Aex holds for i ∈ {b, c, d}, hence holds with |=Tex .

As for Property 2, it is also easy to see that A≡i is consistent w.r.t. Tex, for i ∈ {b, c, d}, as well as Aex. We remark that if PhD students and researchers would be disjoint, i.e., PhDS ⊑ ¬R ∈ Tex, then A≡d would be inconsistent w.r.t. Tex while Aex would be consistent w.r.t. Tex. This observation indicates that DL-specific equivalence relations should avoid to the extent possible building inconsistent summaries out of consistent ABoxes.

Finally, let us illustrate Property 3 on Fig. 1. Consider the UCQ q = R(X) ∧ wF(X, l1)∪R(X)∧wF(X, l2) asking for the researchers who work for the lab l1 or l2. q has some answer on Aex w.r.t. Tex (qAex,Tex = {⟨r1⟩, ⟨r2⟩}), thus its translation qh has some answer w.r.t. Tex on the summary A≡i , i ∈ {b, c, d}. For the CQ q = PhDS(X1) ∧ wW(X1, X2) asking for the PhD students and people with whom they work, we note that because the qh translation does not have answers on A≡i w.r.t. Tex, i ∈ {b, c}, i.e., qhA≡i ,T = ∅, hence by contraposition of Property 3, q does not have answer on Aex w.r.t. Tex either, i.e., qAex,Tex = ∅. However, we observe that qh has some answer on A≡d w.r.t. Tex, while q has no answer on Aex w.r.t. Tex, i.e., this particular summary fails in detecting that q has no answer on Aex w.r.t. Tex. This observation suggests that DL-specific equivalence relations should be carefully chosen depending on the target summary usage.

Resuming our initial discussion, though Property 1 states that a summary is more specific than the ABox it summarizes up to the set of constants they use, it immediately raises a second crucial question: to which extent a summary is more specific than the original ABox it summarizes? We answer this second question by characterizing to which extent the knowledge a summary has about a constant is more specific than the knowledge the ABox has about the equivalence class this summary constant represents, and similarly for a pair of summary constants and the pair of equivalence classes these summary constants represent in the ABox.

For an ABox A and a TBox T , the most precise knowledge about a set of constants a1, . . . , an corresponds to one or more concepts known as the most specific concepts (MSCs) a1, . . . , an are all instances of [ 2 ]. A concept C is a MSC of a1, . . . , an w.r.t. A and T if ( i) A |=T C(ai) holds for all i ∈ [1, n], and (ii) for any concept D such that A |=T D(ai) holds for all i ∈ [1, n], either C ⪯ T D holds or C ̸⪯ T D and D ̸⪯ T C hold. We remark that though a MSC may be of infinite size in some DLs, this is not an issue here as we just want a semantic characterization of it. In the sequel, we designate by mscA,T (a1, . . . , an) the conjunction (⊓) of the MSCs of a1, . . . , an w.r.t. A and T 1 and we refer to it as the MSC of a1, . . . , an w.r.t. A and T , because it always exists (the 1 a1, . . . , an obviously have a single MSC in DLs equipped with conjunction (⊓). empty conjunction is the universal concept ⊤) and it is obviously unique up to equivalence.

A first step towards answering our second question is Lemma 1, which provides an upper approximation through subsumption of the MSC of some summary constant w.r.t. the individual MSCs of the equivalent ABox constants it represents; Lemma 1 follows from the existence of the ABox-to-summary homomorphism.

Lemma 1. Let T be a TBox, A be an ABox and A≡ be the summary of A w.r.t. the ≡ equivalence relation. If a1, . . . , an are all the constants in A that belong to the equivalence class a≡ according to ≡ , then the following holds: Example 3 (Cont.). Let us consider the summary A≡d of Aex in Fig. 1 and illustrate Lemma 1 with the summary constant l1|l2 and the ABox constants l1, l2 it represents: mscA≡d ,Tex (l1|l2) = Clab ⊓ ULab ⊓ ∃wF− , mscAex,Tex (l1) = ULab ⊓ ∃wF− , mscAex,Tex (l2) = CLab ⊓ ∃wF− , and mscA≡d ,Tex (l1|l2) ⪯ Tex ⊓x∈{l1,l2}mscAex,Tex (x), i.e., Clab ⊓ ULab ⊓ ∃wF− ⪯ Tex (ULab ⊓ ∃wF− ) ⊓ (CLab ⊓ ∃wF− ), clearly holds.

At the same time, in the above Lemma 1, to the set of (equivalent) constants a1, . . . , an corresponds its most specific concept mscA,T (a1, . . . , an), as recalled above. As the next step towards answering our second question, Lemma 2 provides a lower approximation through subsumption of the MSC of a1, . . . , an w.r.t. the individual MSCs of these constants; it follows from the definition of MSC.

Lemma 2. Let T be a TBox and A be an ABox. If a1, . . . , an are constants in A, then the following holds: Example 4 (Cont.). Let us consider again the summary A≡d of Aex in Fig. 1 and illustrate Lemma 2 with the summary constant l1|l2 and the ABox constants l1, l2 it represents like in the preceding example: mscAex,Tex (l1, l2) = Lab ⊓ ∃wF− , and ⊔x∈{l1,l2}mscAex,Tex (x) ⪯ T mscAex,Tex (l1, l2), i.e., (ULab ⊓ ∃wF− ) ⊔ (CLab ⊓ ∃wF− ) ⪯ Tex Lab ⊓ ∃wF− , clearly holds.

We are now ready to answer our second question with Theorem 1, which is a direct corollary of Lemma 1 and Lemma 2. It relates mscA,T (a≡ ) and mscA,T (a1, . . . , an) through their respective upper and lower approximations. Theorem 1. Let T be a TBox, A be an ABox and A≡ be the summary of A w.r.t. the ≡ equivalence relation. If a1, . . . , an are all the constants in A that belong to the equivalence class a≡ according to ≡ , then the following holds: From Theorem 1, it is now clear that the semantic distance through subsumption between the knowledge about each constant a≡ in the summary, i.e., msc(a≡ ), and the knowledge about the set of equivalent constants that a≡ represents in the ABox, i.e., mscA,T (a1, . . . , an), corresponds at least to approximate the union of individual knowledge about a1, . . . , an in the ABox by a conjunction, i.e., approximating Fin=1 mscA,T (ai) by dn i=1 mscA,T (ai).

We establish similar results (omitted due to space limitations) by adapting the notion of MSC to roles.

Finally, we stress that all the above results just follow from the definition of quotient ABox (Definition 1). In particular, they are independent of the equivalence relation (≡ ) and of the DL under consideration. 5

Related work

We present below the works from the DL literature that are closely related to ABox summarization.

An ABox summarization approach for the SHIN DL has been studied to optimize consistency checking [ 8 ] and instance retrieval in the subsequent work [ 6 ], i.e., finding the instances of a given SHIN concept; these two summary-based optimizations are implemented in the SHER system [ 7 ]. This approach can be seen as a particular instantiation for the SHIN DL of our general ABox summarization framework, in which equivalent constants are non-distinct constants2 stored in the ABox as instances of the same SHIN concepts; our framework allows using this definition of equivalent constants for the equivalence relation ≡ or any other relevant to the target summary usage. We therefore provide a better understanding of the SHIN summaries of [ 8,6,7 ] by establishing that they are more specific than the SHIN ABoxes they summarize, and notably to which extent they are more specific. Further, we remark that our ABox summarization approach, because it is based on the quotient operation, is more declarative than that of [ 8,6,7 ], by elegantly decoupling the summarization means (i.e., fusing equivalent constants) from the high-level specicfiation of equivalent constants, i.e., the equivalence relation ≡ to use. By contrast, SHIN summaries are deifned by an explicit canonical function that maps ABox constants to summary one; our function h has similar goal but is implicit because derived from the ABox and equivalence relation at hand.

Another form of ABox summarization is ABox abstraction, which has been studied for materialization in the Horn-ALCHOI and Horn-SHOIF DLs [ 9,10 ], 2 In SHIN , constants are stated distinct by using assertions of the built-in role ̸=. i.e., for pre-computing and storing entailed assertions. The idea is to represent several assertions of an ABox by a single one in the ABox abstraction. To this aim, an ABox B is defined as an abstraction of an ABox A if there exists a mapping from the constants in B to the constants in A that defines a homomorphism from B to A. Hence, by contrast with our summaries that are more specific than original ABoxes, abstractions are more general than original ABoxes, thus capture a subset of their contents.

Also related to ABox summarization is ABox modularization, which has been studied for instance retrieval in the SHIF [ 13 ] and SHI [ 15 ] DLs. It consists in extracting a subset of an ABox that entails all the assertions in which a given constant is involved.

Finally, less related works on ABox summarization do not summarize an ABox by another, e.g., [ 16 ] which computes statistics about the concept and roles instances, and Abstract Knowledge Patterns of the form (C1, r, C2) indicating that the role r relates instances of the concept C1 to instances of the concept C2. 6

Conclusion and perspectives

We devised a well-founded ABox summarization framework for DLs, by applying the quotient operation from graph theory to ABoxes and examining the quotient ABoxes, i.e., summaries, it produces.

With this now well-understood ABox summarization framework in place, the next necessary step is to study DL-specific equivalence relations between constants, to explore, visualize or optimize reasoning on and the management of ABoxes. These equivalence relations might be sought either from scratch or derived from the ones used for graph summarization if meaningful w.r.t. the DL semantics, e.g., the well-established and widely adopted ones based on bisimilation [ 14 ] or the recent ones based on the cooccurrence of types and edges [ 11 ]. Also, ideally, such relations should be devised to avoid to the extent possible building inconsistent summaries out of consistent ABoxes (recall Example 2).

Acknowledgments Appendix

This work was partially supported by the ANR project CQFD (ANR-18-CE230003).

Proof (Proof of Property 1). In case ∃a≡1 , . . . , a≡n A≡ is inconsistent w.r.t. T , then Property 1 trivially holds. Otherwise, let I = (∆ I , · I ) be a model of T and ∃a≡1 , . . . , an

≡ A≡ and let us show that I is also a model of ∃a1, . . . , am A.

Because I is a model of ∃a≡1 , . . . , a≡n A≡ , there exists a function ψ from a≡1 , . . . , a≡n to the interpretation domain ∆ I of I such that ψ (a≡i ) ∈ CI for every CR((aa≡i≡i ),a∈≡j )A∈≡ Aa≡nd 1 ≤ i ≤ n, and such that (ψ (a≡i ), ψ (a≡j )) ∈ RI for every and 1 ≤ i, j ≤ n.

Let us consider the composite function ψ ◦ h, where ψ is the above function and h is the implicit function that defines our ABox-to-summary homomorphism. This function thus maps a1, . . . , am to the interpretation domain of I and clearly ψ (h(ai)) ∈ CI for every C(ai) ∈ A and 1 ≤ i ≤ m, because by construction of A≡ C(h(ai)) ∈ A≡ , and such that (ψ (h(ai)), ψ (h(aj))) ∈ RI for every R(ai, aj) ∈ A and 1 ≤ i, j ≤ m, because by construction of A≡ R(h(ai), h(aj)) ∈ A≡ . Therefore, I is also a model of ∃a1, . . . , am A. ⊔⊓ Proof (Proof of Property 2). If A≡ is consistent w.r.t. T , there must exist some model I = (∆ I, · I) of A≡ . Let us show that we can build a model of A out of I, hence that A is consistent w.r.t. T .

Consider the model J = (∆ J , · J ) denfied as follows: ∆ J = ∆ I, AJ = AI for every concept name A ∈ NC , RJ = RI for every role name R ∈ NR, and for every constant a in A, aJ = h(a)I.

For every concept assertion C(a) ∈ A, C(h(a)) ∈ A≡ by definition of a quotient ABox, where h is the implicit function defining the ABox-to-summary homomorphism. Since I is a model of A≡ , I is a model of C(h(a)), i.e., h(a)I ∈ CI. By definition of J , CJ = CI and aJ = h(a)I, thus aJ ∈ CJ , i.e., J is a model of C(a). J is therefore a model of all the concept assertions in A.

Similarly, for every role assertion R(ak, al) ∈ A, R(h(ak), h(al)) ∈ A≡ by definition of a quotient ABox, where h is the implicit function defining the ABox-to-summary homomorphism. Since I is a model of A≡ , I is a model of R(h(ak), h(al)), i.e., (h(ak)I, h(al)I) ∈ RI. By definition of J , RJ = RI, akJ = h(ak)I and alJ = h(al)I, thus (akJ , alJ ) ∈ RJ , i.e., J is a model of R(ak, al). J is therefore a model of all the role assertions in A.

It therefore follows that J is a model of all the assertions in A, hence a model of A. ⊔⊓ Proof (Proof of Property 3). We remark that if a query is asked on an ABox that is inconsistent w.r.t. its associated TBox, then both true and f alse if the query is Boolean, and all the tuples of the arity of the query otherwise, are answers.

If A is not consistent w.r.t. T , then qA,T ̸= ∅, and by the contraposition of Property 2, A≡ is not consistent w.r.t. T too, therefore qhA≡ ,T ̸= ∅ and Property 3 holds in this first case.

If A is consistent w.r.t. T , while A≡ is not consistent w.r.t. T (see Example 2), then Property 3 holds in this second case: qhA≡ ,T ̸= ∅ holds whenever qA,T ̸= ∅ or qA,T = ∅ holds.

Otherwise, A is consistent w.r.t. T and A≡ is consistent w.r.t. T . Let us show that Property 3 holds in this third and last case. If the UCQ q has a non-empty answer set, then at least one of its CQs has a non-empty answer set. Let q′ be such a CQ in q. Let us assume, without loss of generality, that q′ is Boolean as the definition of non-Boolean query answers reduces to the Boolean case. Let us consider some model J of A≡ and T . Let I be the interpretation such that ∆ I = ∆ J , AI = AJ for every concept name A ∈ NC , RI = RJ for every role name R ∈ NR, and aI = h(a)J for every constant a in A. Clearly, by construction, I is homomorphic to J and is a model of A and T . Thus, I is also a model of q′. Further, J is obviously a model of q′ just in case q′ does not contain contants, because the constants in A do not have interpretations in J : ∆ I = ∆ J , AI = AJ for every concept name A ∈ NC , RI = RJ for every role name R ∈ NR. Otherwise, J is a model of qh′, in which constants from A are replaced by the constants in A≡ they are represented by: aI = h(a)J for every constant a in A. Therefore, every model of A≡ and T is a model of qh′, i.e., qh′ is entailed by A≡ and T , thus has a non-empty answer set, hence qh has a non-empty answer set too. ⊔⊓ Proof (Proof of Lemma 1). If A is inconsistent w.r.t. T , then A≡ is also inconsistent w.r.t. T (by contraposition of Property 2), and the Lemma clearly holds.

If A is consistent w.r.t. T while A≡ is inconsistent w.r.t. T , then the Lemma also clearly holds.

Otherwise, both A and A≡ are consistent w.r.t. T , and let us consider some model J of A≡ and T . Let I be the interpretation such that ∆ I = ∆ J , AI = AJ for every concept name A ∈ NC , RI = RJ for every role name R ∈ NR, and aI = h(a)J for every constant a in A. Clearly, by construction, I is homomorphic to J and is a model of A and T . Because I is a model of A and T , I is a model of the concept assertion (mscA,T (ai))(ai) for 1 ≤ i ≤ n. By construction of I, J is a model of (mscA,T (ai))(h(ai)) for 1 ≤ i ≤ n, where h is the ABox-to-summary homomorphism, i.e., J is a model of (mscA,T (ai))(a≡ ) for 1 ≤ i ≤ n.

It follows that all the models of A≡ and T are models of (mscA,T (ai))(a≡ ) for 1 ≤ i ≤ n, and of (mscA≡ d,Tn(a≡ ))(a≡ ) (∗ ). By definition of the MSC of ad≡ n, either (i) mscA≡ d,Tn(ai=≡1) m⪯sTcA,Ti=(a1im)̸⪯scTA,mT(saciA)≡ h,Tol(das≡ o)rh(oiild).mInsctAh≡ e,lTa(tate≡r)c⪯̸ asTe i=1 mscA,T (ai) and dn (ii), mscA≡ ,T (a≡ )⊓ i=1 mscA,T (ai) would be strictly subsumed by mscA≡ ,T (a≡ ) w.r.t. T , and because of (∗ ) above, mscA≡ ,T (a≡ ) ⊓ dn i=1 mscA,T (ai) would be the MSC of ad≡,n a contradiction. Therefore, we must be in case (i) and mscA≡ ,T (a≡ ) ⪯ T i=1 mscA,T (ai) holds. ⊔⊓ Proof (Proof of Lemma 2). Lemma 2 follows the well-known relationship between the MSC of constants and the notion of least common subsumer (LCS) of concepts [ 2 ], i.e., a most specific concept generalizing all of them. A concept C is a LCS of the concepts C1, . . . , Cn w.r.t. a TBox T if ( i) Ci ⪯ T C holds for all i ∈ [1, n], and (ii) for any concept D such that Ci ⪯ T D holds for all i ∈ [1, n], either C ⪯ T D holds or C ̸⪯ T D and D ̸⪯ T C hold. We denote by lcsT (C1, . . . , Cn) the conjunction of the LCSs of C1, . . . , Cn w.r.t. T and we refer to it as the LCS of C1, . . . , Cn w.r.t. T . In particular, it is well-known that mscA,T (a1, . . . , an) and lcsT (mscA,T (a1), . . . , mscA,T (an)) are equivalent [ 2 ]. Further, it is also known that lcsT (mscA,T (a1), . . . , mscA,T (an)) cannot be more specific than Fin=1 mscA,T (ai), notably they are equivalent in DLs allowing disjunction (⊔). Thus, Fin=1 mscA,T (ai) ⪯ T mscA,T (a1, . . . , an) holds. ⊔⊓ Proof (Proof of Theorem 1). The proof trivially follows from Lemma 1, Lemma 2, and the standard first-order semantics of disjunction ⊔ and conjunction ⊓ in DLs. ⊔⊓

1. Abriola , S. , Barceol,´ P., Figueira , D. , Figueira , S.: Bisimulations on data graphs . In: KR ( 2016 )

2. Baader , F. , Calvanese , D. , McGuinness , D.L. , Nardi , D. , Patel-Schneider , P.F . (eds.): The Description Logic Handbook . Cambridge University Press ( 2003 )

3. Bienvenu , M. : Ontology-mediated query answering: Harnessing knowledge to get more from data . In: IJCAI ( 2016 )

4. Calvanese , D. , Giacomo , G.D. , Lembo , D. , Lenzerini , M. , Rosati , R. : Tractable reasoning and eficient query answering in description logics: The DL-Lite family . J. Autom. Reason . 39 ( 3 ) ( 2007 )

5. Cebiric , S. , Goasdoue,´ F., Kondylakis , H. , Kotzinos , D. , Manolescu , I. , Troullinou , G. , Zneika , M. : Summarizing semantic graphs: a survey . VLDB J . 28 ( 3 ) ( 2019 )

6. Dolby , J. , Fokoue , A. , Kalyanpur , A. , Kershenbaum , A. , Schonberg , E. , Srinivas , K. , Ma , L.: Scalable semantic retrieval through summarization and refinement . In: AAAI ( 2007 )

7. Dolby , J. , Fokoue , A. , Kalyanpur , A. , Schonberg , E. , Srinivas , K. : Scalable highly expressive reasoner (SHER) . J. Web Semant . 7 ( 4 ) ( 2009 )

8. Fokoue , A. , Kershenbaum , A. , Ma , L. , Schonberg , E. , Srinivas , K. : The summary abox: Cutting ontologies down to size . In: ISWC ( 2006 )

9. Glimm , B. , Kazakov , Y. , Liebig , T. , Tran , T. , Vialard , V. : Abstraction refinement for ontology materialization . In: ISWC ( 2014 )

10. Glimm , B. , Kazakov , Y. , Tran , T. : Ontology materialization by abstraction refinement in horn SHOIF . In: Singh, S.P. , Markovitch , S. (eds.) AAAI ( 2017 )

11. Goasdoue ,´ F., Guzewicz , P. , Manolescu , I. : RDF graph summarization for firstsight structure discovery . VLDB J . 29 ( 5 ) ( 2020 )

12. Gross , J.L. , Yellen , J. , Zhang, P. (eds.): Handbook of Graph Theory. Discrete Mathematics and Its Applications , Chapman & Hall / CRC Press, Taylor & Francis ( 2013 )

13. Guo , Y. , Heflin , J.: A scalable approach for partitioning owl knowledge bases . In: International Workshop on Scalable Semantic Web Knowledge Base Systems ( 2006 )

14. Liu , Y. , Safavi , T. , Dighe , A. , Koutra , D. : Graph summarization methods and applications: A survey . ACM Comput. Surv . 51 ( 3 ) ( 2018 )

15. Mo¨ller, R., Neuenstadt , C. , O¨cze¸p, O¨.L. , Wandelt , S.: Advances in accessing big data with expressive ontologies . In: International Workshop on Description Logics ( 2013 )

16. Palmonari , M. , Rula , A. , Porrini , R. , Maurino , A. , Spahiu , B. , Ferme , V.: ABSTAT: linked data summaries with abstraction and statistics . In: ESWC ( 2015 )

17. Xiao , G. , Calvanese , D. , Kontchakov , R. , Lembo , D. , Poggi , A. , Rosati , R. , Zakharyaschev , M. : Ontology-based data access: A survey . In: IJCAI ( 2018 )