=Paper=
{{Paper
|id=Vol-71/paper-16
|storemode=property
|title=Compiling Complex Terminologies for Query Processing
|pdfUrl=https://ceur-ws.org/Vol-71/Stuckenschmidt.pdf
|volume=Vol-71
}}
==Compiling Complex Terminologies for Query Processing==
<pdf width="1500px">https://ceur-ws.org/Vol-71/Stuckenschmidt.pdf</pdf>
<pre>
                      Compiling Complex Terminologies for Query Processing

                                                  Heiner Stuckenschmidt
                                                Vrije Universiteit Amsterdam
                                                       heiner@cs.vu.nl


                           Abstract                                  Our concern is now to close the gap between these views
                                                                  and to provide ways of exploiting the expressive power of
        It is widely accepted that the Semantic Web will          rich ontology languages in combination with efficient imple-
        be based on machine-readable metadata describ-            mentations that rely on light weight solutions. In this paper,
        ing the content of resources. These descriptions          we concentrate on the problem of querying information on
        are designed to enable intelligent agents to locate       the Semantic Web. Most existing implementations support
        and filter relevant information with a higher level       querying RDF models, some also supporting schema aware
        of accuracy. The Resource Description Framework           querying. However, it has been argued that the presence of
        (RDF) has been developed as universal language            rich background knowledge requires deductive reasoning for
        for encoding content-related metadata and recently        achieving complete answers [Horrocks and Tessaris, 2000].
        a number of query languages have been proposed            To our knowledge, none of the existing implementations
        to extract information from metadata models. Cur-         of semantic web infrastructure supports this kind of query
        rent applications of RDF and RDF query languages          processing 4 . Therefore, if we want to make use of the
        only use very simple metadata like simple concept         expressive power of languages like DAML+OIL and the
        hierarchies (The Open Directory) or pre-defined at-       querying facilities of existing infrastructure, we have to
        tribute value pairs (Dublin Core). In this paper          find ways of pre-processing DAML+OIL models in such
        we address the problem of encoding and querying           a way that they can be handled by methods developed for
        complex metada using RDF models and queries. In           light-weight approaches.
        our approach we consider ontologies with complex
        concept definitions in the spirit of DAML+OIL and            The obvious way of dealing with DAML+OIL models
        propose a pre-processing method that enables us to        in a lightweight setting, ignoring all language features not
        access these models using RDF query languages             supported by the light weight approach has serious draw-
        without losing information.                               backs as it always means loosing information. Therefore,
                                                                  rather than weakening the background model, our approach
1       Motivation                                                is based on the idea of compiling out implicit knowledge in
                                                                  the background model and enriching the plain RDF model
In todays semantic web research there is still a big gap          with this additional information. As finding the hidden
between theoretical considerations about the use of rich          information again requires expensive logical reasoning,
background knowledge in terms of ontologies and real              this compilation step is done off-line. The actual query
implementations that are actually used in applications on the     answering is done on the enriched model already. This
web. While theoreticians claim that rich knowledge models         approach, known as knowledge compilation, is well known
(as supported by the ontology language DAML+OIL or                in Artificial Intelligence research [Cadoli and Donini, 1997].
recently OWL) provide the expressive power and reasoning          The novelty of this work is to apply the idea in the specific
support needed in many domains, the application of such           context of the semantic web and its representations.
rich models is hampered by the high complexity of reasoning
that makes them unlikely to scale up to a realistic setting. As     The contribution of this paper is two-fold:
a consequence, most existing implementations of semantic
web infrastructure such as JENA 1 , RDFSuite 2 or Sesame 3          • We show how existing reasoning techniques can be used
rely on more light-weight solutions restricting them selves to        to compile implicit information about named objects of
RDF, RDF schema or a small subset of the OWL language.                a domain model into a plain RDF description.
                                                                    • We propose a heuristic approach for enhancing this
    1                                                                4
      http://www.hpl.hp.com/semweb/jena.htm                            There is a prototypical implementation of the query answering
    2
      http://139.91.183.30:9090/RDF/                              approach of Horrocks and Tessaris, but it does not provide a real
    3
      http://sesame.aidministrator.nl/                            infrastructure for handling semantic web data
      description with information about anonymous objects           especially, graph matching is a basic technique with respect
      than can be derived from the background knowledge.             to reasoning and query answering (see e.g. [Caroll, 2002]).
   The paper is organized as follows. We first review some           Adopting this technique, query answering is done by find-
basic notions of RDF, mainly referring to its model theoretic        ing a match between the query graph and the RDF model,
semantics. We pay special attention to the interpretation of         allowing unlabeled nodes in the query graph to be matched
unlabeled nodes as existentially quantified variables referring      against any node in the model. Once a match is found, the
to anonymous objects and on their role in query answering.           query graph can be instantiated with the matching nodes on
Afterwards we review existing work on explicating implicit           the model or those nodes can be returned as an answer.
information contained in background knowledge. We con-                  With the increasing interest in RDF, a number of RDF
sider the case of RDF schema and of DAML+OIL model.                  query languages have been developed and integrated in the
Finally, we discuss the issue of implicit relational informa-        available infrastructure. One of the most widely adopted ones
tion that is only partially covered by existing approaches.          that is based on graphs as queries is SquishQL [Miller et al.,
We show how the query answering approach of Horrocks                 2002]. Different Implementations of SquishQL based lan-
and Tessaris [Horrocks and Tessaris, 2000] can be used to            guages exist as part of RDF APIs or persistent storages and
compile relational information about named objects and we            can therefore be assumed to be widely used. In the following,
sketch an approach for dealing with unnamed objects in the           we consider RDQL, an implementation of SquishQL that is
compilation process. We summarize with a discussion of               part of the Jena RDF Toolkit and the RDF storage and query
open questions.                                                      system Sesame. Following the basic definitions of SquishQL
                                                                     a query has the following components:
                                                                      SELECT the select clause identifies variables form the query
2     RDF Querying                                                       graph that should be returned as the result of the query.
The ability to query RDF models in an efficient way is an             FROM the from clause specifies the RDF model to be used
essential aspects with respect to building semantic web ap-              as a basis for querying. The model is identified by its
plications. As RDF is designed as a language for describ-                URI.
ing the content of an information source, being able to find          WHERE the where clause describes the query graph in
a specific piece of information is equivalent to being able              terms of a conjunction of RDF triples connected by vari-
to find the corresponding RDF statement. In the following,               ables.
we present a general view on answering queries based on the
                                                                      AND the and clause contains a Boolean expression that
RDF data model introduced above and briefly review the RDF
                                                                         constraints possible variable instantiations in the query
data model and some existing query languages.
                                                                         graph.
2.1    RDF Query langauges                                            USING the using clause can be used to define abbreviations
The RDF data model described above and its associated se-                used in the specification of the query.
mantics provides us with a basis for defining queries on RDF           These constructs allow the user to formulate queries in an
models in a straightforward way. The idea that has been pro-         SQL like style which is well known by most application de-
posed elsewhere and is adopted here is to use graphs with un-        velopers.
labeled nodes as queries. The unlabeled elements in a query
graph can be seen as variables of the query. In this approach,       2.2   An Example
answers to a query can be defined in two ways:                       We use a toy example from the domain of family relationships
    • A sub-graph of the given RDF model that is an instanti-        to illustrate the way queries are formulated. For the sake of
      ation of the query graph.                                      simplicity, we omit the FROM, and USING clauses.
    • The set of resources that if they are used to instantiate      SELECT ?p ?c
      unlabeled nodes in the query graph result in a sub-graph       WHERE
      of the given RDF model                                             (?p has-child ?c)
                                                                         (?p type ?g)
   From a theoretical point of view these two definitions                (?p age ?a)
are exchangeable as one can easily be derived from the               AND
other by either extracting instantiated resources from the               (?g eq #female)
an answer graph or by instantiating the query graph with
the resources from the answer set, respectively. Due to this            The above query asks for mothers and their children. The
correspondence, we will use the definitions synonymously.            result of a query will be a table with the identifiers of persons
                                                                     that satisfy the requirements specified in the query, namely
   Another characteristic feature of RDF is the possibility to       that they are in the has-child relation where the first member
refer to entire RDF graphs and not only to single resources.         of the relation is of type female. We further use the following
At first sight, the ability to refer to entire statements destroys   RDF model of mothers to be queried:
the graph Another feature of graph-based queries is that an-
swering queries about RDF models can be based on the same            <rdf:Description rdf:about="alice">
techniques that are used for reasoning about such model. In            <type rdf:resource="#female">
  <has-child rdf:resource="#betty>                                       Expressive language have been developed that can be used
</rdf:Description>                                                    to encode the kind of background knowledge needed to
                                                                      draw the conclusions mentioned above. In especially, RDF
<rdf:Description rdf:about="jane">                                    schema provide means for define simple schematic informa-
  <mother-of rdf:resource="#tom">                                     tion [Brickley and Guha, 2003]. DAML+OIL extends RDF
</rdf:Description>
                                                                      schema towards a more expressive language for defining the
<rdf:Description rdf:about="eve">                                     meaning of classes [van Harmelen et al., 2001a]. Unfor-
  <type rdf:resource="#female">                                       tunately, simple query language like RDQL are not able to
  <has-son rdf:resource="#charles">                                   make use of the background knowledge per se. We rather
  <has-mother rdf:resource="#betty">                                  have to do some pre-processing on the RDF model to be
</rdf:Description>                                                    queried in order to get all intended results. In the follow-
                                                                      ing, we describe how this pre-processing step can be done in
<rdf:Description rdf:about="betty">                                   order to query metadata that is based on an RDF schema and
</rdf:Description>                                                    on a DAML+OIL ontology, respectively, and refer to exist-
                                                                      ing approaches that use these methods to support querying.
<rdf:Description rdf:about="carol">
                                                                      Afterwards, we point out to remaining problems that are the
  <type rdf:resource="mother">
  <has-child rdf:resource="#doris">                                   main motivation for the compilation approach, we propose as
</rdf:Description>                                                    an extension of existing proposals.

<rdf:Description rdf:about="doris">                                   3.1   Schemas
  <type rdf:resource="#mother">                                       RDF schema provide a language for encoding structural back-
</rdf:Description>                                                    ground information about the vocabulary used in an RDF
                                                                      Model. This structural information provides insights into the
<rdf:Description rdf:about="mary">
                                                                      relations between the different elements of the model and
  <type rdf:resource="#virgin-mary">
</rdf:Description>                                                    helps to draw conclusions that could not be found from the
                                                                      plain model. In particular, a schema defines hierarchies of
                                                                      classes and relations as well as restrictions on the range and
   Applying the query specified above we would get the pair           the domain of relations. In the family domain, we use in our
(alice, betty) as a result because alice is defined to be female      example, the schema may contain the following information
at having a child. This result is not very satisfying as it is easy   about the classes and relations used in the model:
to see that all of the defined persons are actually mothers and
should therefore be in the answer to the query. The reason            <rdf:property id="has-son">
for the incomplete result is the lack of information about the          <rdfs:subPropertyOf rdf:resource="#has-child"/>
meaning of the descriptions.                                          </rdf:Property>

                                                                      <rdf:property id="has-mother">
3    Ontology-Aware Querying                                            <rdfs:range rdf:resource="#female"/>
One of the main features of the semantic web is the idea to           </rdf:property>
enrich metadata descriptions with explicit models of their in-
                                                                      <rdf:property id="mother-of">
tended meaning. These models range from simple schema
                                                                        <rdfs:subPropertyOf rdf:resource="#has-child"/>
definitions to complex ontologies that define concepts and              <rdfs:domain rdf:resource="#female"/>
describe them by necessary and sufficient conditions thus en-         </rdf:property>
abling intelligent applications to reason about their members
and their relation to each other. On the semantic web, we
want to exploit this semantic information for query answering            This schema defines that having a son is a special case of
as it provides us with background information for computing           having a child, that mother is a subclass of female persons
more complete results. In the example above we also want to           which is the range of the relation has-mother and the domain
retrieve instances as an answer to ?p that have the following         of the relation mother-of, which in turn is a special case of
properties:                                                           having a child. We immediately see that this information
                                                                      is relevant for answering our query as from it we can con-
    • are of type female and have a daughter or a son, because        clude that Jane and Eve fulfill the requirements specified in
      this implies having a child                                     the query. Formally, this is established by the axiomatic se-
    • have a child and are human and not male because not             mantics of RDF schema given in [Hayes, 2003]. Applying
      being male implies being female.                                these axioms to our example RDF model, we can add a num-
                                                                      ber of new facts that can be matched by the query engine. In
    • are of type mother, because this implies being female           particular, we get the following more complete definitions of
      and having a child.                                             jane, eve and betty:
    • are of type virgin-mary as this implies being the mother
      of christ.                                                      <rdf:Description rdf:about="jane">
  <mother-of rdf:resource="#tom">                                  </rdfs:subClassOf>
  <has-child rdf:resource="#tom">                                </daml:Class>
  <type rdf:resource="#female">
</rdf:Description>                                               <daml:Class rdf:ID="mother"> <rdfs:subClassOf>
                                                                   <daml:complementOf>
<rdf:Description rdf:about="eve">                                    <daml:Class rdf:resource="#male"/>
  <type rdf:resource="#female">                                    </daml:complementOf>
  <has-son rdf:resource="#charles">                                <daml:restriction>
  <has-child rdf:resource="#charles">                                <daml:onProperty rdf:resource="#has-child"/>
  <has-mother rdf:resource="#betty">                                 <daml:hasClass rdf:resource="#human"/>
</rdf:Description>                                                 </daml:restriction>
                                                                   </rdfs:subClassOf>
<rdf:Description rdf:about="betty">                              </daml:Class>
  <type rdf:resource="#female">
</rdf:Description>                                               <daml:Class rdf:ID="virgin-mary">
                                                                   <rdfs:subClassOf>
                                                                     <daml:Class rdf:resource="female">
   Now that the information about these resources has been           <daml:restriction daml:cardinality="1">
made explicit, posting the example query against the ex-               <daml:onProperty rdf:resource="#has-child"/>
panded model will also return jane and eve as an answer                <daml:hasValue rdf:resource="#jesus-christ"/>
to the query. Betty, however, is now only known to be fe-            </daml:restriction>
male, but there is no explicit statement about her child which     </rdfs:subClassOf>
disables the query engine to recognize her as matching the       </daml:Class>
query. In fact, this approach of first expanding a model using
schematic information and then evaluating queries against the       The model adds further semantic information about the
completed model is a common approach that has for example        domain to our model, namely the fact that the class of all
been implemented in Sesame [Broekstra et al., 2002].             humans is exactly the union of male and female person. That
                                                                 male persons cannot be female at the same time, that a mother
3.2   Ontologies                                                 is a female person having a child and that virgin Mary is
The example domain already shows that there are many as-         a female person having exactly one child which is jesus christ.
pects of terminological knowledge that can not be captured
by RDF schema. These aspects include the following facts            In [van Harmelen et al., 2001b] a formal semantics for
that might be relevant for answering queries about our do-       DAML+OIL is described. The semantics is based on an inter-
main:                                                            pretation mapping into an abstract domain. More specifically,
  • the same person may not be male and female                   every concept name is mapped on a set of objects, every prop-
                                                                 erty name is mapped on a set of pairs of objects. Individuals
  • a female person automatically becomes a mother when          (in or case resources) are mapped on individual objects in the
    having a child                                               abstract domain. Formally, an interpretation is defined as fol-
  • virgin Mary is the mother of christ                          lows:
   Including these facts in a model of information semantics     Definition 1 (Interpretation) An Interpretation consists of a
requires a far more expressive language. DAML+OIL [van           pair (∆, .E ) where ∆ is a (possibly infinite) set and .E is a
Harmelen et al., 2001a] is such a language that has been de-     mapping such that:
fined on top of RDF schema, extending it with additional           • xE ∈ ∆ for every individual x
operators for defining classes by constraining possible mem-
bers. For our domain a DAML+OIL model might contain the            • C E ⊆ ∆ for all classes C
following definitions:                                             • RE ⊆ ∆ × ∆ for all roles R
                                                                 We call .E the extension of an individual, concept or role,
<daml:Class rdf:ID="human">                                      respectively.
  <daml:equivalentTo>
    <daml:unionOf>                                                 This notion of an interpretation is a very general one and
      <daml:Class rdf:resource="#male"/>                         does not restrict the set of objects in the extension of a con-
      <daml:Class rdf:resource="#female"/>                       cept. This is done by the use of operators for defining classes.
    </daml:unionOf>                                              These kinds of operators restrict the possible extensions of a
  </daml:equivalentTo>
                                                                 concept. These kinds of restriction are the basis for deciding
</daml:Class>
                                                                 whether a class definition is equivalent, more specialized or
<daml:Class rdf:ID="male">                                       more general than another. Formally, we can decide whether
  <rdfs:subClassOf>                                              one of the following relations between two expressions hold:
    <daml:complementOf>                                           subsumption: C1 v C2 ⇐⇒ C1E ⊆ C2E
      <daml:Class rdf:resource="#female"/>
    </daml:complementOf>                                          membership: x : C ⇐⇒ xE ∈ C E
   Based on the definitions of subsumption and membership,          to translate the query into an equivalent concept expression,
we can use terminological reasoners to compute these rela-          classify this new concept and use standard inference methods
tions from a given model and the corresponding ontology.            to check whether an object is an instance of the query expres-
The main results for the given model are that mother is a           sion. This approach makes use of the fact that binary relations
subclass of female as all human beings that are not male are        in a conjunctive query can be translated into an existential re-
known to be female (the class human is a partition of male          striction in such a way that logical consequence is preserved
and female persons). This of course implies that all members        after a minor modification of the A-Box. Details are given in
of the class mother are also members of the class female. The       the following theorem.
resulting relations that correspond to the rdfs:subClassOf and      Theorem 1 (Role Roll-Up (Horrocks and Tessaris 2000))
the rdf:type statements can now be added to the RDF model           Let hC], R, Ai be a description logic knowledge base with
and its schema. The result are extended descriptions of Carol,      concept definitions C, relation definitions R and assertions
Doris and Mary as given below:                                      A. Let further R be a role, CI Concept names in T and a, b
                                                                    be individual names in A. Given a new concept name Pb not
<rdf:Description rdf:about="carol">                                 appearing in T , then
  <type rdf:resource="#mother"/>
  <type rdf:resource="#female"/>
  <has-child rdf:resource="#doris"/>                                        hC, R, Ai |= (a, b) : R ∧ b : C1 ∧ · · · ∧ b : Ck
</rdf:Description>
                                                                    if and only if
<rdf:Description rdf:about="doris">
                                                                        hC, R, A ∪ {b : Pb }i |= a : ∃R(Pb u C1 u · · · u Ck )
  <type rdf:resource="#mother">
  <type rdf:resource="#female"/>                                       As DAML+OIL can be seen as a specific variant of de-
</rdf:Description>                                                  scription logics, the query answering approach can be applied
                                                                    to DAML+OIL ontologies (compare [Horocks and Tessaris,
<rdf:Description rdf:about="mary">
                                                                    2002]). In especially, we can directly ask for objects that are
  <type rdf:resource="#female"/>
  <type rdf:resource="#virgin-mary">                                related by the has-child relation using the following very sim-
</rdf:Description>                                                  ple query:
                                                                                 Q(X, Y ) ← (X, Y ) : has − child
We see that the additional information obtained by explicitly       Using the theorem of Horrocks and Tessaris, we can now
adding implicit subclass and type relations to the model ex-        do the translation of the conjunct (X, Y ) : has − child
tends set of answers to our query with Carol, because it is         into a concept expression called role-up. In order to actu-
now explicitly stated that she is female and has a child named      ally retrieve related objects, we do this translation for in-
Doris. This approach goes further than the use of schema in-        stantiations of the general conjunct where the Y variable is
formation, however, it does not solve all problems, because         replaced by an object contained in the model. Substitut-
we still cannot find out that Doris and Mary are also answers       ing X for example by the object jesus − christ we get
to our query.                                                       the conjunct (X, jesus − christ) : has − child that trans-
                                                                    lates into the concept expression ∃has − child.Pjesus−christ
4     Relational Knowledge                                          This concept specifies all objects that are related to the ob-
The reason while present approaches fail to compile the on-         ject jesus − christ by the has − child relation. We can
tology in such a way that all intended answers can be given         use existing description logic reasoners in order to retrieve all
from the RDF model is caused by their limited abilities to          objects belonging to this concept. For the example instantia-
compile relational knowledge. In the example it followed            tion, the reasoner will return mary telling us that we can add
from the ontology that Mary is related to jesus-christ by the       the information that mary and jesus-christ are in the has child
has-child relation. In the case of Doris, it is implied that        relation into the RDF model. The new definition of mary will
she is connect to some, maybe unknown object via the same           be the following:
relation. Description logic reasoners allow to reason with
these kinds of relational information by constructing mod-          <rdf:Description rdf:about="mary">
els in terms of possible objects, their type and their relations.     <type rdf:resource="#female"/>
However, this information is only used internally to establish        <type rdf:resource="#virgin-mary">
subsumption and membership relations. In order to overcome            <has-child rdf:resource="#jesus-christ"/>
this problem we can use techniques for deductive query an-          </rdf:Description>
swering in the off-line compilation phase and store the re-
trieved answers as explicit knowledge in our RDF model.               In order to compile all the object relations implied by an
                                                                    ontology, we have to iterate this process over all instances and
4.1    Compiling Object Relationships                               over all relations mentioned. The corresponding algorithm is
Horrocks and Tessaris propose a deductive approach for an-          depicted as Algorithm 1.
swering conjunctive queries over description logic knowledge          Given a set of relations and objects, the algorithm compiles
bases. The idea of the approach of Horrocks and Tessaris now        out all relational information about them that is implied by
                                                                    that are directly connected with anonymous objects (compare
                                                                    figure 2). First of all, there is the daml:hasClass operator
                                                                    claiming that all objects of a class are related some object of a
                                                                    certain type. Further, there are cardinality statements claim-
                                                                    ing that all objects of a class are related to at least, at most
                                                                    or exactly a certain number of objects of a certain type. The
                                                                    case where no special type of the related objects is required
                                                                    is a special case and can therefore also be treated by the ap-
                                                                    proach. The same holds for the first mentioned daml:hasClass
                                                                    operators, because it is equivalent to stating that the mini-
                                                                    mal number of object of certain type that is required equals
                                                                    one. Further, requiring an exact number of related objects
                                                                    (daml:CardinalityQ) can be expressed by requiring a mini-
                                                                    mal and a maximal number of related objects of a certain
                                                                    type where both numbers equal the required number of ob-
                                                                    jects. Therefore, the main relational construct, we have to
                                                                    focus on when investigating the problem of compiling rela-
  Figure 1: Basic algorithm for compiling object relations          tional knowledge are the following constructs:

                                                                    <daml:restriction daml:minCardinalityQ="n">
a terminological knowledge base. When compiling an RDF                <daml:onProperty rdf:resource="#p"/>
model that is based on a DAML+OIL model, the termino-                 <daml:hasClassQ rdf:resource ="#C"/>
logical knowledge base will be provided by the DAML+OIL             </daml:restriction>
model. The sets of relations and objects will consist of all
known relations and objects that are contained either in the        <daml:restriction daml:maxCardinalityQ="m">
RDF model or the DAML+OIL model.                                      <daml:onProperty rdf:resource="#p"/>
                                                                      <daml:hasClassQ rdf:resource ="#C"/>
4.2   Anonymous Objects                                             </daml:restriction>

While the use of the query answering approach of Horrocks
                                                                       In the next section we introduce and extension of the com-
and Tessaris enables us to compile out information about
                                                                    pilation algorithm that deals with anonymous objects making
relations that exists between objects in a model, there is still
                                                                    use of these constructs.
implicit relational information that is not captured by this
approach. The reason for this is that the logical nature of         4.3   Compilation with Anonymous Objects
DAML+OIL allows us to capture incomplete information                As motivated above, our approach for compiling relational
about the domain of interest in the sense that it is not required   information with anonymous objects relies on the use of
to always name related objects in a concept definition. There       qualified number restrictions. We assume that all exis-
are also ways of talking about the existence and the number         tential restrictions and unqualified number restrictions have
of related objects in the domain without actually naming            been translated into qualified number restrictions in the way
these objects. The approach of Horrocks and Tessaris ignore         sketched in the last section. Further, we assume that the back-
this information, because their approach aims at answering          ground knowledge is consistent in itself and that the informa-
questions about named objects in a model. We think,                 tion model is consistent with this background model. In es-
however, that in the in the context of the semantic web there       pecially, this guarantees that there is no conflict between the
are also may situations where we are also interested in this        upper and lower bounds of related objects with respect to the
incomplete information. This argument is based on the open          qualified number restrictions we use for compilation. This re-
world assumption. The fact that the model does not contain          quirement can be checked using existing reasoning systems.
the name of the child of a female person in our example does        For every object o and relation r in the model we now perform
not mean that there is no information about this child. It just     a four step compilation process:
happens not to be contained in this specific model. Therefore,
the answer that doris by virtue of being a mother has a child       Step 1: Collect Bounds In the first step, we determine the
we do not know the name of is also a valuable answer to                  possible range of number of objects related to o by the
our example question. Following this argument, we sketch                 relation r. We collect all class descriptions o can be
an approach for also compiling relational information that               proven to be a member of. From these description, we
contains anonymous objects. In the resulting RDF model                   extract all qualified number restrictions that refer to r. Fi-
these anonymous objects will be represented by blank nodes.              nally, we take the largest lower bound and the smallest
                                                                         upper bound with respect to each class name2 occurring
  In order to determine what kind of anonymous objects we                in the restrictions and store these numbers together with
have to deal with, we have to have a look at the possibilities           the class name.
DAML+OIL provides us for describing relations to objects            Step 2: Verify Bounds In the second step, we check
that are not mentioned themselves. There are two operators               whether the bounds determined for the individual classes
      in the hierarchy are consistent with the bound deter-
      mined for their its subclasses. For every class c in the
      hierarchy starting at the bottom of the hierarchy, we sum
      up the lower bounds determined for all direct subclasses.
      We then check, whether this sum is smaller than the up-
      per bound determined for that class.
Step 3: Adjust Bounds In this step, we adjust the lower
    bound on related objects in the light of collected in-
    formation about more specific information about related
    objects that imply some of the information contained in                   (a) Step 1                (b) Step 2
    the current bounds.
       a) As a first step, we set the lower bound to the current
          upper bound if the upper bound is lower is lower
          than the sum of the lower bounds determined for
          all direct subclasses.
       b) As a second step we subtract the number of all
          known and already compiled anonymous objects of
          this type related to o. This ensures that only the
          most specific available information is actually com-
          piled out and prevents us from adding redundant in-
          formation.                                                        (c) Step 3                 (d) Step 4
Step 4: Compile Relations In the last step, we add informa-
    tion about relations of the object o to anonymous objects          Figure 2: Compuring Bounds in a Concept Hierarchy
    to the RDF model by adding the corresponding triples.
    For each class name, we look up the finally determined
    lower bound of objects related to o via r. This bound is               <daml:hasClassQ rdf:resource="#person"/>
    an natural number l that specifies the number of neces-              </daml:restriction>
    sarily existing relations excluding already known rela-            </rdfs:subClassOf>
    tional information and necessary relations to objects of         </daml:Class>
    a more specific type (compare step 2). We generate l
    anonymous objects ai and add the triples (o r ai) and (ai        <rdf:Description rdf:about="betty">
    type c) for each of these objects.                                 <type rdf:resource="#grandma"/>
                                                                       <type rdf:resource="#relaxed-parent"/>
   It is important to recall, that different from the other compi-     <has-child rdf:resource="#peter"/>
lation methods described in this paper, this compilation pro-        </rdf:Description>
cess is of a heuristic nature and does not claim to produce
logically sound results. The reason is that the current version      <rdf:Description rdf:about="peter">
of the compilation approach does not take into account all             <type rdf:resource="#person"/>
                                                                     </rdf:Description>
information contained in the background knowledge.

4.4    A Simple Example                             When we look at the definition of betty, we see that
We illustrate our compilation approach using a simple exam-
                                                 there are different sources of relational information, we are
ple. Consider the following model describing the instance
                                                 interested in. From her being of type mother (implied from
betty:                                           the fact of being a grandmother) we derive a lower bound of
                                                 one for the has-child relation with respect to the class person.
<daml:Class rdf:ID="grandma">                    Further, being a grandma also implies a lower bound of one
  <rdfs:subClassOf>                              on the has-child relation with respect to the class parent,
    <daml:Class rdf:resource="mother">           which is a subclass of person. An upper bound of two on the
    <daml:restriction>                           relation has-child is provided by the membership to the class
                                                 relaxed-parent. Finally, there is the the explicitly mentioned
      <daml:onProperty rdf:resource="#has-child"/>
      <daml:hasClass rdf:resource="#parent"/>    child peter who is of type person. The result of collecting
    </daml:restriction>                          these bounds is shown in figure 3a.
  </rdfs:subClassOf>
</daml:Class>
                                                    After processing the class parent which is lower in the hi-
<daml:Class rdf:ID="relaxed-parent">             erarchy, we see that the lower bound on the number of related
  <rdfs:subClassOf>                              objects of type parent is higher that the number of related
    <daml:restriction daml:maxCarninality="2">   individuals of that type (i.e. one compared to zero). As a
                                                 reaction, we add an anonymous object of type parent to the
      <daml:onProperty rdf:resource="#has-child"/>
RDF model (compare figure 2b). In the next iteration of the         we need more experiences with applying these methods to re-
process, we process the concept person by adding the lower          alistic scenarios in order to develop heuristics for improving
bound of all sub-concepts and checking it against the upper         the compilation result.
bound (figure 2c). From the resulting lower bound of two we
subtract the number of all related objects of this type. In our     References
case there are two objects of this kind: peter and the anony-
                                                                    [Brickley and Guha, 2003] Dan Brickley and R.V. Guha.
mous object created in the previous iteration. As the result
is zero we conclude that the necessary number of related ob-           RDF vocabulary description language 1.0: RDF schema.
                                                                       Working draft, W3C, 2003. http://www.w3.org/TR/rdf-
jects is already present in the model. The triples added in the
                                                                       schema/.
first iteration of the process lead to a more complete defini-
tion of the instance elly with respect to the has-child relation    [Broekstra et al., 2002] J. Broekstra, A. Kampman, and
as shown below:                                                        F. van Harmelen. Sesame: A generic architecture for stor-
                                                                       ing and querying rdf and rdf schema. In The Semantic Web
                                                                       - ISWC 2002 [2002], pages 54–68.
<rdf:Description rdf:about="betty">
  <type rdf:resource="#happy-grandma"/>                             [Cadoli and Donini, 1997] M. Cadoli and F.M. Donini. A
  <type rdf:resource="#relaxed-parent"/>                               survey on knowledge compilation. AI Communications,
  <has-child rdf:resource="#peter"/>                                   10(3-4):137–150, 1997.
  <has-child rdf:resource="_anonX"/>
                                                                    [Caroll, 2002] J. Caroll. Matching RDF graphs. In The Se-
</rdf:Description>
                                                                       mantic Web - ISWC 2002 [2002], pages 5–15.
<rdf:Description rdf:about="_anonX">                                [Hayes, 2003] Patrick Hayes. RDF semantics. Working
  <type rdf:resource="#parent"/>                                       draft, W3C, 2003. http://www.w3.org/TR/rdf-mt/.
</rdf:Description>
                                                                    [Horocks and Tessaris, 2002] I. Horocks and S. Tessaris.
                                                                       Querying the semantic web: A formal approach. In The
   The added information can now be used in querying en-               Semantic Web - ISWC 2002 [2002], pages 177–191.
abling us to derive that elly is necessarily connected to an        [Horrocks and Hendler, 2002] I. Horrocks and J. Hendler.
instance of type parent by the has child relation using a plain
                                                                       The Semantic Web - ISWC 2002, volume 2342 of Lecture
RDF query language. In fact, this object of type parent may
                                                                       Notes in Computer Science. Springer, 2002.
be identical with peter who we already know, but the back-
ground model does not provide us with enough information            [Horrocks and Tessaris, 2000] I. Horrocks and S. Tessaris. A
to prove whether this is true or not.                                  conjunctive query language for description logic aboxes.
                                                                       In AAAI/IAAI, pages 399–404, 2000.
5   Discussion                                                      [Miller et al., 2002] L. Miller, A. Seaborne, and A. Reggiori.
                                                                       Three implementations of SqishQL, a simple RDF query
In this paper, we addressed the problems that arise from               langauge. In The Semantic Web - ISWC 2002 [2002], pages
the existing gap between theoretical investigations and                423–435.
practical implementations of semantic web technology.
For the specific problem of querying RDF models with                [van Harmelen et al., 2001a] Frank van Harmelen, Peter F.
background knowledge we presented an approach to enhance               Patel-Schneider, and Ian Horrocks. Reference descrip-
plain RDF descriptions by information implied by the                   tion of the daml+oil (march 2001) ontology markup
background knowledge. To this end, we discussed some                   language. http://www.daml.org/2001/03/reference.html,
already existing approaches and presented new methods for              march 2001.
compiling complex relational knowledge using the query              [van Harmelen et al., 2001b] Frank        van      Harmelen,
answering approach proposed by Horrocks and Tessaris.                  Peter F. Patel-Schneider, and Ian Horrocks.             A
All the methods mentioned produce a provably correct and               model-theoretic semantics for daml+oil (march
complete result with respect to answering queries about                2001).       http://www.daml.org/2001/03/model-theoretic-
named objects in an RDF model.                                         semantics.html, march 2001.

   We argued that an open world assumption as we face to
on the semantic web also requires to consider anonymous
objects in answering queries. We described a first step to-
wards an adequate treatment of these objects in the compi-
lation step. The approach presented already produces some
reasonable results, but it cannot give any soundness or com-
pleteness guarantees. We do not expect to come up with a
provably complete algorithm for this problem, however, there
are still many options for improving the result presented most
of them concerned with features for defining complex rela-
tional structures (e.g. role hierarchies, transitivity). Further,

</pre>