Achieving Scalability and Expressivity in an RDF
     Knowledge Base by Implementing Contexts
                       Heiko Stoermer                                      Ignazio Palmisano                     Domenico Redavid
                   University of Trento                            Università degli Studi di Bari          Università degli Studi di Bari
     Dept. of Information and Communication Tech.                   Dipartimento di Informatica             Dipartimento di Informatica
                      Trento, Italy                                          Bari, Italy                              Bari, Italy
               Email: stoermer@dit.unitn.it                        Email: palmisano@di.uniba.it              Email: redavid@di.uniba.it


   Abstract— In this paper we are presenting the context archi-              results, and we wrap up with a conclusion and a short mention
tecture implemented on top of the RDFCore system. With this                  of planned further works in Sect. VI.
extended Knowledge Representation framework we are trying to
overcome some of the limitations of RDF and OWL as they are                               II. M OTIVATION AND R ELATED W ORK
today, without losing sight of performance and scalability issues.
We are illustrating motivations – partly based on requirements
                                                                                One of our initial motivations to move in the direction of
in the VIKEF project – as well as theoretical background,                    contexts in Semantic Web KBs was our critical view on one
implementation details and test-results of our latest works.                 of the ideas of the Semantic Web, namely that – with a shared
                                                                             ontology – two RDF Aboxes provided by different agents
                        I. I NTRODUCTION                                     can simply be merged, collapsed on identical URIs, and thus
                                                                             provide a new, bigger KB for answering a query (the pre-merge
   Motivated by requirements of the VIKEF1 project, where
                                                                             scenario is depicted in Fig. 1).
a large-scale Semantic Web knowledge-base about documents
and other objects provides for intelligent services to the user,
we are investigating and developing a more extended KR
framework, trying to overcome some of the limitations of RDF
and OWL as they are today.
   Our basic idea is to introduce the notion of context into
Semantic Web Knowledge Representation (KR), as previously
described in [2], [14]. We claim that the distributed nature
of the Semantic Web raises issues that can be attacked by
contextualizing knowledge bases, i.e. restricting the scope of
statements to the circumstances they were made under.                           Fig. 1.   Two RDF Aboxes A and A’ compliant to a single TBox T.
   The contribution of this paper is to present one possible
realization of this more complex KR approach for the Seman-                     However, apart from implicit semantics that are omitted
tic Web, and to illustrate our progress based on the KBMS                    when applying such a strategy, cases can be constructed that
RDFCore [4]. In continuation of the ideas and preliminary                    unveil problems even on the logical level. Take the following
results presented in [14], we have been concentrating more on                example, as depicted in Fig. 2: on the formalization side we
the aspects of compatibility relations (CRs) between contexts,               have a TBox T with some relations that have cardinality con-
which can be used to describe in which way statements in                     straints, and two ABoxes A and A0 with assertions compliant
more than one context can be combined to answer queries                      to this TBox. Both ABoxes are consistent by themselves,
to the KB. We have conducted a more extensive experiment                     but when merged, they produce an inconsistency as the two
to investigate performance aspects of RDFCore and our exten-                 following statements violate the cardinality constraints in T :
sions, and backed by general theories of Contextual Reasoning                   < prodi prime minister italian government >
we believe that we will in some cases be able to provide for                    < berlusconi prime minister italian government >
better scalability than a flat, non-contextual KB.                              Relying on a host of research done in the area of Context in
   The paper is organized as follows: In Sect. II we present                 KR [7], [12], [11], [6], [1], [5], [13], we believe it is a viable
intuitive and technical motivations for our approach, as well as             approach to attack issues of this nature by binding consistent
related work. Sect. III describes our general proposal, whereas              sets of assertions to the circumstances they were made under,
Sect. IV contains a technical description of the steps taken to              i.e. to limit their scope to a context, as we will describe in
realize our ideas. In Sect. V we present our experimentation                 Sect. III.
                                                                                As discussed in [8], [2], [3], this contextualization can
  1 Virtual Information and Knowledge Environment Framework; more infor-     serve as a basis for a number of KR modelling aspects,
mation at http://www.vikef.net                                               such as temporal evolution, trust, beliefs and provenance. The
                                 Fig. 2.   Example formalization that produces an inconsistency when merged.


contributions of our approach compared to the proposals made               One issue that becomes obvious immediately is the case
in [8], [3], [10], [9] as well as compared to named graph               where the union of C 0 and C produces an inconsistent ABox
implementations in current RDF triple stores are that i) we             which makes query-answering impossible. This can result
do not propose or require an extension of the current RDF               from cardinality constraints in the TBox (see the Berlusconi-
standard and ii) we aim at substantial support for Compatibility        Prodi example in Sect. II), or subsumption issues (an individ-
Relations (CRs).                                                        ual o is said to be instance of different classes). Our basic
   These relations between contexts enable us to make explicit          solution approach is to extract a minimal subgraph containing
in which way the assertions in the related contexts are sup-            the statement(s) that caused the inconsistency into a named
posed to be combined for query answering, to provide for                graph NG, as illustrated in Fig. 3.
flexible and powerful contextual reasoning as envisioned in
the mentioned bibliography.
   In the course of the VIKEF project it became evident that
some of the relations we have in mind have procedural se-
mantics, and can thus not be formalized in an OWL ontology,
and these are what we are concentrating on at the moment. In
the next section we will describe our examplary proposal of
such a complex relation.

                  III. A N E XEMPLARY CR
                                                                                 Fig. 3.   Two contexts C and C’ in an EXTENDS relation.
   The EXTENDS relation we have chosen to illustrate is
meant to describe a situation where we know that two contexts
                                                                           The result is that the query can be processed on the conflict-
describe the same object, but assume that one context contains
                                                                        free part of the union of C 0 and C. One possible criticism
more information about it than the other.
                                                                        could be that of course we could pose the query to C alone,
   Take the example of two Information Extraction processes
                                                                        without respecting C 0 , and thus avoid the conflict altogether.
P and P 0 that are run on the same document, at different points
                                                                        This however ignores the EXT EN DS relation between the
in time. Assume P 0 is a more advanced process and is able to
                                                                        two (which has been established for a reason), and thus should
extract more information from the document. We propose to
                                                                        only be allowed on contexts that are not in such a relation.
model this as two contexts C (created by P ) and C 0 (created
                                                                           The case is of course slightly more complex when we
by P 0 ) with a relation EXT EN DS that explicates that C 0
                                                                        take into account more than two contexts. We envision the
is an extension to C (a necessary condition for this relation
                                                                        EXT EN DS relation to be transitive. This can result in a
is that both contexts describe the same object). Intuitively we
                                                                        reasoning chain i) when establishing the relation, as conflicts
want to keep the information derived from different sources
                                                                        have to be detected and re-modelled and ii) when querying
separate and with explicit metadata, but have the possibility
                                                                        the contexts, as the necessary contexts and relevant subgraphs
to combine the resulting information where necessary.
                                                                        have to be traversed. This chain however is non-cyclic, as
   When a query q is posed on C 0 , the procedural semantics
                                                                        the relation is directional. Section IV describes our first
of EXT EN DS are envisioned as follows:
                                                                        implementation of this relation.
if q can be answered in C’                                                 We have chosen to attack and illustrate this specific relation
    then return answer                                                  due to its relative complexity. However, we are convinced that
else                                                                    our basic approach as described in [14] is fairly general and
    propagate query to C’ union C.                                      can be used to implement relations of different kinds. In the
                                                                        course of the project we envision relations that make explicit
temporal evolution, trust and a number of domain specific            The implementation we are presenting in this paper relies on
aspects.                                                          RDFCore for RDF models storage, and on Pellet2 for reason-
                                                                  ing tasks such as consistency check over a View. As illustrated
                       IV. R EALIZATION                           in Fig. 4, the DL reasoner is used by the CompatibilityRelation
                                                                  implementations (note that different implementations could
A. RDFContextManager                                              need different reasoning settings, e.g. only RDFS or OWL Lite
  The component we developed to manage contexts is called         inference rather than OWL DL inference), while all the storage
RDFContextManager; its architecture is presented in Fig. 4.       and retrieval of RDF models is done on RDFContextManager,
RDFContextManager is implemented as a Java interface, ex-         which uses RDFCore and its facilities for model storage and
posing methods to:                                                query[14], using the multiuser environment of RDFCore to
                                                                  enable use of Context information by other applications.
  • set the Compatibility Relation Ontology (CRO), which
    is the ontology that defines Compatibility Relations,         B. The Compatibility Relations Ontology (CRO)
    Contexts, parts of Contexts (Graphs) and also gives the
    concepts necessary to represent context splitting and            The CRO contains the definition of the main concepts used
    relations between Contexts and Graphs                         to describe the KB structure in terms of contexts; it contains
  • add new statements to the CRO, stating for example that
                                                                  the definition of Context and the definition of Graph, where
    a given URI C1 represents a Context, that this Context        both concepts represent entities that are named graphs; a
    extends an existing context C2 , or that there is a Graph     Context has the (informal) property of representing something
    G1 which is part of C2 and is compatible or not with C1       that has a meaning as a whole, e.g. the set of statements
  • add, remove or update Contexts and Graphs in the un-
                                                                  extracted from a specific document, at a specific time, with
    derlying persistence layer                                    a specific algorithm, while a Graph is a set of statements that
  • obtain Views over a Context, e.g. ask RDFContextMan-
                                                                  is included in one or more Contexts or other Graphs, but has no
    ager to return all Contexts and Graphs that are connected     specific meaning alone (e.g. the set of statements in a Context
    to a Context C1 with EXTENDS relations, directly or           that cause inconsistencies with another Context). A domain-
    by means of part of relations, following all the relation     range view of the CRO is given in Fig. 5.
    chains and obeying imposed limitations                           Moreover, the CRO contains the definition of the Splittin-
                                                                  gReason class, which represents the reason that led to the
                                                                  isolation of a part of a Context and the storage of that fragment
                                                                  as a Graph; a SplittingReason instance includes references to
                                                                  the Context from where the statements that are being split
                                                                  belonged, the Graph that will hold these statements, the reason
                                                                  for which this split has been done, e.g. because the statements
                                                                  create inconsistencies w.r.t. another context (which is also
                                                                  linked to the reason), and the reification of the statements in
                                                                  the CRO that triggered the split, if any.
            Fig. 4.   Architecture of RDFContextManager
                                                                     An example of SplittingReason generation is the one we
                                                                  will illustrate in detail in Sect. IV-C.1: let us have Contexts
  A CompatibilityRelation is a Java interface exposing meth-
                                                                  C1 and C2 , if we add to the CRO the statement S1 = C1
ods to:
                                                                  EXTENDS C2 , this will trigger a consistency check over
  • verify whether an implementation of CompatibilityRela-        C1 t C2 . If there is an inconsistency, the statements in C2 that
    tion should be triggered into action by some statements       cause the inconsistency are moved to a Graph G1 , and then a
    added to the CRO, e.g. the insertion of a statement C1        SplittingReason SR1 will be created in the CRO, linked to C2
    EXTENDS C2 should trigger the consistency check over          and G1 , with a reason of class Inconsistency which is linked
    C1 t C2 , and, if an inconsistency is detected, counter-      to C1 and a part of relation between C2 and G1 ; S1 will
    measures should be undertaken, in order to guarantee          be reified and attached to SR1 , so that the complete splitting
    that a View over C1 do not answer an inconsistent set         process can be tracked.
    of statements                                                    The CRO also acts as a registry for CompatibilityRelation
  • carry out the check specific for this CompatibilityRelation   implementations, since each declaration of a CompatibilityRe-
  • ask this implementation to provide a set of Contexts          lation amounts to the declaration of a property in this ontology;
    or Graphs that would be excluded from a View over a           an AnnotationProperty for this property, called implemen-
    Context C1 due to some reason, e.g. incompatibility due       tation uri, gives the java class name of the corresponding
    to inconsistency                                              implementation; this is used to retrieve the set of Compatibili-
  • ask the implementation to provide a set of Contexts or        tyRelation that RDFContextManager will use when managing
    Graphs that would be included in a View over a Context        the CRO and the knowledge base.
    C1 , e.g. because of an EXTENDS or a part of relation
    or chain of relations                                           2 www.mindswap.org/2003/pellet/
                                             Fig. 5.   Domain-range view of the CR Ontology


C. Use of Compatibility Relations (CR)                                     C1 adds information to both of them, even if the two
  The simplest use case for the framework is as follows:                   extended contexts are incompatible; in fact, a View over
                                                                           C1 , which is forced to be consistent, will include only
  • An external application adds one or more different Con-
                                                                           one of the extended contexts
     texts in RDFContextManager, assigning them URIs or
                                                                         • The implementation of EXT EN DS will be triggered to
     letting RDFContextManager choose one
                                                                           check matching with the three statements, and it will fire
  • The external application asserts some relations between
                                                                           the check for knowledge base reorganization
     the contexts or specific to a context; the relations between
                                                                         • The check performed by EXT EN DS consists of verify-
     the contexts are expressed through properties defined in
                                                                           ing that any View over C1 that follows the EXT EN DS
     the CRO
                                                                           chain does not produce an inconsistent model; therefore,
  • RDFContextManager receives these new assertions, and
                                                                           it takes the content of C1 and of C2 and runs a DL
     triggers all the CR implementations available into first
                                                                           reasoner (Pellet in this case) over the union. If any
     verifying if any of the new assertions is relevant (i.e. the
                                                                           inconsistency is detected, EXT EN DS tries to isolate
     asserted relation corresponds to the URI the implemen-
                                                                           the responsible statements, selects those that appear in
     tation is attached to) and then checking whether the new
                                                                           C2 and removes them from C2 ; the statements are then
     relation is likely to cause reorganization of the knowledge
                                                                           stored as a Graph G1 . The split is tracked by creating
     base; if this is the case, corrective actions are undertaken
                                                                           a SplittingReason object, connected to C2 , which is
  • The external application makes a query over the CRO to
                                                                           the source, and G1 , which is the result; it is also
     find out all the contexts that satisfy some conditions (e.g.
                                                                           connected to a reason, which in this case is instance of the
     all the contexts which have been created in a specific
                                                                           Inconsistency class, and in turn to C1 which is related
     date), and then asks to perform a query over the set
                                                                           as incompatible w.r.t G1 . The statements added to the
     of statements resulting from the union of the contexts;
                                                                           CRO are reified and attached to the SplittingReason as
     this involves creation of a View for each context that is
                                                                           triggers, in order for the split to be traceable, and finally
     selected by the query
                                                                           a part of relation is asserted between C2 and G1 . Since
  1) EXTENDS Example: We will now use EXTENDS as a                         the EXT EN DS relation is defined transitive, in case
practical example of the described use case:                               C2 is already connected through a EXT EN DS relation
  • Two contexts C1 and C2 are inserted in RDFContextMan-                  to other contexts, then the check is performed not against
     ager                                                                  C2 alone but over the resulting View; the generated splits
  • C1 is asserted to extend C2 w.r.t a specific subject S1 :              in the KB can then be distributed along the EXT EN DS
     the following statements are added to the CRO:                        chain, which is one of the scalability issues we analyze
     < C1 EXT EN DS C2 >                                                   in Sect. V
     < C1 describes S1 >                                                 • When a View over C1 is requested, all the CR imple-
     < C2 describes S1 >                                                   mentations are requested to provide a set of Contexts or
     Matching objects for the describes predicate are nec-                 Graphs that must not appear in the final view (EXCLUDE
     essary because this enables an application to say that                set), i.e. are requested to forbid to follow some paths in
     C1 extends different unrelated contexts, in the sense that
                                                                               Model    Consistency   Model     Consistency
    the CRO assertions; this is because, when multiple CR are                  number   check (ms)    number    check (ms)
    present, some of them may forbid the presence of a result                   0-1       63476       10 - 11     60091
    that others would allow to appear in the results; simply                    2-3       49529       12 - 13     62621
                                                                                4-5       54184       14 - 15     59216
    removing all the forbidden results after all the paths are                  6-7       58410       16 - 17     58142
    followed is not correct nor efficient, since this would                     8-9       62342       18 - 19     62041
    require complex pruning strategies. After the EXCLUDE                                       TABLE I
    set has been computed, all the CR implementations are                          R ESULTS FOR 70000 TRIPLES MODELS
    required to provide the set of Contexts or Graphs that
    should appear in the resulting View (INCLUDE set), and
    they will prune their visiting graph as soon as a forbidden
    result is reached. The final View is then computed as           512 MB of RAM, which is not an adequate server setup). The
    the INCLUDE set plus the resources connected through            time required to complete the consistency check and automatic
    part of to these elements (not including those in the           splitting on models of greater size is around one minute, which
    EXCLUDE set)                                                    is acceptable from our point of view if we consider that this
  • The View can now be viewed as a single model, or the            operation has to be done only once, and occasionally as new
    set of URI for the contexts and graphs can be used as           relations are added.
    dataset for a SPARQL query to be issued to RDFCore,                The most relevant point, here, is that requesting a View
    which in turn uses ARQ3 as SPARQL engine to interpret           operation will return a set of graph identifiers that can be
    and answer it                                                   used as dataset for a SPARQL query, ensuring that the model
                                                                    resulting from the union of the queried data is consistent,
                         V. R ESULTS
                                                                    without having to check at the time of querying; this also
   In this section we presents the empirical evaluation we have     means that the memory requirements (at query time) of the
conducted so far. In order to check the system for scalability,     framework only depend on the number of relations between
we needed to design a big knowledge base with non trivial           Contexts and Graphs, and not on the size of the contained data,
contents, and at the same time divided in smaller chunks            or on their complexity. The memory needed by the SPARQL
without changing the semantics of the content. This, however,       engine to run the query itself, instead, depends heavily on the
seems a very difficult task, and so far we have not found           specific query; still no complete evaluation of the behavior
real world ontologies that satisfy these requirements, so we        of the system w.r.t. the possible kind of queries has been
used a homemade tool to generate individuals for a generic          performed.
ontology; repeating the process many times gave us two well
sized knowledge bases.                                                       VI. C ONCLUSION AND F URTHER W ORKS
   Using the SOFSEM ontology 4 , an ontology to describe the           Basing on the opinion that contexts in Semantic Web KR are
SOFSEM conference, we generated two knowledge bases, one            a way to tackle some of the current limitations of the languages
composed of 30 models containing about 70000 statements             available and provide for better scalability in some cases, we
each (for a total of more than 2 millions triples), and the other   have presented a theoretical approach and an implementation
containing 900 models of about 2000 triples each (1.8 millions      of Contextual Reasoning in a Semantic Web KB and the
triples); on the first one, we tried to chain the models with       associated testing results. We have not only implemented a
EXT EN DS relations involving two models at a time, while           context mechanism into our KBMS to be able to use a context
in the second one we chained tirthy models at a time, obtaining     as a first-class object in assertions, but also illustrated a way
many chains, and then joined the chains. The results are            to provide for context relations with procedural semantics
presented in Table I, where the results for the first experiment    which – in our opinion – is required for a complete context
are presented, in Table II for the second experiment. The           functionality.
second experiment is also depicted in Fig. 6.                          Our next steps will be directed towards the formal definition
   As is depicted in the graph, the time elapsed to create a        and implementation of more compatibility relations. Some of
view over the graphs is almost constant, even if the number of      them will be as required by the VIKEF project, but we are also
relations to navigate increases, while the time elapsed to check    interested in exploring more general and domain independent
the consistency of the models grows proportionally to their         relations between contexts and their properties.
size. It is important to note that the consistency check runs          On the implementational side, these planned steps will be
only when new relations are enterend in the CRO; the most           accompanied by the development of a more standardized test
frequent operation, then, will be the request to create a View      set and a set of exemplary queries that specifically display and
starting from some specified models, and the experimental           make use of contexts, to assess the practicability, performance
evaluation shows that this operation is usually performed in        and scalability of our implementations.
less than half a second on the test machine (a laptop with
                                                                                     VII. ACKNOWLEDGMENTS
  3 http://jena.sourceforge.net
  4 http://nb.vse.cz/ svabo/oaei2006/data/Conference.
                     ˜
                                                                      This research was partially funded by the European Com-
owl                                                                 mission under the 6th Framework Programme IST Integrated
                          Model      View     Consistency       Model    View     Consistency    Model      View    Consistency
                          number     (ms)     check (ms)        number   (ms)     check (ms)     number     (ms)    check (ms)
                            0         91         2043             9       156        5913          18       208        9986
                            1        176         2542             10      152        6076          19       214       10950
                            2        111         2956             11      175        6497          20       218       10699
                            3        122         3185             12      166        7101          21       232       11434
                            4        153         3634             13      175        7383          22       267       11696
                            5        114         3894             14      186        8046          23       242       12064
                            6        132         4328             15      195        8421          24       244       12706
                            7        134         4907             16      208        8862          25       249       13095
                            8        149         5057             17      220        9486          26       276       13476
                                                                         TABLE II
                                                R ESULTS FOR SMALL SIZED MODELS AND LONG CHAINS


                                                      Fig. 6.    Results trend for small sized models


Project VIKEF - Virtual Information and Knowledge Envi-                          [9] Graham Klyne.                Contexts for RDF Information Mod-
ronment Framework (Contract no. 507173, Priority 2.3.1.7                             elling.           Content       Technologies    Ltd,   October   2000.
                                                                                     http://www.ninebynine.org/RDFNotes/RDFContexts.html.
Semantic-based Knowledge Systems; more information at                           [10] Graham Klyne.                Circumstance, provenance and partial
http://www.vikef.net).                                                               knowledge - Limiting the scope of RDF assertions, 2002.
                                                                                     http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html.
                                                                                [11] John L. McCarthy. Generality in artificial intelligence. Commun. ACM,
                            R EFERENCES                                              30(12):1029–1035, 1987.
                                                                                [12] John L. McCarthy. Notes on formalizing context. In IJCAI, pages 555–
[1] Massimo Benerecetti, Paolo Bouquet, and Chiara Ghidini. Contextual
                                                                                     562, 1993.
    reasoning distilled. J. Exp. Theor. Artif. Intell., 12(3):279–305, 2000.
                                                                                [13] Luciano Serafini and Paolo Bouquet. Comparing formal theories of
[2] Paolo Bouquet, Luciano Serafini, and Heiko Stoermer. Introducing
                                                                                     context in ai. Artif. Intell., 155(1-2):41–67, 2004.
    Context into RDF Knowledge Bases. In Proceedings of SWAP 2005,
                                                                                [14] Heiko Stoermer, Ignazio Palmisano, Domenico Redavid, Luigi Iannone,
    the 2nd Italian Semantic Web Workshop, Trento, Italy, December
                                                                                     Paolo Bouquet, and Giovanni Semeraro. RDF and Contexts: Use
    14-16, 2005. CEUR Workshop Proceedings, ISSN 1613-0073, online
                                                                                     of SPARQL and Named Graphs to Achieve Contextualization. In
    http://ceur-ws.org/Vol-166/70.pdf, December 2005.
                                                                                     Proceedings of the First Jena User’s Conference, Bristol, UK, April
[3] Jeremy Carroll, Christian Bizer, Patrick Hayes, and Patrick Stickler.            2006. http://jena.hpl.hp.com/juc2006/proceedings/palmisano/paper.pdf.
    Named Graphs, Provenance and Trust. In Proceedings of the Fourteenth
    International World Wide Web Conference (WWW2005), Chiba, Japan,
    volume 14, pages 613–622, May 2005.
[4] F. Esposito, L. Iannone, I. Palmisano, and G. Semeraro. RDF Core: a
    Component for Effective Management of RDF Models. In Isabel F. Cruz,
    Vipul Kashyap, Stefan Decker, and Rainer Eckstein, editors, Proceedings
    of SWDB’03, The first International Workshop on Semantic Web and
    Databases, Co-located with VLDB 2003, Humboldt-Universität, Berlin,
    Germany, September 7-8, 2003, 2003.
[5] Chiara Ghidini and Luciano Serafini. Distributed first order logics. In
    First International Workshop on Labelled Deduction [LD’98], 1998.
[6] Fausto Giunchiglia. Contextual reasoning. Epistemologia - Special Issue
    on I Linguaggi e le Macchine, XVI:345–364, 1993.
[7] Ramanathan V. Guha. Contexts: A Formalization and Some Applications.
    PhD thesis, Stanford, 1991.
[8] Ramanathan V. Guha, Rob McCool, and Richard Fikes. Contexts for the
    semantic web. In Sheila A. McIlraith, Dimitris Plexousakis, and Frank
    van Harmelen, editors, International Semantic Web Conference, volume
    3298 of Lecture Notes in Computer Science, pages 32–46. Springer,
    2004.