=Paper=
{{Paper
|id=Vol-201/paper-1
|storemode=property
|title=Achieving Scalability and Expressivity in an RDF Knowledge Base by Implementing Contexts
|pdfUrl=https://ceur-ws.org/Vol-201/14.pdf
|volume=Vol-201
|dblpUrl=https://dblp.org/rec/conf/swap/StoermerPR06
}}
==Achieving Scalability and Expressivity in an RDF Knowledge Base by Implementing Contexts==
Achieving Scalability and Expressivity in an RDF
Knowledge Base by Implementing Contexts
Heiko Stoermer Ignazio Palmisano Domenico Redavid
University of Trento Università degli Studi di Bari Università degli Studi di Bari
Dept. of Information and Communication Tech. Dipartimento di Informatica Dipartimento di Informatica
Trento, Italy Bari, Italy Bari, Italy
Email: stoermer@dit.unitn.it Email: palmisano@di.uniba.it Email: redavid@di.uniba.it
Abstract— In this paper we are presenting the context archi- results, and we wrap up with a conclusion and a short mention
tecture implemented on top of the RDFCore system. With this of planned further works in Sect. VI.
extended Knowledge Representation framework we are trying to
overcome some of the limitations of RDF and OWL as they are II. M OTIVATION AND R ELATED W ORK
today, without losing sight of performance and scalability issues.
We are illustrating motivations – partly based on requirements
One of our initial motivations to move in the direction of
in the VIKEF project – as well as theoretical background, contexts in Semantic Web KBs was our critical view on one
implementation details and test-results of our latest works. of the ideas of the Semantic Web, namely that – with a shared
ontology – two RDF Aboxes provided by different agents
I. I NTRODUCTION can simply be merged, collapsed on identical URIs, and thus
provide a new, bigger KB for answering a query (the pre-merge
Motivated by requirements of the VIKEF1 project, where
scenario is depicted in Fig. 1).
a large-scale Semantic Web knowledge-base about documents
and other objects provides for intelligent services to the user,
we are investigating and developing a more extended KR
framework, trying to overcome some of the limitations of RDF
and OWL as they are today.
Our basic idea is to introduce the notion of context into
Semantic Web Knowledge Representation (KR), as previously
described in [2], [14]. We claim that the distributed nature
of the Semantic Web raises issues that can be attacked by
contextualizing knowledge bases, i.e. restricting the scope of
statements to the circumstances they were made under. Fig. 1. Two RDF Aboxes A and A’ compliant to a single TBox T.
The contribution of this paper is to present one possible
realization of this more complex KR approach for the Seman- However, apart from implicit semantics that are omitted
tic Web, and to illustrate our progress based on the KBMS when applying such a strategy, cases can be constructed that
RDFCore [4]. In continuation of the ideas and preliminary unveil problems even on the logical level. Take the following
results presented in [14], we have been concentrating more on example, as depicted in Fig. 2: on the formalization side we
the aspects of compatibility relations (CRs) between contexts, have a TBox T with some relations that have cardinality con-
which can be used to describe in which way statements in straints, and two ABoxes A and A0 with assertions compliant
more than one context can be combined to answer queries to this TBox. Both ABoxes are consistent by themselves,
to the KB. We have conducted a more extensive experiment but when merged, they produce an inconsistency as the two
to investigate performance aspects of RDFCore and our exten- following statements violate the cardinality constraints in T :
sions, and backed by general theories of Contextual Reasoning < prodi prime minister italian government >
we believe that we will in some cases be able to provide for < berlusconi prime minister italian government >
better scalability than a flat, non-contextual KB. Relying on a host of research done in the area of Context in
The paper is organized as follows: In Sect. II we present KR [7], [12], [11], [6], [1], [5], [13], we believe it is a viable
intuitive and technical motivations for our approach, as well as approach to attack issues of this nature by binding consistent
related work. Sect. III describes our general proposal, whereas sets of assertions to the circumstances they were made under,
Sect. IV contains a technical description of the steps taken to i.e. to limit their scope to a context, as we will describe in
realize our ideas. In Sect. V we present our experimentation Sect. III.
As discussed in [8], [2], [3], this contextualization can
1 Virtual Information and Knowledge Environment Framework; more infor- serve as a basis for a number of KR modelling aspects,
mation at http://www.vikef.net such as temporal evolution, trust, beliefs and provenance. The
Fig. 2. Example formalization that produces an inconsistency when merged.
contributions of our approach compared to the proposals made One issue that becomes obvious immediately is the case
in [8], [3], [10], [9] as well as compared to named graph where the union of C 0 and C produces an inconsistent ABox
implementations in current RDF triple stores are that i) we which makes query-answering impossible. This can result
do not propose or require an extension of the current RDF from cardinality constraints in the TBox (see the Berlusconi-
standard and ii) we aim at substantial support for Compatibility Prodi example in Sect. II), or subsumption issues (an individ-
Relations (CRs). ual o is said to be instance of different classes). Our basic
These relations between contexts enable us to make explicit solution approach is to extract a minimal subgraph containing
in which way the assertions in the related contexts are sup- the statement(s) that caused the inconsistency into a named
posed to be combined for query answering, to provide for graph NG, as illustrated in Fig. 3.
flexible and powerful contextual reasoning as envisioned in
the mentioned bibliography.
In the course of the VIKEF project it became evident that
some of the relations we have in mind have procedural se-
mantics, and can thus not be formalized in an OWL ontology,
and these are what we are concentrating on at the moment. In
the next section we will describe our examplary proposal of
such a complex relation.
III. A N E XEMPLARY CR
Fig. 3. Two contexts C and C’ in an EXTENDS relation.
The EXTENDS relation we have chosen to illustrate is
meant to describe a situation where we know that two contexts
The result is that the query can be processed on the conflict-
describe the same object, but assume that one context contains
free part of the union of C 0 and C. One possible criticism
more information about it than the other.
could be that of course we could pose the query to C alone,
Take the example of two Information Extraction processes
without respecting C 0 , and thus avoid the conflict altogether.
P and P 0 that are run on the same document, at different points
This however ignores the EXT EN DS relation between the
in time. Assume P 0 is a more advanced process and is able to
two (which has been established for a reason), and thus should
extract more information from the document. We propose to
only be allowed on contexts that are not in such a relation.
model this as two contexts C (created by P ) and C 0 (created
The case is of course slightly more complex when we
by P 0 ) with a relation EXT EN DS that explicates that C 0
take into account more than two contexts. We envision the
is an extension to C (a necessary condition for this relation
EXT EN DS relation to be transitive. This can result in a
is that both contexts describe the same object). Intuitively we
reasoning chain i) when establishing the relation, as conflicts
want to keep the information derived from different sources
have to be detected and re-modelled and ii) when querying
separate and with explicit metadata, but have the possibility
the contexts, as the necessary contexts and relevant subgraphs
to combine the resulting information where necessary.
have to be traversed. This chain however is non-cyclic, as
When a query q is posed on C 0 , the procedural semantics
the relation is directional. Section IV describes our first
of EXT EN DS are envisioned as follows:
implementation of this relation.
if q can be answered in C’ We have chosen to attack and illustrate this specific relation
then return answer due to its relative complexity. However, we are convinced that
else our basic approach as described in [14] is fairly general and
propagate query to C’ union C. can be used to implement relations of different kinds. In the
course of the project we envision relations that make explicit
temporal evolution, trust and a number of domain specific The implementation we are presenting in this paper relies on
aspects. RDFCore for RDF models storage, and on Pellet2 for reason-
ing tasks such as consistency check over a View. As illustrated
IV. R EALIZATION in Fig. 4, the DL reasoner is used by the CompatibilityRelation
implementations (note that different implementations could
A. RDFContextManager need different reasoning settings, e.g. only RDFS or OWL Lite
The component we developed to manage contexts is called inference rather than OWL DL inference), while all the storage
RDFContextManager; its architecture is presented in Fig. 4. and retrieval of RDF models is done on RDFContextManager,
RDFContextManager is implemented as a Java interface, ex- which uses RDFCore and its facilities for model storage and
posing methods to: query[14], using the multiuser environment of RDFCore to
enable use of Context information by other applications.
• set the Compatibility Relation Ontology (CRO), which
is the ontology that defines Compatibility Relations, B. The Compatibility Relations Ontology (CRO)
Contexts, parts of Contexts (Graphs) and also gives the
concepts necessary to represent context splitting and The CRO contains the definition of the main concepts used
relations between Contexts and Graphs to describe the KB structure in terms of contexts; it contains
• add new statements to the CRO, stating for example that
the definition of Context and the definition of Graph, where
a given URI C1 represents a Context, that this Context both concepts represent entities that are named graphs; a
extends an existing context C2 , or that there is a Graph Context has the (informal) property of representing something
G1 which is part of C2 and is compatible or not with C1 that has a meaning as a whole, e.g. the set of statements
• add, remove or update Contexts and Graphs in the un-
extracted from a specific document, at a specific time, with
derlying persistence layer a specific algorithm, while a Graph is a set of statements that
• obtain Views over a Context, e.g. ask RDFContextMan-
is included in one or more Contexts or other Graphs, but has no
ager to return all Contexts and Graphs that are connected specific meaning alone (e.g. the set of statements in a Context
to a Context C1 with EXTENDS relations, directly or that cause inconsistencies with another Context). A domain-
by means of part of relations, following all the relation range view of the CRO is given in Fig. 5.
chains and obeying imposed limitations Moreover, the CRO contains the definition of the Splittin-
gReason class, which represents the reason that led to the
isolation of a part of a Context and the storage of that fragment
as a Graph; a SplittingReason instance includes references to
the Context from where the statements that are being split
belonged, the Graph that will hold these statements, the reason
for which this split has been done, e.g. because the statements
create inconsistencies w.r.t. another context (which is also
linked to the reason), and the reification of the statements in
the CRO that triggered the split, if any.
Fig. 4. Architecture of RDFContextManager
An example of SplittingReason generation is the one we
will illustrate in detail in Sect. IV-C.1: let us have Contexts
A CompatibilityRelation is a Java interface exposing meth-
C1 and C2 , if we add to the CRO the statement S1 = C1
ods to:
EXTENDS C2 , this will trigger a consistency check over
• verify whether an implementation of CompatibilityRela- C1 t C2 . If there is an inconsistency, the statements in C2 that
tion should be triggered into action by some statements cause the inconsistency are moved to a Graph G1 , and then a
added to the CRO, e.g. the insertion of a statement C1 SplittingReason SR1 will be created in the CRO, linked to C2
EXTENDS C2 should trigger the consistency check over and G1 , with a reason of class Inconsistency which is linked
C1 t C2 , and, if an inconsistency is detected, counter- to C1 and a part of relation between C2 and G1 ; S1 will
measures should be undertaken, in order to guarantee be reified and attached to SR1 , so that the complete splitting
that a View over C1 do not answer an inconsistent set process can be tracked.
of statements The CRO also acts as a registry for CompatibilityRelation
• carry out the check specific for this CompatibilityRelation implementations, since each declaration of a CompatibilityRe-
• ask this implementation to provide a set of Contexts lation amounts to the declaration of a property in this ontology;
or Graphs that would be excluded from a View over a an AnnotationProperty for this property, called implemen-
Context C1 due to some reason, e.g. incompatibility due tation uri, gives the java class name of the corresponding
to inconsistency implementation; this is used to retrieve the set of Compatibili-
• ask the implementation to provide a set of Contexts or tyRelation that RDFContextManager will use when managing
Graphs that would be included in a View over a Context the CRO and the knowledge base.
C1 , e.g. because of an EXTENDS or a part of relation
or chain of relations 2 www.mindswap.org/2003/pellet/
Fig. 5. Domain-range view of the CR Ontology
C. Use of Compatibility Relations (CR) C1 adds information to both of them, even if the two
The simplest use case for the framework is as follows: extended contexts are incompatible; in fact, a View over
C1 , which is forced to be consistent, will include only
• An external application adds one or more different Con-
one of the extended contexts
texts in RDFContextManager, assigning them URIs or
• The implementation of EXT EN DS will be triggered to
letting RDFContextManager choose one
check matching with the three statements, and it will fire
• The external application asserts some relations between
the check for knowledge base reorganization
the contexts or specific to a context; the relations between
• The check performed by EXT EN DS consists of verify-
the contexts are expressed through properties defined in
ing that any View over C1 that follows the EXT EN DS
the CRO
chain does not produce an inconsistent model; therefore,
• RDFContextManager receives these new assertions, and
it takes the content of C1 and of C2 and runs a DL
triggers all the CR implementations available into first
reasoner (Pellet in this case) over the union. If any
verifying if any of the new assertions is relevant (i.e. the
inconsistency is detected, EXT EN DS tries to isolate
asserted relation corresponds to the URI the implemen-
the responsible statements, selects those that appear in
tation is attached to) and then checking whether the new
C2 and removes them from C2 ; the statements are then
relation is likely to cause reorganization of the knowledge
stored as a Graph G1 . The split is tracked by creating
base; if this is the case, corrective actions are undertaken
a SplittingReason object, connected to C2 , which is
• The external application makes a query over the CRO to
the source, and G1 , which is the result; it is also
find out all the contexts that satisfy some conditions (e.g.
connected to a reason, which in this case is instance of the
all the contexts which have been created in a specific
Inconsistency class, and in turn to C1 which is related
date), and then asks to perform a query over the set
as incompatible w.r.t G1 . The statements added to the
of statements resulting from the union of the contexts;
CRO are reified and attached to the SplittingReason as
this involves creation of a View for each context that is
triggers, in order for the split to be traceable, and finally
selected by the query
a part of relation is asserted between C2 and G1 . Since
1) EXTENDS Example: We will now use EXTENDS as a the EXT EN DS relation is defined transitive, in case
practical example of the described use case: C2 is already connected through a EXT EN DS relation
• Two contexts C1 and C2 are inserted in RDFContextMan- to other contexts, then the check is performed not against
ager C2 alone but over the resulting View; the generated splits
• C1 is asserted to extend C2 w.r.t a specific subject S1 : in the KB can then be distributed along the EXT EN DS
the following statements are added to the CRO: chain, which is one of the scalability issues we analyze
< C1 EXT EN DS C2 > in Sect. V
< C1 describes S1 > • When a View over C1 is requested, all the CR imple-
< C2 describes S1 > mentations are requested to provide a set of Contexts or
Matching objects for the describes predicate are nec- Graphs that must not appear in the final view (EXCLUDE
essary because this enables an application to say that set), i.e. are requested to forbid to follow some paths in
C1 extends different unrelated contexts, in the sense that
Model Consistency Model Consistency
the CRO assertions; this is because, when multiple CR are number check (ms) number check (ms)
present, some of them may forbid the presence of a result 0-1 63476 10 - 11 60091
that others would allow to appear in the results; simply 2-3 49529 12 - 13 62621
4-5 54184 14 - 15 59216
removing all the forbidden results after all the paths are 6-7 58410 16 - 17 58142
followed is not correct nor efficient, since this would 8-9 62342 18 - 19 62041
require complex pruning strategies. After the EXCLUDE TABLE I
set has been computed, all the CR implementations are R ESULTS FOR 70000 TRIPLES MODELS
required to provide the set of Contexts or Graphs that
should appear in the resulting View (INCLUDE set), and
they will prune their visiting graph as soon as a forbidden
result is reached. The final View is then computed as 512 MB of RAM, which is not an adequate server setup). The
the INCLUDE set plus the resources connected through time required to complete the consistency check and automatic
part of to these elements (not including those in the splitting on models of greater size is around one minute, which
EXCLUDE set) is acceptable from our point of view if we consider that this
• The View can now be viewed as a single model, or the operation has to be done only once, and occasionally as new
set of URI for the contexts and graphs can be used as relations are added.
dataset for a SPARQL query to be issued to RDFCore, The most relevant point, here, is that requesting a View
which in turn uses ARQ3 as SPARQL engine to interpret operation will return a set of graph identifiers that can be
and answer it used as dataset for a SPARQL query, ensuring that the model
resulting from the union of the queried data is consistent,
V. R ESULTS
without having to check at the time of querying; this also
In this section we presents the empirical evaluation we have means that the memory requirements (at query time) of the
conducted so far. In order to check the system for scalability, framework only depend on the number of relations between
we needed to design a big knowledge base with non trivial Contexts and Graphs, and not on the size of the contained data,
contents, and at the same time divided in smaller chunks or on their complexity. The memory needed by the SPARQL
without changing the semantics of the content. This, however, engine to run the query itself, instead, depends heavily on the
seems a very difficult task, and so far we have not found specific query; still no complete evaluation of the behavior
real world ontologies that satisfy these requirements, so we of the system w.r.t. the possible kind of queries has been
used a homemade tool to generate individuals for a generic performed.
ontology; repeating the process many times gave us two well
sized knowledge bases. VI. C ONCLUSION AND F URTHER W ORKS
Using the SOFSEM ontology 4 , an ontology to describe the Basing on the opinion that contexts in Semantic Web KR are
SOFSEM conference, we generated two knowledge bases, one a way to tackle some of the current limitations of the languages
composed of 30 models containing about 70000 statements available and provide for better scalability in some cases, we
each (for a total of more than 2 millions triples), and the other have presented a theoretical approach and an implementation
containing 900 models of about 2000 triples each (1.8 millions of Contextual Reasoning in a Semantic Web KB and the
triples); on the first one, we tried to chain the models with associated testing results. We have not only implemented a
EXT EN DS relations involving two models at a time, while context mechanism into our KBMS to be able to use a context
in the second one we chained tirthy models at a time, obtaining as a first-class object in assertions, but also illustrated a way
many chains, and then joined the chains. The results are to provide for context relations with procedural semantics
presented in Table I, where the results for the first experiment which – in our opinion – is required for a complete context
are presented, in Table II for the second experiment. The functionality.
second experiment is also depicted in Fig. 6. Our next steps will be directed towards the formal definition
As is depicted in the graph, the time elapsed to create a and implementation of more compatibility relations. Some of
view over the graphs is almost constant, even if the number of them will be as required by the VIKEF project, but we are also
relations to navigate increases, while the time elapsed to check interested in exploring more general and domain independent
the consistency of the models grows proportionally to their relations between contexts and their properties.
size. It is important to note that the consistency check runs On the implementational side, these planned steps will be
only when new relations are enterend in the CRO; the most accompanied by the development of a more standardized test
frequent operation, then, will be the request to create a View set and a set of exemplary queries that specifically display and
starting from some specified models, and the experimental make use of contexts, to assess the practicability, performance
evaluation shows that this operation is usually performed in and scalability of our implementations.
less than half a second on the test machine (a laptop with
VII. ACKNOWLEDGMENTS
3 http://jena.sourceforge.net
4 http://nb.vse.cz/ svabo/oaei2006/data/Conference.
˜
This research was partially funded by the European Com-
owl mission under the 6th Framework Programme IST Integrated
Model View Consistency Model View Consistency Model View Consistency
number (ms) check (ms) number (ms) check (ms) number (ms) check (ms)
0 91 2043 9 156 5913 18 208 9986
1 176 2542 10 152 6076 19 214 10950
2 111 2956 11 175 6497 20 218 10699
3 122 3185 12 166 7101 21 232 11434
4 153 3634 13 175 7383 22 267 11696
5 114 3894 14 186 8046 23 242 12064
6 132 4328 15 195 8421 24 244 12706
7 134 4907 16 208 8862 25 249 13095
8 149 5057 17 220 9486 26 276 13476
TABLE II
R ESULTS FOR SMALL SIZED MODELS AND LONG CHAINS
Fig. 6. Results trend for small sized models
Project VIKEF - Virtual Information and Knowledge Envi- [9] Graham Klyne. Contexts for RDF Information Mod-
ronment Framework (Contract no. 507173, Priority 2.3.1.7 elling. Content Technologies Ltd, October 2000.
http://www.ninebynine.org/RDFNotes/RDFContexts.html.
Semantic-based Knowledge Systems; more information at [10] Graham Klyne. Circumstance, provenance and partial
http://www.vikef.net). knowledge - Limiting the scope of RDF assertions, 2002.
http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html.
[11] John L. McCarthy. Generality in artificial intelligence. Commun. ACM,
R EFERENCES 30(12):1029–1035, 1987.
[12] John L. McCarthy. Notes on formalizing context. In IJCAI, pages 555–
[1] Massimo Benerecetti, Paolo Bouquet, and Chiara Ghidini. Contextual
562, 1993.
reasoning distilled. J. Exp. Theor. Artif. Intell., 12(3):279–305, 2000.
[13] Luciano Serafini and Paolo Bouquet. Comparing formal theories of
[2] Paolo Bouquet, Luciano Serafini, and Heiko Stoermer. Introducing
context in ai. Artif. Intell., 155(1-2):41–67, 2004.
Context into RDF Knowledge Bases. In Proceedings of SWAP 2005,
[14] Heiko Stoermer, Ignazio Palmisano, Domenico Redavid, Luigi Iannone,
the 2nd Italian Semantic Web Workshop, Trento, Italy, December
Paolo Bouquet, and Giovanni Semeraro. RDF and Contexts: Use
14-16, 2005. CEUR Workshop Proceedings, ISSN 1613-0073, online
of SPARQL and Named Graphs to Achieve Contextualization. In
http://ceur-ws.org/Vol-166/70.pdf, December 2005.
Proceedings of the First Jena User’s Conference, Bristol, UK, April
[3] Jeremy Carroll, Christian Bizer, Patrick Hayes, and Patrick Stickler. 2006. http://jena.hpl.hp.com/juc2006/proceedings/palmisano/paper.pdf.
Named Graphs, Provenance and Trust. In Proceedings of the Fourteenth
International World Wide Web Conference (WWW2005), Chiba, Japan,
volume 14, pages 613–622, May 2005.
[4] F. Esposito, L. Iannone, I. Palmisano, and G. Semeraro. RDF Core: a
Component for Effective Management of RDF Models. In Isabel F. Cruz,
Vipul Kashyap, Stefan Decker, and Rainer Eckstein, editors, Proceedings
of SWDB’03, The first International Workshop on Semantic Web and
Databases, Co-located with VLDB 2003, Humboldt-Universität, Berlin,
Germany, September 7-8, 2003, 2003.
[5] Chiara Ghidini and Luciano Serafini. Distributed first order logics. In
First International Workshop on Labelled Deduction [LD’98], 1998.
[6] Fausto Giunchiglia. Contextual reasoning. Epistemologia - Special Issue
on I Linguaggi e le Macchine, XVI:345–364, 1993.
[7] Ramanathan V. Guha. Contexts: A Formalization and Some Applications.
PhD thesis, Stanford, 1991.
[8] Ramanathan V. Guha, Rob McCool, and Richard Fikes. Contexts for the
semantic web. In Sheila A. McIlraith, Dimitris Plexousakis, and Frank
van Harmelen, editors, International Semantic Web Conference, volume
3298 of Lecture Notes in Computer Science, pages 32–46. Springer,
2004.