1144 CAiSE'06 Doctoral Consortium Introducing Context into Semantic Web Knowledge Bases Heiko Stoermer University of Trento, Dept. of Information and Communication Tech., Via Sommarive 10, 38050 Trento, Italy stoermer@dit.unitn.it Abstract. Knowledge Representation in the Semantic Web is mainly characterized by the capabilities of the languages OWL and RDF. When attacking non-trivial problems such as the creation and maintenance of large-scale knowledge bases in RDF, it becomes evident that RDF lacks a central feature, namely the capability of restricting the truth value of a statement to a context, with far-reaching consequences which we will hint at in this paper. We propose a solution approach to this problem: the extension of RDF knowledge bases with context features. We will present theoretical background, envisioned experiments as well as a comparative study based on two technically di erent approaches, and illustrate the applications and bene ts of contextual Knowledge Representation in the Semantic Web. Keywords: Context, Knowledge Representation, Semantic Web, RDF 1 Introduction The general setting of this thesis proposal is the area of the Semantic Web. Since the rst concrete published ideas in this direction in 2001 [3], many interesting and progressive technologies have emerged to pursue the vision of a Web that will not only simply contain data but semantic information that is machine- processable in a meaningful way. The original approach was to keep things simple, as for the authors of [3] it was obvious that complex approaches in Arti cial Intelligence (AI) and espe- cially in the eld of Human Language Technology (HLT) had not advanced to a level that could be called usable for the desired machine interpretation of Web documents authored by humans. So at the base of the layered architecture sketched in [3] there was supposed to be the meaningful annotation of Web documents to transform them into Semantic Web Documents. Technologies such as RDF and RDF Schema [10] were among the rst to come into existence { RDF to annotate documents and RDFS the de ne the regarding annotation vocabulary. Together with OWL { a language for conceptual modeling { these two languages and the related CAiSE'06 DC 1145 tools play a central role in the Semantic Web and have become standards for representing what a machine can know about a document (and the world). From the original idea of rather simple (and possibly manual) annotation, things have developed farther, not only through the in uence of the AI commu- nity, on who's work for example in Description Logics1 and Automated Reason- ing the di erent variants of the OWL language are based. More recent projects in the Semantic Web use a more self-con dent vocab- ulary and talk about knowledge bases instead of metadata, and use inference techniques and Automated Reasoning to provide semantic-enabled services. This development has brought the e orts of the Semantic Web closer to research that has been performed for many years in the area of AI and speci cally Knowledge Representation (KR). This growing overlap opens opportunities for applying theories from AI to the Semantic Web to ensure that the e orts taken so far re- ceive the best theoretical base they can have and also to contribute new ndings back to the AI eld. This thesis proposal will deal with one aspect of KR, the notion of context, and how to improve current KR approaches in the Semantic Web by incorpo- rating context theories to RDF Knowledge Bases. I will illustrate motivations of why sensible KR for the Semantic Web should be based on contexts, and underline this opinion by outlining requirements of the large ongoing European project Vikef 2 that I am involved in. I will furthermore give details about work that has already been performed in this area, by myself and other people, as well as describe the vision for the proposed research in more depth. Finally, I will deliver an outline of which bene ts and applications a good approach in this topic will be able to provide. 2 Problem Description and Motivation The Vikef project deals with creating large-scale information systems that base on Semantic Web technology. At the center of the envisioned systems there is an RDF knowledge base (KB) that contains a large amount of information about documents and their contents. This information is stored in an RDF triple store that has been developed in the course of the project by other partners in the consortium. The triple store has originally been envisioned to be a single bag of RDF triples, i.e. all statements are stored in the same information space together. However, from a Knowledge Representation point of view RDF statements in general are context-free, and thus represent statements of universal truth, while documents contain context sensitive information, i.e. information whose inter- pretation depends on the context in which the document was written. In e ect, the original way to proceed can easily generate contradictory statements to be stored in the KB, such that for instance \Silvio Berlusconi is the Prime minister 1 As one of many relevant sources for information about Description Logics please refer to [1] 2 For more information about the Vikef architecture, features and application sce- narios please refer to the project's website: http://www.vikef.net 1146 CAiSE'06 Doctoral Consortium of Italy" and \Romano Prodi is the Prime minister of Italy" as the result of ar- ticles written at di erent points in time. Such phenomena are however unwanted in a logical system because they would seriously interfere with the higher level reasoning that is to be performed to provide the envisioned services. Addition- ally, we would like to be able to model other aspects such as relevance, credibility and validity of a statement, all of which require further quali cation of a single statement or a de ned set of statements. An innumerous amount of similar examples can be constructed for a single information system, although in such a system it would at least be possible to implement some heuristics to try to keep track of these issues, at least at the time of insertion of a triple, to minimize the problem. However, if we think about the Semantic Web as a whole, with a large number of uncoordinated information systems, the problem becomes even more evident. If every peer builds up a KB of unquali ed RDF statements, the set of universally true facts in the Semantic Web becomes enormously large and impossible to handle from a semantic point of view. In our opinion, such contradictions, contradictory beliefs and facts which become semantically incorrect in the absence of additional pragmatic or contex- tual information are likely to impose serious problems on the coordination and interoperation of information systems in the Semantic Web. 3 Research Proposal: Context in RDF Knowledge Bases We think that the mentioned issues can be attacked by introducing the notion of context into RDF, to limit the scope of an RDF statement to the context in which it is relevant. Especially, we want to model that a statement is true only under a certain set of conditions, which will help us store information in the KB that would cause contradictions or inconsistencies in a non-contextual RDF KB. 3.1 Context in KR - Multi Context Systems The ideas presented in this paper base on the logical theory of Multi Context Systems (MCS) and the principles of Locality and Compatibility presented e.g. in [11], with in uences from [8, 9]. Basically, this theory states that contexts can be seen in a peer-to-peer view, resembling more general aspects such as human beliefs, agent knowledge and other distributed systems. The important aspect is that reasoning within a context follows standards mechanisms, as the non-elementary view on the axioms does not require to keep track of the con- text they are relevant for. Relations between contexts however, i.e. to reason across contexts, are to be expressed in so-called compatibility relations (CRs), that formalize exactly how under certain circumstances knowledge from other contexts becomes relevant. Regarding RDF in this case we claim that a context can be thought of as a locally coherent set of axioms, and we envision CRs to be modeled as a semantic attachment [16]. CAiSE'06 DC 1147 3.2 Basic Work The basic idea is to have all statements that belong to a context in a separate named RDF graph, and extend the RDF semantics in a way to enable contexts to appear as standard objects in RDF statements of other contexts. One way to achieve this has been described in [6]. Then, we want to model the mentioned compatibility relations (CRs) between contexts, to allow for reasoning across contexts. This aspect is probably the most important one, because from an application perspective it is crucial that sensible queries can be issued and all relevant information is taken into account - which requires reasoning across contexts and reasoning on the relations between contexts. Several approaches can be thought of to model compatibility relations in our architecture. First of all, one could think of allowing the implementer of an information system to provide their own vocabularies (ontologies) to describe relations between contexts. A similar option would be for us to provide such an ontology as part of the architecture. However, in our opinion the basic problem with these approaches is the fact that many interesting relations between ar- chitectures semantically cannot be fully formalized with the help of a Semantic Web ontology, which is based on Description Logics 3 . One of the questions that might arise is how these compatibility relations are supposed to be modeled. At the moment, we see three approaches to do this: 1. Formally de ne a xed set of compatibility relations as part of our architec- ture and require all systems that implement the architecture to take care of providing also an implementation of the relations. 2. Provide an ontology for context relations, so that there exists a vocabulary to describe these relations with the help of RDF. This approach is slightly more exible, because the ontology would be extendable. 3. De ne a CR to be implemented as a semantic attachment [16], which can be thought of as a sort of plugin to the system, one attachment per CR. This has the positive e ects that i) there is no restriction on how many and which kind of CRs are part of such a system and ii) implementation of the CRs is generally not restricted to any speci c language or system. Very basic and preliminary results including some of the above ideas have been presented in[6]. 3.3 Open Issues and Further Work As we are only at the very beginning of our work, there are numerous open issues and questions to be explored. We will only list a few, to illustrate the direction 3 As an example for this claim take a relation such as . The semantics of this relation have to be expressed algorithmically: c and c0 are taken to be compatible in the sense that one does not contain facts that contradict with facts of the other and that the relevant context parameters are the same; then, if no answer to a query to c can be given, the query will be propagated to c0 . 1148 CAiSE'06 Doctoral Consortium of the proposed work. To begin with, there are aspects about the underlying theory that in our opinion have to be cleared. Among these are: { What are the semantics of overlapping contexts and how are they to be modeled? { What are the semantics of adding a statement to the context, or adding a statement to the description of a context? { In which way are statements in a context a ected by statements about a context? { How do we discover that a statement is not coherent with a context and thus may not be added to it? { Should contexts be transparent or opaque to a query language? Secondly, as one of the major aspects of the research to be performed, we in- tend to invest a reasonable amount of e ort into the exploration of compatibility relations. As already mentioned before, there are several implementation options that come to mind, but we want to make sure all possibilities are thought of. Additionally, one of our objectives is also to see if there are limitations to the Multi Context Systems theory, what their nature is and if they can be overcome. Regarding the relations between contexts themselves, we would like to investi- gate if there are general relations that can always be assumed { such as for example the ones existing in relational database systems { and how we want to deal with application-speci c relations. It is our goal to explore the possibilities that the theory o ers, and to de ne the formal properties of relations we nd to be relevant. 3.4 Comparative and Experimental Objectives As a further step in addition to the more theoretical aspects mentioned before, we are currently establishing close collaboration with two other research groups to lay the base for comparative experimental work that is envisioned to be part of the outcomes of the proposed dissertation. With the group responsible for developing the mentioned RDF triple store we are currently working on an im- plementation based on RDF named graphs, which we have described in [15]. Additionally, with the group behind the RDF-based P2P system DBin4 we will try to develop a second implementation based on RDF's rei cation feature, to explore if there are ways to overcome the limitations mentioned later in Sect. 4. A substantial part of the proposed work will be the comparison and evalu- ation of the two prototypes regarding soundness, limitations, performance and development e ort. The setting of the proposed work within a large-scale research project is very fortunate and bene cial for this kind of comparative studies. We expect very large datasets to be generated and available for intense evaluation, which will hopefully make the results interesting and highly relevant for the Semantic Web community. 4 http://www.dbin.org/ CAiSE'06 DC 1149 4 State of the Art and Related Work We are certainly not the rst to raise the issue of underquali ed statements in RDF. The straight forward approach to tackle this problem would be to use the rei cation capability of RDF: for every statement inserted into an RDF graph we also insert a number of meta-statements about this statement, containing all relevant context parameters, e.g.: { <1996 IsTheTimeOf ""> { <2005 IsTheTimeOf ""> There exist di erent opinions about the questions whether parameters de- scribing a context can be limited or not [2]. In any case, this approach would be implementable using standard RDF. However, with a potentially unlimited number of context parameters, we foresee a statement explosion with the this kind of approach, because for every statement we would have to add a signi cant number of meta-statements describing the relevant context dimensions, so the overhead is immense. A completely di erent approach of representing contexts in RDF is to extend RDF with the ability to represent a reference to a context directly in the data model. There have been proposals in the past, by Guha [12] or Klyne [14, 13], which do not use rei cation but implement context as a real extension of the RDF model theory, by moving from triples to quadruples for identifying the context to which a statement belongs. To the best of our knowledge, these ideas have not been pursued any further. Moreover all the currently available RDF tools would have to be extended in order to deal with such an RDF model. One related approach that has lead to actual results is that of the W3C Named Graph Interest Group5 . There has been published a substantial article in 2005 [7], and implementational results are now part of the Named Graphs API for Jena (NG4J) 6 , with Jena being the de-facto standard for the development of software that deals with RDF. We think that this approach it could be well used as an underlying implementation for the smaller part of what we want to do, and to be extended for the very important aspect for Compatibility Relations discussed above. From a di erent point of view, we are also trying to discover the relations between our ideas and works from the eld of Description Logics, e.g. Context- OWL [5] and Distributed Description Logics [4]. These works are especially important with regards to Compatibility Relations and the modelling of contexts in general. 5 Bene ts and Applications The bene ts of a good solution for the issues mentioned in Sect. 2 would in our opinion be the base for advanced Knowledge Representation in the Semantic Web. We hope to provide three major outcomes with our work: 5 http://www.w3.org/2004/03/trix/ 6 http://www.wiwiss.fu-berlin.de/suhl/bizer/ng4j/ 1150 CAiSE'06 Doctoral Consortium 1. A detailed solution to the problem of modeling contexts in the Semantic Web in a coherent and general way. 2. An evaluation of the MCS theory. We would be able to put this theory to the test, and explore its limitations. 3. Provision of comparative experimentation results, to illustrate which pos- sibilities exist, how they behave and whether they prove appropriate for real-world applications. Possible applications for this kind of knowledge representation are manifold. Aspects such as beliefs, trust, incomplete knowledge or knowledge base evolution in our opinion can all be tackled with a sensible context system as a base. We believe that in the long run, the vast amount of knowledge represented in the Semantic Web can be handled much easier if represented in context. However, we envision the outcomes of this work to go beyond local aspects and also become relevant from a distributed point of view. As the nature of the Semantic Web is inherently distributed, we think we can contribute to the semantic coordination of Semantic Web agents, rstly by o ering the capabilities to make explicit that two knowledge bases belong to their respective agents and to enable the agents to establish semantic links to the KBs of other peers with the help of compatibility relations. 6 Conclusion In this proposal I have presented an envisioned framework of representing con- text in the Semantic Web, as a way to make the currently available methods of knowledge representation in this area become more precise, meaningful and sensible. I have illustrated the motivations, theoretical and practical, as well as the bene cial setting of the proposed work within a large European research project. We have already contributed to the community with a preliminary pub- lication that received positive response. Certainly, there are many open issues that will have to be explored. But we are establishing the relevant cooperations to provide for a high level of expertise in this area and to provide not only a theoretical contribution but also practical results to illustrate how contextual knowledge representation in the Semantic Web can help to overcome a number of existing issues and limitations. 7 Acknowledgements The work described in this paper has been partly funded by the European Com- mission through grant to the project Vikef under the number IST-507173. I would like to thank Luciano Sera ni and my advisor Prof. Paolo Bouquet for their continuous input and support regarding this topic, and for keeping me on track. CAiSE'06 DC 1151 References [1] Franz Baader, Peter Patel-Schneider, Diego Calvanese, Deborah L. McGuinness, and Daniele Nardi, editors. The Description Logic Handbook. Cambridge Univer- sity Press, 2003. [2] Massimo Benerecetti, Paolo Bouquet, and Chiara Ghidini. Contextual reasoning distilled. J. Exp. Theor. Artif. Intell., 12(3):279{305, 2000. [3] Tim Berners-Lee, James A. Hendler, and Ora Lassila. The Semantic Web. Scien- ti c American, May, 2001. http://www.sciam.com/2001/0501issue/0501berners- lee.html. [4] Alexander Borgida and Luciano Sera ni. Distributed description logics: Assimilat- ing information from peer sources. In Stefano Spaccapietra, Salvatore T. March, and Karl Aberer, editors, J. Data Semantics I, volume 1 of Lecture Notes in Computer Science, pages 153{184. Springer, 2003. [5] Paolo Bouquet, Fausto Giunchiglia, Frank van Harmelen, Luciano Sera ni, and Heiner Stuckenschmidt. C-owl: Contextualizing ontologies. In Dieter Fensel, Ka- tia P. Sycara, and John Mylopoulos, editors, International Semantic Web Confer- ence, volume 2870 of Lecture Notes in Computer Science, pages 164{179. Springer, 2003. [6] Paolo Bouquet, Luciano Sera ni, and Heiko Stoermer. Introducing Context into RDF Knowledge Bases. In Proceedings of SWAP 2005, the 2nd Italian Semantic Web Workshop, Trento, Italy, December 14-16, 2005. CEUR Workshop Proceed- ings, ISSN 1613-0073, online http://ceur-ws.org/Vol-166/70.pdf, 2005. [7] Jeremy Carroll, Christian Bizer, Patrick Hayes, and Patrick Stickler. Named Graphs, Provenance and Trust. In Proceedings of the Fourteenth International World Wide Web Conference (WWW2005), Chiba, Japan, volume 14, pages 613{ 622, May 2005. [8] G. Criscuolo, F. Giunchiglia, and L. Sera ni. A Foundation for Metareasoning, Part I: The proof theory. JLC, 12(1):167{208, 2002. [9] G. Criscuolo, F. Giunchiglia, and L. Sera ni. A Foundation for Metareasoning, Part II: The model theory. JLC, 12(3):345{370, 2002. [10] Eric Miller Frank Manola. RDF Primer, February 2004. http://www.w3.org/TR/rdf-primer/. [11] Chiara Ghidini and Fausto Giunchiglia. Local models semantics, or contextual reasoning=locality+compatibility. Artif. Intell., 127(2):221{259, 2001. [12] Ramanathan V. Guha, Rob McCool, and Richard Fikes. Contexts for the semantic web. In Sheila A. McIlraith, Dimitris Plexousakis, and Frank van Harmelen, editors, International Semantic Web Conference, volume 3298 of Lecture Notes in Computer Science, pages 32{46. Springer, 2004. [13] Graham Klyne. Contexts for RDF Information Modelling. Content Technologies Ltd, October 2000. http://www.ninebynine.org/RDFNotes/RDFContexts.html. [14] Graham Klyne. Circumstance, provenance and partial knowledge - Limiting the scope of RDF assertions, 2002. http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html. [15] Heiko Stoermer, Ignazio Palmisano, Domenico Redavid, Luigi Iannone, Paolo Bouquet, and Giovanni Semeraro. RDF and Contexts: Use of SPARQL and Named Graphs to Achieve Contextualization. In Proceedings of the First Jena User's Conference, Bristol, UK, 2006 (to appear). [16] R.W. Weyhrauch. Prolegomena to a Theory of Mechanized Formal Reasoning. Arti cial Intelligence, 13(1):133{176, 1980.