Towards Large S ale Reasoning on the Semanti Web Balázs Kádár, Gergely Luká sy and Péter Szeredi Budapest University of Te hnology and E onomi s Department of Computer S ien e and Information Theory 1117 Budapest, Magyar tudósok körútja 2., Hungary balazskadar.biz,{luka sy,szeredi} s.bme.hu Abstra t. Traditional algorithms for des ription logi (DL) instan e re- trieval are ine ient for large amounts of underlying data. As des ription logi is be oming popular in areas su h as the Semanti Web, it is very important to have systems that an reason e iently over large data sets. In this paper we present the DLog des ription logi reasoner spe i ally designed for su h s enarios. The DLog approa h transforms des ription logi axioms using the SHIQ DL language into a Prolog program. This transformation is done without any knowledge of the parti ular individuals: they are a essed dynami- ally during the normal Prolog exe ution of the generated program. This allows us to store the individuals in a database instead of memory, whi h results in better s alability and helps using des ription logi ontologies dire tly on top of existing information sour es. In this paper we fo us on the des ription of the DLog appli ation itself. We present the ar hite ture of DLog and des ribe its interfa es. These make it possible to use ABoxes stored in databases and to ommuni- ate with the Protégé ontology editor, as a server appli ation. We also evaluate the performan e of the DLog database extension. Keywords: large data sets, des ription logi , reasoning, logi program- ming, databases 1 Introdu tion Des ription Logi s (DLs) allow us to represent knowledge bases onsisting of terminologi al axioms (the TBox ) and assertional knowledge (the ABox ). Des ription Logi s are be oming widespread as more and more systems start using semanti s for various reasons. As an example, in the Semanti Web idea, DLs are intended to provide the mathemati al ba kground needed for more intel- ligent query answering. Here the knowledge is aptured in the form of expressive ontologies, des ribed in the Web Ontology Language (OWL) [1℄. This language is mostly based on the SHIQ des ription logi , and it is intended to be the standard knowledge representation format of the Web. However, we have tremendous amounts of information on the Web whi h alls for reasoners that are able to e iently handle su h abundan e of data. Moreover, as these data annot be stored dire tly in memory, we need solutions for querying des ription logi on epts in an environment where the ABox is stored in a database . We found that most existing des ription logi reasoners are not suitable for this task, as these are not apable of handling ABoxes stored externally. This is not a simple te hni al problem: most existing algorithms for querying DL on epts need to examine the whole ABox to answer a query. This results in s alability problems and undermines the point of using databases. Be ause of this, we started to investigate te hniques whi h allow the separation of the in- feren e algorithm from the data storage. We have developed a solution, where the inferen e algorithm is divided into two phases. First we reate a query-plan in Prolog from the a tual DL knowl- edge base, without a essing the underlying data set. Subsequently, this query- plan an be run on real data, to obtain the required results. The implementa- tion of these ideas is in orporated in the DLog reasoning system, available at http://dlog-reasoner.sour eforge.net. In this paper we fo us on the ar hite ture of the DLog system, as well as on its external interfa es. We dis uss the interfa e used for a essing databases, whi h allows des ription logi reasoning on top of existing information sour es. We also des ribe the Protégé [2℄ interfa e that makes it possible to use DLog as the ba k-end reasoner of this popular ontology editor. Details on the theoreti al side of DLog an be found in [3℄ and in [4℄. This paper is stru tured as follows. Se tion 2 summarises related work. In Se tion 3 we give a general introdu tion to the DLog approa h and present the ar hite ture and implementation details of the system. The database and Protégé interfa es are des ribed in Se tions 4 and 5, respe tively. Se tion 6 evaluates the performan e of the database extension of DLog w.r.t. the version whi h stores the ABox as Prolog fa ts. Finally, in Se tion 7, we on lude with the future work and the summary of our results. 2 Related work Several te hniques have emerged for dealing with ABox-reasoning. Tradi- tional ABox-reasoning is based on the tableau inferen e algorithm, whi h tries to build a model showing that a given on ept is satisable. To infer that an individual i is an instan e of a on ept C , an indire t assumption ¬C(i) is added to the ABox, and the tableau-algorithm is applied. If this reports in onsisten y, i is proved to be an instan e of C . The main drawba k of this approa h is that it annot be dire tly used for high volume instan e retrieval, be ause it would require he king all instan es in the ABox, one by one. To make tableau-based reasoning more e ient on large data sets, several te hniques have been developed in re ent years [5℄. These are used by the state- of-the-art DL reasoners, su h as Ra erPro [6℄ or Pellet [7℄. Extreme ases involve serious restri tions on the knowledge base to ensure e ient exe ution with large amounts of instan es. For example, [8℄ suggests a 2 solution alled the instan e store , where the ABox is stored externally, and is a essed in a very e ient way. The drawba k is that the ABox may ontain only axioms of form C(a), i.e. we annot make role assertions. Paper [9℄ dis usses how a rst order theorem prover su h as Vampire an be modied and optimised for reasoning over des ription logi knowledge bases. This work, however, mostly fo uses on TBox reasoning. In [10℄, a resolution-based inferen e algorithm is des ribed, whi h is not as sensitive to the in rease of the ABox size as the tableau-based methods. How- ever, this approa h still requires the input of the whole ontent of the ABox before attempting to answer any queries. The KAON2 system [11℄ implements this method and provides reasoning servi es over the des ription logi language SHIQ by transforming the knowledge base into a disjun tive datalog program. Although the motivation and goals of KAON2 are similar to ours, unlike KAON2 (1) we use a pure two-phase reasoning approa h (i.e. the ABox is a - essed only during query answering) and (2) we translate into Prolog whi h has well-established, e ient and robust implementations. Arti le [12℄ introdu es the term Des ription Logi Programming. This idea uses a dire t transformation of ALC des ription logi on epts into denite Horn- lauses, and poses some restri tions on the form of the knowledge base, whi h disallow axioms requiring disjun tive reasoning. As an extension, [13℄ introdu es a fragment of the SHIQ language that an be transformed into Horn- lauses. This work, however, still poses restri tions on the use of disjun tions. 3 The DLog system The main idea of the DLog approa h is that we transform a SHIQ knowl- edge base KB into rst-order lauses Ω( KB) and from these we generate Prolog ode [3℄. In ontrast with [11℄, all lauses ontaining fun tion symbols are elim- inated during the transformation: the resulting lauses an be resolved further only with ABox lauses. This forms the basis of a pure two phase reasoning framework, where every possible ABox-independent reasoning step is performed before a essing the ABox itself, allowing us to store the ontent of the ABox in an external database. A tually, in the general transformation, we use only ertain properties of Ω( KB). These properties are satised by a subset of rst order lauses that is, in fa t, larger than the set of lauses that an be generated from a SHIQ KB. We all these lauses DL lauses . As a onsequen e of this, our results an be used for DL knowledge bases that are more expressive than SHIQ. This in ludes the use of ertain role onstru tors, su h as union. Furthermore, some parts of the knowledge base an be supplied by the user dire tly in the form of rst order lauses. More details an be found in [3℄. As the lauses of a SHIQ knowledge base KB are normal rst-order lauses we an apply the Prolog Te hnology Theorem Proving (PTTP) te hnology [14℄ dire tly on these. In [3℄ we have simplied the PTTP te hniques for the spe ial 3 ase of DL lauses and we have proved that these modi ations are sound and omplete for DL lauses. The simplied PTTP te hniques used in DLog in lude deterministi an estor resolution and loop elimination . Both are appli able only to unary predi ates, i.e. predi ates orresponding to DL on epts. In the design of the DLog system we fo us on modularity. This enables us to easily implement new features and new interfa es. The top level ar hite ture of the system is shown in Figure 1. In this gure, as in subsequent gures of the paper, re tangles with rounded orners represent modules of the DLog system, while data are shown as plain re tangles. In Figure 1 the DLog reasoner is shown within a dashed re tangle. DLog reasoner Knowledge Base Manager First phase: translation PSfrag repla ements Client Se ond phase: exe ution Support modules Fig. 1. The top level ar hite ture of the DLog system. The user (either lo al or remote) a esses DLog through one of the external interfa es. These interfa es range from a lo al onsole to server interfa es like DIG used by the Protégé ontology editor. The knowledge base manager is the entral pie e of the system. It oordinates the tasks of the other modules, and performs the administration of multiple on urrent knowledge bases. It forwards the request arriving from the interfa es to the reasoner modules. The support modules onsist of several tools that are used by most parts of the system. They in lude a onguration manager module, a logger, an XML reader, a run-time system for the se ond phase, and several portability tools that allow DLog to run under dierent Prolog implementations ( urrently SWI and SICStus). The rst phase, translation, shown in Figure 2, takes a set of des ription logi axioms as input. These axioms are divided into two parts: the TBox or terminology box stores on ept and role in lusion axioms, while the ABox or assertion box ontains the fa tual data. The ABox may be stored (partly or 4 ompletely) in external databases. The ABox is pro essed rst, produ ing the ABox ode (whi h is a Prolog module), and the ABox signature, whi h is required for translating the TBox. The generation of ABox ode in ludes optimisations su h as indexing on se ond argument for roles stored in memory. Next, the TBox is pro essed in two steps. First the DL translator module transforms the des ription logi formulae to a set of DL lauses [15℄, whi h are passed on to the TBox translator module that generates the exe utable TBox ode . This generated ode is equivalent, with respe t to instan e retrieval, to the input DL knowledge base. The TBox translator module uses various optimisa- PSfrag repla tions [3℄ ements to obtain more e ient Prolog programs. The ABox and TBox ode an be generated dire tly into memory or may be saved to disk for later (standalone) use. ABox ABox translator ABox ode ABox signature DL TBox TBox translator translator TBox ode Fig. 2. The rst phase: translation. The se ond phase, exe ution, shown in Figure 3, uses the ABox and TBox programs generated in the rst phase, to answer queries. There are two ways to exe ute queries: the generated TBox an be alled dire tly from Prolog as a low-level interfa e, or the Query module provides a high-level interfa e that provides basi support for omposite queries and an aggregate the results. In normal operation the query module is alled by the knowledge base manager, whi h forwards the results to the user interfa e. As the query module does not depend on the rest of the system, it may be used in standalone operation. The run-time system (shown as RTS in the gure) in ludes a hash table implemented in C used to speed up the reasoning, and optional olle tion of statisti s. PSfrag repla ements ABox ode Queries Query RTS TBox ode module Results Fig. 3. The se ond phase: exe ution. 5 4 Integrating DLog with databases As the rst phase of reasoning (i.e. the generation of a query plan) only depends on the signature of the data set, and be ause of the top-down inferen e of Prolog, DLog an e iently use databases to store the ABox. There may be several advantages in using databases to store the ABox. Firstly, this allows reasoning on data sets that annot t into memory. Se ondly, it makes integrating DLog with existing systems easier, as the reasoner an use the existing databases of other appli ations. Thirdly, querying some on epts (namely those orresponding to so- alled query predi ates ) may be performed using omplex database queries, rather than DL reasoning, whi h is expe ted to deliver a marked in rease in performan e. A predi ate is a query predi ate [3℄, if it is non-re ursive, it does not invoke its negation, and is not invoked from within its negation. Here, a predi ate P0 is said to invoke a predi ate Pn , n ≥ 1, if there are n − 1 intermediate predi ates P1 . . . Pn−1 , su h that Pi is dire tly invoked by Pi−1 , i.e. it o urs in a lause body the head of whi h is Pi−1 , for i = 1, . . . , n. Query predi ates require neither loop elimination, nor an estor resolution during exe ution. The name query predi ate ree ts that fa t that su h predi- ates an be transformed to omplex database queries (provided that all on epts and roles required are stored in a single database). This an in rease the per- forman e as the database engine an optimise the query using statisti al and stru tural knowledge of the database in question. We designed the database interfa e to be as simple as possible. The databases are a essed via the ODBC driver of SWI-Prolog; as a onsequen e DLog an interfa e with most modern database systems. We wanted a way to spe ify database a ess using existing tools and interfa es  su h as Protégé and the DIG interfa e it utilises  even if those do not, at the moment, provide a way to spe ify database usage. To a ess a database, several pie es of information are needed: the name of the database, a user name, a password, a des ription of whi h table to use for given on epts and roles, et . Be ause of the aforementioned re- quirements we de ided to use ABox assertions to arry this meta-information. ABox assertions are des ription logi onstru ts that are readily available in DL systems and interfa es, su h as OWL and DIG. In order to spe ify the database a ess for on epts and roles we introdu e new roles (obje t properties), attributes (datatype properties) and individuals dened in the namespa e http://www. s.bme.hu/dlogDB. The ODBC interfa e pres ribes that database onne tions are to be iden- tied by a Data Sour e Name (DSN). In DLog we introdu e an individual to represent a given database onne tion. Roles and on epts are also represented by individuals. An arbitrary name an be used for su h an individual. The meta data provided is used to onne t to the database, and, for ea h on ept and role, an additional lause is generated, whi h, by exe uting an ap- propriate database query, lists appropriate individuals (or pairs of individuals). This allows on epts and roles to be stored partially in databases and partially in memory. This may be very useful when developing ontologies. 6 4.1 Spe ifying the Database Interfa e Database onne tions are represented by individuals that have the string at- tribute hasDSN dened. The value of this attribute is the name of the data sour e (DSN). As all other names in this se tion, this name is dened in the namespa e http://www. s.bme.hu/dlogDB. Additional string attributes, namely hasUserName and hasPassword, may be used to spe ify the user name and the password for the given onne tion, if required. The obje t property hasConne tion links an individual representing a role or a on ept with the database onne tion to be used for a essing it. This makes it possible to use one data sour e for one on ept, and a dierent one for another. The instan e on the left hand side is the individual representing the role or on ept, while the instan e on the right hand side is the individual representing the onne tion. Two methods are provided to spe ify how to get the data from the database. One is to spe ify a query that is to be dire tly exe uted on the database. This method, named the simple interfa e, is provided be ause of its simpli ity: it an be applied to databases without any modi ation. However it has two drawba ks:  it makes transforming query predi ates to database queries very di ult; and  it performs badly for instan e he k queries. The latter is a large setba k as most of the queries are instan e he ks, assuming the the proje tion optimisation of [3℄ is used. Therefore the se ond, preferred, way is to provide the name of a table or of a view and the name of the olumn(s) of this table. This approa h, alled the omplex interfa e may require the reation of new views in the database, but provides mu h greater exibility and better performan e. The SQL query in the simple interfa e is dened using the string attribute hasQuery. The individual represents the role or on ept and the attribute value is the query string. For individuals representing roles the query must return two olumns, and for those used for on epts it must return one olumn that ontains the individual name. If the omplex interfa e is used, the name of the table or view to use is spe ied by the string attribute hasTable. The name of the olumn listing the individuals of a on ept is given using the string attribute hasColumn. For roles, the attributes hasLHS and hasRHS are used for the left and the right hand side, respe tively. Be ause, in Protégé, individuals annot be spe ied as instan es of a negated on ept, we provide some additional attributes: hasNegQuery, hasNegTable and hasNegColumn. These are used to spe ify the database a ess of negated on- epts, in a way similar to their respe tive positive pairs. By providing an attribute hasNegQuery for a name representing the on ept C we spe ify a query listing the individuals of ¬C . Obviously, both hasQuery and hasNegQuery an appear as attributes of the same individual. 7 To spe ify that the individual on ept represents the on ept C , one simply has to make on ept an instan e of C . The DLog system will he k ea h on ept o urring in the ABox if it ontains an instan e whi h is in the namespa e http://www. s.bme.hu/dlogDB. If su h an instan e is found, it is interpreted as a handle to a database whi h is to produ e (additional) instan es for the given on ept. Similarly, to spe ify that an individual role represents the role R, we require that the user in ludes the triple {role, R, indiv} in the ABox. Here indiv is an arbitrary individual. Again DLog will look for an instan e in the namespa e http://www. s.bme.hu/dlogDB within the domain (i.e. the left hand side) of ea h role, and use it to onstru t a database a ess for the given role. The database interfa e is urrently in the alpha test phase. We believe that our approa h for this task, dis ussed above, is an intermediate solution. Ulti- mately the standard interfa es, su h as DIG, should be extended to allow storing (parts of ) the ABox in databases. However, we hope that our work ontributes to implementing this ultimate goal. 4.2 Examples of Using the Database Interfa e We now present two examples for interfa ing with databases, one for the simple, and one for the omplex interfa e. The examples ontain ABox assertions, whi h are displayed as RDF triples in {subje t, predi ate, obje t} format. String values are shown between quotes. The namespa e http://www. s.bme.hu/dlogDB# is represented by the dlog: prex. Figure 4 shows the use of the simplied interfa e for the ABox of the Io aste example. This lassi al example involves the on ept des ribing a person hav- ing a patri ide hild, who, in turn, has a non-patri ide hild. The ABox axioms, whi h are now to be stored in a database, des ribe the hasChild relation between ontaining (Io aste, Oedipus), (Io aste, pairs of individuals (traditionally Polyneikes), (Oedipus, Polyneikes) and (Polyneikes, Thersandros)). The ABox also spe ies whi h individuals are patri ide and whi h are non-patri ide (traditionally Oedipus is known to belong to the former, while Thersandros to the latter). We have hosen the namespa e represented by the io: prex for the names in this ontology. The database onne tion is named iodb, and the orresponding DSN is spe ied as "io aste" (line 1). This onne tion is a essed without spe ifying a user name or a password. A ordingly, iodb has no attributes other than dlog:hasDSN. Both the role hasChild and the on ept Patri ide are taken from this database. The role hasChild is represented by the instan e dlog:riohasChild. We hose this name as a mnemoni for a r ole from the namespa e io , alled hasChild , but any other name ould have been used. Line 2 tells the system that this individual represents the role io:hasChild. Here, the right hand side of the role is of no interest, so we hose to have the same individual as on the left hand side. Line 6 tells that the individual dlog: ioPatri ide is an instan e of 8 1 {dlog:iodb, dlog:hasDSN, "io aste"} 2 {dlog:riohasChild, io:hasChild, dlog:riohasChild} 3 {dlog:riohasChild, dlog:hasConne tion, dlog:iodb} 4 {dlog:riohasChild, dlog:hasQuery, 5 "SELECT parent, hild FROM hasChild"} 6 {dlog: ioPatri ide, rdf:type, io:Patri ide} 7 {dlog: ioPatri ide, dlog:hasConne tion, dlog:iodb} 8 {dlog: ioPatri ide, dlog:hasQuery, 9 "SELECT name FROM people WHERE patri ide"} 10 {dlog: ioPatri ide, dlog:hasNegQuery, 11 "SELECT name FROM people WHERE NOT patri ide"} Fig. 4. An example of the simplied database interfa e. the ide1. This individual, whi h thus represents the on ept on ept io:Patri io:Patri ide, has two queries asso iated with it: one for io:Patri ide (line 8) and one for its negation (line 10). The simplied interfa e allows omplex queries, su h as the one for Patri ide whi h has a WHERE lause. This way the existing table people an be used without modi ation. However, this approa h makes it very di ult to transform any possible query predi ates in the TBox to dire t database queries, and instan e he k queries run with a poor performan e. We now present a se ond example. The TBox of this example, taken from [4℄, is shown below. 1 ∃hasFriend. Al oholi ⊑ ¬Al oholi 2 ∃hasParent. ¬Al oholi ⊑ ¬Al oholi Line 1 des ribes that those who have a friend who is al oholi are non-al oholi (as they see a bad example), while line 2 states that those who have a non- al oholi parent are non-al oholi (as they see a good example). In the lassi form the ABox ontains role assertions for the hasParent and hasFriend re- lations only, and no on ept assertions about anyone being al oholi or non- al oholi . In spite of this, in the presen e of ertain role instan e patterns, one an infer some people to be non-al oholi , using ase analysis. For example, onsider the following pattern: Ja k is Joe's parent and also his friend. Now, if we assume that Ja k is al oholi , then the axiom in line 1 implies that Joe is not al oholi . On the other hand, if Ja k is not al oholi , it follows from line 2 that Joe is not al oholi , either. Thus these two role assertions imply that Joe has to be non-al oholi . Other patterns, where Joe an be inferred to be non-al oholi , are the following: Joe is a friend of himself; Joe is a friend of an an estor; and Joe's two an estors are in the hasFriend relationship. 1 Note that the prex rdf, used in the predi ate position of the triple in line 6, refers to the RDF namespa e: http://www.w3.org/1999/02/22-rdf-syntax-ns#. 9 In Figure 5 we present a database a ess spe i ation for the above example, using the omplex interfa e. Here, the database al oholi is a essed with the user name "drunkard" and the password "palinka" (lines 13). We assume that a new view, alled "hasParentView", was dened in the database to hide the omplex query for the role hasParent, f. lines 46. The olumns of this view, hild and parent (lines 78), ontain the data for the role hasParent. From this information DLog an reate a query for instan e retrieval ("SELECT hild, parent FROM hasParentView"), and three other query patterns for the ases when at least one of the individuals is known (e.g. "SELECT hild FROM hasParentView WHERE parent = ?"). This approa h allows for the generation of omplex database queries for the query predi ates. 1 {dlog:al db, dlog:hasDSN, "al oholi "} 2 {dlog:al db, dlog:hasUserName, "drunkard"} 3 {dlog:al db, dlog:hasPassword, "palinka"} 4 {dlog:ral hasParent, al :hasParent, dlog:ral hasParent} 5 {dlog:ral hasParent, dlog:hasConne tion, dlog:al db} 6 {dlog:ral hasParent, dlog:hasTable, "hasParentView"} 7 {dlog:ral hasParent, dlog:hasLHS, " hild"} 8 {dlog:ral hasParent, dlog:hasRHS, "parent"} 9 {dlog:ral hasFriend, al :hasFriend, dlog:ral hasFriend} 10 {dlog:ral hasFriend, dlog:hasConne tion, dlog:al db} 11 {dlog:ral hasFriend, dlog:hasTable, "friends"} 12 {dlog:ral hasFriend, dlog:hasLHS, "friend1"} 13 {dlog:ral hasFriend, dlog:hasRHS, "friend2"} 14 {dlog: al Al oholi , rdf:type, al :Al oholi } 15 {dlog: al Al oholi , dlog:hasConne tion, dlog:al db} 16 {dlog: al Al oholi , dlog:hasTable, "al oholi View"} 17 {dlog: al Al oholi , dlog:hasColumn, "name"} 18 {dlog: al Al oholi , dlog:hasNegTable, "nonal oholi View"} 19 {dlog: al Al oholi , dlog:hasNegColumn, "name"} Fig. 5. An example of the omplex database interfa e. In Figure 5, lines 1013 spe ify the database a ess for the role hasFriend, while lines 1419 allow for a essing individuals belonging to the on ept al oholi and its negation through appropriate database views. 5 Integrating DLog with Protégé Protégé [2℄ is an open sour e ontology editor that supports the Web Ontology Language (OWL) [1℄, and an onne t to reasoners via the HTTP-based DIG interfa e [16℄. The DLog server implements the DIG interfa e and an be used to exe ute instan e retrieval queries issued from the graphi al interfa e of Protégé. 10 The DIG interfa e spe ies ommuni ation via HTTP, and uses XML data format. For the implementation we used the HTTP server provided with SWI- Prolog. In implementing the interfa e we fa ed di ulties aused by some am- biguities of the DIG spe i ations, despite there being an (exa t) XML s hema denition. Another di ulty was that Protégé does not stri tly follow the def- inition of the interfa e. For example it uses a learKB ommand that is not even dened in version 1.1 of DIG. In DIG 1.0, whi h supported only a single database, this ommand was dened, but Protégé uses the new version that sup- ports multiple on urrent knowledge bases. We strove for an implementation as generi and omplying to the interfa e denition as possible while, also being ompatible with Protégé. For parsing XML we use the SGML module of SWI-Prolog, whi h an be operated in an XML ompatibility mode, allowing namespa es. As this is not a dire t XML parser, it has some di ulties when used in XML mode. For example even with the stri test settings and treating all warnings as errors, it a epts input les that are not even well-formed XML. Be ause of this, and in hope of better performan e, we are planning to swit h to Apa he Xer es-C++. With Xer es we plan to use SAX parsing, instead of DOM, with the hope of lower memory usage and faster parsing. The data are extra ted from the XML DOM using Denite Clause Grammars (DCG). Figure 6 shows the results of a query issued from Protégé, as answered by the DLog server. Fig. 6. S reenshot of query results in Protégé answered by DLog. 11 The integration of Protégé and the database interfa e is in progress. A serious di ulty is that if the results of a query ontain individuals that are not dened in Protégé (i.e. individuals present only in databases) Protégé silently drops these individuals from the list of query results. 6 Evaluation This se tion ontains a preliminary performan e test of the database inter- fa e. We tried the database interfa e on a large version of the Io aste problem whi h ontains 5058 pairs in the hasChild relation, 855 instan es that are known to be patri ide, and 314 that are known to be non-patri ide. The exe ution results are summarised in Table 1. The load time means the time it takes to load the le whi h ontains the axioms, in luding the XML parsing. The translation time is the time it takes to generate the TBox and ABox ode from the axioms, while exe ution time is the run-time of the query. Table 1. Comparing the in-memory and database version of a large Io aste test. (se onds) load translate exe ute total in-memory 0.88 0.53 0.02 1.43 database 0.05 0.02 0.36 0.43 When the ABox is stored in memory, the translation takes 1.41 se onds, and the exe ution takes only 0.02 se onds. Note that these gures were obtained with the indexing optimisation turned o. When this optimisation is turned on, the number of generated ABox lauses is doubled, and translation time in reases a ordingly. The database variant of the example enumerates all the instan es of the queried on ept in 0.36 se onds. This, ompared to the original 0.02 se onds is mu h slower. However, the time we spent at ompile-time was altogether 0.07 se onds, resulting in a total exe ution time of 0.43 se onds. To sum up, in terms of total query exe ution time, more than a three-fold de rease was a hieved, using the database interfa e. From the above data it may seem that using a database for storing the ABox, whi h ts into memory, is bene ial only be ause of the redu ed ompile-time. However, we believe that in the ase of large data sets and omplex queries (espe ially if these ontain on epts giving rise to query predi ates) exe ution time an also be better than that of the in-memory variant. Detailed evaluation of the DLog System an be found in [3℄. 12 7 Summary and future work In this paper we have shown the ar hite ture of the DLog system, dis ussed a database interfa e for representing large ABoxes, and reported on the integration of DLog with the Protégé ontology editor. The database interfa e is espe ially useful if the data set annot t in memory or if it is shared with other systems. Using databases an greatly redu e ompile time and, with advan ed optimisations, it may provide e ien y similar to that of the in-memory version. Future improvements in lude the optimisation of query predi ates, by trans- forming them to database queries, and better integration of Protégé and the database interfa e. Our plans also in lude the implementation of a query mod- ule to handle omposite queries, and the support for additional interfa e formats, su h as OWL, or the KRSS notation used by e.g. the Ra erPro engine. A knowledgements The authors are grateful to the anonymous reviewers for their omments on the earlier version of the paper, and espe ially for re ommending the Billion Triples Challenge for evaluation. Referen es 1. Be hhofer, S.: OWL web ontology language referen e. W3C re ommendation (February 2004) 2. Noy, N., Fergerson, R., Musen, M.: The knowledge model of Protege-2000: Combining interoperability and exibility. http:// iteseer.nj.ne . om/noy01knowledge.html (2000) 3. Luká sy, G., Szeredi, P.: E ient des ription logi reasoning in Prolog: the DLog system. Te hni al report, Budapest University of Te hnology and E onomi s (Jan- uary 2008) Conditionally a epted for publi ation in Theory and Pra ti e of Logi Programming. 4. Luká sy, G., Szeredi, P., Kádár, B.: Prolog based des ription logi reasoning. (De ember 2008) To appear in ICLP 2008. 5. Haarslev, V., Möller, R.: Optimization te hniques for retrieving resour es des ribed in OWL/RDF do uments: First results. In: Ninth International Conferen e on the Prin iples of Knowledge Representation and Reasoning, KR 2004, Whistler, BC, Canada, June 2-5. (2004) 163173 6. Haarslev, V., Möller, R., van der Straeten, R., Wessel, M.: Extended Query Fa- ilities for Ra er and an Appli ation to Software-Engineering Problems. In: Pro- eedings of the 2004 International Workshop on Des ription Logi s (DL-2004), Whistler, BC, Canada, June 6-8. (2004) 148157 7. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A pra ti al OWL-DL reasoner. Web Semant. 5(2) (2007) 5153 8. Horro ks, I., Li, L., Turi, D., Be hhofer, S.: The Instan e Store: DL reasoning with large numbers of individuals. In: Pro eedings of DL2004, British Columbia, Canada. (2004) 13 9. Horro ks, I., Voronkov, A.: Reasoning support for expressive ontology languages using a theorem prover. In: FoIKS. Volume 3861 of Le ture Notes in Computer S ien e., Springer (2006) 201218 10. Hustadt, U., Motik, B., Sattler, U.: Reasoning for Des ription Logi s around SHIQ in a resolution framework. Te hni al report, FZI, Karlsruhe (2004) 11. Motik, B.: Reasoning in Des ription Logi s using Resolution and Dedu tive Databases. PhD thesis, Univesität Karlsruhe (TH), Karlsruhe, Germany (January 2006) 12. Grosof, B.N., Horro ks, I., Volz, R., De ker, S.: Des ription logi programs: Com- bining logi programs with des ription logi . In: Pro . of the Twelfth International World Wide Web Conferen e (WWW 2003), ACM (2003) 4857 13. Hustadt, U., Motik, B., Sattler, U.: Data omplexity of reasoning in very expressive des ription logi s. In: Pro eedings of the Nineteenth International Joint Conferen e on Arti ial Intelligen e (IJCAI 2005), International Joint Conferen es on Arti ial Intelligen e (2005) 466471 14. Sti kel, M.E.: A Prolog te hnology theorem prover: a new exposition and imple- mentation in Prolog. Theoreti al Computer S ien e 104(1) (1992) 109128 15. Zombori, Zs.: E ient two-phase data reasoning for des ription logi s. In: Pro- eedings of the International Federation for Information Pro essing Te hni al Com- mittee on Arti ial Intelligen e (TC12), Milan, Italy (September 2008) A epted onferen e paper. 16. Be hhofer, S.: The DIG des ription logi interfa e. http://dig. s.man hester.a .uk/ (2006) 14