<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ODP Topic Hierarchy Statistics
Hierarchy Max.Depth Avg.Depth Max.Subclass Avg.Subclass #Topics #Resources
Total</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Web service description language www</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>.ibm.com. /developerworks/library/ws-rdf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science</institution>
          ,
          <addr-line>FORTH, Vassilika Vouton, P.O.Box 1385 GR 711 10, Heraklion, Greece [alexaki, christop, gregkar, dp] @ics.forth.gr</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Johann Wolfgang Goethe-University</institution>
          ,
          <addr-line>Robert-Mayer-Str. 11-15 P.O.Box 11 19 32, D-60054 Frankfurt/Main</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1999</year>
      </pub-date>
      <volume>13</volume>
      <issue>6</issue>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>and sketch query evaluation on top of RSSDB.
storage and querying. Specically , we introduce a formal
whilst it outperforms, both in storage volumes and query
vices in various user communities, lead nowadays to large
facilitating the creation and exchange of metadata as any
tiple schemas. Next, we present the design of a persistent
net. The Resource Description Framework (RDF) aims at
form of triples. Last, we briey presen t RQL, a declarative
this paper, we advocate the use of database technology to
edge. Our approach preserves the exibilit y of RDF in
revolumes of RDF metadata. Managing such RDF resource
support declarative access, as well as, logical and physical
descriptions and schemas with existing low-level APIs and
data model for RDF description bases created using
mulMetadata are widely used in order to fully exploit
informaindependence for voluminous RDF description bases.
mation resources and the proliferation of description
serexecution time, other approaches using a monolithic table
tion resources available on corporate intranets or the
Interother Web data. The growing number of available
inforand easy maintenance of real-scale RDF applications. In
ORDBMS by exploring the available RDF schema
knowlRDF Store (RSSDB) for loading resource descriptions in an
We present RDFSuite, a suite of tools for RDF validation,
ning schemas and/or enriching descriptions at any time,
le-based implemen tations does not ensure fast deployment
language for querying both RDF descriptions and schemas,
to represent resource descriptions and schemas under the
1. INTRODUCTION
able information resources and the proliferation of
descripable and machine understandable. Due to its exible model,
HTML pages or XML data) to dieren t target communities
[5], the Dublin Core [25], or the Web Service Description
RDF is playing a central role in the next evolution step of
vocabularies can be easily extended to meet the description
from PDF or Word documents, e-mail or audio/video les to
RDF, as well as, emerging application standards for Web
tion services in various user communities, lead nowadays to
graphs in which nodes are called resources (or literals) and
of schemas [6] i.e., vocabularies of labels for graph nodes
nels, etc.) about resources of quite diverse nature (ranging
(i.e., classes) and edges (i.e., properties). Furthermore, these
needs of specic (sub-)comm unities while preserving the
aumation resources (e.g., sites, documents, data, images, etc.)
of application contexts. To interpret resource descriptions
syndication - and hence, automated processing - in a variety
the provision of various kinds of metadata (for
administrapressing metadata in a form that is both humanly
readavailable on corporate intranets or the Internet. The
ReLanguage [26]). In a nutshell, the growing number of
availlarge volumes of RDF metadata (e.g., the Open Directory
the Web - termed the Semantic Web. Indeed, RDF/S enable
tonomy of description services for each (sub-)community.
data and services syndication (e.g., the RDF Site Summary
within or across communities, RDF allows for the denition
and Web Portals (e.g., Open Directory, CNET, XMLTree1)
Metadata are widely used in order to fully exploit
infordata. More precisely, RDF provides i) a Standard
Repre(corporate, inter-enterprise, e-marketplace, etc.). The most
descriptions for the same Web resources, enabling content
tion, recommendation, content rating, site maps, push
chansentation Language for metadata based on directed labeled
edges are called properties and ii) an XML syntax for
exMany content providers (e.g., ABCNews, CNN, Time Inc.)
ing the creation and exchange of metadata as any other Web
distinctive RDF feature is its ability to suport superimposed
source Description Framework (RDF) [17] aims at
facilitator browsers (e.g., Netscape 6.0, W3C Amaya) already adopt</p>
      <p>Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission by the authors.</p>
      <p>Semantic Web Workshop 2001 Hongkong, China
Copyright by the authors.
ture the semantics of web resources, while the second is
inious information resources. Portals may be distinguished
all cases, the key Portal component is the Knowledge
Caterarchies of topics to semantically classify Web resources.
we will hereforth omit the namespaces as well as the paths
that ODP hierarchies are not deep (the maximum depth is
enabling the development and maintenance of specic
comtals, like Yahoo! or Open Directory (ODP), use huge
hi13) while the number of direct subclasses of a class varies
topics). The former, is represented using the RDF subclass
query language can be used to provide declarative access to
tions is determined by the corresponding namespace
deninon-unique names) prexing class and propert y names.
Figclassify and access, in a semantically meaningful way,
varPortals are nowadays becoming increasingly popular by
[12] on corporate intranets or the Internet. Such
Commuaccording to the breadth of the target community (e.g.,
tardescribed resources. In Section 5 we will illustrate how our
Specically , Table 1 lists statistics of 15 (out of 16) ODP
tal Catalogs can be easily and eectiv elly represented using
hierarchies (version of 16-01-2001), comprising 252840
toption of each schema, e.g., ns1 (www.dmoz.org/topics.rdfs)
from the root of the topic hierarchies (since topics have
tended for Portal administrators. The scope of the
declaraure 2 depicts 2 (out of the 16) hierarchies in the ODP topics
RDF/S.
hierarchy (e.g., subtopics) or across hierarchies (e.g., related
nity Web Portals essentially provide the means to select,
For instance, the catalogs of Internet (or horizontal)
Porics of the catalog to locate resources of interest, or issue a
of resources are usually created using an OCLC Dublin-Core
alog holding descriptive information, i.e., metadata, about
relationships exist between these classes either within a topic
The middle part of Figure 2 depicts the two schemas
emaverage is around 4.02). Additionally, various
administranamespaces in the upper part of Figure 2). Various semantic
geting horizontal or vertical markets), and the complexity
the catalog content. In the sequel, we illustrate how
Porployed by the Open Directory: the rst is in tended to
capand ns2 (www.oclc.com/dublincore.rdfs). For simplicity,
like schema [25]. Users can either navigate through the
topgreatly (the maximum number of subclasses is 314 but the
of information resources (e.g., sites, documents, data). In
represented as RDF classes (see the rdf and rdfs default
the community resources.
full-text query on topic names and the URIs or the titles of
tive metadata (e.g., titles, mime-types, modication dates)
schema, namely, Regional and Recreation, whose topics are
ics under which 1770781 sites are classied. We can observe
munities of interest (e.g., enterprise, professional, trading)
Portal of Netscape exports in RDF around 170M of Subject
storing and querying voluminous RDF descriptions,
the Open Directory Portal exported in RDF (see Section 2).
the manipulated desription bases and the underlying
time other approaches using a monolithic table [20,
In this paper we present ICS-FORTH RDFSuite [14], a suite
under the form of triples.</p>
      <p>Section 3 introduces a formal data model for
descripto benet from three decades of researc h in database
techresentation of several interconnected RDF schemas, as
well as the introduction of a graph instantiation
mechfor loading resource descriptions in an object-relational
DBMS (ORDBMS) by exploiting the available RDF
Section 6 illustrates the performance of RSSDB for
performs both in storage volumes and query execution
Finally, Section 7 presents our conclusions and draws
directions for further research.</p>
      <p>The paper makes the following contributions:
descriptions at any time whilst it can be customized
schemas with existing low-level APIs and le-based
impleanism permitting multiple classication of resources.</p>
      <p>Section 5 sketches the functionality of a declarative
ing (RDF Query Language-RQL) using an object-relational
performance of RDFSuite we use as testbed the catalog of
18, 7] to represent resource descriptions and schemas
RDF application scenario.
ical independence for RDF description bases. In this way,
such as the ODP catalog. In particular, RSSDB
outof tools for RDF validation (Validating RDF Parser-VRP),
language, called RQL, for querying both RDF
demanaging such voluminous RDF resource descriptions and
tion bases created according to the RDF Model &amp;
SynSection 4 presents our persistent RDF Store (RSSDB)
DBMS (see Figure 1). To illustrate the functionality and the
as much as possible to the underlying ORDBMS.
in several ways according to the specicities of both
tion). The main challenge for this model is the
repof determining how to eÆciently store or access them to the
RDF applications have to specify in a high-level language
maintenance of real-scale RDF applications. Still, we want
storage (RDF Schema Specic DataBase -RSSDB), and
queryibility of RDF in rening schemas and/or enriching
schema knowledge. Our approach preserves the
exmentations [22] does not ensure fast deployment and easy
of RQL on top of RSSDB is described with emphasis
tax and Schema specications [17, 6] (without
reicanology to support declarative access and logical and
physscriptions and related schemas. The implementation
on how the RQL interpreter pushes query evaluation
underlying RDF database engine.</p>
      <p>Topics and 700M of indexed URIs). It becomes evident that
only which resources need to be accessed, leaving the task
2. THE OPEN DIRECTORY CATALOG
can be inherited (e.g., to subclasses of ExtResource), while
prove storage volumes and optimize queries on these graphs.
proposed models for semistructured or XML databases [1]:
tional (e.g., the property le size is not used), can be
multiresent superimposed descriptions of community resources,
ily pairwise related by specialization: the instances of
of XML trees (i.e., there is no root XML node).
tion base through one or the union of all related schemas. It
high-level queries on RDF/S graphs, while several physical
Note also that due to multiple classication, resources
may have quite irregular descriptions modeled only through
Hence, RDF properties are unordered (e.g., the property
title). Therefore existing proposals cannot be used, as
an exception mechanism a la SGML [15]. Last but not
such, to manipulate RDF description bases. In the sequel,
Classes do not dene object or relation types: an
ina class may have quite dieren t properties associated
we will present a logical data model allowing users to issue
least, semistructured or XML models can’t distinguish
beunion of these properties is dened;
representations can be used by the underlying DBMS to
imResources may belong to dier ent classes not
necessarstance of a class is just a resource URI;
with them, while there is no other class on which the
title can appear before or after the property description),
opwhile preserving a conceptually unied view of the
descripfrom standard object or relational models [3] or the recently
set of constraints (domain and range compatibility).
resources can be multiply classied (e.g., &amp;r4). These
modeling primitives provide all the exibilit y we need to
repbecomes clear that the RDF data model diers substan tially
valued (e.g., we have two description properties), and they
tween entity (e.g., ExtResource) and property labels (e.g.,
Properties may also be rene d by respecting a minimal
s
n
o
i
t
p
i
r
c
s
e
d
e
c
r
u
o
s
e
R
and two description properties with values \OÆcial site of
(i.e., reic Finally, both RDF graph schemas and ation2).</p>
      <p>RDF statements. Although not illustrated in Figure 2, RDF
literal properties: a title property with value \Disneyland"
(e.g., title), and an object (e.g., \Disneyland "). The
subFinally, the ODP administrative metadata schema captures
&amp;r4). For instance, &amp;r4 is a resource classied under both
value (i.e., node) form a statement. Each statement is
repdescriptions can be serialized in XML using various forests
Using these schemas, we can see in the lower part of
Fignode) together with a named property (i.e., edge) and its
also supports structured values called containers (e.g., Bags)
specialization) with the domain and range of the predicate
class of Paris) and the latter using an RDF property named
respectively. In the RDF jargon, a specic resource (i.e.,
resented by a triple having a subject (e.g., &amp;r4), a predicate
various descriptive elements of Dublin-Core [25] (with the
related (e.g., between the classes Ile-de-France and Hotel).
the classes Paris and ExtResource and has three associated
ject and object should be of a class compatible (under class
ure 2, the descriptions created for four sites (resources
&amp;r1(e.g., &amp;r4 is of type ExtResource). In the rest of the
pato represent attributes of resources as well as relationships
between resources. Properties can also be organized in
taxfor grouping statements, as well as, higher-order statements
execption of the Subject element), as literal RDF properties
dened on class ExtResource. Note that properties serve
relationship (e.g., Travel is a, not necessarily direct,
subper, the term description base will be used to denote a set of
Disneyland Paris" and \Site oÆciel de Disneyland Paris"
onomies in a manner similar to the organization of classes.
5.3
5.56
4.8
5.5
5
5.4
5.9
6.53
5.3
6.3
5
5.87
7.13
4.8
7.7
3.1 Additional RDF Constraints
3. A FORMAL DATA MODEL FOR RDF
domain(p) 2 C and range(p) 2 C [ L. We denote by H =
using class names or literal types so that: for each p 2 P ,
and domain(p2) range(p1) range(p2)3.
and property names P P whose scope is determined by
(N; ) a hierarchy of class and property names, where N =
such that: if 2 P and then p1; p2 p1 p2, domain(p1)
Each RDF schema uses a nite set of class names C C
one or more namespaces. Property types are then dened
C [ P . H is well-formed if is a smallest partial ordering
3.2 RDF Typing System
relation is dened in RDF/S (e.g., bet ween bags of dieren t
while properties as binary relations of type f[ U ; U ]g (for
types). The set of all type names is denoted by T .
meaningless). Furthermore, all types are mutually exclusive
Sequence type, (:) is the Alternative type, and U is the type
relationships) or f[ U ; (for attributes). Furthermore, L]g
for resource U Alternatives capture the semantics of RIs4.
2 + : : : + Unlike traditional object data models, RDF n)].
(e.g., a literal value cannot also be a bag) and no subtyping
The proposed type system oers all the arsenal w e need
and alternatives, labels can be omitted (for bags, labels are
tions (e.g., as a bag of sequences). Finally, assignment of a
can be dened as heterogeneous of type [( 1 + sequences5
ples denoted by : : : ; (where is of some type [v1; v2; vn] vi i)
union (or variant) types [8], and they are also ordered (i.e.,
classes and properties. For instance, unnamed ordered
tuclasses are then represented as unary relations of type f U g
RDF containers can also be used to represent n-ary
relawhere L is a literal type in L, f:g is the Bag type, [:] is the
to capture containers with both homogeneous and
heterogeinteger labels play the role of union member markers). Since
neous member types, as well as, to interpret RDF schema
there exists a predened ordering of labels for sequences
3.3 RDF Description Bases and Schemas
C \ P \ T = ;
p 2 f1; 2; : : : g ) (v1) TBjSjA
subPropertyOf relation is transitive:
if p is subPropertyOf then the domain (range) of p is a subset of the domain (range) of p0 p0:
if p belongs to the set f1; 2; : : : g then is an instance of either rdf:Bag or rdf:Seq or rdf:Alt: v1
p ) p p0; p0 p00 p00
p ) p 6= p0 p00
p P roperty
p ) domain(p) ^ range(p) p0 domain(p0) range(p0)
if v is a URI then it is an instance of one or more classes not related by :
8p; 2 P : p0; p00
A reied statemen t should have exactly one rdf:predicate, rdf:subject and rdf:object property
Class is the root of the class hierarchy:
Additionally, the direct extends of properties must be unique
8p 2 P; 2 [ p] : [v1; v2]
subPropertyOf relation is antisymmetric:
if p doesn’t belong to the set f1; 2; : : : g then belongs to the domain (range) of p: v1 (v2)
subClassOf relation is transitive:
if v is a literal value then it is an instance of one (and only one) Literal type:
8c; 2 C; v 2 ^[ c] ; c ) v 2= ^[ c0 c0 c0]
8c; 2 C : c0; c00
Domain and range of properties should be dened and they should be unique:
Property is the root of the property hierarchy:
c Class
c ) c 6= c0 c0
8p 2 P; 2 C = domain(p)) ^ 2 C [ = range(p)) 9!c1 (c1 9!c2 TL (c2
8v 2 V :
v 2 U ) (v) C
if v is a container value then it is an instance of one (and only one) Container type:
Literal, Resources and Container values are mutually exclusive
c ) c c0; c0 c00 c00
v 2 L ) (v) and (v) is a singleton TL
Class, Property and Type names are mutually exclusive
v 2 V n (L [ U) ) (v) and (v) is a singleton TBjSjA
p 2 P n f1; 2; : : : g ) 9n 2 n domain(p) ^ 9m 2 m range(p) (v1); (v2);
subClassOf relation is antisymmetric:
p 2 P n f1; 2; : : : g ) 2 P n f1; 2; : : : g; p ) 2= ^[ ) (8p0 p0 [v1; v2] p0]
L \ U \ V n (L [ U) = ;
3.4 The Validating RDF Parser (VRP)
8www.w3.org/RDF/Implementations/SiRPAC
4. THE RDF SCHEMA SPECIFIC DATABASE
rdf:Alt) and Literal types (e.g., string, integer, date). to allow the customization of the physical representation of
The main novelty of our representation is the separation RDF metadata in the underlying ORDBMS. This is
imporSubProperty tables, while the corresponding instance tables all class instances by a unique table Instances (see dashed
Note that in the Triples table we distinguish between prop- traversals on subclass (or subproperty) hierarchies.
both RDF schemas and resource descriptions using two ta- properties with range a class will remain unchanged. The
scriptions, called SpecRepr, similar to the attribute-based ap- The benets of this SpecRepr variation are illustrated in
Secthat the syntactic constraint (see Section 3) impos- RDF metadata in an ORDBMS, namely The 9Note PostgreSQL10.
between schema labels is captured by the SubClass and experimented for the ODP catalog is the representation of
This is not the case of other proposals employing a mono- sic SpecRepr could be the representation of properties with
bles (see Figure 6-B), namely, Resources and Triples. The above representation is applicable to RDF schemas where
bles store the URIs of resources while property tables store characteristics of employed schema classes and properties.
and properties) by a unique id whereas the latter represents cialized. Finally, our SpecRepr opens the way for a more
erties representing attributes (i.e., objvalue with literal val- 4.1 The RDF Description Loader
sequel, it can be customized in several ways according to the also interesting to represent in a similar way all the instances
the selection of specic tuples of the tables. Indices are also a limited number of tables. Furthermore, numerous tables
Class), as well as, those of Container (e.g., rdf:Bag, rdf:Seq, Compared to GenRepr, our SpecRepr is exible enough
the URIs of the source and target nodes of the property. Our main goal here is to reduce the total number of
creproach proposed for storing XML data [13, 23]. SpecRepr tion 6 given that most ODP classes (i.e. topics) have few or
bles Class and Property and on the attribute subid of the i.e. one for each topic) have a signican t overhead on the
ORDBMS. Furthermore, a table Type is used to hold the with URI object values). Indices (i.e., B-trees) are
conIt should be stressed that the taxonomic relationships its attributes, etc.). One of the possible variations we have
relies on a schema specic representation of resource de- sources and the id’s of the classes in which resources belong.
ported today by all In other words, RSSDB namely uri and classid, for keeping the uri’s of the re- ORDBMSs9.
ing complete class and property denitions, ensures that loader has been implemented [4] in Java and communication
specicities of both the manipulated description bases and of properties, but in general real-scale RDF schemas have
names of RDF/S built-in types (e.g., rdf:Property, rdfs:- structed for all table attributes.</p>
      <p>Figure 6-A), namely, Class, Property, SubClass and
Subthrough specialization 10www.postgresql.org
enriching descriptions at any time, and as we will see in the tain less than 30 URIs). For other RDF schemas it could be
the statements made about the resources (including classes cleaver representation of class and property names using
apthe RDF application scenario (i.e., querying functionality). more classes than properties. Another alternative to our
bathe table hierarchy created in RSSDB can be only extended
triples (captured by predid, subid and objid respectively). lationships of schema labels and enable to optimize recursive
ues) and those relationships between resources (i.e., objid Figure 5 depicts the architecture of our system for loading
instances of classes and properties. More precisely, class ta- resentation are required to take into account the specic
Figure 4: The Validating RDF Parser (VRP)
dened in an RDF sc hema. To save space when storing class
constructed on the attributes lpart, nsid and id of the ta- (e.g., the ODP catalog implies the creation of 252840 tables,
target of the above tables in order to speed up joins and commercial ORDBMSs (and not PostgreSQL) permit only
of the RDF schema from data information, as well, as the tant since no representation is good for all purposes and in
Indices are constructed on the attributes URI, source and ated instance tables. This is justied b y the fact that some
and properties) under the form of predicate-subject-object propriate encoding systems that preserve the taxonomic
represerves the exibilit y of RDF in rening sc hemas and/or no instances at all (more than 90% of the ODP topics
conare also connected through the subtable relationship, sup- rectangular in Figure 6-A). This table has two attributes,
resentation (i.e., for all RDF schemas), called GenRepr, of be added to the created class tables. The tables created for
Property which capture the class and property hierarchies
The core RDF/S model is represented by four tables (see
distinction between unary and binary relations holding the most real-scale RDF applications variations of a basic
reptables SubClass and SubProperty. response time of all queries (i.e., to nd and open a table,
keeping the namespaces of the RDF Schemas stored in the
lithic table [20, 18, 7] to represent RDF metadata under range a literal type, as attributes of the tables created for the
and properties names, we also consider a table NameSpace Figure 5: RDF Schema Specic DataBase &amp; Loader
the form of triples. These approaches provide a generic rep- domain of this property. Consequently, new attributes will
former represents each resource (including schema classes attribute-properties are single-valued and they are not
spesource (variable Z of type resource URI) and target values
topic names and other should be found in the descriptions
The data path expression in the outer query iterates over the
(variable T of type resource URI) of the title property. The
where $X like "*Hotel*"
ables prexed b y $ like $X) specializing the root Regional.
value matches "*Opera*". Obviously, RQL schema queries
data queries like Q, are not possible in current portals, since
users. RQL aims to simplify such tasks, by providing
powerThen, the ltering condition in the where clause will
rethat similar information may be found under Recreation
names of topics. Furthermore, compositions of schema and
we can see in Figure 2, information about hotels in Paris
of the resources.
select Z
portals may also be found under dieren t hierarchies, that
The schema path expression in the from clause of the
may be \connected" through related links. For example, as
\." implies an implicit join condition between the extent of
and $X &lt; Paris)fYg.fZgtitlefTg
hotels in Paris whose title matches "*Opera*".
from Regionalf:$Xg
ries) of a given topic, browsing should be continued by the
class name and ranges over the result of the nested query.
nested query, states that we are interested in classes
(variQ: Find the resources under the hierarchy Regional, about
such links are not necessarily bi-directional; thus, a user
tain only the classes whose name matches "*Hotel*" and
may also be found in the Recreation hierarchy. Moreover,
starting from the Regional hierarchy may never nd out
from (select $X
are far more powerful than the corresponding topic queries
schema knowledge is that the subtopics of Regional contain
ful path expressions permitting smooth ltering/na vigation
geographical information and a topic Paris is somewhere in
To make things more complex, relevant information in
nally, the result of Q will be a bag of resources whose title
they are also subclasses of Paris (e.g., Hotel and Hotel
Directories. Here, to get all relevant topics, the only required
each class valuating Y and the resources valuating Z.
Fiwhere T like "*Opera*"
ed under the topic of in terest. Note that, in order to locate
one cannot specify that some query terms should match the
previous query can be expressed as follows:
resources classied under the subtopics (e.g., Hotel
Directoon both Portal schemas and resource descriptions. Then the
of common portals, which allow only full-text queries on the
the hierarchy. Thanks to RQL typing, variable Y is of type
source-values of title and the proper instances of the class
is rather straightforward: the class variable $X over all the
subclasses of Regional and its values are ltered according
case of the query Q presented previously.
sions is left to PostgreSQL while the evaluation of RQL
which the evaluation of the right expression depends on the
to the conditions in the where clause. The operator Map on
loop evaluation with values of variable Y passed from the
sions interleave schema with data querying [9]. This is the
(c) the evaluation engine, accessing RDF descriptions from
top is a simple variable renaming for the iterator (Y ) dened
evaluation to the underlying SQL3 engine. Then pushing
pression (expressed in an object algebra a la [10]) to a single
extents (W ) returned by the nested query, as shown in the
(a) the parser, analyzing the syntax of queries; (b) the graph
right branch. The connection between the two expressions
lies on a full-edged ORDBMS like PostgreSQL, the goal
Figure 7 (a) illustrates the algebraic translation of Q. The
in the from clause is translated into a semi-join between the
functions for traversing class and property hierarchies relies
translation of the nested query - given in the left branch
over the nested query result. Then, the data path expression
The RQL interpreter, implemented in C++, consists of
main diÆculty in translating an entire RQL algebraic
exof the RQL optimizer is to push as much as possible query
typing and interdependencies of involved expressions; and
on the existence of appropriate indices (see Section 6). The
constructor, capturing the semantics of queries in terms of
the underlying database [16]. Since our implementation
reevaluation of the left one). Djoin corresponds to a nested
selections or reordering joins to evaluate RQL path
expresis captured by a Djoin operation (i.e., a dependent join in
SQL3 query is due to the fact that most RQL path
expresdown to PostgreSQL, leaving the evaluation of the Djoin to
select Z.source
procedural SQL functions) as follows:
scan operation on class instances (^(Y )[W ]) with a selection
on the Instances relation, as shown in Figure 7 (b). Then,
is expressible in SQL3 (subclassof and issubclassof are
shown in Figure 7 (c). As one can see, the whole query
Note that the * indicates an extended interpretation of
tables, according to the subtable hierarchy. Thus, the whole
right into a join and pushing the selection criterium C =
mizations, as pushing down selections, join reordering etc.</p>
      <p>Y up to the Djoin operation leads to the evaluation plan
the interpreter. However, transforming the semijoin on the
query can be pushed down to PostgreSQL, and the
Postfrom subclassesof(Regional) Y, Intances I, title* Z
each of the subqueries in the shadow boxes can be pushed
left-hand side (i.e., nested query) to the right-hand side.
where issubclassOf(X, Paris) and X like \*Hotel" and
Z.target like \*Opera*"
I.classid=Y and Z.source=I.uri and
greSQL optimizer can perform on further (common)
optiUsing the database representation, we can substitute the
5.1 RQL Interpreter
6. PERFORMANCE TESTS
respectively. The (semi-)joins involved in the evaluation of
three dieren t result cases per query. The main
observaa matter of fact, these query templates depict the core
funcexecution time of a query. Table 4 illustrates the obtained
desired querying functionality for RDF description bases,
queries for specic resource descriptions (e.g., Q10, Q11). As
large Triples table are required.
namely: a) pure schema queries on class and property
de(0.124%) for table Triples (SubClass) in the three cases
To get signican t experimental results, we carried out all
outperforming GenRepr by a factor of up to 3.73
approxthe table names of Figure 6). This benchmark illustrates the
apparent in the cases where self-join computations on the
imately. This is clearly illustrated in Q1, where one
tu5%) and Property (selectivity 20%) using index and
seclasses which represent in GenRepr (SpecRepr) selectivities
the database buers and then nine times to get the a verage
tionality of RQL presented in the previous section.
in queries Q1, Q2, Q5, Q7, Q10 and Q11, with SpecRepr
pressions in the two representations (capital letters abreviate
GenRepr and SpecRepr exhibit comparable performance
queries considered. The deviation in performance is more
that the time required to lter a table in both represen
taof 1.7e-5% (3.955e-4%), 5.14e-4% (1,19e-2%) and 5.38e-3%
queries Q5, Q7 and Q11 incur an additional cost for GenRepr,
we have experimented with classes having 1, 30, 314
subple is selected from both table Triples (selectivity
1,7etion is that SpecRepr outperforms GenRepr for all types of
tions depends on the number of tuples in the query results:
ing available schema knowledge (e.g., Q5-Q9); and c) schema
benchmark queries several times: one initially to warm up
nitions (e.g., Q1-Q4); b) queries on resource descriptions
usTable 3 describes the RDF query templates that we used
quential scans respectively. As expected, in Q2, we can see
execution time (in sec) for both representations in up to
for our experiments, as well as their respective algebraic
exquired in SpecRepr for storing the URIs of resources in the
Triples (i.e., 1770781 extra tuples). Although not
illusconstructed, the average storage volumes per data triple
bethe ODP schema is 1539 sec.
store a data triple is 0.123KB in both representations. This
obtain better storage volumes in the SpecRepr by encoding
the number of data triples. The average space required to
respectively. Despite the use of ids, the indexes in GenRepr
triple requires on average 0.086KB versus 0.1582KB in the 44MB. The size of the Instances table is 150MB (1770781
(GenRepr) and the average loading times become 0.0025 sec
property tables compensates for the extra tuples stored for
come 0.2566KB (SpecRepr) and 0.2706KB (GenRepr) while
Catalog). We can observe that in both representations the size is 32MB for Class (252825 tuples), 8KB for Property
that for each class denition in GenRepr three tuples are 6.2 Querying
Figure 8: Statistics for Loading Schema and Resource Descriptions
The database size required for the resource descriptions
GenRepr. This is mainly attributed to the space saved in tuples) whereas the size of indexes created on it is 140 MB.
larger in GenRepr because of extra (252825) tuples stored.
each resource description in GenRepr. Note that we could
the average loading time become 0.0043 sec and 0.00457 sec
loading a data triple is about 0.0033 sec and 0.0039 sec
the URIs. The tests also show that the average time for
To summarize, after loading the entire ODP catalog, the
triples. The tests show that, in SpecRepr, each schema SubProperty, and the total size of indexes on these tables is
but this solution comes with extra loading and join costs
stored: one in table Resources and two tuples in table
take up more space because of: a) the extra tuples stored
size of tables in GenRepr is 545MB for Triples (5835756
tutriples: one for the class itself and one for its unique super- ples), 202MB for Resources (2022869 tuples) and the total
(between the class and property tables) for the retrieval of
and 0.00317 sec respectively. The total validation time for
the DBMS size in both representations scales linearly with
the resource URIs as integers, as we did in the GenRepr,
When indices are constructed, the average storage volumes
class (multiple class specialization is not used in the ODP size of indexes on these tables is 900MB. In SpecRepr, the
b) the index constructed on the predid attribute for which
per schema triple become 0.1734KB (SpecRepr) and 0.3062KB
trated here, the average time for loading a schema triple is
SpecRepr due to the Namespace table, as well as to the fact
is shown in the rightmost graph of Figure 8. As expected,
about 0.0021 sec and 0.0025 sec respectively in the two
repthere is no corresponding index in SpecRepr.
size of the DBMS scales linearly with the number of schema (5 tuples), 11MB for SubClass (252825 tuples) and 0MB for
respectively in the two representations. When indices are
should not appear as a surprise since the extra space
reresentations. The time required to store a schema triple is
) predid=6^objid=clsid(T</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>