<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Prolog-based Infrastructure for RDF: Scalability and Performance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bob Wielinga</string-name>
          <email>wielinga@swi.psy.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Category: H.3 Information Systems/Information Storage and Retrieval</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>(University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Guus Schreiber (Vrije Universiteit Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jan Wielemaker (University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The semantic web is a promising application-area for the Prolog programming language for its non-determinism and pattern-matching. In this paper we outline an infrastructure for loading and saving RDF/XML, storing triples, elementary reasoning with triples and visualization. A predecessor of the infrastructure described here has been used in various applications for ontology-based annotation of multimedia objects using semantic web languages. Our library aims at fast parsing, fast access and scalability for fairly large but not unbounded applications upto 40 million triples. The RDF parser is distributed with SWI-Prolog under the LGPL Free Software licence. The other components will be added to the distribution as they become stable and documented.</p>
      </abstract>
      <kwd-group>
        <kwd>Performance</kwd>
        <kwd>Logic programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Semantic-web applications will require multiple large ontologies for indexing and
querying. In this paper we describe an infrastructure for handling such large
ontologies, This work was done on the context of a project on ontology-based
annotation of multi-media objects to improve annotations and querying [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
for which we use the semantic-web languages RDF and RDFS. The annotations
use a series of existing ontologies, including AAT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], WordNet [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and ULAN
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To facilitate this research we require an RDF toolkit capable of handling
approximately 3 million triples efficiently on current desktop hardware. This
paper describes the parser, storage and basic query interface for this
Prologbased RDF infrastructure. A practical overview using an older version of this
infrastructure is in an XML.com article [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>We have opted for a purely memory-based infrastructure for optimal speed.
Our tool set can handle the 3 million triple target with approximately 300 Mb.
of memory and scales to approximately 40 million triples on fully equipped
32bit hardware. Although insufficient to represent “the whole web”, we assume 40
million triples is sufficient for applications operating in a restricted domain such
as annotations for a set of cultural-heritage collections.</p>
      <p>This document is organised as follows. In Sect. 2 we describe and evaluate
the Prolog-based RDF/XML parser. Section 3 discusses the requirements and
candidate choices for a triple storage format. In Sect. 4 we describe the chosen
storage method and the basic query engine. In Sect. 5 we describe the API and
implementation for RDFS reasoning support. This section also illustrates the
mechanism for expressing higher level queries. Section 6 describes visualisation
tools to examine the contents of the database. Finally, Sect. 7 describes some
related work.</p>
      <p>Throughout the document we present metrics on time and memory resources
required by our toolkit. Unless specified otherwise these are collected on a dual
AMD 1600+ (approx. Pentium-IV 1600) machine with 2GB memory running
SuSE Linux 8.1, gcc 3.2 and multi-threaded SWI-Prolog 5.1.11.1 The software is
widely portable to other platforms, including most Unix dialects, MS-Windows
and MacOS X. Timing tests are executed on our reference data consisting of 1.5
million triples from WordNet, AAT and ULAN.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Parsing RDF/XML</title>
      <p>The RDF/XML parser is the oldest component of the system. We started our
own parser because the existing (1999) Java (SiRPAC2) and Pro Solutions
Perlbased3 parsers did not provide the performance required and we did not wish
to enlarge the footprint and complicate the system by introducing Java or Perl
components. The RDF/XML parser translates the output of the SWI-Prolog
SGML/XML parser4 into a Prolog list of triples using the steps summarised in
Fig. 1.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Metrics and Evaluation</title>
      <p>The source-code of the parser is 1170 lines, 564 for the first pass creating the
intermediate state, 341 for the generating the triples and 265 for the driver</p>
      <sec id="sec-3-1">
        <title>1 http://www.swi-prolog.org</title>
        <p>2 http://www-db.stanford.edu/~melnik/rdf/api.html
3 http://www.pro-solutions.com/rdfdemo/
4 http://www.swi-prolog.org/packages/sgml2pl.html
XML-Parser</p>
        <p>RDF/XML
Document</p>
        <p>Dedicated
rewrite
language</p>
        <p>RDF
Intermediate
Representation</p>
        <p>DCG
rule-set
Prolog List</p>
        <p>of
Triples
putting it all together. The time to parse the WordNet sources are given in
Tab. 1.</p>
        <p>The parser passes the W3C RDF Test Cases5. In the current implementation
however it does not handle the xml:lang tag nor RDF typed literals using
rdf:dataType.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Storing RDF triples: requirements and alternatives</title>
    </sec>
    <sec id="sec-5">
      <title>Requirement from integrating different ontology representations</title>
      <p>
        Working with multiple ontologies created by different people and/or
organizations poses some specific requirements for storing and retrieving RDF triples.
We illustrate with an example from our own work on annotating images [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Given absence of official RDF versions of AAT and IconClass we created
our own RDF representation, in which the concept hierarchy is modeled as an
RDFS class hierarchy. We wanted to use these ontologies in combination with
5 http://www.w3.org/TR/2003/WD-rdf-testcases-20030123/
the RDF representation of WordNet created by Decker and Melnik6. However,
their RDF Schema for WordNet defines classes and properties for the
metamodel of WordNet. This means that WordNet synsets (the basic WordNet
concepts) are represented as instances of the (meta)class LexicalConcept and that
the WordNet hyponym relations (the subclass relations in WordNet) are
represented as tuples of the metaproperty hyponymOf relation between instances of
wns:LexicalConcept. This leads to a representational mismatch, as we are now
unable to treat WordNet concepts as classes and WordNet hyponym relations
as subclass relations.</p>
      <p>Fortunately, RDFS provides metamodelling primitives for coping with this.
Consider the following two RDF descriptions:
&lt;rdf:Description rdf:about="&amp;wns;LexicalConcept"&gt;</p>
      <p>&lt;rdfs:subClassOf rdf:resource="&amp;rdfs;Class"/&gt;
&lt;/rdf:Description&gt;
&lt;rdf:Description rdf:about="&amp;wns;hyponymOf"&gt;</p>
      <p>&lt;rdfs:subPropertyOf rdf:resource="&amp;rdfs;subClassOf"/&gt;
&lt;/rdf:Description&gt;
The first statement specifies that the class LexicalConcept is a subclass of the
built-in RDFS metaclass Class, the instances of which are classes. This means
that now all instances of LexicalConcept are also classes. In a similar vein, the
second statement defines that the WordNet property hyponymOf is a subproperty
of the RDFS subclass-of relation. This enables us to interpret the instances of
hyponymOf as subclass links.</p>
      <p>We expect representational mismatches to occur frequently in any
realistic semantic-web setting. RDF mechanisms similar to the ones above can
be employed to handle this. However, this poses the requirement on the
toolkit that the infrastructure is able to interpret subtypes of rdfs:Class and
rdfs:subPropertyOf. In particular the latter was important for our
applications, e.g., to be able to reason with WordNet hyponym relations as subclass
relations or to visualize WordNet as a class hierarchy (cf. Fig. 6).
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Requirements</title>
      <p>Based on experiences we stated the following requirements for the RDF storage
formate.</p>
      <p>Efficient subPropertyOf handling As illustrated in Sect. 3.1,
ontologybased annotation requires the re-use of multiple external ontologies. The</p>
      <sec id="sec-6-1">
        <title>6 http://www.semanticweb.org/library/</title>
        <p>subPropertyOf relation provides an ideal mechanism to re-interpret an
existing RDF dataset.</p>
        <p>Avoid frequent cache updates In our first prototype we used secondary
store based on the RDFS data model to speedup RDFS queries. The mapping
from triples to this model is not suitable for incremental update, resulting
in frequent slow re-computation of the derived model from the triples as the
triple set changes.</p>
        <p>Scalability We anticipate the use of at least AAT, WordNet and ULAN in the
next generation annotation tools. Together these require 1.5 million triples
in their current form. We would like to be able to handle 3 million triples on
a state-of-the-art notebook (512 MB).</p>
        <p>Fast load/save RDF/XML parsing and loading time for the above ontologies
is 108 seconds. This should be reduced using an internal format.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Storage options</title>
      <p>The most natural way to store RDF triples is using facts of the format
rdf(Subject, Predicate, Object) and this is, except for a thin wrapper improving
namespace handling, the representation used in our first prototype. As
standard Prolog systems only provide indexing on the first argument this implies
that asking for properties of a subject is indexed, but asking about inverse
relations is slow. Many queries involve reverse relations: “what are the sub-classes of
X?”. “what instances does Y have?”, “what subjects have label L?” are queries
commonly used on our annotation tool.</p>
      <p>Our first tool solved these problems by building a secondary database
following the RDFS datamodel. The cached relations included rdfs class(Class,
Super, Meta). rdfs property(Class, Property, Facet), rdf instance(Resource,
Class) and rdfs label(Resource, Label). These relations can be accessed quickly
in any direction. This approach has a number of drawbacks. First of all, the
implications of even adding or deleting a single triple are potentially enormous,
leaving the choice between complicated incremental synchronisation of the cache
with the triple set or frequent slow total recompute of the cache. Second, storing
the cache requires considerable memory resources and third there are many more
relations that could profit from caching.</p>
      <p>
        Using an external DBMS for the triple store is an alternative. Assuming
some SQL database, there are three possible designs. The simplest one is to
use Prolog reasoning and simple SELECT statements to query the DB. This
approach does not exploit query optimization and causes many requests involving
large amounts of data. Alternatively, one could either write a mixture of Prolog
and SQL or automate part of this process, as covered by the Prolog to SQL
converter of Draxler [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our own (unpublished) experiences indicate a simple
database query is at best 100 and in practice often over 1,000 times slower than
using the internal Prolog database. Query optimization is likely to be of
limited effect due to poor handling of transitive relations in SQL. Many queries
involve rdfs:subClassOf, rdfs:subPropertyOf and other transitive relations.
Using an embedded database such as BerkeleyDB7 provides much faster
simple queries, but still imposes a serious efficiency penalty. This is due to both the
overhead of the formal database API and to the mapping between the in-memory
Prolog atom handles and the resource representation used in the database.
      </p>
      <p>In another attempt we used Predicate(Subject, Object) as database
representation and stored the inverse relation as well in InversePred(Object, Subject)
with a wrapper to call the ‘best’ version depending on the runtime
instantiation. This approach, using native Prolog syntax for fast load/safe satisfies the
requirements with minor drawbacks. The 3 million triples, the software and OS
together require about 600MB of memory. Save/load using Prolog native
syntax is, despite the fast SWI-Prolog parser, only twice as fast as parsing the
RDF/XML.</p>
      <p>In the end we opted for a Prolog foreign-language extension: a module written
in C to extend the functionality of Prolog.8 A significant advantage using an
extension to Prolog rather than a language independent storage module separated
by a formal API is that the extension can use native Prolog atoms, significantly
reducing memory requirements and access time.
4
4.1</p>
    </sec>
    <sec id="sec-8">
      <title>Realising an RDF store as C-extension to Prolog</title>
    </sec>
    <sec id="sec-9">
      <title>Storage format</title>
      <p>
        Triples are stored as a C-structure holding the three fields and 7 hash-table links
for index access on all 7 possible instantiation patterns with at least one-field
instantiated. The size of the hash-tables is automatically increased as the triple
set grows. In addition, each triple is associated with a source-reference consisting
of an atom (normally the filename) and an integer (normally the line-number)
and a general-purpose set of flags, adding to 13 machine words (52 bytes on
32-bit hardware) per triple, or 149 Mbytes for the intended 3 million triples.
Our reference-set of 1.5 million triples uses 890,000 atoms. In SWI-Prolog an
atom requires 7 machine words overhead excluding the represented string. If we
estimate the average length of an atom representing a fully qualified resource
at 30 characters the atom-space required for the 1.8 million atoms in 3 million
7 http://www.sleepycat.com/
8 Extending Prolog using modules written in the C-language is provided in most
todays Prolog systems although there is no established standard foreign interface and
therefore the connection between the extension and Prolog needs to be rewritten
when porting to other implementation of the Prolog language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Pred2
rdfs:subPropertyOf
cached ‘root’ predicate
      </p>
      <p>Pred3
Pred4
triples is about 88 Mbytes. The required total of 237 Mbytes for 3 million triples
fits easily in 512 Mbytes.</p>
      <p>To accommodate active queries safely, deletion of triples is realised by flagging
them as erased. Garbage collection can be invoked if no queries are active.
4.1.0.1 Indexing
Subjects and resource Objects use the immutable atom-handle as hash-key.
Literal Objects use a case-insensitive hash to speedup case-insensitive lookup of
labels, a common operation in our annotation tool. The Predicate field needs
special attention due to the requirement to handle subPropertyOf efficiently.
The storage layer has an explicit representation for all known predicates which
are linked directly in a hierarchy built using the subPropertyOf relation. Each
predicate has a direct pointer to the root predicate: the topmost predicate in
the hierarchy. If the top is formed by a cycle an arbitrary node of the cycle is
flagged as the root, but all predicates in the hierarchy point to the same root
as illustrated in Fig. 2. Each triple is now hashed using the root-predicate that
belongs to the predicate of the triple.</p>
      <p>The above representation provides fully indexed lookup of any instantiation
pattern, case insensitive on literals and including sub-properties. As a
compromise to our requirements, the storage layer must know the fully qualified resource
for subPropertyOf and must rebuild the predicate hierarchy and hash-tables if
subPropertyOf relations are added to or deleted from the triple store. The
predicate hierarchy and index are invalidated if such a triple is added or deleted. The
index is re-build on the first indexable query. We assume that changes to the
constsubPropertyOf relations are infrequent.
4.2</p>
    </sec>
    <sec id="sec-10">
      <title>Fast save/load format</title>
      <p>Although attractive, the Prolog-only prototype has indicated that storing triples
using the native representation of Prolog terms does not provide the required
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
owl http://www.w3.org/2002/7/owl#
xsd http://www.w3.org/2000/10/XMLSchema#
dc http://purl.org/dc/elements/1.1/
eor http://dublincore.org/2000/03/13/eor#
speedup, while the files are, mainly due to the expanded namespaces, larger than
the RDF/XML source. An efficient format can be realised by storing the
atomtext only the first time. Later references to the same atom simply store this as
the N-th atom. A hash-table is used to keep track of the atoms already seen. An
atom on the file thus has two formats: X integer or A length text . Loading
requires an array of already-loaded atoms. The resulting representation has the
same size as the RDF/XML within 10%, and our reference dataset of 1.5 million
triples is loaded 22 times faster, or 5 seconds.
4.3</p>
    </sec>
    <sec id="sec-11">
      <title>Namespace handling</title>
      <p>Fully qualified resources are long, hard to read and difficult to maintain in
application source-code. On the other hand, representing resources as atoms holding
the fully qualified resource is attractive because it is compact and compares very
fast: the only test between two atoms as well as two resources is the equivalence
test. Prolog optimises this test by ensuring there are no two atoms representing
the same characters and therefore comparing atom-handles decides on
equivalence.</p>
      <p>To merge as much as possible of the advantages the API described in
Tab. 3 is encapsulated in a macro-expansion mechanism based on Prolog
goal expansion/2 rules. For each of the arguments that can receive a resource
a term of the format NS : Identifier , where NS is a registered abbreviation
of a namespace and Identifier is a local name, is mapped to the fully
qualified resource.9 The predicate rdf db:ns/2 maps registered short local namespace
identifiers to the fully qualified namespaces. Declared as multifile, this
predicate can be extended by the user. The initial definition contains the well-known
abbreviations used in the context of the sematic web. See Tab. 2.
9 In our original prototype we provided a more powerful version of this mapping at
runtime. In this version, output-arguments could be split into their namespace and
local name as well. After examining actual use of this extra facility in the prototype
and performance we concluded a limited compile-time alternative is more attractive.
rdf(?Subject, ?Predicate, ?Object)</p>
      <p>Elementary query for triples. Subject and Predicate are atoms representing the fully
qualified URL of the resource. Object is either an atom representing a resource or
literal(Text) if the object is a literal value. For querying purposes, Object can be
of the form literal(+Query, -Value), where Query is one of
exact(+Text)</p>
      <p>Perform exact, but case-insensitive match. This query is fully indexed.
substring(+Text)</p>
      <p>Match any literal that contains Text as a case-insensitive substring.
word(+Text)</p>
      <p>Match any literal that contains Text as a ‘whole word’.
prefix(+Text)</p>
      <p>Match any literal that starts with Text.
rdf has(?Subject, ?Predicate, ?Object, -TriplePred)</p>
      <p>This query exploits the rdfs:subPropertyOf relation. It returns any triple whose
stored predicate equals Predicate or can reach this by following the
transitive rdfs:subPropertyOf relation. The actual stored predicate is returned in
TriplePred.
rdf reachable(?Subject, +Predicate, ?Object)</p>
      <p>True if Object is, or can be reached following the transitive property Predicate
from Subject. Either Subject or Object or both must be specified. If one of Subject
or Object is unbound this predicate generates solutions in breath-first search order.
It maintains a table of visited resources, never generates the same resource twice
and is robust against cycles in the transitive relation.
rdf subject(?Subject)</p>
      <p>Enumerate resources appearing as a subject in a triple. The reason for this
predicate is to generate the known subjects without duplicates as one would get using
rdf(Subject, , ). The storage layer ensures the first triple with a specified Subject
is flagged as such.
rdf assert(+Subject, +Predicate, +Object)</p>
      <p>Assert a new triple into the database. Subject and Predicate are resources. Object
is either a resource or a term literal(Value).
rdf retractall(?Subject, ?Predicate, ?Object)</p>
      <p>Removes all matching triples from the database.
rdf update(+Subject, +Predicate, +Object, +Action)</p>
      <p>Replaces one of the three fields on the matching triples depending on Action:
subject(Resource)</p>
      <p>Changes the first field of the triple.
predicate(Resource)</p>
      <p>Changes the second field of the triple.
object(Object)</p>
      <p>Changes the last field of the triple to the given resource or literal(Value).</p>
      <p>With these declarations, we can write the following to get all individuals of
http://www.w3.org/2000/01/rdf-schema#Class on backtracking:
rdf(X, rdf:type, rdfs:’Class’)
4.4</p>
    </sec>
    <sec id="sec-12">
      <title>Performance evaluation</title>
      <p>We studied two queries using our reference set. First we generated all
solutions for rdf(X, rdf:type, wns:’Noun’). The 66025 nouns are generated in
rdfs_individual_of(Resource, Class)
:nonvar(Resource), !,
rdf_has(Resource, rdf:type, MyClass),
rdfs_subclass_of(MyClass, Class).
rdfs_individual_of(Resource, Class)
:nonvar(Class), !,
rdfs_subclass_of(SubClass, Class),
rdf_has(Resource, rdf:type, SubClass).
rdfs_individual_of(_Resource, _Class)
:throw(error(instantiation_error, _)).
0.0464 seconds (1.4 million alternatives/second). Second we asked for the type
of randomly generated nouns. This deterministic query is executed at 526,000
queries/second. Tests comparing rdf/3 with rdf has/4, which exploits the
rdfs:subPropertyOf relation show no significant difference in performance.
5</p>
    </sec>
    <sec id="sec-13">
      <title>Querying and RDFS</title>
      <p>Queries at the RDFS level are implemented using trivial Prolog rules
exploiting the primitives in Tab. 3. For example, Fig. 3 realises testing and generating
individuals. The first rule tests whether an individual belongs to a given class
or generates all classes the individual belongs to. The second rule generates all
individuals that belong to a specified class. The last rule is called in the unbound
condition. There is not much point generating all classes and all individuals that
have a type that is equal to or a subclass of the generated class and therefore
we generate a standard Prolog exception.
5.1</p>
    </sec>
    <sec id="sec-14">
      <title>A few User-queries</title>
      <p>Let us study the question ‘Give me an individual of WordNet ‘Noun’ labeled
right’. This non-deterministic query can be coded in two ways:
right_noun_1(R)
:rdfs_individual_of(R, wns:’Noun’),
rdf_has(R, rdfs:label, literal(right)).
right_noun_2(R)
:multi_cat(Label, CatList)
:setof(Label, wn_label(Label), Labels),
member(Label, Labels),
setof(Cat, lexical_category(Label, Cat), CatList),</p>
      <p>CatList = [_,_|_].
lexical_category(Label, Category)
:rdf_has(SynSet, rdfs:label, literal(Label)),
rdfs_individual_of(SynSet, Category),
rdf_has(Category, rdfs:subClassOf, wns:’LexicalConcept’).
wn_label(Label)
:rdfs_individual_of(SynSet, wns:’LexicalConcept’),
rdf_has(SynSet, rdfs:label, literal(Label)).</p>
      <p>The first query enumerates the subclasses of wns:Noun, generates their 66025
individuals and tests each for having the literal ‘right’ as label. The second
generates the 8 resources in the 1.5 million triple set labeled ‘right’ and tests
them to belong to wns:Noun. The first query requires 0.17 seconds and the second
0.37 milli-seconds to generate all alternatives.</p>
      <p>A more interesting question is ‘Give me a WordNet word that belongs to
multiple lexical categories’. The program is shown in Fig. 4. The first setof/3
generates the 123497 labels (a subproperty of wns:wordForm) defined in this
WordNet version. Next we examine the labels one by one, generating the
lexical categories and selecting the 6584 words that belongs to multiple categories.
The query completes in 9.33 seconds after 2.27 million calls on rdf has/4 and
rdf reachable/3.
6</p>
    </sec>
    <sec id="sec-15">
      <title>Visualisation</title>
      <p>For our annotation application we developed interactive editors. We are
reorganising these into a clean modular design for building RDF/RDFS and OWL tools.
The current toolkit provides a hierarchical browser with instance and class-view
on resources and a tool to generate classical RDF diagrams. Both tools provide
menus that exploit the registered source-information to view the origin of a triple
in a text-editor. Currently these tools help developers to examine the content of
the database. Figure 5 and Fig. 6 visualise the WordNet resource labeled right
in one of its many meanings.
7</p>
    </sec>
    <sec id="sec-16">
      <title>Related Work</title>
      <p>
        7.0.0.2 Protege
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a modular Java-based ontology editor that can be extended using plugins.
We regard Protege as complementary, providing interactive editing where we
only provide simple interactive browsing. The Protege ontology language does
not map one-to-one to RDFS, providing both extensions (e.g. cardinality) and
limitations (notably in handling subPropertyOf). New versions of Protege and
the introduction of OWL reduce this mismatch.
7.0.0.3 Jena
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a Java implementation for basic RDF handling. It aims at standard
compliance and a friendly access from Java. Although its focus and coverage are
slightly different the main difference is the choice of language.
7.0.0.4 Sesame
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is an extensible Java-based architecture realising load/save of RDF/XML,
modify the triple model and RQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] queries. It stresses a modular design where
(*)
notably the storage module can be replaced. Although scalable, the modular
approach with generic DBMS performs poorly (section 6.5 of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: Scalability
Issues).
8
      </p>
    </sec>
    <sec id="sec-17">
      <title>Discussion and Conclusions</title>
      <p>
        We have outlined alternatives and an existing implementation of a library for
handling semantic web languages in the Prolog language. We have demonstrated
that this library can handle moderately large RDF triple sets (3 million) using
237 MB memory, ranging upto 40 million on 32-bit hardware providing a 3.5 GB
address-space to applications. Further scaling either requires complicated
segmentation of the store or hardware providing a larger (e.g. 64-bit) address-space.
The library requires approx. 220 sec. to read 3 million triples from RDF/XML
and 10 sec. from its proprietary file-format. Updating the subPropertyOf cache
requires 3.3 sec. on this data-set. The library requires approx. 2 µs for the first
answer and 0.7 µs for providing alternatives from the result-set through Prolog
backtracking. All measurements on AMD Athlon 1600+ with 2 GB memory. The
perfermance of indexed queries is constant with regard to the size of the triple
set. The time required for not-indexed queries and cache-updates is proportional
with the size of the triple set.
formance. Although Prolog users are used to this it is desirable to have more
high-level query optimisation. Constraint logic programming (CLP) and tabling
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] possibly can improve efficiency and allow a more declarative programming
style in the presence of cycles in relations and other abnormalities in the
searchspace.
      </p>
      <p>Although Prolog queries are not strictly declarative due to required ordering,
cut and database manipulatons, experience with our first prototype has indicated
that the queries required for our annotation and search process are expressed
easily and concise in the Prolog language. We anticipate this infra structure is
also suitable for the prototyping and implementation of end-user query langages.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Bagnara</surname>
          </string-name>
          and
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Carro</surname>
          </string-name>
          .
          <article-title>Foreign language interfaces for Prolog: A terse survey</article-title>
          .
          <source>ALP newsletter</source>
          ,
          <year>Mey 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Jeen</given-names>
            <surname>Broekstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Arjohn</given-names>
            <surname>Kampman</surname>
          </string-name>
          .
          <article-title>Sesame: A generic architecture for storing and querying RDF and RDF Schema</article-title>
          .
          <source>Technical Report OTK-del-10</source>
          , Aidministrator Nederland bv,
          <year>October 2001</year>
          . URL: http://sesame.aidministrator.nl/publications/del10.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Draxler</surname>
          </string-name>
          .
          <article-title>Accessing relational and N F 2 databases through database set predicates</article-title>
          .
          <source>In Geraint A. Wiggins</source>
          , Chris Mellish, and Tim Duncan, editors,
          <source>ALPUK91: Proceedings of the 3rd UK Annual Conference on Logic Programming</source>
          ,
          <year>Edinburgh 1991</year>
          , Workshops in Computing, pages
          <fpage>156</fpage>
          -
          <lpage>173</lpage>
          . Springer-Verlag,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Grosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Eriksson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Fergerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Gennari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Tu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>Knowledge modeling at the millennium: The design and evolution of Prot´eg´e-2000</article-title>
          . In 12th Banff Workshop on Knowledge Acquisition, Modeling, and
          <string-name>
            <surname>Management</surname>
          </string-name>
          . Banff, Alberta,
          <year>1999</year>
          . URL: http://smi.stanford.edu/projects/protege (access
          <source>date: 18 December</source>
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Karvounarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Christophides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Plexousakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Alexaki</surname>
          </string-name>
          .
          <article-title>Querying community web portals</article-title>
          . URL: http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Brian</given-names>
            <surname>McBride</surname>
          </string-name>
          .
          <article-title>Jena: Implementing the rdf model and syntax specification</article-title>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G.</given-names>
            <surname>Miller. WordNet</surname>
          </string-name>
          :
          <article-title>A lexical database for english</article-title>
          .
          <source>Comm. ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ),
          <year>November 1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Bijan</given-names>
            <surname>Parsia</surname>
          </string-name>
          .
          <article-title>RDF applications with Prolog</article-title>
          .
          <source>O'Reilly XML.com</source>
          ,
          <year>2001</year>
          . http://www.xml.com/pub/a/2001/07/25/prologrdf.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Peterson</surname>
          </string-name>
          .
          <article-title>Introduction to the Art and Architecture Thesaurus</article-title>
          . Oxford University Press,
          <year>1994</year>
          . See also: http://www.getty.edu/research/tools/vocabulary/aat/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , Prasad Rao, Konstantinos Sagonas, Terrance Swift,
          <string-name>
            <given-names>and David S.</given-names>
            <surname>Warren</surname>
          </string-name>
          .
          <article-title>Efficient tabling mechanisms for logic programs</article-title>
          . In Leon Sterling, editor,
          <source>Proceedings of the 12th International Conference on Logic Programming</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>714</lpage>
          , Cambridge, June 13-18
          <year>1995</year>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A. Th. Schreiber.</surname>
          </string-name>
          <article-title>The web is not well-formed</article-title>
          .
          <source>IEEE Intelligent Systems, March/April</source>
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Th. Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dubbeldam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Wielinga</surname>
          </string-name>
          .
          <article-title>Ontologybased photo annotation</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>16</volume>
          (
          <issue>3</issue>
          ):
          <fpage>66</fpage>
          -
          <lpage>74</lpage>
          , May/June 2001.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. ULAN:
          <article-title>Union List of Artist Names. The Getty Foundation</article-title>
          . URL: http://www.getty.edu/research/tools/vocabulary/ulan/,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>