=Paper= {{Paper |id=None |storemode=property |title=Benchmarking the Performance of Linked Data Translation Systems |pdfUrl=https://ceur-ws.org/Vol-937/ldow2012-paper-09.pdf |volume=Vol-937 |dblpUrl=https://dblp.org/rec/conf/www/RiveroSBR12 }} ==Benchmarking the Performance of Linked Data Translation Systems== https://ceur-ws.org/Vol-937/ldow2012-paper-09.pdf
Benchmarking the Performance of Linked Data Translation
                       Systems

                                            ∗
                Carlos R. Rivero                      Andreas Schultz                      Christian Bizer
             University of Sevilla, Spain           Freie Universität Berlin,           Freie Universität Berlin,
               carlosrivero@us.es                          Germany                             Germany
                                                  a.schultz@fu-berlin.de                christian.bizer@fu-
                                                                                             berlin.de
                                                          David Ruiz
                                                  University of Sevilla, Spain
                                                         druiz@us.es

ABSTRACT                                                           1.   INTRODUCTION
Linked Data sources on the Web use a wide range of differ-          The Web of Linked Data is growing rapidly and covers a wide
ent vocabularies to represent data describing the same type        range of different domains, such as media, life sciences, pub-
of entity. For some types of entities, like people or biblio-      lications, governments, or geographic data [4, 13]. Linked
graphic record, common vocabularies have emerged that are          Data sources use vocabularies to publish their data, which
used by multiple data sources. But even for representing           consist of more or less complex data models that are repre-
data of these common types, different user communities use          sented using RDFS or OWL [13]. Some data sources try to
different competing common vocabularies. Linked Data ap-            reuse as much from existing vocabularies as possible in or-
plications that want to understand as much data from the           der to ease the integration of data from multiple sources [4].
Web as possible, thus need to overcome vocabulary hetero-          Other data sources use completely proprietary vocabularies
geneity and translate the original data into a single target       to represent their content or use a mixture of common and
vocabulary. To support application developers with this in-        proprietary terms [7].
tegration task, several Linked Data translation systems have
been developed. These systems provide languages to express         Due to these facts, there exists heterogeneity amongst vo-
declarative mappings that are used to translate heteroge-          cabularies in the context of Linked Data. According to [5],
neous Web data into a single target vocabulary. In this pa-        on the one hand, 104 out of the 295 data sources in the
per, we present a benchmark for comparing the expressivity         LOD Cloud only use proprietary vocabularies. On the other
as well as the runtime performance of data translation sys-        hand, the rest of the sources (191) use common vocabularies
tems. Based on a set of examples from the LOD Cloud, we            to represent some of their content, but also often extend and
developed a catalog of fifteen data translation patterns and        mix common vocabularies with proprietary terms to repre-
survey how often these patterns occur in the example set.          sent other parts of their content. Some examples of the use
Based on these statistics, we designed the LODIB (Linked           of common vocabularies are the following: regarding publi-
Open Data Integration Benchmark) that aims to reflect the           cations, 31.19% data sources use the Dublin Core vocabu-
real-world heterogeneities that exist on the Web of Data.          lary, 4.75% use the Bibliographic Ontology, or 2.03% use the
We apply the benchmark to test the performance of two              Functional Requirements for Bibliographic Records; in the
data translation systems, Mosto and LDIF, and compare              context of people information, 27.46% data sources use the
the performance of the systems with the SPARQL 1.1 CON-            Friend of a Friend vocabulary, 3.39% use the vCard ontology,
STRUCT query performance of the Jena TDB RDF store.                or 3.39% use the Semantically-Interlinked Online Communi-
                                                                   ties ontology; finally, regarding geographic data sets, 8.47%
                                                                   data sources use the Geo Positioning vocabulary, or 2.03%
Categories and Subject Descriptors                                 use the GeoNames ontology.
D.2.12 [Interoperability]: Data mapping;
H.2.5 [Heterogeneous Databases]: Data translation
                                                                   To solve these heterogeneity problems, mappings are used
∗Work partially done whilst visiting Freie Universität Berlin.    to perform data translation, i.e., exchanging data from the
                                                                   source data set to the target data set [19, 21]. Data trans-
                                                                   lation, a.k.a. data exchange, is a major research topic in
                                                                   the database community, and it has been studied for re-
                                                                   lational, nested relational, and XML data models [3, 10,
                                                                   11]. Current approaches to perform data translation rely on
                                                                   two types of mappings that are specified at different levels,
                                                                   namely: correspondences (modelling level) and executable
                                                                   mappings (implementation level). Correspondences are rep-
                                                                   resented as declarative mappings that are then combined
Copyright is held by the author/owner(s).                          into executable mappings, which consist of queries that are
LDOW2012, April 16, 2012, Lyon, France.
executed over a source and translate the data into a tar-                      Table 1: Prefixes of the sample patterns
get [7, 18, 19].
                                                                      Prefix      URI
In the context of executable mappings, there exists a num-            rdfs:       http://www.w3.org/2000/01/rdf-schema#
ber of approaches to define and also automatically generate            xsd:        http://www.w3.org/2001/XMLSchema#
them. Qin et al. [18] devised a semi-automatic approach               fb:         http://rdf.freebase.com/ns/
to generate executable mappings that relies on data-mining;           dbp:        http://dbpedia.org/ontology/
Euzenat et al. [9] and Polleres et al. [17] presented prelim-         lgdo:       http://linkedgeodataorg/ontology/
inary ideas on the use of executable mappings in SPARQL               gw:         http://govwild.org/ontology/
to perform data translation; Parreiras et al. [16] presented          po:         http://purl.org/ontology/po/
a Model-Driven Engineering approach that automatically                lgdp:       http://linkedgeodata.org/property/
transforms handcrafted mappings in MBOTL (a mapping                   movie:      http://data.linkedmdb.org/resource/movie/
language by means of which users can express executable               db:         http://www4.wiwiss.fu-berlin.de
mappings) into executable mappings in SPARQL or Java;                             ,→ /drugbank/resource/drugbank/
                                                                      skos:       http://www.w3.org/2004/02/skos/core#
Bizer and Schultz [7] proposed a SPARQL-like mapping lan-
                                                                      foaf:       http://xmlns.com/foaf/spec/
guage called R2R, which is designed to publish expressive,
                                                                      grs:        http://www.georss.org/georss/
named executable mappings on the Web, and to flexible
combine partial executable mappings to perform data trans-
lation. Finally, Rivero et al. [19] devised an approach called
                                                                 LODIB is designed to measure the following: 1) Expressiv-
Mosto to automatically generate executable mappings in
                                                                 ity: the number of mapping patterns that can be expressed
SPARQL based on constraints of the source and target data
                                                                 in a specific data translation system; 2) Time performance:
models, and also correspondences between these data mod-
                                                                 the time needed to perform the data translation, i.e., load-
els. In addition, translating amongst vocabularies by means
                                                                 ing the source file, executing the mappings, and serializing
of mappings is one of the main research challenges in the
                                                                 the result into a target file. In this context, LODIB provide
context of Linked Data, and it is expected that research ef-
                                                                 a validation tool that examines if the source data is repre-
forts on mapping approaches will be increased in the next
                                                                 sented correctly in the target data set: we perform the data
years [4]. As a conclusion, a benchmark to test data trans-
                                                                 translation task in a particular scenario using LODIB, and
lation systems in this context seems highly relevant.
                                                                 the target data that we obtain are the expected target data
                                                                 when performing data translation using a particular system.
To the best of our knowledge, there exist two benchmarks
to test data translation systems: STBenchmark and DTS-
                                                                 This paper is organized as follows: Section 2 presents the
Bench. STBenchmark [1] provides eleven patterns that oc-
                                                                 mapping patterns of our benchmark; in Section 3, we de-
cur frequently when integrating nested relational models,
                                                                 scribe the 84 data translation examples from the LOD Cloud
which makes it difficult for at least some of the patterns
                                                                 that we have analyzed, and the counting of the occurrences
to extrapolate to our context due to a number of inherent
                                                                 of mapping patterns in the examples; Section 4 deals with
differences between nested relational models and the graph-
                                                                 the design of our benchmark; Section 5 describes the eval-
based RDF data model that is used in the context of Linked
                                                                 uation of our benchmark with two data translation systems
Data [14]. DTSBench [21] allows to test data translation
                                                                 (Mosto and LDIF), and compares their performance with
systems in the context of Linked Data using synthetic data
                                                                 the SPARQL 1.1 performance of the Jena TDB RDF store;
translation tasks only, without taking real-world data from
                                                                 Section 6 describes the related work on benchmarking in the
Linked Data sources into account.
                                                                 Linked Data context; and, finally, Section 7 recaps on our
                                                                 main conclusions regarding LODIB.
In this paper, we present a benchmark to test data trans-
lation systems in the context of Linked Data. Our bench-
mark provides a catalogue of fifteen data translation pat-        2.     MAPPING PATTERNS
terns, each of which is a common data translation problem in     A mapping pattern represents a common data translation
the context of Linked Data. To motivate that these patterns      problem that should be supported by any data translation
are common in practice, we have analyzed 84 random exam-         system in the context of Linked Data. Our benchmark pro-
ples of data translation in the Linked Open Data Cloud.          vides a catalogue of fifteen mapping patterns that we have
After this analysis, we have studied the distribution of the     repeatedly discovered as we analyzed the heterogeneity be-
patterns in these examples, and have designed LODIB, the         tween different data sources in the Linked Open Data Cloud.
Linked Open Data Integration Benchmark, to reflect this           In the rest of this section, we present these patterns in de-
real-world heterogeneity that exists on the Web of Data.         tail. Note that for vocabulary terms in concrete examples
                                                                 we use the prefixes shown in Table 1.
The benchmark provides a data generator that produces
three different synthetic data sets, which reflect the pattern
distribution. These source data sets need to be translated       Rename Class (RC). Every source instance of a class C
into a single target vocabulary by the system under test.            is reclassified into the same instance of the renamed
This generator allows us to scale source data and it also            class C’ in the target. An example of this pattern is
automatically generates the expected target data, i.e., after        the renaming of class fb:location.citytown in Freebase
performing data translation over the source data. The data           into class dbp:City in DBpedia.
sets reflect the same e-commerce scenario that we already
                                                                 Rename Property (RP). Every source instance of a
used for the BSBM benchmark [6].
                                                                     property P is transformed into the same instance
     of the renamed property P’ in the target. An example        1:1 Value to Value (1:1). The value of every source in-
     is the renaming of property dbp:elevation in DBpe-               stance of a property P must be transformed by means
     dia into property lgdo:ele in LinkedGeoData, in which            of a function into the value of a target instance of
     both properties represent the elevation of a geographic          property P’. An example is dbp:runtime in DBpedia
     location.                                                        is transformed into movie:runtime in LinkedMDB, in
                                                                      which the source is expressed in seconds and the target
Rename Class based on Property (RCP). This pat-                       in minutes.
    tern is similar to the Rename Class pattern but it is
    based on the existence of a property. Every source           Value to URI (VtU). Every source instance of a prop-
    instance of a class C is reclassified into the same               erty P is translated into a target instance of prop-
    instance of the renamed class C’ in the target, if               erty P’ and the source object value is transformed
    and only if, the source instance is related with an-             into an URI in the target. An example of this pattern
    other instance of a property P. An example is the                is property grs:point in DBpedia, which is translated
    renaming of class dbp:Person in DBpedia into class               into property fb:location.location.geolocation in Free-
    fb:people.deceased person in Freebase, if and only if,           base, and the value of every instance of grs:point is
    an instance of dbp:Person is related with an instance            transformed into an URI.
    of property dbp:deathDate, i.e., if a deceased person in
    Freebase exists, there must exist a person with a date       URI to Value (UtV). This pattern is similar to the pre-
    of death in DBpedia.                                             vious one but the source instance relates to a URI
                                                                     that is transformed into a literal value in the target.
Rename Class based on Value (RCV). This pattern                      An example of the URI to Value pattern is property
    is similar to the previous pattern, but the prop-                dbp:wikiPageExternalLink in DBpedia that is trans-
    erty instance must have a specific value v to re-                 lated into property fb:common.topic.official website in
    name the source instance. An example is the re-                  Freebase, and the URI of the source instance is trans-
    naming of class gw:Person in GovWILD into class                  lated to a literal value in the target.
    fb:government.politician in Freebase, if and only if,
    each instance of gw:Person is related with an instance       Change Datatype (CD). Every source instance of a
    of property gw:profession and its value is the literal           datatype property P whose type is TYPE is renamed
    “politician”. This means that only people whose pro-             into the same target instance of property P’ whose
    fession is politician in GovWILD are translated into             type is TYPE’. An example of this pattern is property
    politicians in Freebase.                                         fb:people.person.date of birth in Freebase whose type
                                                                     is xsd:dateTime, which is translated into target prop-
Reverse Property (RvP). This pattern is similar to the               erty dbp:birthDate in DBpedia whose type is xsd:date.
    Rename Property pattern, but the property instance
    in the target is reversed, i.e., the subject is inter-       Add Language Tag (ALT). In this pattern, every source
    changed with the object. An example is the reverse of            instance of a property P is translated into a target
    property fb:airports operated in Freebase into property          instance of property P’ and a new language tag TAG is
    dbp:operator in DBpedia, in which the former relates             added to the target literal. An example of this pattern
    an operator with an airport, and the latter relates an           is that db:genericName in Drug Bank is renamed into
    airport with an operator.                                        property rdfs:label in DBpedia and a new language tag
                                                                    “@en” is added.
Resourcesify (Rsc). Every source instance of a property
    P is split into a target instance of property P’ and         Remove Language Tag (RLT). Every source instance
    an instance of property Q. Both instances are con-              of a property P is translated into a target instance of
    nected using a fresh resource, which establishes the            property P’ and the source instance has a language tag
    original connection of the instance of property P. Note         TAG that is removed. An example is skos:altLabel in
    that the new target resource must be unique and con-            DataGov Statistics, which has a language tag “@en”,
    sistent with the definition of the target vocabulary.            is translated into skos:altLabel in Ordnance Survey
    An example is the creation of a new URI or blank                and the language tag is removed.
    node when translating property dbp:runtime in DBpe-
    dia into po:duration in BBC by creating a new instance       N:1 Value to Value (N:1). A number of source instances
    of property po:version.                                          of properties P1 , P2 , . . . , Pn are translated into a sin-
                                                                     gle target instance of property P’, and the value of the
Deresourcesify (DRsc). Every source instance of a prop-              target instance is computed by means of a function
    erty P is renamed into a target instance of property P’,         over the values of the source instances. An example
    if and only if, P is related to another source instance          of this pattern is that we concatenate the values of
    of a property Q, that is, both instances use the same            properties foaf:givenName and foaf:surname in DBpe-
    resource. In this case, the source needs more instances          dia into property fb:type.object.name in Freebase.
    than the target to represent the same information. An
    example of this pattern is that an airport in DBpe-          Aggregate (Agg). In this pattern, we count the number of
    dia is related with its city served by property dbp:city,        source instances of property P, which is translated into
    and the name of this city is given as value of rdfs:label.       a target instance of property Q. An example is prop-
    This is transformed into property lgdp:city served in            erty fb:metropolitan transit.transit system.transit lines
    LinkedGeoData, which relates an airport with its city            in Freebase whose values are aggregated into a single
    served (as literal).                                             value of dbp:numberOfLines for each city in DBpedia.
        Table 2: Mapping patterns in the LOD Cloud                we analyzed both directions: one instance is the source and
                                                                  the other instance is the target, and backwards. Therefore,
      Code   Source triples         Target triples                the total number of examples we analyzed was 84. Then,
      RC     ?x a C                 ?x a C’                       we manually counted the number of mapping patterns that
      RP     ?x P ?y                ?x P’ ?y                      are needed to translate between the representations of the
             ?x a C                                               instances (neighboring instances were also considered to de-
             FILTER EXISTS {                                      tect more complex structural mismatches). These statistics
      RCP                           ?x a C’
             {?x P ?y} UNION                                      are publicly-available at [22].
             {?y P ?x} }
             ?x a C                                               In the next step, we computed the averages of our mapping
      RCV                           ?x a C’
             ?x P v                                               patterns grouped by the pair of source and target data set.
      RvP    ?x P ?y                ?y P’ ?x                      To compute them, in some cases, we analyzed the translation
                                    ?x Q ?z                       of one single instance since the data set of the Linked Data
      Rsc    ?x P ?y
                                    ?z P’ ?y                      source comprises only a couple of classes, such as Drug Bank
             ?x Q ?z                                              or Ordnance Survey. In other cases, we analyzed more than
      DRsc                          ?x P’ ?y                      one instance since the data set comprises a large number of
             ?z P ?y
      1:1    ?x P ?y                ?x P’ f(?y)                   classes, such as DBpedia or Freebase.
      VtU    ?x P ?y                ?x P’ toURI(?y)
      UtV    ?x P ?y                ?x P’ toLiteral(?y)           Table 3 presents the statistics of the mappings patterns that
      CD     ?x P ?yˆˆTYPE          ?x P’ ?yˆˆTYPE’               we have found in the LOD Cloud. The two first columns
      ALT    ?x P ?y                ?x P’ ?y@TAG                  stand for the source and target Linked Data data sets, the
      RLT    ?x P ?y@TAG            ?x P’ ?y                      following columns contain the averages of each mapping pat-
             ?x P1 ?v1                                            tern according to the source and the target, i.e., we count the
      N:1    ...                    ?x P’ f(?v1 , . . . , ?vn )   occurrences of mapping patterns in a number of examples
             ?x Pn ?vn                                            and compute the average. Note that, for certain data sets,
                                                                  we analyzed several examples of the same type; therefore,
      Agg    ?x P ?y                ?x Q count(?y)
                                                                  the final numbers of these columns are real numbers (no in-
                                                                  tegers). Finally, the last column contains the total number
                                                                  of instances that we analyzed for each pair of Linked Data
Finally, we present a summary of these mapping patterns
                                                                  data sets.
in Table 2. The first column of this table stands for the
code of each pattern; the second and third columns establish
                                                                  On the one hand, Rename Class and Rename Property map-
the triples to be retrieved in the source and the triples to
                                                                  ping patterns appear in the vast majority of the analyzed
be constructed in the target using a SPARQL-like notation.
                                                                  examples, since these patterns are very common in practice.
Note that properties are represented as P and Q, classes as
                                                                  On the other hand, there are some patterns that are not so
C, constant values as v, tag languages as TAG, and data
                                                                  common, e.g., Value to URI and URI to Value patterns ap-
types as TYPE.
                                                                  pear only once in all analyzed examples (between DBpedia
                                                                  and Drug Bank). Table 4 presents the average occurrences of
3. LODIB GROUNDING                                                the LODIB mapping patterns over all analyzed examples.
In order to base the LODIB Benchmark on realistic real-
world distributions of these mapping patterns, we analyzed
84 data translation examples from the LOD Cloud and
                                                                  4.   LODIB DESIGN
                                                                  Based on the previously described statistics, we have de-
counted the occurrences of mapping patterns in these ex-
                                                                  signed the LODIB Benchmark. The benchmark consists of
amples. First, we selected different Linked Data sources by
                                                                  three different source data sets that need to be translated
exploring the LOD data set catalog maintained on CKAN1 .
                                                                  by the system under test into a single target vocabulary.
The criteria we followed was to choose sources that comprise
                                                                  The topic of the data sets is the same e-commerce data set
a great number of owl:sameAs links with other Linked Data
                                                                  that we already used for the BSBM Benchmark [6]. The
sources, i.e., more than 25, 000. Furthermore, we tried to se-
                                                                  data sets describe products, reviews, people and some more
lect sources from the major domains represented in the LOD
                                                                  lightweight classes, such as product price using different
Cloud. Therefore, the selected Linked Data sources are the
                                                                  source vocabularies. For translation from the representation
following: ACM (RKB Explorer), DBLP (RKB Explorer),
                                                                  of an instance in the source data sets to the target vocab-
Dailymed, Drug Bank, DataGov Statistics, Ordnance Sur-
                                                                  ulary, data translation systems need to apply several of the
vey, DBpedia, GeoNames, Linked GeoData, LinkedMDB,
                                                                  presented mapping patterns. The descriptions of these data
New York Times, Music Brainz, Sider, GovWILD, Pro-
                                                                  sets are publicly-available at the LODIB homepage [22].
ductDB, and OpenLibrary. Note that, for each domain of
the LOD Cloud, there are at least two Linked Data sources
                                                                  These data sets take the previously computed averages of
that contribute to our statistics except from the domain of
                                                                  Table 4 into account by multiplying them by a constant
user-generated content.
                                                                  (11), and divided each one by another constant (3, the total
                                                                  number of data translation tasks, i.e., from each source data
After selecting these sources, we randomly selected 42 exam-
                                                                  set to the target data set). As a result, each of the three data
ples, each of which comprises a pair of instances that are con-
                                                                  translation tasks comprises a number of mapping patterns,
nected by an owl:sameAs link. For each of these examples,
                                                                  and we present the numbers in Table 5, in which the total
1
    http://thedatahub.org/group/lodcloud                          number of mapping patterns for each task is 18.
                                                  Table 3: Mapping patterns in Linked Data sources

Source            Target            RC     RP     RCP    RCV     RvP    Rsc    DRsc    1:1    VtU    UtV    CD     ALT    RLT    N:1    Agg    Total
ACM               DBLP              0.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Dailymed          Drug Bank         1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
DataGov Stats     Ordnance Survey   1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   1.00   0.00   0.00   0.00   1
DBLP              ACM               0.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
DBpedia           Freebase          2.14   8.57   0.64   0.00    2.21   2.29   0.00    0.57   0.00   0.00   1.14   0.00   0.00   0.14   0.07   14
DBpedia           Geonames          1.00   3.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   3
DBpedia           Linked GeoData    1.00   3.50   0.00   0.00    0.00   0.00   1.50    0.13   0.00   0.00   2.25   0.00   0.00   0.00   0.00   8
DBpedia           LinkedMDB         1.00   5.50   0.33   0.00    0.33   0.00   0.00    0.33   0.00   0.00   0.00   0.00   0.00   0.00   0.00   6
DBpedia           Drug Bank         1.00   1.00   0.00   0.00    0.00   0.00   0.00    1.00   1.00   0.00   0.00   0.00   3.00   0.00   0.00   1
DBpedia           New York Times    0.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   1.00   0.00   1
DBpedia           Music Brainz      1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Drug Bank         DBpedia           1.00   1.00   0.00   0.00    0.00   0.00   0.00    1.00   0.00   1.00   0.00   3.00   0.00   1.00   0.00   1
Drug Bank         Freebase          1.00   1.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Drug Bank         Sider             1.00   1.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Drug Bank         Dailymed          1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Freebase          DBpedia           2.14   8.57   0.29   0.07    2.21   0.00   2.29    0.79   0.00   0.00   1.14   0.00   0.00   0.00   0.14   14
Freebase          GovWILD           1.00   4.50   0.00   0.00    0.00   0.00   2.00    1.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   2
Freebase          Drug Bank         1.00   1.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
GeoNames          DBpedia           1.00   3.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
GovWILD           Freebase          1.00   4.50   0.00   0.50    0.00   2.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.50   0.00   2
Linked GeoData    DBpedia           1.00   3.50   0.88   0.75    0.00   1.50   0.00    0.13   0.00   0.00   2.25   0.00   0.00   0.00   0.00   8
LinkedMDB         DBpedia           1.00   5.50   0.00   0.00    0.33   0.00   0.00    0.33   0.00   0.00   0.00   0.00   0.00   0.00   0.00   6
Music Brainz      DBpedia           1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
New York Times    DBpedia           0.00   0.00   0.00   0.00    0.00   0.00   0.00    3.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
OpenLibrary       ProductDB         0.00   0.00   0.00   0.00    0.00   1.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Ordnance Survey   DataGov Stats     1.00   0.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   1.00   0.00   0.00   1
ProductDB         OpenLibrary       0.00   0.00   0.00   0.00    0.00   0.00   1.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
Sider             Drug Bank         1.00   1.00   0.00   0.00    0.00   0.00   0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   1
                                     Table 4: Average occurrences of the mapping patterns

      RC      RP     RCP      RCV      RvP      Rsc     DRsc      1:1      VtU     UtV     CD      ALT    RLT     N:1    Agg
      0.87    2.01   0.08     0.05     0.18     0.24    0.24      0.30     0.04    0.04    0.24    0.14   0.14    0.09   0.01

                              Table 5: Number of mapping patterns in the data translation tasks

  Task       RC      RP   RCP      RCV      RvP        Rsc   DRsc        1:1   VtU     UtV        CD   ALT    RLT     N:1    Agg
  Task 1     3       7    1        0        1          1     1           1     0       1          1    0      1       0      0
  Task 2     3       7    0        1        0          1     1           1     1       0          1    1      0       1      0
  Task 3     3       7    0        0        1          1     1           1     0       0          1    1      1       0      1


We have implemented a data generator to populate and scale               Resourcesify (Rsc). Property src1:birthDate needs to be
the three source data sets that we have specified in the previ-               renamed into property tgt:birthDate and a new target
ous section, which is publicly-available at the LODIB home-                  instance of property tgt:birth is needed, e.g., the date
page [22]. In the data generator, we defined a number of data                 of birth of src1-data:Smith-W person instance.
generation rules, and the generated data are scaled based on
the number of product instances that each data set contains.             Deresourcesify (DRsc). Property src2:revText needs to
In our implementation, we use an extension of the language                   be renamed into property tgt:text, if and only if, the
used in [8], which allows to define particular value genera-                  instance of property src2:revText is related to another
tion rules for basic types, such as xsd:string or xsd:date. In               source instance of property src2:hasText, e.g., the text
addition, missing properties often occurs in the context of                  of src2-data:Review-HTC-W-S review instance.
the Web of Data, therefore, we also provide 44 statistical
distributions in our implementation to randomly select dis-              1:1 Value to Value (1:1). Property src2:price needs to
tribute properties, including Uniform, Normal, Exponential,                   be renamed into property tgt:productPrice, and the
Zipf, Pareto and empirical distributions, to mention a few.                   value must be transformed by means of function us-
                                                                              DollarsToEuros, since the source price is represented
In this section, we provide examples on how a data transla-                   in US dollars and the target in Euros, e.g., the price
tion system needs to translate the data from the source to                    of src2-data:HTC-Wildfire-S product instance.
the target vocabulary regarding the mapping patterns in the              Value to URI (VtU). In this example, we need to re-
three data translation tasks. Specifically, Figure 1 presents                 name property src1:personHomepage into property
a number of source triples that are translated into a number                 tgt:personHomepage, and the values of the source in-
of target triples. Note that we use prefixes src1:, src2:, src3:              stances are transformed into URIs in the target, e.g.,
and tgt: for referring to the source data sets and the single                the homepage of src1-data:Smith-W person instance.
target vocabularies of the data sets; and src1-data:, src2:-
data, src3:-data and tgt:-data for referring to the source and           URI to Value (UtV). In this example, we need to re-
target data. These examples are the following:                               name property src2:productHomepage into property
                                                                             tgt:productHomepage, and the URIs of the source in-
                                                                             stances are transformed into values in the target, e.g.,
Rename Class (RC). Class src1:Product needs to be re-                        the homepage of src2-data:HTC-Wildfire-S product
    named into class tgt:Product, e.g., src1-data:Canon-                     instance.
    Ixus-20010 product instance.
                                                                         Change Datatype (CD). Property dc:date in the first
Rename Property (RP). Property src1:name needs to                            source needs to be translated into dc:date, and its
    be renamed into property rdfs:label, e.g., the name of                   type is transformed from xsd:string into xsd:date, e.g.,
    src1-data:Canon-Ixus-20010 product instance.                             the date of src1-data:Review-CI-001 review instance.
Rename Class based on Property (RCP). In this case,                      Add Language Tag (ALT). property src2:mini-cv needs
    class src1:Person needs to be renamed into class                         to be renamed into property tgt:bio and a new tag
    tgt:Reviewer, if and only if, property src1:author ex-                   language “@en” is added in the target, e.g., the CV of
    ists, e.g., src1-data:Smith-W person instance.                           src2-data:Doe-J person instance.

Rename Class based on Value (RCV). In this exam-                         Remove Language Tag (RLT). property src1:revText
    ple, class src2:Product needs to be renamed into                        needs to be renamed into property tgt:text and the
    class tgt:OutdatedProduct, if and only if, property                     tag language of the source is removed, e.g., the text of
    src2:outdated exists and has value “Yes”, e.g., src2-                   src1-data:Review-CI-001 review instance.
    data:HTC-Wildfire-S product instance.
                                                                         N:1 Value to Value (N:1). properties foaf:firstName and
Reverse Property (RvP). In this example, property                            foaf:surname in the second source need to be trans-
    src1:author is reversed into property tgt:author, e.g.,                  lated into property tgt:name, and their values are
    src1-data:Review-CI-001 review instance and src1-                        concatenated to compose the target value, e.g., the
    data:Smith-W person instance are related and reversed                    first name and surname of src2-data:Doe-J person
    in the target.                                                           instance.
   src1-data:Canon-Ixus-20010                                      src1-data:Canon-Ixus-20010
       a                    src1:Product ;                             a                   tgt:Product ;
       src1:name            "Canon Ixus"^^xsd:string .                 rdfs:label          "Canon Ixus"^^xsd:string .
   src2-data:HTC-Wildfire-S                                        src2-data:HTC-Wildfire-S
       a                    src2:Product ;                             a                   tgt:OutdatedProduct ;
       src2:outdated        "Yes"^^xsd:string ;                        tgt:productPrice    "152.59"^^xsd:double ;
       src2:price           "199.99"^^xsd:double ;                     tgt:productHomepage "htc.com/"^^xsd:string .
       src2:productHomepage  .
   src3-data:VPCS                                                  src3-data:VPCS
       a                    src3:Product ;                             a                    tgt:Product ;
       src3:hasReview       src3-data:Review-VPCS-01 ;                 tgt:totalReviews     "3"^^xsd:integer .
       src3:hasReview       src3-data:Review-VPCS-02 ;
       src3:hasReview       src3-data:Review-VPCS-03 .
   src1-data:Review-CI-001                                         src1-data:Review-CI-001
       a                    src1:Review ;                              a                   tgt:Review ;
       src1:author          src1-data:Smith-W ;                        dc:date             "01/10/2011"^^xsd:date ;
       dc:date              "01/10/2011"^^xsd:string ;                 tgt:text            "This camera is awesome!" .
       src1:revText         "This camera is awesome!"@en .
   src2-data:Review-HTC-W-S                                        src2-data:Review-HTC-W-S
       a                    src2:Review ;                              a                   tgt:Review ;
       src2:hasText         src2-data:Review-HTC-W-S-Text .            tgt:text            "Great phone"^^xsd:string .
   src2-data:Review-HTC-W-S-Text
       a                    src2:ReviewText ;                      src1-data:Smith-W
       src2:revText         "Great phone"^^xsd:string .                a                   tgt:Reviewer ;
   src1-data:Smith-W                                                   tgt:author          src1-data:Review-CI-001 ;
       a                    src1:Person ;                              tgt:birthDate       tgt-data:Smith-W-BirthDate .
       src1:birthDate       "06/07/1979"^^xsd:date ;               tgt-data:Smith-W-BirthDate
       src1:personHomepage "wsmith.org"^^xsd:string .                  a                   tgt:Birth ;
   src2-data:Doe-J                                                     tgt:birthDate       "06/07/1979"^^xsd:date ;
       a                    src2:Person ;                              tgt:personHomepage  .
       src2:mini-cv         "Born in the US."^^xsd:string ;        src2-data:Doe-J
       foaf:firstName       "John"^^xsd:string ;                       a                   tgt:Person ;
       foaf:surname         "Doe"^^xsd:string .                        tgt:bio             "Born in the US."@en ;
                                                                       tgt:name            "John Doe"^^xsd:string .


                        (a) Source triples                                                (b) Target triples

                                             Figure 1: Sample data translation tasks.


Aggregate (Agg). we count the number of instances                        as rdfs:domain of the source and target ontologies
    of source property src3:hasReview, and this num-                     to be integrated, and a number of 1-to-1 correspon-
    ber needs to be translated as the value of property                  dences between TBox ontology entities [19]. Mosto
    tgt:totalReviews, e.g., the reviews of src3-data:VPCS                tool also allows to run these automatically generated
    product instance.                                                    executable mappings using several semantic-web tech-
                                                                         nologies, such as Jena TDB, Jena SDB, or Oracle 11g.
5. EXPERIMENTS                                                           For our tests we advised Mosto to generate (Jena
The LODIB benchmark can be used to measure two per-                      specific) SPARQL Construct queries. The data sets
formance dimensions of a data translation system. For one                were translated using these generated queries and Jena
thing we state the expressivity of the data translation sys-             TDB (version 0.8.10).
tem, that is, the number of mapping patterns that can be
expressed in each system. Secondly we measure the perfor-          LDIF It is an ETL like component for integrating data
mance by taking the time to translate all source data sets to          from Linked Open Data sources [24]. LDIF’s inte-
the target representation. For our benchmark experiment,               gration pipeline includes one module for vocabulary
we generated data sets in N-Triples format containing 25, 50,          mapping, which executes mappings expressed in the
75 and 100 million triples. For each data translation system           R2R [7] mapping language. All the R2R mappings
and data set the time is measured starting with reading the            were written by hand. LDIF supports different run
input data set file and ending when the output data set has             time profiles that apply to different work loads. For
been completely serialized to one or more N-Triples files.              the smaller data sets we used the in-memory profile,
                                                                       in which all the data is stored in memory. For the
We have applied the benchmark to test the performance of               100M data set we executed the Hadoop version, which
two data translation systems:                                          was run in single-node mode (pseudo-distributed) on
                                                                       the benchmarking machine as the in-memory version
                                                                       was not able to process this use case.
Mosto It is a tool to automatically generate executable
    mappings amongst semantic-web ontologies [20]. It is
    based on an algorithm that relies on constraints such          To allow other researchers to reproduce our results, the con-
figuration and all used mappings for Mosto and LDIF are          is not suitable in our context since semantic-web technolo-
publicly-available at the LODIB homepage [22]. To set the       gies have a number of inherent differences with respect to
results of these two systems into context of the more popu-     nested relational models [2, 14, 15, 25].
lar tools in the Linked Data space, we compared the perfor-
mance of both systems with the SPARQL 1.1 performance           Rivero et al. [21] devised DTSBench, a benchmark to test
of the Jena TDB RDF store (version 0.8.10). All the map-        data translation systems in the context of semantic-web
pings for Jena TDB were expressed as SPARQL 1.1 Con-            technologies that provides seven data translation patterns.
struct queries, which were manually written by ourselves.       Furthermore, it provides seven parameters that allow to
For loading the source data sets we used the more efficient       create a variety of synthetic, domain-independent data
tdbloader2, which also generates data set statistics that are   translation tasks to test such systems. This benchmark
used by the TDB optimizer.                                      is suitable to test data translation amongst Linked Data
                                                                sources, however, the patterns that it provides are inspired
Table 6 gives an overview of the expressivity of the data       from the ontology evolution and information integration
translation systems. All mapping patterns are expressable       contexts, not the Linked Data context. Therefore, it allows
in SPARQL 1.1, so all the mappings are actually executed on     to generate synthetic tasks based on these patterns, but not
Jena TDB. The current implementation of the Mosto tool          real-world Linked Data translation tasks.
generates Jena-specific SPARQL Construct queries, which
could, in general, cover all the mapping patterns. However,     There are other benchmarks in the literature that are suit-
the goal of Mosto tool is to automatically generate SPARQL      able to test semantic-web technologies. However, they can-
Construct queries by means of constraints and correspon-        not be applied to our context, since none of them focuses on
dences without user intervention, therefore, the meaning of     data translation problems, i.e., they do not provide source
a checkmark in Table 6 is that it was able to automatically     and target data sets and a number of queries to perform data
generate executable mappings from the source and target         translation. Furthermore, these benchmarks focus mainly on
data sets and a number of correspondences amongst them.         Select SPARQL queries, which are not suitable to perform
Note that Mosto tool is not able to deal with RCP and RCV       data translation, instead of on Construct SPARQL queries.
mapping patterns since it does not allow the renaming of
classes based on conditional properties and/or values. Fur-     Guo et al. [12] presented LUBM, a benchmark to compare
thermore, it does not support Agg mapping pattern since         systems that support semantic-web technologies, which pro-
it does not allow to aggregate/count properties. In R2R it      vides a single ontology, a data generator algorithm that al-
is not possible to express aggregates, therefore no aggrega-    lows to create scalable synthetic data, and fourteen SPARQL
tion mapping was executed on LDIF. In order to check if         queries of the Select type. Wu et al. [26] presented the ex-
the source data has been correctly and fully translated, we     perience of the authors when implementing an inference en-
developed a validation tool that examines if the source data    gine for Oracle. Bizer and Schultz [6] presented BSBM, a
is represented correctly in the target data set. Using the      benchmark to compare the performance of SPARQL queries
validation tool, we verified that all three systems produce      using native RDF stores and SPARQL-to-SQL query rewrit-
proper results.                                                 ers. Schmidt et al. [23] presented SP2 Bench, a benchmark
                                                                to test SPARQL query management systems, which com-
To compare the performance and the scaling behaviour of         prises both a data generator and a set of benchmark queries
the systems we have run the benchmark on an Intel i7 950        in SPARQL.
(4 cores, 3.07GHz, 1 x SATA HDD) machine with 24GB of
RAM running Ubuntu 10.04.
                                                                7.   CONCLUSIONS
                                                                Linked Data sources try to reuse as much existing vocab-
Table 7 summarizes the overall runtimes for each mapping
                                                                ularies as possible in order to ease the integration of data
system and use case. Since Mosto and R2R were not able
                                                                from multiple sources. Other data sources use completely
to express all mapping patterns, we created three groups:
                                                                proprietary vocabularies to represent their content or use a
1) one that did not execute the RCV, RCP and AGG map-
                                                                mixture of common terms and proprietary terms. Due to
pings, 2) one without the AGG mapping and 3) one execut-
                                                                these facts, there exists heterogeneity amongst vocabularies
ing the full set of mappings. The results show that Mosto
                                                                in the context of Linked Data. Data translation, which re-
and Jena TDB have – as expected – similar runtime per-
                                                                lies on executable mappings and consists of exchanging data
formance because Mosto internally uses Jena TDB. LDIF
                                                                from a source data set to a target data set, helps solve these
on the other hand is about twice as fast on the smallest
                                                                heterogeneity problems.
data set and about three times as fast for the largest data
set compared to Jena TDB and Mosto. One reason for the
                                                                In this paper, we presented LODIB, a benchmark to test
differences could be that LDIF highly parallelizes its work
                                                                data translation systems in the context of Linked Data. Our
load, both in the in-memory as well as the Hadoop version.
                                                                benchmark provides a catalogue of fifteen data translation
                                                                patterns, each of which is a common data translation prob-
6. RELATED WORK                                                 lem. Furthermore, we analyzed 84 random examples of data
The most closely related benchmarks are STBenchmark [1]         translation in the LOD Cloud and we studied the distribu-
and DTSBench [21]. Alexe et al. [1] devised STBenchmark,        tion of the patterns in these examples. Taking these results
a benchmark that is used to test data translation systems       into account, we devised three source and one target data
in the context of nested relational models. This benchmark      set based on the e-commerce domain that reflect the map-
provides eleven patterns that occur frequently in the infor-    ping pattern distribution. Each source data set comprises
mation integration context. Unfortunately, this benchmark       one data translation task.
                                         Table 6: Expressivity of the mapping systems

                      RC    RP     RCP     RCV     RvP    Rsc      DRsc      1:1   VtU     UtV   CD    ALT   RLT    N:1     Agg
    Mosto queries      X     X                      X      X        X        X     X       X     X      X    X       X
    SPARQL 1.1         X     X      X       X       X      X        X        X     X       X     X      X    X       X      X
            R2R        X     X      X       X       X      X        X        X     X       X     X      X    X       X

                           Table 7: Runtimes of the mapping systems for each use case (in seconds)

                                                                  25M               50M               75M          100M
          Mosto SPARQL queries / Jena TDB1                      3,121              7,308          10,622           15,763
                               R2R / LDIF1                      1,506              2,803           4,482           *5,718
                   SPARQL 1.1 / Jena TDB1                       2,720              6,418          10,481           16,548
                                 R2R / LDIF2                    1,485              2,950           4,715           *5,784
                      SPARQL 1.1 / Jena TDB2                    2,839              6,508          12,386           19,499
                     SPARQL 1.1 / Jena TDB                2,925          6,858         12,774                      20,630
          * Hadoop version of LDIF as single node cluster. Out of memory for in-memory version.
          1
            without RCP, RCV and AGG mappings
          2
            without AGG mapping


Current benchmarks concerning data translation focus on                 21744, TIN2010-09809-E, TIN2010-10811-E, and TIN2010-
nested relational models, which is not suitable for our con-            09988-E), and partially financed through funds received
text since semantic-web technologies have a number of in-               from the European Community’s Seventh Framework Pro-
herent differences with respect to these models, or in the               gramme (FP7) under Grant Agreement No. 256975 (LATC)
general context of semantic-web technologies. To the best of            and Grant Agreement No. 257943 (LOD2).
our knowledge, LODIB is the first benchmark that is based
on real-world distribution of data translation patterns in the
LOD Cloud, and that is specifically tailored towards the                 8.   REFERENCES
Linked Data context.                                                     [1] B. Alexe, W. C. Tan, and Y. Velegrakis.
                                                                             STBenchmark: Towards a benchmark for mapping
In this paper, we compared three data translation systems,                   systems. PVLDB, 1(1):230–244, 2008.
Mosto, SPARQL 1.1/Jena TDB and R2R, by scaling the                       [2] R. Angles and C. Gutiérrez. Survey of graph database
three data translation tasks. In this context, Mosto is able to              models. ACM Comput. Surv., 40(1), 2008.
deal with 12 out of the 15 mapping patterns described in this            [3] M. Arenas and L. Libkin. Xml data exchange:
paper, SPARQL 1.1/Jena TDB deals with 15 out of 15, and                      Consistency and query answering. J. ACM, 55(2),
R2R deals with 14 out of 15. Furthermore, the results show                   2008.
that R2R outperforms both Mosto and SPARQL 1.1/Jena                      [4] C. Bizer, T. Heath, and T. Berners-Lee. Linked Data -
TDB data translation systems when performing the three                       the story so far. Int. J. Semantic Web Inf. Syst.,
data translation tasks. Our empirical study has shown that,                  5(3):1–22, 2009.
to translate data amongst data sets in the LOD Cloud, there              [5] C. Bizer, A. Jentzsch, and R. Cyganiak. State of the
is only needed a small set of simple mapping patterns. In this               LOD cloud. Available at: http://www4.wiwiss.
context, the fifteen mapping patterns identified in this paper                 fu-berlin.de/lodcloud/state/#terms, 2011.
were enough to cover the vast majority of data translation               [6] C. Bizer and A. Schultz. The Berlin SPARQL
problems when integrating these data sets.                                   benchmark. Int. J. Semantic Web Inf. Syst.,
                                                                             5(2):1–24, 2009.
As the Web of Data grows, the task of translating data                   [7] C. Bizer and A. Schultz. The R2R framework:
amongst data sets moves into the focus. We hope that                         Publishing and discovering mappings on the Web. In
LODIB benchmark will be considered useful by the develop-                    1st International Workshop on Consuming Linked
ers of the currently existing Linked Data translation systems                Data (COLD), 2010.
as well as the systems to come. More information about
                                                                         [8] D. Blum and S. Cohen. Grr: Generating random
LODIB is publicly-available at the homepage [22], such as
                                                                             RDF. In ESWC (2), pages 16–30, 2011.
the exact specification of the benchmark data sets, the data
generator, examples of the mapping patterns, or the statis-              [9] J. Euzenat, A. Polleres, and F. Scharffe. Processing
tics about these mappings that we found in the LOD Cloud.                    ontology alignments with SPARQL. In CISIS, pages
                                                                             913–917, 2008.
                                                                        [10] R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa.
Acknowledgments                                                              Data exchange: semantics and query answering.
Supported by the European Commission (FEDER), the                            Theor. Comput. Sci., 336(1):89–124, 2005.
Spanish and the Andalusian R&D&I programmes (grants                     [11] A. Fuxman, M. A. Hernández, C. T. H. Ho, R. J.
P07-TIC-2602, P08-TIC-4100, TIN2008-04718-E, TIN2010-                        Miller, P. Papotti, and L. Popa. Nested mappings:
     Schema mapping reloaded. In VLDB, pages 67–78,
     2006.
[12] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark
     for OWL knowledge base systems. J. Web Sem.,
     3(2-3):158–182, 2005.
[13] T. Heath and C. Bizer. Linked Data: Evolving the Web
     into a Global Data Space. Morgan & Claypool, 2011.
[14] B. Motik, I. Horrocks, and U. Sattler. Bridging the
     gap between OWL and relational databases. J. Web
     Sem., 7(2):74–89, 2009.
[15] N. F. Noy and M. C. A. Klein. Ontology evolution:
     Not the same as schema evolution. Knowl. Inf. Syst.,
     6(4):428–440, 2004.
[16] F. S. Parreiras, S. Staab, S. Schenk, and A. Winter.
     Model driven specification of ontology translations. In
     ER, pages 484–497, 2008.
[17] A. Polleres, F. Scharffe, and R. Schindlauer.
     SPARQL++ for mapping between RDF vocabularies.
     In ODBASE, pages 878–896, 2007.
[18] H. Qin, D. Dou, and P. LePendu. Discovering
     executable semantic mappings between ontologies. In
     ODBASE, pages 832–849, 2007.
[19] C. R. Rivero, I. Hernández, D. Ruiz, and
     R. Corchuelo. Generating SPARQL executable
     mappings to integrate ontologies. In ER, pages
     118–131, 2011.
[20] C. R. Rivero, I. Hernández, D. Ruiz, and
     R. Corchuelo. Mosto: Generating SPARQL executable
     mappings between ontologies. In ER Workshops,
     pages 345–348, 2011.
[21] C. R. Rivero, I. Hernández, D. Ruiz, and
     R. Corchuelo. On benchmarking data translation
     systems for semantic-web ontologies. In CIKM, pages
     1613–1618, 2011.
[22] C. R. Rivero, A. Schultz, and C. Bizer. Linked Open
     Data Integration Benchmark (LODIB) specification.
     Available at:
     http://www4.wiwiss.fu-berlin.de/bizer/lodib/,
     2012.
[23] M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel.
     SP2 Bench: A SPARQL performance benchmark. In
     ICDE, pages 222–233, 2009.
[24] A. Schultz, A. Matteini, R. Isele, C. Bizer, and
     C. Becker. LDIF - Linked Data integration framework.
     In 2nd International Workshop on Consuming Linked
     Data (COLD), 2011.
[25] M. Uschold and M. Grüninger. Ontologies and
     semantics for seamless connectivity. SIGMOD Record,
     33(4):58–64, 2004.
[26] Z. Wu, G. Eadon, S. Das, E. I. Chong, V. Kolovski,
     M. Annamalai, and J. Srinivasan. Implementing an
     inference engine for RDFS/OWL constructs and
     user-defined rules in oracle. In ICDE, pages
     1239–1248, 2008.