<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Till Blume</string-name>
          <email>tbl@informatik.uni-kiel.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ansgar Scherp</string-name>
          <email>asc@informatik.uni-kiel.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ZBW - Leibniz Information Centre for Economics, Christian-Albrechts-Universität zu Kiel</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ZBW - Leibniz Information Centre for Economics, Christian-Albrechts-Universität zu Kiel</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Graph indices are a key to manage huge amounts of distributed graph data. Instance-level indices are available that focus on the fast retrieval of nodes. Furthermore, there are so-called schema-level indices focusing on summarizing nodes sharing common characteristics, i. e., the combination of attached types and used property-labels. We argue that there is not a one-size-tfis-all schema-level index. Rather, a parameterized, formal model is needed that allows to quickly design, tailor, and compare diefrent schema-level indices. We abstract from related works and provide the formal model FLuID using basic building blocks to eflxibly denfie diferent schema-level indices. The FLuID model provides parameterized simple and complex schema elements together with four parameters. We show that all indices modeled in FLuID can be computed in O(n). Thus, FLuID enables us to ecfiiently implement, compare, and validate variants of schema-level indices tailored for specicfi application scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>linked data</kwd>
        <kwd>schema-level indices</kwd>
        <kwd>formal model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Summarizing data can help to ecfiiently manage huge
amounts of data, and for many application scenarios indices
are available that tfi the specicfi information need [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
For graph data, we can distinguish between instance-level
indices and schema-level indices. Instance-level indices focus
on the fast retrieval of nodes or answering queries regarding
reachability, distance, and shortest path [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. For example,
they can be queried to search for metadata about a book
by its title “Towards a clean air policy” [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Schema-level
indices (SLIs) focus on summarizing nodes sharing
common characteristics, i. e., the combination of attached types
and used property-labels. Thus, SLIs support the eficient
execution of structural queries, e. g., searching for
bibliographic metadata using the type bibo:book [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For example,
instead of indexing the instance-level information illustrated
in Fig. 1, a simple schema-level index would only
memorize the combined use of the types, e. g., bibo:Book and
dct:BibliographicResource. With such SLIs, search systems
like LODeX [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and LODatio [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] support their users in
nifding and exploring data sources. In the past, diferent
SLIs have been developed for diefrent purposes that lack a
common formalization and thus compatibility and
comparability [
        <xref ref-type="bibr" rid="ref13 ref14 ref16 ref19 ref2 ref20 ref4 ref9">9, 14, 16, 4, 13, 2, 20, 19</xref>
        ]. We rfimly believe the
future development and further research on this topic can
benetfi from a common formal model. Thus, we conduct an
in-depth study of existing approaches. We abstract from the
related work and provide the formal model FLuID (Formal
schema-Level Index model for the web of Data) consisting
of basic building blocks to eflxibly denfie SLIs.
      </p>
      <p>The remainder of the paper is organized as follows: In
Sect. 2, we discuss the related work. In Sect. 3, we show
how equivalence relations can model any SLI. Subsequently
in Sect. 4, we denfie our building blocks as equivalence
relations, i. e., schema elements and their parameterizations. In
Sect. 5, we analyze the space and build-time complexity of
indices denfied with FLuID and in Sect. 6.1 we conduct an
empirical evaluation to support the analysis. We then
outline a processing pipeline as well as a search prototype in
Sect. 6.2, before we conclude in Sect. 7.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Schema-level indices (SLIs) support to eficiently execute
structural queries over distributed graph data. Structural
queries focus on how nodes (resources) are described, i. e.,
which combinations of types and properties are used to
model the resources. There are various diferent possibilities
and variants of how to denfie an SLI and diefrent denfiitions
of SLI allow for capturing diefrent schemas. In the
following, we present an overview of SLIs with emphasis on their
schema structure, their application scenario, and how they
were formalized.</p>
      <p>
        Characteristic Sets [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] summarize instances along
common incoming properties and outgoing properties. They
were denfied as sets of instances using a rfist-order-logic
expression over triples. They were evaluated with respect to
the accuracy of cardinality estimations for queries in RDF
databases. SemSets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are denfied as sets of instances that
share the same outgoing properties which are connected
to a common target resource. They were denfied as sets
over their own “Property Graph Data Model”. They were
developed to discover semantically similar sets in knowledge
graphs in order to improve keyword-based ad-hoc retrieval.
Christodoulou et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] applied a hierarchical clustering
algorithm on RDF data in order to determine clusters, i. e.,
sets of instances that are characterized by the same set of
properties. They were denfied as sets in an textual
definition. The clusters are annotated using the RDF type
information of the clustered instances, which is then used
to derive a schema from the data sources on the Web of
Data. ABSTAT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and LODeX [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] summarize instances
based on a common set of RDF types and properties
linking to resources with the same set of types. ABSTAT’s
schema structure was informally denfied as triples in a
textual description. Additionally, ABSTAT selects a minimal
number of types from the set of types such that all remaining
types are sub-classes of the selected types. LODeX’s schema
structure was denfied using a new grammar for their own
model. LODeX uses a clustering of RDF types to select a
representative type. Thus, they can comprehensively
visualize several datasets hosted on DataHub. TermPicker [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
summarizes instances based on a common set of types, a
common set of properties, and a common set of types of all
property-linked resources. The schema structure was
informally introduced by examples. The goal was to make
datadriven recommendations of vocabulary terms.
      </p>
      <p>
        One of the rfist SLIs using bisimulation is DataGuides [
        <xref ref-type="bibr" rid="ref14 ref9">9,
14</xref>
        ]. Bisimulation operates on state transition systems and
denfies an equivalence relation over states [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Two states
are considered equivalent (or bisimilar) if they change into
equivalent states with the same type of transition.
Interpreting a labeled graph as a representation of a state transition
system allows for the application of bisimulation on RDF
data in order to discover structurally equivalent parts in the
graph. Thus, DataGuides [
        <xref ref-type="bibr" rid="ref14 ref9">9, 14</xref>
        ] summarizes instances for
which all outgoing paths for the whole subgraph are
equivalent. DataGuides were evaluated on relational database
systems using the Object Exchange Model (OEM). Since then,
several SLIs adapted the idea of bisimulation and applied a
stratiefid k-bisimulation on RDF and OEM [
        <xref ref-type="bibr" rid="ref13 ref15">15, 13</xref>
        ]. A
stratiefid k-bisimulation is a bisimulation where the maximum
length of the considered path is k edges long [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Another
example is SchemEX [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], that summarizes instances similar
to ABSTAT and LODeX, based on a common set of types
and properties linking to resources with a common set of
types. However, it does not perform any selection of types
for the purpose of cluster labeling.
      </p>
      <p>
        All SLIs presented above define a single, fixed schema
structure. Considering the primary focus of the paper,
the schema structures are often denfied informally in a
textual description or only explained by examples [
        <xref ref-type="bibr" rid="ref19 ref20 ref3 ref4">4, 3,
20, 19</xref>
        ]. One investigated SLI is denfied using a
mathematical model, which, however, was designed not for the
SLI directly but rather the surrounding context [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Only
few indices are denfied using basic rfist-order-logic [
        <xref ref-type="bibr" rid="ref13 ref16 ref9">9, 16,
13</xref>
        ], which could be reused. However, to the best of our
knowledge, there is only a single parameterization of an
SLI suggested by Tran et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. They model a
labelparameterized and height-parameterized index. With
labelparameterization, only specicfi properties are considered and
height-parameterization limits the maximum path-length of
the subgraphs stored in the index. In summary, there
exists no single, fully parameterized model which formally
describes SLIs in general and which can be reused in order
to develop, compare, and validate SLIs.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>EQUIVALENCES OVER GRAPH DATA</title>
      <p>A data graph G is denfied by G ⊆ VUB × P × (VUB ∪ L),
where VUB denotes the set of URIs and blank nodes, P the
set of properties, and L the set of literals. A triple is a
statement about a resource s ∈ VUB ∪ P in the form of a
subject-predicate-object expression (s, p, o) ∈ G. Moreover,
the properties P can be divided into disjoint subsets P =
Ptype ∪˙ Prel, where Ptype contains the properties denoting
type information and Prel contains the properties between
instances in the data graph. If not stated otherwise, Ptype
only contains rdf:type and Prel all p ∈ P \ Ptype.</p>
      <p>
        As discussed in Sect. 2, SLIs summarize instances based
on their schema, i. e., common use of types and properties.
For SLIs, we can distinguish between abstract schema level
and the entity mapping level [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In our context, the abstract
schema level denfies the schema given by the index
denfiition, e. g., taking only properties into account. We call these
the Schema Elements. The entity mapping level is a concrete
assignment of an instance to such an Schema Element. We
will call Schema Elements with instances mapped to them
Instantiated Schema Elements.
      </p>
      <p>
        Each instance uses a denfied set of types and properties
and thus exactly one schema. Therefore, the mapping of
instances to Instantiated Schema Elements is unique. SLIs
partition the data graph into disjoint subsets of instances,
where each subset is described by an Instantiated Schema
Element. Equivalence relations can describe this graph
partitioning in a formal way [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>Definition 1 (Equivalence Relation). For a given</title>
        <p>
          set X, an equivalence relation on X is a subset EQR ⊆
X × X, that is reflexive, symmetric, and transitive. When
(x, y) ∈ EQR, we say that x is equivalent to y or x ∼ y.
For any y ∈ X, the subset of X of all x that are equivalent
to y is called the equivalence class of y, denoted [y]EQR.
Any two equivalence classes are either disjoint or coincide.
This means that any equivalence relation on X denfies a
partition (decomposition) of X, and vice versa [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Furthermore, it can be shown that the intersection of two
equivalence relations over X is also an equivalence relation.
        </p>
        <p>In order to ensure the correctness of the approach, we
formally denfie instances as equivalence relation over the
data graph G. With instances being denfied as equivalence
relation any equivalence relation on top of instances
consequently will be an equivalence relation over the data graph.</p>
        <p>
          Definition 2 (instance). Instances are sets of triples
in the data graph G sharing a common subject URI. The
equivalence relation I ⊆ G × G is defined as ((i1, p1, o1), (i2,
p2, o2)) ∈ I ⇔ i1 = i2. We write [i]I or Ii to denote the
equivalence class of the instance with subject URI i.
This denfiition of an instance maps each triple in G to
exactly one instance determined by its subject URI. Thus,
Def. 2 denfies a partition over the data graph G and
consequently qualiefis as an equivalence relation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In the
context of SLI, we call equivalence classes of instances the
schema elements. We connect the schema elements to
generated instance information by using the notion of
payload [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The payload comprises information about the
actual data, e. g., all instances or only references to their
data source. In summary a SLI can be denfied over the data
graph G, an equivalence relation EQR, and an n-tuple of
payload functions PAY.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Definition 3 (Schema-level Index (SLI)).</title>
        <p>Formally, a schema-level index is a 3-tuple (G, EQR,
PAY), where G is the data graph which is indexed, EQR
is an equivalence relation over instances in G, and PAY
is an n-tuple of payload functions, which map instance
information to equivalence classes in EQR.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>FLuID’S BUILDING BLOCKS</title>
      <p>The FLuID model consists of basic building blocks, which
can be combined to denfie any SLI. We have simple and
complex schema elements, which can be further specialized
with our four parameterizations. This section is organized
into two parts: rfist we denfie simple schema structures and
then we continue with complex schema structures.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Simple Schema Structures</title>
      <p>We start by denfiing a simple schema element called
Object Cluster. Object Clusters partition the data graph
by mapping instances based on a common set of neighboring
objects. Please note that the denfiition qualiefis as
equivalence relation since it is reeflxive, symmetric and transitive.</p>
      <sec id="sec-5-1">
        <title>Definition 4 (Object Cluster OC). Object Clus</title>
        <p>ters partition the data graph by mapping instances [i1]I
and [i2]I , based on a common set of triples where only
the object is considered. The equivalence relation OC is
defined as follows: ([i1]I , [i2]I ) ∈ OC ⇔ ∀(i1, p1, o1)∃(i2, p2,
o2) : o1 = o2 ∧ ∀(i2, p2, o2) ∃(i1, p1, o1) : o1 = o2</p>
        <p>The Object Cluster summarizes instances, not taking any
property information into account. This can be changed
using our rfist parameterization, the label parameterization
lp, which allows ignoring a certain set of properties.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Definition 5 (Label Parameterization lp). The</title>
        <p>label parameterization is a function lp(EQR, Pr), which
takes as input an equivalence relation EQR and a set of
properties Pr ⊆ P and returns an equivalence relation
EQRPr . The returned equivalence relation EQRPr is a
restriction of EQR in terms that all assertions about the
triples in EQR only need to be true if the property of the
triple is included in the parameter property set Pr.
Restricting any schema element with such a property set in
fact relaxes the constraints given by the schema element.
For example, the label parameterization lp applied on the
Object Cluster OC using the properties Ptype summarizes
instances which have the same set of resources connected
over the property rdf:type. This means any other object is
not relevant to determine the equivalence. Please note, any
label parameterized schema element still qualifies as
equivalence relation since the same principle as before applies.</p>
        <p>
          To suficiently cover all SLIs we need more schema
elements in FLuID. Therefore, we can analogously denfie two
further simple schema elements called Property Cluster
(PC) and Property-Object Cluster (POC). The PC
summarizes instances based on the same properties (p1 = p2)
and the POC based on the same property-object tuples
(p1 = p2 ∧ o1 = o2). The Property-Object Cluster is su-fi
cient for a schema structure denfied by SemSets [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] since it
compares objects and properties combined.
        </p>
        <p>
          So far, our schema elements OC, P C, and P OC only
take outgoing properties into account. However, schema
structures like Characteristic Sets [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] consider also
incoming properties. To address incoming properties, an
undirected version of the three simple schema elements can be
denfied by additionally considering the incoming triples ( x,
p, i) ∈ G with i as the subject of the instance being in object
position. We omit the formal denfiition of all undirected
schema elements and only present the undirected Property
Cluster u-P C as an example. ([i1]I , [i2]I ) ∈ u-P C ⇔ ([i1]I ,
[i2]I ) ∈ P C ∧ ∀(x1, p1, i1) ∈ G ∃(x2, p2, i2) ∈ G : p1 =
p2 (and vice versa). The undirected Property Cluster u-P C
resembles the schema structure of Characteristic Sets [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
4.2
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Complex Schema Structures</title>
      <p>
        The simple schema elements introduced above
summarize instances by comparing incoming and outgoing triples
of an instance. However, some SLIs like SchemEX [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
TermPicker [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], ABSTAT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and LODeX [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] denfie
schema structures that go beyond the scope of a single
instance. The simple schema elements are already
combinations of equivalence relations by using the identity
equivalence “=” on properties and objects. We denfie complex
schema elements as an extension of simple schema elements.
      </p>
      <sec id="sec-6-1">
        <title>Definition 6 (Complex Schema Element CSE).</title>
        <p>A complex schema element partitions the data graph by
summarizing instances based on three given equivalence
relations ∼ s, ∼ p, and ∼ o. It can be described as 3-tuple
CSE := (∼ s, ∼ p, ∼ o). Two instances [i1]I , [i2]I are
considered equivalent, if i1 ∼ s i2 ∧ p1 ∼ p p2 ∧ o1 ∼ o o2
holds true for all triples in both instances.</p>
        <p>
          Example 1. We demonstrate the benetfi of complex
schema elements by denfiing CSE-1 := (P C, T, P C)
and CSE-2 := (P C, =, P C), with T being an arbitrary
tautology. Since T considers all properties equal, the
Property Cluster in object position of CSE-1 considers sets
properties. In contrast, CSE-2 uses the identity equivalence
on predicate position, thus, all 2-hop property paths have
to match exactly. The two instances [i3]I and [i4]I with
outgoing properties as illustrated in Fig. 2 are considered
equal according to the equivalence of CSE-1 since the 1-hop
properties are equal and the 2-hop properties are equal.
However, according to CSE-2, they are not considered
equal, since the property paths are not identical.
We apply the same concept to model TermPicker [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]:
(lp(OC, Ptype) ∩ lp(P C, Prel), lp(=, ∅), lp(OC, Ptype))
        </p>
        <p>
          To model TermPicker, we make use of the intersection of the
label parameterized Object Cluster and the label
parameterized Property Cluster. This way, instances need to have the
same type sets and the same Property Cluster. The
independence of the objects’ type sets and the connecting properties
can be achieved using lp(=, ∅) as predicate equivalence ∼ p
in the complex schema element. The identity equivalence on
the empty set is a tautology. The schema of ABSTAT [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
LODeX [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ], and SchemEX [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is defined straigth forward:
(lp(OC, Ptype), lp(=, Prel), lp(OC, Ptype))
To generalize the concept of taking multiple neighboring
instances into account, we denfie the chaining
parameterization cp. As suggested by Tran et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], we
parameterize regarding the maximal path length of the subgraph
structure. As illustrated in Fig. 2, the complex schema
element can consider the neighborhood of up to two hops.
When chaining k schema elements, the pattern is recursively
applied up to k hops.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Definition 7 (Chaining Parameterization cp).</title>
        <p>
          The chaining parameterization is a function cp(CSE, k),
∼whoi)chantdakaeschaaicnoimngplpeaxrasmcheetmera kel∈emNenats CinSpEut :a=nd(∼rest,u∼ rnps,
an equivalence relation CSEk. Formally, this chaining
of k complex schema elements up to length k can be
recursively defined as bisimulation [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Two instances
[i1]I and [i2]I are equivalent according to cp(CSE, k) if
three conditions hold true: (1) For k = 0 the subject
equivalence i1 ∼ s i2. (2) For k &gt; 0 all three equivalences
i1 ∼ s i2 ∧ p1 ∼ p p2 ∧ o1 ∼ o o2. (3) For k &gt; 0 the
recursion step ([o1]I , [o2]I ) ∈ cp(CSE, k − 1).
4.2.1
        </p>
        <sec id="sec-6-2-1">
          <title>Support for Unions of Instances</title>
          <p>FLuID supports a parameterization of the instance de-fi
nition which allows considering instances that resemble the
same real-world entity by using the owl:sameAs property.
In order to take this information into account, we formally
introduce SameAs instances.</p>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>Definition 8 (SameAs Instance). The equivalence</title>
        <p>relation σ summarizes instances based on the semantics
of owl:sameAs in equivalence classes [I]σ , called SameAs
instance. For all instances [i1]I , [i2]I ∈ [I]σ , there is a path
over all edges (independent of the edges direction) labeled
owl:sameAs in G from i1 to i2.</p>
        <p>
          Furthermore, it can be shown, that the assignment of an
instance to a SameAs Instance is unique, by reducing the
problem to nfiding weakly connected components in an
owl:sameAs-labeled subgraph of G, as is has been done by
Ding et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. With the notion of σ , we can now denfie
the instance parameterization to consider SameAs instances
instead of single instances.
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>Definition 9 (Instance Parameterization ip).</title>
        <p>The instance parameterization is a function ip(EQR, σ ),
which extends any simple or complex schema element EQR
to additionally consider all connected instances following the
instance equivalence relation σ . The returned equivalence
relation EQRσ is an extension of EQR, which restricts the
triples to be in [I]σ .</p>
        <p>As an example, we apply the instance parameterization ip
on the Object Cluster equivalence relation OC using the
SameAs instances: (I1, I2) ∈ ip(OC, σ ) ⇔ ∀(i1, p1, o1) ∈
[I1]σ ∃(i2, p2, o2) ∈ [I2]σ : o1 = o2 (and vice versa).</p>
        <p>
          As the example shows, the instance parameterized OC
considers the SameAs network [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and thus merges instances.
Fig. 2 shows an example graph. According to the Object
Cluster denfiition, the instances [ i1]I , [i2]I , and [i3]I are
not equivalent. Summarizing [i1]I and [i2]I to a SameAs
instance [I]σ leads to the equivalence of all three instances.
        </p>
        <sec id="sec-6-4-1">
          <title>4.2.2 Support for Ontology Inferencing</title>
          <p>
            In the Web of Data, there are assertions about individuals
and assertions about RDF types and properties [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. For
example, a dataset can contain the following assertions:
&lt;http://bnb.data.bl.uk/doc/resource/009670097&gt; &lt;dct:creator&gt;
&lt;http://bnb.data.bl.uk/id/organization/GreatBritain[..]&gt; .
&lt;dct:creator&gt; &lt;rdfs:domain&gt; &lt;bibo:Document&gt; .
&lt;dct:creator&gt; &lt;rdfs:range&gt; &lt;foaf:Person&gt; .
          </p>
          <p>
            The triples using rdfs:domain and rdfs:range allow inferring
additional knowledge about individuals using the property
dct:creator. The schema summarization tool ABSTAT [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]
incorporates information derived from an ontology by
inferring triples based on a subtype schema graph. ABSTAT’s
schema graph is constructed by extracting the contained
schema assertions. We extend the idea of the schema
graph from ABSTAT but include all RDFS properties
in the schema graph. Thus, our RDFS schema graph
contains hierarchical dependencies of rdfs:subClassOf and
rdfs:subPropertyOf in a tree structure with further cross
connections regarding rdfs:range and rdfs:domain.
          </p>
          <p>Definition 10 (Schema Graph). Let SG := (VC ∪ P,
E) be an edge-labeled directed multigraph and E ⊆ (VC ∪P )×
(VC ∪ P ). The set of nodes is the union of the set of RDF
classes and properties. The edge-label function ϕ : E → P
assigns labels from a given set of possible properties P to all
edges e ∈ E.</p>
          <p>We construct the RDFS schema graph by extracting
all triples containing RDFS vocabulary terms, namely all
properties PRDF S = {rdfs:subClassOf , rdfs:subPropertyOf ,
rdfs:range, rdfs:domain} and label the schema graph using
the RDFS edge-label function ϕ RDF S. In the following,
we denote the schema graph constructed using the labeling
function ϕ RDF S with SGRDF S. Having the hierarchically
dependencies of types and properties represented using a
Schema Graph, additional triples can be inferred if possible,
e. g., when a property p1 is used in the data graph and exists
as node in the schema graph.</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>Definition 11 (Inferencing Parameterization).</title>
        <p>The inferencing parameterization is a function Φ( G, SG),
which takes any data graph G and schema graph SG as input
and based on the entailment rules denfied in the schema
graph SG returns a data graph GΦ , which additionally
includes all inferred triples.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. SPACE AND TIME COMPLEXITY</title>
      <p>In this section, we analyze the complexity of the
computation process of SLIs denfied with FLuID. To this end, we
conduct a space and time complexity analysis of the
computation process with a particular focus on the impact of the
parameterizations. To compute an SLI, the instance data
needs to be mapped to instantiated schema elements. This
can be done, for example, by extracting all properties of an
instance to form a Property Cluster. For instances using
the same properties, the same schema element is computed.
This can be eficiently implemented using hash maps, which
ensure constant time access. As discussed in Sect. 4, each
SLI denfied with FLuID can be described as a combination
of parameterized simple schema elements using
parameterized complex schema elements. Simple schema elements can
check the equivalence of two instances without considering
neighboring instances. Since instances partition the data
graph (Def. 2) and schema elements partition instances, for
each triple in the data graph one operation for each simple
schema element is needed.</p>
      <p>Schema Elements. We denote with c the number of
simple schema elements given by the concrete SLI denfiition
using FLuID. Thus, without parameterizations, we have
linear space and time complexity in the order of O(c · n), with
c simple schema elements in the denfiition and n triples in
the data graph. Please note, the undirected schema
elements are an exception, since considering incoming
properties produces an overlap of triples. The incoming property
of instance [i1]I ∈ G is the outgoing property of another
instance [i2]I ∈ G. Thus, for undirected schema elements,
we may have to consider each triple twice.</p>
      <p>Label Parameterization. The label parameterization
reduces the number of considered objects and properties for
each simple schema element by restricting the properties
p to be in the set Pr (Def. 5). Considering all excluded
properties P \ Pr and that each property can occur more
than one time in the dataset, we can denfie a constant
l ≥ | P \ Pr|, which denotes the number of occurrences
of excluded properties in the dataset. Thus, the space
complexity is still linear, but we can nfid a lower upper
bound O(c · (n − l)). The time complexity is unchanged.
Instance Parameterization. The instance
parameterization aggregates instances and thus does not impact the
overall size of the index. Aggregating instances to unions can be
done in constant time like triples are aggregated to instances
in constant time using hash maps. Thus, the time
complexity remains unchanged.</p>
      <p>Inference Parameterization. The inferencing
parameterization requires additional space to store all inferred types
and properties. According to our denfiition of schema graph
construction (Sect. 4.2.2), types and properties are only
added to the schema graph, if there exists a triple in the
dataset using a property in PRDF S. We can assume that we
have a limited number of such schema triple s &lt;&lt; n in the
schema graph compared to the data graph size n.
Furthermore, the complexity depends on the number of additional
triples g ≤ s that can actually be inferred for each triple in
the data graph. For example, we have two triples (s1, p1,
o1) ∈ G and (s2, p2, o2) ∈ G with p1 in the schema graph
and p2 not in the schema graph. Then, only for property
p1, for example, all super-properties can be fetched by
following the subPropertyOf relations, e. g., {p3, p4}. Thus, we
have an upper bound for space complexity using the
inference parameterization in the order of O(c · (n − l) · g). Please
note, with a linear dependency g = f (n), we would end up
with a quadratic complexity. In the worst case, we extract a
fully connected (complete) schema graph. That means, for
each indexed triple, all s possible triples in the schema graph
are inferred. Furthermore, the worst case requires all triples
in the data graph to use properties from PRDF S. This is
unrealistic for real-world datasets.</p>
      <p>In our experiment described in the subsequent section,
we use two datasets. From processing these datasets, we
know that the smaller dataset has 2.0% RDFS properties
and the larger dataset has 0.6% RDFS properties. This
leads to a factor of g &lt; 1.001. Thus, it appears safe to
assume that there is no linear dependency of g and n and
that the complexity is not quadratic.</p>
      <p>The schema graph can be implemented using hash maps,
which guarantees constant time for lookup and addition
operations. Inferencing operations are linear in the number
of inferrable types and properties. Thus, we have the same
time complexity as for the space complexity. Furthermore,
all triples s in the schema graph should be excluded from the
index using the label parameterization, which would increase
the number of excluded triples l.</p>
      <p>Chaining Parameterization. The chaining
parameterization denfies the instance’s neighborhood up to a maximum
path length of k. Thus for each instance, we need to store
up ck instantiated simple schema elements. The denfiition
of complex schema elements allows avoiding the
computation of ck simple schema elements for each instance. For
each instance, c simple schema elements need to be
computed. When considering the instance’s neighborhood up to
a maximum path length of k, the c computed simple schema
elements for each instance can be reused.</p>
      <p>Overall Complexity. The space complexity of the index
is in the order of O(ck · (n − l) · g) and the overall time
complexity is in the order of O(c · k · n · g) with c simple
schema elements denfied in the SLI, the chaining parameter
k, l excluded properties in the data graph using the label
parameterization, and g inferrable triples using the
inference parameterization as constant factors independent of n.
Thus, indices denfied with FLuID can be computed in linear
time and space with respect to the number of triples n.
6.
6.1</p>
    </sec>
    <sec id="sec-8">
      <title>EVALUATION AND PROTOTYPE</title>
    </sec>
    <sec id="sec-9">
      <title>Empirical Evaluation</title>
      <p>We empirically evaluate the RDFS schema graph SGRDF S
of two crawled datasets to support our claims regarding the
parameters s and g in our complexity analysis from Sect. 5.
We compare the number of statements s in SGRDF S to the
number of triples n in the dataset G. Second, we count how
many additional statements can be inferred using SGRDF S.
To his end, we compare the number of all additional
properties and types that could be inferred to the number of triples
in the dataset n to average the parameter g.</p>
      <p>
        Datasets. We use two crawled datasets of the Web of Data
with diefrent characteristics. The TimBL-11M dataset
contains about 11 Million triples [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The crawl was conducted
with a breadth-rfist search starting from the FOAF prolfie
of Tim Berners-Lee. Regular snapshots from the Web of
Data are provided by the Dynamic Linked Data
Observatory (DyLDO) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We use their rfist snapshot containing
about 127 Million triples crawled from about 95, 000 seed
URIs. This crawl was done with a breadth-rfist search but
limited to a crawling depth of two [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>Empirical Evaluation Results. The schema graph has a
size of about 2.0% of the TimBL-11M dataset and 0.6% of
the DyLDO-127M dataset. In the TimBL-11M dataset on
average 1.7 additional properties and 4.8 additional types
were inferred. For the DyLDO-127M dataset, on average
2.1 additional properties and 13.5 additional types were
inferred. Furthermore, for only about 15% of the triples
in both datasets, the inferencing operation was necessary.
For the remaining triples, there was no corresponding entry
in the schema graph. Rather than having for each triple all
possible triples inferred, as in the worst case suggests, we
measured that on average it is only about 0.008. Thus, the
factor g for the space and time complexity can be estimated
as a small constant factor g &lt; 1.001.
6.2</p>
    </sec>
    <sec id="sec-10">
      <title>Towards a FLuID Prototype</title>
      <p>
        We use FLuID to index the distributed graph data of the
Web of Data. LODatio+ (http://lodatio.informatik.
uni-kiel.de/) is a search engine to nfid relevant data
sources given a structural query. However, the index
LODatio+ currently uses is tailored for one specicfi application
need. As depicted in Fig. 3, we are implementing FLuID in
a generic processing pipeline and are updating LODatio+ to
understand any denfied index following the FLuID model.
LODatio+ is already an extension of LODatio [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], but
performing generic queries on any FLuID-index is ongoing work.
      </p>
    </sec>
    <sec id="sec-11">
      <title>7. CONCLUSION</title>
      <p>We have presented the novel, parameterized schema-level
index model FLuID which is suficient to express the
functionalities of existing SLIs and beyond. We showed that the
time and space complexity of any SLI developed with FLuID
scales linear with respect to the number of triples indexed.
Implementing FLuID in a single computation- and
queryframework as well as qualitatively comparing existing and
new approaches is ongoing work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Benedetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Po</surname>
          </string-name>
          .
          <article-title>Online index extraction from linked open data sources</article-title>
          .
          <source>In LD4IE</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Benedetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Po</surname>
          </string-name>
          .
          <article-title>Exposing the underlying schema of LOD sources</article-title>
          .
          <source>In Joint IEEE/WIC/ACM WI and IAT</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Christodoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. W.</given-names>
            <surname>Paton</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A. A.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          .
          <article-title>Structure inference for Linked Data sources using clustering</article-title>
          .
          <source>In Joint EDBT/ICDT</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciglan</surname>
          </string-name>
          , K. Nøravg˚, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Hluchy</surname>
          </string-name>
          <article-title>´. The SemSets model for ad-hoc semantic list search</article-title>
          .
          <source>In WWW</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo and M.</surname>
          </string-name>
          <article-title>Lenzerini. TBox and ABox reasoning in expressive description logics</article-title>
          .
          <source>In AAAI Technical Reports</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shinavier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shangguan</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. L. McGuinness.</surname>
          </string-name>
          <article-title>SameAs networks and beyond: Analyzing deployment status and implications of owl:sameAs in Linked Data</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. Q.</given-names>
            <surname>Dividino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scherp</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Gr¨oner, and</article-title>
          <string-name>
            <given-names>T.</given-names>
            <surname>Grotton</surname>
          </string-name>
          .
          <article-title>Change-a-lod: Does the schema on the linked data cloud change or not? In COLD</article-title>
          , volume
          <volume>1034</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>European</given-names>
            <surname>Mathematical Society</surname>
          </string-name>
          . Equivalence relation. http://www.encyclopediaofmath.org/index.php?title=
          <source>Equivalence_relation&amp;oldid=35990</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Goldman</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Widom.</surname>
          </string-name>
          <article-title>DataGuides: Enabling query formulation and optimization in semistructured databases</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gottron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scherp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Krayer</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Peters.</surname>
          </string-name>
          <article-title>LODatio: using a schema-level index to support users infinding relevant sources of linked data</article-title>
          .
          <source>In K-CAP</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <source>Database Foundations for Scalable RDF Processing</source>
          , pages
          <fpage>202</fpage>
          -
          <lpage>249</lpage>
          . Springer Berlin Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>T. K</surname>
          </string-name>
          <article-title>¨afer</article-title>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdelrahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>O'Byrne, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          .
          <article-title>Observing linked data dynamics</article-title>
          .
          <source>In ESWC</source>
          , volume
          <volume>7882</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Konrath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gottron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Scherp</surname>
          </string-name>
          .
          <article-title>SchemEX - ecfiient construction of a data catalogue by stream-based indexing of Linked Data</article-title>
          . J. Web Sem.,
          <volume>16</volume>
          :
          <fpage>52</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>J. McHugh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Goldman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Quass</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Lore: a database management system for semistructured data</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <fpage>54</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nestorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Wiener</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Chawathe</surname>
          </string-name>
          . Representative Objects:
          <article-title>Concise representations of semistructured, hierarchial data</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Moerkotte</surname>
          </string-name>
          .
          <article-title>Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Al-Naymat</surname>
          </string-name>
          .
          <article-title>Graph indexing and querying: a review</article-title>
          .
          <source>Int. Journal of Web Information Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>101</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sangiorgi</surname>
          </string-name>
          .
          <article-title>On the origins of bisimulation and coinduction</article-title>
          .
          <source>ACM Trans. Program. Lang. Syst.</source>
          ,
          <volume>31</volume>
          (
          <issue>4</issue>
          ):
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          :
          <fpage>41</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schaible</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gottron</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Scherp.</surname>
          </string-name>
          <article-title>TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the Linked Open Data cloud</article-title>
          .
          <source>In ESWC</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Porrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          . ABSTAT:
          <article-title>ontology-driven Linked Data summaries with pattern minimalization</article-title>
          .
          <source>In ESWC Satellite Events, Revised Selected Papers</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          .
          <article-title>Semantic search - using graph-structured semantic models for supporting the search process</article-title>
          .
          <source>In Int. Conf. on Conceptual Structures</source>
          , pages
          <fpage>48</fpage>
          -
          <lpage>65</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          , G. Ladwig, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          .
          <article-title>Managing structured and semi-structured RDF data using structure indexes</article-title>
          .
          <source>IEEE TKDE</source>
          ,
          <volume>25</volume>
          (
          <issue>9</issue>
          ):
          <fpage>2076</fpage>
          -
          <lpage>2089</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>