=Paper= {{Paper |id=Vol-1666/paper-03 |storemode=property |title=Multidimensional Interfaces for Selecting Data within Ordinal Ranges |pdfUrl=https://ceur-ws.org/Vol-1666/paper-03.pdf |volume=Vol-1666 |authors=Ruben Taelman,Pieter Colpaert,Ruben Verborgh,Erik Mannens,Rik Van de Walle |dblpUrl=https://dblp.org/rec/conf/semweb/TaelmanCVMW16 }} ==Multidimensional Interfaces for Selecting Data within Ordinal Ranges== https://ceur-ws.org/Vol-1666/paper-03.pdf
                   Multidimensional Interfaces
            for Selecting Data within Ordinal Ranges

         Ruben Taelman, Pieter Colpaert, Ruben Verborgh, and Erik Mannens

                         Data Science Lab (Ghent University - iMinds)
                                firstname.lastname@ugent.be




       Abstract. Linked Data interfaces exist in many flavours, as evidenced by subject
       pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces
       are mostly used to retrieve parts of a complete dataset, such parts can for example be
       defined by ranges in one or more dimensions. Filtering Linked Data by dimensions
       such as time range, geospatial area, or genomic location, requires the lookup of data
       within ordinal ranges. To make retrieval by such ranges generic and cost-efficient,
       we propose a rest solution in-between looking up data within ordinal ranges
       entirely on the server, or entirely on the client. To this end, we introduce a method
       for extending any Linked Data interface with an n-dimensional interface-level index
       such that n-dimensional ordinal data can be selected using n-dimensional ranges.
       We formally define Range Gates and Range Fragments and theoretically evaluate
       the cost-efficiency of hosting such an interface. By adding a multidimensional
       index to a Linked Data interface for multidimensional ordinal data, we found that
       we can get benefits from both worlds: the expressivity of the server raises, yet
       remains more cost-efficient than an interface providing the full functionality on
       the server-side. Furthermore, the client now shares in the effort to filter the data.
       This makes query processing becomes more flexible to the end-user, because the
       query plan can be altered by the engine. In future work we hope to apply Range
       Gates and Range Fragments to real-world interfaces to give quicker access to data
       within ordinal ranges.


Keywords: Linked Data, indexing, dimensional data, Linked Data Fragments, sparql


1   Introduction

Different types of Linked Data interfaces can be used to publish data to the Web, such as
data dumps, subject pages, Triple Pattern Fragments (tpf), and sparql endpoints. Each of
these interfaces have a certain expressivity, ranging from high expressive query interfaces
to low expressive data dumps. The idea of ranking interfaces by their expressivity to
choose a Linked Data interface by discussing trade-offs, was introduced by Linked Data
Fragments (ldfs) [10], as illustrated in Fig. 1. ldfs and tpf also advocate for Linked
Open Data interfaces to follow the rest principles: next to having an identifier for
resources, which Linked Data promises, clients should also automatically be able to
discover the controls of an interface. This way, new features can be added to such an
interface dynamically, enabling clients to exploit those when discovered and supported.
 data                                                                                     sparql
dumps                                                                                     results

generic requests                                                                        specific requests
high client effort                                                                     high server effort
high server availability                                                          low server availability
Fig. 1: The Linked Data Fragments axis illustrates that all http interfaces offer data fragments, yet
they differ in the specificity of the data they contain, and thus the effort needed to create them [10].


     In order to evaluate Basic Graph Patterns, tpf was introduced, where the server only
exposes a triple pattern interface. A significant amount of sparql queries however contain
filter clauses [5] and because the regular tpf interface only allows clients to search
by exact triple pattern matches, it requires these clients to evaluate all sparql filters
client-side. Therefore, an extension for filtering has been added to the tpf interface [8],
which works for substring filtering. This extension thus improves the efficiency of sparql
query evaluation within the tpf framework for queries with high filter selectivity at the
cost of a large server-side index.
     Filtering on multidimensional ordinal values, like numbers, dates, geospatial locations
and even genomic ranges, yields the same problem as substring filtering. It is possible
that a highly selective ordinal filtering is required by the query. Using the regular tpf
interface, the client would have to first download all data before being able to apply a
filter. This leads to slow query execution times for highly selective filters. To this extent,
we introduce a method for publishing any set of triples in fragments which assist the
client to perform the filtering. This way, we allow a generic client to query for data within
any kind of ordinal range. In the same way sparql endpoint maintainers today would add
indices on certain literal types, data publishers now would need to expose the data within
the right fragments using our method. Compared to sparql endpoints, we now have a
less expressive interface which reduces the potential load on the server, in turn reducing
the number of possible server requests and makes filter-processing more cacheable.
     In the next section, we discuss related research on which our solution is based. After
that, Section 3 gives a generic overview of our proposed index structure. Next, Section 4
includes a theoretical analysis of the index for both client and server. Finally, Section 5
discusses the conclusions of this work with further research opportunities.


2    Related Work

Linked Data Fragments is a conceptual framework using which Linked Data publication
interfaces can be compared. These interfaces are compared by the way they represent data,
their controls to specify the data, and the metadata they use to describe this data. sparql
interfaces for example allow very specific data to be selected using a generic control, the
sparql language, which requires a high server effort. Data dumps, for example, have a
very simple control over which the whole dump can be selected. This requires a high
client effort to find specific data but a low server effort. tpf can be seen as a trade-off
between these two extremes where it requires some work from both the client and server.
    Triple Pattern Fragments (tpf) [10] interfaces allow clients to query data based on
triple patterns. When such a request is performed, the server returns a Triple Pattern
Fragment, which consists of data triples that match the client’s triple pattern. These tpfs
are pageable resources to reduce response size, which means that when a lot of triples
match with the client’s triple pattern, the client will need to request multiple pages to
completely retrieve the set of matching triples. These tpfs also include metadata and
controls next to the data. The metadata includes a description of the interface and the tpf,
while the hypermedia controls enable access to other tpfs on the interface and access to
the next and previous pages of the current tpf. Indexing structures are typically used
in data retrieval systems to reduce lookup times. The interface-level index structure we
introduce is similar to tree-based indexes like B-Trees and B+Trees [2] that are used to
store and lookup data with a certain order.
    For specific dimensions such as time, the Memento protocol [6] can be used to
expose access to historical versions of http resources by extending http with content
negotiation in the datetime dimension. Memento adds a new link to resources in the
http header, named the TimeGate, which acts as the datetime dimension for a resource.
It provides a list of temporal versions of the resource which can be requested. In this
work we generalize this concept of a Gate, to provide access to a collection of resources.
    In the area of distributed database systems [4], the concept of data fragmentation
exists. Fragmentation is used to assign subsets of data to different database systems.
This is useful when datasets become too large for one system or when subsets of data
benefit from being assigned to certain systems. Two types of fragmentation are typically
distinguished for relational data: a) horizontal fragmentation refers to splitting up the
rows of a table, where groups of full tuples are assigned to the same fragments. b) vertical
fragmentation refers to splitting up the columns of a table, where certain tuple elements
of all tuples are assigned to the same fragments. In this paper, we will reuse the concept
of horizontal fragmentation, which we will refer to as simply fragmentation.


3     Multidimensional index

In order to allow clients to navigate to specific parts of a dataset for a certain n-dimensional
range, we present a method for adding an index for an rdf class to any Linked Data
interface in this section. First, we introduce new terminology, after which we explain a
model for representing the interface index. Next, we discuss how a client is supposed to
use this index. Finally, we explain how to instantiate this model for any class.


3.1   Terminology

Definition 1. An n-dimensional resource is an rdf resource with n properties whose
ranges are strict linearly ordered sets. An rdf class of n-dimensional resources is called
an n-dimensional class.

    In Definition 1, we introduce the terms n-dimensional resource and n-dimensional
class. A radio’s tracklist element for example can be modeled as a 1-dimensional class
as it is identified by a timestamp in the temporal dimension, which is 1-dimensional. A
sensor measurement for example is a 3-dimensional resource, because it has one temporal
dimension and two geospatial dimensions.
Definition 2. A Linked Data interface exposes certain Linked Data Fragments of a
Linked Dataset. A selector can be applied to an interface to get a fragment, which is
uniquely defined by that selector.

Definition 3. We use the term Range Fragment to refer to a Linked Data Fragment [10]
that has an interval as selector, which applies to dimensional classes at one of its n
dimensions.

   An n-dimensional rdf class has n levels of Range Fragments, where a Range
Fragment at level i acts as a selector for the dimensional class at dimension i. The Range
Fragments at level i can also be seen as being fragmented at dimension i by its selectors.

Definition 4. A Range Gate is a Linked Data interface through which Range Fragments
can be selected by interval. This interface selects all Range Fragments whose interval
overlap with the Range Gate’s interval.

    In order to gain access to Range Fragments at level i, a Range Gate at level i exposes
the list of these Range Fragments at level i. This collection of Range Fragments at the same
level have no restrictions with relation to each other, their identifying interval selector
may partially, completely or not overlap. The number of possible Range Fragments is
unlimited and can vary through time. Ideally, the different Range Fragments must have
an equal size to have a balanced index. The fragmentation will therefore strongly depend
on the dimensional class.
    For example, if we have Range Fragments for 1-dimensional resources where one
area has a large amount of elements compared to the complete domain, then the Range
Fragments for this area will have a smaller identifying selector interval than the other
Range Fragments.

3.2   Index Model
Our index model is a tree structure of Range Fragments and Range Gates which is
exposed at interface-level so that clients can navigate through it to locate dimensional
resources in a certain range. The root of our tree is the original interface to which we
want to add this index. We can interpret this root element as the Range Fragment at level
−1, indicating that no dimension of the dimensional class can be selected at this level.
Each Range Fragment at level i has a link to Range Gate at level i + 1 if i + 1 < n. Fig. 2a
shows an overview of an n-dimensional index model.
    The Range Fragment at level −1 must provide a way to test whether or not a certain
triple pattern applies to each index. Therefore it should either supply a list of triple
patterns, or a control to a simple interface to test whether or not a certain triple pattern is
present in the index.
    This index can either be static or dynamic. Static indexes are made once and should
be used when n-dimensional resources can not be added to the dataset after initial
publication. Dynamic indexes on the other hand allow for n-dimensional resources to be
added after initial publication. Depending on the use case, this index can either be used
as the single point in which the n-dimensional resources can be fetched, which we call a
primary index, or as an alternative way next to the main dataset to fetch this data, which
                                                                    Range Fragment
 −1                                                                 Range Gate                                                       −1

                                                                                                                                     0
  0
                                         ···                                                                 ···
                                                                                                                                     1
  1
                                                                         ..      .. · · · .. ..   .. · · · ..· · ·..   .. · · · ..
      ..    .. · · · ..   ..   .. · · · .. · · · ..   .. · · · ..         .       .        . .     .        . .         .        .
       .     . .           .    . .               .    . .
  n                                                                                                                                  n


           Fig. a: The n-dimensional index model.                             Fig. b: The reduced n-dimensional index model.



we call a secondary index. The latter is advised when needing to remain compatible with
clients who are not capable of using this index.
    The fact that non-leaf Range Fragments exist can have an impact on the complexity of
the internal server index data structure because it has several possible entry points to the
same data. Because of this, a reduced multidimensional index can be used where Range
Fragments only exist at the very last level of the index, and all internal Range Gates at
level i will immediately link to Range Gates at level i + 1. As a consequence, the client
will have to navigate to the last level of the tree before it will be able to start requesting
n-dimensional resources. Fig. 2b shows an overview of a reduced n-dimensional index
model.
    The main reasoning behind the idea of exposing this index to the client is that these
index navigations become cacheable and that index handling is moved from the server
to the client. A client can for example navigate through the n-dimensional index by a
vector v1 = [e0, e1, ..., en ] where each element ei specifies the Range Fragment index
that should be taken at level i. If the client later has to navigate through this index
again using vector v2 = [ f 0, f 1, ..., f n ], then it could reuse parts of this navigation path.
More specifically, the vector vs = [e0, e1, ..., ei ] where with P(x) = k=0            (ek = f k ),
                                                                                  Vx
i = j : 0 ≤ j ≤ n ∧ (∀x ≤ j ⇒ P(x)) ∧ (∀y > j ⇒ ¬P(x)) is the part of the navigation
path that can be reused from cache. When many consecutive Range Fragments need to
be requested by the client, this shared vector vs will typically be very long which will
lead to frequent navigation path element cache hits.


3.3   Client Access

The goal of our multidimensional index is to improve the selectivity of the process to
find n-dimensional ordinal data. When a Linked Data interface exposes such an index for
an n-dimensional class, it should add a link to this index in its metadata so that clients
can discover this index. In practise, a client can exploit this index when for example a
filter clause is defined in a sparql query to select certain values in a time range.
    When a client needs to request data in a certain n-dimensional range, it must first
determine if one of its filters can be performed using an index from the server. It can
do this by testing index-applicability for all n-dimensional classes over which a filter
is defined using the hypermedia controls related to all indexes on the server. If a test
succeeds, the client should find the link to the corresponding index in the server’s
metadata. If such an index is found, the client can use this index to more efficiently filter
on these values, otherwise it should fall back to its default method of working.
     Depending on the number of dimensions of the class, the client will need to perform
more or less requests to find the relevant data. The client algorithm starts at dimension
−1, which is the original entrypoint at the ldf server. At this initial Range Fragment, a
link to the Range Gate at dimension i + 1 exists, which will be dimension 0 in this case.
The client must then request this Range Gate entrypoint, interpret its controls and fill in
the interval to filter on at dimension i. This will result in a paged list of Range Fragments
at level i that overlap with the given interval. From this list, the Range Fragments must be
selected that are at least partially contained in the client’s original range for dimension i.
Depending on the number of dimensions n, this process must be repeated until i = n for
each of the selected Range Fragments at level i. When more than one of those Range
Fragments are selected, these can be recursively handled in parallel by the client. When
a Range Fragment at level n is reached, its contained dimensional resources are highly
selective. Because the data provider is responsible for defining the Range Fragment’s
intervals, the client’s original n-dimensional range might not completely apply to the
data contained in this Range Fragment, so the client will still have to filter out some of
the data afterwards.
     If the client notices that a certain Range Fragment at level i < n has a very small
number of triples, it can start fetching that data instead of having to navigate deeper in the
index. This optimization is only possible when using a non-reduced multidimensional
index, otherwise the client must always go to the nth level of the index before requesting
dimensional resources.

3.4   Model Instantiation
n-dimensional resources have n properties whose ranges are strict linearly ordered sets.
These resources are therefore defined by an ordinal variable of dimensionality n, and
exist at a certain n-dimensional point. The dimensionality n determines the amount of
levels required in the index.
    The multidimensional interface specification [7] is ongoing work, in which we
explain how to describe a multidimensional interface with metadata and how it can
be used with hypermedia controls. We use the Hydra Core Vocabulary [3] to define
these controls in rdf. Range Fragments expose controls for searching triples by triple
pattern, as is described by the tpf specification [9]. Range Gates expose similar controls
to search through Range Fragments by overlapping interval. These controls allow clients
to automatically discover how to navigate in this index and find the data they need.
    In order to determine the distribution of resources in Range Fragments, we must
develop a fragmentation strategy for the dimensional resources. At each dimension i
of the dimensional class of the dimensional resources, the values must be fragmented
in such a way that their Range Fragments at this level are balanced. This balancing
requirement includes Range Fragments at the same level whose parent Range Gates are
not equal. Balanced Range Fragments improve the effectiveness of the index. Range
Fragments are balanced when they contain approximately the same amount of data.
Another requirement for good fragmentation strategies is that they do not require the
re-distribution of information in the Range Fragment when new data is added. This is
because we want this index to be cacheable by clients.
    We distinguish the following categories of fragmentation strategies:
Simple The most straightforward but not always very intelligent strategies like for
     example equally sized n-dimensional intervals.
Stochastic Depending on the n-dimensional distribution of the dimensional resources,
     a prediction for the optimal intervals can be determined by for example selecting a
     fixed number of equally-distanced percentiles in this distribution and deriving their
     n-dimensional values.
Clustering Clustering algorithms can be used to either cluster the dimensional resources
     in a fixed or variable number of clusters which have a low inter-cluster distance in
     the n dimensions. Depending on the dataset being static or dynamic, respectively
     batch or online clustering algorithms should be used.
    Dynamic indexes might require a more simple fragmentation strategy than static
indexes if this strategy requires complex operations for determining its target Range
Fragment. Static indexes can do all required fragmentation operations in an offline
preprocessing step.
    This multidimensional index only defines an addition to the server on interface-level,
the actual internal implementation can consist of anything, as long as it supports the
index interface. A simple way, and in some cases an efficient way, of exposing this index
is by pre-generating the Range Fragments of the index so that they can be accessed as
static resources. When the dataset becomes larger, this naive way becomes inefficient
because of the high number of possible resources to be generated. This is why internal
data structures like B+ trees, red-black trees, or other binary tree variants can be used to
index dimensional resources. Red-black trees can for example be used when the index
structure can fit into the main memory of the server, otherwise B-trees or B+ trees can
be used. If the multidimensional index is primary, the dimensional resources can be
stored directly into the search tree, otherwise the tree can simply hold a reference to the
dimensional resources in the actual dataset to avoid storing redundant information.


4     Theoretical Analysis
In this section, we will discuss the theoretical effects of a multidimensional index on
the performance of both client and server, after which we will discuss possible optimal
parameters for these two.

4.1   Client
The goal of the multidimensional index is to improve the selectivity for the client when
querying dimensional resources. The index will only be used when the client has to filter
over the range of the dimensional resources. In these cases, the client will look for the
dimensional resources through the index instead of through the original interface.
    Equation 1 shows the worst case for the total number of requests when requesting data
for only one Range Fragment leaf through this index. The first term represents the single
request that is required to retrieve the root element of the index and discover the index
controls. The second term represents the number of requests that is required to navigate
to the deepest Range Fragment. The last term identifies the number of requests needed
to retrieve all elements of the deepest Range Fragment. range_fragment_elements(i, j)
returns the number of dimensional resources in the jth fragment at level i. In this
context, count_range_fragments(i, j) is the amount of children the jth Range Gate
at level i has, which depends on the used Range Fragment fragmentation strategy.
range_gates(i) is the number of Range Gates on level i, which equals 1 if i = 0 and
Prange_gates(i−1)
  j=0             count_range_fragments(i − 1, j) otherwise. The best case for the number
of requests is simply the empty index, where n = 0 and all data must be retrieved from
the main dataset.

              O(#requests) = 1
                 n                        &                            '!
                            range_gates(i) count_range_fragments(i, j)
                X
              +     1 + max j=0
                                                 gate_pagesize                       (1)
                i=0
                            range_gates(i) range_fragment_elements(i, j)
               + maxi=0
                    n
                        max j=0
                                                fragment_pagesize
In practice, the n-dimensional interval from the original user’s query can overlap with
multiple Range Fragments. Assuming that the user’s interval spans s Range Fragments,
the simplified equation for the worst case of number of requests to fetch this data is
s ∗ O(#requests). Client-side caching of the index structure when it is being requested
can greatly reduce the number of requests defined by this simplified equation. More
specifically, when caching the index structure, the second term of Equation 1 will
experience a logarithmic reduction when more requests are being done in the worst case.
In the best case, when the Range Fragments that must be fetched are consecutive and have
the same Range Gate parent, Equation 1 for s requests becomes the one in Equation 2.

              O(#requests) = 1
                 n                        &                            '!
                            range_gates(i) count_range_fragments(i, j)
                X
              +     1 + max j=0
                                                 gate_pagesize                       (2)
                i=0
                                  range_gates(i) range_fragment_elements(i, j)
              + s ∗ maxi=0
                       n
                           max j=0
                                                    fragment_pagesize

    The second term in Equation 1 can be seen as the overhead the index has for
the client. These are the requests that are required to navigate through the index
to find leaf Range Fragments. If the client requests many different non-consecutive
ranges, this index navigation overhead will become larger. If a large fraction of the
dimensional resources is requested, it might be better to request this data through
the regular interface if the index is secondary. The client can detect this when the
number of filtered Range Fragments through the Range Gate at level 0 is near the total
number of Range Fragments this Range Gate offers. This decision can be formalized as
skip_index = #total_RF(0) − #filtered_RF(0) > d. This constant d is representative for
the size of the index, and can be approximated as the total number of Range Gates that
the index has at level 1.
    The third term in Equation 1 represents the maximum amount of pages a Range
Fragment has. Assuming the data provider has a good Range Fragment fragmentation
strategy, this index will be balanced and this maximum will be near the average number
of pages a Range Fragment has. This means that this third term will be typically lower
when there is a better fragmentation strategy in use. Again, this number will also be lower
with a larger pagesize. This same reasoning can be applied to the second term, which will
be decreased with a better fragmentation strategy and a larger pagesize. A larger pagesize
will reduce the amount of requests, but this means that a higher load will be placed on
the server when just the first page of a fragment must be requested for its metadata. This
is however not a valid reason not to use larger pagesizes, since it could be made possible
to fetch metadata without fragment data.
    In our example, traditional tpf interfaces provide sequential paged access to fragments.
Due to the hierarchical nature of our multidimensional index, fetching dimensional
resources becomes highly parallelizable. This is because our hierarchical index structure
makes it possible to know in advance which Range Fragments the client will need. Each
Range Fragment can in fact be handled in parallel, so the higher this number of Range
Fragments the user needs to request, the higher this parallelizability.


4.2   Server

When a multidimensional index is added to the interface of a server, the expressivity
of the interface is increased, which will potentially increase the server’s workload. As
mentioned in Section 3.4, the multidimensonal index is only defined on interface-level, so
the actual performance of the index will strongly depend on the internal datastructure used
to store the index. We will make a distinction between the Range Fragment requirements
and the Range Gate requirements.
    Range Fragments contain the actual data. If the index is primary, the dimensional
resources can be stored in the internal index itself. If the index is secondary, the data
should only be stored on one location, and other places should be able to internally fetch
data from that location. If the index is reduced, meaning that data can only be retrieved
from level n in the index, this data structure can be chosen for optimized access for only
n-dimensional ranges instead of 1..n-dimensional ranges.
    This Range Gate data should support efficient lookup by n-dimensional ranges, which
could be achieved using n layers of B+ trees, red-black trees or other binary search tree
variants. For example, a B+ tree at layer i of our multidimensional index could represent
the index structure of our dimensional resources at dimension i. Each B+ tree leaf then
contains a pointer to a B+ tree at layer i + 1 or a list of Range Fragments if i = n.
    The number of dimensions, the fragmentation strategy and the used search tree will
determine the required resources to maintain this internal index on the server and the
                                                        Pn
lookup efficiency. The total number of fragments i=0          range_gates(i) is the amount of
internal search trees that will need to be stored, the higher the number of fragments at
dimension i, the larger the trees at level i will be, and the more storage size these will take.
Equation 3 represents the size of an internal index, where tree_size(n) represents the
total size required for a search tree with n nodes and pointer_size represent the maximum
size of a pointer to a Range Fragment.

                O(internal_index_size) =
                n range_gates(i)
                X     X          
                                     tree_size(count_range_fragments(i, j))           (3)
                 i=0    j=0
                                                                 
                 + count_range_fragments(i, j) ∗ pointer_size

The Range Fragments will not require any additional storage, because these can be stored
in traditional triple or quad stores.


4.3   Optimal parameters

A good index should minimize both the total amount of requests the client should
perform and the total storage size of the internal server-side index, therefore it should
minimize respectively Equation 2 and 3. This means that we should minimize both
count_range_fragments(i, j) and range_gates(i) if we look at the common factors. This
is however not possible, as they contradict each other. Decreasing the first factor, which
represents the total amount of elements in a Range Fragment at level i, will increase
the second factor, which equals the total amount of Range Fragments at level i − 1,
and the other way around. This is because decreasing the total amount of elements
in a Range Fragment at level i means that you have to increase the number of Range
Fragments at level i over which the data at dimension i is being fragmented. This
increase of Range Fragments thus exists for all i, and thus also for i − 1. So decreasing
count_range_fragments(i, j) will increase range_gates(i). Depending on the page size,
the optimal balance between these two parameters will depend on the used fragmentation
strategy of the dataset.


5     Conclusions

In order for clients to efficiently query datasets describing n-dimensional resources, we
introduced a theoretical extension for any Linked Data interface with Range Gates and
Range Fragments. By adding hypermedia controls, supporting clients can automatically
discover this fragmented dataset. The movement of responsibility for filtering, ordering
or window-functions from server to client, lowers the server load, as the http cache
can be used more efficiently. The client can also reuse parts of its paths through the
index when requesting consecutive fragments. An important factor when building a
multidimensional index for an rdf class is how to fragment its instances in n dimensions.
    Our generic index can, with the necessary changes, be implemented for many different
use cases. For example, a 1-dimensional temporal index could be instantiated for storing
train departure delays with a stochastically optimal fragmentation strategy based on
public transit api usage logs [1]. Secondly, substring filtering support could be exposed
using a 1-dimensional index instead of the single filter control that is currently being
used [8].
    We discussed the theoretical client and server performance for this index. The more
selective the n-dimensional filtering by the client, the more efficient this index will
become when compared to non-indexed-access. There is a point at which it becomes
more efficient for the client to query the data without the index, this will be when the filter
has a very low selectivity. The client and server will both benefit from smaller, and thus
more selective, Range Fragments, while at the same time this results in a higher number
of Range Fragments which will thus require a larger index structure i.e. a larger number
of Range Gates. This larger number of internal Range Gates, increases the required
number of requests by the client and the storage requirements for the server, which is
why a balance between the Range Fragment size and count should be determined.
    These parameters introduced in our theoretical analysis offer new trade-offs for data
publishers in offering optimized access to n-dimensional Linked Data on the Web, as
they can choose how much work the client and server should each do for index handling.
This trade-off can even be determined dynamically, by, for example, only providing
access to a highly-fragmented index during low server load and a low-fragmented index
during peak-times. Dynamic fragmentation will however increase the complexity of
index handling and caching by clients.
    One future optimization for the multidimensional index to further reduce the required
number of requests and processing times is by guaranteeing the ordering of triples inside
Range Fragments. This would make it possible for the client to make attempts to skip
certain pages of a Range Fragment if it only has to look for a subset of that Range
Fragment.
    The proposed index currently only allows fragments to be divided up linearly per
dimension. These are linear because our Range Fragment identifiers are intervals, which
are identified by two points in one dimension and thus can be represented by a straight
line across this dimension, which we will call a constant border function. For some use
cases, a division using more general polynomial border functions, which may overlap
several dimensions, might result in a better fragmentation of data. Figure 3 shows an
example of two-dimensional data that may benefit from polynomial border functions.
This, however, goes beyond the capabilities of our hierarchical index with one level per
dimension. Our index could be adapted so that it can combine several dimensions into
one layer as opposed to considering the dimensions independently, so that polynomial
functions become possible. This will, however, increase the cost of the server as it will
give more responsibility of the index evaluation to the server.




       Fig. a: Constant border functions not capable of   Fig. b: Polynomial border functions capable of
       evenly dividing the data.                          evenly dividing the data.

                            Fig. 3: Fragmentation of two-dimensional data.
   In future work, we plan on making a generic implementation of this multidimensional
index for tpf servers with a generic tpf client extension that can consume it. We plan
on using this index for both offering efficient access to dynamic data, like train delays,
and access for ranges on genomes. This index should be tested thoroughly for both the
generic case as for specific use cases and the statements made in our theoretical analysis
should be tested.


References
 1. Colpaert, P., Chua, A., Verborgh, R., Mannens, E., Van de Walle, R., Vande Moere, A.: What
    public transit api logs tell us about travel flows. In: Proceedings of the 25th International
    Conference Companion on World Wide Web. pp. 873–878. International World Wide Web
    Conferences Steering Committee (2016)
 2. Comer, D.: Ubiquitous b-tree. ACM Computing Surveys (CSUR) 11(2), 121–137 (1979)
 3. Lanthaler, M.: Hydra core vocabulary. Unofficial draft, Hydra W3C Community Group (Sep
    2014), http://www.hydra-cg.com/spec/latest/core/
 4. Özsu, M.T., Valduriez, P.: Principles of distributed database systems (2011)
 5. Rietveld, L., Hoekstra, R.: Man vs. machine: Differences in SPARQL queries. In: 4th USEWOD
    Workshop on Usage Analysis and the Web of Data, ESWC (2014)
 6. de Sompel, H.V., Nelson, M.L., Sanderson, R., Balakireva, L., Ainsworth, S., Shankar, H.:
    Memento: Time travel for the web. CoRR abs/0911.1112 (2009)
 7. Taelman, R.: Multidimensional linked data interfaces. Unofficial draft, Ghent University -
    iMinds - Data Science Lab (Jun 2016), https://w3id.org/multidimensional-interface/
    ontology
 8. Van Herwegen, J., De Vocht, L., Verborgh, R., Mannens, E., Van de Walle, R.: Substring
    filtering for low-cost linked data interfaces. In: The Semantic Web–ISWC 2015, pp. 128–143
    (2015)
 9. Verborgh, R.: Triple pattern fragments. Unofficial draft, Hydra W3C Community Group (Aug
    2014), http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
10. Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester,
    B., Haesendonck, G., Colpaert, P.: Triple Pattern Fragments: a low-cost knowledge graph
    interface for the Web. Journal of Web Semantics 37–38, 184–206 (2016)