<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OLAP Manipulations on RDF Data following a Constellation Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ra k Saad</string-name>
          <email>k@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Teste</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cassia Trojahn</string-name>
          <email>cassia.trojahng@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT (UMR5505) &amp; Universite Toulouse 2 Le Mirail (UTM2)</institution>
          ,
          <addr-line>France srf.ra</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Multidimensional analysis is an alternative way for summarising, aggregating and viewing RDF data on di erent axes (dimensions) and subjects of analysis (facts). From a RDF data collection conforming to the W3C Data Cube speci cation, we formalise a multidimensional model in terms of RDF data structures following a conceptual constellation model. This model regroups facts, which are studied according to several dimensions possibly shared between facts, with dimensions relating multi-hierarchies. We show how elementary OLAP operations can be translated into SPARQL queries using an OLAP algebra that is compliant to the constellation model. This algebra is based on a multidimensional table which displays data from one fact and two of its linked dimensions. Initial experiments have been carried out using both synthetic data sets and real data sets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Linked Open Data (LOD) movement has promoting the publication of large
interlinked collections of data, represented as RDF graphs. Following this
initiative, many organisations currently publish statistical data in RDF format (e.g.,
Eurostat1, European Central Bank2, UK COINS3, to cite a few examples). The
need for exploiting these data for analytical and decision-making purposes
becomes rapidly evident. On the one hand, one promising way of analysing these
numeric data is by means of OLAP (Online Analytical Processing) analysis [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
This technique allows for summarising, ltering, aggregating, and viewing data
on di erent axes and subjects of analysis. On the other hand, OLAP treatments
require data to be structured following a speci c model, i.e., the
multidimensional model, which organises data on a set of facts (subjects of analysis), and
dimensions and hierarchies (axes of analysis).
      </p>
      <p>
        A rst category of approaches for manipulating RDF data following a
multidimensional model considers an ETL process for extracting and transforming
these data into a speci c structure, usually the star relational model, before
using standard OLAP systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Another category of approaches aims at
manipulating OLAP operations directly on RDF data collections without using an
      </p>
      <sec id="sec-1-1">
        <title>1http://eurostat.linked-statistics.org</title>
      </sec>
      <sec id="sec-1-2">
        <title>2http://ecb.270a.info/.html</title>
      </sec>
      <sec id="sec-1-3">
        <title>3http://data.gov.uk/resources/coins</title>
        <p>
          ETL process [
          <xref ref-type="bibr" rid="ref3 ref8">8, 3</xref>
          ]. While the rst approach requires the ETL process to be
repeated for propagating the data evolutions in the sources, the second approach
requires a multidimensional modelling of RDF data and a dynamic translation
of OLAP operations into SPARQL queries.
        </p>
        <p>
          In this paper, we present an approach for OLAP manipulations on RDF
data which falls into the second category of approaches. To that extent, we
consider that RDF data are modelled according to the RDF Data Cube
Vocabulary4, a vocabulary to model multidimensional data, such as statistics, in
RDF. First, we propose a formalisation of a multidimensional structure based on
RDF format following a constellation model of facts and dimensions composed
of multi-hierarchies. This model has been introduced by Ravat and colleagues in
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], extending star schemes [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], which are commonly used in the
multidimensional modelling. Second, based on this formalisation, we show how the main
OLAP operations (DRILLDOWN, ROLLUP, SELECT and ROTATE) can be
translated into SPARQL queries using an OLAP algebra that is compliant to
the constellation model. This algebra is based on a multidimensional table which
displays data from one fact and two of its linked dimensions. It de nes a set of
elementary operators from which more complex OLAP operations can be
dened. Our proposal has been implemented and experiments were carried out
using both synthetic data sets and real data sets. The main contributions of the
paper can be outlined as follows :
{ we provide an e cient RDF constellation model that is intimately related to
the multidimensional data model. This generalised model supports multiple
facts, multiple dimensions and multiple hierarchies;
{ the proposed modelling supports complex hierarchies to represent several real
world data organisations and covers the case of non-covering hierarchies [
          <xref ref-type="bibr" rid="ref10 ref11">11,
10</xref>
          ], where instances can not strictly follow the hierarchical speci cations
by allowing values of a child level to jump the intermediate levels along
the hierarchy. An example of this kind of hierarchy is a company having
customers into di erent cities of several countries. American cities can be
regrouped into parent levels such as states whereas French cities jump these
intermediate levels.
        </p>
        <p>The remainder of the paper is organised as follows. x2 introduces the
conceptual constellation model we proposed for the multidimensional modelling of
RDF data. Based on this model, x3 presents how OLAP operations are translated
into SPARQL queries. Then, x4 describes the prototype developed to validate
our approach. x5 discusses related work and x6 concludes the paper and discusses
future work.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Constellation Model on RDF Data</title>
      <p>
        The multidimensional manipulation of RDF data requires the de nition of a
conceptual model from which the OLAP operations will be speci ed. This section
4http://www.w3.org/TR/vocab-data-cube/
formally de nes the elements of a multidimensional model, in terms of RDF data
described using RDF Data Cube5, SKOS6 and RDFS7 vocabularies. This
formalisation is based on a conceptual model that de nes a constellation of facts and
dimensions, which are composed of multi-hierarhies. A constellation regroups
several facts, which are studied according to several dimensions possibly shared
between facts [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Before introducing the model, we consider an example of a
constellation schema that will be used throughout the remainder of the paper.
This schema is composed by the fact Sales that has two measures, quantity and
amount, and which can be analysed throughout three dimensions, Time, Product,
and Geography. The Geography dimension is composed of two hierarchies, HGeo,
a three-level hierarchy (composed by the attributes City, Region and Country )
and HArea, a two-level hierarchy (City, Area). Figure 1 depicts the example of
a multidimensional schema using graphical notations. Note that HGeo is a
noncovering hierarchy because some cities do not belong to regions, whereas each
region as well as each city belongs to one country.
      </p>
      <p>A dimension models an analysis axis and is composed of attributes (also
called parameters or levels). These attributes are organized according to one or
several hierarchies within a dimension :
De nition 1. A dimension D is de ned as 4-tuple (?d,HD,AD,ID), where
{ ?d a qb:DimensionProperty
{ HD=f?h1D,...,?hvDg is a set of hierarchies, where ?h a skos:ConceptScheme
^ ?d qb:codeList ?h
{ AD=f?a1D,...,?auDg is a set of attributes, where ?a a rdfs:Class ^ ?a rdfs:subClassOf
skos:Concept ^ (9 ?h 2 HD : ?a skos:inScheme ?h)</p>
      <sec id="sec-2-1">
        <title>5http://www.w3.org/TR/2012/WD-vocab-data-cube-20120405/ 6http://www.w3.org/2004/02/skos/ 7http://www.w3.org/TR/2004/REC-rdf-schema-20040210/</title>
        <p>{ ID=fI1D,...,IpDg is a set of dimension instances, where IjD=f?ijD1 ,...,?ijDk g is
a set of attribute instances, where ?i a ?a ^ ?a 2 AD ^ k &lt;= jADj
Each hierarchy relates to a skos:ConceptScheme and attributes of dimensions
are modelled as subclasses of skos:Concept and instances of rdfs:Class. In fact,
attributes represent levels of granularity within a dimension. As hierarchies in the
RDF Data Cube Vocabulary version we use are de ned as SKOS hierarchies, each
hierarchical level refers to a skos:Concept in the SKOS hierarchy. In order to link
the instances of attributes to each hierarchical level, de ning them as instances
of rdfs:Class allows for using the property rdf:type to state that instances refer
to a speci c level of hierarchy8.</p>
        <p>
          As stated above, hierarchies represent a particular vision (perspective) of
a dimension where each attribute represents one data granularity according to
which measures could be analysed. Following the constellation model in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ],
weak attributes (attributive properties) complete the parameter semantics. As
one dimension may have multiple hierarchies of attributes, this means that an
attribute can have several direct ancestors, each one belonging to a speci c
hierarchy :
De nition 2. A hierarchy H of a dimension D is de ned as 3-tuple (?h,PH ,WeakH ),
where
{ ?h a skos:ConceptScheme
{ PH = &lt;?p1H ,...,?pvH &gt; is an ordered set of attributes (parameters),
representing levels of granularity within a dimension, where ?p a rdfs:Class ^ ?p
rdfs:subClassOf skos:Concept ^ ?p skos:inScheme ?h ^ ?h skos:hasTopConcept
?p, where ?h skos:hasTopConcept ?xtop ^ 9 C=&lt;?x1...?xn&gt; : n &lt; jPH j ^
?x1 skos:broader ?x2 ^ ?x2 skos:broader ?x3...?xn ^ ?xn skos:broader ?xtop
{ WeakH : PH ! 2AD P H is a function associating each parameter to one or
more weak attributes, where WeakH (?p)=f?a1...?ang, where ?a a rdfs:Class
^ ?a rdfs:subClassOf skos:Concept ^ ?a skos:related ?p
        </p>
        <p>A fact re ects the information that has to be analysed according to
dimensions and that is modelled through one or several indicators (measures):
De nition 3. A fact F is de ned as 3-tuple (?ds,MF ,IF ), where
{ ?ds a qd:DataSet
{ MF =ff1(?m1F ),...,fw(?mFw ) is a set of measures associated with an aggregate
function f , where ?m a db:MeasureProperty
{ IF =f?iiF ,...,?iqF g is a set of fact instances, where ?i is a 3-tuple (?o,(?dv1,
?dv2,...?dvn),(?mv1,?mv2,...?mvm)), where ?o a qb:Observation ^ ?o qb:DataSet
?ds : 8i2[1..n], 9?d2Star(F):?o?d?dvi ^ 8i2[1..m], 9?m2MF :?o?m?mvj</p>
      </sec>
      <sec id="sec-2-2">
        <title>8An alternative to associating instances to SKOS concepts could consider XKOS</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], a recently proposed extension of SKOS that takes into account the
representation of levels in concept schemes, semantic relations like isPartOf and instantiation of
concepts, between other features. This will be further investigated in future work.
        </p>
        <p>A constellation regroups several facts, which are studied according to several
dimensions, possibly shared between facts :
De nition 4. A Constellation Cs is de ned as a 3-tuple (FCs,DCs,StarCs),
where
{ FCs=f?f 1,...,?f mg is a set of facts, where ?f a qb:DataSet
{ DCs=f?d1,...,?dng is a set of dimensions, where ?d a qb:DimensionProperty
{ StarCs : FCs ! 2DCs associates each fact to its linked dimensions, where ?f
qb:Structure ?cube ^ ?cube a qb:DataStructureDe nition ^ ?cube qb:component
?compD ^ ?compD qb:dimension ?d ^ (8?m2Mf , 9?compM : ?cube qb:component
?compM ^ ?compM qb:measure ?m).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Translating OLAP Operations into SPARQL</title>
      <p>
        Based on the constellation model presented above, we de ne a SPARQL query
mechanism for performing OLAP operations directly on the RDF data. This
mechanism is based on the OLAP algebra de ned in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This algebra is a
procedural query language that provides a set of elementary operators from which
more complex operations can be speci ed. It is based on a multidimensional
table (MT) which displays data from one fact and two of its linked dimensions
De nition 5 (Multidimensional table (MT) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). A MT is as 4-tuple
(S,L,C,R) where S=(FS ,MS ) represents the analysed subjects through a fact
F S 2FCs and a set of projected measures MS , L=(DL,HL,PL) represents the
horizontal analysis axis where PL=&lt;pHmLax,...pHmLin&gt;, HL2HDL and DL2StarCs(FS ),
HL is the current hierarchy of DL, C=(DC,HC,PC) represents the vertical
analysis axis where PC=&lt;pHmCax,...pHmCin&gt;, HC2HDC and DC2StarCs(FS ), HC is the
current hierarchy of DC, R=pred1^...^predt is a normalised conjunction of
predicates (restrictions of dimension data and fact data).
      </p>
      <p>
        The algebraic operators take as input a source M T , noted M TSRC =(SSRC ,
LSRC ,CSRC ,RSRC ), and produces an output M T , noted M TRES =(SRES , LRES ,
CRES ,RRES ). Each M TRES can further be manipulated using operators of the
same algebra. In the scope of this paper, we focus on the minimal core of
operators, namely DISPLAY (for de ning a rst MT), DRILLDOWN and ROLLUP
(for moving the analysis details along a hierarchy), SELECT (for selecting data
of a multidimensional schema), and ROTATE (for replacing an analysis axis by
another one). We assume querying a single data set. Each OLAP operation has
an input M TSRC and an output M TRES . For the formal de nition of each OLAP
operator the reader can refer to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Here, we de ne how each operation has been
de ned in terms of SPARQL queries. The aggregations and optimisations are out
of the scope of this paper.
      </p>
      <p>As Figure 2 depicts, an initial multidimensional table M T is built from
a constellation Cs, using the operator DISPLAY, where DISPLAY (FS ,MS ,
DL,HL,DC,HC)=MTRES , with M TRES =(SRES ,LRES ,CRES ,RRES ). This
operation displays the root parameters of each hierarchy (where all observations
refer to the lowest level of the dimension hierarchy, i.e., root level or level 1). The
process of generating a SPARQL query (Table 1) corresponding to DISPLAY
considers the following set of nested operations :
1. Identify the instances of the fact FS to be displayed;
2. Retrieve the values of the root parameters P L1 (attribute in the
horizontal analysis axis) and P C1 (attribute in the vertical analysis axis) of the
dimensions DL and DC, respectively;
3. Retrieve the value mvi of each measure mi;
4. Group the measure values mvi by P L1 and P C1;
5. Calculate the aggregations by applying on the measure values mvi the
corresponding aggregation functions Aggi.</p>
      <p>SELECT ?PL1 ?PC1 (Aggi(mvi) AS ?mesi) SELECT ?prodId ?city (SUM(?qty) AS qtySales)
WHERE WHERE
?fobs rdf:type qb:Observation. ?fobs rdf:type qb:Observation.
?obs qb:dataset IRI(FS). ?obs qb:dataset ex:Sales.
?obs IRI(DL) ?PL1. ?obs ex:Products ?prodId.
?obs IRI(DC) ?PC1. ?obs ex:Geography ?city.
?obs IRI(mi) ?mvi. ?obs ex:quantity ?qty.
gGROUP BY ?PL1 ?PC1 gGROUP BY ?prodId ?city</p>
      <p>From a M TRES resulting from a DISPLAY operation, the operations ROLLUP
and DRILLDOWN modify the analysis precision by manipulating the
hierarchical levels of the dimensions. In ROLLUP (MTSRC ,D,Lvlsup)=MTRES , D 2
fDL,DCg is the dimension on which the operation is applied, Lvlsup = is a
coarser-graduation level used in M TRES , where M TRES =(SSRC ,LRES ,CRES ,
RSRC ). Inversely, for DRILLDOWN (MTSRC ,D,Lvlinf )=MTRES , D is the
dimension on which the operation is applied, Lvlinf = is a lower attribute in
the current hierarchy H of D, and M TRES =(SSRC ,LRES ,CRES ,RSRC ). For the
SPARQL query generation, the operation consists of positioning the
hierarchical level of the current hierarchies under a parameter of level n 1 (by means
of skos:broader ). As an example, consider the ROLLUP (MTSRC ,DL,Lvlsup),
supposing that for the dimension in DCMTSRC , the hierarchy HCMTSRC is
positioned at the root parameter. The operations to be performed are the following
(Table 2) :
1. Identify the instances of the fact FMTSRC in MTSRC to be displayed;
2. Retrieve the values of the root parameters P L1 and P C1 of the dimensions</p>
      <p>DLMTSRC and DCMTSRC ;
3. Retrieve the value mvi of each measure mMTSRC ;
4. From P L1, drill upward in the hierarchy HiLMTSRC until getting up the level
Lvlsup (level n) through the predicate skos:broader and by ensuring that the
new parameter accessed is part of the hierarchy HLMTMTSRC (via the
predicate skos:inScheme. This allows for navigating through the good hierarchy,
in the case where the dimension has multiple hierarchies. Hence, for getting
the parameter of level n requires navigating through n 1 parameters;
5. Retrieve the value mvi of each measure mMTSRC ;
i
6. Group the measure values mvi by P C1 and Lvlsup;
7. Calculate the aggregations by applying on the measure values mvi the
corresponding aggregation functions Aggi;
8. Display the aggregated measure values with the values of P C1 and Lvlsup.
SELECT ?PC1?P Sup(Aggi(mvi)) AS ?mesi SELECT ?prodId ?region (SUM(?qty) AS qtySales)
WHERE WHERE
?fobs rdf:type qb:Observation. ?fobs rdf:type qb:Observation.
?obs qb:dataset IRI(FMTSRC ). ?obs qb:dataset ex:Sales.
?obs IRI(DCMTSRC ) ?PC1. ?obs ex:Products ?prodId.
?obs IRI(DLMTSRC ) ?PL1. ?obs ex:Geography ?city.
?PL1 skos:broader ?PL2. ?city skos:broader ?region.
?PL2 skos:inScheme IRI(HLMTSRC ). ?region skos:inScheme ex:HGeo.
... ?region rdf:type ex:region.
?P Ln 1 skos:broader ?PSup.
?PSup skos:inScheme HLMTSRC .
?PSup rdf:type IRI(Lvlsup).
?obs IRI(miMTSRC ) ?mvi. ?obs ex:quantity ?qty.
gGROUP BY ?PC1 ?PSup gGROUP BY ?prodId ?region</p>
      <p>Note that for the DRILLDOWN operation, the evaluation principle is similar
to the evaluation of ROLLUP. For manipulating the hierarchies within
dimensions, it is important to take into account a special case where, at instances level,
some hierarchical levels have not any associated instance. It is the case of
noncovering hierarchies. For example, in the case of the geographical hierarchy HGeo,
data at the instance level may contain some cities which are not associated to
any region or state (for instance, Vatican City is considered as a city-state which
is not associated to a region). Suppose that HGeo is a non-covering hierarchy, and
one wants to analyse product sales by countries. Hence, the aggregations have
to be done by country regardless of the real hierarchical level on which they are
positioned, i.e, (a) countries positioned at the third level (case of instances that
respect the hierarchy speci cation in the scheme); (b) countries positioned at the
second level (countries which have a state or a city but not both), (c) countries
positioned at the rst level (countries without states and cities, like Monaco).
Hence, we combine graph patterns resulting from the UNION SPARQL
operator. An example of query that takes into account non-covering hierarchies, from
the previous ROLLUP query (ROLLUP (MTSRC ,DL,Lvlsup), where Lvlsup is
the parameter of level n in the scheme, is presented in Table 3.</p>
      <p>WHERE
f
?obs rdf:type qb:Observation.
?obs qb:dataset IRI(FMTSRC ).
?obs IRI(miMTSRC ) ?mvi.
?PC1 rdf:type IRI(PC1).
?obs (DCMTSRC ) ?PC1.
?PSup rdf:type IRI(Lvlsup).
f
?obs (DLMTSRC ) ?PL1.
?PL1 skos:broader ?PL2.
?PL2 skos:inScheme IRI(HLMTSRC ).
...
?P Ln 1 skos:broader ?PSup.
g
UNION
f
?obs (DLMTSRC ?PL1.
?PL1 skos:broader ?PL2.
?PL2 skos:inScheme IRI(HLMTSRC ).
...
?P Ln 2 skos:broader ?PSup.
g
...</p>
      <p>UNION
f
?obs (DLMTSRC ) ?PSup.
g
GROUP BY ?PC1 ?PSup
SELECT ?PC1 ?PSup (Aggi(mvi)) AS ?mesi SELECT ?prodId ?country (SUM(?qty)
AS ?SalesQty)
WHERE
f
?obs rdf:type qb:Observation.
?obs qb:dataset ex:Sales.
?obs ex:quantity ?qty.
?obs ex:Products ?prodId.
?country rdf:type ex:Country.
f
?obs ex:Geography ?geo1.
?geo1 skos:broader ?geo2.
?geo2 skos:inScheme ex:HGeo.
?geo2 skos:broader ?country.
g
UNION
f
?obs ex:Geography ?geo1.
?geo1 skos:broader ?country.
g
UNION
f
?obs ex:Geography ?country.
g</p>
      <p>GROUP BY ?prodId ?country</p>
      <p>For changing the analysis criteria, the ROTATE operation allows changing
one analysis axis by another or the current hierarchy by another in a same
dimension, ROTATE (MTSRC ,Dold,Dnew,HkDnew)=MTRES , where Dold 2 fDL,DCg is
the dimension on which the operation is applied, Dnew is the dimension replacing
Dold, HDnew is the current hiearchy of Dnew, and M TRES =(SSRC ,LRES ,CRES ,
k
RSRC ). Dold=Dnew where only the current hierarchy is to be replaced. The
hierarchical level of the modi ed axis corresponds to the root parameter of this
hierarchy. The process of generating the SPARQL query ensuring these
operations is similar to the operation DISPLAY, but accessing the level of detail
speci ed for the unmodi ed axis. In case of performing hierarchy rotations, the
corresponding axis is positioned on the root level of the new hierarchy.</p>
      <p>Finally, the SELECT operation (i.e., SLICE and DICE in the common OLAP
terminology) removes the data which do not satisfy a condition. A condition can
be applied on the dimension attribute values or fact measure values: SELECT
(MTSRC ,pred)=MTRES , where pred=pred1^ predt is a selection predicate on
facts FS and/or its linked dimensions Di 2 StarCs(FS ). The SPARQL query
implementing this operation can be obtained by integrating the restriction
condition in the query producing the initial multidimensional table (MTSRC ). This
can be achieved using SPARQL FILTER operator which can apply a logical
condition to lter the results of queries.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Prototype and experiments</title>
      <p>In this section, we present the prototype we have implemented in order to
validate the conceptual approach and the experiments carried out using this
prototype. The prototype has been implemented using Microsoft .NET Framework
and dotNetRDF API9. To allow the system to load the multidimensional schema
modelling the analysis needs, it is necessary to specify the data set structure
de nitions (qb:DataStructureDe nition) and the hierarchical speci cation of
dimensions according to the model described above. This operation (step 1) is
assured by the rst principal module, the Multidimensional Model Loader. Once
the schema loaded, the manipulation of OLAP queries becomes possible (step 2).
The second module is the SPARQL Generator. It is responsible for generating
SPARQL queries from OLAP manipulations speci ed as input (step 3). Using
the dotNetRDF API, the generated SPARQL queries run on the data set (step
4). Query results are then presented to the user (step 5). RDF/XML, Turtle,
and N3 RDF syntaxes are supported by the prototype. Figure 3 shows a
screenshot of our prototype. The user interface allows to specify the analysis needs
and shows the results returned after querying the data sources. The generated
SPARQL queries are also given to the user together with their processing time.</p>
      <p>We have conducted our experiments on a RDF data set about Annual
producer price of industrial products from CA 1996 Statistical O ce of the Republic
of Serbia10 having with 789 instances of attributes and 156 observations. It
contains one temporal and one geographical (about Serbia) dimensions. Hierarchical
structure of data is given in associated dictionaries providing a temporal
hierarchy and information on regions and municipalities of Serbia. Initially, observation
values (measures) are given according to the Year level for the temporal
dimension and the country level for the geographical dimension (only one country:
Serbia). We have modi ed the observation values according to di erent levels of</p>
      <sec id="sec-4-1">
        <title>9http://www.dotnetrdf.org/</title>
        <p>10http://wiki.planet-data.eu/web/Annual producer price indicatores of industrial
products CA 1996 from Statistical O ce of the Republic of Serbia
the dimensions hierarchies in order to carry out our experiments on
manipulating hierarchies and make some non-covering data available. This rst data set
contains only two dimensions, however, OLAP operations such as ROTATE
requires more than two dimensions for performing dimension rotations, and more
than one hierarchy per dimension for performing hierarchy rotations. Hence, a
second data set has been used. It has 69888 observations and 1191 instances of
attributes. This data set have been obtained by converting a synthetic
multidimensional database generated in our team to RDF format. The conversion tool
is actually limited to this speci c database. We are planning to develop a generic
version of it to handle with any multidimensional data set.</p>
        <p>The evaluation of our approach has been limited to the manipulations of
OLAP operations on the two data sets described above. More speci cally, we
evaluated the adequacy and the correctness of the results from a sequence of
applied operations. From a DISPLAY operation, we are able to correctly aggregate
data according to a speci c level of granularity, rotate dimensions and
hierarchies or select parts of a multidimensional table. We do not evaluate runtime
for executing the operations, with respect to the number of observations in the
data sets. Further evaluation on di erent data sets have to be carried out.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related work</title>
      <p>
        Traditionally, OLAP analysis operates on data obtained from multiple and
heterogeneous sources. In order to organise data coming from these sources in a
multidimensional structure, a pipeline of extraction, transformation and loading
(ETL) is usually carried out. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an ETL module transforms RDF data
(described using RDF Data Cube Vocabulary) into a Multidimensional Model. The
resulting structure is further manipulated using Mondrian OLAP system and
MDX queries. Hence, the OLAP manipulations are performed on the
multidimensional data source and not directly on the RDF data collection. The
inconvenient of this approach is the ETL process has to repeated in order to propagate
changes in the raw data. In order to overcome this drawback, further proposals
[
        <xref ref-type="bibr" rid="ref14 ref3 ref8">14, 3, 8</xref>
        ] deal with the direct manipulation of RDF data via SPARQL queries. In
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], an approach for online analysis of RDF triples has been proposed, where a
speci c storage system has been designed. The aim is to e ciently manage large
collections of RDF data. A speci c query mechanism extends SPARQL and
multidimensional modelling of RDF data has not been considered in this solution. In
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the authors de ne an RDF vocabulary called Open Cubes (OC) for the
multidimensional modelling of RDF data. OC provides a set of classes and properties
to model the di erent structures of the multidimensional model (dimensions,
attributes, measures), including hierarchical relationships between attributes of
dimensions. From RDF collections described using OC, di erent OLAP
manipulations can be performed directly via queries expressed in SPARQL. Although
this solution is based on a multidimensional modelling of RDF data and allows
expressing OLAP operations in terms of SPARQL, its main limitation is the
use of a speci c and non-standard vocabulary. RDF Data Cube Vocabulary,
meanwhile, is supported by the W3C, that justi es its use by several
collections of published statistical data for subsequent analysis. Tools for supporting
the publication of these statistical data have been proposed. This is the case
of OLAP2Cube and CSV2Cube presented in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which allow the generation of
RDF collections using the vocabulary RDF Data Cube from multidimensional
databases implemented on relational data. In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a process of identifying data
sources for publishing statistical linked data following the RDF Data Cube
vocabulary is also presented. Domain ontologies are used to provide a semantic
meaning to the data cube.
      </p>
      <p>
        Comparing the two main categories of approaches presented above, the
approaches based on direct manipulation of RDF data ([
        <xref ref-type="bibr" rid="ref14 ref3 ref8">14, 3, 8</xref>
        ]) are more
advantageous in terms of exibility and adaptation to the speci cities of published
data on the Web. However, the main drawback in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is related to the
use of non-standard vocabularies. With regards to the work described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
although it is based on RDF Data Cube Vocabulary, it does not take into account
the hierarchical structure at multiple levels neither the multiples hierarchies in
a dimension. Our approach supports these hierarchical notions. Contrary to [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
we do not implement any kind of aggregation for optimising query execution.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Final remarks and future work</title>
      <p>
        This paper has presented a formalisation of a multidimensional model in terms
of RDF data described following RDF Data Cube, SKOS and RDFS
vocabularies. On the basis of this model, we de ned a mechanism for translating common
OLAP operations into SPARQL queries. Two important aspects addressed in
this paper are the ability of representing multiple hierarchies in a dimension and
the ability of handling the cases where the hierarchical structures are not fully
covered at the instance level (a common case in real data). We implemented a
prototype in order to experiment and validate our proposal. A weak point of
our work is evaluation, which has been mainly based on the correctness of query
results, from a sequence of applied OLAP operations. We have several
opportunities for future work. First, we plan to study the ability to express, through
SPARQL queries, more advanced OLAP manipulations using the operators
described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Second, we intend to focus on performance and optimisation of
query execution by means of pre-aggregations. Third, RDF data represent
basically resources referenced by links on the Web. A point to study, would be the
ability to integrate these interconnections between resources in order to
associate automatically more data and extend information available in initial data
sources. Finally, XKOS could be further investigated in our formalisation.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Dayal</surname>
          </string-name>
          .
          <article-title>An overview of data warehousing and OLAP technology</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <volume>65</volume>
          {
          <fpage>74</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Codd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Codd</surname>
          </string-name>
          , and
          <string-name>
            <surname>C. T. Salley. Providing OLAP</surname>
          </string-name>
          (
          <article-title>On-Line Analytical Processing) to User-Analysis: An IT Mandate</article-title>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>L.</given-names>
            <surname>Etcheverry</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Enhancing OLAP Analysis with Web Cubes</article-title>
          .
          <source>In ESWC</source>
          , pages
          <volume>469</volume>
          {
          <fpage>483</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Gillman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cotton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jaques. eXtended Knowledge Organization</surname>
          </string-name>
          <article-title>System (XKOS)</article-title>
          .
          <source>METIS, Work Session on Statistical Metadata</source>
          , Geneva, May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Layman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Reichart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Venkatrao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pellow</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Pirahesh</surname>
          </string-name>
          .
          <article-title>Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals</article-title>
          .
          <source>Data Min. Knowl. Discov.</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>29</volume>
          {
          <fpage>53</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ka</surname>
          </string-name>
          <article-title>mpgen and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          .
          <article-title>Transforming statistical linked data for use in olap systems</article-title>
          .
          <source>In I-SEMANTICS</source>
          , pages
          <volume>33</volume>
          {
          <fpage>40</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ka</surname>
          </string-name>
          <article-title>mpgen and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          .
          <article-title>No Size Fits All ? Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views</article-title>
          .
          <source>In ESWC</source>
          , pages
          <volume>290</volume>
          {
          <fpage>304</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ka</surname>
          </string-name>
          <article-title>mpgen, S. O'Riain, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          .
          <article-title>Interacting with Statistical Linked Data via OLAP Operations</article-title>
          .
          <source>In Proceedings of Interacting with Linked Data (ILD</source>
          <year>2012</year>
          ),
          <article-title>Workshop co-located with the 9th ESWC</article-title>
          , pages
          <volume>36</volume>
          {
          <fpage>49</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kimball</surname>
          </string-name>
          .
          <article-title>The data warehouse toolkit: practical techniques for building dimensional data warehouses</article-title>
          . John Wiley &amp; Sons, Inc., New York, NY, USA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>E.</given-names>
            <surname>Malinowski</surname>
          </string-name>
          and
          <string-name>
            <surname>E. Zimnyi. OLAP</surname>
          </string-name>
          <article-title>Hierarchies: A Conceptual Perspective</article-title>
          .
          <source>In Advanced Information Systems Engineering</source>
          , pages
          <volume>477</volume>
          {
          <fpage>491</fpage>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>T. B. Pedersen</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Jensen</surname>
            , and
            <given-names>C. E.</given-names>
          </string-name>
          <string-name>
            <surname>Dyreson</surname>
          </string-name>
          .
          <article-title>A foundation for capturing and querying complex multidimensional data</article-title>
          .
          <source>Inf. Syst.</source>
          ,
          <volume>26</volume>
          (
          <issue>5</issue>
          ):
          <volume>383</volume>
          {
          <fpage>423</fpage>
          ,
          <year>July 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>F.</given-names>
            <surname>Ravat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Teste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tournier</surname>
          </string-name>
          , and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zur uh. Algebraic and graphic languages for olap manipulations</article-title>
          .
          <source>IJDWM</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <volume>17</volume>
          {
          <fpage>46</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>P. E. R.</given-names>
            <surname>Salas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. D.</given-names>
            <surname>Mota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Breitman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Publishing statistical data on the web</article-title>
          .
          <source>Int. J. Semantic Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):
          <volume>373</volume>
          {
          <fpage>388</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. B.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Scalable online analysis of semantic web data</article-title>
          .
          <source>Semantic Web Challenge</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>A.</given-names>
            <surname>Zancanaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Pizzol</surname>
          </string-name>
          , R. de Moura Speroni,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Todesco</surname>
          </string-name>
          , and
          <string-name>
            <surname>F. O. Gauthier.</surname>
          </string-name>
          <article-title>Publishing multidimensional statistical linked data</article-title>
          .
          <source>In Proceedings of the Fifth International Conference on Information, Process, and Knowledge Management</source>
          , pages
          <volume>290</volume>
          {
          <fpage>304</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>