<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an RDF Analytics Language: Learning from Successful Experiences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fadi Maali</string-name>
          <email>fadi.maali@deri.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Decker</string-name>
          <email>stefan.decker@deri.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Enterprise Research Institute</institution>
          ,
          <addr-line>NUI Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>SPARQL, the W3C standard querying language for RDF, provides rich capabilities for slicing and dicing RDF data. The latest version, SPARQL 1.1, added support for aggregation, nested and distributed queries among others. Nevertheless, the purely declarative nature of SPARQL and the lack of support for common programming patterns, such as recursion and iteration, make it challenging to perform complex data processing and analysis in SPARQL. In the database community, similar limitations of SQL resulted in a surge of proposals of analytics languages and frameworks. These languages are carefully designed to run on top of distributed computation platforms. In this paper, we review these e orts of the database community, identify a number of common themes they bear and discuss their applicability in the Semantic Web and Linked Data realm. In particular, design decisions related to the data model, schema restrictions, data transformation and the programming paradigm are examined and a number of related challenges for de ning an RDF analytics language are outlined.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The cost of acquiring and storing data has dropped dramatically in
the last few years. Consequently, petabytes and terabytes datasets
are becoming commonplace, especially in industries such as telecom,
health care, retail, pharmaceutical and nancial services. This
collected data is playing a crucial role in societies, governments and
enterprises. For instance, data science is increasingly utilised in
supporting data-driven decisions and in delivering data products [
        <xref ref-type="bibr" rid="ref16 ref20">16, 20</xref>
        ].
Furthermore, scienti c elds such as bioinformatics, astronomy and
oceanography are going through a shift from \querying the world" to
\querying the data" in what commonly referred to as e-science [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
The main challenge nowadays is analysing the data and extracting
useful insights from it.
      </p>
      <p>
        In order to process the available massive amount of data, a
number of frameworks were built on top of distributed cluster of
commodity machines. In 2004, Google introduced the MapReduce
framework [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and its open source implementation, Hadoop1, came out in
2007. Microsoft also introduced Dryad [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], its own distributed
computation engine. Furthermore, there has also been a surge of
activity on layering distributed and declarative programming languages
on top of these platforms. Examples include PIG Latin from
Yahoo [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], DryadLINQ from Microsoft [29], Jaql from IBM [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
Meteor/Sopremo [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        While analytics languages aim to utilise the high scalability of
distributed platforms, they also aim to increase developer
productivity by facilitating exible adhoc data analysis and exploration. It
is increasingly recognised that SQL, the main database query
language, has a number of limitations that restrict its utility in analytics
and complex data processing scenarios [
        <xref ref-type="bibr" rid="ref19 ref24 ref9">9, 19, 29, 24</xref>
        ]. SQL limitations
include: (i) a very restrictive type system (ii) common programming
patterns such as iteration and recursion can not be expressed directly
in SQL (iii) processing data requires importing and formatting the
data into a normalised relational format (iv) programmers often nd
it unnatural to analyse data by writing pure declarative queries in
SQL instead of writing imperative scripts.
      </p>
      <p>
        A close parallel can be drawn in the Semantic Web and Linked
Data realm. The size of available RDF data is increasing and massive
datasets are becoming commonplace. The 1.2 billion triple of
Freebase can be freely downloaded2 and the LOD Cloud grew to 31 billion
RDF triples as of September 2011 3. Furthermore, distributed
execution platforms are being utilised to process RDF data particularly
for building query engines that support (part of) SPARQL [
        <xref ref-type="bibr" rid="ref18 ref22 ref8">18, 8,
22</xref>
        ] and for reasoning [
        <xref ref-type="bibr" rid="ref15 ref27">28, 27, 15</xref>
        ]. However, there has not been much
activity in introducing high-level languages to support RDF
analytics and processing. While general-purpose languages, such as PIG
Latin and HiveQL, can be used; they are not tailored to address the
peculiarities of the RDF data model and do not utilise its strength
1 http://hadoop.apache.org/core/
2 https://developers.google.com/freebase/data
3 http://lod-cloud.net/state/
points. We contend that SPARQL alone is also not su cient as it
su ers from the same restrictions that SQL has.
      </p>
      <p>In this paper, we present lessons that can be learned from existing
e orts towards building an analytics language on top of RDF. We
review a number of high-level analytics languages proposed in the
big data and database communities. We identify ve common themes
they bear. For each of these themes, we present our observation,
discuss corresponding e orts in the Semantic Web community and
present pertinent challenges (section 2). We also discuss some further
characteristics of RDF and Linked Data that can prove useful for
analytics language (section 3).
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Common Themes of High-level Languages</title>
      <p>Data model4
Observation: Adoption of \relaxed" versions of the relational data
model.</p>
      <p>
        A large number of data models that relax the constraints of the
relational data model has been proposed and adopted recently,
particularly in the context of big data. MapReduce [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] uses a simple
key-value pair data model. PIG [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and HiveQL [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] support tuples
but allow richer data types such as arrays, lists and maps. Jaql [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
uses a nested data model very similar to JSON. Cattell presented a
survey of data models used in SQL and NoSQL data stores [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        RDF is the data model underlying Semantic Web data. RDF is
a graph-based data model that consists of a set of of triples. There
has been a number of proposals for a more abstract view of RDF
data. Ding et al. proposed the notion of RDF molecule [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to
decompose RDF graphs into components more coarse granular than
triples. Carroll et al. introduced Named graphs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as a way to group
and describe a set of RDF triples. Ravindra et al. introduced Nested
Triple Group to refer to a set of RDF triples sharing the same subject
or object [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
4 A data model consists of a notation to describe data and a set of operations used
to manipulate that data [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. We address the operators separately in the next
subsection.
      </p>
      <p>
        We argue that an RDF analytics language requires de ning a
data model that abstracts the data at a higher level than individual
triples. This introduces a number of challenges and design choices of
whether to support a notion of records or tuples, whether to support
nesting data structures and whether to support collection types such
as sets and arrays. Nested data structures simplify data manipulation
and processing. Additionally, a collection of nested data is easier to
be partitioned and processed in parallel. On the other hand, adopting
a nested data structure on top of RDF requires enforcing the RDF
graph into a set of trees. This approach was adopted by Tabulator
for intuitive presentation of RDF graphs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and by RAPID+ [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
to enhance the performance of processing RDF on top of Hadoop.
JSON-LD5 also encodes RDF into tree but additionally extends the
semantic of JSON by allowing referencing other objects in a
manner similar to the RDF/XML serialisation use of rdf:resource (i.e.
nesting with references). It remains to be seen which option of
nesting, nesting with referencing or pure referencing would prove best in
the context of a data model for an analytics language.
Challenge: De ne a data model on top of RDF that simpli es
manipulating data and works at a higher level than individual triples.
2.2
      </p>
      <sec id="sec-2-1">
        <title>Data processing operators</title>
        <p>Observation: Supporting only a subset of relational algebra, focusing
on operators that can be easily executed in a distributed architecture.</p>
        <p>
          There has been a number of proposals to support SQL on top
of MapReduce framework. However, many of those proposals chose
not to support the full relational algebra underlying SQL. HiveQL
for instance supports only equality predicates in a join. Similar
restrictions are included in SCOPE [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and PIG Latin.
        </p>
        <p>Challenge: de ne a subset of SPARQL algebra that is su cient to
address most common needs and is amenable to distributed execution.
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Programming paradigm</title>
        <p>Observation: A shift from pure declarative languages towards hybrid
ones.
5 http://www.w3.org/TR/json-ld/</p>
        <p>Declarative languages are abstract, concise, easier for domain
experts and provide opportunities for optimisation. Nevertheless, they
are not always the preferred way by programmers. It is often hard
to t complex needs in a single query. Imperative scripts also allow
programmers to pin down an execution plan to exploit optimisation
opportunities that automatic optimisers might miss.</p>
        <p>Increasingly, declarative languages are enriched with features from
other paradigms such as imperative, functional and logic-based. PIG
Latin adopts a hybrid imperative-declarative programming paradigm.
Jaql and Cascalog6 adopt features from functional programming.</p>
        <p>
          On the other hand, most of the languages utilised in the
Semantic Web and Linked Data realms are pure declarative languages.
Examples include R2RML7 for mapping relational databases to RDF,
SPIN8 to de ne rules in SPARQL and the languages de ned as part
of the Linked Data Integration Platform (R2R for schema
mapping [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Silk LSL for data interlinking [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Sieve con guration
for data fusion and quality de nition [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]). Another strand of related
work is embedding RDF manipulation in common object-oriented
languages such as ActiveRDF9 for Ruby and SuRF10 for Python.
These approaches inherit the expressivity of the general-purpose
programming language they are embedded in, but they still handle RDF
data in a detailed low-level manner.
        </p>
        <p>Challenge: Adopt a hybrid declarative-imperative programming paradigm
for RDF processing.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Schema</title>
        <p>Observation: From rigid schemas to partial or no schemas.</p>
        <p>Relational databases require the schema to be designed before
any data can be added to the database (a.k.a. schema rst). It is
generally not easy to change the schema and data that does not strictly
adhere to the schema cannot be added to the database. There is a
6 https://github.com/nathanmarz/cascalog
7 http://www.w3.org/TR/r2rml/
8 http://www.w3.org/Submission/spin-overview/
9 http://activerdf.org/
10 https://code.google.com/p/surfrdf/
number of advantages to schema speci cation, including data
validation, transactional consistency guarantees, static type checking and
optimisation. Nevertheless, requiring a prede ned rigid schema can
be an overkill particularly for ill-de ned ad-hoc analytics tasks. In
this context, users want to start working with the data right away
in an exploratory read-only manner. Consequently, schema-on-read
is increasingly adopted. PIG and Jaql support partial schema de
nition and allow schema de nition to evolve as users are interacting
with the data.</p>
        <p>
          RDF data is self-describing in the sense that (a signi cant part
of) the schema is explicitly encoded in the data and can be extracted.
However, an essential task in consuming RDF data from di erent
sources is schema mapping [
          <xref ref-type="bibr" rid="ref23 ref3">23, 3</xref>
          ]. Schema mapping exposes a
homogeneous model to facilitate e cient consumption and analysis of
the data. The current practice of schema mapping is similar to that of
schema- rst approach of relational databases (i.e. full schema
mapping needs to be de ned, executed before data consumption might
start). We argue for supporting partial and evolving schema mapping
while interacting with RDF data.
        </p>
        <p>Challenge: support partial and evolving schema mapping while
interacting with RDF data.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>In-situ data processing</title>
        <p>Observation: in-place processing has become an important tool for
dealing with data.</p>
        <p>The increasing volume of data generated by applications has
added constraints on how easily and e ciently it can be processed.
Requiring data to be moved before it can be processed, especially
with read-only analytics tasks, is not a viable mechanism at
extreme scale. Therefore, processing data in-place is more and more
supported. In-situ data processing is also re ected by processing
data coming from di erent locations and in di erent formats. It is
common for analytics language to support plain text, HDFS les,
JSON and databases. Most RDF tools require full transformation
and materialisation of data into RDF before it can be processed
(with R2RML being a notable positive exception).</p>
        <p>Challenge: Support in-situ RDF data processing and in-place
transformation of non-RDF data.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Further characteristics of RDF and Linked</title>
    </sec>
    <sec id="sec-4">
      <title>Data</title>
      <p>Linked Data has an additional number of characteristics that should
be utilised in the design of an analytics language. In the following,
we go through some of these characteristics.</p>
      <p>HTTP accessibility deploying RDF data as Linked Data makes it
available via the Web and interlinked to related information.
Support for retrieving data over the Web and following links should
be employed by RDF processing languages.</p>
      <p>
        Graph-based nature this introduces opportunity to support graph
traversal and graph algorithms on top of the RDF data. SPARQL
1.1 property path provides rst support in this regards. However,
other languages such as Gremlin11 and Green-Marl [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] provide
richer graph traversal capabilities and support for breadth- rst
and depth- rst traversal. These features can be embedded in an
RDF analytics language.
      </p>
      <p>Inference RDF has a formally de ned semantics that can be used
for inferencing. Inferencing allows enriching the data and makes
implicit relations and facts explicit. An RDF processing language
should include some support for basic inference tasks. However,
a trade-o between inference capabilities and performance is
inevitable.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>The Linked Data community has been very successful in publishing
large amounts of useful data as evidenced by the growth of the LOD
Cloud. Further emphasis is being put on building applications that
utilise this data. The interlinked nature of RDF data along with its
clearly de ned semantics form a great basis to enable rich analysis
and distilling valuable insights from this data.
11 https://github.com/tinkerpop/gremlin/wiki</p>
      <p>In this paper we focused on a number of lessons that can be
learned from existing e orts on designing analytics language and on
identifying some of the challenges ahead. Our current work focuses
on de ning use cases where SPARQL and other existing approaches
for processing RDF data fall short. These use cases, along with the
design clues outlined in this paper, will be used to inform the
design, the implementation and the evaluation of an RDF analytics
language.</p>
      <p>Acknowledgements. Fadi Maali is funded by the Irish Research
Council, Embark Postgraduate Scholarship Scheme. The ideas in
this paper bene ted from valuable discussions with Aidan Hogan
and Marcel Karnstedt and from the material of the \Introduction to
Data Science" course on Coursera by Bill Howe.
28. J. Urbani, S. Kotoulas, E. Oren, and F. Harmelen. Scalable Distributed
Reasoning Using MapReduce. In Proceedings of the 8th International Semantic Web
Conference, ISWC '09. Springer-Verlag, 2009.
29. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey.</p>
      <p>DryadLINQ: a System for General-purpose Distributed Data-parallel Computing
Using a High-level Language. In Proceedings of the 8th USENIX conference on
Operating systems design and implementation. USENIX Association, 2008.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>T.</surname>
            Berners-lee,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chilton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Connolly</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Dhanaraj</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hollenbach</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lerer</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Sheets</surname>
          </string-name>
          . Tabulator:
          <article-title>Exploring and Analyzing Linked Data on the Semantic Web</article-title>
          .
          <source>In In Procedings of the 3rd International Semantic Web User Interaction Workshop (SWUI06)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ercegovac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balmin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Eltabakh</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Kanne</surname>
            , F. zcan, and
            <given-names>E. J.</given-names>
          </string-name>
          <string-name>
            <surname>Shekita</surname>
          </string-name>
          .
          <article-title>Jaql: A Scripting Language for Large Scale Semistructured Data Analysis</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>12</issue>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          .
          <article-title>The R2R Framework: Publishing and Discovering Mappings on the Web</article-title>
          . In O.
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Harth</surname>
          </string-name>
          , and J. Sequeda, editors,
          <source>COLD</source>
          , volume
          <volume>665</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Volz</surname>
          </string-name>
          , G. Kobilarov, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke. Silk - A Link Discovery</surname>
          </string-name>
          <article-title>Framework for the Web of Data</article-title>
          . In 18th International WWW Conference,
          <year>April 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Stickler</surname>
          </string-name>
          . Named Graphs.
          <source>Journal of Web Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Cattell</surname>
          </string-name>
          .
          <article-title>Scalable SQL and NoSQL Data Stores</article-title>
          . SIGMOD Rec.,
          <volume>39</volume>
          (
          <issue>4</issue>
          ), May
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Chaiken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jenkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ramsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shakib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weaver</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          . SCOPE:
          <article-title>Easy and E cient Parallel Processing of Massive Data Sets</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>1</volume>
          (
          <issue>2</issue>
          ), Aug.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Son</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sung</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y. D.</given-names>
            <surname>Chung</surname>
          </string-name>
          .
          <article-title>SPIDER: A System for Scalable, Parallel / Distributed Evaluation of Large-scale RDF Data</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management</source>
          ,
          <source>CIKM '09. ACM</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          .
          <source>MapReduce: Simpli ed Data Processing on Large Clusters. Commun. ACM</source>
          ,
          <volume>51</volume>
          (
          <issue>1</issue>
          ), Jan.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. L.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>P. P.</given-names>
          </string-name>
          da Silva, and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness. Tracking RDF Graph</surname>
          </string-name>
          <article-title>Provenance using RDF Molecules</article-title>
          .
          <source>In Proceedings of the 4th International Semantic Web Conference</source>
          ,
          <year>November 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Heise</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rheinlander, M. Leich,
          <string-name>
            <given-names>U.</given-names>
            <surname>Leser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          . Meteor/Sopremo:
          <article-title>An Extensible Query Language and Operator Model</article-title>
          .
          <source>In BigData Workshop</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>T.</given-names>
            <surname>Hey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tansley</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Tolle, editors.
          <source>The Fourth Paradigm: Data-Intensive Scienti c Discovery. Microsoft Research</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cha</surname>
          </string-name>
          , E. Sedlar, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Olukotun</surname>
          </string-name>
          .
          <article-title>Green-Marl: a DSL for Easy and E cient Graph Analysis</article-title>
          .
          <source>In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>M. Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Budiu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Birrell</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fetterly</surname>
          </string-name>
          . Dryad:
          <article-title>Distributed Dataparallel Programs from Sequential Building Blocks</article-title>
          .
          <source>In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems</source>
          <year>2007</year>
          , EuroSys '
          <fpage>07</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. C. Liu, G. Qi, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Large Scale Temporal RDFS Reasoning Using MapReduce</article-title>
          . In AAAI,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>M.</given-names>
            <surname>Loukides</surname>
          </string-name>
          .
          <source>What is Data Science? O`Reilly radar</source>
          ,
          <volume>6</volume>
          <fpage>2010</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          , H. Muhleisen, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          . Sieve:
          <article-title>Linked Data Quality Assessment and Fusion</article-title>
          .
          <source>In Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBTICDT '12. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>J. Myung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yeon</surname>
          </string-name>
          , and S.-g. Lee.
          <article-title>SPARQL Basic Graph Pattern Processing with Iterative MapReduce</article-title>
          .
          <source>In Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, MDAC '10. ACM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>C. Olston</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Reed</surname>
            , U. Srivastava,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          .
          <article-title>Pig Latin: a Not-soforeign Language for Data Processing</article-title>
          .
          <source>In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08. ACM</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Fawcett</surname>
          </string-name>
          .
          <article-title>Data Science and its Relationship to Big Data and Data-Driven Decision Making</article-title>
          .
          <source>Big Data</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ), Mar.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>P.</given-names>
            <surname>Ravindra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Anyanwu</surname>
          </string-name>
          .
          <article-title>An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce</article-title>
          .
          <source>In Proceedings of the 8th extended semantic web conference on The semanic web: research and applications -</source>
          Volume
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          ,
          <source>ESWC'11</source>
          . Springer-Verlag,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. A. Schatzle, M.
          <string-name>
            <surname>Przyjaciel-Zablocki</surname>
            , and
            <given-names>G. Lausen.</given-names>
          </string-name>
          <article-title>PigSPARQL: Mapping SPARQL to Pig Latin</article-title>
          .
          <source>In Proceedings of the International Workshop on Semantic Web Information Management, SWIM '11. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          . Ontology Matching:
          <article-title>State of the Art and Future Challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>M. Stonebraker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Harizopoulos</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Hachem</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Helland</surname>
          </string-name>
          .
          <article-title>The End of an Architectural Era: (It's Time for a Complete Rewrite)</article-title>
          .
          <source>In Proceedings of the 33rd international conference on Very large data bases</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>A.</given-names>
            <surname>Thusoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chakka</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Z.</surname>
          </string-name>
          0002,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anthony</surname>
          </string-name>
          , H. Liu, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Murthy</surname>
          </string-name>
          .
          <article-title>Hive - a Petabyte Scale Data Warehouse Using Hadoop</article-title>
          .
          <source>In ICDE. IEEE</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>Principles of Database and Knowledge-base Systems, chapter 2</article-title>
          . Computer Science Press, Rockville,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>J. Urbani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Maassen</surname>
            ,
            <given-names>F. van Harmelen</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Bal</surname>
          </string-name>
          .
          <article-title>OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples</article-title>
          .
          <source>In Proceedings of the 7th international conference on The Semantic Web: research and Applications</source>
          - Volume
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , ESWC'
          <fpage>10</fpage>
          . Springer-Verlag,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>