<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Demonstration of GOOSE: A Secure Framework for Graph Outsourcing and SPARQL Evaluation?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Radu Ciucanu</string-name>
          <email>radu.ciucanu@insa-cvl.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Lafourcade</string-name>
          <email>pascal.lafourcade@uca.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INSA Centre Val de Loire</institution>
          ,
          <addr-line>Univ. Orleans, LIFO EA 4022</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universite Clermont Auvergne, LIMOS CNRS UMR 6158</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We demonstrate GOOSE, an open-source framework for secure graph outsourcing and SPARQL evaluation. We showcase the workow of GOOSE over various real-world use cases, the scalability of GOOSE, and the security properties that GOOSE guarantees in the honest-butcurious cloud security model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Enhancing Semantic Web technologies with security and privacy guarantees is an
important and popular problem [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Several systems have been proposed to tackle
di erent settings, from both security (e.g., [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) and privacy (e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) viewpoints.
      </p>
      <p>
        We take a complementary look by addressing the security issues that occur
when outsourcing an RDF graph to the cloud and querying the outsourced graph
with SPARQL. Our scenario is inspired by the database as a service cloud
computing model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where a data owner outsources some data to the cloud, then a
user is allowed to submit queries to the cloud, which computes and returns the
query answers to the user. We assume that the cloud is honest-but-curious i.e.,
executes tasks dutifully, but tries to gain as much information as possible.
      </p>
      <p>
        We demonstrate GOOSE, an open-source framework that relies on
cryptographic schemes and secure multi-party computation to achieve desirable
security properties: (i) no cloud node can learn the graph, (ii) no cloud node can learn
at the same time the query and the query answers, and (iii) an external network
observer cannot learn the graph, the query, or the query answers. GOOSE has
been presented3 as a full paper at the DBSec 2020 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] conference. The goal of
this demo paper is to showcase GOOSE to the Semantic Web community. Indeed,
GOOSE is an innovative system that allows secure data outsourcing and query
evaluation relevant to popular Semantic Web technologies (RDF and SPARQL).
      </p>
      <p>In Sect. 2, we present an overview of GOOSE, whereas in Sect. 3 we describe
our demonstration scenarios. Due to lack of space, we omit several details
(related work, theoretical and empirical analysis) that can be found in the GOOSE
? Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
3 https://www.youtube.com/watch?v=ZhtpulFf3rs</p>
      <p>(1) Enc(Q)
User
(4) Enc(Ans(G; Q))
(3) Enc(Ans(Gb; Qb))</p>
      <p>Answers
Translator
(2) Enc(Qb)</p>
      <p>SPARQL
Engine
(0) Enc( V )
(0) Enc( )
(0) Enc(Eb)</p>
      <p>Fig. 1. Architecture of GOOSE.</p>
      <p>TravelsTo</p>
      <p>Paris</p>
      <p>Alice</p>
      <p>ReadsAbout
TravelsTo</p>
      <p>Charlie</p>
      <p>Follows
Follows</p>
      <p>Follows
Follows</p>
      <p>Bob
David</p>
      <p>TravelsTo
ReadsAbout</p>
      <p>
        Milan
conference paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The open-source code of GOOSE, as well as the di erent use
cases and data that we use throughout our demonstration scenarios are available
on GitHub 4.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>System Overview</title>
      <p>
        In Fig. 1, we show the architecture of GOOSE, which has 5 participants: data
owner (DO), who owns the graph that it outsources to the cloud in order to be
queried, user (U), who submits graph queries to the cloud and receives query
answers, and 3 cloud participants: query translator (QT), SPARQL engine (SE)
and answers translator (AT). Each Enc from Fig. 1 uses the AES [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] key shared
between the 2 concerned participants. We next outline GOOSE via an example.
Graph Data and Queries. An RDF5 graph is a set of triples (subject,
predicate, object). For the goal of this paper, we simply assume that a graph G=(V; E)
is a directed, edge-labeled graph, where V is a set of nodes and E V V is
a set of directed edges between nodes of V , with labels from an alphabet . The
graph in Fig. 2 has 6 nodes, an alphabet of 3 possible edge labels, and 9 edges.
      </p>
      <p>We focus on Unions of Conjunctions of Regular Path Queries (UCRPQ) that
are at the core of SPARQL 1.16, including recursive queries via Kleene star. By
Ans(G; Q) we denote the answers of query Q over a graph G, using standard
SPARQL semantics. For example, the UCRPQ (?x; ?z) (?x; Follows+; ?y),
(?y; TravelsTo; ?z) selects nodes ?x; ?z s.t. there exists node ?y s.t. one can go
from ?x to ?y with a path in the language of \Follows+" and can go from ?y to
?z with a path in the language of \TravelsTo". The answers of this query on the
4 https://github.com/radu1/goose
5 https://www.w3.org/TR/rdf11-concepts/
6 https://www.w3.org/TR/sparql11-query/
graph from Fig. 2 are (Alice, Milan), (Alice, Paris), (Bob, Milan), (Bob, Paris),
(David, Paris). For example, the tuple (Alice, Paris) is an answer because of
paths Alice Follows Bob Follows David Follows Charlie and Charlie TravelsTo
Paris, where ?x; ?!y; ?z are map!ped to Alice, C!harlie, Paris, respectively. !
Step 0. The graph outsourcing (i.e., the 3 outgoing arrows from DO in Fig. 1)
is done only once at the beginning by DO. Intuitively, DO sends to each cloud
participant a piece of G s.t. each participant can perform its task during query
evaluation but no participant can reconstruct the entire graph. To to so, DO
generates 2 random bijections: (for edge labels) and V (for graph nodes).
By 1 we denote the inverse of (this is needed later on at the end of query
evaluation). For our example graph in Fig. 2, DO may generate:
V =fAlice ! 5; Bob ! 3; Charlie ! 0; David ! 1; Milan ! 2; Paris ! 4g
=fFollows ! 1; ReadsAbout ! 2; TravelsTo ! 0g:
DO uses these 2 functions to hide graph edges: by Eb we denote the hidden set of
edges generated from E, where nodes are replaced using V , and edge labels are
replaced using . On our example in Fig. 2, edge (Alice, Follows, Bob) becomes
(5, 1, 3), edge (Alice, ReadsAbout, Paris) becomes (5, 2, 4), and nally:</p>
      <p>Eb = f(5;1;3); (5;2;4); (5;0;4); (3;1;5); (3;1;1); (3;0;2); (0;0;4); (1;1;0); (1;2;2)g:
Each message sent over the network is encrypted with the key shared between
DO and the corresponding cloud participant, which can decrypt the message
upon reception. Messages are encrypted to avoid that an external observer that
sees them in clear is able to learn the graph G. The distribution of graph storage
among cloud participants makes that none of them can learn the graph G.</p>
      <p>We next discuss query evaluation i.e., steps 1{4 cf. Fig. 1, done for each
query submitted by U. Each message exchanged over the network during query
evaluation is encrypted with the key shared between corresponding participants,
such that an external observer cannot learn the query and its answers.
Step 1. U submits query Q to QT. For example, recall the aforementioned query
(?x; ?z) (?x; Follows+; ?y); (?y; TravelsTo; ?z).</p>
      <p>Step 2. QT translates Q by replacing all labels used in Q using the function
received from DO. By Qb we denote the query Q translated using . On our
example, query from step 1 becomes (?x; ?z) (?x; 1+; ?y); (?y; 0; ?z).
Step 3. SE evaluates translated query Qb received from QT at step 2 on the
graph with hidden nodes and edge labels as de ned by Eb received from DO
during step 0. To do so, SE simply uses some standard SPARQL engine as a
black-box, without any change to the query engine7. We denote the result of SE
by Ans(Gb; Qb), where the true answers Ans(G; Q) are still hidden using function</p>
      <p>V . On our example, Ans(Gb; Qb) = f(5; 2); (5; 4); (3; 2); (3; 4); (1; 4)g.
Step 4. AT uses function V 1 to translate hidden query answers Ans(Gb; Qb) into
true query answers. On our example, AT recovers Ans(G; Q) =f(Alice, Milan),
(Alice, Paris), (Bob, Milan), (Bob, Paris), (David, Paris)g that AT sends to U.
7 In our implementation, we rely on Apache Jena https://jena.apache.org/</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration Scenarios</title>
      <p>
        We (i) introduce via examples the complete work ow of GOOSE and the class
of supported SPARQL queries using real-world scenarios, (ii) emphasize the
scalability of GOOSE, and (iii) point out the security properties of GOOSE and
the security model in which these properties hold.
(i) GOOSE by example. On the GitHub repository of GOOSE (URL given in
Sect 1), in the directory running-example, we included the script example.sh
that reproduces the running example from Sect. 2 and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To analyze the graph,
query, and query answers used by this script, see sub-directories example and
example-secure for standard and GOOSE versions, respectively. In particular,
the les from example-secure hide nodes and edges as outlined in Sect. 2. Notice
that we chose to initially specify the UCRPQ in an XML format and then
translate them in SPARQL. The aforementioned XML format and the translation
to SPARQL are based on gMark [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ], a state-of-the art generator of synthetic
graphs and UCRPQ workloads. Our choice is motivated by the observation that
GOOSE can be easily extended to secure graph outsourcing and UCRPQ
evaluation, regardless the practical language used to encode the UCRPQ. Indeed, one
can easily modify GOOSE to translate the UCRPQ in SQL with recursive views
instead of SPARQL and use PostgreSQL instead of Apache Jena, and hence
obtain a practical system guaranteeing exactly the same security properties.
      </p>
      <p>Going back to the demo, we stress that the aforementioned running example
script provided on the GOOSE repository can be easily run on a laptop. If one
tries the script and gets some error, it is likely that there are some missing
packages. GOOSE is written in Python, and uses Apache Jena (written in Java) for
SPARQL evaluation and gMark (written in C++) for graph and query workload
generation. The script install-libraries.sh installs the necessary libraries.
Before running this script, one should be aware that, depending on the former
state of her computer, it may be more or less trivial to go back to that state8.</p>
      <p>
        In addition to the running example, we have a prede ned script that relies
on four real-world cases based on gMark: uniprot (biological data where proteins
interact with other proteins, are encoded on genes, etc.), shop (online shop selling
di erent types of products to users, etc.), social-network (social network where
persons know other persons, work in companies, etc.), and bib (bibliographical
data about researchers that author papers published in journals or conferences,
etc.). We discuss how to use this script when describing the next scenario.
(ii) Scalability. The idea of this scenario is to generate graphs of increasing sizes
and observe that GOOSE has a linear time behavior for the graph outsourcing. As
for the query evaluation, we generate queries having diverse properties (w.r.t.
arity, selectivity, shape, and use of recursion), run GOOSE, and zoom on the
time taken by each step of GOOSE. One should observe that the bottleneck of
secure query evaluation in GOOSE does not come from the use of cryptographic
primitives, but is due to the SPARQL engine used as a black-box, in particular for
8 We leave as future work the \dockerization" suggested by Anonymous Reviewer 1.
evaluating recursive queries. These observations should con rm the theoretical
and empirical analysis detailed in the full paper on GOOSE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The script script-complete-workflow.sh allows to run such a complete
work ow. As currently con gured, the script should generate the large-scale
experiment reported in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which took 8 days and generated 46GB of data (total
size for graphs, queries, and query answers). To generate quicker scalability
experiments, one can simply tune to smaller values the 5 parameters that have
selfexplanatory names with scaling factor as a substring. To change the gMark
graph and query workload con gurations, one can tune the XML les from
directory gmark/use-cases (see [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ] for the meaning of the gMark constraints).
(iii) Security. To emphasize the challenges of building a system like GOOSE
and to understand what GOOSE design choices make it secure in the
honest-butcurious cloud adversary model, we refer to the cryptographic tools and security
theorems from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For instance, the non-deterministic encryption mode
AESCBC that we chose for GOOSE implies that: for a given graph, if two distinct
queries yield identical answer sets, then these answer sets are encrypted
differently, hence an external network observer (e.g., a curious cloud admin) that
analyzes the messages exchanged over the network cannot know whether two
queries are equivalent on a speci c graph. On the other hand, if one assumes
stronger attacks (e.g., a network observer that has as background knowledge
some partial knowledge on the graph), that could break some GOOSE security
properties by leaking partial knowledge on the queries and their answers.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Advanced</given-names>
            <surname>Encryption</surname>
          </string-name>
          <article-title>Standard (AES) (</article-title>
          <year>2001</year>
          ),
          <source>FIPS Publication 197</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bagan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonifati</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciucanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fletcher</surname>
            ,
            <given-names>G.H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Advokaat</surname>
          </string-name>
          , N.:
          <article-title>Generating Flexible Workloads for Graph Databases</article-title>
          .
          <source>PVLDB</source>
          <volume>9</volume>
          (
          <issue>13</issue>
          ),
          <volume>1457</volume>
          {
          <fpage>1460</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bagan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonifati</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciucanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fletcher</surname>
            ,
            <given-names>G.H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Advokaat</surname>
          </string-name>
          , N.:
          <article-title>gMark: Schema-Driven Generation of Graphs and Queries</article-title>
          .
          <source>IEEE TKDE 29(4)</source>
          ,
          <volume>856</volume>
          {
          <fpage>869</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ciucanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lafourcade</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>GOOSE: A Secure Framework for Graph Outsourcing and SPARQL Evaluation</article-title>
          . In: DBSec. pp.
          <volume>347</volume>
          {
          <issue>366</issue>
          (
          <year>2020</year>
          ), https://doi.org/10. 1007/978-3-
          <fpage>030</fpage>
          -49669-2_
          <fpage>20</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Curino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>E.P.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malviya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balakrishnan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeldovich</surname>
          </string-name>
          , N.:
          <article-title>Relational Cloud: a Database Service for the Cloud</article-title>
          . In: CIDR. pp.
          <volume>235</volume>
          {
          <issue>240</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Delanaux</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonifati</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rousset</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thion</surname>
          </string-name>
          , R.:
          <article-title>Query-Based Linked Data Anonymization</article-title>
          . In: ISWC. pp.
          <volume>530</volume>
          {
          <issue>546</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyskal</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>HDTcrypt : Compression and Encryption of RDF Datasets</article-title>
          .
          <source>Semantic Web Journal</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villata</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Privacy</surname>
          </string-name>
          ,
          <article-title>Security and Policies: A Review of Problems and Solutions with Semantic Web Technologies</article-title>
          .
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <issue>2</issue>
          ),
          <volume>153</volume>
          {
          <fpage>161</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>