<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Demonstration of GOOSE: A Secure Framework for Graph Outsourcing and SPARQL Evaluation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Radu</forename><surname>Ciucanu</surname></persName>
							<email>radu.ciucanu@insa-cvl.fr</email>
							<affiliation key="aff0">
								<orgName type="department">INSA Centre Val de Loire</orgName>
								<orgName type="institution">Univ. Orléans</orgName>
								<address>
									<postCode>EA 4022</postCode>
									<settlement>LIFO</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pascal</forename><surname>Lafourcade</surname></persName>
							<email>pascal.lafourcade@uca.fr</email>
							<affiliation key="aff1">
								<orgName type="laboratory">LIMOS CNRS UMR 6158</orgName>
								<orgName type="institution">Université Clermont Auvergne</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Demonstration of GOOSE: A Secure Framework for Graph Outsourcing and SPARQL Evaluation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D21BDBFD36488C9171CFA19BDE14BB15</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T08:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We demonstrate GOOSE, an open-source framework for secure graph outsourcing and SPARQL evaluation. We showcase the workflow of GOOSE over various real-world use cases, the scalability of GOOSE, and the security properties that GOOSE guarantees in the honest-butcurious cloud security model.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Enhancing Semantic Web technologies with security and privacy guarantees is an important and popular problem <ref type="bibr" target="#b7">[8]</ref>. Several systems have been proposed to tackle different settings, from both security (e.g., <ref type="bibr" target="#b6">[7]</ref>) and privacy (e.g., <ref type="bibr" target="#b5">[6]</ref>) viewpoints.</p><p>We take a complementary look by addressing the security issues that occur when outsourcing an RDF graph to the cloud and querying the outsourced graph with SPARQL. Our scenario is inspired by the database as a service cloud computing model <ref type="bibr" target="#b4">[5]</ref>, where a data owner outsources some data to the cloud, then a user is allowed to submit queries to the cloud, which computes and returns the query answers to the user. We assume that the cloud is honest-but-curious i.e., executes tasks dutifully, but tries to gain as much information as possible.</p><p>We demonstrate GOOSE, an open-source framework that relies on cryptographic schemes and secure multi-party computation to achieve desirable security properties: (i) no cloud node can learn the graph, (ii) no cloud node can learn at the same time the query and the query answers, and (iii) an external network observer cannot learn the graph, the query, or the query answers. GOOSE has been presented 3 as a full paper at the DBSec 2020 <ref type="bibr" target="#b3">[4]</ref> conference. The goal of this demo paper is to showcase GOOSE to the Semantic Web community. Indeed, GOOSE is an innovative system that allows secure data outsourcing and query evaluation relevant to popular Semantic Web technologies (RDF and SPARQL).</p><p>In Sect. 2, we present an overview of GOOSE, whereas in Sect. 3 we describe our demonstration scenarios. Due to lack of space, we omit several details (related work, theoretical and empirical analysis) that can be found in the GOOSE User Query Translator</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SPARQL Engine</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Answers Translator</head><p>Data Owner conference paper <ref type="bibr" target="#b3">[4]</ref>. The open-source code of GOOSE, as well as the different use cases and data that we use throughout our demonstration scenarios are available on GitHub<ref type="foot" target="#foot_0">4</ref> .</p><formula xml:id="formula_0">(0) Enc(σ Σ ) (0) Enc(σ V ) (0) Enc( E) (1) Enc(Q) (2) Enc( Q) (3) Enc(Ans( G, Q))<label>(</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">System Overview</head><p>In Fig. <ref type="figure">1</ref>, we show the architecture of GOOSE, which has 5 participants: data owner (DO), who owns the graph that it outsources to the cloud in order to be queried, user (U), who submits graph queries to the cloud and receives query answers, and 3 cloud participants: query translator (QT), SPARQL engine (SE) and answers translator (AT). Each Enc from Fig. <ref type="figure">1</ref> uses the AES <ref type="bibr" target="#b0">[1]</ref> key shared between the 2 concerned participants. We next outline GOOSE via an example.</p><p>Graph Data and Queries. An RDF<ref type="foot" target="#foot_1">5</ref> graph is a set of triples (subject, predicate, object). For the goal of this paper, we simply assume that a graph G=(V, E) is a directed, edge-labeled graph, where V is a set of nodes and E ⊆ V × Σ × V is a set of directed edges between nodes of V , with labels from an alphabet Σ. The graph in Fig. <ref type="figure">2</ref> has 6 nodes, an alphabet of 3 possible edge labels, and 9 edges.</p><p>We focus on Unions of Conjunctions of Regular Path Queries (UCRPQ) that are at the core of SPARQL 1.1 6 , including recursive queries via Kleene star. By Ans(G, Q) we denote the answers of query Q over a graph G, using standard SPARQL semantics. For example, the UCRPQ (?x, ?z) ← (?x, Follows + , ?y), (?y, TravelsTo, ?z) selects nodes ?x, ?z s.t. there exists node ?y s.t. one can go from ?x to ?y with a path in the language of "Follows + " and can go from ?y to ?z with a path in the language of "TravelsTo". The answers of this query on the graph from Fig. <ref type="figure">2</ref>  Step 0. The graph outsourcing (i.e., the 3 outgoing arrows from DO in Fig. <ref type="figure">1</ref>) is done only once at the beginning by DO. Intuitively, DO sends to each cloud participant a piece of G s.t. each participant can perform its task during query evaluation but no participant can reconstruct the entire graph. To to so, DO generates 2 random bijections: σ Σ (for edge labels) and σ V (for graph nodes). By σ −1 we denote the inverse of σ (this is needed later on at the end of query evaluation). For our example graph in Fig. <ref type="figure">2</ref>, DO may generate:</p><formula xml:id="formula_1">σ V ={Alice → 5, Bob → 3, Charlie → 0, David → 1, Milan → 2, Paris → 4} σ Σ ={Follows → 1, ReadsAbout → 2, TravelsTo → 0}.</formula><p>DO uses these 2 functions to hide graph edges: by E we denote the hidden set of edges generated from E, where nodes are replaced using σ V , and edge labels are replaced using σ Σ . On our example in Fig. <ref type="figure">2</ref>, edge (Alice, Follows, Bob) becomes (5, 1, 3), edge (Alice, ReadsAbout, Paris) becomes (5, 2, 4), and finally: E = {(5,1,3), (5,2,4), (5,0,4), (3,1,5), (3,1,1), (3,0,2), (0,0,4), (1,1,0), (1,2,2)}.</p><p>Each message sent over the network is encrypted with the key shared between DO and the corresponding cloud participant, which can decrypt the message upon reception. Messages are encrypted to avoid that an external observer that sees them in clear is able to learn the graph G. The distribution of graph storage among cloud participants makes that none of them can learn the graph G.</p><p>We next discuss query evaluation i.e., steps 1-4 cf. Fig. <ref type="figure">1</ref>, done for each query submitted by U. Each message exchanged over the network during query evaluation is encrypted with the key shared between corresponding participants, such that an external observer cannot learn the query and its answers.</p><p>Step 1. U submits query Q to QT. For example, recall the aforementioned query (?x, ?z) ← (?x, Follows + , ?y), (?y, TravelsTo, ?z).</p><p>Step 2. QT translates Q by replacing all labels used in Q using the function σ Σ received from DO. By Q we denote the query Q translated using σ Σ . On our example, query from step 1 becomes (?x, ?z) ← (?x, 1 + , ?y), (?y, 0, ?z).</p><p>Step 3. SE evaluates translated query Q received from QT at step 2 on the graph with hidden nodes and edge labels as defined by E received from DO during step 0. To do so, SE simply uses some standard SPARQL engine as a black-box, without any change to the query engine <ref type="foot" target="#foot_3">7</ref> . We denote the result of SE by Ans( G, Q), where the true answers Ans(G, Q) are still hidden using function σ V . On our example, Ans( G, Q) = {(5, 2), <ref type="bibr" target="#b4">(5,</ref><ref type="bibr" target="#b3">4)</ref>, <ref type="bibr" target="#b2">(3,</ref><ref type="bibr" target="#b1">2)</ref>, <ref type="bibr" target="#b2">(3,</ref><ref type="bibr" target="#b3">4)</ref>, (1, 4)}.</p><p>Step 4. AT uses function σ −1 V to translate hidden query answers Ans( G, Q) into true query answers. On our example, AT recovers Ans(G, Q) ={(Alice, Milan), (Alice, Paris), (Bob, Milan), (Bob, Paris), (David, Paris)} that AT sends to U.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Demonstration Scenarios</head><p>We (i) introduce via examples the complete workflow of GOOSE and the class of supported SPARQL queries using real-world scenarios, (ii) emphasize the scalability of GOOSE, and (iii) point out the security properties of GOOSE and the security model in which these properties hold.</p><p>(i) GOOSE by example. On the GitHub repository of GOOSE (URL given in Sect 1), in the directory running-example, we included the script example.sh that reproduces the running example from Sect. 2 and <ref type="bibr" target="#b3">[4]</ref>. To analyze the graph, query, and query answers used by this script, see sub-directories example and example-secure for standard and GOOSE versions, respectively. In particular, the files from example-secure hide nodes and edges as outlined in Sect. 2. Notice that we chose to initially specify the UCRPQ in an XML format and then translate them in SPARQL. The aforementioned XML format and the translation to SPARQL are based on gMark <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, a state-of-the art generator of synthetic graphs and UCRPQ workloads. Our choice is motivated by the observation that GOOSE can be easily extended to secure graph outsourcing and UCRPQ evaluation, regardless the practical language used to encode the UCRPQ. Indeed, one can easily modify GOOSE to translate the UCRPQ in SQL with recursive views instead of SPARQL and use PostgreSQL instead of Apache Jena, and hence obtain a practical system guaranteeing exactly the same security properties.</p><p>Going back to the demo, we stress that the aforementioned running example script provided on the GOOSE repository can be easily run on a laptop. If one tries the script and gets some error, it is likely that there are some missing packages. GOOSE is written in Python, and uses Apache Jena (written in Java) for SPARQL evaluation and gMark (written in C++) for graph and query workload generation. The script install-libraries.sh installs the necessary libraries. Before running this script, one should be aware that, depending on the former state of her computer, it may be more or less trivial to go back to that state <ref type="foot" target="#foot_4">8</ref> .</p><p>In addition to the running example, we have a predefined script that relies on four real-world cases based on gMark: uniprot (biological data where proteins interact with other proteins, are encoded on genes, etc.), shop (online shop selling different types of products to users, etc.), social-network (social network where persons know other persons, work in companies, etc.), and bib (bibliographical data about researchers that author papers published in journals or conferences, etc.). We discuss how to use this script when describing the next scenario.</p><p>(ii) Scalability. The idea of this scenario is to generate graphs of increasing sizes and observe that GOOSE has a linear time behavior for the graph outsourcing. As for the query evaluation, we generate queries having diverse properties (w.r.t. arity, selectivity, shape, and use of recursion), run GOOSE, and zoom on the time taken by each step of GOOSE. One should observe that the bottleneck of secure query evaluation in GOOSE does not come from the use of cryptographic primitives, but is due to the SPARQL engine used as a black-box, in particular for evaluating recursive queries. These observations should confirm the theoretical and empirical analysis detailed in the full paper on GOOSE <ref type="bibr" target="#b3">[4]</ref>.</p><p>The script script-complete-workflow.sh allows to run such a complete workflow. As currently configured, the script should generate the large-scale experiment reported in <ref type="bibr" target="#b3">[4]</ref>, which took 8 days and generated 46GB of data (total size for graphs, queries, and query answers). To generate quicker scalability experiments, one can simply tune to smaller values the 5 parameters that have selfexplanatory names with scaling factor as a substring. To change the gMark graph and query workload configurations, one can tune the XML files from directory gmark/use-cases (see <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref> for the meaning of the gMark constraints).</p><p>(iii) Security. To emphasize the challenges of building a system like GOOSE and to understand what GOOSE design choices make it secure in the honest-butcurious cloud adversary model, we refer to the cryptographic tools and security theorems from <ref type="bibr" target="#b3">[4]</ref>. For instance, the non-deterministic encryption mode AES-CBC that we chose for GOOSE implies that: for a given graph, if two distinct queries yield identical answer sets, then these answer sets are encrypted differently, hence an external network observer (e.g., a curious cloud admin) that analyzes the messages exchanged over the network cannot know whether two queries are equivalent on a specific graph. On the other hand, if one assumes stronger attacks (e.g., a network observer that has as background knowledge some partial knowledge on the graph), that could break some GOOSE security properties by leaking partial knowledge on the queries and their answers.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>4 )Fig. 1 .Fig. 2 .</head><label>412</label><figDesc>Fig. 1. Architecture of GOOSE.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>are (Alice, Milan), (Alice, Paris), (Bob, Milan), (Bob, Paris), (David, Paris). For example, the tuple (Alice, Paris) is an answer because of paths Alice Follows −−−−→ Bob Follows −−−−→ David Follows −−−−→ Charlie and Charlie TravelsTo −−−−−−→ Paris, where ?x, ?y, ?z are mapped to Alice, Charlie, Paris, respectively.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">https://github.com/radu1/goose</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">https://www.w3.org/TR/rdf11-concepts/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">https://www.w3.org/TR/sparql11-query/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_3">In our implementation, we rely on Apache Jena https://jena.apache.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_4">We leave as future work the "dockerization" suggested by Anonymous Reviewer 1.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">FIPS Publication</title>
		<imprint>
			<biblScope unit="volume">197</biblScope>
			<date type="published" when="2001">2001</date>
			<publisher>AES</publisher>
		</imprint>
		<respStmt>
			<orgName>Advanced Encryption Standard</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Generating Flexible Workloads for Graph Databases</title>
		<author>
			<persName><forename type="first">G</forename><surname>Bagan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bonifati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ciucanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H L</forename><surname>Fletcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lemay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Advokaat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">13</biblScope>
			<biblScope unit="page" from="1457" to="1460" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">gMark: Schema-Driven Generation of Graphs and Queries</title>
		<author>
			<persName><forename type="first">G</forename><surname>Bagan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bonifati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ciucanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H L</forename><surname>Fletcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lemay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Advokaat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE TKDE</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="856" to="869" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">GOOSE: A Secure Framework for Graph Outsourcing and SPARQL Evaluation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ciucanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lafourcade</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-49669-2_20</idno>
		<ptr target="https://doi.org/10.1007/978-3-030-49669-2_20" />
	</analytic>
	<monogr>
		<title level="m">DBSec</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="347" to="366" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Relational Cloud: a Database Service for the Cloud</title>
		<author>
			<persName><forename type="first">C</forename><surname>Curino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P C</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Popa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Malviya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Balakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zeldovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CIDR</title>
		<imprint>
			<biblScope unit="page" from="235" to="240" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Query-Based Linked Data Anonymization</title>
		<author>
			<persName><forename type="first">R</forename><surname>Delanaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bonifati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rousset</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Thion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="530" to="546" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">HDTcrypt : Compression and Encryption of RDF Datasets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Steyskal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web Journal</title>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Privacy, Security and Policies: A Review of Problems and Solutions with Semantic Web Technologies</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Villata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>D'aquin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="153" to="161" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
