<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL Query Execution in Networks of Web Browsers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arnaud Grall</string-name>
          <email>arnaud.grall@gfi.fr</email>
          <email>arnaud.grall@univ-nantes.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hala Skaf-Molli</string-name>
          <email>hala.skaf@univ-nantes.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Molli</string-name>
          <email>pascal.molli@univ-nantes.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GFI Informatique - IS/CIE</institution>
          ,
          <addr-line>Nantes</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LS2N - University of Nantes</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Decentralizing the Web means that users gain the ability to store their data wherever they want, users can store their data in their web browsers. Web browsers allow decentralized applications to be deployed in one click and enable interaction with end-users. However, querying data stored in a large network of web browsers remains challenging. A network of web browsers gathers a very large number of browsers hosting small data. The network is highly dynamic, many web browsers are now running on mobile devices with limited resources, this raises issues on energy consumption. In this paper, we propose Snob, a query execution model for SPARQL query over RDF data hosted in a network of browsers. The execution of a query in a browser is similar to the execution of a federated SPARQL query over remote data sources. In Snob, direct neighbours in the network are the data sources and results received from neighbours are stored locally as intermediate results. As data sources are renewed every shuffling, the query continues to progress and could produce new results after each shuffling without congesting the network. To speed up query execution, browsers processing similar queries are connected through a semantic overlay network. Experimentation shows that the number of answers produced by queries grows with the number of executed queries in the network while the number of exchanged messages is always bounded per user.</p>
      </abstract>
      <kwd-group>
        <kwd>Decentralized Data Management</kwd>
        <kwd>Decentralized Applications</kwd>
        <kwd>SPARQL Query Processing</kwd>
        <kwd>Web Browsers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Decentralizing the applications and the data is a powerful paradigm to provide
more control to users on their digital life 3. A decentralized application runs on
resources owned by multiple authorities and provides a service even in presence
of failures of any authority. Data are decentralized if authorities are free to store
their data wherever they want. In this paper, we focus on how a decentralized
3 This work will be published as part of the book "Emerging Topics in Semantic
Technologies. ISWC 2018 Satellite Events. E. Demidova, A.J. Zaveri, E. Simperl
(Eds.), ISBN: 978-3-89838-736-1, 2018, AKA Verlag Berlin."
application running in web browsers is able to provide a SPARQL service over
data stored in browsers.</p>
      <p>We focus on browsers as they are certainly the most deployed programming
environment in the world. Decentralized applications can be deployed in
oneclick, they are in direct contact with end-users, they are able to capture their
location, their history of browsing or their perceptions of the real world. Browsers
already integrate APIs to store data locally and are often used by web
applications to provide off-line functionalities. However, querying data over a large-scale
network of browsers is challenging.</p>
      <p>
        Many works address the problems of data management in Peer-to-Peer
networks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. However, decentralizing data in browsers raises some particular issues.
First, a network of browsers gathers a very large number of browsers with few
data hosted in browsers. It raises the problem of completeness when executing a
query over a very large number of relevant sources as pointed out in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Second,
a large number of browsers are now running on mobile phones with LTE
connections. The dynamicity of such networks is very high compared to existing P2P
networks. Saving bandwidth and battery are clearly two major issues that are
not considered by many existing P2P data management systems. This raises the
problem of processing queries with a fair usage of resources. Third, connections
between browsers rely on the WebRTC4 standard that does not have routing. As
a browser has no address, indexing resources is useless because a browser hosting
relevant data cannot be contacted without flooding the network. This raises the
problem of processing queries when only direct neighbours can be contacted.
      </p>
      <p>
        In this paper, we propose Snob, an approach for executing SPARQL queries
over a network of browsersWe assume that browsers are connected in an
unstructured network based on periodic peer-sampling [
        <xref ref-type="bibr" rid="ref18 ref22">18,22</xref>
        ], i.e., each browser is
connected to a bounded random subset of the network that is renewed
periodically during a shuffling. A query running in a browser can be seen as a federated
SPARQL query where neighbours are the data sources. As data sources are
renewed every shuffling, the query continues to progress after each shuffling. This
ensures that the number of messages exchanged with neighbours is constant
and every query makes progress at each shuffling. However, as progression can
be slow, we build a second overlay network where browsers are connected to a
fixed number of browsers that process similar queries. The new semantic overlay
allows browsers to share common intermediate results.
      </p>
      <p>This paper presents the following contributions:
1. Snob a SPARQL query execution model over data hosted in a network of
browsers. Snob ensures a fair usage of resources by communicating only with
direct neighbours at each shuffling.
2. A semantic overlay network based on query containment.
3. Experimentation shows that the completeness of queries grows with the
number of running queries in the network. The semantic overlay network is able
to speed up completeness when fewer queries are running simultaneously.</p>
    </sec>
    <sec id="sec-2">
      <title>4 https://www.w3.org/TR/webrtc/</title>
      <p>The section 2 describes the related works. Section 3 describes our proposal.
Section 4 presents preliminaries experimental results. Section 5 concludes
contributions and presents future works.
2</p>
      <sec id="sec-2-1">
        <title>Related Works</title>
        <p>
          In our previous work [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], we presented how a distributed semantic web
application can be deployed in a network of browsers. We detailed some use-cases and
pointed some open issues. In this paper, we will go further, we will present how
it is possible to execute SPARQL queries on data hosted in browsers.
        </p>
        <p>
          The Solid project [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] promotes the vision of a decentralized web for social
Web applications. In Solid, a user stores their data in a personal online datastore
(pod). Therefore, application developers can write application relying on the API
of a pod. When an application is deployed in the user browser, the application
requests the location of the pod to access data. Consequently, the user controls
where their data are stored. However, Solid does not describe how to run a query
over a large-scale network of pods.
        </p>
        <p>
          WebDamLog [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] promotes the vision of a decentralized social network.
WebDamLog allows several sites to collaborate thanks to an original distributed
datalog engine. WebDamLog relies on the delegation of datalog rules to connected
sites to make local datalog program runnable. Compared to Solid, WebDamLog
offers a way to write distributed applications, but is not designed to scale to very
large networks.
        </p>
        <p>
          Mastodon 5 or Diaspora 6 are representative of distributed social networks [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Compared to Solid, a pod can gather thousands of users. As anyone can create
a new pod and join the network of existing pods, the system is "decentralized".
However, they do not allow to run expressive queries over the network of pods.
        </p>
        <p>The term "decentralized applications” is also used in the crypto-currencies
communities such as bitcoins or ethereum 7. Such applications use blockchains
to store data and cryptographic tokens to store values. As some blockchains
rely on public P2P networks that anyone can join, the application is called
"decentralized". In this case, users do not really choose where data are stored.
In this paper, we focus on an approach where data placement and control are
managed by users at run-time.</p>
        <p>
          A network of browsers is strongly related to P2P data management [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for
RDF data [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Many systems rely on Distributed Hash Tables (DHT) such as
P-Grid [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], GridVine [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. DHTs are able to deliver good performances for some
classes of queries such as range queries, however, they hardly process complex
join queries and the maintenance of the structure is costly under high dynamicity.
Many systems also rely on unstructured networks. Such approach just maintains
a neighbourhood for each site and better resists to churn. As a participant does
not know where data are located, she floods the network with her query. However,
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 https://joinmastodon.org/</title>
    </sec>
    <sec id="sec-4">
      <title>6 https://diasporafoundation.org/</title>
    </sec>
    <sec id="sec-5">
      <title>7 https://www.ethereum.org/</title>
      <p>
        this approach does not scale and hardly delivers complete results. Flooding can
be avoided with super-peer maintaining routing indices as in Edutella [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
Having super-peers in a network of browsers is possible but they represent a point of
failure which is a strong limitation to massive deployment of distributed
applications in browsers. Flooding can also be reduced with spanning trees as in sensor
networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A spanning tree reduces the flooding to the number of nodes in
the network. However, spanning trees are costly to maintain on large networks
with heavy churn. The network traffic of Flooding can be significantly reduced
using adapted replication strategies and random walks [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Flooding can be
limited by using multiple overlays as in Semantic Overlay Network (SON) [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ].
Participants are clustered in communities according to their common interests.
Queries are routed to the right community to be executed. SON restricts the
number of sources for a query. In this paper, we follow the SON approach to
restrict the search space for queries.
3
      </p>
      <sec id="sec-5-1">
        <title>The Snob Approach</title>
        <p>
          In this paper, we propose Snob an approach to execute SPARQL queries over a
network of browsers. Snob relies on three key ideas:
1. To address issues of dynamicity, we build an unstructured network based on
periodic peer-sampling [
          <xref ref-type="bibr" rid="ref18 ref22">18, 22</xref>
          ], i.e., each browser shuffles its local view with
the local view of its neighbors. Compared to structured networks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
unstructured network tolerates high churn and high expressiveness of queries
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ][chapter 16].
2. Basic approaches for executing queries on unstructured networks rely on
flooding. However, flooding is incompatible with a fair usage of resources. To
overcome this problem, in Snob a browser evaluates its queries as federated
queries using only its direct neighbours as remote data sources. It waits for
the next shuffling that will bring new data sources. Such approach regulates
the network traffic, i.e., a browser can only send a number of messages
bounded to its number of direct neighbours per shuffling.
3. To speed up query execution, we promote the sharing of intermediate results.
        </p>
        <p>
          First, sharing intermediate results replicates and aggregates data.
Consequently, the probability that a query communicates with relevant
intermediate results is improved [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Second, we build semantic overlay networks [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
that select browsers processing similar queries. Therefore, once a browser
encountered a browser with a similar profile, the selected browser remains in its
neighbourhood, improving the search capacity of this browser per shuffling.
3.1
        </p>
        <sec id="sec-5-1-1">
          <title>Network model</title>
          <p>We consider a network of web browsers based on WebRTC. A WebRTC node
has no address and WebRTC has no routing. So a node cannot send messages
to a particular node in the network. However, it can send messages to direct</p>
          <p>SON
RPS
neighbours. This strong constraint prevents browsers from dereferencing hosted
RDF data.</p>
          <p>
            We build two overlay networks as proposed in [
            <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
            ].
1. a Random Peer Sampling (RPS) overlay network that maintains the
membership among connected profiles. We rely on the Cyclon protocol [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] to
maintain the network. Each node maintains a partial view on the entire
network. The view contains a random subset of network nodes. Periodically, a
node performs a shuffling i.e., selects the oldest node from its view and they
exchange parts of their views. This view is used to bootstrap and maintain
the semantic overlay network.
2. a Semantic Overlay Network (SON) builds on top of RPS, it clusters browsers
according to their profile. Each browser maintains a second view, this view
contains the k-best neighbours according to the similarity of their profile
with the browser profile. The maintenance of k-best neighbours is performed
at RPS exchange time. To minimize the overhead of shuffling: (1) the profile
information has to be as small as possible, (2) the similarity metric has to
be computed quickly in order to prevent slowing down the query execution.
          </p>
          <p>Figure 1 shows browsers, browsers B1 − B4 has data Diseasome, B6 − B9
has data on LinkedMDB and B5 has data on both. The RPS network ensures
that all browsers are connected through a random graph, browsers profiles make
the clustered network converging towards two communities. B1 − B4 will be
highly connected because they have data over Diseasome, while B6 − B9 will be
grouped together due their interest in LinkedMDB. B5 could be connected to
both communities because it hosts data from both datasets.</p>
          <p>Algorithm 1: Ranking Algorithm</p>
          <p>Input: P : Profile, P rof iles: set of Profiles, k: number of top ranked profiles
Output: TopK ranked profiles
1 rank(P, P rof iles):
2 sortedP rof ile = sort(P , prof iles, compare)
3 return SelectTopk(k, sortedProfiles)
4 compare(P , P1 ∈ P rof iles, P2 ∈ P rof ile):
5 return Scoring(P, P2) − Scoring(P, P1)
The Semantic Overlay Network clusters browsers with similar profiles. A profile
is defined as a set of triple patterns of queries that are executed by the browser.
Profiles are used for ranking browsers. We use triple patterns containment to
define similarities among profiles</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Definition 1 (Triple Pattern Query Containment). Let T P (D) denote the</title>
          <p>result of execution of query T P over an RDF dataset D. Let T P1 and T P2 be
two triple pattern queries. We say that T P1 is contained in T P2, denoted by
T P1 v T P2, if for any RDF dataset D, T P1(D) ⊆ T P2(D). We say that T P1 is
equivalent to T P2 denoted T P1 ≡ T P2 iff T P1 v T P2 and T P2 v T P1.</p>
          <p>
            Testing triple pattern containment meant finding substitution of variables
in the triple patterns [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. The complexity to find a containment is O(1). For
example, a containment between T P1 and T P2, T P1 v T P2, requires to check
if T P1 imposes at least the same restrictions as T P2 on the subject, predicate,
and object positions, i.e., T P1 and T P2 should have at most the same number
of variables.
          </p>
          <p>To rank neighbours, we assign weights to the containment relationship using
the following order:
(T P1 ≡ T P2)
(T P1 v T P2)
(T P1 w T P2)
(1)</p>
          <p>We assume that equivalence has the highest weight, i.e. the browser is
considered as the best relevant data source. We assign the weight ∞ to ≡, 2 for v
and 1 for w, therefore, the profile with at least one equivalence will always be
preferred to other profiles.</p>
          <p>The algorithm 1 uses the order defined in (1) to keep the k best-ranked
profiles of neighbours.</p>
          <p>The compare function (lines 4-5) compares two different profiles by using
the scoring function. The scoring function (lines 6-21) assigns weights based on
the order defined in (1). Finally, the ranking function returns the top k sorted
profiles.
3.3</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>Query Execution</title>
          <p>In Snob, a browser hosts a small dataset, e.g. thousands of triples. At a given
time, a browser Bi has a fixed number of neighbours: k neighbours in the random
peer sampling layer, and l neighbours in the semantic overlay network. Typical
values of k and l are logarithmic to the size of the whole network. Each browser
exposes to its neighbours a triple pattern interface. The browser is able to process
an incoming triple pattern query and return results to the triple pattern query
originator. Incoming triple pattern queries are processed over local data and
intermediate results. Therefore, a browser can share local data and intermediate
results of running queries.</p>
          <p>Suppose that the browser Bi processes the query Qi. Qi is processed like a
federated query over its k + l neighbors considered as data sources. However,
there are few differences with federated query processing:
1. The federation is incomplete, i.e., the federation will change at the next
shuffling. Therefore, we need to keep intermediate results for continuing
processing when new sources are discovered.
2. Shuffling can bring already visited data sources. However, as intermediate
results may have been progressed, queries have to be re-executed.</p>
          <p>For instance, the browser Bi executes Qi after each periodic shuffling window,
then waits for the next shuffling for re-executing Qi. During the shuffling window,
the browser only answers to triple pattern queries from direct neighbours.</p>
          <p>The Snob approach has several properties:
Constant number of messages We suppose a round corresponds to the
shuffling window. Suppose the browser Bi processes one query Qi. In one round,
(k + l) messages are sent to neighbours with the list of triple patterns of the
query Qi. The browser waits for answers and then process its query Qi with
new intermediate results. So the number of messages exchanged in the whole
network of n participants is bounded to (k + l) ∗ n per round. This prevents
congestion of the network, saves bandwidth and energy.</p>
          <p>
            Continuous progress Thanks to shuffling properties, the RPS layer will bring
periodically to Bi uniform sampling of the network, and potentially nodes
that Bi has never met before. So the query will progress. As we know the
size of the network 8, Bi is able to estimate the ratio of explored nodes with
respect to the total size of the network and decides to terminate or continue
the query execution. This is a form of Pay-as-you-go query processing. This
approach can be compared to Link Traversal approach [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] except that we
do not follow links to access data, we let the shuffling brings data to nodes.
We also do not have the problem of reachability as peer sampling prevents
any partition within the network.
          </p>
          <p>Automatic Clustering As the query progress, Bi meets other browsers with
similar profiles, they automatically create a a community that crawls the
data space thanks to shuffling. Consequently, the more browsers execute
similar queries, the more they improve their crawling capacity.</p>
          <p>Browsers Data</p>
          <p>B1
B2
B3
B4
a P1 a
b P1 a
c P1 a
a P2 a
d P1 a
a P1 b
e P1 a
(c) Initial
Browsers
Data</p>
          <p>B1
?s P1 a
a P1 b</p>
          <p>B3</p>
          <p>B2
?s P1 a
?s P1 a
a ?p b</p>
          <p>B4</p>
          <p>B1
?s P1 a
a P1 b</p>
          <p>B3</p>
          <p>B2
?s P1 a
?s P1 a
a ?p b</p>
          <p>B4
(a) Network state after
the first shuffling
(b) Network state after
the second shuffling</p>
          <p>To illustrate Snob query execution model, consider a subset of a network
of browsers with a random peer sampling network (black link) and a semantic
overlay network (blue link) in Fig. 2a. In this example, each browser Bi has at
least one query to execute. For simplicity, the example is centred on the browser
B1 for a period of two shuffling windows. In Fig. 2a, B1 has only one neighbor
B2 in the SON and no neighbor in the RPS. B1 executes the query Q1 which
has one triple pattern: T P1 : ?s P1 a, therefore the profile of B1 contains T P1.</p>
          <p>
            B1 sends T P1 to B2. B2 answers with the triples matching T P1 in its local
data store (cf. 2c). Once B1 receives answers, B1 inserts intermediate results
into its local data store and executes Q1 which will result in two triples. Then,
B1 suspends Q1 execution until the next shuffling. During this time, B1 only
answers neighbours triple pattern queries.
8 We use an estimation of the size of the network as proposed in [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]
          </p>
          <p>After the second shuffling, in Fig 2b, B1 has now more neighbors, B2, B3
and B4. In red new the established links, solid links are for RPS neighbours
and dashed ones for SON. B1 sends to all neighbors T P1. Neighbours answer
B1 request with matching triples in their local store and intermediate results
collected from their neighbours. B1 stores received intermediate results and
reexecutes Q1. At this state B1 has explored all browsers and the execution of Q1
will produce five triples, which is the complete answers for Q1 with respect to
the data sources available in this network.</p>
          <p>Other browsers executing queries follow the same cycles as B1. In Fig. 2a,
B2 and B4 also execute queries with the same triple pattern as browser B1, i.e,
?s P1 a. De facto, in a bigger network, browsers B1, B2 and B4 will share their
crawling capacity i.e, they will rely on their respective intermediate results and
their different exploration of the network to answer faster their queries.
4</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Experimental Study</title>
        <p>
          The goal of the experimental study is to evaluate the effectiveness of Snob. We
expect to see that the SON of Snob overcomes the RPS in terms of the number
of produced answers and the number of messages is constant between rounds
and always bounded to the number of edges in the network.
We use RDFStore-js [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] as a data store and a query engine. Each browser hosts
a RDFStore-js extended with the Snob model presented in section 3. Snob source
code is available at 9. A Snob client can be a NodeJs client or pure JavaScript
client running in a browser. For our experiments, we focused on a NodeJs client
and completely abstracted the WebRTC communication layer. All experiments
were executed on one machine with Intel(R) Xeon(R) CPU E5-2680v2@2.80GHz
with 40 cores and 130GB RAM. NodeJs version is set to 8.11.1 (LTS).
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>9 https://github.com/folkvir/webrtc-dequenpeda</title>
      <p>We use the real datasets: Diseasome10 and Linked Movies Database11
(LinkedMDB). We generate 100 queries per dataset using PATH and STAR shaped
templates with two to eight triple patterns that are instantiated with random values
from the dataset. We removed the queries that caused the query engine to abort
execution. This results in 100 queries from LinkedMDB and 96 queries from
Diseasome. For these queries, we extracted triple patterns, each triple pattern
of the query is executed as a SPARQL construct query and produces a fragment
that we split into two sub-fragments. Sub-fragments are randomly distributed
across the clients, each client hosts at least one fragment.</p>
      <p>We set up a network of 196 clients. Each client can execute at most one query,
we vary the number of queries executed in the network to observe the impact on
the network clustering. We choose one query per client (all), half of the clients
executing queries (half) and a quarter of clients executing queries (quarter). The
quarter is a subset of the half-selected queries. Table 1 presents the different
parameters used in the experiment. We vary the value of parameters according
to the objective of the experimentations as explained in the following sections.
The shuffle time is fixed to 10 minutes for all experiments.</p>
      <p>Evaluation Metrics: i) Continuous Progress: is the number of completed
queries. ii) Constant Messages number : is the number of messages produced
10 https://old.datahub.io/dataset/fu-berlin-diseasome
11 http://data.linkedmdb.org/
by round. iii) Automatic Clustering: is the clustering of clients running similar
queries.</p>
      <p>Results presented for all metrics are the average of three successive executions
of 100 rounds. A round corresponds to a query execution after a periodic shuffle
where the shuffling time is set to 10 minutes.
4.2</p>
      <sec id="sec-6-1">
        <title>Experimental Results</title>
        <p>Continuous progression To study the impact of the number of queries in
the network on the progress of queries execution, we used different number of
queries.</p>
        <p>Fig. 3 presents the completeness of query answers per round for three different
configurations. Firstly for 49 queries (quarter), secondly for 98 queries (half) and
finally for 196 queries (all). A round corresponds to a query execution after a
periodic shuffle where the shuffling time is set to 10 minutes. As we can see, only
RPS+SON provides better query completeness.</p>
        <p>For the Half and Quarter at round 100, the RPS produces respectively
22.48% and 18.91% of complete answers. Compared to the RPS, the RPS+SON
produces respectively 30.28% and 19.60%. We notice that the RPS+SON
produce a better completeness rather than just relying on the RPS. However, the
gap between RPS and RPS+SON is drastically reduced for 196 queries. This
is because while executing queries, intermediate results are materialized, when
the number of queries increases, the number of intermediate results is naturally</p>
        <p>(a) RPS Graph at Round 100</p>
        <p>(b) SON Graph at Round 1
(c) SON Graph at Round 50
(d) SON Graph at Round 100
increased. Thus, as intermediate results can be used to answer triple patterns
queries, the replication of intermediate results helps to find relevant fragments
faster. This is why we can sometimes notice a curve crossing between the RPS
and RPS+SON curves.</p>
        <p>Constant message number In all configuration, the number of messages
is always bounded to (k + l) ∗ |queries| as only one triple pattern query is
sent to a neighbour. For example, in Fig. 4 all configuration (196 queries), the
average number of message at round 100 is 1834 bounded to 1960 ((k + l) =
10 and |queries| = 196). As expected, we can notice for each configuration
that the average number of results is always smaller than the limit. Note here
that we do not take into account messages produced by RPS and SON for
their establishment and their maintenance. Only Snob messages are taken into
account.</p>
        <p>Automatic clustering In Fig. 5, the directed graph represents the RPS and
the clustering overlay network where a node represents a client in the network
and an edge represents the connection between clients. For example, an edge 1
→ 2 means that the client 1 has the client 2 in its SON view. Fig. 5a shows
clearly that the RPS network groups nodes randomly. However, Figures 5b, 5c
and 5d demonstrate that Snob SON network is able to cluster clients according
to their queries similarities, the clusters are clearly distinguished at round 100
where a greater value of rounds promotes the clustering.
5</p>
        <sec id="sec-6-1-1">
          <title>Conclusion</title>
          <p>In this paper, we described how a decentralized application running in web
browsers is able to provide a SPARQL service over data stored in browsers. We
proposed Snob, a query execution model for SPARQL query over RDF data
hosted in a network of browsers. The execution of a query in a browser is similar
to the execution of a federated SPARQL query over remote data sources. In
Snob, direct neighbours in the network are the data sources and results received
from neighbours are stored locally as intermediate results. As data sources are
renewed every shuffling, the query continues to progress and could produce new
results after each shuffling without congesting the network. To speed up query
execution, browsers processing similar queries are connected through a semantic
overlay network. Experimentation shows that the number of answers produced
by queries grows with the number of executed queries in the network while the
number of exchanged messages is always bounded per user.</p>
          <p>This work opens several perspectives. First, data exchange between browsers
has to be optimized when a data source is revisited to optimize the traffic.
Second, intermediate results could be streamed on the semantic overlay network.
Finally, a DHT service could be implemented in browsers to speed up the
discovery of similar queries in the network.
6</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Acknowledgements</title>
          <p>This work was partially funded by the French ANR project O’Browser
(ANR16-CE25-0005-01). Mr. Grall is funded by the GFI Informatique compagny (145
Boulevard Victor Hugo, 93400, Saint Ouen, France).</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Despotovic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Punceva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          , R.:
          <article-title>P-grid: a self-organizing structured p2p system</article-title>
          .
          <source>ACM SIGMOD Record</source>
          <volume>32</volume>
          (
          <issue>3</issue>
          ),
          <fpage>29</fpage>
          -
          <lpage>33</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Pelt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Gridvine: Building internet-scale semantic overlay networks</article-title>
          .
          <source>In: International semantic web conference</source>
          . pp.
          <fpage>107</fpage>
          -
          <lpage>121</lpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bienvenu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galland</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoine</surname>
          </string-name>
          , É.:
          <article-title>A rule-based language for web data management</article-title>
          .
          <source>In: Proceedings of the thirtieth ACM SIGMOD-SIGACTSIGART symposium on Principles of database systems</source>
          . pp.
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          . ACM (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bertier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guerraoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kermarrec</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The gossple anonymous social network</article-title>
          .
          <source>In: 11th International Middleware Conference 'Middleware</source>
          <year>2010</year>
          )
          <article-title>-</article-title>
          ACM/IFIP/USENIX. LNCS, vol.
          <volume>6452</volume>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>211</lpage>
          . Springer (
          <year>2010</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -16955-7_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Buchegger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiöberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vu</surname>
            ,
            <given-names>L.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Peerson: P2p social networking: early experiences and insights</article-title>
          .
          <source>In: Proceedings of the Second ACM EuroSys Workshop on Social Network Systems</source>
          . pp.
          <fpage>46</fpage>
          -
          <lpage>52</lpage>
          . ACM (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Crespo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          , H.:
          <article-title>Semantic overlay networks for p2p systems</article-title>
          .
          <source>In: International Workshop on Agents and P2P Computing</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sene</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lloret</surname>
          </string-name>
          , J.:
          <article-title>Distributed database management techniques for wireless sensor networks</article-title>
          .
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          <volume>26</volume>
          (
          <issue>2</issue>
          ),
          <fpage>604</fpage>
          -
          <lpage>620</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Doulkeridis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlachou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nørvåg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vazirgiannis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Distributed semantic overlay networks</article-title>
          . In: Handbook of Peer-to-Peer Networking, pp.
          <fpage>463</fpage>
          -
          <lpage>494</lpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Filali</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bongiovanni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baude</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A survey of structured p2p systems for rdf data storage and retrieval</article-title>
          . In:
          <article-title>Transactions on large-scale data-and knowledge-centered systems III</article-title>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>55</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Folz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>CyCLaDEs: a decentralized cache for Linked Data Fragments</article-title>
          . In: ESWC: Extended Semantic Web Conference (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Grubenmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seuken</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Challenges of source selection in the wod</article-title>
          . In: International Semantic Web Conference. pp.
          <fpage>313</fpage>
          -
          <lpage>328</lpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Squin: a traversal based query execution system for the web of linked data</article-title>
          .
          <source>In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <fpage>1081</fpage>
          -
          <lpage>1084</lpage>
          . ACM (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hernández</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          , GARCıA,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>A javascript rdf store and application library for linked data client applications</article-title>
          . In:
          <article-title>Devtracks of the, WWW2012, conference</article-title>
          . Lyon, France. Citeseer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shenker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Search and replication in unstructured peer-to-peer networks</article-title>
          .
          <source>In: Proceedings of the 16th international conference on Supercomputing</source>
          . pp.
          <fpage>84</fpage>
          -
          <lpage>95</lpage>
          . ACM (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sambra</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hawke</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zereba</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Capadisli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aboulnaga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A demonstration of the solid platform for social web applications</article-title>
          .
          <source>In: Proceedings of the 25th International Conference Companion on World Wide Web</source>
          . pp.
          <fpage>223</fpage>
          -
          <lpage>226</lpage>
          . International World Wide Web Conferences Steering Committee (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skaf-Molli</surname>
          </string-name>
          , H.:
          <article-title>Semantic web in the fog of browsers</article-title>
          .
          <source>In: Proceedings of the Workshop on Decentralizing the Semantic Web</source>
          <year>2017</year>
          co
          <article-title>-located with 16th International Semantic Web Conference (ISWC</article-title>
          <year>2017</year>
          )
          <article-title>(</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Montoya</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Federated SPARQL Queries Processing with Replicated Fragments</article-title>
          .
          <source>In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference. Bethlehem, United States (Oct</source>
          <year>2015</year>
          ), http://hal.univ-nantes.fr/hal-01169601
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nédelec</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mostéfaoui</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An adaptive peersampling protocol for building networks of browsers</article-title>
          . World Wide Web pp.
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sintek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naeve</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilsson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmér</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Risch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Edutella: a p2p networking infrastructure based on rdf</article-title>
          .
          <source>In: Proceedings of the 11th international conference on World Wide Web</source>
          . pp.
          <fpage>604</fpage>
          -
          <lpage>615</lpage>
          . ACM (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Özsu</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valduriez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Principles of distributed database systems</article-title>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H.:
          <article-title>Semantic Web and peer-to-peer</article-title>
          . Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Voulgaris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gavidia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Steen</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CYCLON: inexpensive membership management for unstructured P2P overlays</article-title>
          .
          <source>Journal of Network and Systems Management</source>
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <fpage>197</fpage>
          -
          <lpage>217</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>