SWEEP: a Streaming Web Service to Deduce
     Basic Graph Patterns from Triple Pattern
                   Fragments

      Emmanuel Desmontils, Patricia Serrano-Alvarado and Pascal Molli

                 LS2N Laboratory - Université de Nantes – France
                    {firstname.lastname}@univ-nantes.fr


      Abstract. The Triple Pattern Fragments (TPF) interface demonstrates
      how it is possible to publish Linked Data at low-cost while preserving
      data availability. But, data providers hosting TPF servers are not able to
      analyze the SPARQL queries they execute because they only receive and
      evaluate subqueries with one triple pattern. Understanding the executed
      SPARQL queries is important for data providers for prefetching, bench-
      marking, auditing, etc. We propose SWEEP, a streaming web service
      that deduces Basic Graph Patterns (BGPs) of SPARQL queries from a
      TPF server log. We show that SWEEP is capable of extracting BGPs of
      SPARQL queries evaluated by a DBpedia’s TPF server.


1   Introduction
The Triple Pattern Fragments (TPF) interface demonstrates how it is possible to
publish Linked Data at low-cost while preserving data availability [8]. However,
data providers hosting TPF servers are not able to analyze the SPARQL queries
executed by their clients because they only receive single triple pattern queries.
    Understanding the executed SPARQL queries is fundamental for data provi-
ders. Mining logs of SPARQL endpoints allows to detect recurrent patterns in
queries for prefetching [1], benchmarking [3], auditing [4], etc. It provides the
type of queries issued, the complexity and the used resources [2,6]. Such analysis
cannot be done on logs of TPF servers because they only contain information
about single triple patterns. A Basic Graph Pattern (BGP) of a SPARQL query,
that is a set of conjunctive graph patterns, is scattered over the log.
     [7] reported statistics from the logs of the DBpedia’s TPF server. However,
statistics only concern single triple pattern queries and not BGPs. In previ-
ous work [5], we proposed an algorithm to extract BGPs of federated SPARQL
queries from logs of a federation of SPARQL endpoints. Here, we address a sim-
ilar scientific problem but in the context of a single TPF server.
    In this demonstration, we present SWEEP, a streaming web service that is
able to extract BGPs from logs of TPF servers in real-time. From the stream of
single triple pattern queries of a TPF server, SWEEP is capable of extracting
BGPs. This allows data providers running TPF servers to better know how
their data are used. The demonstration highlights the performances of SWEEP
in terms of precision and recall.
2      Motivating example
In Figure 1, two clients, c1 and c2 , execute concurrently queries Q1 and Q2
over the DBpedia’s TPF server. Q1 asks for movies starring Brad Pitt and Q2
for movies starring Natalie Portman.1 Both queries have one BGP composed of
several triple patterns (tpn ).


              c1 (173.28.19.114) : Query Q1
                                                                      c2 (173.28.19.114) : Query Q2
SELECT ?movie ?title ?name WHERE {
                                                           SELECT ?titleEng ?title WHERE {
?movie dbpedia-owl:starring ?actor .              (tp1 )
                                                           ?movie dbpprop : starring ?actor .              (tp01 )
?actor rdfs:label "Brad Pitt"@en .                (tp2 )
                                                           ?actor rdf s : label ”N atalie P ortman”@en .   (tp02 )
?movie rdfs:label ?title .                        (tp3 )
                                                           ?movie rdf s : label ?titleEng .                (tp03 )
?movie dbpedia-owl:director ?director .           (tp4 )
                                                           ?movie rdf s : label ?title                     (tp04 )
?director rdfs:label ?name                        (tp5 )
                                                            FILTER LANGMATCHES(LANG(?titleEng), "EN")
 FILTER LANGMATCHES(LANG(?title), "EN")
                                                            FILTER (!LANGMATCHES(LANG(?title), "EN"))         }
 FILTER LANGMATCHES(LANG(?name), "EN")              }
          ?predicate = rdf s : label                               ?predicate = rdf s : label
           & ?object = “Brad P itt”@en . . .                       & ?object = “N atalie P ortman”@en . . .


                                          DBpedia’s TPF server


                      Fig. 1: Concurrent execution of queries Q1 and Q2 .

    IP     Time       Asked triple pattern/TPF
1   172... 11:24:19   ?predicate=rdfs:label & ?object="Brad Pitt"@en
2   172... 11:24:23   dbpedia:Brad_Pitt rdfs:label "Brad Pitt"@en ,
3   172... 11:24:24   ?predicate=dbpedia-owl:starring & ?object=dbpedia:Brad_Pitt
4   172... 11:24:27   dbpedia:A_River_Runs_Through_It_(film) dbpedia-owl:starring dbpedia:Brad_Pitt
                      dbpedia:Troy_(film) dbpedia-owl:starring dbpedia:Brad_Pitt ...
5 172... 11:24:28     ?subject=dbpedia:A_River_Runs_Through_It_(film) &?predicate=rdfs:label

             Table 1: Excerpt of a DBpedia’s TPF server log for query Q1 .


    The TPF client decomposes the SPARQL queries into a sequence of triple
pattern queries partially presented in Table 1. The odd-numbered lines represent
received triple pattern queries and the even-numbered ones represent sent triples
after evaluation on the RDF graph. Lines 1 and 3, correspond to triple pattern
queries for tp2 and tp1 of Q1 .2 We can observe that the object in Line 3, comes
from a mapping seen in Line 2. This injection of a mapping obtained from a
previous triple pattern query, is clearly a bind join from tp2 towards tp1 .
    As the TPF server only sees triple pattern queries, the original queries are
unknown to the data provider. In this work, we address the following research
question: Can we extract BGPs from a TPF server log?
    The main challenge is to distinguish similar queries, that is queries whose
triple patterns are the same for the TPF server as tp1 vs tp01 . In our example,
we aim to extract two BGPs from the TPF server log, one corresponding to
Q1 , BGP[1]= {tp1 .tp2 .tp3 .tp4 .tp5 } and another corresponding to Q2 , BGP[2]=
{tp01 .tp02 .tp03 .tp04 }.
1
    These queries come from http://client.linkeddatafragments.org/.
2
    TPF clients always rename variables as "subject" or "object", regardless of how they
    are named in the original query.
3   SWEEP
SWEEP uses a TPF server log, as the one of Table 1, composed of an unlimited
ordered sequence of execution traces organized by IP-address. It considers a
fixed-size window sliding over the TPF server log. Window size can depend on
the memory available for the streamed log or on the average of known values
used as timeout by TPF clients.
    We consider a set G of deduced BPGs. Each time a triple pattern query (tpqi )
arrives, SWEEP creates a new BP Gj 2 G or updates an existing one.
    Suppose G is empty and SWEEP receives tpq1 ={?s p2 toto} where ?s
produces 2 mappings: {c1, c2}. As G is empty, SWEEP creates BGP1 containing
tpq1 with the current time as timestamp, BGP1 .ts = time().
    Then, if tpq2 ={c1 p1 ?o} arrives, as c1 appears in mappings of a BGPj 2
G, SWEEP detects a bind join. This implies updating BGP1 with the join
{?s p2 toto . ?s p1 ?o}. If tpq3 = {c2 p1 ?o} arrives, as it is already rep-
resented in BGP1 , nothing is done.
    If BGP1 is out the window, i.e., time() BGP1 .ts > window, then it must
no longer be updated; it is delivered and removed from the stream.
    We run SWEEP with queries proposed by the TPF web client (http://
client.linkeddatafragments.org/). From 21 queries executed, we obtained
100% of precision and 87% of recall of deduced BGPs when compared to the
BGPs of corresponding original queries. SWEEP succeeds in this case because
these queries are note very similar. Diﬀerent precision and recall would be pro-
duced with a more challenging set of queries.

4   Demo
Figure 2 presents the dashboard of SWEEP available at http://sweep.priloo.
univ-nantes.fr. It shows the most recent deduced BGPs and original client
queries when they are available. Our TPF client, http://tpf-client-sweep.
priloo.univ-nantes.fr, sends the original client query to SWEEP to be able
to calculate precision and recall.
    If you want to test SWEEP with another TPF client, you must specify the ad-
dress of the DBpedia’s TPF server we have setup: http://tpf-server-sweep.
priloo.univ-nantes.fr. In this case, SWEEP will deduce BGPs but will not
be able to calculate precision and recall.
    We used, the versions of JavaScript for Node.js of the TPF server and client.
The source code is available at https://github.com/edesmontils/SWEEP.

5   Conclusion and perspectives
SWEEP demonstrates how it is possible to deduce the BGPs executed by a TPF
server. This allows data providers to have a better understanding of the usage
of their data.
    With SWEEP it would be possible to detect whether clients are executing
federated queries over multiple datasets hosted by one TPF server. And if multi-
ple data providers agree on streaming their logs to a shared SWEEP service, they
would be able to detect federated queries executed over multiple TPF servers.
                           Fig. 2: SWEEP dashboard.


References
1. J. Lorey and F. Naumann. Detecting SPARQL Query Templates for Data Prefetch-
   ing. In ESWC Conference, 2013.
2. K. Möller, M. Hausenblas, R. Cyganiak, G. Grimnes, and S. Handschuh. Learning
   from Linked Open Data Usage: Patterns & Metrics. In WebSci10:Extending the
   Frontiers of Society On-Line, 2010.
3. M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo. DBpedia SPARQL
   Benchmark–Performance Assessment with Real Queries on Real Data. In ISWC
   Conference, 2011.
4. S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani. Towards
   Robustness in Query Auditing. In VLDB Conference, 2006.
5. G. Nassopoulos, P. Serrano-Alvarado, P. Molli, and E. Desmontils. FETA: Federated
   QuEry TrAcking for Linked Data. In DEXA Conference, 2016.
6. F. Picalausa and S. Vansummeren. What are Real SPARQL Queries Like? In SWIM
   Workshop, 2011.
7. R. Verborgh, E. Mannens, and R. Van de Walle. Initial Usage Analysis of DBpedia’s
   Triple Pattern Fragments. In USEWOD Workshop, 2015.
8. R. Verborgh, M. Vander Sande, O. Hartig, J. Van Herwegen, L. De Vocht,
   B. De Meester, G. Haesendonck, and P. Colpaert. Triple Pattern Fragments: a
   Low-cost Knowledge Graph Interface for the Web. Journal of Web Semantics, 37–
   38, Mar. 2016.