<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Querying non-RDF Datasets using Triple Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Moreau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patricia Serrano-Alvarado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emmanuel Desmontils</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Thoumas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GDD-LS2N - Nantes University</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>OpenDataSoft</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Triple Pattern Fragments (TPF) interface allows to query Linked Data datasets with high data availability. But, data providers do not integrate large amounts of datasets as Linked Data due to expensive investments in terms of storage and maintenance. The problem we focus on is how to integrate non-RDF datasets on-demand as Linked Data simply and efficiently. In this demo, we present ODMTP, an On-Demand Mapping using Triple Patterns over non-RDF datasets. ODMTP is implemented over a TPF server. We showcase it with SPARQL queries over Twitter.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Figure 1 presents the global architecture of ODMTP. TPF clients receive SPARQL
queries and decompose them into several http requests which are sent to a TPF
server. Such requests, that we call triple pattern queries (TPQs), contain a triple
pattern and a page number.</p>
      <p>We propose to modify the TPF server such that, instead of evaluating TPQs
over a RDF store, it sends TPQs to ODMTP.</p>
      <p>When ODMTP receives a TPQ, it translates it into the target query language
and sends it to the target non-RDF dataset. The query engine of the non-RDF
dataset evaluates this query and returns the resultset to ODMTP. ODMTP then
translates this answer in triples, it constructs fragments and sends them to the
TPF server. Fragments are composed of triples, metadata and controls (total
number of triples matching the triple pattern, number of triples per page, link
to the next page, etc.).</p>
      <p>ODMTP translations are possible thanks to a mapping file for the
corresponding target dataset. Several mapping files can be defined for a target dataset
depending on the needs of semantic applications.</p>
      <p>Figure 2 shows more details about ODMTP. The trimmer component
receives a TPQ and from a mapping file it extracts the mappings pertinent to the
triple pattern. The mapping file maps triple patterns (i.e., subject, predicate, or
object) with columns or attributes of the target non-RDF dataset. Any mapping
language can be used to define this mapping file as long as it facilitates triple
pattern matching and the mapping of heterogeneous data formats into RDF.</p>
      <sec id="sec-1-1">
        <title>Querying non-RDF Datasets using Triple Patterns 3</title>
        <p>Then, the TPQ and its corresponding mapping is received by TP2Query
that uses the mapping to translate the triple pattern into a query in the query
language of the non-RDF dataset. TP2Query communicates with the query
engine of the non-RDF dataset and obtains the resultset corresponding to TPQ.
It also estimates the total number of triples matching the triple pattern. The
implementation of this component depends on the non-RDF dataset.</p>
        <p>Finally, the Mapper component uses the mapping to translate the resultset
in triples. And it produces the fragment that is sent to the TPF server.</p>
        <p>The challenge is having a mapping file expressive enough for the semantic
application purposes. Besides mapping triple patterns into the target columns
or attributes, this mapping should semantically enrich the target dataset.</p>
        <p>The overhead produced by ODMTP is tightly related to the performance of
the query engine of the target non-RDF dataset.</p>
        <p>We evaluated a Python implementation of ODMTP over several datasets
stored in JSON and queried through Elasticsearch. Cardinality of datasets goes
from 100,000 to 500,000 triples. Experiments were executed locally on a computer
with a processor intel i7 2,5 GHz with 16 GB of RAM.</p>
        <p>We measured the time to evaluate the last page (i.e., the potentially most
expensive result page) of the 8 types of triple patterns, i.e., {?s ?p ?o}, {?s
URI ?o}, {?s ?p URI}, {?s URI URI}, {URI URI URI}, etc. We compared
these performances with those of a traditional TPF server using HDTs. ODMTP
produced a constant overhead (on average +0,250 seconds). The reason of this
limited overhead is the set of efficient indexes produced by Lucene3 that is used
by Elasticsearch.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Demo</title>
      <p>In this demo we show a Python implementation of ODMTP over Twitter.
Twitter, as Elasticsearch uses Lucene for real-time search. This implementation is
available at https://github.com/benjimor/odmtp-tpf.</p>
      <p>
        We used the TwitterApi package and xR2RML [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as mapping language. The
TPF server is setup locally as a django application. Once installed, it can receive
requests from TPF clients (e.g., http://client.linkeddatafragments.org/).
      </p>
      <p>Our implementation is a proof of concept that shows how it is possible to
evaluate SPARQL queries using triple patterns. For the moment we consider
only triple patterns with a variable in the subject. Predicates in our
mapping consider schema:author, schema:dateCreated, it:includedHashtag,
it:includedUrl. Objects must have corresponding data types.</p>
      <p>Consider TPQ1= &lt;?s it:includedHashtag "iswc2017", page=1&gt;. When
ODMTP receives TPQ1 the Trimmer identifies the corresponding triples from
the mapping file, in this case the mappings shown in Listing 1.1. Then, TP2Query
translates the triple pattern into the Twitter query https://api.twitter.com/
1.1/search/tweets.json?q=%23iswc2017&amp;count=15 and sends it to the
Twitter API (here it asks for 15 entries). Twitter returns results in JSON containing
3 https://lucene.apache.org/
a link to the next result page. TP2Query follows this link until it retrieves the
pertinent page. Then the Mapper constructs the triples and the fragment.</p>
      <p>A more expressive mapping would consider other triple pattern types and
more semantics for Twitter data.</p>
      <p>During the demo, users will be able to test several SPARQL queries taken into
account by our mapping. A video is available at https://youtu.be/wruH8teK9tU.
@prefix r r : &lt;http : / /www. w3 . org / ns / r2rml#&gt; .
@prefix rml : &lt;http : / / semweb . mmlab . be/ ns /rml#&gt; .
@prefix xrr : &lt;http : / /www. i 3 s . unice . f r / ns / xr2rml#&gt; .
@prefix schema : &lt;http : / / schema . org/&gt; .
@prefix i t : &lt;http : / /www. i n f l u e n c e t r a c k e r . com/ ontology#&gt; .
&lt;#Tweets&gt;
xrr : l o g i c a l S o u r c e [
xrr : query " https : / / api . t w i t t e r . com /1.1/ " ;
rml : i t e r a t o r "$ . s t a t u s e s . " ;
] ;
r r : subjectMap [
r r : template " https : / / t w i t t e r . com/ s t a t u s e s /{$ . id_str }" ;
r r : c l a s s schema : SocialMediaPosting ;
] ;
r r : predicateObjectMap [
r r : p r e d i c a t e i t : includedHashtag ;
r r : objectMap [ xrr : r e f e r e n c e "$ . e n t i t i e s . hashtags . [ ] . t e x t " ; ] ;
]
. . .</p>
      <sec id="sec-2-1">
        <title>Listing 1.1. Excerpt of the mapping file for Twitter</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and perspectives</title>
      <p>ODMTP shows how to integrate non-RDF datasets as Linked Data through
triple patterns. It is an adaptable and modular framework that does not pretend
to make non-RDF datasets compliant to Linked Data principles. Aspects like
dereferencable URIs should complete our approach for instance via dynamic
html pages.</p>
      <p>We will use our implementation of ODMTP over Elasticsearch to integrate
datasets provided by the OpenDataSoft platform. OpenDataSoft provides
hundreds of open datasets that will benefit of the Web of Data without changing
the storage format.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Van Herwegen</given-names>
            ,
            <surname>L. De Vocht</surname>
          </string-name>
          , B. De Meester, G. Haesendonck, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          , “
          <article-title>Triple Pattern Fragments: A Low-cost Knowledge Graph Interface for the Web</article-title>
          ,
          <source>” Journal of Web Semantics</source>
          , vol.
          <volume>37</volume>
          -
          <fpage>38</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Halb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Idehen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Thibodeau</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ezzat</surname>
          </string-name>
          , “
          <article-title>A Survey of Current Approaches for Mapping of Relational Databases to RDF,” W3C RDB2RDF Incubator Group Report</article-title>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faron-Zucker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Montagnat</surname>
          </string-name>
          , “
          <article-title>A Mapping-based Method to Query MongoDB Documents with</article-title>
          SPARQL,
          <source>” in Int. Conference on Database and Expert Systems Applications (DEXA)</source>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>67</lpage>
          , Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>F.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Djimenou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faron-Zucker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Montagnat</surname>
          </string-name>
          , “
          <article-title>Translation of Relational and Non-relationalDdatabases into RDF with xR2RML</article-title>
          ,
          <source>” in Int. Conference on Web Information Systems and Technologies (WEBIST)</source>
          , pp.
          <fpage>443</fpage>
          -
          <lpage>454</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>