<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Data API with Security and Graph-Level Access Control</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barry Norton</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maciej Dziardziel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>British Museum BNorton@britishmuseum.org ?</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>34</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>In this paper we bring together and extend two trends in building API and access control for RDF-based data in the form of an open source data API implementation. Parameterised SPARQL queries and updates are made available to form a RESTful API providing isolation from the underlying database in the style of most database-driven enterprise architectures. Access to query and update resources is governed by LDAP pro les. In e ecting queries and updates, rewriting is employed to provide graph-level access controls, again according to LDAP groups.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        One would hope, therefore, that software frameworks that support RESTful,
or at least pseudo-RESTful3, API construction to easily encapsulate database
queries and updates, like the plethora that exist for more established database
technologies, would appear and would quickly adopt the new technologies,
especially JSON-LD. In reality, those promising candidates, such as the BBC's
Linked Data Platform4 (unconnected with the W3C Working Group, and
having prior claim to the name) and Talis' Kasabi platform [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], remain closed-source
and have even, in the latter case, seemingly been discontinued with no further
access available.
      </p>
      <p>
        In light of this depressing situation, the ResearchSpace project has decided
to implement a new and open source solution to ll this gap. The ResearchSpace
Data API5 is built using python and the Django framework and intends, from
the start, to be enterprise grade, encompassing features such as LDAP-based
security and access [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        A second feature of the API is to provide access control over the RDF
database. Unfortunately even after its second revision [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], SPARQL provided no
means to provide security and access control, or even a suggestion on whether this
should be carried out at database, graph or triple-level. It is worth mentioning
that SPARQL 1.1 did introduce the Graph Store Protocol [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which speci es
access to named graphs in a RESTful manner, making these persistently-identi ed
resources. Oddly the W3C Linked Data Platform has duplicated much of this
work, calling graphs `containers'. Adding access control directly to graphs with
only CRUD (creative, retrieve, update, delete) operations, however, is insu cient
to allow triple-level queries across graphs e ciently. By analogy with relational
databases, APIs might be built over access control allowing certain users/groups
access only to certain tables, but users are still able to (SQL) query across
tables to which they are allowed access. Our contribution, therefore, is to associate
LDAP groups with graph-based access control, allowing both complex queries
across graphs and scalability to large numbers of graphs.
      </p>
      <p>In order to explain our contributions further, we shall rst introduce the
ResearchSpace project, in Section 2, provide more details on the Data API in
Section 3, and conclude, discussing further work, in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>ResearchSpace</title>
      <p>
        ResearchSpace is an Andrew W. Mellon Foundation funded project aimed at
developing an open source platform to support collaborative internet research
and information sharing with Web-based applications for the cultural heritage
scholarly community.
3 I.e. HTTP-based, without the overhead of SOAP encapsulation, but without
necessarily following REST principles [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
4
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-Connecting-together-the
      </p>
      <p>BBCs-Online-Content
5 http://stash.researchspace.org/projects/DATA</p>
      <p>ResearchSpace will provide a range of exible tools to support a wide range of
work ows and will develop these tools on an ongoing basis. Semantic technology
is at the core of the infrastructure because it provides an e ective mechanism for
research and collaboration across data provided by di erent organisations and
projects. ResearchSpace aims to reduce the costs of developing and operating
new and innovative systems, creating a more sustainable research and production
environment and is committed to providing modular open source solutions to
promote uptake of this technology in the cultural heritage sector.</p>
      <p>
        At the centre of the ResearchSpace architecture is a SPARQL-compliant RDF
database, better called a `quadstore' than a `triplestore' due to these standards,
as every statement lives in a named graph. The base data are open datasets
representing museum collections, modelled in the CIDOC-CRM ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Currently, for instance, the ResearchSpace store contains the data of the British
Museum 6 and the Yale Centre for British Art Collection 7, as well as growing
numbers of other institutions', such as the RKD and Rijksmuseum from the
Netherlands.
      </p>
      <p>Since these datasets must be synchronised with internal, currently non-RDF,
collection systems, the `unit of update' when changes occur is the object record.
For this reason named graphs are used in the rst instance as containers for
statements that concern single objects, and can be updated using the SPARQL
Graph Store Protocol. The right-hand side of Figure 1 illustrates the graphs for
a couple of prominent British Museum objects.</p>
      <p>Fig. 2. Search Component in the ResearchSpace Prototype</p>
      <p>The ResearchSpace system provides powerful but intuitive query and
update functionalities over this aggregated data, as illustrated respectively in
Figures 2 and 3.</p>
      <p>The Search component provides means to construct complex conjunctive
queries | by building adding clauses using controlled vocabulary, via
autocomplete | using abstracted properties | via dropdown, which is responsive
to the relationship of the class of the chosen term and the range of the abstract
properties, or , `fundamental relationships' | and with provision for ltering |
via faceting.</p>
      <p>The Image Annotation component, together with others such as Data
Annotation | where existing collection data can be challenged, extended, etc. | a
Forum and Work ow system | where annotation can be discussed and linked,
etc. All of these components depend on pre-de ned parameterised queries, to
retrieve existing data, and pre-de ned updates, to add and update annotations,
forum posts, etc.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Data API</title>
      <p>The Kasabi platform, like the BBC Linked Data Platform, chose to view such
pre-de ned and parameterised queries as RESTful resources. The queries can
can be enacted by HTTP interactions, where the request includes values to bind
to these parameters, which are free variables in the graph patterns of the query.
Kasabi chose to call these `SPARQL Stored Procedures' which is instructive of
the overall approach of isolating the database query interface, but misleading
in the sense of not involving non-SQL/relational algebra programming; we shall
avoid this terminology. Like Kasabi the ResearchSpace Data API uses XML
datatype to declare which variables in the query, or update, are intended to be
substituted for at run-time, and which type of value is expected.</p>
      <p>We extend this fore-going work in four important ways:
1. we expose and maintain our API implementation as open source, inviting
community submissions;
2. instead of tying the access model to a speci c platform, we use the LDAP
standard;
3. we provide means to schedule queries and updates, together with automatic
inspection of results for the former | in the form of XPATH for SELECT
queries and SPARQL ASK queries for CONSTRUCT queries | and both
an API to inspect runs and test results, together with email noti cations of
timings and test results for the purposes of monitoring;
4. we include graph-level security by query re-writing.</p>
      <p>Figure 4 illustrates how an LDAP access model is used to bring together
access to queries and updates, and access to the underlying data. Each user of the
Data API must have an LDAP identity and, according to their group is allowed
access to certain queries and updates. These are illustrated respectively by the
ResearchSpace-level groups for read access to image annotations | whereby
users like Maciej can view existing image annotations retrieved via pre-de ned
query | and write access to image annotations whereby users like Barry can
furthermore change annotations via pre-de ned updates.</p>
      <p>Fig. 4. LDAP Example</p>
      <p>Being allowed to execute a given query or update, however, does not mean
that users in these groups have the ability to view or change arbitrary data. A
second level of access control is enacted at run-time when the query or update is
rewritten, before execution, according to the LDAP groups to which the
requesting user is also a member of. Comparing Figures 4 and 1 we can see that the
BM-speci c groups `read' and `write' are represented in the triplestore (using
the standard ldaps: URI scheme). The intuition behind this speci c example is
that while the image annotation component may be made available to certain
ResearchSpace users, on either a read or a read/write basis, the actual objects
whose annotations that user is thereby enabled to view or change may be
speci c to the project on which they work. In particular their project may provide
access to non-public datasets.</p>
      <p>
        Some existing open source approaches to (RDF) graph-level access control
over SPARQL, such as the Shi3ld component [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] produced in the DataLift
project8, are based on enumeration of the graphs to which a user/group is
allowed access, followed by naive expansion of the query via the addition of FROM
clauses. This is infeasible in ResearchSpace due to the low granularity of named
graphs. There are four million graphs in the British Museum collection alone;
a rewrite that enumerated these would be rejected by most triplestores. On the
other hand, if achieved by making an internal join within a query, this is quite
feasible, so the Data API introduces graph collections to which LDAP groups are
allowed access. This makes query expansion more involved, in case the query
contains more than simply triple patterns, but still feasible and e ciently realised
in a well-indexed RDF database, i.e. one where the named graph, or `context',
forms part of the indices.
      </p>
      <p>8 http://wimmics.inria.fr/projects/shi3ld/</p>
      <p>As a trivial example we shall suggest that the image annotation component
requires simply to query for the label of objects. The query shown in Listing 1.1
will be published to the LDAP group researchspace,cn=image annotation read,
via the Data API, with a speci cation that the ?obj variable is a parameter that
should be substituted at run-time with a URI.</p>
      <p>S E L E C T ? l a b e l
W H E R E {</p>
      <p>? obj rdfs : l a b e l ? l a b e l
}</p>
      <p>Listing 1.1. Example SPARQL Query</p>
      <p>On behalf of any user allowed at least read-access to the image annotation
component, via membership of this group, the component will execute the stored
query, passing the URI of the object whose annotations the user wishes to view.
The Data API will then rewrite the query to ensure that the user is allowed
access to the particular data in question. For instance if Barry wants to view the
annotations for Hoa Hakanai'a, the image annotation component would pass the
parameter http://collection.britishmuseum.org/id/object/EOC3130, and the Data
API would issue to the database the query shown in Listing 1.2.
S E L E C T ? l a b e l
W H E R E {</p>
      <p>G R A P H ? g { &lt; EOC3130 &gt; rdfs : l a b e l ? l a b e l } .</p>
      <p>G R A P H &lt; BM / p e r m i s s i o n s / &gt;</p>
      <p>{? g r o u p dapi : c a n R e a d / dapi : c o n t a i n s ? g } .</p>
      <p>F I L T E R (? g r o u p IN
( &lt; l d a p s :// r e s e a r c h s p a c e . org / dn = BM , cn = read &gt;
&lt; l d a p s :// r e s e a r c h s p a c e . org / dn = BM , cn = write &gt;))</p>
      <p>Listing 1.2. Rewritten SPARQL Query
}
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this paper we have presented a data API which provides for isolation of the
SPARQL interface from developers and users, with group-based access policies.
This takes inspiration from existing approaches to API construction, such as the
BBC's Linked Data Platform and Talis' discontinued Kasabi platform, but is
open source and builds on open standards. The Data API also provides
graphbased data access control in a more scalable way than existing solutions such as
Shi3ld.</p>
      <p>In future work we shall provide Web-based administration which improves on
Django's built-in administration interfaces for permissions, and which provides
graphical reporting of both monitoring of scheduled queries and of ad hoc query
and update usage logging.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The Principle Investigator of the ResearchSpace project, Dominic Oldman,
provided support to this development at the British Museum, and provided valuable
feedback on this report.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Luca</given-names>
            <surname>Costabello</surname>
          </string-name>
          , Serena Villata, and
          <string-name>
            <given-names>Fabien</given-names>
            <surname>Gandon</surname>
          </string-name>
          .
          <article-title>Context-aware access control for rdf graph stores</article-title>
          .
          <source>In ECAI</source>
          , volume
          <volume>242</volume>
          of Frontiers in
          <source>Arti cial Intelligence and Applications</source>
          , pages
          <volume>282</volume>
          {
          <fpage>287</fpage>
          . IOS Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Martin</given-names>
            <surname>Doerr</surname>
          </string-name>
          .
          <article-title>The CIDOC CRM - an ontological approach to semantic interoperability of metadata</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>24</volume>
          :
          <year>2003</year>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Roy Thomas Fielding. REST:
          <article-title>Architectural Styles and the Design of Network-based Software Architectures</article-title>
          .
          <source>Doctoral dissertation</source>
          , University of California, Irvine,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>W3C</given-names>
            <surname>Linked Data</surname>
          </string-name>
          Platform Working Group.
          <source>Linked Data Platform 1.0. W3C Last Call Working Draft, 11 March</source>
          <year>2014</year>
          . Available at http://www.w3.org/TR/2014/WD-ldp-
          <volume>20140311</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Knud</given-names>
            <surname>Mo</surname>
          </string-name>
          <article-title>ller and Leigh Dodds. The Kasabi information marketplace</article-title>
          .
          <source>In 21nd World Wide Web Conference (WWW2012)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Network Working Group.
          <article-title>Lightweight Directory Access Protocol (LDAP): The Protocol</article-title>
          .
          <source>The Internet Society</source>
          ,
          <year>June 2006</year>
          . Available at https://tools.ietf.org/rfc/rfc4511.txt.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. W3C RDF Working Group.
          <article-title>JSON-LD 1.0: A JSON-based Serialization for Linked Data</article-title>
          .
          <source>W3C Recommendation</source>
          , 16
          <year>January 2014</year>
          . Available at http://www.w3.org/TR/2014/REC-json-ld-
          <volume>20140116</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. W3C RDF Working Group.
          <source>RDF 1</source>
          .
          <article-title>1 Turtle: Terse RDF Triple Language</article-title>
          .
          <source>W3C Recommendation</source>
          , 25
          <year>February 2014</year>
          . Available at http://www.w3.org/TR/2014/REC-turtle-
          <volume>20140225</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. W3C SPARQL Working Group.
          <source>SPARQL 1</source>
          .
          <article-title>1 Graph Store HTTP Protocol</article-title>
          .
          <source>W3C Recommendation, 21 March</source>
          <year>2013</year>
          . Available at http://www.w3.org/TR/2013/REC-sparql11
          <string-name>
            <surname>-</surname>
          </string-name>
          http
          <string-name>
            <surname>-</surname>
          </string-name>
          rdf
          <source>-update-20130321/.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. W3C SPARQL Working Group.
          <source>SPARQL 1</source>
          .
          <article-title>1 Query Language</article-title>
          .
          <source>W3C Recommendation, 21 March</source>
          <year>2013</year>
          . Available at http://www.w3.org/TR/2013/REC-sparql11
          <string-name>
            <surname>-</surname>
          </string-name>
          query-20130321/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>