<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Linked Data Quality Sept.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Validating and Describing Linked Data Portals using RDF Shape Expressions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose Emilio Labra Gayo</string-name>
          <email>labra@uniovi.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harold Solbrig</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Prud'hommeaux</string-name>
          <email>eric@w3.org</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose María Álvarez Rodríguez</string-name>
          <email>josemaria.alvarez@uc3m.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. Computer Science, Carlos III University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mayo Clinic, College of Medicine</institution>
          ,
          <addr-line>Rochester, MN</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Oviedo, Dept. of Computer Science</institution>
          ,
          <addr-line>C/Calvo Sotelo, S/N</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>World Wide Web Consortium (W3C) MIT</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>2</volume>
      <issue>2014</issue>
      <abstract>
        <p>In order to improve the quality of linked data portals, it is necessary to have a tool that can automatically describe and validate the RDF triples exposed. RDF Shape Expressions have been proposed as a language based on Regular Expressions that can describe and validate the structure of RDF graphs. In this paper we describe the WebIndex, a medium sized linked data portal, and how we have employed Shape Expressions to document its contents and to automatically validate the shapes of the resources.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RDF</kwd>
        <kwd>Graphs</kwd>
        <kwd>Validation</kwd>
        <kwd>Transformation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Linked Data portals have emerged as a way to publish data on the
Web following a set of principles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which improve data reuse and
integration. As indicated in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], linked data relies on documents
using RDF representations to make typed statements that link
arbitrary things in the world. RDF appears as a data integration
language and some linked data applications use RDF as a database
The syntax and semantics of Shape Expressions are designed to be
familiar to users of regular expressions (specially RelaxNG). The
main difference is that RDF data is a set (of triples) while regular
expression data is a sequence (of characters). Regular expressions
correlate an ordered pattern of atomic characters and logical
operators against an ordered sequence of characters while Shape
Expressions correlate an ordered pattern of pairs of predicate and object
classes and logical operators against an unordered set of arcs in a
graph.
      </p>
      <p>In this paper we propose the use of RDF Shape Expressions to
describe the contents of Linked Data portals in a way that can be
automatically validated.</p>
      <p>As a use case, we will describe the development of the 2013
WebIndex data portal1, which is a linked data portal of medium size
(around 3.5 million of triples) that contains information about the
statistical computations that have been carried on to generate the
WebIndex. We have selected this use case because it contains a data
model with interrelated shapes and reuses several existing
vocab1http://data.webfoundation.org/webindex/
v2013
ularies like RDF Data Cube, Organization Ontology, Dublin Core,
etc.</p>
    </sec>
    <sec id="sec-2">
      <title>2. WEBINDEX DATA MODEL</title>
      <p>The WebIndex is a multi-dimensional measure of the World Wide
Web’s contribution to development and human rights globally. It
covers 81 countries and incorporates indicators that assess several
areas like universal access; freedom and openness; relevant
content; and empowerment2.</p>
      <p>The 2012 version offered a data portal where the data was obtained
by transforming raw observations and precomputed values from
Excel sheets to RDF. The 2013 version of the WebIndex data portal
employs a new validation and computation approach that tries to
obtain a verifiable linked data version of the Web Index data.</p>
      <p>The WebIndex data model is based on the RDF Data Cube
vocabulary. Figure 1 represents the main concepts of the data model3.</p>
      <p>As can be seen, the main concept are observations of type qb:
Observation which have a float value cex:value and are
related to a country, a year a dataset and an indicator.</p>
      <p>A dataset contains a number of slices, each of which also contains
a number of observations.
dataset:DITU a qb:DataSet ;
rdfs:label "ITU Dataset" ;
dct:publisher org:ITU ;
qb:slice slice:ITU09B ,
slice:ITU10B,
...;
...
slice:ITU09B a qb:Slice ;
qb:sliceStructure wf:sliceByArea ;
qb:observation obs:obs8165,
obs:obs8166,
...</p>
      <p>...
org:ITU a org:Organization ;
rdfs:label "ITU" ;
foaf:homepage &lt;http://www.itu.int/&gt;
.
country:Spain a wf:Country ;
wf:iso2 "ES" ; wf:iso3 "ESP" ;
rdfs:label "Spain"
.
indicator:ITU_B a wf:SecondaryIndicator ;
rdfs:label "Broadband subscribers %"
.</p>
      <p>
        Indicators are provided by an organization of type org:OrganizationA validator of this model can validate the simple structure of each
which employs the Organization ontology[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Datasets are also type of resource but it would be better if it can declare and detect
published by organizations. all these interrelationships.
      </p>
      <p>As a sample of some data, an observation can be that Spain has
value 23.78 in 2011 for the indicator ITU-B (Broadband subscribers
per 100 population) in the dataset DITU provided by ITU
(International Telecommunication Union). This information can be
represented in RDF using Turtle syntax as4:
obs:obs8165 a qb:Observation ;
rdfs:label "ITU B in ESP, 2011" ;
dct:issued</p>
      <p>"2013-05-30T09:15:00"^^xsd:dateTime ;
cex:indicator indicator:ITU_B ;
qb:dataSet dataset:DITU ;
cex:value 23.78^^xsd:float ;
cex:ref-area country:Spain ;
cex:ref-year 2011 ;
...other properties omitted for brevity
.</p>
      <p>Notice that the WebIndex data model contains data that is
completely interrelated. Observations are linked to indicators and datasets.</p>
      <p>Datasets contain also links to slices and slices have links to
indicators and observations again. Both datasets and indicators are linked
to the organizations that publish or provide them.</p>
      <p>The following example contains a sample of interrelated data for
this domain.</p>
      <sec id="sec-2-1">
        <title>2http://thewebindex.org</title>
        <p>3In the paper we will employ common prefixes that can be found
in http://prefix.cc
4The URI of the real observation is http://
data.webfoundation.org/webindex/v2013/
observation/obs8165
The WebIndex data model also includes a linked data
representation of computations so it is possible to declare the data from which
a value has been computed so it can be checked. We omit those
classes for brevity.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. USING SHAPE EXPRESSIONS TO DE</title>
    </sec>
    <sec id="sec-4">
      <title>SCRIBE THE WEBINDEX DATA MODEL</title>
      <p>In this section we will describe the WebIndex data model using
Shape Expressions.</p>
      <p>The RDF Shape Expressions language is inspired by RelaxNG and
also has two syntaxes: a compact one and an RDF serialization.
The compact syntax is more oriented towards human readability
while the RDF serialization can be employed to exchange and store
shape expressions using standard semantic web tools. A primer to
the Shape Expressions language can be found at http://www.
w3.org/2013/ShEx/Primer.</p>
      <p>A shape expression is a labelled pattern for a set of RDF Triples
sharing a common subject. Syntactically, it is a pairing of a label,
which can be an IRI or a blank node, and a rule enclosed in brackets
({ }). Typically, this rule is a conjunction of constraints separated
by commas (,). For example, we can declare the shape of a country
as:
&lt;Country&gt; {</p>
      <p>a (wf:Country)
, rdfs:label xsd:string
, wf:iso2 xsd:string
, wf:iso3 xsd:string
}
The above declaration indicates that a country must have rdf:
type with value wf:Country. It must also have the properties
rdfs:label, wf:iso2 and wf:iso3 with a value of type xsd
:string.</p>
      <p>The semantics of Shape Expression validation acts as a type
inference system which infers a type (shape) for a given node in an RDF
graph.</p>
      <p>With the previous declaration, a Shape Expressions validator would
infer:
country:Spain
&lt;Country&gt;
The Shape Expressions language is inspired by Regular
Expressions and the rules can contain cardinality constraints with the
values + (one or more), * (zero of more), ? (zero or one) and even
ranges {m,n} (between m and n repetitions).</p>
      <p>It is also possible to declare that the value of some property has a
given shape using the @ character.</p>
      <p>For example, the shape of datasets can be described as:
&lt;DataSet&gt; {</p>
      <p>a (qb:DataSet)
, qb:structure (wf:DSD)
, rdfs:label xsd:string?
, qb:slice @&lt;Slice&gt;+
}
which declares that a dataset must have rdf:type with value qb:
DataSet and qb:structure with value wf:DSD. It may have
a rdfs:label with a value of type xsd:string and must have
one or more slices with the shape Slice.</p>
      <p>In a similar way, it is possible to declare slices as:
&lt;Slice&gt; {</p>
      <p>a (qb:Slice)
, qb:sliceStructure ( wf:sliceByYear )
, qb:observation @&lt;Observation&gt;+
, cex:indicator @&lt;Indicator&gt;
The declarations for observations and indicators are similar:
}
&lt;Observation&gt; {</p>
      <p>a (qb:Observation)
, cex:value xsd:float
, dct:issued xsd:dateTime
, rdfs:label xsd:string?
, qb:dataSet @&lt;DataSet&gt;
, cex:ref-area @&lt;Country&gt;
, cex:indicator @&lt;Indicator&gt;
, cex:ref-year xsd:gYear
}
&lt;Indicator&gt; {
a ( wf:PrimaryIndicator</p>
      <p>wf:SecondaryIndicator
)
, rdfs:label xsd:string
, rdfs:comment xsd:string ?
, skos:notation xsd:string ?
}
Finally, organizations can be declared as:
&lt;Organization&gt; {</p>
      <p>a ( org:Organization )
, rdfs:label xsd:string
, foaf:homepage IRI
, org:hasSubOrganization</p>
      <p>@&lt;Organization&gt;
}
As can be seen, Shape Expressions offer an intuitive way to
describe the contents of linked data portals. In fact, we have
employed Shape Expressions to document both the WebIndex5 and
Landbook6 data portals. The documentation defines templates for</p>
      <sec id="sec-4-1">
        <title>5http://weso.github.io/wiDoc 6http://weso.github.io/landportalDoc/data</title>
        <p>the different shapes of resources and for the triples that can be
retrieved when dereferencing those resources.</p>
        <p>These templates define the dataset structure in an intuitive way and
can be used to act as a contract between developers of the data
portal. We noted that having a good data model with its corresponding
Shape Expressions specification facilitated the communication
between the different teams involved in the development of the data
portal.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. IMPLEMENTATIONS OF SHAPE EXPRES</title>
    </sec>
    <sec id="sec-6">
      <title>SIONS</title>
      <p>Currently, there are four implementations of Shape Expressions in
progress:</p>
      <p>FancyShExDemo7 was the first prototype implementation in
Javascript. It handles semantic actions which can be used to
extend the semantics of shape expressions and even to
transform RDF to XML or JSON. It supports a form-based
system with dynamic validation during the edition process and
SPARQL queries generation.</p>
      <p>JSShexTest8, developed by Jesse van Dam is another Javascript
implementation. It both supports the SHEXc and SHEX/RDF
syntax of Shape Expressions and contains a validation
semantics for testing purposes based on truth tables.</p>
      <p>Shexcala9: an implementation developed in Scala with an
efficient implementation based on derivatives of regular
expressions. It supports validation against an RDF file and
against a SPARQL endpoint. In the following section we
describe an online validation service which is implemented
on top of ShExcala.</p>
      <p>
        Haws10: a Haskell implementation based on type inference
semantics and backtracking. This implementation can be
seen as an executable monadic semantics of Shape
Expressions [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>5. RDFSHAPE: AN RDF SHAPE VALIDA</title>
    </sec>
    <sec id="sec-8">
      <title>TION SERVICE</title>
      <p>RDFShape11 is an online RDF Shape validation web service that
can be used to validate both the syntax and the shape of RDF data
against some schema.</p>
      <p>The online service has five types of inputs for RDF:</p>
      <p>By URI: The RDF data to be validated is downloaded from a
given URI</p>
      <sec id="sec-8-1">
        <title>By File: The data is uploaded from a local file</title>
      </sec>
      <sec id="sec-8-2">
        <title>By Input: The data is inserted in a textarea By Endpoint: The RDF data triples are retrieved from a SPARQL endpoint on demand. The user has to provide the URI of the endpoint.</title>
        <p>7http://www.w3.org/2013/ShEx/FancyShExDemo
8https://github.com/jessevdam/shextest
9http://labra.github.io/ShExcala/
10http://labra.github.io/haws/
11http://rdfshape.weso.es
By dereference: The RDF triples are obtained by
dereferencing the URIs of the resources that will be validated and using
content negotiation to ask for RDF/XML or Turtle
representations.</p>
        <p>The RDFShape tool allows the user to specify whether to use a
Shape Expression schema or not. If not, the tool just checks that
the RDF can be parsed. Otherwise, the user can also enter a Schema
by URI, by File or by Input.</p>
        <p>Finally, it is possible to validate a specific IRI or just any IRI in the
RDF graph. Specifying an IRI is recommended when validating by
Endpoint to check the shape of a given IRI in the endpoint and it
is mandatory when using by dereference, as it will be the IRI that
will be dereferenced to validate its representation.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>6. VALIDATING LINKED DATA PORTALS</title>
    </sec>
    <sec id="sec-10">
      <title>USING SHAPE EXPRESSIONS</title>
      <p>RDF Shape Expressions can be used not only to describe the
contents of linked data portals, but also to validate them.</p>
      <p>We consider that one of the first steps in the development of a linked
data portal should be the Shape Expression declarations of the
different types of resources. Shape Expressions can play a similar role
to Schema declarations in XML based developments. They can act
as a contract for both the producers and consumers of linked data
portals.</p>
      <p>Notice, however, that this contract does not suppose an extra-limitation
between the possible consumers a linked data portal can have. There
is no impediment to have more than one shape expressions which
enforce different constraints. As a naïve example, the declarations
of the iso2 and iso3 code of Countries can be further constrained
using regular expressions to indicate that they must be 2 or 3
alphabetical characters or could be more relaxed saying that it may be
any value (not only strings). The advantage of Shape Expressions
is that they offer a declarative and intuitive language to express and
refer to those constraints.</p>
      <p>Shape Expression declarations can also be employed to generate
synthetic linked data in the development phase so one can perform
stress tests. For example, during the development of the WebIndex
data portal, we implemented the wiGenerator12 tool which is a
simple program that can generate random linked data that follows the
WebIndex data model with any number of indicators, years of
countries specified by the user. These fake RDF datasets can be
employed to perform stress and usability tests of the data visualization
software.</p>
      <p>Shexcala offers the possibility to validate a URI in an endpoint or
by dereferencing it (retrieving the RDF data behind that URI). The
implementation performs a generic SPARQL query to obtain all the
triples that have a given node as subject in the endpoint:
construct { $node ?p ?y } where {</p>
      <p>$node ?p ?y .
}
12http://labra.github.io/wiGenerator/
Once the triples are retrieved, the system validates the Shape
Expressions declarations of that graph to check the shape of that node.
data portals, nothing precludes to define templates and libraries of
generic shapes that can be reused between different data portals.
In this way, it is very easy to perform shape checking on the
contents of linked data portals. For example, one can retrieve all the
nodes of type qb:Observation and check that they have the
shape &lt;Observation&gt;.</p>
      <p>Notice that in general, this kind of validation is context sensitive to
a given data portal. Shape Expressions deliberately separates types
from shapes.</p>
      <p>For example, LandPortal also expresses a shape for its use of qb:
Observation. Both WebIndex and LandPortal respect the RDF
data Cube definition of an Observation, but they can require or
prohibit different properties (from that ontology or elsewhere) on those
Observations. The observations in WebIndex have different shapes
than the observations in LandPortal, but all of them have type qb
:Observation without introducing any logical conflicts. This
reflects a different usage pattern than generally seen for OWL or
SPIN constraints (see 8). We consider that this difference between
structural shapes and semantic types of resources improves the
separation of concerns involved in linked data portal development.
Nevertheless, although some shapes can be specific to some linked</p>
    </sec>
    <sec id="sec-11">
      <title>7. EXTENSIONS AND CHALLENGES</title>
      <p>At the moment of this writing, the W3C has just chartered a RDF
Data Shapes Working Group with the mission to produce a
language for defining structural constraints on RDF graphs. The Shape
Expressions language is being used as part of the working group
discussions so it is possible that some parts of the language will
change in the future.</p>
      <p>There are currently several topics and extension proposals for the
Shape Expression language that may be interesting to mention:
The Shape Expression language contains other common
regular expression operators like alternatives (|), negations (!),
groupings using parenthesis, etc. that can express more
complex patterns. For example, we could declare that Countries
can have either wf:iso2 or wf:iso3, and that they must
not have the property dc:creator as:
&lt;Country&gt; { a (wf:Country)
, rdfs:label xsd:string
, ( wf:iso2 xsd:string
| wf:iso3 xsd:string
)
, ! dc:creator .
}
Open vs Closed shapes. An open shape is a shape expression
that validates nodes that contain the triples specified in the
shape but can also contain other triples. A closed shape only
validates nodes with those triples and no more. For example,
if we declare users shapes as:
&lt;User&gt; { a foaf:Person }
and we have the following triples:
:john a foaf:Person,</p>
      <p>foaf:name " John " .</p>
      <p>Using closed shapes, the system would not assign :john
the shape &lt;User&gt; because it contains an extra triple, while
using open shapes it would assign it that shape.</p>
      <p>It is perceived that open shapes fit better in an Open World
web, while closed shapes would be better for more controlled
environments.</p>
      <p>In this line of work, there is also a proposal to reuse shape
descriptions by including other shape declarations. For
example, one may be interested to say that providers have the
shape &lt;Organization&gt; but also contain the property wf
:sourceURI as:
&lt;Provider&gt; &amp; &lt;Organization&gt;</p>
      <p>{ wf:sourceURI IRI }
Incoming edges, relations and named graphs. The current
Shape Expression language is based on describing the
subjects of an RDF graph. It would be possible to extend the
language to handle also objects and properties. For example,
we can declare reverse arcs using the operator ^ to indicate
incoming arcs. The Country declaration could be:
&lt;Country&gt; { a (wf:Country)
, rdfs:label xsd:string
, wf:iso2 xsd:string
, wf:iso3 xsd:string
, ^ cex:ref-area @&lt;Observation&gt; *
}
with the meaning that a country can receive (zero or more)
arcs with property cex:ref-area of shape &lt;Observation
&gt;.</p>
      <p>In the same way, it may be interesting to declare the shape
of RDF nodes that act as properties. Another extension
proposal is to describe named graphs. These two proposals are
not difficult to add, but it is not clear which syntax would be
intuitive enough for them.</p>
      <p>More expressiveness. The Shape expression language can
be extended with semantic actions to increase the
expressiveness of the language. Semantic actions are marked by
%lang { actions %} which means that the validator
can invoke a processor of the language lang with the
corresponding actions.</p>
      <p>The Javascript implementation supports semantic actions in
Javascript and SPARQL which can add more expressiveness
to the validation declarations. In fact, it also contains two
simple languages (GenX and GenJ) which enable an easy
way to transform RDF to both XML and JSon.</p>
      <p>Following the RelaxNG path, the Shape Expression language
can be seen as a simple Domain Specific Language which
is tailored to express the structure of RDF graphs. It is not
intended to perform strong constraint checking or validation
using computations. However, with semantic actions or shape
expression validators embedded in other tool chains that
possibility could be offered.</p>
      <p>In the same way, the interplay between Shape Expressions
and reasoners is not established. Some applications could do
inference between checking the shape of RDF graphs, while
other applications could check the shapes before invoking a
reasoner. Another possibility that could be explored is to
have some built-in way that could invoke reasoning
capabilities.</p>
      <p>One of the challenges of the Shape Expressions language is the
performance of Shape checking. A naïve implementation of Shape
checking using backtracking can lead to exponential growth. We
have found that using regular expression derivatives offers an
efficient implementation and we are currently evaluating its
performance.</p>
    </sec>
    <sec id="sec-12">
      <title>8. RELATED WORK</title>
      <p>
        Improving the quality of linked data has been of increasing interest
in the last years. Sieve[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a framework for expressing
quality assessment methods as well as fusion methods.
RDFUnit[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a test-driven framework that can run test cases against an
endpoint. In the case of RDF validation, the main approaches can
be summarized as:
      </p>
      <p>
        Inference based approaches, which try to adapt RDF Schema
or OWL to express validation semantics. The use of Open
World and Non-unique name assumption limits the
validation possibilities. In fact, what triggers constraint violations
in closed world systems leads to new inferences in standard
OWL systems. [
        <xref ref-type="bibr" rid="ref12 ref18 ref4">4, 18, 12</xref>
        ] propose the use of OWL
expressions with a Closed World Assumption to express integrity
constraints.
      </p>
      <p>
        SPARQL Inferencing Notation (SPIN)[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] constraints
associate RDF types or nodes with validation rules. These rules
are expressed as SPARQL ASK queries where true
indicates an error or CONSTRUCT queries which produce spin
:ConstraintViolations. SPIN constraints use the
expressiveness of SPARQL plus the semantics of the ?this
variable standing for the current subject and the spin:
ConstraintViolation class.
      </p>
      <p>
        SPARQL-based approaches use the SPARQL Query
Langugage to express the validation constraints. SPARQL has
much more expressiveness than Shape Expressions and can
even be used to validate numerical and statistical
computations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, we consider that the Shape
Expressions language will be more usable by people familiar with
validation languages like RelaxNG. Nevertheless, Shape
Expressions can be translated to SPARQL queries. In fact, we
have implemented a translator from Shape Expressions to
SPARQL queries. This translator combined with semantic
actions expressed in SPARQL can offer the same
expressiveness as other SPARQL approaches with a more succinct and
intuitive syntax.
      </p>
      <p>
        There have been other proposals using SPARQL combined
with other technologies. Fürber and Hepp[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a
combination between SPARQL and SPIN as a semantic data
quality framework, Simister and Brickley[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] propose a
combination between SPARQL queries and property paths which
is used in Google and Kontokostas et al [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed
RDFUnit a Test-driven framework which employs SPARQL query
templates that are instantiated into concrete quality test queries.
We consider that Shape Expressions can also be employed in
the same scenarios as SPARQL while the specialized
validation nature of Shape Expressions can lead to more efficient
implementations.
      </p>
      <p>
        Grammar based approaches define a domain specific
language to declare the validation rules. OSLC Resource Shapes [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
have been proposed as a high level and declarative
description of the expected contents of an RDF graph expressing
constraints on RDF terms. Shape Expressions have been
inspired by OSLC although they offer more expressive power.
Dublin Core Application Profiles [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] also define a set of
validation constraints using Description Templates with less
expressiveness than Shape Expressions.
      </p>
      <p>
        The main inspiration for Shape Expressions has been RelaxNG [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
a Schema language for XML that offers a good trade-off between
expressiveness and validation efficiency. The semantics of
RelaxNG has also been expressed using inference rules in the
specification document [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and is based on tree grammars [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In the
case of Shape Expressions the underlying semantics can be defined
in terms of regular bag expressions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Shape Expressions are also being employed in the development of
more specialized validators. For example, the Vaskos project13 is
developing a SKOS validator using a combination between Shape
Expressions and SPARQL queries.</p>
    </sec>
    <sec id="sec-13">
      <title>9. CONCLUSIONS</title>
      <p>Shape Expressions have been proposed as a Domain Specific
Language that can describe and automatically validate RDF. They
offer a more expressive way to define sets of RDF graph shapes than
OSLC’s Resource Shapes or Dublin Core’s Application Profiles.
There are trade-offs between expressiveness and implementability,
but compared to schema languages in other data models, Shape
Expressions represent a conservative point in that spectrum, emulating
mostly the expressiveness of RelaxNG.</p>
      <p>
        From a tooling perspective, shape expressions can be used
standalone to validate RDF graphs and endpoints offering a dedicated
language that can be implemented efficiently and generate
specialized error messages for the concrete task of shape validation.
Given that Shape Expressions can be translated to SPARQL queries,
they can also be combined with other widely deployed
infrastructure.
13http://vaskos.chemaar.cloudbees.net/
The complexity of the validation algorithms for Shape Expressions
offers some theoretical challenges related to regular bag
expressions that have been tackled in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The last implementation of
Shexcala contained an algorithm based on derivatives of regular
expressions which greatly improved the efficiency of the validation
process.
      </p>
      <p>Although the language is new and the syntax can seem strange at
first sight, we noticed that people are able to learn the syntax and
to declare shape expressions quickly.</p>
      <p>In general we consider that the benefits of validation can help the
adoption of RDF based solutions where the quality of data is an
important issue.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked-data design issues. W3C design issue document</article-title>
          ,
          <year>June 2006</year>
          . http://www.w3.org/DesignIssue/LinkedData.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked data - the story so far</article-title>
          .
          <source>International Journal Semantic Web Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Boneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Labra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hym</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. G.</surname>
          </string-name>
          <article-title>Prud'hommeau, H</article-title>
          . Solbrig, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Staworko</surname>
          </string-name>
          .
          <article-title>Validating RDF with Shape Expressions</article-title>
          . ArXiv e-prints,
          <source>Apr</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Sirin</surname>
          </string-name>
          .
          <article-title>On RDF validation, stardog ICV, and assorted remarks</article-title>
          .
          <source>In RDF Validation Workshop. Practical Assurances for Quality RDF Data</source>
          , Cambridge, Ma, Boston,
          <year>September 2013</year>
          . W3c, http://www.w3.org/
          <year>2012</year>
          /12/rdf-val.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Coyle</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Baker</surname>
          </string-name>
          .
          <article-title>Dublin core application profiles. separating validation from semantics</article-title>
          .
          <source>In RDF Validation Workshop. Practical Assurances for Quality RDF Data</source>
          , Cambridge, Ma, Boston,
          <year>September 2013</year>
          . W3c, http://www.w3.org/
          <year>2012</year>
          /12/rdf-val.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fürber</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hepp</surname>
          </string-name>
          .
          <article-title>Using sparql and spin for data quality management on the semantic web</article-title>
          . In W. Abramowicz and R. Tolksdorf, editors,
          <source>Business Information Systems</source>
          , volume
          <volume>47</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>46</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch. SPIN - Modeling Vocabulary</surname>
          </string-name>
          . http: //www.w3.org/Submission/spin-modeling/,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Westphal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornelissen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaveri</surname>
          </string-name>
          .
          <article-title>Test-driven evaluation of linked data quality</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on World Wide Web, WWW '14</source>
          , pages
          <fpage>747</fpage>
          -
          <lpage>758</lpage>
          , Republic and Canton of Geneva, Switzerland,
          <year>2014</year>
          . International World Wide Web Conferences Steering Committee.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Labra</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M. Alvarez</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          .
          <article-title>Validating statistical index data represented in RDF using SPARQL queries</article-title>
          .
          <source>In RDF Validation Workshop. Practical Assurances for Quality RDF Data</source>
          , Cambridge, Ma, Boston,
          <year>September 2013</year>
          . W3c, http://www.w3.org/
          <year>2012</year>
          /12/rdf-val.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          .
          <article-title>Reusable semantic specifications of programming languages</article-title>
          .
          <source>In 6th Brazilian Symposium on Programming Languages</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mühleisen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          . Sieve:
          <article-title>Linked Data Quality Assessment and Fusion</article-title>
          .
          <source>In 2nd International Workshop on Linked Web Data Management (LWDM 2012) at the 15th International Conference on Extending Database Technology, EDBT</source>
          <year>2012</year>
          ,
          <year>March 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Adding Integrity Constraints to OWL</article-title>
          . In C. Golbreich,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          , and B. Parsia, editors,
          <source>OWL: Experiences and Directions</source>
          <year>2007</year>
          (
          <article-title>OWLED 2007)</article-title>
          , Innsbruck, Austria, June 6-7
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Murata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawaguchi</surname>
          </string-name>
          .
          <article-title>Taxonomy of xml schema languages using formal language theory</article-title>
          .
          <source>ACM Trans. Internet Technol.</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>660</fpage>
          -
          <lpage>704</lpage>
          , Nov.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>OASIS Committee</given-names>
            <surname>Specification. RELAX NG</surname>
          </string-name>
          <article-title>Specification:</article-title>
          . http://relaxng.org/spec-20011203.html,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          . The Organization Ontology. http://www.w3.org/TR/vocab-org/,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Ryman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Hors</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Speicher</surname>
          </string-name>
          .
          <article-title>OSLC resource shape: A language for defining constraints on linked data</article-title>
          . In C. Bizer,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and S. Auer, editors,
          <source>Linked data on the Web</source>
          , volume
          <volume>996</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Simister</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          .
          <article-title>Simple application-specific constraints for rdf models</article-title>
          .
          <source>In RDF Validation Workshop. Practical Assurances for Quality RDF Data</source>
          , Cambridge, Ma, Boston,
          <year>September 2013</year>
          . W3c, http://www.w3.org/
          <year>2012</year>
          /12/rdf-val.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sirin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          .
          <article-title>Integrity constraints in OWL</article-title>
          .
          <source>In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10)</source>
          . AAAI,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E. van der Vlist. Relax</given-names>
            <surname>NG: A Simpler Schema Language for XML. O'Reilly</surname>
          </string-name>
          , Beijing,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>