=Paper= {{Paper |id=Vol-1215/paper-06 |storemode=property |title=Validating and Describing Linked Data Portals using RDF Shape Expressions |pdfUrl=https://ceur-ws.org/Vol-1215/paper-06.pdf |volume=Vol-1215 |dblpUrl=https://dblp.org/rec/conf/i-semantics/GayoPSR14 }} ==Validating and Describing Linked Data Portals using RDF Shape Expressions== https://ceur-ws.org/Vol-1215/paper-06.pdf
  Validating and Describing Linked Data Portals using RDF
                    Shape Expressions

                      Jose Emilio Labra Gayo                                           Eric Prud’hommeaux
                          University of Oviedo                              World Wide Web Consortium (W3C) MIT,
                       Dept. of Computer Science                                     Cambridge, MA, USA
                          C/Calvo Sotelo, S/N                                                eric@w3.org
                            labra@uniovi.es
                             Harold Solbrig                                    Jose María Álvarez Rodríguez
                           Mayo Clinic                                                 Dept. Computer Science
             College of Medicine, Rochester, MN, USA                                     Carlos III University
                                                                                 josemaria.alvarez@uc3m.es


ABSTRACT                                                                  technology or as an interoperability layer. However, there is a lack
In order to improve the quality of linked data portals, it is necessary   of an accepted practice to declare and constrain the shape of an
to have a tool that can automatically describe and validate the RDF       RDF graph in a way that can be automatically validated to augment
triples exposed.                                                          the quality of RDF based data portals.

RDF Shape Expressions have been proposed as a language based              Validation is standard practice in other conventional data languages.
on Regular Expressions that can describe and validate the structure       Industrial setting count on parsing grammars for domain-specific
of RDF graphs.                                                            languages, DDL constraints in SQL databases, and W3C XML
                                                                          Schema or RelaxNG for XML documents.
In this paper we describe the WebIndex, a medium sized linked
data portal, and how we have employed Shape Expressions to doc-           In the case of RDF, although there are standards for inference like
ument its contents and to automatically validate the shapes of the        RDF Schema and OWL, these technologies employ Open World
resources.                                                                and Non-Unique Name Assumptions that create difficulties for val-
                                                                          idation purposes [18].
Categories and Subject Descriptors
I.2.4 [Knowledge Representation Formalisms and Methods]: Rep-             The RDF Shape Expressions language is intended to perform the
resentation languages; H.3.5 [Online Information Services]: Web-          same function for RDF graphs as Schema languages to XML. It
based services                                                            can be used to validate documents, communicate expected graph
                                                                          patterns for interfaces, and generate user interface forms and code.
General Terms                                                             The syntax and semantics of Shape Expressions are designed to be
Theory                                                                    familiar to users of regular expressions (specially RelaxNG). The
                                                                          main difference is that RDF data is a set (of triples) while regular
Keywords                                                                  expression data is a sequence (of characters). Regular expressions
RDF, Graphs, Validation, Transformation                                   correlate an ordered pattern of atomic characters and logical opera-
                                                                          tors against an ordered sequence of characters while Shape Expres-
1.    INTRODUCTION                                                        sions correlate an ordered pattern of pairs of predicate and object
Linked Data portals have emerged as a way to publish data on the          classes and logical operators against an unordered set of arcs in a
Web following a set of principles [1] which improve data reuse and        graph.
integration. As indicated in [2], linked data relies on documents
using RDF representations to make typed statements that link ar-          In this paper we propose the use of RDF Shape Expressions to de-
bitrary things in the world. RDF appears as a data integration lan-       scribe the contents of Linked Data portals in a way that can be
guage and some linked data applications use RDF as a database             automatically validated.

                                                                          As a use case, we will describe the development of the 2013 We-
                                                                          bIndex data portal1 , which is a linked data portal of medium size
                                                                          (around 3.5 million of triples) that contains information about the
                                                                          statistical computations that have been carried on to generate the
                                                                          WebIndex. We have selected this use case because it contains a data
                                                                          model with interrelated shapes and reuses several existing vocab-
Copyright is held by the author/owner(s).
LDQ 2014, 1st Workshop on Linked Data Quality Sept. 2, 2014, Leipzig,     1
Germany.                                                                    http://data.webfoundation.org/webindex/
                                                                          v2013
ularies like RDF Data Cube, Organization Ontology, Dublin Core,
etc.                                                                        dataset:DITU a qb:DataSet ;
                                                                             rdfs:label "ITU Dataset " ;
                                                                             dct:publisher org:ITU ;
2.    WEBINDEX DATA MODEL                                                    qb:slice slice:ITU09B ,
The WebIndex is a multi-dimensional measure of the World Wide                         slice:ITU10B,
Web’s contribution to development and human rights globally. It                       ...;
covers 81 countries and incorporates indicators that assess several          ...
areas like universal access; freedom and openness; relevant con-            slice:ITU09B a qb:Slice ;
tent; and empowerment2 .                                                     qb:sliceStructure wf:sliceByArea ;
                                                                             qb:observation obs:obs8165,
The 2012 version offered a data portal where the data was obtained                           obs:obs8166,
by transforming raw observations and precomputed values from                                 ...
Excel sheets to RDF. The 2013 version of the WebIndex data portal            ...
employs a new validation and computation approach that tries to             org:ITU a org:Organization ;
obtain a verifiable linked data version of the Web Index data.               rdfs:label "ITU" ;
                                                                             foaf:homepage 
The WebIndex data model is based on the RDF Data Cube vocab-                 .
ulary. Figure 1 represents the main concepts of the data model3 .           country:Spain a wf:Country ;
                                                                             wf:iso2 "ES" ; wf:iso3 "ESP" ;
As can be seen, the main concept are observations of type qb:                rdfs:label " Spain "
Observation which have a float value cex:value and are re-                   .
lated to a country, a year a dataset and an indicator.                      indicator:ITU_B a wf:SecondaryIndicator ;
                                                                             rdfs:label " Broadband subscribers %"
A dataset contains a number of slices, each of which also contains           .
a number of observations.

Indicators are provided by an organization of type org:OrganizationA validator of this model can validate the simple structure of each
 which employs the Organization ontology[15]. Datasets are also    type of resource but it would be better if it can declare and detect
published by organizations.                                        all these interrelationships.

As a sample of some data, an observation can be that Spain has              The WebIndex data model also includes a linked data representa-
value 23.78 in 2011 for the indicator ITU-B (Broadband subscribers          tion of computations so it is possible to declare the data from which
per 100 population) in the dataset DITU provided by ITU (Interna-           a value has been computed so it can be checked. We omit those
tional Telecommunication Union). This information can be repre-             classes for brevity.
sented in RDF using Turtle syntax as4 :

obs:obs8165 a qb:Observation ;                                              3.    USING SHAPE EXPRESSIONS TO DE-
 rdfs:label "ITU B in ESP , 2011" ;                                               SCRIBE THE WEBINDEX DATA MODEL
 dct:issued                                                                 In this section we will describe the WebIndex data model using
   "2013 -05 -30 T09 :15:00 " ^^xsd:dateTime ;                              Shape Expressions.
 cex:indicator indicator:ITU_B ;
 qb:dataSet dataset:DITU ;                                                  The RDF Shape Expressions language is inspired by RelaxNG and
 cex:value 23.78^^xsd:float ;                                               also has two syntaxes: a compact one and an RDF serialization.
 cex:ref-area country:Spain ;                                               The compact syntax is more oriented towards human readability
 cex:ref-year 2011 ;                                                        while the RDF serialization can be employed to exchange and store
 ...other properties omitted for brevity                                    shape expressions using standard semantic web tools. A primer to
 .                                                                          the Shape Expressions language can be found at http://www.
                                                                            w3.org/2013/ShEx/Primer.

Notice that the WebIndex data model contains data that is com-              A shape expression is a labelled pattern for a set of RDF Triples
pletely interrelated. Observations are linked to indicators and datasets.   sharing a common subject. Syntactically, it is a pairing of a label,
Datasets contain also links to slices and slices have links to indica-      which can be an IRI or a blank node, and a rule enclosed in brackets
tors and observations again. Both datasets and indicators are linked        ({ }). Typically, this rule is a conjunction of constraints separated
to the organizations that publish or provide them.                          by commas (,). For example, we can declare the shape of a country
                                                                            as:
The following example contains a sample of interrelated data for
this domain.                                                                 {
2
                                                                              a (wf:Country)
  http://thewebindex.org                                                    , rdfs:label xsd:string
3
  In the paper we will employ common prefixes that can be found             , wf:iso2 xsd:string
in http://prefix.cc
4                                                                           , wf:iso3 xsd:string
  The URI of the real observation is http://
data.webfoundation.org/webindex/v2013/                                      }
observation/obs8165
                                                   Figure 1: Simplified WebIndex data model


The above declaration indicates that a country must have rdf:                 }
type with value wf:Country. It must also have the properties
rdfs:label, wf:iso2 and wf:iso3 with a value of type xsd
:string.                                                                 The declarations for observations and indicators are similar:

The semantics of Shape Expression validation acts as a type infer-        {
ence system which infers a type (shape) for a given node in an RDF          a (qb:Observation)
graph.                                                                    , cex:value xsd:float
                                                                          , dct:issued xsd:dateTime
With the previous declaration, a Shape Expressions validator would        , rdfs:label xsd:string?
infer:                                                                    , qb:dataSet @
                                                                          , cex:ref-area @
 country:Spain                                                   , cex:indicator @
                                                                          , cex:ref-year xsd:gYear
The Shape Expressions language is inspired by Regular Expres-             }
sions and the rules can contain cardinality constraints with the val-
ues + (one or more), * (zero of more), ? (zero or one) and even           {
ranges {m,n} (between m and n repetitions).                                 a ( wf:PrimaryIndicator
                                                                                wf:SecondaryIndicator
It is also possible to declare that the value of some property has a          )
given shape using the @ character.                                        , rdfs:label xsd:string
                                                                          , rdfs:comment xsd:string ?
For example, the shape of datasets can be described as:                   , skos:notation xsd:string ?
                                                                          }
 {
   a (qb:DataSet)
 , qb:structure (wf:DSD)
                                                                         Finally, organizations can be declared as:
 , rdfs:label xsd:string?
 , qb:slice @+                                                     {
 }                                                                          a ( org:Organization )
                                                                          , rdfs:label xsd:string
                                                                          , foaf:homepage IRI
which declares that a dataset must have rdf:type with value qb:           , org:hasSubOrganization
DataSet and qb:structure with value wf:DSD. It may have                           @
a rdfs:label with a value of type xsd:string and must have                }
one or more slices with the shape Slice.

In a similar way, it is possible to declare slices as:                   As can be seen, Shape Expressions offer an intuitive way to de-
                                                                         scribe the contents of linked data portals. In fact, we have em-
 {                                                                ployed Shape Expressions to document both the WebIndex5 and
   a (qb:Slice)                                                          Landbook6 data portals. The documentation defines templates for
 , qb:sliceStructure ( wf:sliceByYear )
                                                                          5
 , qb:observation @+                                             http://weso.github.io/wiDoc
                                                                          6
 , cex:indicator @                                                 http://weso.github.io/landportalDoc/data
the different shapes of resources and for the triples that can be re-         • By dereference: The RDF triples are obtained by dereferenc-
trieved when dereferencing those resources.                                     ing the URIs of the resources that will be validated and using
                                                                                content negotiation to ask for RDF/XML or Turtle represen-
These templates define the dataset structure in an intuitive way and            tations.
can be used to act as a contract between developers of the data por-
tal. We noted that having a good data model with its corresponding
Shape Expressions specification facilitated the communication be-       The RDFShape tool allows the user to specify whether to use a
tween the different teams involved in the development of the data       Shape Expression schema or not. If not, the tool just checks that
portal.                                                                 the RDF can be parsed. Otherwise, the user can also enter a Schema
                                                                        by URI, by File or by Input.
4.     IMPLEMENTATIONS OF SHAPE EXPRES-
                                                                        Finally, it is possible to validate a specific IRI or just any IRI in the
       SIONS                                                            RDF graph. Specifying an IRI is recommended when validating by
Currently, there are four implementations of Shape Expressions in       Endpoint to check the shape of a given IRI in the endpoint and it
progress:                                                               is mandatory when using by dereference, as it will be the IRI that
                                                                        will be dereferenced to validate its representation.
     • FancyShExDemo7 was the first prototype implementation in
                                                                        Figure 2 contains a screen capture of the RDFShape validation tool.
       Javascript. It handles semantic actions which can be used to
       extend the semantics of shape expressions and even to trans-
       form RDF to XML or JSON. It supports a form-based sys-           6.      VALIDATING LINKED DATA PORTALS
       tem with dynamic validation during the edition process and               USING SHAPE EXPRESSIONS
       SPARQL queries generation.                                       RDF Shape Expressions can be used not only to describe the con-
                  8
     • JSShexTest , developed by Jesse van Dam is another Javascript    tents of linked data portals, but also to validate them.
       implementation. It both supports the SHEXc and SHEX/RDF
       syntax of Shape Expressions and contains a validation se-        We consider that one of the first steps in the development of a linked
       mantics for testing purposes based on truth tables.              data portal should be the Shape Expression declarations of the dif-
                                                                        ferent types of resources. Shape Expressions can play a similar role
     • Shexcala9 : an implementation developed in Scala with an         to Schema declarations in XML based developments. They can act
       efficient implementation based on derivatives of regular ex-     as a contract for both the producers and consumers of linked data
       pressions. It supports validation against an RDF file and        portals.
       against a SPARQL endpoint. In the following section we
       describe an online validation service which is implemented       Notice, however, that this contract does not suppose an extra-limitation
       on top of ShExcala.                                              between the possible consumers a linked data portal can have. There
                                                                        is no impediment to have more than one shape expressions which
     • Haws10 : a Haskell implementation based on type inference        enforce different constraints. As a naïve example, the declarations
       semantics and backtracking. This implementation can be           of the iso2 and iso3 code of Countries can be further constrained
       seen as an executable monadic semantics of Shape Expres-         using regular expressions to indicate that they must be 2 or 3 alpha-
       sions [10].                                                      betical characters or could be more relaxed saying that it may be
                                                                        any value (not only strings). The advantage of Shape Expressions
5.     RDFSHAPE: AN RDF SHAPE VALIDA-                                   is that they offer a declarative and intuitive language to express and
       TION SERVICE                                                     refer to those constraints.
RDFShape11 is an online RDF Shape validation web service that
can be used to validate both the syntax and the shape of RDF data       Shape Expression declarations can also be employed to generate
against some schema.                                                    synthetic linked data in the development phase so one can perform
                                                                        stress tests. For example, during the development of the WebIndex
The online service has five types of inputs for RDF:                    data portal, we implemented the wiGenerator12 tool which is a sim-
                                                                        ple program that can generate random linked data that follows the
                                                                        WebIndex data model with any number of indicators, years of coun-
     • By URI: The RDF data to be validated is downloaded from a        tries specified by the user. These fake RDF datasets can be em-
       given URI                                                        ployed to perform stress and usability tests of the data visualization
                                                                        software.
     • By File: The data is uploaded from a local file
                                                                        Shexcala offers the possibility to validate a URI in an endpoint or
     • By Input: The data is inserted in a textarea
                                                                        by dereferencing it (retrieving the RDF data behind that URI). The
     • By Endpoint: The RDF data triples are retrieved from a SPARQL    implementation performs a generic SPARQL query to obtain all the
       endpoint on demand. The user has to provide the URI of the       triples that have a given node as subject in the endpoint:
       endpoint.
                                                                        construct { $node ?p ?y } where {
7
   http://www.w3.org/2013/ShEx/FancyShExDemo                              $node ?p ?y .
 8
   https://github.com/jessevdam/shextest                                }
 9
   http://labra.github.io/ShExcala/
10
   http://labra.github.io/haws/
11                                                                      12
   http://rdfshape.weso.es                                                   http://labra.github.io/wiGenerator/
                                                 Figure 2: Screen capture of RDFShape tool


Once the triples are retrieved, the system validates the Shape Ex-        data portals, nothing precludes to define templates and libraries of
pressions declarations of that graph to check the shape of that node.     generic shapes that can be reused between different data portals.

In this way, it is very easy to perform shape checking on the con-        7.     EXTENSIONS AND CHALLENGES
tents of linked data portals. For example, one can retrieve all the       At the moment of this writing, the W3C has just chartered a RDF
nodes of type qb:Observation and check that they have the                 Data Shapes Working Group with the mission to produce a lan-
shape .                                                      guage for defining structural constraints on RDF graphs. The Shape
                                                                          Expressions language is being used as part of the working group
Notice that in general, this kind of validation is context sensitive to   discussions so it is possible that some parts of the language will
a given data portal. Shape Expressions deliberately separates types       change in the future.
from shapes.
                                                                          There are currently several topics and extension proposals for the
For example, LandPortal also expresses a shape for its use of qb:         Shape Expression language that may be interesting to mention:
Observation. Both WebIndex and LandPortal respect the RDF
data Cube definition of an Observation, but they can require or pro-
hibit different properties (from that ontology or elsewhere) on those          • The Shape Expression language contains other common reg-
Observations. The observations in WebIndex have different shapes                 ular expression operators like alternatives (|), negations (!),
than the observations in LandPortal, but all of them have type qb                groupings using parenthesis, etc. that can express more com-
:Observation without introducing any logical conflicts. This                     plex patterns. For example, we could declare that Countries
reflects a different usage pattern than generally seen for OWL or                can have either wf:iso2 or wf:iso3, and that they must
SPIN constraints (see 8). We consider that this difference between               not have the property dc:creator as:
structural shapes and semantic types of resources improves the sep-
aration of concerns involved in linked data portal development.                   { a (wf:Country)
                                                                                 , rdfs:label xsd:string
Nevertheless, although some shapes can be specific to some linked                , ( wf:iso2 xsd:string
    | wf:iso3 xsd:string                                                   The Javascript implementation supports semantic actions in
    )                                                                      Javascript and SPARQL which can add more expressiveness
  , ! dc:creator .                                                         to the validation declarations. In fact, it also contains two
  }                                                                        simple languages (GenX and GenJ) which enable an easy
                                                                           way to transform RDF to both XML and JSon.
• Open vs Closed shapes. An open shape is a shape expression               Following the RelaxNG path, the Shape Expression language
  that validates nodes that contain the triples specified in the           can be seen as a simple Domain Specific Language which
  shape but can also contain other triples. A closed shape only            is tailored to express the structure of RDF graphs. It is not
  validates nodes with those triples and no more. For example,             intended to perform strong constraint checking or validation
  if we declare users shapes as:                                           using computations. However, with semantic actions or shape
                                                                           expression validators embedded in other tool chains that pos-
   { a foaf:Person }                                                 sibility could be offered.
                                                                           In the same way, the interplay between Shape Expressions
  and we have the following triples:                                       and reasoners is not established. Some applications could do
                                                                           inference between checking the shape of RDF graphs, while
  :john a foaf:Person,
                                                                           other applications could check the shapes before invoking a
    foaf:name "John" .
                                                                           reasoner. Another possibility that could be explored is to
                                                                           have some built-in way that could invoke reasoning capabil-
  Using closed shapes, the system would not assign :john                   ities.
  the shape  because it contains an extra triple, while
  using open shapes it would assign it that shape.
  It is perceived that open shapes fit better in an Open World      One of the challenges of the Shape Expressions language is the
  web, while closed shapes would be better for more controlled      performance of Shape checking. A naïve implementation of Shape
  environments.                                                     checking using backtracking can lead to exponential growth. We
                                                                    have found that using regular expression derivatives offers an ef-
  In this line of work, there is also a proposal to reuse shape
                                                                    ficient implementation and we are currently evaluating its perfor-
  descriptions by including other shape declarations. For ex-
                                                                    mance.
  ample, one may be interested to say that providers have the
  shape  but also contain the property wf
  :sourceURI as:                                                    8.     RELATED WORK
                                                                    Improving the quality of linked data has been of increasing interest
   &                                        in the last years. Sieve[11] proposed a framework for expressing
    { wf:sourceURI IRI }                                            quality assessment methods as well as fusion methods. RDFU-
                                                                    nit[8] is a test-driven framework that can run test cases against an
• Incoming edges, relations and named graphs. The current           endpoint. In the case of RDF validation, the main approaches can
  Shape Expression language is based on describing the sub-         be summarized as:
  jects of an RDF graph. It would be possible to extend the
  language to handle also objects and properties. For example,
                                                                         • Inference based approaches, which try to adapt RDF Schema
  we can declare reverse arcs using the operator ^ to indicate
                                                                           or OWL to express validation semantics. The use of Open
  incoming arcs. The Country declaration could be:
                                                                           World and Non-unique name assumption limits the valida-
   { a (wf:Country)                                               tion possibilities. In fact, what triggers constraint violations
  , rdfs:label xsd:string                                                  in closed world systems leads to new inferences in standard
  , wf:iso2 xsd:string                                                     OWL systems. [4, 18, 12] propose the use of OWL expres-
  , wf:iso3 xsd:string                                                     sions with a Closed World Assumption to express integrity
  , ^ cex:ref-area @ *                                        constraints.
  }
                                                                         • SPARQL Inferencing Notation (SPIN)[7] constraints asso-
                                                                           ciate RDF types or nodes with validation rules. These rules
  with the meaning that a country can receive (zero or more)               are expressed as SPARQL ASK queries where true indi-
  arcs with property cex:ref-area of shape .                                                                       :ConstraintViolations. SPIN constraints use the ex-
  In the same way, it may be interesting to declare the shape              pressiveness of SPARQL plus the semantics of the ?this
  of RDF nodes that act as properties. Another extension pro-               variable standing for the current subject and the spin:
  posal is to describe named graphs. These two proposals are               ConstraintViolation class.
  not difficult to add, but it is not clear which syntax would be
  intuitive enough for them.                                             • SPARQL-based approaches use the SPARQL Query Lan-
                                                                           gugage to express the validation constraints. SPARQL has
• More expressiveness. The Shape expression language can                   much more expressiveness than Shape Expressions and can
  be extended with semantic actions to increase the expres-                even be used to validate numerical and statistical compu-
  siveness of the language. Semantic actions are marked by                 tations [9]. However, we consider that the Shape Expres-
  %lang { actions %} which means that the validator                        sions language will be more usable by people familiar with
  can invoke a processor of the language lang with the corre-              validation languages like RelaxNG. Nevertheless, Shape Ex-
  sponding actions.                                                        pressions can be translated to SPARQL queries. In fact, we
      have implemented a translator from Shape Expressions to               The complexity of the validation algorithms for Shape Expressions
      SPARQL queries. This translator combined with semantic                offers some theoretical challenges related to regular bag expres-
      actions expressed in SPARQL can offer the same expressive-            sions that have been tackled in [3]. The last implementation of
      ness as other SPARQL approaches with a more succinct and              Shexcala contained an algorithm based on derivatives of regular
      intuitive syntax.                                                     expressions which greatly improved the efficiency of the validation
                                                                            process.
      There have been other proposals using SPARQL combined
      with other technologies. Fürber and Hepp[6] proposed a
                                                                            Although the language is new and the syntax can seem strange at
      combination between SPARQL and SPIN as a semantic data
                                                                            first sight, we noticed that people are able to learn the syntax and
      quality framework, Simister and Brickley[17] propose a com-
                                                                            to declare shape expressions quickly.
      bination between SPARQL queries and property paths which
      is used in Google and Kontokostas et al [8] proposed RDFU-
                                                                            In general we consider that the benefits of validation can help the
      nit a Test-driven framework which employs SPARQL query
                                                                            adoption of RDF based solutions where the quality of data is an
      templates that are instantiated into concrete quality test queries.
                                                                            important issue.
      We consider that Shape Expressions can also be employed in
      the same scenarios as SPARQL while the specialized valida-
      tion nature of Shape Expressions can lead to more efficient           10.    REFERENCES
      implementations.                                                    [1] T. Berners-Lee. Linked-data design issues. W3C design issue
                                                                              document, June 2006.
     • Grammar based approaches define a domain specific lan-                 http://www.w3.org/DesignIssue/LinkedData.html.
        guage to declare the validation rules. OSLC Resource Shapes [16] [2] C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the
        have been proposed as a high level and declarative descrip-           story so far. International Journal Semantic Web Information
        tion of the expected contents of an RDF graph expressing              Systems, 5(3):1–22, 2009.
        constraints on RDF terms. Shape Expressions have been in-         [3] I. Boneva, J. E. Labra, S. Hym, E. G. Prud’hommeau,
        spired by OSLC although they offer more expressive power.             H. Solbrig, and S. Staworko. Validating RDF with Shape
        Dublin Core Application Profiles [5] also define a set of val-        Expressions. ArXiv e-prints, Apr. 2014.
        idation constraints using Description Templates with less ex-     [4] K. Clark and E. Sirin. On RDF validation, stardog ICV, and
        pressiveness than Shape Expressions.                                  assorted remarks. In RDF Validation Workshop. Practical
                                                                              Assurances for Quality RDF Data, Cambridge, Ma, Boston,
                                                                              September 2013. W3c,
 The main inspiration for Shape Expressions has been RelaxNG [19],            http://www.w3.org/2012/12/rdf-val.
 a Schema language for XML that offers a good trade-off between           [5] K. Coyle and T. Baker. Dublin core application profiles.
 expressiveness and validation efficiency. The semantics of Re-               separating validation from semantics. In RDF Validation
 laxNG has also been expressed using inference rules in the spec-             Workshop. Practical Assurances for Quality RDF Data,
 ification document [14] and is based on tree grammars [13]. In the           Cambridge, Ma, Boston, September 2013. W3c,
 case of Shape Expressions the underlying semantics can be defined            http://www.w3.org/2012/12/rdf-val.
 in terms of regular bag expressions [3].                                 [6] C. Fürber and M. Hepp. Using sparql and spin for data
                                                                              quality management on the semantic web. In
 Shape Expressions are also being employed in the development of              W. Abramowicz and R. Tolksdorf, editors, Business
 more specialized validators. For example, the Vaskos project13 is            Information Systems, volume 47 of Lecture Notes in Business
 developing a SKOS validator using a combination between Shape                Information Processing, pages 35–46. Springer, 2010.
 Expressions and SPARQL queries.                                          [7] H. Knublauch. SPIN - Modeling Vocabulary. http:
                                                                              //www.w3.org/Submission/spin-modeling/,
                                                                              2011.
 9. CONCLUSIONS
                                                                          [8] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann,
 Shape Expressions have been proposed as a Domain Specific Lan-
                                                                              J. Lehmann, R. Cornelissen, and A. Zaveri. Test-driven
 guage that can describe and automatically validate RDF. They of-
                                                                              evaluation of linked data quality. In Proceedings of the 23rd
 fer a more expressive way to define sets of RDF graph shapes than
                                                                              International Conference on World Wide Web, WWW ’14,
 OSLC’s Resource Shapes or Dublin Core’s Application Profiles.
                                                                              pages 747–758, Republic and Canton of Geneva,
 There are trade-offs between expressiveness and implementability,
                                                                              Switzerland, 2014. International World Wide Web
 but compared to schema languages in other data models, Shape Ex-
                                                                              Conferences Steering Committee.
 pressions represent a conservative point in that spectrum, emulating
 mostly the expressiveness of RelaxNG.                                    [9] J. E. Labra and J. M. Alvarez Rodríguez. Validating
                                                                              statistical index data represented in RDF using SPARQL
 From a tooling perspective, shape expressions can be used stand-             queries. In RDF Validation Workshop. Practical Assurances
 alone to validate RDF graphs and endpoints offering a dedicated              for Quality RDF Data, Cambridge, Ma, Boston, September
 language that can be implemented efficiently and generate special-           2013. W3c,
 ized error messages for the concrete task of shape validation.               http://www.w3.org/2012/12/rdf-val.
                                                                         [10] J. E. Labra Gayo. Reusable semantic specifications of
 Given that Shape Expressions can be translated to SPARQL queries,            programming languages. In 6th Brazilian Symposium on
 they can also be combined with other widely deployed infrastruc-             Programming Languages, 2002.
 ture.                                                                   [11] P. N. Mendes, H. Mühleisen, and C. Bizer. Sieve: Linked
                                                                              Data Quality Assessment and Fusion. In 2nd International
13
   http://vaskos.chemaar.cloudbees.net/                                       Workshop on Linked Web Data Management (LWDM 2012)
     at the 15th International Conference on Extending Database
     Technology, EDBT 2012, March 2012.
[12] B. Motik, I. Horrocks, and U. Sattler. Adding Integrity
     Constraints to OWL. In C. Golbreich, A. Kalyanpur, and
     B. Parsia, editors, OWL: Experiences and Directions 2007
     (OWLED 2007), Innsbruck, Austria, June 6–7 2007.
[13] M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy
     of xml schema languages using formal language theory.
     ACM Trans. Internet Technol., 5(4):660–704, Nov. 2005.
[14] OASIS Committee Specification. RELAX NG Specification:.
     http://relaxng.org/spec-20011203.html, 2001.
[15] D. Reynolds. The Organization Ontology.
     http://www.w3.org/TR/vocab-org/, 2014.
[16] A. G. Ryman, A. L. Hors, and S. Speicher. OSLC resource
     shape: A language for defining constraints on linked data. In
     C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas, and
     S. Auer, editors, Linked data on the Web, volume 996 of
     CEUR Workshop Proceedings. CEUR-WS.org, 2013.
[17] S. Simister and D. Brickley. Simple application-specific
     constraints for rdf models. In RDF Validation Workshop.
     Practical Assurances for Quality RDF Data, Cambridge,
     Ma, Boston, September 2013. W3c,
     http://www.w3.org/2012/12/rdf-val.
[18] J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity
     constraints in OWL. In Proceedings of the 24th AAAI
     Conference on Artificial Intelligence (AAAI-10). AAAI,
     2010.
[19] E. van der Vlist. Relax NG: A Simpler Schema Language for
     XML. O’Reilly, Beijing, 2004.