Linking Data from RESTful Services

                            Rosa Alarcon                                             Erik Wilde
          Departamento de Ciencia de la Computacion                             School of Information
            Pontificia Universidad Catolica de Chile                               UC Berkeley
                       ralarcon@ing.puc.cl                                     dret@berkeley.edu


ABSTRACT                                                          the principles and rationale of both fields. For instance, on
One of the main goals of the Semantic Web is to extend cur-       the Linked Data side, research projects aim to create large
rent human-readable Web resources with semantic informa-          collections of RDF data by transforming structured data
tion encoded in a machine-processable form. One of its most       sources into RDF using specialized mappings, and exposing
successful approaches is the Web of Data which by follow-         the generated RDF dataset as RDF triple stores, often with
ing the principles of Linked Data have made available several     SPARQL endpoints. Although this strategy make available
data sources compliant with the Semantic Web technologies,        large collections of RDF data, they result also in centralistic
such as, RDF triple stores, and SPARQL endpoints. On the          approaches where access is typically mediated through a sin-
other hand, the set of the architectural principles that under-   gle “endpoint” (e.g. a dump of the whole site, an SPARQL
lie the human-readable Web has been conceptualized as the         endpoint, a Tabulator-like interface, etc.) and due to the
Representational State Transfer (REST) architectural style.       heterogeneous nature of the data sources interfaces, they
In this paper, we distill REST concepts in order to pro-          require sophisticated mechanisms to retrieve, process, and
vide a mechanism for describing REST (i.e. human-readable         publish the information [9], which challenges the scalability
Web) resources and transform them into semantic resources.        and accuracy of the expose data since it can be outdated.
The strategy allowed us to harvest already existing Web re-          One of the main tenets of REST is the primacy of re-
sources without requiring changes on the original sources, or     sources that are uniquely identified by opaque URIs, that
ad-hoc interfaces. The presented strategy aims to contribute      is, in order to avoid coupling between clients and servers,
to the availability of more semantic datasets and become a        no assumptions must be made about the structure of the
further step to lower the entry barrier to semantic resources     URI [26]. REST requires a uniform interface, that is, a set of
publishing.                                                       operations or methods with known semantics that changes
                                                                  the state of the resources. The interface depends on the
                                                                  URI scheme, for HTTP, the standard methods are GET, PUT,
Categories and Subject Descriptors                                POST, DELETE, and OPTIONS. Methods are external to the re-
H.3.5 [Information Storage and Retrieval]: Online In-             sources, and are invoked by sending standard messages to
formation Services—Web-based services, Data sharing               the Web server indicating the URI of the requested resource,
                                                                  the method, the payload of the message and metadata.
General Terms                                                        A resource can have multiple “representations” that fol-
                                                                  low a standardized format or media type (e.g., text/html,
Design, Documentation, Languages
                                                                  application/xml, etc.) and can be negotiated with the
                                                                  Web server. Representations convey the state of the client’s
Keywords                                                          interaction within the application and contain hyperlinks
REST, Web Data, Crawling                                          that allow clients to discover other resources or change the
                                                                  state of the represented resource. Most importantly, REST
1.   INTRODUCTION                                                 services have no “endpoints”, instead, they consists of a
   There is an increasing interest in the relationship of Rep-    collection of resource URIs and a set of standard opera-
resentational State Transfer (REST) [13] , and the Semantic       tions. This approach differs greatly from more traditional
Web, which has resulted in various approaches varying from        SOAP/WSDL, where a service publish an endpoint that ex-
the semantic annotation of Web resources, to middleware           poses the set of available operations (i.e. URIs, encoding,
that mediates resource handling. Followed approaches, re-         parameters). Such operations have particular semantics that
semble the strategies of more traditional SOAP/WSDL se-           must be known in advance, in order to be properly invoked
mantic services and neglect basic REST properties. REST           by the client (coupling).
principles are somehow related to Linked Data principles in          REST yield loosely coupled design [26], where architec-
the sense that resources have a unique identifier (URI), that     tural concerns are separated among various standardized
must be dereferenceable through HTTP; resources are inter-        components such as routers, Web servers and Web browsers,
linked, and by following those links new resources can be dis-    resulting in a flexible, extensible and decentralized system
covered. However, differences arise when getting deeper into      simple to maintain and capable of massive scalability. Un-
Copyright is held by the author/owner(s).
                                                                  like distributed system, that hide distribution, decentralized
LDOW 2010, April 27, 2010, Raleigh, North Carolina.               systems make it explicit with the eventual goal of architect-
.
ing a system of systems.                                           ject belong to a dataset different than the object’s dataset.
   Based on these REST principles, we present the Resource         Directionality of the links can be modeled, and other prop-
Linking Language (ReLL), that describes RESTful Web ser-           erties such as licensing (dcterms:license), the number of
vices and provides a natural mapping from the graph-oriented       triples available in the dataset (void:statItem), the vocab-
world of RESTful services (resources interlinked by links          ularies used in the dataset, and a SPARQL endpoint, are
found in resource representations) to the graph-based model        also provided. voiD is accompanied of a Sitemap protocol
of RDF. By means of a ReLL description, a set of REST re-          extension that indicates the location (URI) of the voiD de-
sources are described and exposed. Three applications were         scription so that (semantic) web crawlers can find it and use
described and the resources harvested into a triple store.         voiD’s information to index the dataset. The Silk-LSL (Link
Section 2 briefly discuss related approaches, and section 3        Specification Language) [30] is an XML-based language that
describes the proposed language.                                   allows to define the rules (e.g. similarity metrics) and to find
                                                                   certain types of links (e.g. owl:sameAs) between two data
2.   RELATED WORK                                                  sources automatically (that is, to discover Linksets in the
                                                                   terms of voiD).
   Semantic Web Services (SWS) for REST are mainly fo-
                                                                      voiD’s focus is on providing access and discovery for al-
cused on providing a semantic description of a REST ser-
                                                                   ready existing datasets by publishing metadata, but a more
vice. SA-REST [21] and hREST/MicroWSMO [20] provide
                                                                   granular approach (i.e. information about the retrieved re-
a list of input and output parameters, methods, and URIs
                                                                   sources themselves) is not considered. Silk, allow to better
exposed by a REST service by means of property value pairs
                                                                   index large centralized collections of RDF data, and dis-
or RDFa [1] annotations. The description itself can be trans-
                                                                   covering dependencies between these datasets. While these
formed to RDF using a GRDDL-based [12] strategy for gen-
                                                                   approaches are central to increasing the amount of linked
erating a domain ontology in RDF, but no information about
                                                                   data on the Web, they are rather expensive because they
the REST resources themselves are retrieved.
                                                                   are based on a lot of specialized mapping and publishing
   The Web Application Description Language (WADL) [16]
                                                                   work for just transforming one dataset [9].
describe RESTful services and place resources, identified by
                                                                      LDDR, the Link-based Resource Descriptor Discovery [17]
predefined URI patterns, as first-class objects in a descrip-
                                                                   is a proposal submitted to IETF that focuses on the re-
tion. WADL only supports HTTP methods with request
                                                                   sources rather than the datasets. It allows resources to
and response elements. These elements contain representa-
                                                                   indicate their descriptor’s location by using links in three
tions with a media type and (possibly) another URI. Rep-
                                                                   modes, the <LINK> element available in markup represen-
resentations contain typified parameters that in turn con-
                                                                   tations that support typed-relations such as (X)HTML and
tain links to another resources’ URI. Generally speaking,
                                                                   Atom; the HTTP Link Header; and a Link-pattern con-
WADL attempts to completely describe all possible aspects
                                                                   tained in the resource’s description document located at
of a RESTful service, down to predefined URI patterns and
                                                                   {host}/.well-known/ directory. In all three cases, the de-
the ways in which query parameters have to be composed
                                                                   scriptor itself depends on the resource’s URI, in the form of
for certain types of requests, introducing a higher level of
                                                                   {resource uri};about. Unlike the last approach, the for-
coupling for clients using such descriptions.
                                                                   mer two would require to modify the resources in order to
   In the same line, Battle and Benson [6] propose semantic
                                                                   include the <LINK> elements either in the resource’s code
annotations, similar to SA-REST, and extensions to SPARQL
                                                                   or in the server side in order to process the HTTP Header.
in order to support an HTTP REST uniform interface. They
                                                                      As for the descriptor itself, XRD1 , the Extensible Re-
also propose extensions to the payload of the HTTP REST
                                                                   source Descriptor defines a small set of elements describing
methods (e.g., PUT, DELETE and GET) for maintaining consis-
                                                                   the resource’s URI (and URI template), an XML signature,
tency between a REST resource and its semantic equivalent
                                                                   the expiration date, and links to other resources. Links are
(a triple) in some triple store.
                                                                   also annotated with metadata such as the target resource
   The main problem of these approaches is that they follow
                                                                   URI (and its URI template), mediatype, and the <rel>
the WSDL/SOAP service model; they do not align well with
                                                                   property as defined by the HTTP Header Link Relation-
the principles of RESTful service design, since they disre-
                                                                   ship Types. This approach, implies that there must exist an
gard fundamental properties such as the hypermedia nature
                                                                   XDR document per resource (since the set of links is often
of REST, and the possibility of multiple representations for
                                                                   different for each resource) which introduces high coupling
the resources. They also introduce coupling in their design
                                                                   and may be impractical for a Web-scale application.
by adhering to URI templates for describing the URIs of re-
                                                                      If XRD focuses on individual resources, POWDER, the
sources, input, and output parameters [25], or in the case of
                                                                   Protocol for Web Description Resources2 recommended by
Battle and Benson, they introduce new semantics to the
                                                                   W3C aims to facilitate the description of groups of resources
standard REST interface.
                                                                   identified by Internationalized Resource Identifiers (IRIs).
   EXPRESS [4] is a SWS model that explicitly avoids the
                                                                   An iriset (a set of IRIs, not a set of resources) can be defined
RPC-orientation of the approaches mentioned so far. It
                                                                   in terms of the properties of such IRIs, that is, the accepted
starts from HTTP’s uniform interface, and then describes
                                                                   schemes (e.g. http, https), hosts, paths, and ports defined
the available resources in an OWL ontology. However, the
                                                                   via regular expressions. The iriset properties are described
model of EXPRESS is a centralized one as well, because it is
                                                                   by a descriptorset element that groups restriction attributes
assumed that there is a complete description of a Web Ser-
                                                                   such as certified (indicates if the description certifies an-
vice’s available resources, and then this description is used
                                                                   other resource) and sha1sum (providing a SHA-1 sum of
to generate URIs for classes, instances, and properties.
   On the Linked data side, the Vocabulary Of Interlinked
                                                                   1
Datasets (voiD) [3], describes datasets (sets of RDF triples)          http://docs.oasis-open.org/xri/xrd/v1.0/xrd-1.0.html
                                                                   2
as well as the sets of Linksets, that is, triples where the sub-       http://www.w3.org/TR/2009/REC-powder-dr/
the described resource); and annotation properties, such as,       richness of links discovered in known resources, though, the
displaytext (a descriptive text), displayicon (an image            approach taken in Deimos uses tagging services to discover
URI) and seealso, label, comment that provide a related            new resources.
resource URI, a description and a comment respectively.               Finally, another attempt to provide a bridge between REST
Both restriction attributes and annotation properties have         and the semantic Web is the W3C work in progress of an
well-defined semantics and can be translated automatically         RDF vocabulary representing the HTTP protocol 3 . The ap-
to OWL, thought, they describe high level attributes. An           proach captures properties such as the message exchanged
additional property, typeof is also translated into rdf:type       (including the HTTP headers), the request (including the
and allows to specify a class for all the elements of an iriset.   method and URI) and the response (including the HTTP
For instance, we could define the http:\twitter.com iriset         status code number) with the goal of facilitating relevant
and indicate later that all the elements identified by such        tasks such as content negotiation, as well as additional HTTP
URI belong to the class twitterPublicTimeLine. Prove-              headers registered by the Internet Assigned Numbers Au-
nance information describing author, date and validity pe-         thority (IANA).
riod (attribution) is also provided.
   Unlike XDR, POWDER refers to group of resources iden-
tified by URI patterns (not URI templates) without requir-
                                                                   3.     RESOURCE LINKING LANGUAGE
ing changes in the resources, furthermore, POWDER makes               Considering the related work, we derived a set of require-
possible to assign a class to the group of resources facilitat-    ments for a REST resource description language that con-
ing later complex operations such as SPARQL queries. On            sider REST constraints. For instance, in order to avoid cou-
the negative side, POWDER facilitates the description of           pling URIs must be opaque, they must support multiple
group resources but not it does not provide support for the        representations, and must consider linking among resources
resources discovery or an automatic harvesting process.            as a fundamental property. In order to consider current
   In the approach described by Futrelle [14], RDF is used         installed infrastructure, it must require minimal or no in-
as the “integration layer” in a scenario of heterogeneous data     tervention for existing Web resources; in order to scale it
sources, and the main focus is on harvesting well-known and        must support a partial description of the resources that can
cooperating data sources. This approach can be applied to          be later completed and/or modified, it must describe both
a variety of data sources, but they have to be cooperating         single resources and groups of resources as well as the rela-
in the sense that they expose RDF themselves. The har-             tionships among them, and finally it must be simple in order
vester’s main role is to be notified of new and updated data,      to lower the entry barrier for future developers and foster
and to pull it in from these sources. While this scenario          its adoption.
uses RDF’s power to unify heterogeneous data sources on               The main constraints for designing RESTful services are
the metamodel level, it is only applicable in closed and co-       resource identification, linking, and a uniform interface through
operating settings. In our approach, data sources are not re-      which linked resources can be accessed. By linking we re-
quired to publish RDF themselves. As long as access to data        fer to one of the core aspects of RESTful services, that is
is provided through RESTful services, they can be harvested        the use of hypermedia as the engine of application state (HA-
and used as RDF. A weakness of the current implementa-             TEOAS), which means that service interactions that in non-
tion is that updating is not supported in a way that allows        REST approaches result in server state, are actually imple-
efficient incremental updates, but we plan to address this         mented as clients following links to resources representing
issue in our future work mentioned in Section 6, where we          that state. This results in services that are resource- and
describe extensions to our language that represent update          link-centric, and thus a description language for RESTful
services (and thus the ability to use those for incremental        services should focus on these two aspects.
updates) on the language level.                                       The other two main constraints of REST, self-describing
   SOFIE [29] focuses on information extraction from Web           messages and stateless interactions, are more a question of
resources, and ANGIE [27] on using both extracted infor-           how resource representations are retrieved, and how state
mation and Web services endpoints, for building a more in-         is handled when interacting with services. For the purpose
teractive system that does not require an exhaustive crawl         of designing RESTful services, all of these design issues are
of data, but retrieves information on demand. SOFIE thus           relevant. For the purpose of describing a RESTful service
falls into the category of approaches that start from resource     interface, the most important aspects are the resources rep-
representations, and use information retrieval methods to          resentations that can be retrieved, the ways in which these
extract RDF from them. The current implementation of               can link to other resources, and the protocol interactions
ANGIE focus on the dynamics of query processing in the             that may be required to access those resources. The service
RDF data managed by the system, and uses a hardwired               semantics also require an understanding of the semantics
set of Web services as the back-end. Similar to SA-REST,           of the representations involved in the interactions with the
it uses a set of lowering/lifting transformations to translate     service, but for the mere description of a service’s interface,
the results of function calls from and to RDF. ANGIE fo-           these semantics are not required.
cuses on SPARQL processing (the framework is able to use              Figure 1 shows the schema of ReLL. Elements are shown
Web services while processing SPARQL queries), and less on         as rectangles and attributes as dashed rectangles. Sequences
the ability to easily accommodate a large variety of RESTful       are depicted as a circle with the character “S”. A service
services.                                                          exposes a set of one or more resources that have a unique
   Deimos [5] is another system that starts with information       identifier (xml:id ), names and descriptions (human-readable
found on Web pages or through Web forms, and then uses             labels) and optionally a URI pattern which describes the
semantic analysis to map the syntax of these representations       constraints for the identifiers expected to be used for spe-
to semantically richer information. Instead of relying on the      3
                                                                       http://www.w3.org/TR/HTTP-in-RDF10/
                                                              type
                                                              href                  id
                                                                                   type
                                     1..∞                                         target
                       name                                 schema
                                     0..∞          0..∞                        minOccurs
                       desc
                                                   0..∞
                                                              link             maxOccurs
                                                                                collection
                                   id              S                                                    type
                                  type                               S          selector               select
                                                   1..∞
                                                                                protocol                 type
                            1..∞         representation              0..∞

                                                   0..∞                                                                method
                                                                                     S               request
                                                            1..∞
                             id                    S                 name                           response            media
                                            1..∞             0..∞
                                                                     desc
                                     resource               0..∞
                                                                      uri             type
                                                                                     match
                       S                    1..∞
                                            S          resources                                       id
                                                                                                                1..∞
                                                                                                                       name
                                                                                                       S        0..∞
                                                                                                                       desc
                     representations                          S          linktypes
                                                                                              linktype
                        targetNamespace
                                                                                             1..∞
                                  base                     service                       S


                                                Figure 1: ReLL Description Schema


cific resources (match). A resource may have representa-                      feeds extend Atom with new link relationships), and the
tions, which are the serialization of the resource in some                    effective set of link types that can appear in a concrete re-
syntax. This design naturally supports multiple represen-                     source using the paged feed representation thus is the union
tations for resources, but it does not support, per se, the                   of these different link types. Representations can be based
common practice of some Web services that use different                       on other representations, but only on abstract representa-
URIs for different representations of the same resource (such                 tions. The other use case of abstract representations is
as two URIs with .xml and .json suffixes, if these are two                    representations that are derived from concrete representa-
supported representation formats).4 We discuss this issue                     tions, such as a collection of representations that is available
further down, when we are discussing link types.                              through a paging mechanism in representation formats.
   Representations can be associated with schemas for pos-                       Each representation can contain any number of links. A
sible validation (if schemas exist). Representations can also                 link is retrieved from the representation by using selectors.
be defined as part of the service directly, in which case they                Selectors depend on the representation format, and thus
are abstract, which means that they are not associated with                   their definition and interpretation may depend on a lan-
any concrete resources. The most important use cases for                      guage (selector type) that is appropriated for a certain repre-
abstract representations are conventions for media or data                    sentation. For instance, for XML representations, the most
formats that should be described, so that they can be reused                  popular example for a selector mechanism is the XML Path
as a foundation for describing concrete resource representa-                  Language (XPath) [11, 7], which allows structured selections
tions. A real-world use case for this scenario is an abstract                 within XML document trees. A link defines a possible asso-
representation describing the media type application/xml,                     ciation leading from the resource’s representation containing
that serves as the basis for the abstract representation de-                  the link to another resource as determine by the target. In-
scribing the application/atom+xml media type for feeds ac-                    stead a resource URI, the target contains a valid resource
cording to Atom [24], which in turn serves as the basis for                   id in order to avoid coupling with the resources’ naming
the abstract representation describing the paged feeds media                  scheme.
type (i.e., feeds implementing feed paging [22]). Eventually,                    A link has a link type which represents the semantics of
a concrete service providing a resource may use paged feeds                   the link, but ReLL does not make any attempt to formalize
and thus the resource types its representation with the ab-                   the semantics; link types have a name and a description and
stract “paged feed” representation. The rationale behind                      thus can be documented in a service description, but their
this design is that various representations in this chain of                  semantics are outside of the scope of the description lan-
representations define different linking mechanisms (paged                    guage. Links can also contain protocol descriptions which
4                                                                             for each link specify the rules that govern the interaction
  Such variations in the representation’s URIs could easily                   with the linked resource. This is important because links in
be covered by a URI pattern for the resource ending with
.(xml|json), but the variation of the suffix alone would not                  RESTful services not only have application-specific seman-
imply that it does not actually refer to a different resource,                tics, following the links also may require different ways of
but only to a different representation.                                       using the uniform interface provided by a certain protocol.
                                                                                          rell:link
                                                  rell:represents
                          rell:representation                           rell:resource
                             subClass
                                                           subClass                        subClass
                   iana:representation            base:resource_id                         base:resource_id
                                                                       base:link_type


                             subClass                   io                                       io
                   iana:representation             resource_uri1                             resource_uri2
                                                                      base:link_id
                                                                                     vCard:EMAIL          vCard:TITLE
                               rell:represents                                               bnode        string
                  subClass

                                                                                        rell:represents
                                                      io
                       base:representation_type                                base:timestamp
                                                               http:ETag
                                                                                     dcterms:created
                                                                      string      date


                              Figure 2: Generating RDF triples from ReLL descriptions


Thus, it is possible for each link to specify how this link              is named with an ID or timestamp (e.g., base:r123456789)
has to be traversed using a specific protocol. Practically               that refers to the source or representation from where the
speaking, this means that after a link’s URI has been deter-             graph information was collected. The representation is an
mined (for example by extracting the URI using a selector),              instance of the representation type defined in the ReLL de-
the protocol is determined by inspecting the URI’s scheme,               scription for the retrieved REST resource.
and then the protocol description might give additional hints               Representations are subclasses of a concrete media type
about how to use methods or compose entities for invoking                that can be derived from abstract representations or ab-
the uniform interface. Thus protocol descriptions are just               stract media types as annotated in the ReLL descriptions.
one (the interface-specific) part of describing link semantics.          Abstract representations are supported as classes that serve
                                                                         as the basis for other abstract or concrete representations.
                                                                         For representations, the upper ontology contains all stan-
4.   FROM RELL TO RDF                                                    dardized media types from the IANA registry as classes.
   ReLL main elements such as resource, representation, and                 The representation is then part of the provenance infor-
link serve as the core elements for a RDF/OWL minimal                    mation obtained when retrieved the REST resources (see
vocabulary shown in Figure 2 under the “rell” namespace.                 dashed elements in Figure 2). Other information such as
Resource, and representation are concepts while link, and                the ETag property served by the Web server when retriev-
represents are predicates. Since ReLL describes a REST                   ing the REST resource is also collected if available; the date
application, it is used to generate a domain ontology for the            when the information was retrieved (and hence the named
application. The resource id annotated in ReLL is used as                graph was created) is also annotated. Other information as
the resource’s type and the link type as the predicate that              indicated by [18] could also be included in future develop-
relates two resources. Domain specific resources are also                ments.
subclasses of the rell:resource entity, and currently form
a domain-specific vocabulary by using the ReLL service’s
attribute base.                                                          5.     IMPLEMENTATION
   We are maintaining the actual REST resources’ URIs to                    As a proof of concept, we have implemented RESTler [2],
identify them in the realm of the Semantic Web, however                  a crawler that follows the rules defined by ReLL descriptions
they are considered instances of the domain-specific classes             in order to harvest REST resources. A complementary com-
discussed before. REST resources are linked together with                ponent (a Translator) transforms the retrieved resources into
a link id instead of a link type. REST resources’ themselves             RDF. Figure 3 describes the principal components of the
can be transformed to RDF following a GRDDL approach.                    approach. Rectangles represent software components, UML
For instance, in Figure 2, a resource is annotated with prop-            note figures are used to represent files, straight lines repre-
erties defined in the vCard vocabulary, including simple (lit-           sent information flow required in the configuration phase of
erals) and complex attributes (e.g. the EMAIL is generated               the process (static), while dashed lines represent informa-
as an internal blank node). Naturally, the proper vocabu-                tion flow that take place while the crawling process is being
laries depend on the resources.                                          executed (dynamic).
   With this approach, it is possible to retrieve a graph of                RESTler, is a crawler that parses and uses ReLL descrip-
triples describing a REST resource (URI and attributes)                  tions as instructions for retrieving REST services’ resources.
and its relation to another REST resource, as shown by                   The crawler takes as input an XML document which is a
the dashed rectangle in Figure 2. The resulting graph [10]               ReLL description, and a set of seed URIs (Figure 3), and
                                                                                                         ReLL
                                    RESTler
                                                                    Translator                         vocabulary
                      seeds         Crawler
                                                                                                          .owl
                                   HTTP client
                                     Tiddy                                                               IANA
                                                                     mapping                           vocabulary
                                                                    .properties          RDF/OWL          .owl
                                 RESTful service        ReLL
                                                      description
                                      REST                               .xslt
                                                       .xml file
                                    resource                                             Triplestore
                                  representation                         .xslt


                                                   Figure 3: RESTler Architecture


produces as output a typed graph of the crawled resources           as provenance information (base:timestamp). Additional
and the links connecting them. The crawler also takes as            information is obtained trough XSLT files transforming re-
input authentication information, only basic authentication         sources into RDF sentences, as indicated for the correspond-
is supported (username and password sent in the HTTP                ing mapping file. Each ReLL document is transformed into
request) currently, but we plan to extend the crawler in or-        RDF with a generic XSLT generating an ontology specific
der to support other authentication schemes (e.g., OAuth,           to each application domain. Generated named graphs are
AuthSub).                                                           stored in a triple store. We use Sesame 2.0 as triple store
   The crawler parses the description file, dereferences the        and the system is implemented in Java. Sesame supports
initial URI (seeds), and retrieves the resource representation      named graphs as quads, and we use the fourth component
considering the protocol, request method, and resource me-          for storing provenance information.
dia type provided. Currently we support HTTP (an HTTP                 Finally, for each retrieved resource, the crawler recursively
client), and HTML, XHTML, Atom, JSON, RSS, and XML                  repeats the whole process.
as media types, and only the GET method. But the crawler
can be extended to support other media types, protocols             5.1          School/Twitter/Flickr and User Matching
and request methods.                                                   We applied RESTler to four scenarios: a subset of the
   The resource URI is matched against a regular expres-            Web site of the Information School at UC Berkeley, and two
sion that defines the resource type or id. From the retrieved       well known REST-based applications, Twitter and Flickr.
representation, the crawler obtains the list of embedded            The fourth service provide mappings among the users in
links to other representations by applying an XPath ex-             each of these domains so that we can establish useful equiv-
pression (selector). The link’s target indicates the ex-            alences by means of an owl:sameAs property. ReLL descrip-
pected resource type and requires additional information            tions where created for each scenario and we retrieved 11,353
such as the protocol, and request method to follow and              resources, 22,309 links among them which generated 55,548
the expected media type. If the target is not present in the        triples.
link element, a “nofollow” condition is implied, since it is           Figure 4 presents the ontology that was generated af-
not possible to crawl the linked resource (i.e., there is no in-    ter transforming ReLL descriptions into RDF through a
formation about the media type, protocol, request method            generic XSLT definition. The image was generated using
or expected resource type).                                         OntoViz6 and was later refined for readability. The upper
   It is possible as well to support computed links, that           left corner presents the representation classes and their
is, links that are calculated.5 The crawler also evaluates          corresponding iana media-types (e.g. iana-app:xhtml+xml,
whether the resource fulfills certain restrictions such as the      iana-app:atom+xml, iana-app:xml, iana-txt:html and im-
type of the linked resources (target attribute), and the car-       ages media types). The right-hand side presents the classes
dinality of the retrieved links (minOccurs and maxOccurs            that model the UC Berkeley school domain’s resources (e.g.
attributes for the selector element). These restrictions are        school:person, school:course, etc) and the relationships
optional and allow the crawler to determine whether the re-         among resources (e.g. school:person-course).
source is well-formed and satisfies the preconditions given in         The left-hand side shows the classes corresponding to the
the service description.                                            Flickr domain (e.g. flickr:photostream, flickr:photo,
   For each graph retrieved, a Translator is invoked for gener-     etc) and their relationships (e.g. flickr:photo-sizes). At
ating RDF triples based on the ReLL description, that is, the       the bottom of the figure, a subgraph describes the classes
subjects (resources’ URIs), properties (rdf:type, base:link         that model the Twitter domain (e.g. twitter:follower,
id) and objects (linked resources’ URIs or values), as well         twitter:user, etc) and the hyperlinks or relationships among
                                                                    them (e.g. twitter:status-reply). At the center of the fig-
5                                                                   ure the minimal ontology described in Figure 2 is highlighted
  Based on the ongoing work on the URI Template [15] lan-
guage, it might in the future be possible to define additional      in bold and italics.
ways in which a URI can be composed based on input values
                                                                    6
obtained from the current representation.                               A Protege plugin that generates .dot files
  school:peoplelist-html school:publication-page-html                            twitter:status-xml                 iana:application                      twitter:public-timeline-xml
  school:person-html           school:publication-html                                       isa              isa                 isa              isa                      isa
                           isa                       isa                                   iana_app:xml               isa       iana_app:atom+xml                iana_app:xhtml+xml
   school:course-html                                                  iana:text
                          isa                       isa                                                                                                                       isa
                                                                  isa                            isa                            isa                isa
  school:course-page-html isa                                                      isa
                                        isa                                                                                                                          twitter:user-timeline-xml
                                isa                 iana_txt:html            isa                   rell:representation
   flickr:user-html                                                                  isa
                                  isa
                                 isa          isa           iana_img:ief                   isa
   flickr:photosize-html                                                                 isa
                                                            iana_img:gif
   flickr:camera-html      flickr:photo-html
                                                                                                                            school:course-person*
                   flickr:image-jpeg          isa         iana_img:jpeg                                                                                    school:person
                                                                                                                                         school:person-course*
                                                                                    rell:represents*                                                                                     school:peoplelist-member*
                                                                                                                              school:course                              school:person-website*
                                                                                                                        isa
                                                                 um:usermap                                               school:course-page-course*                                           school:peoplelist

                                                                                                                                  school:course-page                                     isa
                                                                             isa
      flickr:user-page*                                                                                                                                        isa
  flickr:user-first*                                                                                                                              isa                                      school:publication
  flickr:user-last*     flickr:photostream                  rell:collection                                                                                                        isa
                                                                                                                     school:course-page-list*
                                                                                           rell:link*                                                                        school:publication-page-publication*
flickr:user-previous*
                                                                                                                         school:courselist
  flickr:user-next*                                                        isa
                    flickr:userFlickr               isa
                                                                                                                                                                           school:publicationlist
                                                                                                                                  isa
          flickr:user-photo*                        isa                                                                                                        isa
     flickr:photo                     isa                                                        rell:resource                                                          school:publication-page-next*
               flickr:photo-taken*                                                                                                                                 school:publication-page-page*
                                                                isa
                              flickr:camera                                                                                                                       school:publication-page-last*
 flickr:photo-sizes*                                             isa                                                                                       school:publication-page-previous*
                                                                                                       isa     isa
                              isa                                                                                                       isa              school:publication-page-first*
  flickr:photosizes                      flickr:image                                                                            isa
                                                                                                                                                         isa
           flickr:photosizes-image*                                                                          twitter:follower
                                                                                     isa                                                                                          school:publication-page
   flickr:photosizes-page*                                                                                                            twitter:status
                                isa                                                                          twitter:follows*
               flickr:sizecollection
                                                                                   twitter:public-timeline

                                                          isa                                                                              twitter:status-reply*
                                                                                                                        twitter:timeline-statuses*

                                                                       twitter:public-timeline-user-timeline*
                                                                                                                                                          twitter:user
                                                                                                                                              twitter:timeline-user*

           twitter:timeline-page9*                                               twitter:user-timeline
            twitter:timeline-page10*                                                                                                     twitter:timeline-page2*
               twitter:timeline-page11*                                                                                                 twitter:timeline-page3*
                                                                         twitter:timeline-page6*
                                                                                                                                         twitter:timeline-page4*
                      twitter:timeline-page12*                                                                                           twitter:timeline-page7*
                      twitter:timeline-page13*                  twitter:paged-user-timeline
                                                                                                                                          twitter:timeline-page8*
                                                                                                                                          twitter:timeline-page5*


                                        Figure 4: Resulting ontologies generated from ReLL descriptions
              <http://www.ischool.berkeley.edu/people/faculty/erikwilde> a school:person ;
                 vCard:FN "Erik Wilde" ;
                 vCard:ADR _:node14m5kienpx1603 ;
                 vCard:TITLE "Adjunct Professor" ;
                 vCard:ORG _:node14m5kienpx1604 ;
                 vCard:EMAIL _:node14m5kienpx1606 ;
                 vCard:TEL _:node14m5kienpx1607 ;
                 vCard:URL <http://dret.net/netdret/> ;
                 vCard:PHOTO <http://www.ischool.berkeley.edu/files/imagecache/profile-pic/DSC_0176.JPG> ;
                 school:person-website <http://dret.net/netdret/> ;
                 school:person-course <http://www.ischool.berkeley.edu/programs/courses/242> ,
                   <http://www.ischool.berkeley.edu/programs/courses/152> ,
                   <http://www.ischool.berkeley.edu/programs/courses/190-waim> ,
                   <http://www.ischool.berkeley.edu/programs/courses/290-wa> .


                 Figure 5: Describing an instance of a school:person resource using N3 notation


   Collections of resources can be also identified. For in-       allow to reliably discover cases in which these constraints are
stance, at the bottom of the figure, the arcs between two re-     not satisfied anymore, for example when new representations
sources are depicted, the twitter:user-timeline, and the          or new identification and access schemes are used.
twitter:paged-user-timeline described a pagination rela-             Furthermore, this kind of RESTful service description can
tionships, that is, 13 pages of the twitter:user-timeline         also include the set of preconditions that must be satisfied
were collected and the pagination scheme is describe as links     by a client to be able to consume a service. Should these
that lead to a numbered page (e.g. twitter:timeline-page2,        preconditions change (because the service changes), then an
twitter:timeline-page3, etc). For the case of Flickr and          analysis of the description of the preconditions used by the
the Information School the pagination scheme considers links      client allows the client to detect the change (for example,
such as the first, last, next and previous page.                  a new representation format has been introduced), and to
   The fourth RESTful service, the Usermap is show as a           react in an appropriate way (for example, alerting the client
single class near the center of the figure. This is because the   manager, attempting a fallback, or abort). By supporting
ReLL file contains only one class of resource (the usermap),      the description of a set of preconditions, the description lan-
that is, an XML list mapping the users’ URIs between the          guage can achieve loose coupling [26] and still allow clients
other three applications.                                         to detect when they encounter something that they have
   The REST resources themselves are transformed to RDF           not been designed for. As for future work, we are planning
following a GRDDL approach. Figure 5 shows the attributes         on considering more complex data models that support also
obtained for individuals of type school:person. Notice            methods such as PUT, DELETE and POST allowing us to
that it is possible to annotate the relationships between the     model resources that can be modified, and its relation with
REST resource (erikwilde) and its attributes. In the fig-         the SPARQL proposals for supporting such operations [31].
ure these relationships are annotated with vCard, but other          Our minting process consist of selecting the appropriated
information models can be used.                                   name for the namespace (base), resource IDs, link IDs, link
                                                                  types, and representation IDs. In the example presented in
                                                                  Figure5, the resource instance’s namespace and predicates
6.   CONCLUSIONS                                                  chosen for this description correspond to the vCard, but
   The REST community is still discussing whether RESTful         other properties (e.g. foaf) could be also used. We believe
services even should be described, and how such a descrip-        that the selection of such properties must be responsibility
tion language could increase the coupling between a service       of the ReLL designer. Furthermore, the properties used in
provider and a service consumer, so that REST’s goal of           the ReLL description itself (e.g. school:person) could be
loosely coupled services could be compromised. We are tak-        also described using Linked Data vocabularies. By following
ing a pragmatic position and claim that it is important to        this approach the results of RESTler (e.g. triples datasets)
keep in mind that any kind of contract will introduce some        could be better integrated with other Linked Data sources
coupling, that even loosely coupled services need a shared        and the Linked Open Data cloud
set of assumptions, and that a more formal way of describ-           By considering the URIs corresponding to REST resources,
ing those assumptions will help service providers and con-        a natural content negotiation with the Web server will be
sumers in service documentation and consumption. A recent         possible in order to retrieve an RDF-friendly media type
upswing of discoverable links between Web resources (such         (e.g. application/rdf+xml) or the human-readable Web
as an uptake of microformats [19]) has led to the idea of a       version of the same resource. As for limitations, we require
central registry for link relationships in the realm of Web       to prepare a ReLL document for each REST service. This
linking [23], but this activity is still under active develop-    approach has been successfully followed by others such as
ment.                                                             Virtuoso’s Sponger, that prepares Sponges or Cartridges
   Our model is yet a static description of RESTful services      tailored for an application interface such as REST APIs,
that does not cover the cases in which new resources or           known metadata such as MS Office, or known Web sites
identification and access schemes are introduced. However,        such as YouTube. RDB2RDF7 is also an ad-hoc approach
such a description allows to describe the status quo and the
                                                                  7
cases which a client should expect, and therefore they also           http://www.w3.org/2005/Incubator/rdb2rdf/
that transforms RDBMS to RDF representations.                      [4] Areeb Alowisheq, David E. Millard, and
   We believe that by choosing Web technologies such as                Thanassis Tiropanis. EXPRESS: EXPressing
XPATH, XSLT and XML as a the basis for ReLL docu-                      REstful Semantic Services Using Domain Ontologies.
ments, we are lowering the entry barrier to the semantic               In Bernstein et al. [8], pages 941–948.
resources publishing, since most Web developers have the           [5] José Luis Ambite, Sirish Darbha, Aman Goel,
knowledge and tools required to create their own ReLL de-              Craig A. Knoblock, Kristina Lerman, Rahul
scription. This approach also allows developers to control             Parundekar, and Thomas Russ. Automatically
the information they are collecting. Our next challenge is             Constructing Semantic Web Services from Online
to further facilitate the creation of ReLL documents by sup-           Sources. In Bernstein et al. [8], pages 17–32.
porting the dynamic and automatic generation of ReLL de-           [6] Robert Battle and Edward Benson. Bridging the
scriptions. One of the challenges of this goal is the fact that        Semantic Web and Web 2.0 with Representational
we need to design an specific XSLT for each resource type              State Transfer (REST). Journal of Web Semantics,
in order to harvest specific information. A fully automatic            6(1), 2008.
approach would require information retrieval, text mining          [7] Anders Berglund, Scott Boag, Donald D.
and probably machine learning techniques which greatly in-             Chamberlin, Mary F. Fernández, Michael Kay,
creases the costs of the transformation an rises the entry             Jonathan Robie, and Jérôme Siméon. XML Path
barrier for technology adopters.                                       Language (XPath) 2.0. World Wide Web Consortium,
   Having a document such as ReLL may serve as an in-                  Recommendation REC-xpath20-20070123, January
termediate layer that automatic agents can use also as a               2007.
contract describing the capacities of a REST service and           [8] Abraham Bernstein, David R. Karger, Tom
translating them into RDF triples, by following the seman-             Heath, Lee Feigenbaum, Diana Maynard, Enrico
tics (types) made explicit in the document. Our approach               Motta, Krishnaprasad, and Thirunarayan,
can be seen as a complement to proposals such as voiD, since           editors. 8th International Semantic Web Conference,
voiD describes the resulting datasets but does not support             volume 5823 of Lecture Notes in Computer Science,
the triples harvesting process. Our approach will allow any            Chantilly, Virginia, October 2009. Springer-Verlag.
Web content provider to publish ReLL descriptions for oth-
                                                                   [9] Uldis Bojārs, John G. Breslin, Vassilios
ers to crawl their Web sites, or third-parties to develop a
                                                                       Peristeras, Giovanni Tummarello, and Stefan
Web site’s description that accommodates their needs. The
                                                                       Decker. Interlinking the Social Web with Semantics.
crawler’s result is a dataset that can be then described using
                                                                       IEEE Intelligent Systems, 23(3):29–40, May 2008.
voiD. Silk, can be also used for the definition of additional
link patterns such as the user mapping that we created man-       [10] Jeremy J. Carroll, Christian Bizer, Pat Hayes,
ually in this version; and LDDR’s linking techniques can be            and Patrick Stickler. Named Graphs, Provenance
also applied, since it may allow resources to link to their            and Trust. In Allan Ellis and Tatsuya Hagino,
descriptions.                                                          editors, 14th International World Wide Web
   We have placed strong emphasis in a decoupled approach,             Conference, pages 613–622, Chiba, Japan, May 2005.
where the components of the architecture maintain certain              ACM Press.
degree of independence, and require knowledge and tools           [11] James Clark and Steven J. DeRose. XML Path
already available and familiar to most Web developers, and             Language (XPath) Version 1.0. World Wide Web
provide a simple model that may result familiar again to               Consortium, Recommendation REC-xpath-19991116,
Web developers. Our final goal is to contribute in making              November 1999.
available more semantic information while keeping a lower         [12] Dan Connolly. Gleaning Resource Descriptions from
entry barrier for developers.                                          Dialects of Languages (GRDDL). World Wide Web
                                                                       Consortium, Recommendation REC-grddl-20070911,
                                                                       September 2007.
7.   ACKNOWLEDGMENTS                                              [13] Roy Thomas Fielding and Richard N. Taylor.
  This work was partially funded by CONICYT/Bicenntenial               Principled Design of the Modern Web Architecture.
Becas-Chile 2009.                                                      ACM Transactions on Internet Technology,
                                                                       2(2):115–150, May 2002.
8.   REFERENCES                                                   [14] Joe Futrelle. Harvesting RDF Triples. In Luc
 [1] Ben Adida, Mark Birbeck, Shane McCarron,                          Moreau and Ian Foster, editors, International
     and Steven Pemberton. RDFa in XHTML: Syntax                       Provenance and Annotation Workshop (IPAW 2006),
     and Processing — A Collection of Attributes and                   volume 4145 of Lecture Notes in Computer Science,
     Processing Rules for Extending XHTML to Support                   pages 64–72, Chicago, Illinois, May 2006.
     RDF. World Wide Web Consortium, Recommendation                    Springer-Verlag.
     REC-rdfa-syntax-20081014, October 2008.                      [15] Joe Gregorio. URI Template. Internet Draft
 [2] Rosa Alarcón and Erik Wilde. RESTler: Crawling                   draft-gregorio-uritemplate-04, March 2010.
     RESTful Services. In 19th International World Wide           [16] Marc Hadley. Web Application Description
     Web Conference Posters, Raleigh, North Carolina,                  Language. World Wide Web Consortium, Member
     April 2010. ACM Press.                                            Submission SUBM-wadl-20090831, August 2009.
 [3] Keith Alexander, Richard Cyganiak, Michael                   [17] Eran Hammer-Lahav. Link-based Resource
     Hausenblas, and Jun Zhaox. Describing Linked                      Descriptor Discovery. Internet Draft
     Datasets. In 2nd Workshop on Linked Data on the                   draft-hammer-discovery-03, March 2009.
     Web, Madrid, Spain, April 2009.
[18] Olaf Hartig and Jun Zhao. Using Web Data
     Provenance for Quality Assessment. In First
     International Workshop on the Role of Semantic Web
     in Provenance Management, Washington, D.C.,
     October 2009.
[19] Rohit Khare and Tantek Çelik. Microformats: A
     Pragmatic Path to the Semantic Web. In 15th
     International World Wide Web Conference Posters,
     Edinburgh, UK, May 2006. ACM Press.
[20] Jacek Kopecký, Karthik Gomadam, and Tomas
     Vitvar. hRESTS: An HTML Microformat for
     Describing RESTful Web Services. In 2008
     IEEE/WIC/ACM International Conference on Web
     Intelligence, pages 619–625, Sydney, Australia,
     December 2008.
[21] Jon Lathem, Karthik Gomadam, and Amit P.
     Sheth. SA-REST and (S)mashups: Adding Semantics
     to RESTful Services. In First IEEE International
     Conference on Semantic Computing (ICSC 2007),
     pages 469–476, Irvine, California, September 2007.
[22] Mark Nottingham. Feed Paging and Archiving.
     Internet RFC 5005, September 2007.
[23] Mark Nottingham. Web Linking. Internet Draft
     draft-nottingham-http-link-header-08, March 2010.
[24] Mark Nottingham and Robert Sayre. The Atom
     Syndication Format. Internet RFC 4287, December
     2005.
[25] Cesare Pautasso. Composing RESTful services with
     JOpera. In Alexandre Bergel and Johan Fabry,
     editors, International Conference on Software
     Composition 2009, volume 5634 of Lecture Notes in
     Computer Science, pages 142–159, Zürich,
     Switzerland, July 2009. Springer-Verlag.
[26] Cesare Pautasso and Erik Wilde. Why is the Web
     Loosely Coupled? A Multi-Faceted Metric for Service
     Design. In Quemada et al. [28], pages 911–920.
[27] Nicoleta Preda, Fabian M. Suchanek, Gjergji
     Kasneci, Thomas Neumann, Maya Ramanath, and
     Gerhard Weikum. ANGIE: Active Knowledge for
     Interactive Exploration. In 35th International
     Conference on Very Large Data Bases (VLDB 2009),
     pages 1570–1573, Lyon, France, August 2009. ACM
     Press.
[28] Juan Quemada, Gonzalo León, Yoëlle S.
     Maarek, and Wolfgang Nejdl, editors. 18th
     International World Wide Web Conference, Madrid,
     Spain, April 2009. ACM Press.
[29] Fabian M. Suchanek, Mauro Sozio, and Gerhard
     Weikum. SOFIE: A Self-Organizing Framework for
     Information Extraction. In Quemada et al. [28], pages
     911–920.
[30] Julius Volz, Christian Bizer, Martin Gaedke,
     and Georgi Kobilarov. Discovering and
     Maintaining Links on the Web of Data. In Bernstein
     et al. [8], pages 650–665.
[31] Erik Wilde and Michael Hausenblas. RESTful
     SPARQL? You Name It! — Aligning SPARQL with
     REST and Resource Orientation. In Walter Binder
     and Erik Wilde, editors, 4th Workshop on Emerging
     Web Services Technology (WEWST 2009), pages
     39–43, Eindhoven, Netherlands, November 2009.