Linking Data from RESTful Services
Rosa Alarcon Erik Wilde
Departamento de Ciencia de la Computacion School of Information
Pontificia Universidad Catolica de Chile UC Berkeley
ralarcon@ing.puc.cl dret@berkeley.edu
ABSTRACT the principles and rationale of both fields. For instance, on
One of the main goals of the Semantic Web is to extend cur- the Linked Data side, research projects aim to create large
rent human-readable Web resources with semantic informa- collections of RDF data by transforming structured data
tion encoded in a machine-processable form. One of its most sources into RDF using specialized mappings, and exposing
successful approaches is the Web of Data which by follow- the generated RDF dataset as RDF triple stores, often with
ing the principles of Linked Data have made available several SPARQL endpoints. Although this strategy make available
data sources compliant with the Semantic Web technologies, large collections of RDF data, they result also in centralistic
such as, RDF triple stores, and SPARQL endpoints. On the approaches where access is typically mediated through a sin-
other hand, the set of the architectural principles that under- gle “endpoint” (e.g. a dump of the whole site, an SPARQL
lie the human-readable Web has been conceptualized as the endpoint, a Tabulator-like interface, etc.) and due to the
Representational State Transfer (REST) architectural style. heterogeneous nature of the data sources interfaces, they
In this paper, we distill REST concepts in order to pro- require sophisticated mechanisms to retrieve, process, and
vide a mechanism for describing REST (i.e. human-readable publish the information [9], which challenges the scalability
Web) resources and transform them into semantic resources. and accuracy of the expose data since it can be outdated.
The strategy allowed us to harvest already existing Web re- One of the main tenets of REST is the primacy of re-
sources without requiring changes on the original sources, or sources that are uniquely identified by opaque URIs, that
ad-hoc interfaces. The presented strategy aims to contribute is, in order to avoid coupling between clients and servers,
to the availability of more semantic datasets and become a no assumptions must be made about the structure of the
further step to lower the entry barrier to semantic resources URI [26]. REST requires a uniform interface, that is, a set of
publishing. operations or methods with known semantics that changes
the state of the resources. The interface depends on the
URI scheme, for HTTP, the standard methods are GET, PUT,
Categories and Subject Descriptors POST, DELETE, and OPTIONS. Methods are external to the re-
H.3.5 [Information Storage and Retrieval]: Online In- sources, and are invoked by sending standard messages to
formation Services—Web-based services, Data sharing the Web server indicating the URI of the requested resource,
the method, the payload of the message and metadata.
General Terms A resource can have multiple “representations” that fol-
low a standardized format or media type (e.g., text/html,
Design, Documentation, Languages
application/xml, etc.) and can be negotiated with the
Web server. Representations convey the state of the client’s
Keywords interaction within the application and contain hyperlinks
REST, Web Data, Crawling that allow clients to discover other resources or change the
state of the represented resource. Most importantly, REST
1. INTRODUCTION services have no “endpoints”, instead, they consists of a
There is an increasing interest in the relationship of Rep- collection of resource URIs and a set of standard opera-
resentational State Transfer (REST) [13] , and the Semantic tions. This approach differs greatly from more traditional
Web, which has resulted in various approaches varying from SOAP/WSDL, where a service publish an endpoint that ex-
the semantic annotation of Web resources, to middleware poses the set of available operations (i.e. URIs, encoding,
that mediates resource handling. Followed approaches, re- parameters). Such operations have particular semantics that
semble the strategies of more traditional SOAP/WSDL se- must be known in advance, in order to be properly invoked
mantic services and neglect basic REST properties. REST by the client (coupling).
principles are somehow related to Linked Data principles in REST yield loosely coupled design [26], where architec-
the sense that resources have a unique identifier (URI), that tural concerns are separated among various standardized
must be dereferenceable through HTTP; resources are inter- components such as routers, Web servers and Web browsers,
linked, and by following those links new resources can be dis- resulting in a flexible, extensible and decentralized system
covered. However, differences arise when getting deeper into simple to maintain and capable of massive scalability. Un-
Copyright is held by the author/owner(s).
like distributed system, that hide distribution, decentralized
LDOW 2010, April 27, 2010, Raleigh, North Carolina. systems make it explicit with the eventual goal of architect-
.
ing a system of systems. ject belong to a dataset different than the object’s dataset.
Based on these REST principles, we present the Resource Directionality of the links can be modeled, and other prop-
Linking Language (ReLL), that describes RESTful Web ser- erties such as licensing (dcterms:license), the number of
vices and provides a natural mapping from the graph-oriented triples available in the dataset (void:statItem), the vocab-
world of RESTful services (resources interlinked by links ularies used in the dataset, and a SPARQL endpoint, are
found in resource representations) to the graph-based model also provided. voiD is accompanied of a Sitemap protocol
of RDF. By means of a ReLL description, a set of REST re- extension that indicates the location (URI) of the voiD de-
sources are described and exposed. Three applications were scription so that (semantic) web crawlers can find it and use
described and the resources harvested into a triple store. voiD’s information to index the dataset. The Silk-LSL (Link
Section 2 briefly discuss related approaches, and section 3 Specification Language) [30] is an XML-based language that
describes the proposed language. allows to define the rules (e.g. similarity metrics) and to find
certain types of links (e.g. owl:sameAs) between two data
2. RELATED WORK sources automatically (that is, to discover Linksets in the
terms of voiD).
Semantic Web Services (SWS) for REST are mainly fo-
voiD’s focus is on providing access and discovery for al-
cused on providing a semantic description of a REST ser-
ready existing datasets by publishing metadata, but a more
vice. SA-REST [21] and hREST/MicroWSMO [20] provide
granular approach (i.e. information about the retrieved re-
a list of input and output parameters, methods, and URIs
sources themselves) is not considered. Silk, allow to better
exposed by a REST service by means of property value pairs
index large centralized collections of RDF data, and dis-
or RDFa [1] annotations. The description itself can be trans-
covering dependencies between these datasets. While these
formed to RDF using a GRDDL-based [12] strategy for gen-
approaches are central to increasing the amount of linked
erating a domain ontology in RDF, but no information about
data on the Web, they are rather expensive because they
the REST resources themselves are retrieved.
are based on a lot of specialized mapping and publishing
The Web Application Description Language (WADL) [16]
work for just transforming one dataset [9].
describe RESTful services and place resources, identified by
LDDR, the Link-based Resource Descriptor Discovery [17]
predefined URI patterns, as first-class objects in a descrip-
is a proposal submitted to IETF that focuses on the re-
tion. WADL only supports HTTP methods with request
sources rather than the datasets. It allows resources to
and response elements. These elements contain representa-
indicate their descriptor’s location by using links in three
tions with a media type and (possibly) another URI. Rep-
modes, the element available in markup represen-
resentations contain typified parameters that in turn con-
tations that support typed-relations such as (X)HTML and
tain links to another resources’ URI. Generally speaking,
Atom; the HTTP Link Header; and a Link-pattern con-
WADL attempts to completely describe all possible aspects
tained in the resource’s description document located at
of a RESTful service, down to predefined URI patterns and
{host}/.well-known/ directory. In all three cases, the de-
the ways in which query parameters have to be composed
scriptor itself depends on the resource’s URI, in the form of
for certain types of requests, introducing a higher level of
{resource uri};about. Unlike the last approach, the for-
coupling for clients using such descriptions.
mer two would require to modify the resources in order to
In the same line, Battle and Benson [6] propose semantic
include the elements either in the resource’s code
annotations, similar to SA-REST, and extensions to SPARQL
or in the server side in order to process the HTTP Header.
in order to support an HTTP REST uniform interface. They
As for the descriptor itself, XRD1 , the Extensible Re-
also propose extensions to the payload of the HTTP REST
source Descriptor defines a small set of elements describing
methods (e.g., PUT, DELETE and GET) for maintaining consis-
the resource’s URI (and URI template), an XML signature,
tency between a REST resource and its semantic equivalent
the expiration date, and links to other resources. Links are
(a triple) in some triple store.
also annotated with metadata such as the target resource
The main problem of these approaches is that they follow
URI (and its URI template), mediatype, and the
the WSDL/SOAP service model; they do not align well with
property as defined by the HTTP Header Link Relation-
the principles of RESTful service design, since they disre-
ship Types. This approach, implies that there must exist an
gard fundamental properties such as the hypermedia nature
XDR document per resource (since the set of links is often
of REST, and the possibility of multiple representations for
different for each resource) which introduces high coupling
the resources. They also introduce coupling in their design
and may be impractical for a Web-scale application.
by adhering to URI templates for describing the URIs of re-
If XRD focuses on individual resources, POWDER, the
sources, input, and output parameters [25], or in the case of
Protocol for Web Description Resources2 recommended by
Battle and Benson, they introduce new semantics to the
W3C aims to facilitate the description of groups of resources
standard REST interface.
identified by Internationalized Resource Identifiers (IRIs).
EXPRESS [4] is a SWS model that explicitly avoids the
An iriset (a set of IRIs, not a set of resources) can be defined
RPC-orientation of the approaches mentioned so far. It
in terms of the properties of such IRIs, that is, the accepted
starts from HTTP’s uniform interface, and then describes
schemes (e.g. http, https), hosts, paths, and ports defined
the available resources in an OWL ontology. However, the
via regular expressions. The iriset properties are described
model of EXPRESS is a centralized one as well, because it is
by a descriptorset element that groups restriction attributes
assumed that there is a complete description of a Web Ser-
such as certified (indicates if the description certifies an-
vice’s available resources, and then this description is used
other resource) and sha1sum (providing a SHA-1 sum of
to generate URIs for classes, instances, and properties.
On the Linked data side, the Vocabulary Of Interlinked
1
Datasets (voiD) [3], describes datasets (sets of RDF triples) http://docs.oasis-open.org/xri/xrd/v1.0/xrd-1.0.html
2
as well as the sets of Linksets, that is, triples where the sub- http://www.w3.org/TR/2009/REC-powder-dr/
the described resource); and annotation properties, such as, richness of links discovered in known resources, though, the
displaytext (a descriptive text), displayicon (an image approach taken in Deimos uses tagging services to discover
URI) and seealso, label, comment that provide a related new resources.
resource URI, a description and a comment respectively. Finally, another attempt to provide a bridge between REST
Both restriction attributes and annotation properties have and the semantic Web is the W3C work in progress of an
well-defined semantics and can be translated automatically RDF vocabulary representing the HTTP protocol 3 . The ap-
to OWL, thought, they describe high level attributes. An proach captures properties such as the message exchanged
additional property, typeof is also translated into rdf:type (including the HTTP headers), the request (including the
and allows to specify a class for all the elements of an iriset. method and URI) and the response (including the HTTP
For instance, we could define the http:\twitter.com iriset status code number) with the goal of facilitating relevant
and indicate later that all the elements identified by such tasks such as content negotiation, as well as additional HTTP
URI belong to the class twitterPublicTimeLine. Prove- headers registered by the Internet Assigned Numbers Au-
nance information describing author, date and validity pe- thority (IANA).
riod (attribution) is also provided.
Unlike XDR, POWDER refers to group of resources iden-
tified by URI patterns (not URI templates) without requir-
3. RESOURCE LINKING LANGUAGE
ing changes in the resources, furthermore, POWDER makes Considering the related work, we derived a set of require-
possible to assign a class to the group of resources facilitat- ments for a REST resource description language that con-
ing later complex operations such as SPARQL queries. On sider REST constraints. For instance, in order to avoid cou-
the negative side, POWDER facilitates the description of pling URIs must be opaque, they must support multiple
group resources but not it does not provide support for the representations, and must consider linking among resources
resources discovery or an automatic harvesting process. as a fundamental property. In order to consider current
In the approach described by Futrelle [14], RDF is used installed infrastructure, it must require minimal or no in-
as the “integration layer” in a scenario of heterogeneous data tervention for existing Web resources; in order to scale it
sources, and the main focus is on harvesting well-known and must support a partial description of the resources that can
cooperating data sources. This approach can be applied to be later completed and/or modified, it must describe both
a variety of data sources, but they have to be cooperating single resources and groups of resources as well as the rela-
in the sense that they expose RDF themselves. The har- tionships among them, and finally it must be simple in order
vester’s main role is to be notified of new and updated data, to lower the entry barrier for future developers and foster
and to pull it in from these sources. While this scenario its adoption.
uses RDF’s power to unify heterogeneous data sources on The main constraints for designing RESTful services are
the metamodel level, it is only applicable in closed and co- resource identification, linking, and a uniform interface through
operating settings. In our approach, data sources are not re- which linked resources can be accessed. By linking we re-
quired to publish RDF themselves. As long as access to data fer to one of the core aspects of RESTful services, that is
is provided through RESTful services, they can be harvested the use of hypermedia as the engine of application state (HA-
and used as RDF. A weakness of the current implementa- TEOAS), which means that service interactions that in non-
tion is that updating is not supported in a way that allows REST approaches result in server state, are actually imple-
efficient incremental updates, but we plan to address this mented as clients following links to resources representing
issue in our future work mentioned in Section 6, where we that state. This results in services that are resource- and
describe extensions to our language that represent update link-centric, and thus a description language for RESTful
services (and thus the ability to use those for incremental services should focus on these two aspects.
updates) on the language level. The other two main constraints of REST, self-describing
SOFIE [29] focuses on information extraction from Web messages and stateless interactions, are more a question of
resources, and ANGIE [27] on using both extracted infor- how resource representations are retrieved, and how state
mation and Web services endpoints, for building a more in- is handled when interacting with services. For the purpose
teractive system that does not require an exhaustive crawl of designing RESTful services, all of these design issues are
of data, but retrieves information on demand. SOFIE thus relevant. For the purpose of describing a RESTful service
falls into the category of approaches that start from resource interface, the most important aspects are the resources rep-
representations, and use information retrieval methods to resentations that can be retrieved, the ways in which these
extract RDF from them. The current implementation of can link to other resources, and the protocol interactions
ANGIE focus on the dynamics of query processing in the that may be required to access those resources. The service
RDF data managed by the system, and uses a hardwired semantics also require an understanding of the semantics
set of Web services as the back-end. Similar to SA-REST, of the representations involved in the interactions with the
it uses a set of lowering/lifting transformations to translate service, but for the mere description of a service’s interface,
the results of function calls from and to RDF. ANGIE fo- these semantics are not required.
cuses on SPARQL processing (the framework is able to use Figure 1 shows the schema of ReLL. Elements are shown
Web services while processing SPARQL queries), and less on as rectangles and attributes as dashed rectangles. Sequences
the ability to easily accommodate a large variety of RESTful are depicted as a circle with the character “S”. A service
services. exposes a set of one or more resources that have a unique
Deimos [5] is another system that starts with information identifier (xml:id ), names and descriptions (human-readable
found on Web pages or through Web forms, and then uses labels) and optionally a URI pattern which describes the
semantic analysis to map the syntax of these representations constraints for the identifiers expected to be used for spe-
to semantically richer information. Instead of relying on the 3
http://www.w3.org/TR/HTTP-in-RDF10/
type
href id
type
1..∞ target
name schema
0..∞ 0..∞ minOccurs
desc
0..∞
link maxOccurs
collection
id S type
type S selector select
1..∞
protocol type
1..∞ representation 0..∞
0..∞ method
S request
1..∞
id S name response media
1..∞ 0..∞
desc
resource 0..∞
uri type
match
S 1..∞
S resources id
1..∞
name
S 0..∞
desc
representations S linktypes
linktype
targetNamespace
1..∞
base service S
Figure 1: ReLL Description Schema
cific resources (match). A resource may have representa- feeds extend Atom with new link relationships), and the
tions, which are the serialization of the resource in some effective set of link types that can appear in a concrete re-
syntax. This design naturally supports multiple represen- source using the paged feed representation thus is the union
tations for resources, but it does not support, per se, the of these different link types. Representations can be based
common practice of some Web services that use different on other representations, but only on abstract representa-
URIs for different representations of the same resource (such tions. The other use case of abstract representations is
as two URIs with .xml and .json suffixes, if these are two representations that are derived from concrete representa-
supported representation formats).4 We discuss this issue tions, such as a collection of representations that is available
further down, when we are discussing link types. through a paging mechanism in representation formats.
Representations can be associated with schemas for pos- Each representation can contain any number of links. A
sible validation (if schemas exist). Representations can also link is retrieved from the representation by using selectors.
be defined as part of the service directly, in which case they Selectors depend on the representation format, and thus
are abstract, which means that they are not associated with their definition and interpretation may depend on a lan-
any concrete resources. The most important use cases for guage (selector type) that is appropriated for a certain repre-
abstract representations are conventions for media or data sentation. For instance, for XML representations, the most
formats that should be described, so that they can be reused popular example for a selector mechanism is the XML Path
as a foundation for describing concrete resource representa- Language (XPath) [11, 7], which allows structured selections
tions. A real-world use case for this scenario is an abstract within XML document trees. A link defines a possible asso-
representation describing the media type application/xml, ciation leading from the resource’s representation containing
that serves as the basis for the abstract representation de- the link to another resource as determine by the target. In-
scribing the application/atom+xml media type for feeds ac- stead a resource URI, the target contains a valid resource
cording to Atom [24], which in turn serves as the basis for id in order to avoid coupling with the resources’ naming
the abstract representation describing the paged feeds media scheme.
type (i.e., feeds implementing feed paging [22]). Eventually, A link has a link type which represents the semantics of
a concrete service providing a resource may use paged feeds the link, but ReLL does not make any attempt to formalize
and thus the resource types its representation with the ab- the semantics; link types have a name and a description and
stract “paged feed” representation. The rationale behind thus can be documented in a service description, but their
this design is that various representations in this chain of semantics are outside of the scope of the description lan-
representations define different linking mechanisms (paged guage. Links can also contain protocol descriptions which
4 for each link specify the rules that govern the interaction
Such variations in the representation’s URIs could easily with the linked resource. This is important because links in
be covered by a URI pattern for the resource ending with
.(xml|json), but the variation of the suffix alone would not RESTful services not only have application-specific seman-
imply that it does not actually refer to a different resource, tics, following the links also may require different ways of
but only to a different representation. using the uniform interface provided by a certain protocol.
rell:link
rell:represents
rell:representation rell:resource
subClass
subClass subClass
iana:representation base:resource_id base:resource_id
base:link_type
subClass io io
iana:representation resource_uri1 resource_uri2
base:link_id
vCard:EMAIL vCard:TITLE
rell:represents bnode string
subClass
rell:represents
io
base:representation_type base:timestamp
http:ETag
dcterms:created
string date
Figure 2: Generating RDF triples from ReLL descriptions
Thus, it is possible for each link to specify how this link is named with an ID or timestamp (e.g., base:r123456789)
has to be traversed using a specific protocol. Practically that refers to the source or representation from where the
speaking, this means that after a link’s URI has been deter- graph information was collected. The representation is an
mined (for example by extracting the URI using a selector), instance of the representation type defined in the ReLL de-
the protocol is determined by inspecting the URI’s scheme, scription for the retrieved REST resource.
and then the protocol description might give additional hints Representations are subclasses of a concrete media type
about how to use methods or compose entities for invoking that can be derived from abstract representations or ab-
the uniform interface. Thus protocol descriptions are just stract media types as annotated in the ReLL descriptions.
one (the interface-specific) part of describing link semantics. Abstract representations are supported as classes that serve
as the basis for other abstract or concrete representations.
For representations, the upper ontology contains all stan-
4. FROM RELL TO RDF dardized media types from the IANA registry as classes.
ReLL main elements such as resource, representation, and The representation is then part of the provenance infor-
link serve as the core elements for a RDF/OWL minimal mation obtained when retrieved the REST resources (see
vocabulary shown in Figure 2 under the “rell” namespace. dashed elements in Figure 2). Other information such as
Resource, and representation are concepts while link, and the ETag property served by the Web server when retriev-
represents are predicates. Since ReLL describes a REST ing the REST resource is also collected if available; the date
application, it is used to generate a domain ontology for the when the information was retrieved (and hence the named
application. The resource id annotated in ReLL is used as graph was created) is also annotated. Other information as
the resource’s type and the link type as the predicate that indicated by [18] could also be included in future develop-
relates two resources. Domain specific resources are also ments.
subclasses of the rell:resource entity, and currently form
a domain-specific vocabulary by using the ReLL service’s
attribute base. 5. IMPLEMENTATION
We are maintaining the actual REST resources’ URIs to As a proof of concept, we have implemented RESTler [2],
identify them in the realm of the Semantic Web, however a crawler that follows the rules defined by ReLL descriptions
they are considered instances of the domain-specific classes in order to harvest REST resources. A complementary com-
discussed before. REST resources are linked together with ponent (a Translator) transforms the retrieved resources into
a link id instead of a link type. REST resources’ themselves RDF. Figure 3 describes the principal components of the
can be transformed to RDF following a GRDDL approach. approach. Rectangles represent software components, UML
For instance, in Figure 2, a resource is annotated with prop- note figures are used to represent files, straight lines repre-
erties defined in the vCard vocabulary, including simple (lit- sent information flow required in the configuration phase of
erals) and complex attributes (e.g. the EMAIL is generated the process (static), while dashed lines represent informa-
as an internal blank node). Naturally, the proper vocabu- tion flow that take place while the crawling process is being
laries depend on the resources. executed (dynamic).
With this approach, it is possible to retrieve a graph of RESTler, is a crawler that parses and uses ReLL descrip-
triples describing a REST resource (URI and attributes) tions as instructions for retrieving REST services’ resources.
and its relation to another REST resource, as shown by The crawler takes as input an XML document which is a
the dashed rectangle in Figure 2. The resulting graph [10] ReLL description, and a set of seed URIs (Figure 3), and
ReLL
RESTler
Translator vocabulary
seeds Crawler
.owl
HTTP client
Tiddy IANA
mapping vocabulary
.properties RDF/OWL .owl
RESTful service ReLL
description
REST .xslt
.xml file
resource Triplestore
representation .xslt
Figure 3: RESTler Architecture
produces as output a typed graph of the crawled resources as provenance information (base:timestamp). Additional
and the links connecting them. The crawler also takes as information is obtained trough XSLT files transforming re-
input authentication information, only basic authentication sources into RDF sentences, as indicated for the correspond-
is supported (username and password sent in the HTTP ing mapping file. Each ReLL document is transformed into
request) currently, but we plan to extend the crawler in or- RDF with a generic XSLT generating an ontology specific
der to support other authentication schemes (e.g., OAuth, to each application domain. Generated named graphs are
AuthSub). stored in a triple store. We use Sesame 2.0 as triple store
The crawler parses the description file, dereferences the and the system is implemented in Java. Sesame supports
initial URI (seeds), and retrieves the resource representation named graphs as quads, and we use the fourth component
considering the protocol, request method, and resource me- for storing provenance information.
dia type provided. Currently we support HTTP (an HTTP Finally, for each retrieved resource, the crawler recursively
client), and HTML, XHTML, Atom, JSON, RSS, and XML repeats the whole process.
as media types, and only the GET method. But the crawler
can be extended to support other media types, protocols 5.1 School/Twitter/Flickr and User Matching
and request methods. We applied RESTler to four scenarios: a subset of the
The resource URI is matched against a regular expres- Web site of the Information School at UC Berkeley, and two
sion that defines the resource type or id. From the retrieved well known REST-based applications, Twitter and Flickr.
representation, the crawler obtains the list of embedded The fourth service provide mappings among the users in
links to other representations by applying an XPath ex- each of these domains so that we can establish useful equiv-
pression (selector). The link’s target indicates the ex- alences by means of an owl:sameAs property. ReLL descrip-
pected resource type and requires additional information tions where created for each scenario and we retrieved 11,353
such as the protocol, and request method to follow and resources, 22,309 links among them which generated 55,548
the expected media type. If the target is not present in the triples.
link element, a “nofollow” condition is implied, since it is Figure 4 presents the ontology that was generated af-
not possible to crawl the linked resource (i.e., there is no in- ter transforming ReLL descriptions into RDF through a
formation about the media type, protocol, request method generic XSLT definition. The image was generated using
or expected resource type). OntoViz6 and was later refined for readability. The upper
It is possible as well to support computed links, that left corner presents the representation classes and their
is, links that are calculated.5 The crawler also evaluates corresponding iana media-types (e.g. iana-app:xhtml+xml,
whether the resource fulfills certain restrictions such as the iana-app:atom+xml, iana-app:xml, iana-txt:html and im-
type of the linked resources (target attribute), and the car- ages media types). The right-hand side presents the classes
dinality of the retrieved links (minOccurs and maxOccurs that model the UC Berkeley school domain’s resources (e.g.
attributes for the selector element). These restrictions are school:person, school:course, etc) and the relationships
optional and allow the crawler to determine whether the re- among resources (e.g. school:person-course).
source is well-formed and satisfies the preconditions given in The left-hand side shows the classes corresponding to the
the service description. Flickr domain (e.g. flickr:photostream, flickr:photo,
For each graph retrieved, a Translator is invoked for gener- etc) and their relationships (e.g. flickr:photo-sizes). At
ating RDF triples based on the ReLL description, that is, the the bottom of the figure, a subgraph describes the classes
subjects (resources’ URIs), properties (rdf:type, base:link that model the Twitter domain (e.g. twitter:follower,
id) and objects (linked resources’ URIs or values), as well twitter:user, etc) and the hyperlinks or relationships among
them (e.g. twitter:status-reply). At the center of the fig-
5 ure the minimal ontology described in Figure 2 is highlighted
Based on the ongoing work on the URI Template [15] lan-
guage, it might in the future be possible to define additional in bold and italics.
ways in which a URI can be composed based on input values
6
obtained from the current representation. A Protege plugin that generates .dot files
school:peoplelist-html school:publication-page-html twitter:status-xml iana:application twitter:public-timeline-xml
school:person-html school:publication-html isa isa isa isa isa
isa isa iana_app:xml isa iana_app:atom+xml iana_app:xhtml+xml
school:course-html iana:text
isa isa isa
isa isa isa isa
school:course-page-html isa isa
isa twitter:user-timeline-xml
isa iana_txt:html isa rell:representation
flickr:user-html isa
isa
isa isa iana_img:ief isa
flickr:photosize-html isa
iana_img:gif
flickr:camera-html flickr:photo-html
school:course-person*
flickr:image-jpeg isa iana_img:jpeg school:person
school:person-course*
rell:represents* school:peoplelist-member*
school:course school:person-website*
isa
um:usermap school:course-page-course* school:peoplelist
school:course-page isa
isa
flickr:user-page* isa
flickr:user-first* isa school:publication
flickr:user-last* flickr:photostream rell:collection isa
school:course-page-list*
rell:link* school:publication-page-publication*
flickr:user-previous*
school:courselist
flickr:user-next* isa
flickr:userFlickr isa
school:publicationlist
isa
flickr:user-photo* isa isa
flickr:photo isa rell:resource school:publication-page-next*
flickr:photo-taken* school:publication-page-page*
isa
flickr:camera school:publication-page-last*
flickr:photo-sizes* isa school:publication-page-previous*
isa isa
isa isa school:publication-page-first*
flickr:photosizes flickr:image isa
isa
flickr:photosizes-image* twitter:follower
isa school:publication-page
flickr:photosizes-page* twitter:status
isa twitter:follows*
flickr:sizecollection
twitter:public-timeline
isa twitter:status-reply*
twitter:timeline-statuses*
twitter:public-timeline-user-timeline*
twitter:user
twitter:timeline-user*
twitter:timeline-page9* twitter:user-timeline
twitter:timeline-page10* twitter:timeline-page2*
twitter:timeline-page11* twitter:timeline-page3*
twitter:timeline-page6*
twitter:timeline-page4*
twitter:timeline-page12* twitter:timeline-page7*
twitter:timeline-page13* twitter:paged-user-timeline
twitter:timeline-page8*
twitter:timeline-page5*
Figure 4: Resulting ontologies generated from ReLL descriptions
a school:person ;
vCard:FN "Erik Wilde" ;
vCard:ADR _:node14m5kienpx1603 ;
vCard:TITLE "Adjunct Professor" ;
vCard:ORG _:node14m5kienpx1604 ;
vCard:EMAIL _:node14m5kienpx1606 ;
vCard:TEL _:node14m5kienpx1607 ;
vCard:URL ;
vCard:PHOTO ;
school:person-website ;
school:person-course ,
,
,
.
Figure 5: Describing an instance of a school:person resource using N3 notation
Collections of resources can be also identified. For in- allow to reliably discover cases in which these constraints are
stance, at the bottom of the figure, the arcs between two re- not satisfied anymore, for example when new representations
sources are depicted, the twitter:user-timeline, and the or new identification and access schemes are used.
twitter:paged-user-timeline described a pagination rela- Furthermore, this kind of RESTful service description can
tionships, that is, 13 pages of the twitter:user-timeline also include the set of preconditions that must be satisfied
were collected and the pagination scheme is describe as links by a client to be able to consume a service. Should these
that lead to a numbered page (e.g. twitter:timeline-page2, preconditions change (because the service changes), then an
twitter:timeline-page3, etc). For the case of Flickr and analysis of the description of the preconditions used by the
the Information School the pagination scheme considers links client allows the client to detect the change (for example,
such as the first, last, next and previous page. a new representation format has been introduced), and to
The fourth RESTful service, the Usermap is show as a react in an appropriate way (for example, alerting the client
single class near the center of the figure. This is because the manager, attempting a fallback, or abort). By supporting
ReLL file contains only one class of resource (the usermap), the description of a set of preconditions, the description lan-
that is, an XML list mapping the users’ URIs between the guage can achieve loose coupling [26] and still allow clients
other three applications. to detect when they encounter something that they have
The REST resources themselves are transformed to RDF not been designed for. As for future work, we are planning
following a GRDDL approach. Figure 5 shows the attributes on considering more complex data models that support also
obtained for individuals of type school:person. Notice methods such as PUT, DELETE and POST allowing us to
that it is possible to annotate the relationships between the model resources that can be modified, and its relation with
REST resource (erikwilde) and its attributes. In the fig- the SPARQL proposals for supporting such operations [31].
ure these relationships are annotated with vCard, but other Our minting process consist of selecting the appropriated
information models can be used. name for the namespace (base), resource IDs, link IDs, link
types, and representation IDs. In the example presented in
Figure5, the resource instance’s namespace and predicates
6. CONCLUSIONS chosen for this description correspond to the vCard, but
The REST community is still discussing whether RESTful other properties (e.g. foaf) could be also used. We believe
services even should be described, and how such a descrip- that the selection of such properties must be responsibility
tion language could increase the coupling between a service of the ReLL designer. Furthermore, the properties used in
provider and a service consumer, so that REST’s goal of the ReLL description itself (e.g. school:person) could be
loosely coupled services could be compromised. We are tak- also described using Linked Data vocabularies. By following
ing a pragmatic position and claim that it is important to this approach the results of RESTler (e.g. triples datasets)
keep in mind that any kind of contract will introduce some could be better integrated with other Linked Data sources
coupling, that even loosely coupled services need a shared and the Linked Open Data cloud
set of assumptions, and that a more formal way of describ- By considering the URIs corresponding to REST resources,
ing those assumptions will help service providers and con- a natural content negotiation with the Web server will be
sumers in service documentation and consumption. A recent possible in order to retrieve an RDF-friendly media type
upswing of discoverable links between Web resources (such (e.g. application/rdf+xml) or the human-readable Web
as an uptake of microformats [19]) has led to the idea of a version of the same resource. As for limitations, we require
central registry for link relationships in the realm of Web to prepare a ReLL document for each REST service. This
linking [23], but this activity is still under active develop- approach has been successfully followed by others such as
ment. Virtuoso’s Sponger, that prepares Sponges or Cartridges
Our model is yet a static description of RESTful services tailored for an application interface such as REST APIs,
that does not cover the cases in which new resources or known metadata such as MS Office, or known Web sites
identification and access schemes are introduced. However, such as YouTube. RDB2RDF7 is also an ad-hoc approach
such a description allows to describe the status quo and the
7
cases which a client should expect, and therefore they also http://www.w3.org/2005/Incubator/rdb2rdf/
that transforms RDBMS to RDF representations. [4] Areeb Alowisheq, David E. Millard, and
We believe that by choosing Web technologies such as Thanassis Tiropanis. EXPRESS: EXPressing
XPATH, XSLT and XML as a the basis for ReLL docu- REstful Semantic Services Using Domain Ontologies.
ments, we are lowering the entry barrier to the semantic In Bernstein et al. [8], pages 941–948.
resources publishing, since most Web developers have the [5] José Luis Ambite, Sirish Darbha, Aman Goel,
knowledge and tools required to create their own ReLL de- Craig A. Knoblock, Kristina Lerman, Rahul
scription. This approach also allows developers to control Parundekar, and Thomas Russ. Automatically
the information they are collecting. Our next challenge is Constructing Semantic Web Services from Online
to further facilitate the creation of ReLL documents by sup- Sources. In Bernstein et al. [8], pages 17–32.
porting the dynamic and automatic generation of ReLL de- [6] Robert Battle and Edward Benson. Bridging the
scriptions. One of the challenges of this goal is the fact that Semantic Web and Web 2.0 with Representational
we need to design an specific XSLT for each resource type State Transfer (REST). Journal of Web Semantics,
in order to harvest specific information. A fully automatic 6(1), 2008.
approach would require information retrieval, text mining [7] Anders Berglund, Scott Boag, Donald D.
and probably machine learning techniques which greatly in- Chamberlin, Mary F. Fernández, Michael Kay,
creases the costs of the transformation an rises the entry Jonathan Robie, and Jérôme Siméon. XML Path
barrier for technology adopters. Language (XPath) 2.0. World Wide Web Consortium,
Having a document such as ReLL may serve as an in- Recommendation REC-xpath20-20070123, January
termediate layer that automatic agents can use also as a 2007.
contract describing the capacities of a REST service and [8] Abraham Bernstein, David R. Karger, Tom
translating them into RDF triples, by following the seman- Heath, Lee Feigenbaum, Diana Maynard, Enrico
tics (types) made explicit in the document. Our approach Motta, Krishnaprasad, and Thirunarayan,
can be seen as a complement to proposals such as voiD, since editors. 8th International Semantic Web Conference,
voiD describes the resulting datasets but does not support volume 5823 of Lecture Notes in Computer Science,
the triples harvesting process. Our approach will allow any Chantilly, Virginia, October 2009. Springer-Verlag.
Web content provider to publish ReLL descriptions for oth-
[9] Uldis Bojārs, John G. Breslin, Vassilios
ers to crawl their Web sites, or third-parties to develop a
Peristeras, Giovanni Tummarello, and Stefan
Web site’s description that accommodates their needs. The
Decker. Interlinking the Social Web with Semantics.
crawler’s result is a dataset that can be then described using
IEEE Intelligent Systems, 23(3):29–40, May 2008.
voiD. Silk, can be also used for the definition of additional
link patterns such as the user mapping that we created man- [10] Jeremy J. Carroll, Christian Bizer, Pat Hayes,
ually in this version; and LDDR’s linking techniques can be and Patrick Stickler. Named Graphs, Provenance
also applied, since it may allow resources to link to their and Trust. In Allan Ellis and Tatsuya Hagino,
descriptions. editors, 14th International World Wide Web
We have placed strong emphasis in a decoupled approach, Conference, pages 613–622, Chiba, Japan, May 2005.
where the components of the architecture maintain certain ACM Press.
degree of independence, and require knowledge and tools [11] James Clark and Steven J. DeRose. XML Path
already available and familiar to most Web developers, and Language (XPath) Version 1.0. World Wide Web
provide a simple model that may result familiar again to Consortium, Recommendation REC-xpath-19991116,
Web developers. Our final goal is to contribute in making November 1999.
available more semantic information while keeping a lower [12] Dan Connolly. Gleaning Resource Descriptions from
entry barrier for developers. Dialects of Languages (GRDDL). World Wide Web
Consortium, Recommendation REC-grddl-20070911,
September 2007.
7. ACKNOWLEDGMENTS [13] Roy Thomas Fielding and Richard N. Taylor.
This work was partially funded by CONICYT/Bicenntenial Principled Design of the Modern Web Architecture.
Becas-Chile 2009. ACM Transactions on Internet Technology,
2(2):115–150, May 2002.
8. REFERENCES [14] Joe Futrelle. Harvesting RDF Triples. In Luc
[1] Ben Adida, Mark Birbeck, Shane McCarron, Moreau and Ian Foster, editors, International
and Steven Pemberton. RDFa in XHTML: Syntax Provenance and Annotation Workshop (IPAW 2006),
and Processing — A Collection of Attributes and volume 4145 of Lecture Notes in Computer Science,
Processing Rules for Extending XHTML to Support pages 64–72, Chicago, Illinois, May 2006.
RDF. World Wide Web Consortium, Recommendation Springer-Verlag.
REC-rdfa-syntax-20081014, October 2008. [15] Joe Gregorio. URI Template. Internet Draft
[2] Rosa Alarcón and Erik Wilde. RESTler: Crawling draft-gregorio-uritemplate-04, March 2010.
RESTful Services. In 19th International World Wide [16] Marc Hadley. Web Application Description
Web Conference Posters, Raleigh, North Carolina, Language. World Wide Web Consortium, Member
April 2010. ACM Press. Submission SUBM-wadl-20090831, August 2009.
[3] Keith Alexander, Richard Cyganiak, Michael [17] Eran Hammer-Lahav. Link-based Resource
Hausenblas, and Jun Zhaox. Describing Linked Descriptor Discovery. Internet Draft
Datasets. In 2nd Workshop on Linked Data on the draft-hammer-discovery-03, March 2009.
Web, Madrid, Spain, April 2009.
[18] Olaf Hartig and Jun Zhao. Using Web Data
Provenance for Quality Assessment. In First
International Workshop on the Role of Semantic Web
in Provenance Management, Washington, D.C.,
October 2009.
[19] Rohit Khare and Tantek Çelik. Microformats: A
Pragmatic Path to the Semantic Web. In 15th
International World Wide Web Conference Posters,
Edinburgh, UK, May 2006. ACM Press.
[20] Jacek Kopecký, Karthik Gomadam, and Tomas
Vitvar. hRESTS: An HTML Microformat for
Describing RESTful Web Services. In 2008
IEEE/WIC/ACM International Conference on Web
Intelligence, pages 619–625, Sydney, Australia,
December 2008.
[21] Jon Lathem, Karthik Gomadam, and Amit P.
Sheth. SA-REST and (S)mashups: Adding Semantics
to RESTful Services. In First IEEE International
Conference on Semantic Computing (ICSC 2007),
pages 469–476, Irvine, California, September 2007.
[22] Mark Nottingham. Feed Paging and Archiving.
Internet RFC 5005, September 2007.
[23] Mark Nottingham. Web Linking. Internet Draft
draft-nottingham-http-link-header-08, March 2010.
[24] Mark Nottingham and Robert Sayre. The Atom
Syndication Format. Internet RFC 4287, December
2005.
[25] Cesare Pautasso. Composing RESTful services with
JOpera. In Alexandre Bergel and Johan Fabry,
editors, International Conference on Software
Composition 2009, volume 5634 of Lecture Notes in
Computer Science, pages 142–159, Zürich,
Switzerland, July 2009. Springer-Verlag.
[26] Cesare Pautasso and Erik Wilde. Why is the Web
Loosely Coupled? A Multi-Faceted Metric for Service
Design. In Quemada et al. [28], pages 911–920.
[27] Nicoleta Preda, Fabian M. Suchanek, Gjergji
Kasneci, Thomas Neumann, Maya Ramanath, and
Gerhard Weikum. ANGIE: Active Knowledge for
Interactive Exploration. In 35th International
Conference on Very Large Data Bases (VLDB 2009),
pages 1570–1573, Lyon, France, August 2009. ACM
Press.
[28] Juan Quemada, Gonzalo León, Yoëlle S.
Maarek, and Wolfgang Nejdl, editors. 18th
International World Wide Web Conference, Madrid,
Spain, April 2009. ACM Press.
[29] Fabian M. Suchanek, Mauro Sozio, and Gerhard
Weikum. SOFIE: A Self-Organizing Framework for
Information Extraction. In Quemada et al. [28], pages
911–920.
[30] Julius Volz, Christian Bizer, Martin Gaedke,
and Georgi Kobilarov. Discovering and
Maintaining Links on the Web of Data. In Bernstein
et al. [8], pages 650–665.
[31] Erik Wilde and Michael Hausenblas. RESTful
SPARQL? You Name It! — Aligning SPARQL with
REST and Resource Orientation. In Walter Binder
and Erik Wilde, editors, 4th Workshop on Emerging
Web Services Technology (WEWST 2009), pages
39–43, Eindhoven, Netherlands, November 2009.