Linking Data from RESTful Services Rosa Alarcon Erik Wilde Departamento de Ciencia de la Computacion School of Information Pontificia Universidad Catolica de Chile UC Berkeley ralarcon@ing.puc.cl dret@berkeley.edu ABSTRACT the principles and rationale of both fields. For instance, on One of the main goals of the Semantic Web is to extend cur- the Linked Data side, research projects aim to create large rent human-readable Web resources with semantic informa- collections of RDF data by transforming structured data tion encoded in a machine-processable form. One of its most sources into RDF using specialized mappings, and exposing successful approaches is the Web of Data which by follow- the generated RDF dataset as RDF triple stores, often with ing the principles of Linked Data have made available several SPARQL endpoints. Although this strategy make available data sources compliant with the Semantic Web technologies, large collections of RDF data, they result also in centralistic such as, RDF triple stores, and SPARQL endpoints. On the approaches where access is typically mediated through a sin- other hand, the set of the architectural principles that under- gle “endpoint” (e.g. a dump of the whole site, an SPARQL lie the human-readable Web has been conceptualized as the endpoint, a Tabulator-like interface, etc.) and due to the Representational State Transfer (REST) architectural style. heterogeneous nature of the data sources interfaces, they In this paper, we distill REST concepts in order to pro- require sophisticated mechanisms to retrieve, process, and vide a mechanism for describing REST (i.e. human-readable publish the information [9], which challenges the scalability Web) resources and transform them into semantic resources. and accuracy of the expose data since it can be outdated. The strategy allowed us to harvest already existing Web re- One of the main tenets of REST is the primacy of re- sources without requiring changes on the original sources, or sources that are uniquely identified by opaque URIs, that ad-hoc interfaces. The presented strategy aims to contribute is, in order to avoid coupling between clients and servers, to the availability of more semantic datasets and become a no assumptions must be made about the structure of the further step to lower the entry barrier to semantic resources URI [26]. REST requires a uniform interface, that is, a set of publishing. operations or methods with known semantics that changes the state of the resources. The interface depends on the URI scheme, for HTTP, the standard methods are GET, PUT, Categories and Subject Descriptors POST, DELETE, and OPTIONS. Methods are external to the re- H.3.5 [Information Storage and Retrieval]: Online In- sources, and are invoked by sending standard messages to formation Services—Web-based services, Data sharing the Web server indicating the URI of the requested resource, the method, the payload of the message and metadata. General Terms A resource can have multiple “representations” that fol- low a standardized format or media type (e.g., text/html, Design, Documentation, Languages application/xml, etc.) and can be negotiated with the Web server. Representations convey the state of the client’s Keywords interaction within the application and contain hyperlinks REST, Web Data, Crawling that allow clients to discover other resources or change the state of the represented resource. Most importantly, REST 1. INTRODUCTION services have no “endpoints”, instead, they consists of a There is an increasing interest in the relationship of Rep- collection of resource URIs and a set of standard opera- resentational State Transfer (REST) [13] , and the Semantic tions. This approach differs greatly from more traditional Web, which has resulted in various approaches varying from SOAP/WSDL, where a service publish an endpoint that ex- the semantic annotation of Web resources, to middleware poses the set of available operations (i.e. URIs, encoding, that mediates resource handling. Followed approaches, re- parameters). Such operations have particular semantics that semble the strategies of more traditional SOAP/WSDL se- must be known in advance, in order to be properly invoked mantic services and neglect basic REST properties. REST by the client (coupling). principles are somehow related to Linked Data principles in REST yield loosely coupled design [26], where architec- the sense that resources have a unique identifier (URI), that tural concerns are separated among various standardized must be dereferenceable through HTTP; resources are inter- components such as routers, Web servers and Web browsers, linked, and by following those links new resources can be dis- resulting in a flexible, extensible and decentralized system covered. However, differences arise when getting deeper into simple to maintain and capable of massive scalability. Un- Copyright is held by the author/owner(s). like distributed system, that hide distribution, decentralized LDOW 2010, April 27, 2010, Raleigh, North Carolina. systems make it explicit with the eventual goal of architect- . ing a system of systems. ject belong to a dataset different than the object’s dataset. Based on these REST principles, we present the Resource Directionality of the links can be modeled, and other prop- Linking Language (ReLL), that describes RESTful Web ser- erties such as licensing (dcterms:license), the number of vices and provides a natural mapping from the graph-oriented triples available in the dataset (void:statItem), the vocab- world of RESTful services (resources interlinked by links ularies used in the dataset, and a SPARQL endpoint, are found in resource representations) to the graph-based model also provided. voiD is accompanied of a Sitemap protocol of RDF. By means of a ReLL description, a set of REST re- extension that indicates the location (URI) of the voiD de- sources are described and exposed. Three applications were scription so that (semantic) web crawlers can find it and use described and the resources harvested into a triple store. voiD’s information to index the dataset. The Silk-LSL (Link Section 2 briefly discuss related approaches, and section 3 Specification Language) [30] is an XML-based language that describes the proposed language. allows to define the rules (e.g. similarity metrics) and to find certain types of links (e.g. owl:sameAs) between two data 2. RELATED WORK sources automatically (that is, to discover Linksets in the terms of voiD). Semantic Web Services (SWS) for REST are mainly fo- voiD’s focus is on providing access and discovery for al- cused on providing a semantic description of a REST ser- ready existing datasets by publishing metadata, but a more vice. SA-REST [21] and hREST/MicroWSMO [20] provide granular approach (i.e. information about the retrieved re- a list of input and output parameters, methods, and URIs sources themselves) is not considered. Silk, allow to better exposed by a REST service by means of property value pairs index large centralized collections of RDF data, and dis- or RDFa [1] annotations. The description itself can be trans- covering dependencies between these datasets. While these formed to RDF using a GRDDL-based [12] strategy for gen- approaches are central to increasing the amount of linked erating a domain ontology in RDF, but no information about data on the Web, they are rather expensive because they the REST resources themselves are retrieved. are based on a lot of specialized mapping and publishing The Web Application Description Language (WADL) [16] work for just transforming one dataset [9]. describe RESTful services and place resources, identified by LDDR, the Link-based Resource Descriptor Discovery [17] predefined URI patterns, as first-class objects in a descrip- is a proposal submitted to IETF that focuses on the re- tion. WADL only supports HTTP methods with request sources rather than the datasets. It allows resources to and response elements. These elements contain representa- indicate their descriptor’s location by using links in three tions with a media type and (possibly) another URI. Rep- modes, the element available in markup represen- resentations contain typified parameters that in turn con- tations that support typed-relations such as (X)HTML and tain links to another resources’ URI. Generally speaking, Atom; the HTTP Link Header; and a Link-pattern con- WADL attempts to completely describe all possible aspects tained in the resource’s description document located at of a RESTful service, down to predefined URI patterns and {host}/.well-known/ directory. In all three cases, the de- the ways in which query parameters have to be composed scriptor itself depends on the resource’s URI, in the form of for certain types of requests, introducing a higher level of {resource uri};about. Unlike the last approach, the for- coupling for clients using such descriptions. mer two would require to modify the resources in order to In the same line, Battle and Benson [6] propose semantic include the elements either in the resource’s code annotations, similar to SA-REST, and extensions to SPARQL or in the server side in order to process the HTTP Header. in order to support an HTTP REST uniform interface. They As for the descriptor itself, XRD1 , the Extensible Re- also propose extensions to the payload of the HTTP REST source Descriptor defines a small set of elements describing methods (e.g., PUT, DELETE and GET) for maintaining consis- the resource’s URI (and URI template), an XML signature, tency between a REST resource and its semantic equivalent the expiration date, and links to other resources. Links are (a triple) in some triple store. also annotated with metadata such as the target resource The main problem of these approaches is that they follow URI (and its URI template), mediatype, and the the WSDL/SOAP service model; they do not align well with property as defined by the HTTP Header Link Relation- the principles of RESTful service design, since they disre- ship Types. This approach, implies that there must exist an gard fundamental properties such as the hypermedia nature XDR document per resource (since the set of links is often of REST, and the possibility of multiple representations for different for each resource) which introduces high coupling the resources. They also introduce coupling in their design and may be impractical for a Web-scale application. by adhering to URI templates for describing the URIs of re- If XRD focuses on individual resources, POWDER, the sources, input, and output parameters [25], or in the case of Protocol for Web Description Resources2 recommended by Battle and Benson, they introduce new semantics to the W3C aims to facilitate the description of groups of resources standard REST interface. identified by Internationalized Resource Identifiers (IRIs). EXPRESS [4] is a SWS model that explicitly avoids the An iriset (a set of IRIs, not a set of resources) can be defined RPC-orientation of the approaches mentioned so far. It in terms of the properties of such IRIs, that is, the accepted starts from HTTP’s uniform interface, and then describes schemes (e.g. http, https), hosts, paths, and ports defined the available resources in an OWL ontology. However, the via regular expressions. The iriset properties are described model of EXPRESS is a centralized one as well, because it is by a descriptorset element that groups restriction attributes assumed that there is a complete description of a Web Ser- such as certified (indicates if the description certifies an- vice’s available resources, and then this description is used other resource) and sha1sum (providing a SHA-1 sum of to generate URIs for classes, instances, and properties. On the Linked data side, the Vocabulary Of Interlinked 1 Datasets (voiD) [3], describes datasets (sets of RDF triples) http://docs.oasis-open.org/xri/xrd/v1.0/xrd-1.0.html 2 as well as the sets of Linksets, that is, triples where the sub- http://www.w3.org/TR/2009/REC-powder-dr/ the described resource); and annotation properties, such as, richness of links discovered in known resources, though, the displaytext (a descriptive text), displayicon (an image approach taken in Deimos uses tagging services to discover URI) and seealso, label, comment that provide a related new resources. resource URI, a description and a comment respectively. Finally, another attempt to provide a bridge between REST Both restriction attributes and annotation properties have and the semantic Web is the W3C work in progress of an well-defined semantics and can be translated automatically RDF vocabulary representing the HTTP protocol 3 . The ap- to OWL, thought, they describe high level attributes. An proach captures properties such as the message exchanged additional property, typeof is also translated into rdf:type (including the HTTP headers), the request (including the and allows to specify a class for all the elements of an iriset. method and URI) and the response (including the HTTP For instance, we could define the http:\twitter.com iriset status code number) with the goal of facilitating relevant and indicate later that all the elements identified by such tasks such as content negotiation, as well as additional HTTP URI belong to the class twitterPublicTimeLine. Prove- headers registered by the Internet Assigned Numbers Au- nance information describing author, date and validity pe- thority (IANA). riod (attribution) is also provided. Unlike XDR, POWDER refers to group of resources iden- tified by URI patterns (not URI templates) without requir- 3. RESOURCE LINKING LANGUAGE ing changes in the resources, furthermore, POWDER makes Considering the related work, we derived a set of require- possible to assign a class to the group of resources facilitat- ments for a REST resource description language that con- ing later complex operations such as SPARQL queries. On sider REST constraints. For instance, in order to avoid cou- the negative side, POWDER facilitates the description of pling URIs must be opaque, they must support multiple group resources but not it does not provide support for the representations, and must consider linking among resources resources discovery or an automatic harvesting process. as a fundamental property. In order to consider current In the approach described by Futrelle [14], RDF is used installed infrastructure, it must require minimal or no in- as the “integration layer” in a scenario of heterogeneous data tervention for existing Web resources; in order to scale it sources, and the main focus is on harvesting well-known and must support a partial description of the resources that can cooperating data sources. This approach can be applied to be later completed and/or modified, it must describe both a variety of data sources, but they have to be cooperating single resources and groups of resources as well as the rela- in the sense that they expose RDF themselves. The har- tionships among them, and finally it must be simple in order vester’s main role is to be notified of new and updated data, to lower the entry barrier for future developers and foster and to pull it in from these sources. While this scenario its adoption. uses RDF’s power to unify heterogeneous data sources on The main constraints for designing RESTful services are the metamodel level, it is only applicable in closed and co- resource identification, linking, and a uniform interface through operating settings. In our approach, data sources are not re- which linked resources can be accessed. By linking we re- quired to publish RDF themselves. As long as access to data fer to one of the core aspects of RESTful services, that is is provided through RESTful services, they can be harvested the use of hypermedia as the engine of application state (HA- and used as RDF. A weakness of the current implementa- TEOAS), which means that service interactions that in non- tion is that updating is not supported in a way that allows REST approaches result in server state, are actually imple- efficient incremental updates, but we plan to address this mented as clients following links to resources representing issue in our future work mentioned in Section 6, where we that state. This results in services that are resource- and describe extensions to our language that represent update link-centric, and thus a description language for RESTful services (and thus the ability to use those for incremental services should focus on these two aspects. updates) on the language level. The other two main constraints of REST, self-describing SOFIE [29] focuses on information extraction from Web messages and stateless interactions, are more a question of resources, and ANGIE [27] on using both extracted infor- how resource representations are retrieved, and how state mation and Web services endpoints, for building a more in- is handled when interacting with services. For the purpose teractive system that does not require an exhaustive crawl of designing RESTful services, all of these design issues are of data, but retrieves information on demand. SOFIE thus relevant. For the purpose of describing a RESTful service falls into the category of approaches that start from resource interface, the most important aspects are the resources rep- representations, and use information retrieval methods to resentations that can be retrieved, the ways in which these extract RDF from them. The current implementation of can link to other resources, and the protocol interactions ANGIE focus on the dynamics of query processing in the that may be required to access those resources. The service RDF data managed by the system, and uses a hardwired semantics also require an understanding of the semantics set of Web services as the back-end. Similar to SA-REST, of the representations involved in the interactions with the it uses a set of lowering/lifting transformations to translate service, but for the mere description of a service’s interface, the results of function calls from and to RDF. ANGIE fo- these semantics are not required. cuses on SPARQL processing (the framework is able to use Figure 1 shows the schema of ReLL. Elements are shown Web services while processing SPARQL queries), and less on as rectangles and attributes as dashed rectangles. Sequences the ability to easily accommodate a large variety of RESTful are depicted as a circle with the character “S”. A service services. exposes a set of one or more resources that have a unique Deimos [5] is another system that starts with information identifier (xml:id ), names and descriptions (human-readable found on Web pages or through Web forms, and then uses labels) and optionally a URI pattern which describes the semantic analysis to map the syntax of these representations constraints for the identifiers expected to be used for spe- to semantically richer information. Instead of relying on the 3 http://www.w3.org/TR/HTTP-in-RDF10/ type href id type 1..∞ target name schema 0..∞ 0..∞ minOccurs desc 0..∞ link maxOccurs collection id S type type S selector select 1..∞ protocol type 1..∞ representation 0..∞ 0..∞ method S request 1..∞ id S name response media 1..∞ 0..∞ desc resource 0..∞ uri type match S 1..∞ S resources id 1..∞ name S 0..∞ desc representations S linktypes linktype targetNamespace 1..∞ base service S Figure 1: ReLL Description Schema cific resources (match). A resource may have representa- feeds extend Atom with new link relationships), and the tions, which are the serialization of the resource in some effective set of link types that can appear in a concrete re- syntax. This design naturally supports multiple represen- source using the paged feed representation thus is the union tations for resources, but it does not support, per se, the of these different link types. Representations can be based common practice of some Web services that use different on other representations, but only on abstract representa- URIs for different representations of the same resource (such tions. The other use case of abstract representations is as two URIs with .xml and .json suffixes, if these are two representations that are derived from concrete representa- supported representation formats).4 We discuss this issue tions, such as a collection of representations that is available further down, when we are discussing link types. through a paging mechanism in representation formats. Representations can be associated with schemas for pos- Each representation can contain any number of links. A sible validation (if schemas exist). Representations can also link is retrieved from the representation by using selectors. be defined as part of the service directly, in which case they Selectors depend on the representation format, and thus are abstract, which means that they are not associated with their definition and interpretation may depend on a lan- any concrete resources. The most important use cases for guage (selector type) that is appropriated for a certain repre- abstract representations are conventions for media or data sentation. For instance, for XML representations, the most formats that should be described, so that they can be reused popular example for a selector mechanism is the XML Path as a foundation for describing concrete resource representa- Language (XPath) [11, 7], which allows structured selections tions. A real-world use case for this scenario is an abstract within XML document trees. A link defines a possible asso- representation describing the media type application/xml, ciation leading from the resource’s representation containing that serves as the basis for the abstract representation de- the link to another resource as determine by the target. In- scribing the application/atom+xml media type for feeds ac- stead a resource URI, the target contains a valid resource cording to Atom [24], which in turn serves as the basis for id in order to avoid coupling with the resources’ naming the abstract representation describing the paged feeds media scheme. type (i.e., feeds implementing feed paging [22]). Eventually, A link has a link type which represents the semantics of a concrete service providing a resource may use paged feeds the link, but ReLL does not make any attempt to formalize and thus the resource types its representation with the ab- the semantics; link types have a name and a description and stract “paged feed” representation. The rationale behind thus can be documented in a service description, but their this design is that various representations in this chain of semantics are outside of the scope of the description lan- representations define different linking mechanisms (paged guage. Links can also contain protocol descriptions which 4 for each link specify the rules that govern the interaction Such variations in the representation’s URIs could easily with the linked resource. This is important because links in be covered by a URI pattern for the resource ending with .(xml|json), but the variation of the suffix alone would not RESTful services not only have application-specific seman- imply that it does not actually refer to a different resource, tics, following the links also may require different ways of but only to a different representation. using the uniform interface provided by a certain protocol. rell:link rell:represents rell:representation rell:resource subClass subClass subClass iana:representation base:resource_id base:resource_id base:link_type subClass io io iana:representation resource_uri1 resource_uri2 base:link_id vCard:EMAIL vCard:TITLE rell:represents bnode string subClass rell:represents io base:representation_type base:timestamp http:ETag dcterms:created string date Figure 2: Generating RDF triples from ReLL descriptions Thus, it is possible for each link to specify how this link is named with an ID or timestamp (e.g., base:r123456789) has to be traversed using a specific protocol. Practically that refers to the source or representation from where the speaking, this means that after a link’s URI has been deter- graph information was collected. The representation is an mined (for example by extracting the URI using a selector), instance of the representation type defined in the ReLL de- the protocol is determined by inspecting the URI’s scheme, scription for the retrieved REST resource. and then the protocol description might give additional hints Representations are subclasses of a concrete media type about how to use methods or compose entities for invoking that can be derived from abstract representations or ab- the uniform interface. Thus protocol descriptions are just stract media types as annotated in the ReLL descriptions. one (the interface-specific) part of describing link semantics. Abstract representations are supported as classes that serve as the basis for other abstract or concrete representations. For representations, the upper ontology contains all stan- 4. FROM RELL TO RDF dardized media types from the IANA registry as classes. ReLL main elements such as resource, representation, and The representation is then part of the provenance infor- link serve as the core elements for a RDF/OWL minimal mation obtained when retrieved the REST resources (see vocabulary shown in Figure 2 under the “rell” namespace. dashed elements in Figure 2). Other information such as Resource, and representation are concepts while link, and the ETag property served by the Web server when retriev- represents are predicates. Since ReLL describes a REST ing the REST resource is also collected if available; the date application, it is used to generate a domain ontology for the when the information was retrieved (and hence the named application. The resource id annotated in ReLL is used as graph was created) is also annotated. Other information as the resource’s type and the link type as the predicate that indicated by [18] could also be included in future develop- relates two resources. Domain specific resources are also ments. subclasses of the rell:resource entity, and currently form a domain-specific vocabulary by using the ReLL service’s attribute base. 5. IMPLEMENTATION We are maintaining the actual REST resources’ URIs to As a proof of concept, we have implemented RESTler [2], identify them in the realm of the Semantic Web, however a crawler that follows the rules defined by ReLL descriptions they are considered instances of the domain-specific classes in order to harvest REST resources. A complementary com- discussed before. REST resources are linked together with ponent (a Translator) transforms the retrieved resources into a link id instead of a link type. REST resources’ themselves RDF. Figure 3 describes the principal components of the can be transformed to RDF following a GRDDL approach. approach. Rectangles represent software components, UML For instance, in Figure 2, a resource is annotated with prop- note figures are used to represent files, straight lines repre- erties defined in the vCard vocabulary, including simple (lit- sent information flow required in the configuration phase of erals) and complex attributes (e.g. the EMAIL is generated the process (static), while dashed lines represent informa- as an internal blank node). Naturally, the proper vocabu- tion flow that take place while the crawling process is being laries depend on the resources. executed (dynamic). With this approach, it is possible to retrieve a graph of RESTler, is a crawler that parses and uses ReLL descrip- triples describing a REST resource (URI and attributes) tions as instructions for retrieving REST services’ resources. and its relation to another REST resource, as shown by The crawler takes as input an XML document which is a the dashed rectangle in Figure 2. The resulting graph [10] ReLL description, and a set of seed URIs (Figure 3), and ReLL RESTler Translator vocabulary seeds Crawler .owl HTTP client Tiddy IANA mapping vocabulary .properties RDF/OWL .owl RESTful service ReLL description REST .xslt .xml file resource Triplestore representation .xslt Figure 3: RESTler Architecture produces as output a typed graph of the crawled resources as provenance information (base:timestamp). Additional and the links connecting them. The crawler also takes as information is obtained trough XSLT files transforming re- input authentication information, only basic authentication sources into RDF sentences, as indicated for the correspond- is supported (username and password sent in the HTTP ing mapping file. Each ReLL document is transformed into request) currently, but we plan to extend the crawler in or- RDF with a generic XSLT generating an ontology specific der to support other authentication schemes (e.g., OAuth, to each application domain. Generated named graphs are AuthSub). stored in a triple store. We use Sesame 2.0 as triple store The crawler parses the description file, dereferences the and the system is implemented in Java. Sesame supports initial URI (seeds), and retrieves the resource representation named graphs as quads, and we use the fourth component considering the protocol, request method, and resource me- for storing provenance information. dia type provided. Currently we support HTTP (an HTTP Finally, for each retrieved resource, the crawler recursively client), and HTML, XHTML, Atom, JSON, RSS, and XML repeats the whole process. as media types, and only the GET method. But the crawler can be extended to support other media types, protocols 5.1 School/Twitter/Flickr and User Matching and request methods. We applied RESTler to four scenarios: a subset of the The resource URI is matched against a regular expres- Web site of the Information School at UC Berkeley, and two sion that defines the resource type or id. From the retrieved well known REST-based applications, Twitter and Flickr. representation, the crawler obtains the list of embedded The fourth service provide mappings among the users in links to other representations by applying an XPath ex- each of these domains so that we can establish useful equiv- pression (selector). The link’s target indicates the ex- alences by means of an owl:sameAs property. ReLL descrip- pected resource type and requires additional information tions where created for each scenario and we retrieved 11,353 such as the protocol, and request method to follow and resources, 22,309 links among them which generated 55,548 the expected media type. If the target is not present in the triples. link element, a “nofollow” condition is implied, since it is Figure 4 presents the ontology that was generated af- not possible to crawl the linked resource (i.e., there is no in- ter transforming ReLL descriptions into RDF through a formation about the media type, protocol, request method generic XSLT definition. The image was generated using or expected resource type). OntoViz6 and was later refined for readability. The upper It is possible as well to support computed links, that left corner presents the representation classes and their is, links that are calculated.5 The crawler also evaluates corresponding iana media-types (e.g. iana-app:xhtml+xml, whether the resource fulfills certain restrictions such as the iana-app:atom+xml, iana-app:xml, iana-txt:html and im- type of the linked resources (target attribute), and the car- ages media types). The right-hand side presents the classes dinality of the retrieved links (minOccurs and maxOccurs that model the UC Berkeley school domain’s resources (e.g. attributes for the selector element). These restrictions are school:person, school:course, etc) and the relationships optional and allow the crawler to determine whether the re- among resources (e.g. school:person-course). source is well-formed and satisfies the preconditions given in The left-hand side shows the classes corresponding to the the service description. Flickr domain (e.g. flickr:photostream, flickr:photo, For each graph retrieved, a Translator is invoked for gener- etc) and their relationships (e.g. flickr:photo-sizes). At ating RDF triples based on the ReLL description, that is, the the bottom of the figure, a subgraph describes the classes subjects (resources’ URIs), properties (rdf:type, base:link that model the Twitter domain (e.g. twitter:follower, id) and objects (linked resources’ URIs or values), as well twitter:user, etc) and the hyperlinks or relationships among them (e.g. twitter:status-reply). At the center of the fig- 5 ure the minimal ontology described in Figure 2 is highlighted Based on the ongoing work on the URI Template [15] lan- guage, it might in the future be possible to define additional in bold and italics. ways in which a URI can be composed based on input values 6 obtained from the current representation. A Protege plugin that generates .dot files school:peoplelist-html school:publication-page-html twitter:status-xml iana:application twitter:public-timeline-xml school:person-html school:publication-html isa isa isa isa isa isa isa iana_app:xml isa iana_app:atom+xml iana_app:xhtml+xml school:course-html iana:text isa isa isa isa isa isa isa school:course-page-html isa isa isa twitter:user-timeline-xml isa iana_txt:html isa rell:representation flickr:user-html isa isa isa isa iana_img:ief isa flickr:photosize-html isa iana_img:gif flickr:camera-html flickr:photo-html school:course-person* flickr:image-jpeg isa iana_img:jpeg school:person school:person-course* rell:represents* school:peoplelist-member* school:course school:person-website* isa um:usermap school:course-page-course* school:peoplelist school:course-page isa isa flickr:user-page* isa flickr:user-first* isa school:publication flickr:user-last* flickr:photostream rell:collection isa school:course-page-list* rell:link* school:publication-page-publication* flickr:user-previous* school:courselist flickr:user-next* isa flickr:userFlickr isa school:publicationlist isa flickr:user-photo* isa isa flickr:photo isa rell:resource school:publication-page-next* flickr:photo-taken* school:publication-page-page* isa flickr:camera school:publication-page-last* flickr:photo-sizes* isa school:publication-page-previous* isa isa isa isa school:publication-page-first* flickr:photosizes flickr:image isa isa flickr:photosizes-image* twitter:follower isa school:publication-page flickr:photosizes-page* twitter:status isa twitter:follows* flickr:sizecollection twitter:public-timeline isa twitter:status-reply* twitter:timeline-statuses* twitter:public-timeline-user-timeline* twitter:user twitter:timeline-user* twitter:timeline-page9* twitter:user-timeline twitter:timeline-page10* twitter:timeline-page2* twitter:timeline-page11* twitter:timeline-page3* twitter:timeline-page6* twitter:timeline-page4* twitter:timeline-page12* twitter:timeline-page7* twitter:timeline-page13* twitter:paged-user-timeline twitter:timeline-page8* twitter:timeline-page5* Figure 4: Resulting ontologies generated from ReLL descriptions a school:person ; vCard:FN "Erik Wilde" ; vCard:ADR _:node14m5kienpx1603 ; vCard:TITLE "Adjunct Professor" ; vCard:ORG _:node14m5kienpx1604 ; vCard:EMAIL _:node14m5kienpx1606 ; vCard:TEL _:node14m5kienpx1607 ; vCard:URL ; vCard:PHOTO ; school:person-website ; school:person-course , , , . Figure 5: Describing an instance of a school:person resource using N3 notation Collections of resources can be also identified. For in- allow to reliably discover cases in which these constraints are stance, at the bottom of the figure, the arcs between two re- not satisfied anymore, for example when new representations sources are depicted, the twitter:user-timeline, and the or new identification and access schemes are used. twitter:paged-user-timeline described a pagination rela- Furthermore, this kind of RESTful service description can tionships, that is, 13 pages of the twitter:user-timeline also include the set of preconditions that must be satisfied were collected and the pagination scheme is describe as links by a client to be able to consume a service. Should these that lead to a numbered page (e.g. twitter:timeline-page2, preconditions change (because the service changes), then an twitter:timeline-page3, etc). For the case of Flickr and analysis of the description of the preconditions used by the the Information School the pagination scheme considers links client allows the client to detect the change (for example, such as the first, last, next and previous page. a new representation format has been introduced), and to The fourth RESTful service, the Usermap is show as a react in an appropriate way (for example, alerting the client single class near the center of the figure. This is because the manager, attempting a fallback, or abort). By supporting ReLL file contains only one class of resource (the usermap), the description of a set of preconditions, the description lan- that is, an XML list mapping the users’ URIs between the guage can achieve loose coupling [26] and still allow clients other three applications. to detect when they encounter something that they have The REST resources themselves are transformed to RDF not been designed for. As for future work, we are planning following a GRDDL approach. Figure 5 shows the attributes on considering more complex data models that support also obtained for individuals of type school:person. Notice methods such as PUT, DELETE and POST allowing us to that it is possible to annotate the relationships between the model resources that can be modified, and its relation with REST resource (erikwilde) and its attributes. In the fig- the SPARQL proposals for supporting such operations [31]. ure these relationships are annotated with vCard, but other Our minting process consist of selecting the appropriated information models can be used. name for the namespace (base), resource IDs, link IDs, link types, and representation IDs. In the example presented in Figure5, the resource instance’s namespace and predicates 6. CONCLUSIONS chosen for this description correspond to the vCard, but The REST community is still discussing whether RESTful other properties (e.g. foaf) could be also used. We believe services even should be described, and how such a descrip- that the selection of such properties must be responsibility tion language could increase the coupling between a service of the ReLL designer. Furthermore, the properties used in provider and a service consumer, so that REST’s goal of the ReLL description itself (e.g. school:person) could be loosely coupled services could be compromised. We are tak- also described using Linked Data vocabularies. By following ing a pragmatic position and claim that it is important to this approach the results of RESTler (e.g. triples datasets) keep in mind that any kind of contract will introduce some could be better integrated with other Linked Data sources coupling, that even loosely coupled services need a shared and the Linked Open Data cloud set of assumptions, and that a more formal way of describ- By considering the URIs corresponding to REST resources, ing those assumptions will help service providers and con- a natural content negotiation with the Web server will be sumers in service documentation and consumption. A recent possible in order to retrieve an RDF-friendly media type upswing of discoverable links between Web resources (such (e.g. application/rdf+xml) or the human-readable Web as an uptake of microformats [19]) has led to the idea of a version of the same resource. As for limitations, we require central registry for link relationships in the realm of Web to prepare a ReLL document for each REST service. This linking [23], but this activity is still under active develop- approach has been successfully followed by others such as ment. Virtuoso’s Sponger, that prepares Sponges or Cartridges Our model is yet a static description of RESTful services tailored for an application interface such as REST APIs, that does not cover the cases in which new resources or known metadata such as MS Office, or known Web sites identification and access schemes are introduced. However, such as YouTube. RDB2RDF7 is also an ad-hoc approach such a description allows to describe the status quo and the 7 cases which a client should expect, and therefore they also http://www.w3.org/2005/Incubator/rdb2rdf/ that transforms RDBMS to RDF representations. [4] Areeb Alowisheq, David E. Millard, and We believe that by choosing Web technologies such as Thanassis Tiropanis. EXPRESS: EXPressing XPATH, XSLT and XML as a the basis for ReLL docu- REstful Semantic Services Using Domain Ontologies. ments, we are lowering the entry barrier to the semantic In Bernstein et al. [8], pages 941–948. resources publishing, since most Web developers have the [5] José Luis Ambite, Sirish Darbha, Aman Goel, knowledge and tools required to create their own ReLL de- Craig A. Knoblock, Kristina Lerman, Rahul scription. This approach also allows developers to control Parundekar, and Thomas Russ. Automatically the information they are collecting. Our next challenge is Constructing Semantic Web Services from Online to further facilitate the creation of ReLL documents by sup- Sources. In Bernstein et al. [8], pages 17–32. porting the dynamic and automatic generation of ReLL de- [6] Robert Battle and Edward Benson. Bridging the scriptions. One of the challenges of this goal is the fact that Semantic Web and Web 2.0 with Representational we need to design an specific XSLT for each resource type State Transfer (REST). Journal of Web Semantics, in order to harvest specific information. A fully automatic 6(1), 2008. approach would require information retrieval, text mining [7] Anders Berglund, Scott Boag, Donald D. and probably machine learning techniques which greatly in- Chamberlin, Mary F. Fernández, Michael Kay, creases the costs of the transformation an rises the entry Jonathan Robie, and Jérôme Siméon. XML Path barrier for technology adopters. Language (XPath) 2.0. World Wide Web Consortium, Having a document such as ReLL may serve as an in- Recommendation REC-xpath20-20070123, January termediate layer that automatic agents can use also as a 2007. contract describing the capacities of a REST service and [8] Abraham Bernstein, David R. Karger, Tom translating them into RDF triples, by following the seman- Heath, Lee Feigenbaum, Diana Maynard, Enrico tics (types) made explicit in the document. Our approach Motta, Krishnaprasad, and Thirunarayan, can be seen as a complement to proposals such as voiD, since editors. 8th International Semantic Web Conference, voiD describes the resulting datasets but does not support volume 5823 of Lecture Notes in Computer Science, the triples harvesting process. Our approach will allow any Chantilly, Virginia, October 2009. Springer-Verlag. Web content provider to publish ReLL descriptions for oth- [9] Uldis Bojārs, John G. Breslin, Vassilios ers to crawl their Web sites, or third-parties to develop a Peristeras, Giovanni Tummarello, and Stefan Web site’s description that accommodates their needs. The Decker. Interlinking the Social Web with Semantics. crawler’s result is a dataset that can be then described using IEEE Intelligent Systems, 23(3):29–40, May 2008. voiD. Silk, can be also used for the definition of additional link patterns such as the user mapping that we created man- [10] Jeremy J. Carroll, Christian Bizer, Pat Hayes, ually in this version; and LDDR’s linking techniques can be and Patrick Stickler. Named Graphs, Provenance also applied, since it may allow resources to link to their and Trust. In Allan Ellis and Tatsuya Hagino, descriptions. editors, 14th International World Wide Web We have placed strong emphasis in a decoupled approach, Conference, pages 613–622, Chiba, Japan, May 2005. where the components of the architecture maintain certain ACM Press. degree of independence, and require knowledge and tools [11] James Clark and Steven J. DeRose. XML Path already available and familiar to most Web developers, and Language (XPath) Version 1.0. World Wide Web provide a simple model that may result familiar again to Consortium, Recommendation REC-xpath-19991116, Web developers. Our final goal is to contribute in making November 1999. available more semantic information while keeping a lower [12] Dan Connolly. Gleaning Resource Descriptions from entry barrier for developers. Dialects of Languages (GRDDL). World Wide Web Consortium, Recommendation REC-grddl-20070911, September 2007. 7. ACKNOWLEDGMENTS [13] Roy Thomas Fielding and Richard N. Taylor. This work was partially funded by CONICYT/Bicenntenial Principled Design of the Modern Web Architecture. Becas-Chile 2009. ACM Transactions on Internet Technology, 2(2):115–150, May 2002. 8. REFERENCES [14] Joe Futrelle. Harvesting RDF Triples. In Luc [1] Ben Adida, Mark Birbeck, Shane McCarron, Moreau and Ian Foster, editors, International and Steven Pemberton. RDFa in XHTML: Syntax Provenance and Annotation Workshop (IPAW 2006), and Processing — A Collection of Attributes and volume 4145 of Lecture Notes in Computer Science, Processing Rules for Extending XHTML to Support pages 64–72, Chicago, Illinois, May 2006. RDF. World Wide Web Consortium, Recommendation Springer-Verlag. REC-rdfa-syntax-20081014, October 2008. [15] Joe Gregorio. URI Template. Internet Draft [2] Rosa Alarcón and Erik Wilde. RESTler: Crawling draft-gregorio-uritemplate-04, March 2010. RESTful Services. In 19th International World Wide [16] Marc Hadley. Web Application Description Web Conference Posters, Raleigh, North Carolina, Language. World Wide Web Consortium, Member April 2010. ACM Press. Submission SUBM-wadl-20090831, August 2009. [3] Keith Alexander, Richard Cyganiak, Michael [17] Eran Hammer-Lahav. Link-based Resource Hausenblas, and Jun Zhaox. Describing Linked Descriptor Discovery. Internet Draft Datasets. In 2nd Workshop on Linked Data on the draft-hammer-discovery-03, March 2009. Web, Madrid, Spain, April 2009. [18] Olaf Hartig and Jun Zhao. Using Web Data Provenance for Quality Assessment. In First International Workshop on the Role of Semantic Web in Provenance Management, Washington, D.C., October 2009. [19] Rohit Khare and Tantek Çelik. Microformats: A Pragmatic Path to the Semantic Web. In 15th International World Wide Web Conference Posters, Edinburgh, UK, May 2006. ACM Press. [20] Jacek Kopecký, Karthik Gomadam, and Tomas Vitvar. hRESTS: An HTML Microformat for Describing RESTful Web Services. In 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pages 619–625, Sydney, Australia, December 2008. [21] Jon Lathem, Karthik Gomadam, and Amit P. Sheth. SA-REST and (S)mashups: Adding Semantics to RESTful Services. In First IEEE International Conference on Semantic Computing (ICSC 2007), pages 469–476, Irvine, California, September 2007. [22] Mark Nottingham. Feed Paging and Archiving. Internet RFC 5005, September 2007. [23] Mark Nottingham. Web Linking. Internet Draft draft-nottingham-http-link-header-08, March 2010. [24] Mark Nottingham and Robert Sayre. The Atom Syndication Format. Internet RFC 4287, December 2005. [25] Cesare Pautasso. Composing RESTful services with JOpera. In Alexandre Bergel and Johan Fabry, editors, International Conference on Software Composition 2009, volume 5634 of Lecture Notes in Computer Science, pages 142–159, Zürich, Switzerland, July 2009. Springer-Verlag. [26] Cesare Pautasso and Erik Wilde. Why is the Web Loosely Coupled? A Multi-Faceted Metric for Service Design. In Quemada et al. [28], pages 911–920. [27] Nicoleta Preda, Fabian M. Suchanek, Gjergji Kasneci, Thomas Neumann, Maya Ramanath, and Gerhard Weikum. ANGIE: Active Knowledge for Interactive Exploration. In 35th International Conference on Very Large Data Bases (VLDB 2009), pages 1570–1573, Lyon, France, August 2009. ACM Press. [28] Juan Quemada, Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl, editors. 18th International World Wide Web Conference, Madrid, Spain, April 2009. ACM Press. [29] Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. SOFIE: A Self-Organizing Framework for Information Extraction. In Quemada et al. [28], pages 911–920. [30] Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. Discovering and Maintaining Links on the Web of Data. In Bernstein et al. [8], pages 650–665. [31] Erik Wilde and Michael Hausenblas. RESTful SPARQL? You Name It! — Aligning SPARQL with REST and Resource Orientation. In Walter Binder and Erik Wilde, editors, 4th Workshop on Emerging Web Services Technology (WEWST 2009), pages 39–43, Eindhoven, Netherlands, November 2009.