Introduction

Cooking HTTP content negotiation with Vapour

Diego Berrueta

Sergio Fernandez

sergio.fernandezg@fundacionctic.org 0

Ivan Frade

ivan.frade@gmail.com 1 0 Fundacion CTIC Gijon , Asturias , Spain 1 Universidad de Oviedo Oviedo , Asturias , Spain

The Semantic Web is built upon distributed knowledge published on the Web. But this vision cannot be implemented without some basic publishing rules to make the data readable for machines. Publication of RDF vocabularies must receive special attention due to their important role in the Semantic Web architecture. In this paper we describe a scripting web-based application that validates the compliance of a vocabulary against these publication rules. Practical experimentation allows to illustrate and to discuss some common problems in the implementation of these rules.

Introduction

The Semantic Web is a big container, a universal medium for data, information and knowledge exchange. However the Semantic Web is not only about putting data on the Web, there are some publishing rules. Tim Berners-Lee outlined four basic principles [ 2 ] to publish Linked Data on the Web [ 5 ]. These rules describe how URIs must be used as names for things, and how to provide useful information on these things and other related ones. Although there are guidelines to coin adequate URIs for things [ 11 ], there is still the need to provide the best representation of the information for each request depending on each kind of client agent, human or software.

Web documents are retrieved using mainly the HTTP [ 8 ] protocol. This protocol provides a mechanism known as content negotiation. By means of content negotiation, it is possible to serve Web content in the format or language preferred by the requester (if it is available, obviously). Using transparent content negotiation in HTTP [ 9 ] has many bene ts [ 12 ], and it can be implemented using di erent techniques in the Apache web server, as we describe in more detail in Section 2 of this paper. Section 3 introduces a scripting application that provides help and guidance to implement correctly and to debug HTTP content negotiation. In Section 4 the compliance of some of the most used vocabularies in the Semantic Web is evaluated with respect to the publishing rules. Finally, Section 5 presents some conclusions and future work.

Content negotiation with Apache: Recipes

Nowadays, the Apache HTTP Server is the most used Web server3, and it provides three di erent approaches to implement content negotiation4: Type Map: Explicit handlers are described in a le (.var) for each resource.

The necessary con guration is quite complicated and tedious, therefore this method is hardly used.

MultiViews: Based in the MIME-type and names of the les in a directory, MultiViews serves the most appropriate le in the current directory when the requested resource does not exist. It returns an additional header (Content-Location) to indicate the actual location of the le. This method can be extended using the Apache module mod mime to associate handlers to new le extensions. However, this solution has a quite important problem: it only works if the les exist in the same directory.

Rewrite request: Probably because the two alternatives above do not provide an easy solution, the most widely used method is one which was not speci cally designed to implement content negotiation. This mechanism uses the module mod rewrite in order to rewrite the request according to some ad-hoc rules. As a result, requests (for objects that are not known to be information resources) are redirected using the HTTP 303 status code, to the URI of the appropriate content depending on the format requested. Obviously, some time is lost with the extra HTTP round-trip, but it is negligible for many applications, as well as mandatory according the httpRange-14 resolution from the TAG5.

There is some ongoing work by W3C on Best Practice Recipes for Publishing RDF Vocabularies [ 3 ], a document which contains several recipes that advice on how to publish RDF/OWL Vocabularies using mod rewrite. This \cookbook" provides step-by-step instructions to publish vocabularies on the Web, and gives example con gurations designed to address the most common scenarios.

However, the Recipes are not perfect, and there is at least one important issue to be solved6. Tim Berners-Lee reported that \the recipe for responding to an accept header only responds to a header which EXACTLY matches [the rule antecedent]". For those requests which contain values for the Accept header such as text/* or application/rdf+xml;q=0.01, where wildcards or q-values are used, the actual representation served by the rules proposed in the Recipes might di er from the expected one. This is a serius problem of the Recipes, but it can be easily solved using a script at server-side.

3 http://www.netcraft.com/survey/ (retrieved 13/Mar/2008) 4 http://httpd.apache.org/docs/2.0/content-negotiation.html 5 http://www.w3.org/2001/tag/issues.html#httpRange-14 6 http://www.w3.org/2006/07/SWD/track/issues/58 (retrieved 13/Mar/2008) Vapour: a scripting approach to debug content negotiation

The previous section has shown that a correct implementation of content negotiation is not an easy task. Futhermore, manually testing an implementation is not complex, but it is long and cumbersome. Although it can be done with tools such as cURL7, this process is not handy, specially for intensive or repetitive tests against a vocabulary.

In order to facilitate the task of testing the results of content negotiation on a vocabulary, we developed a web-based application called Vapour8. This application provides a service that makes multiple requests to a set of URIs and runs a test suite speci cally designed to check the responses of the server against the best practices de ned in the Recipes. Tests are executed against the vocabulary URI, a class URI and a property URI (the latter two can be automatically detected). Based on the input parameters, the application provides a pointer to the speci c recipe, in case the user wants to learn more on how to con gure the web server. Vapour stores all assertions into an in-memory RDF store, using a combination of EARL [ 1 ], HTTP Vocabulary [ 10 ] and an RDF representation of the best practices of the Recipes. Thus Vapour can provide the 7 Richard Cyganiak's explanation of how to use cURL to debug content negotiation, blog post available at: http://dowhatimean.net/2007/02/ debugging-semantic-web-sites-with-curl 8 http://vapour.sourceforge.net/ reports both in HTML and in RDF, using content negotiation. The HTML view displays a clean and concise pass/fail report of each set of tests (Figure 1), as well as a detailed explanation of its ndings that includes a graphical representation of the HTTP dialog. Needless to say, the examples included in the Recipes are successfully validated by Vapour.

The application is written in Python, and it uses common Python libraries such as urllib, httplib, web.py and RDFLib. Scripting languages such as Python allow an agile development of applications in a short time with little resources. Source code of Vapour is available on SourceForge9, and an online demo of the service is also available10.

As depicted in Figure 2, Vapour has a simple and functional design that ful ls the objectives of the project. There are three components: cup is the web front-end. It uses the web.py framework and the Cheetah template engine, and it provides a web interface that allows the user to interact with other components of the application in a simple way. The architecture has been designed to allow other kind of interfaces. For instance, a command line interface is also provided. teapot is the core of the application. It launches HTTP dialogs (with and without content negotiation) to evaluate the response status code and contenttype. Teapot requests the URI of the vocabulary, and also the URIs of a class and a property from the vocabulary. All the resulting assertions are inserted into the RDF store. strainer is the module in charge of generating the reports for each test performed by the application. It queries the RDF model using SPARQL to get

9 http://sourceforge.net/projects/vapour/ 10 http://idi.fundacionctic.org/vapour

the result and trace of each test, and it produces a report in XHTML or RDF/XML. For the XHTML reports, we also use Cheetah templates.

The service can be deployed as a normal CGI in Apache or using a Python web framework. We reviewed the security of the application avoiding some common problems in this kind of applications, such as limiting requests per client. 4

Experimental results

Practical experimentation illustrates some common problems of how content negotiation is implemented, and enables the discussion on these problems. We checked some RDFS and OWL vocabularies published on the web. We chose the most frequently used vocabularies, in terms of number of instances, according to the last scienti c study [ 7 ]. However, this ranking is aging (2004), so we also included some newer vocabularies, such as SKOS, DOAP and SIOC, which are also popular according to more up-to-date sources11. http://www.w3.org/1999/02/22-rdf-syntax-ns# 3/3 http://www.w3.org/2000/01/rdf-schema# 3/3 http://xmlns.com/foaf/0.1/ 3/3 http://purl.org/dc/elements/1.1/ 2/2 http://www.w3.org/2003/01/geo/wgs84 pos# 3/3 http://rdfs.org/sioc/ns# 3/3 http://www.w3.org/2004/02/skos/core# 3/3 http://usefulinc.com/ns/doap# 3/3 http://purl.org/rss/1.0/ 1/3 http://semantic-mediawiki.org/swivt/1.0# 0/3 N/A RDF/XML N/A RDF/XML 3/3 HTML 0/2 RDF/XML 0/3 RDF/XML 0/3 RDF/XML 3/3 RDF/XML 0/3 RDF/XML 0/3 HTML 0/3 text/plain Accept Accept Default RDF HTML response

Table 1 summarizes the results of running Vapour against a list of ten popular vocabularies of the semantic web. These results provide an approximation to the quality of the publication of vocabularies on the web. All the vocabularies were retrieved on 12/Mar/2008. For each vocabulary, the vocabulary URI, a class URI and a property URI were tested (except for Dublin Core, which does not have any class). The results show that most vocabularies are correctly published as RDF. However, it is signi cant that most vocabularies do not correctly provide HTML representations of the resources, even if they are available. Additionally, some vocabularies return an incorrect MIME type, such as text/plain or application/xml. 11 See the ranking at http://pingthesemanticweb.com/stats/namespaces.php (retrieved 12/Mar/2008) by PingTheSemanticWeb.com [ 6 ]

Conclusions

Content negotiation is a powerful technique. Although the basic mechanism is simple, it is often badly implemented. Vapour is useful to debug and to provide advice on how to solve common problems, as well as to provide quality assurance in the best possible way.

The application presented in this paper is fairly simple, but it actually helps to debug the implementation of content negotiation in web servers. It is particularly interesting that Vapour provides the results also in RDF . Using this machine-readable format, it should be easy to build another service on top of Vapour, and to use these data for other tasks, such as a service to check the compliance of a speci c collection of vocabularies published on the Web.

Current best practices (and consequently, Vapour) should probably be updated to cover new methods to publish RDF data, such as RDFa [ 4 ] embedded in XHTML pages. In the future, we would like to extend Vapour to cover more generic validations in Linked Open Data scenarios, and to help webmasters to better understand some common implementation issues.

Abou-Zahra . Evaluation and Report Language (EARL) . Working draft, W3C , 2007 .

2. T. Berners-Lee. Linked Data Design Issues . Available at http://www.w3.org/ DesignIssues/LinkedData.html, 2006 .

Berrueta and

Phipps . Best Practice Recipes for Publishing RDF Vocabularies . Working draft, W3C , 2008 .

Birbeck ,

Pemberton , and

Adida . RDFa Syntax, a collection of attributes for layering RDF on XML languages . Technical report, W3C , 2006 .

Bizer ,

Cyganiak , and

Heath . How to Publish Linked Data on the Web . Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/ LinkedDataTutorial/, 2007 .

Bojars ,

Passant ,

Giasson , and

Breslin . An Architecture to Discover and Query Decentralized RDF Data . In 3rd Workshop on Scripting for the Semantic Web , 2007 .

Ding ,

Zhou ,

Finin , and

Joshi . How the Semantic Web is Being Used: An Analysis of FOAF Documents . In 38th International Conference on System Sciences , January 2005 .

Gettys ,

Mogul ,

Frystyk ,

Masinter ,

Leach , and

Berners-Lee . RFC 2616: Hypertext Transfer Protocol - HTTP/1 .1. RFC, IETF, 1999 .

Holtman and

Mutz . Transparent Content negotiation in HTTP . RFC, IETF, 1998 .

10. J. Koch , C. A.

Velasco , and S.

Abou-Zahra . HTTP Vocabulary in RDF . Technical report, W3C , 2007 .

11.

Sauermann and

Cyganiak . Cool URIs for the Semantic Web . Working draft, W3C , 2007 .

12.

Seshan ,

Stemm , and

Katz . Bene ts of Transparent Content Negotiation in HTTP . In Proceedings of the IEEE Globcom 98 Internet Mini-Conference , 1998 .