<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cooking HTTP content negotiation with Vapour</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diego Berrueta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Fernandez</string-name>
          <email>sergio.fernandezg@fundacionctic.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Frade</string-name>
          <email>ivan.frade@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fundacion CTIC Gijon</institution>
          ,
          <addr-line>Asturias</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Oviedo Oviedo</institution>
          ,
          <addr-line>Asturias</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Semantic Web is built upon distributed knowledge published on the Web. But this vision cannot be implemented without some basic publishing rules to make the data readable for machines. Publication of RDF vocabularies must receive special attention due to their important role in the Semantic Web architecture. In this paper we describe a scripting web-based application that validates the compliance of a vocabulary against these publication rules. Practical experimentation allows to illustrate and to discuss some common problems in the implementation of these rules.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Semantic Web is a big container, a universal medium for data, information
and knowledge exchange. However the Semantic Web is not only about putting
data on the Web, there are some publishing rules. Tim Berners-Lee outlined
four basic principles [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to publish Linked Data on the Web [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These rules
describe how URIs must be used as names for things, and how to provide useful
information on these things and other related ones. Although there are guidelines
to coin adequate URIs for things [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], there is still the need to provide the best
representation of the information for each request depending on each kind of
client agent, human or software.
      </p>
      <p>
        Web documents are retrieved using mainly the HTTP [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] protocol. This
protocol provides a mechanism known as content negotiation. By means of content
negotiation, it is possible to serve Web content in the format or language
preferred by the requester (if it is available, obviously). Using transparent content
negotiation in HTTP [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has many bene ts [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and it can be implemented using
di erent techniques in the Apache web server, as we describe in more detail in
Section 2 of this paper. Section 3 introduces a scripting application that
provides help and guidance to implement correctly and to debug HTTP content
negotiation. In Section 4 the compliance of some of the most used vocabularies
in the Semantic Web is evaluated with respect to the publishing rules. Finally,
Section 5 presents some conclusions and future work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Content negotiation with Apache: Recipes</title>
      <p>Nowadays, the Apache HTTP Server is the most used Web server3, and it
provides three di erent approaches to implement content negotiation4:
Type Map: Explicit handlers are described in a le (.var) for each resource.</p>
      <p>The necessary con guration is quite complicated and tedious, therefore this
method is hardly used.</p>
      <p>MultiViews: Based in the MIME-type and names of the les in a
directory, MultiViews serves the most appropriate le in the current directory
when the requested resource does not exist. It returns an additional header
(Content-Location) to indicate the actual location of the le. This method
can be extended using the Apache module mod mime to associate handlers to
new le extensions. However, this solution has a quite important problem:
it only works if the les exist in the same directory.</p>
      <p>Rewrite request: Probably because the two alternatives above do not
provide an easy solution, the most widely used method is one which was not
speci cally designed to implement content negotiation. This mechanism uses
the module mod rewrite in order to rewrite the request according to some
ad-hoc rules. As a result, requests (for objects that are not known to be
information resources) are redirected using the HTTP 303 status code, to the
URI of the appropriate content depending on the format requested.
Obviously, some time is lost with the extra HTTP round-trip, but it is negligible
for many applications, as well as mandatory according the httpRange-14
resolution from the TAG5.</p>
      <p>
        There is some ongoing work by W3C on Best Practice Recipes for Publishing
RDF Vocabularies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a document which contains several recipes that advice on
how to publish RDF/OWL Vocabularies using mod rewrite. This \cookbook"
provides step-by-step instructions to publish vocabularies on the Web, and gives
example con gurations designed to address the most common scenarios.
      </p>
      <p>However, the Recipes are not perfect, and there is at least one important
issue to be solved6. Tim Berners-Lee reported that \the recipe for responding
to an accept header only responds to a header which EXACTLY matches [the
rule antecedent]". For those requests which contain values for the Accept header
such as text/* or application/rdf+xml;q=0.01, where wildcards or q-values
are used, the actual representation served by the rules proposed in the Recipes
might di er from the expected one. This is a serius problem of the Recipes, but
it can be easily solved using a script at server-side.</p>
      <sec id="sec-2-1">
        <title>3 http://www.netcraft.com/survey/ (retrieved 13/Mar/2008) 4 http://httpd.apache.org/docs/2.0/content-negotiation.html 5 http://www.w3.org/2001/tag/issues.html#httpRange-14 6 http://www.w3.org/2006/07/SWD/track/issues/58 (retrieved 13/Mar/2008)</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Vapour: a scripting approach to debug content negotiation</title>
      <p>The previous section has shown that a correct implementation of content
negotiation is not an easy task. Futhermore, manually testing an implementation is
not complex, but it is long and cumbersome. Although it can be done with tools
such as cURL7, this process is not handy, specially for intensive or repetitive
tests against a vocabulary.</p>
      <p>
        In order to facilitate the task of testing the results of content negotiation
on a vocabulary, we developed a web-based application called Vapour8. This
application provides a service that makes multiple requests to a set of URIs
and runs a test suite speci cally designed to check the responses of the server
against the best practices de ned in the Recipes. Tests are executed against
the vocabulary URI, a class URI and a property URI (the latter two can be
automatically detected). Based on the input parameters, the application provides
a pointer to the speci c recipe, in case the user wants to learn more on how to
con gure the web server. Vapour stores all assertions into an in-memory RDF
store, using a combination of EARL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], HTTP Vocabulary [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and an RDF
representation of the best practices of the Recipes. Thus Vapour can provide the
7 Richard Cyganiak's explanation of how to use cURL to debug
content negotiation, blog post available at: http://dowhatimean.net/2007/02/
debugging-semantic-web-sites-with-curl
8 http://vapour.sourceforge.net/
reports both in HTML and in RDF, using content negotiation. The HTML view
displays a clean and concise pass/fail report of each set of tests (Figure 1), as well
as a detailed explanation of its ndings that includes a graphical representation
of the HTTP dialog. Needless to say, the examples included in the Recipes are
successfully validated by Vapour.
      </p>
      <p>The application is written in Python, and it uses common Python libraries
such as urllib, httplib, web.py and RDFLib. Scripting languages such as Python
allow an agile development of applications in a short time with little resources.
Source code of Vapour is available on SourceForge9, and an online demo of the
service is also available10.</p>
      <p>As depicted in Figure 2, Vapour has a simple and functional design that
ful ls the objectives of the project. There are three components:
cup is the web front-end. It uses the web.py framework and the Cheetah
template engine, and it provides a web interface that allows the user to interact
with other components of the application in a simple way. The architecture
has been designed to allow other kind of interfaces. For instance, a command
line interface is also provided.
teapot is the core of the application. It launches HTTP dialogs (with and
without content negotiation) to evaluate the response status code and
contenttype. Teapot requests the URI of the vocabulary, and also the URIs of a
class and a property from the vocabulary. All the resulting assertions are
inserted into the RDF store.
strainer is the module in charge of generating the reports for each test
performed by the application. It queries the RDF model using SPARQL to get</p>
      <sec id="sec-3-1">
        <title>9 http://sourceforge.net/projects/vapour/ 10 http://idi.fundacionctic.org/vapour</title>
        <p>the result and trace of each test, and it produces a report in XHTML or
RDF/XML. For the XHTML reports, we also use Cheetah templates.</p>
        <p>The service can be deployed as a normal CGI in Apache or using a Python
web framework. We reviewed the security of the application avoiding some
common problems in this kind of applications, such as limiting requests per client.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental results</title>
      <p>
        Practical experimentation illustrates some common problems of how content
negotiation is implemented, and enables the discussion on these problems. We
checked some RDFS and OWL vocabularies published on the web. We chose the
most frequently used vocabularies, in terms of number of instances, according
to the last scienti c study [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, this ranking is aging (2004), so we also
included some newer vocabularies, such as SKOS, DOAP and SIOC, which are
also popular according to more up-to-date sources11.
http://www.w3.org/1999/02/22-rdf-syntax-ns# 3/3
http://www.w3.org/2000/01/rdf-schema# 3/3
http://xmlns.com/foaf/0.1/ 3/3
http://purl.org/dc/elements/1.1/ 2/2
http://www.w3.org/2003/01/geo/wgs84 pos# 3/3
http://rdfs.org/sioc/ns# 3/3
http://www.w3.org/2004/02/skos/core# 3/3
http://usefulinc.com/ns/doap# 3/3
http://purl.org/rss/1.0/ 1/3
http://semantic-mediawiki.org/swivt/1.0# 0/3
N/A RDF/XML
N/A RDF/XML
3/3 HTML
0/2 RDF/XML
0/3 RDF/XML
0/3 RDF/XML
3/3 RDF/XML
0/3 RDF/XML
0/3 HTML
0/3 text/plain
Accept Accept Default
RDF HTML response
      </p>
      <p>
        Table 1 summarizes the results of running Vapour against a list of ten
popular vocabularies of the semantic web. These results provide an approximation
to the quality of the publication of vocabularies on the web. All the
vocabularies were retrieved on 12/Mar/2008. For each vocabulary, the vocabulary URI, a
class URI and a property URI were tested (except for Dublin Core, which does
not have any class). The results show that most vocabularies are correctly
published as RDF. However, it is signi cant that most vocabularies do not correctly
provide HTML representations of the resources, even if they are available.
Additionally, some vocabularies return an incorrect MIME type, such as text/plain
or application/xml.
11 See the ranking at http://pingthesemanticweb.com/stats/namespaces.php (retrieved
12/Mar/2008) by PingTheSemanticWeb.com [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Content negotiation is a powerful technique. Although the basic mechanism is
simple, it is often badly implemented. Vapour is useful to debug and to provide
advice on how to solve common problems, as well as to provide quality assurance
in the best possible way.</p>
      <p>The application presented in this paper is fairly simple, but it actually helps
to debug the implementation of content negotiation in web servers. It is
particularly interesting that Vapour provides the results also in RDF . Using this
machine-readable format, it should be easy to build another service on top of
Vapour, and to use these data for other tasks, such as a service to check the
compliance of a speci c collection of vocabularies published on the Web.</p>
      <p>
        Current best practices (and consequently, Vapour) should probably be
updated to cover new methods to publish RDF data, such as RDFa [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] embedded
in XHTML pages. In the future, we would like to extend Vapour to cover more
generic validations in Linked Open Data scenarios, and to help webmasters to
better understand some common implementation issues.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abou-Zahra</surname>
          </string-name>
          .
          <article-title>Evaluation and Report Language (EARL)</article-title>
          .
          <source>Working draft, W3C</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>T.</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked Data Design Issues</article-title>
          . Available at http://www.w3.org/ DesignIssues/LinkedData.html,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Berrueta</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Phipps</surname>
          </string-name>
          .
          <article-title>Best Practice Recipes for Publishing RDF Vocabularies</article-title>
          . Working draft,
          <source>W3C</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Birbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pemberton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Adida</surname>
          </string-name>
          .
          <article-title>RDFa Syntax, a collection of attributes for layering RDF on XML languages</article-title>
          .
          <source>Technical report, W3C</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          .
          <article-title>How to Publish Linked Data on the Web</article-title>
          . Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/ LinkedDataTutorial/,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>U.</given-names>
            <surname>Bojars</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giasson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Breslin</surname>
          </string-name>
          .
          <article-title>An Architecture to Discover and Query Decentralized RDF Data</article-title>
          .
          <source>In 3rd Workshop on Scripting for the Semantic Web</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          .
          <article-title>How the Semantic Web is Being Used: An Analysis of FOAF Documents</article-title>
          .
          <source>In 38th International Conference on System Sciences</source>
          ,
          <year>January 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gettys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mogul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Frystyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Masinter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <source>RFC 2616: Hypertext Transfer Protocol - HTTP/1</source>
          .1. RFC, IETF,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>K.</given-names>
            <surname>Holtman</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mutz</surname>
          </string-name>
          .
          <article-title>Transparent Content negotiation in HTTP</article-title>
          . RFC, IETF,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. Koch</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>Velasco</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Abou-Zahra</surname>
          </string-name>
          .
          <article-title>HTTP Vocabulary in RDF</article-title>
          .
          <source>Technical report, W3C</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>L.</given-names>
            <surname>Sauermann</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>Cool URIs for the Semantic Web</article-title>
          . Working draft,
          <source>W3C</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>S.</given-names>
            <surname>Seshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stemm</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Katz</surname>
          </string-name>
          .
          <article-title>Bene ts of Transparent Content Negotiation in HTTP</article-title>
          .
          <source>In Proceedings of the IEEE Globcom 98 Internet Mini-Conference</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>