=Paper= {{Paper |id=Vol-538/paper-19 |storemode=property |title=An Ontology of Resources for Linked Data |pdfUrl=https://ceur-ws.org/Vol-538/ldow2009_paper19.pdf |volume=Vol-538 |dblpUrl=https://dblp.org/rec/conf/www/HalpinP09 }} ==An Ontology of Resources for Linked Data== https://ceur-ws.org/Vol-538/ldow2009_paper19.pdf
                  An Ontology of Resources for Linked Data

                           Harry Halpin                                         Valentina Presutti
          Institute for Communicating and Collaborative                  Semantic Technology Laboratory
                              Systems                                              ISTC-CNR
                       University of Edinburgh                             Via Nomentana 56, 00161
                         2 Buccleuch Place                                         Rome, Italy
                     Edinburgh, United Kingdom                             valentina.presutti@cnr.it
                       H.Halpin@ed.ac.uk

ABSTRACT                                                             The key feature of the Semantic Web is not its use of
The primary goal of the Semantic Web is to use URIs as            knowledge representation technologies like ontologies and in-
a universal space to name anything, expanding from using          ference per se, but the introduction of these technologies to
URIs for webpages to URIs for “real objects and imagi-            operate over Web resources as defined by URIs. Early Se-
nary concepts,” as phrased by Berners-Lee. This distinc-          mantic Web efforts forgot this, and treated URIs as just odd
tion has often been tied to the distinction between infor-        sorts of symbols. The Linked Data Tutorial provided a way
mation resources, like webpages and multimedia files, and         for putting Semantic Web technologies in harmony with Web
non-information resources, which are everything from real         architecture, and now Linked Data is experiencing amazing
people to abstract concepts like ‘the integers.’ Furthermore,     growth. Yet, there is still debate within Web architecture
the W3C has recommended not to use the same URI for               circles as to what the definition of a ‘information resource’
information resources and non-information resources, and          is, a term crucial to Linked Data, and how terms like this
several communities like the Linked Data initiative are de-       relate to the pre-Semantic Web hypertext Web. We model
ploying this principle. The definition put forward by the         the terms used in Linked Data and Web architecture using
W3C, that information resources are things whose “essen-          a lightweight formal ontology in OWL-DL, which we call
tial nature is information” is a difficult distinction at best.   IRW, for ‘Identity of Resources on the Web.’ The hope is
For example, would the text of Moby Dick be an information        this ontology will clarify these debates and allow further de-
resource? While this problem could safely be ignored up un-       velopment of a provenance-aware and semantically verified
til recently, with the rise of Linked Data and projects like      Linked Data Web.
OKKAM, it appears that this problem should be modelled            Before trying to figure out the difference between a ‘non-
formally. An ontology called IRW (Identity and Reference          information’ and ‘information’ resource, what is a resource?
on the Web) of various types of resources and their rela-         The W3C TAG state in their Architecture of the Web that
tionships, both for the hypertext Web and Linked Data, is         ‘resource’ is used in a general sense for whatever might be
presented. It builds upon Information Object Lite (an ex-         identified by a URI [?]. Previously, a resource was thought
tension of DOLCE Ultra Lite for describing information ob-        of as strictly to be for network-accessible objects such as
jects) and IRE (an earlier ontology of and aligns with other      webpages, since the term ‘resource’ is defined by Fielding in
work in this area. This ontology can be used as a tool to         the first HTTP RFC as “a network data object or service,
make Linked Data more self-describing and to allow infer-         identified by a URI”. However, Berners-Lee broadened the
ence to be used to test for membership in various classes of      concept of resource in his RFC 2396, stating that “a resource
resources.                                                        can be anything that has identity. Familiar examples include
                                                                  an electronic document, an image, a service (e.g., ‘today’s
                                                                  weather report for Los Angeles’), and a collection of other
Categories and Subject Descriptors                                resources. Not all resources are network ‘retrievable’; e.g.,
H.3.d [Information Technology and Systems]: Meta-                 human beings, corporations, and bound books in a library
data                                                              can also be considered resources” [?].
                                                                  One distinction that has been upheld by Hayes and others is
                                                                  the distinction between reference and access [?]. Making an
General Terms                                                     analogy between URIs and names, access means “that the
Knowledge Representation                                          name provides a causal pathway to the thing, perhaps me-
                                                                  diated by the Web” while reference means that “the name is
                                                                  being used to mention the thing,” which may or may not co-
Keywords                                                          incide with access [?]. Something is then ‘Web-accessible’ if
Linked Data, ontology, resource, Web architecture                 it can accessed via the use of HTTP. This use of the term ‘re-
                                                                  source’ for both referring to non-Web accessible things and
                                                                  for naming Web-accessible things is continued in URI RFC
1.   INTRODUCTION                                                 3986, the current IETF RFC, which states that “this specifi-
                                                                  cation does not limit the scope of what might be a resource;
                                                                  rather, the term ‘resource’... likewise, abstract concepts can
Copyright is held by the author/owner(s).                         be resources, such as the operators and operands of a mathe-
LDOW 2009, April 20–24, 2009, Madrid, Spain.
ACM 978-1-60558-487-4/09/04.
matical equation, the types of a relationship (e.g., ‘parent’ or   a URI, since the Eiffel Tower itself is not an information
‘employee’), or numeric values (e.g., zero, one, and infinity)”    resource, no Web representations are directly available. In-
[?]. It is precisely this ability to name things with URIs that    stead, the agent gets a 303 See Other that in turn redirects
aren’t Web-accessible that defines both the Semantic Web           them to an information resource that hosts Web representa-
and Linked Data. However, unlike traditional Semantic Web          tions about the Eiffel Tower, such as
applications, Linked Data allows Web-accessible associated         http://dbpedia.org/page/Eiffel_Tower. When this URI
descriptions, in both machine and human-readable forms, to         returns the 200 status code in response to an HTTP GET
be accessed from a URI for a non-information resource.             request, the agent can infer that
The most obvious distinction is between a resource that            http://dbpedia.org/page/Eiffel_Tower/ is actually an in-
could in principle be Web-accessible, like a webpage, and          formation resource. The Semantic Web URI used to refer to
a resource that is not in principle Web-accessible, like the       the Eiffel Tower itself,
Eiffel Tower itself. This distinction is given by the W3C          http://dbpedia.org/resource/Eiffel_Tower, could be any
TAG as the distinction between an information resource and         kind of resource and so could be a non-information resource
something that may not be an information resource [?]. The         [?]. This example is illustrated in Figure ??, using terms
W3C TAG then define an information resource as some-               from the IRW ontology introduced in Section ??. An alter-
thing “whose essential characteristics can be conveyed in a        native to the 303 redirection is the hash convention, in which
message,” which is a controversial definition [?]. As noted        one uses the fragment identifier of a URI to get redirection
by the Linked Data tutorial, this implies there is another         ‘for free’ with smaller RDF vocabularies. If one wanted a
kind of resource, non-information resources, for things that       Semantic Web URI that referred to the Eiffel Tower itself
are not possibly Web-accessible, like a URI whose primary          without the hassle of a 303 redirection, one would use the
purpose is to refer to the Eiffel Tower [?]. Furthermore, one      URI http://www.tour-eiffel.fr/#it to refer to the Eiffel
can distinguish ‘Web resources’ (a subset of information re-       Tower itself. Since browsers either dispose of or treat the
sources) that are usually Web-accessible, such as web-pages,       fragment identifier as a fragment of a hypertext document
from things that simply carry information, like the text of        or some other Web representation, if an agent tries to access
Moby Dick, regardless of whether it is on the Web or not.          via HTTP GET a Semantic Web URI that uses the hash con-
Again, let us emphasize that some find these distinctions          vention, the server will not return a 404 Not Found status
very intuitive, while others do not. Lastly, in order to dis-      code, but instead will resolve to the URI before the hash,
tinguish URIs for non-accessible things on the Semantic Web        http://www.tour-eiffel.fr, which can then be an Web
(the ‘Cool URIs for the Semantic Web’) from the normal use         resource capable of returning Web representations, which is
of URIs on the hypertext Web, we call the former Seman-            called an ‘associated description’ in the Linked Data com-
tic Web URIs [?]. In Web architecture circles, what are            munity [?]. In this way, Semantic Web inference engines
typically called ‘webpages’ are just one kind of a ‘represen-      can keep the Semantic Web URI that refers to the Eiffel
tations’ of a resource [?]. In order to distinguish the use of     Tower and an associated description about the Eiffel Tower
the word ‘representation’ in Web architecture circles from         separate by taking advantage of the predefined behaviour in
its normal usage, the word Web Representation is used in           web browsers. However, practically the 303 redirection of
this paper to designate a more encompassing notion of rep-         the W3C TAG and the hash convention leave the question
resentation of a resource, i.e. any set of bits that is ‘coming    of whether a resource is an information resource or non-
down the wire’ in response to the use of the Web.                  information resource indeterminate, since there is nothing
                                                                   to prevent 303 redirection from being used to redirect from
2.    LINKED DATA AND REDIRECTION                                  one information resource to another information resource,
                                                                   and the hash convention is dependent on media types, being
   Linked data allows the access of associated descriptions        more often used for named parts in the document in HTML
from URIs for non-information resources by use of redirec-         instead of as a shortcut for distinguishing non-information
tion. This was codified by the W3C TAG when it officially          resources and their associated descriptions.
resolved httpRange-14 by saying that the 303 See Other
HTTP header can serve to disambiguate between informa-
tion resources and possible non-information resources. The         3.   RELATED WORK
official resolution by the TAG is given below as [?]:                There has been some related work in this area. Mogul has
                                                                   suggested that there are fundamental disagreements about
     • If an HTTP resource responds to a GET request with          what precisely the difference between an HTTP entity and
       a 2xx response, then the resource identified by that        a “representation of a resource” are, and that this leads
       URI is an information resource;                             to widespread problems with caching implementations in
                                                                   HTTP [?]. David Boorh has proposed an informal cate-
     • If an HTTP resource responds to a GET request with
                                                                   gorisation of what can be identified by a URI, noticing the
       a 303 (See Other) response, then the resource iden-
                                                                   confusion between ‘naming’ and ‘identifying’ and even ‘de-
       tified by that URI could be any
                                                                   scribing’ [?]. Hayes has long attempted to elucidate the fun-
     • If an HTTP resource responds to a GET request with          damental difference between the use of resources to access
       a 4xx response, then the nature of the resource is un-      webpages and the use of a URI to refer to some non-Web
       known.                                                      accessible thing [?]. Furthermore, the use of URIs to refer
                                                                   to physical entities and the subsequent clarification of the
One concrete example would be an agent is trying to access         direct reference position has led to the OKKAM project, a
a URI that refers to the Eiffel Tower itself,                      project to build a catalogue of ‘entity’ URIs that is supposed
http://dbpedia.org/resource/Eiffel_Tower. Upon attempt-            to directly refer to physical entities [?]. This general line of
ing to access that resource with a HTTP GET request on             thinking has led to a number of workshops at conferences
                                  Figure 1: 303 Redirection for Semantic Web URIs


such as the World Wide Web Conference and the European           widely-known DOLCE foundational ontology and its exten-
Semantic Web Conference devoted to this topic [?, ?].            sion for describing information objects3 (IOL, described in
Within the W3C, there is an informal activity of the W3C         [?]), attempted to model some of these concepts earlier [?].
TAG called the ‘Architecture of the Semantic Web’ (AWWSW)        However, many aspects were not included in IRE, such as
that has for over a year attempted to decipher Web archi-        the distinctions between resources and their Web represen-
tecture, in part prompting by the need to model HTTP in          tations, or the concept of accessing a web-page via a web
RDF directly in order for HTTP transactions to be validated      server, that are crucial to the efforts within the W3C and
via EARL, the RDF-based Evaluation and Report Language           Web community, while many of the distinctions drawn by
used by the W3C to validate new W3C standards and de-            DUL+IOL were found to be too ‘heavy-weight’ for these
scribe test-cases [?, ?]. Yet, HTTP in RDF currently does        communities [?]. In response to these concerns, the IRE
not model the notion of ‘resource’ except with a misuse of       ontology has been evolved into the IRW ontology.
rdf:Alt, so it must be corrected by integrating an ontology
of resources like IRW. While both EARL and the AWWSW
are attempting a much more detailed and low-level descrip-       5.   THE IRW ONTOLOGY
tion of HTTP transactions than we attempt, the lightweight         The prefix irw: is for the namespace
IRW ontology described in this paper should allow specifi-       http://purl.org/NET/irw/ of the IRW ontology. The sta-
cations like HTTP in RDF to directly address the notion of       ble version of the ontology can also be accessed via its PURL.
a ‘resource.’                                                    The latest version of the IRW ontology may be accessed at:
                                                                 http://ontologydesignpatterns.org/ont/web/irw.owl.
4.     THE USE OF A FORMAL ONTOLOGY                              The prefix rdfs: is used for the RDF(S) namespace
                                                                 http://www.w3.org/2000/01/rdf-schema#. ir: is
  The primary use of a formal ontology in the context of         http://www.ontologydesignpatterns.org
Linked Data is to provide a foundation for the use of a com-     /cp/owl/informationrealization.owl. While the IRW ontol-
mon ontology to describe Linked Data and typical Linked          ogy in full can not explicated due to lack of space, the pri-
Data transactions, currently being done by different ontolo-     mary classes and properties are given in Figure ??. The
gies in Section ??. To this aim, IRW can be discussed,           IRW-related elements needed for the example of 303 redi-
reviewed, and comment on the ontologydesignpatterns.org          rection are given in Figure ??. The IRW ontology starts
wiki1 . To serve the aim of elucidating arguments, additional    with irw:Resource. While this class expresses the same in-
modules of IRW have been developed and are briefly intro-        tuition as rdfs:Resource, we have defined it because this
duces in Section ??.                                             version of IRW is within OWL-DL expressivity. In OWL
There have been previous attempts to model at least a sub-       Full, this class is equivalent to rdfs:Resource.
set of the notions outlined in a formal ontology, but all lack
coverage of some crucial concepts. For example, while the
                                                                 Identification and reference..
ontology given by RDF Schema touches upon the vocabulary
                                                                    The notion of a URI is modeled as a class, irw:URI that
of resources via its term rdfs:Resource, it does not cover
                                                                 has exactly one value for the datatype property irw:hasURI
the distinction between information and non-information re-
                                                                 allowing to specify its value. Modelling URIs as a class al-
sources. The IRE (Identifiers, Resources, and Entities),
                                                                 lows us to talk about different kinds of URIs, such as IRIs
based on Dolce Ultra Lite (DUL),2 a light version of the
                                                                 (Internationalized Resource Identifiers) and Semantic Web
1
    http://ontologydesignpatterns.org/wiki/Submissions:IRW
2                                                          3
    http://www.loa-cnr.it/ontologies/DUL.owl                 http://www.loa-cnr.it/ontologies/IOLite.owl
Figure 2: The IRW ontology illustrated as a graph. Rounded nodes are classes, while rectangular ones are
datatypes. Arcs ending with an empty triangle are rdfs:subClassOf relationships. Arcs ending with a filled
triangle are either object properties or datatype properties depending of the range node. Arcs’ direction
indicates the domain and range of the property. A ‘1’ associated to a property means it is functional, a ‘T’
means it is transitive, ‘1+’ means ‘at least one’. Prefixes are indicated only if different from irw:.


URIs. According to some like Berners-Lee, URIs identify ex-      be used between just information resources that have noth-
actly one resource. This is modeled in IRW by the functional     ing to do with the Semantic Web, their domain and range
property irw:identifies, having range irw:Resource (and          says nothing about the type of resource. In order to model
inverse property, irw:isIdentifiedBy). Of course, those          explicitly the redirection, two distinct sub-properties of this
that disagree with this viewpoint may not use irw:identifies,    have been added in a TAG-specific module of IRW4 that con-
and so it is given sub-properties irw:accesses and irw:refersTo. tains tag:redirects303To property and a tag:redirectsHashTo
The idea of reference as explicated by Hayes is modeled          property. Obviously, tag:redirects303To models the TAG’s
by the object property irw:refersTo (and inverse property,       ‘solution’ to httpRange-14 while tag:redirectsHashTo rep-
irw:isReferencedBy) [?]. One condition on this property is       resents the hash convention.
that the object of reference should be “immediately causally
disconnected” from its subject [?]. This is important, as ref-   Types of resources..
erence is the relationship to both URIs for non-information         Having defined reference and redirection, we can now cat-
resources like the Eiffel Tower or integers, but also applies    egorize resources. There are two main disjoint sub-classes of
to the relationship of an information resource to some non-      irw:Resource. The first subclass is given as
information resource, like the relationship of Tim Berner-       irw:InformationResource, which is an information object,
Lee’s homepage to Berners-Lee himself. So, the key point         such as a musical composition, a text, a word, a picture.
is that URIs can identify resources, and some of these URIs      An information object is an object defined at a level of ab-
refer to non-information resources.                              straction, independently from how it is concretely realized.
                                                                 So an irw:InformationResource expresses the same intu-
Access and redirection..                                         ition and is an equivalent class to the DUL+IOL informa-
   Distinct from reference is the irw:accesses relationship,     tion object [?]. This means an information resource has, via
which is a causal connection to the thing identified. This       the ir:realizes property (with inverse ir:isRealizedBy),
is modelled again as a relationship between URIs and re-         at least one ir:InformationRealization, a concrete real-
sources, although it is transitive, unlike irws:refersTo. If     ization. This term is again imported from DUL+IOL [?].
one can access a and a accesses b then a accesses c (via b).     So an information resource’s “essential characteristics can
Although a wide notion, access allows us to model the typical    be conveyed in a single message” implies that everything
HTTP request-response Web transactions between a Web             from a bound book to an HTTP message can be a realiza-
client and a server. A URI may also have a irw:redirectsTo
property, a sub-property of irw:accesses, that we can use        4
                                                                   http://www.ontologydesignpatterns.org/ont/web/tag2irw.owl
to model HTTP redirection. However, since redirection can        associated with prefix tag:.
tion for an information resource [?]. Furthermore, the prop-      that relates a URI to a concrete Web server (inverse prop-
erty irw:isAbout (and inverse property, irw:isTopicOf) ex-        erty irw:resolvesTo). This irw:resolvesTo property is
presses the relationship of an information resource to a re-      currently implemented by mapping a URI to an IP address
source or resources the information is ‘about.’ Examples of       or addresses. So each irw:WebServer is the resolution of
this are descriptions of a resource using natural language or     at least one irw:URI. Additionally, a irw:WebServer has a
depictions of a resource using images. Information resources      irw:isLocationOf property with at least one
also can, but not necessarily, be identified (either accessed     irw:WebRepresentation (inverse property, locatedOn), in-
or referred to) with a URI. In this manner, the text of Moby      dicating the Web server concretely can respond to an HTTP
Dick can be an information resource since it could be con-        request with a particular Web Representation.
veyed as a single message in English, and can be realized by
both a particular book or a webpage containing that text.
Note irw:NonInformationResource complements                       Linked Data transactions..
irw:InformationResource from which it is disjoint with.              The typical Linked Data transaction is also modeled. A
Such class represents things that can not themselves – for        new sub-class of irw:URI, SemanticWebURI is given, where
whatever reason – be realized as a single digitally encoded       the Semantic Web URI has a constraint that it must have
message. A number of different kinds of things may be             at least one irw:redirects property. In the Linked Data
irw:NonInformationResources. Since this concept is the            Initiative, another important kind of resource is “associated
cause of much confusion and debate, it is detailed with three     descriptions,” which is just an Web resource that can be ac-
disjoint sub-classes. These kinds of IRW distinctions are         cessed via redirection from a Semantic Web URI [?]. For
not normative, as there are other possible plausible, more        example, in DBPedia6 the resource
detailed modeling choices. Our aim here is of communi-            dbpedia:/resource/Eiffel_Tower redirects to an associ-
cating the intuition behind the concepts of information and       ated description at dbpedia:/data/Eiffel_Tower, and to
non-information resources without entering the philosophi-        an HTML page at dbpedia:/page/Eiffel_Tower depend-
cal debate about top-level ontologies. IRW contains three         ing on the requested media type [?]. This scenario can be
sub-classes of irw:NonInformationResources:5                      generalized:
irw:PhysicalEntityResource, is a resource that is ‘touch-         a irw:WebClient irw:requests a irw:SemanticWebURI x
able’ like physical people, artifacts, places, bodies, chemical   and the request is redirected (e.g. via hash or 303 redirec-
substances, biological entities;                                  tion) to another URI, where this second URI identifies an
irw:ConceptualResource, which refer to resources that are         ldow:AssociatedDescription,7 which has one irw:isAbout
created in a social process that can not be completely re-        property to a non-information resource. We model
alized digitally, such as legal entities, political entities, so- ldow:AssociatedDescription as a subclass of
cial relations, as well as the concept of horse and imaginary     irw:WebResource.
objects like unicorns; and finally irw:AbstractResource,
which refers to abstract combinatorial spaces that cannot be
located in space-time such as formal entities like functions or
                                                                  6. ALIGNING IRW TO OTHER ONTOLO-
the integers as well as more mundane resources like the in-             GIES
finite set of names that constitute the resource identified by       In this section, we present a number of suggested align-
URIs themselves. A sub-class of irw:InformationResource           ments, as given in Table 2. The alignments are to the three
is irw:WebResource, which is an information resource iden-        primary other ontologies, the RDF in HTTP ontology [?],
tified by at least one URI and realized by at least one           and the IRE ontology as well as an ontology for HTTP
irw:WebRepresentation, so that a Web resource is just an          used by the Tabulator Browser [?, ?]. The namespaces for
information resource that is realized by at least one accessi-    ont is http://www.w3.org/2007/ont/http. IRE, due to its
ble Web representation like a web-page. irw:WebRepresentation modular construction and re-use of terms from DUL+IOL
is a sub-class of irw:InformationRealization with con-            patterns, uses many namespaces, but they can be found at
straints added to make the cardinality of ir:isRealizedBy         http://www.ontologydesignpatterns.org/cpont/ire.owl.
and                                                               The http namespace is http://www.w3.org/2006/http#.
irw:isIdentifiedBy both at least 1. In this way IRW can
distinguish between a resource for the text of ‘Moby Dick’        7. APPLICATIONS
in general and a webpage about ‘Moby Dick.’
                                                                     There are several applications of this ontology. The first
                                                                  is to solve the problem noted earlier that currently Linked
Hypertext Web transactions..                                      Data resources are still not self-describing, such that there
   The typical hypertext Web transaction can be modelled          is no “definition, description, some other kind of indication
by IRW. We begin with irw:WebClient, which is some client         of what the identifier is intended to identify” on the level
in the context of the Web that can have a irw:requests rela-      of a resource [?]. If one gets a URI of Linked Data, how
tionship to a URI (note that irw:requests serves as an hook       can one record that it for a non-information resource or an
to the alignment of IRW with HTTP in RDF [?]), as exem-           associated description, besides actually going to the URI
plified by a typical HTTP GET request). The irw:requests          and performing HTTP GET. Then, how should one record
property is a sub-property of irw:access. A irw:WebClient
                                                                  6
then irw:requests a irw:URI. We also introduce the class            Prefix dbpedia:          is used for the namespace
irw:WebServer, which has a irw:isResolutionOf property            http://dpedia.org
                                                                  7
                                                                    Typical Linked Data terminology is represented
                                                                  in a specific module of IRW represented here
5
  Note that the three classes does not constitute an exhaus-      by the prefix ldow:           referring to the namespace
tive partition.                                                   http://ontologydesignpatterns.org/ont/web/ldow2irw.owl
                               Class or Property                         Alignments
                            irw:WebRepresentation            owl:equivalentClass http:Message
                                                        owl:equivalentClass ont:ResponseMessage
                                                       rdfs:subClassOf ire:InformationRealization
                                                        rdfs:subClassOf ir:InformationRealization
                                  http:Content          rdfs:subClassOf ir:InformationRealization
                              http:MessageHeader        rdfs:subClassOf ir:InformationRealization
                           irw:InformationResource      owl:equivalentClass ir:InformationObject
                             irw:SemanticWebURI                    ire:SemanticWebURI
                                  irw:identifies                    ire:isExactProxyFor
                                   irw:isAbout                            ire:about

                                   Table 1: Mapping of IRW to Other Ontologies


this provenance? The IRW ontology this in turn allows the        document can be reached through a irw:redirectsTo from
semantic validation, to be able to describe and infer in de-     a irw:NonInformationResource.
tail the types of resources that can be interacted with via      This same RDF records of what resources are Web resources
HTTP, which is useful for both tools like EARL that record       or non-information resources, associated descriptions and
validation of Web standards to be implemented in a reliable      their media-types (particularly RDF documents) is impor-
fashion, which is useful for error-reporting on the Web in       tant information for any Semantic Web search engine. The
general and HTTP in particular [?]. One facet of semantic        proposed Semantic Web Site-maps allows authors to publish
validation is the description of Linked Data, where terms        various characteristics of Semantic Web data, such as its up-
like non-information resource and associated description be-     date frequency and preferred method of access via an HTTP
come important. This is useful for both semantic validation      response [?]. However, it has to express what kind of data it
of Linked Data and Semantic Web Search engines [?].              is. This is important, as currently Semantic Web search en-
                                                                 gines often do specialise in different types of Semantic Web
7.1   Making Linked Data Self-Describing                         resources. For example, FALCON-S distinguishes between
   There would be a number of advantages if webpages that        searching for what they call objects
have RDF content could distinguish themselves as such, in        (irw:PhysicalEntityResources) and concepts
the same way that HTML ‘valid’ documents are currently           (irw:ConceptualResources). As tools like Swoogle spe-
validated by W3C Validators and often mark themselves            cialises in conceptual resources while the OKKAM project
by a computer graphic. This can be done by embedding             specialises in naming entities, by allowing publishers to de-
a IRW statement in RDF/XML documents, RDF returned               scribe what kinds of Semantic Web resources they have, a
from SPARQL endpoints, and RDFa or GRDDL statement               Semantic Web search engine can then specialise in searching
in XHTML or XML documents [?]. Ideally, this would be            and displaying for different kinds of resources [?, ?]. Fur-
in conjunction with some sort of graphical logo to distin-       thermore, the use of a Semantic Web search engine that
guish the page as ‘Linked Data Enabled,’ as detecting the        searches all kinds of RDF like Sindice, along with some
RDF statement, even in RDFa, is difficult for humans. Sec-       large-scale inference engine like SOAR that could run some
ond, for irw:NonInformationResources that are part of the        kind of inference-based reasoning algorithm against a large
Linked Data and thus have no Web Representation to em-           data-set, would allow the different kinds of resources to be
bed such a statement in, or resources whose actual Web can       automatically annotated and categorised [?, ?].
not be changed or must be changed en mass, such a RDF
triple can be embedded directly in HTTP via the use of the       7.3    Linked Data Metadata
HTTP Link Header [?].                                              One use of IRW to systematise the process of Linked Data
                                                                 validation. Currently, the only Linked Data validator is
7.2   Semantic HTTP Validation                                   Vapour, which is coded procedurally and whose results can
   For EARL, we can then use the inference not only to de-       not themselves be presented as RDF [?]. The IRW and the
tect the presence of Semantic Web URIs and Information           HTTP in RDF vocabulary can be used to record whether
Resources, but also to determine constraints and contra-         or not each Linked Data resource is properly redirected us-
dictions. For example, one constraint that EARL is in-           ing 303 redirection, and the IRW vocabulary can be used to
terested in finding out is whether namespaces documents          make sure that the 303 redirection can lead access both an
are employing either the hash convention or 303 redirec-         associated description in HTML and in RDF [?]. Any errors
tion, since according to the W3C, namespace resources are        over large linked data-sets are easily collected and tested via
not information resources but an abstract space of infinite      SPARQL. Furthermore, Linked Data publishers could add
names. According to IRW, a namespace resource would be           two RDF statements that let their associated description be
an irw:AbstractResource. This is because a user can ‘mint’       self-describing, solving the identity crisis in the context of
a new namespace name without checking any namespace              Linked Data, and possibly leading to less incorrect use of
documents in any RDF and XML document and there is no            owl:sameAs. Just embedding dbpedia:data/Eiffel_Tower
ability of the namespace document to constrain names, but        irw:isAbout dbpedia:/resource/Eiffel/Eiffel_Tower would
only to recommend them. One obvious use-case is to check         work. The following statement: dbpedia:data/Eiffel_Tower
every new namespace document and see if the namespace            rdf:type ldow:AssociatedDescription could be added, as
well as stating dbpedia:resource/Eiffel_Tower is of type             http://www4.wiwiss.fu-
irw:NonInformationResource or                                        berlin.de/bizer/pub/LinkedDataTutorial/ (Last
even irw:PhysicalEntityResource for clarity. This class              accessed on May 28th 2008).
would be useful for determining whether or not the resource      [9] D. Booth. URIs and the myth of resource identity. In
                                                                     Proceedings of Identity, Reference, and the Web
had a property such as latitude or longitude, since concrete         Workshop at the WWW Conference, 2006.
physical entities will have them while concepts and abstract         http://www.ibiblio.org/hhalpin/irw2006/dbooth.pdf.
mathematical expressions will not.                              [10] P. Bouquet, H. Stoermer, and D. Giacomuzzi.
                                                                     OKKAM: Enabling a Web of Entities. In i3: Identity,
8.    CONCLUSION AND FUTURE WORK                                     Identifiers, Identification. Proceedings of the
                                                                     WWW2007 Workshop on Entity-Centric Approaches
   Overall, the IRW ontology is a beginning, yet it should           to Information and Knowledge Management on the
serve as foundational contribution of modelling Linked Data          Web, Banff, Canada, May 8, 2007., CEUR Workshop
and so the “Dark Side of Semantic Web” that Hendler be-              Proceedings, ISSN 1613-0073, May 2007. online
lieves may give the Semantic Web a crucial advantage over            http://CEUR-WS.org/Vol-249/submission 150.pdf.
previous efforts in knowledge representation [?]. IRW clari-    [11] P. Bouquet, H. Stoermer, G. Tummarello, and
                                                                     H. Halpin, editors. Proceedings of the WWW2007
fies the interactions between the hypertext Web and Linked
Data, allowing Linked Data spiders to keep track of im-              Workshop I3 : Identity, Identifiers, Identification,
                                                                     Entity-Centric Approaches to Information and
portant provenance regarding the identity of resources, and          Knowledge Management on the Web, Banff, Canada,
to characterise the resources correctly for semantic valida-         May 8, 2007, CEUR Workshop Proceedings.
tion and error detection. Future work needs to be done               CEUR-WS.org, 2007.
to standardise IRW or a descendant thereof through the          [12] P. Bouquet, H. Stoermer, G. Tummarello, and
W3C, which will doubtless result in refinements to IRW,              H. Halpin, editors. Proceedings of the ESWC2008
and to encourage its use within the Linked Data community            Workshop on Identity, Reference, and the Web,
in the context of various validators, debuggers, and search          Tenerife, Spain, June 1st, 2008, CEUR Workshop
                                                                     Proceedings, 2008.
engines. By developing a consistent vocabulary for describ-     [13] D. Connolly. A pragmatic theory of reference for the
ing the identity of resources in IRW, the first step has been        web. In Proceedings of Identity, Reference,
taken.                                                               and the Web Workshop at the WWW Conference, 2006.
                                                                     http://www.ibiblio.org/hhalpin/irw2006/dconnolly2006.pdf
                                                                     (Last accessed November 22nd 2008).
9.    ACKNOWLEDGEMENTS                                          [14] R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker, and
  We would like to thank Aldo Gangemi for his insightful             G. Tummarello. Semantic sitemaps: Efficient and
comments. Also, Harry Halpin was partially supported by              flexible access to datasets on the semantic web. In
a Microsoft ‘Beyond Search’ award. Valentina Presutti was            Proceedings of European Semantic Web Conference,
supported by NeOn and IKS EU FP7 projects.                           pages 690–704, 2008.
                                                                [15] S. F. Diego Berrueta and I. Frade. Cooking http
10.   REFERENCES                                                     content negotiation with vapour. In Proceedings of
 [1] S. Abou-Zahra. Evaluation and Report Language                   Identity, Reference, and the Semantic Web Workshop
     (EARL) 1.0 Schema. W3C Working Draft, W3C,                      at the European Semantic Web Conference, 2008.
     2007. http://www.w3.org/TR/EARL10-Schema/.                 [16] L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost,
 [2] B. Adida, M. Birbeck, S. McCarron, and                          Y. Peng, P. Reddivari, V. C. Doshi, and J. Sachs.
     S. Pemberton. RDFa in XHTML: Syntax and                         Swoogle: A Search and Metadata Engine for the
     Processing. W3C Recommendation, W3C, 2008.                      Semantic Web. In Proceedings of the Thirteenth ACM
     http://www.w3.org/TR/rdfa-syntax/.                              Conference on Information and Knowledge
 [3] A. P. Aidan Hogan, Andreas Harth. SOAR:                         Management. ACM Press, November 2004.
     Authoritative reasoning for the web. aswc 2008: 76-90.     [17] A. Gangemi. Norms and plans as unification criteria
     In Proceedings of the Asian Semantic Web Conference             for social collectives. Journal of Autonomous Agents
     (ASWC2008), pages 76–90, Bangkok, Thailand, 2008.               and Multi-Agent Systems, 16(3), 2008.
 [4] S. Auer, C. Bizer, J. Lehmann, G. Kobilarov,               [18] A. Gangemi, N. Guarino, C. Masolo, R. Oltramari,
     R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a              and L. Schneider. Sweetening ontologies with DOLCE.
     web of open data. In Proceedings of the International           In Proceedings of International Conference on
     Semantic Conference and Asian Semantic Web                      Knowledge Engineering and Knowledge Management.
     Conference (ISWC/ASWC2007), pages 718–728,                      Ontologies and the Semantic Web, pages 166–181.
     Busan, Korea, 2007.                                             Springer, 2002.
 [5] T. Berners-Lee, R. Fielding, and L. Masinter. IETF         [19] P. Hayes and H. Halpin. In defense of ambiguity.
     RFC 2396 Uniform Resource Identifier (URI): Generic             International Journal of Semantic Web and
     Syntax, 1998. http://www.ietf.org/rfc/rfc2396.txt               Information Systems, 4(2):1–18, 2008.
     (Last accessed on Sept. 15th 2008).                        [20] J. Hendler. The Dark Side of the Semantic Web.
 [6] T. Berners-Lee, R. Fielding, and L. Masinter. IETF              IEEE Intelligent Systems, 22(1):2–4, 2007.
     RFC 3986 Uniform Resource Identifier (URI): Generic        [21] I. Jacobs and N. Walsh. Architecture of the World
     Syntax, January 2005.                                           Wide Web. Technical report, W3C, 2004.
     http://www.ietf.org/rfc/rfc3986.txt(Last accessed on            http://www.w3.org/TR/webarch/ (Last accessed Oct
     April 2th 2008).                                                12th 2008).
 [7] T. Berners-Lee, J. Hollenbach, K. Lu, J. Presbrey,         [22] J. Koch, C. A. Velasco, and S. Abou-Zahra. HTTP
     E. Prud’hommeaux, and mc schraefel. Tabulator                   Vocabulary in RDF. W3C Working Draft, W3C,
     Redux: Browsing and Writing Linked Data. In                     2008. http://www.w3.org/TR/EARL10-Schema/.
     Proceedings of the WWW2007 Workshop on Linked              [23] J. Mogul. Clarifying the fundamentals of HTTP. In
     Data on the Web, 2008.                                          Proceedings of the 11th International World Wide
 [8] C. Bizer, R. Cygniak, and T. Heath. How to publish              Web Conference, pages 444–457, 2002.
     Linked Data on the Web, 2007.
[24] M. Nottingham. IETF Internet Draft HTTP Header
     Linking, 2008. http://www.mnot.net/drafts/draft-
     nottingham-http-link-header-01.txt.
[25] E. Oren, R. Delbru, M. Catasta, R. Cyganiak,
     H. Stenzhorn, and G. Tummarello. Sindice.com: a
     document-oriented lookup index for open linked data.
     International Journal of Metadata, Semantics, and
     Ontologies 2008, 3(1):37–52, 2008.
[26] S. Pepper. The case for published subjects. In
     Proceedigs Identity, Reference,
     and the Web Workshop at the WWW Conference, 2006.
     http://www.ibiblio.org/hhalpin/irw2006/spepper2.pdf.
[27] V. Presutti and A. Gangemi. Identity of resources and
     entities on the web. International Journal of Semantic
     Web and Information Systems, 4(2):49–72, 2008.
[28] L. Sauerman and R. Cygniak. HTTP Vocabulary in
     RDF. W3C Note, W3C, 2008.
     http://www.w3.org/TR/cooluris/.