<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>When owl:sameAs isn't the Same: An Analysis of Identity Links on the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harry Halpin</string-name>
          <email>H.Halpin@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick J. Hayes</string-name>
          <email>phayes@ihmc.us</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Communicating and Collaborative</institution>
          ,
          <addr-line>Systems</addr-line>
          ,
          <institution>University of Edinburgh</institution>
          ,
          <addr-line>2 Buccleuch Place, Edinburgh</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Human and Machine Cognition</institution>
          ,
          <addr-line>40 South Alcaniz St., Pensacola</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <abstract>
        <p>In Linked Data, the use of owl:sameAs is ubiquitous in 'inter-linking' data-sets. However, there is a lurking suspicion within the Linked Data community that this use of owl:sameAs may be somehow incorrect, in particular with regards to its interactions with inference. In fact, owl:sameAs can be considered just one type of 'identity link,' a link that declares two items to be identical in some fashion. After reviewing the definitions and history of the problem of identity in philosophy and knowledge representation, we outline four alternative readings of owl:sameAs, showing with examples how it is being (ab)used on the Web of data. Then we present possible solutions to this problem by introducing alternative identity links that rely on named graphs.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data</kwd>
        <kwd>ontology</kwd>
        <kwd>resource</kwd>
        <kwd>Web architecture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>As large numbers of independently developed data-sets
have been introduced to the Web as Linked Data, the
vexing problem of identity has returned with a vengeance to the
Semantic Web. As the ubiquitous owl:sameAs property is
used as the RDF property to connect these data-sets, it has
been dubbed the owl:sameAs problem. However, the
problem of identity lies not within Linked Data or within the
Semantic Web languages, but is an outstanding and
wellknown – if sometimes not precisely labeled – issue in
preSemantic Web knowledge representation languages in
artificial intelligence. What precisely is new in its latest guise
of this problem on the Web of Linked Data is that this is
the first time the problem is being encountered by different
individuals attempting to independently knit their
knowledge representations together using the same standardized
language. Much of the supposed “crisis” over the
proliferation of owl:sameAs in Linked Data can be traced to the fact
that these uses of owl:sameAs tend to be mutually
incompatible, and almost always violate the rather strict logical
semantics of identity demanded by owl:sameAs. However,
the exact types of distinctions made by these individuals are
important, even if they contradict the relevant specification
of owl:sameAs. First, these uses and abuses of owl:sameAs
demonstrate for the first time in the history of knowledge
representation how precisely these problems play out in the
wild. Second, as the Semantic Web is a project in
development, it is always possible to specify anew different and new
kinds of language constructs and more clearly specified best
practices to align the specifications with the actual empirical
use of the Semantic Web in the wild.</p>
      <p>First, we will give an overview of the problem of
identity and its somewhat dusty lineage in artificial intelligence,
if only to show how what was already a known issue for
knowledge representation becomes even more exacerbated
when knowledge representation goes global for the Semantic
Web. Then, four distinct uses of owl:sameAs are discussed
in addition to the precise idea of “same thing as,” namely:</p>
      <p>Finally, a number of suggestions for how the current
situation can be improved are sketched. The necessity of both
semantic and theoretical work is given as well.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>A BRIEF HISTORY OF IDENTITY</title>
      <p>The problem of identity has a long and chequered history,
spreading from philosophy and mathematics to linguistics
and knowledge representation. In each of these fields, what
it means for two things to be identical goes straight to the
heart of semantics.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>What is Identity?</title>
      <p>
        The father of knowledge representation, Leibnitz, was not
surprisingly also the first to phrase a coherent and
formalizable definition of identity, often called ‘Leibnitz’s Law’ or
the ‘The Identity of Indiscernables,’ namely that for every x
and every y, whatever is true of x is true of y, then x is
identical to y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This notion of identity states that identity is
composed of properties, so that in order for two things to be
the same they must share all the same properties. This law
can then be stated logically as ∀x∀y(∀P.(P (x) ↔ P (y)) →
x = y). If x = y, then they are the same thing, so of course
all the properties of x are also properties of the y: there is
only one thing there, to either have the property or not have
it.
      </p>
      <p>A number of classical problems already crop up in this
analysis of identity. First, exactly what properties are
being counted? Obviously, we can imagine worlds where things
have the exact same properties but are nonetheless not
identical, such as an exact clone (which is all too easy with
digital objects). There are two obvious escape hatches: a
thing may have a property of “being=itself” (an
[haecceity) or things having different temporal-spatial co-ordinates
could be counted as different, even if they share the rest
of their properties in common. While that sounds like a
common-sense distinction, is it true that Tim Berners-Lee
is the same person he was when he was a child? Or if he lost
his arm? This leads straight into arguments about
perdurance and endurance in philosophy. Are there two different
kinds of properties, properties that are somehow intrinsic
to identity and others that are extrinsic, i.e. purely
relative to other things? Lastly, perhaps the real question is
who or what determines the conditions of identity, namely
that identity can only be made in context of an explicit
theory of identity criteria. These theories can be formalized in
terms of the semantic interpretation of sentences to
referents. However, this does not mean that these theories are
compatible. If one has a theory of identity where one is
talking only about people as employees in a particular role, then
two different people who have the same job will be
identical, but if one has a more fine-grained interpretation the
very same people would be different. One can even imagine
theories of identity based on different criteria, where some
theories of identity subsume weaker or stronger ones. There
is also the problem of vagueness, and the inability to specify
precise properties (such as the exact latitude and longitude
boundaries of Mount Everest). Regardless of the problems,
the point of Leibnitz’s Law is clear: When someone says
two things are the same, they mean they share all the same
properties.</p>
      <p>
        Frege was the first to note what Galois called the
“linguistic counterpart” to Leibnitz’s Law: the Principle of
Substitution, which states that if x and y are identical, then x
may be substituted for y without changing the truth-value
of the sentence in which the substitution is made [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Formally, this is generally phrased in the precise same manner
as Leibnitz’s Law. However, there is a subtle difference, for
while Leibnitz’s Law dealt with the identity of concepts and
individuals on the level of properties, the Principle of
Substitution deals with when the use of the name itself in the
context of actual sentences.
      </p>
      <p>
        There are two important consequences. The first is the
classic division between use and mentioning of a name.
Even if “Mount Everest” in English and “Qomolangma”
in Tibetan mean the same thing, names from different
languages cannot be substituted for each other often. Sentences
like “Qomolangma is a word in Tibetan” mention a name,
while the sentence “Mount Everest is the highest mountain
in the world” uses the name. Obviously, “Mount Everest is
a word in Tibetan” is false. Is “Qomolangma is the highest
mountain in the world” true? This fact was not necessarily
known in Tibet before the era of global geological surveys.
So one could easily have a case of a geoscientist who has
never visited the mountain knowing it is the highest
mountain in the world and a Tibetan monk who lives not too far
from the mountain not knowing - or caring - if it is the
highest mountain in the world. This distinction was called the
distinction between two names having the same referent but
different senses, i.e. contexts that do or do not share certain
information [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Often contexts where a name can be
substituted are called extensional, while referentially opaque
contexts where a name can not be substituted are intensional.
In general, indirect quotations and statements of belief, such
as “Rajendra Pachauri believes the glaciers on Mount
Everest are melting” are considered opaque. Although in
practice the principle of substitution is subtle and its use often
wrought with confusion, the key point is straightforward: A
name can identify different things in different contexts.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>The IS-A Debate in Semantic Networks</title>
      <p>
        It would be easy to dismiss these arguments over identity
as being mandarin philosophical questions, until the ‘pedal
hits the metal’ in the the world of knowledge representation.
This is precisely what happened to semantic networks, a
predecessor of the Semantic Web in knowledge representation.
Semantic networks, as pioneered by Quillian, were viewed
as an alternative knowledge representation scheme to
firstorder logic in the early days of artificial intelligence [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In
essence, a semantic network is similar to an RDF graph
except that instead of using URIs, the nodes and edges were
labeled purely using natural language or pseudo-natural
language labels.
      </p>
      <p>
        Semantic networks, by relying on words from natural
language or pseudo-words to label their constructs, whose
meanings were somehow supposed to be simply obvious, actually
led to these constructs being ambiguous. The classic
example was the infamous IS-A label used by Brachman in his
What IS-A is and isn’t. An Analysis of Taxonomic Links in
Semantic Networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Often, two nodes were connected by
an IS-A link. Were IS-A links assertional, such that
somehow two nodes connected by an IS-A were identical? Or
were they taxonomic, such that they meant a sub-class or
subset relationship? Or a structural relationship between a
concept and an instance? Brachman found that there were
a proliferation of the various meanings of IS-A links in
semantic networks, and that not only were they incompatible
between different semantic networks, but that within a
single network IS-A links were often given different meanings
within the same network [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Given this lack of clarity about
exactly what was being represented by the knowledge
representation, semantic networks could not be transferred or
combined with each other with any degree of reliability.
      </p>
      <p>
        In an effort to remedy this crisis, Brachman and others
decided to split what they called the “epistemological level”
- the kinds of nodes and edges that remained neutral to the
underlying primitives yet could be given a specific semantics
from the rest of the semantic network whose meaning could
only be grounded in some linguistic convention [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These
logical constructs could be given a formal semantics (and
thus a model theory) by mapping them to a language with a
well-defined semantics, such as first-order predicate calculus.
Therefore, semantic networks could be considered just an
intuitive (or slightly odd, depending on your preferences)
notation for logic.
      </p>
      <p>
        The Semantic Web seems to have learned from semantic
networks. The formal semantics of RDF are important
precisely because RDF statements can be given the same logical
meaning uniformly across a distributed network, even if the
semantics of RDF have relatively light inferential power and
do not constrain the semantic interpretations very tightly
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This is also on purpose, as it allows RDF to be - in
theory - used as a foundation or “glue” for other more
constrained vocabularies. Furthermore, it was precisely the
explorations of the semantics of “semantic” networks that led
to description logic, and so OWL. By giving OWL and RDF
a formal semantics - albeit a very limited one - it was
imagined that the Semantic Web would not repeat the mistakes
of semantic networks.
      </p>
    </sec>
    <sec id="sec-5">
      <title>THE IDENTITY CRISIS OF LINKED DATA</title>
      <p>Contrary to popular belief in some circles, formal
semantics are not a silver bullet. Just because a construct in a
knowledge representation language is prescribed a behavior
using formal semantics does not necessarily mean that
people will follow those semantics when actually using that
language “in the wild.” This can be laid down to a wide variety
of reasons. In particular, the language may not provide the
facilities needed by people as they actually try to encode
knowledge, so they may use a construct that seems close
enough to their desired one. A combination of not reading
specifications - especially formal semantics, which even most
software developers and engineers lack training in - and the
labeling of constructs with “English-like” mnemonics
naturally will lead to the use of a knowledge representation
language by actual users that varies from what its designers
intended. In decentralized systems like the Semantic Web,
this problem is naturally exacerbated. However, far from
being a sign of abuse, it is a sign of success, as it means
that the Semantic Web is actually being deployed outside
academia and research labs.</p>
      <p>
        The problem has definitely arisen on the Semantic Web
in terms of the use of owl:sameAs in Linked Data. In
Linked Data, each item of interest is given a URI, that in
turn redirects to either human-readable HTML or
machinereadable RDF depending on content negotiation. The URI
for the item itself, which is called rather confusingly a
“noninformation resource” in Linked Data circles, as a web-page
or RDF graph would be an information resource, as the
“ distinguishing characteristic of these resources is that all
of their essential characteristics can be conveyed in a
message” [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Usually, this data is released in some sort of
automated or semi-automated manner, often by mapping
relational data to RDF. Somehow, a URI is chosen for each
identifier in the data-set that is exported in Linked Data.
Although the general thinking in RDF (and thus, the main
idea behind the ability of RDF graph merge) was that URIs
would be re-used, in practice URIs are simply minted anew
for each identifier in a Linked Data set. As opposed to the
simple exporting of data-sets into RDF, what puts the links
in Linked Data is the use of what we term identity links
- links that define two things to be identical or otherwise
closely related - to link between diverse and heterogeneous
data-sets. While there has been some research that deals
with this problem [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the scope of the problem is just
beginning to be understood.
      </p>
      <p>
        The most typical link used is owl:sameAs, which is in
general used to to say “that two URI references actually refer
to the same thing” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For example, the city of Paris is
referenced in a number of different Linked Data-sets:
ranging from OpenCyc to the New York Times. In DBPedia, a
Linked Data export of Wikipedia, these data-sets are
connected by owl:sameAs. In particular, dbpedia:Paris is owl:sameAs
as both the opencyc:CityOfParisFrance and
opencyc:Paris DepartmentFrance, as OpenCyc distinguishes
that the department of Paris from Paris itself, as Paris DepartmentFrance
is a distinct geopolitical entity from CityOfParisFrance,
despite the fact that both share the same territory, while Wikipedia
does not make this distinction.
4.
      </p>
    </sec>
    <sec id="sec-6">
      <title>THE SEMANTICS OF OWL:SAMEAS AND</title>
    </sec>
    <sec id="sec-7">
      <title>ALTERNATIVES</title>
      <p>
        At first, this use of owl:sameAs seems to be harmless.
Its actual definition is that “the built-in OWL property
owl:sameAs links an individual to an individual” and “Such
an owl:sameAs statement indicates that two URI references
actually refer to the same thing: the individuals have the
same identity” [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Furthermore, OWL states that “It is
unrealistic to assume everybody will use the same name to
refer to individuals. That would require some grand design,
which is contrary to the spirit of the web” [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>However, owl:sameAs does have a particular semantics of
individual identity, namely that the two individuals are
exactly the same and so share all the same properties, and thus
are equivalent in terms of Leibnitz’s identity of
indiscernables. Given that OWL has no unique name assumption,
once there is an application of owl:sameAs to two different
URIs, then any statement that is given to a single URI is
true for every other URI that has an owl:sameAs link.
Furthermore, while in OWL Full owl:sameAs can be considered
to be the same as between any URIs as classes can be
considered “individual” instances of other classes and
properties can be considered individuals, in OWL DL in order to
preserve decidability individuals are strictly separated from
classes, and so one should use OWL DL equivalentClass and
equivalentProperty instead. Therefore, quick-and-dirty use
of owl:sameAs will almost always lead to OWL Full, which
very little work has been done on in terms of efficient
implementations of inference. The real trick with owl:sameAs
is that it works both ways: as it is both symmetric and
transitive, so that anyone can link to your data-set with
owl:sameAs from anywhere else on the Web without your
permission, and any statement they make about their own
URI will immediately apply to yours. As imaginable, such
transitive closures can immediately get very large. There
have been considerable rumors in the Linked Data
community that such use of owl:sameAs is somehow “wrong”
with regards to the formal semantics of OWL. It does seem
intuitively that the use of owl:sameAs may be the logical
equivalent of a bulldozer. Since inference is rarely used on
the Linked Data, these possible side-effects have not been
noticed. Does this really matter? Is the use of owl:sameAs
an exploding time-bomb for Linked Data, or a harmless
convention? What exactly is the point of linking data if nobody
is going to draw any conclusions which use the links?
5.</p>
    </sec>
    <sec id="sec-8">
      <title>FOUR VARIATIONS OF IDENTITY IN</title>
    </sec>
    <sec id="sec-9">
      <title>LINKED DATA 5.1</title>
    </sec>
    <sec id="sec-10">
      <title>Same Thing As But Referentially Opaque</title>
      <p>The first case is when the two URIs do refer to the same
thing, but all the properties ascribed to one URI are not
necessarily accepted by the other. This means that the use of
the URI is referentially opaque, which means that one URI
cannot be substituted for another (the Principle of
Substitution is violated), i.e. the context is intensional. A classic
example of this would be the the concept of sodium in
DBpedia, which has an owl:sameAs link to the concept of sodium
in OpenCyc. The OpenCyc ontology says that an element
is the set (class) of all pieces of the pure element, so that for
example sodium in Cyc has a member which is the lump of
pure metallic sodium. On the other hand, sodium as defined
by DBPedia is used to also include isotopes, which have
different number of neutrons than “standard” sodium. So, one
should not state the number of neutrons in DBPedia’s use of
sodium, but one can with OpenCyc. Therefore, owl:sameAs
here is in error, as it does not allow mutual substitutivity.
Indeed, this use of URIs in an opaque referential context
is likely what most uses of owl:sameAs actually are for, as
it is unlikely that most deployers of Linked Data actually
check whether or not all the properties and their associated
inferences are shared amongst linked data-sets. This
property is exceedingly important for Linked Data, as contrary
to popular doctrine, it is possible that the Web is full of
referentially opaque contexts. The problem is there is no way
to get a handle on contexts informally without descending
into non-logical reasoning currently.
5.2</p>
    </sec>
    <sec id="sec-11">
      <title>Same Thing As But Different Context</title>
      <p>In this case, two URIs do refer to the same thing and all
properties do hold of both URIs, but that we cannot re-use
the URI in a different context. The central intuition here
is there are ’forms of reference’ appropriate to a context,
especially in social contexts. To use an example from Lynn
Stein, when at a meeting of the PTA (Parent-Teacher
Association) she is Ms. Stein, Rachel’s mum, not Professor Stein
of MIT. This does not mean that in the PTA meeting Ms.
Stein is somehow not a professor, but that within that
context those properties do not matter. At first, this distinction
may not seem directly relevant to linked data, provided we
keep ’name’ in the social sense distinct from ’identifier’ in
the Web sense. However, this distinction raises other issues
about what kind of ’names’ URIs really are and precisely
why certain properties for linked data are given in the RDF
description of a certain URI and others are not.
5.3</p>
    </sec>
    <sec id="sec-12">
      <title>Represents</title>
      <p>Often identity is conflated with representation. While the
term “representation” is often very contentious, its intuitive
definition is that, just as a picture of something depicts
something, a URI can be for a representation of a thing
rather than the thing itself. Intuitively, there seems to be
a clear-cut line between that which represents something
(the representation) and that which is represented (the
referent), sometimes called the relationship between a “sign”
and a “signifier.” However, the relationship is often not as
clear-cut as we would lead ourselves to believe. In fact, in
human natural language use-mention confusions are
ubiquitous and often useful. For example, often a web-page or an
e-mail address are used to refer to a person. Rather than
yell at the world to get an education in philosophical logic,
it may be better to clarify this relationship. It also might be
worth distinguishing between using a representation to refer
to the represented, such as using a picture of Berners-Lee to
refer to Tim Berners-Lee himself, using something
accidentally or contextually to refer to something, a phenomenon
called displaced reference. The example of using an e-mail
box to refer to a person is not an error but rather more a
displaced reference.
5.4</p>
      <p>Sometimes its clear that two things are not identical but
simply closely related in some manner. This, for example, is
the relationship between the district of Paris and the
Department of Paris in Cyc. Furthermore, there are often complex,
structured, yet hard-to-specify relationships between things,
such as the relationship between isotopes and elements, the
quantity and a measurement of a quantity, and an image
and a facsimile of that image. In web architecture, it is
clear there is a close relationship These relationships that
are ‘very similar to’ seem to deserve their own property, but
are often currently lumped together in Linked Data under
the all-encompassing use of owl:sameAs. Most of the more
noticeable errors of owl:sameAs seem to come from this
category, and it is likely that examples such as the relationship
of sodium within DBPedia to sodium in OpenCyc are of this
kind as well.</p>
    </sec>
    <sec id="sec-13">
      <title>MOVING FORWARD 6. 6.1</title>
    </sec>
    <sec id="sec-14">
      <title>Same Thing As But Referentially Opaque</title>
      <p>
        Surprisingly, most of the time people use owl:sameAs they
are accidentally doing what is sort of an implicit import of
statements of the subject of the owl:sameAs statement.
Obviously, to address the weaker identity implied by the
referentially opaque use of identity, a weaker version of owl:sameAs
should be specified that does not import all the properties in
a full transitive closure.Somewhat similar predicates already
exist in SKOS as skos:exactMatch and skos:closeMatch, but
their use seems rare in Linked Data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and they require
domains and ranges of SKOS concepts. As most Linked Data
does not actually do much inference, one in reality only
imports what statements are actually used. So could continue
using owl:sameAs with a kind of ‘importer beware’
principle. Informally, it is one thing to link to your URI, but its
another thing to believe what you say about it as though you
were talking about my URI. Put another way, one should
be wary of accepting conclusions over here that could have
been drawn over there, so to speak.
6.2
      </p>
    </sec>
    <sec id="sec-15">
      <title>Same Thing As But Different Context</title>
      <p>
        There is already a notion of context built into RDF, namely
named graphs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Even though it is not part of the official
standard (albeit, snuck into RDF through SPARQL and
implemented in almost every tool-set), it is clear that part of
the problem with owl:sameAs usage on the Semantic Web is
that sameAs should not always be a statement between two
URIs in a unqualified manner, but may be qualified as
holding only within a certain named graph. Furthermore, noting
the that the use of owl:sameAs is somewhat equivalent of
an accidental usage of owl:imports, although the exact
behavior of this construct has only been intuitively (although
not formally) specified. These implicit imports should
probably either be separated, so that one states at first that two
items are identical using the weaker form of identity given
above, and then independently if one feels strongly about
that the two URIs are not referentially opaque, one imports
all (or even some of) the associated properties of the
“identical” resource. Furthermore, there could be an inverse of this
implicit importing of identity, where statements that are
imported due to the transitive closure of owl:sameAs are not
imported. This would allow a more fine-grained measure of
control over the use of identity in named graphs.
6.3
      </p>
    </sec>
    <sec id="sec-16">
      <title>Represents</title>
      <p>The use of owl:sameAs is already a sort of statement of
this kind in the FOAF vocabulary, the foaf:isPrimaryTopicOf
statement. One possible solution to this problem would be
to wrap such a property into some core W3C approved
standard. However, the problem is that it is unclear if a strict
separation between mention and use is necessary or even
desirable. In many contexts, as relevant experience in OpenID
deployment shows, using an e-mail as an identifier for a
person is often more natural than the URI of a home-page, or
even a “non-information resource.” What is needed
however, is a way to make the distinctions that either conflate
or separate mention and use or on the fly. The use of weak
identity statements - and in this case, a “represents”
statement - and explicit importing and de-importing of properties
within the context of particular named graphs would allow
us to do state things like “Within this named graph and only
within this named graph, the e-mail address URI is
identical to the person and shares their properties” and “Within
this other named graph, the e-mail address represents the
person, but does not have all the properties of that person.”
6.4</p>
    </sec>
    <sec id="sec-17">
      <title>Very Similar To</title>
      <p>Again, the tempting easy solution is simply to introduce
a new predicate for “very similar to.” The SKOS
vocabulary has a number of “matching” predicates that are close
in meaning to this, ranging from hierarchically structured
skos:broadMatch and skos:narrowMatch to the more
suitable skos:closeMatch. However, the main issue with these
predicates is that again, their use may be a matter of
opinion, as someone’s close match may be another person’s
identical match. One is also tempted to engage with some sort
of “fuzzy” or numerically weighted uncertainty measure
between one and zero of identity, but then the real hard
questions of where precisely will these real values come from
and their relationship to actual probability theory muddies
these conceptual waters quickly. It seems that beneath this
apparently simple property is likely a whole family of
heterogeneous and semi-structured identity relationships that
should be studied more carefully and empirically observed
before any hasty judgments are made.</p>
    </sec>
    <sec id="sec-18">
      <title>CONCLUSION</title>
      <p>It is possible to do empirical studies of exactly how people
use owl:sameAs in the wild. Examples of owl:sameAs can
be taken from the Linked Data Web in the wild in order to
determine how experimentally robust these distinctions are
would be, i.e. do people actually use owl:sameAs in the four
ways that are outlined above, and are there more possible
ways that we are not aware of? In fact, even the ability to
recognize these kinds of distinctions may vary quite wildly
by background and training. Lastly, if a number of
empirical distinctions between identity links that are currently
conflated by owl:sameAs can be made in a robust manner,
then there is considerable formal semantic work to be done.
Giving the Linked Data community well-defined (both
formally and informally) predicates should be done even when
one does think of the properties given to URIs as absolute
truths given by Linked Data publishers or W3C
specifications, but as functions of their actual use. The (ab)use of
owl:sameAs in Linked Data is not a threat, it’s an
opportunity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alexander</surname>
          </string-name>
          .
          <article-title>The Leibniz-Clarke correspondence</article-title>
          . Manchester University Press, Manchester, United Kingdom,
          <year>1956</year>
          .
          <source>Republished</source>
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cygniak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          . How to publish
          <source>Linked Data on the Web</source>
          ,
          <year>2007</year>
          . http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial/ (Last accessed on May 28th
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brachman</surname>
          </string-name>
          .
          <article-title>What IS-A is and isn't: An analysis of taxonomic links in semantic networks</article-title>
          .
          <source>IEEE Computer</source>
          ,
          <volume>16</volume>
          (
          <issue>10</issue>
          ):
          <fpage>30</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brachman</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmolze</surname>
          </string-name>
          .
          <article-title>An overview of the KL-ONE knowledge representation system</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ):
          <fpage>151</fpage>
          -
          <lpage>160</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Stickler</surname>
          </string-name>
          .
          <article-title>Named graphs</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>247</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Frege</surname>
          </string-name>
          .
          <article-title>Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens</article-title>
          . Halle, Germany,
          <year>1879</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Frege</surname>
          </string-name>
          . Uber Sinn und Bedeutung.
          <source>Zeitshrift fur Philosophie and philosophie Kritic</source>
          ,
          <volume>100</volume>
          :
          <fpage>25</fpage>
          -
          <lpage>50</lpage>
          ,
          <year>1892</year>
          .
          <article-title>Reprinted in The Philosophical Writings of Gottlieb Frege (</article-title>
          <year>1956</year>
          ), Blackwell, Oxford, United
          <string-name>
            <surname>Kingdom</surname>
          </string-name>
          (
          <year>1956</year>
          ), translated by Max Black.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          . In defense of ambiguity.
          <source>International Journal of Semantic Web and Information Systems</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <source>Architecture of the World Wide Web. Technical report, W3C</source>
          ,
          <year>2004</year>
          . http://www.w3.org/TR/webarch/ (Last accessed Oct 12th
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaffri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glaser</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Millard. Managing</surname>
          </string-name>
          <article-title>URI synonymity to enable consistent reference on the Semantic Web</article-title>
          .
          <source>In Proceedings of the Workshop on Identity, Reference, and the Web (IRSW) at ESWC2008</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miles</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer. SKOS Simple</surname>
          </string-name>
          <article-title>Knowledge Organization System reference</article-title>
          .
          <source>W3c recommendation, W3C</source>
          ,
          <year>2008</year>
          . http://www.w3.org/TR/skos-reference/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Quillian</surname>
          </string-name>
          .
          <article-title>Semantic memory</article-title>
          . In M. Minsky, editor,
          <source>Semantic Information Processing</source>
          , pages
          <fpage>216</fpage>
          -
          <lpage>270</lpage>
          . MIT Press, Cambridge, Massachusetts, USA,
          <year>1968</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and D.</given-names>
            <surname>McGuinness. OWL Web Ontology Language Guide. Recommendation</surname>
          </string-name>
          , W3C,
          <year>2004</year>
          . http://www.w3.org/TR/2004/REC-owl-guide20040210.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>