<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computing FOAF Co-reference Relations ⋆ with Rules and Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jennifer Sleeman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County Baltimore. MD 21250</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The friend of a friend (FOAF) vocabulary is widely used on the Web to describe 'agents' (people, groups and organizations) and their properties. Since FOAF does not require unique ID for agents, it is not clear when two FOAF instances should be linked as co-referent, i.e., denote the same entity in the world. One approach is to use logical constraints such as the presence of inverse functional properties as evidence that two individuals are the same. Another applies heuristics based on the string similarity of values of FOAF properties such as name and school as evidence for or against co-reference. Performance is limited, however, by many factors: non-semantic string matching, noise, changes in the world, and the lack of more sophisticated graph analytics. We describe a prototype system that takes a set of FOAF agents and identifies subsets that are believed to be co-referent. The system uses logical constraints (e.g., IFPs), strong heuristics (e.g., FOAF agents described in the same file are not co-referent), and an SVM generated classifier. We present initial results using data collected from Swoogle and other sources and describe plans for additional analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>FOAF</kwd>
        <kwd>machine learning</kwd>
        <kwd>linked data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The FOAF (Friend of a Friend) vocabulary has been one of the most widely used
ontologies since the beginning of the Semantic Web. It defines classes and
properties for describing entities (people, organizations and groups), their attributes,
and their relations. FOAF’s popularity is evident in the social networking sites
that publish profiles using FOAF, the number of RDF documents using the
FOAF namespace, the number of foaf:Agent instances or the volume of RDF
triples using FOAF terms [
        <xref ref-type="bibr" rid="ref20 ref8">8, 20</xref>
        ]. Its widespread use can be explained by the
common need to publish, find and reason with basic data on people and
organizations and also the lightweight and practical design of the vocabulary.
      </p>
      <p>One of the principles underlying the Semantic Web is that it enables us
to give concepts and individuals URIs that serve as unique identifiers, removing
much of the ambiguity that comes with using human language and representation
systems that are not designed to be distributed and open. The FOAF ontology
allows one to create a foaf:Agent instance that indeed represents a single, unique
entity and to describe its properties and relations.</p>
      <p>What FOAF does not require is a property that represents a globally unique
identifier (GUID) that can be used to recognize when two foaf:Agent
individuals are co-referent, i.e., refer to the same individual whether real or fictional. It’s
considered good practice to associate a URI with each FOAF instance, which
is a global ID, but not necessarily a unique one. It is common to find scores of
FOAF descriptions on the Web for the same entity, each with a different URI.</p>
      <p>This is a problem that is common to other representation systems, including
human language, database address records, and even official government records.
An entity’s name is often a useful property for distinguishing it from other
entities, but there are many people named Michael Jordan and several companies
known as Apple. Given two such descriptions and depending on the context and
task at hand, we may find enough evidence to conclude that the entity mentions
are or are not co-referent. If there is enough supporting evidence to conclude
that they are, we can integrate the information from the two descriptions.</p>
      <p>There are many potential problems in integrating information from two
sources once we’ve decided that they provide information on the same entity.
One source may be known to be unreliable. The properties may be subjective.
Sources may differ in their beliefs even about objective properties. The
descriptions may include dynamic properties and refer to their values at different points
in time.</p>
      <p>
        In the rest of this paper we discuss the problem of deciding when RDF
descriptions of two FOAF agents are likely to be co-referent. One common approach
is to use the presence of FOAF properties declared to be inverse functions [
        <xref ref-type="bibr" rid="ref14 ref17 ref18 ref28">28,
14, 17, 18</xref>
        ] (e.g., foaf:mbox) as a validation that two individuals are the same. A
second [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] applies heuristics based on the string similarity of values of FOAF
properties such as name and school as evidence for or against co-reference.
      </p>
      <p>We describe an ’ensemble’ approach which uses both a rules-based model
and a vector model and clustering to group sets of co-referent pairs. The
vector model is part of a supervised machine learning method which uses features
defined over pairs of FOAF individuals to produce a classifier for identifying
co-referent FOAF instances. The rules-based model uses logical rules such as
inverse functional properties and other identifying properties such as owl:sameAs
and owl:differentFrom.</p>
      <p>
        This approach accounts for spare data, in that when inverse functional
properties or other properties that provide evidence of co-reference such as owl:sameAs
are not present, the classifier can then be used to provide a prediction of
coreference. However, when these properties are present, taking advantage of their
presence is practical. A machine learning approach is used to account for issues
that arise when using a heuristic string similarity approach. Common properties
are not always present in both profiles and noise in the data can obfuscate the
fact that two profiles represent the same person. Noise and anomalies in the data
can feasibly be captured in a classification model.
In our earlier work, we used owl:sameAs to model the semantics of our
processing. That is, when we concluded that two FOAF agents referred to the same
individual, we asserted that they were equivalent using owl:sameAs. The use of
sameAs, however, can cause problems, when integrating information as discussed
by Ding et al [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and others [
        <xref ref-type="bibr" rid="ref15 ref25 ref26">26, 25, 15</xref>
        ]. We are currently using weaker predicates,
coref and notCoref to represent that two instances are or are not thought to
be coreferential. For coreferential instances, we can merge their descriptions for
some, but not all, uses.
      </p>
      <p>We may decide that two FOAF agents refer to the same object in the world,
but their descriptions might be incompatible and asserting that they are
equivalent with sameAs could lead to contradictions. One source of problems is that the
descriptions could have both been true but at different times. Simply
concatenating their triples is dangerous. Ding et al. present an example of two FOAF
profiles for Li Ding. One hosted at Stanford was accurate when it was published
several years ago, but some facts have changed since then. A more recent FOAF
profile indicates that he is now working at RPI and holds a job title of Research
Scientist. Each profile uses a unique URI to identify the person Li Ding, and it
is reasonable to declare the two URIs refer to the same person. However, if we
connect the two URIs using sameAs, an OWL reasoner can infer that Li Ding
holds the position of Research Scientist at Stanford, which has never been the
case. Another problem is that sources might differ in their beliefs about the
world, such as the birthplace of the 44th president of the United States.</p>
      <p>Figure 1 gives some axioms in N3 for the coref and notCoref properties. The
coref property is transitive and symmetric and has sameAs as a sub-property.
notCoref is symmetric, but not transitive and has owl:differentFrom as a
subproperty. The first rule states that if two instances, a and b, are not coreferent,
then every instance coreferent with a is :notCoref with every instance
coreferent with b. The second rule, which is really a heuristic, states that if a knows b,
then they are assumed to be distinct individuals and thus :notCoref. Note that
owl:sameAs implies coref and owl:differentFrom implies notCoref, so
reasoners that can derive sameAs and differentFrom properties will also contribute
to computing coreference relations.</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <p>
        The problem of identifying co-reference entities is common in many contexts,
including databases, bibliographic collections, and Semantic Web graphs. The
earliest applications dealt with linking records for people in databases of
significant life events, such as birth, marriage, and death records [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Fellegi and
Sunter’s 1969 paper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] provided an early formal model for matching database
records which represented identical persons, objects or events. Elmagarmid et
al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a recent survey of record linking in database systems.
      </p>
      <p>
        Name disambiguation in bibliographic databases has also been studied.
Citations are rich with names – for people, departments, institutions, journals,
conferences, publishers etc. Complicating the matching process is the fact that
these are often abbreviated using many inconsistent forms. Han et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
describe supervised learning approaches for name disambiguation in citations used
in the CiteSeer system. Singla and Domingos [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] apply machine learning and
probabilistic logic to a similar problem.
      </p>
      <p>
        The traditional approach on the Semantic Web is the process of ’smushing’
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] FOAF instances – combining the profiles. ’Smushing’ FOAF profiles can
bring together information from various sources that are determined to represent
the same ’person’. One can choose to rely solely on the presence of owl:sameAs,
as this property is meant to link individuals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, relying on this property
alone to ’smush’ data is not effective, as its presence is not always found and it
can also be represented inaccurately. There are multiple techniques used to both
identify co-referent FOAF profiles and that perform some type of ’smushing’ [
        <xref ref-type="bibr" rid="ref18 ref27 ref28">28,
27, 18</xref>
        ].
      </p>
      <p>OWL’s InverseFunctional property class (IFP) can help provide a way to
recognize co-referent profiles, but it does not offer a complete solution. The FOAF
vocabulary defines, for example, foaf:homepage and foaf:mbox to be inverse
functions, providing strong evidence that two FOAF agents are co-referent if
they share an identical value for either of those properties. However, with the
popularity of social networking sites that support FOAF extraction of profiles,
extracted FOAF profiles do not always include such an inverse functional
property and sometimes these properties can be misused.</p>
      <p>
        This was discovered by previous work [
        <xref ref-type="bibr" rid="ref18 ref19 ref28">28, 18, 19</xref>
        ]. For example, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] describes
how a list of FOAF profiles produced by exporters with empty foaf:mbox values
all contained duplicate foaf:mbox sha1sum values. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], it was discovered that
a large portion of FOAF profiles which contained foaf:weblog property,
contained a duplicate value for each profile, representing the community web logs.
In [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], foaf:weblog and foaf:homepage in particular were found to be
representing collective sites. In our previous work [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], we have found foaf:homepage
values representing community web sites and we have also found FOAF profile
with the absence of inverse functional properties altogether. It is not uncommon
for several people to share a common value that is contained within an inverse
functional property.
      </p>
      <p>Fig. 2: Co-Referent System Architecture
4</p>
    </sec>
    <sec id="sec-3">
      <title>Architecture and Methodology</title>
      <p>
        Our overall approach is similar to the ones used in other co-reference problems,
such as [
        <xref ref-type="bibr" rid="ref24 ref31">31, 24</xref>
        ]. Given a collection of FOAF instances to compare, we would
like to cluster them into sets that we believe refer to the same person in the
world. This process is divided into five stages: (i) generating candidate pairs, (ii)
generating the rules-based model, (iii) classification, (iv) designating pairs as
coreferent or not, and (v) creating clusters. Figure 2 shows a high level architecture
of our system. Entities are extracted from FOAF profiles and are the basis for
the system. Entities can also be represented as a cluster of previously evaluated
entities. Co-reference is determined by evaluation of both rules-based and vector
models.
4.1
      </p>
      <sec id="sec-3-1">
        <title>Methodology</title>
        <p>We parse FOAF profiles by extracting triples from the associated URL and build
an entity table based on extracted persons. For each entity found in the FOAF
profile, a new entity is created in the database. When an entity is defined by
a foaf:knows graph, our system uses any referring URL for that entity and
attempts to parse FOAF data described by that URL. In a number of examples,
we acquire more information about an entity by retrieving their FOAF profile.
Our ensemble approach builds both a rules-based model that consists of results
generated by logical rules and a vector classification model. If co-reference cannot
be determined by the rules-based model, the prediction established by the vector
model is then used. Co-referent pairs are part of larger clusters that are also used
in the system to potentially discover other co-referent pairs. When we cluster our
entities we use results from the rules-based model as a possible entity elimination
from the cluster due to a logical result that indicates a pair within the cluster is
known not to be co-referent.
{?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) =&gt; {?a :coref ?b}
{?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) =&gt; { ?x :coref ?y }
{?a foaf:knows ?b. ?a foaf:knows ?c. ?b neq ?c} =&gt; {?b :notCoref ?c}</p>
        <p>
          FOAF profiles were obtained from URLs extracted from Swoogle [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. When
retrieving documents based on the Swoogle listing, an attempt is made to retrieve
the latest version of the document and if the latest version is no longer accessible
we retrieve the cached version from the Swoogle database. We also used URLS
extracted from tests conducted in previous work [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Candidate Designation</title>
        <p>Given a potentially large collection of N FOAF instances we could proceed by
testing each of the O(N 2) possible pairs to see which are co-referent. Since the
vast majority of the pairs will not be matched and the co-reference test will be
relatively expensive, we start by filtering the possible pairs to produce a smaller
set of candidates using a simple string matching heuristic test for each pair. The
result is a set of FOAF instance pairs that can be used for both training and
generating test sets that will be run through the classifier in step two.</p>
        <p>A potential match is calculated based on common properties. For each pair
of FOAF instances, an exact match is attempted for each property. If the exact
match returns a false match for all properties then a partial match is attempted.
If no partial matches exist, we attempt a simple cross-property match. For each
type of match potential, if any single property pair returns true, we include the
pair as a candidate. By performing this step we reduce the number of potential
matches per URL which has improved total running time.
4.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Rules-based Model</title>
        <p>Our rules-based component consists of rules that can conclude that a pair of
FOAF instances are co-referent or not-coreferent. Of course, for most pairs,
neither conclusion can be drawn. If the rules conclude that both are true, then
this inconsistency results in neither conclusion being used.</p>
        <p>Some of the rules implement the semantics of OWL given the axioms in
Figure 1, i.e., owl:sameAs implies :coref and owl:differentFrom implies :notcoref.
This is supported by rules that use functional and inverse function properties,
as shown in Figure 3.</p>
        <p>We also use a heuristic rule that all individuals in a FOAF agent’s knows
network are assumed to be distinct, i.e., not co-referent. Note that the rule
is applied to a graph that is extracted from a single document without prior
processing and thus applies to explicitly asserted foaf:knows relations.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Vector Model</title>
        <p>
          The co-referent classifier predicts whether two FOAF instances are co-referent.
We use a Support Vector Machine (SVM) for classifying our data. We used
SVMLight [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] with a linear kernel and standard parameters to build our model
for classification. The classification model and predictions are captured for
coreferent processing. Potential pairs are evaluated using a number of features
based on FOAF property comparisons.
        </p>
        <p>
          Feature Set Property-specific features include the following types:
– Inverse functional properties as a boolean feature
– Two different distance measures of properties common to both instances
– More complex distance measures, which might include unpacking
semantic information (e.g., the geographical distance between two geotags) and
resolving entity mentions (e.g., Baltimore) to linked data nodes
– Partial analysis of the graphs centered on the instances, such as the
immediate (one-hop) social networks formed by foaf:knows properties
Distance metrics were calculated using the Levenshtein distance [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] method
and Dice’s Coefficient [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>(2 ∗ N umberof characterbigramsinbothstrings)
Dice′sCoef f icient :
(N umberof bigramsinstring1 + N umberof bigramsinstring2)
(1)</p>
        <p>Simple Property Matching Distance. A simple property match is when
a single property matches within the two FOAF instances being evaluated. For
example, foaf:name matches in both instances.</p>
        <p>Partial Property Matching Distance. In some cases, a property has a
subpart which represents uniqueness that can be used as a distinguishing string
to be matched to a subpart of the same property in a different instance. For
example, part of the foaf:weblog property offers a partial match.</p>
        <p>Cross-Property Matching Distance. In some cases, either a full
property or a subpart of a property can be used to match a different property in
another FOAF instance. In some of our gathered FOAF instances we discovered
properties that were commonly cross-matched. For example, a foaf:name string
part would correspond to a foaf:nick.</p>
        <p>Training We automatically label training and tests using a heuristic. Training
and tests must be manually inspected and evaluated for correctness. What we
have seen is over a 70% accuracy in the automatic labeling heuristic. This has
reduced the time it takes us to generate our training and test sets.
Once co-referent pairs are designated we cluster our pairs in such a way that the
cluster is a representation of all instances of a particular FOAF agent. Clusters
can grow over time as the amount of information used during the pair evaluation
increases. We use a greedy process for clustering foaf individuals that begins by
putting each in a singleton coreference set. A merging process continues as long
as two candidate sets are judged to be similar enough to be merged into a new
one that replaces its ancestors and stops when there are no pairs that can be
merged. In this figure, the four foaf individuals end up in two coreference sets.
When a FOAF pair is designated as co-referent this forms a cluster. As clusters
begin to form in the system with multiple iterations, the co-referent pairing can
be in the form of a FOAF instance and a cluster, two FOAF instances, and two
FOAF clusters as depicted in Figure 4.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>We ran two experiments, the first resulting in about 50,000 triples with over 500
entity mentions. Approximately 600 classes were used for training. The second
had about 250,000 triples with over 3500 entity mentions. The classification
training set consisted of over 1800 classes. For experiment two the distribution
of URLs is conveyed in Figure 5. We conducted a 10-fold cross-validation with
results shown in Table 2.
5.1</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>For experiment two we show in Table 1 that we only saw the inverse functional
property rule result in a number of positive co-referent cases. The majority
Fig. 5: Experiment Two URL Distributions
of the rules resulted in an undetermined state. As expected, the foaf:knows
rule returned a number of pairs that resulted in a non-co-referent state. For
experiment one 900 pairs were designated as a non-match and the majority of
the rules returned an undetermined result. Table 2 shows that our classification
step is likely predicting accurately co-referent and non-co-referent pairs.</p>
        <p>During our E2 clustering phase, the first phase of clustering resulted in a
90% accuracy. The error occurred in pairs that should have been clustered but
were not. A second round of clustering did not yield any new relationship pairs
among instances but cluster to cluster pairing did occur.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>The results of the above experiments are encouraging however since our new
approach actually retrieves additional FOAF profiles based on members defined
within the foaf:knows graphs, we quickly reach large numbers of ’entities’
with the average number of entities known by a person between 50-100 people.
This can produce a selection of entities that is tightly linked which can have
two effects on the system. It can reduce diversity of analyzed data and it can
produce a number of entities that are likely representing the same person. Future
iterations will include larger, more diverse sets of data with a diversity filter used
to select pairs that span a number of domains. Our two experiments explored
using different url distributions, however, 10-fold cross-validation results were
close in measures. When choosing to add new classes to a cluster a threshold is
used as a way to reduce errors. This threshold will require additional testing to
determine an appropriate setting.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>We have described an approach to predicting coreferent pairs of FOAF instances
that uses a small set of rules and a classifier developed by supervised machine
learning process. The descriptions of coreferent pairs are merged to create a new
description that is then re-evaluated to find additional descriptions judged to be
coreferent.</p>
      <p>
        We have been working with FOAF data as an instance of a larger problem:
automatically linking RDF instances based on their descriptions. Making
headway on this problem will allow us to more easily add data to the growing linked
data cloud [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our machine learning approach, system architecture and many
of our techniques should also apply to non-FOAF data equally well.
      </p>
      <p>
        The FOAF co-referent problem described here is also a common problem
in non-FOAF domains. Our approach can be abstracted and applied to other
domains. In particular, instance matching [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] among ontologies is a domain
that could benefit from such a co-referent solution. Entity clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is also
another domain which could benefit from our co-referent solution.
      </p>
      <p>
        Future work will include exploiting additional properties within the instance
that are not of the FOAF vocabulary (e.g., sioc [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and using these properties
to provide additional evidence as to whether a pair of FOAF instances represent
the same individual or not. A portion of our collected FOAF documents had
non-FOAF vocabularies that offered additional information such as ’author’. By
exploiting these additional properties, we could increase accuracy particularly
when a FOAF property is absent and the non-FOAF property offers the same
meaning.
      </p>
      <p>As highlighted in our introduction, inverse functional properties can be
implemented incorrectly. We account for this type inaccuracy in our classification
method, however we also plan to account for this in our rules-based model in
future revisions of the system.</p>
      <p>Many properties asserted about FOAF instances have string values that refer
to entities. Examples from the core FOAF vocabulary are foaf:Organization
and foaf:fundedBy. We would like to recognize that two strings refer to the
same entity when their values are different but known aliases or alternate names.
Luckily, for many entities, it is easy to generate lists of known aliases drawing on
resources such as Gazetteers, Wikipedia and Freebase. We have developed lists of
known aliases for organizations and places from data extracted from Wikipedia
and Freebase, including aliases for about 270K places and 50K organizations.
The current system does not yet exploit these lists but we plan to do so in the
next version, probably as an additional string matching metric, e.g., the two
instances have a property whose values differ but are in members of a known set
of aliases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sekine</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Weps 2 evaluation campaign: Overview of the web people search clustering task</article-title>
          .
          <source>In: 18th Int. World Wide Web Conf</source>
          . Madrid, Spain (April
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , PatelSchneider,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , L.Stein:
          <article-title>Owl web ontology language reference w3c recommendation 10 february 2004</article-title>
          . http://www.w3.org/TR/owl-ref/ (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The emerging web of linked data</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>24</volume>
          (
          <issue>5</issue>
          ),
          <fpage>87</fpage>
          -
          <lpage>92</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Boj¯ars, U.,
          <string-name>
            <surname>Breslin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peristeras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Interlinking the social web with semantics</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>23</volume>
          (
          <issue>3</issue>
          ),
          <fpage>29</fpage>
          -
          <lpage>40</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A comparison of personal name matching: Techniques and practical issues</article-title>
          .
          <source>In: Proceedings of the Second International Workshop on Mining Complex Data. IEEE</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cost</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddivari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doshi</surname>
            ,
            <given-names>V.C.</given-names>
          </string-name>
          , ,
          <string-name>
            <surname>Sachs</surname>
          </string-name>
          , J.:
          <article-title>Swoogle: A search and metadata engine for the semantic web</article-title>
          .
          <source>In: Proc. 13th ACM Conf. on Information and Knowledge Management</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinavier</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>OWL:sameAs and linked data: an empirical study</article-title>
          .
          <source>In: Proc. 2nd Web Science Conf. (April</source>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>How the Semantic Web is Being Used:An Analysis of FOAF Documents</article-title>
          .
          <source>In: Proc. 38th Int. Conf. on System Sciences (January</source>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Entity disambiguation for knowledge base population</article-title>
          .
          <source>In: Proc. 23rd Int. Conf. on Computational Linguistics (August</source>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dunn</surname>
          </string-name>
          , H.:
          <article-title>Record linkage</article-title>
          .
          <source>American Journal of Public Health</source>
          <volume>36</volume>
          (
          <issue>12</issue>
          ),
          <volume>1412</volume>
          (
          <year>1946</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Elmagarmid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ipeirotis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Duplicate record detection: A survey</article-title>
          .
          <source>IEEE Transactions on knowledge and data</source>
          engineering pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Fellegi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sunter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A theory for record linkage</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>64</volume>
          (
          <issue>328</issue>
          ),
          <fpage>1183</fpage>
          -
          <lpage>1210</lpage>
          (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorusso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montanelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varese</surname>
          </string-name>
          , G.:
          <article-title>Towards a benchmark for instance matching</article-title>
          .
          <source>In: Int. Workshop on Ontology Matching</source>
          , volume
          <volume>431</volume>
          ,
          <year>2008</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Golbeck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rothstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Linking social networks on the web with FOAF</article-title>
          .
          <source>In: Proc. 17th Int. World Wide Web Conf. (April</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Halpin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>When owl:sameas isnt the same: An analysis of identity links on the Semantic Web</article-title>
          .
          <source>In: Proc. 2010 Int. Workshop on Linked Data on the Web (April</source>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Han,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Giles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Zha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Tsioutsiouliklis</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Two supervised learning approaches for name disambiguation in author citations</article-title>
          .
          <source>In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries</source>
          . pp.
          <fpage>296</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gassert</surname>
          </string-name>
          , H.:
          <article-title>On searching and displaying RDF data from the web</article-title>
          .
          <source>In: Proc. Demos and Posters, 2nd European Semantic Web Conf. Heraklion</source>
          , GR (May
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Performing object consolidation on the semantic web data graph</article-title>
          .
          <source>In: Proc. I3: Identity, Identifiers, Identification. Workshop at 16th Int. World Wide Web Conf. (February</source>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <article-title>The god entity</article-title>
          . http://blog.aidanhogan.com/
          <year>2008</year>
          /10/god-entity.
          <source>html</source>
          (
          <year>2008</year>
          ), accessed January 2010
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <article-title>Billion triple challenge 2009 dataset</article-title>
          . http://vmlion25.deri.ie/ (
          <year>2009</year>
          ), accessed November 2010
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <article-title>Foaf-project.org definition of smushing</article-title>
          . http://wiki.foaf-project.org/w/Smushing (
          <year>2010</year>
          ), accessed January 2010
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Joachims</surname>
          </string-name>
          , T.:
          <article-title>SVMLight: Support Vector Machine (</article-title>
          <year>1999</year>
          ), university of Dortmund, http://svmlight.joachims.org/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          :
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>Soviet Physics Doklady</source>
          <volume>10</volume>
          ,
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          (
          <year>1966</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Mayfield</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.D.</given-names>
            ,
            <surname>Eisner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Crossdocument coreference resolution: A key technology for learning by reading</article-title>
          .
          <source>In: Proc. AAAI Spring Symposium on Learning by Reading and Learning to Read. AAAI (March</source>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>McCusker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.:</given-names>
          </string-name>
          <article-title>owl:sameas considered harmful to provenance</article-title>
          .
          <source>In: Proc. ISCB Conf. on Semantics in Healthcare and Life Sciences (Feburary</source>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Passant</surname>
          </string-name>
          , A.:
          <article-title>:me owl:sameas flickr:33669349@n00</article-title>
          .
          <source>In: Proc. 1st Int. Workshop on Linked Data on the Web (April</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Price</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawles</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flach</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Estimating whether partial FOAF descriptions describe the same individual</article-title>
          .
          <source>In: Proc. Workshop on Friend of a Friend</source>
          ,
          <article-title>Social Networking and the Semantic Web</article-title>
          (
          <year>September 2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berrueta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Smushing rdf instances: are alice and bob the same open source developer?</article-title>
          <source>In: Proc. 3rd Expert Finder workshop on Personal Identification and Collaborations: Knowledge Mediation and Extraction, 7th Int. Semantic Web Conf. (November</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Singla</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Entity resolution with markov logic</article-title>
          .
          <source>In: 6th Int. Conf. on Data Mining</source>
          ,
          <year>2006</year>
          . ICDM'06. pp.
          <fpage>572</fpage>
          -
          <lpage>582</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.:</given-names>
          </string-name>
          <article-title>A Machine Learning Approach to Linking FOAF Instances</article-title>
          .
          <source>In: Spring Symposium on Linked Data Meets AI</source>
          . AAAI (
          <year>January 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaedke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
          </string-name>
          , G.:
          <article-title>Silk - a link discovery framework for the web of data</article-title>
          .
          <source>In: Proc. 2nd Workshop on Linked Data on the Web</source>
          . Madrid, Spain (April
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>