<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Aligning Ontologies of Geospatial Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rahul Parundekar</string-name>
          <email>parundek@isi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Craig A. Knoblock</string-name>
          <email>knoblock@isi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose´ Luis Ambite</string-name>
          <email>ambite@isi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Sciences Institute and Computer Science Department University of Southern California 4676 Admiralty Way</institution>
          ,
          <addr-line>Marina del Rey, CA 90292</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Web of Linked Data is characterized by linking structured data from different sources using equivalence statements, such as owl:sameAs, as well as other types of linked properties. However, the ontologies behind these sources are not linked. The work described in this paper aims at providing alignments between these ontologies by using an extensional approach. We look at geospatial data sources on the Web of Linked Data and provide equivalence and subsumption relationships between the classes from these ontologies after building hypotheses that are supported by owl:sameAs links. We are able to model one geospatial source in terms of the other by aligning the two ontologies and thus understand the semantic relationships between the two sources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The last few years have witnessed the beginnings of a paradigm shift from publishing
isolated data from various organizations and companies to publishing data that is linked
to related data from other sources, using the structured model of the Semantic Web.
As the data being published on the Web of Linked Data starts to grow, the availability
of related data can be used to supplement one’s own knowledge base. This provides
a huge potential in various domains, like the geospatial domain, where it is used in
the integration of data from different sources. Apart from generating structured data,
a necessary step to publish data in the Web of Linked Data is to provide links from
the generated instances to other data ‘out there,’ based on background knowledge (e.g.
linking DBpedia to Wikipedia), common identifiers (e.g. ISBN numbers), or pattern
matching (e.g. names, latitude, longitude and other information used to link Geonames
to DBpedia). These links are often expressed by using owl:sameAs. A common
occurrence when such links between instances are asserted is the missing link between their
corresponding concepts. Such a link would ideally be able to help a consumer of the
information (agent/human) to model data from the other source in terms of their own
knowledge. This is widely known as ontology alignment, a special form of schema
alignment. Schema alignment has been extensively studied in domains like schema
integration, data warehouses, E-commerce, semantic query processing, etc. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. There is
also a considerable amount of work in ontology matching which is summarized in
Euzenat et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The advent of the Web of Linked Data warrants a renewed inspection of
these methods. Our approach provides alignments between classes from two ontologies
in the Web of Linked Data by looking at the instances which are linked as the same. We
believe that providing an alignment between the ontologies of the two sources on the
Web of Linked Data provides valuable knowledge in understanding and modeling the
semantic relationships among sources.
      </p>
      <p>The paper is organized as follows. We first provide background on geospatial Linked
Data and describe the sources that are the subject of this paper. We then describe our
approach to ontology alignment, where alignments are conjunctions of restriction classes.
We follow with an empirical evaluation on three large geospatial Linked Data sources,
namely GEONAMES, DBPEDIA, and LINKEDGEODATA. Finally, we briefly describe
related and future work and discuss the contributions of this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background on Linked Geospatial Data</title>
      <p>In this section, we provide a brief introduction to the popular use of Linked Data and
the data sources from the geospatial domain that we consider.
2.1</p>
      <sec id="sec-2-1">
        <title>Nature of Linked Data</title>
        <p>
          The Linked Data movement, as proposed by Berners-Lee [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], aims to provide
machinereadable connections between data in the Web. Bizer et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] describes several
approaches to publishing such Linked Data. Most of the Linked Data is generated
automatically by converting existing structured data sources (typically relational databases)
into RDF, using an ontology that closely matches the original data source. For example,
GEONAMES gathers its data from around 40 different sources and it primarily exposes its
data in the form of a flat-file structure, 1 which is described with a simple ontology [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Such an ontology might have been different if designed at the same time as the
collection of the actual data. For example, all instances of GEONAMES have the same rdf:type
of geonames:Feature, however they could have been more effectively typed based on
the featureClass &amp; featureCode properties.
        </p>
        <p>
          The links in the Linked Data on the Web make the Semantic Web browsable and,
moreover, increases the amount of knowledge by complementing data already presented
by a resource. A popular way of linking data on the Web is the use of owl:sameAs
links to represent identity links [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Instead of reusing existing URIs, new URIs are
automatically generated while publishing linked data and an owl:sameAs link is used to
say that two URI references refer to the same thing. Both Ding et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and Halpin et al.
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] discuss the popular usage of the owl:sameAs property. Halpin et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] summarizes
its usages as belonging to one of the four types i) same thing as but different context
ii) same thing as but referentially opaque iii) represents and iv) very similar to. We
however refrain ourselves from going into the specifics and use the term as asserting
same thing. For example, GEONAMES uses the owl:sameAs link to represent an alias or
a co-reference.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Linked geospatial Data Sources</title>
        <p>
          We consider three linked data sources GEONAMES2, DBPEDIA3 and LINKEDGEODATA4.
1 http://download.geonames.org/export/dump/readme.txt
2 http://www.geonames.org/ontology/
3 http://wiki.dbpedia.org/About
4 http://linkedgeodata.org/About
GEONAMES is a geographical database that contains over 8 million geographical names
consisting of 7 million unique features including 2.6 million populated places and 2.8
million alternate names. This database is downloadable free of charge and accessible in
various forms like a flat table, web services, etc. We use the Linked Data representation
of the database available as an RDF dump containing 6,520,110 features and 93,896,732
triples. The structure behind the data is the Geonames ontology [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which closely
resembles the flat-file structure. A typical individual in the database is an instance of type
Feature and has a Feature Class associated with it. These Feature Classes can be
administrative divisions, populated places, structures, mountains, water bodies, etc. Though
the Feature Class is subcategorized into 645 different Feature Codes, the Feature Code
is associated with a Feature instance and not as a specialization of the property
featureClass (this is probably due to automatically exporting of existing relational data into
RDF rather than building data conforming to an ontology). A Feature also has several
other properties, such latitude, longitude, and an owl:sameAs property linking it to an
instance from DBPEDIA.
        </p>
        <p>
          DBPEDIA is a source of structured information extracted from WIKIPEDIA containing
about 1.5 million objects that are classified with a consistent ontology. Because of the
vastness and diversity of the data in DBPEDIA, it presents itself as a hub for links in
the Web of Linked Data from other sources [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The dataset includes individuals not
limited to geospatial data such as 312,000 persons, 94,000 music albums, etc. We are
interested in a subset of this data which GEONAMES links itself with (e.g. places,
mountains, etc.). As DBPEDIA contains a large variety of data (e.g. abstracts, links to other
articles, images, etc.) we limit our approach to RDF containing the rdf:type assertion
and info boxes, which provide factual information.
        </p>
        <p>
          LINKEDGEODATA is a geospatial source with its data imported from the Open Street
Map (OSM) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] project containing about 2 billion triples comprising approximately 350
million nodes and 30 million ways. In order to generate the RDF, the data extracted from
the OSM project was linked to DBPEDIA instances using the user created links on OSM
to WIKIPEDIA articles. These links were then used as the training set on which machine
learning algorithms were applied with a heuristic on a combination of the
LINKEDGEODATA type information, spatial distance, and name similarity [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. As a result the set of
owl:sameAs links was then expanded to include more instance matchings based on the
model learnt. This data is provided by LINKEDGEODATA as an RDF dump.
        </p>
        <p>We loaded these three RDF datasets into a local database. In this paper we only
consider instances from DBPEDIA and GEONAMES that are linked using the owl:sameAs
property. Similarly, we focus on instances from LINKEDGEODATA linked by owl:sameAs
to DBPEDIA instances. In order to reduce the number of self joins required on a source
table in the database, the values related to each individual were converted into a single
row identified with its URI. This forms a table where the columns represent all of the
properties (predicates) that occur in the triples of that source. For example, the table for
GEONAMES subset contains 11 columns not including the identifier. The values for the
properties (object part of a triple) go into the value of the column for the row
identified by URI of the individual (subject part of a triple). We call this a vector. In cases
of multivalued properties, the row is replicated in such a way that each cell contains a
single value but the number of rows equals the number of multiple values. Each new
row however, is still identified with the same URI, thus retaining the number of
distinct individuals. In general, the total number of rows for each individual is the product
of cardinalities of the value sets for each of its properties. Throughout this paper we
describe our methodology using the GEONAMES-DB PEDIA dataset, but we include the
findings of the LINKEDGEODATA-DB PEDIA source in the results section.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Ontology Alignment Using Linked Data</title>
      <sec id="sec-3-1">
        <title>Ontology Alignment</title>
        <p>
          An Ontology Alignment is “a set of correspondences between two or more ontologies”
where a correspondence is “the relation holding, or supposed to hold according to a
particular matching algorithm or individual, between entities of different ontologies.”[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
Entities here, can be classes, individuals, properties or formulas. Our algorithm relies on
data analysis and statistical techniques for matching the two ontologies, which Euzenat
et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] classify as a common extension comparison approach for aligning the structure.
This approach considers two sets A and B of instances belonging to the two ontologies
out of which some instances are common. Set containment relationships between the
set of instances A and B suggest the alignment between the two as follows: A equals B
(A ∩ B = A = B), A contains B (A ∩ B = B), A is contained in B (A ∩ B = A), A is
disjoint from B (A ∩ B = ∅) and A overlaps with B (A ∩ B 6= ∅). Euzenat et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] also
argues that using a Hamming distance is more robust than using simple set containment
as it tolerates misclassified individuals and still produces acceptable results. We
identify common instances between the two ontologies required for this technique using the
owl:sameAs links, where the instance identifier in each ontology gets replaced with a
combination of the URIs from both ontologies. Instead of limiting ourselves to existing
classes, we extend the scope of our alignments by using restriction classes as explained
in the following section. In its stricter sense, an alignment between classes would also
consider structural conformity like cardinality constraints on properties, class
hierarchies, domain and ranges of properties, etc. We however do not consider this in order to
provide simpler alignments as most of these ontologies are rudimentary and based on
pre-existing relational structures.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Restriction Classes</title>
        <p>In the alignment process, instead of focusing only on classes defined by rdf:type, we
also consider classes defined by conjunctions of property value restrictions (i.e,
hasValue constraints in the Web Ontology Language), which we will call restriction classes
in the rest of the paper. For example, each instance from GEONAMES has its rdf:type as
Feature. Traditional alignments would then only be between the class Feature from
GEONAMES and another class from DBPEDIA, for example Place. As mentioned
before, instances in GEONAMES are also categorized by the featureCode and featureClass
properties. A restriction on values of such properties gives us classes that we can
effectively align with classes from DBPEDIA. For example, the restriction class formed
from the concept Feature with its value of featureCode property constrained to the
instance ‘geonames:A.PCLI’ (Independent political entity) aligns with the Country
concept from DBPEDIA. Our algorithm defines restriction classes from the source ontologies
and generates alignments between such restrictions clases using subset or equivalence
relationships.</p>
        <p>Defining the Space of Restriction Classes for a Vector The space of restriction
classes to which an instace belongs is simply the powerset of the property-value pairs of
the vector representing the instance. For example assume that the GEONAMES source has
only three properties: rdf:type, featureCode and featureClass; and that the vector for the
instance Saudi Arabia is (geonames:Feature, geonames:A.PCLI, geonames:A). Then
this instance belongs to the restriction class defined by (rdf:type=geonames:Feature &amp;
featureClass=geonames:A). The other sets from the powerset also form such restriction
classes as shown in Figure 1.</p>
        <p>(rdf:type=geonames:Feature)
(featureCode=geonames:A.PCLI)</p>
        <p>(featureClass=geonames:A)
(rdf:type=geonames:Feature &amp; featureClass=geonames:A)
(rdf:type=geonames:Feature &amp; featureCode=geonames:A.PCLI)</p>
        <p>(featureCode=geonames:A.PCLI &amp; featureClass=geonames:A)
(rdf:type=geonames:Feature &amp; featureCode=geonames:A.PCLI &amp; featureClass=geonames:A)</p>
        <p>We employ a bottom-up approach to determining the restriction classes to which an
instance belongs. For example, the instance vector for Saudi Arabia directly supports
the restriction class (rdf:type=geonames:Feature &amp; featureCode=geonames:A.PCLI &amp;
featureClass=geonames:A). This restriction in turn supports the three restrictions shown
in Figure 1. These subsequently support restrictions formed by eliminating a constraint
on one of the properties in it. In general, if for the n number of properties for the vector
at the current level we select m properties to constrain our restrictions, then that level
(wno,mul)dfrcoomntatihne sment Sre(ssetreicStieocntisotnha3t.r2e)p.rTesheenste elmenmernetsstrfircotimontshweocuolmdbiinnatutornriasluspeptoCrt
restriction sets formed by choosing (m-1) properties from S (i.e. C (n,m-1) ). For a
restriction at level l we call all the restrictions at level (l-1) that it supports as its parent
restrictions. We can see that, if a restriction is supported by some individuals, then those
individuals also support any of its parent restrictions. Using this hierarchical approach
for any vector, we can build a set of restriction classes that it supports (denoted by R),
in a bottum-up fashion, starting with a restriction class corresponding to the vector. It
is evident that in order to consider all restriction classes, the algorithm would be
exponential. We thus need some preprocessing that eliminates those columns that are not
useful.
In order to reduce the search space of alignment hypotheses, we identify properties that
do not contribute to the alignment. Inverse functional properties resemble secondary
keys in databases and identify an instance uniquely. Thus, if a restriction class is
constrained on the value of an inverse functional property, it would only have a single
element in it. As an example, consider the wikipediaArticle property in GEONAMES.
An instance in GEONAMES has links to multiple pages in Wikipedia, each in a
different language. For example, the instance for the country Saudi Arabia5 has links to 237
different articles. Each of these in turn, however, could be used to identify only Saudi
Arabia. On the other hand, each of the seven million unique features is also
classified into nine different Feature Classes. Thus, featureCode and featureClass properties
are not inverse functional properties and instances can be grouped by restrictions with
constraints on these properties. We remove a column from the table of vectors if its
corresponding property is inverse functional. In some of the cases (like the alternateName
and wikipediaArticle) the multivalued properties get eliminated and thus the number of
rows in the flat table gets reduced.</p>
        <p>From these individual vectors, we then perform a join on the owl:sameAs property
from GEONAMES such that we get a combination of properties from both ontologies. We
call this concatenation of two vectors an instance pair. In case of multiple vectors for
the same URI (as a result of multivalued properties), we get instance pairs whose count
is equal to the product of the number of vectors from each of the ontologies.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Forming Alignment Hypotheses</title>
        <p>We build our alignment hypothesis in a bottom-up fashion by examining each instance
pair and the restriction classes that its corresponding vector supports. Each
alignment hypothesis is a two part conjunction of a restriction class from the first ontology
(GEONAMES) and a restriction class from the second one (DBPEDIA). In order to come
up with these hypotheses we look at vectors that compose each of the instance pairs. If
we build a set of restrictions R1 from the first vector and R2 from the second vector,
then the set of hypotheses that this instance pair supports is the Cartesian product of
R1 and R2. We then would aggregate such hypotheses over all instance pairs and
eliminate the hypotheses that contain only a singleton set of instance pairs supporting it.
As a result we would be left with alignments supported by multiple instances between
the two sources. In practice however this brute force method is very inefficient in space
and time considerations. Hence we use an algorithm that capitalizes on the set
containment property of the restrictions and thus finds alignments inspired from the bottum-up
method described in Section 3.2.</p>
        <p>For building hypotheses of alignments, we begin with each instance pair. We assert
a seed hypothesis such that if vector1 is the first part (from GEONAMES) of this pair and
vector2 is the second part (from DBPEDIA), then the restriction class corresponding to
vector1 (built of constraints on each of its property-value pairs) has an alignment with
the restriction class corresponding to vector2. Let r1 &amp; r2 be the restriction classes
constituting this hypothesis. We compute P1 &amp; P2 as the set of parent restrictions of
5 http://sws.geonames.org/102358/about.rdf
(featureCode=geonames:A.PCLI)
(dbpedia:language=dbpedia:Arabic)
(featureCode=geonames:A.PCLI)
(rdf:type=dbpedia:country)
(featureCode=geonames:A.PCLI)
(dbpedia:language=dbpedia:Arabic)</p>
        <p>(rdf:type=geonames:Feature)
(dbpedia:language=dbpedia:Arabic)
(featureCode=geonames:A.PCLI)
(rdf:type=dbpedia:country &amp; dbpedia:language=dbpedia:Arabic)</p>
        <p>(rdf:type=geonames:Feature)
(rdf:type=dbpedia:country &amp; dbpedia:language=dbpedia:Arabic)
(rdf:type=geonames:Feature &amp; featureCode=geonames:A.PCLI)</p>
        <p>(dbpedia:language=dbpedia:Arabic)
(rdf:type=geonames:Feature &amp; featureCode=geonames:A.PCLI)</p>
        <p>(rdf:type=dbpedia:country)
(rdf:type = geonames:Feature &amp; featureCode = geonames:A.PCLI)
(rdf:type = dbpedia:country &amp; dbpedia:language = dbpedia:Arabic)
r1 &amp; r2. We now build a set of parent hypotheses from the seed source that contains
an alignment from each restriction p1 from P1 to r2 and vice versa. We keep a track
of all our hypotheses and all the instances pairs that support them. If an instance pair
supports one hypothesis, then it also supports all of its parent hypotheses. We can say
this, because the restrictions constituting our hypothesis support their own parent
restrictions, which in turn constitute our parent hypotheses. This process is illustrated in
Figure 2.
3.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluating Alignment Hypotheses</title>
        <p>After building the hypotheses, we score each hypothesis to ascertain the degree of
confidence with which we hold this alignment to be true. For each hypothesis, we refer to the
restriction classes constituting it. We then find all instances belonging to the restriction
class r1 from the first source and r2 from the second source. Figure 3 shows the sets of
such instances. We then compute the image of r1 (denoted by I(r1)), which is the set
of instances from the second source that form instance pairs with instances in r1, by
following the owl:sameAs links. The dashed lines in the figure represent these instance
pairs. All the pairs that match both restrictions r1 and r2 also support our hypothesis
and thus is equivalent to the instance pairs corresponding to instances belonging to the
intersection of the sets r2 and I(r1). This set of instance pairs that support our
hypothesis is depicted as the shaded region. We can now capture subset and equivalence
relations between the restriction classes by set-containment relations from the figure.
For example, if the set of instance pairs identified by r2 are a subset of I(r1), then
the set r2 and the shaded region would be entirely contained in the I(r1). We use two
metrics P and R to quantify these set-containment relations. Figure 4 summarizes these
metrics and also the different cases of intersection. In order to allow a certain margin
of error induced by the dataset, we are lenient on the constraints and use the relaxed
versions P’ and R’ as part of our scoring mechanism.</p>
        <p>r1
r2</p>
        <p>Key:
I(r1)</p>
        <p>Set of instance pairs where both r1 and r2 holds
Set of instances from O1 where r1 holds
Set of instances from O2 where r2 holds
Set of instances from O2 paired to instances from O1
Instance pairs where both r1 and r2 holds</p>
        <p>Instance pairs where r1 holds</p>
        <p>While calculating results, we find equivalences and subsets as follows. Consider
a hypothesis from the restriction class ‘featureClass=geonames:H’ from GEONAMES to
restriction class ‘rdf:type=dbpedia:BodyOfWater’ from DBPEDIA. Based on the support
for the hypothesis we have |I(r1)| = 2132, |r2| = 1989 and |I(r1) ∩ r2| = 1959. Thus,
P’ = 0.98 &amp; R’ = 0.92 and we say that it is an equivalence alignment as both P’ and R’
are greater than a threshold of 0.9 as explained in Figure 4. Similarly, we also identify
alignments like the restriction class ‘featureCode=geonames:P.PPL’ ⊂ restriction class
‘rdf:type=geonames:Place’ and others as shown in Table 1.
We generated alignments using the above described method on a subset of GEONAMES
and DBPEDIA that contains only instances that are linked together. After having
generated alignments on a subset of the data, we found that most of the properties left
after preprocessing and removing the inverse-functional properties in DBPEDIA did not
have enough support in the database. Moreover, as the classes in the DBPEDIA
ontology are well-defined and highly specialized, we focused only on the rdf:type property
on the DBPEDIA side for the results. Our approach generates 20116 hypotheses that
cannot be specialized further and have a support of more than one instance pair. From
this we get 8 equivalences, 2937 r1 ⊂ r2 relations and 60 r2 ⊂ r1 relations. Table 1
shows some examples of the equivalence classes that our algorithm finds. Note that as
we consider only the subset of GEONAMES and DBPEDIA consisting of instances linked
via owl:sameAs links, our algorithm classifies as equivalence alignments like the one
between ‘rdf:type=geonames:Feature’ and ‘rdf:type=dbpedia:Place’, which with more
linked data would probably not hold.</p>
        <p>We also ran our algorithm on the LINKEDGEODATA-DB PEDIA subset. As these
ontologies contain classes that are highly specialized, we chose to restrict the alignments
only between the rdf:types of the sources. We generate 176 hypotheses that cannot be
specialized further and have a support of more than one instance pair. From this we
get 9 equivalences, 66 r1 ⊂ r2 relations and 14 r2 ⊂ r1 relations. These results are
summarized in Table 1.</p>
        <p>It should be noted that these alignments closely follow the semantics behind the
sources. For example, looking at the results we would assume that the alignment of
‘geonames:featureCode=T.MT’ (Mountain) with ‘rdf:type=dbpedia:Mountain’ would
be equivalent. Closer inspection of the GEONAMES dataset shows, however, that there
are some places with Feature Codes like ‘T.PK’ (Peak), ‘T.HLL’ (Hill), etc. from
GEONAMES whose corresponding instances in DBPEDIA are all typed ‘dbpedia:Mountain’.
This implies that the interpretation of the concept ‘Mountain’ is different in both the
sources and only a subsumption relation holds.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        There is large previous literature on ontology matching [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Ontology matching has
been based on terminological (e.g. linguistic and information retrieval techniques [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]),
structural (e.g. graph matching[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]) and semantic (e.g. model based) approaches or
their combination. The FCA-merge algorithm [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] uses extension techniques over
common instances between two ontologies to generate a concept lattice in order to merge
them, and thus align them indirectly. This algorithm, however, relies on a domain expert
(users) to generate the merged ontology and is based on a single corpus of documents
instead of two different sources, unlike our approach. Perhaps a strong parallel to our
work is found in Duckham et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] which also uses an extensional approach for
fusion and alignment of ontologies in geospatial domain. The difference in our approach
in comparison to their work (apart from the fact that it predates Linked Data) is that
while their method fuses ontologies and aligns only existing classes, our approach is
able to generate alignments between classes that are derived from the existing ontology
by imposing restrictions on values of any or all of the properties not limited to the class
type. Most of the work in information integration within the Web of Linked Data is in
instance matching as explained in Bizer et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Raimond et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] use string and
graph matching techniques to interlink artists, records, and tracks in two online music
datasets (Jamendo and MusicBrainz) and also between personal music collections and
the MusicBrainz dataset. Our approach solves a complimentary piece of the
information integration problem on the Web of Linked Data by aligning ontologies of linked
data sources.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Linked Data on the Web contains linked instances from multiple sources without the
ontologies of the sources being themselves linked. However, it is useful to the consumers
of the data to define the alignments between such ontologies. Our algorithm generates
alignments, consisting of conjunctions of restriction classes, that define subsumption
and equivalence relations between the ontologies. This paper focused on automatically
finding alignments between the ontologies of geospatial data sources. However, the
technique is general and can be applied to other Web of Linked Data data sources.</p>
      <p>
        In our future work, we plan to improve the scalability of our approach, specifically,
improve the performance of the algorithm that generates alignment hypotheses by using
a more heuristic exploration of the space of alignments. We also plan to apply our
alignment techniques to other sources in the Web of Linked Data beyond the geospatial
domain, such as the richly interlinked genetic data published by Bio2RDF [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>A survey of approaches to automatic schema matching</article-title>
          .
          <source>VLDB Journal</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ) (
          <year>2001</year>
          )
          <fpage>334</fpage>
          -
          <lpage>350</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology matching. Springer-Verlag (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Design Issues: Linked Data</article-title>
          . http://www.w3.org/DesignIssues/LinkedData.html (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>How to publish linked data on the web</article-title>
          . http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vatant</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wick</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Geonames ontology</article-title>
          . http://www.geonames.org/ontology/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Halpin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>When owl:sameAs isn't the same: An analysis of identity links on the semantic web</article-title>
          .
          <source>In: International Workshop on Linked Data on the Web</source>
          , Raleigh, North Carolina (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinavier</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.:</given-names>
          </string-name>
          <article-title>owl:sameAs and Linked Data: An Empirical Study</article-title>
          . In: Second Web Science Conference, Raleigh, North Carolina (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          . In: Sixth International Semantic Web Conference, Busan, Korea (
          <year>2007</year>
          )
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Haklay</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>OpenStreetMap: user-generated street maps</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>LinkedGeoData: Adding a Spatial Dimension to the Web of Data</article-title>
          . In: Eight International Semantic Web Conference, Washington, DC (
          <year>2009</year>
          )
          <fpage>731</fpage>
          -
          <lpage>746</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An API for Ontology Alignment</article-title>
          . In: Third International Semantic Web Conference, Hiroshima, Japan (
          <year>2004</year>
          )
          <fpage>698</fpage>
          -
          <lpage>712</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Melnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Similarity flooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In: International Conference on Data Engineering</source>
          , San Jose, California (
          <year>2002</year>
          )
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>FCA-Merge: Bottom-up merging of ontologies</article-title>
          .
          <source>In: International Joint Conference on Artificial Intelligence</source>
          , Seattle, Washington (
          <year>2001</year>
          )
          <fpage>225</fpage>
          -
          <lpage>234</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Duckham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Worboys</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An algebraic approach to automated geospatial information fusion</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          <volume>19</volume>
          (
          <issue>5</issue>
          ) (
          <year>2005</year>
          )
          <fpage>537</fpage>
          -
          <lpage>557</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Raimond</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic interlinking of music datasets on the semantic web</article-title>
          .
          <source>In: First Workshop on Linked Data on the Web</source>
          , Beijing, China (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Belleau</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tourigny</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Good</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morissette</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>In: Bio2RDF: A Semantic Web atlas of post genomic knowledge about human and mouse</article-title>
          . Springer (
          <year>2008</year>
          )
          <fpage>153</fpage>
          -
          <lpage>160</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>