<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Stanford, CA 94305, USA Prasenjit Mitra and Gio Wiederhold Infolab, Stanford University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>A system that enables interoperation among information sources using ontologies needs to resolve the terminological differences between ontologies. In this work, we present several methods that we have designed to match terms used in different ontologies. We have implemented two methods based on linguistic similarities of terms used in the ontologies. The first looks up a dictionary or semantic network like WordNet and the second determines similarities of words based on word similarity compuoed from a domain-specific corpus of documents. We discuss our experiments that indicate that a method that uses both heuristics produces good results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Often, we cannot answer a query from a single source, and need to
compose information from multiple information sources. These
information sources are autonomously created and maintained.
Integrating the information in them to create a single source is not an
option where the owners of the information sources prefer to
maintain their autonomy. The merging approach of creating an unified
source is not scalable and is costly. Besides, an integrated
information source would need to be updated as soon as any information in
any individual source changes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Furthermore, in certain cases a
complete unification of a large number of widely disparate
information sources into one monolithic information source is not feasible
due to unresolvable inconsistencies between them that are irrelevant
to the application. For a particular application, resolution of
inconsistencies between a pair of knowledge sources is typically feasible,
but it becomes nearly impossible when the objective is undefined and
the number of sources is large.
      </p>
      <p>Due to the complexity of achieving and maintaining global
semantic integration, the merging approach is not scalable. We have
adopted a distributed approach which allows the sources to be
updated and maintained independent of each other and enables
composition of information via interoperation.</p>
      <p>Ontologies are increasingly being used to assist the integration of
information. They specify the terminology (and its semantics) used
in information sources. These sources are autonomously created and
maintained.</p>
      <p>The alternative to individual ontologies for individual sources is
to use standard ontologies across multiple information sources.
Efforts to create and use standardized ontologies have met with limited</p>
    </sec>
    <sec id="sec-2">
      <title>The Need for Autonomous Ontologies 1 Introduction</title>
      <p>1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Resolving Semantic Heterogeneity</title>
      <p>Problems of heterogeneity in hardware, operating systems, and data
structures have been widely addressed, but issues of diverse
semantics have been handled mainly in an ad-hoc fashion. While
composing information from information sources, we need to ensure that the
information that we are composing have some semantically
meaningful relationship. Semantic heterogeneity among information sources
needs to be resolved to enable meaningful information exchange or
interoperation among them.</p>
      <p>The two major sources of heterogeneity among the sources are as
follows: First, different sources use different data formats and
modeling languages to represent their data and meta-data. Second, sources
using the same data format differ in their structure and semantics
of the terminology they use. Such heterogeneity are a result of the
autonomous nature of the ontologies and the fact that information
sources are constructed by different people with different objectives
in mind.</p>
      <p>Often different sources use different terminologies to describe the
objects in the sources. The same term, used in different sources,
often have overlapping or somewhat different semantics, e.g., the term
“nail” has entirely different semantics in a “cosmetics” ontology and
the “carpentry” ontology. Similarly, different sources, often, use
difM atch by default all concepts that are to be taken to be equivalent
(M atchConcept1Concept2) the expert. We use the relation
M atch does not indicate the exact semantic relationship between
M atch relationship, or are equivalent etc. Therefore, gives a coarse
Concept1 Concept2 to indicate that the two concepts and are
related (above an acceptable threshold using an expert-supplied
metric of relatedness that can vary from application to application).
the two concepts, for example, whether they have a class-subclass
relatedness measure and it is upon the human expert to then refine
it to something more semantic, if such refinement is required by
the application. For example, the human expert might indicate that
unless otherwise noted by the expert.
expressing matches between equivalent concepts and the more
complex rules expressed in datalog that are mostly supplied by
&lt;connection&gt;|
&lt;from&gt;Washington D.C.&lt;/from&gt;
&lt;to&gt;al-Jaber&lt;/to&gt;
&lt;/connection&gt;
2
ferent terms to refer to semantically similar objects, e.g., the terms
“truck” and “lorry” in two transportation ontologies might refer to
the same class of objects.</p>
      <p>In order to enable interoperation, we intend to capture the semantic
bridges between two ontologies using articulation rules. These rules
express the relationship between two (or more) concepts belonging
to the ontologies that we seek to interoperate. Since these ontologies
can be fairly large, establishing such rules manually is a very
expensive and laborious task. Fully automating the process is also not
feasible. First, despite the rapid advances made in the field of natural
language processing, the technology still remains inadequate to
automatically resolve semantic heterogeneity among these information
sources using different terminology. Second, even though ontologies
expose some of the semantics of the terms and their relationships,
they often remain incomplete or inadequate if we consider the needs
of the various applications that use them.</p>
      <p>
        The problem of ontology alignment has been studied for some
time. Tools like OntoMorph [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], PROMPT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and Chimaera [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
help significantly automate the process. However, these tools do not
contain a component that identifies concept names that are
linguistically similar automatically and use that knowledge as the basis for
furhter alignment of the ontologies. They require manual
construction of articulation rules or base their matches on the structure of
the ontologies. Our approach provides a greater degree of
automation while keeping the option of a human expert to ratify the
suggested articulation. A similar problem is that of schema matching in
databases. However, most of the techniques used in matching tools
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] etc. are not adequate when the primary
differences among sources are due to differences in terminology in
sources with little structural similarity or when instance data is not
available and would provide poor results due to the absence of good
structural similarity or absence of instance data in our applications.
      </p>
      <p>In this paper, we propose a semi-automated algorithm for
resolving the terminological heterogeneity among the ontologies and
establishing the articulation rules necessary for meaningful interoperation.</p>
      <p>This algorithm forms the basis of the articulation generator for our
ONtology compositION sytesm (ONION). Our experiments show
that basing such matching on structural information is inadequate.</p>
      <p>We describe several heuristics to resolve the terminological
heterogeneity among ontologies. Experimental results show that combining
the information obtained by using multiple heuristics provides a
better match between semantically related terms in the ontologies.
FORALL X,Y,Z
connection(X,Z)&lt;connection(X,Y) and
connection(Y,Z).
&lt;flight&gt;|</p>
      <p>&lt;DepCity&gt;Washington
D.C.&lt;/DepCity&gt;</p>
      <p>&lt;ArrCity&gt;Frankfurt&lt;/ArrCity&gt;
&lt;/flight&gt;
V alueOf g. All other relationships are not interpreted by the
articufSubClassOf; P artOf; AttributeOf; InstanceOf;
lation generator in ONION.</p>
      <p>Articulation rules are of two types
ones that are simple statements of the form</p>
    </sec>
    <sec id="sec-4">
      <title>Ontologies and Their Articulations</title>
      <p>f DepCity The United Airlines source has light, whose is
In Figure 1, we show an example articulation. On the left hand
side, is a portion of the United Airlines Ontology and on the right
a portion of the TRANSOM Ontology. These ontologies were
constructed manually for experimentation. The objective of the
application is to transport military men and materiel from Washington D.C.
to Al Jabar Airbase in Kuwait. A combination of commercial flights
and special purpose sorties is to be used to meet the transport
objective.</p>
      <p>Inference Engine
&lt;/Equ&gt;
Using articulation rule:</p>
      <p>&lt;sortie&gt;&lt;from&gt;Rhein Main
&lt;Equ&gt; &lt;Airport&gt;Frankfurt&lt;/Airport&gt;AFB&lt;/from&gt;
&lt;AFB&gt;Rhein Main AFB &lt;/AFB&gt; &lt;to&gt;al-Jaber AB&lt;/to&gt;</p>
      <p>&lt;/sortie&gt;
&lt;Impl&gt; &lt;Sortie&gt;&lt;Connection&gt;&lt;/Impl&gt;
&lt;Impl&gt;&lt;Flight&gt;&lt;Connection&gt;&lt;/Impl&gt;
&lt;Equ&gt;&lt;DepCity&gt;&lt;From&gt;&lt;/Equ&gt; .</p>
      <sec id="sec-4-1">
        <title>Declaratively</title>
      </sec>
      <sec id="sec-4-2">
        <title>Specified Rules</title>
        <p>for each word w2 in s2:
similarityScore wst.lookup( w1, w2 );</p>
        <p>Add (w1, w2, similarityScore) to similarityList;
– Sort similarityList on the similarity score of the tuples;
– Set matchedWords</p>
        <p>null;
– floatingPointNumber matchingScore
0.0;
– for each tuple (w1, w2, ss) in similarityList:
AlJ abarAirbase; Kuwait.</p>
        <p>The tool generates and suggests the simpler articulation rules to
indicate the terms in the two ontologies that are related. The expert
then validates these suggestions and the final set of articulation rules
are stored to be used during query rewriting and execution.
3</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Generation of Ontology Articulations 3.1</title>
    </sec>
    <sec id="sec-6">
      <title>Non-iterative Algorithms</title>
      <p>3.1.1</p>
      <sec id="sec-6-1">
        <title>Linguistic Matching</title>
        <p>Non-iterative algorithms are ones that identify the matching concepts
in the two ontologies in one pass. Our linguistic matcher employs
only non-iterative algorithms.</p>
        <p>The linguistic matcher looks at all possible pairs of terms from the
two ontologies it is matching and assigns a similarity score to each
pair. If the similarity score is above a threshold, then the match is
accepted and an articulation rule is generated. The threshold can be
modified by the expert performing the articulation to increase or
decrease the number of matches generated.</p>
        <p>We expect that a concept name is represented as a string of words.
The matcher constructs all possible pairs of words where the two
words in a pair come from different strings. The matcher uses a
wordsimilarity table generated by a word relator which we describe
below. It looks up a word-similarity table to determine the similarity
between all such pairs of words. Finally, it computes the similarity
of the strings based on the similarity of the pairs of words.</p>
        <p>match( String s1, String s2, WordSimilarityTable wst)
– List similarityList;
– for each word w1 in s1:
– similarityScore</p>
        <p>similarityScore / min( size(s1), size(s2) );
– return similarityScore;
For example, given the strings ”Department of Defence” and
F rankf urt dicate that the airport is the co-located with the
Kuwait.</p>
        <p>We establish the articulation rules semi-automatically. They
inONION has an automated articulation generator (ArtGen) that
suggests articulations based on a library of heuristic matchers. Each
matcher matches terms in the two ontologies. A human expert,
knowledgeable about the semantics of concepts in both ontologies,
validates the suggested matches generated by ArtGen using a GUI
tool. The expert can either accept the match, keep the match but
modify the suggested relationship between the matched terms, delete a
suggested match or say that the match is irrelevant for the application
at hand. The expert can also indicate new matches that the
articulation generator might have missed.</p>
        <p>The process of constructing an articulation is an iterative process
and after the expert is satisfied with the rules generated, they are
stored and used when information needs to be composed from the
two ontologies. The response of the expert is also logged and the
articulation generator uses the expert’s feedback to generate better
articulations in future while articulating similar ontologies for
similar applications. This learning process improves the quality of future
generation of articulations from similar information sources.</p>
        <p>The heuristic matchers used by the automated articulation
generator can be classified into two broad types - iterative and non-iterative.
Since the articulation generator is modular in nature, any
applicationspecific matching algorithm can be plugged in. However, we believe
that a set of basic matching algorithms will be useful in a wide
variety of applications and we experimented to determine such a set.</p>
        <p>The denominator is the number of words in the string with less
number of words.</p>
        <p>This similarity score of two strings is then normalized with
respect to the highest generated score in the application. The
normalization step removes the bias of word-relators that give
very low similarity scores for all pairs of words or those that
give very high scores to all pairs of words. If the generated
similarity score is above the threshold, then the two
concepts are said to match, and we generate an articulation rule,
w in the context vector, Vw, of a word is equal the number of words
= in the corpus. Let Vw[i] c. This implies that the ith word in the
c corpus occurs with a frequency in the 1000-character
neighbourw currences of in documents in the corpus. For example, the words</p>
        <p>For example, the definitions of ”truck” and ”boat” are ”an
automotive vehicle suitable for hauling”,and ”a vessel for water
transportation”. If the specified depth is 1, we do not look into the definitions
of ”vehicle” and ”vessel” to determine their similarity. Since they
are not exactly the same, we say their similarity is 0. If however, the
depth were set to 2 (or more), we would look up the definitions of
”vehicle” and ”vessel”, discover their definitions both have
”transportation” in common, and generate a similarity measure and
propagate that similarity up to generate a non-zero similarity for ”truck”
and ”boat”.</p>
        <p>
          Corpus-Based Word Relator: Word similarities used by the
linguistic matcher can also be generated using a corpus-based matching
algorithm. The word relator uses a corpus of documents belonging
to the domain of the ontologies that are being matched. The terms
that appear in the ontology should also appear in the documents. The
word relator calculates word-similarity scores based on the similarity
of the contexts in which the words appear in the documents [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>We identify the context in which a word, w, appears by looking
at words that appear in a 1000-character neighbourhood of all
oc”in”, ”the”, ”For”, and ”example” constitute the 30-character
neighbourhood of the word ”corpus” at the end of the last sentence. In the
example, we looked at a 15-character window ahead of the word and
15 characters behind the word and chose all words that are complete
in these windows. Therefore, even though part of the word
”documents” appears in the 15-character window before the word ”corpus”
in that sentence it is ignored.</p>
        <p>We look at all words that appear in the corpus. For each occurrence
of a word, we identify the words in its context. The number of rows
hood of the word w. The cosine of such normalized context vectors
of two words gives a measure of the similarity of contexts in which
the two words appear. We use this similarity measure to generate a
table of word similarities that is then used by the linguistic matcher.</p>
        <p>Ideally, we would have one corpus associated with one ontology,
where the documents in the corpus use the terms in the exact sense
as it is used in the ontology. However, for our experiments we did
not have such a domain-specific corpus. We generated a corpus by
searching the web(google) using 5 keywords each from the two
ontologies that we were seeking to articulate. Typically, a corpus of 200
pages proved adequate to produce good matches.</p>
        <p>These algorithms look for structural isomorphism between subgraphs
of the ontologies to find matching concepts. For the ontologies we
have experimented with, we see that a purely structural matcher
one that simply looks for isomorphism between subgraphs in the
ontologies without considering concept names- performs very poorly
and is inadequate.</p>
        <p>Therefore, we propose a structure-based matcher that is called
after the matches generated by a linguistic matcher is available. If the
linguistic matcher has matched nodes, ”A” and ”B” in the
ontologygraphs, the structural matcher looks to match their children (also
parents), ”C”, and ”D”, if they have not already been matched. If
a substantial percentage (above the threshold supplied) of the parents
of ”C” have been matched with those of ”D”, and the children of
”C” have been matched with those of ”D”, then an articulation rule
matching ”C” and ”D” is generated.
3.2</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Iterative Algorithms</title>
      <p>
        Instance-based matching heuristics have been used to successfuly
match schemas in databases [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Such matchers look at data types,
and extract other features like lengths of attributes, numerical or
lexical statistics of attributes, and match classes based on such feature
vectors. Though, we can handle ontologies, whose concepts also
have instances associated with them, oftentimes, businesses are
reluctant to make instances available. Thus, we have designed our
algorithms assuming that no instance data is available. However, if such
information is available, the matcher can be extended to use instance
information.
      </p>
      <p>Iterative algorithms are algorithms that depend upon existing
articulation rules to generate further articulation rules. They require
multiple iterations over the two source ontologies in order to generate
semantic matches between them.
3.1.2</p>
      <sec id="sec-7-1">
        <title>Instance-based Heuristics 3.2.1</title>
      </sec>
      <sec id="sec-7-2">
        <title>Structure-based Heuristics 3.2.2</title>
      </sec>
      <sec id="sec-7-3">
        <title>Inference-based Heuristics</title>
        <p>An inference engine can reason with the rules available with the
ontologies and any seed rules provided by an expert ontologies to
generate matches between the ontologies. For example, a rule:
(=&gt; (InstanceOf X O1.LuxuryCar)
((InstanceOf X O2.Car) AND
(O2.PriceOf Y X) AND
(O2.UnitOf X "$") AND
(ValueOf X Z) AND
(&gt; Z 40,000)))</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Experiments &amp; Results</title>
      <p>1. Ontologies (avg. 30 nodes) constructed manually to represent a
domestic airlines (terminology used on United Airlines website)
and a airforce ontology (terminology used in the US Air Force).
2. Ontologies (avg. 50 nodes) constructed manually from the NATO
government web-sites representing each web-page associated with
an department of the government as a node. The edges in the
ontology graph were derived from the links between the pages.</p>
      <p>Date</p>
      <p>Wing
Origin Destination</p>
      <p>Payload
Time
Sortie</p>
      <p>AFB</p>
      <p>GEOLOC
the similarity of all pairs of words in the corpus. The corpus-based
method can then be thought of as equivalent to a lookup based
method, where the word-similarity matrix is constructed from the
words in the corpus. This variation of the corpus-based method
scaled well and for our ontologies finished within a couple of
minutes at worst.</p>
      <p>Quality: The quality of the matches were very dependent on the
quality of the corpus available. We experimented with corpuses of
size 50 pages, 100 pages, 200 pages and 1000 pages. Corpuses of
size 50-100 pages resulted in low recall figures for the matches. A
size of 200 webpages often proved adequate to generate a recall
of 70%, although in most cases having a corpus of 1000 matches
increased the recall, it was less than a few percentages.</p>
      <p>Graph matcher for Articulation- creating Expert
Transcom
ontology
EstCost</p>
      <p>We measured the accuracy of the generated match by comparing
the results generated by the automated matcher with those expected
by the expert. Any match deleted by the expert was taken to be a false
positive and lowered the precision figures, and a match added by the
expert that the automated generator failed to find lowered the recall.
We summarize the results of the several experiments below:</p>
      <p>A purely structural method which requires exact concept-name
match, like that has been used in existing tools, fails to generate
even 50% of the matches expected by the expert. This result is not
surprising since despite having useful information, the structure
of the ontologies used hardly encode sufficient semantics to use
them solely for ontology alignment.</p>
      <p>Adding linguistic heuristics gave significantly better results,
especially, the corpus-based heuristic provided we supplied the
matcher with a good representative set corpus of documents from
the applicable domain.</p>
      <p>However, a multi-strategy approach works best. On the average
about 75% of the matches were generated, with less than 5% false
positives that the expert indicated was not correct. The linguistic
method generates on the average about 60-70% of the matches
(recall with 95% precision). Adding the structural matcher, boosts
the matches by 5-10%. The human expert provided us with the
other 30% of the rules that were not generated automatically.
Route
Departure</p>
      <p>City
Airport
Name</p>
      <p>The performance of the algorithm depends upon several
parameters:</p>
      <p>Thesaurus-based Method: A general purpose thesaurus results in
very poor results. Domain-specific thesauri produce better results
but might not be available.</p>
      <p>Corpus-based Method: A corpus-based method produced better
results than the thesaurus-based method. In the aircraft example,
solely employing the thesaurus-based method produced a 30%
recall (at 90% precision). A corpus-based method, where we
obtained a corpus by searching the web with a few key-words from
the domains, boosted the match to 60%. Combining the two, we
obtained a recall of 70%.</p>
      <p>Scalability: Initially, we tried the corpus-based method with a
preprocessing step of collecting the corpus and building up the
wordcontext vectors. The linguistic matcher, while matching the
ontologies, constructed the word similarities as needed. However, for
a test case with 300 nodes in each ontology took an hour to run on
a Pentium III machine with 256M memory. It becomes clear that
for larger ontologies, the algorithm does not scale well if we
compute the word similarities while matching the ontologies. For the
algorithm to scale, not only, do we need to build the corpus and
construct the word-context vectors a priori, but also pre-compute</p>
      <p>In Figure 2(hand-drawn), we show two ontologies - the United
Ontology and the TRANSCOM Ontology and the matches
generated. We used a hybrid method that uses WordNet as a thesaurus,
and a corpus generated by searching google. For example, the page
”http://www.etrackcargo.com/Help/Agents/Fieldwas part of the
corpus. The confidence scores of the matches are as follows when the
threshold was set to 0.7:</p>
      <p>If the threshold was set to a lower value 0.60, we
introP ayload and was not higher than 0.7 using the corpus-based
word-relator and would not have been suggested. Thus, we see that a
hybrid method gives us a better accuracy than any one method alone.</p>
      <p>In this example, we see that with a threshold value of 0.7, we
generate all the desired matches and no false matches - the ideal solution.
However, acheiving a 100all cases. From this and several other
experiments, we see that setting a threshold of 0.7 gives the most number
of matches with the a 95the matches are false positives. However,
in a significant number of cases the value of the threshold varies
depending upon both the corpus supplied and the ontologies being
matched. Therefore, we suggest that for an unknown application or
an unknown corpus, when running the first time, the matching
threshold be set to 0.7. This is not to say that a threshold of 0.7 will produce
best results always but from our experience it provides a good
starting point as there is no one threshold value that will provide
satisfactory for all applications. If not satisfied with the results the expert
can then increase or decrease the threshold to get better matches.
5</p>
    </sec>
    <sec id="sec-9">
      <title>Conclusion</title>
      <p>We discussed several heuristic methods to produce simple matching
rules between concepts in ontologies that are being aligned. We see
that a multi-strategy method based on intial linguistic-similarity
followed by structural matching generates matches between ontologies
with reliable accuracy. The work of an expert who then validates the
suggested rules or supplies new rules is substantially reduced by the
automated component.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] 'Wordnet - a lexical database for english</article-title>
          . http://www.cogsci.princeton.edu/ wn/',
          <source>Technical report</source>
          , Princeton University.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] 'Resource description framework(rdf) model and syntax specification, w3c recommendation http://www</article-title>
          .w3.org/tr/rec-rdf-syntax', (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M. D. Siegel C. H. Goh</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Madnick</surname>
          </string-name>
          .
          <article-title>Semantic interoperability through context interchange: Representing and reasoning about data conflicts in heterogeneous</article-title>
          and autonomous systems http://citeseer.nj.nec.com/191060.html.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chalupsky</surname>
          </string-name>
          , '
          <article-title>Ontomorph: A translation system for symbolic knowledge'</article-title>
          ,
          <source>in KR 2000</source>
          , pp.
          <fpage>471</fpage>
          -
          <lpage>482</lpage>
          . Morgan Kaufmann Publishers, (
          <year>Apr 2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , '
          <article-title>Reconciling schemas of disparate data sources: A machine-learning approach'</article-title>
          ,
          <source>in SIGMOD</source>
          <year>2002</year>
          , (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jannink</surname>
          </string-name>
          ,
          <article-title>A Word Nexus for Systematic Interoperation of Semantically Heterogeneous Data Sources</article-title>
          ,
          <source>Ph.D. dissertation</source>
          , Stanford University,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , and E. Rahm, '
          <article-title>Generic schema matching with cupid'</article-title>
          ,
          <source>in VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14</source>
          ,
          <year>2001</year>
          , Roma, Italy, pp.
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          . Morgan Kaufmann, (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.L.</given-names>
            <surname>McGuiness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fikes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rice</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Wilder</surname>
          </string-name>
          ., '
          <article-title>The chimaera ontology environment'</article-title>
          ,
          <source>in Seventh National Conference on Artificial Intelligence (AAAI-2000)</source>
          , (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Melnik</surname>
          </string-name>
          , Hector Garcia-Molina, and Erhard Rahm,
          <source>in Proceedings of the Twelfth International Conference on Data Engineering</source>
          , San Jose, CA. IEEE Computer Society, (
          <year>February 2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.F.</given-names>
            <surname>Noy</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Musen</surname>
          </string-name>
          , '
          <article-title>Prompt: Algorithm and tool for automated ontology mergin and alignment'</article-title>
          ,
          <source>in Seventh National Conference on Artificial Intelligence (AAAI-2000)</source>
          , (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.E.</given-names>
            <surname>Oliver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.H.</given-names>
            <surname>Shortliffe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Musen</surname>
          </string-name>
          , '
          <article-title>Representation of change on controlled medical terminologies'</article-title>
          ,
          <source>in Proc. AMIA Conference</source>
          , (Oct.
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Yannis</surname>
            <given-names>Papakonstantinou</given-names>
          </string-name>
          , Hector Garcia-Molina, and
          <string-name>
            <given-names>Jeffrey D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , '
          <article-title>Medmaker: A mediation system based on declarative specifications'</article-title>
          ,
          <source>in Proceedings of the Twelfth International Conference on Data Engineering, February 26 - March 1</source>
          ,
          <year>1996</year>
          , New Orleans, Louisiana, ed.,
          <string-name>
            <surname>Stanley</surname>
            <given-names>Y. W.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
          </string-name>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>141</lpage>
          . IEEE Computer Society, (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hinrich</surname>
            <given-names>Schuetze</given-names>
          </string-name>
          , 'Dimensions of meaning', in Supercomputing, pp.
          <fpage>787</fpage>
          -
          <lpage>796</lpage>
          , (
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Haas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          , '
          <article-title>Data-driven understanding and refinement of schema mappings'</article-title>
          ,
          <source>in ACM SIGMOD</source>
          , (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>