<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Cluster-based Approach for User Participation in Ontology Maching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vinicius Lopes</string-name>
          <email>vinicius.lopes@uniriotec.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernanda Baião</string-name>
          <email>fernanda.baiao@uniriotec.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kate Revoredo</string-name>
          <email>katerevoredo@uniriotec.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>A Clustering-based Approach for User Participation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Applied Informatics Federal University of the State of Rio de Janeiro (UNIRIO)</institution>
          ,
          <addr-line>Rio de Janeiro</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>User participation is a promising approach for Ontology Matching; however, determining the most representative pairs of entities is still a challenge. This paper delineates an Ontology Matching approach for user participation employing a clustering algorithm. Ontology matching focuses on identifying correspondences between entities of two or more ontologies and establishing an alignment as a solution to the heterogeneity problem. Some works in ontology matching apply user participation approaches [2][5], such as selecting and combining similarity measures, tuning parameter values or giving feedback for suggested correspondences. User feedback is considered a promising approach since it requires domain knowledge as opposed to technical knowledge. Due to the difficulty of finding available users, however, it is necessary to minimize user effort by selecting the most representative correspondences. This work delineates an approach to address this issue, in which we apply a clustering algorithm to identify the most representative pairs of entities.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>ontology matching</kwd>
        <kwd>machine learning</kwd>
        <kwd>clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Our proposed approach is composed by 4 steps, which are detailed below.</p>
      <p>Select Candidate Correspondences. In this step, a committee is formed to select a subset of
candidate correspondences for the user feedback. Given two ontologies O and O´, each committee
member mi is represented by a matrix Mi. Each cell Mi[x,y] is the similarity value (calculated
according to a unique or a combination of similarity measures) for the pair (x,y), where x is an entity
of O and y is an entity of O´. Since Mi are typically sparse matrices (given that most of the pairs do
not match), this step analyzes all matrices and selects pairs with the highest potential for actually
being correspondent. A pair (x,y) is selected as a candidate correspondence iff, for every matrix Mi,
y is the entity that is most similar to x, and vice-versa.</p>
      <p>
        Select Correspondences for User Feedback. In this step, we apply the algorithm farthest-first [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
as a naïve, yet effective and efficient clustering algorithm for selecting correspondences for user
feedback among the candidate correspondences. Each instance to be clustered represents a candidate
correspondence (x, y). The attributes of an instance (x, y) are the similarity values Mi[x][y] of each
matrix. The cluster centroids are selected for user feedback and then stored in a repository.
Collect and Propagate User Feedback. The user gives his feedback on the selected pairs (either
confirming or rejecting as a real correspondence). The feedbacks are updated in the repository.
Learn the Ontology Alignment and Propagate User Feedback. In this step, a classification
algorithm is executed considering the repository of classified correspondences. The Naive Bayes
classification algorithm achieved the best results. The bayes rule determines the probability
distribution of class C for a pair of entities, considering its attributes (similarity measures). The
resulting model is used to classify candidate correspondences, returning the label c that maximizes
the posterior probability to propagating the effect of user feedback for the remaining candidate
correspondences, and storing them in the repository.
      </p>
      <p>
        We executed an initial experiment of the approach on top of the OAEI conference dataset.
Reference alignments were used to validate the results and simulate user feedbacks. We considered
only equivalence correspondences between classes. The committee included Cosine [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
WuPalmer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] similarity measures. We evaluated two values (3 and 6) for the number of clusters, or
user feedbacks. In the first run the approach achieved an average precision of 0.68 and an average
recall of 0.55. In the second run the approach achieved an average precision of 0.83 and an average
recall of 0.58. These results show an increase in the precision of 15% when the number of feedbacks
increases. F-measure also increased from 0.58 from 0.67. However, the metrics remained the same
(or even decreased) for certain pairs of ontologies, indicating there is a need to further investigate
the optimal number of clusters for each case.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Conclusion</title>
      <p>We introduce an approach for ontology matching with user participation that selects candidate
correspondences based on a committee of similarity measures. Promising results were obtained on
top of the OAEI conference dataset. Future work will perform further experiments, consider other
similarity measures and clustering algorithms (including hierarchical approaches).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>T. F.</given-names>
          </string-name>
          <article-title>Clustering to minimize the maximum intercluster distance</article-title>
          .
          <source>Theoretical Computer Science 38</source>
          , pp.
          <fpage>293</fpage>
          -
          <lpage>306</lpage>
          (
          <year>1985</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Interactive User Feedback in Ontology Matching Using Signature Vectors</article-title>
          . In: Kementsietsidis et al (eds.) ICDE. pp.
          <fpage>1321</fpage>
          -
          <lpage>1324</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Verb Semantics and Lexical Selection</article-title>
          .
          <source>Proc. 32nd annual meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>133</fpage>
          -
          <lpage>138</lpage>
          (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollias</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A String Metric for Ontology Alignment</article-title>
          . Semant.
          <source>WebISWC</source>
          <year>2005</year>
          .
          <volume>3729</volume>
          ,
          <fpage>624</fpage>
          -
          <lpage>637</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>G.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Actively Learning Ontology Matching via User Interaction</article-title>
          . In: Bernstein,
          <string-name>
            <surname>A.</surname>
          </string-name>
          et al (eds.).
          <source>ISWC</source>
          . pp.
          <fpage>585</fpage>
          -
          <lpage>600</lpage>
          . Springer (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>