<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>User Involvement in Ontology Matching Using an Online Active Learning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Booma Sowkarthiga Balasubramani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aynaz Taheri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel F. Cruz</string-name>
          <email>ifcruz@uic.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADVIS Lab Department of Computer Science University of Illinois at Chicago</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a semi-automatic ontology matching system using a hybrid active learning and online learning approach. Following the former paradigm, those mappings whose validation is estimated to lead to greater quality gain are selected for user validation, a process that occurs in each iteration, following the online learning paradigm. Experimental results demonstrate the e ectiveness of our approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        After the source and target ontologies are loaded into AgreementMaker, the
following steps are executed in sequence:
Automatic matching algorithms execution The following matchers are
executed individually and their results are stored in the corresponding similarity
matrices: the Advanced Similarity Matcher (ASM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the Parametric String-based
Matcher (PSM) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the Lexical Similarity Matcher (LSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the Vector-based
Multi-word Matcher (VMM) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and the Base Similarity Matcher (BSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Linear weighted combination The Linear Weight Combination (LWC)
matcher [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] linearly combines the similarity matrices of the other ve automatic
matchers using weights determined by the local con dence quality metric, which
estimates the quality of the scores produced by each matcher. The new score for
each mapping is stored in the LWC matrix. It is up to the selection phase to
output only those mappings that are in the nal alignment, taking into account
the desired cardinality of the mappings (e.g., one-to-one) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Candidate mapping selection Candidate mappings to be presented to the
users for validation are based on the combination of the following three criteria:
(1) Disagreement-based Top-k Mapping [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which measures the level of similarity
among the ve scores, one for each of the matchers considered. If the matchers
mostly agree on the scores, then the disagreement is low, but it is high when the
matchers disagree on the scores; (2) Cross Count Quality (CCQ), which counts,
for a score, the number of non-zero scores in the row and column of that score
in the LWC matrix [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The count is normalized by the maximum sum of the
scores per column and row in the whole matrix; (3) Similarity Score De niteness
(SSD), which is a quality metric that ranks mappings in increasing order of their
score [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It evaluates how close the score associated with a mapping is to the
maximum and minimum possible scores (1 and 0).
      </p>
      <p>
        User validation The result of this step is a label that has value 1 if the mapping
is correct and 0 if the mapping is incorrect. For each iteration, users validate a set
of candidate mappings. The validation of each mapping is called an interaction
by others [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. There can be any number of interactions per iteration, that is,
users can be presented with any number of mappings to validate at a time.
Classi cation We use a logistic regression classi er, which considers the
parametric distribution P (Y jX) where Y is the discrete-valued user label (1 or 0)
and the feature vector X = hX1; : : : ; Xni is the signature vector [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with n scores
computed for a mapping by n individual matchers, and estimates the parameter
that is the vector of weights W = hw1; : : : ; wni of the LWC matcher. The logistic
regression model is based on the following probabilities:
      </p>
      <p>
        1
P (Y = 1jX) = 1 + ew0+Pin=1 wiXi ; P (Y = 0jX) = 1 + ew0+Pin=1 wiXi
ew0+Pin=1 wiXi
W is updated during the iterative process by taking the partial derivative of the
log likelihood function with respect to each component, wi. The recursive rule
for the update is as follows, where is the learning rate that determines how
fast or slow the weights will converge to their optimal values [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
W
      </p>
      <p>W +</p>
      <p>g(W T Xi))
m
X Xi(Y i
i=1</p>
    </sec>
    <sec id="sec-2">
      <title>3 Experimental Evaluation</title>
      <p>We use the 2014 OAEI Conference Track ontology sets and their reference
alignments to simulate the user validation. The baseline is the F-Measure obtained
automatically by the AgreementMaker matchers. Table 1 depicts the average
FMeasure after 20 iterations using the three candidate selection criteria
individually or in combination with one another. The top performer is the
Disagreementbased Top-k Mapping Selection criteria.</p>
      <p>
        Our approach has an average F-Measure gain of 8.6% and an average
FMeasure of 60.4%. This is a considerable improvement as we started from an
average F-Measure of 51.8%, which was obtained using the automatic matchers
along with LWC. Table 2 compares our results with those obtained by other
systems that participated in the 2014 OAEI Interactive Track. It performs
better than HerTUDA and WeSeE (with F-Measure values of 58.2% and 47.3%,
respectively). The F-Measure gain of AML [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is 7.1% and of LogMap is 4.6%,
therefore our approach has the highest F-Measure gain. The table also shows the
relative number of interactions, which is the average number of interactions per
pair of ontologies divided by the size of the reference alignment for that pair.
Our approach shows better improvement in F-Measure with fewer number of
interactions when compared to AML that has the highest F-Measure.
      </p>
      <p>Figure 1 shows the e ect of the total number of interactions on the F-Measure
in our approach. Here, the total number of interactions represent the sum of the
number of interactions in each of the 21 reference alignments in the
Conference Track dataset (one for each pair of ontologies) up to 123 interactions.
The Disagreement-based Top-k Mapping Selection performs better than the
other candidate selection strategies. SSD and the combination of SSD+CCQ+
Disagreement have the next highest average F-Measure.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Comparison with Related Work</title>
      <p>We divide previous work into two categories depending on whether feedback
from single or multiple users is considered.</p>
      <p>
        Single user A previous approach that uses AgreementMaker performs updates in
the LWC matrix based on user feedback [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but does not use a classi er to adjust
the LWC weights. Another method uses logistic regression to learn an optimal
combination of both lexical and structural similarity metrics [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Compared to
our approach, it uses di erent similarity metrics, candidate selection strategies,
and techniques to customize weights for di erent matching strategies. Another
system aggregates similarity measures with the help of self-organizing maps and
incorporates user feedback for re ning self-organizing map outcomes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. There
is an active learning approach where the user validation is propagated according
to the ontology structure [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Another approach makes use of the
parameterization of matchers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It uses example mappings to automatically determine a
suitable parameter setting for each matcher, based on those examples. However,
in our approach, the LWC uses ve of the already existing matchers with the
same con guration as in AgreementMaker.
      </p>
      <p>
        Multiple users We discuss two approaches. The rst one uses a pay-as-you-go
approach and propagates the (possibly faulty) user validation input to
similar mappings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the second approach, a multi-user feedback method that
attempts to maximize the bene ts that can be drawn from user feedback, by
managing it as a rst class citizen [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. None of these approaches uses a classi er.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusions and Future Work</title>
      <p>In this paper, we have proposed an e ective semi-automatic ontology matching
approach that combines active learning with online learning. Our experimental
evaluation demonstrate that a considerable improvement in F-Measure can be
achieved over the base case. Clearly, a combination of user feedback with learning
is fertile ground for future research, where the scalability of the methods to large
and very large ontologies and the use of a variety of classi ers and of candidate
selection strategies would be some of the topics to investigate.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was partially supported by NSF Awards IIS-1143926, IIS-1213013,
and CCF-1331800.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>A.A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hedeler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Embury</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>User Feedback as a First Class Citizen in Information Integration Systems</article-title>
          .
          <source>In: CIDR Conference on Innovative Data Systems Research</source>
          . pp.
          <volume>175</volume>
          {
          <issue>183</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loprete</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taheri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Pay-As-You-Go MultiUser Feedback Model for Ontology Matching</article-title>
          .
          <source>In: International Conference on Knowledge Engineering and Knowledge Management (EKAW)</source>
          , pp.
          <volume>80</volume>
          {
          <fpage>96</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palandri Antonelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>AgreementMaker: E cient Matching for Large Real-World Schemas and Ontologies</article-title>
          .
          <source>PVLDB</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <volume>1586</volume>
          {
          <fpage>1589</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palandri Antonelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>E cient Selection of Mappings and Automatic Quality-driven Combination of Matching Methods</article-title>
          . In: ISWC International Workshop on Ontology Matching (OM).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>551</volume>
          , pp.
          <volume>49</volume>
          {
          <issue>60</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caimi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Palandri</given-names>
            <surname>Antonelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Keles</surname>
          </string-name>
          , U.C.:
          <article-title>Using AgreementMaker to Align Ontologies for OAEI 2010</article-title>
          . In: ISWC International Workshop on Ontology Matching (OM).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>689</volume>
          , pp.
          <volume>118</volume>
          {
          <issue>125</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Interactive User Feedback in Ontology Matching Using Signature Vectors</article-title>
          .
          <source>In: IEEE International Conference on Data Engineering (ICDE)</source>
          . pp.
          <volume>1321</volume>
          {
          <issue>1324</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dragisic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granada</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kempf</surname>
            ,
            <given-names>A.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lambrix</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montanelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solimando</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>dos Santos</surname>
          </string-name>
          , C.T.,
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>Results of the Ontology Alignment Evaluation Initiative 2014</article-title>
          . In: ISWC International Workshop on Ontology Matching (OM). pp.
          <volume>61</volume>
          {
          <fpage>104</fpage>
          . CEUR Workshop Proceedings (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>One Size Does Not Fit All: Customizing Ontology Alignment Using User Feedback</article-title>
          .
          <source>In: International Semantic Web Conference (ISWC). Lecture Notes in Computer Science</source>
          , vol.
          <volume>6496</volume>
          , pp.
          <volume>177</volume>
          {
          <fpage>192</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.M.:</given-names>
          </string-name>
          <article-title>The AgreementMakerLight Ontology Matching System</article-title>
          . In: International Conference on Ontologies, DataBases, and
          <article-title>Applications of Semantics (ODBASE)</article-title>
          . pp.
          <volume>527</volume>
          {
          <fpage>541</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Halloran</surname>
          </string-name>
          , J.:
          <article-title>Classi cation: Naive Bayes vs Logistic Regression</article-title>
          .
          <source>Tech. rep.</source>
          , University of Hawaii at Manoa EE
          <volume>645</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jirkovsky</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ichise</surname>
          </string-name>
          , R.:
          <article-title>Mapsom: User Involvement in Ontology Matching</article-title>
          .
          <source>In: Joint International Semantic Technology Conference (JIST)</source>
          , pp.
          <volume>348</volume>
          {
          <fpage>363</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Towards an Automatic Parameterization of Ontology Matching Tools Based on Example Mappings</article-title>
          .
          <source>In: ISWC International Workshop on Ontology Matching (OM)</source>
          . pp.
          <volume>37</volume>
          {
          <issue>48</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Actively Learning Ontology Matching via User Interaction</article-title>
          .
          <source>In: International Semantic Web Conference (ISWC). Lecture Notes in Computer Science</source>
          , vol.
          <volume>5823</volume>
          , pp.
          <volume>585</volume>
          {
          <fpage>600</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>