ALIN Results for OAEI 2018

        Jomar da Silva1 , Kate Revoredo1 , and Fernanda Araujo Baião1

       Graduated Program in Informatics, Department of Applied Informatics
        Federal University of the State of Rio de Janeiro (UNIRIO), Brazil
          {jomar.silva,katerevoredo,fernanda.baiao}@uniriotec.br


       Abstract. ALIN is an ontology matching system specialized in the in-
       teractive ontology matching, and its main characteristic is the use of
       expert feedback to improve the set of mapping suggestions, using se-
       mantic and structural techniques to make this improvement. ALIN has
       obtained the alignment with the highest quality in the interactive track-
       ing for Conference data set. This paper describes its configuration for
       the OAEI 2018 competition and discusses its results.

       Keywords: ontology matching, Wordnet, interactive ontology match-
       ing, ontology alignment, interactive ontology alignment


1     Presentation of the system

Due to the advances in information and communication technologies, a large
amount of data repositories became available. Those repositories, however, are
highly semantically heterogeneous, which hinders their integration. Ontology
matching has been successfully applied to solve this problem, by discovering
mappings between two distinct ontologies which, in turn, conceptually define
the data stored in each repository. Among the various ontology matching ap-
proaches that exist in the literature, interactive ontology matching includes the
participation of domain experts to improve the quality of the final alignment [1].
ALIN is an interactive ontology matching system and has participated in the
OAEI 2016 and OAEI 2017 evaluations.


1.1   State, purpose, general statement

ALIN has the following steps to perform the interactive ontology matching pro-
cess: First, ALIN generates an initial set of mappings. This set is called the
set of mapping suggestions, that are the mappings to receive expert feedback.
After, the interactive phase begins, where, at each interaction, the expert gives
his feedback for some mapping suggestions. After each expert feedback, ALIN
modifies the set of mapping suggestions according to the expert feedback. The
modification of the set of mapping suggestions is by the use of the structural
analysis of ontologies and the use of alignment anti-patterns. The interactions
continue until there are no more mapping suggestions left.
Algorithm 1 ALIN algorithm
Input: Two ontologies to be aligned, similarity metrics
Output: Alignment between the two ontologies
 1: Loading of ontologies
 2: Generation of the initial set of mapping suggestions
 3: Move of mappings by automatic classification from the set of mappings sug-
    gestions to the alignment
 4: Move of mappings by the low value of semantic similarity from the set of
    mapping suggestions to a backup set
 5: while Set of mapping suggestions is not empty do
 6:    Choose mapping from the the set of mapping suggestions to submit to
    the expert
 7:    Receive expert feedback to chosen mapping and remove it from the set
    of mapping suggestions
 8:    if Mapping is accepted then
 9:        Remove mappings in an alignment anti-pattern with accepted map-
    ping from the set of mapping suggestions
10:        Insert some data property and object property mappings related to
    the accepted mapping into set of mapping suggestions
11:        Move some mappings related to the accepted mapping from the
    backup set to the set of mapping suggestions
12:    end if
13: end while


1.2   Specific techniques used

The steps of ALIN algorithm (Algorithm 1) are the following:

 – Line 1. ALIN loads the ontology classes, object properties, and data prop-
   erties through the Alignment API [2]. For each entity, some data are stored
   such as name and label. ALIN saves the class superclasses and disjunctions.
   ALIN also saves information about the object properties, like their hyper-
   nyms and their associated classes, and information about the data properties,
   like their associated class. ALIN does not use instances. The ALIN can only
   work with ontologies whose entity names are in English.
 – Line 2. For each similarity metric, ALIN finds a set of mappings using a sim-
   ple matching algorithm. The simple matching algorithm treats the matching
   problem as a stable marriage problem with size list limited to 1 [3], i.e., the
   algorithm only selects one mapping if similarity value between the two enti-
   ties of the mapping is the highest considering all the mappings with at least
   one of these entities. ALIN uses six metrics and runs six times, once for each
   one, giving rise, each execution, to a set of mappings. The union of the sets
   gives origin to the initial set of mapping suggestions. ALIN uses the linguist
   metrics Jaccard, Jaro-Winkler, n-Gram, Resnick, Jiang-Conrath, and Lin.
   Simmetrics API [4] provides the metrics Jaccard, Jaro-Winkler, and n-Gram
   and HESML API [5] the metrics Resnick, Jiang-Conrath, and Lin. HESML
   API uses Wordnet. Because ALIN needs the canonical form of the entity
   names to use the Wordnet, ALIN uses Stanford CoreNLP API [6]. ALIN
   uses the most frequent synsets of words to calculate semantic similarities.
 – Line 3. The value of the similarity metrics ( Resnick, Jiang-Conrath, Lin,
   Jaccard, Jaro-Winkler, and n-Gram ) varies from 0 to 1 ( 1 is the maximum
   value ). When one mapping in the set of mapping suggestions has all the six
   metrics with the maximum value, ALIN moves the mapping from the set of
   mapping suggestions to the final alignment.
 – Line 4. ALIN moves the mappings whose entities has one of its linguistic
   metrics less than a given threshold from the set of mapping suggestions to
   a backup set. These mappings can return later, by structural analysis, to
   the set of mapping suggestions. [7] shows this technique, but with a little
   difference, it didn’t use a threshold. It moves the class mappings that are
   not in the same Wordnet synset.
 – Lines from 5 to 13. At this point, the interactions with the expert begin.
   ALIN sorts the mappings in the set of mapping suggestions by the sum of
   similarity metric values, greater sum first. ALIN submits the mappings to
   the expert. The set of mapping suggestions has, at first, only class mappings.
   After each expert feedback, if the expert accepts the mapping, ALIN moves
   it from the set of mapping suggestions to the alignment, else ALIN removes
   it from the set of mapping suggestions. ALIN can remove mappings (besides
   the mappings that received feedback) from the set of mapping suggestions
   and can include other mappings into it, depending on the expert feedback.

      At each interaction with the expert:

 – ALIN removes from the set of mapping suggestions all the mappings that
   are in alignment anti-pattern [8][9] with the accepted mapping;
 – ALIN inserts into the set of mapping suggestions, data property (like [10])
   and object property mappings related to the accepted class mappings.
 – ALIN moves from the backup set to the set of mapping suggestions all map-
   pings whose both entities are subclasses of the classes of an accepted map-
   ping. [7] shows a similar technique.

      The interaction phase continues until the set of mapping suggestions is empty.


1.3     Link to the system and parameters file

ALIN is available through Google drive
   (https://drive.google.com/file/d/
   1v6cxQvAuWVqIBzWQUEIzDuZogoW35fdq/view?usp=sharing) as a pack-
age for running through the SEALS client.
2     Results

Interactive ontology matching is the focus of the ALIN system. The quality of
the alignment generated by ALIN is dependent on the correct expert feedback.
ALIN has two phases: the non-interactive and the interactive phases. The non-
interactive phase goal is to achieve high precision without worrying about the
recall. In the interactive phase, ALIN modifies the set of mapping suggestions,
including and removing mappings related to the accepted mapping. If the ex-
pert makes a mistake, ALIN is more prone to deteriorate the set of mapping
suggestions, thereby decreasing the F-measure.
    The system performs better when the number of data and object properties
documented in the ontologies is proportionately large. In the interactive phase,
the system includes into the set of mapping suggestions mappings related with
accepted class mappings, thus allowing increase the recall. When the number
of properties in the ontologies is small, the system still generates an alignment
with good precision, but its recall tends to be not so good.


2.1   Comments on the participation of the ALIN in non-interactive
      tracks

As expected the participation of ALIN in non-interactive matching tracks showed
the following results: high precision and not so high recall when compared to the
other tools, as can be seen in Anatomy track1 (Table 1). The conference track
results can be seen on the OAEI 20182 page.


2.2   Comments on the participation of the ALIN in interactive
      tracks

Interactive Anatomy Track In this track, the program ALIN showed the
highest precision among the four evaluated tools when the error rate is zero
(Table 2). When the error rate increases, both the precision as the recall falls,
so falling the F-measure (Table 3). Dependence on expert feedback to ensure
precision and to increase recall explains this decline in quality when the expert
makes mistakes.
    As ontologies of the Anatomy Track contains almost no properties, ALIN
cannot utilize some interactive techniques like the selection of property mappings
related to accepted class mappings. Not using these techniques has limited the
increase in recall, which influenced the F-measure.
1
  Results     for    OAEI     2018    -     Anatomy      track.    Available    at
  http://oaei.ontologymatching.org/2018/results/anatomy/ Last accessed on Oct, 02,
  2018.
2
  Results of Evaluation for the Conference track within OAEI 2018 . Available
  at http://oaei.ontologymatching.org/2018/results/conference/index.html Last ac-
  cessed on Oct, 24, 2018.
        Table 1. Participation of ALIN in Anatomy non-interactive track

                        Tool      Precision Recall F-Measure
                        AML         0.95    0.936     0.943
                    LogMapBio      0.888    0.908     0.898
                    POMAP++        0.919    0.877     0.897
                       XMap        0.929    0.865     0.896
                       LogMap      0.918    0.846     0.88
                      SANOM        0.888    0.844     0.865
                      FCAMapX      0.941    0.791     0.859
                      KEPLER       0.958    0.741     0.836
                        Lily       0.872    0.795     0.832
                    LogMapLite     0.962    0.728     0.828
                      ALOD2Vec     0.996    0.648     0.785
                    StringEquiv    0.997    0.622     0.766
                       DOME        0.997    0.615     0.761
                       ALIN        0.998    0.611     0.758
                    Holontology    0.976    0.294     0.451

   Table 2. Participation of ALIN in Anatomy interactive track - Error rate 0.0

               Tool    Precision Recall F-measure Total Requests
              ALIN       0.994    0.826    0.902          602
              AML        0.964    0.948    0.956          240
             LogMap      0.982    0.846    0.909          388
              XMap       0.929    0.867    0.897           35

   Table 3. Participation of ALIN in Anatomy interactive track - Error rate 0.1

               Tool    Precision Recall F-measure Total Requests
              ALIN       0.914    0.802    0.854          578
              AML        0.952    0.946    0.948          268
             LogMap      0.961    0.832    0.892          388
              XMap       0.929    0.867    0.897           35


Interactive Conference Track In this track, ALIN stood out, showing the
greatest F-measure among the four tools when the error rate is zero (Table 4),
as with a loss of F-measure when the error rate increases (Table 5).
    Table 4. Participation of ALIN in Conference interactive track - Error rate 0.0

                  Tool   Precision Recall F-measure Total Requests
                 ALIN      0.921    0.721     0.809          276
                 AML       0.912    0.711     0.799          270
                LogMap     0.886    0.61      0.723           82
                 XMap      0.719    0.62      0.666           16

    Table 5. Participation of ALIN in Conference interactive track - Error rate 0.1

                  Tool   Precision Recall F-measure Total Requests
                 ALIN      0.725    0.686     0.705          264
                 AML       0.838    0.698     0.762          277
                LogMap     0.85     0.596      0.7            82
                 XMap      0.719    0.62      0.666           16


  Other results, including results with different error rates, can be seen on the
OAEI 20183 page.


2.3    Comparison of the participation to ALIN in OAEI 2018 with
       his participation in OAEI 2017

 – One modification made in ALIN was the withdrawal of additional criteria for
   the automatic classification of mappings. At the beginning of its execution,
   ALIN automatically selects mappings with the entities with the same name
   to put into the alignment. In the OAEI 2017, ALIN used additional criteria
   for that, that is, if a mapping had the two entities with the same name,
   but had met one of those criteria, ALIN didn’t put it into the alignment. In
   the conference data set, the use of these criteria increased the precision of
   the alignment, and thus its quality, but also the number of interactions. In
   the Anatomy data set, the use of these criteria increased only the number
   of interactions. For OAEI 2018, ALIN focused on reducing its number of
   interactions. So, ALIN doesn’t use the additional criteria for the automatic
   classification of mappings anymore. This modification reduced the number
   of interactions (Total Requests) in both the anatomy track (Table 6) and
   the conference track (Table 7), without decreasing the quality (F-measure)
   on the anatomy track.
 – Another modification was the selection of new mappings to the set of map-
   ping suggestions. For OAEI, one interactive matching system can place up
   to three related mappings in an interaction. To take advantage of this rule,
3
    Results    for    OAEI    2018    -   Interactive    Track      .   Available     at
    http://oaei.ontologymatching.org/2018/results/interactive/     Last accessed      on
    Oct, 2, 2018.
   in 2018, ALIN selects new mappings, with at least one entity equal to other
   already selected, to put into the set of mapping suggestions. This selection
   increases the likelihood of raising the recall. This modification increased the
   recall on the anatomy track (Table 6) but not increased enough on the con-
   ference track (Table 7) to compensate for the first modification.
 – ALIN has stopped using the WS4J API4 . ALIN had already stopped using
   WS4J to calculate similarity in OAEI 2017, starting to use HESML. ALIN
   was only using WS4J to find the most common synset to an entity name,
   but now ALIN is directly accessing the Wordnet files.


Table 6. Participation of ALIN            in    Anatomy   interactive   track   -   OAEI
2016[11]/2017[12]/2018- Error rate 0.0

                  Year Precision Recall F-measure Total Requests
                  2016   0.993    0.749        0.854        803
                  2017   0.993    0.794        0.882        939
                  2018   0.994    0.826        0.902        602


Table 7. Participation of ALIN in Conference interactive track - OAEI
2016[11]/2017[12]/2018- Error rate 0.0

                  Year Precision Recall F-measure Total Requests
                  2016   0.957    0.735        0.831        326
                  2017   0.957    0.731        0.829        329
                  2018   0.921    0.721        0.809        276


3     General Comments

Evaluating the results it can be seen that the system can be improved towards:

 – handling user error rate;
 – generating a higher quality (especially w.r.t. recall) initial alignment in its
   non-interactive phase;
 – reducing the number of interactions with the expert.
4
    ’WS4J’. Available at https://github.com/Sciss/ws4j Last accessed on Jan, 16, 2018.
3.1   Conclusions

The ALIN system stands out in the interactive ontology matching process when
ontologies have some characteristics, such as many documented properties, and
when the expert does not make mistakes.
   The second author was partially funding by project PQ-UNIRIO N01/2017
(” Aprendendo, adaptando e alinhando ontologias:metodologias e algoritmos.”)
and CAPES/PROAP.


References
 1. Paulheim, H., Hertling, S., Ritze, D.: Towards Evaluating Interactive Ontology
    Matching Tools. Lecture Notes in Computer Science 7882 (2013) 31–45
 2. David, J., Euzenat, J., Scharffe, F., Trojahn dos Santos, C.: The alignment api
    4.0. Semant. web 2(1) (January 2011) 3–10
 3. Irving, R.W., Manlove, D.F., O’Malley, G.: Stable marriage with ties and bounded
    length preference lists. Journal of Discrete Algorithms 7(2) (2009) 213 – 219
    Selected papers from the 2nd Algorithms and Complexity in Durham Workshop
    ACiD 2006.
 4. Surhone, L.M., Timpledon, M.T., Marseken, S.F.: SimMetrics. VDM Publishing
    (2010)
 5. Lastra-Daz, J.J., Garca-Serrano, A., Batet, M., Fernndez, M., Chirigati, F.: Hesml.
    Inf. Syst. 66(C) (June 2017) 97–118
 6. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.:
    The Stanford CoreNLP natural language processing toolkit. In: Association for
    Computational Linguistics (ACL) System Demonstrations. (2014) 55–60
 7. Silva, J., Baião, F., Revoredo, K., Euzenat, J.: Semantic interactive ontology
    matching: Synergistic combination of techniques to improve the set of candidate
    correspondences. In: OM-2017: Proceedings of the Twelfth International Workshop
    on Ontology Matching. Volume 2032. (2017) 13–24
 8. Guedes, A., Baião, F., Shivaprabhu, Revoredo, R.: On the Identification and Rep-
    resentation of Ontology Correspondence Antipatterns. In: Proc. 5th Int. Conf.
    Ontol. Semant. Web Patterns (WOP14), CEUR Work. Proc. (2014)
 9. Guedes, A., Baião, F., Revoredo, K.: Digging Ontology Correspondence Antipat-
    terns. In: Proceeding WOP14 Proc. 5th Int. Conf. Ontol. Semant. Web Patterns.
    Volume 1032. (2014) 38–48
10. Silva, J., Revoredo, K., Baião, F.A., Euzenat, J.: Interactive Ontology Matching:
    Using Expert Feedback to Select Attribute Mappings. (2018)
11. Silva, J., Baião, F., Revoredo, K.: Alin results for oaei 2016. In: OM-2016: Pro-
    ceedings of the Eleventh International Workshop on Ontology Matching. OM’16
    (2016) 130–137
12. Silva, J., Baião, F., Revoredo, K.: Alin results for oaei 2017. In: OM-2017: Pro-
    ceedings of the Twelfth International Workshop on Ontology Matching. OM’17
    (2017) 114–121