FCAMapX results for OAEI 2018

                            Guowei Chen1,2 and Songmao Zhang2
                        1
                         University of Chinese Academy of Sciences
      2
          Institute of Mathematics, Academy of Mathematics and Systems Science,
                     Chinese Academy of Sciences, Beijing, P.R. China
                                     gwch@amss.ac.cn
                                   smzhang@math.ac.cn


          Abstract. FCAMapX is an automated ontology matching system based
          on Formal Concept Analysis, a mathematical model for analyzing indi-
          viduals and structuring concepts. FCAMapX has succeeded in partici-
          pating in three tracks of 2018 OAEI this year, including the Conference
          track, Anatomy, and Large Biomedical Ontologies. Based on our 2016
          OAEI submission system FCA-Map which failed some large tasks within
          a designated time, we pursue improvements in efficiency and precision in
          FCAMapX. Concretely, we optimize the data structures for saving mem-
          ory space and implement a more efficient algorithm for computing formal
          concept lattices. To favor precision, we tighten the condition for identify-
          ing lexical mappings and strengthen the structural validation to retrieve
          negative evidence for matches identified lexically and structurally. As a
          result, the running time for all the tasks has become less than an hour in
          our experimental setting; and in a majority of the cases, the precision and
          F-measure are both improved while the recall is lowered. Additionally, in
          comparison with other OAEI participants, FCAMapX has achieved the
          best or the second best F-measure and recall in most large biomedical
          ontology matching tasks.


1     Presentation of the system

Based on our 2016 OAEI participant system FCA-Map [1,3], this edition, called
FCAMapX, pursues to improve the efficiency and precision.


1.1       State, purpose, general statement

In OAEI 2016, we submitted FCA-Map, a novel system based on Formal Con-
cept Analysis to identify and validate mappings across ontologies, including one-
to-one mappings and complex mappings. FCA-Map incrementally generates a
total of three types of formal contexts and extracts mappings from the lattices
derived. First, the token-based formal context describes how class names, labels,
and synonyms share lexical tokens, leading to lexical mappings (anchors) across
ontologies. Second, the relation-based formal context describes how classes are
in taxonomic, partonomic and disjoint relationships with the anchors, leading
2      Guowei Chen and Songmao Zhang

to positive and negative structural evidence for validating the lexical match-
ing. Third, the positive relation-based context can be used to discover structural
mappings. The 2016 OAEI evaluation in the Anatomy, the Large Biomedical On-
tologies, and the Disease and Phenotype track demonstrates the effectiveness of
FCA-Map and its competitiveness with the top-ranked systems. For SNOMED-
NCI(whole), the largest ontology matching task in OAEI, FCA-Map ranks first
for recall and second for F-measure; ranks second for both F-measures of FMA-
NCI and FMA-SNOMED, and obtains the best F-measures for most Disease
and Phenotype tasks [3]. On the other hand, FCA-Map suffers from long running
times due to the high complexity of deriving formal concept lattice in the Formal
Concept Analysis formalism, which is a PSPACE-complete problem. Moreover,
the performance of FCA-Map in terms of precision is relatively poorer than of
recall and F-measure. We intend to address these two issues in the 2018 edition
FCAMapX.


1.2   Specific techniques used

In order to improve the efficiency, we optimize the data structures for saving
memory space and implement a more efficient algorithm Hermes [4] for com-
puting formal concept lattice and Galois sub-hierarchy. The Hermes algorithm
has an efficient running time of O(min{nm, nα }), where n is the number of ob-
jects or attributes, m the size of formal context, and nα the time required to
perform matrix multiplication (currently α = 2.376). To improve the precision,
we tighten the condition for identifying lexical mappings from the token-based
lattice computed in the first step. Moreover, the second step for structural vali-
dation and the third step for structural mapping are swapped so that the positive
and negative evidence can be retrieved for all mappings identified, lexically and
structurally. This can favor precision as mappings with negative evidence are
discarded.


1.3   Adaptations made for the evaluation

Similarly to our previous edition, our SEALS submission included precomputed
word variants originated from UMLS[5] for mapping biomedical ontologies. More-
over, in order to augment the performance of FCAMapX in mapping ontologies in
general purpose domains like those of the Conference track, we used the synsets
of WordNet[6] in the first step for identifying synonymous terms. Property names
in the Conference ontologies are also taken into account when constructing the
token-based formal context for lexical mapping.


1.4   Link to the system and parameters file

SEALS wrapped version of FCAMapX for OAEI 2018 is available at https:
//drive.google.com/open?id=1-0upxrcPbu5OVJAJn-DtTOUMOh3QDriM.
                                         FCAMapX results for OAEI 2018          3

1.5   Link to the set of provided alignments
The results obtained by FCAMapX for OAEI 2018 are available at https://
drive.google.com/open?id=1DzRD_90O3YwoGpW5FJL9vSy_f1Ia0YZo


2     Results
In this section, we present our evaluation results obtained by running FCAMapX
over the tracks of Anatomy, Conference, and Large Biomedical Ontologies. Tests
were performed using a desktop computer with 16 GB of RAM and Intel R
CoreTM i7-8700 CPU @ 3.20GHz.

2.1   The OAEI 2018 Anatomy Track
The anatomy track consists of the Adult Mouse Anatomy (2744 classes) and a
fragment of the NCI Thesaurus (3304 classes) for describing the human anatomy.
Compared with our 2016 version, FCAMapX has improved the precision from
0.932 to 0.941, whereas the recall is decreased from 0.837 to 0.791, leading to a
drop of the F-Measure from 0.882 to 0.860, as shown in Table 1).


                       Table 1. Results for Anatomy track

                  Task Precision Recall F-Measure Runtime (s)
                 MA-NCI 0.941 0.791       0.860     11.811


2.2   The OAEI 2018 Conference Track
The Conference 2018 Track contains 16 ontologies describing the domain of con-
ference organizations. These ontologies are of smaller scale with limited classes
and semantic relations, for which our approach can be ineffective, as analyzed
in [3]. In this edition, we add external knowledge source WordNet and the re-
sults are listed in Table 2. Taking advantage of the additional synonyms defined
in WordNet for general purpose domains, FCAMapX has increased the average
recall from 0.52 to 0.582 and the average F-measure from 0.61 to 0.62, while the
precision drops from 0.75 to 0.698.

2.3   The OAEI 2018 Large Biomedical Ontologies Track
This track consists of finding alignments between the Foundational Model of
Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus
(NCI). These ontologies are of both large-scale and semantic richness. The results
obtained by FCAMapX are depicted in Table 3. Except for FMA-NCI (small), in
all other five tasks, FCAMapX has managed to increase the precision as well as
4      Guowei Chen and Songmao Zhang

                      Table 2. Results for Conference track

                     Task    Precision Recall F-Measure Runtime (s)
              cmt-conference  0.563 0.600       0.581     1.194
                 cmt-confOf   0.667 0.375       0.480     0.291
                   cmt-edas   0.615 0.615       0.615     0.391
                  cmt-ekaw    0.556 0.455       0.500     0.254
                 cmt-iasted   0.500 1.000       0.667     0.546
                 cmt-sigkdd   0.750 0.750       0.750     0.222
            conference-confOf 0.818 0.600       0.692     0.243
             confenrece-edas  0.600 0.529       0.562     0.355
             conference-ekaw  0.619 0.520       0.565     0.273
            conference-iasted 0.364 0.286       0.320     0.466
            conference-sigkdd 0.750 0.600       0.667     0.223
                confOf-edas   0.846 0.579       0.687     0.304
                confOf-ekaw   0.857 0.600       0.706      0.24
               confOf-iasted  0.857 0.667       0.750     0.403
               conOf-sigkdd   1.000 0.571       0.727     0.193
                  edas-ekaw   0.647 0.478       0.550     0.343
                 edas-iasted  0.727 0.421       0.533      0.48
                edas-sigkdd   0.875 0.467       0.609     0.293
                ekaw-iasted   0.462 0.600       0.522     0.467
                ekaw-sigkdd   0.778 0.636       0.700     0.235
               iasted-sigkdd  0.813 0.867       0.839     0.534


the F-measure while the recall values are lowered. Take FMA-SNOMED (whole)
for example, the precision is 1.8 times of the 2016 version and the F-measure
1.4 times. More importantly, in our own experimental setting, FCAMapX fin-
ished all tasks in the Large Biomedical track within 2 hours as required by 2016
OAEI, whereas our 2016 system failed the three Whole tasks. For the largest task
SNOMED-NCI (whole), our previous version ran about 13 hours as reported in
[3], and by FCAMapX, the time has been downsized to 0.95 hours.


                   Table 3. Results for Large Biomedical track

                Task         Precision Recall F-Measure Runtime (s)
            FMA-NCI (small)   0.948 0.911       0.929      73.692
            FMA-NCI (whole)   0.665 0.841       0.743     1171.62
         FMA-SNOMED (small) 0.955 0.815         0.879     125.791
         FMA-SNOMED (whole) 0.819 0.762         0.789    2179.924
          SNOMED-NCI (small)  0.878 0.703       0.781    1039.138
         SNOMED-NCI (whole) 0.796 0.680         0.733    3418.672
                                          FCAMapX results for OAEI 2018        5

    As reported by OAEI 3 , out of the six tasks in the track, FCAMapX ranks
first for three and second for two tasks in terms of recall; and for F-measure,
FCAMapX ranks first for two and second for three tasks.


3     General comments
This is the second time that we participate in the OAEI campaign with our
Formal Concept Analysis based systems. The main goal is to improve the effi-
ciency in regard to our 2016 edition which failed to finish within the designated
time for three tasks in Large Biomedical Ontologies track. This has been accom-
plished by FCAMapX. At the same time, strengthening the structural validation
of mappings has yielded higher precisions which can lead to better F-measure
values.

3.1    Comments on the results
FCAMapX has succeeded in participating in three tracks this year, including the
Conference track, Anatomy, and Large Biomedical Ontologies. The running time
for all the tasks has become less than an hour now in our experimental setting.
In a majority of the cases, the precision and F-measure are both improved while
the recall is lowered. That FCAMapX performs unsatisfactorily for FMA-NCI
(small) in comparison with our 2016 system deserves a further explanation.

3.2    Discussions on the way to improve the proposed system
We intended to run FCAMapX on the Disease and Phenotype track where our
previous 2016 system performs competitively [1,3]. The results in our own setting
against the consensus alignments with vote 3 are listed in Table 4, where the
matching tasks involve the Human Phenotype (HP) Ontology, the Mammalian
Phenotype (MP) Ontology, the Human Disease Ontology (DOID), and the Or-
phanet and Rare Diseases Ontology (ORDO). Note that these results cannot be
compared with our 2016 system, as the version and source of the four ontologies
are different from the ones used in 2016 4 .
   Unfortunately, FCAMapX failed this track with errors as reported by the
OAEI evaluation. This indicates that the quality of the system shall be improved.


                  Table 4. Results for Disease and Phenotype track

                   Task  Precision Recall F-Measure Runtime (s)
                  HP-MP   0.848 0.760       0.802    2368.376
                DOID-ORDO 0.869 0.729       0.793     450.134


3
    http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2018/results/
4
    http://oaei.ontologymatching.org/2018/phenotype/
6       Guowei Chen and Songmao Zhang

3.3   Comments on the OAEI procedure

With our participating experience this year, we find that OAEI is well organized
in an efficient way and organizers helpful. Various tracks have different levels of
difficulty, which is challenging and appealing, and the SEALS platform is very
convenient to use.


4     Conclusions

In this paper, we present FCAMapX as an improved version of our 2016 OAEI
system FCA-Map. The improvement mainly lies in the efficiency, as illustrated
by the dramatic drop of running times, for instance from 13 to 1 hour for the
largest OAEI task. The second improvement is on the mapping precision which
normally causes the F-measure to rise. Compared with other OAEI participants,
FCAMapX has achieved the best or the second best F-measure and recall in
five out of the six large biomedical ontology matching tasks. Despite these, our
system still has a long way to go in terms of covering all OAEI tracks, especially
those instance matching tasks for which the Formal Concept Analysis formalism
has a potential to prevail with its capability of clustering commonalities among
individuals.


Acknowledgements

This work has been supported by the National Key Research and Development
Program of China under grant 2016YFB1000902, and the Natural Science Foun-
dation of China grant 61621003.


References
 1. Zhao, M., & Zhang, S. (2016, December). FCA-Map results for OAEI 2016. In
    OM@ ISWC (pp. 172-177).
 2. Zhao, M., & Zhang, S. (2016, December). Identifying and validating ontology map-
    pings by formal concept analysis. In OM@ ISWC (pp. 61-72).
 3. Zhao, M., Zhang, S., Li, W., & Chen, G. (2018). Matching biomedical ontologies
    based on formal concept analysis. Journal of biomedical semantics, 9(1), 11.
 4. Berry, A., Huchard, M., Napoli, A., & Sigayret, A. (2012, October). Hermes: an
    efficient algorithm for building Galois sub-hierarchies. In CLA: Concept Lattices
    and their Applications (pp. 21-32). Universidad de Malaga.
 5. Lindberg DA, Humphreys BL, McCray AT, et al. The unified medical language
    system. IMIA Yearbook. 1993;32(4):28191.
 6. G. A. Miller. WordNet: A Lexical Database for English. Communications of the
    ACM, 38(11):3941, 1995.
 7. de Souza, K.X.S., Davis, J.: Aligning ontologies and evaluating concept similari-
    ties. In: OTM Confederated International Conferences On the Move to Meaningful
    Internet Systems, Springer (2004) 10121029
                                            FCAMapX results for OAEI 2018           7

 8. Guan-yu, L., Shu-peng, L., et al.: Formal concept analysis based ontology merging
    method. In: Computer Science and Information Technology (ICCSIT), 2010 3rd
    IEEE International Conference on. Volume 8., IEEE (2010) 279282
 9. Obitko, M., Snsel, V., Smid, J.: Ontology design with formal concept analysis.
    CLA 128(3) (2004) 13771390
10. Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: IJCAI.
    Volume 1. (2001) 225230
11. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con-
    cepts. In: Ordered sets. Springer (1982) 445470
12. Xu, X., Wu, Y., Chen, J.: Fuzzy fca based ontology mapping. In: 2010 First In-
    ternational Conference on Networking and Distributed Computing, IEEE (2010)
    181185
13. Zhang, S., Bodenreider, O.: Experience in aligning anatomical ontologies. Interna-
    tional journal on Semantic Web and information systems 3(2) (2007) 1