FOAM – Framework for Ontology Alignment and Mapping
  Results of the Ontology Alignment Evaluation Initiative
                        Marc Ehrig                                                                   York Sure
                     Institute AIFB                                                               Institute AIFB
                 University of Karlsruhe                                                      University of Karlsruhe
               76128 Karlsruhe, Germany                                                     76128 Karlsruhe, Germany
               ehrig@aifb.uni-karlsruhe.de                                                  sure@aifb.uni-karlsruhe.de


ABSTRACT                                                                   4.   Similarity Aggregation, i.e. aggregate the multiple similarity
                                                                                assessments for one pair of entities into a single measure.
This paper briefly introduces the system FOAM and its
                                                                           5. Interpretation, i.e. use all aggregated numbers, a threshold
underlying techniques. We then discuss the results returned                     and an interpretation strategy to propose the alignment
from the evaluation. They were very promising and at the                        (align(e1)=‘ e2’). This may also include a user validation.
same time clarifying. Concisely: labels are very important;                6. Iteration, i.e. as the similarity of one alignment influences
structure helps in cases where labels do not work;                              the similarity of neighboring entity pairs; the equality is
dictionaries may provide additional evidence; ontology                          propagated through the ontologies.
management systems need to deal with OWL-Full. The                         Finally, we receive alignments linking the two ontologies.
results of this paper will also be very interesting for other
participants, showing specific strengths and weaknesses of                 This general process was extended to meet the mentioned
our approach.                                                              requirements.
                                                                           •    High quality results were achieved through a combination of
                                                                                a rule-based approach and a machine learning approach.
                                                                                Underlying individual rules such as, if the super-concepts are
1. PRESENTATION OF THE SYSTEM                                                   similar the entities are similar, have been assigned weights
                                                                                by a machine learnt decision tree [5]. Especially steps 1, 3
1.1 State, purpose, general statement                                           and 4 were adjusted for this. Currently, our approach does
In recent years, we have seen a range of research work on                       not make use of additional background knowledge such as
methods proposing alignments [1; 2]. When we tried to apply                     dictionaries here.
these methods to some of the real-world scenarios we address in            •    Efficiency was mainly achieved through an intelligent
other research contributions [3], we found that existing alignment              selection of candidate alignments in 2, the search step
methods did not suit the given requirements:                                    selection [4].
     •    high quality results;                                            •    User-interaction allows the user intervening during the
     •    efficiency;                                                           interpretation step. By presenting the doubtable alignments
     •    optional user-interaction;                                            (and only these) to the user, overall quality can be
     •    flexibility with respect to use cases;                                considerably increased. Yet this happens in a minimal
     •    and easy adjusting and parameterizing.                                invasive manner.
We wanted to provide the end-user with a tool taking ontologies            •    The system can automatically set its parameters according to
as input and returning alignments (with explanations) as output                 a list of given use cases, such as ontology merging,
meeting these requirements.                                                     versioning, ontology mapping, etc. The parameters also
                                                                                change according to the ontologies to align, e.g., big
1.2 Specific techniques used                                                    ontologies always require the efficient approach, whereas
We have observed that alignment methods like QOM [4] or                         smaller ones do not [6].
PROMPT [2] may be mapped onto a generic alignment process                  •    All these parameters may be set manually. This allows using
(Figure 1). Here we will only mention the six major steps to                    the implementation for very specific tasks as well.
clarify the underlying approach for the FOAM tool. We refer to             •    Finally, FOAM has been implemented in Java and is freely
[4] for a detailed description.                                                 available, thus extensible.

1.   Feature Engineering, i.e. select excerpts of the overall              1.3 Adaptations made for the contest
     ontology definition to describe a specific. This includes             No special adjustments have been made for the contest. However,
     individual features, e.g. labels, structural features, e.g.           some elements have been deactivated. Due to the small size of the
     subsumption, but also more complex features as used in                benchmark and directory ontologies efficiency was not used, user-
     OWL, e.g. restrictions.                                               interaction was removed for the initiative, and no specific use
2.   Search Step Selection, i.e. choose two entities from the two          case parameters were taken. A general alignment procedure was
     ontologies to compare (e1,e2).                                        applied.
3.   Similarity Assessment, i.e. indicate a similarity for a given
                                                                           The system used for the evaluation is a derivative of the ontology
     description        (feature) of     two     entities    (e.g.,
                                                                           alignment tool used in last year’s contests I3Con [7] and EON-
     simsuperConcept(e1,e2)=1.0).
                                                                           OAC [8].


                                                                      72
2. RESULTS                                                                    2.1.4 Tests 248 to 266
All tests were performed on a standard notebook under Windows.                These tests were the most challenging ones for our approach.
FOAM has been implemented in Java with all its advantages and                 Labels and comments had been removed and different structural
disadvantages.                                                                elements as well.
The individual results of the benchmark ontologies were grouped.              Precision reaches levels of 0.61 to 0.95. Recall is in the range of
Further, one short section describes the testing of the directory             0.18 to 0.55. Unfortunately, the evaluation results did not show a
and anatomy ontologies. The concrete results can be found in                  clear tendency of which structural element is most important for
Section 6.3 of this paper.                                                    our alignment approach. It seems that the structural features can

                                                                     Iteration   6
                                        1                    2                       3                  4                5
                            Feature             Search Step          Similarity             Similarity          Inter-
                            Engineering         Selection            Computation            Aggregation         pretation


                                  Input                                                                              Output


                                                      Figure 1: Ontology Alignment Process
                                                                              be exchanged to a certain degree. If one feature is missing,
2.1.1 Tests 101 to 104                                                        evidence is collected from another feature. This is a nice result for
These tests are basic tests for ontology alignment.                           our approach, as it indicates that the weighting scheme of the
As the system assumes that equal URIs mean equal objects an                   individual features has been assigned correctly. One tendency that
alignment of an ontology with itself always returns the correct               could be identified was that with decreasing semantic information
alignments. The alignment with and irrelevant ontology does not               the found alignments become sparser. However, most of the
return any results. Language generalization or restriction does not           identified alignments were correct (see precision).
affect the results. Our approach is robust enough to cope with                We will briefly mention one test for which our approach
these differences. Considering the differences which occur in real            performed surprisingly well. Ontology 262 has practically
world ontology modeling this is a very desirable feature.                     everything removed: no labels; no comments; no properties; no
                                                                              hierarchies. Nevertheless, some alignments have been identified.
2.1.2 Tests 201 to 210                                                        The only information that remained was the links between
Tests 201 through 210 focus on labels and comments of                         instances and their classes. By checking whether instance sets
ontological entities.                                                         were the same (at least in terms of numbers, the instance labels
                                                                              actually differed), some concepts could be correctly aligned.
The labels are the most important feature to identify an alignment.
In fact, everything else can be neglected, if the labels indicate an
alignment (e.g. also the comments in Test 203). Vice versa,                   2.1.5 Tests 301 to 304
changed labels do seriously affect the outcomes. As our approach              Ontologies 301 through 304 represent schemas modeled by other
currently does not make use of any dictionaries, this is critical.            institutions but covering the same domain of bibliographic
Small changes as occurring through a different naming                         metadata. From the evaluation perspective, these real world
convention can be balanced-out (Test 204 is only slightly worse               ontologies combine the difficulties of the previous tests.
than the ideal result). Synonyms or translations, possibly also               Especially test case 301 differs both in terms of structure and
with removed comments, lower especially recall considerably                   labels. Its labels generally use the term “has”, i.e. “hasISBN”
(between 0.57 and 0.87). Nevertheless, the structure alignment                instead of “ISBN”. This results in a rather low term similarity, as
does find many of the alignments, despite the differing labels. For           our approach does not split the strings into individual terms.
the mentioned recalls, precision stays between 0.80 and 0.96.                 Combined with the differing structure this results in a rather low
                                                                              quality. Also for the other ontologies, both precision and recall do
2.1.3 Tests 221 to 247                                                        not reach perfect levels. However, the results are satisfactory. In
For all these tests the structure is changed.                                 fact, preliminary tests using our semi-automatic approach showed
                                                                              that results could be noticeably increased with very little effort.
However, as the labels remain, alignment is very good. Again,
                                                                              The question that will partially also be answered by this initiative,
this indicates that labels are the main distinguishing feature. Only
                                                                              is what can maximally be reached. We hope to gain these insights
smaller irritations result from the differing structures. In specific,
                                                                              by comparing our results to other participants’ results.
more false positives are identified resulting in a precision of in the
worst case “only” 0.94. Recall stays above0.97. According to the
amount of structure also the processing time changes. Please note             2.2 Directory Ontologies
that first results are returned almost instantaneously (less than 5           The directory ontologies are subsumption hierarchies. They could
seconds). The times presented in the table represent the total time           be easily processed. The evaluation results at the workshop will
until the approach stops its search for alignments.                           presumably show the following main effects: Subsumption helps
                                                                              to identify some alignments correctly. Our missing usage of
                                                                              dictionaries misses some alignments. As this dataset only uses


                                                                         73
subsumption, we cannot rely on the more complex ontology                       are one good underlying test base. For our approach, the directory
features which our approach normally also tries to exploit. Thus,              tests are less interesting, as they are restricted to subsumption
results will not be ideal.                                                     hierarchies, rather than complete ontologies. Many of the specific
                                                                               advantages of our approach cannot be applied. It was very
2.3 Anatomy Ontologies                                                         unfortunate, that we could not run the anatomy tests. However,
We were very interested in running our ontology alignment on the               we think it is very important to have some real world ontologies,
big real world anatomy ontologies. Especially for our efficient                and we hope to test them at a latter point in time.
approach, this would have been a deep evaluation. Unfortunately,               For future work, it might be interesting to add some user-
the ontologies were modeled in OWL-Full. Our approach is based                 interaction component to the tests. It would also be interesting to
on the KAON2-infrastructure1 that only allows for OWL-DL. As                   not only have real world ontologies, but also see which alignment
this interaction is very deep, it was not possible to change to an             approach performs how for specific ontology alignment
ontology environment capable of OWL-Full for the contest. We                   applications.
could not run these tests. One result, for us, was the realization
that ontologies will probably not stay in the clean world of OWL-              3.4 Comments on the measures
DL. We will have to draw consequences from this.                               Precision and recall are without any doubt the most important
                                                                               measures. Some balancing measure needs to be added as well, as
3. GENERAL COMMENTS                                                            we have done with the f-measure. Otherwise, it is very difficult to
                                                                               draw conclusions on which approach worked best on which test
3.1 Comments on the results                                                    set. For future evaluation it would also be interesting to make use
An objective comment on strengths or weakness requires the                     of some less strict evaluation measure, as presented in [9].
comparison with other participants, which will not be available
before the workshop. However, some conclusions can be drawn.
                                                                               4. CONCLUSION
Strengths:                                                                     In this paper, we have briefly presented an approach and a tool for
       •    Labels or identifiers are important and help to align              ontology alignment and mapping - FOAM. This included the
            most of the entities.                                              general underlying process. Further, we have mentioned how
                                                                               specific requirements are realized with this tool. We then applied
       •    The structure helps to identify alignments, if the labels          FOAM to the test data. The results were carefully analyzed. We
            are not expressive.                                                also discussed some future steps for both our own approach and
       •    A more expressive ontology results in better                       the evaluation of alignments in general.
            alignments; an argument in favor of ontologies                     The main conclusions from the experiments were:
            compared to simple classification structures.
                                                                                    •    It is possible to create a good automatic ontology
       •    The generally learnt weights have shown very good                            alignment approaches.
            results.
                                                                                    •    Labels are most important.
Weaknesses:
                                                                                    •    Structure helps, if the labels are not expressive.
       •    The approach cannot deal with consequently changed
            labels. Especially translations, synonyms, or other                     •    Due to the importance of labels, our approach needs to
            conventions make it difficult to identify alignments.                        be extended with e.g. dictionaries in the background.

       •    The system is bound to OWL-DL or lesser ontologies.                     •    One general conclusion from the real world ontologies,
                                                                                         was that an ontology system has to be able to also
                                                                                         manage OWL-Full, as the real world does not provide
3.2 Discussions on the way to improve the                                                the clean ontologies of OWL-DL.
proposed system                                                                In general, the evaluation has shown us where our specific
Possible improvements are directly related to the weaknesses in
                                                                               strengths and weaknesses are, and how we can continue on
the previous section.
                                                                               improving. The results of other participants will give us some
       •    Extending the handling of labels (strings) can                     further guidelines.
            presumably increase overall effectiveness. Usage of
            dictionaries is widely applied and will be added to our            5. REFERENCES
            approach as well.                                                  [1] Agrawal, R., Srikant, R.: On integrating catalogs. In:
       •    The tight interconnection of FOAM with KAON2                           Proceedings of the Tenth International Conference on
            restricts the open usage of it. Currently efforts are being            the World Wide Web (WWW-10), ACM Press (2001)
            made to decouple them by inserting a general ontology                  603–612
            management layer.
                                                                               [2] Noy, N.F., Musen, M.A.: The PROMPT suite:
                                                                                   interactive tools for ontology merging and mapping.
3.3 Comments on the test cases
The benchmark tests have shown very interesting general results
                                                                                   International Journal of Human-Computer Studies 59
on how the alignment approach behaves. These systematic tests                      (2003) 983–1024
                                                                               [3] Ehrig, M., Haase, P., van Harmelen, F., Siebes, R.,
                                                                                   Staab, S., Stuckenschmidt, H., Studer, R., Tempich,
1
    http://kaon2..semanticweb.org                                                  C.: The SWAP data and metadata model for


                                                                          74
    semantics-based peer-to-peer systems. In: Proceedings
    of MATES-2003. First German Conference on                          #           Name           Prec.       Rec.     F-         Time
    Multiagent Technologies. LNAI, Erfurt, Germany,                                                                  measure
    Springer (2003)                                                   101   Reference            1.0      1.0        1.0       2.96
[4] Ehrig, M., Staab, S. QOM - quick ontology mapping.                      alignment
    In F. van Harmelen, S. McIlraith, and D. Plexousakis,             102   Irrelevant           -        -          -         207.14
    editors, Proceedings of the Third International                         ontology
    Semantic Web Conference (ISWC2004), LNCS, pages
                                                                      103   Language             1.0      1.0        1.0       180.95
    683–696, Hiroshima, Japan, 2004. Springer.                              generalization
[5] Ehrig, M., Staab, S., Sure, Y. Supervised learning of             104   Language             1.0      1.0        1.0       177.63
    an ontology alignment process. In Proceedings of the                    restriction
    Workshop on IT Tools for Knowledge Management
                                                                      201   No names             0.90     0.65       0.75      175.99
    Systems: Applicability, Usability, and Benefits
    (KMTOOLS) at 3. Konferenz Professionelles                         202   No names,         no 0.85     0.57       0.68      176.59
    Wissensmanagement, Kaiserslautern, Germany, April                       comments
    2005.                                                             203   No comments          1.0      1.0        1.0       174.21
[6] Ehrig, M., Sure, Y. Adaptive Semantic Integration. In             204   Naming               0.96     0.93       0.94      185.09
    Proceedings of the Workshop on Ontologies-based                         conventions
    techniques for DataBases and Information Systems at               205   Synonyms             0.80     0.67       0.73      174.46
    VLDB 2005, Trondheim, Norway, August 2005.                        206   Translation          0.93     0.76       0.84      172.15
[7] Hughes, T. Information Interpretation and Integration             207                        0.95     0.78       0.86      167.89
    Conference (I3CON) at PerMIS-2004, Gaithersburg,
                                                                      208                        0.96     0.87       0.92      164.20
    MD, USA, August 2004.
                                                                      209                        0.81     0.57       0.67      168.63
[8] Sure, Y., Corcho, O., Euzenat, J., Hughes, T. (editors),
    3rd International Workshop on Evaluation of Ontology              210                        0.92     0.67       0.77      164.31
    based Tools (EON2004), Volume 128, CEUR-WS                        221   No specialization    1.0      1.0        1.0       172.92
    Publication. Workshop at the 3rd International                    222   Flattened            1.0      1.0        1.0       127.63
    Semantic Web Conference (ISWC 2004), 7th-11th                           hierarchy
    November 2004, Hiroshima, Japan
                                                                      223   Expanded             0.99     1.0        0.99      142.70
[9] Ehrig, M., Euzenat, J. Relaxed Precision and Recall for                 hierarchy
    Ontology Alignment. In Proceedings of the Integrating             224   No instance          1.0      0.99       0.99      42.09
    Ontologies Workshop at K-Cap ’05, Banff, Alberta,
                                                                      225   No restrictions      1.0      1.0        1.0       171.13
    Canada, October 2005.
                                                                      228   No properties        1.0      1.0        1.0       112.60
6. RAWRESULTS                                                         230   Flattened classes    0.94     1.0        0.97      137.60

6.1 Link to the system and parameters file                            232                        1.0      0.99       0.99      45.50
The FOAM system may be downloaded at                                  233                        1.0      1.0        1.0       110.57
http://www.aifb.uni-karlsruhe.de/WBS/meh/foam.                        236                        1.0      1.0        1.0       12.77
The system is continuously improved, so results may slightly          237                        1.0      1.0        1.0       87.94
differ from the results provided in this paper. The interested        238                        1.0      1.0        1.0       106.29
reader is encouraged to download, test, and use the system.
                                                                      239                        0.94     1.0        0.97      73.14
6.2 Link to the set of provided alignments (in                        240                        0.95     0.97       0.97      84.63
    align format)                                                     241                        1.0      1.0        1.0       11.15
The results are also available through the website:                   246                        0.94     1.0        0.97      51.14
http://www.aifb.uni-karlsruhe.de/WBS/meh/foam/results.zip.
                                                                      247                        0.94     1.0        0.97      70.27
6.3 Matrix of results                                                 248                        0.85     0.48       0.62      251.65
The following results were achieved in the evaluation runs. As        249                        0.73     0.46       0.57      150.39
FOAM only allows identifying equality relations, precision and
                                                                      250                        0.95     0.55       0.69      114.00
recall only refer to these.
                                                                      251                        0.88     0.41       0.56      132.39
                                                                      252                        0.62     0.34       0.44      145.59


                                                                 75
253   0.80   0.44   0.57   83.96         262                 0.78   0.21   0.33   21.70
254   0.75   0.18   0.29   103.56        265                 0.85   0.38   0.52   70.50
257   0.76   0.48   0.59   28.43         266                 0.63   0.36   0.46   81.68
258   0.86   0.39   0.53   133.79        301   BibTeX/MIT    0.78   0.35   0.48   23.43
259   0.75   0.45   0.56   149.39        302   BibTeX/UMBC   0.88   0.74   0.80   21.31
260   0.85   0.38   0.52   71.21         303   Karlsruhe     0.84   0.90   0.87   61.08
261   0.61   0.33   0.43   82.89         304   INRIA         0.94   0.97   0.95   43.32


                                    76