=Paper= {{Paper |id=Vol-2181/paper-11 |storemode=property |title=A Journey From Simple to Complex Alignment on Real-World Ontologies |pdfUrl=https://ceur-ws.org/Vol-2181/paper-11.pdf |volume=Vol-2181 |authors=Lu Zhou |dblpUrl=https://dblp.org/rec/conf/semweb/Zhou18 }} ==A Journey From Simple to Complex Alignment on Real-World Ontologies== https://ceur-ws.org/Vol-2181/paper-11.pdf
 A Journey From Simple to Complex Alignment
           on Real-World Ontologies

                                      Lu Zhou

            DaSe Lab, Wright State University, Dayton OH 45435, USA
                              zhou.34@wright.edu



      Abstract. Ontology alignment has been an active research topic for
      over a decade. Over that time, many developers have focused on cre-
      ating alignment systems and methods to find simple 1-to-1 equivalence
      matches between two ontologies. However, very few alignment systems
      focus on finding complex correspondences. There are several reasons for
      this limitation. First, there are no widely accepted alignment benchmarks
      that contain such complex relationships. Second, the traditional evalua-
      tion metrics like precision, recall, and f-measure are not accurate enough
      to evaluate the performance of a complex alignment system. And third,
      the approaches most commonly used to find simple equivalences do not
      handle the increased computational complexity of finding complex equiv-
      alences well. Therefore, it becomes a big challenge for many developers
      to create and evaluate the systems. In this paper, in order to advance
      the development of ontology matching, we seek to address the problem
      by first developing potential complex alignment benchmarks from real-
      world ontologies. In addition, we utilize traditional automated alignment
      systems to suggest complex correspondences, and finally plan to achieve
      our ultimate goal of creating and evaluating our own complex alignment
      system based on logical RDF data compression.


1   Problem Statement

Similar to database integration, ontology alignment is an important process in
enabling computers to query and reason across the many linked open datasets on
the semantic web. This is a difficult challenge because the ontologies underlying
different linked datasets can vary in terms of subject area coverage, level of
abstraction, ontology modeling philosophy, and even language.
     In order to solve this problem, an ontology alignment system aims to iden-
tify the entity relationships between two and more ontologies. Such relationships
have a wide range of complexity, from basic 1-to-1 equivalence to arbitrary m-
to-n relationships. However, over a decade, the majority of the studies in the
field still focuses on the simplest end of this scale – finding 1-to-1 equivalence
relations between ontologies. Very few ontology alignment systems and methods
are developed to uncover complex relations. The reasons for this limitation may
lie in the following. First, there are no widely used and accepted ontology align-
ment benchmarks that involve complex relations. Without these benchmarks,
2      Lu Zhou

even if there were a complex alignment system, it is not only very hard to eval-
uate if this system could correctly detect the complex correspondences, but also
it is a challenge to evaluate whether the system is comprehensive enough to
find all different kinds of complex patterns. Second, the traditional approach of
precision, recall, and f-measure does not seem fine-grained enough to evaluate
complex correspondences. A better version of precision and recall is needed [2].
    This work seeks to progress in the direction of fostering the development of
research activities in the field of complex ontology alignment. We firstly show
that real-world ontologies involve many complex relations. And based on these
real-world datasets, we develop high quality complex alignment benchmarks,
including creating complex alignments and categorizing them into complex pat-
terns. In addition, to decrease the complexity of detecting complex relations, we
leverage automated alignment systems to uncover and suggest possible complex
relations. Moreover, we plan to apply logical RDF compression with the results
that are generated by traditional automated alignment systems to create a new
complex alignment system.


2   Relevancy

Ontology alignment seeks to address the conceptual heterogeneities between on-
tologies. Over a decade, the community of this field remains on creating and
improving the algorithms of finding simple alignments. The reason that the re-
searchers do not really dig into the complex alignment for such a long time, is
that the community is still discovering and analyzing how to reach the goal.
Nowadays, the research related to simple alignment has been well studied. It is
actually a good timing to move on to complex ontology alignment, because more
and more good alignment systems and algorithms have been published, and also
more and more data are populated into ontologies and published as linked open
data, the applications that utilize these LODs are required to involve ontology
matching and data integration processes [3]. In addition, due to the complexity
of the alignments between ontologies, only identifying traditional simple 1-to-1
alignment is not enough to fulfill the growing high demand of most of these
applications. Therefore, it is necessary to create complex alignment systems and
methods to uncover complex relations in real-world use cases.
    This work focuses on addressing the problem from several different perspec-
tives. We prepared benchmarks that involve complex relations from real-world
ontologies and will try to distribute them as a new track in the Ontology Align-
ment Evaluation Initiative (OAEI), which was started in 2005 with the intent
to allow researchers in the field to compare the performance of their approaches
on a consistent set of benchmarks over time. Since then, it has been more conve-
nient for researchers from different organizations to test their methods. Second,
as we’ve seen, it is a difficult challenge to detect complex relations. This work
seeks to narrow down this issue by leveraging traditional alignment systems to
suggest possible complex candidates. This would be a valuable starting point for
determining the exact relation. Moreover, our ultimate goal in this work is to
    A Journey From Simple to Complex Alignment on Real-World Ontologies          3

create a complex alignment system that has better performance than traditional
automated alignment systems.


3    Related work
Regarding the creation of complex alignment benchmarks, Thieblin [11] is creat-
ing a complex alignment benchmark using the Conference track ontologies within
the OAEI. This work is partially completed, and at the time of this writing it
covers three of the seven ontologies. In addition, we are collaborating with them
(under their direction) to complete the dataset and prepare a new task in OAEI
to evaluate complex alignment systems.
    However, even though there are no widely accepted and used benchmarks
that involve complex relations, some researchers still tried to create alignment
systems and evaluate them using their ow manually developed reference align-
ment. BLOOMS [3] is an alignment system based on Wikipedia to detect the
subsumption relations. Other subsumption systems have evaluated the precision
of their approach by manually validating relations produced by their system,
while foregoing an assessment of recall [9]. There are some more general ap-
proaches based on complex patterns to detect complex correspondences. Ritze
et al [7, 8] proposed several complex correspondences patterns. Such as: Class
by Attribute Value, Class by Attribute Type, Class by Inverse Attribute Type,
Inverse Properties, and Property Chain. In addition, Ritze also utilized linguistic
analysis techniques, like detection of antonymy, active form, etc to help detect
these complex patterns. Other similar work was done by Šváb-Zamazal and
Svátek [10]. It firstly detected N-ary relations in the source ontology. And then,
it matched the detected N-ary relations to an object property in the target
ontology.
    Our work differs from the above methods in several aspects. First, we focus
on real-world ontologies, which we found that these datasets are not only used
by academic researchers, but also the industries and governments to develop
applications for the usage of normal human life. There are some interesting
relations that have not yet been mentioned in the current benchmark from OAEI.
In addition, the instance data of these real-world ontologies are ready to be used
as additional information to help improve the performance of alignment process.
In contrast to this, significant instance data is not readily available for most of
the OAEI Conference Track ontologies. Moreover, regarding the creation of a
complex alignment, instead of comparing each entity in the source ontology to
each entity in the target ontology, we apply logical RDF compression to list a
set of available rules, and narrow down them based on the suggestion generated
by traditional alignment systems to finally output the complex relation. More
details are discussed in Section 5.
4      Lu Zhou

4   Research Questions and Hypotheses
The research questions that we plan to address are listed as follows:

 1. Do real-world ontology alignments contain complex relations?
 2. How well do traditional automated alignment systems work on real-world
    matching tasks that contain complex relations?
 3. Can we create an automated alignment system that performs better than tra-
    ditional alignment systems on finding complex relations that exist between
    ontologies?

Our hypotheses associated to the above research questions are the following:

 1. Most real-world ontologies contain many complex relations.
 2. Traditional automated alignment systems may not be able to identify com-
    plex relationships directly, but they may be able to suggest the atomic enti-
    ties involved in such relations.
 3. A complex alignment system that leverages logical RDF compression can
    effectively identify complex relations between ontologies.


5   Approach
Hypothesis 1 Our previous work with the NSF EarthCube Initiative and the
US Geological Survey involved the time consuming task of manually aligning
several real-world ontologies. These alignments have been discussed and evalu-
ated by domain experts and ontology engineers to guarantee that they are of
high quality. We will inventory these alignments, along with any other real-world
alignments we can acquire, to answer the first research question: Do real-world
ontology alignments contain complex relations?

Hypothesis 2 To answer the second research question, “How well do tradi-
tional automated alignment systems perform on real-world matching tasks that
contain complex relations?”, we plan to first evaluate several state of the art
alignment systems on the alignment tasks mentioned above. Since traditional
alignment systems only attempt to identify simple relations between ontologies,
their performance will be limited to the percentage of the alignments that in-
volve these types of relations. However, it is possible that these systems, while
they cannot identify the precise relationship that holds between an entity in the
source ontology and two or more entities in the target ontology, they can at
least identify the entities involved in the relationship. For example, in the rela-
tion below, the class M ischaakusaakihiikin in Cree ontology is equivalent to
the intersection of instances of LakeOrP ond and entities that isContainedBy
a SwampOrM arsh in the SWO ontology. While a traditional alignment sys-
tem cannot identify things like intersection or value restrictions, it may be able
to determine that LakeOrP ond, isContainedBy, and SwampOrM arsh are re-
lated in some way to M ischaakusaakihiikin. To check this, for each entity es
   A Journey From Simple to Complex Alignment on Real-World Ontologies            5

in source ontology Os , we will use the automated alignment systems to give a
list of candidates et in the target ontology Ot , ordered by the similarity assigned
to them by the alignment system. We will evaluate the performance against the
benchmark using mean reciprocal rank [6].

EquivalentClasses(cree:Mischaakusaakihiikin
        ObjectIntersectionOf(swo:LakeOrPond
                ObjectSomeValuesFrom(swo:isContainedBy
                        swo:SwampOrMarsh)))

Hypothesis 3 As we mentioned, the ultimate goal of this work is to see if
we can create an automated alignment system that effectively identifies com-
plex relationships that exist between two ontologies. Our planned approach is
to create an extensional matcher (i.e. one that relies upon instance data) that
leverages logical RDF compression [5]. Logical RDF compression uses the FP-
Growth data mining algorithm to generate rules that can be stored in lieu of
the triples they are based on. For example, say that a linked dataset contains
triples about university students. There might be many triples of the form  and many corresponding triples of the form  because, according to this dataset, all Com-
puter Science majors are enrolled in the College of Engineering. Logical RDF
compression would replace the second set of triples with a single rule: if x is has-
Major ComputerScience then x isEnrolledIn CollegeOfEngineering, and these
triples could then be generated on-the-fly in response to queries, thereby saving
space in the linked dataset. While logical RDF compression seeks to find any
rules that can be used to shrink the dataset, it is possible that some of these
rules represent meaningful semantic relations that hold between entities. For
example, if hasMajor ComputerScience exists in one ontology and isEnrolledIn
CollegeOfEngineering exists in another ontology, then it may be possible to infer
the relation below.

SubClassOf(ObjectSomeValuesFrom(ont1:hasMajor ont1:ComputerScience)
    ObjectSomeValuesFrom(ont2:isEnrolledIn ont2:CollegeOfEngineering))

    Because the FP-Growth algorithm underlying logical RDF compression can
generate a very large number of rules, some mechanism must be put in place to
choose the more semantically meaningful rules rather than the ones that result
in the most compression. Our planned approach for this is to choose rules that
involve the entities suggested by traditional alignment systems.
    The overall work flow is shown in Figure 1. We first apply the traditional
alignment systems to suggest the candidates as we described above. And then,
we use RDF compression [5] on the source ontology to list a set of compression
rules. Based on the suggested candidates from traditional alignment systems,
we can create a filter to pick up the compression rules, and finally output the
complex relations.
6        Lu Zhou




         Fig. 1. The Work Flow of the Proposed Complex Alignment System


6     Preliminary Results

We have some preliminary results in terms of finding complex relations from real-
world ontologies. There are two different datasets that we are currently working
on. One dataset is from GeoLink project1 that was funded under the U.S. Na-
tional Science Foundation’s EarthCube initiative. Another dataset is a set of
ontologies from surface water domain. Based on these two sets of ontologies, we
have developed the alignments in consultation with domain experts from differ-
ent institutions. In GeoLink datasets, the number of classes, object properties,
and data properties in GeoLink base ontology (GBO) and GeoLink modular on-
tology (GMO) are showed in Table 1. We found that the 87 out of 111 relations
in them are complex relations, including not only class subsumption, property
subsumption, property chain equivalence, and property chain subsumption that
were introduced in [7], but also some typecasting relations. The idea of typecast-
ing, and why it is important in ontology modeling, is formally introduced and
discussed in [4]. Moreover, the alignment is also available in both EDOAL and
rules syntax for the purpose of manipulating and reading respectively. The full
dataset has been uploaded to the FigShare2 . We also wrote a paper to describe
the creation and submitted to ISWC2018, which is currently under-review.
    In hydrography dataset, it consists of four ontologies. (a) The Surface Wa-
ter Ontology (SWO), which was originally presented in [12], was developed by
the US Geological Survey (USGS). (b) The Hydro3 ontology was developed by
individuals at the University of Maine in order to support expanded gazetteer
functions using topology and semantic inference [13]. (c) The HydrOntology is a
non-English ontology, which was developed by the Spanish National Geographic
Institute (IGN) [1]. (d) Cree surface water ontology is in a language, Cree, which
1
    https://www.geolink.org/
2
    https://doi.org/10.6084/m9.figshare.5907172
    A Journey From Simple to Complex Alignment on Real-World Ontologies           7

Table 1. The Number of Classes, Object Properties, and Data Properties in Both
GeoLink Ontologies

     Ontology Classes Object Properties Data Properties
      GBO       40           149              49
      GMO      156           124              46


is spoken by some of the native inhabitants of northern Canada. The reason of
choosing these four ontologies is that Hydro3, HydrOntology and Cree have a
large degree of overlap with SWO. Therefore, we utilize these four ontologies
to create a reference alignment manually. Table 2 shows the number of classes,
and object properties, and data properties. And we found that the 84 out of
197 relations in them are complex relations. We are currently writing a paper
about evaluating the performance of traditional automated alignment systems
on this dataset. The traditional automated alignment systems are able to find
the simple relationships. But, they may not be able to identify these complex
relationships directly. We hypothesize that they may be able to suggest some
possible complex relations partially. Moreover, it is able to greatly narrow down
the entities and mitigates the high complexity of computation.


Table 2. The Number of Classes, Object Properties, and Data Properties in Hydrog-
raphy Ontologies

       Ontology Classes Object Properties Data Properties
        SWO        85          20                1
       Hydro3      22          34                0
     HydrOntology 154          47               75
         Cree      83          21                7




7    Evaluation Plan

In this section, we introduce the evaluation plan for each research question. For
research question 1, as we showed in Section 6, it is considered successful that
we have found many complex alignments in real-world ontologies. In addition,
we are also preparing to incorporate the dataset into OAEI as a new track for
other researchers accessing it. For research question 2, after achieving the list of
entities involved in a complex relation, we will evaluate the performance against
the benchmark using mean reciprocal rank as we discussed in Section 5. For
research question 3, it is a challenge to evaluate the performance of a complex
alignment system. The traditional precision, recall, and f-measure metrics do not
8      Lu Zhou

seem fine-grained enough. For example, there is a relation between Hydro3 and
SWO, if one alignment system identified this:

EquivalentClasses(
        ObjectIntersectionOf(
                hydro3:Hydrographic_Feature
                hydro3:Hydrographic_Structure
                hydro3:Boundary)
        swo:HydrographicFeature))

and another identified this:

SubClassOf(
        ObjectUnionOf(
                hydro3:Hydrographic_Feature
                hydro3:Island
                hydro3:Shore)
        swo:HydrographicFeature))

Based on the reference alignment, we need a metric to consider the first sys-
tem “more correct” than the second. We plan to develop a performance metric
that more accurately reflects the performance of a complex alignment system.
Another challenge is that, to the best of our knowledge, there are no existing
complex alignment systems against which to compare our approach. Therefore,
we might consider evaluating the performance based on our manually created
reference alignment.


8   Reflections

It is primarily difficult to identify complex relationships between ontologies be-
cause of computational complexity. A naive approach would need to compare
every entity in the source ontology to every possible combination of entities
in the target ontology, which is not feasible. Instead of doing this, our proposed
approach has a good chance of success because it is based on a logical RDF com-
pression method that has already been shown to be applicable to large datasets,
and we also can further limit the search space by using the output from tradi-
tional alignment systems to narrow the focus. There are some reflections. The
performance of using logical RDF compression in our alignment system is pri-
marily based on the Abox information in the ontology. It is still not clear that
how to apply our alignment algorithm to a more generalized scenario. However,
our approach is feasible, and can be a good starting point to achieve the ultimate
goal in the future.
Acknowledgments I am very grateful to my advisor Dr. Pascal Hitzler and Dr.
Michelle Cheatham for their support and constructive suggestions of this re-
search proposal.
   A Journey From Simple to Complex Alignment on Real-World Ontologies                  9

References
 1. Blázquez, L.M.V., Gargantilla, J.Á.R., López-Pellicer, F.J., Corcho, Ó., Nogueras-
    Iso, J.: An approach to comparing different ontologies in the context of hydrograph-
    ical information. In: Information Fusion and Geographic Information Systems, Pro-
    ceedings of the Fourth International Workshop, IF&GIS 2009, 17-20 May 2009, St.
    Petersburg, Russia. pp. 193–207 (2009)
 2. Euzenat, J.: Semantic precision and recall for ontology alignment evaluation. In:
    IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial
    Intelligence, Hyderabad, India, January 6-12, 2007. pp. 348–353 (2007)
 3. Jain, P., Hitzler, P., Sheth, A.P., Verma, K., Yeh, P.Z.: Ontology alignment for
    linked open data. In: The Semantic Web - ISWC 2010 - 9th International Semantic
    Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised
    Selected Papers, Part I. pp. 402–417 (2010)
 4. Krisnadhi, A.A., Hitzler, P., Janowicz, K.: On the capabilities and limitations
    of OWL regarding typecasting and ontology design pattern views. In: Ontology
    Engineering - 12th International Experiences and Directions Workshop on OWL,
    OWLED 2015, co-located with ISWC 2015, Bethlehem, PA, USA, October 9-10,
    2015, Revised Selected Papers. pp. 105–116 (2015)
 5. Pan, J.Z., Gómez-Pérez, J.M., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern
    based RDF data compression. In: Semantic Technology - 4th Joint International
    Conference, JIST 2014, Chiang Mai, Thailand, November 9-11, 2014. Revised Se-
    lected Papers. pp. 239–256 (2014)
 6. Radev, D.R., Qi, H., Wu, H., Fan, W.: Evaluating web-based question answering
    systems. In: Proceedings of the Third International Conference on Language Re-
    sources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands,
    Spain (2002)
 7. Ritze, D., Meilicke, C., Sváb-Zamazal, O., Stuckenschmidt, H.: A pattern-based on-
    tology matching approach for detecting complex correspondences. In: Proceedings
    of the 4th International Workshop on Ontology Matching (OM-2009) collocated
    with the 8th International Semantic Web Conference (ISWC-2009) Chantilly, USA,
    October 25, 2009 (2009)
 8. Ritze, D., Völker, J., Meilicke, C., Sváb-Zamazal, O.: Linguistic analysis for com-
    plex ontology matching. In: Proceedings of the 5th International Workshop on
    Ontology Matching (OM-2010), Shanghai, China, November 7, 2010 (2010)
 9. Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic alignment of rela-
    tions, instances, and schema. Proceedings of the VLDB Endowment 5(3), 157–168
    (2011)
10. Šváb-Zamazal, O., Svátek, V.: Towards ontology matching via pattern-based de-
    tection of semantic structures in owl ontologies. In: Proceedings of the Znalosti
    Czecho-Slovak Knowledge Technology conference (2009)
11. Thieblin, E., Haemmerle, O., Hernandez, N., Trojahn, C.: Towards a complex
    alignment evaluation dataset. In: Proceedings of the 16th International Conference
    on Ontology Matching-Volume 2032. CEUR-WS. org (2017)
12. Varanka, D.E., Usery, E.L.: An applied ontology for semantics associated with sur-
    face water features. Land Use and Land Cover Semantics: Principles, Best Prac-
    tices, and Prospects p. 145 (2015)
13. Vijayasankaran, N.: Enhanced place name search using semantic gazetteers (2015)