=Paper= {{Paper |id=Vol-1766/oaei16_paper8 |storemode=property |title=Lily results for OAEI 2016 |pdfUrl=https://ceur-ws.org/Vol-1766/oaei16_paper8.pdf |volume=Vol-1766 |authors=Peng Wang,Wenyu Wang |dblpUrl=https://dblp.org/rec/conf/semweb/WangW16 }} ==Lily results for OAEI 2016== https://ceur-ws.org/Vol-1766/oaei16_paper8.pdf
                      Lily Results for OAEI 2016

                              Peng Wang1 , Wenyu Wang1,2
      1
          School of Computer Science and Engineering, Southeast University, China
                  2
                    Chien-Shiung Wu College, Southeast University, China
                               {pwang, ms} @ seu.edu.cn




          Abstract. This paper presents the results of Lily in the ontology align-
          ment contest OAEI 2016. As a comprehensive ontology matching system,
          this year Lily is intended to participate in three tracks of the contest:
          benchmark, conference, and anatomy. The specific techniques used by
          Lily will be introduced briefly. The strengths and weaknesses of Lily will
          also be discussed.



1     Presentation of the system

With the use of hybrid matching strategies, Lily, as an ontology matching sys-
tem, is capable of solving some issues related to heterogeneous ontologies. It can
process normal ontologies, weak informative ontologies [5], ontology mapping de-
bugging [7], and ontology matching tunning [9], in both normal and large scales.
In previous OAEI contests [1–3], Lily has achieved preferable performances in
some tasks, which indicated its effectiveness and wideness of availability.


1.1       State, purpose, general statement

The core principle of matching strategies of Lily is utilizing the useful information
correctly and effectively. Lily combines several effective and efficient matching
techniques to facilitate alignments. There are four main matching strategies:
(1) Generic Ontology Matching (GOM) is used for common matching tasks with
normal size ontologies. (2) Large scale Ontology Matching (LOM) is used for the
matching tasks with large size ontologies. (3) Ontology mapping debugging is
used to verify and improve the alignment results. (4) Ontology matching tuning
is used to enhance overall performance.
    The matching process mainly contains three steps: (1) Pre-processing, when
Lily parses ontologies and prepares the necessary information for subsequent
steps. Meanwhile, the ontologies will be generally analyzed, whose characteris-
tics, along with studied datasets, will be utilized to determine parameters and
strategies. (2) Similarity computing, when Lily uses special methods to calculate
the similarities between elements from different ontologies. (3) Post-processing,
when alignments are extracted and refined by mapping debugging.
    This time, Lily has few changes compared to the OAEI 2015 version.
1.2   Specific techniques used
Lily aims to provide high quality 1:1 concept pair or property pair alignments.
The main specific techniques used by Lily are as follows.

Semantic subgraph An element may have heterogeneous semantic interpre-
tations in different ontologies. Therefore, understanding the real local meanings
of elements is very useful for similarity computation, which are the foundations
for many applications including ontology matching. Therefore, before similarity
computation, Lily first describes the meaning for each entity accurately. However,
since different ontologies have different preferences to describe their elements,
obtaining the semantic context of an element is an open problem. The semantic
subgraph was proposed to capture the real meanings of ontology elements [4].
To extract the semantic subgraphs, a hybrid ontology graph is used to repre-
sent the semantic relations between elements. An extracting algorithm based on
an electrical circuit model is then used with new conductivity calculation rules
to improve the quality of the semantic subgraphs. It has been shown that the
semantic subgraphs can properly capture the local meanings of elements [4].
    Based on the extracted semantic subgraphs, more credible matching clues can
be discovered, which help reduce the negative effects of the matching uncertainty.

Generic ontology matching method The similarity computation is based
on the semantic subgraphs, which means all the information used in the simi-
larity computation comes from the semantic subgraphs. Lily combines the text
matching and structure matching techniques.
    Semantic Description Document (SDD) matcher measures the literal similar-
ity between ontologies. A semantic description document of a concept contains
the information about class hierarchies, related properties and instances. A se-
mantic description document of a property contains the information about hier-
archies, domains, ranges, restrictions and related instances. For the descriptions
from different entities, the similarities of corresponding parts will be calculated.
Finally, all separated similarities will be combined with the experiential weights.

Matching weak informative ontologies Most existing ontology matching
methods are based on the linguistic information. However, some ontologies may
lack in regular linguistic information such as natural words and comments. Con-
sequently the linguistic-based methods will not work. Structure-based methods
are more practical for such situations. Similarity propagation is a feasible idea
to realize the structure-based matching. But traditional propagation strategies
do not take into consideration the ontology features and will be faced with ef-
fectiveness and performance problems. Having analyzed the classical similarity
propagation algorithm, Similarity Flood, we proposed a new structure-based on-
tology matching method [5]. This method has two features: (1) It has more strict
but reasonable propagation conditions which lead to more efficient matching pro-
cesses and better alignments. (2) A series of propagation strategies are used to
improve the matching quality. We have demonstrated that this method performs
well on the OAEI benchmark dataset [5].
   However, the similarity propagation is not always perfect. When more align-
ments are discovered, more incorrect alignments would also be introduced by
the similarity propagation. So Lily also uses a strategy to determine when to use
the similarity propagation.



Large scale ontology matching Matching large ontologies is a challenge due
to its significant time complexity. We proposed a new matching method for large
ontologies based on reduction anchors [6]. This method has a distinct advantage
over the divide-and-conquer methods because it does not need to partition large
ontologies. In particular, two kinds of reduction anchors, positive and negative
reduction anchors, are proposed to reduce the time complexity in matching.
Positive reduction anchors use the concept hierarchy to predict the ignorable
similarity calculations. Negative reduction anchors use the locality of matching
to predict the ignorable similarity calculations. Our experimental results on the
real world datasets show that the proposed methods are efficient in matching
large ontologies [6].



Ontology mapping debugging Lily utilizes a technique named ontology map-
ping debugging to improve the alignment results [7]. Different from existing meth-
ods that focus on finding efficient and effective solutions for the ontology mapping
problems, mapping debugging emphasizes on analyzing the mapping results to
detect or diagnose the mapping defects. During debugging, some types of map-
ping errors, such as redundant and inconsistent mappings, can be detected. Some
warnings, including imprecise mappings or abnormal mappings, are also locked
by analyzing the features of mapping result. More importantly, some errors and
warnings can be repaired automatically or can be presented to users with revising
suggestions.



Ontology matching tuning Lily adopted ontology matching tuning this year.
By performing parameter optimization on training datasets [9], Lily is able to
determine the best parameters for similar tasks. Those data will be stored. When
it comes to real matching tasks, Lily will perform statistical calculations on the
new ontologies to acquire their features that help it find the most suitable con-
figurations, based on previous training data. In this way, the overall performance
can be improved.
    Currently, ontology matching tuning is not totally automatic. It is difficult
to find out typical statistical parameters that distinguish each task from oth-
ers. Meanwhile, learning from test datasets can be really time-consuming. Our
experiment is just a beginning.
1.3     Adaptations made for the evaluation


For benchmark, anatomy and conference tasks, Lily is totally automatic, which
means Lily can be invoked directly from the SEALS client. It will also determine
which strategy to use and the corresponding parameters.



1.4     Link to the system and parameters file


SEALS wrapped version of Lily for OAEI 2016 is available at https://drive.
google.com/folderview?id=0B5j4YFThSEQkRXdUVUg5eHRFSUE&usp=sharing.



1.5     Link to the set of provided alignments


The set of provided alignments, as well as overall performance, is available at
each track of the OAEI 2016 official website, http://oaei.ontologymatching.
org/2016/.



2      Results


2.1     Benchmark track


There are two datasets in different sizes: biblio and film. The biblio dataset
concerns bibliographic references and is inspired freely from BibTeX. The film
dataset contains a movie ontology in English and French. Especially, the film
dataset was not known from the participants when submitting their systems,
and actually have been generated afterwards. This biblio dataset will be matched
using Generic Ontology Matching, because the ontology size is generally small.
Lily will automatically choose matching methods and strategy to handle with
film dataset.
   There are five groups of test suites in each dataset. Each test suite has 94
matching tasks. The overall results of one test suite will be represented by the
mean value of Precision, Recall and F-Measure. Test suites were generated from
the same seed ontologies, which means they are all equal. Thus, the harmonic
mean values of all test suites will be used to evaluate how well Lily worked.
      The detailed results are shown in Table 1.
                Table 1. The performance in the Benchmark track

                       Test suite Precision Recall F-Measure
                       biblio-r1    0.97     0.84    0.90
                       biblio-r2    0.96     0.83    0.89
                       biblio-r3    0.97     0.84    0.90
                       biblio-r4    0.97     0.83    0.89
                       biblio-r5    0.97     0.83    0.89
                       H-mean 0.97          0.83     0.89
                        film-r1     0.97     0.69    0.80
                        film-r2     0.97     0.69    0.80
                        film-r3     0.97     0.70    0.81
                        film-r4     0.97     0.70    0.81
                        film-r5     0.97     0.70    0.81
                       H-mean 0.97          0.70     0.81



   As Table 1 has shown, Lily handles Benchmark datasets well. According to
the Benchmark results of OAEI20161 , Lily has the highest overall F-Measure
among all matching systems.


2.2    Anatomy track

The anatomy matching task consists of two real large-scale biological ontologies.
Table 2 shows the performance of Lily in the Anatomy track on a server with
one 3.46 GHz, 6-core CPU and 8GB RAM allocated. The time unit is second
(s).


                 Table 2. The performance in the Anatomy track

                   Matcher Runtime Precision Recall F-Measure
                    Lily    272s     0.87     0.79    0.83



    Compared with the result in OAEI 2011 [8], there is a small improvement of
Precision, Recall and F-Measure, from 0.80, 0.72 and 0.76 to 0.87, 0.79 and 0.83,
respectively. One main reason for the improvement is that we found the names
of classes not semantically useful, which would confuse Lily when the similarity
matrix was calculated. After the names were excluded, better alignments were
generated. Besides, there is a significant reduction of the time consumption, from
563s to 272s. This is not only the result of stronger CPU, but also because more
optimizations, like parallelization, were applied to the algorithms in Lily.
    However, as can be seen in the overall result, Lily lies in the middle position
of the rank, which indicates it is still possible to make further progress. Addi-
1
    http://oaei.ontologymatching.org/2016/results/benchmarks/index.html
tionally, some key algorithms have not been successfully parallelized. After that
is done, the time consumption is expected to be further reduced.


2.3   Conference track

In this track, there are 7 independent ontologies that can be matched with one
another. The 21 subtasks are based on given reference alignments. As a result of
heterogeneous characters, it is a challenge to generate high-quality alignments
for all ontology pairs in this track.
    Lily adopted ontology matching tuning for the Conference track this year.
Table 3 shows its latest performance.


                 Table 3. The performance in the Conference track

                       Test Case ID Precision Recall F-Measure
                      cmt-conference  0.53      0.6    0.56
                         cmt-confof   0.80     0.25    0.38
                           cmt-edas   0.64     0.54    0.58
                          cmt-ekaw    0.55     0.55    0.55
                         cmt-iasted   0.57     1.00    0.73
                         cmt-sigkdd   0.70     0.58    0.64
                    conference-confof 0.67     0.53    0.59
                     conference-edas  0.41     0.41    0.41
                     conference-ekaw  0.62     0.64    0.63
                    conference-iasted 0.67     0.43    0.52
                    conference-sigkdd 0.71     0.67    0.69
                        confof-edas   0.69     0.47    0.56
                        confof-ekaw   0.79     0.75    0.77
                       confof-iasted  0.46     0.67    0.55
                       confof-sigkdd  0.17     0.14    0.15
                          edas-ekaw   0.67     0.52    0.59
                         edas-iasted  0.50     0.37    0.42
                        edas-sigkdd   0.63     0.33    0.43
                        ekaw-iasted   0.50     0.80    0.62
                        ekaw-sigkdd   0.50     0.46    0.48
                       iasted-sigkdd  0.56     0.67    0.61
                          Average     0.59    0.53     0.56



    Compared with the result in OAEI 2011 [8], there is a significant improvement
of mean Precision, Recall and F-Measure, from 0.36, 0.47 and 0.41 to 0.59, 0.53
and 0.56, respectively. Besides, all the tasks share the same configurations, so it is
possible to generate better alignments by assigning the most suitable parameters
for each task. We will continue to enhance this feature.
3   General comments
On the whole, Lily is a comprehensive ontology matching system with the ability
to handle multiple types of ontology matching tasks, of which the results are
generally competitive. The performance of Lily is similar to the results of 2015
[10]. However, Lily still lacks in strategies for some newly developed matching
tasks. The relatively high time and memory consumption also prevent Lily from
finishing some challenging tasks.

4   Conclusion
In this paper, we briefly introduced our ontology matching system Lily. The
matching process and the special techniques used by Lily were presented, and
the alignment results were carefully analyzed.
   There is still so much to do to make further progress. Lily needs more opti-
mization to handle large ontologies with limited time and memory. Thus, tech-
niques like parallelization will be applied more. Also, we have just tried out
ontology matching tuning. With further research on that, Lily will not only
produce better alignments for tracks it was intended for, but also be able to
participate in the interactive track.

References
[1] Peng Wang, Baowen Xu: Lily: ontology alignment results for OAEI 2009. In The
  4th International Workshop on Ontology Matching, Washington Dc., USA (2009)
[2] Peng Wang, Baowen Xu: Lily: Ontology Alignment Results for OAEI 2008. In The
  Third International Workshop on Ontology Matching, Karlsruhe, Germany (2008)
[3] Peng Wang, Baowen Xu: LILY: the results for the ontology alignment contest OAEI
  2007. In The Second International Workshop on Ontology Matching (OM2007), Bu-
  san, Korea (2007)
[4] Peng Wang, Baowen Xu, Yuming Zhou: Extracting Semantic Subgraphs to Capture
  the Real Meanings of Ontology Elements. Journal of Tsinghua Science and Technol-
  ogy, vol. 15(6), pp. 724-733 (2010)
[5] Peng Wang, Baowen Xu: An Effective Similarity Propagation Model for Matching
  Ontologies without Sufficient or Regular Linguistic Information, In The 4th Asian
  Semantic Web Conference (ASWC2009), Shanghai, China (2009)
[6] Peng Wang, Yuming Zhou, Baowen Xu: Matching Large Ontologies Based on Re-
  duction Anchors. In The Twenty-Second International Joint Conference on Artificial
  Intelligence (IJCAI 2011), Barcelona, Catalonia, Spain (2011)
[7] Peng Wang, Baowen Xu: Debugging Ontology Mapping: A Static Approach. Com-
  puting and Informatics, vol. 27(1), pp. 2136 (2008)
[8] Peng Wang: Lily results on SEALS platform for OAEI 2011. Proc. of 6th Interna-
  tional Workshop on Ontology Matching, pp. 156-162 (2011)
[9] Yang, Pan, Peng Wang, Li Ji, Xingyu Chen, Kai Huang, Bin Yu: Ontology Matching
  Tuning Based on Particle Swarm Optimization: Preliminary Results. In The Semantic
  Web and Web Science, pp. 146-155 (2014)
[10] Wenyu Wang, Peng Wang: Lily results for OAEI 2015. Proc. of 11th International
  Workshop on Ontology Matching, (2015)