=Paper=
{{Paper
|id=Vol-2277/paper42
|storemode=property
|title=
Combining Lexical and Semantic Similarity Measures with Machine Learning Approach for Ontology Matching Problem

|pdfUrl=https://ceur-ws.org/Vol-2277/paper42.pdf
|volume=Vol-2277
|authors=Lev Bulygin
|dblpUrl=https://dblp.org/rec/conf/rcdl/Bulygin18
}}
==
Combining Lexical and Semantic Similarity Measures with Machine Learning Approach for Ontology Matching Problem
==
<pdf width="1500px">https://ceur-ws.org/Vol-2277/paper42.pdf</pdf>
<pre>
   Combining Lexical and Semantic Similarity Measures with
    Machine Learning Approach for Ontology and Schema
                     Matching Problem
                                              © Lev Bulygin
                                    Lomonosov Moscow State University,
                                             Moscow, Russia
                                          buliginleo@yandex.ru
           Abstract. Ontology and schema matching is one of the most important tasks for data integration. We
     suggest to combine the different matchers with machine learning approach. The features are the outputs of
     lexical and semantic similarity functions. Naive Bayesian classifier, logistic regression and gradient tree
     boosting have been trained on these features. The proposed approach is tested on the OAEI 2017 benchmark
     “conference” with the various splits of the data on train and test sets. Experiments show that final combined
     model in element-level matching outperformed the single matchers. Results are compared with EditDistance
     matcher and WordNet matcher.
           Keywords: ontology matching, element-level matcher, lexical matcher, semantic matcher, word2vec,
     word2net.

                                                                  approach on OAEI 2017 “conference” ontologies with
 1 Introduction                                                   single matchers: edit distance and WordNet.
                                                                      In Section 2 we reviewed related work and it's
 Data integration is now a very important problem, as the         evolution. Section 3 describes the setup of the paper. In
 amount of data is growing very much. One of the most             Section 4 presents our matching algorithm in detail.
 important steps in automatic data integration is the             Section 5 shows the experiments and the analysis. We
 comparison of ontologies or schemas. This problem is             conclude this paper in Section 6.
 traditionally solved manually or semi-automatically.
 Manual ontology or schema matching is very labor-
                                                                  2 Related Work
 intensive, long in time and prone to errors. Therefore
 now the task is to maximally automate the process of             Schema matching and ontology matching are very
 comparing the circuit with the greatest accuracy.                similar. In this article, there is no difference for schema
     For schema matching and ontology matching                    matching and ontology matching because we only work
 element-level and structure-level matchers are used.             with the names of the entities. So we considered the
 Element-level matchers use information of elements of            works about schema matching and ontology matching
 schema (name of columns, description). Structure-level           together. In [1] a review of approaches for automatic
 matchers use information of structure (type similarity,          schema matching are conducted. They introduced the
 key properties, hierarchy of elements).                          classification of matchers and concluded that it is
     Recently, element-level matchers have been widely            impossible to create a fully automatic matching system,
 used lexical and semantic information. For getting               universal for all subject areas. One of the most important
 semantic information WordNet and Word2vec are used.              conclusions is that hybrid matching system is better than
 Word2vec allow us to get a meaning of the word and we            single system. Intuitively, this is obvious, since a hybrid
 can use this for improving ontology and schema                   matcher uses more information to make a decision.
 matching. So we can combine many lexical and semantic                In [2] the ready-made solutions for automatic schema
 information into one similarity matrix and train a               matching are compared and the concept of pre-match
 machine learning model for trying to solve ontology and          effort and post-match effort introduced. Pre-match effort
 schema matching problems.                                        is learning of the matcher, configuration of thresholds
     This work is performed as Master thesis. The main            and weights. Post-match effort is correction and
 goal of this work is to propose a approach for solving           improvement of the match output. In the same year, in
 ontology and schema matching problem with the greatest           [3] the authors described their COMA system. The
 accuracy. To achieve this goal we need to perform the            authors introduced new methods of post-match effort:
 following tasks: review related work, create the                 reuse of matchings, aggregation of individual matchers.
 architecture of solution, select of information for              In [4] the Target-based Integration Query System (TIQS)
 matching and create experiments. We compared our                 are described. In [5] a new classification on matching
                                                                  dimensions are presented. A natural issue is uncertainty,
                                                                  In [6] uncertainty are considered, monotonicity and
Proceedings of the XX International Conference                    statistical monotonicity are introduced . All the above
“Data Analytics and Management in Data Intensive                  articles don't use machine learning for aggregation
Domains” (DAMDID/RCDL’2018), Moscow, Russia,
October 9-12, 2018


                                                            245
results of matchers.                                                 3 Problem Statement, Evaluation Metrics
    In [7] Bayesian Networks are used for combining                  and Similarity Measures
ontology matching. As the features lexical information
(N-gram, Levenshtein, Dice Coefficient etc) were used.               3.1 Problem Statement
The best accuracy is 88%.
    In [8] the outputs from several single matchers are              An ontology O is by a 3-tuple (C, P, I). C is the classes,
used and the authors trained on this data a multi-layer              denoting the concepts. P is the relations within the
neural network.The authors used lexical information and              ontology. I is the instances of classes. The task of
the WordNet similarity.                                              ontology matching is to find the alignment between
    In [9] ontology matching problem in detail are                   entities in a source ontology 𝑂𝑂1 and a target ontology 𝑂𝑂2 .
described: applications, classifications of ontology                      An alignment is a set {(𝑒𝑒1 , 𝑒𝑒2 , 𝑐𝑐𝑐𝑐𝑐𝑐)|𝑒𝑒1 ∈ 𝑂𝑂1 , 𝑒𝑒2 ∈
matching techniques, evaluations and alignment. They                 𝑂𝑂2 }, where 𝑒𝑒1 is an entity in 𝑂𝑂1 , 𝑒𝑒2 is an entity in 𝑂𝑂2 , and
suggest three dimensions of building correspondences:                𝑐𝑐𝑐𝑐𝑐𝑐 is the confidence of the correspondence.
conceptual, extensional and semantic.                                     Our algorithm works only with names of entities
    In [10] Multiple Concept Similarity for ontology                 without the structure information. The input of our
mapping are described. The author reduced the ontology               algorithm is the two ontologies: source and target. The
mapping problem to the machine learning problem. They                output of our algorithm is the alignment.
used lexical information (prefix, suffix, Edit distance, n-
gram), semantic information (WordNet) and special                    3.2 Evaluation Metrics
types of similarity called Word List similarity (a                     The effectiveness of ontology matching could
similarity for sentences).                                           measured by precision, recall and F-measure.
    In [11] a machine-learning approach to ontology                                   𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
                                                                                                           |𝐴𝐴∩𝐵𝐵|
                                                                                                                   , (1)
alignment problem are described. The authors used                                                                 |𝐴𝐴|
                                                                                                             |𝐴𝐴∩𝐵𝐵|
lexical information, semantic information (WordNet)                                         𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 =           ,                                           (2)
                                                                                                              |𝐵𝐵|
and structural information for training Support Vector                                                                   2⋅𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝⋅𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
Machine, K-Nearest Neighbours, Decision Tree,                                             𝐹𝐹 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 =                                              ,   (3)
                                                                                                                         𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝+𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
AdaBoost models. The authors improves F-measure
criterion up to 99%.                                                     where A is a predict alignment and B is a true
    In [12] Bayeasian Networks are used for composition              alignment.
of matchers. The resulting model achieved 81%
accuracy. The authors used the outputs of the lexical                3.3 Used Similarity Measures
matchers, structure-level matchers, synonym matchers                 As features for machine learning we used the similarity
and instance-level matchers.                                         measures. We extract from ontology the names of
    In [13] all stages of ontology matching problem in               entities. Further we need to create sets of words from
detail are described: feature selection, methods of                  names for computing some similarity measures.
combining matchers and experiments.                                      Example. We have two names of entities:
    In [14] word2vec embeddings for ontology matching                “early_paid_applicant”        and       “early-registered-
problem are used. Authors proposed the new algorithm of              participant”. The sets are [“early”, “paid”, “applicant”]
matching and sentence2vec algorithm. The results matcher             and [“early”, “registered”, “participant”].
are compared with various types of WordNet, EditDistance
matchers. In [15] the combination of word2vec features and               Metrics based on lexical information from names of
a neural network are used. One of the most important                 entities:
problems is that articles using machine learning can not be          • N-gram [11]. Let ngram(S, N) be the set of
compared with each other because of the lack of a unified                 substrings of string S of length N. The n-gram
benchmark. The OAEI is not intended to use machine                        similarity for two strings S and T:
learning, so the training dataset is chosen by the author of                                             |𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛(𝑆𝑆,𝑁𝑁)∩𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛(𝑇𝑇,𝑁𝑁)|
the article.                                                                      𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛(𝑆𝑆, 𝑇𝑇) =                                                           (4)
                                                                                                                𝑚𝑚𝑚𝑚𝑚𝑚(|𝑆𝑆|,|𝑇𝑇|)−𝑁𝑁+1
    At the moment, the most promising option is the                  •    Dice coefficient [11]. The Dice similarity score is
using of machine learning to create a hybrid model with                   defined as twice the shared information
lexical, semantic and structure information. In this                      (intersection) divided by sum of cardinalities. For
article, we want to connect many different single                         two sets X and Y, the Dice similarity score is:
matchers to one hybrid matcher using machine learning.                                                                        2∗|𝑋𝑋∩𝑌𝑌|
One of the newest features we used is Word2vec. So far                                           𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑋𝑋, 𝑌𝑌) =                                              (5)
                                                                                                                              |𝑋𝑋|+|𝑌𝑌|
we only used the single element-level matchers.                      •    Jaccard similarity [11]. For two sets X and Y, the
    All reviewed papers use widely the lexical                            Jaccard similarity score is:
information and WordNet distances. In this paper we                                                                                 |𝑋𝑋∩𝑌𝑌|
used 29 similarity metrics from 17 various single                                               𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗(𝑋𝑋, 𝑌𝑌) =                                         (6)
                                                                                                                                    |𝑋𝑋∪𝑌𝑌|
element-level matchers.                                              •    Jaro measure [11]. The Jaro measure is a type of edit
                                                                          distance, developed mainly to compare short strings,
                                                                          such as first and last names.


                                                               246
•    Substring similarity [11]. For longest substring T of                                •   Ratio. For two strings X, Y:
     two strings X and Y substring similarity is:                                                                                         𝑀𝑀
                                                                                                         𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(2.0 ⋅ ⋅ 100), (10)
                                                            2∗|𝑇𝑇|                                                                    𝑇𝑇
                            𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋, 𝑌𝑌) =                      (7)          • where T is the total number of characters in both
                                                           |𝑋𝑋|+|𝑌𝑌|
•    Monge-Elkan [11]. Token-based and internal                                              strings, and M is the number of matches in the two
     similarity function for tokens:                                                         strings X and Y.
                                |𝑥𝑥|                                                      • Soundex [6]. Phonetic measure such as soundex
                            1
    𝑠𝑠𝑠𝑠𝑚𝑚𝑀𝑀𝑀𝑀 (𝑥𝑥, 𝑦𝑦) =        ∑ 𝑚𝑚𝑚𝑚𝑥𝑥𝑗𝑗=1,|𝑦𝑦| 𝑠𝑠𝑠𝑠𝑚𝑚′ (𝑥𝑥[𝑖𝑖], 𝑦𝑦[𝑗𝑗])                  match string based on their sound. These measures
                            |𝑥𝑥| 𝑖𝑖=1
                                                                                             have been especially effective in matching names,
(8)                                                                                          since names are often spelled in different ways that
• Smith-Waterman [11]. The Smith-Waterman                                                    sound the same.
    algorithm performs local sequence alignment; that
    is, for determining similar regions between two                                       •   Token Sort. Fuzzy Wuzzy token sort ratio raw
    strings.                                                                                  raw_score is a measure of the strings similarity as an
                                                                                              int in the range [0, 100]. For two strings X and Y,
•    Needleman-Wunsch [11]. The Needleman-Wunsch                                              the score is obtained by splitting the two strings into
     distance generalizes the Levenshtein distance and                                        tokens and then sorting the tokens. The score is then
     considers global alignment between two strings.                                          the fuzzy wuzzy ratio raw score of the transformed
     Specifically, it is computed by assigning a score to                                     strings.
     each alignment between the two input strings and
     choosing the score of the best alignment, that is, the                               •   Tversky Index. For sets X and Y:
                                                                                                                                          |𝑋𝑋∩𝑌𝑌|
     maximal score.                                                                                           𝑇𝑇𝑇𝑇(𝑋𝑋, 𝑌𝑌) =                                 ,   (11)
                                                                                                                               |𝑋𝑋∩𝑌𝑌|+𝛼𝛼|𝑋𝑋−𝑌𝑌|+𝜂𝜂|𝑌𝑌−𝑋𝑋|
•    Affine gap [17]. Returns the affine gap score                                        •   where 𝛼𝛼, 𝜂𝜂 > 0.
     between two strings. The affine gap measure is an                                    •   Overlap coefficient. For sets X and Y:
     extension of the Needleman-Wunsch measure that                                                                                     |𝑋𝑋∩𝑌𝑌|
     handles the longer gaps more gracefully.                                                                    𝑂𝑂𝑂𝑂(𝑋𝑋, 𝑌𝑌) =                                  (12)
                                                                                                                                    𝑚𝑚𝑚𝑚𝑚𝑚(|𝑋𝑋|,|𝑌𝑌|)
•    Bag distance. The number of common symbols in
     two strings.                                                                             Metrics based on semantic information from names
•    Cosine similarity [11]. For two sets X and Y:                                        of entities:
                                                             |𝑋𝑋∩𝑌𝑌|
                                                                                          • WordNet similarity. Best score of similarity between
                                   𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) =                     (9)              synonyms of one word with synonyms of other
                                                            �|𝑋𝑋|⋅|𝑌𝑌|
•    Partial Ratio. Given two strings X and Y, let the                                         word. We use a sentence similarity algorithm from
     shorter string (X) be of length m. It finds the fuzzy                                     [14].
     wuzzy ratio similarity measure between the shorter                                   •   Word2vec and Sentence2vec similarity. Word2vec
     string and every substring of length m of the longer                                     is a model that are used to produce word
     string, and returns the maximum of those similarity                                      embeddings. We used word2vec model trained by
     measures.                                                                                the texts of Google News. The dimensionality of
•    Soft TFIDF [21]. A variant of TF-IDF that considers                                      these vectors is 300. We used sentence2vec
     words equal based on Jaro Winkler rather than exact                                      algorithm from [14].
     match.
•    Editex. Editex is a phonetic distance measure that                                   4 Proposed Matching Algorithm
     combines the properties of edit distances with the                                       The input of the matching system is the two
     letter-grouping strategy used by Soundex and                                         ontologies. For each ontology the system extracts the
     Phonix.                                                                              names of the entities. Further it generate all possible pairs
•    Generalized Jaccard. This similarity measure is                                      of the names of the ontology with the names of the other
     softened version of the Jaccard measure. The                                         ontology. For all pairs system computes the all similarity
     Jaccard measure is promising candidate for tokens                                    measures. All outputs of the similarity measures are
     which exactly match across the sets.                                                 concatenated into the one similarity matrix. The
                                                                                          computed features are used as input to a machine
•    JaroWinkler. The Jaro-Winkler measure is designed
                                                                                          learning model.
     to capture cases where two strings have a low Jaro
     score, but share a prefix and thus are likely to match.                              Program 1 Element-level matching algorithm
•    Levenshtein distance [11]. Levenshtein distance                                      Input: Ontology1, Ontology2 - input ontologies
     computes the minimum cost of transforming one                                        or schemas, THRESHOLD - threshold for create
                                                                                          matching between elements
     string into the other.                                                               Output: alignment - output alignment for
•    Partial Token Sort. For two strings X and Y, the                                     Ontology1 and Ontology2
     score is obtained by splitting the two strings into                                  for Entity1 ∈ Ontology1 do
                                                                                            for Entity2 ∈ Ontology2 do
     tokens and then sorting the tokens. The score is then                                    Name1 ← get_name(Entity1)
     the fuzzy wuzzy partial ratio raw score of the                                           Name2 ← get_name(Entity2)
     transformed strings.                                                                     Features ← get_features(Name1, Name2)


                                                                                    247
    Match ← predict_match(Features)                                Table 1 Our split of dataset “conference” from OAEI
    if Match > THRESHOLD then                                      2017 benchmark
      alignment.append((Name1,Name2))
    end if
  end for                                                                    Train pairs                     Test pairs
end for
return alignment                                                                                     Conference, Ekaw
                                                                          Iasted, Sigkdd
                                                                                                       Ekaw, Sigkdd
Program 2 Creating dataset and training a machine                          Cmt, Confof
                                                                                                        Cmt, Sigkdd
learning model                                                             Confof, Edas
                                                                                                         Cmt, Ekaw
                                                                           Edas, Iasted
Input: TrueAlignments - set of true alignments,                                                         Ekaw, Iasted
TrainAlignments - set of train pairs                                      Confof, Ekaw
                                                                                                        Edas, Ekaw
                                                                            Cmt, Iasted
Output: Model - trained model for predict                                                            Conference, Iasted
                                                                           Edas, Sigkdd
matching                                                                                             Conference, Sigkdd
                                                                         Confof, Sigkdd
                                                                                                      Conference, Edas
for Ontology1, Ontology2 in TrueAlignments do                           Conference, Confof
                                                                                                       Confof, Iasted
  for Entity1 ∈ Ontology1 do                                                Cmt, Edas
    for Entity2 ∈ Ontology2 do
                                                                                                      Cmt, Conference
      if (Entity1, Entity2) ∈ TrainAlignments
      then
        add_to_train(Entity1, Entity2, 1)                          5.2 Experiments
      else
        add_to_train(Entity1, Entity2, 0)                          As a baseline solution we take EditDistance and
      end if                                                       WordNet matchers. For every matcher we select
    end for
  end for
                                                                   threshold with the best F-measure on the test dataset. As
end for                                                            the machine learning models we chose Naive Bayes
Model.train()                                                      Classifier 27, Logistic regression 28 as the simple models
return Model                                                       and Gradient boosting 29 as the complex model.
    For training a machine learning model we need the
                                                                   Table 2 The results on test dataset
ontologies and the alignments. We split the alignments
on the training alignments and testing alignments.                                    Precision,                       F-
Further we train a machine learning model on the                        Method                     Recall, %
                                                                                          %                        measure, %
features from train alignments. Then we can evaluate F-
measure on testing alignment.                                         EditDistance         50         35.8            41.7

5 Preliminaries and Results                                             WordNet            9.1        38.2            14.7
                                                                      Naive Bayes
5.1 Collection of Data                                                                     2.13      80.08            4.15
                                                                       Classifier
    The experiments were conducted at the dataset                       Logistic
                                                                                        75.58        40.12            52.41
“conference” from OAEI 2017. The dataset consists 16                   regression
ontologies from the same domain (conference
organisation) and 21 true ontology matchings. This                     XGBoost          69.15        45.67            55.01
dataset is convenient for machine learning because all
ontologies are from the same domain and this dataset has
the several true matchings between ontologies.                         We can see on Table 2 that XGBoost is the model
    We get entities from ontologies with parsing code on           with best F-measure 55.01%. The worst model is Naive
Python and get array of tuples: (entity1, entity2, match),         Bayes Classifiers with F-measure 4.15%. EditDistance is
where entity1, entity2 - string names of entities, match -         better than WordNet by 27%. For getting more stable
bool variable, 1 means that the entities are matched, 0            results we splitted the datasets randomly on training and
means that the entities are not matched. We generated 29           testing pairs 20 times and averaged the results. The
similarity measures from 3.3 with Python packages:                 results are show on Table 3. We can see that XGBoost
py_stringmatching 24, fuzzycomp 25, gensim 26.                     are remained the best model.
    The final dataset is very unbalanced: 391 572
negative samples and 305 positive samples. After
generating features we split data randomly on train and
test datasets in equal proportions. In Table 1 are
described our split.

24
   https://github.com/kvpradap/py_stringmatching                   .GaussianNB.html
25
   https://github.com/fuzzycode/fuzzycomp                          28
                                                                      http://scikit-
26
   https://github.com/RaRe-Technologies/gensim                     learn.org/stable/modules/generated/sklearn.linear_mode
27
   http://scikit-                                                  l.LogisticRegression.html
learn.org/stable/modules/generated/sklearn.naive_bayes             29
                                                                      https://github.com/dmlc/xgboost


                                                             248
Table 3 The averaged results on 20 various random                     Semantics IV, pp. 146-171 (2005). doi:
splits on train and test datasets                                     10.1007/11603412_5
                                                                 [6] Gal, A.: Why is Schema Matching Tough and
                  Precision,                    F-
       Method                  Recall, %                              What Can We Do About It? In: ACM SIGMOD
                      %                     measure, %
                                                                      Record, vol. 35, issue 4, pp. 2-5 (2006). doi:
  EditDistance      47.31         39.2         42.82                  10.1145/1228268.1228269
                                                                 [7] Svab, O., Svatek, V.: Combining Ontology
       WordNet      11.87        41.69         18.35                  Mapping Methods Using Bayesian Networks. In:
                                                                      OM'06 Proceedings of the 1st International
  Naive Bayes                                                         Conference on Ontology Matching, vol. 225, pp.
                     2.63        81.49          5.10
   Classifier                                                         206-210.
    Logistic
                    74.98        43.43         54.95             [8] Hariri, B., Abolhassani, H., Sayyadi, H.: A
   regression
                                                                      Neural-Networks-Based Approach for Ontology
       XGBoost      72.72        47.51         57.29                  Alignment. (2006).
                                                                      https://www.jstage.jst.go.jp/article/softscis/2006/
                                                                      0/2006_0_1248/_article
                                                                 [9] Euzenat, J., Shvaiko, P.: Ontology Matching.
6 Conclusions                                                         Springer-Verlag Berlin Heidelberg, Berlin
In this paper, we combined the lexical and semantic                   (2007). doi: 10.1007/978-3-642-38721-0
similarity measures into one hybrid model. The                   [10] Ichise, R.: Machine Learning Approach for
experiments show that hybrid model is better than single              Ontology Mapping Using Multiple Concept
matchers.                                                             Similarity Measures. In: Seventh IEEE/ACIS
    In future work we want to run our solution on other               International Conference on Computer and
benchmarks. We must add the structure-level and                       Information Science, (2008). doi:
instance-level features because in our dataset there are              10.1109/ICIS.2008.51
examples in which the names of entities are exactly the          [11] Nezhadi, A., Shadgar, B., Osareh, A.: Ontology
same but they are not matched.                                        Alignment Using Machine Learning Techniques.
                                                                      In: International Journal of Computer Science &
Acknowledgments. This work is supervised by Sergey                    Information Technology, vol. 3, pp. 139-150
Stupnikov, Institute of Informatics Problems, Federal                 (2011). doi: 10.5121/ijcsit.2011.3210
Research Center “Computer Science and Control“ of the            [12] Nikovsi, D., Esenther, A., Ye, X., Shiba, M.,
Russian Academy of Sciences”.                                         Takayama, S.: Bayesian Networks for Matcher
                                                                      Composition in Automatic Schema Matching.
References                                                            In: Mitsubishi Electric Research Laboratories
                                                                      (2011). doi: 10.1.1.644.2168
 [1]    Rahm, E., Bernstein, A.: A survey of approaches
        to automatic schema matching. In: The                    [13] Ngo, D.: Enhancing Ontology Matching by
        International Journal on Very Large Data Bases,               Using Machine Learning, Graph Matching and
        vol. 10, issue 4, pp. 334-350 (2001). doi:                    Information Retrieval Techniques. In: University
        10.1007/s007780100057                                         Montpellier II - Sciences et Techniques du
                                                                      Languedoc (2012). doi: 10.1.1.302.587
 [2]    Do, H., Melnik, S., Rahm, E.: Comparison of
        Schema Matching Evaluations. In: Revised                 [14] Zhang, Y., Wang, X., Lai, S., He, S., Liu, K.,
        Papers from the NODe 2002 Web and Database-                   Zhao, J., Lv, X.: Ontology Matching with Word
        Related Workshops on Web, Web-Services, and                   Embeddings. In: Chinese Computational
        Database Systems, pp. 221-237 (2002). doi:                    Linguistics and Natural Language Processing
        10.1007/3-540-36560-5_17                                      Based on Naturally Annotated Big Data, pp 34-
                                                                      45 (2014). doi: 10.1007/978-3-319-12277-9_4
 [3]    Do, H., Rahm, E.: COMA: a system for flexible
        combination of schema matching approaches. In:           [15] Jayawardana, V., Lakmal, D., Silva, N., Perera,
        VLDB '02 Proceedings of the 28th international                A., Sugathadasa, K., Ayesha, B.: Deriving a
        conference on Very Large Data Bases, pp. 610-                 Representative Vefctor for Ontology Classes
        621 (2002). doi: 10.1016/B978-155860869-                      with Instance Word Vector Embeddings. In:
        6/50060-3                                                     2017 Seventh International Conference on
                                                                      Innovative Computing Technology (INTECH),
 [4]    L. Xu, D. Embley.: Automating Schema
                                                                      (2017). doi: 10.1109/INTECH.2017.8102426
        Mapping for Data Integration. (2003).
        http://www.deg.byu.edu/papers/AutomatingSche
        maMatching.journal.pdf
 [5]    Shvaiko, P., Euzenat, J.: A Survey of Schema-
        Based Matching Approaches. In: Journal on Data


                                                           249

</pre>