Property-based Entity Type Graph Matching

    Fausto Giunchiglia[0000−0002−5903−6150] and Daqian Shi[0000−0003−2183−1957]

        Department of Information Engineering and Computer Science (DISI),
                            University of Trento, Italy
                 {fausto.giunchiglia, daqian.shi}@unitn.it


        Abstract. We are interested in dealing with the heterogeneity of Knowl-
        edge bases (KBs), e.g., ontologies and schemas, modeled as sets of entity
        types (etypes), e.g., person, where each etype is associated with a set of
        properties, e.g., age or height, via an inheritance hierarchy. A huge litera-
        ture exists on this topic. A common approach is to model KBs as graphs
        decorated with labels and reduce the problem of KB matching to that
        of matching these two elements, viz., labels and structure of the graph.
        However, labels of etypes are often misplaced, e.g., they are more general
        or specific than the correct etype, as defined by its properties. Structure-
        based matching may also lead to wrong conclusions as the properties
        assigned to an etype in an inheritance hierarchy do not depend on the
        order by which they are assigned and, therefore, on the specific structure
        of the graph. In this paper, we propose a novel etype graph matching
        approach, dealing with the two problems highlighted above, based on
        two key ideas. The first is to implement matching as a classification task
        where etypes are characterized by the associated properties. The second
        is we propose two property-based etype similarity metrics, which model
        the roles that properties have in the definition of an etype. The experi-
        mental results show the effectiveness of the algorithm, in particular for
        those etype graphs with a high number of properties. 1

        Keywords: Etype graph matching · Machine learning · Entity type sim-
        ilarity · Knowledge reuse


1     Introduction

We are interested in dealing with the heterogeneity of Knowledge bases (KBs),
e.g., ontologies and schemas, modeled as sets of entity types (etypes), e.g., per-
son, where each etype is associated with a set of properties, e.g., age or height, via
an inheritance hierarchy. A huge literature exists on this topic, e.g., [23, 24, 33].
Most etype graph matching approaches exploit label-based methods [6,36], such
as character similarity metrics and synonym analysis, and structure-based meth-
ods [18], implementing various forms of graph matching. However, labels of
etypes may suggest a wrong etype [19,34]. For example, an eagle can be labelled
1
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2         F. Giunchiglia and D. Shi

as Bird in a general-purpose ontology and Eagle in a domain-specific ontology.
Structure-based matching may also lead to wrong conclusions as the properties
assigned to an etype in an inheritance hierarchy are cumulative and depend only
on the nodes in the path from the root and, therefore, do not depend on the
order by which they are assigned. For example, the super-class of etype Eagle
can be Animal in one etype graph and Bird in another etype graph.
    As a solution to the above problems, the main intuition underlying the work
described in this paper is to match etypes on the basis of the properties which
are used to define them. It is, in fact, the properties that are used to intension-
ally define an etype which define it independently of its specific name and also
independently of its hierarchy [17]. Furthermore, it is fact that in most relevant
ontologies, etypes are associated with sufficient properties, like DBpedia [1] and
OpenCyc [13]. And the reason for this is quite obvious, being the purpose of
any data or knowledge integration task exactly that of extending the number of
properties associated to an etype.
    In this paper, we implement the above intuition based on main contributions:
    – We introduce two property-based etype similarity metrics, namely the hor-
      izontal similarity ESh and the vertical similarity ESv which characterise
      the role that properties have in the definition of given etypes. These simi-
      larity metrics capture the main idea that for any two etypes, the properties
      which distinguish one etype from the other should not occur in the other
      etype. Since different properties contribute differently for matching etypes,
      we introduce ESh which focuses on measuring the properties with different
      shareability, and ESv measures properties based on their specificity.
    – We implement the etype graph matching as a classification task where the
      matching of etypes is based on their associated properties. In this paper, we
      propose and evaluate a machine learning (ML)-based etype graph matching
      approach.
   The paper is organized as follows. Section 2 introduces our own specific for-
malization for etype graphs and relevant terminology. Section 3 presents two
property-based etype similarity metrics. Section 4 introduces the overall etype
graph matching algorithm. The evaluation details and results in Section 5, where
the experiments are based on a selected test cases from the Ontology Alignment
Evaluation Initiative (OAEI) [11]. Finally, we present the related work in Section
6 and the conclusions in Section 7.

2      Etype Graphs as FCA contexts
We formalize etype graphs as formal concept analysis [17] (FCA) contexts.
Specifically, we define an etype graph ET G as ET G = hE, P, T i, with E =
{e1 . . . en } being the set of etypes from the etype graph, P = {p1 . . . pn } being
the set of properties, T = {e ∈ E|he, T (e)i} being the set of correspondences
between etypes and properties, where function T (e) returns properties of e. We
consider the property p is used to describe an etype e when the property belongs
to set T (e). Two observations:
                             Property-based Entity Type Graph Matching        3

1. E is a set of etypes but not a set of entities. Similar to what happens in
   general FCA, which assumes that an entity is described by a set of property
   values, an etype is considered to be described by a set of properties T (e).
   Since in our method we focus on the correlations between etypes and prop-
   erties, we organize an etype graph as etype-property correlation map as an
   FCA context without containing additional information.
2. Etype characterization exploits not only the properties associated with it
   but also the others, namely those which are not used in its definition. Thus,
   we introduce the non-associated properties into our FCA context and dis-
   tinguish two more different cases for better presenting the FCA context.


                Fig. 1. An example the hierarchy of etype graph


As an example, Figure 1 presents the hierarchy of an etype graph, extracted from
DBpedia [1]. In each box, etypes are presented in yellow and their properties in
green. We formalize the etype graph in Figure 1, into an FCA context as from
below.


        Fig. 2. An example of formalizing etype graph into FCA contexts

In Figure 2 we adopt the following conventions. The value box with a circle
represents the fact the property is associated with the etype, e.g., citizenship
is associated with Person. The value box with a cross means the property is
not associated with the etype, e.g., date is not used to describe etype Person.
The value “UN” represents the fact that the property is not associated with
the etype but associated with at least one of its subclasses, namely undefined.
4       F. Giunchiglia and D. Shi

The intuition is that the property might or might not be used to describe the
current etype, e.g., academy award is used to describe Artist and it might be
used to describe Person since Artist is a subclass of Person. We encode these
three correlations as the parameter wp . Since the correlation of “associated with”
is positive for a property describing an etype, the correlation of “not associated
with” is negative and the correlation of “undefined” is neutral, we take wp to be
defined as wp ∈ {1, 0, −1}.
                     
                      1, if p ∈ prop(E)
                 wp = 0, if p ∈ / prop(E)&p ∈ prop(E.subclass)                  (1)
                       -1, if p ∈
                                / prop(E)&p ∈
                                            / prop(E.subclass)
                     

In the above equation, we take p as the target property and prop(E) as the
properties associated with E. Thus, the circles, UNs and crosses in Figure 2 are
set to 1, 0 and -1, respectively.


3     Property-based similarity
The similarity metrics are inspired to the work in [16, 19] in considering prop-
erties as one of the most important features to describe an etype and to the
formalization of the “get-specific” heuristic provided in [20]. These provide us
the intuition that a more specific property provides more information to identify
an etype. Let us introduce our two etype similarity metrics in detail.

3.1   Horizontal Similarity
When measuring the specificity of a property, a possible idea is to horizontally
compare the number of etypes that are described by a specific property, namely
the shareability of the property [19]. If a property is used for describing diverse
etypes, it means that the property is not highly characterizing. Thus, for in-
stance, in figure 2, the property name is used to describe Person, Place, Athlete.
Dually, if a property is used for describing a few etypes or the property is asso-
ciated with only one etype, this means this property can be regarded as highly
characterizing, e.g., in Figure 2, property settlement is specific for etype Place.
Based on this intuition, we consider the specificity of a property is related to
its shareability. Therefore, we propose SP as the metric for measuring property
specificity. More precisely, SP aims to minimize the number of etypes that are
associated with the target property in a specific etype graph. We model the
metric SP as:

                       SPET G (p) = wp ∗ eλ(1−n(p)) ∈ [−1, 1]                   (2)
where p is the input property and n(p) is the number of etypes that are described
by the input property in a specific entity graph ET G, thus n(p) ≥ 0; e refers to
the natural mathematical constant [15]; λ is a constraint factor whose aim is to
produce a gentle curve. Assume that A and B are two etype graphs. Then we
model ESh as follows:
                                  Property-based Entity Type Graph Matching              5


                                   k                                    
                                 1X           SPA (pi )     SPB (pi )
              ESh (Ea , Eb ) =                           +                   ∈ [0, 1]   (3)
                                 2 i=1       |prop(Ea )|   |prop(Eb )|
where we take Ea , Eb as the candidate etypes from A and B respectively. Thus
Ea ∈ A and Eb ∈ B; prop(E) refers to the properties associated with the spe-
cific etype and |prop(E)| refers to the number of prop(E). k is the number of
matched properties which are associated with both etype Ea and Eb . SPA (pi )
and SPB (pi ) refer to the specificity of the aligned property pi in A and B, re-
spectively. Notice that we have ESh (Ea , Eb ) = ESh (Eb , Ea ). Notice also that
we apply z-score normalization [29] to ESh at the end of calculation, and that
the range of ESh is between 0 to 1.

3.2   Vertical Similarity
Etype graphs are organized as classification hierarchies such that upper-layer
etypes represent more abstract or more general concepts, whereas lower-layer
etypes represent more concrete or more specific concepts [20, 31]. Correspond-
ingly, properties of upper-layer etypes are more general since they are used to
describe general concepts, vice versa, properties of lower-layer etypes are more
specific since they are used to describe specific concepts. We assume that specific
properties will contribute more to the identification of an etype. For instance, in
Figure 2, as a lower-layer etype, Artist can be identified by the property academy
award but not by the property name. Based on this intuition, we propose L(p)
as a metric for measuring property specificity. We model L(p) as follows:

                  LET G (p) = wp ∗ θ ∗          min        layer(E) ∈ [−1, 1]           (4)
                                              E∈etype(p)

where: θ is a constraint factor which normalized the range of the function;
etype(p) outputs all the etypes that are described by the property p; and layer(E)
refers to the layer of the inheritance hierarchy where an etype E is defined. We
define the vertical etype similarity metric ESv as from below.
                                   k                                    
                                 1X            LA (pi )      LB (pi )
              ESv (Ea , Eb ) =                           +                   ∈ [0, 1]   (5)
                                 2 i=1       |prop(Ea )|   |prop(Eb )|
Similar to the definition of ESh , we have candidate etypes Ea ∈ A and Eb ∈ B
and the properties prop(E) associated with the etype E. The key difference is
that ESv exploits the property specificity based on the layer information L(p).
LA (pi ) and LB (pi ) refer to the highest layer of the aligned property pi in A and
B, respectively. Notice that ESv is symmetric as well. ESv is also normalized
by z-score normalization, in the same way as ESh . Finally the range of ESv is
between 0 to 1.

4     Etype Graph Matching
Figure 3 presents the Processing chart of our etype graph matching approach. It
mainly consists of two matchers, the property matcher and the etype matcher.
6      F. Giunchiglia and D. Shi

After parsing the input etype graph pair, properties are first sent into the NLP-
based property matcher, where string-based and language-based similarity met-
rics are exploited to match two property labels [3,4]. Then we generate the FCA
contexts according to the etypes and correlated property pairs. In this phase,
we will also generate our property-based etype similarity metrics ESh and ESv
and then send them all to the etype matcher. We develop a ML-based matcher
which considers etype matching as a binary classification task. Thus, our etype
matcher will predict two incoming etypes as match or unmatch and output the
matched etypes as the final results.


          Fig. 3. Processing chart of our etype graph matching approach


     Algorithm 1 below presents the step-by-step process for calculating property-
based etype similarity metrics ESh and ESv . After formalizing etype graphs into
FCA contexts, we assume that the two candidate FCA contexts fa and fb are
generated. P M refers to the property pairs which are aligned by the property
matcher, EM refers to the candidate etype pairs which are waiting for matching.
For every etype pair in EM , we check their correlated properties and update the
specificity values to ESh or ESv if the property pair is aligned. After traversing
all the candidate etype pairs, we obtain completed etype similarities which will be
used for training the ML model, or predicting if two etypes are matching. Table
1 provides some representative examples to show the etype similarity ESv and
ESh between etypes from cmt-confof and cmt-conference in conference track.


          Table 1. Examples of values of etype simimlarity ESv and ESh
                  etype-cmt   etype-confof        ESv ESh
                  Paper       Contribution        1     0.853
                  Author      Author              0.756 0.740
                  SubjectArea Topic               0.198 0.961
                  Meta-Review Poster              0     0.312
                  etype-cmt   etype-conference    ESv ESh
                  Chairman    Chair               1     0.559
                  Person      Person              1     0.970
                  Person      Conference document 0.02 0.06
                  Chairman    Publisher           0     0.07
                                  Property-based Entity Type Graph Matching              7

Algorithm 1 Etype similarity generation. ESh , ESv = etypesim(fa , fb )
Input:
    Candidate FCA contexts fa and fb ;
Output:
    Property-based etype similarity ESh , ESv ;
 1: P M = (pa , pb ) = P ropertyM atcher(fa , fb ); {align pa and pb as property pairs by
    property matcher, where pa ∈ fa and pb ∈ fb .}
 2: EM = (Ea , Eb ) = EtypeSelector(fa , fb ); {select etypes Ea , Eb from fa , fb and
    assemble them as candidate etype pairs EM .}
 3: for all EMi ∈ EM do
 4:   for all pa ∈ fa , pb ∈ fb do
 5:      if (pa , pb ) ∈ P M then
 6:         ESh (EMi ).add(SP (pa ), SP (pb )); {add the horizonal specificity to etype
            similarity ESh (EMi ), refers to equation 3}
 7:         ESv (EMi ).add(L(pa ), L(pb )); {add the vertical specificity to etype similar-
            ity ESv (EMi ), refers to equation 5}
 8:      end if
 9:   end for
10: end for
11: return ESh , ESv


5     Evaluation
We first describe the evaluation set-up and then provide the results from the
experiments.

5.1   Evaluation Set-up
The main decision for the evaluation was to take OAEI as the main reference
for the selection of the matching problems. As of today, this in fact the major
source of ontology matching problems.
    Our approach focuses on ontologies that contain etypes associated with a
fair number of properties. As a result, we have selected the following cases:
the bibliographic ontology dataset [10] and conference track [38] (ra1 version).
From the bibliographic ontology dataset, we select series #101 and series #301-
304, which present real-life ontologies for bibliographic references from the web.
We set these bibliographic ontologies as the training set for training our ML-
based etype matcher. The conference track contains 16 ontologies, dealing with
conference organizations, and 21 reference alignments. We set the 21 reference
alignments from the conference track as the testing set to validate our etype
matcher. We select the training and testing set from different cases since we aim
to prove the adaptation of our approach, which also prevents our approach from
overfitting. Notice that there is an unbalanced positive and negative sample issue
when we match two candidate ontologies, which means negative samples will be
produced much more than positive samples. To address this issue, we propose
a model training strategy that decreases the negative samples and duplicates
8       F. Giunchiglia and D. Shi

a part of positive samples to achieve a balanced training set and to alleviate
overfitting.
    In this paper, our matching approach applies a general binary classification
strategy, which is independent of the specific ML model. Thus, the data label is
1 or 0, which means two etypes are matching or unmatching respectively. The
data consists of three kinds of attributes, which are string-based similarity met-
rics (N-garm [12], Longest common subsequence [12], Levenshtein distance [37]),
language-based similarity metrics (Wu and Palmer similarity [28], Word2vec [8])
and property-based similarity metrics (ESh and ESv ). These etype similarities
aim to measure different aspects of the relatedness between two etypes. Here
we select some of the most common string-based and language-based similarity
metrics as additional metrics working with our property-based similarity metrics
for achieving better etype matching results.


5.2   Experimental Results

For better evaluating the validity of our approach, we apply 4 different ML
models, which are: random forest [27], stochastic gradient descent (SGD) clas-
sifier [22], decision tree [32] and logistic regression [26]. We have compared
our work with state-of-the-art matching methods, as they came out of previ-
ous OAEI evaluation campaigns. The involved state of the art systems are:
FCAMap [7], AML [14], LogMap and LogMapLt [21]. We calculate precision,
recall, F1 -measure, and also F0.5 -measure and F2 -measure [30].


               Table 2. Comparing our method with SOTA methods

              ConferenceTrack          Prec. Rec. F0.5 -m. F1 -m. F2 -m.
                FCAMap            0.680 0.625        0.668   0.651 0.635
                  AML             0.832 0.630        0.782   0.717 0.662
                 LogMap           0.798 0.592        0.746   0.680 0.624
                LogMapLt          0.716 0.554        0.676   0.625 0.580
           Ours-RandomForest      0.529 0.884        0.575   0.662 0.779
           Ours-SGDClassifier     0.779 0.632        0.744   0.698 0.656
            Ours-DecisionTree     0.671 0.703        0.677   0.687 0.696
          Ours-LogisticRegression 0.556 0.808        0.593   0.659 0.741


Table 2 shows the results of our approach with the different models mentioned
above, compared with the results of state-of-the-art methods. Firstly, we can find
our approach with different models produce slightly different results, the SGD
classifier performs the best in general, which leads the precision, F0.5 -measure
and F1 -measure. And, random forest advances in recall and F2 -measure. Decision
tree and logistic regression classifiers are marginally powerless than the other two
in conference track ontologies. Secondly, extend to the overall comparison, we can
find that AML has the best overall results. Leading the precision, F0.5 -measure
                                  Property-based Entity Type Graph Matching            9

and F1-measure. Our approach with random forest leads the results on recall and
F2 -measure. Considering that the average results of our approach with different
models are performing close to the state-of-the-art on F1 -measure, we can say
that our approach leads to similar results as state-of-the-art competitors, while
advances in different aspects2 .
    The comparison to state-of-the-art methods shows the validity of our etype
matcher. Moreover, we design a second experiment which is an ablation test to
evaluate if our designed property-based etype similarity metrics are effective. In
this experiment, we test on the backbone model (B) which was trained only by
string-based and language-based similarity metrics. We also test on the model
with ESh , ESv , and both ESh and ESv (ours), respectively. Note that the
backbone model refers to Ours-SGDClassifier in table 2.
    Table 3 shows the results of the ablation test, it is easy to find by using our
designed metrics, the results significantly improved comparing with the results
of the backbone model. Moreover, although B+ESh achieve the best recall mea-
sure, B+ESv +ESh leads in precision and F1 -measure, which means the best
overall performance. This observation shows both our designed metrics are ef-
fective on the etype matching task. At the same time, the etype matcher achieves
the best performance by simultaneously using ESv and ESh .


                                 Table 3. Ablation test
                              Models      Prec. Rec. F1 -m.
                               B      0.621 0.605 0.613
                             B+ESv    0.650 0.700 0.674
                             B+ESh    0.634 0.729 0.678
                           B+ESv +ESh 0.779 0.632 0.698


6     Related work

Based on the idea originally introduced in [19] and different from all the previous
work, our approach is based on the idea of exploiting properties as the main
means for matching etypes. We provide below a short summary of the four
main techniques that we exploit in the implementation of property-based etype
similarity, namely, label matching, graph matching, and the use of ML and FCA.
    In the early stages of ontology matching, etype matching methods mostly fo-
cused on string-based methods. The work in [36] reviews a wide range of string
similarity metrics and propose an ontology alignment method by selecting dif-
ferent powerful similarity metrics. Later, ensemble metrics strategies were in-
troduced in some studies [25], which apply multiple matchers based on different
2
    All approaches do not have significant differences in running times since the confer-
    ence track contains no large ontology.
10      F. Giunchiglia and D. Shi

string-based metrics. The principle of these works is that the combined matchers
are more powerful than an individual matcher.
    The structure of an etype graph has also been considered as important infor-
mation for identifying etypes, like [2,18]. The LogMap system [21] uses a two-step
matching strategy, that is, matches two etypes Ea and Eb by a lexical matcher,
and then considers the etypes that are semantically close to Ea are more likely to
be semantically close to Eb . AML [14] introduces an ontology matching system
that consists of a string-based matcher and a structure-based matcher, building
internal correspondences by exploiting is-a and part-of relationships.
    Some work on matching etypes is based on the use of ML. This work mod-
els the etype matching task as a binary classification task, trying to encode
the information like string similarities and structure information as attributes.
For instance, the work in [5] achieves promising results by encoding the lexical
similarity of the superclass and subclass as structural similarity.
    Finally, FCA lattices have been applied in etype matching methods in the
work described in [7, 35]. To refine health records searching outputs, the work
in [9] introduced a matching method based on FCA which assists the end-user in
defining their queries. In turn, in [35] a bottom-up ontology merging approach
was proposed where FCA lattices were used to keep track of the ontology hier-
archy.


7    Conclusions
In this paper, we have introduced a novel etype graph matching approach via
property-based similarity measurement. Firstly, we discussed a novel formaliza-
tion method for etype graphs, which encodes etypes and properties into FCA
contexts. Then we proposed two novel metrics for measuring the contextual sim-
ilarity between two etypes, namely horizontal similarity and vertical similarity.
Based on our proposed metrics, we have developed a ML-based framework for
etype graph matching. The experimental results show the validity of our ap-
proach.


Acknowledgements
The research conducted by Fausto Giunchiglia has received funding from the
InteropEHRate project, co-funded by the European Union (EU) Horizon 2020
programme under grant number 826106, and the research conducted by Daqian
Shi has received funding from the program of China Scholarships Council (No.
202007820024).


References
 1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia:
    A nucleus for a web of open data. In: The semantic web, pp. 722–735. Springer
    (2007)
                                 Property-based Entity Type Graph Matching            11

 2. Autayeu, A., Giunchiglia, F., Andrews, P.: Lightweight parsing of classications into
    lightweight ontologies. In: European Conference on Research and Advanced Tech-
    nology for Digital Libraries (ECDL 2010). Glasgow, United Kingdom (September
    2010)
 3. Bella, G., Giunchiglia, F., McNeill, F.: Language and domain aware lightweight
    ontology matching. Journal of Web Semantics 43, 1–17 (2017)
 4. Bella, G., Zamboni, A., Giunchiglia, F.: Domain-based sense disambiguation in
    multilingual structured data. In: DIVERSITY Workshop at ECAI 2016 (2016)
 5. Bulygin, L., Stupnikov, S.A.: Applying of machine learning techniques to combine
    string-based, language-based and structure-based similarity measures for ontology
    matching. In: DAMDID/RCDL. pp. 129–147 (2019)
 6. Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In:
    International semantic web conference. pp. 294–309. Springer (2013)
 7. Chen, G., Zhang, S.: Identifying mappings among knowledge graphs by formal
    concept analysis. In: OM@ ISWC. pp. 25–35 (2019)
 8. Church, K.W.: Word2vec. Natural Language Engineering 23(1), 155–162 (2017)
 9. Curé, O.C., Maurer, H., Shah, N.H., Le Pendu, P.: A formal concept analysis and
    semantic query expansion cooperation to refine health outcomes of interest. BMC
    medical informatics and decision making 15(1), 1–6 (2015)
10. Euzenat, J., Ferrara, A., Hollink, L., Isaac, A., Joslyn, C., Malaisé, V., Meilicke,
    C., Nikolov, A., Pane, J., Sabou, M., et al.: Results of the ontology alignment
    evaluation initiative 2009 (2010)
11. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontol-
    ogy alignment evaluation initiative: six years of experience. In: Journal on data
    semantics XV, pp. 158–192. Springer (2011)
12. Euzenat, J., Shvaiko, P., et al.: Ontology matching, vol. 18. Springer (2007)
13. Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of dbpedia,
    freebase, opencyc, wikidata, and yago. Semantic Web Journal 1(1), 1–5 (2015)
14. Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The
    agreementmakerlight ontology matching system. In: OTM Confederated Interna-
    tional Conferences” On the Move to Meaningful Internet Systems”. pp. 527–541.
    Springer (2013)
15. Finch, S.R.: Mathematical constants. Cambridge university press (2003)
16. Fumagalli, M., Daqian, S., Giunchiglia, F.: Ranking schemas by focus:a cognitively-
    inspired approach. In: 26th International Conference on Conceptual Structures
    (2021)
17. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations. Springer
    Science & Business Media (2012)
18. Giunchiglia, F., Autayeu, A., Pane, J.: S-match: an open source framework for
    matching lightweight ontologies. Semantic Web 3(3), 307–317 (2012)
19. Giunchiglia, F., Fumagalli, M.: Entity type recognition–dealing with the diversity
    of knowledge. In: Proceedings of the International Conference on Principles of
    Knowledge Representation and Reasoning. vol. 17, pp. 414–423 (2020)
20. Giunchiglia, F., Zaihrayeu, I., Kharkevich, U.: Formalizing the get-specific docu-
    ment classification algorithm. In: International Conference on Theory and Practice
    of Digital Libraries. pp. 26–37. Springer (2007)
21. Jiménez-Ruiz, E., Grau, B.C.: Logmap: Logic-based and scalable ontology match-
    ing. In: International Semantic Web Conference. pp. 273–288. Springer (2011)
22. Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document cate-
    gorization using stochastic gradient descent (sgd) classifier. In: 2015 International
12      F. Giunchiglia and D. Shi

    Conference on Cognitive Computing and Information Processing (CCIP). pp. 1–4.
    IEEE (2015)
23. Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of
    the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
    database systems. pp. 233–246 (2002)
24. Lonsdale, D., Embley, D.W., Ding, Y., Xu, L., Hepp, M.: Reusing ontologies and
    language components for ontology generation. Data & Knowledge Engineering
    69(4), 318–330 (2010)
25. Nezhadi, A.H., Shadgar, B., Osareh, A.: Ontology alignment using machine learn-
    ing techniques. International Journal of Computer Science & Information Technol-
    ogy 3(2), 139 (2011)
26. Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: A comparison
    of logistic regression and naive bayes. In: Advances in neural information processing
    systems. pp. 841–848 (2002)
27. Pal, M.: Random forest classifier for remote sensing classification. International
    journal of remote sensing 26(1), 217–222 (2005)
28. Palmer, M., Wu, Z.: Verb semantics and lexical zhibiao w u. In: Proceedings of
    the 32nd Annual Meeting of the Association for Computational Linguistics, Las
    Cruces, New Mexico. pp. 133–138 (1994)
29. Patro, S., Sahu, K.K.: Normalization: A preprocessing stage. arXiv preprint
    arXiv:1503.06462 (2015)
30. Pour, N., Algergawy, A., Amini, R., Faria, D., Fundulaki, I., Harrow, I., Hertling,
    S., Jiménez-Ruiz, E., Jonquet, C., Karam, N., et al.: Results of the ontology align-
    ment evaluation initiative 2020. In: Proceedings of the 15th International Workshop
    on Ontology Matching (OM 2020). vol. 2788, pp. 92–138. CEUR-WS (2020)
31. Rios-Alvarado, A.B., Lopez-Arevalo, I., Sosa-Sosa, V.J.: Learning concept hier-
    archies from textual resources for ontologies construction. Expert Systems with
    Applications 40(15), 5907–5915 (2013)
32. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology.
    IEEE transactions on systems, man, and cybernetics 21(3), 660–674 (1991)
33. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges.
    IEEE Transactions on knowledge and data engineering 25(1), 158–176 (2011)
34. Sleeman, J., Finin, T., Joshi, A.: Entity type recognition for heterogeneous seman-
    tic graphs. AI Magazine 36(1), 75–86 (2015)
35. Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: IJCAI.
    vol. 1, pp. 225–230 (2001)
36. Sun, Y., Ma, L., Wang, S.: A comparative evaluation of string similarity metrics
    for ontology alignment. Journal of Information &Computational Science 12(3),
    957–964 (2015)
37. Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE transactions
    on pattern analysis and machine intelligence 29(6), 1091–1095 (2007)
38. Zamazal, O., Svátek, V.: The ten-year ontofarm and its fertilization within the
    onto-sphere. Journal of Web Semantics 43, 46–53 (2017)