<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metadata-based Term Selection for Modularization and Uniform Interpolation of OWL Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xinhao Zhu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuan Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruiqing Zhao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu Dong</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yizheng Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Key Laboratory for Novel Software Technology, Nanjing University</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Arti cial Intelligence, Nanjing University</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper explores the problem of selecting good terms as seed signature for abstraction of OWL ontologies. Existing methods generate seed signatures based on geographic connections, which is far from su cient to produce a satisfactory abstract. This restricts the reusability of OWL ontologies from the aspect of knowledge management. In this paper, we propose a signature extension approach to generate seed signatures for modularization and uniform interpolation of OWL ontologies, both of which are ontology abstraction techniques. The approach establishes the semantic relevance of terms by taking into account as much as possible metadata information of an OWL ontology, and computes a numerical value to measure the relevance of terms using their embedding transformed based on a so-called OWL2Vec* framework. An empirical evaluation of the approach shows that the proposed method signi cantly outperforms other term selection baselines in making accurate selections. Besides, a case study on ontology abstraction tasks shows that modularization tools can make more complete and precise abstractions using the signature extended by our method.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Because of the heterogeneous nature of web resources, ontologies developed for
the semantic web are typically large, sometimes monolithic, and knowledge
modelled therein is rich and covers multiple topics. This may however restrict the
reusability and interoperability of ontologies in real-world application scenarios,
since large ontologies can be di cult to manage, unwieldy to manipulate, and
moreover costly to reason about.</p>
      <p>Consider an ontology reuse use case where an ontologist wants to import a
football ontology into a growing sports knowledge base. Currently the only
wellestablished ontology concerning football is the BBC Sports Ontology3, which,
however, publishes data about all types of competitive physical activities,
pertaining not only to the topic of football. Importing the whole ontology into the
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
3 https://www.bbc.co.uk/ontologies/sport
knowledge base is not di cult from an engineering perspective, but as one can
expect, many web services upon the knowledge base such as search, querying,
retrieval, which typically involve extensive reasoning, may become problematic,
as too much irrelevant information has been, automatically yet unnecessarily,
introduced. Such information makes no contribution to the formalization of the
information about football but increases the computational cost.</p>
      <p>A straightforward way to tackle these challenges of reusability and
interoperability is to extract a fragment of an ontology that can behave in the same
way as the original ontology in a speci c context, but is signi cantly smaller. In
the above case, this means to extract from the BBC Sports Ontology a fragment
that contains su ciently many logical statements to summarize all knowledge
about football. Ideally, this fragment should be as small as possible.</p>
      <p>
        Two logic-based approaches have been developed for computing fragments of
ontologies. One is based on modularization [
        <xref ref-type="bibr" rid="ref13 ref15 ref3 ref5 ref8 ref9">5,9,13,8,3,15</xref>
        ], which seeks to identity
from an ontology a subset (module) that preserves several reasoning tasks for a
sub-vocabulary of the ontology, namely a seed signature.4 The other is uniform
interpolation [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref6">18,16,6,17</xref>
        ], which computes a more compact representation of a
module of an ontology which preserves the underlying logical de nitions of the
terms in the seed signature.
      </p>
      <p>
        As one could expect, the quality of extracted fragments depends largely on
the seed signature fed to modularization and uniform interpolation procedures.
We may say that a fragment is complete if it covers all essential information
about the topic of interest, and a fragment is precise if it is complete and in
addition, it does not include too much irrelevant information about the topic
of interest. More speci cally, if we selected as seed signature too few terms to
summarize all materials of the topic, we would lose important information that
a user may be interested in, and if we selected as seed signature too many terms
with some of them not strongly relevant to the topic, we would include too
much additional information. Importing more information can also change the
de nitions of the terms in the original ontology, and destroy the coherence and
consistency of the original ontology [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Nevertheless, very little attention has been paid to the problem of term
selection for ontology extraction. Chen et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have proposed a signature extension
algorithm to generate seed signatures for ontology modularization. The idea is
to (1) x a primitive seed signature , often containing several domain
expertsuggested terms, and (2) extend with new terms collected from the axioms
which contain the current -terms. This step is iterated until no new terms can
be added to . One may understand this as: if two people p1 and p2 live together
in a house h1 on an island, then they are relevant and team up as = fp1; p2g,
and if there exists a road connecting h1 with another house h2, then the people
living in h2 are collected into . Iteratively, the same strategy applies to the
entire island, and in the end, will probably have collected all habitants on
the island. However, a person who lives on another island will never be collected
4 A signature of an ontology is the set of all concept and role names in the ontology.
by since there is no road connecting two islands; islands are geographically
isolated.
      </p>
      <p>Evidently, following this signature extension strategy one must obtain a larger
seed signature with which, a more informative fragment will be produced, but
we may argue that the seed signature obtained in this way, i.e., using a signature
extension algorithm based merely on geographic connections, could hardly yield
a complete fragment. Our argument is that: the relevance between a term and
the expanding seed signature should be evaluated based on a consideration of all
metadata of the participating terms in the context of the host ontology, rather
than based merely on their geographic connections.</p>
      <p>Consider a scenario where an ontologist wants to extract from a multi-domain
ontology a fragment that describes football and closely related information; see
Figure 1. With the central term \Football" being selected as a single seed in the
primitive signature, an extension = fFootball; BallGame; Sports; Player;
FootballPlayerg is obtained using the above signature extension algorithm.
Terms in other domains such as MentholSpray will not be collected in , because
it is geographically isolated from the domain of Sports. However, the annotated
information of MentholSpray explains that \MentholSpray can be used as pain
reliever for sports players". In this sense the term MentholSpray is supposed
to be strongly relevant to the topic. Collecting MentholSpray in the extended
signature may enable the expanded knowledge base to answer queries regarding
the treatment of an injury in a football match. This is a good example showing
that the relevance between a term and the expanding seed signature in the
context of the host ontology could be established based on important metadata
of the participating terms, for example, based on their lexical information.</p>
      <p>
        In this paper, we propose a novel term selection approach to discovering
semantic relationships between two isolated groups of terms. The idea is to
measure the relevance of non- terms with terms based on their D-dimensional
vector representation computed from important metadata of the ontology using
OWL2Vec* [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a random walk- and word embedding-based OWL ontology
embedding framework that encodes the semantics of OWL ontologies in a vector
space by taking into account their graph structure, lexical information, as well as
the logical constructors used therein. The work is intended to enhance existing
logic-based ontology abstraction techniques as practical tools for many
ontologybased knowledge processing tasks by exploiting non-logical approaches to
facilitate this transfer. Previously, not much work has considered tightly coupled
logical and data-driven techniques and exploited the complementary strengths
of them to open up an application pipeline. Our empirical evaluation showed that
the proposed approach signi cantly outperformed other term selection baselines
in recommending good seed signatures, and with this approach, more precise
fragments could be produced using two existing modularization and uniform
interpolation tools.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Metadata-based Term Selection</title>
      <p>
        For space reasons, we have to assume readers' familiarity with the notions of
ontology modularization [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and uniform interpolation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Our term selection
approach accommodate ontologies described in OWL 2, which are based on the
description logic SROIQ [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; see the Description Logic Handbook [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for a
detailed description of the syntax and semantics of description logics.
      </p>
      <p>Arguably, most topics can satisfactorily be summarized or de ned by a set
of concept names, but do not depend too much on role names. Hence, in this
paper, we only consider the seed signature to be a set of concept names.</p>
      <p>The signature sig(O) of an ontology O is the set of all concept names in O.
Given an ontology O and a seed signature sig(O) containing a single or
a few concept names suggested by domain experts or simply selected by users,
which are believed to be the central term or terms that can best summarize
the topic of interest, our approach computes an extension 0 of in three
steps, namely concept representation learning, computing relevance value, and
signature extension based on relevance value. 0 is the seed signature to be fed
to modularization and uniform interpolation procedures.
2.1</p>
      <sec id="sec-2-1">
        <title>Concept Representation Learning</title>
        <p>The rst step is to transform all concept names A in O into D-dimensional
vectors in a vector space where the relevance of each concept name (to ) is
computed based on important metadata of O.</p>
        <p>
          Our concept representation learning model is based on OWL2Vec* [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], an
ontology embedding framework, which computes the vector representations for
concept names in OWL ontologies as expressive as SROIQ. OWL2Vec*
computes the embedding of an OWL ontology based on a corpus of sequences of
tokens, which are encoded from the metadata of the ontology. Such metadata
includes the graph structure of the ontology, i.e., an RDF graph (a set of RDF
Algorithm 1 Nearest Neighbour Ranking
Input: A set of concepts NC , A set of seed signatures
        </p>
        <p>A set of concept embedding feAD : A 2 NC g,</p>
        <p>A distance function d : RD R ! [0; 1].</p>
        <p>Output: A relevance function f : NC ! [0; 1],
triples) converted from the OWL ontology by OWL2Vec*, the so-called lexical
information about the ontology, i.e., annotations, and the so-called logical
information about the concepts and roles in the ontology, i.e, subsumption,
equivalence, disjointness, etc.</p>
        <p>We note that OWL2Vec* was not meant for term selection tasks, so we make
modi cationstothe original OWL2Vec* model to maximize the performance of
the downstream term selection models. In particular,we designed a ne-tuning
process to further improve ontology embedding, which was task-speci c and
further discussed in section 3. In the end, every concept name A is represented
as a D-dimensional vector eA.
The second step is to compute the relevance value of every (non- ) concept
name A in O w.r.t. . The computation is based on the relative distance of e A
to its nearest seed neighbour (the nearest seed name) in the vector space. The
range of the relevance value is [0; 1] with 1 standing for the strongest relevance
and 0 for the weakest relevance. The relevance value is computed by a newly
developed algorithm called Nearest Neighbor Ranking algorithm (NN-RANK),
shown in Algorithm 1.</p>
        <p>NN-RANK rst computes the distance from each concept name to each seed
name in the vector space. In principle, many distance functions d : RD RD !
[0; 1] can be used to achieve this, but the Consine distance, formulated as
d(eA; eB) = 1</p>
        <p>eA eB
keAk2 keBk2
has made the best measure of relevance in our experiments. j j distance values
are computed in this way for each concept name A, while the smallest distance
value, which denotes the shortest distance, is identi ed as a valid distance value
of A to . NN-RANK then sorts all concepts names in O by their valid distance
value. Concept names with smaller valid distance values are considered to be
semantically more relevant to the seed signature, and thus to the central topic.
These valid distance values (and the corresponding concept names) are then
uniformly distributed between 0 and 1. The result is the relevance value of each
A w.r.t. .
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Relevance-based Seed Signature Extension</title>
        <p>A natural question arises: how to use the computed relevance values to guide the
selection of terms for ontology abstraction? Upon di erent application demands,
the strategies may vary. Without a well-acknowledged gold standard, a feasible
solution could be to measure the \degree" of relevance and de ne to what degree
the relevance is a concept name can be thought of as \relevant" to the seeds in
. For example, one could set a numerical threshold on the relevance value at
0.9 if she wants to gain a more cohesive abstraction of ontology and at 0.5 if she
wants to have a looser one. We leave this exibility to users. Given a threshold
at the scale of 0 to 1, our approach extends the primitive seed signature by
adding to the concept names with relevance value no less than . The result
is 0 = [ fA j A 2 sig(O) ^ f (A; ) g.</p>
        <p>Computing j sig(O)j j j distances requires linear time to j sig(O)j, and the
subsequent sorting requires linear time to j sig(O)j. Hence, we have the following
lemma regarding the time complexity of NN-RANK.</p>
        <p>Lemma 1. Given any OWL ontology O in SROIQ and a primitive seed
signature sig(O) with n = j sig(O)j and k = j j, our term selection approach
always computes an extended seed signature 0 such that 0 in O(n log n+kn)
time.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Empirical Evaluation of NN-RANK</title>
      <p>In this experiment, we used NN-RANK to predict SNOMED CT Refset
components. The aim was to show that the algorithm could enrich a given primitive
seed signature with concept names highly relevant to the initial seeds (in a
vector space). The experiment was conducted on a work station with an Intel
Xeon CPU @ 2.60GHz and 32 GB memory.</p>
      <p>SNOMED CT5 is currently the most comprehensive, multilingual clinical
healthcare ontology in the world. A SNOMED CT Refset6 is a collection of
SNOMED CT components sharing speci c characteristics (e.g., a speci c
domain). An example of SNOMED CT Refset is the Malaria refset released by the
5 https://www.snomed.org/
6 https://con uence.ihtsdotools.org/display/DOCGLOSS/refset
National Resource Centre for EHR Standards in India, which includes ndings,
disorders, and organisms related to Malaria. Arguably, the refset published o
cially by a group of ontology engineers and domain experts, can be considered
as a complete and precise standard of an Malaria abstract of SNOMED CT.</p>
      <p>Our task was to predict concepts in SNOMED CT Refsets based on a seed
signature (randomly or manually) selected from the refsets. This task was
designed to t with realistic scenarios where we needed to develop a new refset
with least intervention from domain experts. We assumed that refsets developed
by the domain experts were complete and precise fragments, containing concepts
that were highly interconnected on the semantic level (e.g., in the same
clinical domain). Therefore, the task of predicting SNOMED CT Refset components
could be used to evaluate the performance of term selection models.</p>
      <p>
        To better position our algorithm, we compared NN-RANK with two other
term selection strategies, namely, a strategy adapted from locality-based
modularization [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (denoted as Star-modularization), and the signature-extension
based on geographic connections [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (denoted as Sig-Ext, con gured with depth
d). We treated them as baselines. The idea of the locality-based modularity
strategy was to take all concept names in the computed module as the extended
signature of the seed. This may not be ideal but was nevertheless a means to
extend the seed signature. In this way, the relevance value f (A; ) of A was 1
if A was in the signature of the computed module, and 0 otherwise. We also
considered a comparison of NN-RANK with Meta-SVDD [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a model designed
for few-shot one-class-classi cation problems. Using Meta-SVDD, we learnt
patterns about refsets from existing refsets, in order to enhance its performance in
predicting new refset components.
      </p>
      <p>We considered the International Edition of SNOMED CT (version July 2020),
which contains 354,256 concepts, 355,214 logical axioms, and 1,506,185
description axioms. We used two sets of publicly accessible and in-use term collections,
NHS refsets 7 and NRC refsets 8, as the target refsets.</p>
      <p>The NHS refsets, issued by the National Health Service (NHS) in the UK,
o ered from the full Edition of SNOMED CT a set of components de ned by a
particular requirement. The NRC refsets were released by the National Resource
Centre for EHR Standards (NRCeS) in India, which contained 30 standalone
refsets covering concepts related to common diseases.</p>
      <p>We adopted two metrics widely used in classi cation and ranking tasks,
namely the Normalized Discounted Cumulative Gain (NDCG) and the Area
under the ROC Curve (AUC), to evaluate the performance of term selection
models. Both measures returned high values if a model made accurate
predictions, i.e. they measured the similarity between the approximations and the
refset components.</p>
      <p>Ontology embedding generated by OWL2Vec* on SNOMED CT was used for
the concept embedding, where each concept was represented by a 200-dimensional
vector. Di erent from the original OWL2Vec* model, we used a ne-tuning
pro7 https://dd4c.digital.nhs.uk/dd4c/
8 https://www.nrces.in/resources#snomedct releases
cess specially designed for this task, to further improve the ontology embedding.
Speci cally, refsets in this process were transformed to documents containing
(concept uri, refset identi er, concept uri) triples, then a Word2Vec model was
used to ne-tune the pre-computed concept embedding on these documents. The
ne-tuning process was done in a 10-fold cross validation manner, which meant
that evaluations on any refset is based on a concept embedding ne-tuned on
90% refsets other than itself.</p>
      <p>For NRC refsets, two seed signatures r and s consisted of K concepts
respectively were used throughout the experiment. r was randomly selected
among all the refset concepts, while s was manually selected with the aim that
the K concepts it contained could describe the topic from di erent aspects. For
NHS refsets, we only used a di erent set of r generated in the same way. It was
crucial to be able to set the size of the primitive seed signature K accordingly
to the application. In realistic use cases, the seed signature may be manually
selected, where smaller K means less manual cost, so K = 5 is used in the
experiments.</p>
      <p>
        We used the OWL API syntactic locality module extraction tool9 as the
implementation of the locality-based module, and the o cial implementation
of Sig-Ext. For Meta-SVDD, our implementation was based on the source code
provided by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
The results (mean value standard deviation of the two measures) in Table 1
and 2 show that embedding-based methods outperformed logical approaches in
the above settings. This was because logical methods were not designed for this
task, and it did not capture lexical information of the ontology, which was crucial
in determining the semantic relevance between concepts.
      </p>
      <p>Besides, NN-RANK slightly outperformed Meta-SVDD, particularly when
using s. We will conduct a case study on the aforementioned Malaria refset to
explain the mechanism and e ectiveness of NN-RANK in this task.</p>
      <p>Figure 2 shows the distribution of the Malaria refset components and other
SNOMED CT concepts in a 2-dimensional vector space. As illustrated in the
gure, refset components tended to form a number of minor clusters, with each
containing some highly semantically relevant concepts. The whole refset was
composed of several concept clusters instead of a giant cluster. This meant that
when two seed concepts A1 and A2 were given, any concept A that was similar
to A1 or A2, i.e. d(eA; eA1 ) &lt; or d(eA; eA2 ) &lt; with being a small value
greater than 0, were more likely to be a refset component compared to another A
which was similar to the average of eA1 and eA2 , i.e., d (eA; (eA1 + eA2 )=2) &lt; .
NN-RANK was designed to t in this multi-clusters pattern, and achieved better
performance compared to other models utilizing concept embedding.</p>
      <p>The performance of NN-RANK could be signi cantly enhanced when seed
signatures described the topic from di erent aspects. For a high quality primitive
seed signature like s, an increased seed signature size would generally led to
more accurate selection results.
3.2</p>
      <sec id="sec-3-1">
        <title>Time E ciency</title>
        <p>For the current setting of N = 354; 256; K = 5; D = 200 and using Cosine
distance as the distance function, NN-RANK generated 0 within 5 seconds. For
comparison, it usually takes minutes to hours for other approaches (e.g.,
Starmodularization and Sig-Ext) to compute on a large-scale ontology like SNOMED
CT, and ve minutes for the Meta-SVDD model to converge in the same setting.</p>
        <p>It is true that our approach takes around 2 hours to build embedding vectors
on SNOMED CT, but this cost is acceptable in real-life scenarios since the
training is conducted only once but can be meaningfully used many times and
forever. Also, the training time can be adjusted. When the ontology contains
less than 100K logical and annotation axioms, it is typically less than one hour.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Case Study: Ontology Abstraction</title>
      <p>In this part, we explored how input signature extended by NN-RANK
benets di erently between modularization and uniform interpolation in the OWL
ontology abstraction task.</p>
      <p>We considered HeLiS10, an ALCHIQ(D) ontology integrating knowledge
about food and activity from a nutritional point of view. The experiment was
based on HeLiS v1.10 which has 172,213 axioms, 277 concepts, and 50 roles.
First, we randomly generated 10 concept subsets from sig(OHeLiS ) with the size
of subsets ranged from 1 to 5. These randomly generated concept sets, denoted
as r, could be the approximations of seed signatures around random topics.
Then NN-RANK returned the ordered sets 0.</p>
      <p>
        As the abstractions in real-life are usually small in size, we chose the top 10%
of 0 (i.e., set the threshold as 0.9) to be the input signature for modularization
and uniform interpolation. We used UI-FAME [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to compute uniform
interpolants, and Star-modularization to compute locality-based modules as they are
publicly accessible. Both preserved full logical entailments of the input
signature 0 in OHeLiS [
        <xref ref-type="bibr" rid="ref10 ref14">10,14</xref>
        ]. Then the abstraction results computed by these two
10 https://horus-ai.fbk.eu/helis/
tools with the input of 0 (denoted as 0+UI-FAME, 0+Star-modularization)
were assessed with four metrics [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: module size jMj, module inherent
richness InhRich, module intra distance IntraDist and module cohesion Cohesion.
A module with relative smaller size, higher inherent richness, relative smaller
intra distance, and higher cohesion was said to be more compact. We also test
r+Star-modularization and compared it with 0+Star-modularization.
4.2
      </p>
      <sec id="sec-4-1">
        <title>Results and Analysis</title>
        <p>We compared 0+UI-FAME and 0+Star-modularization to see the e
ectiveness of NN-RANK to di erent abstraction methods. From table 3, we can see
that UI-FAME generated more compact abstractions. Besides, UI-FAME was
sensitive to the input signature. These results make sense because locality-based
modularization introduced other terms which were not in 0 but uniform
interpolation stuck to 0. Experiments with thresholds setting as 0.3, 0.5, and 0.7
show that the size of 0 did not a ect the compactness of the locality-based
module abstraction.</p>
        <p>Term selection allowed users to extend the seed signature in an adjustable
way. For uniform interpolation, it is a key step to select suitable terms for the
speci ed topic, because the semantics of the topic is mainly captured by the
input terms. We observe that once if the input terms were not su cient enough for
uniform interpolation, the module could be very small, containing many
meaningless axioms like A v &gt; or concept assertion axioms. NN-RANK+UI-FAME
generated knowledge highly relative to the topic. For instance, in Table 4, the
topic was \SpecialBread". The related axioms in OHeLiS were contained in
Ofragment. Clearly, \SpecialBread" had ve individuals. Besides, these
individuals had no other super-classes except \SpecialBread'. As commonsense
knowledge, \OliveBread" can be
\OlivesAndOliveProducts", \SoyBread" can be \SoyProducts", \MilkBread"
can be \MilkAndDairyProducts", which were missing in OHeLiS. So without
the extension of NN-RANK, these related concepts could not be preserved in
r + Star-modularization or r + UI-FAME. While NN-RANK could
preserve them according to that \OlivesAndOliveProducts", \SoyProducts", and
\MilkAndDairyProducts" were lexically close to the individuals of the topic
concept \SpecialBread".</p>
        <p>To sum up, with NN-RANK modules and uniform interpolants produced
more complete fragments. In addition, 0+uniform interpolation produced more
precise fragments than 0+modularization.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>This paper makes a preliminary attempt to address the problem of extending the
given seed signature with new terms selected sophisticatedly through
embeddingbased computation of important metadata of an OWL ontology. An evaluation
of the approach on a predication task of a SNOMED CT refset shows that our
approach makes accurate selections compared with other term selection
baselines. A case study shows that high-quality modules and uniform interpolants of
OWL ontologies can be produced using our term selection approach.</p>
      <p>The absence of standardized benchmarks remains the main bottleneck in
evaluating the performance of term selection methods. Hence, a number of
prede ned question answering instances that are generated based on the input
ontology might be helpful in deciding the completeness and precision of the generated
abstracts of OWL ontologies. For a problem Q that can be answered by querying
an ontology O, a satisfactory abstract M of O regarding a input signature
should be able to answer Q if Q is relevant to , and should not be able to
answer Q if Q is not relevant to .</p>
      <sec id="sec-5-1">
        <title>Acknowledgements</title>
        <p>The authors would like to thank the reviewers for their insightful comments
and good suggestions. This work was supported by National Natural Science
Foundation of China (grant 62006114) and Open Research Projects of Zhejiang
Lab (grant 2021KE0AB08).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>An Introduction to Description Logic</article-title>
          . Cambridge University Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holter</surname>
            ,
            <given-names>O.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonyrajah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Owl2vec*: Embedding of owl ontologies</article-title>
          . arXiv preprint arXiv:
          <year>2009</year>
          .
          <volume>14654</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alghamdi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Ontology Extraction for Large Ontologies via Modularity and Forgetting</article-title>
          . In: Kejriwal,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.A.</given-names>
            ,
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Proc. K-CAP'19</source>
          . pp.
          <volume>45</volume>
          {
          <fpage>52</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dahia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Segundo</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Meta learning for few-shot one-class classi cation</article-title>
          . arXiv preprint arXiv:
          <year>2009</year>
          .
          <volume>05353</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Modularizing ontologies</article-title>
          . In: Ontology Engineering in a Networked World, pp.
          <volume>213</volume>
          {
          <fpage>233</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ianni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schindlauer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompits</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Forgetting in managing rules and ontologies</article-title>
          .
          <source>In: Web Intelligence</source>
          . pp.
          <volume>411</volume>
          {
          <fpage>419</fpage>
          . IEEE Computer Society (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gamper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsang</surname>
            ,
            <given-names>Y.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snead</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajpoot</surname>
          </string-name>
          , N.:
          <article-title>Meta-svdd: Probabilistic meta-learning for one-class classi cation in cancer histology images</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>03109</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gatens</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Lower and upper approximations for depleting modules of description logic ontologies</article-title>
          .
          <source>In: Proc. ECAI'14. Frontiers in Arti cial Intelligence and Applications</source>
          , vol.
          <volume>263</volume>
          , pp.
          <volume>345</volume>
          {
          <fpage>350</fpage>
          . IOS Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Modular Reuse of Ontologies: Theory and Practice</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>31</volume>
          ,
          <issue>273</issue>
          {
          <fpage>318</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Modularity and web ontologies</article-title>
          .
          <source>In: KR</source>
          . pp.
          <volume>198</volume>
          {
          <issue>209</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The even more irresistible SROIQ</article-title>
          .
          <source>In: Proc. KR'06</source>
          . pp.
          <volume>57</volume>
          {
          <fpage>67</fpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>Z.C.</given-names>
          </string-name>
          :
          <article-title>Evaluation metrics in ontology modules</article-title>
          .
          <source>In: Description Logics</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Model-theoretic inseparability and modularity of description logic ontologies</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>203</volume>
          ,
          <issue>66</issue>
          {
          <fpage>103</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Logic-based ontology comparison and module extraction, with an application to dl-lite</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>174</volume>
          (
          <issue>15</issue>
          ),
          <volume>1093</volume>
          {
          <fpage>1141</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Koopmann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , J.:
          <article-title>Deductive Module Extraction for Expressive Description Logics</article-title>
          .
          <source>In: Proc. IJCAI'20</source>
          . pp.
          <volume>1636</volume>
          {
          <fpage>1643</fpage>
          . ijcai.
          <source>org</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liberatore</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Propositional independence: Formula-variable independence and forgetting</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>18</volume>
          ,
          <issue>391</issue>
          {
          <fpage>443</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Foundations for Uniform Interpolation and Forgetting in Expressive Description Logics</article-title>
          .
          <source>In: Proc. IJCAI'11</source>
          . pp.
          <volume>989</volume>
          {
          <fpage>995</fpage>
          . IJCAI/AAAI Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Visser</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Bisimulations, Model Descriptions and Propositional Quanti ers</article-title>
          . Logic Group Preprint Series, Utrecht University (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alghamdi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juric</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khodadadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Tracking logical di erence in large-scale ontologies: a forgetting-based approach</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Arti cial Intelligence</source>
          . vol.
          <volume>33</volume>
          , pp.
          <volume>3116</volume>
          {
          <issue>3124</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>