<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OReCaP { Towards Ontology Reuse via Focused Categorization Power</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viet Bach Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vojtech Svatek Gollam Rabby</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ondrej Zamazal</string-name>
          <email>ondrej.zamazalg@vse.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Knowledge Engineering University of Economics</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Focused categorization power (FCP) has been recently introduced as a way of measuring the utility of an ontology by the count of concept expressions expressible using the ontology and subsumed by the given (focus) class/es. OReCaP is an ontology search interface with an integrated ontology ranking method based on the FCP value. The choice of ontologies for reuse is supported by the listing of di erent types of categories provided by the ontology for a particular focus class.</p>
      </abstract>
      <kwd-group>
        <kwd>focused categorization power</kwd>
        <kwd>ontology ranking</kwd>
        <kwd>ontology reuse</kwd>
        <kwd>ontology search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        When reusing existing OWL ontologies for publishing a dataset in RDF or
developing a new ontology, preference may be given to those providing extensive
subcategorization for the classes deemed important in the new dataset schema
or ontology (focus classes). The reused set of categories may not only consist of
named classes but also of some compound concept expressions viewed as
meaningful categories by the knowledge engineer and possibly later transformed to
a named class, too, in a local setting. In our previous work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] we de ned the
general notion of focused categorization power (FCP) of a given ontology,
calculated with respect to a focus class and a particular concept expression language,
as the (estimated) weighted count of the categories that can be built from the
ontology's signature, conform to the language, and are subsumed by the focus
class. For the sake of tractable experiments we then formulated and empirically
justi ed a restricted concept expression language based on existential
restrictions.
      </p>
      <p>As an example, let us consider the case of ontology reuse for describing the
dataset of a used car retailer. For the focus class Vehicle in a particular ontology,
we can consider its named subclasses such as Motorcycle, but also anonymous
concept expressions1 such as Vehicle and hadAccident some Thing (vehicle
that underwent an accident), Vehicle and hasSeller some Company (vehicle
sold by a company, not by a person), or Vehicle and hasFuel value CNG
(vehicle that uses CNG as fuel). These three examples belong each to a di erent
concept expression type; however, all are a part of the `existential restriction
family'. Their Tbox templates are, in turn: 9P:&gt;, 9P:C, and 9P:fig. Exhaustively
enumerating all such expressions that can be constructed from the signature of
the given ontology would of course have little relevance. However, we assume that
the expressions can be ltered using syntactic patterns over the Tbox axioms;
most prominent role is played by the domain/range axioms, in this respect. For
example, 9P:&gt; is more likely to be a meaningful subcategory for a focus class
F C if there is an axiom in the form</p>
      <p>
        P rdfs:domain F C
in the (inferential closure of the) ontology. The heuristic patterns for other
axioms types [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are a bit more complex, but still easy to detect in the ontology
Tbox.
      </p>
      <p>
        In this demo paper we present the rst operationalization of the notion of
FCP in its main target context: ontology recommendation for dataset
description. Recommendation of ontologies and of individual terms from them have
recently been an active eld of research. The mainstream approach consists in
various kinds of term/ontology popularity computation. For example,
Atemezing &amp; Troncy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] used an information-theoretic approach, and Butt [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] employed
a hub-authority graph analysis approach. Stavrakantonakis et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] then
combined popularity metrics with the credibility of the vocabulary designers (based
on the previously developed ontologies) as an orthogonal feature. Kolbe et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
analogously, measured the academic publication performance of the designers.
      </p>
      <p>
        We believe that the FCP is yet another relatively orthogonal feature to be
considered: while some existing approaches include a similar notion of class
`importance' within the ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], they do not consider compound concepts as
`latent' entities in uencing this importance.
      </p>
      <p>
        The survey on ontology reuse strategies by Schaible et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] indicates that
reusing multiple entities from the same vocabulary (even if some of them are by
themselves less popular than analogous entities from other vocabularies) is often
preferred. This corroborates the relevance of measuring the FCP of ontologies:
ontologies providing ample sub-categorization for the `pillar' concepts of the
tobe-published dataset deserve to be adopted in bulk.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Tool Description</title>
      <p>OReCaP is a web application2 that aims to demonstrate the calculation of FCP
scores for ontologies in the context of an ontology search (for reuse) scenario.
1 Here written in the human-readable Manchester syntax, see https://www:w3:org/</p>
      <p>TR/owl2-manchester-syntax/.
2 Available as demo at https://fcp:vse:cz/orecap.</p>
      <p>The interaction starts with a keyword-based search where the input consists
of at least one focused class keyword and of optional additional keywords. The
intuition is that the focused class keyword/s denotes the high- or medium-level
type/s of entities whose instances are to be further sub-categorized using
concepts from the ontology; the additional keywords, on the other hand, correspond
to whatever domain terms. Imagine, for example, that the data is currently
stored in a relational database. The focused class keyword might then often be
the name of the top-level table (which can be, e.g., `Client', `Patient',
`Vehicle', `Account', or the like); the additional keywords can be taken, e.g., from the
names of subordinate tables, table columns, or prede ned values for the elds.</p>
      <p>The search returns a sorted list of ontologies whose classes match one or
more of the provided keywords by their IRI, name or description; classes with
a match of focused class keyword are listed rst. The matched classes are listed
for each ontology. Classes that match the focused class keywords are preselected
(i.e., checked) by default; classes that matches the additional keywords are not
preselected but can be selected (checked) manually by the user.</p>
      <p>
        The next step is to execute the FCP calculation for a chosen ontology, given
the selected classes as focus classes, by clicking on the `Calculate FCP' button. In
a pop-up window, metadata about the ontology including its URI and namespace
is displayed, along with the total FCP score, which is calculated based on the
FCP weight values and the categorizations listed at the bottom. This score is
the sum of all partial scores for each focus class. The weight values can be
adjusted for each calculated ontology according to the user's assessment of each
category type, and the resulting FCP score will change accordingly. The global
FCP weights can be changed in the settings section, so that every new FCP
calculation would use them as the default weight values. The calculated FCP
score is then saved to a comparison list, which shows the FCP-based ranking of
the ontologies. Furthermore, the details of the calculations and categorizations
can be inspected, where for each focus class, its categories are displayed. There
are 4 types of categories considered, conforming to the earlier formulated [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
concept expression language (the F C symbol denotes the focus class):
{ t1 : named classes; speci cally, we consider the subclasses of the focus class
(C; C v F C)
{ t2 : existential restriction to the top concept (F C u 9P:&gt;)
{ t3 : existential restrictions to a named class (F C u 9P:C)
{ t4 : existential restrictions to a particular individual (F C u 9P:fig).
      </p>
      <p>
        OReCaP makes use of the Linked Open Vocabulary API3 for the
keywordbased search and for retrieving the ontology metadata. The FCP calculation
itself [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is implemented on top of OWL API, in combination with the jFact
reasoner.4 OWL API is used to load and parse the ontology source codes, and
jFact is used to infer class expressions. Our implementations for this demo are
open-source and available on GitHub5 under the MIT license.
3 https://lov:linkeddata:es/dataset/lov/api
4 https://github:com/owlcs/owlapi, https://github:com/owlcs/jfact
5 https://github:com/nvbach91/orecap, https://github:com/nvbach91/fcp-api
      </p>
    </sec>
    <sec id="sec-3">
      <title>Usage Scenario</title>
      <p>
        This scenario addresses sports event data publishing. For the focus class
keywords competition, round, and match, and additional keywords game, medal,
player, team, and sport, OReCaP lists the BBC Sport Ontology as one of the top
matches. The FCP calculation for this particular ontology with 3 selected classes,
sport:Competition, sport:Match, and sport:Round, yields, with previously
empirically estimated default category type weights [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a score of 162.00 (of which
sport:Competition alone assures over 130). The detailed calculation is shown in
Table 1. Among the meaningful categories usable for sub-categorizing instances
of sport:Competition using the ontology, OReCaP lists, e.g., the following ones:
{ sport:GroupCompetition (t1 );
{ 9 sport:promotesTo.owl:Thing (t2 ) { competitions that lead to a promotion;
{ 9 sport:lastStage.sport:KnockoutCompetition (t3 ) { competitions that have
a knock-out competition as their the last stage;
{ 9 sport:eventGender.fhttp://www.bbc.co.uk/things/event-gender/mixedg (t4 )
{ competitions where the gender of competitors is mixed.
      </p>
      <p>Focus class
sport:Competition
sport:Match
sport:Competition
sport:Round
sport:Match
sport:Competition
sport:Round
sport:Match
sport:Competition
sport:Round
Total FCP score</p>
      <p>Of course, not all categories are equally meaningful. To re ect that, the user
can adjust the weight values as described in Section 2. At the moment,
OReCaP only allows to change the weight values at the level of the whole category
types (we also plan to add the option of altering the weight of the individual
categories).</p>
      <p>Another result retrieved for the same keyword setting is the The DBpedia
Ontology. Even if it has more (additional) keyword matches, it only matches
a single focus class keyword (with dbpedia-owl:Competition), and its (default)
FCP score is only 5.00. A partial screenshot with these two results is in Fig. 1.
Focused class keywords *
competition round match</p>
      <p>CALCULATE FCP</p>
      <p>Additional keywords
game medal player team unitsport event
dbpedia-owl: The DBpedia Ontology
http://dbpedia.org/ontology/
Matched keywords
game (1) medal (1) player (1) team (1) competition (1) match (1)
(Un)select your focus classes
dbpedia-owl:Competition dbpedia-owl:Game
dbpedia-owl:GolfPlayer dbpedia-owl:Netbal Player
dbpedia-owl:SquashPlayer dbpedia-owl:Handbal Team
dbpedia-owl:Basebal Team dbpedia-owl:NCAATeamSeason
dbpedia-owl:AustralianFootbal Team dbpedia-owl:SportsTeamSeason
dbpedia-owl:CyclingCompetition dbpedia-owl:SportCompetitionResult
dbpedia-owl:WrestlingEvent dbpedia-owl:Footbal Match</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>The notion of FCP is fundamentally novel within the family of content-based
ontology recommendation approaches. The current demo is meant to
demonstrate its contribution. In the future we plan to compare the results obtained
through this measure with those obtained by popularity, credibility, and other
existing measures, to see how they could support the dataset publisher in a
complementary way, within a coherent methodology. We would also like to perform
experiments with the tool in various domains, in order to devise novel heuristics
for setting parameters on the onset of the ontology search and reuse sessions.
Acknowledgment
This research is being supported by project no. 18-23964S of the Czech Science
Foundation, \Focused categorization power of web ontologies ".</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atemezing</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Information Content based Ranking Metric for Linked Open Vocabularies</article-title>
          .
          <source>In: 10th Int. Conf. Semantic Systems</source>
          ,
          <year>2014</year>
          , ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Butt</surname>
            ,
            <given-names>A. S.:</given-names>
          </string-name>
          <article-title>Ontology search: Finding the right ontologies on the web</article-title>
          .
          <source>In: WWW</source>
          <year>2015</year>
          , Companion volume.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kolbe</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Traon</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Popularity-Driven Ontology Ranking Using Qualitative Features</article-title>
          .
          <source>In: ISWC 2019</source>
          . Springer LNCS 11778.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Schaible</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling</article-title>
          .
          <source>In: ESWC 2014</source>
          , Springer, LNCS
          <volume>8465</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stavrakantonakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Linked Open Vocabulary Ranking and Terms Discovery</article-title>
          .
          <source>In: SEMANTiCS</source>
          <year>2016</year>
          , ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Svatek</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamazal</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vacura</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Categorization Power of Ontologies with Respect to Focus Classes</article-title>
          .
          <source>In: EKAW 2016</source>
          , Springer, LNCS
          <volume>10024</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>