<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NEOntometrics: A Flexible and Scalable Software for Calculating Ontology Metrics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Achim Reiz</string-name>
          <email>achim.reiz@uni-rostock.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kurt Sandkuhl</string-name>
          <email>kurt.sandkuhl@uni-rostock.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Metrics</institution>
          ,
          <addr-line>Ontometrics, NEOntometrics, Ontology Quality</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rostock University</institution>
          ,
          <addr-line>18051 Rostock</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Metrics allow to empirically assess ontologies. They enable the knowledge engineer to quickly grasp changes and differences between two different ontologies or two different versions of one ontology and can guide developing or reusing decisions. Calculating ontology metrics requires specialized software. Today, there is a lack of application support, as none of the previously developed software is open source, and most applications are not available anymore. This paper presents a flexible, scalable architecture for future-proof metric calculation software. We first depict the missing availability and lack of functionality in the current approaches. Afterward, we abbreviate a new scalable and flexible architecture, which is the underpinning of the new NEOntometrics software. The software is open source and comes with a public metric calculation endpoint.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and an API2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It originated from a student project and is since maintained by the authors of this
paper.
activities.
      </p>
      <p>However, further research in ontology metrics brought new requirements that the existing
applications struggled to fulfill. Ontometrics has problems with the efficient analysis of large
ontologies, does not allow for the calculation of evolutional metrics, and has no interface to adopt new
measurements. These shortcomings initiated the development of NEOntometrics3.</p>
      <p>The paper is structured as follows: First, we present previously developed metric calculation
software and derive our need for a new development based on the research requirements and the
shortcomings of the current Ontometrics application. These shortcomings then motivate the newly
proposed application architecture. The research concludes with an overview of upcoming research</p>
      <p>2022 Copyright for this paper by its authors.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work And New Requirements For Metric Based Ontology Research</title>
      <p>The field of automatic metric calculation has seen some activity in the past years. However, many
approaches are unavailable and lack the features we deem necessary for our research. Table 1 below
gives an overview of previously published ontology evaluation software.</p>
      <p>None of the software named above is open source, and most are no longer available. The lack of
sources and applications hinders today's research from reusing significant parts of the already developed
body of knowledge. Thus, the research approaches are often isolated from one another. As a result, an
assessment that applies the OntoQA framework is hardly comparable with an assessment that utilizes
metrics from the OQuaRE framework.</p>
      <p>
        As part of a broader research perspective [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we would like to: (A.) Analyze historical metric data
for a variety of different ontologies. As git has become the de facto standard for sharing codebases and
distributed development, we argue that the software should allow the git protocol. (B.) Support a variety
of existing and potential future calculation methodologies and frameworks. (C.) Provide helpful
resources on the various available metrics. (D.) Have convenient interfaces for both humans and
machines.
      </p>
      <p>As the Ontometrics source code is available at our institution, we selected it as the basis for our
future software. Furthermore, at first glance, Ontometrics already fulfills many of the given
requirements: It comes with a web GUI and an API. The latter could be extended with a script to
calculate historical data. The corresponding wiki provides helpful resources on the metrics.</p>
      <p>However, the underlying metric calculation does not scale well, and the implemented calculation
algorithms are neither easily extensible nor flexible. The missing scalability originates from the
implementation of the metric calculation itself: Each calculated metric is represented as an object with
an internal representation of the given ontology. This architecture creates a massive overhead of
memory usage for larger ontologies, which were at risk of overloading the memory stack. This
inefficient memory handling further hindered the parallelization of calculations.</p>
      <p>The metric calculations themselves were hardcoded into the software. While this, arguably, does not
hinder the extension of the software, it adds a layer of complexity for future efforts. Regarding the API,
which is REST-based, adding new metric calculations would require adding versioning to keep the
endpoint consistent for the consumers.</p>
      <p>There are a few more issues, like the old-fashioned web interface and the isolated help pages on the
wiki. The application was not future-proofed in the old state, and we argue that a complete rework is
more promising than a somewhat limited evolution for all of the given requirements.</p>
    </sec>
    <sec id="sec-3">
      <title>3. A New Calculation Architecture</title>
      <p>As shown before, the old Ontometrics application is not sufficient to fulfill the upcoming requirements.
The following section outlines the architecture for Ontometrics' successor NEOntometrics. The
software is open-source and available on GitHub5.
4 Java binaries available on GitHub, but no source code or license attached.
5 https://github.com/achiminator/NEOntometrics, https://doi.org/10.5281/zenodo.6984839</p>
    </sec>
    <sec id="sec-4">
      <title>A Metric Ontology For Ontology Metrics</title>
      <p>Fundamental to the new metric application is a central place for storing knowledge on ontology metrics.
NEOntometrics stores the information in the form of an ontology. The resource contains
humanreadable data, like metric descriptions and definitions, but also the underlying calculation
methodologies that the computer will use to set up the calculation engine at the start of the application.</p>
      <p>The ontology contains two main sets of different metrics: The Elemental Metrics represent the
atomic attributes of the ontology, like the Number of Classes or Object Property Domain Axioms. These
measurements are connected to individuals representing the metric name in the database and the
calculation software.</p>
      <p>
        The section Quality Frameworks upholds metrics proposed in the literature, namely the various
frameworks like OntoQA by Tartir et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or the measurements by Gangemi et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These
elements contain annotations representing the textual description found in their papers and the
calculation of the metrics in the form of relations with the Elemental Metrics. Figure 1 shows Tartir et
al.'s Relationship Diversity. The calculation is formalized using the object properties divisor and
numerator, connected to the elemental metrics Sub Class Declarations and Object Property Class
Assertions.
      </p>
      <p>As we will show in the next section, the ontology aligns the various (micro-)services and provides
the domain knowledge responsible for the functioning of the services. It ensures the flexibility to
describe and implement new metrics quickly. As long as there is no need to add other Elemental
Metrics, newly implemented Quality Frameworks can be instantly applied to already calculated
datasets.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>A Scalable Calculation Unit</title>
      <p>The new service is built in a microservice architecture, encapsulating the various functionalities in
separate containers. Figure 2 gives an overview of the application stack and the underlying
technologies.</p>
      <p>The web frontend is built using the multi-platform language flutter and the underlying client
language dart6. The Web-Framework complies with the Material design, which is known chiefly as the
primary design language of Android Apps. It thus provides a familiar navigation experience. flutter
allows us to utilize a state-of-the-art web design while building the UI with a typed, object-oriented
programming language. The metric-related parts, like the help and calculation page, are created
dynamically when visiting the frontend webpage.</p>
      <p>The API part is built using django and django rest framework7 and handles all incoming requests.
At startup, the API first extracts the relevant information from the Metric Ontology. Afterward, it
prepares the data for the frontend help page and metric calculation options and augments the Elemental
6 https://flutter.dev, https://dart.dev
7 https://www.djangoproject.com, https://www.django-rest-framework.org
Metrics available in the database with the Quality Framework metrics from the ontology using
automated code generation.</p>
      <p>For accessing ontology data, the client first examines whether a requested ontology analysis is
already known in the system. Three states can occur: If it is in the (redis-) queue, the API returns either
the current position of the job in the queue or, if the calculation has already started, its progress. If the
metrics are already completely calculated and stored in the database, they can be retrieved with a second
request. The user can put the calculation into the queue if the requested ontology (repository) is
unknown to the system.</p>
      <p>Already, NEOntometrics is currently able to calculate over 160 metrics – returning all of them would
produce a significant over-fetching for most situations. Thus, the API provides its services to the
frontend and other consumers using a GraphQL interface. It empowers the service consumer to decide
how the response is structured and which information and metrics shall be included.</p>
      <p>The queue is managed by a Redis8 instance, an in-memory key-value database. It stores information
on the required parameters and meta information on the upcoming jobs, the workers that handle the
actual calculation, and their progress. The latter includes information like the number of already
analyzed and analyzable files.</p>
      <p>The workers handle the actual calculation of the jobs. They register themselves in the queue and
retrieve open jobs. To analyze a git repository, they clone it, then collect the files that end with .owl,
.rdf, or .ttl. Afterward, the service calls the OPI calculation instance for every commit of every collected
file to calculate the underlying elemental ontology metrics. After a successful calculation, the metrics
are stored in the database. The worker and API share a common codebase, which can either be started
as a django instance (API) or as a worker. The asynchronous calculation is managed using the
djangorq9 package.</p>
      <p>
        The OPI service carries out the actual analysis of the ontologies. The worker sends an ontology to
OPI using an HTTP-POST request and receives the corresponding metric data. OPI is the successor of
the OntoMetrics API [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the service does not have much in common with its predecessor.
First and foremost, the metric calculation has been reworked: Each metric now works on the same
internal ontology object, thus, avoiding inefficient memory allocation. Further, the service supports
reasoning capabilities. At last, we removed all non-atomic metrics and homogenized and aligned the
names of the calculated metrics with the other services.
      </p>
      <p>The relational database stores the django-related information and the calculated metrics. It currently
builds upon MariaDB10. However, django is highly flexible regarding the endpoint and allows the user
to adapt it to other technologies without changing the code.</p>
      <p>Docker11 handles the provisioning of the services. The use of containers ensures portability,
isolation of services, and scalability. For instance, the number of parallel analyses can be scaled up by
increasing the number of workers (currently: 3). As OPI creates a new thread for every request, we can
scale the number of parallel computations on the machine by scaling up the workers. Furthermore, the
8 https://redis.io
9 https://github.com/rq/django-rq
10 https://mariadb.org
11 https://www.docker.com
system is future-proofed for more upcoming demand as it could be scaled out horizontally, e.g., by
adding it to a docker swarm or Kubernetes cluster with load balancing. More API nodes handle more
frontend requests, and the calculation capacity increases with adding more OPI nodes.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Conclusion</title>
      <p>Calculating ontology metrics requires specialized software. Depending on the ontology size and
complexity, the analysis of ontologies can require considerable computational resources. It is especially
the case if we target not only one ontology at a time but an ontology repository with all available files
and versions.</p>
      <p>This paper presented the architecture of NEOntometrics, a responsive, flexible, and scalable
application for calculating ontology metrics. We believe that it has the potential to strengthen the use
of ontology metrics and enable us to answer many research questions of today and tomorrow.</p>
      <p>Future research will analyze the results of the metric calculations. However, we still plan to extend
the application with functionalities like the analysis of SPARQL endpoints, dashboarding capabilities,
or private repositories.</p>
    </sec>
    <sec id="sec-7">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lantow</surname>
          </string-name>
          ,
          <article-title>OntoMetrics: Putting Metrics into Use for Ontology Evaluation</article-title>
          ,
          <source>in: Proceedings of the 8th IC3K 2016 International Joint Conference on Knowledge Discovery</source>
          ,
          <article-title>Knowledge Engineering and Knowledge Management, Porto</article-title>
          , Portugal,
          <year>2016</year>
          , pp.
          <fpage>186</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dibowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandkuhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lantow</surname>
          </string-name>
          ,
          <article-title>Ontology Metrics as a Service (OMaaS)</article-title>
          ,
          <source>in: Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management</source>
          , Budapest, Hungary,
          <volume>02</volume>
          .
          <fpage>11</fpage>
          .2020 -
          <volume>04</volume>
          .
          <fpage>11</fpage>
          .
          <year>2020</year>
          , pp.
          <fpage>250</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Qing</given-names>
            <surname>Lu</surname>
          </string-name>
          , Volker Haarslev,
          <article-title>OntoKBEval: A Support Tool for DL-based Evaluation of OWL Ontologies, in: OWL: Experiences and Directions</article-title>
          , Athens, Georgia (USA),
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tartir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.B.</given-names>
            <surname>Arpinar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Aleman-Meza</surname>
          </string-name>
          ,
          <article-title>OntoQA: Metric-Based Ontology Quality Analysis</article-title>
          ,
          <source>in: IEEE Workshop on Knowledge Acquisition from Distributed</source>
          , Autonomous,
          <source>Semantically Heterogeneous Data and Knowledge Sources</source>
          , Houston,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dividino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sonntag</surname>
          </string-name>
          ,
          <article-title>Semiotic-based ontology evaluation tool S-OntoEval</article-title>
          ,
          <source>in: Proceedings of the International Conference on Language Resources and Evaluation</source>
          , Marrakech, Morocco,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>McDaniel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.C.</given-names>
            <surname>Storey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sugumaran</surname>
          </string-name>
          ,
          <article-title>Assessing the quality of domain ontologies: Metrics and an automated ranking system</article-title>
          ,
          <source>Data &amp; Knowledge Engineering</source>
          <volume>115</volume>
          (
          <year>2018</year>
          )
          <fpage>32</fpage>
          -
          <lpage>47</lpage>
          . https://doi.org/10.1016/j.datak.
          <year>2018</year>
          .
          <volume>02</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandkuhl</surname>
          </string-name>
          ,
          <article-title>Harmonizing the OQuaRE Quality Framework</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Enterprise Information Systems</source>
          , online,
          <year>2022</year>
          , pp.
          <fpage>148</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Duque-Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.T.</given-names>
            <surname>Fernández-Breis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Aussenac-Gilles, OQuaRE: A squarebased approach for evaluating the quality of ontologies</article-title>
          ,
          <source>Journal of Research and Practice in Information Technology</source>
          <volume>43</volume>
          (
          <year>2011</year>
          )
          <fpage>159</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reiz</surname>
          </string-name>
          ,
          <article-title>An Evolutional Based Data-Driven Quality Model for Ontologies</article-title>
          ,
          <source>in: Proceedings of the ISWC 2020 Doctoral Consortium</source>
          , Athens, Greece/online,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Catenacci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciaramita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bolici</surname>
          </string-name>
          , Strignano Onofrio,
          <article-title>Ontology evaluation and validation: An integrated formal model for the quality diagnostic task</article-title>
          , Trentino, Italy,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>