<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OOSP: Ontological Benchmarks Made on the Fly</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ondrej Zamazal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vojtech Svatek</string-name>
          <email>svatekg@vse.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Knowledge Engineering, University of Economics</institution>
          ,
          <addr-line>W. Churchill Sq.4, 130 67 Prague 3</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The demo paper presents OOSP (Online Ontology Set Picker), a tool allowing to select, from major repositories, a set of ontologies that satisfy a user-de ned sets of metrics. Its main purpose is allowing ontological tool designers to rapidly build custom benchmarks on which they could test di erent features. It could also serve for usage studies of di erent ontology language constructs and for pattern spotting. The web front-end allows to specify a broad range of metrics and delivers benchmarks along with their statistics of metrics, including a graph view.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The number of ontologies on the semantic web is steadily growing, and new tools
for their management and exploitation are being built all the time. Their
functionality needs to be tested on ontology collections allowing to balance between
1) su cient coverage of di erent cases the tool might encounter, and, 2) presence
of speci c features crucial for a particular functionality. With respect to the
latter, for instance, ontology repair tools [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ] can only be assessed on models with
non-trivial concept expressions; similarly, thoroughly testing ontology
visualization techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] demands diverse ontology aspects such as large taxonomies,
instances or various types of axioms; furthermore, reasoners often concentrate
on certain OWL2 ontology pro les, such as EL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Beyond the benchmarking
scenario, usage analysis of di erent constructs can also be helpful when
analyzing empirical modeling patterns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and devising best-practice patterns for the
respective modeling problems. Since preferences regarding language constructs
(and similar kinds of restrictions) are not met by existing ontology search/picking
tools, such as Swoogle1 or Watson,2 we decided to develop a simple web-based
tool called Online Ontology Set Picker (OOSP), leveraging on our previous work
related to analysis of ontology repositories [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] and o ering a broad range of
ontology selection criteria/metrics.
      </p>
      <p>The demo paper is structured as follows. Section 2 describes the OOSP
internals including the source repositories. Section 3 introduces its front-end.
Section 4 provides comparison to related work. Finally, Section 5 wraps up the paper
with conclusions and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://swoogle.umbc.edu/ 2 http://watson.kmi.open.ac.uk/</title>
      <sec id="sec-2-1">
        <title>OOSP: Sources and Internals</title>
        <p>At the source level, OOSP currently relies on two prominent ontology
repositories, BioPortal and LOV. The content of the ontologies is processed using the
OWL-API.3</p>
        <p>BioPortal 4 is a library of well-curated biomedical ontologies. Currently
(February 2015 snapshot) there are 420 ontologies in di erent formats, including some
adapted from another repository, the OBO foundry.5 Out of the 420 ontologies
36 were not available due to `not found' error or private access. Further 9
ontologies pointed to zip archives, which we currently do not process. Finally, out
of the 375 available BioPortal ontologies, our Feb. 2015 snapshot contains 317
(85%), since 36 ontologies were not processable due to unavailable imports and
22 due to parsing problems using OWL-API. To access the BioPortal ontologies
we used RESTful services.</p>
        <p>LOV 6 is a well-curated collection of linked open vocabularies used in the
Linked Data Cloud. To date (Feb. 2015 snapshot) there are 475 ontologies
covering diverse domains, e.g., publications, science, business or city. The
ontologies/vocabularies are usually small and they are used within diverse linked open
data applications. Out of the 475 ontologies 2 were not parseable by OWL-API
and 12 ontologies were not processable due to unavailable imports. In all, our Feb.
2015 snapshot contains 461 LOV ontologies (97%). To access the LOV ontologies
dump we used its SPARQL endpoint. This dump however does not contain all
imported ontologies. In all, OOSP now enables an access to 778 ontologies.</p>
        <p>The considered ontology metrics (82)7 are divided into 7 groups covering the
most important ontology aspects. Entity metrics (9) include numbers of entities
(e.g., classes, instances); axiom metrics (27) include numbers of di erent axiom
types (e.g., subsumption, equivalence); class expression type metrics (11) include
expression types used for construction of anonymous classes (e.g., existential
quanti cation); taxonomy metrics (9) include characteristics of taxonomy (e.g.,
the number of top classes, leaf classes, branching degree, maximum taxonomy
depth); OWL2 pro les and reasoning metrics (7) include pro le information
along with information about consistency and number of unsatis able classes;8
annotation metrics (6) include counts of selected annotation types (e.g., labels,
comments) and of di erent languages involved in label annotations; nally, detail
metrics (13) include some newly designed metrics related to domain/range (e.g.,
number of anonymous classes as domain de nition).</p>
        <p>
          Similarly as in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] we computed the overlap between the collections,
considering two ontologies as similar if their overlap, as the ratio of their signature
intersection and signature union, is at least 90%. Based on this, the BioPortal
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 http://owlapi.sourceforge.net/</title>
      <p>4 http://bioportal.bioontology.org/
5 http://obofoundry.org/
6 http://lov.okfn.org/dataset/lov/
7 We state a number of metrics in brackets.
8 We applied the HermiT reasoner: http://hermit-reasoner.com/.
and LOV snapshots only share 5 ontologies (e.g., the biotop ontology). There
are 4 overlapping ontologies within LOV. These overlapping ontologies di er in
annotation properties which are not currently considered in our overlap
computation. Due to very large ontologies in BioPortal we were not able to compute
its overlap.
3</p>
      <sec id="sec-3-1">
        <title>OOSP: Front-end</title>
        <p>OOSP, available at http://owl.vse.cz:8080/OOSP/, is a web-based application
implemented using Java Servlet Pages, JavaScript and OWL-API. Ontologies
with their imports are stored on disk and ontology metric criteria values and
statistics are stored in a MySQL database.</p>
        <p>OOSP follows a four-step work ow depicted in Figure 1. First, the initial
ontology pool is selected; for now, it can be BioPortal, LOV, or their union. Second,
the user can browse through the seven ontology metric types and specify values
(max and/or min, except nominal values such as OWL pro les) for individual
metrics. Currently, the combination of all selected restrictions has the semantics
of conjunction (we plan to add more exibility in future). To make the
restriction setting more informed, six statistics are provided: the ratio of ontologies
having at least one occurrence of the object aggregated (via count, average or
max) by the metric;9 the ratio of ontologies for which respective metrics is
unknown (N/A);10 and descriptive statistics (median, average, standard deviation
and maximum) of the metric over all ontologies.11 Third, the user obtains the
ontology set meeting the provided restrictions, and fourth, OOSP can randomly
select a subset of it, with required cardinality. For both, restricted ontology set
and its randomly selected subset, OOSP provides a table containing all metrics
values for all selected ontologies. An ontology from the set can be downloaded in
three ways: one separate ontology as OWL le, one ontology with all ontologies
from its import closure as zip archive, or ontology merged with its import closure
as one OWL le. There are further three ontology-set-wise download options:
only the table (in CSV); ontology set summary descriptive statistics (also in
9 For binary metrics such as OWL pro les, it is simply the ratio of positive values.
10 E.g., the reasoner could not process some ontologies due to unsupported datatypes.
11 We omit minimum since it is usually zero.</p>
        <p>CSV); and actual ontologies as OWL les (ZIP archive). Finally, for selected
eight metrics (classes/instances counts, axiom types, DL constructs, OWL 2
pro les, annotations, domain/range de nition types) OOSP also o ers graphs of
ontology set statistics.
4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Related Work</title>
        <p>Ontology repositories provide various search options to access their ontologies.
The Watson search engine allows to search ontologies using keywords. Via its
Java API Watson also provides a SPARQL endpoint along with some
precomputed metrics: concept coverage, DL expressivity, representation language (e.g.,
RDFS), numbers of classes, properties, individuals and statements. BioPortal
provides a term-based search for classes and properties in ontologies, where one
can further restrict the ontology category (e.g., anatomy). BioPortal RESTful
services o er several count-based metrics per ontology e.g. number of classes,
properties LOV also provides a RESTful service for a term-based search over
ontologies or terms, and a SPARQL endpoint. Other ontology repositories solely
provide collection of ontologies without rich metadata (e.g., the Oxford Ontology
Library, Protege Ontology Library, or Ontohub).</p>
        <p>
          The most relevant is the work by Matentzoglu et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], where the Manchester
OWL repository12 is presented. It contains a crawl-based created Manchester
OWL Corpus (MOWLCorp) presented in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], a snapshot of BioPortal and Oxford
Ontology Library. The goal of this repository, similarly to ours, is to create and
share ontology datasets. It provides access to ve pre-constructed datasets and an
experimental REST-based web service that should allow users to create a custom
dataset. Authors in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] also mentioned an experimental data set creator allowing
users to create custom datasets based on a wide range of metrics. However, on
the respective web-page13 there is only available o ine generation of custom
datasets where a user can specify his/her requirements: ontology pool, import
handling, OWL2 pro les and special wishes speci ed in a HTML form, while the
custom dataset is to be generated o ine by the portal maintainers.
        </p>
        <p>In comparison, our work focuses on the web-based front-end allowing to build
an experimental ontology set useful as benchmark for ontology tool developers
and ontology experimenters. Therefore, we do not precompile any ontology set
collection but we rather provide a broad range of metrics that can work as
onthe- y restrictions. Next, besides BioPortal repository we also considered LOV
repository since it is also a well-curated source and it also contains ontologies
broadly used. To cover potentially many various use cases we also provide extra
metrics types such as taxonomy, annotation and detail metrics. Further, we
put more emphasis on di erent types of additional downloads: besides actual
ontologies (and optionally their imports) it is also possible to download a table
with ontology metric values and summary statistics, plus associated graphs. We
cannot provide more practical comparison between the end-usage of OOSP and
12 http://mowlrepo.cs.manchester.ac.uk/
13 http://mowlrepo.cs.manchester.ac.uk/generate-custom-dataset/
the Manchester OWL repository, since the latter does not have a front-end for
on-the- y ontology dataset generation available.
5</p>
      </sec>
      <sec id="sec-3-3">
        <title>Conclusions and Future Work</title>
        <p>This paper presents OOSP, a web-based tool allowing ontology developers and
experimenters to create a benchmark ontology set based on selected metrics,
from two curated ontology repositories. Di erent ontology metric types can be
useful for various use cases (e.g., benchmarking ontology repair tools, ontology
visualization tools, reasoners, or tracking frequent patterns in ontologies). OOSP
provides detailed metrics values for each ontology from the selected set as well
as overall summary statistics, including a graph form.</p>
        <p>The current version of OOSP allows to reproduce each ontology set selection
(by setting the same initial pool and metrics restrictions) by other users;
however, in the future we also want to enable permanent storage of the ontology set
analysis results, so as to corroborate exchange of information between di erent
experimenters. We also plan to include more ontology repositories (e.g., Oxford
Ontology Library), involve more graphs and more detailed ontology metrics.
Finally, we plan to evaluate the usability of benchmark construction by OOSP
within a concrete domain, via a user-study with potential ontology tool
developers, e.g., for ontology visualization tools.</p>
        <p>Acknowledgement Ondrej Zamazal has been supported by the CSF grant no.
1414076P. This research is also supported by UEP IGA project no. F4/90/2015 and
long term institutional support of research activities by Faculty of Informatics
and Statistics, University of Economics, Prague.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kalyanpur</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Repairing unsatis able concepts in OWL ontologies</article-title>
          .
          <source>In: 3rd European Semantic Web Conference</source>
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Katifori</surname>
            <given-names>A.</given-names>
          </string-name>
          et al.:
          <article-title>Ontology visualization methods - a survey</article-title>
          .
          <source>In: ACM Computing Surveys (CSUR)</source>
          .
          <volume>39</volume>
          (
          <issue>4</issue>
          ), 10 pages,
          <year>2007</year>
          , ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
          </string-name>
          , E.:
          <article-title>Ontology design pattern detection - initial method and usage scenarios</article-title>
          .
          <source>In: 4th Int. Conference on Advances in Semantic Processing</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lehmann</surname>
            <given-names>J.</given-names>
          </string-name>
          , Buhmann
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>ORE - a Tool for Repairing and Enriching Knowledge Bases</article-title>
          .
          <source>In: 9th International Semantic Web Conference</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Matentzoglu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bail</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Snapshot of the OWL Web</article-title>
          .
          <source>In: 12th International Semantic Web Conference</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Matentzoglu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <article-title>The Manchester OWL Repository: System</article-title>
          .
          <source>In: 13th International Semantic Web Conference 2014 at poster session.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Noessner</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niepert</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>ELOG: A Probabilistic Reasoner for OWL EL</article-title>
          .
          <source>In: 5th Conf. Web Reasoning and Rule Systems (RR</source>
          <year>2011</year>
          ), Galway.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Towards Automation of Ontology Analysis Reporting</article-title>
          .
          <source>In: 14th Conference on Information Technologies Applications and Theory</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
          </string-name>
          , V.:
          <source>Automated Exploration of Ontology Repositories. In: 11th International Workshop on OWL: Experiences and Directions (OWLED</source>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>