<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automated Exploration of Ontology Repositories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ondrej Zamazal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vojtech Svatek</string-name>
          <email>svatekg@vse.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Economics</institution>
          ,
          <addr-line>W. Churchill Sq.4, 130 67 Prague 3</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Motivation The choice of adequate ontology repository is an important prerequisite to nding an ontology to be reused or adapted for a concrete use case. As the repositories are mostly a liated to particular communities within the semantic web, understanding the typical features of ontologies in each of them is also helpful for designers of ontology management tools. Overall Process, Metrics and Results Our ontology exploration process includes ontology collection, materialization and then metrics computation; nally, the resulting metrics are explored using the R language1 to automatically get a summary report in the form of tables. To automate the collection phase, we partly employed Ontohub,2 which is an open ontology repository mirroring several other repositories. The materialization includes ontology storing (into the database) in order to decompose them into entities, names, relations, imported ontologies and head nouns. We use the OWL-API3 to manipulate the ontologies. We considered metrics related to four aspects of ontologies.4 Logical and structural metrics include, e.g., the numbers of di erent types of entities and axioms. We also categorize the complexity of ontologies into bins (as in [2, 3]). The naming aspect re ects some basic information regarding the length of class name (local fragments of URI or labels), capitalization and usage of concatenation symbol/technique, i.e. a hyphen, underscore, camel-case or dot (as in [1]). For the annotation aspect we compute the proportions of RDFS annotations. We explored ontologies from ve prominent ontology repositories (Table 1 contains just a few selected metrics). Due to parsing problems, unavailability of ontologies or their imports we however did not collect all ontologies from the repository. BioPortal 5 is a web portal providing access to a library of wellcurated biomedical ontologies via REST-ful services. It contains ontologies from another ontology repository, the OBO Foundry.6 We collected ontologies using the Ontohub mirror where only ontologies with size below 5MB (thus only 342 of 1 http://www.r-project.org/ 2 https://ontohub.org/ 3 http://owlapi.sourceforge.net/ 4 Due to the space limitation full list of metrics and complete results are at the supplementary web page: http://owl.vse.cz:8080/MetricsExploration/ 5 http://bioportal.bioontology.org/ 6 http://obofoundry.org/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Metrics (June 2014 snapshot)
Ontologies processed
Percentage of all
Complex class using existential restr. Avg
Complex class as superclass Avg
Branching Avg</p>
      <p>Max
Multiple inheritance Avg</p>
      <p>Max
Annotation as label Avg
Annotation as comment Avg
Camel technique Avg
Underscore technique Avg
the total) are available.7 The Dumontier lab ontologies8 are biological ontologies
aimed at knowledge representation and reasoning. Their ontologies are quite
interconnected (many mutual imports). LOV 9 is a well-curated collection of
linked open vocabularies used in the Linked Data Cloud. The Protege ontology
library mostly contains ontologies developed within the Protege editor. As there
is no programmatic access to the library, we manually downloaded them. It
turned up that out of 93 ontologies (except Dumontier ontologies on which there
is also a link) 43% ontologies were not available. Finally, the TONES repository
(using its Ontohub mirror of 207 ontologies - collected 88%) contains ontologies
of various domains, many of them however designed for testing purposes.
Future Work We plan to run such an analysis repeatedly, include more
repositories (preferably via Ontohub) and more metrics. We also want to keep the
ontology exploration services available via a web interface10 where the users could
ask, on the one hand, for the latest summaries of particular repositories, and on
the other hand for particular ontologies or ontologies meeting some criteria.
Ondrej Zamazal has been supported by the CSF grant no. 14-14076P and this
research is also supported by UEP IGA project F4/34/2014 (IG407024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Manaf</surname>
            <given-names>N. A. A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>: A Survey of Identi ers and Labels in OWL Ontologies</article-title>
          . In: OWLED-
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Matentzoglu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bail</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Snapshot of the OWL Web</article-title>
          .
          <source>In: ISWC</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wang</surname>
            <given-names>T. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Survey of the Web Ontology Landscape</article-title>
          . In: ISWC-
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>7 To overcome 5MB limitation we gathered BioPortal ontologies directly by RESTful services. Corresponding ontology metrics are available via the supplementary web</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>8 http://dumontierlab.com/?page=ontologies</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>9 http://lov.okfn.org/dataset/lov/</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>10</surname>
          </string-name>
          <article-title>A sample service, providing metrics for a given ontology, is already available from http://owl</article-title>
          .vse.cz:8080/MetricsExploration/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>