<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The SEALS Yardsticks for Ontology Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raul Garc a-Castro</string-name>
          <email>rgarcia@fi.upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan Grimm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioan Toma</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Schneider</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Marte</string-name>
          <email>adrian.marteg@sti2.at</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FZI Research Center for Information Technology</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontology Engineering Group, Departamento de Inteligencia Arti cial. Facultad de Informatica, Universidad Politecnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>STI Innsbruck. Universitat Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the rst SEALS evaluation campaign over ontology engineering tools (i.e., the SEALS Yardsticks for Ontology Management). It presents the di erent evaluation scenarios de ned to evaluate the conformance, interoperability and scalability of these tools, and the test data used in these scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Ontology engineering tools are a cornerstone in the development of the Semantic
Web and still they lack some set of common evaluations and test data that can
be used to assess whether these tools are suitable for speci c use cases.</p>
      <p>In SEALS we aim to automatically evaluate ontology engineering tools. This
implies a mayor challenge in this type of tools; rst, because of the high
heterogeneity between these tools and, second, because the perception over these
tools is usually related to their user interfaces. Nevertheless, we plan to
disregard user interaction from the evaluations (either from real users or simulated)
and to measure the relevant characteristics of these tools through programmatic
interactions in order to obtain fully-automatic evaluations.</p>
      <p>The SEALS Yardsticks for Ontology Management is an evaluation campaign
over ontology engineering tools that contains three evaluation scenarios for
evaluating the conformance, interoperability and scalability of these tools, and that
are supported by di erent evaluation services provided by the SEALS Platform.</p>
      <p>The rst characteristic that we will cover in the evaluation campaign will be
the conformance of ontology development tools. Previously, this conformance has
only be measured in qualitative evaluations that were based on tool speci cations
or documentation, but not on running the tools and obtaining results about their
real behaviour. Some previous evaluations provided some information about the
conformance of the tools since such conformance a ected the evaluation results;
however, the current situation is that the real conformance of existing tools is
unknown. Therefore, we will evaluate the conformance of ontology engineering
tools and we will cover the RDF(S) and OWL W3C recommendations.</p>
      <p>
        A second characteristic that we will cover, highly related to conformance, is
interoperability. Previously, in the RDF(S) and OWL Interoperability
Benchmarking activities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] the interoperability of several semantic technologies was
evaluated using RDF(S) and OWL Lite as interchange languages. In this
evaluation campaign we will extend these evaluations with test data for OWL DL
and OWL Full to fully cover the RDF(S) and OWL recommendations.
      </p>
      <p>
        Scalability is a main concern for any semantic technology, including ontology
engineering tools. Nevertheless, only one e ort was previously performed for
evaluating the scalability of this kind of tools and it was speci c to a single
tool [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this rst evaluation campaign we will establish the grounds for the
automatic evaluation of the scalability of ontology engineering tools, with the
aim of proposing an extensible approach to be further extended in the future.
      </p>
      <p>In all these evaluation scenarios, the only requirement for performing the
evaluation on a tool is that the tool is able of importing and exporting ontologies
in the ontology language. Therefore, the evaluations can be performed not only
on ontology engineering tools but also on other types of semantic technologies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Evaluation Scenarios</title>
      <p>In the evaluation scenarios that compose the evaluation campaign, we need an
automatic and uniform way of accessing most of the semantic tools and the
operations performed to access such tools must be supported by most of them.
Due to the high heterogeneity in semantic tools, ontology management APIs
vary from one tool to another. Therefore, the way chosen to automatically access
the tools is through the following two operations commonly supported by most
semantic tools: to import an ontology from a le (i.e., to load an ontology from
a le into the tool internal model), and to export an ontology to a le (i.e., to
store an ontology from the tool internal model into a le).</p>
      <p>
        The next sections describe the three evaluation scenarios used in the
evaluation campaign. A detailed description can be found in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Conformance. The conformance evaluation has the goal of evaluating the
conformance of semantic technologies with regards to ontology representation
languages, that is, to evaluate up to what extent semantic technologies adhere to
the speci cation of ontology representation languages.</p>
      <p>During the evaluation, a common group of tests is executed and each test
describes one input ontology that has to be imported by the tool and then
exported. After a test execution, we have two ontologies in the ontology
representation language, namely, the original ontology and the nal ontology exported
by the tool. By comparing these ontologies we can know up to what extent the
tool conforms to the ontology language.</p>
      <p>Interoperability. The interoperability evaluation has the goal of evaluating
the interoperability of semantic technologies in terms of the ability that such
technologies have to interchange ontologies and use them. In concrete terms, the
evaluation takes into account the case of interoperability using an interchange
language.</p>
      <p>During the experiment, a common group of tests is executed and each test
describes one input ontology that has to be interchanged between a single tool
and the others (including the tool itself). After a test execution, we have three
ontologies in the ontology representation language, namely, the original
ontology, the intermediate ontology exported by the rst tool and the nal ontology
exported by the second tool. By comparing these ontologies we can know up to
what extent the tools are interoperable.</p>
      <p>Scalability. The scalability evaluation has the goal of evaluating the scalability
of semantic technologies in terms of time characteristics. More concretely in
our case, the scalability evaluation is concerned with evaluating the ability of
ontology engineering tool to handle large ontologies.</p>
      <p>During the evaluation, a common group of tests is executed and each test
describes one input ontology that has to be imported by the tool and then
exported. We are interested in the amount of time it takes to perform import
and export operations on large size ontologies. After a test execution, we have
as result two ontologies, the original ontology and the nal ontology exported
by the tool, and execution information including the time when the import and
export operations started and ended.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Test Data</title>
      <p>Conformance and Interoperability. In the rst evaluation campaign, the
conformance and interoperability evaluations will cover the RDF(S) and OWL
speci cations. To this end, we will use four di erent test suites that contain
synthetic ontologies with simple combinations of components of the RDF(S),
OWL Lite, OWL DL, and OWL Full knowledge models.</p>
      <p>
        The RDF(S) and OWL Lite Import Test Suites already exist and detailed
descriptions of them can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The OWL DL and OWL Full Import
Test Suites have been developed in the context of the SEALS project and are
described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Scalability. Two test suites were de ned to be used for the scalability
evaluations. The Real-World Ontologies Scalability Test Suite includes real-world
ontologies in OWL DL that have been identi ed as being relevant for scalability
evaluation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: AEO, the NCI Thesaurus, GALEN, the Foundational Model of
Anatomy Ontology (FMA), the OBO Foundry, Robert's family ontology, and
the wine and food ontologies. From this large set of ontologies we have selected
20 ontologies of various sizes to construct a rst scalability test data suite.
      </p>
      <p>
        The second test suite was de ned using the Lehigh University Benchmark
(LUBM)1 data generator (UBA) that generates data over the Univ-Bench
ontology2, which describes universities and departments and the activities that
occur at them [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Evaluation automation will allow to convert evaluations of ontology engineering
environments from one-time evaluation activities to e ortless continuous
evaluations. Furthermore this reduction of e ort in evaluating these tools will allow
not only to perform evaluations with large test data, but also to perform
evaluations that would be di cult or impossible to be performed manually. Equally
important in the research area is the bene t of repeatability that automatic
evaluations provide, allowing to perform evaluations multiple times in a consistent
way and to objectively compare research ndings.</p>
      <p>All the resources used in this evaluation campaign as well as the results
obtained will be publicly available through the SEALS Platform. This way anyone
interested in evaluating an ontology engineering tool will be able to do so, and
to compare to others, with a small e ort.</p>
      <p>Our future plans are to extend, on the one hand, the evaluation scenarios to
cover more tool characteristics and, on the other hand, the evaluation data to
include new test suites to cover the OWL 2 speci cation. With these extensions,
we plan to conduct a second edition of this evaluation campaign.
Acknowledgements
This work has been supported by the SEALS European project (FP7-238975).
1 http://swat.cse.lehigh.edu/projects/lubm/
2 http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Garc</surname>
            a-Castro,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Benchmarking Semantic Web technology</article-title>
          . Volume 3
          <article-title>of Studies on the Semantic Web</article-title>
          . AKA Verlag { IOS Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Garc</surname>
            a-Castro,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Guidelines for benchmarking the performance of ontology management APIs</article-title>
          .
          <source>In: Proceedings of the 4th International Semantic Web Conference (ISWC2005)</source>
          , Galway, Ireland, Springer (
          <year>2005</year>
          )
          <volume>277</volume>
          {
          <fpage>292</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Garc</surname>
            a-Castro,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kerrigan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
          </string-name>
          , G.:
          <article-title>D10.1. Evaluation design and collection of test data for ontology engineering tools</article-title>
          .
          <source>Technical report, SEALS Project</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Garc</surname>
            a-Castro,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toma</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marte</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bock</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>D10</source>
          .2.
          <article-title>Services for the automatic evaluation of ontology engineering tools v1</article-title>
          .
          <source>Technical report, SEALS Project</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , He in, J.:
          <article-title>LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>3</volume>
          (
          <year>2005</year>
          )
          <volume>158</volume>
          {
          <fpage>182</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>