<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Redefinition and Statistical Analysis of Measures for Evaluating the Quality of Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Melina Tibaldo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexia Wilkinson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ma. Laura Taverna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariela Rico</string-name>
          <email>mrico@frsf.utn.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ma. Rosa Galli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Investigaci ́on y Desarrollo de Ingenier ́ıa en Sistemas de Informaci ́on (CIDISI) - Universidad Tecnol ́ogica Nacional - Facultad Regional Santa Fe</institution>
          ,
          <addr-line>Lavaise 610 - S3004EWB - Santa Fe - SF -</addr-line>
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INGAR-UTN-CONICET</institution>
          ,
          <addr-line>Avellaneda 3657, S3002GJC Santa Fe</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>51</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>OntoQualitas is a framework to evaluate an ontology whose purpose is the interchange of information between different contexts. However, the framework does not propose acceptance thresholds of the measure values. In this paper, measures proposed in this framework are redefined in order to improve their usefulness in assessing the quality of such ontologies. These measures were calculated semi-automatically on a set of ontologies and its results were described by means of a statistical analysis as a first step to the definition of their acceptance thresholds.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology quality</kwd>
        <kwd>measure</kwd>
        <kwd>statistical analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Even after more than a decade since the emergence of ontologies in Computer
Science and with its growing use in different disciplines, standardized methods
have not been developed for evaluating their quality [8].</p>
      <p>
        Although methodologies, methods, techniques, and software tools to
support the ontology building process were proposed, ontology evaluation still plays
only a passive role in ontology engineering projects [17]. In order to assess the
ontology quality, different works have emerged depending on the kind of
ontologies being evaluated and for what purpose [
        <xref ref-type="bibr" rid="ref1 ref3 ref5 ref6">1, 3, 5–7, 9, 15, 20–22</xref>
        ]. These works
present different quality measures and evaluate some ontologies quantitatively.
However, specific studies have not been found about the suitable values of these
measures, their acceptance thresholds, and their impact on the quality of the
evaluated ontologies.
      </p>
      <p>
        Quality is not a property of something, but a judgment, so that should be in
relation to some purpose [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While issues such as orphan classes or consistency
in naming are important, the purpose for which the ontology is developed should
guide the evaluation of quality thus contributing to the enrichment of its quality.
The set of measures and their corresponding weights should be in relation with
the purpose of the ontology [15].
      </p>
      <p>A proposed framework to evaluate an ontology considering its specific
purpose is OntoQualitas, which includes known measures and new measures to
evaluate the quality of an ontology whose purpose is the interchange of
information in a collaborative business processes environment [15]. To this aim, a set
of requirements is identified that the ontology should fulfill and, associated with
them, it is identified a set of questions that reflect specific aspects relevant to
the evaluation of ontology. For each question, appropriate measures, their ranges
of possible values, and the optimal values are defined. However, the framework
does not propose acceptance thresholds of the measure values.</p>
      <p>In order to advance in the definition of these thresholds and their impact on
the ontology quality, the definition of the proposed measures should be analyzed
and, if necessary, modified to ensure their homogeneity. Then, it is necessary to
calculate the measures on a set of ontologies and conduct a descriptive statistical
analysis of the redefined measures in order to study their behavior.</p>
      <p>This paper presents the reformulation of some of the measures outlined in
OntoQualitas, resulting in measures that will be more convenient for evaluation
of the ontology quality. In addition, a statistical study of a set of ontologies is
shown, to whom the reformulated measures were calculated.</p>
      <p>The paper is organized as follows: Section 2 describes the main characteristics
of the OntoQualitas framework; Section 3 presents the reformulated measures;
Section 4 presents the results of the preliminary analysis of data. Results are
discussed in Section 5, which also includes the conclusions of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>OntoQualitas</title>
      <p>OntoQualitas is a framework to evaluate the quality of an ontology whose
purpose is the interchange of information between different contexts [15]. It is
structured from an overall requirement imposed on ontologies regarding its content
and structure, which is that the ontology should allow the interchange of
information between different contexts without imposing a global meaning of such
information to all involved contexts. From this overall requirement, three
specific requirements are derived: (i) the representation of information interchanged
should be formal, (ii) only the information strictly necessary for the interchange
must be represented, and (iii) the representation must allow a correct
interpretation of the interchanged information in all involved contexts.</p>
      <p>The second requirement aforementioned has two aspects: completeness and
conciseness. The third requirement has three aspects: semantic correctness,
syntactic correctness, and representation correctness, which is assessing the quality
of mappings of entities, relations, and features into the elements of the ontology.</p>
      <p>
        OntoQualitas specifies questions that help addressing relevant aspects for
ontology evaluation. For each question, appropriate measures are associated.
Some of them have been proposed with the objective of assessing the quality of
ontologies from a quantitative perspective [
        <xref ref-type="bibr" rid="ref3 ref5 ref6">3, 5, 6, 20, 21</xref>
        ]; others were proposed
with the aim of evaluating the mapping between domain entities, its relationships
and features, and the elements used for its representation [13].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Analysis of Measures</title>
      <p>
        In OntoQualitas, the value of some measures is provided in the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ],
others are provided in the range [0, n], some optimal values are 1, and others are
0. In order to quantify the different quality aspects and to compare values among
ontologies, it is necessary to homogenize the value ranges and optimal values of
the measures associated with each aspect. As a consequence, a first activity was
to modify the definition of some measures to ensure that all have the same scale
([
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]) and optimal value (1). Additionally, some measures can only be calculated
if the considered ontology has the corresponding characteristics. These situations
are explicitly identified in Tables 1 to 5.
      </p>
      <p>Completeness (Table 1) refers to the extension, degree, amount or coverage
to which the information in a user-independent ontology covers the information
of the real world [11].</p>
      <p>Concise (Table 2) refers to whether an ontology does not store any
unnecessary or useless definitions, if explicit redundancies do not exist between
definitions, and redundancies cannot be inferred using other definitions and
axioms [11].</p>
      <p>
        Syntactic correctness (Table 3) tries to evaluate the quality of the ontology
according to the way it is written, i.e. the correctness and breadth of syntax
used [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Semantic correctness (Table 4) deals with the vocabulary used to represent
entities, relations, and features, and the correctness of the representation of the
interchanged information in the ontology.</p>
      <p>Representation correctness (Table 5) is related to the quality of mappings of
entities, relations, and features into the elements of the ontology evaluated.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results of Preliminary Analysis of Data</title>
      <p>The results of this preliminary analysis are presented according to the second
and third requirements. Since the considered ontologies are formalized in OWL2,
the representation of information interchanged is formal, thus achieving the first
requirement.</p>
      <p>In order to evaluate reformulations to the OntoQualitas measures, ontologies
for information interchange between different contexts were needed. A set of
ontologies created by students from the course “Development of ontology-based
information systems” have been developed from the same specific instructions.
First, ontologies (called “base”) were developed by using an ontology learning
technique. Then, the representation of entities, their relationships and features
were enriched, using a proposed method [14]. These ontologies were called
“enriched”. Measures were calculated semi-automatically and the instructions were
the frame of reference.</p>
      <p>In the base ontologies, certain measures could not be calculated due to lack of
the corresponding characteristics. Therefore, in the subsequent statistical
analysis, the amount of data varies.
Coverage of dimensions [15] Coverage(Odfc; Fdfc) =| Odfc ∩ Fdfc | / | Fdfc |
Odfc: Set of dimensions used to specify entity contextual features in the ontology
Fdfc: Set of dimensions used to specify entity contextual features in a frame of reference
The frame of reference should have at least a dimension used to specify entity contextual features
* The measure was redefined
* Exhaustive subclass partition without common classes ESP N CC = 1 − ESP CC/C
ESP CC: Number of classes belonging to more than one subclass of an exhaustive partition in the
ontology
C: Number of classes in the ontology, without considering the root class (Thing)
The ontology should have at least a class, without considering the root class (Thing)
* Exhaustive subclass partition without external instances ESP N EI = 1 − ESP EI/I
ESP EI: Number of instances of a base class that do not belong to any class of the exhaustive subclass
partition of the base class
I: Number of instances in the ontology
The ontology should have at least an instance
* The measure was redefined
analysis of this set of measures. Mean lets see the behavior of each measure on
the set of ontologies; position measures, the dispersion of data (deviation; Q1,
first quartile; and Q3, third quartile).</p>
      <p>Regarding completeness (Table 6), only two of the nine measures have a
mean greater than 0.6. The measure with the highest mean value is Coverage
of dimensions (Coverage(Odfc; Fdfc)); 90.0% of ontologies have a value greater
than or equal to 0.9, meaning that most of the dimensions used to specify
entity contextual features were made explicit in the ontology. Then, Domains and
ranges of relations (DRR) follows with a mean of 0.63, which determines the
proportion of domain and range of the relations and functions exactly and
precisely delimited. The frame of reference had no instances. Then, the measures
Coverage of relations between instances and Coverage of instances, not listed in
Table 6, could not be calculated.</p>
      <p>In regards to conciseness (Table 7), except in Precision, all other measures
have high values. Half of ontologies have all of instances semantically different
and nonredundant instance-of relations (SDI and N RIR are optimal). The other
half has no instances. No ontologies with hierarchical relations have redundant
subclass-of relations (N RSR has optimum value in all measures). Semantically
different classes (SDC) has a mean of 0.75 and 75% of ontologies have a value
greater than or equal to 0.53, meaning that more than half of subclasses are
defined with different characteristics. 75% of ontologies do not have redundant
non-hierarchical relations (ON RR is optimal).</p>
      <p>In relation to the semantic correctness (Table 8), the measures are mostly
high. The hierarchies are well defined, without cycles (N CE0, N CE1, and
N CED), as well as the exhaustive subclass partitions (ESP N CI, ESP N CC,
and ESP N EI). By contrast, ontologies are moderately interpretable and
unclear; 75% of them have a value less than or equal to 0.5 and 0.4, respectively.</p>
      <p>As for syntactic correctness (Table 9), it can be observed that the ontologies
are syntactically correct, but the proportion of syntactic features used is very
low, despite the development of ontologies supported by a case tool.</p>
      <p>Finally, as to the representation correctness (Table 10), on average, 90%
of the intended use and simple features of entities is represented according to
its principle. However, only in 10% of cases, on average, the representation of
entities is performed through classes of ontology. The measures Principle of
complex entity features and Principle of common entity features could not be
calculated because the ontologies do not have these characteristics.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusions</title>
      <p>In this paper, the reformulation of some measures of the OntoQualitas framework
has been presented, and the results of a preliminary analysis over the values
obtained from applying such measures to a set of ontologies have been shown.</p>
      <p>According to the results, the evaluated ontologies do not fulfill adequately the
second requirement, i.e., the representation of the information strictly necessary
for the interchange. In part, this may be due to the ontology learning tool used to
generate the base ontologies that do not add necessary and sufficient conditions,
or existential and universal restrictions, among others.</p>
      <p>Looking at the syntactic correctness measures, it can be observed that the
richness of language was not seized, despite the use of case tools for the
development of ontologies. The use of ontology learning techniques contributed to this,
as only limited to map the elements of the source into the ontology language
elements, untapped all syntactic features available.</p>
      <p>As for the semantic interpretation, measures revealed that the names for the
ontology elements (classes, relations, properties) were not properly selected.</p>
      <p>Regarding the representation correctness, an unexpected result is the low
representation of entities through the ontology classes.</p>
      <p>Finally, these measures allow detecting errors in the development of
ontologies, which affects its quality. An exploratory analysis of the data allowed to
characterize the studied ontologies. Future work is to carry out an inferential
statistical analysis to a larger set of ontologies that allows analyzing the possible
interdependence between measures, define acceptance thresholds of measures,
and propose a strategy for assessing the quality of ontologies.
7. Duque-Ramos, A., Fern´andez-Breis, J., Stevens, R., Aussenac-Gilles, N.: OQuaRE:
A SQuaRE-based approach for evaluating the quality of ontologies. J. Res. Pract.</p>
      <p>Inf. Tech. 43(2), 159–176 (2011)
8. Duque-Ramos, A., Fern´andez-Breis, J., Iniesta, M., Dumontier, M., Egan˜a
Aranguren, M., Schulz, S., Aussenac-Gilles, N., Stevens, R.: Evaluation of the
OquaRE framework for ontology quality. Expert Syst. Appl. 40, 2669–2703 (2013)
9. Gangemi, A., Catenacci, C., Ciaramita, M., Lehmann, J.: Ontology evaluation and
validation. The Semantic Web: Research and Applications. 3rd European Semantic
Web Conference, ESWC, Proceedings, LNCS 4011, 140–154 (2006)
10. Gaˇsevi´c, D., Djuri´c, D., Devedˇzi´c, V.: Model driven architecture and ontology
development. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006)
11. G´omez-P´erez, A.: Evaluation of ontologies. Int. J. Intell. Syst. 16, 391–409 (2001)
12. Guarino, N.: Towards a formal evaluation of ontology quality. IEEE Intell. Syst.</p>
      <p>19(4), 74-81 (2004)
13. Rico, M.: Soporte para enriquecer la representaci´on de entidades en una ontolog´ıa.</p>
      <p>Tesis doctoral, Universidad Tecnol´ogica Nacional, Fac. Reg. Santa Fe, AR (2011)
14. Rico, M., Caliusco, M.L., Chiotti, O., Galli, M.R.: An approach to define
semantics for BPM systems interoperability. Enterprise Information Systems,
DOI:10.1080/17517575.2013.767381 (2013)
15. Rico, M., Caliusco, M.L., Chiotti, O., Galli, M.R.: OntoQualitas: A framework
for ontology quality assessment in information interchanges between heterogeneous
systems. Comput. Ind. 65(9), 1291–1300 (2014)
16. Romero Villafranca, R.: Curso de introducci´on a los m´etodos de an´alisis estad´ıstico
multivariante. Universitat Polit`ecnica de Val`encia, SP.UPV.95–606 (1995)
17. Simperl, E., Mochol, M., Bu¨rger, T.: Achieving maturity: The state of practice in
ontology engineering in 2009. Int. J. Comput. Sci. Appl. 7(1), 45–65 (2010)
18. Staab, S., Studer, R. (Eds.): Handbook on ontologies. International handbooks on
information systems. 2nd edn. Springer-Verlag Berlin Heidelberg (2009)
19. Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: Principles and
methods. Data. Knowl. Eng. 25(1-2), 161–197 (1998)
20. Stvilia, B.: A model for ontology quality evaluation. First Monday 12(12) (2007)
21. Tartir, S., Arpinar, I.B.: Ontology evaluation and ranking using OntoQA.
International Conference on Semantic Computing, ICSC, 185–192 (2007)
22. Vrandeˇci´c, D.: Ontology evaluation. PhD Thesis. Institute AIFB, University of</p>
      <p>Karlsruhe, Germany (2010)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brewster</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt</surname>
          </string-name>
          , N.:
          <article-title>Ranking ontologies with AKTiveRank</article-title>
          .
          <source>5th International Semantic Web Conference, ISWC. LNCS 4273</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. A´lvarez Su´arez,
          <string-name>
            <given-names>M.M.</given-names>
            ,
            <surname>Caballero</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , P´erez Lechuga, G.:
          <article-title>An´alisis multivariante: Clasificaci´on</article-title>
          , organizaci´on y validaci´on de resultados.
          <source>4th Int. Latin American &amp; Caribbean Conference for Engineering and Technology (LACCET)</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brank</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grobelink</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mladenic</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey of ontology evaluation techniques</article-title>
          .
          <source>Conference on Data Mining and Data Warehouses (SiKDD)</source>
          ,
          <fpage>166</fpage>
          -
          <lpage>169</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Breitman</surname>
            ,
            <given-names>K.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casanova</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Truszkowski</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Semantic Web: Concepts, technologies and applications</article-title>
          .
          <source>NASA monographs in systems and software engineering</source>
          . Springer-Verlag London Limited (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Burton-Jones</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>V.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugumaran</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahluwalia</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A semiotic metrics suite for assessing the quality of ontologies</article-title>
          .
          <source>Data Knowl. Eng</source>
          .
          <volume>55</volume>
          (
          <issue>1</issue>
          ),
          <fpage>84</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Colomb</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Quality of ontologies in interoperating information systems</article-title>
          .
          <source>Technical report 18/02</source>
          ISIB-CNR, Padova, Italy (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>