Problems impacting the quality of automatically built
                         ontologies
      Toader GHERASIM1 and Giuseppe BERIO2 and Mounira HARZALLAH3 and Pascale KUNTZ4


Abstract. Building ontologies and debugging them is a time-                  different researches recently focused on that issue [13, 2, 24]. How-
consuming task. Over the recent years, several approaches and tools          ever, as far as we know, a generic standardized description of these
for the automatic construction of ontologies from textual resources          errors does not still exist. It seems however a preliminary step for the
have been proposed. But, due to the limitations highlighted by ex-           development of assisted construction method.
perimentations in real-life applications, different researches focused          In this paper, we focus on the most important errors that affect
on the identification and classification of the errors that affect the on-   the quality of semi-automatically built ontologies. To get closer the
tology quality. However, these classifications are incomplete and the        operational concerns we propose a detailed typology of the different
error description is not yet standardized. In this paper we introduce        types of problems that can be identified when evaluating an ontology.
a new framework providing standardized definitions which leads to            Our typology is inspired from a generic standardized description of
a new error classification that removes ambiguities of the previous          the notion of quality in conceptual modeling [18]. And, our analysis
ones. Then, we focus on the quality of automatically built ontologies        is applied on a real-life situation concerning the manufacturing of
and we present experimental results of our analysis on an ontology           pieces in composite materials for the aerospace industry.
automatically built by Text2Onto for the domain of composite mate-              The rest of this paper is organized as follows. Section 2 is a state-
rials manufacturing.                                                         of-the art of the ontology errors. Section 3 describes a framework
                                                                             which provides a standardized description of the errors and draws
                                                                             correspondences between our new classification and the main errors
1   Introduction                                                             previously identified in the literature. Section 4 presents our experi-
Since the pioneering works of Gruber [15], ontologies play a ma-             mental results in the domain of composite materials manufacturing.
jor role in knowledge engineering whose importance is growing with           More precisely, we analyze errors affecting an ontology produced by
the rise of the semantic Web. Today they are an essential component          an automatic construction tool (here Text2Onto) from a set of tech-
in numerous applications in various fields: e.g. information retrieval       nical textual resources.
[22, 20], knowledge management [26], analysis of social semantic
networks [8] and business intelligence [27]. However, despite the            2     State-of-the art on ontological errors
maturity level reached in ontology engineering, important problems
                                                                             In the literature, the notion of ”ontological error” is often used in a
remain open and are still widely discussed in the literature. The most
                                                                             broad sense covering a wide variety of problems which affect the on-
challenging issues concern the automation of ontology construction
                                                                             tology quality. But, from several studies published this last decade,
and their evaluation.
                                                                             we have identified four major denominations associated to comple-
   The increasing popularity of ontologies and the scaling changes of
                                                                             mentary definitions: (1) ”taxonomic errors” [14, 13, 9, 2], (2) ”design
this last decade have motivated the development of ontology learn-
                                                                             anomalies” or ”deficiencies” [2, 3], (3) ”anti-patterns” [7, 25, 23],
ing techniques. Promising results have been obtained [6, 5]. And,
                                                                             and (4) ”pitfalls” or ”worst practices [23, 24].
although these techniques have been often experimentally proved to
be not sufficient enough for constructing ready-to-use ontology [5],
their interest is not questioned in particular in technical domains [17].    2.1    Taxonomic errors
Few recent works recommend an integration between ontology learn-            From the pioneering works of Gomez-Perrez [14], the denomination
ing techniques and manual intervention [27].                                 ”taxonomic error” is used to refer to three types of errors that affect
   Whatever their use, it is essential to assess their quality through-      the taxonomic structure of ontologies: inconsistency, incompleteness
out their development. Several ontology quality criteria and dif-            and redundancy. Recently, extensions have been proposed to non-
ferent evaluation methods have been proposed in the literature               taxonomic properties [3], but in this synthesis we focus on taxonomic
[19, 4, 11, 21, 1]. However, as mentioned by [28], defining ”a good          errors.
ontology” remains a difficult problem and the different approaches              Inconsistencies in the ontology may be logical or semantic. More
only permit to ”recognize problematic parts of an ontology”. From            precisely, three classes of inconsistencies in the taxonomic structure
an operational point of view, error identification is a very important       have been detailed: circularity errors (e.g. a concept that is a special-
step for the ontology integration in real-life complex systems. And,         ization or a generalization of itself), partitioning errors which pro-
1 LINA, UMR 6241 CNRS, e-mail: toader.gherasim@univ-nantes.fr                duce logical inconsistencies (e.g. a concept defined as a specializa-
2 LABSTICC, UMR 6285 CNRS, email: giuseppe.berio@univ-ubs.fr                 tion of two disjoint concepts), and semantic errors (e.g. a taxonomic
3 LINA, UMR 6241 CNRS, e-mail: mounira.harzallah@univ-nantes.fr              relationship between two concepts that is not consistent with the se-
4 LINA, UMR 6241 CNRS, e-mail: pascale.kuntz@polytech.univ-nantes.fr
                                                                             mantics of the latter).
   Incompleteness is met when concepts or relations of specialization        ontology). One class corresponds to the functional dimension: ”re-
are missing, or when some distributions of the instances of a concept        quirement completeness” (RC, when the ontology does not cover its
between its sons are not stated as exhaustive and/or disjoint.               specifications). And, two classes correspond to the usability dimen-
   In the opposite way, redundancy errors are met when a taxonomic           sion: ”ontology understanding” (OU, information that makes under-
relationship can be directly deduced by logical inference from the           standability more difficult e.g. concept label polysemy or label syn-
other relationships of the ontology, or when concepts with the same          onymy for distinct concepts, non explicit declaration of inverse rela-
father in the taxonomy do not share any common information (no               tions or equivalent properties) and ”ontology clarity” (OC, e.g. vari-
instances, no children, no axioms, etc.) and can be only differentiated      ations of writing-rule and typography for the labels).
by their names.                                                                 It is easy to deduce from this classification that some pitfalls
                                                                             should belong to different classes associated to different dimensions
                                                                             (e.g. the fact that two inverse relations are not stated as inverse is
2.2    Design anomalies                                                      both a ”no inference” (NI) pitfall and an ”ontology understanding”
                                                                             (OU) pitfall). Another attempt [24] proposed a classification of the
Roughly speaking, design anomalies mainly focus on ontology un-
                                                                             24 identified pitfalls in the three error classes (inconsistency, incom-
derstanding and maintainability. They are not necessarily errors but
                                                                             pleteness and redundancy) given by Gomez-Perrez et al. [14]. But,
undesirable situations. Five classes of design anomalies have been
                                                                             these classes are concerned by the ontology structure and content,
described: (1) ”lazy concepts” (leaf concepts in the taxonomy not
                                                                             and consequently four pitfalls associated with the ontology context
implied in any axiom and without any instances); (2) ”chains of in-
                                                                             do not fit with this classification.
heritance” (long chains composed of intermediate concepts with a
                                                                                In order to highlight the links between the different classifications,
single child); (3) ”lonely disjoint” concepts (superfluous disjunction
                                                                             Poveda et al. tried to define a mapping between the classification in 7
axiom between distant concepts in the taxonomy which may disrupt
                                                                             classes deduced from the dimensions defined by Gangemi et al. [11]
inference reasoning); (4) ”over-specific property range” (too specific
                                                                             and the 3 error classes proposed by Gomez-Perrez et al. [14]. How-
property range which should be replaced by a coarser range which
                                                                             ever, this task turned out to be very complex, and only four pitfall
fits the considered domain better); (5) ”property clumps” (duplica-
                                                                             classes exactly fit with one of the error classes. For the other, there is
tion of the same properties for a large set of concepts instead of the
                                                                             overlapping or no possible fitting.
inheritance of these properties from a more general concept).

                                                                             3     The framework
2.3    Anti-patterns
                                                                             The state of the art briefly presented in the previous section shows
Ontology design patterns (ODP) are formal models of solutions com-           that the terminology used for describing the different problems im-
monly used by domain experts to solve recurrent modeling problems.           pacting on the quality of ontologies is not yet standardized and that
Anti-patterns are ODP that are a priori known to produce incon-              existing classifications do not cover the whole diversity of problems
sistencies or unsuitable behaviors. [23] also called anti-patterns ad-       described in the literature.
hoc solutions specifically designed for a problem even if well-known            In this section we present a framework providing standardized def-
ODP are available. Three classes of anti-patterns have been described        initions for quality problems of ontologies and leading to a new clas-
[7, 25, 23]: (1) ”logical anti-patterns” that can be detected by logi-       sification of these problems. The framework comprises two distinct
cal reasoning; (2) ”cognitive anti-patterns” (possible modeling errors       and orthogonal dimensions: errors vs. unsuitable situations (first di-
due to misunderstanding of the logical consequences of the used ex-          mension) and logical facet vs. social facet of problems (second di-
pression); (3) ”guidelines” (complex expressions valid from a logical        mension).
and a cognitive point of view but for which simpler or more accurate            Unsuitable situations identify problems which do not prevent the
alternatives exist).                                                         usage of an ontology (within specific targeted domain and applica-
                                                                             tions). On the contrary, errors identify problems preventing the usage
                                                                             of an ontology.
2.4    Pitfalls                                                                 It is well known that one ontology has two distinct facets: an on-
Pitfalls are complementary to ODPs. Their broad definition covers            tology can be processed by machines (according to its logical speci-
problems affecting the ontology quality for which ODPs are not               fication) and can be used by humans (including an implicit reference
available. Poveda et al. [24] described 24 types of experimentally           to a social sharing).
identified pitfalls as, for instance, forgetting the declaration of an in-      The remainder of the section is organized alongside the second di-
verse relation when this latter exists or of the attribute range. And        mension (i.e. logic vs. social facet) and within each facet, errors and
they proposed a pitfall classification which follows the three evalu-        unsuitable situations are defined. The framework is based on ”nat-
able dimensions of an ontology proposed by Gangemi et al. [11]:              ural” analogies between respectively social and logical errors and
(1) structural dimension (aspects related to syntax and logical prop-        social and logical unsuitable situations.
erties), (2) functional dimension (how well the ontology fits a pre-
defined function), (3) the usability dimension (to which extent the          3.1     Problem classification
ontology is easy to be understood and used). Four pitfall classes cor-
                                                                             3.1.1    Logical ground problems
respond to the structural dimension: ”modeling decisions” (MD, sit-
uations where OWL primitives are not used properly), ”wrong infer-           The logical ground problems can be formally defined by consider-
ence” (WI, e.g. relationships or axioms that allow false reasoning),         ing notions defined by Guarino et al. [16]: e.g. Interpretation (Ex-
”no inference” (NI, gaps in the ontology which do not allow infer-           tensional first order structure), Intended Model, Language, Ontology
ences required to produce new desirable knowledge), ”real world              and the two usual relations , ` provided in any logical language.
modeling” (RWM, when commonsense knowledge is missing in the                 The relation  is used to express both that one interpretation I is a
model of a logical theory T , written as I  T (i.e. all the formulas                reasoning according to the targeted ontology applications (for-
in T are true in I, written for each formula ϕ ∈ T , I  ϕ), and                     mally, for some specific formula ϕ, true in the intended models
also for expressing the logical consequence (i.e. that any model of a                O  ϕ, cannot be derived O 0 ϕ within those suitable reasoning
logical theory T is also a model of a formula, written as T  ϕ). The                systems);
relation ` is used to express the logical calculus i.e. the set of rules
used to prove a theorem (i.e. any formula) ϕ starting from a theory                  The most common logical ground unsuitable situations are
T , written as T ` ϕ.                                                             listed below. These situations impact negatively on the ”non func-
   Examples and formalizations hereinafter are provided by using a                tional qualities” of ontologies such as reusability, maintainability, ef-
typical Description Logics notation (but easily transformable in first            ficiency as defined in the ISO 9126 standard for software quality.
order or other logics).
   The usual logical ground errors are listed below.                              6. Logical equivalence of distinct artifacts (concepts / relationships
                                                                                     / instances) i.e. whenever two distinct artifacts are proved to be
1. Logical inconsistency corresponding to ontologies containing log-                 logically equivalent; for example, A and B are two concepts in O
   ical contradictions for which a model does not exist (because the                 and O  A = B;
   set of intended models is never empty, an ontology without mod-                7. Symmetrically, logically indistinguishable artifacts i.e. whenever
   els does not make sense anyway; formally, given an ontology O                     it is not possible to prove that two distinct artifacts are not equiv-
   and the logical consequence relation  according to the logical                   alent from a logical point of view; in other words, if not possi-
   language L used for building O, there is no interpretation I of                   ble to prove anyone of the following statements: (O  A = B),
   O such that I  O). For example, if an ontology contains the                      (O  A ∩ B ⊆⊥) and (O  c ⊆ AandO  c ⊆ B); this case
   following axioms B ⊆ A (B is a A), A ∩ B ⊆ > (A and B                             (7) can be partially covered in the case (3) above whenever in-
   are disjoint), c ⊆ B (c is instance of B), then c ⊆ A and                         tended models provide precise information on the equivalence or
   c ⊆ A ∩ B, so there is a logical contradiction in the definition of               the difference between A and B;
   this ontology;                                                                 8. OR artifacts i.e. an artifact A equivalent to a disjunction like C∪S,
2. Unadapted5 ontologies wrt to intended models6 i.e. an ontology                    A 6= C, S but for which, if applicable, it does not exist at least a
   for which something that is false in all (some of) the intended                   common (non optional) role / property for C and S or because C
   models of L is true in the ontology; formally, there exists a for-                and S have common instances; in the first case, a simple formal-
   mula ϕ such that for each (for some) intended model(s) of L, ϕ                    ization can be expressed by saying that it does not exist a (non
   is false and O  ϕ. For example, if we have in the ontology two                   optional) role R such that O  (C ∪ S) ⊆ ∃R.>; in the second
   concepts A and B that are declared as disjoint (O  A ∩ B ⊆⊥)                     case, an even simpler formalization is O  c ⊆ C and O  c ⊆ S,
   and in each intended model there exists an instance c that is com-                being c one constant not part of O; the first case targets potentially
   mon between A and B (i.e. c ⊆ A ∩ B), then the ontology is                        heterogeneous artifacts such as Car ∪ P erson, with probably
   unadapted;                                                                        no counterpart in the intended models, thus possibly leading to
3. Incomplete ontologies wrt to intended models i.e. an ontology for                 unadapted ontologies according to case (2) above; the second case
   which something that is true in all the intended models of L, is                  targets potential ambiguities as, for instance, one role (property)
   not necessarily true in all the models of O; formally, there exists               R logically equivalent to a disjunction (R1 ∪ R2 ) being (R1 ∩ R2 )
   a formula ϕ such that for each intended model of L, ϕ is true and                 satisfiable;
   O 2 ϕ. As an example, if in all the intended models C ∪ B = A,                 9. AND artifacts i.e. one artifact A equivalent to a conjunction like
   and the ontology O defines B ⊆ A and C ⊆ A, it is not possible                    C ∩ S, A 6= C, S but for which, if applicable, it does not exist
   to prove that C ∪ B = A;                                                          at least a common (non optional) role / property for C and S;
4. Incorrect (or unsound) reasoning wrt the logical consequence i.e.                 this case is relevant to limit as much as possible some potentially
   when some specific conclusions are derived by using suitable rea-                 heterogeneous artifacts such as Car ∩P erson, possibly leading
   soning systems for targeted ontology applications even if these                   to artifact unsatisfiability;
   conclusions are not true in the intended models and must not be               10. While some case of unsatisfiability of ontology artifacts (concepts,
   derived by any reasoning according to the targeted ontology appli-                roles, properties etc.) can be covered by (2) because intended mod-
   cations (formally, when a specific formula ϕ, false in the intended               els may not contain void concepts, unsatisfiability tout-court is not
   models O 2 ϕ, can be derived O ` ϕ within any of those suitable                   necessarily an error but a situation which is not suitable for ontol-
   reasoning systems);                                                               ogy artifacts (i.e. given an ontology artifact A, O  A ⊆ ⊥); even
5. Incomplete reasoning wrt the logical consequence i.e. when some                   if in ontologies it might be possible to define what must not be true
   specific conclusions cannot be derived by using suitable reasoning                (instead of what must be true), this practice is not encouraged;
   systems for targeted ontology applications even if these conclu-              11. High complexity of the reasoning task i.e. whenever something
   sions are true in intended models and must be derived by some                     is expressed in a way that complicates the reasoning, while there
5 We use the term ”unadapted” instead of ”incorrect” ontologies because it           exist more simple ways to express the same thing;
  remains unclear if intended models are defined for building the ontology       12. Ontology not minimal i.e. whenever the ontology contains unnec-
  or may also be defined independently. However, if intended models are              essary information:
  defined for building the ontology, the term ”incorrect” may be more appro-
  priate.                                                                            • Unnecessary because it can be derived or built7 . An example of
6 Intended models should have been defined fully and independently as in the           such unsuitable situation is the redundancy of taxonomic rela-
  case of models representing abstract structures or concepts such as num-             tions such as whenever A ⊆ B, B ⊆ C, and A ⊆ C are all
  bers, processes, events, time and other ”upper concepts”, often defined ac-
  cording to their own properties. If intended models are not available, some
                                                                                       ontology axioms, the last axiom can be derived from the first
  specific entailments can be defined as facts that should necessarily be true         two ones;
  in the targeted domain (or for targeted applications); specific counterexam-
  ples can also be defined instead of building entire intended models.            7 Built means that the artifact can be defined by using other artifacts.
    • Unnecessary because it is not part of the intended models. For         11. Lack of adapted and certified versions of the ontology in various
      instance, a concept A being part of the ontology (language) but            languages requires specific efforts by social actors for understand-
      not defined by intended models.                                            ing and learning the ontology but also to use the ontology in spe-
                                                                                 cific standard contexts (limited compliance); there are no natural
                                                                                 analogies;
 3.1.2    Social ground problems                                             12. Socially useless artifacts included in the ontology; a natural anal-
 Social ground problems are related to the perception (interpreta-               ogy is with ontology not minimal.
 tion) and the targeted usage of ontologies by social actors (humans,
 applications based on social artifacts like WordNet, etc.). Percep-          3.2    Positioning state of the art relevant problem
 tion (interpretation) and usage may not be formalized at all. In some               classes in to the proposed framework
 sense, a further distinction between social facet and logical facet is
 as the distinction between respectively tacit and explicit knowledge.        The precise definitions of the proposed framework allow us to clas-
    There are four social ground errors:                                      sify most of the ontology quality problems described in literature. Ta-
                                                                              ble 1 presents our classification of the different problems mentioned
 1. Social contradiction i.e. the perception (interpretation) that the so-    in Section 2. Some of the problems described in literature may cor-
    cial actor gives to the ontology or to the ontology artifacts is in       respond to more than one class of problems from our framework, as
    contradiction with the ontology axioms and their consequences; a          the definitions of these problems are often very large and sometimes
    natural analogy is with unadapted ontologies;                             ambiguous.
 2. Perception of design errors i.e. the social actor perception ac-             Table 1 reveals, at a first view, that the proposed framework pro-
    counts for some design errors such as modeling instances as con-          vides additional problems that are not directly pointed out, to our
    cepts; a natural analogy is with unadapted ontologies;                    knowledge, in the current literature about ontology quality and eval-
 3. Socially meaningless i.e. the social actor is unable to give any in-      uation (but may be mentioned elsewhere). These problems are No
    terpretation to the ontology or to ontology artifacts as in the case      adapted and certified ontology version, Indistinguishable artifacts,
    of artificial labels such as ”XYHG45”; a natural analogy is with          Socially meaningless, High complexity of the reasoning task and In-
    unadapted ontologies;                                                     correct reasoning. However, while covered, other problems are, in
 4. Social incompleteness i.e. the social actor perception is that one or     our opinion, too much narrowly defined in existing literature about
    several artifacts (axioms and/or their consequences) are missing in       ontology quality and evaluation. For instance, No standard formal-
    the ontology; a natural analogy is with incomplete ontologies;            ization is specific to very simple situations while we refer to com-
                                                                              plete non standard theories.
    The social ground unsuitable situations are mostly related to the            A deeper analysis of Table 1 reveals that the ”logical anti-patterns”
 difficulties that a social actor has to overcome for using the ontology      presented in [7, 25] belong to the logical ground category and are
 especially due to limited understandability, learnability and compli-        focusing on unadapted ontologies error and unsatisfability unsuit-
 ance (as defined in ISO 9126). As for the logical ground unsuitable          able situation. The ”non-logical anti patterns” presented in [7, 25]
 situations, it is difficult to dress an exhaustive list; the most common     partially cover the logical ground unsuitable situations. The ”guide-
 and important are listed below.                                              lines” presented in [7, 25] span only over unsuitable situations from
                                                                              both logical and social ground category.
 5. Lack of or poor textual explanations i.e. when there are few, no or          What is qualified as ”inconsistency” in [14] span over errors and
    poor annotations; prevents understanding by social actors; there          unsuitable situations and also (as in the case of ”semantic incon-
    are no natural analogies;                                                 sistency”) over the two dimensions (logical and social), making, in
 6. Potentially equivalent artifacts i.e. the social actors may identify      our opinion, the terminology a little bit confusing. According to our
    as equivalent (similar) distinct artifacts as in the case of artifacts    framework, we perceive ”circularity in taxonomies”, as defined in
    with synonymous or exactly the same labels assigned to distinct           [14], as an unsuitable situation (logical equivalence of distinct arti-
    artifacts; a natural analogy is with logically equivalent artifacts;      facts) because, from a logical point of veiw, this only means that ar-
 7. Socially indistinguishable artifacts i.e. the social actors would not     tifacts are equivalent (not requiring a fixpoint semantics). However,
    be able to distinguish two distinct artifacts as, for instance, in the    ”circularity in taxonomies” can be seen also within a social contra-
    case of artifacts with polysemic labels assigned to distinct arti-        diction if actors assign distinct meanings to the various involved ar-
    facts; a natural analogy is with logically indistinguishable arti-        tifacts. The problems presented as ”incompleteness errors” in [13]
    facts;                                                                    belong to the incomplete ontologies class of logical errors. The ”re-
 8. Artifacts with polysemic labels may be interpreted as union or in-        dundancy errors” fits, in our classification, within the ontology not
    tersection of their several rather distinct meanings associated to        minimal class of logical unsuitable situations.
    labels; a natural analogy is therefore with OR and AND artifacts.            None of the ”design anomalies” presented in [2] is perceived as a
 9. Flatness of the ontology (or non modularity), i.e. ontology pre-          logical error. Two of them correspond to a logical unsuitable situation
    sented as a set of artifacts without any additional structure, espe-      (logically undistinguishable artifacts), one to a social error (percep-
    cially if coupled with a important number of artifacts; a natural         tion of design errors) and the last one to a social unsuitable situation
    analogy is with high complexity of the reasoning task but also pre-       (no standard formalization).
    venting effective learning and understanding by social actors;               Concerning ”pitfalls” [24], the most remarkable fact concerns
10. Non-standard formalization of the ontology, using a very specific         what we call incomplete reasoning. Indeed, introducing ad-hoc re-
    logics or theory, requires a specific effort by social actors for un-     lations such as is a, instance of , etc., replacing the ”standard” re-
    derstanding and learning the ontology but also to use the ontology        lations such as subsumption, member of , etc., should not be con-
    in standard contexts (reduced compliance); there are no natural           sidered as a case of incomplete ontologies but as a case of incomplete
    analogies;                                                                reasoning. This is because accepting a specific ontological commit-
                                                        Table 1. Positioning state of the art relevant problem classes in to the proposed framework.

                                              Framework                                                                State of the art problems
                                         1    Logical inconsistency           â inconsistency error: ”partition errors - common instances in disjoint decomposition”
                                                                              â inconsistency errors: ”partition errors - common classes in disjoint decomposition”, ”semantic inconsistency”
                                                                              â logical anti-patterns: ”OnlynessIsLoneliness”, ”UniversalExistence”, ”AndIsOR”, ”EquivalenceIsDifference”
                                         2    Unadapted ontologies
                                                                              â pitfalls: P5 (wrong inverse relationship, WI), P14 (misusing ”allValuesFrom”, MD), P15 (misusing ”not
                                                                              some”/”some not”, WI), P18 (specifying too much the domain / range, WI), P19 (swapping ∩ and ∪, WI)
                 Errors


                                                                              â incompleteness errors: ”incomplete concept classification”, ”disjoint / exhaustive knowledge omission”
                                                                              â pitfalls: P3 (”is a” instead of ”subclass-of”, MD), P9 (missing basic information, RC & RWM), P10 (missing
                                         3    Incomplete ontologies
                                                                              disjointness, RWM), P11 (missing domain / range in prop., NI & OU), P12 (missing equiv. prop., NI & OU), P13
                                                                              (missing inv. rel., NI & OU), P16 (misusing primitive and defined classes, NI)
                                         4    Incorrect reasoning
                                         5    Incomplete reasoning            â pitfalls: P3 (using ”is a” instead of ”subclass-of”, MD), P24 - using recursive def., MD)
                                                                              â inconsistency error: ”circularity”
                                         6    Logical equivalence of dis-
Logical ground


                                                                              â pitfall: P6 (cycles in the hierarchy, WI)
                                              tinct artifacts                 â non logical anti-pattern: ”SynonymeOfEquivalence”

                                         7    Logically indistinguishable     â pitfall: P4 (unconnected ontology elements, RC)
                                              artifacts                       â design anomalies: ”lazy concepts” and ”chains of inheritance”
                 Unsuitable situations


                                         8    OR artifacts                    â pitfall: P7 (merging concepts to form a class, MD & OU)
                                         9    AND artifacts                   â pitfall: P7 (merging concepts to form a class, MD & OU)
                                                                              inconsistency error: ”partition errors - common classes in disjoint decomposition”
                                         10   Unsatisfiability
                                                                              â logical anti-patterns: ”OnlynessIsLoneliness”, ”UniversalExistence”, ”AndIsOR”, ”EquivalenceIsDifference”

                                         11   High complexity of the rea-
                                              soning task
                                                                              â redundancy error: ”redundancy of taxonomic relations”
                                                                              â pitfalls: P3 (using ”is a” instead of ”subclass-of”, MD), P7 (merging concepts to form a class, MD & OU), P21
                                         12   Ontology not minimal            (miscellaneous class, MD)
                                                                              â non logical anti-pattern: ”SomeMeansAtLeastOne”
                                                                              â guidelines: ”Domain&CardinalityConstraints”, ”MinIsZero”
                                                                              â inconsistency error: ”semantic inconsistency”
                                                                              â logical anti-pattern: ”AndIsOR”
                                         1    Social contradiction            â pitfalls: P1 (polysemic elements, MD), P5 (wrong inv. rel., WI), P14 (misusing ”allValuesFrom”, MD), P15
                                                                              (misusing ”not some”/”some not”, WI), P19 (swapping ∩ and ∪, WI)
                 Errors


                                                                              â pitfalls: P17 (specializing too much the hierarchy, MD), P18 (specifying too much the domain / range, WI), P23
                                                                              (using incorrectly ontology elements, MD)
                                         2    Perception of design errors
                                                                              â non logical anti-pattern: ”SumOfSome”
                                                                              â design anomaly: ”lonely disjoints”
                                         3    Socially meaningless
Social ground


                                                                              â pitfalls: P12 (missing equiv. prop., NI & OU), P13 (missing inv. rel., NI & OU), P16 (misusing primitive and
                                         4    Social incompleteness
                                                                              defined classes, NI)

                                         5    Lack/poor textual explana-
                                                                              â pitfalls: P8 (missing annotation, OC & OU)
                                              tions
                                         6    Potentially equiv. artifacts    â pitfalls: P2 (synonym as classes, MD & OU)
                 Unsuitable situations


                                         7    Indistinguishable artifacts
                                         8    Polysemic labels                â pitfalls: P1 (polysemic elements, MD & OU)
                                         9    Flatness of the ontology
                                                                              â pitfalls: P20 (swapping label and comment, OU), P22 (using different naming criteria in the ontology, OC)
                                         10   No standard formalization       â guidelines: ”GroupAxioms”, ”DisjointnessOfComplement” and ”Domain&CardinalityConstraints”
                                                                              â design anomaly: ”property clumps”

                                         11   No adapted and certified
                                              ontology version
                                         12   Useless artifacts               â pitfall: P21 (using a miscellaneous class, MD & OU)
ment for building intended models, ad-hoc relations can be defined
                                                                              Table 2.   What problems are expected in automatically built ontologies.
in the same way as standard relations. However, using standard rea-
soning it is expected (and even proved once fixing the logics) that
                                                                              Types of problems            Expected (Yes/No) and Why
reasoning algorithms are incomplete. However, adding artifacts may
                                                                                                           N (no axiom is defined ⇒ contradictions are
also solve some incompleteness and may also be useful for speeding            1. Logical inconsistency     unexpected; but they remain possible in the
up reasoning.                                                                                              case of future enrichments)
   Only one of the seven classes of ”pitfalls” [24] perfectly fits in one                                  Y (taxonomic relationships extraction algo-
class of our typology: the ”real world modeling” pitfalls belong to the       2. Unadapted ontologies      rithms are syntax based 6= from the intended
incomplete ontologies logical errors. All the ”ontology clarity” pit-                                      models)
falls are social unsuitable situations. All the ”requirement complete-                                     Y (automatically extracted knowledge is
ness” pitfalls are logical problems. The ”no inference” pitfalls are          3. Incomplete ontologies     limited to concepts and taxonomies 6= from
logical or social incomplete ontologies errors. Most (6/9 and 4/5) of                                      the intended models)
the ”modeling decisions” and ”wrong inference” pitfalls are consid-           4. Incorrect reasoning       N (they might appear for complete formal-
ered as errors. The class of ”ontology understanding” pitfalls spans          5. Incomplete reasoning      ization of concepts and relationships)
over 10 classes of problems, covering logical and social errors and           6. Logical equivalence of    Y (automatic tools consider that each term
unsuitable situations.                                                        distinct artifacts           defines a different artifact ⇒ the ontology
   Most (16/20) of the pitfalls concerning the ”structural dimension”         7. Logically indistin-       may contain logically equivalent & logically
of the ontology [11] are perceived as errors. All (2/2) the pitfalls          guishable artifacts          indistinguishable artifacts)
concerning the ”functional dimension” of the ontology are logical             8. OR artifacts              Y (polysemy of terms directly affects con-
problems.                                                                                                  cepts / relationships: OR / AND concepts /
                                                                              9. AND artifacts             relationships may appear)

4     Problems that affect the quality of automatically                                                    Y (polysemy of terms directly affects con-
      built ontologies                                                                                     cepts / relationships: these latter may be-
                                                                              10. Unsatisfiability         come unsatisfiable if their polysemic senses
Although the proposed framework is general, we are especially con-                                         are combined)
cerned by ontologies automatically built from textual resources. We                                        N (few or no axioms are defined ⇒ reason-
                                                                              11. High complexity of       ing remains very basic; but, it can be more
therefore aim at pointing the problems that are expected in automat-          the reasoning task
ically constructed ontologies (i.e. there is evidence of their presence                                    complex if the ontology is further enriched)
or they will appear in future enrichments8 of the ontology). We are           12. Ontology not mini-       Y (automatic tools introduce redundancies in
also interested by the opposite case, i.e. if there are unexpected prob-      mal                          taxonomies)
lems in automatically constructed ontologies: it should be noted that
unexpected problems are problems that even if the ontology may suf-                                        Y (ontologies are built from limited textual
                                                                              1. Social contradiction      resources which may introduce contradiction
fer of them, there is no evidence of their presence/absence for the                                        in taxonomies)
ontology as it is (however, these problems may appear in future en-
                                                                                                           Y (the built ontology may contain concepts
richments of the ontology). Our analysis is performed in two steps. In        2. Perception of design      that are considered more close to instances
the first step (Section 4.1), we point out expected/unexpected prob-          errors                       by the social actor.)
lems due to inherent limitations of the tools for automatic ontology                                       Y (several meaningless concepts with ob-
construction. In the second step (Section 4.2), we assess the results         3. Social meaningless        scure labels are often introduced)
obtained in the first step by discussing our experience with the tool         4. Social incompleteness     Y (probably due to limited textual corpus)
Text2Onto.
                                                                              5. Lack of or poor textual   Y (usually automatic tools do not provide
                                                                              explanations                 textual explanations)
4.1    Expected and unexpected problems in an                                                              Y (automatic tools consider that each term
       automatically built ontology                                           6. Potentially equivalent    defines a different artifact ⇒ distinct con-
                                                                              artifacts                    cepts can have synonymous labels ⇒ these
In a previous work [12] we have deeply studied four approaches (and                                        latter are perceived as potentially equivalent)
associated tools) for the automatic construction of ontologies form                                        Y (the ontology is incomplete ⇒ it contains
texts and we compared them with a classical methodology for manual            7. Indistinguishable arti-   concepts that can be distinguished only by
ontology construction (Methontology). This analysis highlighted that          facts                        their labels; if such concepts have synony-
                                                                                                           mous labels, they are indistinguishable)
none of the automated approaches (and associated tools) covers all
the tasks and subtasks associated to each step of the classical manual                                     Y (automatic tools consider that each term
                                                                              8. Artifacts with poly-      defines a different artifact ⇒ it is possible to
method. The ignored tasks/subtasks are:                                       semic labels                 have concepts with polysemic labels)
1. The explicit formation of artifacts (concepts, instances and rela-                                      Y (the ontology is poorly structured and has
                                                                              9. Flatness of the ontol-    no design constraints - e.g. no disjunction ax-
   tionships) from terms9 ; usually, the automatic tools consider that        ogy                          iom, lazy concepts)
   each term represents a distinct artifact: they do not group synony-
   mous terms and do not choose a single sense for polysemic terms            10. No standard formal-      N (automatic tools usually can export their
                                                                              ization                      results in different formalization)
2. The identification of axioms (e.g. the disjunction axioms)
3. The identification of attributes for concepts                                                           Y (automatically obtained results closely de-
                                                                              11. No adapted and cer-      pend on the input texts language (often En-
4. The identification of natural language definitions for concepts            tified ontology version      glish) and certifying them is difficult)

8 Enrichment should be understood as adding artifacts to the existing ones.
                                                                                                           Y (automatic tools often generate useless ar-
                                                                              12. Useless artifacts        tifacts from additional external resources)
9 A term corresponds to one or several words found in one text.
  Table 2 provides a complete view of expected and unexpected
problems according to our experience and suggest why each prob-
lem is expected or not.


4.2     Experience with Text2Onto                                           Table 3.   Types of problems identified in the automatically constructed
                                                                                                           ontology.
4.2.1    The experimental setup                                             Types of problems            Identyfied (Yes/No) and How
                                                                            1. Logical inconsistency     No
During the last two years we were implied in a project called ISTA3
that proposed an ontology based solution for problems related to the        2. Unadapted ontologies      No
integration of heterogeneous sources of information. The application                                     Yes: Some relationships are missing to con-
domain was the management of the production of composite compo-             3. Incomplete ontologies     nect the 389 lazy concepts; some of them are
                                                                                                         explicitly indicated in the textual corpus
nents for the aerospace industry. In this context, we tried to simplify
the process of deploying the interoperability solution in new domains       4. Incorrect reasoning       No
by using automatic solution for constructing the required ontologies.       5. Incomplete reasoning      No
   The analysis presented in [12] conducted us to choose Text2Onto          6. Logical equivalence of    Yes: 3 cycles in the hierarchy; (automatically
[6] for the automatic construction of our ontologies. Text2Onto takes       distinct artifacts           detected by reasoners)
as input textual resources from which it extracts different ontologi-                                     Yes:
cal artifacts (concepts, instances, taxonomic relationships, etc.) that                                  * 389 lazy concepts (automatically identified
                                                                                                         by an ad-hoc algorithm)
are structured together to construct an ontology. Text2Onto perfor-         7. Logically indistin-       *73 groups of ”leaf” concepts; each group is
mances for extracting concepts and taxonomical relationships are            guishable artifacts          composed of concepts that are indistinguish-
better than its performances for extracting other types of ontologi-                                     able; (automatically identified by an ad-hoc
cal artifacts; consequently, in our tests we used Text2Onto for con-                                     algorithm)
structing ontologies containing concepts and taxonomical relation-          8. OR artifacts              No
ships only.                                                                 9. AND artifacts             No
   The textual resource used in the experiment presented in this paper      10. Unsatisfiability         No
is a technical glossary composed of 376 definitions of the most im-
                                                                            11. High complexity of
portant terms of the domain of composite materials and how are they         the reasoning task           No
used for manufacturing pieces. The glossary contains 9500 words.
                                                                                                         Yes: one taxonomical relationship can be de-
For constructing the ontology we resort to the standard configura-          12. Ontology not mini-       duced from two taxonomical relationships
tion for the different parameters of Text2Onto: all the proposed al-        mal                          already present in the ontology (automati-
gorithms for concepts (and respectively for taxonomic relations) ex-                                     cally identified by an ad-hoc algorithm)
tractions have been used and their results have been combined with
the default strategy.                                                                                    Yes: 15 taxonomic relationships are jugged
                                                                            1. Social contradiction      semantically inconsistent by the expert
   The constructed ontology is an automatically built domain ontol-
ogy that contains 965 concepts and 408 taxonomic relationships.                                          Yes: 5 concepts that are interpreted as in-
                                                                            2. Perception of design      stances by the expert (units of measure and
Some of the central concepts of this ontology are: ”technique”,             errors                       proper names)
”step”, ”compound”, ”fiber”, ”resin”, ”polymerization”, ”laminate”,
”substance”, ”form”.                                                                                     Yes: 21 concepts that have meaningless la-
                                                                            3. Social meaningless        bels, for the expert
                                                                            4. Social incompleteness     Yes
4.2.2    Identified problems                                                5. Lack of or poor textual   Yes: no annotation associated to the ontology
                                                                            explanations                 or to its artifacts
Table 3 summarizes which types of problems have been identified
                                                                            6. Potentially equivalent    Yes: 6 pairs of concepts have synonym la-
in the automatically constructed ontology in our experience with            artifacts                    bels, for the expert
Text2Onto. It also indicates, when possible, how many problems
                                                                            7. Indistinguishable arti-
have been identified. Most of problems are relatively easy to iden-         facts                        No
tify and to quantify (e.g. the number of cycles in the taxonomical
                                                                            8. Artifacts with poly-      Yes: 69 concepts with polysemic labels, for
structure), but there are exceptions (e.g. the number of concepts or        semic labels                 the expert
taxonomic relationships that are missing from the ontology).
                                                                            9. Flatness of the ontol-    Yes: 389 lazy concepts lead to a poorly struc-
                                                                            ogy                          tured ontology
4.2.3    Discussion                                                         10. No standard formal-
                                                                            ization                      No
No intended model or use case scenario was available when the ex-           11. No adapted and cer-
pert analyzed the automatically constructed ontology. Consequently,         tified ontology version      No
it was able only to make a supposition concerning the logical com-                                       Yes: 28 concepts are not necessary (3 are too
                                                                            12. Useless artifacts        generic, 25 are out of the domain)
pleteness of the ontology and no logical error (unadapted ontology,
incomplete or incorrect reasoning) was identified.
   Few logical unsuitable situations are identified, but it is remarkable
that they were identified automatically.
   Unsurprisingly, most of the identified problems are social prob-
lems.
   The analysis in Section 4.1 suggest that most of the problems that            [9] M. Fahad and M. Qadir, ‘A framework for ontology evaluation’, in
are expected in the automatically constructed ontologies are due to                  Proc. of the 16th Int. Conf. on Conceptual Struct. (ICCS2008), volume
                                                                                     354, pp. 149–158, (2008).
the fact that the automatic tool do not take into account the synonymy          [10] M. Fernandez, A. Gomez-Prez, and N. Juristo, ‘Methontology: From
and the polysemy of terms when constructing concepts. However,                       ontological art towards ontological engineering’, in Proc. of the AAAI97
even if Text2Onto, as configured for our test, do not group synonym                  Spring Symposium Series on Ontological Engineering, pp. 33–40,
terms when forming concepts, and allows polysemic terms to be la-                    (1997).
bels for concepts, our test-case reveals that only two types of prob-           [11] A. Gangemi, C. Catenacci, M. Ciaramita, and J. Lehmann, ‘Modelling
                                                                                     ontology evaluation and validation’, in Proc. of Eur. Sem. Web Conf.
lems (socially indistinguishable artifacts and artifacts with polysemic              (ESWC2006), number 4011 in LNCS, (2006).
labels) may be imputed to this limitation.                                      [12] T. Gherasim, M. Harzallah, G. Berio, and P. Kuntz, ‘Analyse com-
   Most of the identified problems are related to the fact that the au-              parative de methodologies et d’outils de construction automatique
tomatically constructed ontology seems to be incomplete.                             d’ontologies a partir de ressources textuelles’, in Proc. of EGC’2011,
                                                                                     pp. 377–388, (2011).
                                                                                [13] A. Gomez-Perez, ‘Ontology evaluation’, in Handbook on Ontologies,
                                                                                     eds., S. Staab and R. Studer, Int. Handbook on Inf. Syst., pp. 251–274,
5   Conclusion                                                                       Springer, 1 edn., (2004).
                                                                                [14] A. Gomez-Perez, M.F. Lopez, and O.C. Garcia, Ontological Engineer-
In this paper, we have introduced a framework providing standard-                    ing: With Examples from the Areas of Knowledge Management, E-
ized definitions for different errors that have some impact on the                   Commerce and the Semantic Web, chapter Chap 3.8.2 Taxonomy eval-
quality of the ontologies. This framework aims at both unifying var-                 uation, 180–184, Advanced Information and Knowledge Processing,
ious error descriptions presented in the recent literature and com-                  Springer, 2001.
                                                                                [15] T. R. Gruber, ‘A translation approach to portable ontology specifica-
pleting them. It also leads to a new error classification that removes               tions’, Knowl. Acquisition, 5(2), 199–220, (1993).
ambiguities of the previous ones. During ontology evaluation this               [16] N. Guarino, D. Oberle, and S. Staab, ‘What is an ontology?’, in Hand-
framework may be used as a support for verifying in a systematic                     book on Ontologies, 1–17, Springer, 2 edn., (2009).
way if the ontology contains errors or unsuitable situations.                   [17] G. Hirst, ‘Ontology and the lexicon’, in Handbook on Ontologies, eds.,
                                                                                     R. Studer and S. Staab, Int. Handbook on Inf. Syst., 269–292, Springer,
    In the second part of the paper we focused on the quality of au-                 2 edn., (2009).
tomatically built ontologies and we present experimental results of             [18] J. Krogstie, O.I. Lindland, and G. Sindre, ‘Defining quality aspects
our analysis on an ontology automatically built by Text2Onto. The                    for conceptual models’, in Proc. of the IFIP8.1 Working Conference
results show that a large part of the identified errors are linked to                on Information Systems Concepts: Towards a Consolidation of Views
the ontology incompleteness. Moreover, it confirms that the identi-                  (ISCO3), (1995).
                                                                                [19] A. Lozano-Tello and A. Gomez-Perez, ‘Ontometric: A method to
fication of logical errors other than inconsistency requires intended                choose the appropriate ontology’, Journal of Database Management,
models (or at least a set of positive and negative examples) and use                 15(2), 1–18, (2004).
case scenarii.                                                                  [20] N. Ben Mustapha, H. Baazaoui Zghal, M.A. Aufaure, and H. Ben
    Due to the increasing complexity of the software, the identification             Ghezala, ‘Enhancing semantic search using case-based modular ontol-
                                                                                     ogy’, in Proc. of the 2010 ACM Symposium on Applied Computing, pp.
of the origin of each error in the ontology building process remains an              1438–1439, (2010).
open question. And a further works consists in associating the iden-            [21] L. Obrst, B. Ashpole, W. Ceusters, I. Mani, S. Ray, and B. Smith, ‘The
tified errors with the different tasks of an ontology construction (e.g.             evaluation of ontologies: toward improved semantic interoperability’,
the Methontology tasks [10]). This work could help to improve the                    in SemanticWeb: Revolutionizing Knowledge Discovery in the Life Sci-
quality results of the software by a retro-engineering process and/or                ences, ed., K.-H. Cheung C. J. O. Baker, 139–158, Springer, (2007).
                                                                                [22] J.D. Osborne, J. Flatow, M. Holko, S.M. Lin, W.A. Kibbe, L. Zhu, M.I.
to design assistant to detect and to solve major errors.                             Danila, G. Feng, and R. L. Chisholm, ‘Annotating the human genome
                                                                                     with disease ontology’, BMC Genomics, 10, 63–68, (2009).
                                                                                [23] M. Poveda, M. C. Suarez-Figueroa, and A. Gomez-Perez, ‘Common
REFERENCES                                                                           pitfalls in ontology development’, in Proc. of the Current topics in ar-
                                                                                     tificial intelligence (CAEPIA09), and 13th conference on Spanish asso-
[1] M. Almeida, ‘A proposal to evaluate ontology content’, Journal of Ap-            ciation for artificial intelligence, (2009).
    plied Ontology, 4(3-4), 245–265, (2009).                                    [24] M. Poveda, M. C. Suarez-Figueroa, and A. Gomez-Perez, ‘A double
[2] J. Baumeister and D. Seipel, ‘Smelly owls design anomalies in on-                classification of common pitfalls in ontologies’, in Proc. of Workshop
    tologies’, in Proc. of 18th Int. Florida Artificial Intelligence Research        on Ontology Quality (OntoQual 2010), Co-located with EKAW 2010,
    Society Conf. (FLAIRS), pp. 215–220, (2005).                                     (2010).
[3] J. Baumeister and D. Seipel, ‘Anomalies in ontologies with rules’, Web      [25] C. Roussey, O. Corcho, and L. M. V. Blzquez, ‘A catalogue of owl
    Semantics: Science, Services and Agents on the World Wide Web, 8(1),             ontology antipatterns’, in Proc. of the Fifth Int. Conf. on Know. Capture
    55–68, (2010).                                                                   KCAP, pp. 205–206, (2009).
[4] A. Burton-Jones, V. Storey, and V. Sugumaran, ‘A semiotic metrics           [26] N. H. Shah and M. A. Musen, ‘Ontologies for formal representation
    suite for assessing the quality of ontologies’, Data Knowl. Eng., 55(1),         of biological systems’, in Handbook on Ontologies, eds., R. Studer and
    84–102, (2005).                                                                  S. Staab, Int. Handbook on Inf. Syst., 445–462, Springer, 2 edn., (2009).
[5] P. Cimiano, A. Madche, S. Staab, and J. Volker, ‘Ontology learning’, in     [27] E. Simperl and Tempich C., ‘Exploring the economical aspects of on-
    Handbook on Ontologies, eds., R. Studer and S. Staab, Int. Handbook              tology engineering’, in Handbook on Ontologies, eds., R. Studer and
    on Inf. Syst., 245–267, Springer, 2 edn., (2009).                                S. Staab, Int. Handbook on Inf. Syst., 445–462, Springer, 2 edn., (2009).
[6] P. Cimiano and J. Volker, ‘Text2onto - a framework for ontology learn-      [28] D. Vrandecic, ‘Ontology evaluation’, in Handbook on Ontologies, eds.,
    ing and data-driven change discovery’, in 2nd Eur. Semantic Web Conf.,           R. Studer and S. Staab, Int. Handbook on Inf. Syst., 293–314, Springer,
    eds., A. Montoyo, R. Munoz, and E. Metais, volume 3513, pp. 227–238,             2 edn., (2009).
    (2005).
[7] O. Corcho, C. Roussey, and L. M. V. Blazquez, ‘Catalogue of anti-
    patterns for formal ontology debugging’, in Atelier Construction
    d’ontologies: vers un guide des bonnes pratiques, AFIA 2009, (2009).
[8] G. Ereteo, M. Buffa, O. Corby, and F. Gandon, ‘Semantic social net-
    work analysis: A concrete case’, in Handbook of Research on Methods
    and Techniques for Studying Virtual Communities: Paradigms and Phe-
    nomena, 122–156, IGI Global, (2010).