<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extending Bayesian Classi er with Ontological Attributes</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computing Science, Poznan University of Technology</institution>
          ,
          <addr-line>ul. Piotrowo 2, 60-965 Poznan</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The goal of inductive learning classi cation is to form generalizations from a set of training examples such that the classi cation accuracy on previously unobserved examples is maximized. Given a speci c learning algorithm, it is obvious that its classi cation accuracy depends on the quality of training data. In learning from examples, noise is anything which obscures correlations between attributes and the class [1]. There are many possible solutions to deal with the existence of noise. Data cleaning or detection and elimination of noisy examples constitutes the rst approach. Due to the risk of data cleaning, when noisy examples are retained while good examples are removed, e orts have been taken to construct noise tolerant classi ers. Although both these approaches seem very di erent, they try to somehow 'clean' this noisy training data. In this paper, we propose an approach to 'admit and utilize' noisy data by enabling to model di erent levels of knowledge granularity both in training and testing examples. The proposed knowledge representation use hierarchies of sets of attribute values, derived from subsumption hierarchies of concepts from an ontology represented in description logic. The main contributions of the paper are: (i) we propose a novel extension of the nave Bayesian classi er by hierarchical, ontology based attributes (ontological attributes ), (ii) we propose an inference scheme that handles ontological attributes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        There are three major sources of noise: (i) insu ciency of the description for
attributes or the class (or both), (ii) corruption of attribute values in the
training examples, (iii) erroneous classi cation of training examples [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The second
and third source of noise can lead to so-called attribute-noise and class-noise
respectively. Attribute-noise is represented by: (i) erroneous attribute values, (ii)
missing or "don't care" attribute values, (iii) incomplete attributes or "don't
care" values. The class-noise is represented by: (i) contradictory examples, or
(ii) misclassi cation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the rst major source of noise, although not
easily quanti able, is important. This insu ciency of the description can lead to
both erroneous attribute values and erroneous classi cation. Let us call this
resulting noise as description-noise. Following for example [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] the main reason for
description-noise may be in a language used to represent attribute values, which
is not expressive enough to model di erent levels of knowledge granularity. In
such a case, erroneous or missing attribute values may be introduced by users of
a system that are required to provide very speci c values, but the level of their
knowledge of the domain is too general to precisely describe the observation by
the appropriate value of an attribute. Even if the person is an expert of the
domain, erroneous or missing attribute values can be observed as a consequence
of lack of time, or other resources to make detailed observations (ie. a more
complete description). However, if the language enabled modeling di erent levels of
knowledge granularity (very precise or more general descriptions), we would be
able to decrease a level of this description-noise.
      </p>
      <p>
        In order to model di erent levels of knowledge granularity, each testing and
training example would be described by a set of values for any attribute. These
sets of values should re ect the domain knowledge and could not be constructed
arbitrarily. Let us notice, that in some domains, hierarchical or taxonomical
relationships between sets of values, represented by so called concepts, may be
observed and this knowledge could be explored. Such knowledge is currently
often available in the form of ontologies. The most widely used language to
represent ontologies, suitable in particular to model taxonomical knowledge, is
Web Ontology Language (OWL) 1. The theoretical counterpart of OWL, from
which its semantics is drawn, is constituted by a family of languages called
description logics (DLs) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A description logic knowledge base, KB, is typically
divided into intensional part (terminological one, a TBox ), and extensional part
(assertional one, an ABox ).
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>An Ontological Attribute</title>
      <p>Given is an attribute A and the set V = fV1; V2; :::; Vng, where n &gt; 1, of
nominal values of this attribute. Let us assume that given is a TBox, which speci es
domain knowledge relevant to a given classi cation task. In particular, it
expresses a multilevel subsumption ("is-a") hierarchy of concepts. Each concept is
described by a subset of the set V for every attribute A. Then we can formulate
a de nition of an ontological attribute as follows.</p>
      <p>Ontological attribute An ontological attribute A is de ned by a tuple hH; V i,
where:
{ by H is denoted a multilevel subsumption hierarchy of concepts, derived
from a DL knowledge base. This hierarchy of concepts consists of the set of
nodes N H = froot; N C ; N T g. This hierarchy de nes a root-node, denoted by
root, a set N C of complex-nodes and a set N T of terminal-nodes.
1 www.w3.org/TR/owl-features/
{ by V is denoted a nite set V = fV1; V2; :::; Vng, where n &gt; 1 of nominal
values of A.
{ each node Nk 2 N T [ N C represents a subset of the set V , denoted as
val(Nk); the root-node represents the set V</p>
      <p>To model actual training examples, an ABox would be used.
3.1</p>
      <p>Using Ontological Attributes in the Nave Bayesian Classi er
In order to apply the proposed ontological attributes in the nave Bayesian
classi er, we further specify the general de nition of an ontological attribute given
in the former section. Please note, that by making the assumptions presented
in the following paragraphs, we will implicitly switch from the usual open world
assumption used to reason with a DL knowledge base to produce a concept
hierarchy, to the closed world assumption, more appropriate to the case of inference
with nave Bayesian classi er. In particular we will assume that a hierarchy of
concepts would represent such hierarchical partitioning of the set V of attribute
values, such that each concept would correspond to a non-empty subset of V .
Properties of nodes Each complex-node represents a concept from the KB,
described by a proper, non-empty subset of V . Each terminal-node represents a
concept from the KB, described by a unique value Vi from the set V .
Relations between nodes For a given ontological attribute A, the hierarchy H is a
tree, i.e. each node Nk 2 fN C [ N T g has exactly one parent, denoted as pa(Nk),
such that val(Nk) val(pa(Nk)). Moreover, each node Nk 2 froot [ N C g
speci es a set ch(Nk) of his children. To model di erent levels of knowledge
granularity, we assume that for each Nk 2 froot [ N C g all his children are
pairwise disjoint and this node Nk is a union of his children. Finally, for each node
Nk 2 froot [ N C g we de ne a set de(Nk) of descendants of this node, as a set
of its children or children of his descendants.</p>
      <p>The role of complex-nodes In the setting of learning with description-noise, each
training and testing example can be described in general by a set Zl of values for
each attribute A, where Zl V . We can divide training examples into no-noisy
examples (jZlj = 1) and noisy examples (jZlj &gt; 1). In order to represent noisy
(training and testing) examples, the ontological attribute A uses complex-nodes.
We will call such a hierarchy a complex-hierarchy.</p>
      <p>Algorithm 1 (Populating a complex-hierarchy). For each ontological attribute A
we proceed as follows:
We associate each training example t described by a set Zl of values of A and
a class label Cj (t : A = Zl ^ C = Cj ) to a node Nk. When jZlj = 1, Zl
is associated to a terminal-node Nk, such that Zl = val(Nk). Otherwise, we
associate the training example to a complex-node Nk, such that Zl val(Nk),
at the lowest possible level of the complex-hierarchy.</p>
      <p>Root : V
N1 : fV1g</p>
      <p>N2 : fV2g</p>
      <p>N6 : Z1 = fV3; V4; V5g
N3 : fV3g</p>
      <p>N7 : Z2 = fV4; V5g
N4 : fV4g</p>
      <p>N5 : fV5g
Example Given is an attribute A such that V =fV1,V2,V3,V4,V5g and given is
a class variable C such that it takes values from the set fC1; C2g. Let us
assume, that the description-noise is modeled by sets Z1 = fV3; V4; V5g and
Z2 = fV4; V5g. Let us assume a sample scenario in which the single values of the
attribute A are determined by conducting three medical tests. The rst test is
able to partition the set V into the following disjoint subsets: fV1g, fV2g and
Z1 = fV3; V4; V5g. If the result of the rst test is Z1, then in some cases it is
conducted a second test, that partitions the set Z1 into the following disjoint
subsets: fV3g and Z2 = fV4; V5g. Only in critical cases it is conducted the last
test, that can partition the set Z2 into disjoint subsets: fV4g and fV5g.
Following this domain-knowledge, we have introduced two complex-nodes N6 and
N7, such that they represent the sets Z1 and Z2 respectively. Terminal-nodes
N1; N2; N3; N4; N5 represent single values from the set V . The root-node
represents the set V . The resulting complex-hierarchy is presented in Figure 1.
We can approximate the required probability distribution for a noisy testing
example described by a set Zl = val(Nk), following principles of the probabilistic
theory, by collecting frequencies of training examples T , described by sets Zm
Zl, as follows:</p>
      <p>P (ZljCj ) = PZm Zl jT : A = Zm ^ C = Cj j (1)</p>
      <p>jT : C = Cj j</p>
      <p>Let us remind, that a set Zl is assigned to the node Nk, such that Zl =
val(Nk). The key property of an ontological attribute A, is that for the node Nk
all its children are pairwise disjoint. Since then, all training examples described
by sets Zm Zl, are represented by the node Nk or its descendants, and the
probability distribution for a noisy testing example described by a set Zl we can
de ne as follows:
P (ZljCj ) =
jT : A
val(Nk) ^ C = Cj j + PNd2de(Nk) jT : A
jT : C = Cj j
val(Nd) ^ C = Cj j(2)</p>
      <p>In this way we are able to classify a new noisy example using other less
noisy and no-noisy training examples. For example, we can classify a testing
example, described by the set Z1, and associated to the node N6 using all training
examples described by all subsets of the set Z1. These training examples would
be associated to the complex-node N6 or his descendants.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>
        The topic of learning with ontologies is relatively new, and so far there are few
approaches in this line of research, for the classi cation task see for example [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The simple use of ontology (Attribute Value Taxonomies) in the nave Bayesian
classi er (AVT-NBL) is presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This approach, to the best of our
knowledge, is the only one existing approach for learning the nave Bayesian classi er
from noisy (partially speci ed) data. Both in our approach and in AVT-NBL,
noisy (partially speci ed) data is represented using hierarchical structures and
similar aggregation procedures are used. Let us notice, that AVT-NBL requires a
static, prede ned, taxonomy of attribute values. In our approach, the hierarchy
of sets of attribute values can be constructed dynamically driven by
observations and hypotheses to prove. Moreover, our aggregation procedure allows to
construct the complex-hierarchy from all possible subsets of attribute values.
In this way we would be able to model any noisy training and testing example
in order to achieve the highest classi cation accuracy, that is not possible
using an Attribute Value Taxonomy. Due to limitations of the presentation, this
generalization is not discussed in the paper. Let us point out, that AVT-NBL
uses a propagation procedure, that does not follow principles of the probabilistic
theory. Moreover, to the best of our knowldge, AVT-NBL does not classify noisy
instances, which is the main goal of our approach.
      </p>
      <p>In the future, we will concentrate on the problem of the optimality of the
complex-hierarchy, derived from a knowledge domain of the form of subsumption
hierarchies of concepts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hickey</surname>
          </string-name>
          , R.J.:
          <article-title>Noise Modelling and Evaluating Learning from Examples</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>82</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>1996</year>
          )
          <volume>157</volume>
          {
          <fpage>179</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Class Noise vs</article-title>
          .
          <source>Attribute Noise: A Quantitative Study. Artif. Intell. Rev</source>
          .
          <volume>22</volume>
          (
          <issue>3</issue>
          ) (
          <year>2004</year>
          )
          <volume>177</volume>
          {
          <fpage>210</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niblett</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Induction in Noisy Domains</article-title>
          . In: EWSL. (
          <year>1987</year>
          )
          <volume>11</volume>
          {
          <fpage>30</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
          </string-name>
          , P., eds.:
          <source>The Description Logic Handbook</source>
          . Cambridge University Press (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fanizzi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Distance-Based Classi cation in Owl Ontologies</article-title>
          . In Lovrek, I.,
          <string-name>
            <surname>Howlett</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
          </string-name>
          , L.C., eds.:
          <source>KES (2)</source>
          . Volume
          <volume>5178</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2008</year>
          )
          <volume>656</volume>
          {
          <fpage>661</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Honavar</surname>
          </string-name>
          , V.:
          <article-title>AVT-NBL: An Algorithm for Learning Compact and Accurate Nave Bayes Classi ers from Attribute Value Taxonomies and Data</article-title>
          .
          <source>In: ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining</source>
          , Washington, DC, USA, IEEE Computer Society (
          <year>2004</year>
          )
          <volume>289</volume>
          {
          <fpage>296</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>