<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Sentences and Assessments in Probabilistic Description Logics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jos´e Eduardo Ochoa Luna</string-name>
          <email>eduardo.ol@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kate Revoredo</string-name>
          <email>katerevoredo@uniriotec.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Gagliardi Cozman</string-name>
          <email>fgcozman@usp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Inform ́atica Aplicada</institution>
          ,
          <addr-line>Unirio Av. Pasteur, 458, Rio de Janeiro, RJ</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Escola Polit ́ecnica, Universidade de S ̃ao Paulo</institution>
          ,
          <addr-line>Av. Prof. Mello Morais 2231, S ̃ao Paulo - SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The representation of uncertainty in the semantic web can be eased by the use of learning techniques. To completely induce a probabilistic ontology (that is, an ontology encoded through a probabilistic description logic) from data, two basic tasks must be solved: (1) learning concept definitions and (2) learning probabilistic inclusions. In this paper we propose and test an algorithm that learns concept definitions using an inductive logic programming approach and then learns probabilistic inclusions using relational data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Probabilistic Description Logics (PDLs) have been extensively investigated in
the last few years [
        <xref ref-type="bibr" rid="ref19 ref5 ref7 ref8">5, 8, 19, 7</xref>
        ]. The goal is to represent uncertainty in the context
of classical description logics. So far probabilistic description logics have been
mostly restricted to academic purposes, as caveats in syntax and semantics have
prevented them from spreading into several domains. Additionaly, it can be hard
to elicit the probability component of a particular set of sentences.
      </p>
      <p>
        The probabilistic description logic crALC [
        <xref ref-type="bibr" rid="ref22 ref6 ref7">6, 22, 7</xref>
        ] allows one to perform
probabilistic reasoning by adding uncertainty capabilities to the logic ALC [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Previous efforts for learning crALC have separately focused on concept
definitions [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and probabilistic inclusions [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In this paper, we present an algorithm
for learning concept definitions and probabilistic inclusions at once; i.e., we
discuss how to construct the whole probabilistic terminology based on crALC from
relational data. We expect that learning techniques can accomodate together
background knowledge and deterministic and probabilistic concepts, giving each
component its due relevance.
      </p>
      <p>
        The algorithm we propose is mostly based on inductive logic programming
(ILP) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] techniques with a probabilistic twist. A search for the best concept
description is performed. At the end of this search a decision is made as to
whether to consider the concept description found or to insert a probabilistic
inclusion based on this concept.
      </p>
      <p>The paper is organized as follows. Section 2 reviews basic concepts of
description logics, probabilistic description logics, crALC and machine learning
in a deterministic setting. Section 3 presents our algorithm for probabilistic
description logic learning. Experiments are discussed in Section 4, and Section 5
concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basics</title>
      <p>The aim of this paper is to learn probabilistic terminologies from data. In this
section we briefly review both deterministic and probabilistic components of
probabilistic description logics. In addition, machine learning in a deterministic
setting is discussed.
2.1</p>
      <sec id="sec-2-1">
        <title>Description Logics</title>
        <p>
          Description logics (DLs) form a family of representation languages that are
typically decidable fragments of first order logic (FOL) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Knowledge is expressed
in terms of individuals, concepts, and roles. The semantic of a description is
given by a domain D (a set) and an interpretation ·I (a functor). Individuals
represent objects through names from a set NI = {a, b, . . .}. Each concept in the
set NC = {C, D, . . .} is interpreted as a subset of a domain D. Each role in the
set NR = {r, s, . . .} is interpreted as a binary relation on the domain.
        </p>
        <p>Concepts and roles are combined to form new concepts using a set of
constructors. Constructors in the ALC logic are conjunction (C ⊓D), disjunction (C ⊔D),
negation (¬C), existential restriction (∃r.C), and value restriction (∀r.C).
Concept inclusions/definitions are denoted respectively by C ⊑ D and C ≡ D, where
C and D are concepts. Concepts (C ⊔ ¬C) and (C ⊓ ¬C) are denoted by ⊤ and
⊥ respectivelly. Information is stored in a knowledge base (K) divided in two
parts: the TBox (terminology) and the ABox (assertions). The TBox lists
concepts and roles and their relationships. A TBox is acyclic if it is a set of concept
inclusions/definitions such that no concept in the terminology uses itself. The
ABox contains assertions about objects.</p>
        <p>
          Given a knowledge base K =&lt; T , A &gt;, the reasoning services typically
include (i) consistency problem (to check whether the A is consistent with respect
to the T ); (ii) entailment problem (to check whether an assertion is entailed by
K; note that this generates class-membership assertions K |= C(a), where a is
an individual and C is a concept); (iii) concept satisfiability problem (to check
whether a concept is subsumed by another concept with respect to the T ). The
latter two reasoning services can be reduced to the consistency problem [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
2.2
        </p>
        <p>
          Probabilistic Description Logics and crALC
Several probabilistic descriptions logics (PDLs) have appeared in the literature.
Heinsohn [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], Jaeger [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and Sebastiani [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] consider probabilistic inclusion
axioms such as PD(Professor) = α, meaning that a randomly selected object is a
Professor with probability α. This characterizes a domain-based semantics:
probabilities are assigned to subsets of the domain D. Sebastiani also allows inclusions
such as P (Professor(John)) = α, specifying probabilities over the interpretations
themselves. For example, one interprets P (Professor(John)) = 0.001 as assigning
0.001 to be the probability of the set of interpretations where John is a Professor.
This characterizes an interpretation-based semantics.
        </p>
        <p>
          The PDL crALC is a probabilistic extension of the DL ALC that adopts an
interpretation-based semantics. It keeps all constructors of ALC, but only allows
concept names on the left hand side of inclusions/definitions. Additionally, in
crALC one can have probabilistic inclusions such as P (C|D) = α or P (r) = β
for concepts C and D, and for role r. If the interpretation of D is the whole
domain, then we simply write P (C) = α. The semantics of these inclusions is
roughly (a formal definition can be found in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]) given by:
        </p>
        <p>∀x ∈ D : P (C(x)|D(x)) = α,
∀x ∈ D, y ∈ D : P (r(x, y)) = β.</p>
        <p>We assume that every terminology is acyclic; no concept uses itself. This
assumption allows one to represent any terminology T through a directed acyclic
graph. Such a graph, denoted by G(T ), has each concept name and role name
as a node, and if a concept C directly uses concept D, that is if C and D appear
respectively in the left and right hand sides of an inclusion/definition, then D
is a parent of C in G(T ). Each existential restriction ∃r.C and value restriction
∀r.C is added to the graph G(T ) as nodes, with an edge from r to each restriction
directly using it. Each restriction node is a deterministic node in that its value
is completely determined by its parents.</p>
        <p>
          The semantics of crALC is based on probability measures over the space of
interpretations, for a fixed domain. Inferences, such as P (Ao(a0)|A) for an ABox
A, can be computed by propositionalization and probabilistic inference (for exact
calculations) or by a first order loopy propagation algorithm (for approximate
calculations) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Learning Description Logics</title>
        <p>
          The use of ontologies for knowledge representation has been a key element of
proposals for the Semantic Web [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. However, constructing ontologies from scratch
can be a bundersome and time consuming task [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Nowadays, mainly due to
the availability of data, learning of ontologies has turn out to be an
interesting alternative. Indeed, considerable effort is currently invested into developing
automated means for the acquisition of ontologies [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          Most early approaches were only capable of learning simple ontologies such
as taxonomic hierarchies. Some recent approaches such as YINYANG [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ],
DLFOIL [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and DL-Learner [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] have focused on learning expressive terminologies
(we refer to [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for a detailed review on learning description logics). To some
extent, all these approaches have been inspired by Inductive Logic Programming
(ILP) techniques, in that they try to transfer ILP methods to description logic
settings. The goal of learning in such deterministic languages is generally to find
a correct concept with respect to given examples. A formal definition is:
Definition 1. Given a knowledge base K, a target concept Target such that
Target 6∈ K, a set E = Ep ∪ En of positive and negative examples given as
assertions for Target, the goal of learning is to find a concept definition C(Target ≡ C)
such that K ∪ C |= Ep and K ∪ C 6|= En.
        </p>
        <p>
          A sound concept definition for Target must cover all positive examples and
none of the negative examples. A learning algorithm can be constructed as a
combination of (1) a refinement operator, which defines how a search tree can
be built, (2) a search algorithm, which controls how the tree is traversed, and
(3) a scoring function to evaluate the nodes in the tree defining the best one.
The refinement operator Refinement operators allow us to find candidate
concept definitions through two basic tasks: generalization and specialization
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Such operators in both ILP and description logic learning rely on
θ-subsumption to establish an ordering so as to traverse the search space. If a concept
C subsumes a concept D(D ⊑ C), then C covers all examples which are covered
by D, which makes subsumption a suitable order. Arguably the best refinement
operator for description logic learning is the one available in the DL-Learner
system [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ], as this operator has been proved to be complete, weakly complete
and proper (see [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] for details).
        </p>
        <p>
          The score function In a deterministic setting a cover relationship simply tests
whether, for given candidate concept definition (C), a given example e holds;
that is, K ∪ C |= e where e ∈ Ep or e ∈ En. In this sense, a cover relationship
cover(e, K, C) indicates whether a candidate concept covers a given example. A
cover relationship is commonly evaluated by instance checking [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          In description logic learning one often compares candidates through score
functions based on the number of positive/negative examples covered. To avoid
overfitting on concepts, horizontal expansions3 are also explored [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. For
instance, in DL-Learner a fitness relationship considers the number of positive
examples as well as the length of solutions when expanding candidates in the
tree search.
        </p>
        <p>
          The algorithm to traverse the search space The learning algorithm
depends basically on the way we traverse the candidate concepts obtained after
applying refinement operators. In a deterministic setting the search for
candidate concepts is often based on the FOIL [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] algorithm. There are also different
approaches (for instance, DL-Learner, an approach based on genetic algorithms
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], and one that relies on horizontal expansion and redundance checking to
traverse search trees [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]).
3 Given a node in a search tree, the horizontal expansion is its upper bound on the
length of child concepts.
        </p>
        <p>Learning the PDL crALC
A probabilistic terminology consists of both concepts definitions and
probabilistic components (probabilistic inclusions in crALC). We aim at automatically
identifying from data sound deterministic concepts and consistent probabilistic
inclusions. A key design choice in learning under a combined approach is to give
a due relevance to each component.</p>
        <p>It is worth noting that there are well established deterministic concepts such
as Father ≡ Male ⊓ hasChild.⊤ for which it would be unnecessary to find a
probabilistic interpretation. On the other hand, there are concepts with natural
probabilistic assessments such as P (FlyingBird|Bird) = α. In principle, a learning
algorithm should be able to deal with these subtleties.</p>
        <p>We argue that negative and positive examples underlie the choice of either a
concept definition or a probabilistic inclusion. In a deterministic setting we
expect to find concepts covering all positive examples, which is not always possible.
It is common to allow flexible heuristics that deal with these issues. Moreover,
there are several examples that cannot be ascribed to candidate hypotheses4.
Uncertainty stems from such missing information. Therefore, when we are
unable to find a concept definition that covers all positive examples we assume
such hypothesis as candidates to be a probabilistic inclusion and we begin the
search for the best probabilistic inclusion that fits the examples.</p>
        <p>As in description logic learning three tasks are important and should be
considered: (1) refinement operators, (2) scoring functions and (3) a traverse
search space algorithm. The refinement operator described in 2.3 is used for
learning the deterministic component of probabilistic terminologies. The other
two tasks were adapted for probabilistic description logic learning as follows.
3.1</p>
      </sec>
      <sec id="sec-2-3">
        <title>The Probabilistic Score Function</title>
        <p>
          In our proposal, since we want to learn probabilistic terminologies, we adopt a
probabilistic cover relation given in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]:
        </p>
        <p>cover (e, K, C) = P (e|K, C).</p>
        <p>Every candidate hypothesis together with a given example turns out to be a
probabilistic random variable which yields true if the example is covered, and
false otherwise. To guarantee soundness of the ILP process (that is, to cover
positive examples and not to cover negative examples), the following restrictions
are needed:</p>
        <p>P (ep|K, C) &gt; 0,</p>
        <p>P (en|K, C) = 0.</p>
        <p>In this way a probabilistic cover relationship is a generalization of the
deterministic cover, and is suitable for a combined approach. Probabilities can be
4 In some cases the Open World Assumption inherent to description logics prevent us
for stating membership of concepts.
computed through Bayes’ theorem:</p>
        <p>P (e|K, C1, . . . , Ck) =</p>
        <p>P (C1, C2, . . . , Ck|T )P (T )</p>
        <p>
          P (C1, . . . , Ck)
,
where C1, . . . , Ck are candidate concepts definitions, and T denotes the target
concept variable. Here are three possibilities for modeling P (C1, . . . , Ck|T ): (1)
a naive Bayes assumption may be adopted [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] (each candidate concept is
independent given the target), and then P (C1, . . . , Ck|T ) = Qi P (Ci|T ); (2) the
noisy-OR function may be used [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]; (3) a less restrictive option based on tree
augmented naive Bayes networks (TAN) may be handy [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This last possibility
has been considered for the probabilistic cover relationship used in this paper.
In each case probabilities are estimated by maximum (conditional) likelihood
parameters. The candidate concept definition Ci with the highest probability
P (Ci|T ) is the one chosen as the best candidate.
        </p>
        <p>As we have chosen a probabilistic cover relationship, our probabilistic score
is defined accordingly:
score(K|C) =</p>
        <p>Y P (ei|K, C),
ei∈Ep
where C is the best candidate chosen as described before.</p>
        <p>In the probabilistic score we have previously defined, a given threshold allow
us differentiate between a deterministic and probabilistic inclusion candidate.
Further details are given in the next section.
3.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>The Algorithm to Learn Probabilistic Terminologies</title>
        <p>
          Previous efforts for learning the PDL crALC have separately explored concepts
definitions [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and probabilistic inclusions [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. In this paper, we advocate for a
combined approach where we use a classical approach for traversing the space of
deterministic concepts and a probabilistic procedure for generating probabilistic
inclusions.
        </p>
        <p>The choice between a deterministic or a probabilistic inclusion is based on a
probabilistic score. We start by searching a deterministic concept. If after a set of
iterations the score of the best candidate is below a given threshold, a search for a
probabilistic inclusion is preferred rather than keep searching for a deterministic
concept definition. Then, the current best k-candidates are considered as start
point for probabilistic inclusion search. The complete learning procedure is shown
in Algorithm 1.</p>
        <p>The algorithm starts with an overly general concept definition in the root
of the search tree (line 1). This node is expanded according to refinement
operators and horizontal expansion criterion (line 4), i.e, child nodes obtained by
refinement operators are added to the search tree (line 5). The probabilistic
parameters of these child nodes are learned (line 6) and then they are evaluated
with the best one chosen for a new expansion (line 3) (alternative nodes based
on horizontal expansion factor are also considered (line 8)). This process
continues until a stopping criterion is attained (difference for scores is insignificant);
After that, the best node obtained is evaluated and if it is above a threshold,
a deterministic concept definition is found and returned (line 11). Otherwise, a
probabilistic inclusion procedure is called (line 13).</p>
        <p>
          The Algorithm 2 learns probabilistic inclusions. It starts retrieving the best
k nodes in the search tree and computing the conditional mutual information for
every pair of nodes (line 2). Then an undirected graph is built where the vertices
are the k nodes and the edges are weighted with the value of the conditional
mutual information [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] for each pair of vertices (lines 4 and 5). A maximum
weight spanning tree [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] from this graph is built (line 6) and the target concept
is added as a parent for each vertice (line 7). The probabilistic parameters are
learned (line 8). This learned TAN-based classifier [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is used to evaluate the
possible probabilistic inclusion candidates (line 9) and the best one is returned.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In order to evaluate the learning algorithm we have divided the analysis in two
stages. In a first stage, the algorithm was compared with, arguably, the best
description logic learning algorithm available (the DL-Learner system). The second
stage evaluated suitability of the algorithm for learning probabilistic
terminologies in real world domains.</p>
      <p>
        The aim of the first stage was to investigate whether by introducing a
probabilistic setting the algorithm behaves as well as traditional deterministic
approaches in description logic learning tasks. In this preliminar evaluation (as a
rule, there is a lack of evaluation standards in ontology learning [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]) we have
Require: SearchTree previously computed
1: for each pair of candidates Ci, Cj in first k nodes of the SearchTree do
2: compute the conditional mutual information I(Ci, Cj|T )
3: end for
4: build an undirected graph in which vertices are the k candidates
5: annotate the weight of an edge connecting Ci to Cj by the I(Ci, Cj|T )
6: build a maximum weight spanning tree from this graph
7: add T as parent for each Ci
8: learn probabilities for P (Ci|P arents(Ci))
9: return the highest probabilistic inclusion P (T |C′) = α
      </p>
      <p>
        Algorithm 2: Algorithm for learning probabilistic inclusions.
considered five datasets available in the DL-Learner system and reported in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Evaluation results are shown in Table 1.
      </p>
      <p>The combined approach was able to learn correct concept definitions.
However, in some cases produced longer solutions.</p>
      <p>In the second stage we focused on learning of probabilistic terminologies from
real world data. Wikipedia5 was used to do so. Wikipedia articles consist mostly
of free text, but also contain various types of structured information in the form
of Wiki markup. Such information includes infobox templates, categorization
information, images geo-coordinates, links to external Web pages, disambiguation
pages, redirects between pages, and links across different language editions of
Wikipedia.</p>
      <p>
        In the last years, there were several projects aimed at structuring such huge
source of knowledge. Examples include, The DBpedia project [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which
extracts structured information from Wikipedia and turns it into a rich
knowledge base, and YAGO [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], a semantic knowledge base based on data from
Wikipedia and WordNet6. Currently, YAGO knows more than 2 million entities
(like persons, organizations, cities, etc.). It knows 20 million facts about these
5 http://www.wikipedia.org/
6 wordnet.princeton.edu/
entities. Unlike many other automatically assembled knowledge bases, YAGO
has a manually confirmed accuracy of 95%. Several domains ranging from films,
places, historical events, wines, etc. have been considered in this ontology.
Moreover, facts are given as binary relationships that are suitable for our learning
settings. There are approximately 92 relationships available. Examples include
actedIn, bornIn, created, discovered describes, diedIn, happenedIn, hasAcademicAdvisor,
hasChild, hasHDI, hasWonPrize, influences, isMarriedTo, isPartOf, livesIn, politicianOf,
worksAt.
      </p>
      <p>We have used subsets of YAGO facts for learning probabilistic terminologies.
Two domais have been mostly explored. The first, related to scientists. The
second, related to film directors. In both cases the threshold used was 0.85 and
the 20 best candidates were considered in the probabilistic inclusion learning
step.</p>
      <p>The first dataset consists of 2008 potential scientists for which we have
learned concept definitions and probabilistic inclusions. The complete
terminology is given below:</p>
      <p>P (Person) = 0.9
P (Topic) = 0.4
P (Year) = 0.35
P (Prize) = 0.2
P (Text) = 0.25
P (EducationalInstitution) = 0.3
P (wrotes) = 0.4
P (hasAcademicAdvisor) = 0.80
P (interestedIn) = 0.6
P (diedOnYear) = 0.7
P (hasWonPrize) = 0.4
P (worksAt) = 0.85</p>
      <p>P (influences) = 0.6
Scientist ≡ Person
⊓(∃hasAcademicAdvisor.Person
⊓∃wrotes.Text ⊓ ∃worksAt.EducationalInsitution)
P (InfluentialScientist | Scientist ⊓ ∃influences.</p>
      <p>∃diedOnYear.Year) = 0.85
P (Musician | Person ⊓ ∃hasAcademicAdvisor.∃wrote.Text) = 0.1
HonoredScientist ≡ Scientist</p>
      <p>⊓ ∃hasWonPrize.Prize</p>
      <p>This resulting crALC terminology can be further investigated by
probabilistic inference7. The basic task we address is classification. Assume we are
interested in classifying a potential scientist given we know he/she has written
a book and has an academic advisor:
P (Scientist(0)|Person(0) ⊓ ∃wrote.Text(1) ⊓ hasAcademicAdvisor.Person(2)) = 0.5
When further evidence is available the value probability is updated to:</p>
      <sec id="sec-3-1">
        <title>P (Scientist(0) |Person(0) ⊓(∃wrote.Text(1) ⊓ ∃hasAcademicAdvisor. ∃influences.Person(3))) = 0.65</title>
        <p>7 Given a domain size, a relational Bayesian network is constructed to do so.</p>
        <p>In the second dataset we have collected facts about film directors ranging
from classical to contemporary. About 5589 potential directors have been
considered. The complete probabilistic terminology is shown below.</p>
        <p>Learned components range from basic concept definitions such as Actor to
probabilistic inclusions for describing most influential directors. Assume we are
interested in classifying a person given we know that he/she has acted and
directed. According to evidence available:</p>
        <p>P (Actor(0)|Person(0) ⊓ ∃actedIn.Film(1) ⊓ ∃directed.Film(2)) = 0.4
P (Director(0)|Person(0) ⊓ ∃actedIn.Film(1) ⊓ ∃directed.Film(2)) = 0.55
As further evidence is given, probability value changes to:</p>
      </sec>
      <sec id="sec-3-2">
        <title>P (Actor(0) |Person(0) ⊓(∃actedIn.Film(1) ⊓ ∃directed.Film(2) ⊓∃influences.Person(3))) = 0.3</title>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We have proposed a method for learning deterministic/probabilistic components
of terminologies expressed in crALC. Differently from previous approaches, we
have produced a combined scheme, where both the deterministic and
probabilistic components receive due attention.</p>
      <p>This unified learning scheme has the following components: (1) a refinement
operator for traversing the search space, (2) probabilistic cover and score
relationships for evaluating candidates, (3) a mixed search procedure. Initially, the
search aims at finding deterministic concepts. If the score obtained is below a
given threshold, a probabilistic inclusion search is conducted (a probabilistic
classifier is produced). Experiments with probabilistic terminology in a real-world
domain suggest that probabilistic inclusions do lead to improved likelihoods.</p>
      <p>Probabilistic description logics offer expressive languages in which to conduct
learning, while charging a relatively low cost for inference. The present
contribution offers novel ideas for this sort of learning task; we note that the current
literature on this topic is rather scarce. Our future work is to investigate the
scalability of our learning methods.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The first author is supported by CAPES and the third author is partially
supported by CNPq. The work reported here has received substantial support
through FAPESP grant 2008/03995-5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Antoniou</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. van Harmelen. Semantic</given-names>
            <surname>Web</surname>
          </string-name>
          <article-title>Primer</article-title>
          . MIT Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nutt</surname>
          </string-name>
          .
          <article-title>Basic description logics</article-title>
          .
          <source>In Description Logic Handbook</source>
          , pages
          <fpage>47</fpage>
          -
          <lpage>100</lpage>
          . Cambridge University Press,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann. DBpedia -</surname>
          </string-name>
          <article-title>a crystallization point for the web of data</article-title>
          .
          <source>Web Semant</source>
          .,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Cormen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Leiserson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Rivest</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stein</surname>
          </string-name>
          . Introduction to Algorithms. MIT Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>P. C. G.</given-names>
            <surname>Costa and K. B. Laskey</surname>
          </string-name>
          .
          <article-title>PR-OWL: A framework for probabilistic ontologies</article-title>
          .
          <source>In Proceeding of the 2006 conference on Formal Ontology in Information Systems</source>
          , pages
          <fpage>237</fpage>
          -
          <lpage>249</lpage>
          , Amsterdam, The Netherlands, The Netherlands,
          <year>2006</year>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>F. G.</given-names>
            <surname>Cozman and R. B. Polastro</surname>
          </string-name>
          .
          <article-title>Loopy propagation in a probabilistic description logic</article-title>
          . In Sergio Greco and Thomas Lukasiewicz, editors,
          <source>Second International Conference on Scalable Uncertainty Management, Lecture Notes in Artificial Intelligence (LNAI 5291)</source>
          , pages
          <fpage>120</fpage>
          -
          <lpage>133</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F. G.</given-names>
            <surname>Cozman and R. B. Polastro</surname>
          </string-name>
          .
          <article-title>Complexity analysis and variational inference for interpretation-based probabilistic description logics</article-title>
          .
          <source>In Conference on Uncertainty in Artificial Intelligence</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>C. D'Amato</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>Tractable reasoning with Bayesian description logics</article-title>
          .
          <source>In SUM '08: Proceedings of the 2nd international conference on Scalable Uncertainty Management</source>
          , pages
          <fpage>146</fpage>
          -
          <lpage>159</lpage>
          , Berlin, Heidelberg,
          <year>2008</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. L. De Raedt, editor.
          <source>Advances in Inductive Logic Programming</source>
          . IOS Press,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>N.</given-names>
            <surname>Fanizzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. D'Amato</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>DL-FOIL concept learning in description logics</article-title>
          .
          <source>In ILP '08: Proceedings of the 18th International Conference on Inductive Logic Programming</source>
          , pages
          <fpage>107</fpage>
          -
          <lpage>121</lpage>
          , Berlin, Heidelberg,
          <year>2008</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>N.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Geiger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldszmidt</surname>
          </string-name>
          .
          <article-title>Bayesian network classifiers</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>29</volume>
          :
          <fpage>131</fpage>
          -
          <lpage>163</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J.</given-names>
            <surname>Heinsohn</surname>
          </string-name>
          .
          <article-title>Probabilistic description logics</article-title>
          .
          <source>In International Conf. on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. L.
          <string-name>
            <surname>Iannone</surname>
            ,
            <given-names>I. Palmisano</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fanizzi</surname>
          </string-name>
          .
          <article-title>An algorithm based on counterfactuals for concept learning in the semantic web</article-title>
          .
          <source>Applied Intelligence</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>139</fpage>
          -
          <lpage>159</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaeger</surname>
          </string-name>
          .
          <article-title>Probabilistic reasoning in terminological logics</article-title>
          .
          <source>In Principals of Knowledge Representation (KR)</source>
          , pages
          <fpage>461</fpage>
          -
          <lpage>472</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>N.</given-names>
            <surname>Landwehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>DeRaedt. Integrating</surname>
          </string-name>
          <article-title>Na¨ıve Bayes and FOIL</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>8</volume>
          :
          <fpage>481</fpage>
          -
          <lpage>507</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          .
          <article-title>Hybrid learning of ontology classes</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Machine Learning and Data Mining</source>
          , volume
          <volume>4571</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>883</fpage>
          -
          <lpage>898</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>Foundations of refinement operators for description logics</article-title>
          . In Hendrick Blockeel,
          <string-name>
            <given-names>Jude W.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          , and Prasad Tadepalli, editors,
          <source>ILP '07: Proceedings of the 17th International Conference on Inductive Logic Programming</source>
          , volume
          <volume>4894</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>161</fpage>
          -
          <lpage>174</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>A refinement operator based learning algorithm for the ALC description logic</article-title>
          . In Hendrick Blockeel,
          <string-name>
            <given-names>Jude W.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          , and Prasad Tadepalli, editors,
          <source>ILP '07: Proceedings of the 17th International Conference on Inductive Logic Programming</source>
          , volume
          <volume>4894</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>147</fpage>
          -
          <lpage>160</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>Expressive probabilistic description logics</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>172</volume>
          (
          <issue>6- 7</issue>
          ):
          <fpage>852</fpage>
          -
          <lpage>883</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Ochoa-Luna</surname>
          </string-name>
          and
          <string-name>
            <surname>F. G. Cozman.</surname>
          </string-name>
          <article-title>An algorithm for learning with probabilistic description logics</article-title>
          .
          <source>In 5th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW) at the 8th International Semantic Web Conference (ISWC)</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
          , Chantilly, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>Probabilistic Reasoning in Intelligent Systems: networks of plausible inference</article-title>
          .
          <source>Morgan Kaufman</source>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>R. B. Polastro</surname>
            and
            <given-names>F. G. Cozman.</given-names>
          </string-name>
          <article-title>Inference in probabilistic ontologies with attributive concept descriptions and nominals</article-title>
          .
          <source>In 4th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW) at the 7th International Semantic Web Conference (ISWC)</source>
          , Karlsruhe, Germany,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>J. R. Quinlan</surname>
            and
            <given-names>R. M.</given-names>
          </string-name>
          <string-name>
            <surname>Cameron-Jones</surname>
          </string-name>
          .
          <article-title>FOIL: A midterm report</article-title>
          .
          <source>In Proceedings of the European Conference on Machine Learning</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          . Springer-Verlag,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>K.</given-names>
            <surname>Revoredo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ochoa-Luna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.G.</given-names>
            <surname>Cozman</surname>
          </string-name>
          .
          <article-title>Learning terminologies in probabilistic description logics</article-title>
          .
          <source>In Proceedings of the 20th Brazilian Symposium on Artificial Intelligence</source>
          . To appear,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <article-title>A probabilistic terminological logic for modelling information retrieval</article-title>
          .
          <source>In ACM Conf. on Research and Development in Information Retrieval (SIGIR)</source>
          , pages
          <fpage>122</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>F. M. Suchanek</surname>
            ,
            <given-names>G.</given-names>
            Kasneci, and G.
          </string-name>
          <string-name>
            <surname>Weikum.</surname>
          </string-name>
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In WWW '07: Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>