=Paper= {{Paper |id=Vol-527/paper-1 |storemode=property |title=Here you can download the complete proceedings as a single PDF-file |pdfUrl=https://ceur-ws.org/Vol-527/proceedings.pdf |volume=Vol-527 }} ==Here you can download the complete proceedings as a single PDF-file== https://ceur-ws.org/Vol-527/proceedings.pdf

Foreword

This volume contains the papers presented at the 5th International Workshop on Uncer-
tainty Reasoning for the Semantic Web (URSW 2009), held as a part of the 8th Inter-
national Semantic Web Conference (ISWC 2009) at the Westfields Conference Center
near Washington, DC, USA, October 26, 2009. It contains 6 technical papers and 3 po-
sition papers, which were selected in a rigorous reviewing process, where each paper
was reviewed by at least four program committee members.
The International Semantic Web Conference is a major international forum for
presenting visionary research on all aspects of the Semantic Web. The International
Workshop on Uncertainty Reasoning for the Semantic Web is an exciting opportu-
nity for collaboration and cross-fertilization between the uncertainty reasoning com-
munity and the Semantic Web community. Effective methods for reasoning under un-
certainty are vital for realizing many aspects of the Semantic Web vision, but the abil-
ity of current-generation Web technology to handle uncertainty is extremely limited.
Recently, there has been a groundswell of demand for uncertainty reasoning technol-
ogy among Semantic Web researchers and developers. This surge of interest creates a
unique opening to bring together two communities with a clear commonality of interest
but little history of interaction. By capitalizing on this opportunity, URSW could spark
dramatic progress toward realizing the Semantic Web vision.
Audience: The intended audience for this workshop includes the following: (1) re-
searchers in uncertainty reasoning technologies with interest in Semantic Web and Web-
related technologies; (2) Semantic Web developers and researchers; (3) people in the
knowledge representation community with interest in the Semantic Web; (4) ontology
researchers and ontological engineers; (5) Web services researchers and developers with
interest in the Semantic Web; and (6) developers of tools designed to support Semantic
Web implementation, e.g., Jena, Protégé, and Protégé-OWL developers.
Topics: We intended to have an open discussion on any topic relevant to the general
subject of uncertainty in the Semantic Web (including fuzzy theory, probability the-
ory, and other approaches). Therefore, the following list should be just an initial guide:
(1) syntax and semantics for extensions to Semantic Web languages to enable repre-
sentation of uncertainty; (2) logical formalisms to support uncertainty in Semantic Web
languages; (3) probability theory as a means of assessing the likelihood that terms in
different ontologies refer to the same or similar concepts; (4) architectures for applying
plausible reasoning to the problem of ontology mapping; (5) using fuzzy approaches to
deal with imprecise concepts within ontologies; (6) the concept of a probabilistic ontol-
ogy and its relevance to the Semantic Web; (7) best practices for representing uncertain,
incomplete, ambiguous, or controversial information in the Semantic Web; (8) the role
of uncertainty as it relates to Web services; (9) interface protocols with support for
uncertainty as a means to improve interoperability among Web services; (10) uncer-
tainty reasoning techniques applied to trust issues in the Semantic Web; (11) existing
implementations of uncertainty reasoning tools in the context of the Semantic Web;
(12) issues and techniques for integrating tools for representing and reasoning with un-
certainty; and (13) the future of uncertainty reasoning for the Semantic Web.
II

We wish to thank all authors who submitted papers and all workshop participants
for fruitful discussions. We would like to thank the program committee members and
external referees for their timely expertise in carefully reviewing the submissions.

October 2009 Fernando Bobillo
Paulo C. G. da Costa
Claudia d’Amato
Nicola Fanizzi
Kathryn B. Laskey
Kenneth J. Laskey
Thomas Lukasiewicz
Trevor Martin
Matthias Nickles
Michael Pool
Pavel Smrž
Workshop Organization

Program Chairs
Fernando Bobillo (University of Zaragoza, Spain)
Paulo C. G. da Costa (George Mason University, USA)
Claudia d’Amato (University of Bari, Italy)
Nicola Fanizzi (University of Bari, Italy)
Kathryn B. Laskey (George Mason University, USA)
Kenneth J. Laskey (MITRE Corporation, USA)
Thomas Lukasiewicz (University of Oxford, UK)
Trevor Martin (University of Bristol, UK)
Matthias Nickles (University of Bath, UK)
Michael Pool (Convera, Inc., USA)
Pavel Smrž (Brno University of Technology, Czech Republic)

Program Committee
Fernando Bobillo (University of Zaragoza, Spain)
Silvia Calegari (University of Milano-Bicocca, Italy)
Rommel Carvalho (George Mason University, USA)
Paulo C. G. da Costa (George Mason University, USA)
Fabio Gagliardi Cozman (University of São Paulo, Brazil)
Claudia d’Amato (University of Bari, Italy)
Nicola Fanizzi (University of Bari, Italy)
Marcelo Ladeira (University of Brasilia, Brazil)
Kathryn B. Laskey (George Mason University, USA)
Kenneth J. Laskey (MITRE Corporation, USA)
Thomas Lukasiewicz (University of Oxford, UK)
Anders L. Madsen (Hugin Expert A/S, Denmark)
Trevor Martin (University of Bristol, UK)
Matthias Nickles (University of Bath, UK)
Jeff Z. Pan (University of Aberdeen, UK)
Yun Peng (University of Maryland, Baltimore County, USA)
Michael Pool (Convera, Inc., USA)
Livia Predoiu (University of Mannheim, Germany)
Guilin Qi (University of Karlsruhe, Germany)
Carlos Henrique Ribeiro (Instituto Tecnológico de Aeronáutica, Brazil)
David Robertson (University of Edinburgh, UK)
Daniel Sánchez (University of Granada, Spain)
Sergej Sizov (University of Koblenz-Landau, Germany)
Pavel Smrž (Brno University of Technology, Czech Republic)
Giorgos Stoilos (National Technical University of Athens, Greece)
Umberto Straccia (ISTI-CNR, Italy)
IV

Andreas Tolk (Old Dominion University, USA)
Johanna Völker (University of Karlsruhe, Germany)
Peter Vojtáš (Charles University, Czech Republic)

External Reviewers

Zhiqiang Gao
Table of Contents

Technical Papers

Probabilistic Ontology and Knowledge Fusion for Procurement Fraud
Detection in Brazil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Rommel N. Carvalho, Kathryn B. Laskey, Paulo C. G. da Costa, Marcelo
Ladeira, Laécio L. Santos, and Shou Matsumoto

Combining Semantic Web Search with the Power of Inductive Reasoning . . . . . . 15
Claudia d’Amato, Nicola Fanizzi, Bettina Fazzinga, Georg Gottlob, and
Thomas Lukasiewicz
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning . . . . 27
Nicola Fanizzi, Claudia d’Amato, and Floriana Esposito

Ontology Granulation Through Inductive Decision Trees . . . . . . . . . . . . . . . . . . . 39
Bart Gajderowicz and Alireza Sadeghian
Axiomatic First-Order Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Kathryn B. Laskey

An Algorithm for Learning with Probabilistic Description Logics . . . . . . . . . . . . 63
José Eduardo Ochoa Luna and Fabio Gagliardi Cozman

Position Papers

BeliefOWL: An Evidential Representation in OWL Ontology . . . . . . . . . . . . . . . 77
Amira Essaid and Boutheina Ben Yaghlane
Fuzzy Taxonomies for Creative Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . 81
Trevor Martin, Zheng Siyao, and Andrei Majidian
Uncertainty Reasoning for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Dave Reynolds
VI
Technical Papers
Probabilistic Ontology and Knowledge Fusion for
Procurement Fraud Detection in Brazil

Rommel N. Carvalho1, Kathryn B. Laskey1, Paulo C. G. Costa1, Marcelo Ladeira2,
Laécio L. Santos2, and Shou Matsumoto2,
1
George Mason University
4400 University Drive
Fairfax, VA 22030-4400 USA
rommel.carvalho@gmail.com, {klaskey, pcosta}@gmu.edu
2
University of Brasilia
Campus Universitário Darcy Ribeiro
Brasilia – DF 70910-900 Brazil
mladeira@unb.br, {laecio, cardialfly}@gmail.com

Abstract. To cope with society’s demand for transparency and corruption
prevention, the Brazilian Office of the Comptroller General (CGU) has carried
out a number of actions, including: awareness campaigns aimed at the private
sector; campaigns to educate the public; research initiatives; and regular
inspections and audits of municipalities and states. Although CGU has collected
information from hundreds of different sources - Revenue Agency, Federal
Police, and others - the process of fusing all this data has not been efficient
enough to meet the needs of CGU’s decision makers. Therefore, it is natural to
change the focus from data fusion to knowledge fusion. As a consequence,
traditional syntactic methods must be augmented with techniques that represent
and reason with the semantics of databases. However, commonly used
approaches fail to deal with uncertainty, a dominant characteristic in corruption
prevention. This paper presents the use of Probabilistic OWL (PR-OWL) to
design and test a model that performs information fusion to detect possible
frauds in procurements involving Federal money. To design this model, a
recently developed tool for creating PR-OWL ontologies was used with support
from PR-OWL specialists and careful guidance from a fraud detection specialist
from CGU.

Keywords: Probabilistic Ontology, PR-OWL, Ontology, Procurement Fraud
Detection, Knowledge Fusion, MEBN, UnBBayes.

1 Introduction

A primary responsibility of the Brazilian Office of the Comptroller General (CGU) is
to prevent and detect government corruption. To carry out this mission, CGU must
gather information from a variety of sources and combine it to evaluate whether
4 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

further action, such as an investigation, is required. One of the most difficult
challenges is the information explosion. Auditors must fuse vast quantities of
information from a variety of sources in a way that highlights its relevance to decision
makers and helps them focus their efforts on the most critical cases. This is no trivial
duty. The Growing Acceleration Program (PAC) alone has a budget greater than 250
billion dollars with more than one thousand projects only on the state of Sao Paulo
(http://www.brasil.gov.br/pac/). All of these have to be audited and inspected by CGU
– and, in spite having only three thousand employees. Therefore, CGU must optimize
its processes in order to carry out its mission.
The Semantic Web (SW), like the document web that preceded it, is based on
radical notions of information sharing. These ideas [1] include: (i) the Anyone can say
Anything about Any topic (AAA) slogan; (ii) the open world assumption, in which
we assume there is always more information that could be known, and (iii) nonunique
naming, which appreciates the reality that different speakers on the Web might use
different names to define the same entity. In a fundamental departure from
assumptions of traditional information systems architectures, the Semantic Web is
intended to provide an environment in which information sharing can thrive and a
network effect of knowledge synergy is possible. But this style of information
gathering can generate a chaotic landscape rife with confusion, disagreement and
conflict.
We call an environment characterized by the above assumptions a Radical
Information Sharing (RIS) environment. The challenge facing SW architects is
therefore to avoid the natural chaos to which RIS environments are prone, and move
to a state characterized by information sharing, cooperation and collaboration.
According to [1], one solution to this challenge lies in modeling, and this is where
ontologies languages like Web Ontology Language (OWL) come in.
As it will be shown in Section 3, the domain of procurement fraud detection is a
RIS environment. However, uncertainty is ubiquitous to knowledge fusion.
Uncertainty is especially important to applications such as fraud detection, in which
perpetrators seek to conceal illicit intentions and activities, making crisp assertions
extremely hard and rare. In such environments, partial (not complete) or approximate
(not exact) information is more the rule than the exception.
Bayesian networks (BNs) have been widely applied to draw inferences to
information and knowledge fusion in the presence of uncertainty. However, according
to [2] BNs are not expressive enough for many real-world applications. More
specifically, BNs assume a simple attribute-value representation – that is, each
problem instance involves reasoning about the same fixed number of attributes, with
only the evidence values changing from problem instance to problem instance.
Complex problems on the scale of the semantic web often involve intricate
relationships among many variables, and the limited representational power of BNs is
insufficient for building useful, detailed models.
Multi-Entity Bayesian Network (MEBN) logic can represent and reason with
uncertainty about any propositions that can be expressed in first-order logic [3].
Probabilistic OWL (PR-OWL) uses MEBN’s strengths to provide a framework for
building probabilistic ontologies (PO), a major step towards semantically aware,
probabilistic knowledge fusion systems [4]. This paper uses PR-OWL to design and
Prob. Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil 5

test a model for fusing information to detect possible frauds in procurements
involving Federal funds.
The paper is organized as follows. Section 2 introduces Multi-Entity Bayesian
Networks (MEBN), an expressive Bayesian logic, and PR-OWL, an extension of the
OWL language that can represent probabilistic ontologies having MEBN as its
underlying logic. Section 3 presents a case study from CGU to demonstrate the power
of PR-OWL ontologies for knowledge representation and fusion. Finally, Section 4
presents some concluding remarks.

2 MEBN and PR-OWL

Multi-Entity Bayesian Networks (MEBN) [5 and 6] extend BNs (BN) to achieve first-
order expressive power. MEBN represents knowledge as a collection of MEBN
Fragments (MFrags), which are organized into MEBN Theories (MTheories).
An MFrag contains random variables (RVs) and a fragment graph representing
dependencies among these RVs. An MFrag is a template for a fragment of a Bayesian
network. It is instantiated by binding its arguments to domain entity identifiers to
create instances of its RVs. There are three kinds of RV: context, resident and input.
Context RVs represent conditions that must be satisfied for the distributions
represented in the MFrag to apply. Input nodes represent RVs that may influence the
distributions defined in the MFrag, but whose distributions are defined in other
MFrags. Distributions for resident RV instances are defined in the MFrag.
Distributions for resident RVs are defined by specifying local distributions
conditioned on the values of the instances of their parents in the fragment graph.
A set of MFrags represents a joint distribution over instances of its random
variables. MEBN provides a compact way to represent repeated structure in a BN. An
important advantage of MEBN is that there is no fixed limit on the number of RV
instances, and the random variable instances are dynamically instantiated as needed.
An MTheory is a set of MFrags that satisfies conditions of consistency ensuring
the existence of a unique joint probability distribution over its random variable
instances.
To apply an MTheory to reason about particular scenarios, one needs to provide
the system with specific information about the individual entity instances involved in
the scenario. On receipt of this information, Bayesian inference can be used both to
answer specific questions of interest (e.g., how likely is it that a particular
procurement is being directed to a specific enterprise?) and to refine the MTheory
(e.g., each new tactical situation includes additional statistical data about the
likelihood of a given attack for that set of circumstances). Bayesian inference is used
to perform both problem specific inference and learning in a sound, logically coherent
manner (for more details see [6 and 7]).
State-of-the-art systems are increasingly adopting ontologies as a means to ensure
formal semantic support for knowledge sharing [8, 9, 10, 11, 12, and 13].
Representing and reasoning with uncertainty is becoming recognized as an essential
capability in many domains. A common error is to provide support for uncertainty
representation by just annotating ontologies with numerical probabilities. This
6 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

approach leads to brittleness, as too much information is lost due to the lack of a
representational scheme that can capture structural nuances of the probabilistic
information. More expressive representation formalisms are needed [4].

Fig. 1. PR-OWL main concepts.

Probabilistic Ontologies (PR-OWL) [14 and 15] was proposed as a more
expressive formalism for representing knowledge in domains characterized by
uncertainty. Figure 1 presents the main concepts needed to define an MTheory in PR-
OWL. In the diagram, the ellipses represent the general classes, while the arcs
represent the main relationships among the classes.
The procurement fraud detection probabilistic ontology was built in UnBBayes-
MEBN, a tool for building and reasoning with PR-OWL probabilistic ontologies.
UnBBayes-MEBN was the first software to implement PR-OWL/MEBN (see [16, 17,
18, 19] for more details). UnBBayes-MEBN supports Multi-Entity Bayesian Network
(MEBN) and enables creation and editing of Probabilistic Ontologies in PR-OWL
[18]. The MEBN/PR-OWL Graphical User Interface (GUI) [16] allows users to
define MFrags and make probabilistic queries. UnBBayes-MEBN also implements an
algorithm for generating a Situation Specific Bayesian Network (SSBN) [18, 19],
which is an ordinary BN created by instantiating instances of the MFrags to respond
to a probabilistic query. Once the SSBN is generated, the inference engine
(Reasoning) is called to process findings and update beliefs. UnBBayes-MEBN uses
the Protégé-OWL library to load and save PR-OWL files (IO) in a format compatible
with OWL. It supports first order logic context node evaluation (FOL), through the
use of the PowerLoom library. It also defines and implements a built-in mechanism
for typing and recursion. Finally, it permits the definition of dynamic conditional
probabilistic tables.
UnBBayes has proven to be a simple, yet powerful, tool for designing probabilistic
ontologies and for uncertain reasoning in complex situations such as procurement
fraud detection. It is straightforward to use and provides powerful features (e.g.
dynamic table) not available in systems (e.g., Quiddity) previously employed to
reason with PR-OWL/MEBN knowledge bases.
Prob. Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil 7

3 Procurement Fraud Detection

A major source of corruption is the procurement process. Although laws attempt to
ensure a competitive and fair process, perpetrators find ways to turn the process to
their advantage while appearing to be legitimate. This is why a specialist has
didactically structured the different kinds of procurement frauds CGU has dealt with
in past years.
These different fraud types are characterized by criteria, such as business owners
who work as a front for the company, use of accounting indices that are not common
practice, etc. Indicators have been established to help identify cases of each of these
fraud types. For instance, one principle that must be followed in public procurement is
that of competition. Every public procurement should establish minimum requisites
necessary to guarantee the execution of the contract in order to maximize the number
of participating bidders. Nevertheless, it is common to have a fake competition when
different bidders are, in fact, owned by the same person. This is usually done by
having someone as a front for the enterprise, which is often someone with little or no
education.
The ultimate goal of this case study is to structure the specialist knowledge in a
way that an automated system can reason with the evidence in a manner similar to the
specialist. Such an automated system is intended to support specialists and to help
train new specialists, but not to replace them. Initially, a few simple criteria were
selected as a proof of concept. Nevertheless, it is shown that the model can be
incrementally updated to incorporate new criteria. In this process, it becomes clear
that a number of different sources must be consulted to come up with the necessary
indicators to create new and useful knowledge for decision makers about the
procurements.

Fig. 2. Procurement fraud detection overview.

Figure 2 presents an overview of the procurement fraud detection process. The data
for our case study represent several requests for proposal and auctions that are issued
8 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

by the Federal, State and Municipal Offices (Public Notices – Data). As the focus of
the work is in representing the specialist knowledge and reasoning through
probabilistic ontologies and not in the collection of information, the idea is that the
analysts that work at CGU, already making audits and inspections, accomplish the
collection of information through questionnaires that can specifically be created for
the collecting of indicators for the selected criteria (Information Gathering). These
questionnaires can be created using a system that is already in production at CGU.
Once they are answered the necessary information is going to be available (DB –
Information). Hence, UnBBayes, using the probabilistic ontology designed by experts
(Design – UnBBayes), will be able to collect these millions of items of information
and transform them into dozens or hundreds of items of knowledge, through logic and
probabilistic inference, e.g. procurement announcements, contracts, reports, etc - a
huge amount of data - are analyzed allowing the gathering of relevant relations and
properties - a large amount of information - which in turn are used to draw some
conclusions about possible irregularities - a smaller number of items of knowledge
(Inference – Knowledge). This knowledge can be filtered so that only the
procurements that show a probability higher than a threshold, e.g. 20%, are
automatically forwarded to the responsible department along with the inferences
about potential fraud and the supporting evidence (Report for Decision Makers).
The criteria selected by the specialist were the use of accounting indices and the
demand of experience in just one contract. There are four common types of indices
that are usually used as requirements in procurements (ILC, ILG, ISG, and IE). Any
other type could indicate a made-up index specifically designed to direct the
procurement to some specific company. The greater the numbers of uncommon
accounting indices used by the procurement the more suspicious it is, i.e. the higher
the chance of having fraud. In addition, a procurement specifies a minimum value for
these accounting indices. The minimum value that is usually required is 1.0. The
higher this minimum value, the more the competition is narrowed, and therefore the
higher the chance the procurement is being directed to some company.

Fig. 3. ProcurementRequirement MFrag.

The other criterion, demanding proof of experience in only one contract, is suspect
because in almost every case, the experience is not gained only by a particular
contract, but also by doing it over and over again in different contracts. It does not
matter if you have built 1,000 ft2 of wall in just one contract or 100 ft2 in 10 different
contracts. The experience gained will be basically the same.
The procurement fraud detection model was developed as a probabilistic ontology
(using PR-OWL) to define its semantics and uncertain characteristics. The MTheory
created for the model, using UnBBayes-MEBN, was divided into three different
MFrags.
Prob. Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil 9

The first, Figure 3, presents the criteria required from a company to participate in
the procurement, containing information about the type of accounting index (ILC,
ILG, ISG, IE, and Other) and the minimum value for it (between 0 and 1, between 1
and 2, between 2 and 3, and greater than 3). This MFrag also contains information
about where a specific index is used (which procurement), and if the procurement
demands experience in only one contract.

Fig. 4. DirectingProcurementByIndexes MFrag.

The second, Figure 4, represents whether procurement is being directed to a
specific company by the use of unusual accounting indices. As explained before, this
analysis is based on the type of the index and the minimum value it requires. This
evaluation takes into consideration every index used in a specific procurement, hence
it is dynamic.
The last MFrag, Figure 5, represents the overall possibility that procurement is
being directed to a specific company based on the result of its being directed by the
use of unusual indices and by the requirement of experience in only one contract, as
explained before.

Fig. 5. DirectingProcurement MFrag.

To test the model, two scenarios, that represent the two groups of suspect and non
suspect procurements, were chosen from a set of real cases, as shown:
• Suspect procurement (proc1):
o ind1 = ILC >= 2.0;
o ind2 = ILG >= 1.5;
o ind3 = Other >= 3.0.
o It demands experience in only one contract.
• Non suspect procurement (proc2):
o ind4 = IE >= 1.0;
10 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

o ind5 = ILG >= 1.0;
o ind6 = ILC >= 1.0;
o It does not demand experience in only one contract.
The information above was introduced in our model as known entities and
findings. After that we queried the system to give us information about the node
IsProcurementDirected(proc) for both proc1 and proc2. UnBBayes-MEBN than
executed the SSBN algorithm and generated the same node structure as shown in
Figure 6, because both procurements have three accounting indices and information
about the demanding experience in only one contract. However, as expected, the
parameters and findings are different giving different results to the query, as shown
below:
• Non suspect procurement:
o 0.01% that the procurement was directed to a specific company by
using accounting indices;
o 0.10% that the procurement was directed to a specific company.
• Suspect procurement:
o 55.00% that the procurement was directed to a specific company by
using accounting indices;
o 29.77%, when the information about demanding experience in only
one contract was omitted, and 72.00%, when it was given, that the
procurement was directed to a specific company.

Fig. 6. Generated SSBN for query IsProcurementDirected(proc1).

The specialist analyzed and agreed with the knowledge generated by the
probabilistic ontology reasoned developed using PR-OWL/MEBN in UnBBayes. He
stated that the probabilities represent, semantically (i.e. high, medium, and low
chance), what he would think when analyzing the same entities and findings.
Although the SSBNs generated for this proof of concept present the same structure,
it is common to have a different one as the context varies from procurement to
Prob. Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil 11

procurement. For instance, we have come across several procurements that have all
four common indices and some other different ones. In this case, if there are two
additional indices (ind5 and ind6), then the resulting SSBN would have two more
copies for nodes IndexType(index) andIndexMinValue(index). This would make the
use of BN not applicable. The ability to make multiple copies of nodes based on a
context is only available in a more expressive formalism, as MEBN.

Fig. 7. EnterpriseBusinessNetwork MFrag.

An additional capability not available with BN is to specify constraints on
applicability of knowledge. Such constraints can only be implemented in a more
expressive language. As we are dealing with BN formalism it is only natural to think
of a formalism that extends BN. MEBN, as a Bayesian first-order logic, makes it
possible to define these constraints using FOL.
Figure 7 presents the constraints (context nodes) necessary to model the fraud
detection scenarios considered here. In this MFrag, the criterion is to identify if there
is a suspicious business relationship between enterprises entA and entB. The more
cases where enterprise B wins a procurement that the basic project was developed by
enterprise A, the higher the chance they have some kind of personal business
relationship, which means that it is more likely that enterprise B is developing the
basic projects in such a way that will favor enterprise A, inhibiting the desired
competition.

Fig. 8. OwnerFront MFrag.

Since the designed model is restricted to just two criteria, the team started to think
about other criteria that could be incorporated and tested further. Figure 8 presents the
suggested MFrag for detecting owners that act as a front to the real owner of the
company (the person who really has the power to make decisions and that gets all the
money), by looking up their socio-economic attributes and checking the size of the
12 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

company. In other words, if a company is highly profitable, yet has an owner with
little education, low income, no car, no house, etc, then the company is probably a
front.

Fig. 9. Knowledge fusion from different Government Offices DBs.

From the criteria presented and modeled in this Section, we can clearly see the
need for a principled way of dealing with uncertainty. But what is the role of
Semantic Web in this domain? Well, it is easy to see that our domain of fraud
detection is a RIS environment. The data CGU has available does not come only from
its audits and inspections. In fact, much complementary information can be retrieved
from other Federal Agencies, including Federal Revenue Agency, Federal Police, and
others. Imagine we have information about the enterprise that won the procurement,
and we want to know information about its owners, such as their personal data and
annual income. This type of information is not available at CGU’s Data Base (DB),
but must be retrieved from the Federal Revenue Agency’s DB. Once the information
about the owners is available, it might be useful to check their criminal history. For
that (see Figure 9), information from the Federal Police must be used. In this example,
we have different sources saying different things about the same person: thus, the
AAA slogan applies. Moreover, there might be other Agencies with crucial
information related to our person of interest; in other words, we are operating in an
open world. Finally, to make this sharing and integration process possible, we have to
make sure we are talking about the same person, who may (especially in case of
fraud) be known by different names in different contexts.

5 Conclusion

The problem that CGU and many other Agencies have faced of processing all the
available data into useful knowledge is starting to be solved with the use of
Prob. Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil 13

probabilistic ontologies, as the procurement fraud detection model showed. Besides
fusing the information available, the designed model was able to represent the
specialist knowledge for the two real cases we evaluated. UnBBayes reasoning given
the evidence and using the designed model were accurate both in suspicious and non
suspicious scenarios. These results are encouraging, suggesting that a fuller
development of our proof of concept system is promising.
In addition, it is fairly easy to introduce new criteria and indicators in the model in
an incremental way. Thus, new rules for identifying fraud can be added without
rework. After a new rule is incorporated into the model, a set of new tests can be
added to the previous one with the objective of always validating the new model
proposed, without doing everything from scratch.
Furthermore, the use of this formalism through UnBBayes allows advantages such
as impartiality in the judgment of irregularities in procurements (given the same
conditions the system will always deliver the same result), scalability (capacity to
analyze thousands of procurements in a short time when compared to human
capacity) and a joint analysis of large volumes of indicators (the higher the number of
indicators to examine jointly the more difficult it is for the specialist analysis to be
objective and consistent).
As a next step, CGU is choosing new criteria to be incorporated into the designed
probabilistic ontology. This next set of criteria will require information from different
Brazilian Agencies’ databases. Therefore, the semantic power of ontologies with the
uncertainty handling capability of PR-OWL will be extremely useful for fusing
information from multiple databases.

Acknowledgments. Rommel Carvalho gratefully acknowledges full support from the
Brazilian Office of the Comptroller General (CGU) for the research reported in this
paper, and its employees involved in this research, especially Mário Vinícius
Claussen Spinelli, the domain expert.

References

1. Allemang, D. & Hendler, J. A. 2008. Semantic web for the working ontologist modeling in
RDF, RDFS and OWL. Elsevier, ISBN 978-0-12-373556-0, United States.
2. Costa, P. C. G., Laskey, K. B., Takikawa, M., Pool, M., Fung, F., and Wright, E. J. 2005.
MEBN logic: A Key Enabler for Network Centric Warfare. In Proceedings of the 10th
International Command and Control Research and Technology Symposium (10th ICCRTS).
McLean, Virginia, USA, CCRP publications.
3. Laskey, K. B., Mahoney, S. M., and Wright, E. 2001. Hypothesis Management in Situation-
Specific Network Construction. Uncertainty in Artificial Intelligence: Proceedings of the
Seventeenth Conference, San Mateo, CA, Morgan Kaufman.
4. Laskey, K. B., Costa, P. C. G., and Janssen, T. 2008. Probabilistic Ontologies for Knowledge
Fusion. In Proceedings of the 11th International Conference on Information Fusion.
5. Laskey, K. B. and Costa, P. .C. G. 2005. Of Klingons and Starships: Bayesian Logic for the
23rd Century. In Uncertainty in Artificial Intelligence: Proceedings of the Twenty-first
Conference (UAI 2005). AUAI Press: Edinburgh, Scotland.
6. Laskey, K. B. 2008. MEBN: A language for first-order Bayesian knowledge bases. In
Artificial Intelligence, Volume 172, Issues 2-3, February 2008, Pages 140-178
14 R. Carvalho, K. Laskey, P. Costa, M. Ladeira, L. Santos, and S. Matsumoto

7. Mahoney, S. and Laskey, K. B. 1998. Constructing Situation Specific Belief Networks, In
Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-
98). San Francisco, CA: Morgan Kaufmann.
8. Chen, H., and Wu, Z.. 2003. On Case-Based Knowledge Sharing in Semantic Web. In Tools
with Artificial Intelligence, IEEE International Conference on, 0:200. Vol. 0. Los Alamitos,
CA, USA, IEEE Computer Society.
9. Chen, H., Wu, Z., and Xu, J. 2003. KB-Grid: Enabling Knowledge Sharing on the Semantic
Web. In Challenges of Large Applications in Distributed Environments, International
Workshop on, 0:70. Vol. 0. Los Alamitos, CA, USA: IEEE Computer Society.
10. Costa, Paulo C. G., Chang, KC., Laskey, K. B., and Carvalho, R. N. 2009. A Multi-
Disciplinary Approach to High Level Fusion in Predictive Situational Awareness. In
Proceedings of the 12th International Conference on Information Fusion. Seattle, WA,
USA.
11. Dadzie, AS., Bhagdev, R., Chakravarthy, A., Chapman, S., Iria, J., Lanfranchi, V.,
Magalhães, J., Petrelli, D., and Ciravegna. F. 2008. Applying semantic web technologies to
knowledge sharing in aerospace engineering. Journal of Intelligent Manufacturing 20, no. 5
(6): 611-623. doi:10.1007/s10845-008-0141-1.
12. Kings, N. J. and Davies, J. 2009. Semantic Web for Knowledge Sharing. In Semantic
Knowledge Management, 103-111. http://dx.doi.org/10.1007/978-3-540-88845-1_8.
13. Veres, G. V., Huynh, T. D., Nixon, M. S., Smart, P. R. and Shadbolt, N. R. 2006. The
Military Knowledge Information Fusion Via Semantic Web Technologies.
http://eprints.ecs.soton.ac.uk/14278/.
14. Costa, P. C. G. 2005. Bayesian Semantics for the Semantic Web. PhD Diss. Department of
Systems Engineering and Operations Research, George Mason University. 315p, Fairfax,
VA, USA.
15. Costa, P. C. G., Laskey, K. B., and Laskey, K. J. 2005. PR-OWL: A Bayesian Ontology
Language for the Semantic Web. In Proceedings of the ISWC Workshop on Uncertainty
Reasoning for the Semantic Web, Galway, Ireland.
16. Carvalho, R. N., Ladeira, M., Santos, L. L., and Costa, P. C. 2007. A GUI Tool for
Plausible Reasoning in the Semantic Web using MEBN. In Proceedings of the Seventh
international Conference on intelligent Systems Design and Applications, 381-386. ISDA.
IEEE Computer Society, Washington, DC, USA.
17. Carvalho, R. N., Ladeira, M., Santos, L. L., Matsumoto, S., and Costa, P. C. G. 2008.
UnBBayes-MEBN: Comments on Implementing a Probabilistic Ontology Tool. In
Proceedings of the IADIS International Conference on Applied Computing, 211-218.
18. Costa, P. C. G., Ladeira, M, Carvalho, R. N., Santos, L. L., Matsumoto, S., and Laskey, K.
B. 2008. A First-Order Bayesian Tool for Probabilistic Ontologies. In Proceedings of the
Twenty-First International Florida Artificial Intelligence Research Society Conference, 631-
636. Menlo Park, California, USA, The AAAI Press.
19. Carvalho, R. N., Ladeira, M., Santos, L. L., Matsumoto, S., and Costa, P. C. G. 2009. A
GUI Tool for Plausible Reasoning in the Semantic Web Using MEBN. In Book Innovative
Applications in Data Mining, 17-45. DOI: 10.1007/978-3-540-88045-5_2. Springer Berlin /
Heidelberg.
Combining Semantic Web Search with the
Power of Inductive Reasoning

Claudia d’Amato1 , Nicola Fanizzi1 , Bettina Fazzinga2 ,
Georg Gottlob3,4 , and Thomas Lukasiewicz3,5
1
Dipartimento di Informatica, Università degli Studi di Bari, Italy
{claudia.damato,fanizzi}@di.uniba.it
2
Dipartimento di Elettronica, Informatica e Sistemistica, Università della Calabria, Italy
bfazzinga@deis.unical.it
3
Computing Laboratory, University of Oxford, UK
{georg.gottlob,thomas.lukasiewicz}@comlab.ox.ac.uk
4
Oxford-Man Institute of Quantitative Finance, University of Oxford, UK
5
Institut für Informationssysteme, TU Wien, Austria

Abstract. Extensive research activities are recently directed towards the Semantic
Web as a future form of the Web. Consequently, Web search as the key technology
of the Web is evolving towards some novel form of Semantic Web search. A very
promising recent approach to such Semantic Web search is based on combining stan-
dard Web search with ontological background knowledge and using standard Web
search engines as the main inference motor of Semantic Web search. In this paper,
we propose to further enhance this approach to Semantic Web search by the use of
inductive reasoning techniques. This adds especially the important ability to handle
inconsistencies, noise, and incompleteness, which are very likely to occur in dis-
tributed and heterogeneous environments, such as the Web. We report on a prototype
implementation of the new approach and experimental results.

1 Introduction
Web search [3] as the key technology of the Web is about to change radically with the
development of the Semantic Web [2]. As a consequence, the elaboration of a new search
technology for the Semantic Web, called Semantic Web search [6], is currently an ex-
tremely hot topic, both in Web-related companies and in academic research. In particular,
there is a fast growing number of commercial and academic Semantic Web search en-
gines. The research can be roughly divided into two main directions. The first (and most
common) one is to develop a new form of search for searching the pieces of data and
knowledge that are encoded in the new representation formalisms of the Semantic Web
(e.g., [6]), while the second (and less explored) direction is to use the data and knowledge
of the Semantic Web in order to add some semantics to Web search (e.g., [9]).
A very promising recent representative of the second direction to Semantic Web search
has been presented in [8]. The approach is based on (i) using ontological (unions of) con-
junctive queries (which may contain negated subqueries) as Semantic Web search queries,
(ii) combining standard Web search with ontological background knowledge, (iii) using
the power of Semantic Web formalisms and technologies, and (iv) using standard Web
search engines as the main inference motor of Semantic Web search. It consists of an off-
line ontology compilation step, based on deductive reasoning techniques, and an online
query processing step. In this paper, we propose to further enhance this approach to Se-
mantic Web search by the use of inductive reasoning techniques for the offline ontology
16 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

compilation step. To our knowledge, this is the first combination of Semantic Web search
with inductive reasoning. The paper’s main contributions can be summarized as follows:
– We develop a combination of Semantic Web search as presented in [8] with an induc-
tive reasoning technique (based on similarity search [11] for retrieving the resources
that likely belong to a query concept [5]). The latter serves in an offline ontology
compilation step to compute completed semantic annotations.
– Importantly, the new approach to Semantic Web search can handle inconsistencies,
noise, and incompleteness in Semantic Web knowledge bases, which are all very likely
to occur in distributed and heterogeneous environments, such as the Web. We provide
several examples illustrating this important advantage of the new approach.
– We report on a prototype implementation of the new approach in the context of desktop
search. We also provide very positive experimental results for the precision and the
recall of the new approach, comparing it to the deductive approach in [8].

2 System Overview
The overall architecture of our Semantic Web search system is shown in Fig. 1. It consists
of the Interface, the Query Evaluator, and the Inference Engine (Fig. 1, dark parts), where
the Query Evaluator is implemented on top of standard Web Search Engines. Standard
Web pages and their objects are enriched by Annotation pages, based on an Ontology.
We thus assume that there are semantic annotations to standard Web pages and to ob-
jects on standard Web pages. Note that such annotations are starting to be widely available
for a large class of Web resources, especially with the Web 2.0. Semantic annotations
about Web pages and objects may also be automatically learned from the Web pages and
the objects to be annotated (see, e.g., [4]), and/or they may be extracted from existing on-
tological knowledge bases on the Semantic Web. Another important standard assumption
that we make is that Web pages and their objects have unique identifiers.
For example, in a very simple scenario, a Web page i1 may contain information about a
Ph.D. student i2 , called Mary, and two of her papers, namely, a conference paper i3 entitled
“Semantic Web search” and a journal paper i4 entitled “Semantic Web search engines” and
published in 2008. A simple HTML page representing this scenario is shown in Fig. 2,
left side. There may now exist one semantic annotation each for the Web page, the Ph.D.
student Mary, the journal paper, and the conference paper. The annotation for the Web
page may simply encode that it mentions Mary and the two papers, while the one for Mary
may encode that she is a Ph.D. student with the name Mary and the author of the papers
i3 and i4 . The annotation for the paper i3 may encode that i3 is a conference paper and
has the title “Semantic Web search”, while the one for the paper i4 may encode that i4
is a journal paper, authored by Mary, has the title “Semantic Web search engines”, was
published in 2008, and has the keyword “RDF”. The semantic annotations of i1 , i2 , i3 ,
and i4 are formally expressed as the sets of axioms Ai1 , Ai2 , Ai3 , and Ai4 , respectively:

Ai1 = {contains(i1 , i2 ), contains(i1 , i3 ), contains(i1 , i4 )},
Ai2 = {PhDStudent(i2 ), name(i2 , “mary”), isAuthorOf(i2 , i3 ), isAuthorOf(i2 , i4 )},
Ai3 = {ConferencePaper(i3 ), title(i3 , “Semantic Web search”)}, (1)
Ai4 = {JournalPaper(i4 ), hasAuthor(i4 , i2 ), title(i4 , “Semantic Web search engines”),
yearOfPublication(i4 , 2008), keyword(i4 , “RDF”)}.

Inference Engine. Using an ontology containing some background knowledge, these
semantic annotations are then further enhanced in an offline ontology compilation step,
where the Inference Engine adds all properties that can be deduced from the semantic
Combining Semantic Web Search with the Power of Inductive Reasoning 17

1111111
0000000
0000000
1111111 0000000
1111111
0000000
1111111 Query

0000000
1111111
0000000 1111111
0000000
Interface
Evaluator

1111111 0000000
1111111 Web
Search
Engine

1111111
0000000
Annotations

0000000
1111111
0000000
1111111
Inference

0000000
1111111
Engine

0000000
1111111 Ontology

Fig. 1. System architecture.

www.xyuniversity.edu/mary/an1.html www.xyuniversity.edu/mary/an2.html

www.xyuniversity.edu/mary
www.xyuniversity.edu/mary

WebPage i1
PhDStudent i2

contains i2
name mary i2

contains i3
isAuthorOf i3

contains i4
isAuthorOf i4

www.xyuniversity.edu/mary/an4.html

www.xyuniversity.edu/mary/an3.html

www.xyuniversity.edu/mary

Article i4

www.xyuniversity.edu/mary
JournalPaper i4

Article i3
hasAuthor i2

ConferencePaper i3
title Semantic Web search engines

hasAuthor i2
yearOfPublication 2008

title Semantic Web search
keyword RDF

Fig. 2. Left side: HTML page p; right side: four HTML pages p1 , p2 , p3 , and p4 , which encode
(completed) semantic annotations for p and the objects on p.

annotations and the ontology. In [8], we assume a deductive such step, while here we pro-
pose and explore an inductive one. The resulting (completed) semantic annotations are then
published as Web pages, so that they can be searched by standard Web search engines. For
example, an ontology may contain the knowledge that (i) conference and journal papers
are articles, (ii) conference papers are not journal papers, (iii) isAuthorOf relates scien-
tists and articles, (iv) isAuthorOf is the inverse of hasAuthor, and (v) hasFirstAuthor is a
functional binary relationship, which is formally expressed by:
ConferencePaper v Article, JournalPaper v Article, ConferencePaper v ¬JournalPaper,
∃isAuthorOf v Scientist, ∃isAuthorOf − v Article, isAuthorOf − v hasAuthor, (2)
hasAuthor− v isAuthorOf, (funct hasFirstAuthor).

Using this ontological knowledge, we can derive from the above annotations that the two
papers i3 and i4 are also articles, and both authored by John. These resulting searchable
(completed) semantic annotations of (objects on) standard Web pages are published as
HTML Web pages with pointers to the respective object pages, so that they (in addition
to the standard Web pages) can be searched by standard search engines. For example, the
HTML pages for the completed semantic annotations of the above Ai1 , Ai2 , Ai3 , and Ai4
are shown in Fig. 2, right side. Note that on the HTML page of each individual, its identifier
is located beside the atomic concept below the row specifying the URIs. Practically, such
an identifier may simply be the HTML address of the Web page/object’s annotation page.
For example, considering the HTML pages of Fig. 2, the individual described by p4 is i4 ,
and the one described by p2 is i2 . Observe that we use a plain textual representation of
18 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

the completed semantic annotations in order to allow their processing by existing standard
search engines for the Web. It is important to point out that this textual representation
is simply a list of properties, each eventually along with an identifier or a data value as
attribute value, and it can thus immediately be encoded as a list of RDF triples.

Query Evaluator. The Query Evaluator (see Fig. 1) reduces each Semantic Web search
query of the user in an online query processing step to a sequence of standard Web search
queries on standard Web and annotation pages, which are then processed by a standard
Web Search Engine. The Query Evaluator also collects the results and re-transforms them
into a single answer which is returned to the user. As an example of a Semantic Web search
query, one may ask for all Ph.D. students who have published an article in 2008 with RDF
as a keyword, which is formally expressed as follows:

Q(x) = ∃y (PhDStudent(x) ∧ isAuthorOf(x, y) ∧ Article(y) ∧
yearOfPublication(y, 2008) ∧ keyword(y, “RDF ”)) .

This query is transformed into the two queries Q1 = PhDStudent AND isAuthorOf and
Q2 = Article AND “yearOfPublication 2008” AND “keyword RDF”, which can both be
submitted to a standard Web search engine, such as Google. The result of the original
query Q is then built from the results of the two queries Q1 and Q2 . Note that a graphical
user interface, such as the one of Google’s advanced search, or even a natural language
interface can help to hide the conceptual complexity of ontological queries to the user.

3 Semantic Web Search

We now introduce Semantic Web knowledge bases and the syntax and semantics of Se-
mantic Web search queries to such knowledge bases. We then generalize the PageRank
technique to our approach. We assume the reader is familiar with the syntax and the se-
mantics of Description Logics (DLs) [1], which we use as underlying ontology languages.

Semantic Web Knowledge Bases. Intuitively, a Semantic Web knowledge base consists
of a background TBox and a collection of ABoxes, one for every concrete Web page
and for every object on a Web page. For example, the homepage of a scientist may be
such a concrete Web page and be associated with an ABox, while the publications on the
homepage may be such objects, which are also associated with one ABox each.
We assume pairwise disjoint sets D, A, RA , RD , I, and V of atomic datatypes, atomic
concepts, atomic roles, atomic attributes, individuals, and data values, respectively. Let I
be the disjoint union of two sets P and O of Web pages and Web objects, respectively.
Informally, every p ∈ P is an identifier for a concrete Web page, while every o ∈ O is
an identifier for a concrete object on a concrete Web page. We assume the atomic roles
links to between Web pages and contains between Web pages and Web objects. The former
represents the link structure between concrete Web pages, while the latter encodes the
occurrences of concrete Web objects on concrete Web pages.

Definition 1. A semantic annotation Aa for a Web page or object a ∈ P ∪ O is a finite
set of concept membership axioms A(a), role membership axioms P (a, b), and attribute
membership axioms U (a, v), where A ∈ A, P ∈ RA , U ∈ RD , b ∈ I, and v ∈ V. A Seman-
tic Web knowledge base KB = (T , (Aa )a ∈ P∪O ) consists of a TBox T and one semantic
annotation Aa for every Web page and object a ∈ P ∪ O.
Combining Semantic Web Search with the Power of Inductive Reasoning 19

Informally, a Semantic Web knowledge base consists of some background terminolog-
ical knowledge and some assertional knowledge for every concrete Web page and for every
concrete object on a Web page. The background terminological knowledge may be an on-
tology from some global Semantic Web repository or an ontology defined locally by the
user site. In contrast to the background terminological knowledge, the assertional knowl-
edge will be directly stored on the Web (on annotation pages like the described standard
Web pages) and is thus accessible via Web search engines.
Example 1. (Scientific Database). We use a DL knowledge base KB = (T , A) to spec-
ify some simple information about scientists and their publications. The sets of atomic
concepts, atomic roles, atomic attributes, and data values are:
A = {Scientist, Article, ConferencePaper, JournalPaper},
RA = {hasAuthor, isAuthorOf, contains}, RD = {name, title, yearOfPublication},
V = {“mary”, “Semantic Web search”, 2008, “Semantic Web search engines”}.
Let I = P ∪ O be the set of individuals, where P = {i1 } is the set of Web pages, and O =
{i2 , i3 , i4 } is the set of Web objects on the Web page i1 . The TBox T contains the axioms
in Eq. 2. Then, a Semantic Web knowledge base is given by KB = (T , (Aa )a ∈ P∪O ),
where the semantic annotations of the individuals in P ∪ O are the ones in Eq. 1.
Semantic Web Search Queries. We use unions of conjunctive queries with negated con-
junctive subqueries as Semantic Web search queries to Semantic Web knowledge bases.
We now first define the syntax of Semantic Web search queries and then the semantics of
positive and general such queries.

Syntax. Let X be a finite set of variables. A term is either a Web page p ∈ P, a Web object
o ∈ O, a data value v ∈ V, or a variable x ∈ X. An atomic formula (or atom) α is of one
of the following forms: (i) d(t), where d is an atomic datatype, and t is a term; (ii) A(t),
where A is an atomic concept, and t is a term; (iii) P (t, t0 ), where P is an atomic role,
and t, t0 are terms; and (iv) U (t, t0 ), where U is an atomic attribute, and t, t0 are terms. An
equality has the form =(t, t0 ), where t and t0 are terms. A conjunctive formula ∃y φ(x, y)
is an existentially quantified conjunction of atoms α and equalities =(t, t0 ), which have
free variables among x and y.
Wn
Definition 2. A Semantic Web search query Q(x) is an expression i=1 ∃yi φi (x, yi ),
where each φi with i ∈ {1, . . . , n} is a conjunction of atoms α (also called positive atoms),
negated conjunctive formulas not ψ, and equalities =(t, t0 ),Wwhich have free variables
n
among x and yi , and the x’s are exactly the free variables of i=1 ∃yi φi (x, yi ).
Intuitively, Semantic Web search queries are unions of conjunctive queries, which may
contain negated conjunctive queries in addition to atoms and equalities as conjuncts.
Example 2. (Scientific Database cont’d). Two Semantic Web search queries are:
Q1 (x) = (Scientist(x) ∧ not doctoralDegree(x, “oxford university”) ∧ worksFor(x,
“oxford university”)) ∨ (Scientist(x) ∧ doctoralDegree(x, “oxford university”) ∧
not worksFor(x, “oxford university”));
Q2 (x) = ∃y (Scientist(x) ∧ worksFor(x, “oxford university”) ∧ isAuthorOf(x, y)∧
not ConferencePaper(y) ∧ not ∃z yearOfPublication(y, z)).
Informally, Q1 (x) asks for scientists who are either working for oxford university and did
not receive their Ph.D. from that university, or who received their Ph.D. from oxford uni-
versity but do not work for it. Whereas query Q2 (x) asks for scientists of oxford university
who are authors of at least one unpublished non-conference paper. Note that when search-
ing for scientists, the system automatically searches for all subconcepts (known according
to the background ontology), such as e.g. Ph.D. students or computer scientists.
20 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

Semantics of Positive Search Queries. We now define the semantics of positive Semantic
Web search queries, which are free of negations, in terms of ground substitutions via the
notion of logical consequence.
A search query Q(x) is positive iff it contains no negated conjunctive subqueries. A
(variable) substitution θ maps variables from X to terms. A substitution θ is ground iff it
maps to Web pages p ∈ P, Web objects o ∈ O, and data values v ∈ V. A closed first-order
formula φ is a logical consequence of a knowledge
S base KB = (T , (Aa )a∈P∪O ), denoted
KB |= φ, iff every first-order model I of T ∪ a∈P∪O Aa also satisfies φ.

Definition 3. Given a Semantic Web knowledge base KB and a positive Semantic Web
search query Q(x), an answer for Q(x) to KB is a ground substitution θ for the variables x
(which are exactly the free variables of Q(x)) with KB |= Q(xθ).

Example 3. (Scientific Database cont’d). Consider the Semantic Web knowledge base KB
of Example 1 and the following positive Semantic Web search query, asking for all scien-
tists who author at least one published journal paper:
Q(x) = ∃y (Scientist(x) ∧ isAuthorOf(x, y) ∧ JournalPaper(y) ∧ ∃z yearOfPublication(y, z)).

An answer for Q(x) to KB is θ = {x/i2 }. Recall that i2 represents the scientist Mary.

Semantics of General Search Queries. We next define the semantics of general Se-
mantic Web search queries by reduction to the semantics of positive ones, interpreting
negated conjunctive subqueries not ψ as the lack of evidence about the truth of ψ. That is,
negations are interpreted by a closed-world semantics on top of the open-world semantics
of DLs (we refer to [8] for more motivation and background).

Definition 4. Given a Semantic Web knowledge base KB and search query
Wn
Q(x) = i=1 ∃yi φi,1 (x, yi ) ∧ · · · ∧ φi,li (x, yi ) ∧ not φi,li +1 (x, yi ) ∧ · · · ∧ not φi,mi (x, yi ) ,

an answer for Q(x) to KB is a ground substitution θ for the variables x such that KB |=
Q+ (xθ) and KB 6|= Q− (xθ), where Q+ (x) and Q− (x) are defined as follows:
W
Q+ (x) = Wn
i=1 ∃yi φi,1 (x, yi ) ∧ · · · ∧ φi,li (x, yi ) and
Q− (x) = ni=1 ∃yi φi,1 (x, yi ) ∧ · · · ∧ φi,li (x, yi ) ∧ (φi,li +1 (x, yi ) ∨ · · · ∨ φi,mi (x, yi )) .

Roughly, a ground substitution θ is an answer for Q(x) to KB iff (i) θ is an answer for
Q+ (x) to KB , and (ii) θ is not an answer for Q− (x) to KB , where Q+ (x) is the positive
part of Q(x), while Q− (x) is the positive part of Q(x) combined with the complement of
the negative one. Observe that both Q+ (x) and Q− (x) are positive queries.

Example 4. (Scientific Database cont’d). Consider the Semantic Web knowledge base
KB = (T , (Aa )a∈P∪O ) of Example 1 and the following general Semantic Web search
query, asking for Mary’s unpublished non-journal papers:
Q(x) = ∃y (Article(x) ∧ hasAuthor(x, y) ∧ name(y, “mary”) ∧ not JournalPaper(x) ∧
not ∃z yearOfPublication(x, z)).

An answer for Q(x) to KB is given by θ = {x/i3 }. Recall that i3 represents an unpub-
lished conference paper entitled “Semantic Web search”. Observe that the membership
axioms Article(i3 ) and hasAuthor(i2 , i3 ) do not appear in the semantic annotations Aa
with a ∈ P ∪ O, but they can be inferred from them using the background ontology T .
Combining Semantic Web Search with the Power of Inductive Reasoning 21

Ranking Answers. As for the ranking of all answers for a Semantic Web search query
Q to a Semantic Web knowledge base KB (i.e., ground substitutions for all free variables
in Q, which correspond to tuples of Web pages, Web objects, and data values), we use a
generalization of the PageRank technique: rather than considering only Web pages and the
link structure between Web pages (expressed through the role links to here), we also con-
sider Web objects, which may occur on Web pages (expressed through the role contains),
and which may also be related to other Web objects via other roles. More concretely, we
define the ObjectRank of a Web page or an object a as follows:
P
R(a) = d · b∈Ba R(b) / Nb + (1 − d) · E(a) ,

where (i) Ba is the set of all Web pages and Web objects that relate to a, (ii) Nb is the
number of Web pages and Web objects that relate from b, (iii) d is a damping factor, and
(iv) E associates with every Web page and every Web object a source of rank.

4 Deductive Offline Ontology Compilation
In this section, we describe the (deductive) offline ontology reasoning step, which com-
piles the implicit terminological knowledge in the TBox of a Semantic Web knowledge
base into explicit membership axioms in the ABox, i.e., in the semantic annotations of
Web pages / objects, so that it (in addition to the standard Web pages) can be searched by
standard Web search engines. For the online query processing step, see [8].
The compilation of TBox knowledge into ABox knowledge is formalized as follows.
Given a satisfiable Semantic Web knowledge base KB = (T , (Aa )a∈P∪O ), the simple
completion of KB is the Semantic Web knowledge base KB 0 = (∅, (Aa 0 )a∈P∪O ) such
that every Aa 0 is the set of all concept memberships A(a), roleSmemberships P (a, b), and
attribute memberships U (a, v) that logically follow from T ∪ a∈P∪O Aa , where A ∈ A,
P ∈ RA , U ∈ RD , b ∈ I, and v ∈ V. Informally, for every Web page and object, the simple
completion collects all available and deducible facts (whose predicate symbols shall be
usable in search queries) in a completed semantic annotation.
Example 5. Consider the TBox T of Example 1 and the semantic annotations (Aa )a ∈ P∪O
of Example 1. The simple completion contains in particular the new axioms Article(i3 ),
hasAuthor(i3 , i2 ), and Article(i4 ). The first two are added to Ai3 and the last one to Ai4 .
As shown in [8], general quantifier-free search queries to a Semantic Web knowledge
base KB over DL-LiteA [10] as underlying DL can be evaluated on the simple completion
of KB (which contains only compiled but no explicit TBox knowledge anymore). Similar
results hold when the TBox of KB is equivalent to a Datalog program, and the query is
fully general. Hence, the simple completion assures (i) always a sound query processing
and (ii) a complete query processing in many cases. For this reason, and since complete-
ness of query processing is actually not that much an issue in the inherently incomplete
Web, we propose to use the simple completion as the basis of our Semantic Web search.
Once the completed semantic annotations are computed, we encode them as HTML
pages, so that they are searchable via standard keyword search. Specifically, we build
one HTML page for the semantic annotation Aa of each individual a ∈ P ∪ O. That is, for
each individual a, we build a page p containing all the atomic concepts whose argument
is a and all the atomic roles/attributes where the first argument is a (see Section 2).

5 Inductive Offline Ontology Compilation
We now describe an inductive inference based on similarity search, which we propose
to use instead of deductive inference for offline ontology compilation in our approach to
Semantic Web search. Section 6 then summarizes the central advantages of this proposal.
22 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

Inductive Inference Based on Similarity Search. In similarity search [11], the basic
idea is to find the most similar object(s) to a query object (i.e., the one to be classified)
with respect to a similarity (or dissimilarity) measure. We review the basics of the k-
nearest-neighbor (k-NN) method applied to the Semantic Web context [5]. The objective
is to induce an approximation for a discrete-valued target hypothesis function h : IS → V
from a space of instances IS to a set of values V = {v1 , . . . , vs } standing for the classes
(concepts) that have to be predicted. Let xq be the query instance whose class-membership
is to be determined. Using a dissimilarity measure, the set of the k-nearest (pre-classified)
training instances relative to xq is selected: NN (xq ) = {x1 , . . . , xk }. Hence, the k-NN
algorithm approximates h for classifying xq on the grounds of the value that h is known to
assume for the training instances in NN (xq ). Precisely, the value is decided by means of a
weighted majority voting procedure: it is the most voted value by the instances in NN (xq )
weighted by the similarity of the neighbor individual. The estimate of the hypothesis func-
tion for the query individual is:
k
X
ĥ(xq ) := argmax wi δ(v, h(xi )) , (3)
v∈V i=1

where δ returns 1 in case of matching arguments and 0 otherwise, and, given a dissimilarity
measure d, the weights wi are determined by wi = 1/d(xi , xq ).
Observe that this setting assigns to the query instance xq a value, which stands for
one in a set of pairwise disjoint concepts (corresponding to the value set V ). In a multi-
relational setting, as those of the Semantic Web (SW) context, this assumption cannot be
made in general, since it is well known that an individual may be an instance of more
than one concept. The problem is also related to the closed-world assumption (CWA)
usually made in the knowledge discovery context. To deal with the open-world assump-
tion (OWA), generally adopted for the SW representations, the absence of information on
whether a training instance x belongs to the extension of a query concept Q should not be
interpreted negatively, as in the standard settings which adopt the CWA, rather, it should
count as neutral (uncertain) information. Assuming this alternate viewpoint, the multi-
class classification problem is transformed into a ternary one and the V = {+1, −1, 0}
value set is adopted for the classification of an individual with respect to a query concept
Q and where the three values denote, respectively, membership, non-membership, and un-
certainty. Hence, the task is cast as follows: given a query concept Q, determine the mem-
bership of an instance xq through the NN procedure (see Eq. 3) where V = {−1, 0, +1}
and the hypothesis function values for the training instances are determined as:
(
+1 K |= Q(x)
hQ (x) = −1 K |= ¬Q(x)
0 otherwise.

That is, the value of hQ for the training instances is determined by logical entailment (de-
noted |=) of the corresponding assertion from the knowledge base. Alternatively, a look-up
in the ABox of the knowledge base could be considered, thus obtaining a classification
process less complex but also possibly less accurate.
For measuring the similarity between individuals, a totally semantic and language in-
dependent family of dissimilarity measures has been used [5]. It is based on the idea of
comparing the semantics of the input individuals along a number of dimensions repre-
sented by a committee of concept descriptions, say F = {F1 , F2 , . . . , Fm }, which stands
as a group of discriminating features expressed in the OWL-DL sub-language taken into
account. It is formally defined as follows [5]:
Combining Semantic Web Search with the Power of Inductive Reasoning 23

Definition 5 (family of measures). Let KB = (T , A) be a knowledge base. Given a set
of concept descriptions F = {F1 , F2 , . . . , Fm }, corresponding weights w1 , . . . , wm , and
p > 0, a family of dissimilarity functions dpF : Ind(A) × Ind(A) 7→ [0, 1] is defined by:
 1/p
|F|
1 X
∀a, b ∈ Ind(A) : dpF (a, b) :=  wi | δi (a, b) |p  ,
|F| i=1

where the dissimilarity function δi (i ∈ {1, . . . , m}) is defined as follows:

 0
 Fi (a) ∈ A ∧ Fi (b) ∈ A
1 Fi (a) ∈ A ∧ ¬Fi (b) ∈ A or
∀a, b ∈ Ind(A) : δi (a, b) = ¬Fi (a) ∈ A ∧ Fi (b) ∈ A


1/2 otherwise.

An alternative definition for the projections requires the entailment of an assertion
(instance-checking) rather than the simple ABox look-up; this can make the measure more
accurate yet more complex to compute. Moreover, using instance checking, induction is
performed on top of deduction, thus making it a kind of completion of deductive reasoning.
As for the weights wi employed in the family of measures, they should reflect the
impact of the single feature concept Fi relative to the overall dissimilarity. This is de-
termined by the quantity of information conveyed by a feature, which is measured as its
entropy. Namely, the extension of a feature Fi relative to the whole domain of objects may
be probabilistically quantified as PFi = |Fi I |/|∆I | (relative to the canonical interpreta-
tion I). This can be roughly approximated by |retrieval(Fi )|/|Ind(A)|. Hence, considering
also the probability P¬Fi related to its negation and the one related to the unclassified in-
dividuals (relative to Fi ), denoted PU , we may give an entropic measure for the feature:
H(Fi ) = − (PFi log(PFi ) + P¬Fi log(P¬Fi ) + PU log(PU )) .
The measures strongly depend on F. Here, we make the assumption that the feature-set
F represents a sufficient number of (possibly redundant) features that are able to discrim-
inate really different individuals. However, an optimal discriminating feature set could be
learned [7]. Experimentally, we obtained good results by using the very set of both primi-
tive and defined concepts found in the knowledge base [5].

Measuring the Likelihood of an Answer. The inductive inference made by the proce-
dure shown above is not guaranteed to be deductively valid. Indeed, inductive inference
naturally yields a certain degree of uncertainty. So, from a more general perspective, the
main idea behind the above inductive inference for Semantic Web search is closely re-
lated to the idea of using probabilistic ontologies to increase the precision and the recall of
querying databases and of information retrieval in general. But, rather than learning prob-
abilistic ontologies from data, representing probabilistic ontologies, and reasoning with
probabilistic ontologies, we directly use the data in the inductive inference step.
In order to measure the likelihood of the decision made by the inductive procedure
(individual xq belongs to the query concept denoted by value v maximizing the argmax
argument in Eq. 3), given the k-nearest training individuals in NN (xq ) = {x1 , . . . , xk },
the quantity that determined the decision should be normalized by dividing it by the sum of
such arguments over the (three) possible values:
Pk
i=1 wi · δ(v, hQ (xi ))
l(class(xq ) = v|NN (xq )) = P Pk . (4)
i=1 wi · δ(v , hQ (xi ))
0
v 0 ∈V
24 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

Hence, the likelihood of the assertion Q(xq ) corresponds to the case when v = +1.
The computed likelihood can be used for building a probabilistic ABox (which is a collec-
tion of pairs, each consisting of a classical ABox axiom and a probability value).

6 Inconsistencies, Noise, and Incompleteness
In this section, we illustrate the main advantages of using inductive reasoning in Seman-
tic Web search, namely, that inductive reasoning (differently from deductive reasoning)
can handle inconsistencies, noise, and incompleteness in Semantic Web knowledge bases,
which are all very likely to occur when knowledge bases are stored in a distributed and
heterogeneous fashion, like on the Web.

Inconsistencies. Since our inductive method is based on the majority vote of the individ-
uals in the neighborhood, it may be able to give a correct classification even in the case of
inconsistent knowledge bases. This aspect is illustrated by the following example.

Example 6. Consider the description logic knowledge base KB = (T , A) that consists of
the following TBox T and ABox A:

T = {Man ≡ Male u Human; Professor ≡ Person u ∃abilitatedTo.Teaching u
∃isSupervisorOf.PhDThesis u Researcher; Researcher ≡ GraduatePerson u
∃worksFor.ResearchInstitute u ¬∃isSupervisorOf.PhDThesis; . . .} ;
A = {Professor(Franz); isSupervisorOf(Franz, DLThesis); Professor(John);
isSupervisorOf(John, RoboticsThesis); Professor(Flo); isSupervisorOf(Flo, MLThesis);
Researcher(Nick); Researcher(Ann); isSupervisorOf(Nick, SWThesis); . . .} .

Actually, Nick is a Professor, indeed, he is the supervisor of a PhD thesis in A. However,
by human mistake, he is asserted to be a Researcher in A, and by the axiom for Researcher
in T , he cannot be the supervisor of any PhD thesis. Hence, KB is inconsistent, and thus a
deductive reasoner cannot answer whether Nick is a Professor or not (since everything can
be deduced from an inconsistent knowledge base). On the contrary, by inductive reasoning,
it is highly probable that the returned classification result is that Nick is an instance of
Professor. This is because the most similar individuals are Franz, John, and Flo, and all of
them vote for the concept Professor.

Noise. Inductive reasoning may also be able to give a correct classification in the presence
of noise in a knowledge base (containing, e.g., incorrect concept and/or role membership
assertions), which is illustrated by the following example.

Example 7. Consider the description logic knowledge base KB = (T 0 , A), where the
ABox A is as in Example 6 and the TBox T 0 is obtained from the TBox T of Example 6
by replacing the axiom for Researcher by the following axiom:
Researcher ≡ GraduatePerson u ∃worksFor.ResearchInstitute .

Again, Nick is actually a Professor, but by human mistake asserted to be a Researcher
in KB . But due to the slightly modified axiom for Researcher, there is no inconsistency
in KB anymore. By deductive reasoning, however, Nick turns out to be a Researcher,
whereas by inductive reasoning, it is highly probable that the returned classification result
is that Nick is an instance of Professor, as above, because the most similar individuals are
Franz, John, and Flo, and all of them vote for the concept Professor.
Combining Semantic Web Search with the Power of Inductive Reasoning 25

Incompleteness. Clearly, inductive reasoning may also be able to give a correct classifi-
cation in the presence of incompleteness in a knowledge base. That is, inductive reasoning
is not necessarily deductively valid, and may produce new knowledge.

Example 8. Consider the description logic knowledge base KB = (T 0 , A0 ), where the
TBox T 0 is as in Example 7 and the ABox A0 is obtained from the ABox A of Exam-
ple 6 by removing the axiom Researcher(Nick). Then, the resulting knowledge base is
neither inconsistent nor noisy, but it is now incomplete. Nonetheless, by the same line of
argumentation as in Examples 6 and 7, it is highly probable that the classification result by
inductive reasoning is that Nick is an instance of Professor.

7 Implementation and Experiments
In this section, we describe our prototype implementation for a semantic desktop search
engine. Furthermore, we report on very positive experimental results on the precision and
the recall under inductively vs. deductively completed semantic annotations.

Implementation. We have implemented a prototype for a semantic desktop search en-
gine. We have realized both a deductive and an inductive version of the offline inference
step for generating the completed semantic annotation for every considered resource. The
deductive version uses P ELLET1 , while the inductive one is based on the k-NN technique,
integrated with an entropic measure, as proposed in Section 5. Specifically, each individ-
ual i of a Semantic Web knowledge base is classified relative to all atomic concepts and
all restrictions ∃R− .{i} with roles R. The parameter k was set to log(|Ind(A)|), where
Ind(A) stands for all individuals in the knowledge base. The simpler distances d1F were
employed, using all the atomic concepts in the knowledge base for determining the set F.

Precision and Recall of Inductive Semantic Web Search. We next give an experimental
comparison between Semantic Web search under inductive and under deductive reason-
ing. We do this by providing the precision and the recall of the latter vs. the former. Our
experimental results with queries relative to the F INITE -S TATE -M ACHINE (FSM) and
the S URFACE -WATER -M ODEL (SWM) ontology from the Protégé Ontology Library2 are
summarized in Table 1. For example, Query (8) asks for all transitions having no target
state, while Query (16) asks for all numerical models having either the domain “lake” and
public availability, or the domain “coastalArea” and commercial availability. The experi-
mental results in Table 1 essentially show that the answer sets under inductive reasoning
are very close to the ones under deductive reasoning.

8 Summary and Outlook
We have presented a combination of Semantic Web search as presented in [8] with an in-
ductive reasoning technique, based on similarity search [11] for retrieving the resources
that likely belong to a query concept [5]. As a crucial advantage, the new approach to Se-
mantic Web search allows for handling inconsistencies, noise, and incompleteness, which
are very likely in distributed and heterogeneous environments, such as the Web. We have
also reported on a prototype implementation and very positive experimental results on the
precision and the recall of the new inductive approach to Semantic Web search.
1
http://www.mindswap.org
2
http://protegewiki.stanford.edu/index.php/Protege Ontology Library
26 C. d’Amato, N. Fanizzi, B. Fazzinga, G. Gottlob, and T. Lukasiewicz

Table 1. Precision and recall of inductive vs. deductive Semantic Web search.
Onto- Query No. Results No. Results No. Correct Results Precision Recall
logy Deduction Induction Induction Induction Induction
1 FSM State(x) 11 11 11 1 1
2 FSM StateMachineElement(x) 37 37 37 1 1
3 FSM Composite(x) ∧ hasStateMachineElement(x, accountDetails) 1 1 1 1 1
4 FSM State(y) ∧ StateMachineElement(x) ∧ hasStateMachineElement(x, y) 3 3 3 1 1
5 FSM Action(x) ∨ Guard(x) 12 12 12 1 1
6 FSM ∃y, z (State(y) ∧ State(z) ∧ Transition(x) ∧ source(x, y) ∧ target(x, z)) 11 2 2 1 0.18
7 FSM StateMachineElement(x) ∧ not ∃y (StateMachineElement(y) ∧
hasStateMachineElement(x, y)) 34 34 34 1 1
8 FSM Transition(x) ∧ not ∃y (State(y) ∧ target(x, y)) 0 5 0 0 1
9 FSM ∃y (StateMachineElement(x) ∧ not hasStateMachineElement(x,
accountDetails) ∧ hasStateMachineElement(x, y) ∧ State(y)) 2 2 2 1 1
10 SWM Model(x) 56 56 56 1 1
11 SWM Mathematical(x) 64 64 64 1 1
12 SWM Model(x) ∧ hasDomain(x, lake) ∧ hasDomain(x, river) 9 9 9 1 1
13 SWM Model(x) ∧ not ∃y (Availability(y) ∧ hasAvailability(x, y)) 11 11 11 1 1
14 SWM Model(x) ∧ hasDomain(x, river) ∧ not hasAvailability(x, public) 2 8 0 0 0
15 SWM ∃y (Model(x) ∧ hasDeveloper(x, y) ∧ University(y)) 1 1 1 1 1
16 SWM Numerical(x) ∧ hasDomain(x, lake) ∧ hasAvailability(x, public)∨
Numerical(x) ∧ hasDomain(x, coastalArea) ∧
hasAvailability(x, commercial) 12 9 9 1 0.75

In the future, we aim especially at extending the desktop implementation to a real Web
implementation, using existing search engines, such as Google. Another interesting topic is
to explore how search expressions that are formulated as plain natural language sentences
can be translated into the ontological conjunctive queries of our approach. It would also
be interesting to investigate the use of probabilistic ontologies rather than classical ones.

Acknowledgments. Georg Gottlob’s work was supported by the EPSRC grant Number
EP/E010865/1 “Schema Mappings and Automated Services for Data Integration.” Georg
Gottlob, whose work was partially carried out at the Oxford-Man Institute of Quantitative
Finance, gratefully acknowledges support from the Royal Society as the holder of a Royal
Society-Wolfson Research Merit Award. Thomas Lukasiewicz’s work was supported by
the German Research Foundation (DFG) under the Heisenberg Programme.

References
[1] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The
Description Logic Handbook. Cambridge University Press, 2003.
[2] T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Sci. Am., 284:34–43, 2001.
[3] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer
Networks, 30(1–7):107–117, 1998.
[4] P.-A. Chirita, S. Costache, W. Nejdl, and S. Handschuh. P-TAG: Large scale automatic gener-
ation of personalized annotation TAGs for the Web. In Proc. WWW-2007.
[5] C. d’Amato, N. Fanizzi, and F. Esposito. Query answering and ontology population: An in-
ductive approach. In Proc. ESWC-2008.
[6] L. Ding, T. W. Finin, A. Joshi, Y. Peng, R. Pan, and P. Reddivari. Search on the Semantic Web.
IEEE Computer, 38(10):62–69, 2005.
[7] N. Fanizzi, C. d’Amato, and F. Esposito. Induction of classifiers through non-parametric meth-
ods for approximate classification and retrieval with ontologies. International Journal of Se-
mantic Computing, 2(3):403–423, 2008.
[8] B. Fazzinga, G. Gianforme, G. Gottlob, and T. Lukasiewicz. From Web search to Seman-
tic Web search. Technical Report INFSYS RR-1843-08-11, Institut für Informationssysteme,
TU Wien, November 2008. http://www.kr.tuwien.ac.at/research/reports/rr0811.pdf.
[9] R. V. Guha, R. McCool, and E. Miller. Semantic search. In Proc. WWW-2003.
[10] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data
to ontologies. J. Data Semantics, 10:133–173, 2008.
[11] P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search — The Metric Space Ap-
proach, Advances in Database Systems 32. Springer, 2006.
Evidential Nearest-Neighbors Classification for
Inductive ABox Reasoning

Nicola Fanizzi, Claudia d’Amato, and Floriana Esposito

Dipartimento di Informatica, Università degli studi di Bari
Campus Universitario, Via Orabona 4, 70125 Bari, Italy
{fanizzi,claudia.damato,esposito}@di.uniba.it

Abstract. In the line of our investigation on inductive methods for
Semantic Web reasoning, we propose an alternative way for approxi-
mate ABox reasoning based on the analogical principle of the nearest-
neighbors. Once neighbors of a test individual are selected, a combination
rule descending from the Dempster-Shafer theory can join together the
evidence provided by the neighbor individuals. We show how to exploit
the procedure for determining unknown class- and role-memberships or
fillers for datatype properties which may be the basis for many further
ABox inductive reasoning algorithms.

1 Introduction

In the context of reasoning in the Semantic Web (SW), a growing interest is
being committed to alternative procedures extending the standard methods so
that they can deal with the various facets of uncertainty related with Web rea-
soning [1]. Extensions of the classic probability measures [2] offer alternative
ways to deal with inherent uncertainty of the knowledge bases (KBs) in the SW.
Particularly, belief and plausibility measures adopted in the Dempster-Shafer
Theory of Evidence [3] have been exploited as means for dealing with incom-
pleteness [4] and also inconsistency [5], which may arise from the aggregation of
data and metadata on a large and distributed scale. In this work we undertake
again the inductive point of view. Indeed, in many SW domains a very large
number of assertions can potentially be true but often only a small number of
them is known to be true or can be inferred to be true. So far the application
of combination rules related to the Dempster-Shafer theory has concerned the
induction of metrics which are essential for all similarity-based reasoning meth-
ods [4]. One of the applications of such measures was related to the prediction of
assertions through nearest neighbor procedures. Recently a general-purpose ev-
idential nearest neighbor procedure based on the Dempster-Shafer combination
rule has been proposed [6]. In this work this method is extended to the specific
case of semantic KBs through a more epistemically appropriate combination pro-
cedure [7]. In the perspective of inductive methods, the need for a definition of a
semantic similarity measure for individuals arises, that is a problem that so far
received less attention in the literature compared to the measures for concepts.
28 N. Fanizzi, C. d’Amato, and F. Esposito

Recently proposed dissimilarity measures for individuals in specific languages
founded in Description Logics [8] turned out to be practically effective for the
targeted inductive tasks [9], however they are still based on structural criteria
so that they can hardly scale to more complex languages. We devised families of
dissimilarity measures for semantically annotated resources, which can overcome
the aforementioned limitations [10, 11]. Our measures are mainly based on the
Minkowski’s norms for Euclidean spaces induced by means of a method devel-
oped in the context of relational machine learning [12]. Namely, the measures
are based on the degree of discernibility of the input individuals with respect to
a given context [13] (or committee of features), which are represented by concept
descriptions expressed in the language of choice.
The main contributions of this work regard the extension of a framework for
the classification of individuals through a prediction procedure based on evidence
theory and similarity. In particular we propose using Yager’s rule of combina-
tion and exploiting the mentioned families of metrics defined for individuals in
ontologies. This allows for measuring the confirmation of the truth of candidate
assertions. The prediction of the values (related to class-membership or datatype
and object properties) may have plenty of applications in uncertainty reasoning
with ontologies.
The remainder of the paper is organized as follows. In the next section (§2),
distance measures that shall be utilized for selecting neighbor individuals are
introduced. Then (§3), the basics of the Dempster-Shafer theory and a nearest-
neighbor procedure based on an alternative rule of combination are recalled.
Hence (§4) we present the applications of the method to the problems of de-
termining the class- or role-membership of individuals w.r.t. given query con-
cepts / roles as well as the prediction of fillers for datatype properties. Relevant
related work are discussed in (§5) and we conclude (§6) proposing extensions
and applications of these methods in further works.

2 Dissimilarity Measures for Individuals

Since the reasoning method to be presented in the following is intended to be
general purpose, no specific language will be assumed in the following for re-
sources, concepts (classes) and their properties. It suffices to consider a generic
representation that can be mapped to some Description Logic language with the
standard model-theoretic semantics (see [8] for a thorough reference).
A knowledge base K = hT , Ai comprises a TBox T and an ABox A. T
is a set of axioms concerning the (partial) definition of concepts (and roles)
through class (role) expressions. A contains assertions (ground facts) concerning
the world state. The set of the individuals occurring in A will be denoted with
Ind(A). Each individual can be assumed to be identified by its own URI (it is
useful in this context to make the unique names assumption).
Similarity-based tasks, such as individual classification, retrieval, and clus-
tering require language-independent measures for individuals whose definition
can capture semantic aspects of their occurrence in the knowledge base [10, 11].
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning 29

For our purposes, we need functions to assess the similarity of individuals.
However individuals do not have an explicit syntactic (or algebraic) structure
that can be compared (unless one resorts to language-specific notions [9], such
as the most specific concept [8]). Focusing on the semantic level, the leading
idea may be that, similar individuals should behave similarly w.r.t. the same
concepts. A way for assessing the similarity of individuals in a knowledge base
can be based on the comparison of their semantics along a number of dimen-
sions represented by a set of concept descriptions (henceforth referred to as the
committee or context [13]). Specifically, the measure may compare individuals on
the grounds of their behavior w.r.t. a given context, say C = {C1 , C2 , . . . , Cm },
which stands as a group of discriminating relevant concepts (features) expressed
in the considered language. We begin with defining the behavior of an individ-
ual w.r.t. a certain concept in terms of projecting it in this dimension: Given a
concept Ci ∈ C, the related projection function πi : Ind(A) 7→ {0, 21 , 1} is defined:

 1 K |= Ci (a)
∀a ∈ Ind(A) πi (a) = 0 K |= ¬Ci (a)
1
2 otherwise
The case of πi (a) = 12 corresponds to the case when a reasoner cannot give
the truth value for a certain membership query. This is due to the Open World
Assumption normally made in Semantic Web reasoning. Hence, as in the classic
probabilistic models, uncertainty may be coped with by considering a uniform
distribution over the possible cases. Further ways to approximate these values
in case of uncertainty are investigated in [4].
The discernibility functions related to the context w.r.t. which two input in-
dividuals are compared are defined as follows. Given a feature concept Ci ∈ C,
the related discernibility function δi : Ind(A) × Ind(A) 7→ [0, 1] is defined as:

∀(a, b) ∈ Ind(A) × Ind(A) δi (a, b) = |πi (a) − πi (b)|
The discernibility function δi assigns 0 if the two individuals a and b have the
same behavior w.r.t. Ci , that is if they are both instance of Ci or both instance
of ¬Ci or nothing is known about this. This is because, if a and b have the same
bahavior w.r.t. Ci then there are no other information for discriminating them.
Finally, a family of dissimilarity measures for individuals that is inspired to
the Minkowski’s metrics can be defined [10, 11]: Let K = hT , Ai be a knowledge
base. Given a context C and a related vector of weights w, a family of dissimi-
larity measures {dCp }p∈IN , dCp : Ind(A) × Ind(A) 7→ [0, 1] is defined as follows:
" # p1
X
∀(a, b) ∈ Ind(A) × Ind(A) dCp (a, b) = wi δi (a, b)p
Ci ∈C
1
The effect of the weights is to normalize w.r.t. the other features involved.
Obviously these measures are not absolute, then they should be also considered
1
A possible way for determining the wi is to assign a high value if the corresponding
feature concept reflects high information content, low value otherwise (see [10] for
more details).
30 N. Fanizzi, C. d’Amato, and F. Esposito

w.r.t. the context of choice, hence comparisons across different contexts may not
be meaningful. Larger contexts are likely to decrease the measures because of the
normalizing factor yet these values is affected also by the degree of redundancy of
the features employed. In other works the choice of the weights is done according
to variance or entropy associated to the various concepts in the context [10, 11].
Compared to other proposed measures [14, 9, 15], the presented functions
do not depend on the constructors of a specific language, rather they require
only (retrieval or) instance-checking for computing the projections through class-
membership queries to the knowledge base. The complexity of measuring the
dissimilarity of two individuals depends on the complexity of such inferences
(see [8], Ch. 3). Note also that the projections that determine the measure can be
computed (or derived from statistics maintained on the knowledge base) before
the actual distance application, thus determining a speed-up in the computation
of the measure. This is very important for algorithms that massively use this
distance, such as instance-based methods.
One should assume that C represents a set of (possibly redundant) features
that are able to discriminate individuals that are actually different. The choice
of the concepts to be included (a feature selection problem [12]) may be cru-
cial. Therefore, specific optimization algorithms founded in randomized search
have been devised which are able to find optimal choices of discriminating con-
texts [10, 11]. However, the results obtained so far with knowledge bases drawn
from ontology libraries showed that (a selection) of the primitive and defined
concepts are often sufficient to induce sufficiently discriminating measures.

3 Evidence-Theoretic Nearest-Neighbor Prediction

In this section the basics of the theory of evidence and combination rules [3]
are recalled then a nearest neighbor classification procedure based on the rule of
combination [6] is extended in order to perform prediction of unobserved values
(related to datatype properties or also class-membership).

3.1 Basics of the Evidence Theory

In the Dempster-Shafer theory, a frame of discernment Ω is defined as the set
of all hypotheses in a certain domain. Particularly, in a classification problem it
is the set of all possible classes. A basic belief assignment
P (BBA) is a function
m that defines a mapping m : 2Ω 7→ [0, 1] verifying: A∈Ω m(A) = 1. Given
a certain piece of evidence, the value of the BBA for a given set A expresses a
measure of belief that is committed exactly to A. The quantity m(A) pertains
only to A and does not imply any additional claims about any of its subsets. If
m(A) > 0, then A is called a focal element for m.
The BBA m cannot be considered a proper probability measure: it is de-
fined over 2Ω instead of Ω and it does not require the properties of monotone
measures [2]. The BBA m and its associated focal elements define a body of
evidence, from which a belief function Bel and a plausibility function Pl can
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning 31

be derived as mappings from 2Ω to [0, 1]. For a given A ⊆ Ω, the belief in A,
denoted Bel(A), represents a measure of the total belief committed to A given
the available evidence. Bel is defined as follows:
X
∀A ∈ 2Ω Bel(A) = m(B) (1)
∅6=B⊆A

Analogously, the plausibility of A, denoted Pl(A), represents the amount of belief
that could be placed in A, if further information became available. Pl is defined
as follows:
X
∀A ∈ 2Ω Pl(A) = m(B) (2)
B∩A6=∅

It is easy to see that: Pl(A) = P
Bel(Ω) − Bel(Ā). Moreover m(∅) = 1 − Bel(Ω)
|A\B|
and for each A 6= ∅: m(A) = B⊆A (−1) Bel(B). Using these equations,
knowing just one function among m, Bel, and Pl allows to derive the others.
The Dempster-Shafer rule of combination [3] is an operation for pooling ev-
idence from a variety of sources. This rule aggregates independent bodies of
evidence defined within the same frame of discernment into one body of evi-
dence. Let m1 and m2 be two BBAs. The new BBA obtained by combining m1
and m2 using the rule of combination, m12 is the orthogonal sum of m1 and m2 .
Generally, the normalized version of the rule is used:
P
Ω B∩C=A m1 (B) m2 (C)
P
∀A ∈ 2 \ {∅} m12 (A) = (m1 ⊕ m2 )(A) =
1 − B∩C=∅ m1 (B) m2 (C)

(and m12 (∅) = 0) where the numerator (1 − c) normalizes the values of the
combined BBA w.r.t. the amount of conflict c between m1 and m2 .
Different evidence fusion rules have been proposed [2]. A more epistemolog-
ically sound combination rule [7] for our purposes places the probability mass
related to the conflict between the BBAs to the case of maximal ignorance.
P
 B∩C=A m1 (B) m2 (C) A 6= Ω ∧ A 6= ∅
∀A ∈ 2Ω m12 (A) = m1 (Ω) m2 (Ω) + c A=Ω

0 A=∅

This means that the conflict between the two sources of evidence is not hidden,
but it is explicitly recognized as a contributor to ignorance.
Due to the associativity and commutativity of the operations involved, it is
easy to prove that the resulting combination operator is associative and commu-
tative, and admits the vacuous BBA (Ω unique focal set) as neutral element.

3.2 The Nearest Neighbors Procedure

Let us consider the finite set of instances X and a finite set of integers V ⊆ ZZ to
be used as labels (which may correspond to disjoint classes or distinct attribute
values). The available information is assumed to consist in a training set TrSet =
32 N. Fanizzi, C. d’Amato, and F. Esposito

{(x1 , v1 ), . . . , (xM , vM )} ⊆ Ind × V of single-labeled instances (examples). In our
case, X = Ind(A), the set of individual names occurring in the ontology.
Let xq be a new individual to be classified on the basis of its nearest neighbors
in TrSet. Let Nk (xq ) = {(xo(j) , vo(j) ) | j = 1, . . . , k} be the set of the k nearest
neighbors of xq in TrSet sorted by a function o(·) depending on an appropriate
metric d which can be applied to ontology individuals (e.g. one of the measures
in the family defined in the previous section §2).
Each pair (xi , vi ) ∈ Nk (xq ) constitutes a distinct item of evidence regarding
the value to be predicted for xq . If xq is close to xi according to d, then one
will be inclined to believe that both instances are associated to the same value,
while when d(xq , xi ) increases, this belief decreases and that leads to a situation
of almost complete ignorance concerning the value to be predicted for xq .
Consequently, each (xi , vi ) ∈ Nk (xq ) may induce a BBA mi over V which
can be defined as follows [6]:

 λσ(d(xq , xi )) A = {vi }
∀A ∈ 2V mi (A) = 1 − λσ(d(xq , xi )) A=V (3)

0 otherwise
where λ ∈]0, 1[ is a parameter and σ(·) is a decreasing function such that σ(0) = 1
and limd→∞ σ(d) = 0 (e.g. σ(d) = exp(−γdn ) with γ > 0 and n ∈ IN). The values
of the parameters can be determined heuristically.
Considering each training individual in Nk (xq ) as an separate source of ev-
idence, k BBAs mj are obtained. These can be pooled by means of the rule of
combination leading to the aggregated BBA m that synthesizes the final belief:
k
M
m̄ = mj = m1 ⊕ · · · ⊕ mk (4)
j=1

In order to predict a value, functions Bel and Pl can be derived from m̄ using
the equations seen above, and the query individual xq is assigned the value in V
that maximizes the belief or plausibility:
vq = argmax Bel({vi }) or vq = argmax Pl({vi })
(xi ,vi )∈Nk (xq ) (xi ,vi )∈Nk (xq )

The former choice (select the hypothesis with the greatest degree of belief the
most credible) corresponds to a skeptical viewpoint while the latter (select the
hypothesis with the lowest degree of doubt the most plausible) is more credulous.
The degree belief (or plausibility) of the predicted value provides also a way to
compare the answers of an algorithm built on top of such analogical procedure.
This is useful for tasks such as ranking, matchmaking, etc..
Finally, it is possible to combine the two measures Bel and Pl analogously
to necessity (Nec) and possibility (Pos) in Possibility Theory (which can be
considered a special case2 of Dempster-Shafer theory). One can define a single
2
Precisely, the body of evidence must contain consonant focal sets, i.e. when the set
of focal elements is a nested family [2].
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning 33

ENNk (xq , TrSet, V )

1. Compute the neighbor set Nk (xq ) ⊆ TrSet.
2. for each i ← 1 to k do
Compute mi (Eq. 3)
3. for each v ∈ V do
Compute m̄ (Eq. 4) and derive Bel and Pl (Eqs. 1–2)
Compute the confirmation C (Eq. 5) from Bel and Pl
4. Select v ∈ V that maximizes C (Eq. 6).

Fig. 1. The evidence nearest neighbor procedure.

measure of confirmation C, ranging in [−1, +1], by means of a simple one-to-one
transformation [2]:
∀A ⊆ Ω C(A) = Bel(A) + Pl(A) − 1 (5)
Hence, denoted with C the combination of Bel and Pl, the resulting rule for
predicting the uncertain value for the test individual can be written as follows:
vq = argmax C({vi }) (6)
(xi ,vi )∈Nk (xq )

Summing up, the procedure is as reported in Fig. 1:
It is worthwhile to note that the complexity of the method is polynomial
in the number of instances in the TrSet. If this set is compact and contains
very prototypical individuals with plenty of related assertions, then the result-
ing predictions are likely to be accurate. Another source of complexity in the
computations may be the number of values in V which may yield a large number
of subsets 2|V | for which BBAs are to be computed. However this depends also
on the kind of problem that is to be solved (e.g. in class membership detection
|V | = 2). Moreover what really matters in the number of focal sets for each BBA
which may be much less than 2|V | .

4 Assertion Prediction
The utility of the presented procedure when applied to ontology reasoning can be
manifold. In the following we propose its employment in the inductive prediction
of unknown values related to class-membership and datatype / object property
fillers. This feature may be easily embedded in an ontology management system
in order to help the knowledge engineers elicit assertions which may be not be
derived from the knowledge base yet they can be rather made in analogy with
the others [9].
In the following, the symbol |≈ in expressions like K |≈ α will denote the
derivation of the assertion α from the knowledge base K obtained through an al-
ternative procedure (like the evidence nearest neighbor presented in the previous
section).
34 N. Fanizzi, C. d’Amato, and F. Esposito

4.1 Class-Membership

Let us suppose a (query) concept Q is given. In this case one may consider only
examples made up of individuals with a definite class-membership leading to a
binary problem with a set of values VQ = {+1, −1} denoting, resp., membership
and non-membership w.r.t. the query concept. Alternatively, one may admit
ternary problems with some labels set to 0 to explicitly denote an indefinite (un-
certain) class-membership [9, 10]. We shall also consider the related training set
TrSetQ ⊆ Ind(A) × VQ . The values of the labels vi for the training examples can
be obtained through deductive reasoning (instance-checking) or specific facilities
made available by the knowledge management systems [16].
Now to predict the class-membership value vq for some individual xq w.r.t. Q,
it suffices to call the procedure ENNk (xq , TrSetQ , VQ ) and decide on the grounds
of the returned value. Thus in a binary setting (VQ = {+1, −1}), one will either
conclude that K |≈ Q(xq ) or K |≈ ¬Q(xq ) depending on the value that maximizes
C in Eq. 6 (resp., vq = +1 or vq = −1). Moreover the value of the confirmation
function which determined the returned value vq can be exploited for ranking
the hits by comparing the strength of the inductive conclusions.
Adopting a ternary setting, it may turn out that the most likely value is
vq = 0 resulting in an uncertain case. One may force the choice among the
values of C for vq = −1 and vq = +1, e.g. when the confirmation degree exceeds
a some threshold.
The inductive procedure described above can be trivially exploited for per-
forming the retrieval of a certain concept inductively. Given a certain concept Q,
it would suffice to find all individuals a ∈ Ind(A) that are such that K |≈ Q(a).
The hits could be returned ranked by the respective confirmation value C(+1).

4.2 Datatype Fillers

In this case, let us suppose a certain (functional) datatype property P is given
and the problem is to predict its value for a certain test individual a (which has
to be supposed to be in its domain). The set of values VP may correspond to the
(discrete and finite) range of the property or to its restriction to the observed
values for the training instances: VP = {v ∈ range(P ) | ∃P (a, v) ∈ A}. Different
settings may be devised allowing for some special value(s) denoting the case of
a yet unobserved value(s) for that property.
The related training set will be some TrSetP ⊆ domain(P ) × VP , where
domain(P ) ⊆ Ind(A) is the set of individual names that have a known P -value in
the knowledge base. Differently from the previous problem, datatype properties
generally do not have a specific intensional definition in the knowledge base
(except for the specification of domain and range), hence a mere look-up in the
ABox should suffice to determine the TrSet.
Now to predict the value in VP of the datatype property P for some indi-
vidual a, the method requires calling the procedure with ENNk (a, TrSetP , VP ).
Thus in this setting, if vq is the value that maximizes Eq. 6 then we can write
K |≈ P (a, vq ). Also in this case the value of the confirmation function which
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning 35

determined choice of the value vq can be exploited for comparing the strength
of an inductive conclusion to others.
In case of special settings with dummy values indicating unobserved values,
when these are found to be the most credible among the others, a knowledge
engineer should be contacted for the necessary changes to the ontology.
The inductive procedure described above can be trivially exploited for per-
forming alternate forms of retrieval, e.g. finding all individuals with a certain
value for the given property. Given a certain value v, it would suffice to find all
individuals a ∈ Ind(A) that are such that K |≈ P (a, v). Again, the hits could be
returned ranked according to the respective confirmation value C(+1).
The limitation of treating only functional datatype properties may be over-
come by considering a different way to assign the probability mass to BBAs than
Eq. 3, including subsets of all possible values. Examples are to be constructed
accordingly (labels will be chosen in 2VP ). Alternatively, more complex frames
of discernment, e.g. Ω 0 = 2Ω , so consider sets of values as possible fillers of
the property. In all such settings the computation of the BBAs and descending
measures would become of course much more complex and expensive, yet clever
solutions (or approximations) proposed in the literature [6] may contribute to
mitigate this problem.

4.3 Relationships among Individuals

In principle, a very similar setting may be used in order to establish the possi-
bility that a certain test individual is related through some object property with
some other individual [17, 18].
Since the set Ind(A) is finite (the target is not discovering relations with
unseen individuals), one may want to find all individuals that are related to a
test one through some object property, say R. The problem can be decomposed
into smaller ones aiming at verifying whether K |≈ R(a, b) holds:

for each b ∈ Ind(A) do
for each a ∈ Ind(A) do
TrSet ← {(x, v) | x ∈ Ind(A) \ {a}, if K |= R(x, b) then v ← +1 else v ← −1}
vbR ← ENNk (a, TrSet, {+1, −1})
if vbR = +1 then
return K |≈ R(a, b)
else
return K |≈ ¬R(a, b)

Note that, in the construction of the training sets, the inference K |= R(x, b)
may turn out to be merely an ABox lookup operation for the given assertions
(when roles are not intensionally defined in a proper RBox). Conversely, if an
RBox is available (sometimes as a subset of the TBox) the values of the label for
the training examples can be obtained through deductive reasoning (instance-
checking) or the mentioned facilities made available by advanced reasoners or
knowledge management systems [16].
36 N. Fanizzi, C. d’Amato, and F. Esposito

This simple setting makes a sort of closed-world assumption in the decision of
the induced assertions descending from the adoption of the binary value set and
the composition of the TrSet. A more cautious setting would involve a ternary
value set VR = {−1, 0, +1} which allows for an explicit treatment of those indi-
viduals a for which R(a, b) is not derivable (or just absent from the ABox). The
final decision on the induced conclusion has to consider also this new possibility
(e.g. using a threshold of confirmation for accepting likely assertions).

5 Related Work

The proposed method is related to those approaches devised to offer alternative
ways of reasoning with ABoxes for eliciting hidden knowledge (regularities) in
order to complete and populate the ontology with likely assertions even in the
occurrence of incorrect parts, supposing this kind of noise is not systematic.
The tasks of ontology completion and population have often been tackled
through formal methods (such as formal concept analysis [19]). Discovering new
assertions (and related probabilities in a classical setting) is another related
task for eliciting hidden knowledge in the ontologies. In [18] a machine learning
method is proposed to estimate the truth of statements by exploiting regularities
in the data. In [17] another statistical learning method for OWL-DL ontologies is
proposed, combining a latent relational graphical model with Description Logic
inference in a modular fashion. The probability of unknown role-assertions can be
inductively inferred and known concept-assertions can be analyzed by clustering
individuals.
Similarity-based reasoning with ontologies is the primary aim of this work
which follows a number of related methods founded on dissimilarity measures for
individuals in knowledge bases expressed in Description Logics [9, 10]. Mostly,
they adopt some alternate form of the classic Nearest-Neighbor lazy learning
scheme [12] in order to draw inductive conclusions that often cannot be deduc-
tively entailed by the knowledge bases.
Similar approaches based on lazy learning have been proposed that adopt
generalized probability theories such as the Dempster-Shafer. In [6], which was a
source of inspiration for this paper, the standard rule of combination is exploited
in an evidence-theoretic classification procedure where labels were not assumed
to be mutually exclusive. Rules of combination had been used in [4] in order
to learn precise metrics to be exploited in a lazy learning setting like those
mentioned above.
One of the most appreciated advantages of performing inductive ABox rea-
soning through these methods is that they can naturally handle inconsistent
(and inherently incomplete) knowledge bases, especially when inconsistency is
not systematic. In [5] a method for dealing with inconsistent ABoxes populated
through information extraction is proposed: it constructs ad hoc belief networks
for the conflicting parts in an ontology and adopts the Dempster-Shafer theory
for assessing the confidence of the resulting assertions.
Evidential Nearest-Neighbors Classification for Inductive ABox Reasoning 37

6 Concluding Remarks and Outlook
In the line of our investigation of inductive methods for Semantic Web reasoning,
we have proposed an alternative way for approximate ABox reasoning based on
the nearest-neighbors analogical principle. Once neighbors of a test individual
are selected through some distance measures, a combination rule descending
from the Dempster-Shafer theory can fuse the evidence provided by the various
neighbor individuals. We have shown how to exploit the procedure for assertion
prediction problems such as determining unknown class- or role-memberships
as well as attribute-values which may be the basis for many ABox inductive
reasoning algorithms. The method is being implemented so to allow an extensive
experimentation on real ontologies.
Special settings to accommodate cases of uncertain or unobserved values
are to be investigated. One promising extension of the method concerns the
possibility of considering infinite sets of values V following the studies [20, 2].
This would allow dealing with domains where the total amount of values is
unknown (also due to the inherent nature of the Semantic Web). Moreover the
predicted values often need not to be exclusive. Hence the prediction procedure
would require an extension towards the consideration of sets of values instead of
singletons.
As necessity and possibility measures are related to the belief measures (see
note 2 at page 32) a natural extension may be towards the possibilistic theory and
its calculus which is, in general, different from the Dempster-Shafer theory and
calculus. Further possible extensions concern all other monotone measures such
as the Sugeno λ-measures [2]. The extension towards the Possibility Theory is
interesting also because of its parallelism with modal logics [20] and possibilistic
extensions of Description Logics [21].

References
1. Laskey, K., Laskey, K., Costa, P., Kokar, M., Martin, T., Lukasiewicz, T.: Un-
certainty Reasoning for the World Wide Web. W3C Incubator Group. (2008)
http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/.
2. Klir, G.: Uncertainty and Information. Wiley (2006)
3. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press (1976)
4. Fanizzi, N., d’Amato, C., Esposito, F.: Approximate measures of semantic dissim-
ilarity under uncertainty. In da Costa, P., et al., eds.: Uncertainty Reasoning for
the Semantic Web I. Volume 5327 of LNAI. Springer (2008) 355–372
5. Nikolov, A., Urea, V., Motta, E., de Roeck, A.: Using the Dempster-Shafer theory
of evidence to resolve ABox inconsistencies. In da Costa, P., et al., eds.: Uncertainty
Reasoning for the Semantic Web I. Volume 5327 of LNAI. Springer (2008) 143–160
6. Denoeux, T.: A k-nearest neighbor classication rule based on Dempster-Shafer
theory. IEEE Transactions on Systems, Man and Cybernetics 25 (1995) 804–813
7. Yager, R.: On the Dempster-Shafer framework and new combination rules. Infor-
mation Sciences 41 (1987) 93–137
8. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P., eds.:
The Description Logic Handbook. Cambridge University Press (2003)
38 N. Fanizzi, C. d’Amato, and F. Esposito

9. d’Amato, C., Fanizzi, N., Esposito, F.: Analogical reasoning in description logics.
In da Costa, P., et al., eds.: Uncertainty Reasoning for the Semantic Web I. Volume
5327 of LNAI. Springer (2008) 336–354
10. d’Amato, C., Fanizzi, N., Esposito, F.: Query answering and ontology population:
An inductive approach. In Bechhofer, S., et al., eds.: Proceedings of the 5th Eu-
ropean Semantic Web Conference, ESWC2008. Volume 5021 of LNCS., Springer
(2008) 288–302
11. Fanizzi, N., d’Amato, C., Esposito, F.: Metric-based stochastic conceptual cluster-
ing for ontologies. Information Systems 34 (2009) 725–739
12. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning –
Data Mining, Inference, and Prediction. Springer (2001)
13. Goldstone, R., Medin, D., Halberstadt, J.: Similarity in context. Memory and
Cognition 25 (1997) 237–255
14. Borgida, A., Walsh, T., Hirsh, H.: Towards measuring similarity in description
logics. In Horrocks, I., Sattler, U., Wolter, F., eds.: Working Notes of the Interna-
tional Description Logics Workshop. Volume 147 of CEUR Workshop Proceedings.,
Edinburgh, UK (2005)
15. d’Amato, C., Staab, S., Fanizzi, N.: On the influence of description logics ontologies
on conceptual similarity. In Gangemi, A., Euzenat, J., eds.: Proceedings of the 16th
Knowledge Engineering Conference, EKAW2008. Volume 5268 of LNAI., Springer
(2008) 48–63
16. Horrocks, I., Li, L., Turi, D., Bechhofer, S.: The Instance Store: DL reasoning
with large numbers of individuals. In Haarslev, V., Möller, R., eds.: Proceedings of
the 2004 Description Logic Workshop, DL 2004. Volume 104 of CEUR Workshop
Proceedings., CEUR (2004) 31–40
17. Rettinger, A., Nickles, M.: Infinite hidden semantic models for learning with OWL
DL. In d’Amato, C., et al., eds.: Proceedings of 1st ESWC Workshop on Inductive
Reasoning and Machine Learning for the Semantic Web, IRMLeS09. Volume 474
of CEUR Workshop Proceedings., Heraklion, Greece (2009)
18. Tresp, V., Huang, Y., Bundschus, M., Rettinger, A.: Materializing and querying
learned knowledge. In d’Amato, C., et al., eds.: Proceedings of 1st ESWC Workshop
on Inductive Reasoning and Machine Learning for the Semantic Web, IRMLeS09.
Volume 474 of CEUR Workshop Proceedings., Heraklion, Greece (2009)
19. Baader, F., Ganter, B., Sertkaya, B., Sattler, U.: Completing description logic
knowledge bases using formal concept analysis. In Veloso, M., ed.: Proceedings
of the 20th International Joint Conference on Artificial Intelligence, Hyderabad,
India (2007) 230–235
20. Harmanec, D., Klir, G., Wang, Z.: Modal logic interpretation of Dempster-Shafer
theory: An infinite case. International Journal of Approximate Reasoning 14 (1996)
81–93
21. Qi, G., Pan, J., Ji, Q.: A possibilistic extension of description logics. In Calvanese,
D., et al., eds.: Working Notes of the 20th International Description Logics Work-
shop, DL2007. Volume 250 of CEUR Workshop Proceedings., Bressanone, Italy
(2007) 435–442
�������������������������������������������������������

������������������������������������

�����������������������������������������������������������������������������������������������
�������������������������������������������

����������������

������ ������

������������������������

��������������������������������������

�������������������

������������������������

����������������������

�����������������������������������

������������������������

�������������������������

���������

���������
�������
��������

�������
�������
�������

��

��
������

������
�����
�����
�����

�����
����

���

�����������������������������������

�� ���������� �� ������������ ����
�

��� �� �� ��� �� ����� �� ������������ ��� ��� ������ �������� ���� ���� ��������� ���������
����������������������������

����������������������

���� ����� ��� ���� ������� ��� ������� ��� ������������� ���� ���� ��������� ��� ���� ������ ��� ���������� ��������
�������������������������������������������

�
�����������������������������������������������������������������������������������������������

����������������������

���������������������������������������

������ � ������
�� � ������������

��������������

���������������

����������������

���� ������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������
����� �������������������������������������������������������������������������������������������������������������
������������������������������������������
���� ������ ���� ������ ���� ����� ���� ���� ���� �� ��������� ������������ �������� ���������� ��������� ��������� �����
���������������������������������������������������������
����� ������� ������ ���� �������� ������� ���� ������� ��� �� ��������� ���� ����������� �������� ��������� ���� ������
�����������������������������������������������������������
����� ��������������������������������������������������������������������������������������������������������������
�����������
������ �����������������������������������������������������������������������������������������������������������
����������������������������������������������������������������������������������
����� ��������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������
����� �������������������������������������������������������������������������������������������������������������
�����������������
����� �������������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������
����� �����������������������������������������������������������������������������������������������������������
����������������
����� ����������������������������������������������������������������������������������������������������������
����������������
����� ���������� ���� ���������� ���� ��������� ���������� ��� ���� �������� ������������ ������� ��� ������������ ��� ���� ����
�������������������������������������������������������������������������������
������ ��������������������������������������������������������������������������������������������������������������
������������������������������������������������������
����� ��������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������
����� ������������������������������������������������������������������������������������������������������������
�������������������
����� ���������� ���� �������� ���� ���������� ���� ������������ ���� ��� ������������� ��� ���� ������� ���� �������� ��� ����� ����
����������������������������������������������������������������������
����� ������� ���� ������� ���� �������� �� ��������� ������ ���������� ���� ����� ������ ������������ ��� ���� ��������������
����������� ��� ������� ��������� ��� ������������ ��������� ���������� ������� ���� ������� ���� ������ ���� �����
���������������
����� ������������������������������������������������������������������������������������������������������������
�����������������������������������������
����� �����������������������������������������������������������������������������������������������������������
����� ���������� ���� �� ������ ������������ ������ ���� ���� ��������� ����� ��� ������ ������ ���� ���� ��������� �����
���������������������������������������������������������
����� ���������� ���� �� ������ ������������ ������� ��� ������������ ��� ���� ����� ��������� ����������� ��� �����������
�����������������������������������������������
����� ���������� ���� ���������� ������� ������ ������������ �������� �������� ��� ����������� ������������� ���������� ���� ����
�����������������
����� ������� ���� ���������� ���� ���������������� ���������� ��� ��������� ������ ��� ��������� ������� ��� ������������� ����
��������������������������������������������������������������������������������
����� ��������������������������������������������������������������������������������������������������������������
������������������������������������������������
����� �������� ������ ���������� ���� ������������ ��� ������ ���� ���� ��������� ����� ���� ����������������� ��� ������������
����������������������������������������������������������������������������������������������������������������
����� ����������������������������������������������������������������������������������������������������������������
����� �������� ����� ������� ���� ����� �������� ���������� �������� ��������� ������ ���� ������������ ���� ��������� �������
�������������������������������
����� �������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������
����� ��������������������������������������������������������������������������������������������������������
��������������������������������������������������������������
Axiomatic First-Order Probability

Kathryn Blackmond Laskey

Department of Systems Engineering and Operations Research
George Mason University, Fairfax VA 22030, USA
klaskey@gmu.edu

Abstract. Most languages for the Semantic Web have their logical basis in
some fragment of first-order logic. Thus, integrating first-order logic with
probability is fundamental for representing and reasoning with uncertainty in
the semantic web. Defining semantics for probability logics presents a
dilemma: a logic that assigns a real-valued probability to any first-order
sentence cannot be axiomatized and lacks a complete proof theory. This paper
develops a first-order axiomatic theory of probability in which probability is
formalized as a function mapping Gödel numbers to elements of a real closed
field. The resulting logic is fully first-order and recursively axiomatizable, and
therefore has a complete proof theory. This gives rise to a plausible reasoning
logic with a number of desirable properties: the logic can represent arbitrarily
fine-grained degrees of plausibility intermediate between proof and disproof; all
mathematical and logical assumptions can be explicitly represented as finite
computational structures accessible to automated reasoners; contradictions can
be discovered in finite time; and the logic supports learning from observation.
Keywords: First-Order Logic, Probability.

1 Introduction

Logic-based languages have long been recognized as an effective means to represent
information clearly, unambiguously, and in a manner that facilitates processing by
machines. By far the most common logical basis for Semantic Web languages is
classical first-order logic (FOL). This is no accident: its clear syntax, well-understood
semantics, and complete proof theory make FOL a natural choice for computational
knowledge representation and reasoning. However, FOL lacks a fundamental
capability essential for semantically aware systems. As Jeffreys [1] put it, “Traditional
or deductive logic admits only three attitudes to any proposition: definite proof,
disproof, or blank ignorance.” An intelligent reasoner must do more: it must assess
the plausibility of uncertain hypotheses, make reasonable choices when the outcome
is uncertain, and use observations to improve its representation of the world.
Probability is the unique plausible reasoning calculus that satisfies certain
intuitively satisfying axioms of coherent reasoning (e.g., [2]). For this reason,
probability has achieved a privileged status for plausible reasoning akin to FOL’s
privileged status with respect to logical reasoning. The past few decades have given
rise to increasingly expressive probability-based languages, as well as a host of
52 K. Laskey

restricted languages designed for scalability. There is increasing interest in probability
for semantic web applications [3].
It is often taken for granted that a new kind of logic is needed to capture essential
aspects of plausible reasoning: “Ordinary logic seems to be inadequate by itself to
cope with problems involving beliefs. In addition a theory of probability is required”
[4]. Because of their built-in machinery for reasoning about functions, higher-order
logic has been proposed as a natural logical basis for combining probability and logic
[5]. On the other hand, its complete proof theory makes first-order logic attractive as
a computational logic. Moreover, it is attractive to use the same logic to reason both
about the domain itself and about the plausibility of statements about the domain.
The question thus arises of whether an adequate formalization of probability is
possible within first-order logic itself.
This paper formalizes, within standard first-order logic, a probabilistic logic pow-
erful enough to express uncertainty about arbitrary first-order sentences. By formal-
izing probability as an axiomatic first-order theory, probabilities can be associated
coherently with arbitrary first-order sentences, with no modification of traditional
first-order semantics. To stay within axiomatic first-order logic, probabilities are
defined not as real numbers, but as elements of a real closed field. An axiom schema
is added to the standard axioms of the probability calculus to give “logical teeth” to
the idea that probability zero events do not happen. The semantics proposed here
connects naturally to algorithmic notions of randomness as proposed by Kolmogorov
and Martin-Löf [6], [7], as well as to Dawid’s [8] calibration criterion.

2 First-Order Probability

We begin with a first-order language L used to make assertions about a domain. L
includes the usual logical symbols (variables, logical connectives, universal and
existential quantifiers), together with a set of domain-specific predicate, function and
constant symbols. Without loss of generality, L is taken to be a traditional, untyped
first-order language.1 To operationalize the requirement that assertions be expressible
as a finite computational structure, a knowledge base (KB) is taken to be an axiomatic
theory of L. That is, a KB contains a consistent, recursive set A of sentences of L.
The logical consequences of these axioms comprise a recursively enumerable set TA =
Cn(A), called the theory of A. Gödel’s completeness theorem implies that TA is equal
to the set {σ : A ⊢ σ} of sentences provable from A.
In general, a KB may be incomplete – it need not imply a definite truth-value for
every sentence. In fact, a sufficiently powerful theory is necessarily incomplete. It is
useful for a reasoner to grade the plausibility of propositions it can neither prove nor
disprove. Probability is a natural candidate for this purpose.
It seems reasonable to define probability as a function mapping each sentence to a
real number between zero and 1, in a manner that satisfies the standard identities of
probability theory, and so that sentences provable (disprovable) from A are assigned

1
It is well known that a typed first-order logic can be re-expressed via a syntactic
transformation as an untyped logic (cf., [7]).
Axiomatic First-Order Probability 53

probability 1 (0). This approach, natural as it seems, runs into difficulty. The first
roadblock is that in standard first-order logic, arguments of functions must be ele-
ments of the domain, not sentences or propositions. The second roadblock is that the
theory of the real numbers cannot be fully characterized as an axiomatic first-order
theory. Several authors have shown that formalizing probabilities as a real numbers in
the unit interval results in a theory that cannot be axiomatized, and that does not admit
a complete proof theory (cf., [10], [11]). On the other hand, by abandoning the re-
quirement that probabilities be real-valued, Bacchus [12] developed axiomatic
probability logics that have a complete proof theory.
Given the mathematical impossibility of both an axiomatic first-order theory and a
function mapping sentences to real numbers, which should be preferred? To answer
this question, we step back to first principles, and consider fundamental requirements
for a computational probabilistic logic. First, a computational logic should explicitly
represent all mathematical and logical assumptions as finite computational structures
accessible to an automated reasoner. Second, it should be possible for a reasoner to
discover contradictions in a knowledge base, to identify when observations are
inconsistent with a theory, and to prove any consequence entailed by a theory. Third,
a logic for plausible reasoning must be able to associate measures of plausibility with
propositions, to express degrees of plausibility intermediate between proof and
disproof, and to do so in a logically coherent manner. All these requirements are met
by the proposed formalism. Furthermore, the first two requirements are automatically
satisfied if probability is formalized as a traditional first-order axiomatic theory, while
the final requirement can be met without demanding either that probability be
formalized as a function on sentences, or that probability values be real numbers.
Hence, a first-order axiomatic theory is a fundamental requirement, whereas a
function from sentences to real numbers is dispensable.
If probability is not a function mapping sentences to real numbers, then what is it?
We formalize probability as a function mapping Gödel numbers to elements of a real-
closed field (RCF). A RCF is the closest one can come to formalizing the real
numbers within first-order logic. The real numbers are uniquely characterized up to
isomorphism as an ordered field with the least upper bound property. The ordered
field axioms formalize familiar properties of the real and rational numbers: addition,
multiplication, additive and multiplicative inverses (hence, subtraction and division),
distribution of multiplication over addition, and complete ordering. These axioms can
be formalized fully in FOL. The defining property of the real numbers, that every
bounded non-empty set of real numbers has a least upper bound, is not a first-order
property. In a RCF, the least upper bound property holds for all definable relations.
The RCF axioms are sufficient to characterize all first-order properties of the real
numbers (cf., [13]). Thus, we assume probabilities are elements of a RCF.
Gödel showed that, given a sufficiently powerful formal system, domain elements
(e.g., numbers) can be associated with sentences, formulas, and proofs. This device
allows indirect expression of and reasoning about logical notions such as proof and
consistency, while complying with FOL’s prohibition against direct reference to sen-
tences. Defining probabilities as a mapping from Gödel numbers to elements of a
RCF allows us to develop a fully first-order axiomatization of probability. We argue
later that our axioms capture the essential requirements for a computational logic of
plausible reasoning.
54 K. Laskey

3 The Probability Axioms

The original language L and axioms A are augmented with additional symbols and
axioms to provide the necessary logical apparatus for probabilistic reasoning. The
augmented language and axioms are called L* and A*, respectively.

3.1 The Language

The language L* has numerical constants 0 and 1; arithmetic ordering predicate ≤;
arithmetic operators + and ×; one-place predicate symbols R and N to represent real
and natural numbers; and the two-place function symbol P to represent probability. In
addition, there is a predicate D to represent elements of the domain; and a countable
collection L1, L2, … of labels as names for individuals. If L already has mathematical
symbols and mathematical axioms consistent with our probability axioms, we can
make use of the existing logical machinery; otherwise new symbols are added.

3.2 The axioms

Domain axioms. Our first step is to relativize the domain axioms to D. This is
achieved through a standard syntactic translation, e.g., ∀x ϕ(x) becomes ∀x
D(x)→ϕ(x), and the sentence ∃x ϕ(x) becomes ∃x D(x)∧ϕ(x) (c.f., [9]).
Integer arithmetic. We require enough integer arithmetic to allow Gödel
numbering and reasoning about provability. The following axioms, together with the
RCF axioms defined below (which apply to natural numbers by virtue of inclusion)
serve this purpose:
N1. ∀x N(x)→ R(x)
N2. N(0)
N3. ∀x N(x)→ N(x+1)
N4. ∀x ∀y N(x) ∧ N(y) → ((x < y+1) → (x ≤ y))
N5. ∀x N(x)→ ¬(x < 0)
N6. All universal closures of formulas N(x)→ (ϕ(0)∧∀x(ϕ(x) → ϕ(x+1))) →
∀x ϕ(x), where ϕ(x) has x (and possibly other variables) free.
Real closed field axioms. As described above, probabilities are formalized as
elements of a real closed field (RCF), the first-order theory of the real numbers.
Axioms for a real closed field can be found in [13]. Note that by virtue of being
natural numbers, the constants 0 and 1 are also real numbers.
R1. Additive and multiplicative closure: ∀x ∀y R(x)∧R(y) → R(x+y) ∧ R(x ⋅ y)
R2. Commutativity: ∀x ∀y R(x)∧R(y) → (x+y = y+x) ∧ (x ⋅ y = y ⋅ x)
R3. Associativity: ∀x∀y∀z R(x)∧R(y)∧R(z) → (((x+y) + z = x + (y + z))
∧ ((x ⋅ y) ⋅ z = x ⋅ (y ⋅ z)))
R4. Identity: R(0) ∧ R(1) ∧ 0≠1 ∧ ( ∀x R(x) → ((x+0 = x) ∧ (x ⋅ 1 = x)) )
R5. Inverses: ∀x R(x) → ( ∃y (x + y = 0) ∧ (x≠0 → ∃y (xz = 1) ) )
R6. Distributive Law: ∀x∀y∀z R(x)∧R(y)∧R(z) → (x ⋅ (y + z) = (x ⋅ y) + (x ⋅ z))
Axiomatic First-Order Probability 55

R7. Total order: ∀x∀y∀z R(x)∧R(y)∧R(z) → (( x ≤ y ∨ y ≤ x ) ∧ (x ≤ y ∧ y ≤ x
→ x=y) ∧ (x≤y ∧ y≤z → x≤z))
R8. Agreement of ordering with field operations: : ∀x∀y∀z R(x)∧R(y)∧R(z) →
( (x≤y → x+z ≤ y+z) ∧ ((0 ≤ x ∧ 0 ≤ y) → 0 ≤ x ⋅ y) )
R9. First-order closure: The following axiom schema holds for all one-place
formulas ϕ(x): ∀x (ϕ(x) → R(x)) ∧ ∃x ϕ(x) ∧ ∃y (R(y) ∧ ∀x (ϕ(x) → x ≤ y) )
→ ∃y (R(y) ∧ ∀x (ϕ(x) → x ≤ y) ∧ ∀z (R(z) ∧ ∀x (ϕ(x) → x ≤ z) ↔ y ≤ z)
Axiom schema R9 is the first-order “image” of the least upper bound axiom. It
states that if ϕ(x) represents a non-empty subset of the real numbers and ϕ(x) has a
real upper bound, then ϕ(x) has a real least upper bound. Tarski [14] showed that the
theory of real closed real fields can be characterized as an ordered field in which
every element has a square root and every polynomial of odd degree has a root. R9
covers not only relations definable in the language of the real numbers, but also any
real relation definable in L*. Thus, the above axioms are stronger than the standard
RCF axioms.
Probability axioms. Good [4] stresses that probability is properly a two-place func-
tion P(E|H), taken to mean the probability that would be assigned to the proposition E
if the proposition H were known to be true. Good introduces the symbol H* to denote
“the usual assumptions of logic and pure mathematics,” which must be taken as given
in all probability assessments. He makes no attempt to decide exactly what should be
assumed as part of H*, and says it is conceivable that H* cannot be expressed in a
finite number of words. Because our concern is reasoning by computational agents,
we depart from Good and insist that the underlying assumptions be formalized as a
first-order axiomatic theory. We require that H* be expressed as a finite comp-
utational structure, with an effective procedure for generating the axioms explicitly,
and an effective procedure for checking whether any given sentence is an axiom. In
particular, we assume that H* includes the axioms N1-N5, R1-R9, and P1-P6 (below).
We introduce into L* the two-place function symbol P. The value of P represents
a meaningful probability whenever the following conditions are met (i) the first
argument of P is the Gödel number #σ of a sentence σ of L*; (ii) the second
argument of P is the Gödel number #ϕ(x) of a one-place open formula ϕ(x) defining a
relation representable in TA* = Cn(A*); and (iii) the relation represented by ϕ(x)
contains the Gödel numbers of all axioms in A*. The formula ϕ(x) is used to represent
the set of sentences whose Gödel numbers satisfy ϕ. P(#σ, #ϕ(x)) represents the
probability of σ, given that all sentences in the set represented by ϕ(x) are true.
Condition (iii) says that the domain axioms and probability axioms are taken as given.
For readability, we write P(σ | ϕ) for P(#σ, #ϕ(x)) and P(σ | τ, ϕ) for P(#σ, #ψ(x)),
where #ψ(x) represents the union of the relation defined by ϕ and {#τ}. That is, P(σ |
τ, ϕ) represents the likelihood of σ under the assumption that τ and all sentences in
the set represented by ϕ(x) are true.
With this preamble, we now present the probability axioms. The axioms are stated
informally for readability; stating them formally is straightforward. The axioms are
universally quantified over (Gödel numbers of) sentences σ and τ, and formulas ϕ.
P1. 0 ≤ P(σ | ϕ) ≤ 1.
P2. If A* ⊢ σ, then P(σ | A*) = 1.
56 K. Laskey

P3. If P(σ∧τ | ϕ) = 0, then P(σ∨τ | ϕ) = P(σ | ϕ) + P(τ | ϕ).
P4. P(σ∧τ | ϕ) = P(σ | τ, ϕ) × P(τ | ϕ)
P5. If σ↔τ, then P(σ | ϕ) = P(τ | ϕ), and P(γ | σ, A*) = P(γ | τ, A*) for all γ.
The first three axioms are the usual axioms for finitely additive probability. P4
formalizes Bayesian conditioning. P5 is taken from Good [4], and formalizes the
notion that logically equivalent propositions should be interchangeable with regard to
rational degrees of belief.
Some authors (including Good and de Finetti) regard finite additivity as sufficient
to formalize rational degrees of belief. Other authors consider countable additivity to
be essential. Because countable additivity is typically taken for granted in
applications, we regard it as essential. However, full formalization of countable
additivity is not possible within FOL, because FOL cannot express the notion of an
arbitrary infinite sequence of Gödel numbers. To formalize countable additivity, we
adopt a condition introduced by Gaifman [15]. Gaifman’s condition can be formalized
as a first-order axiom schema. Informally, it is stated as:
P6. P(∀x ψ(x) | ϕ) is the supremum of the values P(ψ(κ1)∨…∨ψ(κn) | ϕ), for all
finite conjunctions ψ(κ1)∨…∨ψ(κn) of sentences, formed by substituting
constant terms of L* into ψ(x).
The constants κi may be constants of the original language L, numerical constants
(0 or 1), or label constants (one of the Li). The label constants provide enough con-
stants to cover individuals that might not otherwise be referenced explicitly.

3.3 Terminology

The following definitions provide some necessary terminology.
Definition 1: Let L be a first-order language. A p-language L* for L is a language
that augments L with symbols for domain elements, real and natural number
arithmetic, and probability, as described in Section 3.1 above. A p-theory TA* for an
axiomatic theory TA in L augments the axioms A of TA as described in Section 3.2
above, to include: (i) axioms relativizing axioms in A to elements of the original
domain; (ii) axioms N1-N5, R-1R9, and P1-P6; and (iii) additional axioms defining a
domain-specific probabilistic theory. A p-theory TA* containing only N1-N5, RCF,
and P1-P6, with no domain axioms, is called the base p-theory for L.
Definition 2: Let L be a first-order language; let L* be a p-language for L. An
axiomatic theory TA* of L* is probabilistically complete if it assigns a unique prob-
ability P(σ | A*) to every sentence σ of L. That is, TA* is probabilistically complete if
for every sentence σ there is a unique real number pσ such that TA* |− P(σ | A*) = pσ.
A probabilistically complete p-theory assigns a single RCF element to each
sentence. Incomplete p-theories give rise to interval probabilities. Some writers have
advocated founding the theory of probability on interval rather than point-valued
probabilities (e.g., [4]). The possibility of incomplete p-theories is attractive when the
KB designer is not able to specify a probability for every sentence. With the advent of
first-order languages based on graphical probability models, it is now possible to
Axiomatic First-Order Probability 57

define probabilistically complete p-theories suitable for many interesting problems, to
develop workable knowledge engineering procedures for specifying p-theories, and to
devise tractable inference and learning algorithms for p-theories.
Definition 3: Let L be a first-order language; let T be a theory of L; and let L* be
a p-language for L. An axiomatic theory TA* of L* corresponds to T if T |− σ implies
TA* |− P(σ | A*) = 1 for any sentence σ of L. TA* corresponds strongly to T if T |− σ if
and only if TA* |− P(σ | A*) = 1.
Clearly, if the axioms A of TA are included among the axioms A* of TA*, then TA*
corresponds to TA. In general, augmenting A with N1-N5, RCF, and P1-P6, will not
determine a unique p-theory. If it is assumed that A incorporates all objective,
incontrovertibly true domain knowledge, then adding probabilistic axioms to com-
plete a p-theory brings subjectivity into the KB.
Of course, in actual applications, it is rarely the case that all logical axioms are
incontrovertibly true assertions. More realistically, some axioms will be highly
questionable; others, though quite useful, may be downright false. Axioms in real
KBs are carefully engineered to be “good enough for the task.” A great deal of
subjective judgment goes into developing a “good enough” KB. In short, the logical
axioms of a KB are often as subjective as the probabilities, and sometimes more so.
Definition 4: Let L be a first-order language, L* a p-language for L; and TA* a
probabilistically complete axiomatic theory of L*. Let ϕ(x,y) be a formula of L* that
functionally represents a recursive sequence of Gödel numbers of sentences of L*
(i.e., for each natural number n, there is exactly one sentence σn such that ϕ(n, #σn) is
provable from A*). We say the sequence σ1, σ2, … of sentences is negligible if for
every RCF element u > 0 there is a natural number n such that P(σ1 ∧ … ∧ σn | A*) <
u. The sequence σ1, σ2, … is certain if for every RCF element u > 0 there is a natural
number n such that P(σ1∧…∧σn | A*) > 1- u.
A negligible sequence is vanishingly improbable. That is, the probabilities of its
finite-length leading segments tend to zero as their lengths increase without bound.
Clearly, any sequence containing a zero-probability sentence is negligible. We can
define negligible or certain individual sentences or finite-length sequences of
sentences in the obvious way, by appending infinitely many copies of a tautology to
the end of the sequence. An individual sentence is certain if it has probability 1 and
negligible if it has probability zero.
Defintion 5: Let L be a first-order language; let L* be a p-language for L; let T*
be a theory of L*. The core of T* is the set of sentences {σ : σ is certain under T*}.

4 Semantics

This section defines semantics for p-theories.
Standard first-order semantics. The logic set forth in this paper is a standard,
untyped first-order logic. As such, it can be given standard first-order model-theoretic
semantics.
A first-order structure for a theory in L* is a pair (D, m), where D is a non-empty
set called the domain of interpretation, and m assigns to each function, constant, and
58 K. Laskey

relation symbol of L* a function, constant or relation of the correct arity on D. A
structure (D, m) implies a truth-value for each sentence of L*. (D, m) is called a
model of T* if every sentence in T* is true in (D, m). Models of T* are sometimes
called possible worlds for T*. A sentence σ is implied by A* if it is true in all models
of A*, and satisfiable if it has a model.
Standard semantics for p-theories gives rise to a seeming conflict between logical
truth and probabilistic certainty: there may be logically possible sentences that have
probability zero. For example, we might represent successive tosses of a symmetric
die as independent and identically distributed with probability 1/6 of landing on each
face. In a hypothetical infinite sequence of tosses, the frequency of tosses that land on,
say, the number 2 is certain to be 1/6. Any sequence of outcomes that does not have a
limiting frequency of 1/6 is negligible, in the sense of D4. Nevertheless, every
sequence of outcomes is logically possible. Standard first-order semantics cannot
distinguish between typical realizations of this probabilistic process (i.e., “random
looking” sequences with limiting frequency 1/6) and highly atypical realizations (e.g.,
sequences that have the incorrect limiting frequency, or exhibit some other unusual
regularity, such as a 2 on every sixth toss). Suppose the sentence σ1 asserts that the
limiting frequency is 1/6 that the die lands on 2; the sentence σ2 asserts that every toss
comes up 2; and the sentence σ3 asserts that a 2 occurs on the first two tosses. The
sentence σ1 has probability 1; σ2 has probability zero; and σ3 has probability 1/36.
None of these sentences is either implied by or inconsistent with the logical axioms.
Traditional first-order semantics seems to provide no way to differentiate among
them. For this reason, many authors have considered standard first-order semantics
inadequate for probabilistic theories, and have turned to alternative semantics.
Measure models. A common approach to giving semantics to probabilistic logics is
through a probability measure on structures. Using results from measure theory,
Gaifman [15] showed that a coherent probability assignment to quantifier-free
sentences can be extended to a countably additive probability measure on a σ-algebra
of subsets of {(D, m)}, the set of all structures on D = {L1, L2, …}.2 In measure
model semantics, a sentence is assigned probability equal to the measure of the set of
models of the sentence.
Just as traditional first-order semantics is defined in terms of set theory, measure
model semantics is defined in terms of measure theory. Measure theory is the branch
of real analysis used to formalize probability. In measure model semantics, it is
generally taken for granted that probabilities are interpreted as real numbers. As noted
above, the semantic condition that probabilities must be interpreted as real numbers
results in a non-axiomatizable logic that lacks a complete proof theory. On the other
hand, axiomatic set theory provides sufficient mathematical machinery to prove the
standard results of measure theory. If A* contains set theory axioms, then the results
of measure theory hold in all models of A*. Therefore, a probabilistically complete p-
theory that includes set theory axioms has a unique measure model. A probabil-
istically incomplete p-theory has a family of measure models, one for each

2
Gaifman’s domain of interpretation included the constants of the original language as well as
the added constants; the original constants are then self-interpreted. This requires that no two
constants be equal, an assumption we do not make.
Axiomatic First-Order Probability 59

distribution consistent with TA*. Any measure model assigns probability zero to the set
of models of negligible sequences.
Under measure model semantics, the logical and probabilistic aspects of a theory
remain semantically distinct. Provable sentences are true in all ordinary models of
TA*, and hence have probability 1 in any measure model of TA*. Unsatisfiable sen-
tences are false in all models and hence have probability zero in any measure model.
Other than P1-P6, probabilities for other sentences are unrestricted. In particular, a
sentence may provably have probability 1 and yet be false in some models.
Certainty restriction semantics. If a sentence has probability 1, then conditioning
on the sentence does not change its probability or the probability of any other
sentence. Furthermore, because there are only countably many sentences, we can
condition on all certain sentences – the core of the p-theory – without changing any
probabilities. We can use this fact to rule out negligible sentences as models of a p-
theory. Note that a p-theory TA* has enough logical machinery to define a provability
predicate. We can thus introduce an axiom schema that infers σ from TA* |− P(σ | A*)
= 1. Adding this axiom schema excludes provably negligible sentences as models of
TA*. We call this axiom schema the certainty restriction. Adding the certainty
restriction schema to TA* reduces the set of models of TA* without changing either the
probability of any sentence or any of the measure models consistent with TA*.
A rational agent makes no practical distinction between propositions with
probability one and those provable from its knowledge base. Many texts use limiting
frequencies (as well as other certain events) to define the meaning of probability
statements (e.g., that “fair die” means that the limiting frequency of tosses landing on
each face is 1/6). This suggests that the certainty restriction captures some aspect of
the intuitive semantics of probability as it is commonly applied and understood.
Strong probabilistic semantics. The certainty restriction and the conditioning
restriction can be formulated in first-order logic. This means that these conditions can
be imposed as satisfaction criteria for p-theories without any change to first-order
semantics. However, these conditions cannot capture the stronger semantic notion that
infinite-length negligible sequences should not occur in models of a probabilistic
theory. Strong probabilistic semantics requires that no model of a p-theory may
contain all sentences in a negligible sequence of sentences.
Each negligible sequence can be identified with an effectively null binary string, as
defined by Martin-Löf [6]. Martin-Löf randomness has been studied extensively (c.f.,
[7]), and is popular as a characterization of what it means for a sequence to be a
typical realization of a probability distribution on sequences. Dawid’s [8] calibration
criterion is closely related to Martin-Löf randomness: the set of uncalibrated
sequences for a given probability distribution is effectively null for that distribution.
If the axioms A* of our p-theory are strong enough to formalize measure theory,
then we can prove that the set of negligible sequences has probability zero under the
measure model for A*. Thus, negligible sequences can be excluded as models of A*
without changing any probabilities. However, excluding the negligible sentences as
models means abandoning traditional first-order semantics, because the proposition
that a sequence is negligible cannot be formalized as a recursive first-order axiom
schema. As a consequence, there is no complete proof system for probabilistic logic
with strong probabilistic semantics.
60 K. Laskey

Frequency probability. Some authors have argued that fundamentally different
kinds of probability are required to formalize different metaphysical notions such as
subjective degrees of belief, long-run frequencies, physical randomness, or
algorithmic randomness. Others argue that a single kind of probability is adequate for
all these metaphysical positions. Good ([4], [16]) and Barnett [17] discuss the
different viewpoints on this issue.
Because the same mathematics is applied to reason about all these kinds of
probability, and a proliferation of different logics complicates knowledge
representation and knowledge interchange, it seems reasonable to investigate whether
a single computational logic might be applicable to different notions of probability.
We have argued above that p-theories can represent subjective degrees of belief
about propositions that can neither be proven nor disproven. We argue that p-theories
can represent long-run frequencies and physical randomness. The basic idea derives
from a theorem of de Finetti [18] stating that an infinitely exchangeable sequence of
events is mathematically equivalent to one a frequentist or proponent of physical
propensity would model as independent and identically distributed (iid) given an
unknown parameter, together with a subjective probability distribution on values of
the parameter. To the frequentist, the parameter corresponds to the unknown long-run
frequency. To the propensity theorist, the parameter corresponds to the unknown
propensity. To the strict subjectivist, the parameter is a modeling fiction that provides
a parsimonious representation for an exchangeable sequence.
Infinitely exchangeable sequences in p-theories are represented as first-order
axiom schemas stating that different orderings of finite-length initial segments of a
sequence are equally probable. P-theories can also represent iid propositions with
unknown parameters. Incomplete p-theories can be used if no subjective distribution
is available (perhaps due to philosophical aversion to subjective probability) for the
unknown parameter. Thus, frequency and propensity probability as well as degree of
belief probability can be represented with p-theories.
Summary: Semantics for p-theories. With no change to standard first-order
semantics, a complete p-theory assigns probabilities consistently to sentences of L*
such that the axioms A* have probability 1. By including the certainty restriction
axiom schema in A*, we can identify probability 1 with provability from A*. If A*
includes set theory axioms, there is a unique measure model (probability measure
over models) for each complete p-theory. Requiring that all models of a p-theory be
non-negligible in the sense of Martin-Löf would take us out of the realm of first-order
model theory.

5 Learning and Dogmatism

An attractive feature of probability theory is its inbuilt support for learning from
observation. We can add any new non-negligible axiom to a p-theory, and Bayesian
conditioning can be used to obtain an updated p-theory with the evidence as an axiom.
As new observations accrue, we obtain a sequence of p-theories, each containing
additional axioms representing new information obtained since the previous p-theory
in the sequence.
Axiomatic First-Order Probability 61

Fundamental to the scientific attitude is lack of dogmatism. In our context, non-
dogmatism means assigning probability zero only to propositions known in-
controvertibly to be false. A dogmatic theory will be overturned if it makes definite
empirical predictions that turn out to be false. However, a theory can be dogmatic
without ever being proven false. For example, if a theory starts out certain that a coin
is fair, it can never learn that the coin is biased, no matter how many trials are
observed. A theory that allows for bias will eventually become convinced that a
biased coin is biased, even if it begins with a high likelihood that the coin is fair.
If we begin with an axiomatic theory TA of L, and assume that the axioms A
represent a set of sentences known incontrovertibly to be true (whether by definition
or by meticulous empirical observation), we would like to be able to represent a p-
theory that assigns probability zero to exactly those sentences that can be disproven
from A. We say such a p-theory corresponds non-dogmatically to TA. There are many
reasons, including tractability, convenience of specification, economy of commu-
nication, and the like, that we might choose to represent and reason with a dogmatic
theory, as long as it is judged “good enough” for the task at hand. But as a matter of
principle, we want the capability to represent a non-dogmatic theory, even if
reasoning with it is impractical.
A result of Gaifman and Snir [11] would seem to doom any hope of finding a non-
dogmatic theory. They proved that, under measure-model semantics, every axiomatiz-
able theory is dogmatic. On the other hand, Laskey [19] described how to specify a
non-dogmatic p-theory corresponding to any consistent, finitely axiomatizable first-
order theory. This apparent inconsistency is resolved by noting that Gaifman and Snir
assume that all true sentences of natural number arithmetic are base axioms of the
logical language, which is therefore not axiomatizable. Gaifman and Snir’s no-go
result does not apply to axiomatic first-order probability logics. An inevitable price of
the axiomatic first-order approach, implied by Tarski’s undefinability theorem, is that
any complete p-theory must assign probability intermediate between 0 and 1 to some
sentences in the language of arithmetic. This is natural if we view probability as a
degree of provability. No first-order axiom system can prove all true sentences of
arithmetic; therefore, no axiomatic first-order probability logic can assign probability
1 to all true sentences of arithmetic.

6 Conclusion

As probabilistic languages find increasing application to the Semantic Web (e.g.,
[20]), there is a need for a logical foundation that integrates traditional SW logics with
probability logic. The logic presented here formalizes probability as a standard
axiomatic first-order theory, with no alteration to traditional first-order model
theoretic semantics. Advantages of this approach are the ability to represent p-theories
as finite computational structures amenable to machine processing, the availability of
a complete proof system, and compatibility with semantics of traditional logic-based
languages. While this paper focuses on expressive power of the logic, SW
applications require tractability and scalability. Future work will consider tractable
restrictions of the logic, as well as fast approximate inference methods.
62 K. Laskey

Acknowledgments

Grateful acknowledgement is extended to anonymous reviewers who provided
helpful comments on previous drafts of some material contained in the present paper.

References

1. Jeffreys, H. Theory of Probability. (3rd ed) Oxford University Press (1961)
2. Savage, L.J. The Foundations of Statistics. Wiley, New York (1954)
3. Costa, P. C. G., Laskey, K.B. and Lukasiewicz, T. Uncertainty Representation and
Reasoning in the Semantic Web, in Semantic Web Engineering in the Knowledge Society,
Idea Information Science (2008)
4. Good, I. J. Probability and the Weighing of Evidence. London: Charles Griffin and Co
(1950)
5. Ng, K. S. and Lloyd, J. W. Probabilistic Reasoning in a Classical Logic. Journal of
Applied Logic, vol. 7, no. 2, pp. 218-238 (2009).
6. Martin-Lof, P. The Definition of Random Sequences. Information and Control vol. 9, pp.
602-619 (1966)
7. Li, M. and Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications
(2nd ed). Springer (1997)
8. Dawid, A. P. Calibration-Based Empirical Probability. The Annals of Statistics vol. 13,
no. 4, pp. 1251-1274 (1985)
9. Enderton, H.B. A Mathematical Introduction to Logic: Harcourt Academic Press (2001)
10. Halpern, J.Y. An Analysis of First-Order Logics of Probability. Artificial Intelligence, vol.
46, pp. 311-50 (1991)
11. Gaifman, H. and M. Snir. Probabilities over Rich Languages, Journal of Symbolic Logic,
vol. 47, pp. 495-548 (1982)
12. Bacchus, F. Representing and Reasoning with Probabilistic Knowledge: A Logical
Approach to Probabilities. Boston, MA, MIT Press (1990)
13. Shoenfeld, J. R. Mathematical Logic. Association for Symbolic Logic (1967)
14. Tarski, A. A Decision Method for Elementary Algebra and Geometry. Univ. of California
Press (1951)
15. Gaifman, H. Concerning Measures in First-Order Calculi. Israel Journal of Mathematics.
vol. 2, pp. 1-18 (1964)
16. Good, I.J. Good Thinking: The Foundations of Probability and Its Applications. University
of Minnesota Press (1983)
17. Barnett, V. Comparative Statistical Inference. (3rd ed.) Wiley, New York (1999)
18. de Finetti, Bruno. Theory of Probability: A Critical Introductory Treatment. Wiley, New
York (1974, originally published in 1934).
19. Laskey, K.B. MEBN: A Language for First-Order Bayesian Knowledge Bases. Artificial
Intelligence, vol. 172, no. 2-3, pp. 140-178 (2008)
20. Carvalho, R., Laskey, K.B., Costa, P., Ladeira, M., Santos, L., and Matsumoto, S.
UnBBayes: Modeling Uncertainty for Plausible Reasoning in the Semantic Web. In: V.
Kordic (ed.) Semantic Web, IN-TECH Publishing, ISBN: 978-953-7619-33-6 (in press)
An Algorithm for Learning with Probabilistic
Description Logics

José Eduardo Ochoa Luna and Fabio Gagliardi Cozman

Escola Politécnica, Universidade de São Paulo,
Av. Prof. Mello Morais 2231, São Paulo - SP, Brazil
eduardo.ol@gmail.com, fgcozman@usp.br

Abstract. Probabilistic Description Logics are the basis of ontologies
in the Semantic Web. Knowledge representation and reasoning for these
logics have been extensively explored in the last years; less attention has
been paid to techniques that learn ontologies from data. In this paper
we report on algorithms that learn probabilistic concepts and roles. We
present an initial effort towards semi-automated learning using proba-
bilistic methods. We combine ILP (Inductive Logic Programming) meth-
ods and a probabilistic classifier algorithm (search for candidate hypothe-
ses is conducted by a Noisy-OR classifier). Preliminary results on a real
world dataset are presented.

1 Introduction

Ontologies are key components of the Semantic Web, and among the formalisms
proposed within Knowledge Engineering, the most popular ones at the moment
are based on Description Logics (DLs) [1]. There are however relatively few
ontologies available, and on very few subjects. Moreover, building an ontology
from scratch can be a very burdensome and difficult task; very often two domain
experts design rather different ontologies for the same domain [2]. Considerable
effort is now invested into developing automated means for the acquisition of
ontologies. Most of the currently pursued approaches do not use the expressive
power of languages such as OWL, and are only capable of learning ontologies of
restricted form, such as taxonomic hierarchies [3].
It is therefore natural to try to combine logic-based and probabilistic ap-
proaches to machine learning for automated ontology acquisition. Inspired by
the success of Inductive Logic Programming (ILP) [4] and statistical machine
learning, in this paper we describe methods that learn ontologies in the recently
proposed Probabilistic Description Logic crALC [5]. In using statistical methods
we wish to cope with the uncertainty that is inherent to real-world knowledge
bases, where we commonly deal with biased, noisy or incomplete data. Moreover,
many interesting research problems, such as ontology alignment and collective
classification, require probabilistic inference over evidence.
Probabilistic Description Logics are closely related to Probabilistic Logics
that have been extensively researched in the last decades [6, 7]. Some logics [8]
64 J. E. Ochoa Luna and F. G. Cozman

admit inference based on linear programming, while other resort to indepen-
dence assumptions and graph-theoretical models akin to Bayesian and Markov
networks. Despite the potential applicability of Probabilistic Description Logics,
learning ontologies expressed in these logics is a topic that has not received due
attention. In principle, it might be argued that the same methods for learning
probabilistic logics might be applied, with proper account of differences in syntax
and semantics. In this paper we follow this path and report on some similarities
and differences that may be of interest.
The semantics of the Probabilistic Description Logic crALC [5] is based on
measures over interpretations and on assumptions of independence. Scalability
issues for inference in crALC have been addressed so that we can run inference
on medium size domains [9]. There are two learning tasks that deserve atten-
tion. First, learning probability values, perhaps through a maximum likelihood
estimation method. We use this technique and, due to uncertainty in Semantic
Web datasets, we employ the EM algorithm. The second task is learning logical
constructs, where we are interested in finding a set of concepts and roles that
best fit examples and where probabilistic assessments can be assigned. In ILP al-
gorithms such as FOIL [10], one commonly relies on a cover function to evaluate
candidate hypotheses. We approach learning concepts as a classification task,
and based on an efficient probabilistic Noisy-OR classifier [11, 12], we guide the
search among candidate structures.
Section 2 reviews key concepts useful to the remainder of the paper. In Sec-
tion 3, algorithms for learning crALC ontologies are introduced. Once these
algorithms have formally stated, we wish to explore semi-automated reasoning
from a real world dataset — the Lattes curriculum platform. A first attempt at
constructing a probabilistic ontology using this dataset is reported in Section 4.

2 Background
Assume we are provided with a repository of HTML pages where researchers and
students have stored data about publications, courses, languages, and further
relational data. In order to structure such knowledge we might choose to use
ontologies. We may extract concepts such as Researcher and Person, and we
may establish relationships such as ⊑ among them. These concepts are often
expressed using Description Logics (Section 2.1). Suppose we are not able to
precisely state membership relations among concepts, but instead we can give
probabilistic assessments such as P (Student|Researcher) = 0.4. Such assessments
are encoded in Probabilistic Description Logics such as crALC (Section 2.2).
Suppose further that we look for automated means to learn ontologies given
assertions on concepts such as Student(jane); this task is commonly tackled by
Description Logic Learning algorithms (Section 2.3).

2.1 Description Logics
Description Logics (DLs) form a family of representation languages that are typ-
ically decidable fragments of First Order Logic (FOL) with particular seman-
An Algorithm for Learning with Probabilistic Description Logics 65

tics [13, 14]. DLs represent knowledge in terms of objects, concepts, and roles.
Each concept in the set of concepts NC = {C, D, . . .} is interpreted as a subset
of a domain (a set of objects). Each role in the set of roles NR = {r, s, . . .} is
interpreted as a binary relation on the domain. Individuals represent the objects
through names from the set of names NI = {a, b, . . .}. Information is stored in
a knowledge base divided in (at least) two parts: the TBox (terminology) and
the ABox (assertions). The TBox describes the terminology by listing concepts
and roles and their relationships. The ABox contains assertions about objects.
Complex concepts are built using atomic concepts, roles and constructors.
Depending on the constructors involved one can obtain different expressive power
and decidability properties. The semantics of a description is given by a domain
∆ and an interpretation, that is a functor ·I . We refer to [13] for further back-
ground on Description Logics.
One of the central ideas in DL is subsumption [14]: Given two concepts de-
scriptions C and D in T , C subsumes D denoted by C ⊒ D, iff for every inter-
pretation I of T it holds that CI ⊇ DI . Also, C ≡ D amounts to C ⊒ D and
D ⊒ C.
Subsumption is a useful inference mechanism that allow us to perform stan-
dard reasoning tasks such as instance checking and concept retrieval. Instance
checking is valuable for our ILP methods because it amounts to produce class-
membership assertions: K |= C(a), where K is the knowledge base, a is an individ-
ual name and C is a concept definition given in terms of the concepts accounted
for in K.

2.2 The Logic crALC
The logics mentioned in Section 2.1 do not handle uncertainty through prob-
abilities. It might be interesting to assign probabilities to assertions, concepts,
roles; the Probabilistic Description Logic crALC does just that.
crALC is a probabilistic extension of the DL ALC. The following construc-
tors are available in ALC: conjunction (C ⊓ D), disjunction C ⊔ D, negation (¬C),
existential restriction (∃r.C), and value restriction (∀r.C). Concepts inclusions
and definitions are allowed and denoted by C ⊑ D and C ≡ D, where C is a
concept name. The semantics is given by a domain D and an interpretation I.
A set of concept inclusions and definitions is a terminology. A terminology is
acyclic if it is a set of concept inclusions/definitions such that no concept in the
terminology uses itself.
A key concept in crALC is probabilistic inclusion, denoted by P (C|D) = α,
where D is a concept and C is a concept name. If the interpretation of D is the
whole domain, then we simply write P (C) = α. We are interested in comput-
ing a query P (Ao (a0 )|A) for an ABox A = {Aj (aj )}M j=1 (this is an inference).
Assume also that C in role restrictions ∃r.C and ∀r.C is a concept name. As prob-
abilistic inclusions must only have concept names in their conditioned concept,
assessments such as P (∀r.C|D) = α or P (∃r.C|D) = α are not allowed.
We assume that every terminology is acyclic; this assumption allows one to
draw any terminology T as a directed acyclic graph G(T ): each concept name
66 J. E. Ochoa Luna and F. G. Cozman

is a node, and if a concept C directly uses concept D, then D is a parent of C
in G(T ). Each existential and value restriction is added to the graph G(T ). As
each one of these restrictions directly uses r and C, the graph must contain a
node for each role r, and an edge from r to each restriction directly using it. Each
restriction node is a deterministic node in that its value is completely determined
by its parents.
The semantics of crALC is based on probability measures over the space of
interpretations, for a fixed domain. Inferences can be computed by a first order
loopy propagation algorithm that has been shown to produce good approxima-
tions for medium size domains [9].

2.3 Inductive Logic Programming and Description Logic Learning
ILP is a research field at the intersection of machine learning [15] and logic
programming [16]. It aims at a formal framework as well as practical algorithms
for inductively learning relational descriptions from examples and background
knowledge. Learning is commonly regarded as a search problem [17]; indeed, in
ILP there is a space of candidate solutions, the set of “well formed” hypotheses
H, and an acceptance criterion characterizing solutions. In concept-learning and
ILP the search space is typically structured by means of the dual notions of
generalization and specialization.
A significant algorithm for ILP is FOIL [10]. This algorithm moves from an
explicit representation of the target relation (as a set of tuples of a particular
collection of constants) to a more general, functional definition that might be
applied to different constants. For a particular target relation, FOIL finds clauses
in FOL one at a time, removing tuples explained by the current clause before
looking for the next through a cover method. FOIL uses an information-based
heuristic to guide its search for simple, general clauses. Because of its simplicity
and computational efficiency, we have chosen to develop a covered approach
when learning Probabilistic Description Logics.
There have been notable efforts to learn ontologies in Description Logics;
some of these previous results have directly inspired our work. As noted by
Fanizzi et al [14], early work on learning in DLs essentially focused on demon-
strating the PAC-learnability for various languages derived from CLASSIC.
Many approaches to the problem of learning concept definitions in DL formalisms
can be classified in two categories [2]: in one category the problem is approached
by translating it to another formalism in which concept learning has already
been investigated, while in the other category the problem is approached in the
original formalism.
One example of the first approach can be found in the work of Kietz [18],
where the hybrid language CARIN-ALN is used [19]. This language combines a
complete structural subsumption service in a DL with Horn logic, where terms
are individuals in the domain. Likewise, in the AL-log framework [20], DATA-
LOG clauses are combined with ALC constructs. In the same direction, DL+log
[21] allows for the tight integration of DLs and DATALOG. Arguably, the decid-
able knowledge representation framework SHIQ+log [1], is the most powerful
An Algorithm for Learning with Probabilistic Description Logics 67

among the ones currently available for the integration of DLs and clausal logic.
First, it relies on the very expressive DL SHIQ. Second, it allows for inducing a
definition for DL concepts thus having ontology elements not only as input but
also as output of the learning process.
The problem of translation turns out to be similar to an ILP problem. There
are two issues to address: incompatibilities between DLs and Horn Logic, and
the fact that the OWA1 is used in DLs.
The other approach, solving the learning problem in the original formalism,
can be found in the work of Cohen and Hirsh [22], which uses a pure DL-based
approach for concept learning, in this case on the CLASSIC DL language. In
these algorithms, ILP has been a significant influence, as refinement operators
have been extensively explored. Badea and Nienhuys-Cheng [23] suggest a re-
finement operator for the ALER description logic. They also investigate some
theoretical properties of refinement operators that favour the use of a downward
refinement operator to enable a top-down search.
Learning algorithms for DLs (in particular for the language ALC) were cre-
ated by Iannone et al [2] that also make use of refinement operators. Instead
of using the classical approach of combining refinement operators with a search
heuristic, they developed an example driven learning method. The language,
called YINYANG, requires lifting the instances to the concept level through a
suitable approximate operator (most specific concepts MSCs) and then start
learning from such extremely specific concept descriptions. A problem of these
algorithms is that they tend to produce unnecessarily long concepts. One reason
is that MSCs for ALC and more expressive languages do not exist and hence
can only be approximated.
These disadvantages have been partly mitigated in the work of Lehmann
[24], where approximations are not needed because it is essentially based on a
genetic programming procedure lying on refinement operators whose fitness is
computed on the grounds of the covered instances. In the DL-LEARNER system
[3] further refinement operators and heuristics have been developed for the ALC
logic.
The DL-FOIL system [14] is a new DL version of the FOIL [10] algorithm,
that is adapted to learning the DL representations supporting the OWL-DL
language. The main components of this new system are represented by a set of
refinement operators borrowed from other similar systems [2, 3] and by a differ-
ent gain function (proposed in FOIL-I [25]) which must take into account the
OWA inherent to DLs. In DL-FOIL, like in the original FOIL algorithm, the
generalization routine computes (partial) generalizations as long as they do not
cover any negative example. If this occurs, the specialization routine is invoked
for solving these sub-problems. This routine applies the idea of specializing us-
ing the (incomplete) refinement operator. The specialization continues until no
negative example is covered (or a limited number of them).

1
Open World Assumption mantains that an object that cannot be proved to belong
to a certain concept is not necessarily a counterexample for that concept [14].
68 J. E. Ochoa Luna and F. G. Cozman

3 Probabilistic Description Logic Learning
In this section, we focus on learning DL axioms and probabilities tailored to
crALC. To learn the terminology component we are inspired by Probabilistic
ILP methods and thus we follow generic syntax and semantics given in [16]. The
generic supervised concept learning task is devoted to finding axioms that best
represent assertions positive (covered) and negatives, in a probabilistic setting
this cover relation is given by:

Definition 1. (Probabilistic Covers Relation) A probabilistic covers rela-
tion takes as arguments an example e, a hypothesis H and possibly the back-
ground theory B, and returns the probability value P (e|H, B) between 0 and 1 of
the example e given H and B, i.e., covers(e, H, B) = P (e|H, B).

Given Definition 1 we can define the Probabilistic DL learning problem as
follows [16]:
Definition 2. (The Probabilistic DL Learning Problem) Given a set E =
Ep ∪ Ei of observed and unobserved examples Ep and Ei (with Ep ∩ Ei = ∅) over
the language LE , a probabilistic covers relation covers(e, H, B) = P (e|H, B), a
logical language LH for hypotheses, and a background theory B, find a hypothesis
H ∗ such that H ∗ = arg maxH score(E, H, B) and the following constraints hold:
∀ep ∈ Ep : covers(ep , H ∗ , B) > 0 and ∀ei ∈ Ei : covers(ei , H ∗ , B) = 0. The
score is some objective function, usually involving the probabilistic
Q covers relation
of the observed examples such as the observed likelihood ep ∈Ep covers(ep , H ∗ , B)
or some penalized variant thereof.
Negative examples conflict with the usual view on learning examples in sta-
tistical learning. Therefore, when we speak of positive and negative examples we
are referring to observed and observed ones.
As we focus in crALC, B = K = (T , A), and given a target concept C,
E = Ind+ −
C (A) ∪ IndC (A) ⊒ Ind(A), are positive and negative examples or
individuals. For instance, candidate hypotheses can be given by C ⊒ H1 , . . . , Hk ,
where H1 = B ⊓ ∃D.⊤, H2 = A ⊔ E, . . ..
We assume each candidate hypothesis together with the example e for the tar-
get concept as being a probabilistic variable or feature in a probabilistic model2 ;
according to available examples, each candidate hypothesis turns out to be true,
false or unknown whether result for instance checking C(a) on K, Ind(A) is re-
spectively true, false or unknown. The learning task is restricted to finding a
probabilistic classifier for the target concept.
A suitable framework for this probabilistic setting is the Noisy-OR classifier,
a probabilistic model within the Bayesian networks classifiers commonly referred
to as models of independence of clausal independence (ICI) [12]. In a Noisy-OR
classifier we aim at learning a class C given a large number of attributes.
As a rule, in an ICI classifier for each attribute variable Aj , j = 1, . . . , k
(A denotes the multidimensional variable (A1 , . . . , Ak ) and a = (a1 , . . . , ak )
2
A similar assumption is adopted in the nFOIL algorithm [26].
An Algorithm for Learning with Probabilistic Description Logics 69

its states) we have one child A′j that has assigned a conditional probability
distribution P (A′j |Aj ). Variables A′j , j = 1, . . . , k are parents of probability of
the class variable C. PM (C|A′ ) represents a deterministic function f that assigns
to each combination of values (a′1 , . . . , a′k ) a class c. A generic ICI classifier is
illustrated in Figure 1 .

6Z
}
Z
Z
Z

Z
A′1 A′2 ... A′k

6 6 6

A1 A2 ... Ak

Fig. 1. ICI models [12].

The probability distribution of this model is given by [12]:
k
Y
PM (c, a′ , a) = PM (c|a′ ) PM (a′j |aj ) · PM (aj ),
j=1

where the conditional probability PM (c|a′ ) is one if c = f (a′ ) and zero otherwise.
The Noisy-OR model is an ICI model where f is the OR function:

PM (C = 0|A′ = 0) = 1 and PM (C = 0|A′ 6= 0) = 0.

The joint probability distribution of the Noisy-OR model is
k
!
Y
PM (·) = PM (C|A′1 , . . . , A′k ) · PM (A′j |Aj ) · PM (Aj ) .
j=1

It follows that
Y
PM (C = 0|A = a) = PM (A′j = 0|Aj = aj ), (1)
j
Y
PM (C = 1|A = a) = 1 − PM (A′j = 0|Aj = aj ). (2)
j

Using a threshold 0 ≤ t ≤ 1 all data vectors a = (a1 . . . , ak ) such that
PM (C = 0|A = a) < t are classified to class C = 1.
70 J. E. Ochoa Luna and F. G. Cozman

The Noisy-OR classifier has the following semantics. If an attribute Aj is in a
state aj then the instance (a1 , . . . , aj , . . . , ak ) is classified as C = 1 unless there
is an inhibitory effect, with probability PM (A′j = 0|Aj = aj ). All inhibitory
effects are assumed to be independent. Therefore the probability that an in-
stance
Q does not belong to class C (C = 0), is a product of all inhibitory effects
′
j PM (A j = 0|Aj = aj ). For learning this classifier the EM-algorithm has been
proposed [12]. The algorithm is directly applicable to any ICI model; in fact, an
efficient implementation resort to a transformation of an ICI model using a hid-
den variable (further details in [12]). We now shortly review the EM-algorithm
tailored to Noisy-OR combination functions.
Every iteration of the EM-algorithm consists of two steps: the expectation
step (E-step) and maximization step (M-step). In a transformed decompos-
able model the E-step corresponds to computing the expected marginal count
n(A′l , Al ) given data D = {e1 , . . . , en } (ei = {ci , ai } = {ci , ai1 , . . . , aik }) and
model M:
n
X
n(A′l , Al ) = PM (A′l , Al |ei ) for all l = 1, . . . , k
i=1

where for each (a′l , al )

PM (A′l = a′l |ei ) if al = ail ,
PM (A′l = a′l , Al = al |ei ) =
0 otherwise.

Assume a Noisy-Or classifier PM and an evidence C = c, A = a. The updated
probabilities (the E-step) of A′l for l = 1, . . . , k can be computed as follows [12]:

PM (A′l = a′l |C = c, a)


 1 if c = 0 and a′l = 0,

 ′

 0 ! if c = 0 and al = 1,
= 1 ′
Q ′

 z · PM (Al = 0|Al = al ) − j PM (Aj = 0|Aj = aj ) if c = 1 and a′l = 0,



1 ′
z · PM (Al = 1|Al = al ) if c = 1 and a′l = 1,

where z is a normalization constant. The maximization step corresponds to set-
ting
∗ n(A′l , Al )
PM (A′l |Al ) = , for all l = 1 . . . , k.
n(Al )
Given the Noisy-OR classifier, the complete learning algorithm is described
in Figure 2, where λ denotes the maximum likelihood parameters. We have used
the refinement operators introduced in [3] and the Pellet reasoner3 for instance
checking. It may happen that during learning a given example for a candidate
hypothesis Hi cannot be proved to belong to the target concept. This is not
necessarily a counterexample for that concept. In this case, we can make use of
3
http://clarkparsia.com/pellet/.
An Algorithm for Learning with Probabilistic Description Logics 71

Input: a target concept C, background knowledge K = (T , A), a training set E =
Ind+ −
C (A) ∪ IndC (A) ⊆ Ind(A) containing assertions on concept C.
Output: induced concept definition C.

Repeat
Initialize C′ = ⊥
Compute hypotheses C′ ⊒ H1 , . . . , Hn based on refinement operators for ALC
logic
Let h1 , . . . , hn be features of the probabilistic Noisy-OR classifier, apply the EM
algorithm
For all hi Q
Compute score ep ∈Ep covers(ep , hi , B)
Let h′ the hypothesis with the best score
According to h′ add H ′ to C
Until score({h1 , . . . , hi }, λi , E) > score({h1 , . . . , hi+1 }, λi+1 , E)

Fig. 2. Complete learning algorithm.

the EM algorithm of the Noisy-OR classifier to estimate the class ascribed to
the instance.
In order to learn probabilities associated to terminologies obtained for the
former algorithm we commonly resort to the EM algorithm. In this sense, we
are influenced in several respects from approaches given in [16].

4 Preliminary Results
To demonstrate feasibility of our proposal, we have run preliminary tests on
relational data extracted from the Lattes curriculum platform, the Brazilian
government scientific repository4 . The Lattes platform is a public source of re-
lational data about scientific research, containing data on several thousand re-
searchers and students. Because the available format is encoded in HTML, we
have implemented a semi-automated procedure to extract content. A restricted
database has been constructed based on randomly selected documents. We have
performed learning of axioms based on elicited asserted concepts and roles, fur-
ther probabilistic inclusions have been added according to the crALC syntax.
Figure 3 illustrates the network generated for a domain of size 2.
For instance, to properly identify a professor, the following concept descrip-
tion has been learned:
Professor ≡ Person
⊓(∃hasPublication.Publication ⊔ ∃advises.Person ⊔ ∃worksAt.Organization)

When Person(0) 5 is given by evidence, the probability value P (Professor(0)) =
0.68 (we have considered a large number of professors in our experiments), as
4
http://lattes.cnpq.br.
5
Indexes 0, 1 . . . n represent individuals from a given domain.
72 J. E. Ochoa Luna and F. G. Cozman

Fig. 3. Relational Bayesian network for the Lattes curriculum dataset.

further evidence is given the probability value changes to:
P (Professor(0)|∃hasPublication(1)) = 0.72,
and
P (Professor(0)|∃hasPublication(1) ⊔ ∃advises(1)) = 0.75.
The former concept definition can conflict with standard ILP approaches,
where a more suitable definition might be mostly based on conjuntions. In con-
trast, in this particular setting, the probabilistic logic approach has a nice and
flexible behavior. However, it is worth noting that terminological constructs ba-
sically rely on the refinement operator used during learning.
Another query, linked to relational classification, allows us to prevent dupli-
cate publications. One can be interested in retrieving the number of publications
for a given research group. Whereas this task might seem trivial, difficulties arise
mainly due to multi-authored documents. In principle, each co-author would
have a different entry for the same publication in the Lattes platform, and it
must be emphasized that each entry is be prone to contain errors. In this sense,
a probabilistic concept for duplicate publications was learned:
DuplicatePublication ≡ Publication
⊓(∃hasSimilarTitle.Publication ⊔ ∃hasSameYear.Publication
⊔hasSameType.Publication))
It clearly states that a duplicate publication is related to publications that
share similar title6 , same year and type (journal article, chapter book and so on).
At first, the prior probability is low: P (DuplicatePublication(0)) = 0.05. Evidence
on title similarity increases considerably the probability value:
P (DuplicatePublication(0)|∃hasSimilarTitle(0, 1)) = 0.77.
6
Similarity was carried out by applying a “LIKE” database operator on titles.
An Algorithm for Learning with Probabilistic Description Logics 73

Further evidence on type almost guarantees a duplicate concept:

P (DuplicatePublication(0)|∃hasSimilarName(1) ⊓ ∃hasSameType(1)) = 0.99.

It must be noted that title similarity does not guarantee a duplicate document.
Two documents can share the same title (same author), but nothing prevents
them from being published on different means (for instance, a congress paper
and an extended journal article). Probabilistic reasoning is valuable to deal with
such issues.

5 Conclusion
In this paper we have presented algorithms that perform learning of both prob-
abilities and logical constructs from relational data for the recently proposed
Probabilistic DL crALC. Learning of parameters is tackled by the EM algo-
rithm whereas structure learning is conducted by a combined approach relying
on statistical and ILP methods. We approach learning of concepts as a classifi-
cation task; a Noisy-OR classifier has been accordingly adapted to do so.
Preliminary results have focused on learning a probabilistic terminology from
a real-world domain — the Brazilian scientific repository. Probabilistic logic
queries have been posed on the induced model; experiments suggest that our
methods are suitable for learning ontologies in the Semantic Web.
Our planned future work is to investigate the scalability of our learning meth-
ods.

Acknowledgements
The first author is supported by CAPES. The second author is partially sup-
ported by CNPq. The work reported here has received substantial support
through FAPESP grant 2008/03995-5.

References
1. Lisi, F.A., Esposito, F.: Foundations of onto-relational learning. In: ILP ’08:
Proceedings of the 18th International Conference on Inductive Logic Programming,
Berlin, Heidelberg, Springer-Verlag (2008) 158–175
2. Iannone, L., Palmisano, I., Fanizzi, N.: An algorithm based on counterfactuals for
concept learning in the semantic web. Applied Intelligence 26(2) (2007) 139–159
3. Lehmann, J., Hitzler, P.: A refinement operator based learning algorithm for the
ALC description logic. In Blockeel, H., Shavlik, J.W., Tadepalli, P., eds.: ILP ’08:
Proceedings of the 17th International Conference on Inductive Logic Programming.
Volume 4894 of Lecture Notes in Computer Science., Springer (2008) 147–160
4. Muggleton, S.: Inductive Logic Programming. McGraw-Hill, New York (1992)
5. Cozman, F.G., Polastro, R.B.: Loopy propagation in a probabilistic description
logic. In: SUM ’08: Proceedings of the 2nd International Conference on Scalable
Uncertainty Management, Berlin, Heidelberg, Springer-Verlag (2008) 120–133
74 J. E. Ochoa Luna and F. G. Cozman

6. Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive
Computation and Machine Learning). The MIT Press (2007)
7. de Campos, C.P., Cozman, F.G., Ochoa-Luna, J.E.: Assembling a consistent set
of sentences in relational probabilistic logic with stochastic independence. Journal
of Applied Logic 7 (2009) 137–154
8. Hailperin, T.: Sentential Probability Logic. Lehigh University Press, Bethlehem,
United States (1996)
9. Cozman, F.G., Polastro, R.: Complexity analysis and variational inference for
interpretation-based probabilistic description logics. In: Conference on Uncertainty
in Artificial Intelligence. (2009)
10. Quinlan, J.R., Mostow, J.: Learning logical definitions from relations. In: Machine
Learning. (1990) 239–266
11. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann, San Mateo (1998)
12. Vomlel, J.: Noisy-OR classifier: Research articles. Int. J. Intell. Syst. 21(3) (2006)
381–398
13. Baader, F., Nutt, W.: Basic description logics. In: Description Logic Handbook.
Cambridge University Press (2002) 47–100
14. Fanizzi, N., D’Amato, C., Esposito, F.: DL-FOIL concept learning in description
logics. In: ILP ’08: Proceedings of the 18th International Conference on Inductive
Logic Programming, Berlin, Heidelberg, Springer-Verlag (2008) 107–121
15. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
16. Raedt, L.D., Kersting, K.: Probabilistic inductive logic programming. In: Proba-
bilistic ILP - LNAI 4911. Springer-Verlag Berlin (2008) 1–27
17. Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods.
Journal of Logic Programming 19-20 (1994) 629–679
18. Kietz, J.U.: Learnability of description logic programs. In: Inductive Logic Pro-
gramming, Springer (2002) 117–132
19. Rouveirol, C., Ventos, V.: Towards learning in CARIN-ALN . In: ILP ’00: Pro-
ceedings of the 10th International Conference on Inductive Logic Programming,
London, UK, Springer-Verlag (2000) 191–208
20. Donini, F.M., Lenzerini, M., Nardi, D., Schaerf, A.: AL-log: integrating Datalog
and description logics. Journal of Intelligent and Cooperative Information Systems
10 (1998) 227–252
21. Rosati, R.: DL+log: Tight integration of description logics and disjunctive datalog.
In: KR. (2006) 68–78
22. Cohen, W., Hirsh, H.: Learning the CLASSIC description logic: Theoretical and
experimental results. In: (KR94): Principles of Knowledge Representation and
Reasoning: Proceedings of the Fourth International Conference, Morgan Kaufmann
(1994) 121–133
23. Badea, L., Nienhuys-Cheng, S.H.: A refinement operator for description logics.
In: ILP ’00: Proceedings of the 10th International Conference on Inductive Logic
Programming, London, UK, Springer-Verlag (2000) 40–59
24. Lehmann, J.: Hybrid learning of ontology classes. In: Proceedings of the 5th
International Conference on Machine Learning and Data Mining. Volume 4571 of
Lecture Notes in Computer Science., Springer (2007) 883–898
25. Inuzuka, N., Kamo, M., Ishii, N., Seki, H., Itoh, H.: Top-down induction of logic
programs from incomplete samples. In: ILP ’96 : 6th International Workshop.
Volume 1314 of LNAI., SV (1997) 265–284
26. Landwehr, N., Kersting, K., DeRaedt, L.: Integrating Naı̈ve Bayes and FOIL. J.
Mach. Learn. Res. 8 (2007) 481–507
Position Papers
BeliefOWL: An Evidential Representation in
OWL Ontology

Amira Essaid1 and Boutheina Ben Yaghlane2
1
LARODEC Laboratory, Institut Supérieur de Gestion de Tunis
essaid amira@yahoo.fr
2
LARODEC Laboratory, Institut des Hautes Etudes Commerciales de Carthage
boutheina.yaghlane@ihec.rnu.tn

Abstract. The OWL is a language for representing ontologies but it
is unable to capture the uncertainty about the concepts for a domain.
To address the problem of representing uncertainty, we propose in this
paper, the theoretical aspects of our tool BeliefOWL which is based on
evidential approach. It focuses on translating an ontology into a directed
evidential network by applying a set of structural translation rules. Once
the network is constructed, belief masses will be assigned to the different
nodes in order to propagate uncertainties later.

1 Introduction

Many ontology definition languages have been developed to define ontologies in a
formal way. Among them the OWL 3 which is based on crisp logic. This language
suffers from its lack to represent real domains containing incomplete knowledge
or uncertain information. To overcome this, an extension of the OWL seems
to be a convenient solution. Many researches find this extension important and
try to propose approaches for handling uncertainty in ontology field. For that
purpose, two main mathematical theories have been applied: the probability
theory ([2],[7]) and the fuzzy sets theory ([4],[6]).
However not all the problems of uncertainty lend themselves to one of these
theories. We can find ourselves faced to situations where we are called to repre-
sent the total ignorance or the partial one about information concerning classes.
This can be resolved by applying the Dempster-Shafer theory [5]. At this stage,
we are interested to use this theory and especially we are encouraged to work
with the directed evidential networks [1] which are viewed as effective and appro-
priate graphical representation for uncertain knowledge. Adding to that, the use
of conditional belief functions provides a well representation of the uncertainty
in the relationships among the variables of a graph.
In this position paper we present our tool BeliefOWL as an approach for
extending an OWL ontology with belief functions as well as the translation of
this ontology into an evidential network.
3
http://www.w3.org/2001/sw/webOnt
78 A. Essaid and B. Ben Yaghlane

2 Uncertainty in OWL

The OWL is an expressive language for representing classes and the relations
between them for a domain of discourse. However the source of information
itself can suffer from giving a sufficient information of a concept. Sometimes we
can find ourselves unable to express the exact relation existing between classes
because of an incomplete knowledge about the domain of discourse or missed
values. Uncertainty extension to the OWL is starting to know a considerable
focus during the last years.
To cope with uncertain information in OWL extension, we propose the use
of the Dempster-Shafer theory [5]. In fact this theory allows assigning beliefs
not only to a single element but to a set of elements. Furthermore, it gives
the experts the possibility to represent the total ignorance or the partial one
about information concerning the classes of an ontology and the relations that
may exist between them. Besides, this theory provides a method for combining
several pieces of evidence from different sources to establish a new belief by using
Dempster’s rule of combination.
One of our goal is to translate an OWL taxonomy into a directed evidential
network (DEVN). The DEVN is a model introduced in [1] to represent knowledge
under uncertainty by using the belief functions. It is defined as a directed acyclic
graph (DAG) where the nodes represent variables and the directed arcs linking
nodes describe conditional dependence relations between these variables. These
relations are expressed by conditional belief functions for each variable given
its parents. Two kinds of belief functions are depicted to represent uncertainty
in the DEVN: the prior belief function and the conditional belief function. The
former concerns the root node and the latter expresses the belief function of a
node given the value taken by its parents.

3 Presentation of the BeliefOWL

The figure 1 resumes the different steps followed leading to our tool. In fact the
BeliefOWL has as input an OWL ontology and as output a directed evidential
network (DEVN).

Step 1: A Belief Extension to OWL: An OWL ontology can define
classes, properties and individuals. In this paper we will focus on attributing
belief masses to the different classes of an OWL taxonomy. For this purpose,
we define some new classes able to represent and to introduce this uncertain
information.

– Prior evidence: We define two classes to express the prior evidence and. The former is used to enumerate the dif-
ferent masses related to the different classes of an OWL taxonomy. It has an
object property that specifies the relation between classes
BeliefOWL: An Evidential Representation in OWL Ontology 79

Step1:
Belief Extension to OWL

Evidential ontology
OWL ontology

m(A)
Animal
Animal

m[A](F)
m[A](Ml) m[A](H)

Male Human Female
Step 3: Male Human
Female

Step 2: Conditional Belief masses m[F](W)
m[Ml](M)
Structural Translation Woman
attribution NodeUnion
Woman
Man NodeUnion Man

NodeInter_2
NodeInter_2
NodeDisjoint NodeInter_1 NodeDisjoint
NodeInter_1

DEVN DAG DEVN with belief
masses

Fig. 1. BeliefOWL Framework

and . The latter expresses the prior evi-
dence and has a datatype property which enables to assign a
mass value between 0 and 1.
– Conditional evidence: It is defined through two main classes and. The former is the same as in the case of prior
evidence but has an object property . The latter identifies
the conditional evidence and has a datatype property.
Step 2: Constructing an Evidential Network: Given an OWL ontology, we
translate it in a DAG by specifying the different nodes to be created as well as the
relations existing between these nodes. The construction of the DAG interests
some of the OWL statements those related to classes.
– : It is represented as a variable node in the translated DEVN.
– : When a class is a subclass of another one, a directed arc
is drawn from the superclass node to the child subclass node.
– ,:When two classes are related to
each other by one of these statements, a new node is created in the translated
DEVN and a directed arc is drawn between the two classes and the node
added.
– : A class C may be defined as the intersection of some
classes Ci (i,. . . ,n). This can be represented in the translated DEVN by an
arc from each Ci to C and another one from C and each Ci to a new node
created for representing the intersection.
– : A class C may be defined as the union of some classes
Ci (i,. . . ,n). This can be represented in the translated DEVN by an arc from
C to each Ci to C and another one from C and each Ci to a new node created
for representing the union.
Step 3: Evidence Attribution: Once the DAG of our network is constructed,
the remaining issue is to assign masses for each node of the network. Considering
the DAG that we have got, we can depict two kinds of nodes:
80 A. Essaid and B. Ben Yaghlane

– ClassesNodes: are the nodes representing the different classes of our taxon-
omy and defined by . To this kind of nodes we attribute the prior
belief functions and the conditional ones given into the evidential ontology.
– ConstNodes: are those related to the constructors of our taxonomy with-
out considering because this kind of constructor is not
represented by a specific node. Concerning the constNodes, masses will be
attributed according to the constructor we are talking about. In fact if we
have a node created to depict an intersection between two classes, the mass
will be attributed by applying the Dempster’s rule of combination. Concern-
ing the node representing an union, the disjunctive rule of combination will
be applied in that case.
Once our evidential network is constructed and the masses are assigned to
each node a propagation process can be performed.

4 Conclusion
In this paper, we have presented the beliefOWL which is a new approach for
representing uncertainty in an OWL ontology. We considered only the case for
including uncertainty in classes. This uncertainty is modeled via the Dempster-
Shafer theory of evidence. We have presented the theoretical aspects of our tool
which consists on translating an OWL ontology into a network. For this purpose,
we extend the OWL ontology classes with belief masses, then we apply structural
translation rules in order to get a DAG of a directed evidential network. The
masses added to the ontology will be extracted and will be attributed to the
network’s nodes classes.
Further work can carry about the properties and the individuals. The prior
beliefs assigned to the different nodes of the network are given by an expert, in
the future the assignment can be done automatically through a learning process.

References
1. Ben Yaghlane,B.: Uncertainty representation and reasoning in directed evidential
networks, PhD thesis, Institut Supérieur de Gestion de Tunis Tunisia, 2002.
2. Ding,Z.: BayesOWL: A Probabilistic Framework for Semantic Web, PhD thesis,
University of Maryland, Baltimore Country, (2005).
3. Fukushige,Y.: Representing probabilistic relations in RDF, proc. of Work. on
URSW at the 4th ISWC, Galway, Ireland, (2005).
4. Gao,M., Liu,C.: Extending OWL by fuzzy description logic, proc. of the 17th IEEE
ICTAI05, Hong Kong, China, 562-567 (2005).
5. Shafer,G.: A Mathematical Theory of Evidence, Princeton University Press (1976).
6. Stoilos,G., Stamou,G., Tzouvaras,V., Pan,J., Horrocks,I.: Fuzzy OWL: Uncertainty
and the Semantic Web, proc. of the Inter. Work. on OWL-ED05, Galway, Ireland,
(2005).
7. Yang,Y., Calmet,J.: OntoBayes: An Ontology-Driven Uncertainty Model, proc.
of CIMSCA/IAWTIC, Washington, DC, USA. IEEE Computer Society, 457-463
(2005).
Fuzzy Taxonomies for Creative Knowledge Discovery

Trevor Martin1,2, Zheng Siyao1,3 and Andrei Majidian2
1
AI Group, University of Bristol, BS8 1TR UK
2
Intelligent Systems Lab, BT, Adastral Park, Ipswich IP5 3RE, UK
3
School of Computer Science and Engineering, BeiHang University, Beijing, China

Trevor. Martin@bristol.ac.uk, zhengsyao@gmail.com, Andrei .Majidian@bt.com

Abstract. A systematic form of creative knowledge discovery is outlined,
requiring taxonomies to generalise knowledge structures and mappings between
taxonomies to find parallels between knowledge structures from different
domains. These share many of the features needed to handle uncertainty in the
semantic web, and results will be relevant to the URSW community.

Keywords: fuzzy taxonomy, creative knowledge discovery, fuzzy association
rules, uncertainty in semantic web

1 Introduction

Almost by definition, creative knowledge discovery is difficult to automate and
harder to assess objectively. By creative knowledge discovery, we mean finding
previously unknown links between concepts or small “chunks” of knowledge in such
a way that useful additional knowledge is generated. It can be distinguished from
“standard” knowledge discovery by defining the latter as the search for explanatory
and/or predictive patterns and rules in large volume data within a specific domain. For
example, a knowledge discovery process might examine an ISP(internet service
provider)’s customer database and determine that people who have a high monthly
spend and who send more than three emails to the support centre in a single month are
very likely to change to a different provider in the following month. Such knowledge
is implicit within the data but is useful in predicting and understanding behaviour.
By contrast, creative knowledge discovery is more concerned with “thinking the
unthought-of” and looking for new links, new perspectives, etc. Such links are often
found by drawing parallels between different domains and looking to see how well
those parallels hold - for example, compare the ISP example mentioned above to a
hotel chain finding that regular guests who report dissatisfaction with two or more
stays often cease to be regular guests unless they are tempted back by special
treatment (such as complimentary room upgrades). This is a simple illustration of
similar problems (losing customers) in different domains. A solution in one domain
(complimentary upgrades) could inspire a solution in the second (e.g. a higher
download allowance at the same price). Of course, such analogies may break down
when probed too far but they often provide the creative insight necessary to spark a
82 T. Martin, Z. Siyao and A. Majidian

new solution through a new way of looking at a problem. In many cases, this
inspiration is often referred to as “serendipity”, or accidental discovery.
It is possible that many serendipitous discoveries are subsequently rationalised as
the outcome of rigorous application of the scientific process. The traditional view of
the scientist is as a generator and tester of hypotheses - often this is presented as an
almost mechanical process and systems such as King’s robot scientist [1] take this to
an extreme, using an inductive logic programming approach to systematically
generate and test hypotheses in a laboratory.
In this paper we outline a project to automate creative knowledge discovery. The
aim is to find parallels between different knowledge repositories - in this case,
semantically annotated networks of documents or process models - in the hope of
transferring useful links from one network to another. In the case of process models
from different domains, the aim is to identify possible improvements in one process if
its analogue in the other domain is more efficient in some way.
This work shares many of the problems faced by research into uncertainty in the
semantic web - the mapping between repositories is very similar to a mapping
between ontologies, and the creation of knowledge networks encounters several issues
that are well-known from the semantic web, such as the need for imprecise concepts,
integration of sources that represent entities and classes at different levels of detail
etc. The work is at an early stage, and this paper briefly outlines (i) a possible
approach to automating creativity which relies on the use of fuzzy taxonomies and (ii)
preliminary work on automatic extraction of taxonomies from data; this requires a
representation of uncertainty similar to that needed for the semantic web.

2 A Method for Creative Knowledge Discovery

Can creativity - in this sense of suddenly making novel connections - be
automated? Koestler [2] summarised this view of creativity as follows:
“The creative act is not an act of creation in the sense of the Old Testament. It does not create
something out of nothing: it uncovers, selects, re-shuffles, combines, synthesizes already
existing facts, idea, faculties, skills. The more familiar the parts, the more striking the new
whole”

Table 1 - attributes of two music players (taken from [4])

Conventional tape recorder Sony Walkman
big small
clumsy neat
records does not record
plays back plays back
uses magnetic tape uses magnetic tape
tape is on reels tape is in cassette
speakers in cabinet speakers in headphones
mains electricity battery
Sherwood [3] proposes a systematic approach, in which a situation or artefact is
represented as an object with multiple attributes, and the consequences of changing
Fuzzy Taxonomies for Creative Knowledge Discovery 83

attributes, removing constraints, etc are progressively explored. For example, given
an old style reel-to-reel tape recorder as starting point, Sherwood’s approach is to list
some of its essential attributes, substitute plausible alternatives for a number of these
attributes, and evaluate the resulting conceptual design or solution. Table 1 shows
how this could have led to the Sony Walkman in the late 70s [4]. Again, with the
benefit of hindsight the reader should be able to see that by changing magnetic tape to
a hard disk and considering the way music is purchased and distributed, the same
method could (retrospectively, at least) lead one to invent the iPod. Of course, having
the vision to choose new attributes and the knowledge and foresight to evaluate the
result is the hard part - and the creative steps are usually only obvious with hindsight.
This systematic approach is ideally suited to handling data which is held in an
object-attribute-value format, provided we have a means of changing/generalising
attribute values. We intend to use taxonomies for this purpose, so that “sensible”
changes can be made (e.g. mains, battery are both possible values for a power
attribute). Representing an object O as a set of attribute-value pairs
{( ai , v i ) attribute ai of object O has value v i } we generate a new “design” O* = {( ai , T (v i ))}
by changing one or more values using Ti, a non-deterministic transformation of a
value to another value from the same taxonomy. Given sufficient time, this would
! simply enumerate all possible combinations of attribute values.
! We can reduce the
search space by looking at the solution to an analogous problem in a different domain.
Our aim is to adapt previously developed tools for taxonomy matching [5] so that
analogies can be found; the next section briefly outlines a way to extract taxonomic
structure when it is not explicitly available.

3 Extracting Embedded Soft Taxonomies

An ontology essentially consists of a taxonomy of concepts, one or more relations
between concepts, and rules which impose constraints and allow data transformation.
The idea of an ontology is central to the semantic web [6], although there can be a
very high cost in creation and maintenance. This is reflected in practical experience -
it is rare to find web-based data that is fully marked up with RDF or OWL metadata.
It is far more common to encounter data that is stored in a relational database or an
equivalent XML-tagged format. Such data often contains implicit taxonomies - a
relational table may flatten hierarchical data into one or more attributes. For example,
a film database may record genre(s) and sub-genre(s) as separate fields, hiding the
hierarchical dependency. The hierarchy may be obvious to a human reader of the data,
but it is invisible to the machine. Similarly, XML tags can hide structure. XML relies
on human interpretation for its “semantics” - a programmer can take advantage of the
fact that and are subtypes of , but a program has
no way of knowing this unless it is made explicit by means of a taxonomy. Although
a well-designed schema will make hierarchical structure explicit, our experience is
that a significant proportion of data sources rely on programmer intuition instead.
We have investigated formal concept analysis (FCA) [7, 8] as a way of extracting
hidden structure from a dataset in object-attribute-value form. In its simplest form,
FCA considers a binary-valued table, where each row corresponds to an object and
84 T. Martin, Z. Siyao and A. Majidian

each column to an attribute (property). The extension to a fuzzy case is (relatively)
straightforward, by considering a fuzzy relation R* and alpha-cuts which reduce the
problem to the crisp case. A brief outline and promising initial results are given in [9].

4 Applications

Two specific domains form demonstrators for this work. XML process mining
algorithms exist to discover process model from log files; various additions include
heuristic and fuzzy approaches to handle noisy data. Semantic processing mining
involves ontology knowledge. The ProM [www.processmining.org] platform takes
SA-MXML (semantic annotated mxml) files as input, where the annotation conforms
to the Web Service Modelling Language. The aim of this demonstrator is to find
(partial) similarities between process models in different domains, and use process
simulation tools to determine whether one process can be improved by slightly
altering it to match the second process more closely. The second demonstrator is
based on web forum discussions and support centre documentation, and will attempt
to improve the automated provision of “help” information.

5 Summary

This paper has briefly outlined a project to automate aspects of creative knowledge
discovery. The project is in early stages. Although not a direct application of
uncertain reasoning in the semantic web, it shares many of the same problems and
useful cross-fertilisation of ideas should be possible.
Acknowledgement : this work was partly funded by the FP7 BISON (Bisociation
Networks for Creative Information Discovery) project, number 211898.

6 References

1. King, R.D., et al., Functional genomic hypothesis generation and experimentation by a robot
scientist. Nature, 2004(6971): p. 247-251.
2. Koestler, A., The Act of Creation. 1964: Macmillan. 751.
3. Sherwood,D. SilverBullet Machine::Guide to Creativity,2009, www.silverbulletmachine.com
4. Sherwood, D., Koestler's Law: The Act of Discovering Creativity-And How to Apply It in
Your Law Practice. Law Practice, 2006. 32(8).
5. Martin,T.P, B.Azvine, Y.Shen, Granular Assoc Rules for Multiple Taxonomies: A Mass
Assignment Approach in Uncertain Reasoning in the Sem Web, M.Nickles, Ed 2008,Springer.
6. Berners-Lee, T, J.Hendler, and O.Lassila, Semantic Web, in Scientific American 2001 p28-37.
7. Ganter, B, R. Wille, Formal Concept Analysis: Mathematical Foundations. 1998: Springer.
8. Priss, U., Formal Concept Analysis in Information Science.
9. Majidian, A. and T.P. Martin. Extracting Taxonomies from Data - a Case Study using Fuzzy
FCA. in Web Intelligence-09. 2009. Milan, Italy: IEEE Computing.
Position paper: Uncertainty reasoning for linked data

Dave Reynolds1
1
Hewlett Packard Laboratories, Bristol
Dave.e.Reynolds@gmail.com

Abstract. Linked open data offers a set of design patterns and conventions for
sharing data across the semantic web. In this position paper we enumerate some
key uncertainty representation issues which apply to linked data and suggest
approaches to tackling them. We suggest the need for vocabularies to enable
representation of link certainty, to handle ambiguous or imprecise values and to
express sets of assumptions based on named graph combinators.

Keywords: Uncertainty reasoning; linked open data; semantic web

1 Introduction
The need for reasoning over uncertain information within the semantic web occurs in
many different situations. It can arise from intrinsic uncertainty in the world being
modeled or from limitations of the sensing or reasoning agent itself (epistemic). The
term uncertainty is often used to refer to many different notions including ambiguity,
randomness, vagueness, inconsistency, incompleteness [1][9].
In recent years an approach to the semantic web, called linked data, has been
developed and offers a promising route to practical and widespread semantic web up-
take. It provides a set of design guidelines or patterns for how the semantic web
technologies, and broader web architecture, can be used for sharing information. The
existing guidelines and practices have no provision for representation of uncertainty;
yet linked data is indeed fraught with many of these different types of uncertainty.
In this brief position paper we examine the ways in which uncertainty can occur in
a linked data setting and sketch possible approaches to addressing the issues raised.

2 Linked data
Linked Data is a set of conventions for publishing data on the semantic web. It is
based on principles outlined by Tim Berners-Lee [2]. These principle advocate the use
of http URIs for naming entities, the publication of data about these URIs using the
standards (RDF, SPARQL) and inclusion of links to other URIs so that agents can
discover more information. While quite simple these guidelines, along with a growing
body of practical advice [3], have led to publication and linking of many datasets in
this form [4]. This has resulted in high profile commercial applications such as [5].
While not explicitly stated, the style of linked data places an emphasis on data
sharing and simplicity, with corresponding less emphasis on depth of modeling and
reasoning. Yet the intrinsic nature of the linked data approach leads to issues of
uncertainty representation and reasoning. This is due to the emphasis on cross-linking
86 D. Reynolds

multiple data sources that have been independently developed and modeled.
Uncertainty can arise from the instance linking process, from the mapping between
different sources models and due to differing hidden assumptions in the underlying
datasets. Yet the essence of linked data, and a large part of the reason for its uptake, is
simplicity. The data is intended to be self-descriptive and accessible through simple
link following and graph union or through SPARQL endpoints. Our challenge is to
develop a common, easy to deploy, approach to uncertainty representation which can
be applied to linked data sets without losing this simplicity.

3 Some sources of uncertainty in linked data applications
In this section we enumerate some key sources of uncertainty for linked data. We
focus on the sources which directly result from the intrinsic nature of linked data – the
cross-linking of independently developed RDF datasets.

3.1 Ambiguity resulting from data merging
In linked data, entities (Individuals) which co-occur with different URIs in different
datasets are unified. This is achieved by publishing owl:sameAs relations between
identified entities, either within the dataset or as a separate link set. The process of
identifying such co-references is imperfect. Firstly, the co-references are typically
found by a mixture of string matching, attribute matching, and type constraints,
generally based on a statistical or machine learning algorithm [6]. Thus co-references
are only identified with some probability (or less formal heuristic weighting). Yet the
asserted links are binary and the strength of association is lost. Secondly, the nature of
the entities is ambiguous in some datasets. For example, Wikipedia and thus DBPedia
conflate the concepts of the City Bristol in the UK and the associated Unitary
Authority. A co-reference link that identified the ambiguous DBPedia concept with
one that specifically denotes the Unitary Authority would be an error in general, even
though it may be an acceptable approximation in some situations.

3.2 Misalignment of precision and assumptions between merged sources
Many datasets in the linked data web publish property values for the entities they
describe; for example, the population of the City of London. Yet those values are
sometimes imprecise or dependent upon measurement assumptions that are not made
explicit. For example, the population of a city depends on the time of the
measurement, the measurement methodology and the precise definition of the
boundary of the city; it is also subject to statistical uncertainty. As a result, at the time
of writing, a linked data query on London returns a graph with four assertions on its
population ranging from 7,700,000 to 8,500,000. One of these sources of variation,
the time of measurement, is sometimes made explicit in data and indeed one of the
four assertions is (indirectly) time qualified. However, such contextual qualification is
not consistently available and, in any case, only accounts for one source of variation.
Thus when datasets are linked the resulting union will often have multiple conflicting
values for supposedly functional properties.
Uncertainty Reasoning for Linked Data 87

3.3 Misalignment of models
When linking datasets we also want to map the associated ontologies. This process is
just as error prone as entity co-reference since the axiomatization of concepts in the
ontologies is rarely so complete as to allow a unambiguous mapping. Errors in the
ontology mapping can lead to global effects such as unexpected identification of
related concepts. Determining and publishing such alignment errors is the subject of
considerable research and is outside the scope of this paper.

3.4 Absence of source reliability information
Separate from the uncertainty arising from combination and linking of datasets then
the datasets themselves can be uncertain or contain errors (either accidental or
malicious). While this is true in general in the semantic web, the linked data approach
implies broad cross linking with no provision for narrow scoping of link references.
This exacerbates the problems of the veracity or trustworthiness of included datasets.

4 Mitigation approaches
We now discuss approaches to mitigate the effects of these uncertainty sources on the
consumers of linked data. In keeping with linked data methodology we seek simple,
broadly applicable, design patterns. In particular, we suggest the need for design
patterns for making the uncertainty inherent in the linked datasets more explicit, and
mechanisms to enable selective combination of datasets (so that problematic values or
links can be omitted). In this a short position paper we only sketch the suggested
approaches as a basis for discussion in the workshop.

4.1 Link vocabulary
The link vocabulary would provide a common representation for co-reference links,
enabling publication of the link certainty information on which per-link inclusion
decisions can be made. This can be achieved by extending the voiD ontology [8] with
a concept UncertainLinkSet (as a subclass of void:LinkSet), and associated
properties for describing the method used for deriving the link set. The
UncertainLinkSet itself would contain n-ary relations (WeightedLink) comprising the
link and associated link weight. Different subclasses of WeightedLink indicate
different interpretations of the link weight (such as probabilistic or ad hoc).

4.2 Imprecise value vocabulary
The imprecise value vocabulary would provide a common representation for
imprecise values that arise from data set merger, as discussed in 3.2. This would allow
republication of merged datasets which explicitly show the variation in source data
values. Returning to our example of the population of London the merged set might
look like:
:London :population [a :ImpreciseValue;
:samplevalue [:value 7700000; :source :s1; :context :y2009]
:samplevalue [:value 7900000; :source :s2; :context :y2008]
:estimatedValue 785123]
88 D. Reynolds

4.3 Override graphs
Finally we suggest the need for override graphs so that one agent can publish
retractions and overrides to the link assertions or data assertions made by another.
The current approach to this, in linked data applications, is to partition data and
link sets into named graphs [7]. For example, rather than include all the co-reference
links directly in the same graph as the entity descriptions, we partition them into a
separate named graph. In this way a RESTful access can see the union of the relevant
graphs but a SPARQL endpoint can support selection of which graphs to include.
This allows agents to avoid selected link sets or sub-sources but only at the grain size
of the entire graph. To overcome this limitation we suggest extending the VoiD
vocabulary to include graph combinators difference, union and replace. So one source
can decide which subsets of the data and links to trust, and can then publish the
assumptions it is making as a set of deltas over the source graphs. The difference
graphs enable per-link and per-assertion changes to be expressed even if the
underlying source only publishes the link set or data assertions as monolithic graphs.

5 Discussion
Of the issues in section 3 we have suggested an agenda for how to address some of
them. The link and imprecise value vocabularies enable publication of link
uncertainty (3.1) and value ambiguity (3.2) information in linked data sets. The
vocabularies themselves would not remove the uncertainties, nor the problems of
estimating them. However, simply having a means to publish this information is
already a step forward. The suggested graph combinators would enable an agent to
make and publish more selective data combinations, based on its interpretation of link
strengths and data values. This does not solve the problems of deciding which parts of
which sources to trust, but it does enable more effective sharing of such decisions.

References
1. Laskey, K.J., et al.: Uncertainty Reasoning for the World Wide Web, W3C Incubator Group
Report, 31 March 2008. http://www.w3.org/2005/Incubator/urw3/XGR-urw3/
2. Berners-Lee, T.: Linked Data. http://www.w3.org/DesignIssues/LinkedData.html (2006)
3. http://linkeddata.org/
4. http://linkeddata.org/data-sets
5. Kobilarov, G., et al.: Media Meets Semantic Web --- How the BBC Uses DBpedia and
Linked Data to Make Connections. Proc. of the 6th European Semantic Web Conference on
the Semantic Web: Research and Applications (Heraklion, Crete, Greece). Lecture Notes In
Computer Science, vol. 5554. Springer-Verlag, Berlin, Heidelberg, 723-737. (2009)
6. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey.
IEEE Transactions on Knowledge and Data Engineering 19 (2007)
7. Carroll, J. J., Bizer, C., Hayes, P., and Stickler, P.: Named graphs, provenance and trust. In
Proceedings of the 14th international Conference on World Wide Web (Chiba, Japan, May
10 - 14, 2005). WWW '05. ACM, New York, NY, 613-622. (2005)
8. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: voidD Guide : Using the vocabulary
of Interlinked Datasets. http://rdfs.org/ns/void-guide (2009)
9. Kruse, R., Schwecke, E., and Heinsohn, J. 1991 Uncertainty and Vagueness in Knowledge
Based Systems. Springer-Verlag New York, Inc.
Author Index

Ben Yaghlane, Boutheina . . . . . . . . . . . . 77
Carvalho, Rommel N. . . . . . . . . . . . . . . . . 3
Cozman, Fabio Gagliardi . . . . . . . . . . . . 63
da Costa, Paulo C. G. . . . . . . . . . . . . . . . . .3
d’Amato, Claudia . . . . . . . . . . . . . . . 15, 27
Esposito, Floriana . . . . . . . . . . . . . . . . . . 27
Essaid, Amira . . . . . . . . . . . . . . . . . . . . . . 77
Fanizzi, Nicola . . . . . . . . . . . . . . . . . . 15, 27
Fazzinga, Bettina . . . . . . . . . . . . . . . . . . . 15
Gajderowicz, Bart . . . . . . . . . . . . . . . . . . 39
Gottlob, Georg . . . . . . . . . . . . . . . . . . . . . 15
Ladeira, Marcelo . . . . . . . . . . . . . . . . . . . . 3
Laskey, Kathryn B. . . . . . . . . . . . . . . . 3, 51
Lukasiewicz, Thomas . . . . . . . . . . . . . . . 15
Majidian, Andrei . . . . . . . . . . . . . . . . . . . 77
Martin, Trevor . . . . . . . . . . . . . . . . . . . . . . 77
Matsumoto, Shou . . . . . . . . . . . . . . . . . . . . 3
Ochoa Luna, José Eduardo . . . . . . . . . . 63
Reynolds, Dave . . . . . . . . . . . . . . . . . . . . . 81
Sadeghian, Alireza . . . . . . . . . . . . . . . . . . 39
Santos, Laécio L. . . . . . . . . . . . . . . . . . . . . 3
Siyao, Zheng . . . . . . . . . . . . . . . . . . . . . . . 77