=Paper=
{{Paper
|id=Vol-477/paper-53
|storemode=property
|title=Formalizing Multimedia Interpretation based on Abduction over Description Logic Aboxes
|pdfUrl=https://ceur-ws.org/Vol-477/paper_66.pdf
|volume=Vol-477
|dblpUrl=https://dblp.org/rec/conf/dlog/PeraldiKM09
}}
==Formalizing Multimedia Interpretation based on Abduction over Description Logic Aboxes==
Formalizing Multimedia Interpretation based on
Abduction over Description Logic Aboxes ?
Sofia Espinosa Peraldi, Atila Kaya, Ralf Möller
Hamburg University of Technology, Germany,
{sofia.espinosa, at.kaya, r.f.moeller}@tuhh.de
Abstract. The paper describes how interpretations of multimedia doc-
uments can be formally derived using abduction over domain knowledge
represented in an ontology. The approach uses an expressive ontology
specification language, namely description logics in combination with
logic programming rules, and formalizes the multimedia interpretation
process using a combined abduction and deduction operation. We de-
scribe how the observables as well as the space of abducibles can be
formally defined. The approach is evaluated using examples from text
processing, but can also be applied to interpret content in other modal-
ities.
1 Introduction
Multimedia interpretation can be defined as the process of extracting deep-level
semantics from content. Deep-level semantics represent the abstract meaning
of the content and they are the result of reasoning processes over background
knowledge. Automatic interpretation of multimedia content for information ex-
traction (IE) purposes is becoming a major issue of research due to requirements
to reduce costs in the areas of knowledge management and information retrieval.
The goal is to automate interpretation based on formal models tailored towards
a particular domain.
The basic idea of the multimedia interpretation approach that we pursue
in this paper is that content descriptions derived by low-level analysis pro-
cesses should be supported by constructing one or multiple high-level “expla-
nations”. The appraoch has been explored in the EU-funded projects BOEMIE
(http://www.boemie.org) and CASAM (http://www.casam-project.eu).
Since the goal is to derive high-level descriptions with new objects and as-
sertions among new and “old” objects, deduction alone is not an appropriate
formalization. In addition to deduction, an abduction step is required, as has
been realized several times in the literature.
– First techniques for finding abductive explanations have been discussed long
ago [1, 2]. In [3] Mayer and Pirri present a semantic tableaux for abduction in
the context of first-order logic using ‘reversed skolomization’. Until now only
the formalization of this problem is shown, the algorithms are not practical.
?
This paper has been partially supported by the projects BOEMIE and CASAM
(FP6-027538 and FP7-217061, respectively).
– in [4] Hobbs et al. formalize text interpretation as abduction. In his work
Hobbs also argues that deduction is not enough and describes how high-level
text interpretation can be realized as abduction over predicate logic formu-
lae. Since predicate logic reasoning is undecidable in general, and the per-
formance of reasoning engine is rather low (if they terminate for a particular
problem), we pursue an approach that uses a less expressive but decidable
formalism for which highly optimized reasoners exist (e.g., RacerPro).
– In [5], Shanahan presents a formal theory of robot perception as a form of
abduction. In this work, low-level sensor data is transformed into a symbolic
representation using first-order logic and abduction is used to derive expla-
nations. Logic-based modeling is used to describe the behavior of procedural
programs in [5], however.
– In the context of scene interpretation, recently, Möller and Neumann pro-
posed the use of DLs for the representation of aggregates that can be used
by reasoning services as building blocks for the scene interpretation process
[6, 7]. However, in these works, description logics are used to theoretically
describe multimedia interpretation processes only.
The contribution of this paper is the first declarative way to formalize multimedia
interpretation that is directly operational in the sense of executable specifica-
tions. The paper describes how the “observables requiring explanation” as well
as the “space of abducibles” can be formally defined using logic programming
techniques. In contrast to approaches such as [8], which use abduction in the
context of rules in logic programming only, or appraoches which deal only with
concept abduction or [9], our approach combines existing DL reasoning mecha-
nisms and rules in a coherent framework, i.e., abduction is considered as a new
type of non-standard Abox retrieval inference service, which is integrated into
an existing DL reasoner. The approach is evaluated using examples from text
processing, but can also be applied to interpret multimedia documents in other
modalities.
2 Preliminaries
For studying interpretation as abduction over ontologies we focus on the descrip-
tion logic ALCQ. We assume that the reader is familiar with description logics,
and we only introduce specific operators necessary for Abox abduction.
2.1 Sequences, Variable Substitutions and Transformations
For the introduction of the interpretation algorithm, we need some additional
definitions. A variable is a name of the form ?name where name is a string of
characters from {a..z}. In the follow definitions, we denote places where variables
can appear with uppercase letters.
Let V be a set of variables, and let X, Y1 , . . . , Yn be sequences h. . .i of vari-
ables from V . Z denotes a sequence of individuals. We consider sequences of
length 1 or 2 only, if not indicated otherwise, and assume that (hXi) is to be
read as (X) and (hX, Y i) is to be read as (X, Y ) etc. Furthermore, we assume
that sequences are automatically flattened. A function as set turns a sequence
into a set in the obvious way.
A variable substitution σ = [X ← i, Y ← j, . . .] is a mapping from variables
to individuals. The application of a variable substitution σ to a sequence of
variables hXi or hX, Y i is defined as hσ(X)i or hσ(X), σ(Y )i, respectively, with
σ(X) = i and σ(Y ) = j. In this case, a sequence of individuals is defined. If a
substitution is applied to a variable X for which there exists no mapping X ← k
in σ then the result is undefined. A variable for which all required mappings are
defined is called admissible (w.r.t. the context).
2.2 Grounded Conjunctive Queries
Let X, Y1 , Yn be sequences of variables, and let Q1 , . . . , Qn denote atomic concept
or role descriptions.
A query is defined by the following syntax.
{(X) | Q1 (Y1 ), . . . , Qn (Yn )}
The sequence X may be of arbitrary length but all variables mentioned in X
must also appear in at least one of the Y1 , · · · , Yn : as set(X) ⊆ as set(Y1 ) ∪ · · · ∪
as set(Yn ).
Informally speaking, Q1 (Y1 ), . . . , Qn (Yn ) defines a conjunction of so-called
query atoms Qi (Yi ). The list of variables to the left of the sign | is called the
head and the atoms to the right of are called the query body. The variables in
the head are called distinguished variables. They define the query result. The
variables that appear only in the body are called non-distinguished variables and
are existentially quantified.
Answering a query with respect to an ontology Σ means finding admissible
variable substitutions σ such that Σ |= {(σ(Y1 )) : Q1 , . . . , (σ(Yn )) : Qn }. We say
that a variable substitution σ = [X ← i, Y ← j, . . .] introduces bindings i, j, . . .
for variables X, Y, . . .. Given all possible variable substitutions σ, the result of a
query is defined as {(σ(X))}
Note that the variable substitution σ is applied before checking whether Σ |=
{(σ(Y1 )) : Q1 , . . . , (σ(Yn )) : Qn }, i.e., the query is grounded first.
For a query {(?y) | P erson(?x), hasP articipant(?y, ?x)} and the Abox Γ1
= {ind1 : HighJump, ind2 : P erson, (ind1 , ind2 ) : hasP articipant}, the sub-
stitution [?x ← ind2 , ?y ← ind1 ] allows for answering the query, and defines
bindings for ?y and ?x.
A boolean query is a query with X being of length zero. If for a boolean query
there exists a variable substitution σ such that Σ |= {(σ(Y1 )) : Q1 , . . . , (σ(Yn )) :
Qn } holds, we say that the query is answered with true, otherwise the answer
is false.
Later on, we will have to convert query atoms into Abox assertions. This is
done with the function transform. The function transform applied to a set of
query atoms {γ1 , . . . γn } is defined as {transform(γ1 , σ), . . . , transform(γn , σ)}
where
transform(P (X), σ) := (σ(X)) : P .
2.3 Rules
A rule r has the following form P (X) ← Q1 (Y1 ), . . . , Qn (Yn ) where P, Q1 , . . . , Qn
denote atomic concept or role descriptions with the additional restriction (safety
condition) that as set(X) ⊆ as set(Y1 ) ∪ · · · ∪ as set(Yn ).
Rules are used to derive new Abox assertions, and we say that a rule r is ap-
plied to an Abox A. The function call apply(Σ, P (X) ← Q1 (Y1 ), . . . , Qn (Yn ), A)
returns a set of Abox assertions {(σ(X)) : P } if there exists an admissible vari-
able substitution σ such that the answer to the query
{() | Q1 (Y1 ), . . . , Qn (Yn )}
is true with respect to Σ ∪ A.1 If no such σ can be found, the result of the call to
apply(Σ, r, A) is the empty set. The application of a set of rules R = {r1 , . . . rn }
to an Abox is defined as follows.
[
apply(Σ, R, A) = apply(Σ, r, A)
r∈R
The result of forward chain(Σ, R, A) is ∅ if apply(Σ, R, A) ∪ A = A and
apply(Σ, R, A) ∪ forward chain(Σ, R, A ∪ apply(Σ, R, A)) otherwise.
3 Multimedia Interpretation
The multimedia interpretation process aims to compute interpretations (sets
of descriptions) of a multimedia document based on low-level descriptions and
background knowledge. Beside low-level descriptions about objects and their
relations, an interpretation also contains high-level descriptions about abstract
objects like events and their relations with low-level descriptions. High-level de-
scriptions cannot directly be inferred from low-level descriptions but they have
to be hypothesized w.r.t. some background knowledge.
In the rest of this section, we first formalize abduction as the key infer-
ence service for explaining observations, and we then describe the multimedia
interpretation process that exploits both abduction and deduction to compute
interpretations.
3.1 Computing Explanations via Abduction
In general, abduction is formalized as Σ ∪ ∆ |= Γ where background knowledge
(Σ), and observations (Γ ) are given and explanations (∆) are to be computed.
In terms of DLs, ∆ and Γ are Aboxes and Σ is a TBox.
Abox abduction is implemented as a non-standard retrieval inference service
in DLs, in contrast to standard retrieval inference services where answers are
found by exploiting the ontology, Abox abduction has the task of acquiring
what should be added to the ontology in order to answer a query. Therefore, the
1
If Σ ∪ A is inconsistent the result is well-defined but useless. It will not be used
afterwards.
result of Abox abduction is a set of hypothesized Abox assertions. To achieve
this, the space of abducibles has to be previously defined and we do this in terms
of rules.
We assume that a set of rules R as defined above (see Section 2.3) are spec-
ified, and define a non-deterministic function compute explanation as follows.
– compute explanation(Σ, R, A, (Z) : P ) = transform(Φ, σ) if there exists a
rule r = P (X) ← Q1 (Y1 ), . . . , Qn (Yn ) ∈ R that is applied to an Abox A such
that a set of query atoms Φ and an admissible variable substitution σ with
σ(X) = Z can be found, and the query Q := {() | expand(P (X), r, R) \ Φ}
is answered with true.
– If no such rule r exists in R it holds that compute explanation(Σ, R, A, (Z) :
P ) = ∅.
The goal of the function compute explanation is to determine what must
be added (Φ) such that an entailment Σ ∪ A ∪ Φ |= (Z) : P holds. Hence, for
compute explanation, abductive reasoning is used. The set of query atoms Φ
defines what must be hypothesized in order to answer the query Q with true
such that Φ ⊆ expand(P (X), r, R) holds. The definition of compute explanation
is non-deterministic due to several possible choices for Φ.
The function application expand(P (X), P (X) ← Q1 (Y1 ), . . . , Qn (Yn ), R) is
also defined in a non-deterministic way as
expand0 (Q1 (Y1 ), R) ∪ · · · ∪ expand0 (Qn (Yn ), R)
with expand0 (P (X), R) being expand(P (X), r, R) if there exist a rule r = P (X) ←
. . . ∈ R and hP (X)i otherwise. We say the set of rules is backward-chained, and
since there might be multiple rules in R, backward-chaining is non-deterministic
as well. Thus, multiple explanations are generated.
3.2 The Media Interpretation Process
In the following we devise an abstract computational engine for “interpreting”
Abox assertions in terms of a given set of rules. Interpretation in this sense is not
to be confused with the interpretation of a concept description (which is defined
as a set of objects from the domain). Interpretation of Abox assertions w.r.t. a
set of rules is meant in the sense that using the rules some high-level explanation
is constructed such that the Abox assertions are entailed. The interpretation of
an Abox is again an Abox. For instance, the output Abox might represent results
of a content interpretation process (see below for an example). The presentation
in this paper slghtly extended the one in [10].
Let Γ be an Abox of observations whose assertions are to be explained. The
goal of the interpretation process is to use a set of rules R to derive “expla-
nations” for elements in Γ . The interpretation algorithm implemented in the
interpretation engine works on a set of (possible) interpretations I, i.e., a set of
Aboxes.
Initially, I ⇐ {Γ }, i.e. {(pN ame1 , country1 ) : personN ameT oCountry,
(hjN ame1 , city1 ) : sportsN ameT oCity}, at this stage, the interpretation is
just the input Abox Γ .2 The complete multimedia interpretation process is im-
plemented by the interpret function:
function interpret(Ω, Ξ, Σ, R, S, Γ ) :
I0 := {Γ }
repeat
I := I0
(A, α) := Ω(I) // A ∈ I, α ∈ A s.th. requires f iat(α) holds
I0 := (I \ {A}) ∪ interpretation step(Σ, R, S, A, α).
until Ξ(I) or no A and α can be selected such that I0 6= I
return I
It takes as parameters a strategy function Ω, a termination function Ξ, a back-
ground knowledge Σ, a set of rules R, a scoring function S and an Abox Γ of
observations. It applies the strategy function Ω in order to decide which asser-
tion to interpret, uses a termination function Ξ in order to check whether to
terminate due to resource constraints and a scoring function S to evaluate an
explanation.
The function Ω for the interpretation strategy and Ξ for the termination con-
dition are used as an oracle and must be defined in an application-specific way.
In our multimedia interpretation scenario we assume that requires f iat func-
tion is defined in an application-specific way. The function interpretation step
is defined as follows.
interpretation step(Σ, R, S, A, α):
[
consistent completed explanations(Σ, R, A, ∆).
∆∈compute all explanations(Σ,R,S,A,α)
We need two additional auxiliary functions.
consistent completed explanations(Σ, R, A, ∆):
{∆0 | ∆0 = ∆ ∪ A ∪ forward chain(Σ, R, ∆ ∪ A), consistentΣ (∆0 )}
compute all explanations(Σ, R, S, A, α):
maximize({∆ | ∆ = compute explanation(Σ, R, A, α)}, S).
The function consistent(T ,A0 ) (A) determines if the Abox A ∪ A0 has a model
which is also a model of the Tbox T .
Depending on the application context, some of the observations can be taken
for granted (bona-fide assertions) whereas others as requiring explanations (we
call them fiat assertions for brevity). The requires f iat function is used to split
Γ into bona-fide assertions Γ1 and fiat assertions Γ2 . Assertions for which the
requires f iat function returns true constitute Γ2 , whereas Γ1 = Γ \ Γ2 .
We impose restrictions on the choice of the explanations (∆s) computed dur-
ing the interpretation process. In particular, a scoring function S evaluates an
2
⇐ denotes the assignment operator
explanation ∆ according to the two criteria proposed by Thagard for select-
ing explanations [11], namely simplicity and consilience. According to Thagard,
the less hypothesized assertions an explanation contains (simplicity) and the
more ground assertions (observations) an explanation involves (consilience), the
higher its preference score. The following function can be used to compute the
preference score for a given explanation3 : S(Σ, Γ1 , ∆) := Sf (Σ, Γ1 , ∆) − Sh (∆).
The function Sf represents the number of assertions in the explanation (∆) that
follow from Σ ∪ ∆, and the function Sh represents the number of assertions in
the explanation. Thus, Sf and Sh can be defined as follows:
Sf (Σ, Γ1 , ∆) := ]{α ∈ Γ1 | Σ ∪ ∆ |= α}
Sh (∆) := ]{α ∈ ∆}
The function maximize selects those explanations (∆s) for which the score
S(Σ, Γ1 , ∆) is maximal, i.e., there exists no other ∆0 ∈ I such that
S(Σ, Γ1 , ∆0 ) > S(Σ, Γ1 , ∆).
4 An Interpretation Example
We proceed with the interpretation of a text excerpt to discuss the details of
the interpretation process. Figure 1 shows a text from a web page with athletics
news, the underlined words are tokens which have to be detected by text analysis
processes. The extraction result obtained by using the BOEMIE shallow text
processing technology is depicted in Figure 2 as an Abox. In Figure 3 a small
part of the ontology (Σ) relevant for our discussion is shown. Additionally, a
small excerpt of the set of rules (R) used for the interpretation of texts about
the athletics example is shown in Figure 4. The rules shown in Figure 4 define
the space of abducibles.
‘13 August 2002 - Helsinki. Russia’s newly crowned European champion
Jaroslav Rybakov won the high jump with 2.29 m. Oskari Fronensis from Finland
cleared 2.26 and won silver.’
Fig. 1. A small excerpt from a web page.
The Abox in Figure 2 constitutes the set of observations Γ for the interpret func-
tion (see Section 3.2). The strategy function Ω selects the assertions that are
fiat and therefore require the computation of explanations. In the current imple-
mentation, the strategy function selects all binary predicates shown in Figure 2
in the beginning of the process. Next, each assertion selected by the strategy
function is transformed into a corresponding query and the abductive retrieval
inference service is asked for explanations. For example, from the role assertion
(hjN ame1 , date1 ) : sportsN ameT oDate the following query is derived:
3
For the sake of brevity the parameters of S are not shown in the previous functions.
date1 : Date (date1 , ‘13 August 20020 ) : hasV alue
city1 : City (city1 , ‘Helsinki0 ) : hasV alue
country1 : Country (country1 , ‘Russia0 ) : hasV alue
country2 : Country (country2 , ‘F inland0 ) : hasV alue
perf1 : Performance (perf1 , ‘2.290 ) : hasV alue
perf2 : Performance (perf2 , ‘2.260 ) : hasV alue
rank1 : Ranking (rank1 , ‘silver 0 ) : hasV alue
hjN ame1 : HighJumpN ame (hjN ame1 , ‘high jump0 ) : hasV alue
pN ame2 : P ersonN ame (pN ame1 , ‘Jaroslav Rybakov 0 ) : hasV alue
pN ame1 : P ersonN ame (pN ame2 , ‘Oskari F ronensis0 ) : hasV alue
(pN ame1 , country1 ) : personN ameT oCountry
(pN ame2 , country2 ) : personN ameT oCountry
(pN ame1 , perf1 ) : personNameToPerformance
(pN ame2 , perf2 ) : personNameToPerformance
(hjN ame1 , perf1 ) : sportsNameToPerformance
(hjN ame1 , date1 ) : sportsN ameT oDate
(hjN ame1 , city1 ) : sportsN ameT oCity
Fig. 2. Abox representing the results of text analysis (Γ ).
P erson v ∃hasN ame.P ersonN ame u ∃hasN ationality.Country
Athlete v P erson
HighJumper v Athlete
P oleV aulter v Athlete
HighJumpN ame v SportsN ame u ¬P oleV aultN ame
P oleV aultN ame v SportsN ame
SportsT rial v ∃≤1 hasP articipant.Athlete u ∃≤1 hasPerformance.Performance u
∃≤1 hasRanking.Ranking
HighJump v SportsT rial u ∀hasP articipant.HighJumper u ¬P oleV ault
P oleV ault v SportsT rial u ∀hasP articipant.P oleV aulter
SportsRound v ∃hasN ame.RoundN ame u ∃hasDate.Date u ∃hasP art.SportsT rial
HighJumpRound v SportsRound u ∀hasP art.HighJump u ¬P oleV aultRound
P oleV aultRound v SportsRound u ∀hasP art.P oleV ault
SportsCompetition v ∃hasP art.SportsRound u ∃hasN ame.SportsN ame u ∃takesP lace.City
HighJumpCompetition v SportsCompetition u ∀hasP art.HighJumpRound u
∀hasN ame.HighJumpN ame u ¬P oleV aultCompetition
P oleV aultCompetition v SportsCompetition u ∀hasP art.P oleV aultRound u
∀hasN ame.P oleV aultN ame
Fig. 3. An example Tbox for the athletics domain (see http://www.boemie.org/ for
an extended version).
Q1 := {() | sportsN ameT oDate(hjN ame1 , date1 )}
In the given set of rules R (see Figure 4), two rules have the predicate
sportsN ameT oDate in the rule head. Therefore, both rules are applied in a
backward chaining way (i.e. from left to right) and corresponding terms are
unified and we get variable bindings for X and Y. The unbound variable Z is
instantiated with a fresh individual (e.g. new ind 1). Notice that for one of these
rules, namely for the one that hypothesizes a pole vault competition, all bind-
ings that are found for Y produce explanations (∆s) that are inconsistent w.r.t.
Σ. This is caused by the disjointness axioms in the Tbox (e.g. the concepts
HighJumpName and PoleVaultName are disjoint). The abductive retrieval ser-
personN ameT oCountry (X, Y ) ← P erson(Z),
hasP ersonN ame(Z, X), P ersonN ame(X),
hasN ationality(Z, Y ), Country(Y ).
sportsN ameT oDate (X, Y ) ← HighJumpCompetition(Z),
hasSportsN ame(Z, X), HighJumpN ame(X),
hasDate(Z, Y ), Date(Y ).
sportsN ameT oDate (X, Y ) ← P oleV aultCompetition(Z),
hasSportsN ame(Z, X), P oleV aultN ame(X),
hasDate(Z, Y ), Date(Y ).
sportsCompetitionT oP erf ormance (X, Y ) ← SportsCompetition(X),
hasSportsN ame(X, Z), SportsN ame(Z),
sportsN ameT oP erf ormance(Z, Y ).
sportsCompetitionT oP erf ormance (X, Y ) ← SportsCompetition(X),
hasP art(X, Z), SportsRound(Z),
hasP art(Z, W ), SportsT rial(W )
hasP erf ormance(W, Y ).
Fig. 4. Text interpretation rules for the athletics domain (R).
vice discards inconsistent explanations. Therefore, the generated explanation to
answer Q1 with true is:
∆ = {new ind1 : HighJumpCompetition, (new ind1 , hjN ame1 ) : hasSportsN ame,
(new ind1 , date1 ) : hasDate}
The assertions in ∆ are then added to the Abox A and the rules are applied
in a forward chaining way to find out whether new assertions can be inferred
and therefore have to be added to A as well. In our example there are no new
assertions that can be inferred at this stage.
The same procedure is applied to every assertion selected by the strategy
function, until all assertions considered as requiring fiat are explained. Finally,
an Abox with the following assertions are returned by the interpretation process:
new ind1 : HighJumpCompetition, new ind2 : P erson, new ind3 : P erson,
new ind4 : SportsRound, new ind5 : SportsT rial, new ind6 : SportsT rial,
(new ind1 , hjN ame1 ) : hasSportsN ame, (new ind1 , date1 ) : hasDate,
(new ind1 , new ind4 ) : hasP art, (new ind5 , new ind2 ) : hasP articipant,
(new ind4 , new ind5 ) : hasP art, (new ind5 , perf1 ) : hasPerformance,
(new ind6 , new ind3 ) : hasP articipant, (new ind6 , perf2 ) : hasPerformance
(new ind2 , pN ame1 ) : hasP ersonN ame, (new ind2 , country1 ) : hasN ationality,
(new ind3 , pN ame2 ) : hasP ersonN ame, (new ind3 , country2 ) : hasN ationality
which represents the semantic description (interpretation) of the multimedia
content.
Notice that if interpretations of documents in a multimedia repository are
available as Aboxes, DL reasoners can be used to realize semantic multimedia
retrieval. For example assume that a query for high jump trials Q2 := {(?X) |
HighJumpT rial(?X)} is posed. A DL Reasoner (e.g. RacerPro) would return
new ind5 as a binding for ?X, even though this information is not explicit in
the Abox. Informally speaking, a high jump competition can only have high
jump rounds as parts and a high jump round can only have high jump tri-
als as parts (see the definitions of the concepts HighJumpCompetition and
HighJumpRound in the ontology depicted in Figure 3). Therefore it is certain
that new ind5 has to be a HighJumpT rial instance in all possible worlds. This
simple example shows that DL reasoners are useful tools for detecting implicit in-
formation in multimedia interpretations and thus can be used to realize semantic
multimedia retrieval.
5 Evaluation
In this section, we present an evaluation of an implementation of the multimedia
interpretation process presented in Section 3. The utility of the abduction-based
multimedia interpretation process is analyzed by means of precision, recall and
F-measure through an empirical evaluation of the results obtained for a corpus
of web pages.
Experimental Setting and Criteria To test the approach, an ontology about
the athletics domain was used as well as a corpus of 104 web pages each, con-
taining daily news about athletics events.
The corpus has been annotated manually with state-of-the-art annotation
tool [12]. The manual annotation process has been accomplished in two steps:
First, words in the text have been associated with corresponding concepts in the
ontology. These concepts are: PersonName, Country, City, Age, Gender, Per-
formance, Ranking, SportsName, RoundName, Date and EventName. They are
called mid-level concepts (MLCs) in the BOEMIE project. Second, the text seg-
ments annotated with mid-level concepts as labels are grouped and each group
is associated with a high-level concept (HLC) such as Athlete, SportTrial, Sport-
sRound, SportEvents and SportsCompetition. The outcome of the manual an-
notation process is a set of annotations for the corpus. They contain not only
low-level but also high-level descriptions of the content and serve as ground truth
for the evaluation.
Later the set of manually obtained annotations have been used to train the
text analysis tools in order to automatically extract concept instances as well as
relations between the instances from the corpus. The results of analysis, which
are analysis Aboxes, have been automatically interpreted following the multi-
media interpretation approach presented. As a result, a set of automatically
computed annotations for the same corpus is obtained. This set of annotations,
which represent the results of automatic analysis and interpretation of the cor-
pus, will be evaluated in this section.
To set up the evaluation, a set of queries has been defined in order to ask
for the number of HLCi (high-level concept instances) in both manual and au-
tomatically computed annotations. In this way, names of high-level concepts
constitute the parameters to evaluate the precision and recall of the multimedia
interpretation framework.
Evaluation results Table 1 shows the results of the experiments conducted.
The letters M and AC represent manual and automatically computed annota-
tions respectively. The results in Table 1 show that most of the manually anno-
tated high-level concepts (explanations) could also be computed automatically.
There are exceptional cases in which no explanations have been computed due
to the lack of necessary rules or lack of low-level text analysis results. For exam-
ple, in the case of SportsRound, the interpretation process expects input about
SportsRoundN ame and Date instances in the relation called
sportsRoundN ameT oStartDate, but this structures are rarely found during
text analysis. Therefore, the rule necessary to create instances of SportsRound
is never applied.
Table 1. Evaluation of the multimedia interpretation approach for the text modality
HLC M AC M ∩ AC Precision Recall F-Measure
Athlete 783 591 496 0.84 0.63 0.72
SportsTrial 729 641 513 0.80 0.70 0.74
SportsCompetition 443 200 188 0.94 0.43 0.80
SportsEvent 304 304 266 0.87 0.87 0.87
SportsRound 375 0 0 0 0 0
Furthermore, we observed that from a total of 200 extracted
SportCompetitions, five have been further specialized to HighJumpCompetitions
and ten to P oleV aultCompetitions. This is due to low-level analysis results
where instances of the concepts HighJumpN ame and P oleV aultN ame have
been found. As observed in Figure 3 the definition of a HighJumpCompetition
(P oleV aultCompetiton) requires a HighJumpN ame (P oleV aultN ame) as a
range restriction for the role hasN ame.
6 Conclusions
In this paper we have presented the first declarative way to formalize multimedia
interpretation based on abduction that is directly operational in the sense of
executable specifications. The interpretation engine specified in this paper is
evaluated empirically.
We have shown that given the results of state-of-the-art multimedia analy-
sis tools as low-level descriptions, the whole multimedia interpretation process
can be realized by exploiting declarative domain models and high-optimized de-
ductive and abductive reasoning services. We have discussed how existing DL
reasoning mechanisms and rules can be combined in a coherent framework. We
have also evaluated the approach using examples from the text modality and
have shown that the current implementation provides promising results for web
pages with news about athletics events.
References
1. Aliseda-Llera, A.: Seeking Explanations: Abduction in Logic, Philosophy of Science
and Artifical Intelligence. PhD thesis, University of Amsterdam (1997)
2. Paul, G.: Approaches to Abductive Reasoning–An Overview. AI Review 7 (1993)
109–152
3. Mayer, M.C., Pirri, F.: First-Order Abduction via Tableau and Sequent Calculi.
Bulletin of the IPGL 1(1) (1993) 99–117
4. Hobbs, J.R., Stickel, M., Appelt, D., Martin, P.: Interpretation as abduction.
Artificial Intelligence Journal Vol. 63 (1993) 69–142
5. Shanahan, M.: Perception as Abduction: Turning Sensor Data Into Meaningful
Representation. Cognitive Science 29(1) (2005) 103–134
6. Möller, R., Neumann, B.: Ontology-based Reasoning Techniques for Multimedia
Interpretation and Retrieval. In: Semantic Multimedia and Ontologies : Theory
and Applications. Springer (2008)
7. Neumann, B., Möller, R.: On Scene Interpretation with Description Logics. In
Christensen, H., Nagel, H.H., eds.: Cognitive Vision Systems: Sampling the Spec-
trum of Approaches. Number 3948 in LNCS. Springer (2006) 247–278
8. Kakas, A., Denecker, M.: Abduction in logic programming. In Kakas, A., Sadri,
F., eds.: Computational Logic: Logic Programming and Beyond. Part I. Number
2407 in LNAI. Springer (2002) 402–436
9. Colucci, S., Di Noia, T., Di Sciascio, E., Donini, F.M., Mongiello, M.: A uniform
tableaux-based method for concept abduction and contraction in description logics.
In: Proc. of the 16th Eur. Conf. on Artificial Intelligence (ECAI 2004). (2004) 975–
976
10. Castano, S., Espinosa, S., Ferrara, A., Karkaletsis, V., Kaya, A., Möller, R., Mon-
tanelli, S., Petasis, G., Wessel, M.: Multimedia interpretation for dynamic ontology
evolution. In: Journal of Logic and Computation, Oxford University Press (2008)
11. Thagard, R.P.: The best explanation: Criteria for theory choice. The Journal of
Philosophy (1978)
12. Petasis, G., Karkaletsis, V., Paliouras, G., Androutsopoulos, I., Spyropoulos, C.D.:
Ellogon: A new text engineering platform (2002)