Introduction

Formalizing Multimedia Interpretation based on Abduction over Description Logic Aboxes ?

So a Espinosa Peraldi

Atila Kaya

Ralf Moller

0 0 Hamburg University of Technology , Germany

The paper describes how interpretations of multimedia documents can be formally derived using abduction over domain knowledge represented in an ontology. The approach uses an expressive ontology speci cation language, namely description logics in combination with logic programming rules, and formalizes the multimedia interpretation process using a combined abduction and deduction operation. We describe how the observables as well as the space of abducibles can be formally de ned. The approach is evaluated using examples from text processing, but can also be applied to interpret content in other modalities.

Introduction

Multimedia interpretation can be de ned as the process of extracting deep-level semantics from content. Deep-level semantics represent the abstract meaning of the content and they are the result of reasoning processes over background knowledge. Automatic interpretation of multimedia content for information extraction (IE) purposes is becoming a major issue of research due to requirements to reduce costs in the areas of knowledge management and information retrieval. The goal is to automate interpretation based on formal models tailored towards a particular domain.

The basic idea of the multimedia interpretation approach that we pursue in this paper is that content descriptions derived by low-level analysis processes should be supported by constructing one or multiple high-level \explanations". The appraoch has been explored in the EU-funded projects BOEMIE (http://www.boemie.org) and CASAM (http://www.casam-project.eu).

Since the goal is to derive high-level descriptions with new objects and assertions among new and \old" objects, deduction alone is not an appropriate formalization. In addition to deduction, an abduction step is required, as has been realized several times in the literature.

{ First techniques for nding abductive explanations have been discussed long ago [ 1, 2 ]. In [ 3 ] Mayer and Pirri present a semantic tableaux for abduction in the context of rst-order logic using `reversed skolomization'. Until now only the formalization of this problem is shown, the algorithms are not practical. ? This paper has been partially supported by the projects BOEMIE and CASAM (FP6-027538 and FP7-217061, respectively). { in [ 4 ] Hobbs et al. formalize text interpretation as abduction. In his work Hobbs also argues that deduction is not enough and describes how high-level text interpretation can be realized as abduction over predicate logic formulae. Since predicate logic reasoning is undecidable in general, and the performance of reasoning engine is rather low (if they terminate for a particular problem), we pursue an approach that uses a less expressive but decidable formalism for which highly optimized reasoners exist (e.g., RacerPro). { In [ 5 ], Shanahan presents a formal theory of robot perception as a form of abduction. In this work, low-level sensor data is transformed into a symbolic representation using rst-order logic and abduction is used to derive explanations. Logic-based modeling is used to describe the behavior of procedural programs in [ 5 ], however. { In the context of scene interpretation, recently, Moller and Neumann proposed the use of DLs for the representation of aggregates that can be used by reasoning services as building blocks for the scene interpretation process [ 6, 7 ]. However, in these works, description logics are used to theoretically describe multimedia interpretation processes only.

The contribution of this paper is the rst declarative way to formalize multimedia interpretation that is directly operational in the sense of executable speci cations. The paper describes how the \observables requiring explanation" as well as the \space of abducibles" can be formally de ned using logic programming techniques. In contrast to approaches such as [ 8 ], which use abduction in the context of rules in logic programming only, or appraoches which deal only with concept abduction or [ 9 ], our approach combines existing DL reasoning mechanisms and rules in a coherent framework, i.e., abduction is considered as a new type of non-standard Abox retrieval inference service, which is integrated into an existing DL reasoner. The approach is evaluated using examples from text processing, but can also be applied to interpret multimedia documents in other modalities. 2

Preliminaries

For studying interpretation as abduction over ontologies we focus on the description logic ALCQ. We assume that the reader is familiar with description logics, and we only introduce speci c operators necessary for Abox abduction. 2.1

Sequences, Variable Substitutions and Transformations For the introduction of the interpretation algorithm, we need some additional de nitions. A variable is a name of the form ?name where name is a string of characters from fa::zg. In the follow de nitions, we denote places where variables can appear with uppercase letters.

Let V be a set of variables, and let X; Y1; : : : ; Yn be sequences h: : :i of variables from V . Z denotes a sequence of individuals. We consider sequences of length 1 or 2 only, if not indicated otherwise, and assume that (hXi) is to be read as (X) and (hX; Y i) is to be read as (X; Y ) etc. Furthermore, we assume that sequences are automatically attened. A function as set turns a sequence into a set in the obvious way.

A variable substitution = [X i; Y j; : : :] is a mapping from variables to individuals. The application of a variable substitution to a sequence of variables hXi or hX; Y i is de ned as h (X)i or h (X); (Y )i, respectively, with (X) = i and (Y ) = j. In this case, a sequence of individuals is de ned. If a substitution is applied to a variable X for which there exists no mapping X k in then the result is unde ned. A variable for which all required mappings are de ned is called admissible (w.r.t. the context). 2.2

Grounded Conjunctive Queries Let X; Y1; Yn be sequences of variables, and let Q1; : : : ; Qn denote atomic concept or role descriptions.

A query is de ned by the following syntax.

f(X) j Q1(Y1); : : : ; Qn(Yn)g The sequence X may be of arbitrary length but all variables mentioned in X must also appear in at least one of the Y1; ; Yn: as set(X) as set(Y1) [ [ as set(Yn).

Informally speaking, Q1(Y1); : : : ; Qn(Yn) de nes a conjunction of so-called query atoms Qi(Yi). The list of variables to the left of the sign j is called the head and the atoms to the right of are called the query body. The variables in the head are called distinguished variables. They de ne the query result. The variables that appear only in the body are called non-distinguished variables and are existentially quanti ed.

Answering a query with respect to an ontology means nding admissible variable substitutions such that j= f( (Y1)) : Q1; : : : ; ( (Yn)) : Qng. We say that a variable substitution = [X i; Y j; : : :] introduces bindings i; j; : : : for variables X; Y; : : :. Given all possible variable substitutions , the result of a query is de ned as f( (X))g Note that the variable substitution is applied before checking whether j= f( (Y1)) : Q1; : : : ; ( (Yn)) : Qng, i.e., the query is grounded rst.

For a query f(?y) j P erson(?x); hasP articipant(?y; ?x)g and the Abox 1 = find1 : HighJ ump, ind2 : P erson, (ind1; ind2) : hasP articipantg, the substitution [?x ind2; ?y ind1] allows for answering the query, and de nes bindings for ?y and ?x.

A boolean query is a query with X being of length zero. If for a boolean query there exists a variable substitution such that j= f( (Y1)) : Q1; : : : ; ( (Yn)) : Qng holds, we say that the query is answered with true, otherwise the answer is false.

Later on, we will have to convert query atoms into Abox assertions. This is done with the function transform. The function transform applied to a set of query atoms f 1; : : : ng is de ned as ftransform( 1; ); : : : ; transform( n; )g where transform(P (X); ) := ( (X)) : P . 2.3

Rules A rule r has the following form P (X) Q1(Y1); : : : ; Qn(Yn) where P, Q1; : : : ; Qn denote atomic concept or role descriptions with the additional restriction (safety condition) that as set(X) as set(Y1) [ [ as set(Yn).

Rules are used to derive new Abox assertions, and we say that a rule r is applied to an Abox A. The function call apply( ; P (X) Q1(Y1); : : : ; Qn(Yn); A) returns a set of Abox assertions f( (X)) : P g if there exists an admissible variable substitution such that the answer to the query

f() j Q1(Y1); : : : ; Qn(Yn)g is true with respect to [ A.1 If no such can be found, the result of the call to apply( ; r; A) is the empty set. The application of a set of rules R = fr1; : : : rng to an Abox is de ned as follows.

apply( ; R; A) = [ apply( ; r; A) r2R The result of forward chain( ; R; A) is ; if apply( ; R; A) [ A = A and apply( ; R; A) [ forward chain( ; R; A [ apply( ; R; A)) otherwise. 3

Multimedia Interpretation

The multimedia interpretation process aims to compute interpretations (sets of descriptions) of a multimedia document based on low-level descriptions and background knowledge. Beside low-level descriptions about objects and their relations, an interpretation also contains high-level descriptions about abstract objects like events and their relations with low-level descriptions. High-level descriptions cannot directly be inferred from low-level descriptions but they have to be hypothesized w.r.t. some background knowledge.

In the rest of this section, we rst formalize abduction as the key inference service for explaining observations, and we then describe the multimedia interpretation process that exploits both abduction and deduction to compute interpretations. 3.1

Computing Explanations via Abduction In general, abduction is formalized as [ j= where background knowledge ( ), and observations ( ) are given and explanations ( ) are to be computed. In terms of DLs, and are Aboxes and is a TBox.

Abox abduction is implemented as a non-standard retrieval inference service in DLs, in contrast to standard retrieval inference services where answers are found by exploiting the ontology, Abox abduction has the task of acquiring what should be added to the ontology in order to answer a query. Therefore, the 1 If [ A is inconsistent the result is well-de ned but useless. It will not be used afterwards. result of Abox abduction is a set of hypothesized Abox assertions. To achieve this, the space of abducibles has to be previously de ned and we do this in terms of rules.

We assume that a set of rules R as de ned above (see Section 2.3) are speci ed, and de ne a non-deterministic function compute explanation as follows. { compute explanation( ; R; A; (Z) : P ) = transform( ; ) if there exists a rule r = P (X) Q1(Y1); : : : ; Qn(Yn) 2 R that is applied to an Abox A such that a set of query atoms and an admissible variable substitution with (X) = Z can be found, and the query Q := f() j expand(P (X); r; R) n g is answered with true. { If no such rule r exists in R it holds that compute explanation( ; R; A; (Z) : P ) = ;.

The goal of the function compute explanation is to determine what must be added ( ) such that an entailment [ A [ j= (Z) : P holds. Hence, for compute explanation, abductive reasoning is used. The set of query atoms de nes what must be hypothesized in order to answer the query Q with true such that expand(P (X); r; R) holds. The de nition of compute explanation is non-deterministic due to several possible choices for .

The function application expand(P (X); P (X) Q1(Y1); : : : ; Qn(Yn); R) is also de ned in a non-deterministic way as expand0(Q1(Y1); R) [ [ expand0(Qn(Yn); R) with expand0(P (X); R) being expand(P (X); r; R) if there exist a rule r = P (X) : : : 2 R and hP (X)i otherwise. We say the set of rules is backward-chained, and since there might be multiple rules in R, backward-chaining is non-deterministic as well. Thus, multiple explanations are generated. 3.2

The Media Interpretation Process In the following we devise an abstract computational engine for \interpreting" Abox assertions in terms of a given set of rules. Interpretation in this sense is not to be confused with the interpretation of a concept description (which is de ned as a set of objects from the domain). Interpretation of Abox assertions w.r.t. a set of rules is meant in the sense that using the rules some high-level explanation is constructed such that the Abox assertions are entailed. The interpretation of an Abox is again an Abox. For instance, the output Abox might represent results of a content interpretation process (see below for an example). The presentation in this paper slghtly extended the one in [ 10 ].

Let be an Abox of observations whose assertions are to be explained. The goal of the interpretation process is to use a set of rules R to derive \explanations" for elements in . The interpretation algorithm implemented in the interpretation engine works on a set of (possible) interpretations I, i.e., a set of Aboxes.

Initially, I ( f g, i.e. f(pN ame1; country1) : personN ameT oCountry, (hjN ame1; city1) : sportsN ameT oCityg, at this stage, the interpretation is just the input Abox .2 The complete multimedia interpretation process is implemented by the interpret function: function interpret( ; ; ; R; S; ) :

I0 := f g repeat

I := I0 (A; ) := (I) // A 2 I, 2 A s.th. requires f iat( ) holds

I0 := (I n fAg) [ interpretation step( ; R; S; A; ). until (I) or no A and can be selected such that I0 6= I return I It takes as parameters a strategy function , a termination function , a background knowledge , a set of rules R, a scoring function S and an Abox of observations. It applies the strategy function in order to decide which assertion to interpret, uses a termination function in order to check whether to terminate due to resource constraints and a scoring function S to evaluate an explanation.

The function for the interpretation strategy and for the termination condition are used as an oracle and must be de ned in an application-speci c way. In our multimedia interpretation scenario we assume that requires f iat function is de ned in an application-speci c way. The function interpretation step is de ned as follows.

consistent completed explanations( ; R; A; ): interpretation step( ; R; S; A; ):

[ 2compute all explanations( ;R;S;A; ) We need two additional auxiliary functions. consistent completed explanations( ; R; A; ): f 0 j 0 = [ A [ forward chain( ; R; [ A); consistent ( 0)g compute all explanations( ; R; S; A; ): maximize(f j

= compute explanation( ; R; A; )g; S): The function consistent(T ;A0)(A) determines if the Abox A [ A0 has a model which is also a model of the Tbox T .

Depending on the application context, some of the observations can be taken for granted (bona- de assertions) whereas others as requiring explanations (we call them at assertions for brevity). The requires f iat function is used to split into bona- de assertions 1 and at assertions 2. Assertions for which the requires f iat function returns true constitute 2, whereas 1 = n 2.

We impose restrictions on the choice of the explanations ( s) computed during the interpretation process. In particular, a scoring function S evaluates an 2 ( denotes the assignment operator explanation according to the two criteria proposed by Thagard for selecting explanations [ 11 ], namely simplicity and consilience. According to Thagard, the less hypothesized assertions an explanation contains (simplicity) and the more ground assertions (observations) an explanation involves (consilience), the higher its preference score. The following function can be used to compute the preference score for a given explanation3: S( ; 1; ) := Sf ( ; 1; ) Sh( ). The function Sf represents the number of assertions in the explanation ( ) that follow from [ , and the function Sh represents the number of assertions in the explanation. Thus, Sf and Sh can be de ned as follows: We proceed with the interpretation of a text excerpt to discuss the details of the interpretation process. Figure 1 shows a text from a web page with athletics news, the underlined words are tokens which have to be detected by text analysis processes. The extraction result obtained by using the BOEMIE shallow text processing technology is depicted in Figure 2 as an Abox. In Figure 3 a small part of the ontology ( ) relevant for our discussion is shown. Additionally, a small excerpt of the set of rules (R) used for the interpretation of texts about the athletics example is shown in Figure 4. The rules shown in Figure 4 de ne the space of abducibles.

`13 August 2002 - Helsinki. Russia's newly crowned European champion Jaroslav Rybakov won the high jump with 2.29 m. Oskari Fronensis from Finland cleared 2.26 and won silver.'

The Abox in Figure 2 constitutes the set of observations for the interpret function (see Section 3.2). The strategy function selects the assertions that are at and therefore require the computation of explanations. In the current implementation, the strategy function selects all binary predicates shown in Figure 2 in the beginning of the process. Next, each assertion selected by the strategy function is transformed into a corresponding query and the abductive retrieval inference service is asked for explanations. For example, from the role assertion (hjN ame1; date1) : sportsN ameT oDate the following query is derived: 3 For the sake of brevity the parameters of S are not shown in the previous functions. In the given set of rules R (see Figure 4), two rules have the predicate sportsN ameT oDate in the rule head. Therefore, both rules are applied in a backward chaining way (i.e. from left to right) and corresponding terms are uni ed and we get variable bindings for X and Y. The unbound variable Z is instantiated with a fresh individual (e.g. new ind 1). Notice that for one of these rules, namely for the one that hypothesizes a pole vault competition, all bindings that are found for Y produce explanations ( s) that are inconsistent w.r.t.

. This is caused by the disjointness axioms in the Tbox (e.g. the concepts HighJumpName and PoleVaultName are disjoint). The abductive retrieval serpersonNameT oCountry (X; Y ) sportsNameT oDate (X; Y ) sportsNameT oDate (X; Y ) sportsCompetitionT oP erformance (X; Y ) sportsCompetitionT oP erformance (X; Y )

P erson(Z); hasP ersonName(Z; X); P ersonName(X); hasNationality(Z; Y ); Country(Y ): HighJumpCompetition(Z); hasSportsName(Z; X); HighJumpName(X); hasDate(Z; Y ); Date(Y ): P oleV aultCompetition(Z); hasSportsName(Z; X); P oleV aultName(X); hasDate(Z; Y ); Date(Y ): SportsCompetition(X); hasSportsName(X; Z); SportsName(Z); sportsNameT oP erformance(Z; Y ): SportsCompetition(X); hasP art(X; Z); SportsRound(Z); hasP art(Z; W ); SportsT rial(W ) hasP erformance(W; Y ): vice discards inconsistent explanations. Therefore, the generated explanation to answer Q1 with true is: = fnew ind1 : HighJ umpCompetition, (new ind1; hjN ame1) : hasSportsN ame; (new ind1; date1) : hasDateg The assertions in are then added to the Abox A and the rules are applied in a forward chaining way to nd out whether new assertions can be inferred and therefore have to be added to A as well. In our example there are no new assertions that can be inferred at this stage.

The same procedure is applied to every assertion selected by the strategy function, until all assertions considered as requiring at are explained. Finally, an Abox with the following assertions are returned by the interpretation process: new ind1 : HighJ umpCompetition; new ind2 : P erson; new ind3 : P erson; new ind4 : SportsRound; new ind5 : SportsT rial; new ind6 : SportsT rial; (new ind1; hjN ame1) : hasSportsN ame; (new ind1; date1) : hasDate; (new ind1; new ind4) : hasP art; (new ind5; new ind2) : hasP articipant; (new ind4; new ind5) : hasP art; (new ind5; perf1) : hasPerformance; (new ind6; new ind3) : hasP articipant; (new ind6; perf2) : hasPerformance (new ind2; pN ame1) : hasP ersonN ame; (new ind2; country1) : hasN ationality; (new ind3; pN ame2) : hasP ersonN ame; (new ind3; country2) : hasN ationality which represents the semantic description (interpretation) of the multimedia content.

Notice that if interpretations of documents in a multimedia repository are available as Aboxes, DL reasoners can be used to realize semantic multimedia retrieval. For example assume that a query for high jump trials Q2 := f(?X) j HighJ umpT rial(?X)g is posed. A DL Reasoner (e.g. RacerPro) would return new ind5 as a binding for ?X, even though this information is not explicit in the Abox. Informally speaking, a high jump competition can only have high jump rounds as parts and a high jump round can only have high jump trials as parts (see the de nitions of the concepts HighJ umpCompetition and HighJ umpRound in the ontology depicted in Figure 3). Therefore it is certain that new ind5 has to be a HighJ umpT rial instance in all possible worlds. This simple example shows that DL reasoners are useful tools for detecting implicit information in multimedia interpretations and thus can be used to realize semantic multimedia retrieval. 5

Evaluation

In this section, we present an evaluation of an implementation of the multimedia interpretation process presented in Section 3. The utility of the abduction-based multimedia interpretation process is analyzed by means of precision, recall and F-measure through an empirical evaluation of the results obtained for a corpus of web pages.

Experimental Setting and Criteria To test the approach, an ontology about the athletics domain was used as well as a corpus of 104 web pages each, containing daily news about athletics events.

The corpus has been annotated manually with state-of-the-art annotation tool [ 12 ]. The manual annotation process has been accomplished in two steps: First, words in the text have been associated with corresponding concepts in the ontology. These concepts are: PersonName, Country, City, Age, Gender, Performance, Ranking, SportsName, RoundName, Date and EventName. They are called mid-level concepts (MLCs) in the BOEMIE project. Second, the text segments annotated with mid-level concepts as labels are grouped and each group is associated with a high-level concept (HLC) such as Athlete, SportTrial, SportsRound, SportEvents and SportsCompetition. The outcome of the manual annotation process is a set of annotations for the corpus. They contain not only low-level but also high-level descriptions of the content and serve as ground truth for the evaluation.

Later the set of manually obtained annotations have been used to train the text analysis tools in order to automatically extract concept instances as well as relations between the instances from the corpus. The results of analysis, which are analysis Aboxes, have been automatically interpreted following the multimedia interpretation approach presented. As a result, a set of automatically computed annotations for the same corpus is obtained. This set of annotations, which represent the results of automatic analysis and interpretation of the corpus, will be evaluated in this section.

To set up the evaluation, a set of queries has been de ned in order to ask for the number of HLCi (high-level concept instances) in both manual and automatically computed annotations. In this way, names of high-level concepts constitute the parameters to evaluate the precision and recall of the multimedia interpretation framework.

Evaluation results Table 1 shows the results of the experiments conducted. The letters M and AC represent manual and automatically computed annotations respectively. The results in Table 1 show that most of the manually annotated high-level concepts (explanations) could also be computed automatically. There are exceptional cases in which no explanations have been computed due to the lack of necessary rules or lack of low-level text analysis results. For example, in the case of SportsRound, the interpretation process expects input about SportsRoundN ame and Date instances in the relation called sportsRoundN ameT oStartDate, but this structures are rarely found during text analysis. Therefore, the rule necessary to create instances of SportsRound is never applied. Furthermore, we observed that from a total of 200 extracted SportCompetitions, ve have been further specialized to HighJ umpCompetitions and ten to P oleV aultCompetitions. This is due to low-level analysis results where instances of the concepts HighJ umpN ame and P oleV aultN ame have been found. As observed in Figure 3 the de nition of a HighJ umpCompetition (P oleV aultCompetiton) requires a HighJ umpN ame (P oleV aultN ame) as a range restriction for the role hasN ame. 6

Conclusions

In this paper we have presented the rst declarative way to formalize multimedia interpretation based on abduction that is directly operational in the sense of executable speci cations. The interpretation engine speci ed in this paper is evaluated empirically.

We have shown that given the results of state-of-the-art multimedia analysis tools as low-level descriptions, the whole multimedia interpretation process can be realized by exploiting declarative domain models and high-optimized deductive and abductive reasoning services. We have discussed how existing DL reasoning mechanisms and rules can be combined in a coherent framework. We have also evaluated the approach using examples from the text modality and have shown that the current implementation provides promising results for web pages with news about athletics events.

1. Aliseda-Llera , A. : Seeking Explanations: Abduction in Logic, Philosophy of Science and Arti cal Intelligence . PhD thesis , University of Amsterdam ( 1997 )

2. Paul , G.: Approaches to Abductive Reasoning{An Overview . AI Review 7 ( 1993 ) 109 { 152

3. Mayer , M.C. , Pirri , F. : First-Order Abduction via Tableau and Sequent Calculi . Bulletin of the IPGL 1 ( 1 ) ( 1993 ) 99 { 117

4. Hobbs , J.R. , Stickel , M. , Appelt , D. , Martin , P. : Interpretation as abduction . Arti cial Intelligence Journal Vol. 63 ( 1993 ) 69 { 142

5. Shanahan , M. : Perception as Abduction: Turning Sensor Data Into Meaningful Representation . Cognitive Science 29 ( 1 ) ( 2005 ) 103 { 134

6. Moller, R., Neumann , B. : Ontology-based Reasoning Techniques for Multimedia Interpretation and Retrieval . In: Semantic Multimedia and Ontologies : Theory and Applications . Springer ( 2008 )

7. Neumann , B. , Moller, R.: On Scene Interpretation with Description Logics . In Christensen, H., Nagel , H.H., eds.: Cognitive Vision Systems: Sampling the Spectrum of Approaches. Number 3948 in LNCS . Springer ( 2006 ) 247 { 278

8. Kakas , A. , Denecker , M.: Abduction in logic programming . In Kakas, A. , Sadri , F., eds.: Computational Logic: Logic Programming and Beyond. Part I. Number 2407 in LNAI . Springer ( 2002 ) 402 { 436

9. Colucci , S. , Di Noia, T. , Di Sciascio , E. , Donini , F.M. , Mongiello , M.: A uniform tableaux-based method for concept abduction and contraction in description logics . In: Proc. of the 16th Eur. Conf. on Arti cial Intelligence (ECAI 2004 ). ( 2004 ) 975 { 976

10. Castano , S. , Espinosa , S. , Ferrara , A. , Karkaletsis , V. , Kaya , A. , Moller, R., Montanelli , S. , Petasis , G. , Wessel , M. : Multimedia interpretation for dynamic ontology evolution . In: Journal of Logic and Computation , Oxford University Press ( 2008 )

11. Thagard , R.P.: The best explanation: Criteria for theory choice . The Journal of Philosophy ( 1978 )

12. Petasis , G. , Karkaletsis , V. , Paliouras , G. , Androutsopoulos , I. , Spyropoulos , C.D.: Ellogon: A new text engineering platform ( 2002 )