-

Journal of Applied Logic 1 (2003) 273-308. [25] U. Straccia

10.1017/S1471068422000163

Verifying Properties of a MultiLayer Network for the Recognition of Basic Emotions in a Conditional DL with Typicality (Extended Abstract)

Mario Alviano

Francesco Bartoli

Marco Botta

Roberto Esposito

Laura Giordano

Daniele Theseider Dupré

1 0 DEMACS, Università della Calabria , Via Bucci 30/B, 87036 Rende (CS) , Italy 1 DISIT, Università del Piemonte Orientale , Viale Michel 11, 15121 Alessandria , Italy 2 Dipartimento di Informatica, Università di Torino , Corso Svizzera 185, 10149 Torino , Italy

2017

12678 2 9

The extended abstract (an abridged version of [1]) reports about our work investigating the relationships between a multi-preferential semantics for defeasible reasoning in knowledge representation and a multilayer neural network model. Weighted knowledge bases for a simple description logic with typicality are considered under a (many-valued) “concept-wise” multipreference semantics. The semantics is used to provide a preferential interpretation of MultiLayer Perceptrons (MLPs). A model checking and an entailment based approach are exploited in the verification of properties of neural networks for the recognition of basic emotions.

eol>Description Logics Preferential and Conditional reasoning Typicality Explainability

have exceptions.

In weighted defeasible knowledge bases (KBs) typicality inclusions come with a weight. A concept can be associated with a set of typicality inclusions (conditionals) of the form T() ⊑ ,, with a weight , representing the prototypical properties of concept . The weight is a real number representing the plausibility or implausibility of the property , for members of . For instance, one may want to represent a situation in which horses are normally tall and run fast, it is very plausible that they have a tail, but implausible that they have stripes. In a weighted KB these defeasible properties of horses may be represented as:

T(Horse) ⊑ Tall , 4 .5 T(Horse) ⊑ RunFast , 4 .2

T(Horse) ⊑ ∃has_Tail .⊤, 9 .7 T(Horse) ⊑ ∃has_Stripes.⊤, − 20 where negative weights represent implausible properties. The defeasible Tbox above can be used to define an ordering among domain elements, comparing their typicality as horses. For instance, assuming that Spirit is tall, has tail, no stripes and does not run fast, while Buddy is tall, has tail, runs fast and has stripes, we can expect that spirit <Horse buddy . In our approach such features (such as, being tall or having a tail) are as well represented as concepts in the DL.

In the two valued case, the preference relations < can be constructed from the KB by defining the weight () of a domain element with respect to a concept , by summing up the weights of the typicality inclusions for satisfied by . The preference relations are then induced from such weights as: < if () > (). In the example: Spirit satisfies the first and the third default, hence WHorse (spirit ) = 14 .2 , while Buddy satisfies all the defaults, hence, WHorse (buddy ) = − 1.6. As WHorse (spirit ) > WHorse (buddy ) then spirit <Horse buddy . The semantic construction is in the spirit of other semantics for conditionals [ 23, 9, 24 ], but it adopts multiple preferences.

Note that the interpretation of a typicality concept T(), for an arbitrary (e.g., T(Student ⊓Employee)) would require the definition of a preference < for each , or the definition of a global preference relation <. In [20], e.g., a global preference < is defined based on a (modified) Paretocombination of preferences < . An alternative route is to move to a fuzzy interpretation of concepts, and define < based on the fuzzy interpretation of .

Fuzzy and many-valued DLs are well studied in the literature (see, for instance, [25, 26, 27, 28, 29]). In fuzzy DLs, the idea is that a concept is interpreted as a function : ∆ → [ 0, 1 ] mapping each domain element to a value in the unit interval [ 0, 1 ]. Then, for a domain element ∈ ∆ , () is regarded the degree of membership of in concept . In the fuzzy case [ 21, 1 ], the preference relation < of any concept is induced by the fuzzy interpretation of concept : < if () > (). In a non-crisp interpretation of typicality [ 1 ], the fuzzy interpretation of typicality concepts T() in an interpretation is defined as: (T()) () = (), if there is no ∈ ∆ such that < ; (T()) () = 0, otherwise. This choice has some impact on the (KLM) properties of entailment. When (T()) () > 0, we say that is a typical -element in (and all typical -elements have the same membership degree in ).

As in the two-valued case, besides usual fuzzy DL axioms, a weighted KB includes a defeasible TBox, a set of weighted typicality inclusions T() ⊑ ,, with weight , for each distinguished concept . The definition of () in a fuzzy interpretation is defined by considering the degree to which satisfies the properties (being tall, running fast, etc.). The weight () of wrt in an interpretation = ⟨∆ , · ⟩ is defined as follows: () = ∑︀ℎ ℎ ,ℎ(), if () > 0; () = −∞ , otherwise.

The models of a KB are required to satisfy further properties beyond satisfying fuzzy DL axioms [30], by enforcing that the membership degree () of in is aligned with the weight () in . For instance, in coherent models [21] of a KB, we require that < if () > (). Faithful models [31] exploit a slightly weaker condition, while the stronger notion of -coherence of a fuzzy interpretation wrt a KB exploits a monotonically non-decreasing function : R → [ 0, 1 ]. is -coherent with respect to a weighted KB if: for all ∈ and ∈ ∆ , () = ( ()).

A mapping of a multilayer network to a conditional KB can be be defined in a simple way [ 21, 1 ], by associating a concept name with each unit in the network and by introducing, for each synaptic connection from neuron ℎ to neuron with weight ℎ, a conditional T() ⊑ ℎ with weight ℎ. If we assume that is the activation function of all units in the network (having value in the unit interval [ 0, 1 ]), then the -coherent semantics characterizes unit activation: () corresponds to the activation of unit for some input stimulus . The semantics can also consider multiple functions to represent the activation functions of diferent units. -coherent interpretations capture the stationary states of the network, both for MLPs and for recurrent networks, which allow for feedback cycles (a weighted KB can indeed have cycles).

Since a multilayer network can be regarded as a conditional KB, entailment in the conditional logic can be used for the verification of conditional properties of the network for post-hoc verification . Undecidability results for fuzzy DLs with general inclusion axioms [32, 29] have led to considering a ifnitely-valued version of -coherent semantics, which provides an approximation of the fuzzy semantics [ 1 ], by taking = {0, 1 , . . . , − 1 , 1}, for ≥ 1, as the truth space. For the boolean fragment, in the finitely-valued case, an ASP-based approach has been proposed for defeasible reasoning under -coherent entailment [33]. Complexity results have been investigated, as well as the scalability of diferent encodings of entailment in ASP, by taking advantage of custom propagators, weak constraints and weight constraints [34].

In [ 1 ] we consider both the entailment based approach and a model checking approach in the verification of conditional properties of some trained multilayer feedforward networks for the recognition of basic emotions, using the Facial Action Coding System (FACS) [35] and the RAF-DB [36] data set, containing almost 30000 images labeled with basic emotions or combinations of two emotions. The images were input to OpenFace 2.0 [37], which detects a subset of the Action Units (AUs) in [35], corresponding to facial muscle contractions; The AUs were used as input layer of an MLP, trained to recognize four emotions. The relations between such AUs and emotions, studied by psychologists [38], have been used as a reference for formulae to be verified.

The model checking approach exploits the behavior of the network over a set ∆ of input exemplars (e.g., the test set), to construct a single multi-preferential interpretation with domain ∆ , considering only some units of interest (e.g., input and output units). For such units , the associated concept is () be the activity of unit for input . Graded conditional properties of the interpreted by letting form T() ⊑ ≥ (as well as strict properties ⊑ ≥ ) can then be checked in . Verifying the satisfiability of an inclusion in the interpretation requires polynomial time in the size of and of the formula.

The entailment based approach has been experimented for a binary classification task, for the class happiness vs other emotions. A set of 8 835 images was used. The OpenFace output intensities were rescaled in order to make their distribution conformant to the expected one in case AUs are recognized by humans [35]. The resulting 17 AUs were used as input units of a fully connected feed forward NN, with two hidden layers of 50 and 25 nodes, using the logistic activation function for all layers. The F1 score of the trained network was 0.831. Verification has been performed taking 5 as the truth value space (given that a scale of five values, plus absence, is used by humans for AU intensities), and using minimum t-norm, the associated t-conorm, and standard involutive negation. With truth space 5 and 17 AUs as input units, the size of the search space for the solver was 617, i.e., more than 1013. The weighted conditional knowledge base associated to the network contains 2 201 weighted typicality inclusions. The version of the solver in [34] based on weight constraints and order encoding was used.

Let us consider the two graded inclusion axioms: (a) T(happiness) ⊑ au1 ⊔au6 ⊔au12 ⊔au14 ≥ /5 and (b) T(happiness) ⊑ au6 ⊔ au12 ≥ /5. The model checking approach, applied to the test set (2 651 individuals with 390 instances of T(happiness)), finds that both formulae hold for = 3 and do not hold for = 4.

In the entailment approach, the solver finds in seconds that (a) is not entailed for = 4, and in minutes that it is entailed for = 1, while for = 2, 3, it does not provide a result in hours. On a variant of the experiment, using as inputs AU intensities that are not rescaled, the solver finds in seconds that (a) is not entailed for = 2, and in minutes that it is entailed for = 1. The graded inclusion axiom (b) is entailed for = 1 and not for = 3. In the latter case, then, a counterexample is found by entailment, whose search space includes all possible combinations of input vectors, while it is not found by model checking on the test set. The co-existence of strict and defeasible inclusions in weighted KBs also allows for combining empirical knowledge with elicited knowledge for reasoning and for post-hoc verification. A diferent experiment in the verification of properties of a network trained to classify its input as an instance of four emotions surprise, fear, happiness, anger, is also reported in [ 1 ].

While the model-checking approach does not require to consider the activity of all units to build a preferential interpretation of a network, in the entailment-based approach all units are considered. Also, the model-checking approach, based on the conditional multi-preferential semantics, is a general (model agnostic) approach, which may be suitable to explain diferent network models (and was first considered for SOMs [22]). On the other hand, the entailment-based approach is specific for MLPs. Both approaches are global ones (see, e.g., [39]), as they consider the behavior of the network over a set ∆ of input stimuli. We refer to [ 1 ] for detailed results, discussion and related work on this conditional approach to explainability.

Acknowledgments The work was partially supported by the INDAM-GNCS Project 2024 “LCXAI: Logica Computazionale per eXplainable Artificial Intelligence”.

[1]

Alviano ,

Bartoli ,

Botta ,

Esposito ,

Giordano ,

D. Theseider

Dupré , A preferential interpretation of multilayer perceptrons in a conditional logic with typicality , Int. Journal of Approximate Reasoning 164 ( 2024 ). URL: https://doi.org/10.1016/j.ijar. 2023 . 109065 .

[2]

Delgrande , A first-order conditional logic for prototypical properties , Artificial Intelligence 33 ( 1987 ) 105 - 130 .

[3]

Makinson , General theory of cumulative inference , in: Non-Monotonic Reasoning , 2nd International Workshop, Grassau, FRG, June 13-15, 1988 , Proceedings, 1988 , pp. 1 - 18 .

[4]

Kraus ,

Lehmann ,

Magidor , Nonmonotonic reasoning, preferential models and cumulative logics , Artificial Intelligence 44 ( 1990 ) 167 - 207 .

[5]

Pearl , System Z : A natural ordering of defaults with tractable applications to nonmonotonic reasoning , in: TARK' 90 , Pacific

Grove

, CA, USA, 1990 , pp. 121 - 135 .

[6]

Lehmann ,

Magidor , What does a conditional knowledge base entail? , Artificial Intelligence 55 ( 1992 ) 1 - 60 .

[7]

Benferhat ,

Cayrol ,

Dubois ,

Lang ,

Prade , Inconsistency management and prioritized syntax-based entailment , in: Proc. IJCAI'93 , Chambéry „ 1993 , pp. 640 - 647 .

[8]

Booth , J. B. Paris, A note on the rational closure of knowledge bases with both positive and negative knowledge , Journal of Logic, Language and Information 7 ( 1998 ) 165 - 190 . doi: 10 .1023/A: 1008261123028 .

[9]

Kern-Isberner , Conditionals in Nonmonotonic Reasoning and Belief Revision - Considering Conditionals as Agents , volume 2087 of LNCS , Springer, 2001 .

[10]

Lewis , Counterfactuals, Basil Blackwell Ltd, 1973 .

[11]

Nute , Topics in conditional logic, Reidel, Dordrecht ( 1980 ).

[12]

Giordano ,

Gliozzi ,

Olivetti ,

G. L.

Pozzato , Preferential Description Logics, in: LPAR 2007 , volume 4790 of LNAI , Springer, Yerevan, Armenia, 2007 , pp. 257 - 272 .

[13]

Britz ,

Heidema , T. Meyer, Semantic preferential subsumption, in: G. Brewka, J. Lang (Eds.), KR 2008 , AAAI Press, Sidney, Australia, 2008 , pp. 476 - 484 .

[14]

Giordano ,

Gliozzi ,

Olivetti ,

G. L.

Pozzato , ALC+T: a preferential extension of Description Logics , Fundamenta Informaticae 96 ( 2009 ) 1 - 32 .

[15]

Casini , U. Straccia, Rational Closure for Defeasible Description Logics , in: T. Janhunen, I. Niemelä (Eds.), JELIA 2010 , volume 6341 of LNCS , Springer, Helsinki, 2010 , pp. 77 - 90 .

[16]

Casini , T. Meyer, K. Moodley,

Nortje , Relevant closure: A new form of defeasible reasoning for description logics , in: JELIA 2014 , LNCS 8761, Springer, 2014 , pp. 92 - 106 .