Dialogue in Hierarchical Learning of a Concept Using Prototypes and Counterexamples Extended Abstract Soma Dutta and Piotr Wasilewski Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland somadutta9@gmail.com piotr@mimuw.edu.pl While dealing with vague concepts often it puts us in fix to determine whether to a particular situation/case/state a particular concept applies or not. A human perceiver can determine some cases as the positive instances of the concept, and some as the negative instances of the same; but there always remain cases, which might have some similari- ties with some positive cases, and also have some similarities with some negative cases of the concept. So we propose to learn about the applicability of a concept to a particular situation using a notion of similarity of the situation with the available prototypes (pos- itive instances) and counterexamples (negative instances) of the concept. Perceiving a vague concept, due to the inherent nature of vagueness, is subjective, and thus never can be exhausted by listing down all the positive and negative instances of the concept. Rather we may come to realize about the applicability, or non-applicability, or applica- bility to some extent, of a concept to a situation in a step-by-step hierarchical manner by initiating dialogue between a perceiver and the situation descriptor. Hence, the main key ingredients of this proposal are (i) prototypes and counterexamples of a concept, (ii) similarity based arguments in favour and against of applicability of a concept at a particular situation, and (iii) hierarchical learning of the concept through dialogues. Similarity based reasoning [3], hierarchical learning of concepts [1], dialogue in the context of approximation space [2] all are separately important directions of research. For our purpose, in this presentation we would concentrate on combining these aspects from a different angle. In [4], a preliminary version of logic of prototypes and counterexamples has been set. To make this paper self-contained, we recapitulate the necessary definitions below. We start with a set S of finitely many situations, say {s1 , s2 , . . . , sn }, and A of finitely many attributes {a1 , a2 , . . . , am }. Each si , i = 1, 2, . . . n, is considered to be a function si : A 7→ [0, 1]. Let the consolidated data of each situation is stored in the form of a set {hsi (a1 ), si (a2 ), . . . , si (am )i : si ∈ S}, which is a subset of [0, 1]m . Let W ⊆ [0, 1]m and {hsi (a1 ), si (a2 ), . . . , si (am )i : si ∈ S} ⊆ W . Each member of W may be called a world. We now consider a fuzzy approximation space hW, Simi, where Sim is a fuzzy similarity relation between worlds of W . That is, Sim : W × W 7→ [0, 1], and we assume Sim to satisfy the following properties. (i) Sim(ω, ω) = 1 (reflexivity) (ii) Sim(ω, ω ′ ) = Sim(ω ′ , ω) (symmetry) (iii) Sim(ω, ω ′ ) ∗ Sim(ω ′ , ω ′′ ) ≤ Sim(ω, ω ′′ ) (transitivity). Following [3], the fuzzy approximation space hW, Simi is based on the unit 127 interval [0, 1] endowed with a t-norm ∗ and a S-implication operation →. We now propose to represent any (vague) concept α by a pair (α+ , α− ) consisting of the positive instances (prototypes) and negative instances (counterexamples) of α respectively, where α+ , α− ⊆ W and α+ ∩ α− = φ. Definition 1 [4]. Given the fuzzy approximation space hW, Simi, and a con- cept α represented by (α+ , α− ), the degree to which α applies to a world ω ∈ W , denoted by gr(ω |= α), is given by: gr(ω |= α) = 1 if ω ∈ α+ , = 0, if ω ∈ α− , and = Simα+ (ω) ∗ ¬Simα− (ω), otherwise. The fuzzy upper approximations Simα+ and Simα− are defined following the Definition proposed in [3], i.e., Simα+ (ω) = supu∈W Sim(ω, u) ∗ α+ (u) = supu∈α+ Sim(ω, u). Similar is the case for Simα− . ¬ is considered to be the standard complementation operation defined as ¬a = 1 - a. Let us call Simα+ (ω) = Daf (ω, α), the degree of arguments in favour of ω qualifies α and Simα− (ω) = Dag (ω, α), the degree of arguments against ω qualifies α. So, given a concept α and world ω ∈/ α+ , α− , gr(ω |= α) = Daf (ω, α) ∗ ¬Dag (ω, α). Let us now pose the issue of the research in disguise of a practical need. Let we have a clinical record of n number of patients’ details with respect to some m number of parameters/attributes. These parameters might be some objective values of some clin- ical tests, called signs, or some subjective features experienced by the patients, called symptoms. With respect to the state of each patient, the values corresponding to all these parameters are converted, by some mean, to the values over a common scale, say [0, 1]. That is, if an m-tuple hx1 , x2 , . . . , xm i from [0, 1]m represents the rates of the m parameters corresponding to a patient, then we say hx1 , x2 , . . . , xm i describes the state of a patient. Based on the rates assigned to all the parameters by each patient, i.e. a m-tuple of values hx1 , x2 , . . . , xm i from [0, 1]m , which cases representing the states of the patients are how much similar or dissimilar may be anticipated. Now, one task is to make a tentative diagnosis about a patient whose measurement concerning the m-tuple of parameters appears to be new with respect to the database of the n patients. Now with the above set-up, developed in [4], we may compute gr(ω |= d), the degree of applicability of a disease d for the newly appeared situation, say world ω. The value viz., gr(ω |= d), for different diseases, may help to make a hypothetical assumption regarding the plausible disease. For being more certain about the diagnosis, it is quite natural to enquire about some more factors/attributes. So the dialogue would have a role to play here. In order to incorporate dialogue in the previous set-up, below we would present the above mentioned theory in a broader framework. Let us first fix the domain of the (vague) concepts of our concern. Let A be the set of all attributes (finitely many) required to understand all the concepts over the fixed domain. At time t0 , with respect to a set of attributes At0 ⊆ A we have a set of finitely many situations, say St0 , at hand such that which situation is characterized by which concept is known to us. That is, given a situation s from St0 , s is characterized as a positive instance or negative instance of some of the concepts c over the domain of concern. So, we say St0 , a set of situations, is charaterized by the set of attributes 128 At0 at time t0 . Let St0 = {s1t0 , s2t0 , . . . snt0 } and At0 = {a1 , a2 , . . . am }. We say a database at time t0 with respect to the set of situations St0 , denoted as DSt0 , is the set of all tuples of values for the attributes of At0 for each situations of St0 . That is, for each ai (sjt0 ) ∈ [0, 1], i ∈ {1, 2, . . . m} and j ∈ {1, 2, . . . , n}, DSt0 = {ha1 (sjt0 ), . . . am (sjt0 )i : sjt0 ∈ St0 } ⊆ [0, 1]m . According to rough set literature DSt0 is basically an information system. We assume that for each database DSt0 there is a database manager, may be called decision maker, dm(St0 ) . Definition 2: A dialogue base at some time point tk is a tuple (G1 , G2 , . . . Gr , Ri , Re ) such that each Gj = hAgj , Aj , Ri i constitutes of a set of agents Agj , a set of at- tributes Aj , and an accessibility relation Ri among the agents. Ri stands for internal accessibility relation among the agents of each group Gj . Each Agj ⊇ Stjk ∪ {dmj } for some time point tk , where Stjk is the a set of situations characterized by Aj . That is each Agj contains a set of situations Stjk and a database manager dmj corresponding to Stjk . For each s ∈ Stjk , Ri (s, dmj ) holds, and Ri is symmetric. The relation Re is a reflexive, symmetric relation and it stands for external accessibility re- lation between different database managers. That is, for some j, l, Re (dmj , dml ) holds. Intuitively each Gj of the dialogue base contains a set of nodes and relation Ri among the nodes. These nodes constitute the set Agj . Some of the nodes represent those situations which are characterized by Aj , the set of attributes of Gj . dmj is a node designated as database manager. Stjk , the set of situations characterized by Aj , generates a database DS j of tuple of values for each attribute of Aj corresponding to tk each situation of Stjk . DS j ⊆ Wj , and hence is embedded in the approximation space tk (Wj , SimAj ). In each Gj the dmj has access to the other nodes. Through dialogue it is expected that dmj would enquire a particular situation (i.e. node) for information, and the particular situation would provide the information corresponding to the query. So, Ri has to be symmetric as both database manager and the situation descriptor should have access to make the communication. The external accessibility relation Ri allows accessing two database managers. A database manager can access her own information, and if dmj can access dml , then the reverse also holds. So, Re is reflexive and symmetric. Summarizing the whole, we can say that each Agj of Gj is a set of nodes some of which are specific cases, already characterized by the set of attributes Aj of Gj , at some time point. dmj can be considered as a dummy node which can access any other node. The rest of nodes can be any new case/situation appearing at some further point of time. That is why Agj ⊇ Stjk ∪ {dmj }. On the other hand, the detailed information about the set of situations Stjk , which are characterized by Aj , are available in the corresponding database (or information system) DS j . The information system or tk database DS j is also open to handle new information corresponding to new cases of tk Agj . That is why, DS j ⊆ Wj , where (Wj , SimAj ) is a fuzzy approximation space tk based on the attributes Aj . The following picture is a model of what we are trying to formalize through the notion of dialogue base and approximation spaces containing 129 different databases characterized by different sets of attributes. Gj Gl Re Gi (Wi, SimAi) Agi dm Si Di Ri Dialogue base Database embedded In approximation space Fig. 1. Internal and external dialogues among the granules of a dialogue base and cor- responding outcomes generated in the respective approximation spaces Now given the prelude of the practical need, what do we expect from a dia- logue? A dialogue at the first time point t0 , denoted as diagt0 , would consists of two rounds r1 and r2 . At round r1 the database manager may ask the situation s to provide the values for a set of attributes ha1 , . . . am i. At the round r2 the situation descriptor answers the query with the tuple ha1 = v1 , . . . , am = vm i, or in other words simply a tuple of values from [0, 1]m . So, as an output of a complete dialogue at time t0 between a situation s and the corresponding database manager dmit0 we expect to receive a tuple of values from [0, 1]m . So, combining the both round we may write that output of a dialogue at time t0 is given by, diagt0 (s, dmit0 ) = ω ∈ W ⊆ [0, 1]m . Though we are going to combine the two rounds in a single complete dialogue, there is a difference in the nature of the two rounds. The dialogue in r1 throws a question, and the dialogue in r2 provides an answer. So, the dialogue somehow moves the communication from the first agent’s approximation space to the second agent’s approximation space. So, the definition of dialogue is proposed as follows. Definition 3: Given a dialogue base (G1 , G2 , . . . Gr , Ri , Re ) at time t0 , a dia- logue between two agents ag1 , ag2 , denoted as diagt0 (ag1 , ag2 ), is defined as follows. (i) diagt0 (ag1 , ag2 ) = ω ∈ Wi of the approximation space (Wi , SimAti ), if Ri (ag1 , ag2 ) holds for ag1 , ag2 ∈ Agi of the group (Agi , Ati , Ri ), and Wi ⊇ DSti for 0 130 Sti0 ⊆ Agi . (ii) diagt0 (ag1 , ag2 ) = ω ∈ W ′ of (W ′ , SimAt′ ), if Re (ag1 , ag2 ) holds for ag1 ∈ Agi and ag2 ∈ Agj , where At′ = Ati ∩ Atj , and W ′ = Wj |At′ where Wj ⊇ DS j for t0 Stj0 ⊆ Agj . Wj |At′ denotes the restriction of Wj to the attribute set At′ . Now we cast our problem of hierarchical learning of a concept in the frame- work of dialogue. The idea is to start at time point t0 with a set of situations St0 characterized by a set of attributes At0 . The information corresponding to St0 would be available in the database DSt0 . The situations St0 corresponding to the attributes At0 is a part of a granule Gt0 of a dialogue base. Now given a new situation s first, the corresponding database manager dmt0 of Gt0 would initiate a dialogue with the situation s. As an outcome of diagt0 (s, dmt0 ) there will be a tuple of values for each attributes of At0 . This tuple of values represents a world, say ω in Wt0 , the universe of the approximation space (Wt0 , SimAt0 ), in which DSt0 is already embedded. Now with respect to (Wt0 , SimAt0 ), one can compute gr(ω |=e c), the degree of applicability of a concept c to the world ω. In order to be more certain in the decision, the database manager at the next point of time t1 (> t0 ) may initiate another dialogue with s asking for values for some additional attributes. In that situation the dialogue would proceed from the old approximation space to a new approximation space with respect to a bigger set of attributes. Process of hierarchical learning of a concept Step 1 We fix a set of attributes A (finitely many) for a fixed domain C of finitely many concepts, and consider all possible subsets of A. We assume that for each possible A1 ⊆ A there is a set of situations S1 characterized by A1 in the sense that each s ∈ S1 is characterized as either a positive or a negative instance of some of the concepts of C. Step 2 For each set of situations Si characterized by Ai there is a database DSi consisting of tuple of values for each attribute of Ai corresponding to each situation of Si . We consider Wi ⊇ DSi where Wi ⊆ [0, 1]|Ai | . For each database DSi we assume the presence of a database manager dmi . Step 3 Now we start with a dialogue base (G1t0 , G2t0 , . . . Grt0 , Ri , Re ) at time t0 . For each i = 1, 2, . . . r, Git0 = hAgit0 , Ait0 , Ri i and Agit0 ⊇ Sit0 ∪ {dmit0 }, where Sit0 is the set of situations characterized by Ait0 . Each Sit0 is embedded in an approximation space (Wit0 , SimAit0 ) through its database DSit0 . To mark the time point t0 corresponding to each component of a dialogue base we have used suffixes like it0 . But every Sit0 must coincide with some Sj of situations, about which we have discussed at Step 2, as either of the groups (Git0 , 1 ≤ i ≤ r) of the dialogue base constitutes of a set of agents and a set of attributes taken from A. Step 4 At time t0 , given a situation s ∈ Agit0 - {dmit0 } a dialogue is initiated as diagt0 (s, dmit0 ). The output of the dialogue would provide a tuple of values from Wit0 . Let us assume that diagt0 (s, dmit0 ) = ω ∈ Wit0 . Step 5 Now in the fuzzy approximation space (Wit0 , SimAit0 ), based on the development made in [4], we can compute gr(ω |=e c) (see Definition 1) for a concept belonging 131 to C. Based on some (significantly high) value gr(ω |=e c) for some c, we can make a hypothesis for the ‘applicability of the concept c at the world ω (= s(t0 ))’, the world assigned by situation s at time t0 . Step 6 To be more certain regarding the decision, at time t1 (> t0 ) the second dia- logue may be initiated as diagt1 (s, dmit1 ), where dmit1 is the same database manager dmit0 at the next time point t1 , as the old dialogue base changes to (G1t1 , G2t1 , . . . Grt1 , Ri , Re ) considering the new time point. The new Git1 con- tains all the agents of Git0 and preserves the same relation Ri among the agents of Git0 . It differs in the set of attributes Ait1 where Ait1 ⊇ Ait0 . As, At0 ⊆ At1 , Sit0 ⊆ Sit1 . We take Git1 = Git0 ∪ Sit1 , where Sit1 is the set of situations charac- terized by At1 . So, diagt1 (s, dmit1 ) would now provide a new world ω ′ from the approximation space (Wit1 , SimAit1 ). Wit0 is embedded in Wit1 in the sense that Wit1 ⊆ Wit0 × [0, 1]|At1 |−|At0 | , and for each u = hu1 , . . . u|At0 | i ∈ Wit0 , there is u′ ∈ Wit1 such that u′ = hu1 , . . . u|At0 |,0,0...,0 i having entry ‘0’ for the rest of the |At1 | − |At0 | components. When a dialogue at time ti moves to a new approxima- tion space from its previous approximation space at time ti−1 we call the dialogue proceeds. Step 7 As all the subsets of the whole attribute set A is considered, the set of attributes At1 ⊆ A must merge with some Aj considered in the begining. So, there is al- ready a set of situations and corresponding database embedded in the approxima- tion space (Wit1 , SimAit1 ) = (Wj , SimAj ), and the dialogue at time t1 moves into the new approximation space with respect to Aj . So, with respect to the approxi- mation space (Wj , Aj ) we can compute gr(ω ′ |=e c). Step 8 Now, if gr(ω |=e c) < gr(ω ′ |= c) where s(t0 ) = ω and s(t1 ) = ω ′ , then we may consider the situation s to be ascribed as an instance of c. Step 8 describes just a simple case for including a situation in the prototypes of a con- cept. In practice, the constraints for including a situation as a positive instance of a concept c, or as a negative instance of a concept c may have several layers of dialogues. In this presentation we would explore that idea more specifically. It is also to be noted that for each respective case of W , where W ⊆ [0, 1]l for any finite natural number l, we assume the presence of a binary fuzzy similarity relation. Below we present a simple application of the present proposal considering the same example taken in [4]. Example: Let we have a clinical database of a set of situations, S = {s1 , s2 , s3 , . . . , s9 } with respect to a set A of attributes, consisting of temperature, blood-pressure, blood-tests, ecg, headache, sneezing, convulsion, vomiting, skin-rash, dizzi- ness, stomach-upset, stomach pain. Sequentially let us call these attributes as a1 , a2 , a3 , . . . , a12 . As a1 , . . . , a4 are determined by some objective val- ues of some tests these are called signs; the rest are symptoms, determined by some subjective values as experienced by particular patients. Let CB = {F ev, Allergy, Stomachinf , HBP, LBP, V ertigo, U nconsciousness}, and C is the union of CB and {F evc , F evv , Stroke, F ood-poisoning, V iralinf , P eptic- ulcer}. The relations among the dependent and the independent concepts of C are as follows. 132 F ev ⊆k F evc ; F ev, Allergy ⊆k F evv ; F ev, HBP, V ertigo, U nconsciousness ⊆k Stroke; F ev, Stomachinf ⊆k F ood-poisoning; F ev, Stomachinf , Allergy ⊆k V iralinf ; and Stomachinf ⊆k P eptic-ulcer. F evc and F evv respectively stand for fever due to cold and viral fever. HBP, LBP , Stomachinf , and V iralinf respectively stand for high blood pressure, low blood pressure, stomach infection and viral infection. Each si is identified with its state given by hsi (a1 ), si (a2 ), si (a3 ), . . . , si (a12 )i ∈ [0, 1]12 , and W (⊆ [0, 1]12 ) contains hsi (a1 ), si (a2 ), si (a3 ), . . . , si (a12 )i for each si . The tuple of values corresponding to each si , and the positive and negative cases of each disease from CB are given in the following table. Let us use d1 for F ev, d2 for Allergy, d3 for Stomachinf , d4 for HBP , d5 for LBP , d6 for V ertigo, and d7 for U nconsciousness. For each si if dj receives +, then si is the positive case for dj , and if it receives −, then si is the negative case of dj . So, following the definitions proposed in [4] we can easily calculate gr(si |=e c) for any si ∈ S and c ∈ C. Table 1. Patients data table a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 d1 d2 d3 d4 d5 d6 d7 s1 .8 .3 .5 0 .7 .7 0 0 0 0 0 0 + - - - - s2 .5 .8 .7 .7 .7 0 .7 0 .3 .9 0 0 - + - + + s3 .9 .5 .7 0 .7 .8 0 0 0 0 0 0 + - - - - s4 .4 .3 .6 0 0 0 0 .5 0 0 .7 .6 - - + - - - s5 .7 .1 .7 0 .3 0 0 .7 0 0 .9 .7 - + - - - s6 .7 .2 .8 0 .3 0 0 .5 .7 0 .7 .7 + + - - - s7 .9 .5 .8 0 .8 .8 0 0 .3 .2 0 0 + - - - - s8 .3 .1 .3 .8 .7 0 .7 0 0 .8 0 0 - - - - + + + s9 .6 .7 .7 .5 .5 0 0 .9 0 .2 .9 1 - + - - Let a new situation s10 , with the tuple h.5, .5, .5, .5, .7, 0, .8, .5, .7, .8, .5, 0i of values corresponding to the respective attributes, appear. The task is to make a diagnosis for s10 . Now we pose the above problem in the framework of the present proposal in the following way. Let us start at time t0 with the dialogue base (Gt0 , Re , Ri ). For simplicity we have considered only one group of agents having internal relation Ri and exter- nal relation Re . Let St0 = S, and at time t0 the set of situations St0 is charater- ized by A = At0 . The database at time t0 , denoted as DSt0 , is basically the set {hsi (a1 ), si (a2 ), . . . , si (a12 )i : si ∈ St0 }. Now Gt0 ⊇ St0 ∪ dmt0 , where dmt0 is a dummy agent representing the database manager for DSt0 , and DSt0 ⊆ Wt0 ⊆ [0, 1]12 . For each si ∈ St0 , Ri (dmt0 , si ) holds, and Ri is symmetric. Now in appearance of a new situation s10 , the outcome of the dialogue between dmt0 and s10 at the round 133 r1 is diagtr01 (dmt0 , s10 ) = ha1 , a2 , . . . a12 i, and that of at the second round of the dia- logue is diagtr02 (s10 , dmt0 ) = h.5, .5, .5, .5, .7, 0, .8, .5, .7, .8, .5, 0i. Combining both the rounds we write diagt0 (s10 , dmt0 ) = h.5, .5, .5, .5, .7, 0, .8, .5, .7, .8, .5, 0i = w (say). Now based on the proposal presented in [4], with respect to the fuzzy approximation space (Wt0 , SimAt0 ) one can calculate gr(w |=e c) for some c ∈ C. Let us denote the degree to which s10 qualifies c at time t0 as gr(w |=te0 c). Let for c = Stroke, gr(w |=te0 c) = 21 . In order to be more certain the decision maker may need to ask the patient for some more tests. Let the new test, i.e., the attribute a13 is M RI-scan (Magnetic Resonance Imaging). So, the dialogue base at the next time point t1 moves to (Gt1 , Re , Ri ) where Gt1 = Gt0 ∪ St1 ⊇ St1 ∪ {dmt1 }, dmt1 is the same as dmt0 at the next point of time, and St1 is the set of situations characterized by At1 = At0 ∪ {a13 }. According to the definition of a dialogue base Ri (dmt1 , si ) holds for each si ∈ St1 . Now the new dialogue at time t1 would be diagtr11 (dmt1 , s10 ) = ha1 , a2 , . . . a12 , a13 i, and let diagtr12 (dmt1 , s10 ) = hs10 (a1 ), s10 (a2 ), . . . s10 (a12 ), s10 (a13 )i = w′ ∈ Wt1 , where Wt1 ⊆ Wt0 × [0, 1] such that for each w = hx1 , x2 , . . . x12 i ∈ Wt0 there is hx1 , x2 , . . . x12 , 0i ∈ Wt1 . So, now at time t1 with respect to the fuzzy approximation space (Wt1 , SimAt1 ) one can calculate gr(w′ |=te1 c), and based on gr(w′ |=te1 c) ≥ gr(w |=te0 c) or gr(w′ |=te1 c) ≤ gr(w |=te0 c) a decision regarding considering s10 as a positive or negative case of Stroke may be taken through a hierarchical manner of learning. Acknowledgement Authors of this abstract are thankful to Professor Andrzej Skowron for his suggestion to introduce a framework of dialogue in the development of this work. This work has been carried out during the tenure of ERCIM Alain Bensoussan fellowship of the first author, and this joint work is partially supported by the Polish National Science Centre (NCN) grants DEC-2011/01/D/ST6/06981. References 1. Bazan J. G., Skowron A., and Swiniarski R.. Layered learning for concept synthesis. Trans- actions on Rough Sets V: Journal Subline LNCS 4100, pages 39-62, 2006. 2. Chakraborty M.K., Banerjee M., Rough dialogue and implication lattices. Funda- menta Informaticae, 75(1-4), 123-139, 2007. 3. Dubois D., and Prade H., Rough fuzzy sets and fuzzy rough sets. International Jour- nal of General Systems, 17:191-200, 1990. 4. Dutta S., Wasilewski P., Concept Synthesis Using Logic of Prototypes and Coun- terexamples: a Graded Consequence Approach, M. Kryszkiewicz, S. Bandyopadhyay, H. Rybinski, S.K. Pal (eds), Pattern Recognition and Machine Intelligence, 6th Interna- tional Conference, PReMI 2015, LNCS 9124, 303-313, 2015. 5. Klir, G.J., and Yuan, B.,: Fuzzy sets and fuzzy logic: theory and applications, Prentice Hall of India, (1995).