=Paper=
{{Paper
|id=Vol-1424/Paper7
|storemode=property
|title=Preliminary Study Towards a Fuzzy Model for Visual Attention
|pdfUrl=https://ceur-ws.org/Vol-1424/Paper7.pdf
|volume=Vol-1424
|dblpUrl=https://dblp.org/rec/conf/ijcai/RalescuBC15
}}
==Preliminary Study Towards a Fuzzy Model for Visual Attention==
Preliminary Study Towards a Fuzzy Model for Visual Attention
Anca Ralescu1 , Isabelle Bloch2 , and Roberto Cesar3
1. EECS Department, University of Cincinnati, ML 0030, Cincinnati, OH 45221, USA - anca.ralescu@uc.edu
2. Institut Mines Telecom, Telecom Paristech, CNRS LTCI, Paris, France - isabelle.bloch@telecom-paristech.fr
3. University of Sao Paulo, IME, Sao Paulo, Brazil - cesar@ime.usp.br
Abstract different biological and computational approaches to model
such phenomena. For instance, the center-surround hypothe-
Attention, in particular visual attention, has been a sub- sis (a common issue for the analysis of receptive fields in the
ject of studies in various disciplines, including cognitive
retina) is a classical model for bottom-up saliency (Gao, Ma-
science, experimental psychology, and computer vision.
In cognitive science and experimental psychology the hadevan, and Vasconcelos 2008). In such settings, Gao and
objective is to develop theories that can explain the at- co-authors (Gao, Mahadevan, and Vasconcelos 2008) incor-
tention phenomenon of cognition. In computer vision, porate discriminant features and decision-theoretic model
the objective is to inform image understanding systems for saliency characterization. Saliency detection is important
by hypotheses on the human visual attention. There is, in many different imaging and vision applications (Yan et al.
however, very little influence of studies across these two 2013; Yang et al. 2013). For instance, in medical imaging,
disciplines. In a departure from this state of affairs, this saliency maps are useful to guide model-based image seg-
study seeks to develop an algorithmic approach to visual mentation (Fouquier, Atif, and Bloch 2012), thus merging
attention as part of an image understanding system, by top-down and bottom-up approaches.
starting with a theory of visual attention put forward in
experimental psychology. In the process, it will become The mechanism of attention has been studied intensively
useful to revise some of the concepts of this theory, in
in the field of psychology and cognitive science, (Kahne-
particular by adopting fuzzy set based representations
and the necessary calculus for them. man 1973), (Treisman and Gelade 1980), (Treisman 1988),
(Treisman 2014), (Humphreys 2014), (Bundesen, Habekost,
and Kyllingsbæk 2005) (Bundesen, Vangkilde, and Petersen
1 Introduction 2014). In this paper we focus on the theory of visual atten-
tion introduced in (Bundesen 1990), where visual recogni-
As subject of human cognition, attention has attracted a
tion and attentional selection are considered as the task of
great interest from the fields of cognitive science and ex-
perceptual categorization, basically deciding to which cate-
perimental psychology.
gory an object or element of the visual field belongs.
Visual attention is a wide field, largely addressed in the
literature covering different aspects. Some works related Following the notation of (Bundesen 1990), throughout
to the present paper are briefly reviewed, without seeking this paper, x is an input item, e.g. image or image region, or
at exhaustivity. One approach relies on Gestalt theory, and more generally an item to be categorized of recognized. The
Gestalt and computer vision models are compared by (Des- collection of all items x is denoted by S. A category is de-
olneux, Moisan, and Morel 2003). Two sets of experiments noted by i and the collection of all categories is denoted by
for Gestalt detection methods are carried out and compared R. A category can stand for an ontological category (e.g., an
to computationally predicted results. Object size and noise object, or a scene), or for subsets in the range of a particular
are the two parameters taken into account in these experi- attribute (e.g., red for the attribute color). Regardless of the
ments. The authors indicate that the qualitative thresholds situation the conceptual treatment of categories and/or items
predicted by the proposed computational approach of gestalt is the same. E(x, i) denotes the event/statement ”x is in cat-
detection fit the human perception. egory i”. When viewed as an event, one can talk about its
Another approach is purely computational and based on probability; when viewed as a statement, one can talk about
image information. An important review on visual atten- its truth or its possibility.
tion modeling is presented by (Borji and Itti 2013). The
important aspect of saliency-based attention is specifically From this point on this paper is organized as follows:
addressed in this review. Nearly 65 models are reviewed Section 2 contains a brief review of TVA concepts and
and classified in a didactical taxonomy that helps clarify- mechanisms - filtering and pigeonholing. Section 3 presents
ing the field. Visual saliency refers to a bottom-up phe- the motivation for the introduction of fuzzy sets; the fuzzy
nomenon where some scene regions are detected as more mechanisms of filtering and pigeonholing. Conclusions and
prominent than others due to some visual features. There are future research are in Section 4.
2 TVA concepts and mechanisms of attention Example 1 Let T stand for the task to determine if an ob-
In this section, we review and comment the main concepts ject identified in an image corresponds to a “flag of some
and modeling steps of the Theory of Visual Attention (TVA) country”. The decision is to be based on color information
by (Bundesen 1990). only. Assume several color categories and their respective
pertinences as shown in Table 1.
2.1 Attentional Weight
One of the main concepts introduced in TVA is that of atten-
Table 1: Color categories and their respective pertinence val-
tional weight defined as follows:
X ues to the task “Identify flag of a country”.
w(x) = η(x, i)π(i) (1) Color category: i Category pertinence: π(i)
i∈R red 0.8
What are the possible interpretations of the quantities in yellow 0.3
Equation (1)? If η(x, i) is interpreted as the salience of x for black 0.1
category i, then w(x) could be interpreted as the salience of green 0.2
x across the family of categories R, averaged with respect (max π(i), imax ) (0.8, red)
to category pertinence. From the point of view of computer (min π(i), imin ) (0.1, black)
vision, η(x, i) is simply the output of an operator designed
to provide information for category i.
Note that pertinence of a category is (or must be) consid- In this example
ered with respect to a task, which could be a categorization η(x, Tred ) = 0.8η(x, red); η(x, Tblack ) = 0.1η(x, black).
at a higher semantic/ontological level. Adopting this point of
view, the product η(x, i)π(i) can then be interpreted as the In Equation (1) only those categories i with π(i) > 0 con-
pertinence of item x to the task with respect to which cate- tribute to w(x). This means that categories which are not
gory i had pertinence π(i). More precisely, one can define pertinent (i.e., π(i) = 0) are never considered for x, even
π(x, Ti ) = η(x, i)π(i) when η(x, i) is very large.
To summarize, with the interpretation of η(x, i)π(i) as de-
as the pertinence of x to Ti where Ti is the task to which scribed above, the attentional weight w(x) defined by Equa-
category i has pertinence value π(i). tion (1) is the cumulative pertinence of x to a task T , ob-
For example, suppose that i is the color category “red” of tained from strength of the sensory evidence given by x to
the attribute color. Furthermore, suppose that the color cat- all categories, in proportion to their pertinence to the task
egory “red” has pertinence π(red) to the task of identifying T.
visually an object such as, for instance, the “flag of some
country”. Let now x be a region in an image, and η(x, red) 2.2 Hazard Function
the output of evaluating it with respect to the color “red”. In (Bundesen 1990) the notion of a hazard function ν(x, i) is
Then η(x, Tred ) = η(x, red)π(red) is the pertinence of x introduced as ν(x, i) = P rob(E(x, i)), that is, the probabil-
to the task Tred . ity that item x is in category i (e.g., image region x is red).
Taking max/min with respect to x obtains: It is assumed (see 2nd assumption in (Bundesen 1990)) that
xmax,red = arg max η(x, Tred ), ν is computed as:
x∈S
the region in the input which is most pertinent to Tred , and ν(x, i) = η(x, i)β(i)w(x) (2)
xmin,red = arg min η(x, Tred ), 1
where η(x, i) and w(x) are as described above , and β(i) is
x∈S
introduced to indicate a bias for category i. Since ν is in-
the region in the input which is least pertinent to Tred . terpreted as a probability, ν(x, i) ∈ [0, 1], which is ensured
Similarly, taking max / min over categories, yields when η(x, i), β(i), w(x) ∈ [0, 1], without additional con-
imax = arg max π(i); imin = arg min π(i) straints on these values. Moreover, when R is an exhaustive
i∈R i∈R,π(i)>0
set of exclusive (non-overlapping) categories, then ν should
the most/least pertinent categories respectively. The condi-
P
be normalized so that i∈R ν(x, i) = 1, in order to really
tion π(i) > 0 ensures that categories which are not pertinent satisfy its interpretation from (Bundesen 1990) as a proba-
at all, i.e. with π(i) = 0, are not taken into account, so the bility. More recently, in (Bundesen, Vangkilde, and Petersen
trivial case π(imin ) = 0 is never obtained. Then, for fixed 2014) β(i) is decomposed as
x, η(x, imax ), η(x, imin ) are the strengths of evidence for x
to be in the highest/lowest pertinence category, and β(i) = Ap(i)u(i) (3)
π(x, Tmax ) = η(x, imax )π(imax ) where A ∈ [0, 1] is the level of alertness, and p(i) and u(i)
π(x, imin ) = η(x, Tmin )π(imin ) are respectively, the prior probability and utility of category
i. One can imagine that A also varies with the category, in
are the importance of x to the task corresponding to the cate-
gory of highest/lowest pertinence value. Versions of the fol- 1
Note that the expression of (Bundesen 1990) involves a nor-
lowing “flag example” will be used in this paper to illustrate
P
malized version of w, i.e. w(x)/ x∈S w(x). Here we implicitly
various points. assume that w is normalized, in order to simplify equations.
which case A in Equation (3) is replaced by an Ai . This is above. Computing now P (x is i | x is categorized) yields:
justified by the fact that one may be more alert to a cate-
gory than to others. In an image processing system, A, or Ai P (x is i | x is categorized) = P ν(x,i)
ν(x,k)
could be tied to the performance of the image processing op- η(x,i)β(i)w(x)
k∈R
(5)
erators used. The components p(i), u(i) of β(i), and hence = w(x) P
ν(x,k)
= Pη(x,i)β(i)
ν(x,k)
k∈R k∈R
β(i), must also be tied up to a (higher level) task T . While
p(i) may be obtained from past data and experiments on the which does not depend on w, hence satifies condition (F2).
task T , u(i) seems to be purely subjective, and to a large In Equation (5) the numerator is ν(x, i) since
extent, its role seems to overlap with that is π(i). Plugging {x is i} ⊂ {x is categorized}
w(x) and β(i) in (2) results in
P and therefore
ν(x, i) = Aη(x, i)p(i)u(i) j∈R η(x, j)π(j) P (x is i, x is categorized) = P (x is i), while the denom-
= Ap(i)u(i)[η(x, i)2 π(i)+ (4) inator uses an assumption on non-overlapping
P categories to
write P (x is categorized) as k∈R ν(x, k). Dropping the
P
+η(x, i) j6=i η(x, j)π(j)]
constraint of non-overlapping categories is discussed later
which suggests that the most important role in computing in this study.
ν(x, i) is played by the sensory evidence. In particular, ν’s
largest value is obtained when A = p(i) = u(i) = 1, (i.e. 2.4 Pigeonholing
under maximum alertness, maximum prior probability, and For fixed item x ∈ S, pigeonholing (Bundesen 1990) refers
maximum utility), and in that case ν(x, i) is a function only to the mechanism of selecting a category i ∈ R (given a
of the sensory evidence. Stated differently, this means that higher level task), across a set of items S. It seeks to:
A, p(i) and u(i) can only decrease the value of ν(x, i). How- P
(P1) increase x∈S ν(x, i) for category i pertinent to the
ever, they may provide a mechanism to account for different task, such that
types of subjective information, and of ranking the values of P
ν(x, i) when they enter its definition as shown in Equations (P2) for all j ∈ R, j 6= i, x∈S ν(x, j) does not change
(2) - (4). The justification in (Bundesen, Vangkilde, and Pe- Pigeonholing can be done by increasing β(i) for some i ∈ R
tersen 2014) of Equation (3) is based on the fact that when as follows: For category i ∈ R, let βi0 = aβi , with a > 1.
either one of A, p(i), or u(i) is null, then β(i) = 0. How- Then
ever, the same result holds when these quantities enter the
ν 0 (x, i) = η(x, i)βi0 wx = η(x, i)aβi wx
definition of β not through a product, but through other op-
> η(x, i)βi wx = ν(x, i).
erations, such as the min, or more generally, t-norms.
The fact that the value of ν(x, i) decreases when Summing up over x ∈ S obtains
Ap(i)u(i) 6= 1 (i.e. at least one of these three values is less X
than 1, u(i) for instance) can be interpreted as follows: x will P 0 (i is selected) = η(x, i)βi0 wx > P (i is selected),
be less probably categorized in i if, for instance, the utility x∈R
for i is low, which means that we do not really care for this (6)
category. This also goes with the interpretation as a rate of which achieves (P1). At the same time, it is clear that for any
encoding information in the memory, according to (Bunde- other category j 6= i, P (j is selected) does not change, and
sen 1990), even without considering time information. hence (P2) is satisfied too.
The two mechanisms for visual attention proposed in Equation (6) uses the assumption that items x are non-
(Bundesen 1990), filtering and pigeonholing, are described overlapping, for example that they form a partition of the
next. image. However, this partition need not be crisp, i.e. may
allow overlapping x’s, as for example these are stated in
qualitative terms. In such cases, Equation (6) does not hold.
2.3 Filtering Dropping the constraint of non-overlapping items, discussed
Filtering (Bundesen 1990) refers to the mechanism of se- later, leads to a different interpretation of ν(x, i).
lecting an item x ∈ S (given a higher level task), for a target
category i. This mechanism seeks to 3 Fuzzy Mechanisms for Visual Attention
We consider in this section the situations when the values
(F1) increase ν(x, i) for some category i, while
of the attentional weight and/or category pertinence are not
(F2) not changing the conditional probability of E(x, i) exact. In such situations these values may be represented as
given that x is categorized. fuzzy sets, and therefore, the computation of the categoriza-
tion of an item must resort to calculus with fuzzy sets. First,
Filtering can be achieved by increasing w(x) as follows: let us see why indeed such situations may arise.
For category j ∈ R assume π 0 (j) = aπ(j), where Recall that in its original definition, for a given input x
0
a
P > 1. Then w(x) of equation Pbecomes w (x) =
(1) and category i, the strength of sensory evidence for E(x, i),
0
i∈R,i6=j η(x, i)πi + η(x, j)πj = i∈R,i6=j η(x, i)πi + η(x, i) ∈ [0, 1]. Assuming that η(x, i) is the output of an
η(x, j)aπj > w(x). Therefore, ν(x, i) becomes ν 0 (x, i) = operator/test for category i on item x, this output may be
η(x, i)β(i)w0 (x) > ν(x, i), which satisfies condition (F1) inexact because of the inexact nature of the category i. For
example, if the category i = red of the attribute color, then Several formulas for the cardinality of a fuzzy set have been
for a given input pixel value x this category holds ”more or put forward. Here, for illustration purposes, the definition
less” and it may not be useful to commit to an exact 0/1 from (Ralescu 1986) is used to obtain
value.
Likewise, in its original definition, the pertinence of a Card ({µx (i) | i ∈ R}) (k) = µx,(k) ∧ (1 − µx,(k+1) ) (8)
category, π(i) conveys its importance. Obviously, given a where µx,(k) denotes, the kth largest value of µx (·), and
collection of visual categories, and task, they may be dis- µx,(|R|+1) = 0. Thus, the cardinality defined in Equation
tinguished along their pertinence values. Moreover, several (7) is a fuzzy set on {0, ..., |R|}. For an exact value of w(x)
categories may have the same, maximum importance for the the 0.5-level set of w(x)
e (which is an interval), or its classic
given task. As an example, consider the pertinence of color cardinality can be used (Ralescu 1995).
categories for the detection of an object which is known to
have one of two possible color categories, white or yellow, 3.2 A new definition for β(i)
from the collection of all possible color categories. In this Following the discussion from Section 2.4, define
case, it is useful to be able to encode
π(white) = π(yellow) = 1, β(i)
e = min{A, p(i), u(i)} (9)
which would be possible when π is considered as a pos- As in the case of β defined in (3), β(i)
e = 0 whenever A = 0,
sibility distribution on the color categories, regardless of or p(i) = 0, or u(i) = 0, and the discussion of (Bundesen
the number of color categories allowed. By contrast, us- 1990) holds: that is, category i biases the selection to the
ing a probability based approach, the cardinality of R, the extent that the system is alert, and category i is possible and
collection of categories, restricts the values assigned to useful. Alternatively, (9) means that the bias for the selection
these equally possible categories, to at most 0.5. That is, of i cannot be greater than the system alertness, the possibil-
π(white) = π(yellow) ≤ 0.5 with equality when R = ity of i or its utility. Furthermore, replacing the product by
{yellow, white}. min also eliminates the possibility of values for βe smaller
than each one of A, p(i), and u(i), which is the well-known
3.1 A new definition for w(x) drowning effect of multiplication of positive values smaller
The departure point for the new definition for w(x) is the than 1. More importantly, it should be mentioned that the
interpretation of a special case of Equation (1). Let Ra = min can handle ordinal or qualitative values, without need-
{i ∈ R | π(i) = a} and consider the special case R = R0 ∪ ing specifying precise numbers. Specifying such precise val-
R1 , that is, all categories in R are either ”fully” pertinent, ues might be difficult when subjective assessments are made.
π(i) = 1 (i ∈ R1 ), or not pertinent π(i) = 0 (i ∈ R0 ). Then By contrast, in the case of such assessments, ordinal or qual-
(1) becomes X itative values are usually easily produced.
w(x) = η(x, i) As already mentioned, in the fuzzy set framework, the
i∈R1 product and min are but two particular cases of a t-norm
(conjunction operator). A, p(i), and u(i) are interpreted re-
Next let ηmax = maxi∈R1 , and recall that η(x, i) ≤ 1. Then
X X spectively, as degrees of alertness, possibility (rather than
w(x) ≤ ηmax = ηmax 1 = ηmax |R1 | ≤ |R1 |, probability) of i to be selected, and utility for the category i,
i:π(i)=1 i∈R1
and the bias for i is defined as the conjunction of these. This
interpretation makes (9) meaningful beyond a mere compu-
where |R1 | denotes the cardinality of the set R1 . That is, tational artifice. Another choice for defining βe is to select
w(x) is bounded by the number of categories i with perti- a more general, aggregation operator, H : [0, 1] × [0, 1] ×
nence π(i) = 1. If η(x, i) = 1 for all i ∈ R1 then w(x) is [0, 1] → [0, 1], which would allow the contribution of more
exactly the number of such categories.
than one of A, p(i), u(i) towards β. e
This meaning of w(x) is very natural and appealing. In-
deed, one would expect the item x to count to the extent that 3.3 A new definition for ν(x, i)
it supports more categories. To generalize this notion, define
for fixed x ∈ S and fixed task T With the new definitions, w(x),
e and βe of w(x) and β respec-
tively, the meaning of ν(x, i) also changes from a probability
µ(x,T ) (i) = η(x, i)πT (i) to a possibility, more precisely, P ossibility(x is i):
the degree to which category i, pertinent to task T , is sup-
ported by the (data) item x as shown by the strength of sen- P ossibility(x is i) = H(η(x, i), β(i),
e w(x))
e (10)
sory evidence, η(x, i). Therefore, µ(x,T ) : R → [0, 1] is the where H is again an aggregation operator, and hence the
membership of a fuzzy set on the set of categories. 2 Then definition of ν(x, i) from (Bundesen 1990) is a particular
the weight of item x is now defined as the cardinality of this case, when H is the product.
fuzzy set. That is For defining H, one may rely on the huge literature on
w(x)
e = Card {(i, µx (i)) | i ∈ R} (7) information fusion, for which the fuzzy sets theory provides
a number of useful operators (see e.g. (Dubois and Prade
2 1985; Yager 1991; Bloch 1996) for reviews on fuzzy fu-
In the following, assuming only one task, T , for ease of nota-
tion, the subscript T will be dropped, to write µx (i). sion operators). The large choice offered by these operators
allows modeling different combination behaviors (conjunc- Humphreys, G. W. 2014. Feature confirmation in object
tive, disjunctive, compromise, etc.), with different degrees perception: Feature integration theory 26 years on from the
(e.g. the min is a less severe conjunction as the product). Treisman Bartlett lecture. The Quarterly Journal of Experi-
Operators can also behave differently depending on whether mental Psychology (just-accepted):1–49.
the values to be combined are small, large, of the same order Kahneman, D. 1973. Attention and Effort. Prentice-Hall.
of magnitude, or having different priorities. The operators
Ralescu, A. L. 1986. A note on rule representation in expert
H could also be set differently for the three values. For in-
systems. Information Sciences 38(2):193–203.
stance η and w,
e which depend on x and i could be combined
using an operator H1 , and the result combined with β,
e which Ralescu, D. 1995. Cardinality, quantifiers, and the aggrega-
depends on i only, using another operators H2 . tion of fuzzy criteria. Fuzzy sets and systems 69(3):355–365.
Treisman, A. M., and Gelade, G. 1980. A feature-integration
4 Conclusions and Future Work theory of attention. Cognitive psychology 12(1):97–136.
This paper discussed an attentional model developed in the Treisman, A. 1988. Features and objects: The fourteenth
field of psychology and cognitive science set in a proba- bartlett memorial lecture. The Quarterly Journal of Experi-
bilistic framework. The basic concepts of this model were mental Psychology 40(2):201–237.
discussed and an alternative, fuzzy set based approach was Treisman, A. 2014. The psychological reality of levels of
suggested. In the fuzzy set framework, modeling would be processing. Levels of processing in human memory 301–
easier, more natural (for instance replacing numbers by ordi- 330.
nal or qualitative values), and it would allow for more flex- Yager, R. R. 1991. Connectives and Quantifiers in Fuzzy
ible ways of combining the different terms. This discussion Sets. Fuzzy Sets and Systems 40:39–75.
paves the way for a new attentional model, the complete de- Yan, Q.; Xu, L.; Shi, J.; and Jia, J. 2013. Hierarchical
velopment of it being left for future work. saliency detection. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 1155–1162.
5 Acknowledgments
Yang, C.; Zhang, L.; Lu, H.; Ruan, X.; and Yang, M.-H.
Anca Ralescu’s contribution was partially supported by a 2013. Saliency detection via graph-based manifold rank-
visit to Telecom ParisTech. ing. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 3166–3173.
References
Bloch, I. 1996. Information Combination Operators for Data
Fusion: A Comparative Review with Classification. IEEE
Transactions on Systems, Man, and Cybernetics 26(1):52–
67.
Borji, A., and Itti, L. 2013. State-of-the-art in visual atten-
tion modeling. IEEE Transactions on Pattern Analysis and
Machine Intelligence 35(1):185–207.
Bundesen, C.; Habekost, T.; and Kyllingsbæk, S. 2005.
A neural theory of visual attention: bridging cognition and
neurophysiology. Psychological review 112(2):291.
Bundesen, C.; Vangkilde, S.; and Petersen, A. 2014. Recent
developments in a computational theory of visual attention
(tva). Vision research.
Bundesen, C. 1990. A theory of visual attention. Psycho-
logical review 97(4):523.
Desolneux, A.; Moisan, L.; and Morel, J.-M. 2003. Com-
putational gestalts and perception thresholds. Journal of
Physiology-Paris 97(2):311–324.
Dubois, D., and Prade, H. 1985. A Review of Fuzzy Set
Aggregation Connectives. Information Sciences 36:85–121.
Fouquier, G.; Atif, J.; and Bloch, I. 2012. Sequential
model-based segmentation and recognition of image struc-
tures driven by visual features and spatial relations. Com-
puter Vision and Image Understanding 116(1):146–165.
Gao, D.; Mahadevan, V.; and Vasconcelos, N. 2008.
The discriminant center-surround hypothesis for bottom-up
saliency. In Advances in Neural Information Processing Sys-
tems, 497–504.