=Paper=
{{Paper
|id=Vol-2068/exss3
|storemode=property
|title=Explaining Contrasting Categories
|pdfUrl=https://ceur-ws.org/Vol-2068/exss3.pdf
|volume=Vol-2068
|authors=Michael Pazzani,Amir Feghahati,Christian Shelton,Aaron Seitz
|dblpUrl=https://dblp.org/rec/conf/iui/PazzaniFSS18
}}
==Explaining Contrasting Categories==
<pdf width="1500px">https://ceur-ws.org/Vol-2068/exss3.pdf</pdf>
<pre>
Explaining Contrasting Categories
Michael Pazzani, Amir Feghahati, Christian Shelton, Aaron Seitz
University of California, Riverside
Riverside, CA, United States
pazzani@ucr.edu, sfegh001@ucr.edu, cshelton@cs.ucr.edu, aseitz@ucr.edu
ABSTRACT contrasting categories. Medical diagnosis is an area where
This paper describes initial progress in deep learning capable explanations are of importance, particularly when treatments
not only of fine-grained categorization tasks, such as whether are risky or painful. For example, deep learning systems
an image of bird is a Western Grebe or a Clark’s Grebe, but [10] have proven accurate at analyzing images to identify
also explaining contrasts to make them understandable. melanoma, but not at explaining the diagnosis in a way that
Knowledge-discovery in databases has been described as the gives patients or doctors confidence in following treatments.
process of identifying valid, novel, potentially useful, and Figure 1 is from the web site. http://tiphero.com/skin-cancer/
ultimately understandable patterns in data [1]. In spite of and lists several signs of melanoma and provides examples
this, much of machine learning has focused on “valid” and of what patients and physicians look for and a model for what
“useful” with little attention paid to “understandable” [2- 6]. explainable learning should aspire to produce. Dermatology
Recent work in deep learning has showed remarkable as well as histology and radiology [11] are examples where
accuracy on a wide range of tasks [7], but produces models visual clues are important to initial differential diagnosis.
that are more difficult to interpret than most earlier Figure 2 shows an explanation from a bird web site on how
approaches to artificial intelligence and machine learning. to distinguish two varieties of grebes. We aspire for our deep
Our ultimate goal is to learn to annotate images to explain learning algorithms to create explanations similar to those in
the difference between contrasting categories as found in figures. In the remainder of this paper, we concentrate on
bird guides or medical books. bird identification in the remainder of the paper due to the
Author Keywords availability of large existing image datasets and the ease of
Explainable Artificial Intelligence, Machine Learning, finding amateur bird watchers. We first show that amateur
Categorization, Deep Learning bird watchers attend to the distinguishing characteristics,
known in the bird watching community as “field marks,”
ACM Classification Keywords such as those in Figure 2. Next, we, describe an extension to
1.2.6 Artificial Intelligence: Learning (K.3.2). a deep learning system that automatically identifies field
HISTORY marks. We leave it to future research on how to describe
The first author’s research on learning explainable models field marks.
from data started in the mid-1990s after interacting with BIRD IDENTIFICATION: A PILOT STUDY.
doctors on models for medical diagnosis [2-5]. Although We prepared images of 12 birds divided into 6 sets of
some have focused on which representation formalism is contrasting birds (e.g., Spotted Towhee and Eastern
more “understandable” (e.g., [8]), the research has focused Towhee). Images were shown to four experienced bird
on how to constrain or bias an algorithm within a particular watchers who were asked a yes-or-no question about the bird
representation to produced results that are acceptable to identification (e.g., “Is this a Spotted Towhee?”). Using an
human experts [6]. In this paper, we investigate how people eye tracking system, we recorded the parts of the image that
explain contrasting categories and develop algorithms to received attention. Figure 3 shows an example of where one
create explanations of the category of objects in images (e.g., subject focused on the wing of eastern towhee and spotted
[9]). We focus not on explaining why an object belongs to a towhee, two similar pictures distinguished in part due to the
certain category, but rather why it belongs to that category spots on the wing. In contrast to distinguish a Clark’s Grebe
and not a contrasting category. Figures 1 & 2 show from a Western Grebe that subject concentrated on the area
examples of such explanations that people use to explain around the eye and the bill.
© 2018. Copyright for the individual papers remains with the authors.
Copying permitted for private and academic purposes. ExSS '18, March 11,
Although suggestive of how bird watchers learn and attend
Tokyo, Japan. to field marks to distinguish similar species, the data is
preliminary and requires more subjects and statistical tests.
Experiments are planned to show two contrasting images
simultaneously to experienced bird watchers and to track
attentional changes in novices as they learn to identify birds.
Figure 1. Explanation of how to differentiate moles from melanoma.

Figure 2. Explanation of how to distinguish a Clark’s grebe from a Western grebe.

Figure 3. Eye tracking data shows an experienced bird watcher concentrates on the wing to distinguish a spotted towhee (left)
from an eastern towhee (right)
highlight those sections of the input image that were most
important to the network for distinguishing the two classes:
that of the output class and that of the next closest class. The
next closest class is chosen since the network has the most
difficulty in distinguishing them. As a simple extension, it is
possible for the user to ask for an explanation with respect to
any other category.
To find these image regions, we backward propagate an
output vector consisting of +1 for the most likely class and -
1 for the next most likely class. This propagation is similar
to the “backprop” neural network algorithm used to train
deep networks, but in this case applied for explanation and
not for training network weights.
This process identifies the most important pixels, i.e., those
for which changes will cause the network to assign the given
image to the most-similar (i.e., the next most likely) class
instead of the correct class. This raw set of pixels is too
sporadically distributed to provide a human-consumable
explanation. To get larger coherent regions, we convolve
windows with different sizes on the image and record the
maximum change in each window. The windows with the
maximum changes are the regions that are mostly
contributing to misguide the network and the most important
regions to explain the features that discriminate the
contrasting categories. The process is depicted in Figure 5.
As one example, the network correctly identified a test
image as a cerulean warbler and the second most likely
classification as a black throated blue warbler Figure 6 (left)
highlights the regions of the image that were found to be
most important in distinguishing the two classes including
the eye and throat. Figure 6 (right) is an image of a black
Figure 4. Eye tracking data shows an experienced bird
watcher concentrates on the eye to distinguish a Western throated blue warbler which shows the difference in throat
grebe (top) from a Clark’s grebe (bottom). and neck.
LEARNING DISCRIMINATIVE REGIONS OF
CONTASTING CATEGORIES
Deep learning for image classification has shown great
results [12, 13], surpassing the previous best computer vision
systems. Although classification is an interesting and
challenging problem, we want to go one step further and
augment the deep network to create a contrasting visual
explanation. This explanation identifies the regions of an
image that discriminate the selected category from the
second most likely category. In bird identification, these
regions should correspond to the field marks described in
bird guides and the areas that bird watchers focus on when
identifying bird species.
Figure 5. (top) Standard forward propagation. (bottom) Our
We demonstrate the approach with a fine-grained backward propagation, starting with a vector difference of
classification task of birds. We seek to find out which regions the best class from the second-best class, passing through the
of a bird image are the most important to distinguish it from network to produce the derivative of the input plane with
images of the most similar contrasting class of birds. To do respect to this class difference, and then convolution and
so, we train a known deep network [12] on the bounding region finding to identify the most important regions of the
boxes of a frequently used birds dataset [15]. The network image. Note that while not shown, the lower propagation
has been chosen because of its simplicity. Our proposed depends on the upper, as the derivatives are at the points
method is not dependent on any specific network defined through the upper propagation.
architecture. As contrasting visual explanation, we will
2. Pazzani, M. (2000). Knowledge discovery from data?
IEEE Intelligent Systems 15(2): 10-13 (2000)
3. M. J. Pazzani, S. Mani, W. R. Shankle (2001).
Acceptance of Rules Generated by Machine Learning
among Medical Experts. Methods of Information in
Medicine; 40: 380-385.
4. Pazzani, M., Mani, S. & Shankle, W. R. (1997).
Beyond concise and colorful: learning intelligible rules.
Proceedings of the Third International Conference on
Knowledge Discovery and Data Mining, Newport
Figure 6. An image of a cerulean warbler (left) with Beach, CA. AAAI Press, 235-238.
highlighting indicating the regions that distinguish it from a
black-throated blue warbler (right). 5. Pazzani, M., Mani, S. & Shankle, W. R. (1997).
Comprehensive knowledge-discovery in databases. In
To illustrate the impact of having a contrasting category, M. G. Shafto & P. Langley (Ed.), Proceedings of the
Figure 7 shows the regions that contribute most to producing Nineteenth Annual Conference of the Cognitive
the correct category without regard to finding the difference Science Society, pp. 596-601. Mahwah, NJ:Lawrence
between a contrasting categories. (i.e. not propagating -1 for Erlbaum.
the next most similar class). Note that neither the throat nor
6. Pazzani, M. J. & Bay, S. D. (1999). The Independent
eye are highlighted.
Sign Bias: Gaining Insight from Multiple Linear
Regression. In Proceedings of the Twenty-First Annual
Meeting of the Cognitive Science Society.
7. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep
learning. Nature, 521(7553), 436-444.
8. B. Gaines, “Transforming Rules and Trees into
Comprehensible Knowledge Structures,” Advances in
Knowledge Discovery and Data Mining, U.M. Fayyad
et al., eds., MIT Press, Cambridge, Mass., 1996,
pp.205–226.
9. Hendricks, L. A., Akata, Z., Rohrbach, M., Donahue,
Figure 7. Important regions in categorizing a cerulean J., Schiele, B., & Darrell, T. (2016). Generating Visual
warbler without the contrasting categories. Explanations. n: Leibe B., Matas J., Sebe N., Welling
CONCLUSION M. (eds) Computer Vision – ECCV 2016. ECCV 2016.
We have begun to explore how machine learning may Lecture Notes in Computer Science, vol 9908.
emulate how humans explain contrasting categories. Springer, Cham
Preliminary data show the features that experienced bird 10. Premaladha, J., and K. S. Ravichandran. "Novel
watchers use to differentiate contrasting categories. A approaches for diagnosing melanoma skin lesions
network architecture learned similar features. In future through supervised and deep learning algorithms."
work, we will explore how to label differentiating features Journal of medical systems 40.4 (2016): 96.
using techniques similar to [9] with an ultimate goal to
automate explanations similar to those found in bird guides 11. Geoffrey R. Norman, Donald Rosenthal, Lee R.
and medical books. Brooks, Scott W. Allen, Linda J. Muzzin. The
Development of Expertise in Dermatology. Arch
ACKNOWLEDGEMENTS Dermatol. 1989;125(8):1063–1068.
This was developed with funding from the DARPA
12. A. Krizhevsky, I. Sutskever, I., & G. E. Hinton,(2012)
Explainable AI Program under a contract from NRL. The
ImageNet classification with deep convolutional neural
views, opinions, and/or findings expressed are those of the
net-works. In NIPS 2012
author and should not be interpreted as representing the
official views or policies of the DoD or the U.S. Government. 13. K. He, X. Zhang, S. Ren, & J. Sun (2015). “Deep
residual learning for image recognition,”
REFERENCES
arXiv:1512.03385, 2015
1. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth,
“From Data Mining to Knowledge Discovery: An 14. Wah, C. and Branson, S. and Welinder, P. and Perona,
Overview,” Advances in Knowledge Discovery and P. & Belongie, S. (2011). The Caltech-UCSD Birds-
Data Mining, U.M. Fayyad et al., eds., AAAI/MIT 200-2011 Dataset. Technical Report CNS-TR-2011-
Press, Menlo Park, Calif., 1996, pp. 1–34. 2. 001, California Institute of Technology.

</pre>