Explaining Contrasting Categories
                    Michael Pazzani, Amir Feghahati, Christian Shelton, Aaron Seitz
                                    University of California, Riverside
                                          Riverside, CA, United States
                    pazzani@ucr.edu, sfegh001@ucr.edu, cshelton@cs.ucr.edu, aseitz@ucr.edu
ABSTRACT                                                                   contrasting categories. Medical diagnosis is an area where
This paper describes initial progress in deep learning capable             explanations are of importance, particularly when treatments
not only of fine-grained categorization tasks, such as whether             are risky or painful. For example, deep learning systems
an image of bird is a Western Grebe or a Clark’s Grebe, but                [10] have proven accurate at analyzing images to identify
also explaining contrasts to make them understandable.                     melanoma, but not at explaining the diagnosis in a way that
Knowledge-discovery in databases has been described as the                 gives patients or doctors confidence in following treatments.
process of identifying valid, novel, potentially useful, and               Figure 1 is from the web site. http://tiphero.com/skin-cancer/
ultimately understandable patterns in data [1]. In spite of                and lists several signs of melanoma and provides examples
this, much of machine learning has focused on “valid” and                  of what patients and physicians look for and a model for what
“useful” with little attention paid to “understandable” [2- 6].            explainable learning should aspire to produce. Dermatology
Recent work in deep learning has showed remarkable                         as well as histology and radiology [11] are examples where
accuracy on a wide range of tasks [7], but produces models                 visual clues are important to initial differential diagnosis.
that are more difficult to interpret than most earlier                     Figure 2 shows an explanation from a bird web site on how
approaches to artificial intelligence and machine learning.                to distinguish two varieties of grebes. We aspire for our deep
Our ultimate goal is to learn to annotate images to explain                learning algorithms to create explanations similar to those in
the difference between contrasting categories as found in                  figures. In the remainder of this paper, we concentrate on
bird guides or medical books.                                              bird identification in the remainder of the paper due to the
Author Keywords                                                            availability of large existing image datasets and the ease of
Explainable Artificial Intelligence, Machine Learning,                     finding amateur bird watchers. We first show that amateur
Categorization, Deep Learning                                              bird watchers attend to the distinguishing characteristics,
                                                                           known in the bird watching community as “field marks,”
ACM Classification Keywords                                                such as those in Figure 2. Next, we, describe an extension to
1.2.6 Artificial Intelligence: Learning (K.3.2).                           a deep learning system that automatically identifies field
HISTORY                                                                    marks. We leave it to future research on how to describe
The first author’s research on learning explainable models                 field marks.
from data started in the mid-1990s after interacting with                  BIRD IDENTIFICATION: A PILOT STUDY.
doctors on models for medical diagnosis [2-5]. Although                    We prepared images of 12 birds divided into 6 sets of
some have focused on which representation formalism is                     contrasting birds (e.g., Spotted Towhee and Eastern
more “understandable” (e.g., [8]), the research has focused                Towhee). Images were shown to four experienced bird
on how to constrain or bias an algorithm within a particular               watchers who were asked a yes-or-no question about the bird
representation to produced results that are acceptable to                  identification (e.g., “Is this a Spotted Towhee?”). Using an
human experts [6]. In this paper, we investigate how people                eye tracking system, we recorded the parts of the image that
explain contrasting categories and develop algorithms to                   received attention. Figure 3 shows an example of where one
create explanations of the category of objects in images (e.g.,            subject focused on the wing of eastern towhee and spotted
[9]). We focus not on explaining why an object belongs to a                towhee, two similar pictures distinguished in part due to the
certain category, but rather why it belongs to that category               spots on the wing. In contrast to distinguish a Clark’s Grebe
and not a contrasting category.        Figures 1 & 2 show                  from a Western Grebe that subject concentrated on the area
examples of such explanations that people use to explain                   around the eye and the bill.
© 2018. Copyright for the individual papers remains with the authors.
Copying permitted for private and academic purposes. ExSS '18, March 11,
                                                                           Although suggestive of how bird watchers learn and attend
Tokyo, Japan.                                                              to field marks to distinguish similar species, the data is
                                                                           preliminary and requires more subjects and statistical tests.
                                                                           Experiments are planned to show two contrasting images
                                                                           simultaneously to experienced bird watchers and to track
                                                                           attentional changes in novices as they learn to identify birds.
                            Figure 1. Explanation of how to differentiate moles from melanoma.


                     Figure 2. Explanation of how to distinguish a Clark’s grebe from a Western grebe.


Figure 3. Eye tracking data shows an experienced bird watcher concentrates on the wing to distinguish a spotted towhee (left)
                                              from an eastern towhee (right)
                                                                  highlight those sections of the input image that were most
                                                                  important to the network for distinguishing the two classes:
                                                                  that of the output class and that of the next closest class. The
                                                                  next closest class is chosen since the network has the most
                                                                  difficulty in distinguishing them. As a simple extension, it is
                                                                  possible for the user to ask for an explanation with respect to
                                                                  any other category.
                                                                  To find these image regions, we backward propagate an
                                                                  output vector consisting of +1 for the most likely class and -
                                                                  1 for the next most likely class. This propagation is similar
                                                                  to the “backprop” neural network algorithm used to train
                                                                  deep networks, but in this case applied for explanation and
                                                                  not for training network weights.
                                                                  This process identifies the most important pixels, i.e., those
                                                                  for which changes will cause the network to assign the given
                                                                  image to the most-similar (i.e., the next most likely) class
                                                                  instead of the correct class. This raw set of pixels is too
                                                                  sporadically distributed to provide a human-consumable
                                                                  explanation. To get larger coherent regions, we convolve
                                                                  windows with different sizes on the image and record the
                                                                  maximum change in each window. The windows with the
                                                                  maximum changes are the regions that are mostly
                                                                  contributing to misguide the network and the most important
                                                                  regions to explain the features that discriminate the
                                                                  contrasting categories. The process is depicted in Figure 5.
                                                                   As one example, the network correctly identified a test
                                                                  image as a cerulean warbler and the second most likely
                                                                  classification as a black throated blue warbler Figure 6 (left)
                                                                  highlights the regions of the image that were found to be
                                                                  most important in distinguishing the two classes including
                                                                  the eye and throat. Figure 6 (right) is an image of a black
    Figure 4. Eye tracking data shows an experienced bird
   watcher concentrates on the eye to distinguish a Western       throated blue warbler which shows the difference in throat
         grebe (top) from a Clark’s grebe (bottom).               and neck.
LEARNING    DISCRIMINATIVE                 REGIONS         OF
CONTASTING CATEGORIES
Deep learning for image classification has shown great
results [12, 13], surpassing the previous best computer vision
systems. Although classification is an interesting and
challenging problem, we want to go one step further and
augment the deep network to create a contrasting visual
explanation. This explanation identifies the regions of an
image that discriminate the selected category from the
second most likely category. In bird identification, these
regions should correspond to the field marks described in
bird guides and the areas that bird watchers focus on when
identifying bird species.
                                                                  Figure 5. (top) Standard forward propagation. (bottom) Our
We demonstrate the approach with a fine-grained                   backward propagation, starting with a vector difference of
classification task of birds. We seek to find out which regions   the best class from the second-best class, passing through the
of a bird image are the most important to distinguish it from     network to produce the derivative of the input plane with
images of the most similar contrasting class of birds. To do      respect to this class difference, and then convolution and
so, we train a known deep network [12] on the bounding            region finding to identify the most important regions of the
boxes of a frequently used birds dataset [15]. The network        image. Note that while not shown, the lower propagation
has been chosen because of its simplicity. Our proposed           depends on the upper, as the derivatives are at the points
method is not dependent on any specific network                   defined through the upper propagation.
architecture. As contrasting visual explanation, we will
                                                                  2.   Pazzani, M. (2000). Knowledge discovery from data?
                                                                       IEEE Intelligent Systems 15(2): 10-13 (2000)
                                                                  3.   M. J. Pazzani, S. Mani, W. R. Shankle (2001).
                                                                       Acceptance of Rules Generated by Machine Learning
                                                                       among Medical Experts. Methods of Information in
                                                                       Medicine; 40: 380-385.
                                                                  4.   Pazzani, M., Mani, S. & Shankle, W. R. (1997).
                                                                       Beyond concise and colorful: learning intelligible rules.
                                                                       Proceedings of the Third International Conference on
                                                                       Knowledge Discovery and Data Mining, Newport
     Figure 6. An image of a cerulean warbler (left) with              Beach, CA. AAAI Press, 235-238.
 highlighting indicating the regions that distinguish it from a
             black-throated blue warbler (right).                 5.   Pazzani, M., Mani, S. & Shankle, W. R. (1997).
                                                                       Comprehensive knowledge-discovery in databases. In
To illustrate the impact of having a contrasting category,             M. G. Shafto & P. Langley (Ed.), Proceedings of the
Figure 7 shows the regions that contribute most to producing           Nineteenth Annual Conference of the Cognitive
the correct category without regard to finding the difference          Science Society, pp. 596-601. Mahwah, NJ:Lawrence
between a contrasting categories. (i.e. not propagating -1 for         Erlbaum.
the next most similar class). Note that neither the throat nor
                                                                  6.   Pazzani, M. J. & Bay, S. D. (1999). The Independent
eye are highlighted.
                                                                       Sign Bias: Gaining Insight from Multiple Linear
                                                                       Regression. In Proceedings of the Twenty-First Annual
                                                                       Meeting of the Cognitive Science Society.
                                                                  7.   LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep
                                                                       learning. Nature, 521(7553), 436-444.
                                                                  8.   B. Gaines, “Transforming Rules and Trees into
                                                                       Comprehensible Knowledge Structures,” Advances in
                                                                       Knowledge Discovery and Data Mining, U.M. Fayyad
                                                                       et al., eds., MIT Press, Cambridge, Mass., 1996,
                                                                       pp.205–226.
                                                                  9.   Hendricks, L. A., Akata, Z., Rohrbach, M., Donahue,
     Figure 7. Important regions in categorizing a cerulean            J., Schiele, B., & Darrell, T. (2016). Generating Visual
          warbler without the contrasting categories.                  Explanations. n: Leibe B., Matas J., Sebe N., Welling
CONCLUSION                                                             M. (eds) Computer Vision – ECCV 2016. ECCV 2016.
We have begun to explore how machine learning may                      Lecture Notes in Computer Science, vol 9908.
emulate how humans explain contrasting categories.                     Springer, Cham
Preliminary data show the features that experienced bird          10. Premaladha, J., and K. S. Ravichandran. "Novel
watchers use to differentiate contrasting categories. A               approaches for diagnosing melanoma skin lesions
network architecture learned similar features. In future              through supervised and deep learning algorithms."
work, we will explore how to label differentiating features           Journal of medical systems 40.4 (2016): 96.
using techniques similar to [9] with an ultimate goal to
automate explanations similar to those found in bird guides       11. Geoffrey R. Norman, Donald Rosenthal, Lee R.
and medical books.                                                    Brooks, Scott W. Allen, Linda J. Muzzin. The
                                                                      Development of Expertise in Dermatology. Arch
ACKNOWLEDGEMENTS                                                      Dermatol. 1989;125(8):1063–1068.
This was developed with funding from the DARPA
                                                                  12. A. Krizhevsky, I. Sutskever, I., & G. E. Hinton,(2012)
Explainable AI Program under a contract from NRL. The
                                                                      ImageNet classification with deep convolutional neural
views, opinions, and/or findings expressed are those of the
                                                                      net-works. In NIPS 2012
author and should not be interpreted as representing the
official views or policies of the DoD or the U.S. Government.     13. K. He, X. Zhang, S. Ren, & J. Sun (2015). “Deep
                                                                      residual learning for image recognition,”
REFERENCES
                                                                      arXiv:1512.03385, 2015
1.   U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth,
     “From Data Mining to Knowledge Discovery: An                 14. Wah, C. and Branson, S. and Welinder, P. and Perona,
     Overview,” Advances in Knowledge Discovery and                   P. & Belongie, S. (2011). The Caltech-UCSD Birds-
     Data Mining, U.M. Fayyad et al., eds., AAAI/MIT                  200-2011 Dataset. Technical Report CNS-TR-2011-
     Press, Menlo Park, Calif., 1996, pp. 1–34. 2.                    001, California Institute of Technology.