<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Pazzani</string-name>
          <email>pazzani@ucr.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amir Feghahati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Shelton</string-name>
          <email>cshelton@cs.ucr.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aaron Seitz</string-name>
          <email>aseitz@ucr.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Author Keywords Explainable Artificial Intelligence</institution>
          ,
          <addr-line>Machine Learning, Categorization, Deep Learning</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Riverside Riverside, CA</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes initial progress in deep learning capable not only of fine-grained categorization tasks, such as whether an image of bird is a Western Grebe or a Clark's Grebe, but also explaining contrasts to make them understandable. Knowledge-discovery in databases has been described as the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [1]. In spite of this, much of machine learning has focused on “valid” and “useful” with little attention paid to “understandable” [2- 6]. Recent work in deep learning has showed remarkable accuracy on a wide range of tasks [7], but produces models that are more difficult to interpret than most earlier approaches to artificial intelligence and machine learning. Our ultimate goal is to learn to annotate images to explain the difference between contrasting categories as found in bird guides or medical books.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        HISTORY
The first author’s research on learning explainable models
from data started in the mid-1990s after interacting with
doctors on models for medical diagnosis [
        <xref ref-type="bibr" rid="ref2 ref4 ref5">2-5</xref>
        ]. Although
some have focused on which representation formalism is
more “understandable” (e.g., [8]), the research has focused
on how to constrain or bias an algorithm within a particular
representation to produced results that are acceptable to
human experts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this paper, we investigate how people
explain contrasting categories and develop algorithms to
create explanations of the category of objects in images (e.g.,
[9]). We focus not on explaining why an object belongs to a
certain category, but rather why it belongs to that category
and not a contrasting category. Figures 1 &amp; 2 show
examples of such explanations that people use to explain
© 2018. Copyright for the individual papers remains with the authors.
Copying permitted for private and academic purposes. ExSS '18, March 11,
Tokyo, Japan.
contrasting categories. Medical diagnosis is an area where
explanations are of importance, particularly when treatments
are risky or painful. For example, deep learning systems
[
        <xref ref-type="bibr" rid="ref11">10</xref>
        ] have proven accurate at analyzing images to identify
melanoma, but not at explaining the diagnosis in a way that
gives patients or doctors confidence in following treatments.
Figure 1 is from the web site. http://tiphero.com/skin-cancer/
and lists several signs of melanoma and provides examples
of what patients and physicians look for and a model for what
explainable learning should aspire to produce. Dermatology
as well as histology and radiology [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] are examples where
visual clues are important to initial differential diagnosis.
Figure 2 shows an explanation from a bird web site on how
to distinguish two varieties of grebes. We aspire for our deep
learning algorithms to create explanations similar to those in
figures. In the remainder of this paper, we concentrate on
bird identification in the remainder of the paper due to the
availability of large existing image datasets and the ease of
finding amateur bird watchers. We first show that amateur
bird watchers attend to the distinguishing characteristics,
known in the bird watching community as “field marks,”
such as those in Figure 2. Next, we, describe an extension to
a deep learning system that automatically identifies field
marks. We leave it to future research on how to describe
field marks.
      </p>
      <p>BIRD IDENTIFICATION: A PILOT STUDY.</p>
      <p>We prepared images of 12 birds divided into 6 sets of
contrasting birds (e.g., Spotted Towhee and Eastern
Towhee). Images were shown to four experienced bird
watchers who were asked a yes-or-no question about the bird
identification (e.g., “Is this a Spotted Towhee?”). Using an
eye tracking system, we recorded the parts of the image that
received attention. Figure 3 shows an example of where one
subject focused on the wing of eastern towhee and spotted
towhee, two similar pictures distinguished in part due to the
spots on the wing. In contrast to distinguish a Clark’s Grebe
from a Western Grebe that subject concentrated on the area
around the eye and the bill.</p>
      <p>Although suggestive of how bird watchers learn and attend
to field marks to distinguish similar species, the data is
preliminary and requires more subjects and statistical tests.
Experiments are planned to show two contrasting images
simultaneously to experienced bird watchers and to track
attentional changes in novices as they learn to identify birds.</p>
      <p>
        LEARNING DISCRIMINATIVE REGIONS OF
CONTASTING CATEGORIES
Deep learning for image classification has shown great
results [
        <xref ref-type="bibr" rid="ref13 ref14">12, 13</xref>
        ], surpassing the previous best computer vision
systems. Although classification is an interesting and
challenging problem, we want to go one step further and
augment the deep network to create a contrasting visual
explanation. This explanation identifies the regions of an
image that discriminate the selected category from the
second most likely category. In bird identification, these
regions should correspond to the field marks described in
bird guides and the areas that bird watchers focus on when
identifying bird species.
      </p>
      <p>
        We demonstrate the approach with a fine-grained
classification task of birds. We seek to find out which regions
of a bird image are the most important to distinguish it from
images of the most similar contrasting class of birds. To do
so, we train a known deep network [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ] on the bounding
boxes of a frequently used birds dataset [15]. The network
has been chosen because of its simplicity. Our proposed
method is not dependent on any specific network
architecture. As contrasting visual explanation, we will
highlight those sections of the input image that were most
important to the network for distinguishing the two classes:
that of the output class and that of the next closest class. The
next closest class is chosen since the network has the most
difficulty in distinguishing them. As a simple extension, it is
possible for the user to ask for an explanation with respect to
any other category.
      </p>
      <p>To find these image regions, we backward propagate an
output vector consisting of +1 for the most likely class and
1 for the next most likely class. This propagation is similar
to the “backprop” neural network algorithm used to train
deep networks, but in this case applied for explanation and
not for training network weights.</p>
      <p>This process identifies the most important pixels, i.e., those
for which changes will cause the network to assign the given
image to the most-similar (i.e., the next most likely) class
instead of the correct class. This raw set of pixels is too
sporadically distributed to provide a human-consumable
explanation. To get larger coherent regions, we convolve
windows with different sizes on the image and record the
maximum change in each window. The windows with the
maximum changes are the regions that are mostly
contributing to misguide the network and the most important
regions to explain the features that discriminate the
contrasting categories. The process is depicted in Figure 5.
As one example, the network correctly identified a test
image as a cerulean warbler and the second most likely
classification as a black throated blue warbler Figure 6 (left)
highlights the regions of the image that were found to be
most important in distinguishing the two classes including
the eye and throat. Figure 6 (right) is an image of a black
throated blue warbler which shows the difference in throat
and neck.</p>
      <p>To illustrate the impact of having a contrasting category,
Figure 7 shows the regions that contribute most to producing
the correct category without regard to finding the difference
between a contrasting categories. (i.e. not propagating -1 for
the next most similar class). Note that neither the throat nor
eye are highlighted.</p>
      <p>CONCLUSION
We have begun to explore how machine learning may
emulate how humans explain contrasting categories.
Preliminary data show the features that experienced bird
watchers use to differentiate contrasting categories. A
network architecture learned similar features. In future
work, we will explore how to label differentiating features
using techniques similar to [9] with an ultimate goal to
automate explanations similar to those found in bird guides
and medical books.</p>
      <p>ACKNOWLEDGEMENTS
This was developed with funding from the DARPA
Explainable AI Program under a contract from NRL. The
views, opinions, and/or findings expressed are those of the
author and should not be interpreted as representing the
official views or policies of the DoD or the U.S. Government.
3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>U.M. Fayyad</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Piatetsky-Shapiro</surname>
            ,
            <given-names>and P.</given-names>
          </string-name>
          <string-name>
            <surname>Smyth</surname>
          </string-name>
          , “
          <article-title>From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery</article-title>
          and
          <string-name>
            <given-names>Data</given-names>
            <surname>Mining</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.M.</given-names>
            <surname>Fayyad</surname>
          </string-name>
          et al., eds., AAAI/MIT Press, Menlo Park, Calif.,
          <year>1996</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          . 2.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Knowledge discovery from data?</article-title>
          <source>IEEE Intelligent Systems</source>
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>10</fpage>
          -
          <lpage>13</lpage>
          (
          <year>2000</year>
          )
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Shankle</surname>
          </string-name>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Acceptance of Rules Generated by Machine Learning among Medical Experts</article-title>
          . Methods of Information in Medicine;
          <volume>40</volume>
          :
          <fpage>380</fpage>
          -
          <lpage>385</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shankle</surname>
            ,
            <given-names>W. R.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Beyond concise and colorful: learning intelligible rules</article-title>
          .
          <source>Proceedings of the Third International Conference on Knowledge Discovery and Data Mining</source>
          , Newport Beach, CA. AAAI Press,
          <fpage>235</fpage>
          -
          <lpage>238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shankle</surname>
            ,
            <given-names>W. R.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Comprehensive knowledge-discovery in databases</article-title>
          . In M. G. Shafto &amp; P.
          <string-name>
            <surname>Langley</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society</source>
          , pp.
          <fpage>596</fpage>
          -
          <lpage>601</lpage>
          . Mahwah, NJ:Lawrence Erlbaum.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Bay</surname>
            ,
            <given-names>S. D.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>The Independent Sign Bias: Gaining Insight from Multiple Linear Regression</article-title>
          .
          <source>In Proceedings of the Twenty-First Annual Meeting of the Cognitive Science Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ),
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Gaines</surname>
          </string-name>
          , “
          <article-title>Transforming Rules and Trees into Comprehensible Knowledge Structures,” Advances in Knowledge Discovery</article-title>
          and
          <string-name>
            <given-names>Data</given-names>
            <surname>Mining</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.M.</given-names>
            <surname>Fayyad</surname>
          </string-name>
          et al., eds., MIT Press, Cambridge, Mass.,
          <year>1996</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Hendricks</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akata</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiele</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Generating Visual Explanations</article-title>
          . n: Leibe B.,
          <string-name>
            <surname>Matas</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebe</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welling</surname>
            <given-names>M</given-names>
          </string-name>
          . (eds) Computer Vision - ECCV
          <year>2016</year>
          .
          <article-title>ECCV 2016</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>Lecture Notes in Computer Science</source>
          , vol
          <volume>9908</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          10.
          <string-name>
            <surname>Premaladha</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Ravichandran</surname>
          </string-name>
          .
          <article-title>"Novel approaches for diagnosing melanoma skin lesions through supervised and deep learning algorithms</article-title>
          .
          <source>" Journal of medical systems 40.4</source>
          (
          <year>2016</year>
          ):
          <fpage>96</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          11.
          <string-name>
            <surname>Geoffrey R. Norman</surname>
          </string-name>
          , Donald Rosenthal,
          <string-name>
            <surname>Lee R. Brooks</surname>
            ,
            <given-names>Scott W.</given-names>
          </string-name>
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>Linda J.</given-names>
          </string-name>
          <string-name>
            <surname>Muzzin</surname>
          </string-name>
          .
          <source>The Development of Expertise in Dermatology. Arch Dermatol</source>
          .
          <year>1989</year>
          ;
          <volume>125</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1063</fpage>
          -
          <lpage>1068</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          , &amp; G. E. Hinton,(
          <year>2012</year>
          )
          <article-title>ImageNet classification with deep convolutional neural net-works</article-title>
          .
          <source>In NIPS 2012</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          13.
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren, &amp; J.
          <string-name>
            <surname>Sun</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>“Deep residual learning for image recognition</article-title>
          ,
          <source>” arXiv:1512.03385</source>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Branson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Welinder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>The Caltech-UCSD Birds200-</article-title>
          2011
          <source>Dataset. Technical Report CNS-TR-2011- 001</source>
          , California Institute of Technology.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>