Explaining Contrasting Categories Michael Pazzani, Amir Feghahati, Christian Shelton, Aaron Seitz University of California, Riverside Riverside, CA, United States pazzani@ucr.edu, sfegh001@ucr.edu, cshelton@cs.ucr.edu, aseitz@ucr.edu ABSTRACT contrasting categories. Medical diagnosis is an area where This paper describes initial progress in deep learning capable explanations are of importance, particularly when treatments not only of fine-grained categorization tasks, such as whether are risky or painful. For example, deep learning systems an image of bird is a Western Grebe or a Clark’s Grebe, but [10] have proven accurate at analyzing images to identify also explaining contrasts to make them understandable. melanoma, but not at explaining the diagnosis in a way that Knowledge-discovery in databases has been described as the gives patients or doctors confidence in following treatments. process of identifying valid, novel, potentially useful, and Figure 1 is from the web site. http://tiphero.com/skin-cancer/ ultimately understandable patterns in data [1]. In spite of and lists several signs of melanoma and provides examples this, much of machine learning has focused on “valid” and of what patients and physicians look for and a model for what “useful” with little attention paid to “understandable” [2- 6]. explainable learning should aspire to produce. Dermatology Recent work in deep learning has showed remarkable as well as histology and radiology [11] are examples where accuracy on a wide range of tasks [7], but produces models visual clues are important to initial differential diagnosis. that are more difficult to interpret than most earlier Figure 2 shows an explanation from a bird web site on how approaches to artificial intelligence and machine learning. to distinguish two varieties of grebes. We aspire for our deep Our ultimate goal is to learn to annotate images to explain learning algorithms to create explanations similar to those in the difference between contrasting categories as found in figures. In the remainder of this paper, we concentrate on bird guides or medical books. bird identification in the remainder of the paper due to the Author Keywords availability of large existing image datasets and the ease of Explainable Artificial Intelligence, Machine Learning, finding amateur bird watchers. We first show that amateur Categorization, Deep Learning bird watchers attend to the distinguishing characteristics, known in the bird watching community as “field marks,” ACM Classification Keywords such as those in Figure 2. Next, we, describe an extension to 1.2.6 Artificial Intelligence: Learning (K.3.2). a deep learning system that automatically identifies field HISTORY marks. We leave it to future research on how to describe The first author’s research on learning explainable models field marks. from data started in the mid-1990s after interacting with BIRD IDENTIFICATION: A PILOT STUDY. doctors on models for medical diagnosis [2-5]. Although We prepared images of 12 birds divided into 6 sets of some have focused on which representation formalism is contrasting birds (e.g., Spotted Towhee and Eastern more “understandable” (e.g., [8]), the research has focused Towhee). Images were shown to four experienced bird on how to constrain or bias an algorithm within a particular watchers who were asked a yes-or-no question about the bird representation to produced results that are acceptable to identification (e.g., “Is this a Spotted Towhee?”). Using an human experts [6]. In this paper, we investigate how people eye tracking system, we recorded the parts of the image that explain contrasting categories and develop algorithms to received attention. Figure 3 shows an example of where one create explanations of the category of objects in images (e.g., subject focused on the wing of eastern towhee and spotted [9]). We focus not on explaining why an object belongs to a towhee, two similar pictures distinguished in part due to the certain category, but rather why it belongs to that category spots on the wing. In contrast to distinguish a Clark’s Grebe and not a contrasting category. Figures 1 & 2 show from a Western Grebe that subject concentrated on the area examples of such explanations that people use to explain around the eye and the bill. © 2018. Copyright for the individual papers remains with the authors. Copying permitted for private and academic purposes. ExSS '18, March 11, Although suggestive of how bird watchers learn and attend Tokyo, Japan. to field marks to distinguish similar species, the data is preliminary and requires more subjects and statistical tests. Experiments are planned to show two contrasting images simultaneously to experienced bird watchers and to track attentional changes in novices as they learn to identify birds. Figure 1. Explanation of how to differentiate moles from melanoma. Figure 2. Explanation of how to distinguish a Clark’s grebe from a Western grebe. Figure 3. Eye tracking data shows an experienced bird watcher concentrates on the wing to distinguish a spotted towhee (left) from an eastern towhee (right) highlight those sections of the input image that were most important to the network for distinguishing the two classes: that of the output class and that of the next closest class. The next closest class is chosen since the network has the most difficulty in distinguishing them. As a simple extension, it is possible for the user to ask for an explanation with respect to any other category. To find these image regions, we backward propagate an output vector consisting of +1 for the most likely class and - 1 for the next most likely class. This propagation is similar to the “backprop” neural network algorithm used to train deep networks, but in this case applied for explanation and not for training network weights. This process identifies the most important pixels, i.e., those for which changes will cause the network to assign the given image to the most-similar (i.e., the next most likely) class instead of the correct class. This raw set of pixels is too sporadically distributed to provide a human-consumable explanation. To get larger coherent regions, we convolve windows with different sizes on the image and record the maximum change in each window. The windows with the maximum changes are the regions that are mostly contributing to misguide the network and the most important regions to explain the features that discriminate the contrasting categories. The process is depicted in Figure 5. As one example, the network correctly identified a test image as a cerulean warbler and the second most likely classification as a black throated blue warbler Figure 6 (left) highlights the regions of the image that were found to be most important in distinguishing the two classes including the eye and throat. Figure 6 (right) is an image of a black Figure 4. Eye tracking data shows an experienced bird watcher concentrates on the eye to distinguish a Western throated blue warbler which shows the difference in throat grebe (top) from a Clark’s grebe (bottom). and neck. LEARNING DISCRIMINATIVE REGIONS OF CONTASTING CATEGORIES Deep learning for image classification has shown great results [12, 13], surpassing the previous best computer vision systems. Although classification is an interesting and challenging problem, we want to go one step further and augment the deep network to create a contrasting visual explanation. This explanation identifies the regions of an image that discriminate the selected category from the second most likely category. In bird identification, these regions should correspond to the field marks described in bird guides and the areas that bird watchers focus on when identifying bird species. Figure 5. (top) Standard forward propagation. (bottom) Our We demonstrate the approach with a fine-grained backward propagation, starting with a vector difference of classification task of birds. We seek to find out which regions the best class from the second-best class, passing through the of a bird image are the most important to distinguish it from network to produce the derivative of the input plane with images of the most similar contrasting class of birds. To do respect to this class difference, and then convolution and so, we train a known deep network [12] on the bounding region finding to identify the most important regions of the boxes of a frequently used birds dataset [15]. The network image. Note that while not shown, the lower propagation has been chosen because of its simplicity. Our proposed depends on the upper, as the derivatives are at the points method is not dependent on any specific network defined through the upper propagation. architecture. As contrasting visual explanation, we will 2. Pazzani, M. (2000). Knowledge discovery from data? IEEE Intelligent Systems 15(2): 10-13 (2000) 3. M. J. Pazzani, S. Mani, W. R. Shankle (2001). Acceptance of Rules Generated by Machine Learning among Medical Experts. Methods of Information in Medicine; 40: 380-385. 4. Pazzani, M., Mani, S. & Shankle, W. R. (1997). Beyond concise and colorful: learning intelligible rules. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Figure 6. An image of a cerulean warbler (left) with Beach, CA. AAAI Press, 235-238. highlighting indicating the regions that distinguish it from a black-throated blue warbler (right). 5. Pazzani, M., Mani, S. & Shankle, W. R. (1997). Comprehensive knowledge-discovery in databases. In To illustrate the impact of having a contrasting category, M. G. Shafto & P. Langley (Ed.), Proceedings of the Figure 7 shows the regions that contribute most to producing Nineteenth Annual Conference of the Cognitive the correct category without regard to finding the difference Science Society, pp. 596-601. Mahwah, NJ:Lawrence between a contrasting categories. (i.e. not propagating -1 for Erlbaum. the next most similar class). Note that neither the throat nor 6. Pazzani, M. J. & Bay, S. D. (1999). The Independent eye are highlighted. Sign Bias: Gaining Insight from Multiple Linear Regression. In Proceedings of the Twenty-First Annual Meeting of the Cognitive Science Society. 7. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. 8. B. Gaines, “Transforming Rules and Trees into Comprehensible Knowledge Structures,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad et al., eds., MIT Press, Cambridge, Mass., 1996, pp.205–226. 9. Hendricks, L. A., Akata, Z., Rohrbach, M., Donahue, Figure 7. Important regions in categorizing a cerulean J., Schiele, B., & Darrell, T. (2016). Generating Visual warbler without the contrasting categories. Explanations. n: Leibe B., Matas J., Sebe N., Welling CONCLUSION M. (eds) Computer Vision – ECCV 2016. ECCV 2016. We have begun to explore how machine learning may Lecture Notes in Computer Science, vol 9908. emulate how humans explain contrasting categories. Springer, Cham Preliminary data show the features that experienced bird 10. Premaladha, J., and K. S. Ravichandran. "Novel watchers use to differentiate contrasting categories. A approaches for diagnosing melanoma skin lesions network architecture learned similar features. In future through supervised and deep learning algorithms." work, we will explore how to label differentiating features Journal of medical systems 40.4 (2016): 96. using techniques similar to [9] with an ultimate goal to automate explanations similar to those found in bird guides 11. Geoffrey R. Norman, Donald Rosenthal, Lee R. and medical books. Brooks, Scott W. Allen, Linda J. Muzzin. The Development of Expertise in Dermatology. Arch ACKNOWLEDGEMENTS Dermatol. 1989;125(8):1063–1068. This was developed with funding from the DARPA 12. A. Krizhevsky, I. Sutskever, I., & G. E. Hinton,(2012) Explainable AI Program under a contract from NRL. The ImageNet classification with deep convolutional neural views, opinions, and/or findings expressed are those of the net-works. In NIPS 2012 author and should not be interpreted as representing the official views or policies of the DoD or the U.S. Government. 13. K. He, X. Zhang, S. Ren, & J. Sun (2015). “Deep residual learning for image recognition,” REFERENCES arXiv:1512.03385, 2015 1. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery: An 14. Wah, C. and Branson, S. and Welinder, P. and Perona, Overview,” Advances in Knowledge Discovery and P. & Belongie, S. (2011). The Caltech-UCSD Birds- Data Mining, U.M. Fayyad et al., eds., AAAI/MIT 200-2011 Dataset. Technical Report CNS-TR-2011- Press, Menlo Park, Calif., 1996, pp. 1–34. 2. 001, California Institute of Technology.