<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring Principled Visualizations for Deep Network Atributions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mukund Sundararajan</string-name>
          <email>mukunds@google.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shawn Xu</string-name>
          <email>jinhuaxu@verily.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankur Taly</string-name>
          <email>ataly@google.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rory Sayres</string-name>
          <email>sayres@google.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amir Najmi</string-name>
          <email>amir@google.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Google Inc, Mountain View</institution>
          ,
          <addr-line>California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Verily Life Sciences</institution>
          ,
          <addr-line>South San Francisco, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>Attributions (cf. [1]) are increasingly used to explain the predictions of deep neural networks for various vision tasks. Attributions assign feature importances to the underlying pixels, but what the human consumes are visualizations of these attributions. We find that naive visualizations may not reflect the attributions faithfully and may sometime mislead the human decision-maker. We identify four guiding principles-Graphical Integrity, Layer Separation, Morphological Clarity, and Coverage-for efective visualizations; the ifrst three requirements are standard in the visualization and the computer-vision literatures. We discuss fixes to naive visualization to satisfy these principles, and evaluate our fixes via a user study. Overall, we hope that this leads to more foolproof visualizations for mission-critical tasks like diagnosis based on medical imaging.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Human-centered computing → Heat maps; Visualization
theory, concepts and paradigms; • Computing
methodologies → Neural networks.
1
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
    </sec>
    <sec id="sec-3">
      <title>Visualization of Attributions</title>
      <p>
        Deep neural networks are now commonly used for computer
vision tasks. Such networks are used to detect objects in images
(cf. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]), and to perform diagnosis from medical imaging data.
One approach to explaining the prediction of a deep network is
to attribute the prediction score back to the base features (pixels).
Several attribution methods have been proposed in the literature
IUI Workshops’19, March 20, 2019, Los Angeles, USA
© 2019 for the individual papers by the papers’ authors. Copying permitted for private
and academic purposes. This volumn is published and copyrighted by its editors.
(cf. [
        <xref ref-type="bibr" rid="ref1 ref14 ref15 ref17 ref2 ref20 ref6">1, 2, 6, 14, 15, 17, 20</xref>
        ], where they are also called local
explanation methods or deep network saliency maps). The resulting
attributions are meant to guide model or training data
improvements, or assist a human decision-maker (such as a doctor using
a deep learning model as assistance in the diagnosis of diabetic
retinopathy [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ]).
      </p>
      <p>
        Attribution methods fall into two broad categories. Some
methods assign influence proportional to the gradient of the prediction
score with respect to the input image (or modifications of the input
image) (cf. [
        <xref ref-type="bibr" rid="ref14 ref16 ref18">14, 16, 18</xref>
        ]). Other methods propagate or redistribute
the prediction score, layer by layer of the deep network, from the
output of the network back to its input (cf. (cf. [
        <xref ref-type="bibr" rid="ref14 ref17 ref2">2, 14, 17</xref>
        ])). In all
cases, the methods eventually assign each pixel a score proportional
to its importance. This score could be positive or negative
depending on the polarity of the influence of the pixel on the score. All the
attribution methods are constructed and justified in principled ways
(cf. the axiomatic approaches underlying [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], or the
discussion about the Layerwise Conservation Principle from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). The
justifications and the construction principles difer across methods
because there seems to be no such thing as an universally optimal
explanation, but nevertheless each method is relatively principled.
      </p>
      <p>A key feature of these attribution based explanations is that
they are expressly for human consumption. A common way to
communicate these attributions to a human is by displaying the
attributions themselves as images.1 Figure 1 describes the context
surrounding the visualizations.
1.2</p>
    </sec>
    <sec id="sec-4">
      <title>Faithfulness of Visualizations</title>
      <p>
        Because the human consumes visualizations of the attributions and
not the underlying numbers, we should take care that these
visualizations do not distort the attribution in a way that undermines
the justifications underlying the attribution methods. As Edward
Tufte [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] notes:
      </p>
      <p>
        “Visual representation of evidence should be governed by principles
of reasoning about quantitative evidence. For information displays,
design reasoning must correspond to scientific reasoning. Clear and
precise seeing becomes as one with clear and precise thinking.”
1There are indeed applications of deep learning which take non-image inputs. For
instance, a sentiment detection model like the one in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] takes a paragraph of text
as input. Consequently, the attributions of the score are to words in the text. In such
cases, the attribution numbers can be communicated directly, one per word, without
visualizations. Therefore the problems associated with vision applications do not arise,
and we ignore such applications.
(a) Context surrounding the Attribution Visualizations.
      </p>
      <p>The main purpose of this paper is to identify visualization
principles that ensure that relevant properties of the attributions are
truthfully represented by the visualizations. As we shall soon see,
naive visualizations often only display a fraction of the underlying
phenomena (Figure 2 (b)), and either occlude the underlying image
or do not efectively display the correspondence between the image
and the attributions (Figure 2 (e)).
1.3</p>
    </sec>
    <sec id="sec-5">
      <title>Efectiveness of Visualizations</title>
      <p>
        As we discuss visualization techniques, it is important to note that
visualizations are used to help model developers debug data and
models, and help experts to make decisions. In either scenario, the
human has some competence in the model’s prediction task. For
instance, doctors are capable of diagnosing conditions like diabetic
retinopathy or mammography from images. In this case, the model
and the explanation are either used to screen cases for review by
a doctor (cf. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), or to assist the doctor in diagnosis (cf. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). The
criterion for success here is the accuracy of the model and the
human combined. To accomplish this, it is helpful to ensure that
the humans comprehend the visualizations.
      </p>
      <p>There are at least three reasons why the human may find it
hard to understand the explanation: First, the model happens to
reason diferently from the human. Second, the attribution method
distorts the model’s operation; this is a shortcoming of the
attribution method and not the model. Third, the visualization distorts
the attribution numbers; this is a shortcoming of the visualization
approach.</p>
      <p>While it is indeed possible to construct inherently interpretable
models, or to improve the attribution method, the focus of the paper
is solely on better visualizations. We will take the model behavior
and the attribution method’s behavior as a given and optimize the
visualizations.</p>
      <p>Uncluttered visualizations tend to be easier to comprehend.
Consider the diference between a scatter plot and a bar chart. The
former displays a detailed relationship between two variables, but
it can be relatively cluttered as a large number of points are
displayed. In contrast, the latter is relatively uncluttered and reduces
the cognitive load on the user. But it is possible that the binning
(along the x-axis of the barchart) may cause artifacts or hide
relevant information. Ideally, we’d like to reduce clutter without hiding
information or causing artifacts.</p>
      <p>
        If the visualization is cluttered the human may start ignoring
the explanation altogether. This phenomenon is known as “disuse”
of automation in the literature on automation bias [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, if
the visualization is over-optimized for human-intelligibility— for
instance, by selectively highlighting features2 that are considered
important by the human—this could cause confirmation bias and
suppress interesting instances where the model and the human
reach diferent conclusions by reasoning diferently. This
phenomenon is known as “misuse” of automation. Both disuse and misuse
ultimately harm the overall accuracy of the decision-making. Our
goal, therefore, is to reduce clutter without causing confirmation
bias.
      </p>
      <p>
        A diferent aspect of successful comprehension of the
explanations depend on how well the visualization establishes a
correspondence between the two layers of information, namely the raw image
and the depiction of the attributions. As Tufte [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] notes:
“What matters—inevitably, unrelentingly—is the proper
relationship among information layers.”
      </p>
      <p>Naively overlaying the attributions on top of the image may
obscure interesting characteristics of the raw image. On the other
hand, visualizing the two layers separately loses the correspondence
between the explanation (the attributions) and the “explained” (the
raw image). We would like to balance these potentially conflicting
objectives.
2</p>
    </sec>
    <sec id="sec-6">
      <title>GRAPHICAL INTEGRITY AND COVERAGE</title>
      <p>
        Our first goal is to have the visualizations represent the underlying
attribution numbers as faithfully as possible. The goal is to ensure
that the visualizations satisfy the standard principle of Graphical
Integrity, i.e., the visualization reflects the underlying data, or
equivalently that the “representation of numbers should match the
true proportions” (cf. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]).
      </p>
      <p>
        Intuitively, if a feature that has twice the attribution of another,
then it should appear twice as bright. Though this is a fine
aspiration, it will be rather hard to achieve precisely. This is because
human perception of brightness is known to be non-linear [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
it is also possible that there are spatial efects that afect human
perception.
2Here, by feature we simply mean groups of logically related pixels, and not any formal
notion of feature from either the computer vision or machine learning literature.
      </p>
      <p>A corollary to Graphical Integrity is that features with positive
or negative attributions are called out diferently. This is easy to
achieve by just using diferent colors, say green and red, to display
positive and negative attributions. However, one can also naively
translate feature “importance” in terms of high attribution
magnitude, ignoring the sign of the attribution. This can be dangerous.</p>
      <p>
        As [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] discusses, the explanations can then appear to lack
sensitivity to the network, i.e., the attributions from two very diferent
networks can appear visually similar.
      </p>
      <p>The most obvious way to achieve Graphical Integrity, assuming a
calibrated display, is to linearly transform attributions to the range
[0, 255]; the maximum attribution magnitude is assigned a value of
255 in a 8 bit RGB scheme.</p>
      <p>Assume that this transformation is done separately for positive
and negative attributions. Positive attributions are shown in green.</p>
      <p>Negative attributions in red.</p>
      <p>
        As a concrete example, consider the image in Figure 2 (a), and
an object recognition network built using the GoogleNet
architecture [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and trained over the ImageNet object recognition
dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The top predicted label for this image is “fireboat”;
we compute attribution for this label3 using the Integrated
Gradients method [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and a black image baseline. And we visualize only
the pixels with positive attributions.
      </p>
      <p>The resulting “naive" visualization (Figure 2 (b)) highlights some
features that match our semantic understanding of the label:
Primarily, attribution highlights a set of water jets on the left side of
the image. Note, however, that the highlighted regions do not cover
the entire area of this semantic feature: there are also water jets
to the image right that are not highlighted, nor is the structure of
the boat itself. Indeed, only a small fraction of the attributions is
visible.</p>
      <p>
        Such visualizations display only a small fraction of the total
attribution. This is because the attributions span a large range of
values, and are long-tailed (see Figure 3 (a)). We also show that this
is not only true for Integrated Gradients, but also true for two other
attributions techniques: Gradient x Image [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Guided
Backpropagation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] – showing that techniques from both categories
(Section 1.1) of attribution exhibit this property. The large range
implies that only the top few pixels4 by attribution magnitude are
highlighted (compare Figure 2 (b) against Figure 2 (c)). The long tail
implies that the invisible pixels hold the bulk of the attribution. We
found that in order to see 80% of the attribution magnitude, around
150000 pixels would need to be visible (out of 1 million pixels for a
1024x1024 image) across the three techniques (see Figure 3 (d)). In
contrast, the naive visualization only shows (around) the top 500
pixels across the three techniques. It is possible that this long tail
is possibly fixed by changes to either the deep network training
process or the attribution, but this is outside the scope of this paper.
      </p>
      <p>Let us revisit this phenomenon in our example. Suppose that
we take only the pixels with positive attributions. Consider the
pixel with the kth max attribution value such that the top k pixels
3Specifically, we analyze the softmax value for this label. The softmax score is a function
of the logit scores of all labels; explaining the softmax score implicitly explains the
discriminatory process of the network, i.e., what made the model predict this label
and not any another label
4 Ideally we would like to report coverage numbers at the level of human-intelligible
features. For analytical tractability, we will use pixels as proxies for features.
account for 75% of the attribution magnitude. Notice, that we would
like this pixel to be visible so as to cover the underlying phenomena.</p>
      <p>Now suppose that we take the ratio of the max attribution value
to kth max attribution value. This ratio is approximately 25. This
would imply that if the max attribution value has an value close to
255 (in 8-bit RGB), the kth max pixel would have a value close to
10 and appear indistinguishable from black, and hence would be
invisible.</p>
      <p>This brings us to our second requirement, Coverage, which
requires that a large fraction of important features are visible. This
is a concept introduced by us, justified by the data analysis above;
our other three requirements are standard either in the visualization
or the computer vision literature.</p>
      <p>To fix coverage, we must reduce the range of the attributions.</p>
      <p>There are several ways to do this, but arguably the simplest is to clip
the attributions at the top end. This sacrifices Graphical Integrity
at the top end of the range because we can no longer distinguish
pixels or features by attribution value beyond the clipping threshold.</p>
      <p>But it improves coverage. We can now see a larger fraction of the
attributions (see Figures 2 (d)). Back to our example. Suppose that
we clip the attributions at the 99th percentile. The ratio of the
maximum attribution value to that of the the kth max pixel (such
that the top k pixels account for 75% of the attribution magnitude)
falls from 25 to 5. This would imply that if the max attribution value
has an value close to 255, the kth max pixel would have a value
close to 50, and this is distinguishable from black, and hence visible.</p>
      <p>Notice that we have achieved a high-degree of coverage (we can
see 75% of the attributions) by sacrificing Graphical Integrity for a
small fraction of the pixels (in this case, the top 1%).</p>
      <p>So far we have visualized pixels with positive attributions.
Certain pixels may receive a negative atribution. Strictly speaking, the
Graphical Integrity and coverage requirements apply to visualizing
such negative attributions too. However, empirically, we notice two
diferent ways in which pixels with negative attribution arise. First,
they coincide with pixels that have positive attribution as part of
the same feature; this likely happens because the deep network
performs edge detection and the positives and negatives occur on
either side of the edge. Second, occasionally, we see entire features
with negative attributions. In this case, the feature is a negative
“vote" towards the predicted class. It is worth showing the second
type of negative attribution, though possibly not the first type
because they are redundant and simply increase cognitive load. For
instance, in the case of the fireboat, pixels with negative attribution
always co-occur with pixels with positive attributions; see Figure 2
(e).</p>
      <p>In general, it is hard to know a priori that the negative
attributions are simply redundant. We recommend first showing experts
both positive and negative attributions initially, and suppressing
the negative attributions if they are mostly redundant.</p>
      <p>
        As another example, consider the scenario of explaining the
diabetic retinopathy classification of a retinal image. This is
arguably a more mission critical task than object recognition because
it is health related. For the model, we use the deep learning model
trained to detect diabetic retinopathy from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and again we use the
Integrated Gradients method [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for computing the attributions.
      </p>
      <p>Figure 4 shows a comparison between the naive and the clipped
visualizations of the attribution.
(a) Our running example: A fireboat
(b) A naive visualization of the attributions.
(c) The top 500 pixels by attribution magnitude.</p>
      <p>(d) Improved Coverage from Clipping.
(e) Visualizing both positive (green) and negative (red)
attributions.
(a) Distribution of attribution magnitude over pixels. (b) Number of pixels to cover the top 20% of total
attribution.
(c) Number of pixels to cover the top 50% of total
attribution.
(d) Number of pixels to cover the top 80% of total
attribution.</p>
      <p>This example fundus image was determined by a panel of retina
specialists to have severe non-proliferative diabetic retinopathy
(NPDR), a form of the condition with increased risk of vision loss
over time. This determination was made by the specialists based
on image features indicating specific types of pathology: These
include a range of microaneurysms and hemorrhages (dark regions)
throughout the image, as well as potential intra-retinal
microvascular anomalies (IRMA) that are diagnostic of more severe forms of
the disease.</p>
      <p>A deep-learning network accurately predicted Severe NPDR in
this image; we examined the corresponding attribution map for this
prediction. The naive visualization highlights some of this
pathology, but misses much of it. Many microaneurysms and hemorrhages
are omitted, particularly in the lower-right regions of the image
(Figure 4 (a), arrows). The IRMA feature, important for accurate
diagnosis, is somewhat highlighted, but of relatively low salience
(Figure 4 (a), dotted circle). It is possible for clinicians to miss this
signal.</p>
      <p>By contrast, a clipped version of the image (which reduces the
range of attributions by clipping attribution scores above a certain
threshold; Figure 4 (b)) highlights these clinically-relevant features.</p>
      <p>We asked two ophthalmologists to evaluate this image, along with
both visualizations, as part of our eval experiment (described in
detail below). In this instance, both ophthalmologists indicated that
they preferred the clipped version, citing the fact that the naive
visualization either missed or inadequately highlighted the most
important features for diagnosis.
3</p>
    </sec>
    <sec id="sec-7">
      <title>MORPHOLOGICAL CLARITY AND LAYER</title>
    </sec>
    <sec id="sec-8">
      <title>SEPARATION</title>
      <p>As we discussed in the Introduction (cf. Figure 1), attribution
visualizations are ultimately for human consumption; a human interprets
the visualization and takes some action.</p>
      <p>Our next requirement is that the images satisfy Morphological
Clarity, i.e., the features have clear form, the visualization is not
“noisy’.</p>
      <p>Notice that model may behave in a way that does not naturally
result in Coherence; it could for instance, rely on a texture that
is “noise” like. In this sense, optimizing for Morphological Clarity
could be at the cost of faithfully representing the model’s behavior
and the attributions. Nevertheless it is likely that visualizations that
satisfy Coherence are more efective in an assistive context as we
discussed in Section 1.3 and reduce cognitive load on the human.
To best account for this trade-of, consider applying the operations
below that improve Morphological Clarity as a separate knob.
(a) A naive visualization of the attributions
(b) Clipped version of Figure(a).</p>
      <p>
        To improve Morphological Clarity, We shall apply two standard
morphological transformations. (a) The first fills in small holes in
the attributions (called Closing) and (b) the second removes small,
stray, noisy features from the visualization (called Opening). These
are standard operations in image processing (cf. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). See Figure 5
for the result, where we applied the morphological transformations
to both Integrated Gradients and Guided Backpropagation
attributions. Notice that the morphological operations reduce visual
clutter and therefore improve Morphological Clarity.
      </p>
      <p>
        We now introduce our final requirement. Notice that our
visualizations establish a correspondence between the attributions and
the raw image, so that the attributions can highlight important
features. It is important that the correspondence is established without
occluding the raw image, because the human often wants to inspect
the underyling image directly either to verify what the attributions
show, or to form a fresh opinion. This is called Layer Separation
(see Chapter 3 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]), i.e., to ensure that both information layers, the
attributions and the image, are separately visible.
      </p>
      <p>Notice that we have been overlaying the attributions on top of a
grayscale version of the original image. This ensures that the the
colors of the attribution do not collide with those of the image and
that we can associate the attributions with the underlying image.
Unfortunately, we do not have a clear view of the raw image. If we
had an interactive visualization, we could toggle Figures(e) and (a)
on a click.</p>
      <p>But a diferent alternative is to produce outlines of the important
regions. This is also possible via standard morphological operations.
The idea is to first threshold the attributions values into a binary
mask, where all nonzero attribution values are mapped to 1, and
the rest are zero. Then, we subdivide the mask into clusters by
computing the connected components of the mask. Here, we can
rank the components by the sum of attribution weight inside each
component, and keep the top N components. Finally, we can get
just the borders around each kept component by subtracting the
opening of the component by the component. See Figure 6. The
result is that the underlying raw image is visible, but we can also
tell which parts the attribution calls out as important.
4</p>
    </sec>
    <sec id="sec-9">
      <title>LABELING</title>
      <p>An important aspect of the visualization is labeling. The labeling
helps ensure that the decision-maker understands exactly what
prediction is being explained. Deep networks that perform
classification usually output scores/predictions for every class under
consideration. The attributions usually explain one of these scores.
It is therefore important to label visualizations with the class name
and the prediction score. Additionally, it is worth clarifying whether
a visualization distinguishes positive from negative attributions;
and if so, indicate the sign of these attributions. See Figure 6 (d) as
an example.
5</p>
    </sec>
    <sec id="sec-10">
      <title>EVALUATION</title>
      <p>In order to evaluate the impact of the visualization principles
discussed here on an actual decision-making task, we ran a pair of
side-by-side evaluation experiments. For each experiment, 5
boardcertified ophthalmologists assessed retinal fundus images for
diabetic retinopathy (as in Figure 4), and then compared two
visualization methods (such as Figure 4 (a) vs. Figure 4 (b)) in terms of
how well they supported their evaluations5. In these comparisons,
5Since visualizations are inherently for human consumption, and since we have no
source of ground truth, we decided on evaluating doctor preference as our ground truth.</p>
      <p>We selected a task for which doctors would be readily able to identify all
clinicallyrelevant pathology relevant to diagnosis, reducing the chance of confirmation bias.
(a) Baboon, using Integrated Gradients attributions
(b) Baboon with morphological operations applied
(c) African Hunting Dog, using Integrated Gradients
attributions
(d) African Hunting Dog with morphological operations
applied
the same underlying attribution map was used; the only thing that
varied was the visualization parameters.</p>
      <p>
        For the first comparison, we compared naive integrated
gradients visualizations to a clipped version (where values above the
95th percentile are clipped to the 95th percentile value, and values
below the 30th percentile are thresholded (set to zero)) (N=87
images). As we described in section 2, the clipped version sacrifices
some Graphical Integrity for better Coverage. This allowed us to
measure whether experts found this trade-of efective. For the
second comparison, we compared the clipped version to an outline
version, made using the methods we describe in Section 3 (N=51
images). This allowed us to measure the efect of the morphological
operations to improve clarity. Due to doctor availability and time
Ideally, we should measure prediction accuracy of doctor aided with visualizations as
in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], but that is outside the scope of this paper.
constraints, we did not compare the naive version directly with the
outline version.
      </p>
      <p>For both experiments, each image and its resulting visualizations
were independently assessed by 2 out of the 5 doctors participating
in the experiment. For each image, doctors performed the following
steps: (1) They viewed the original fundus image and were prompted
“What is your assessment for diabetic retinopathy for this image?".</p>
      <p>
        They could select from one of the 5 categories from the International
Clinical Diabetic Retinopathy scale ( [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), plus a 6th option if the
image was determined to be ungradeable by the doctor. (2) Doctors
viewed an additional image, with the two visualizations being
compared shown side-by-side. The assignment of visualization method
to left/right side was randomized across trials. Doctors answered
the following questions: “Which visualization better supports your
evaluation of DR in this image?"; “(Optional) What contributed to
your decision?"; “If you marked that you changed your diagnosis,
(a) Ringneck Snake
(b) Stopwatch
(c) Drilling Platform
(d) Indigo Bunting, with classification label, prediction score, and visualization
parameters
what did you change it to?". For the second question, doctors could
select 0 or more options from a range of reasons, including “The
[left/right] visualization highlighted irrelevant features" and “The
[left/right] visualization missed important features". We also
included an option to indicate that the doctor changed their diagnosis
after seeing the visualizations; and an “Other" option for qualitative
open-text feedback on the value of the visualizations for the given
image.
      </p>
      <p>The results indicate that changes in Coverage and Morphological
Clarity of a visualization can strongly impact its perceived value to
consumers of the visualization (Figure 7). Overall, doctors tended
to prefer a naive diabetic retinopathy visualization over our clipped
version, at an approximately 2:1 rate (Figure 7 (a)). However, there
were substantial diferences between these versions in terms of the
tradeof between missing features and highlighting irrelevant
features (Figure 7 (b)). Naive visualizations tended to miss important
features more often (lacking sensitivity), while our clipped version
tended to highlight irrelevant features (lacking specificity). Those
instances when the naive visualization missed a feature tended to
account for the approximately 1/3 of instances where a clipped
version was preferred; however, given that a visualization covered the
essential features, further increases in Coverage were not viewed
as helpful, since the features revealed seemed less relevant. These
responses indicate that the clipping parameters used in this
experiment were likely too aggressive at increasing Coverage. This
indicates that, in this experiment, each visualization tended towards
diferent errors, and that tuning Coverage (or allowing consumers
of a visualization to tune Coverage, e.g. via a slider) can optimize
this tradeof.</p>
      <p>We also found that increasing Morphological Clarity, via
outlining, was preferred (Figure 7 (c)). Doctors tended to prefer outlines
to the clipped version, again at an approximately 2:1 rate. This
preference occurred despite the higher tendency of the outline version
to miss features compared to the clipped version, (Figure 7 (d)). The
relatively higher rate of sensitivity misses for outlines is likely due
to the morphology step used here, which may remove some smaller
features before computing the outlines. As with the Coverage
experiment, the cases of sensitivity losses (when the visualization
missed important features) account for many of the times when
that outlines were not preferred (being cited in 31/36, or 86% of
such trials); in cases where the outlines did not miss features, it
was strongly preferred for increased clarity of the display.</p>
      <p>These experiments demonstrate the importance of clipping and
Morphological Clarity to the potential utility of explanation
visualizations in a clinical application. Adjusting parameters, such
as clipping and outlining methodology, which afect a
visualization’s coverage and Morphological Clarity can play a strong role in
(b) Tradeofs between sensitivity and specificity of
visu(a) Preference for two visualizations with diferent cov- alizations with diferent coverage. Error bars reflect 95%
erage levels on perceived efectiveness in explaining dia- binomial confidence intervals on the rate at which
docbetic retinopathy predictions. Error bars indicate 95% bi- tors evaluating the visualizations responded that a given
nomial confidence intervals. Each doctor’s assessment of visualization method highlighted irrelevant features (X
a visualization is treated as an independent observation. axis) or missed important features (Y axis)
(c) Preference for two visualizations with diferent coher- (d) Tradeofs between sensitivity and specificity of
visualence levels on perceived efectiveness in explaining dia- izations with diferent coverage. Conventions as in panel
betic retinopathy predictions. Conventions as in panel (b).</p>
      <p>(a).
experts’ preference for the visualization, even though the
underlying attribution data are the same. They also afect whether the
visualization highlights clinically relevant image features. These
further illustrate that concepts such as sensitivity and specificity
are applicable to the perception of visualizations. These tradeofs
may explain preference for a particular visualization type. They
may also indicate the extent to which a human considers an
explanation to diverge from their expected image features. (This latter
signal may merit further exploration: When attribution highlights
a feature that an expert considers irrelevant, this may indicate
model deficiencies, but it may also indicate features that a model
has learned are diagnostic, but which diverges from the expected
features that humans have learned to use.) Understanding the right
balance of clipping and Morphological Clarity will be an important
step in validating attributions to be useful for assisting people; and
will likely depend strongly on the domain of the classification task.</p>
      <p>This may be an area where providing user control on aspects of a
visualization may help it to be more efective.
6</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>We have evaluated our visualizations in the context of assisting
doctors (see Section 5). In the introduction, we also mentioned a use
case to help developers debug their models. In Figure 4 (b), notice
that the model highlights the notch. The notch is not a pathology of
diabetic retinopathy. It is perhaps the case that some camera makes
had a predominance of certain types of DR cases, and the model
picks up the notch as a predictive feature. This is obviously not
desirable and highlights the ability of the visualizations to identity
data and model issues.</p>
      <p>There is a large literature on explaining deep-network
predictions. These papers discuss principled approaches that identify the
influence of base features of the input (or of neurons in the hidden
layers) on the output prediction. However, they do not discuss the
visualization approach that is used to present this analysis to the
human decision-maker. The visualizations have a large influence on
the efectiveness of the explanations. As we discsuss in Section 5,
modifying the visualization (by clipping it) change the types of
model errors (sensitivity or specificity) detected by the human. As
we discuss in Section 2, this diference is driven by the large range
and heavy-tailedness of the attribution scores. Visualization is the
language in which the explanations are presented and so it is
important to treat it as a first-class citizen in the process of explanation,
to be transparent about the visualization choices that are made, and
to give the end user control over the visualization knobs.</p>
      <p>Furthermore, we notice the central role of the human expert in
the use of attribution maps within diabetic retinopathy diagnosis.
The human implicitly interprets important logical/high-level
features (e.g. a hemorrhage) from the pixel importances. In an assistive
context, what matters is the prediction accuracy of the human and
the model combined. We must rely on the human to be a domain
expert, to be calibrated to perform the model’s (visual) prediction
task, and to be calibrated to assess the visual explanations. We can
tune the visualizations to aid the human in the process of
interpretation. As discussed in Section 3, we can make the features in
the visualizations more visually coherent and less noisy. Of course,
we could go too far down this path and optimize for agreement
between the human and the model; this would be dangerous as it
would merely encourage confirmation bias.</p>
      <p>Efective visualization will allow human experts to identify
unattended or unexpected visual features, and relate their own
understanding of the prediction task to that of the model’s performance.
This will increase the combined accuracy of the model and the
human.</p>
      <p>Finally, we expect our visualization techniques to be applicable
to almost all computer vision tasks. Real-world images and medical
images capture a vast variety of tasks, and we demonstrate that our
methods are applicable to both of these.</p>
      <p>All our code is available at this link:
sualizationLibrary</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>We thank Naama Hammel, Rajeev Ramchandran, Michael Shumski,
Jesse Smith, Ali Zaidi, Dale Webster, who provided helpful feedback
and insights for this document.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>David</given-names>
            <surname>Baehrens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Timon</given-names>
            <surname>Schroeter</surname>
          </string-name>
          , Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and
          <string-name>
            <surname>Klaus-Robert Müller</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>How to Explain Individual Classification Decisions</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          (
          <year>2010</year>
          ),
          <fpage>1803</fpage>
          -
          <lpage>1831</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Binder</surname>
          </string-name>
          , Grégoire Montavon,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <surname>Klaus-Robert Müller</surname>
            , and
            <given-names>Wojciech</given-names>
          </string-name>
          <string-name>
            <surname>Samek</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers</article-title>
          .
          <source>CoRR</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V</given-names>
            <surname>Gulshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <article-title>M Coram,</article-title>
          and et al.
          <year>2016</year>
          .
          <article-title>Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs</article-title>
          .
          <source>JAMA 316</source>
          ,
          <issue>22</issue>
          (
          <year>2016</year>
          ),
          <fpage>2402</fpage>
          -
          <lpage>2410</lpage>
          . https://doi.org/10.1001/jama .
          <year>2016</year>
          .
          <volume>17216</volume>
          arXiv:/data/journals/jama/935924/joi160132.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S</given-names>
            <surname>Haneda</surname>
          </string-name>
          and
          <string-name>
            <given-names>H</given-names>
            <surname>Yamashita</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>International clinical diabetic retinopathy disease severity scale. Nihon rinsho</article-title>
          .
          <source>Japanese journal of clinical medicine 68</source>
          (
          <year>2010</year>
          ),
          <fpage>228</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Convolutional Neural Networks for Sentence Classification</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29</source>
          ,
          <year>2014</year>
          , Doha,
          <string-name>
            <surname>Qatar,</surname>
          </string-name>
          <article-title>A meeting of SIGDAT, a Special Interest Group of the ACL</article-title>
          ,
          <string-name>
            <surname>Alessandro</surname>
            <given-names>Moschitti</given-names>
          </string-name>
          , Bo Pang, and Walter Daelemans (Eds.).
          <source>ACL</source>
          ,
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          . http://aclweb.org/anthology/D/D14/D14- 1181.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Scott</surname>
            <given-names>M Lundberg</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Su-In</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A Unified Approach to Interpreting Model Predictions</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          30, I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett (Eds.). Curran Associates, Inc.,
          <fpage>4768</fpage>
          -
          <lpage>4777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Raja</given-names>
            <surname>Parasuraman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Riley</surname>
          </string-name>
          .
          <year>1997</year>
          . Humans and Automation: Use, Misuse, Disuse, Abuse.
          <source>Human Factors</source>
          <volume>39</volume>
          ,
          <issue>2</issue>
          (
          <year>1997</year>
          ),
          <fpage>230</fpage>
          -
          <lpage>253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Doina</given-names>
            <surname>Precup</surname>
          </string-name>
          and Yee Whye Teh (Eds.).
          <source>2017. Proceedings of the 34th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>Sydney</article-title>
          ,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia,
          <fpage>6</fpage>
          -
          <issue>11</issue>
          <year>August 2017</year>
          .
          <source>Proceedings of Machine Learning Research</source>
          , Vol.
          <volume>70</volume>
          . PMLR. http://jmlr.org /proceedings/papers/v70/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          , Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein,
          <string-name>
            <surname>Alexander C. Berg</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2015</year>
          .
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          (IJCV) (
          <year>2015</year>
          ),
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S S</given-names>
            <surname>Stevens</surname>
          </string-name>
          .
          <year>1957</year>
          .
          <article-title>On The Psychophysical Law</article-title>
          .
          <source>Psychological review 64 (06</source>
          <year>1957</year>
          ),
          <fpage>153</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wojciech</surname>
            <given-names>Samek</given-names>
          </string-name>
          , Alexander Binder, Grégoire Montavon,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Bach</surname>
          </string-name>
          , and
          <string-name>
            <surname>Klaus-Robert Müller</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Evaluating the visualization of what a Deep Neural Network has learned</article-title>
          .
          <source>CoRR</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Rory</surname>
            <given-names>Sayres</given-names>
          </string-name>
          , Ankur Taly, Ehsan Rahimy, Katy Blumer, David Coz,
          <string-name>
            <given-names>Naama</given-names>
            <surname>Hammel</surname>
          </string-name>
          , Jonathan Krause, Arunachalam Narayanaswamy, Zahra Rastegar, Derek Wu,
          <string-name>
            <given-names>Shawn</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Scott Barb</surname>
          </string-name>
          , Anthony Joseph, Michael Shumski,
          <string-name>
            <given-names>Jesse</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Arjun B.</given-names>
            <surname>Sood</surname>
          </string-name>
          , Greg S. Corrado, Lily Peng, and
          <string-name>
            <surname>Dale</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Webster</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy</article-title>
          .
          <source>Ophthalmology</source>
          (
          <year>2018</year>
          ). https://doi.org/10.1016/j.ophtha .
          <year>2018</year>
          .
          <volume>11</volume>
          .016
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Jean</given-names>
            <surname>Serra</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>Image Analysis</article-title>
          and
          <string-name>
            <given-names>Mathematical</given-names>
            <surname>Morphology</surname>
          </string-name>
          . Academic Press, Inc., Orlando, FL, USA.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Avanti</surname>
            <given-names>Shrikumar</given-names>
          </string-name>
          , Peyton Greenside, and
          <string-name>
            <given-names>Anshul</given-names>
            <surname>Kundaje</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Learning Important Features Through Propagating Activation Diferences</article-title>
          ,
          <source>See [ 8]</source>
          ,
          <fpage>3145</fpage>
          -
          <lpage>3153</lpage>
          . http://proceedings.mlr.press/v70/shrikumar17a.html
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Karen</surname>
            <given-names>Simonyan</given-names>
          </string-name>
          , Andrea Vedaldi, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps</article-title>
          .
          <source>CoRR</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Smilkov</surname>
          </string-name>
          , Nikhil Thorat, Been Kim, Fernanda B.
          <string-name>
            <surname>Viégas</surname>
            , and
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Wattenberg</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>SmoothGrad: removing noise by adding noise</article-title>
          .
          <source>CoRR abs/1706</source>
          .03825 (
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .03825 http://arxiv.org/abs/1706.03825
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Jost</given-names>
            <surname>Tobias</surname>
          </string-name>
          <string-name>
            <given-names>Springenberg</given-names>
            , Alexey Dosovitskiy, Thomas Brox, and
            <surname>Martin</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Striving for Simplicity: The All Convolutional Net</article-title>
          .
          <source>CoRR</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Yi</given-names>
            <surname>Sun</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mukund</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Axiomatic attribution for multilinear functions</article-title>
          .
          <source>In 12th ACM Conference on Electronic Commerce (EC)</source>
          .
          <volume>177</volume>
          -
          <fpage>178</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Mukund</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ankur</given-names>
            <surname>Taly</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A Note about: Local Explanation Methods for Deep Neural Networks lack Sensitivity to Parameter Values</article-title>
          . https://arxiv.org/abs/
          <year>1806</year>
          .04205.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Mukund</surname>
            <given-names>Sundararajan</given-names>
          </string-name>
          , Ankur Taly, and
          <string-name>
            <given-names>Qiqi</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Axiomatic Attribution for Deep Networks</article-title>
          ,
          <source>See [8]</source>
          ,
          <fpage>3319</fpage>
          -
          <lpage>3328</lpage>
          . http://proceedings.mlr.press/v70/sundar arajan17a.html
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Wei Liu, Yangqing Jia,
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Scott E.</given-names>
            <surname>Reed</surname>
          </string-name>
          , Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Going Deeper with Convolutions</article-title>
          .
          <source>CoRR</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,
          <string-name>
            <given-names>Ian J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Rob</given-names>
            <surname>Fergus</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Intriguing properties of neural networks</article-title>
          .
          <source>CoRR</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Edward</given-names>
            <surname>Tufte</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <string-name>
            <given-names>Envisioning</given-names>
            <surname>Information</surname>
          </string-name>
          . Graphics Press, Cheshire,
          <string-name>
            <surname>CT</surname>
          </string-name>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Edward</given-names>
            <surname>Tufte</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>The visual display of quantitative informations 2nd ed</article-title>
          . Graphics Press, Cheshire, Conn. http://www.amazon.com/Visual-Display-Quantitati ve-Information/dp/0961392142
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Edward</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tufte</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Visual Explanations: Images and Quantities, Evidence and Narrative</article-title>
          . Graphics Press, Cheshire, CT.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>