<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visualising the Training Process of Convolutional Neural Networks for Non-Experts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christin S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>P.O. Box 217, 7500AE Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Convolutional neural networks are very complex and not easily interpretable by humans. Several tools give more insight into the training process and decision making of neural networks but are not understandable for people with no or limited knowledge about arti cial neural networks. Since these non-experts sometimes do need to rely on the decisions of a neural network, we developed an open-source tool that intuitively visualises the training process of a neural network. We visualize neuron activity using the dimensionality reduction method UMAP. By plotting neuron activity after every epoch, we create a video that shows how the neural network improves itself throughout the training phase. We evaluated our method by analysing the visualization on a CNN training on a sketch data set. We show how a video of the training over time gives more insight than a static visualisation at the end of training, as well as which features are useful to visualise for non-experts. We conclude that most of the useful deductions made from the videos are suitable for non-experts, which indicates that the visualization tool might be helpful in practice.</p>
      </abstract>
      <kwd-group>
        <kwd>Explainable AI</kwd>
        <kwd>Convolutional Neural Network sation</kwd>
        <kwd>Dimensionality reduction</kwd>
        <kwd>Image recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Image recognition has become a great bene cial technology in various elds. It
is being used for face recognition, visual search engines, e-commerce, healthcare
and much more. To classify images, a convolutional neural network (CNN) needs
to be trained. Neural networks have been found very useful for nding extremely
complex patterns [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Although image recognition is thus a very impressive and
useful technology, its complexity (and especially the high number of parameters)
makes it di cult to understand how a trained model makes its decisions and why
? Both authors contributed equally.
      </p>
      <p>
        Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
it gives certain results. Getting a better understanding and interpretation of the
model can help with debugging and improving the model, as well as generating
trust for non-experts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We focus on facilitating the non-expert: an individual
without knowledge of neural networks who may not necessarily have a technical
background. The only knowledge required for our tool is a basic understanding
of classi cation. That is, a machine labelling input after learning from labelled
training data.
      </p>
      <p>
        The need to further interpret these models has given rise to numerous CNN
visualisation techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We use the dimensionality reduction method
Uniform Manifold Approximation and Projection (UMAP) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to plot the neuron
activations for every input in a single 2D graph. Our approach generates a
visualisation with three plots: (i) a 2D plot of the test data with the input images
as the data points, (ii) a simple 2D plot of the training data, and (iii) a line
plot of the accuracy of both the testing and training data. Most importantly,
we visualise throughout the CNN's training phase, generating a video frame for
each epoch. We evaluate our tool by manually inspecting and interpreting the
videos, and question the accessibility for a non-expert. The data set we classify
contains grey-scale small sketches of various objects [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We limited us to 10
object categories.
      </p>
      <p>We have released a tool1 that intuitively visualises the training process of
a convolutional neural network. The tool can be simply adapted to visualise
other architectures of feedforward neural networks as well. We evaluated that
the visualisations can provide non-experts with a general insight into how a CNN
learns and how it can deduct the CNN's decisions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Much research concerns improving computational sketch classi cation [
        <xref ref-type="bibr" rid="ref15 ref16 ref19 ref3 ref5">3, 5, 15,
16, 19</xref>
        ]. The complexity of classifying sketches lies within the aspects being very
dependent on the person who drew it, such as their skill level. Using neural
networks is a valid approach to classifying these sketches. As neural networks in
general are not transparent and the task at hand is complex, this setup o ers the
optimal testing ground to evaluate if our visualisation tool can help non-experts
gather insight of a neural network's workings.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Training phase visualisation</title>
        <p>
          Most training visualisations show whether a model is improving during its
iterations, rather than how it is training and why it makes certain decisions. One of
these visualisations is the one proposed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which proposes a way to visualise
a novel type of training error curve on three levels of detail.
        </p>
        <p>
          Since we want to show how the model trains, we have to look at more
information than just the accuracy or error. Two tools that give insight into the
1 https://github.com/Linths/TrainVideo
training process of a neural network are DeepTracker [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and Tensorview [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
DeepTracker is a proposed visual analytics system which helps domain experts
in giving a better understanding of what has occurred in the CNN's training
process. TensorView uses visualisations of di erent attributes of the CNN, such
as the weights of the convolution lters, trajectories of the rst two dimensions
of the convolution weights, and the activations of the lters. Both DeepTracker
and Tensorview speci cally aid model-builders so they can improve their
network. However, since they give a lot of information that is only relevant to an
expert or model-builder, it also becomes less understandable for non-experts.
        </p>
        <p>Therefore, when building our visualisation tool, we need to nd the right
amount of information to show, to give insight into the training process, without
losing the understandability for the non-expert.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dimensionality reduction</title>
        <p>
          Our goal is to make a CNN's behaviour understandable for non-experts. This
behaviour is determined by how every neuron activates for the given images.
Because the number of neurons can grow large and the activations are essentially
a set of formulas, showing all this information can overwhelm non-experts. One
could show all exact neuron activity by plotting all separate activation values
per input image, taking a dimension per neuron. However, to make a
humanreadable plot, tools such as t-SNE [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and UMAP [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] reduce the dimensionality
to 2 or 3. These new dimensions are compressed versions of the old dimensions.
Dimensionality reduction (DR) approximates the neural network's behaviour.
While the absolute DR values do not carry intuitive meaning, the relative values
do: generally, if two images get similar DR values, the neural network behaves
similarly to them. When it becomes clear what images the CNN considers similar,
one can theorise about what aspects the CNN based its decisions on.
        </p>
        <p>
          There is much variety in the existing DR approaches as well as the further
functionality of these visualisation tools. While t-SNE has been a very
popular DR implementation [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the more recent DR implementation UMAP shows
competition in visual and computational terms [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Comparative tests on
benchmark data sets, such as COIL-100 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], MNIST [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and Fashion MNIST [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] { all
comparable to our data { have shown that UMAP consistently visualises global
structures signi cantly better than t-SNE and in considerably less time than
t-SNE. Therefore, we use UMAP as our DR method.
        </p>
        <p>
          Using t-SNE, Rauber et al. developed a 2D method that gives insight into
how an arti cial neural network learns [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] over time, by overlaying all t-SNE
frames that were taken between training epochs, showing a t-SNE \trail" for
every input entry. The imagery however becomes cluttered by compressing the
whole timeline. One can also not see the additional information { such as the
changing performance of the neural network { evolve together with the
network's activations. There are also tools that create dynamic DR visualisations.
TensorFlow's Embedding Projector, is an open-source online web application
that allows for DR with rich functionality [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The tool shows the changing
t-SNE plots while the neural network is being trained. The tool does not have
        </p>
        <p>Expert
builds
Test images
Train images</p>
        <p>CNN</p>
        <p>Trained CNN</p>
        <p>ABC
CCA
70%
Prediction data
(training)</p>
        <p>Repeat
per epoch</p>
        <p>ABC
CCA
70%
Prediction data
(testing)</p>
        <p>Non-expert
investigates</p>
        <p>Visualisation frame Visualisation video
this dynamic visualisation for UMAP. The original images are included in the
plots and they are colour-coded according to actual class. The tool is interactive
but lacks certain information such as accuracy.</p>
        <p>We conclude we will develop a tool that creates dynamic DR visualisations,
including performance statistics, by creating videos with the inter-epoch DR
plots as video frames. We use UMAP as DR method and include the input
images in the plots. The user-friendliness and increased interpretability that the
Embedding Projector o ers with its rich interactivity could be an easy extension
for our tool in the future.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>In this section, we describe our approach for producing dynamic visualisations
and interpreting them. To further facilitate the reproduction and expansion of
this study, our source code is available on GitHub1.</p>
      <p>Simply put, our training visualiser takes in train and test images, runs them
through a trainable CNN and outputs a video visualising the training process
of the CNN. The main idea is to show how a CNN develops and improves on
classifying unseen images after more training. This is why we require test data
as well, even though we aim to visualise the training process. While a properly
performing neural network is crucial to our approach, there are only a few
further constraints. Our integrated CNN can be interchanged with any feedforward
neural network with a fully connected layer. Note that the resulting
visualization shows the original input images, thus if the network doesn't have images for
input, only minor code tweaks are needed to show these as data points instead.</p>
      <p>Figure 1 shows how the tool operates and how the stakeholders are involved.
Per epoch of the CNN, it learns from the train data, classi es the test data and
creates a visualisation frame. After all epochs, the tool creates a video from those
frames.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>For the experiment, we use our tool on one speci c data set and one speci c
neural network architecture. We manually evaluate the resulting visualisations
in detail.
4.1</p>
      <sec id="sec-4-1">
        <title>Data Set</title>
        <p>
          The data set we use was collected by having people sketch recognizable, speci c
subjects [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It consists of 250 categories with 80 sketches of 1111x1111 pixels
each. They analyzed sketch recognition performance by humans, which was 73%,
to compare to the computational sketch recognition.
        </p>
        <p>We augmented the training data as follows: (i) horizontal ipping, (ii)
random rotation between 3.5 and 20.5 degrees clockwise or counterclockwise, (iii)
random shift between 20 and 100 pixels to right or left and to top or bottom (iv)
random rescale between 0.75 and 0.9 or between 1.1 and 1.25. All test and train
images are resized from 1111 x 1111 to 128 x 128, converted to tensor images
and normalized according to a normal distribution with = 0:8 and = 0:2
(determined empirically).</p>
        <p>We limit our experiments to 10 classes to increase the CNN's performance
and our visualisation's legibility. We created data sets, one corresponding to a
di cult classi cation task (further denoted as Hard ) and one corresponding to
a simple classi cation task (further denoted as Easy ). Table 1 shows our two
data sets. Easy has arguably easily distinguishable images, e.g. apples and ants,
while Hard contains very similar images: types of bears, birds and cars. This
way, we can assess how our visualisation videos might provide understanding
in di erent situations. Moreover, showing a non-expert both videos might give
them additional insight into the CNN's training process. For every class, we
split the data into 60 original train images (which amounts to 300 images after
transformations) and 20 test images.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Neural network architecture</title>
        <p>
          Previous work on the sketch data set yielded an accuracy of 70% using 15
classes [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The authors used a CNN with two convolutional layers with ReLU
activations, each followed by a max-pool layer, and two fully connected layers
(FC1 and FC2). In this work we use a similar architecture, that is shown in
Figure 2. The di erences between our architecture and that of [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] are the amount of
nodes in the FC layers, the amount of lters per convolutional layer, the stride
in the last max-pool layer, and that their FC1 has ReLU activations, where ours
does not. We apply dropout before FC1, and softmax on the 10 output nodes.
All parameters are determined empirically, optimising test accuracy, speed and
the visualisation (e.g., showing a clear clustering). Table 2 lists the nal
hyperparameters. We increased the number of epochs to show how the visualisation
changes when the model over ts.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Visualisation</title>
        <p>To make the training process visible and understandable for non-experts, we
visualise the training data, testing data and accuracy of training and testing per
epoch, as can be seen in Figure 5. For this, we take the pre-activation values
of FC2 layer, as shown in Figure 2. To visualise all data points from an epoch,
we save the original labels, predictions and pre-activation values of FC2 of each
image after it passes the neural network.</p>
        <p>For the test data plot, we want to display the images on the plot points, to
make it visible why a certain image was perhaps misclassi ed, why speci c classes
are displayed as multiple clusters or why they appear close to speci c other
clusters. We add a mask to each of those images with the colour corresponding
to the predicted label, and a red border if the image is misclassi ed, which is
most useful when the actual class of the image is di cult to determine from
the image and to give a quick overview of how many images within a certain
cluster are misclassi ed. This way, the plot represents three important parts
of information; (i) the actual class, which is represented by the image, (ii) the
predicted class, represented by the image colour, and (iii) the neuron activity,
represented by the location of the image in the plot gure.</p>
        <p>The training data plot is almost the same as the test data plot, however, we
only show simple dots instead of the images, since we want to focus on showing
the e ect of the training phase on the test data, and thus we want to keep the
training data plot small.</p>
        <p>We present the test accuracy and train accuracy, to see whether the CNN
trains well, and to view the relation between the quality of the neural network
and the UMAP visualisation.</p>
        <p>To visualise the neuron activity, we use UMAP (with #neighbours =25) to
reduce the pre-activation node values of the FC2 to two dimensions. To keep this
dimensionality reduction consistent, we x the UMAP seed. This way, when the
values in the nodes are only slightly changed after an epoch, the x and y values
of an image will also only slightly change, which makes for a visualisation video
which is easy to follow.</p>
        <p>However, since the absolute x and y values have no useful addition for
nonexperts when they already have the relative locations of the plot points, we
decide to not show any values on the axis of the training and testing data plots.
This way, we avoid confusing the user with useless information.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation</title>
        <p>Our visualisation needs to give useful insights into the decisions that are made
inside a neural network during the training process, while also considering the
level of knowledge and understanding of a non-expert user. To evaluate whether
our tool complies to these two conditions, we compare our dynamic visualisations
with static visualisations, look at which conclusions can be drawn from our
visualisations, and also look at which aspects can be confusing for non-experts
and can lead to wrong conclusions.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In this section we show and interpret our visualisations for Easy and Hard. We
also examine whether the visualisations themselves, the observations and the
interpretations are accessible for non-experts. While our resulting visualisation
is a video, for clarity we will refer to video frames (Figures 3 and 4). We refer to
a speci c frame with a shorthand, e.g. the visualisation of Easy after 3 epochs is
called \Easy -3". The two output videos and the raw frames are available online2.
5.1</p>
      <sec id="sec-5-1">
        <title>Accuracy of the model</title>
        <p>We build and tweak the neural network based on Easy, which consists of only
10 classes. With this data subset, we reach over 70% accuracy. We also test the
accuracy of our model on a data subset of 15 classes. For this, we use Easy,
with the addition of the crab, pineapple, snail, sponge bob and squirrel classes.
Before the model starts to over t, at epoch 7, we achieve around 74% accuracy.
However, despite our neural network being resistant to more than 10 classes, our
visualisation becomes cluttered and less understandable when it has to present
more classes. Table 3 gives an overview of the accuracies of our model.
We observe several trends in the visualisations for Easy and Hard after running
for 20 epochs.</p>
        <p>Throughout the whole timeline, Easy and Hard show clustering in the train
and test plots. Every frame contains several clusters of mainly the same
actual and/or predicted class. Over time, these clusters separate more. The video2
shows relatively smooth transitions; it maintains the same data structure with
some slight to moderate shifts. Unexpectedly, the visualisation seems to ip
horizontally or vertically at a few points, but even then the relative positioning seems
consistent. Interestingly, a speci c structure is only present in the train plots for
Easy -1 and Hard -1. We refer to the very distanced and closely knit groups of
just several points. We found out through simple tests that this happens when
there is too little coherence in the data to properly apply DR { logical to happen
after just one epoch.</p>
        <p>Generally, the image distancing in the test plot seems logical and easily
interpretable. Firstly, in Easy -4 and later, the apple cluster moves far away from all
the other points. This means the CNN has found a very speci c way to predict
apples: the activation behaviour for apples is very di erent than for the other
classes. As almost the whole cluster is predicted correctly, the strong pattern
the CNN learns is a useful one. By inspecting all the images, one can speculate
what was used as the discriminating characteristic for apples. We hypothesize it
is a combination of the round and simple form and the large whitespace.</p>
        <p>Secondly, in Hard -7 and later we nd clusters that contain two to three
predicted classes. The partitioning almost completely matches our expectations
(Figure 1), grouping cars and bears. However, the tool does not group sheep with
clouds. To the contrary, the sheep are grouped with bears. This indicates the
CNN has ease separating sheep from clouds, but not from bears. We believe the
CNN focuses on legs which is a strong di erentiator between sheep and clouds,
but not for sheep and bears. The bears that are matched with the sheep do look
similar as they are on all fours. Moreover, the CNN clearly distinguishes this
bear variant from bears that are sitting, standing on hind legs or lack a body.</p>
        <p>The train accuracy signals the model over ts around Easy -9 and Hard -9.
The clusters in the test and train plots continue to separate, even for Easy
where the visualisation seems to have stabilised. Easy -9 and Easy -20 are closely
similar, but Easy -20 shows a bit more separation. For Hard, these di erences
are bigger. While Hard -20 has a signi cantly clearer separation of clusters than
Hard -9, the test accuracy does not actually increase. Note that the clusters here
concern images of the same predicted class. In reality, the model just polarizes
its activation behaviour. It becomes stricter in enforcing the patterns it claims
to see, but not the patterns that actually exist. It is important to keep in mind
that more exaggerated clustering does not always indicate a better model.
5.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Situation types</title>
        <p>There are multiple ways an image can appear in our visualisations. The
background colour, thus predicted class, can either be correct or wrong. And the
position of the image can di er. Table 4 shows the di erent types of
combinations, and in Figure 5 we point out some examples of these types. For A, we show
a correctly classi ed apple, which is also placed correctly. For B, we point out
an alarm clock which is classi ed correctly but not placed with the other alarm
clocks. C is represented by a basket, classi ed as an apple, which is also placed
with the apples by UMAP. For D, we show two examples; bananas classi ed as
bells, and bells classi ed as bananas. However, UMAP places both with their
actual class rather than their predicted class. That means that UMAP's DR seems
to be better at identifying these images than the neural network's nal layers,
while they are given the same input: the pre-activation values of FC2. Lastly, E
is represented by a misclassi ed alarm clock, which is neither placed with the
other alarm clocks, nor its predicted class. There are also middle grounds of the
above-mentioned situations. For example, a combination of C and D. When an
image is classi ed incorrectly, but both the actual and predicted class are
clustered together, the image can be placed with both its actual and its predicted
class images. This means that the classes are probably really similar, such as the
types of cars in Hard.
5.4</p>
      </sec>
      <sec id="sec-5-3">
        <title>Approachability for the non-expert</title>
        <p>Our tool visualises only information that is useful for a non-expert. For the test
data, we visualise the actual class, the predicted class, the neuron activations
(which estimates the class compatibility), and the overall accuracy. The same
holds for the training data except for the actual classes. Since these are all
features that are easy to understand, also for non-experts, the tool seems very
approachable for people with little to none background knowledge about neural
networks.</p>
        <p>There are some aspect that might be confusing for non-experts, such as
situation types B, D and E from Section 5.3. These situations are less intuitive since
non-experts might not fully understand the di erence between the output of the
neural network, the activation values of the neurons, and the output of UMAP.
When these do not agree on an image, it can become confusing for the user as
to why an image is classi ed as one thing, but clustered with another. Other
confusing factors are the occasional ips of the visualisations between epochs
and the misleading clustering of a increasingly over tting model (mentioned in
Section 5.2).</p>
        <p>Visualising the training process over time { rather than after it has nished
{ can give insights into which features are recognized rst, and when certain
images are clustered much quicker and further away from the rest of the classes.
Also, seeing certain classes cluster together or next to each other can provide
the knowledge that the neural network sees them as quite similar.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>In this section, we lay out factors that have in uenced our research and the
consequences for the tool's quality and our research's validity.</p>
      <p>Ideally, our visualisation would be insightful for CNNs of various performance
levels. We examined results for CNNs scoring around 50% and 70% test accuracy.
Because of the challenging dataset, we were unable to get a higher accuracy
and check if our visualisation would then be insightful. The data set we used
is arguably hard to classify, as it only contains 80 instances per class and the
classes can still contain quite di erent drawings, e.g., the bears mentioned in
Section 5.2.</p>
      <p>The CNN over ts rather quickly. It allows for only eight visualisation frames
without over tting. More frames would facilitate smoother transitions and might
be more understandable and less overwhelming for non-experts. To add, detail
might be lost by the lack of intermediate epochs.</p>
      <p>Dimensionality reduction in itself is awed. A datapoint can have false
neighbours or missing neighbours due to the \compression" of dimensions. This could
make our visualisation misleading. We have not tested other dimensionality
reduction methods than UMAP and we have only roughly optimized the UMAP
settings to minimize such mistakes. Also, we have not counted any of these
mistakes in our current visualisation, but we did see UMAP struggling to apply DR
in Easy -1 and Hard -1.</p>
      <p>Because UMAP re ts on the data for every epoch, the visualization frames
sometimes appear to be ipped horizontally or vertically. This makes the
transitions between our frames less smooth. A solution to this is applying DR only
once, on the data of all epochs. We can then still visualise every epoch
individually by only displaying the appropriate data points. This approach could
however lower the DR quality as it has to t signi cantly more data.
Additionally, it would disable \live visualisations": visualising while running the CNN.</p>
      <p>We have not used a standardized method to evaluate understandability, e.g.
user studies. Our evaluation method concerned just two test cases and was not
quanti ed. Visual inspection is not completely objective as well. Though tasks
as determining clusters have no objective answer anyway: there's no silver bullet.
All in all, our methodology could not produce statistical claims.</p>
    </sec>
    <sec id="sec-7">
      <title>Summary and Future work</title>
      <p>We have built a CNN for the data set of human-made sketches with an
accuracy of over 70% for 10 classes when using the easily distinguishable data subset
(Easy ). For the data subset with classes that seem very similar (Hard ), the
accuracy is slightly lower, due to the classes being more similar. During the training
of this convolutional neural network (CNN), we have created video
visualisations which provide useful information such as the actual class, predicted class
and similarity between classes. Despite some situations which can be di cult to
understand by a non-expert, the videos are overall quite interpretable even by
people without prior neural network knowledge.</p>
      <p>
        Many options for improvement and addition have been left for the future.
Most importantly, we would like to apply user studies with non-experts in our
evaluation for a better scienti c justi cation. That way, we can evaluate how
understandable the visualizations are in practice. Another useful addition for the
evaluation would be a thorough comparison of our tool with several other neural
network visualisations. It would be interesting to at least include methods that
are focused on the training process and/or DR, such as [
        <xref ref-type="bibr" rid="ref14 ref17 ref2 ref4 ref8 ref9">2, 4, 8, 9, 14, 17</xref>
        ]. With
user studies and tool comparisons, one can more properly assess the added value
of both the DR and training process aspect of the visualizations. Furthermore,
it could be interesting to see whether tting the UMAP visualisation to the
result of the rst epoch, or to all epochs after training, makes the visualisation
more easy to follow since clusters might not jump between epochs. Another
useful addition could be to add an automatic cluster detector, which can detect
the types in Table 3. Some methods can detect false or missing neighbours in
DR [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which would add a quick overview for the user of which data points are
classi ed correctly but displayed within the wrong cluster. Moreover, a small but
very useful tool to give concrete and objective insight into how distinguishable
the classes are is a confusion matrix of the actual classes versus the predicted
classes. Lastly, the tool could be made into an interactive viewer. By being able
to switch certain options on or o , it could provide more information without
too much clutter. It could then also enable a 3D visualisation.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berrada</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Peeking inside the black-box: A survey on explainable arti cial intelligence (xai)</article-title>
          .
          <source>IEEE Access PP</source>
          ,
          <volume>1</volume>
          {
          <issue>1</issue>
          (
          <issue>09</issue>
          <year>2018</year>
          ). https://doi.org/10.1109/ACCESS.
          <year>2018</year>
          .2870052
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lippel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zielke</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Gradient descent analysis: On visualizing the training of deep neural networks:</article-title>
          .
          <source>In: Proceedings of the 14th International Joint Conference on Computer Vision</source>
          , Imaging and
          <source>Computer Graphics Theory and Applications</source>
          . pp.
          <volume>338</volume>
          {
          <fpage>345</fpage>
          . SCITEPRESS - Science and Technology
          <string-name>
            <surname>Publications</surname>
          </string-name>
          (
          <year>2019</year>
          ). https://doi.org/10.5220/0007583403380345
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chandan</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deepika</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suraksha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamatha</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          :
          <article-title>Identi cation and grading of freehand sketches using deep learning techniques</article-title>
          .
          <source>In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI)</source>
          . pp.
          <volume>1475</volume>
          {
          <issue>1480</issue>
          (Sep
          <year>2018</year>
          ). https://doi.org/10.1109/ICACCI.
          <year>2018</year>
          .8554920
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guan</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>L.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estrada</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahrens</surname>
          </string-name>
          , J.:
          <article-title>Tensorview: visualizing the training of convolutional neural network using paraview</article-title>
          .
          <source>In: Proceedings of the 1st Workshop on Distributed Infrastructures for Deep Learning - DIDL '17</source>
          . pp.
          <volume>11</volume>
          {
          <fpage>16</fpage>
          . ACM Press (
          <year>2017</year>
          ). https://doi.org/10.1145/3154842.3154846
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Eitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>How do humans sketch objects? ACM Trans</article-title>
          .
          <year>Graph</year>
          .
          <source>(Proc. SIGGRAPH</source>
          )
          <volume>31</volume>
          (
          <issue>4</issue>
          ),
          <volume>44</volume>
          :1{
          <fpage>44</fpage>
          :
          <fpage>10</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hohman</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kahng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pienta</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chau</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          :
          <article-title>Visual analytics in deep learning: An interrogative survey for the next frontiers</article-title>
          .
          <source>IEEE transactions on visualization and computer graphics</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P., et al.:
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <issue>11</issue>
          ),
          <volume>2278</volume>
          {
          <fpage>2324</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
          </string-name>
          , H.:
          <article-title>Deeptracker: Visualizing the training process of convolutional neural networks</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>25</fpage>
          (Nov
          <year>2018</year>
          ). https://doi.org/10.1145/3200489
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Maaten</surname>
          </string-name>
          , L.v.d.,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Visualizing data using t-sne</article-title>
          .
          <source>Journal of machine learning research 9(Nov)</source>
          ,
          <volume>2579</volume>
          {
          <fpage>2605</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coimbra</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minghim</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Telea</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>Visual analysis of dimensionality reduction quality for parameterized projections</article-title>
          .
          <source>Computers &amp; Graphics</source>
          <volume>41</volume>
          ,
          <issue>26</issue>
          {
          <fpage>42</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Healy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melville</surname>
          </string-name>
          , J.: Umap:
          <article-title>Uniform manifold approximation and projection for dimension reduction</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>03426</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Muller,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Reinhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Strickland</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <source>Neural Networks: An Introduction</source>
          . Springer, Berlin/Heidelberg, Germany (
          <year>2012</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          - 57760-4
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nene</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nayar</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murase</surname>
          </string-name>
          , H.:
          <article-title>Columbia object image library (coil-20</article-title>
          . Tech. rep., Columbia University (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fadel</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falcao</surname>
            ,
            <given-names>A.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Telea</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>Visualizing the hidden activity of arti cial neural networks</article-title>
          .
          <source>IEEE transactions on visualization and computer graphics 23(1)</source>
          ,
          <volume>101</volume>
          {
          <fpage>110</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Seddati</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dupont</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahmoudi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Deepsketch:
          <article-title>Deep convolutional neural networks for sketch recognition and similarity search</article-title>
          .
          <source>In: 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)</source>
          . p.
          <volume>16</volume>
          (
          <year>Jun 2015</year>
          ). https://doi.org/10.1109/CBMI.
          <year>2015</year>
          .7153606
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyac</surname>
          </string-name>
          , E.:
          <article-title>Sketch recognition using transfer learning</article-title>
          .
          <source>Multimedia Tools and Applications (Jan</source>
          <year>2019</year>
          ). https://doi.org/10.1007/s11042-018-7067-1
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Smilkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorat</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicholson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reif</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viegas</surname>
            ,
            <given-names>F.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wattenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Embedding projector: Interactive visualization and interpretation of embeddings</article-title>
          .
          <source>arXiv preprint arXiv:1611.05469</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rasul</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vollgraf</surname>
          </string-name>
          , R.:
          <article-title>Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms</article-title>
          .
          <source>arXiv preprint arXiv:1708.07747</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A hybrid convolutional neural network for sketch recognition</article-title>
          .
          <source>Pattern Recognition Letters</source>
          (
          <year>2019</year>
          ). https://doi.org/10.1016/j.patrec.
          <year>2019</year>
          .
          <volume>01</volume>
          .006
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>