<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Interpretability of Convolutional Neural Networks by Taxonomy Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vitor A. C. Horta</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Sobczyk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten C. Stol</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Mileo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BrainCreators</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>École Normale Supérieure de Paris-Saclay</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Insight Centre for Data Analytics at Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Interpretability of Convolutional Neural Networks (CNNs) is often crucial for their application in real world scenarios. We aim to provide such interpretations in terms of the semantic content and conceptual structure CNNs acquire from their training data. Recent advances in Explainable AI have shown that CNNs are capable of learning hierarchical relationships between semantic categories in the form of taxonomic classifications. However, accurate evaluation of this ability is an open challenge, of which two aspects are non-trivial: constructing symbolic representations of semantic content after training, and quantification of its adequacy with respect to the semantics of the application domain. Existing evaluation methods are typically restricted to standard CNN performance metrics and do not take into account the underlying decision-making process in terms of explicit semantic structure in the domain. In this paper we propose a taxonomy extraction method for supervised CNN classifiers to capture how symbolic class concepts and their hypernyms from a given domain are hierarchically organised in the model's subsymbolic representation. In addition, we propose a taxonomy ground truth comparison method to evaluate the “semantic adequacy” of the extracted hierarchy of class concepts. Our approach is tested using VGG-16, ResNet-18, ResNet-152 and trained on CIFAR-100 and ImageNet. Results show the influence of the dataset quality and architecture depth on semantic adequacy, as suggested by recent literature [1]. We also observe that existing techniques for injecting external knowledge to the models during the training phase may lead to better taxonomies. This suggests that the hierarchical-aware models may have a semantic advantage over their respective original architectures. Finally, we provide a fine grained approach for analysing CNN interpretability in terms of its semantic content.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable AI</kwd>
        <kwd>Taxonomy Extraction</kwd>
        <kwd>Convolutional Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Since their inception in 1989 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Convolutional Neural Networks (CNNs) are still widely used and
considered to be the state of the art in computer vision tasks. Each year, more sophisticated and
accurate CNN architectures are developed by researchers in both academia and industry [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
more real-world applications are making use of them [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The adoption of Convolutional Neural Networks (CNNs) in critical domains such as in healthcare
[
        <xref ref-type="bibr" rid="ref5 ref7">5, 6</xref>
        ] is yet to reach its full potential, mainly due to limitations in their domain-specific semantic
interpretability. Such limitations draw attention to the tradeof between performance accuracy and
the ability of a model to provide motivations for its decisions.
      </p>
      <p>
        In the field of Explainable AI (XAI), the issue of evaluating and comparing deep learning models
beyond their accuracy over a test set has been tackled by methods that provide global interpretations.
Examples include detection of concept importance [
        <xref ref-type="bibr" rid="ref8 ref9">7, 8</xref>
        ] and class hierarchy visualisation [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]. While
providing some form of explanation, methods like these can not be suficient for an objective
comparison between diferent models, since the resulting interpretations do not provide a metric that
compares CNN decision making with some external semantic ground truth.
      </p>
      <p>
        In this work, we provide methods for semantic interpretability of CNNs, based on how hierarchical
relationships between semantic concepts are captured by the model’s internal representation. To this
aim, we propose a taxonomy extraction method to derive a domain taxonomy from a trained CNN.
For example, for a model trained over the ImageNet dataset [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ], our method is capable of extracting
taxonomic axioms such as ( ℎℎ  ⊂  ⊂   ⊂     ⊂   ⊂  ) .
      </p>
      <p>By comparing such taxonomic axioms with a relevant ground truth class hierarchy (e.g., WordNet
for ImageNet) we can evaluate quantitatively how faithful the extracted taxonomy is to the selected
groundtruth. In the current context this metric is referred to as semantic adequacy. Note that,
independent of model accuracy, a higher semantic adequacy score implies better transparency by
allowing interpretation of model behaviour in terms of user-accepted concepts. Moreover, it makes
objective semantic comparison between models possible, provided that their training data is drawn
from classes in similar or related taxonomies.</p>
      <p>
        In order to achieve these goals, we build upon the idea of co-activation graph [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ], a data structure
based on correlation coeficients of neuron activation values. Semantic notions are introduced by
relating CNN output neurons to nodes in the graph. In this work, we combine co-activation graphs
with a taxonomy extraction method originally designed for knowledge graphs by [12]. An overview
of the approach can be seen in Figure 1. Finally, the resulting taxonomy can be compared to a ground
truth taxonomy to measure the semantic adequacy of the CNN classifier. The main contributions of
the paper are:
• A vector representation for semantic relationships of output classes, obtained by co-activation
analysis of a trained model. Such class embeddings are general purpose representations. We
use them for taxonomy extraction by means of hierarchical clustering.
• A method for taxonomy extraction from a trained model. The resulting taxonomies provide
symbolic explanations for sub-symbolic decision making.
• An evaluation metric for comparison between extracted and ground truth taxonomies,
measuring what we call the semantic adequacy of a model. In turn, this metric allows us to compare
diferent models in semantic terms, instead of merely their performance metrics.
      </p>
      <p>Experimental results indicate that the CNNs used can learn direct and transitive subclass
relationships reasonably well, especially considering that they were not explicitly trained for that and
did not have any a priori information about semantic hierarchies. An encouraging finding from our
results is that hierarchy-aware ResNets achieved the best results for both datasets. This support the
intuition that if we inject external knowledge into a neural network during the training process,
we can make the model more interpretable by making such knowledge explicit in relation to the
network’s decision process. Our experiments also show that the taxonomy extraction and the idea
of semantic adequacy can enhance global interpretability of deep models in terms of the semantics
pertaining to class hierarchy encoded in their internal representations.</p>
      <p>The rest of this paper is organised as follows. Section 2 presents related work for the topics of
global interpretation of CNNs and taxonomy extraction. Section 3 elaborates on the methodology for
extracting taxonomies from co-activation graphs built for CNNs. In Section 4 an experiment using
CNNs trained for CIFAR-100 and ImageNet is conducted. In Section 5 we present our conclusions
and discuss the obtained results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Semantic Global Interpretation of CNNs</title>
        <p>
          In XAI, global interpretations aim to provide insights into the behaviour of a model as a whole, instead
of explaining individual inference events. E.g., TCAV [
          <xref ref-type="bibr" rid="ref8">7</xref>
          ] identifies the most important concepts
for predicting each output class in a classification problem. TCAV is able to interpret classification
outcomes in terms of semantic concepts but fails to work well in treating correlations between
concepts, often present in real-world image datasets. This issue is addressed by [ 13], with a method
to detect cause-efect relations between concepts and the model predictions. Then, [ 14] proposes a
method to discover concepts and measure concept importance for predictions. Interpreting
causeefect relationships between concepts and output classes in this way is useful, but requires each
concept to be measured individually. As a result it does not easily allow for a direct and quantitative
comparison between models.
        </p>
        <p>
          Using a diferent approach, [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] explores how classes are hierarchically organised by CNNs. A visual
approach to discover hierarchical relationships between classes is combined with an
hierarchicalaware CNN architecture. This work is closely related to ours in the way we both explore hierarchical
structures. However, while their method requires visual and interactive analysis, our approach extracts
the hierarchical relationships automatically from the model. This is an important prerequisite for
the assessment of the semantic adequacy: without automatic extraction of hierarchical information,
the scale of modern CNNs makes semantic global interpretation infeasible. In addition, while [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ]
relies only on the confusion matrix, our taxonomy extraction explores the internal representations
of a model. This makes our approach suitable for interpreting separate layers in the model and
understanding the role of such layers in the overall decision making process.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Taxonomy extraction from graph structures</title>
        <p>In complex domains, constructing taxonomies by hand is an expensive task. Some approaches have
looked into taxonomy extraction from non-textual, structured and semi-structured data, such as
knowledge graphs. Authors in [12, 15] propose an unsupervised method to find hypernym axioms.
Vector embeddings for each node in the knowledge graph are calculated, resulting in a class centroid
and radius in the latent space. Then, based on the distance between the centroids, they form axioms
and construct their transitive closures to build a taxonomy.</p>
        <p>The main drawback of the approach in [15] is that concepts need to be directly represented as nodes
in the knowledge graph. The method proposed in [12] can help overcome this issue. Based on node
embeddings, their approach uses hierarchical clustering over the latent space to find a hierarchical
structure. By mapping concepts (or types) to cluster-nodes in this structure, a typed tree is formed
from which to construct a taxonomy. The advantage is that the clustering phase does not use any
information regarding the types. This makes it more suitable in our setup, where these types are
typically not represented by nodes in the graph. In this work, we use co-activation graphs as a graph
representation for CNNs and adapt the method in [12] to this graph for the taxonomy extraction. In
the next section, the methodology to combine the two is presented in detail.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Hierarchy-aware architectures</title>
        <p>
          Recent works have shown that CNNs can learn the underlying hierarchical structure between classes
during the training phase even though they were not explicitly trained for this specific task. Motivated
by this observation, authors in [
          <xref ref-type="bibr" rid="ref10">9, 16</xref>
          ] have shown how a CNN architecture can be modified in order
to help the model in the learning of such hierarchical structure during the training phase.
        </p>
        <p>
          In their work, the authors in [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] show how a modified Alex-Net architecture [ 17] can improve
accuracy and accelerate the process of learning class hierarchies. Their method works by adding
extra classification branches between some of the convolutional layers from the original architecture.
        </p>
        <p>
          On other hand, the method proposed by [16] adds extra classification layers after the final
classification layer of the original architecture. We consider the method from [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] more suitable for a
benchmark on semantic adequacy because the hierarchical structure is learned directly from the
convolutional layers and not from the final classification layer.
        </p>
        <p>
          Following [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] and by adapting their method for ResNets, a hierarchy-aware ResNet can be
constructed as follows: Given a hierarchy of depth  (root excluded), we add  − 1 branches in the
architecture that learn group level classifications and optimise for error at each level of the class
hierarchy. The branches are each composed of two fully-connected layers, and are evenly spread
along the residual blocks. These additions are suficient for our current experiments. We leave
further architectural optimisations (e.g., dimension of extra modules) to future work. One hypothesis
that can emerge from this is that hierarchy-aware CNNs may lead to better taxonomies than their
corresponding original architectures, since they were explicitly trained for learning the hierarchical
relationships between classes. To test this hypothesis, in Section 4 we compare the semantic
adequacy from hierarchy-aware CNNs constructed following [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] against their corresponding original
architecture.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We assume class concepts are organised in a taxonomy, and that a neural network was optimised
only to discriminate between leaf node classes. Our goal is to reconstruct, into symbolic form, the full
taxonomy from the internal sub-symbolic structure of the model. To this end, we introduce a novel
extraction method, designed such that the extracted taxonomy reflects how the model organises
output classes and their hypernyms hierarchically in its internal representation.</p>
      <p>
        We use co-activation graphs [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] as an intermediate representation to correlate neuron activities
with classes in a trained model. Embedding graph nodes as vectors results in a latent representation
of semantic relationships between classes as learned by the model. We then build on the taxonomy
extraction method of [12] and hierarchical clustering to transform latent representations back into a
semantic structure. An overview of the procedure is shown in Figure 1. The remainder of this section
is devoted to describing each step in further detail.
      </p>
      <sec id="sec-3-1">
        <title>3.1. From deep representations to co-activation graph</title>
        <p>Given a trained neural network model, the nodes of its co-activation graph stand in one-to-one
correspondence with its neurons. Each pair of nodes in the graph is connected by a weighted edge
determined by the Spearman correlation coeficients between neuron activation values on a test data
set. For neurons in dense layers this process is straightforward: there is a single activation value per
data sample. For neurons in convolutional layers, a single activation value is obtained by applying
average pooling on the feature map.</p>
        <p>The resulting graph provides relevant information on how dependencies between internal
representations impact the global behaviour of a classifier. This makes co-activation graphs a suitable
intermediate representation to analyse how a model transforms sub-symbolic information from pixels,
via its internal representations, into an assignment to a semantic class. Next, we will discuss how to
transform the statistical information in the co-activation graph into a semantic structure.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. From co-activation graph to taxonomy</title>
        <p>We modify the extraction method in [12] such that it applies to co-activation graphs. Three phases can
be distinguished: embedding of graph nodes into vector representation, followed by agglomerative
clustering, and finally assignment of semantic types to clusters. We now discuss further details of
the method. For the embedding of nodes from the co-activation graph to vector representations we
make use of existing embedding functions [18, 19]. The results from the first phase is a data set 
containing a vector for each node in the co-activation graph.</p>
        <p>Agglomerative clustering starts by creating a leaf cluster for every vector in  . At each iteration,
the two closest clusters are merged according to some chosen distance metric. Clustering terminates
when there is a single cluster containing every vector in  , resulting in a tree structure over the
vectors. Assigning semantic types to the tree will turn it into a taxonomy.</p>
        <p>In order to assign types to clusters, the method calculates the F-score  (, ) for each cluster 
and type  , which indicates how well  represents the entities in  . The F-score can be calculated as
shown in Equation 1, where  , is the number of entities with type  in cluster  ,   is the number
of entities in  and   is the number of entities with type  . A high F-score(C,t) indicates that cluster
 contains mostly entities of type  and the highest number of entities of type  is contained in  .
 (, ) = 2 ⋅</p>
        <p>,
  +  
(1)</p>
        <p>We remove clusters that are not associated to a type and evaluate the extracted taxonomy by
precision, recall and F-score over edges of the taxonomy (“direct” evaluation) and its transitive closure
(“transitive” evaluation).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Analysis</title>
      <p>The experimental evaluation has two distinct goals. First, to measure correlations of class similarity
in the embedding space with semantic similarity in the ground truth taxonomy. This indicates the
ability of learned representations to separate symbolic concepts in the taxonomy. Second, to evaluate
the semantic adequacy of the extracted taxonomy with respect to its ground truth. This indicates the
degree of transparency allowed by the particular models as expressed in semantic terms, and the
suitability of our method for semantic interpretation in general.</p>
      <sec id="sec-4-1">
        <title>4.1. Experiment Setup</title>
        <p>Experiments were conducted using CIFAR-100 and ImageNet datasets. For CIFAR-100 we trained
VGG-16, ResNet-18 and hierarchy-aware ResNet-18 (noted HA-ResNet-18). For ImageNet we trained
VGG-16, ResNet-152 and hierarchy-aware ResNet-152 (noted HA-ResNet-152).</p>
        <p>For each dataset and model, a co-activation graph containing only those connections with
correlation higher than 0.3 was built. We extracted the taxonomy using the pipeline in Figure 1. Node
embeddings were generated using two diferent embedding methods in order to check if the results
are consistent across diferent strategies: Node2Vec [ 19] and Fast Random Projection (FastRP) [18].
For both Node2Vec and FastRP we used the algorithm implemented by Neo4j.</p>
        <p>After calculating the node embeddings using the two methods above, the next phase is to apply
the agglomerative clustering algorithm. For this phase we have tested diferent distance criteria and
metrics, which influence the merging strategy of the agglomerative clustering. The metrics were
euclidean and cosine (when applicable) while the distance criteria used were: average (UPGMA),
weighted (WPGMA), complete, centroid and ward.</p>
        <p>At this point, we end up with a hierarchical clustering tree, and the next phase is to assign types
to the clusters that best represent the entities from each type. Because we want to compare the
extracted taxonomy with the WordNet hierarchy, we extracted the types directly from WordNet using
the nltk python package. Then, for each type  and each cluster  , we have calculated the   (, )
using Equation 1 and assigned types to clusters by solving the corresponding Linear Sum Assignment
problem. Finally, the clusters and types that are not associated are removed, and the resulting tree
hierarchy represents the extracted taxonomy.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Correlating class embedding with class similarity</title>
        <p>The goal of this analysis is to check whether the produced node embeddings can encode semantic
similarity between classes represented in that space. For this analysis, we have first calculated the
semantic similarity for every pair of classes in the dataset. This process was done by using the path
similarity metric available from the nltk package, which calculates how similar two concepts are
based on the shortest path that connects them on the WordNet hierarchy. We have then calculated
the spearman correlation between semantic similarity among classes and cosine similarity between
the corresponding classes in the embedding space.</p>
        <p>In Table 1 it is possible to observe that there is a positive correlation between semantic similarity
and cosine similarity among classes in the embedding space. This is a first indication that the
CNNs have learned semantic relationships between classes, which supports the idea that it may be
possible to reconstruct the taxonomical relationships between them. The HAResNet152 architecture
trained on ImageNet achieved the highest correlation, which gives a first evidence that the additional
information injected during the training phase helped this model to better capture the semantic
relationships between classes. It can also be noted that the results from Node2Vec are higher than
FastRP for CIFAR-100 while FastRP performs better for ImageNet, which indicates that the choice of
the embedding method may be incluenced by the density of the graph. However, the correlation only
is not a strong metric to compare the semantic adequacy between each architecture since they do not
expose which semantic relationships were learned by the model. The semantic adequacy comparison
between models is done in a more detailed way in the second analysis.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evaluating extracted taxonomies</title>
        <p>In this second analysis we evaluate the semantic adequacy of the taxonomies extracted using our
method by comparing them with the ground truth hierarchy extracted from WordNet. This evaluation
is conducted following the same principles used in [12], which uses both a direct and a transitive
evaluation. For example, an axiom such as  ℎℎ  ⊂  is going to cause a negative efect
in the direct evaluation, because, according to WordNet, the direct hypernym for a  ℎℎ 
is ℎℎ  . In the transitive evaluation, the goal is to evaluate high level axioms. Using the
previous example,  ℎℎ  ⊂  causes a positive efect in the transitive evaluation because
there is a transitive relationship between types  ℎℎ  and  , also according to WordNet.</p>
        <p>For this analysis, we also evaluated the semantic adequacy of an untrained model using the VGG-16
architecture initialised using random parameters, with the purpose of providing a lower bound
baseline. The precision, recall and F-score are calculated using the edges of the graphs representing
the extracted taxonomies and the edges of the graph representing the ground truth, as described
in Equation 2. For the direct evaluation we calculate the evaluation metrics based directly on the
respective graphs whereas for the transitive evaluation we consider the transitive closure.</p>
        <p>Ground truth :   = ( ,   ), Experimental :   = ( ,   )
⎧   =
⎪
⎪</p>
        <p>=
⎨
⎪
⎪  -  = 2 ⋅
⎩
|  ∩   |</p>
        <p>|  |
|  ∩   |
|  |
|  ∩   |
|  | + |  |
(2)
models we tested if there were a diference in their F-score distribution using the student t-test. The
null-hypothesis is that there is no diference between two distributions and p-value ≤ 0.05 rejects the
null hypothesis and indicates there is a statistical diference between the two distributions.</p>
        <p>From Table 4 it is possible to observe that, for the transitive evaluation, all HA-ResNet variations
are statistically more semantic adequate than pure ResNets and VGG-16. This is an encouraging
evidence that the injection of external knowledge during the training phase may help achieving more
interpretable models. We can also see that VGG-16 does not perform better than any other model,
which may indicate that the architecture depth can have an efect on semantic adequacy.</p>
        <p>
          Overall, our evaluation shows that the taxonomies extracted from CNNs using our method can
achieve reasonable direct and transitive F-scores, even when the models were not trained explicitly
for that. The best results in our analysis were generated by the hierarchical-aware architectures, as
proposed by [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ] and adapted for ResNets in this paper.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We proposed a method for semantic interpretability of deep representations by extracting taxonomies
from the internal structure of trained CNNs. Our approach represents a CNN as a co-activation graph
and adapts the taxonomy extraction method in [12] to such graph. We then introduce the concept
of semantic adequacy to measure how well a model captures the hierarchical relationship between
classes from a given domain by comparing its extracted taxonomy to a ground truth such as WordNet.</p>
      <p>The proposed taxonomy extraction method and semantic adequacy together can help in
comparing and choosing among diferent CNNs by exposing how well each model learned the semantic
relationships from a given dataset instead of relying purely on performance metrics. The next steps
include adapting the semantic adequacy in order to provide more fine-grained information, since the
value is currently associated to the extracted taxonomy as a whole. In this case we expect the metric
to inform which specific parts of the taxonomy are more or less adequate according to the ground
truth. For example, the extracted taxonomy may be more adequate for a specific subtree (e.g. dogs)
but less adequate for another (e.g. primates). This information can be useful not only for deciding
when to trust a given model but also for transfer learning, where a model may not be suitable for a
given task even though it may provide high accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This publication has emanated from research supported by Science Foundation Ireland (SFI) Grant
Number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund.
[12] F. Martel, A. Zouaq, Taxonomy extraction using knowledge graph embeddings and hierarchical
clustering, in: Proceedings of the 36th Annual ACM Symposium on Applied Computing,
SAC ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 836–844. URL:
https://doi.org/10.1145/3412841.3441959. doi:1 0 . 1 1 4 5 / 3 4 1 2 8 4 1 . 3 4 4 1 9 5 9 .
[13] Y. Goyal, A. Feder, U. Shalit, B. Kim, Explaining classifiers with causal concept efect (cace), 2019.</p>
      <p>URL: https://arxiv.org/abs/1907.07165. doi:1 0 . 4 8 5 5 0 / A R X I V . 1 9 0 7 . 0 7 1 6 5 .
[14] C.-K. Yeh, B. Kim, S. O. Arik, C.-L. Li, T. Pfister, P. Ravikumar, On completeness-aware
conceptbased explanations in deep neural networks, 2019. URL: https://arxiv.org/abs/1910.07969. doi:1 0 .
4 8 5 5 0 / A R X I V . 1 9 1 0 . 0 7 9 6 9 .
[15] P. Ristoski, S. Faralli, S. P. Ponzetto, H. Paulheim, Large-scale taxonomy induction using entity
and word embeddings, in: Proceedings of the International Conference on Web Intelligence,
WI ’17, Association for Computing Machinery, New York, NY, USA, 2017, p. 81–87. URL:
https://doi.org/10.1145/3106426.3106465. doi:1 0 . 1 1 4 5 / 3 1 0 6 4 2 6 . 3 1 0 6 4 6 5 .
[16] R. L. Grassa, I. Gallo, N. Landro, Learn class hierarchy using convolutional neural networks,</p>
      <p>CoRR abs/2005.08622 (2020). URL: https://arxiv.org/abs/2005.08622. a r X i v : 2 0 0 5 . 0 8 6 2 2 .
[17] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional
neural networks, in: Proceedings of the 25th International Conference on Neural Information
Processing Systems - Volume 1, NIPS’12, Curran Associates Inc., Red Hook, NY, USA, 2012, p.
1097–1105.
[18] H. Chen, S. F. Sultan, Y. Tian, M. Chen, S. Skiena, Fast and accurate network embeddings via
very sparse random projection, 2019. URL: https://arxiv.org/abs/1908.11512. doi:1 0 . 4 8 5 5 0 / A R X I V .
1 9 0 8 . 1 1 5 1 2 .
[19] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, 2016. URL: https:
//arxiv.org/abs/1607.00653. doi:1 0 . 4 8 5 5 0 / A R X I V . 1 6 0 7 . 0 0 6 5 3 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>Network dissection: Quantifying interpretability of deep visual representations, 2017</article-title>
          . URL: https://arxiv.org/abs/1704.05796.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 1 7 0 4 . 0 5 7 9 6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Denker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hubbard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Jackel</surname>
          </string-name>
          , Backpropagation Applied to Handwritten Zip Code Recognition,
          <source>Neural Computation</source>
          <volume>1</volume>
          (
          <year>1989</year>
          )
          <fpage>541</fpage>
          -
          <lpage>551</lpage>
          . URL: https://doi.org/10.1162/neco.
          <source>1989.1.4.541. doi:1 0 . 1 1</source>
          <volume>6 2</volume>
          / n e c
          <source>o . 1 9</source>
          <volume>8 9 . 1</volume>
          .
          <issue>4</issue>
          .
          <article-title>5 4 1 . a r X i v : h t t p s : / / d i r e c t . m i t</article-title>
          . e d u / n e c o / a r t i c l e - p
          <source>d f / 1 / 4 / 5 4 1 / 8 1</source>
          <volume>1 9 4 1</volume>
          / n e c
          <source>o . 1 9 8 9 . 1 . 4 . 5 4 1</source>
          . p d f .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sohail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Zahoora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Qureshi</surname>
          </string-name>
          ,
          <article-title>A survey of the recent architectures of deep convolutional neural networks</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>53</volume>
          (
          <year>2020</year>
          )
          <fpage>5455</fpage>
          -
          <lpage>5516</lpage>
          . URL: https: //doi.org/10.1007/s10462-020
          <source>-09825-6. doi:1 0 . 1 0 0 7 / s 1 0</source>
          <volume>4 6 2 - 0 2 0 - 0 9 8 2 5 - 6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Alzubaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Humaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dujaili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Al-Shamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Santamaría</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Fadhel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Amidie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Farhan</surname>
          </string-name>
          ,
          <article-title>Review of deep learning: concepts, CNN architectures, challenges, applications, future directions</article-title>
          ,
          <source>Journal of Big Data</source>
          <volume>8</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10. 1186/s40537-021
          <source>-00444-8. doi:1 0 . 1 1 8 6 / s 4 0</source>
          <volume>5 3 7 - 0 2 1 - 0 0 4 4 4 - 8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zeleznik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Foldyna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eslami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , I. Alexander,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Alvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Uno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kikuchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karady</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J.-E. Scholtz,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mayrhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lyass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Mahoney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Massaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Vasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Douglas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J. W. L.</given-names>
            <surname>Aerts</surname>
          </string-name>
          ,
          <article-title>Deep convolutional neural networks to predict cardiovascular risk from computed tomography</article-title>
          ,
          <source>Nature Communications</source>
          <volume>12</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1038/s41467-021-20966-2. doi:
          <volume>1</volume>
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>1 0 3 8 / s 4 1</source>
          <volume>4 6 7 - 0 2 1 - 2 0 9 6 6 - 2</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shadmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bregman-Amitai</surname>
          </string-name>
          , E. Elnekave,
          <article-title>Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct</article-title>
          ,
          <source>in: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI</source>
          <year>2018</year>
          ),
          <year>2018</year>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>28</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I S B I .</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wattenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gilmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wexler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Viegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sayres</surname>
          </string-name>
          ,
          <article-title>Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) (</article-title>
          <year>2017</year>
          ). URL: https://arxiv.org/abs/1711.11279.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 1 7 1 1 . 1 1 2 7 9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wexler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Towards automatic concept-based explanations</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>9273</fpage>
          -
          <lpage>9282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jourabloo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Do convolutional neural networks learn class hierarchy?</article-title>
          ,
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          <volume>24</volume>
          (
          <year>2018</year>
          )
          <fpage>152</fpage>
          -
          <lpage>162</lpage>
          . URL: https://doi.org/10.1109%
          <fpage>2Ftvcg</fpage>
          .
          <year>2017</year>
          .
          <volume>2744683</volume>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0 9</volume>
          / t v c
          <source>g . 2 0</source>
          <volume>1 7 . 2 7 4 4 6 8 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition</article-title>
          , Ieee,
          <year>2009</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Horta</surname>
          </string-name>
          , I. Tiddi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Little</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mileo</surname>
          </string-name>
          ,
          <article-title>Extracting knowledge from deep neural networks through graph analysis</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>120</volume>
          (
          <year>2021</year>
          )
          <fpage>109</fpage>
          -
          <lpage>118</lpage>
          . URL: https: //www.sciencedirect.com/science/article/pii/S0167739X21000613. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . f u t u r e .
          <source>2 0 2 1 . 0 2 . 0 0 9 .</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>