<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>FALL SYMPOSIUM SERIES, Thinking Fast and Slow and Other Cognitive Theories in AI, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Graph-based Neural Modules to Inspect Attention-based Architectures: A Position Paper</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Breno W. Carvalho</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artur S. d'Avila Garcez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luís C. Lamb</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>City</addr-line>
          ,
          <institution>University of London</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research</institution>
          ,
          <addr-line>Rio de Janeiro</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>MIT Sloan School of Management</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>UFRGS</institution>
          ,
          <addr-line>Porto Alegre</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Encoder-decoder architectures are prominent building blocks of state-of-the-art solutions for tasks across multiple fields where deep learning (DL) or foundation models play a key role. Although there is a growing community working on the provision of interpretation for DL models as well as considerable work in the neuro-symbolic community seeking to integrate symbolic representations and DL, many open questions remain around the need for better tools for visualization of the inner workings of DL architectures. In particular, encoder-decoder models ofer an exciting opportunity for visualization and editing by humans of the knowledge implicitly represented in model weights. In this work, we explore ways to create an abstraction for segments of the network as a two-way graph-based representation. Changes to this graph structure should be reflected directly in the underlying tensor representations. Such two-way graph representation enables new neuro-symbolic systems by leveraging the pattern recognition capabilities of the encoder-decoder along with symbolic reasoning carried out on the graphs. The approach is expected to produce new ways of interacting with DL models but also to improve performance as a result of the combination of learning and reasoning capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Neuro-symbolic models</kwd>
        <kwd>Deep Learning explainability</kwd>
        <kwd>Model introspection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>We live in hustling times for Artificial Intelligence (AI) research with many optimistic
perspectives from researchers from Computer Science and other areas. We are entering an era where our
models not only can generalize from examples of a given task, but given the appropriate context
and conditions, they can generalize across diferent tasks, indicating an emergent phenomenon
that is almost impossible to predict. Such models, notably deep learning (DL) ones are powerful
and influential, yet the current drawbacks in design and reliability are now obvious. Those
embedding
oder
bedding
try point)
der
mbedding</p>
      <p>Representation
module
2</p>
      <p>1
Multi-Head</p>
      <p>Attention
InputEmbedding</p>
      <p>Multi-Head
Attention
Multi-Head
Attention
Output
Embedding
models, like fun-house mirrors, can distort aspects of their training data and generate false
afirmations even though they were fed with the correct facts necessary for the answer. They
can also lean toward unethical biases and make unsupported claims. It is therefore important
to be able to edit a model’s answer and context to first understand the supporting facts or
assumptions behind a given answer and second, correct or provide new contextual information
so as to increase trust in the model’s outputs.</p>
      <p>
        One approach in this direction is to create introspection entry points into the models so
as to peak into the operations of an intermediate layer (as depicted in Figure 1), interacting
directly with it via graph editing. Those introspection entry points are neural layers not aimed
to contribute to performance in any single task but to add an extra layer to the developer’s
interaction with the model by providing an abstraction to inspect specific parts of the network.
There are already a few relevant methods in the literature, some of which are listed in the survey
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and use graph representations as part of the network structure itself. At the same time,
the field of neuro-symbolic (NeSy) AI research investigates ways of merging explicit symbolic
knowledge with many such sub-symbolic neural approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        At the Robert S. Engelmore Memorial Lecture, during the AAAI Conference on Artificial
Intelligence, New York, February 10th, 2020, in a talk entitled The Third AI Summer, Henry Kautz
presented a taxonomy for NeSy models, with six main categories [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Adding those inspection
nodes, which ideally should not harm model performance at the target task, falls into the
Neuro; Symbolic category in Kautz’s taxonomy. This category includes systems where a neural
component and a symbolic component mutually share information, each with a diferent goal.
In this way, the inspection module can help edit the implicit knowledge stored in the DL model
and serve even as a debugging tool. One possible approach to building such inspection modules
is based on Graph Convolutional Neural Networks (GNN) [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. By combining inspection with
the model one can perform a secondary task of internal state-representation of the model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
As DL models grow in size and complexity, we believe that those introspection tools in the
network may allow for a more in-depth understanding of such models. This raises the question
of the form of introspection and extensions thereof that might be expected to produce a better
understanding of DL models.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Short-term Objectives</title>
      <p>Our immediate objective is to design and develop an initial tool set for interpreting encoded
knowledge in deep learning (large) language models and editing it. We aim to add components to
infer a graph structure from specific tensors, so-called entry points, in the model’s architecture.
In this way, a natural step is to evaluate the feasibility of implementing diferent approaches for
those graph entry points (see Figure 2a). We call entry points compact and non-sparse tensors
along the network pipeline that can also be connected to a GNN to produce a graph abstraction
of this stage of the overall network. For now, we consider only DL architectures based on the
overall encoder-decoder pattern. Those are usually models used in transduction tasks, which
make up a vast class of DL models. Our short-term goals are:</p>
      <sec id="sec-2-1">
        <title>To compare diferent alternatives for implementing graph constraints without signifi</title>
        <p>cantly reducing performance. Creating graph constraints to specific entry points means
ensuring that the underlying tensor always has a meaningful compact graph representation if
the overall model was fed with valid input vectors. To this end, we contemplate adopting a few
alternatives, such as having an auxiliary loss function that will signal whether the encoder was
able to create a valid graph. It will also signal if the decoder can generate the expected output.
Another approach is to train a GNN decoder and apply it to the output of the encoder part.
Aiming at language tasks supports both kinds of strategies since there are extensive Language
Resources that can be used as an initial graph structure.</p>
        <sec id="sec-2-1-1">
          <title>To find semantic representations for the edges and nodes. A representation of one</title>
          <p>embedding without any semantics associated with it might be as daunting as looking at the
model itself. We seek to adapt existing work in the literature to be able to create a representation
with more meaningful names for entity nodes and connections.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>To be able to edit the model to increase performance, remove biases or add new knowl</title>
        <p>edge. Finally, any editing of the graph representation should be reflected in the encoder’s
weights and influence the final output of the model, ideally in a way that suggests a causal
relationship. This enables us to interact actively and edit context within the model to ensure
that the main component of a given output is considered to be relevant and correct by an expert
in the domain of interest.</p>
        <sec id="sec-2-2-1">
          <title>To Design a Method for generating interpretations in a local scope. In the short term,</title>
          <p>we are concerned with local interpretations where we interpret the model components’ states
and context while processing a specific input. This approach is already insightful for a series of
use cases and might be the stepping-stone for developing a global interpretation.</p>
          <p>
            Achieving these goals will require the development of a simple but versatile tool-set that might
be useful to the community for inspecting and editing models in diverse research or industrial
areas (similarly to Grad-Cam [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] and Bertology [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] and many other helpful visualization tools
found to be relevant by DL practitioners).
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Long-term Objectives</title>
      <p>Our main goal is to be able to inspect large deep learning language models through diferent
entry points in such a way as to increase trust in their output. Transformer or
transformerinspired architectures dominate the state-of-the-art of such language models. Considering that
most transformer architectures follow an encoder-decoder model, fulfilling the above short-term
objectives should provide an entry point into inspecting the inner state of the model and its
context. We conjecture that having a few editable graph layers attached to a transformer of
considerable size will enable an expert to study the relationships of the many abstraction levels
one might find by inspecting those layers. Our long-term goals are:</p>
      <sec id="sec-3-1">
        <title>To have a method to enable experts to visualize, edit and interact directly with the</title>
        <p>
          language models. One might consider exploring multiple entry points. Manually editing
context or knowledge in the network through a graph interface linked to diferent layers might
provide much richer insight. Possibilities for graph-entry points are depicted in Figure 1. The
goal is to select tensors that are most descriptive of the model’s current state. One can also use
this method with visualization methods such as [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>To design a new family of neurosymbolic models based on systematic interactions with
fragments of the underlying architecture. One might also leverage the power of symbolic
reasoners to perform expansion and inference on the graph context representation to fix or
enhance the model’s output. This approach is much aligned with a fast-slow AI perspective,
where the learning of the language model represents the fast, pattern-matching, system 1,
and the slower, graph-based reasoning represents system 2 (see Figure 2b). In this figure, we
illustrate that a symbolic expansion of the graph representation of specific parts of the network
might be used to provide the model with enough information to go from a wrong answer, as
depicted in Figure 2a, to the correct answer (see Figure 2b).</p>
      </sec>
      <sec id="sec-3-2">
        <title>To explore alternative representations of the entry points that are most useful in a</title>
        <p>specific context. As we explore larger language models we might find that diferent entry
points might benefit from diferent representation formats. It might even be the case that
reusing smaller transformers will allow us to have natural language representation for some
model entry points. An analogy would be when humans perform complex tasks and mentally
talk to themselves while performing the task. Another class of representation alternatives to
Coughing can help stop
a heart attack.</p>
        <p>There is no evidence that coughing
can help stop a heart attack.</p>
        <p>Representation</p>
        <p>Module
Encoder
(system 1)
Input Embedding
Can coughing
effectively stop a
heart attack?</p>
        <p>Decoder
(system 1)</p>
        <p>Output
Embedding</p>
        <p>Representation</p>
        <p>Module
Encoder
(system 1)
Input Embedding
Can coughing
effectively stop a
heart attack?</p>
        <p>Symbolic
Reasoner
(system 2)</p>
        <p>Decoder
(system 1)</p>
        <p>
          Output
Embedding
(a) Encoder-decoder example with an entry (b) Illustrative representation of neuro-symbolic
reapoint for an interpretation module. One soning to complement the pattern learning
capabilmight explore multiple representation lan- ities of attention-based models. A GNN entry point
guages depending on the resources available. is interpreted as an AI system 2 component via the
In this case, the model outputs an incorrect application of a symbolic reasoner.
answer for a question from the TruthfulQA
dataset, [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], due to incomplete context
information, that we hope to make explicit with
the graph representation.
be explored consists of hyper-graphs that might have a more suitable role in explaining the
network’s inference mechanism since they are able to convey recursive levels of abstraction in
the information being represented.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>To develop a method where the training data gives the representation semantics and</title>
        <p>not necessarily by external language resources. In an ideal scenario, the only information
available during training should be suficient to build meaningful representations. Even with
available linguistic resources, expanding the entry point representation with tokens and concepts
from the training data could allow the use of a more robust and extensible representational
language such as a graph, hypergraph, or natural language.</p>
        <p>To develop a pre-trained zoo of entry point representations for specific architectures
that can be further specialized. It should be possible to use a library of visualization
modules pre-trained on commonplace text corpora as a baseline, and specialize it with specific
domain corpora. This process might accelerate the inspection process and also diminish the
learning curve of experts from diferent domains on how to use those tools. At this point, we
are aiming at modularity. We intend not only to have a methodology but also specific modules
that experts from diverse backgrounds can reuse.</p>
        <sec id="sec-3-3-1">
          <title>To define a method comprising both local and global interpretation scope. In opposi</title>
          <p>tion to local scope interpretations, global interpretations aim at providing insight into the overall
behavior of the model considering an entire task. For instance, understanding what parts of
a large language model are responsible for summing two numbers or performing sentiment
analysis.</p>
          <p>To achieve those objectives, one should build a robust methodology to edit and interact
with large language models at multiple levels of abstraction processed by the model. This
methodology might enable further insight into the inner workings of such models.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Introspection Modules</title>
      <p>
        Adding introspection modules to current deep learning architectures has the potential to enable
better interpretability, but also unlock new ways to edit and condition those models. As discussed
in detail by this joint initiative [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], large-scale language models (also called foundation models
for their re-usability across multiple tasks) pose serious research and societal challenges. We
discuss a few of them in this paper’s Challenges section.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Fairness and accountability goals</title>
        <p>
          Although somewhat hard to quantify, fairness and accountability are important to acknowledge
in what concerns trust in deep learning models of far-reaching decisions. Unfairness is
characterized in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] into two classes: prediction outcome discrimination, and prediction quality
disparity. An example of the first class is when a model predicts an unfavorable treatment of a
demographic group, such as bias against women. The second class refers to a model performing
poorly for a determined group of individuals. Both cases raise the need to flag such models and
correct them. Accountability might be understood as a capability for post-hoc inspection of the
model in order to make it available for auditing the causes of the model’s outcome [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Following [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Section 4.11, which discusses the concept of interpretability of foundation
models, we contemplate three categories of interpretability: what (the model’s limits and
capabilities), why (finding what in the data set is responsible for the model’s output), and how
(gaining an understanding of how certain parts and mechanisms of the model impact the output).
In this work, we focus on the why and how interpretability questions.
        </p>
        <p>As well as being able to interpret those models, one should also seek to obtain better tools to
assist us in accelerating the construction or adaptation of new models.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model editing and design goals</title>
        <p>By understanding individual model mechanisms, we seek to build a compositional understanding
of the complex behavior of a foundation model. A possible way is to interconnect or build larger
models from pre-trained components and reduce societal risks by building more reliable and
interpretable models through “debugging" tools.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Challenges</title>
      <p>Creating those representation/introspection modules raises several challenges. Some of these,
such as the visual interpretation of self-attention heads, have been addressed to some extent in
the literature. Others remain as yet to be explored.</p>
      <p>
        The why and how interpretability questions from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], as mentioned in the previous section,
are essential for the understanding of large language models, but face many open challenges:
      </p>
      <sec id="sec-5-1">
        <title>5.1. Challenges on describing why (model behavior) and how (model mechanisms)</title>
        <p>Studying the influence of the model’s input might be performed by carefully studying how the
behavior of the model changes with changes in the input. The ability to peek inside the model
also might be used to provide insight into which parts of the input are more relevant to the
model. The task of generating interpretations for the inner workings of a language model poses
further challenges:</p>
        <sec id="sec-5-1-1">
          <title>1) It is not clear how the components of those language models are interconnected.</title>
          <p>
            Although the overall design of a model’s architecture may be clear, the actual weights of such
units have emergent behavior that might be fairly hard to predict. A central question to a
fair representation of the mechanisms inside a language model is that of to what extent those
mechanisms behave as one coherent model or as many models. As the model learns to generalize
across multiple tasks, it might be the case that its components find weight sub-spaces that
virtually correspond to many sub-models, possibly one for each task. It isn’t clear yet to which
point in this spectrum current foundation models belong. This lack of understanding and clarity
leads to our next challenge.
2) Local and global interpretation methods might have pitfalls related to the complexity
of such language models. Given the highly complex behavior of those models and their
emergent properties, it is unwise to jump to conclusions based on a single global or local
interpretation method, even though there are both local[
            <xref ref-type="bibr" rid="ref13">13, 14</xref>
            ] and global[15, 16] methods for
providing insights into task-specific models. For instance, considering that those large models
might find weight sub-spaces for distinct tasks it might be the case that studying the model for
a few of those sub-tasks doesn’t provide a fair picture of its behavior in other tasks or even for
other similar inputs.
          </p>
        </sec>
        <sec id="sec-5-1-2">
          <title>3) The inter-relationship of diferent components in the model might be counter</title>
          <p>intuitive or hard to determine. It is not clear whether semantically related mechanisms,
e.g. the parts of a language model responsible for summing up two numbers, are the same as the
ones used to perform a related task, such as e.g. summing two numbers expressed as numeral
nouns.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Challenges about the introspection modules themselves</title>
        <p>
          Besides the inherent challenges of deriving interpretations from huge language models, there
are also intrinsic points to be evaluated in the design of the introspection model. There are
questions and decisions to be made about the desired representational language and format,
and there are also considerations about the architecture of those modules and their refinement
procedure.
1) Creating meaningful interpretations assumes an underlying vocabulary and
semantics. This can be thought of as the symbol grounding problem, i.e. the definition of
a vocabulary for the representation generated by the introspection modules. This might be
achieved through the use of an underlying ontology that will constrain the interpretation space
into a set of predefined concepts and vocabulary. It remains unclear how the semantics of the
ontology will be transferred to the interpretation. It might be achieved as a secondary task
happening during the training of the language model that we want to inspect.
2) Diferent abstraction levels must be considered in order to avoid overly complicated
and misguiding interpretations. This happens because the size of a given interpretation
may be prohibitive, both in terms of meaningful visualization and interpretability. To circumvent
this one needs to employ diferent levels of abstraction, although those model layers closer to
the output layers seem to pertain to more abstract features of the data set. Walking back and
forth between those diferent levels of abstraction reinforces the need to guarantee consistency
in the interpretations’ vocabulary from diferent entry points.
3) Interpretations should be consistent and reliable across a specific domain or data
set. This is not an easy requirement to enforce. It might be the case that given the complexity
of those models, consistency can be enforced only partially for most of the time. We also need
to consider consistency across related input or context. For instance, it should not be the case
that two very similar text prompts result in very diferent interpretations when looking at the
same entry point in the model. A possible approach is to anchor those graph representations on
concepts and constructs from public linguistic resources, such as FrameNet [17], VerbNet [18],
WordNet [19] and Propbank [20], all connected by the SemLink project [21]. The process of
generating a consistent interpretation must also be subject to auditing to ensure the faithfulness
of the entire system.
4) Considering that the introspection module can be viewed as a black-box component,
one must devise metrics of trustworthiness specific to it. The GNN or other DL model
used to generate representations from parts of the language model is also a black-box module
that might pose itself some fairness questions [22]. We hope that it will prove feasible to keep
those models relatively small. It is important to make sure that relying on those modules will be
much less complex than the model being studied. In such case, existing visualization methods
such as [
          <xref ref-type="bibr" rid="ref8">8, 23</xref>
          ] may sufice to audit them and improve trust.
        </p>
        <p>In summary, the goal of adding a human-editable and interpretable representation of context
and the state of specific parts of a language model has many open questions that to be solved
will require drawing contributions from diferent areas of expertise. One way to face some
of the challenges outlined here is to combine approaches that use both linguistic resources
to provide the needed representation building blocks (e.g. concepts and relations) and to use
tokens and concepts from the training data set as building blocks.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Outlook</title>
      <p>As mentioned in previous sections, this kind of neural-symbolic work draws from a diverse
mixture of research areas in AI and Computer Science more generally.</p>
      <sec id="sec-6-1">
        <title>6.1. Peeking inside DL models:</title>
        <p>While characterizing and describing the behavior of a deep learning model, one might consider
the data set and the predicted outputs, or also the state variations of the models. It is also
possible to build interpretations about the model behavior on specific data samples or trying to
understand how it behaves in a broader sense when performing a particular task.</p>
        <p>
          Local scope interpretation methods target insight into the model’s behavior on specific inputs
or contrasting a given set of data points. Examples include [14], an approach that uses influence
functions from robust statistics to understand how perturbations in the training set influence
the model behavior; [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] uses the concept of interpretations as meta-predictors, i.e. one can
use an explanation to predict the output of a model, and they instantiate this approach using
image classification masks. By contrast, global methods are used to describe general aspects
and behaviors of the model. Interesting examples include [15] and [16]. Both local and global
methods treat the language models as black-box objects to be understood through their inputs
and outputs.
        </p>
        <p>
          Other than looking at the training set and the models’ outputs, another class of interpretation
methods that is more aligned with the work proposed here, consists of models that extract
information from the model weights. This includes GradCam [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which does backward propagation
of gradients to estimate which pixels of the input where more relevant for a Convolutional
Network output; [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] provides a tool-set for visualizing the weights of the self-attention heads of
BERT-like models. Although those methods can be very insightful when analyzing task-specific
models, considering that current language models might have distinct behavior across diferent
tasks, it is unclear how insightful those methods could be in this scenario. Most likely, the
designers and analysts of such models would need to employ multiple methods of interpretation
and visualization to have a fair understanding of the models’ behavior.
        </p>
        <p>As an alternative to the line of work described in this paper, where we propose the use of
external (and hopefully auditable) modules to peak into the language models, one can train the
model itself to generate an explanation for its output [24]. This approach faces some skepticism
since language models can generate plausible, although untrue, outputs and nothing prevents
this behavior from extending to the model’s self-explanations.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Graph deconvolution and Neurosymbolic AI</title>
        <p>We regard GNNs and neuro-graph algorithms in general as a promising general approach for
both local and global interpretation.</p>
        <p>
          Graph convolution is defined by analogy to convolutional layers over Euclidean data [ 25, 26]
and it is an important tool to infuse deep learning models with relational knowledge [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
Conversely, we conjecture that using specific decoder units to extract graph representations
from the entry point tensors (even if having to condition the training of the underlying language
model) might be a worthwhile approach to obtain meaningful interpretations from language
models.
        </p>
        <p>We might consider each introspection module as a simple decoder unit that maps the encoded
embedding into the interpretation space. As mentioned in the Challenges section, one might
initially use linguist resources conditioned on the sentence processed by the language model
to assemble the vocabulary used to form the interpretations. A possible approach is to use
heuristics to get concepts from the language model prompt and build a simple graph, even
combining it with a syntax or dependency tree to have an initial graph to use and evaluate
the decoder unit used as an introspection module. In this sense, there are a few approaches
in the literature that one might draw inspiration from. For instance, there is work on graph
encoders [27], and also deconvolutional graph networks [28] to derive a graph from the entry
point embedding. At this stage, it might be unclear how to define the appropriate abstraction
level of the interpretation.</p>
        <p>Instead of building the modules as decoder units trained on a heuristic graph, another
approach is to adapt variational graph autoencoders [29]. A variation of the encoding part could
output a Graph Convolutional Network-like embedding, and the decoder would try to retrieve
the original input of the language model (assuming that the entry point embedding might
encode a smooth topological representation of concepts). This assumption is not guaranteed
to hold, but it is an interesting issue to investigate. One might use deconvolution networks
[28] to infer a graph without prior conditioning. The intuition behind this rationale is that this
deconvolution approach would create representations without any supervision, which might be
too complicated or obscure to be used as interpretation.</p>
        <p>As stated in our short and long-term objectives, inferring an arbitrary graph from those entry
points is not enough for our interpretation and editing purposes. Those graphs also need to
have a clear semantics that correlates with the model’s output and with commonsense and
domain knowledge as well.</p>
        <p>One might wonder why bother with graph representations and not try to go directly to
building natural language interpretations. We conjecture that natural language will lack explicit
restrictions that are needed for clarity of a given interpretation and also that the resulting
natural language introspection module would need to be almost as complex as the language
model itself, thus defeating its original purpose.</p>
        <p>
          There are several neurosymbolic methods that aim to infuse the deep learning model
architectures with symbolic or logical structures [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], including logic tensor networks (LTN) [30] and
logical neural networks (LNN) [31]. They can also be used as parts of the introspection modules
described here in order to make DL models more auditable and reliable.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>Large language models are becoming state-of-the-art in many tasks and fields, and as they
grow in popularity, they also grow in complexity. Those models usually follow a transduction
encoder-decoder pattern based on self-attention mechanisms that are repeated multiple times
to a point where they are so complex that surprising behavior emerges from them. In this
paper, we state the need for inspection methods to both characterize and describe those models’
behaviors. Despite the consolidated literature on deep learning interpretability, the community
still needs to build diferent tools to inspect diferent aspects of those models. We believe that
graph-based algorithms, in particular, deconvolution graph networks and variational graph
auto-encoders might be key to generating formal yet flexible interpretations of parts of such
large language models.</p>
      <p>Interpretability of language models is a blooming field as those models keep growing and
reaching multiple uses in society. There is a vast literature to support this field, but there is a
lfagrant need for new methods and approaches to deal with such models’ high generalizability.
Being able to audit and edit aspects of the model during development and deployment should
allow the community to correct flawed inferences performed by those models and adjust to
counter certain unethical biases. Thus inspection tools have the potential for tremendous impact
if they can help us develop models we can trust and rely on to shape the further development
of AI.</p>
      <p>We envision a roadmap for building tools with which multidisciplinary communities can
start inspecting small transduction models based on self-attention, and progressively scale up
to large-scale language models. By doing so, those communities are empowered to shape the
widely spread use cases of such models in society. To this end, we hope to start a conversation
to unify our eforts, lower the threshold for other machine learning researchers to join us, and
bring these communities closer together with a common language.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Acknowledgments</title>
      <p>We thank Viviane Torres, Sandro Rama Fiorini, Emílio Ashton Brasil, and Renato Cerqueira for
insightful conversations on the topic. Luis Lamb was supported in part by CAPES and CNPq,
Brazil.
[14] P. W. Koh, P. Liang, Understanding black-box predictions via influence functions, in:</p>
      <p>International conference on machine learning, PMLR, 2017, pp. 1885–1894.
[15] S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, K.-R. Müller, Unmasking
clever hans predictors and assessing what machines really learn, Nature communications
10 (2019) 1–8.
[16] A. W. Thomas, H. R. Heekeren, K.-R. Müller, W. Samek, Analyzing neuroimaging data
through recurrent deep learning models, Frontiers in neuroscience 13 (2019) 1321.
[17] C. F. Baker, C. J. Fillmore, J. B. Lowe, The berkeley framenet project, in: COLING 1998</p>
      <p>Volume 1: The 17th International Conference on Computational Linguistics, 1998.
[18] M. Palmer, K. L. Kipper, et al., Verbnet, The Oxford Handbook of Cognitive Science (2004).
[19] G. A. Miller, Wordnet: A lexical database for english, Commun. ACM 38 (1995) 39–41.</p>
      <p>URL: https://doi.org/10.1145/219717.219748. doi:10.1145/219717.219748.
[20] M. Palmer, D. Gildea, P. Kingsbury, The proposition bank: An annotated corpus of semantic
roles, Computational linguistics 31 (2005) 71–106.
[21] M. Palmer, Semlink: Linking propbank, verbnet and framenet, in: Proceedings of the
generative lexicon conference, GenLex-09, Pisa, Italy, 2009, pp. 9–15.
[22] A. Jacovi, Y. Goldberg, Towards faithfully interpretable nlp systems: How should we define
and evaluate faithfulness?, arXiv preprint arXiv:2004.03685 (2020).
[23] X. Shi, F. Lv, D. Seng, J. Zhang, J. Chen, B. Xing, Visualizing and understanding graph
convolutional network, Multimedia Tools and Applications 80 (2021) 8355–8375.
[24] D. C. Elton, Self-explaining ai as an alternative to interpretable ai, in: International
conference on artificial general intelligence, Springer, 2020, pp. 95–106.
[25] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural
network model, IEEE transactions on neural networks 20 (2008) 61–80.
[26] B. Sanchez-Lengeling, E. Reif, A. Pearce, A. B. Wiltschko, A gentle introduction to graph
neural networks, Distill 6 (2021) e33.
[27] W. L. Hamilton, R. Ying, J. Leskovec, Representation learning on graphs: Methods and
applications, arXiv preprint arXiv:1709.05584 (2017).
[28] J. Li, J. Li, Y. Liu, J. Yu, Y. Li, H. Cheng, Deconvolutional networks on graph data, Advances
in Neural Information Processing Systems 34 (2021) 21019–21030.
[29] T. N. Kipf, M. Welling, Variational graph auto-encoders, arXiv preprint arXiv:1611.07308
(2016).
[30] L. Serafini, A. d. Garcez, Logic tensor networks: Deep learning and logical reasoning from
data and knowledge, arXiv preprint arXiv:1606.04422 (2016).
[31] R. Riegel, A. Gray, F. Luus, N. Khan, N. Makondo, I. Y. Akhalwaya, H. Qian, R. Fagin,
F. Barahona, U. Sharma, et al., Logical neural networks, arXiv preprint arXiv:2006.13155
(2020).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/ paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          , A. S.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gori</surname>
            ,
            <given-names>M. O. R.</given-names>
          </string-name>
          <string-name>
            <surname>Prates</surname>
            ,
            <given-names>P. H. C.</given-names>
          </string-name>
          <string-name>
            <surname>Avelar</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Graph neural networks meet neural-symbolic computing: A survey and perspective</article-title>
          ,
          <source>in: IJCAI</source>
          <year>2020</year>
          ,
          <article-title>ijcai</article-title>
          .org,
          <year>2020</year>
          , pp.
          <fpage>4877</fpage>
          -
          <lpage>4884</lpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2020</year>
          /679. doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2020</year>
          /679.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] A.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          <string-name>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <surname>Neurosymbolic</surname>
            <given-names>AI</given-names>
          </string-name>
          :
          <article-title>the 3rd wave</article-title>
          , CoRR abs/
          <year>2012</year>
          .05876 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2012</year>
          .05876. arXiv:
          <year>2012</year>
          .05876.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Kautz</surname>
          </string-name>
          ,
          <article-title>The third AI summer: AAAI robert s. engelmore memorial lecture</article-title>
          ,
          <source>AI Mag</source>
          .
          <volume>43</volume>
          (
          <year>2022</year>
          )
          <fpage>93</fpage>
          -
          <lpage>104</lpage>
          . URL: https://doi.org/10.1609/aimag.v43i1.19122. doi:
          <volume>10</volume>
          .1609/aimag. v43i1.
          <fpage>19122</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Introduction to graph neural networks</article-title>
          ,
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Philip</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on graph neural networks</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems 32</source>
          (
          <year>2020</year>
          )
          <fpage>4</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-cam:
          <article-title>Visual explanations from deep networks via gradient-based localization</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kovaleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rumshisky</surname>
          </string-name>
          ,
          <article-title>A primer in bertology: What we know about how bert works, Transactions of the Association for Computational Linguistics 8 (</article-title>
          <year>2020</year>
          )
          <fpage>842</fpage>
          -
          <lpage>866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <article-title>Truthfulqa: Measuring how models mimic human falsehoods</article-title>
          ,
          <source>CoRR abs/2109</source>
          .07958 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2109.07958. arXiv:
          <volume>2109</volume>
          .
          <fpage>07958</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , E. Adeli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          , S. von Arx,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bohg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brunskill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brynjolfsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Card</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Castellon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Chatterji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Creel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Q.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demszky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Doumbouya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Durmus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ermon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Etchemendy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gillespie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Grossman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Icard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karamcheti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Keeling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Khani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Krass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kuditipudi</surname>
          </string-name>
          , et al.,
          <article-title>On the opportunities and risks of foundation models</article-title>
          ,
          <source>CoRR abs/2108</source>
          .07258 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2108.07258. arXiv:
          <volume>2108</volume>
          .
          <fpage>07258</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Fairness in deep learning: A computational perspective</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tomsett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raghavendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alzantot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cerutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Julier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Rao</surname>
          </string-name>
          , et al.,
          <article-title>Interpretability of deep learning models: A survey of results, in: 2017 IEEE smartworld, ubiquitous intelligence &amp; computing, advanced &amp; trusted computed, scalable computing &amp; communications, cloud &amp; big data computing, Internet of people and smart city innovation (smartworld</article-title>
          /SCALCOM/UIC/ATC/CBDcom/IOP/SCI), IEEE,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Fong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <article-title>Interpretable explanations of black boxes by meaningful perturbation</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3429</fpage>
          -
          <lpage>3437</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>