<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Democratizing Advanced Attribution Analyses of Generative Language Models with the Inseq Toolkit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriele Sarti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nils Feldhus</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jirui Qi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malvina Nissim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arianna Bisazza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Language and Cognition (CLCG), University of Groningen, Oude Kijk in 't Jatstraat 26 Groningen</institution>
          ,
          <addr-line>9712EK</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>German Research Center for Artificial Intelligence (DFKI), Alt-Moabit 91c</institution>
          ,
          <addr-line>Berlin, 10559</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Inseq1 is a recent toolkit providing an intuitive and optimized interface to conduct feature attribution analyses of generative language models. In this work, we present the latest improvements to the library, including eforts to simplify the attribution of large language models on consumer hardware, additional attribution approaches, and a new client command to detect and attribute context usage in language model generations. We showcase an online demo using Inseq as an attribution backbone for context reliance analysis, and we highlight interesting contextual patterns in language model generations. Ultimately, this release furthers Inseq's mission of centralizing good interpretability practices and enabling fair and reproducible model evaluations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Generative Language Models</kwd>
        <kwd>Feature Attribution</kwd>
        <kwd>Python Toolkit</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Feature attribution methods have been widely adopted in NLP to quantify the importance
of input tokens in driving language models’ (LMs) predictions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While some works used
feature attribution to analyze generative NLP models, focusing mainly on machine translation
[2, 3, 4, i.a.], most analyses in this area focused on classification due to the initial popularity
of BERT-based encoders [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the challenges of autoregressive generation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Although
several post-hoc interpretability tools are available, few support generative LMs [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], often
requiring ad-hoc wrappers to enable interoperability with the popular Transformers library [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
commonly used by NLP practitioners.
      </p>
      <p>
        Inseq [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a Python library ofering native compatibility with Transformers and supporting
advanced methods and customizations. Inseq centralizes access to a broad set of feature
attribution methods, sourced in part from the Captum [12] framework, enabling fair comparisons
      </p>
      <sec id="sec-1-1">
        <title>Generative LM</title>
        <p>Prompt</p>
        <p>To innovate one should</p>
      </sec>
      <sec id="sec-1-2">
        <title>Autoregressive Generation</title>
        <p>think
outside
the
box</p>
      </sec>
      <sec id="sec-1-3">
        <title>Extraction of Attribution Scores and</title>
      </sec>
      <sec id="sec-1-4">
        <title>Custom Step Functions</title>
        <p>↓ Attributed</p>
        <p>To
innovate
one
should
think
outside
the</p>
        <p>Generated →
think outside the box
0 0.02 0 0
0.7 0.45 0.2 0.15
0.05 0.04 0 0.02
0.25 0.02 0 0.02
0.47 0.4 0.15
0.4 0.35</p>
        <p>0.35
0.5
0.82 0.94 0.99
across various techniques for all encoder-decoder and decoder-only models supported by the
Transformers library. The toolkit aims to democratize access to interpretability analyses of
generative LMs with minimal setup, enabling reproducible evaluations. An example is provided
in Figure 1. Thanks to its intuitive interface, users can easily integrate interpretability analyses
into their text generation pipelines with just a few lines of code. Moreover, a command-line
interface (CLI) and various utility methods to visualize, serialize, and reload attribution outcomes
are provided to facilitate analysis at scale. Inseq is also highly flexible, including cutting-edge
attribution methods with built-in post-processing features (Section 2), supporting customizable
attribution targets and enabling the attribution of arbitrary sequences produced via forced
decoding (Section 2.1).</p>
        <p>In this paper, we summarize recent eforts in the development of the Inseq toolkit, focusing
specifically on newly added usability features to support the attribution of large LMs (LLMs)
(Section 2.2), and a new command to contrastively attribute context usage in LMs generations
(Section 3). Finally, we present various applications of Inseq in recent research (Section 4).</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The Inseq Toolkit</title>
      <p>
        Inseq provides an easy-to-use interface to apply feature attribution methods, extending
Captum [12] as attribution back-end to generative models from the Transformers library [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Table 1 (left) presents an updated list of supported attribution methods, categorized into three
groups, gradient-based, internals-based and perturbation-based, depending on their underlying
approach to importance quantification. Aside from popular model-agnostic methods, Inseq
notably provides built-in support for attention weight attribution and a range of cutting-edge
methods not supported in any other toolkit, such as Discretized Integrated Gradients [17],
Sequential Integrated Gradients [18], Value Zeroing [22], and ReAGent [23], with many of those
allowing for the importance attribution of custom intermediate model layers.</p>
      <p>Among its notable features, Inseq ofers flexible source and target-side attribution for
encoder-decoder systems, alongside several Aggregator classes to aggregate attribution scores
across various dimensions (e.g. at the token level), and AggregatorPipeline for chaining various
aggregation steps (e.g. extract the weight of the i-th attention head at the n-th layer).</p>
      <sec id="sec-2-1">
        <title>2.1. Customizing generation and attribution</title>
        <p>At every generation step, in addition to computing attribution scores, Inseq can also use models’
information to compute functions of the output distributions or intermediate representations,
which we collectively refer to as step functions (Table 1, S). For example, the resulting scores
can provide additional insights into the generation process for uncertainty quantification or
outlier detection. Inseq provides access to several built-in step functions and allows users to
create and register custom ones. Step scores are computed alongside attribution and visualized
in the same matrix of attribution scores (e.g. (|&lt;) in Figure 1).</p>
        <p>
          Various attribution methods rely on model outputs to predict input importance, using
functions of the model’s output logits or token probabilities [27]. Yin and Neubig [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] propose
contrastive metrics to help disentangle how various factors contribute to the prediction. For
example, the gradient ∇((barking) − (crying)) given the prompt “Can you stop the dog from
___” will highlight the role of the entity dog in selecting barking, disentangling the semantic
component from grammatical correctness by providing a crying as grammatically valid choice.
Figure 2 illustrates an example. Inseq users can leverage any built-in or custom-defined step
function as an attribution target, enabling advanced use cases like contrastive comparisons.
        </p>
        <p>The new version of Inseq supports customizable word alignments, i.e. indices aligning
tokens in the original and contrastive generated texts, to support contrastive comparisons
between texts of diferent lengths, including automatic alignments using the multilingual LaBSE
encoder [28] to streamline their application.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Usability Features</title>
        <p>Inseq supports batching to simplify analysis at scale and customizable start/end positions to
accelerate the attribution process for studies on localized phenomena (e.g., pronoun coreference).
Moreover, it ofers a CLI to attribute single examples or entire Datasets from the command line,
storing resulting outputs and visualizations. Attributions can be saved in JSON format with
metadata to identify their provenance, allowing for easy reloading and visualization.
Quantization and distributed attribution All models allowing for quantization using
bitsandbytes [29] can be loaded in 4-bit or 8-bit precision directly from Transformers, and their
attributions can be computed normally using Inseq at a fraction of the cost. Similarly, Inseq is
compatible with the Petals library [30], supporting gradient-based attribution across language
models whose computation is distributed across several machines. This can alleviate the need
for high-end GPUs to run LLMs, enabling the distributed computation of attribution scores.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Case study: Attributing Context Influence using PECoRe</title>
      <p>The PECoRe framework [31] was proposed to identify and attribute context usage in language
models, and further adapted by Qi et al. [32] to produce model internals-based citations for LLM
generations. First, contrastive functions such as KL divergence select generated tokens sensitive
to context ablation. Then, contrastive feature attribution is used to identify context tokens
driving the contextual prediction. Inseq provides an ad-hoc CLI command (attribute-context)
for PECoRe usage, supporting all contrastive step functions and attribution methods. Figure 3
provides an example output in a GUI built on top of the Inseq API.2 In the example, an LLM3 is
1A tutorial for distributed attribution is available here: https://inseq.org/en/latest/examples/petals.html
2The presented demo is available here: https://huggingface.co/spaces/gsarti/pecore
3We used StableLM 2 Zephyr 1.6B: https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
4
prompted with contexts retrieved from Wikipedia to provide a long-form answer to a query (1).
When referring to context information (2), PECoRe shows that the indices of the two documents
containing relevant information are salient. On the other hand, the names of other Hawaiian
islands are important when the model produces an additional remark on their population (3).
We observe that the context is not salient for answering the question, suggesting the model
might have memorized the answer. We test this by prompting the model in a closed-book
setting, finding that the model can indeed respond correctly without context (4).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Related Work using Inseq</title>
      <p>Since its first release, Inseq was adopted to conduct several feature attribution analyses of
generative LMs. In the conversational domain, its Integrated Gradients implementation was
used to study longitudinal dialogues with conversational models for Italian [33]. Inseq was
also used to measure agreement between attribution scores and a new metric of LLMs’ factual
reliability [34], and to analyze the context repetition in dialogues [35]. In machine translation,
Inseq attribution methods were used to select salient in-context examples with the aim to
mitigate gender bias in translated sentences [36] and to evaluate the usage of source and
targetside information in character-level machine translation systems across several languages [37].</p>
      <p>Inseq was integrated into several tools and methods, including the LLMCheckup interface [38],
using Inseq for producing attributions for fact-checking and conversational question answering
(QA), and the PECoRe framework [31] for detecting and attributing context usage in language
models. Finally, Inseq methods were also used as baselines to compare proposed new feature
attribution approaches [23], and to probe the contextual influence in afixal negation [39].</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>GS and AB acknowledge the support from the Dutch Research Council (NWO) for the InDeep
project (NWA.1292.19.399). JQ and AB acknowledge NWO support for the Lessen project
(NWA.1389.20.183). GS acknowledges support of the Netherlands eScience Center Fellowship
program. NF acknowledges support from the German Federal Ministry of Education and
Research (BMBF) as part of the project XAINES (01IW20005).
terpretability toolkit for sequence generation models, in: ACL 2023 (Volume 3: System
Demonstrations), ACL, 2023, pp. 421–435. URL: https://aclanthology.org/2023.acl-demo.40.
[12] N. Kokhlikyan, V. Miglani, M. Martin, E. Wang, B. Alsallakh, J. Reynolds, A. Melnikov,
N. Kliushkina, C. Araya, S. Yan, O. Reblitz-Richardson, Captum: A unified and generic
model interpretability library for PyTorch, arXiv abs/2009.07896 (2020). URL: https://arxiv.
org/abs/2009.07896.
[13] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising
image classification models and saliency maps, in: The Second International Conference
on Learning Representations, 2014. URL: http://arxiv.org/abs/1312.6034.
[14] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating
activation diferences, in: ICML 2017, Proceedings of Machine Learning Research, PMLR,
2017, pp. 3145–3153. URL: https://proceedings.mlr.press/v70/shrikumar17a.html.
[15] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in: NeurIPS
2017, volume 30, Curran Associates Inc., 2017, p. 4768–4777. URL: https://dl.acm.org/doi/
10.5555/3295222.3295230.
[16] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: Proceedings
of the 34th International Conference on Machine Learning (ICML), volume 70, Journal of
Machine Learning Research (JMLR), 2017, p. 3319–3328.
[17] S. Sanyal, X. Ren, Discretized integrated gradients for explaining language models, in:</p>
      <p>EMNLP 2021, ACL, 2021, pp. 10285–10299. doi:10.18653/v1/2021.emnlp-main.805.
[18] J. Enguehard, Sequential integrated gradients: a simple but efective method for explaining
language models, in: Findings of ACL 2023, ACL, 2023, pp. 7555–7565. doi:10.18653/v1/
2023.findings-acl.477.
[19] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
and translate, in: ICLR 2015, 2015. URL: http://arxiv.org/abs/1409.0473.
[20] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in:
Computer Vision – ECCV 2014, Springer International Publishing, Cham, 2014, pp. 818–
833. doi:10.1007/978-3-319-10590-1_53.
[21] M. T. Ribeiro, S. Singh, C. Guestrin, "why should i trust you?": Explaining the predictions
of any classifier, in: KDD 2016, Association for Computing Machinery, 2016, p. 1135–1144.
doi:10.1145/2939672.2939778.
[22] H. Mohebbi, W. Zuidema, G. Chrupała, A. Alishahi, Quantifying context mixing in
transformers, in: EACL 2023, 2023, pp. 3378–3400. doi:10.18653/v1/2023.eacl-main.
245.
[23] Z. Zhao, B. Shan, ReAGent: A model-agnostic feature attribution method for generative
language models, in: AAAI Workshop on Responsible Language Models, 2024. URL:
https://arxiv.org/abs/2402.00794.
[24] Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model
uncertainty in deep learning, in: ICML 2016, volume 48 of Proceedings of Machine Learning
Research, Proceedings of Machine Learning Research (PLMR), 2016, pp. 1050–1059. URL:
https://proceedings.mlr.press/v48/gal16.html.
[25] P. Fernandes, K. Yin, E. Liu, A. Martins, G. Neubig, When does translation require context?
a data-driven, multilingual exploration, in: ACL 2023 (Volume 1: Long Papers), ACL, 2023,
pp. 606–626. doi:10.18653/v1/2023.acl-long.36.
[26] S. Lu, S. Chen, Y. Li, D. Bitterman, G. Savova, I. Gurevych, Measuring pointwise -usable
information in-context-ly, in: Findings of EMNLP 2023, ACL, 2023, pp. 15739–15756.
doi:10.18653/v1/2023.findings-emnlp.1054.
[27] J. Bastings, S. Ebert, P. Zablotskaia, A. Sandholm, K. Filippova, “will you find these
shortcuts?” a protocol for evaluating the faithfulness of input salience methods for text
classification, in: EMNLP 2022, ACL, 2022, pp. 976–991. doi:10.18653/v1/2022.emnlp-main.64.
[28] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic BERT sentence
embedding, in: ACL 2022 (Volume 1: Long Papers), ACL, 2022, pp. 878–891. doi:10.18653/
v1/2022.acl-long.62.
[29] T. Dettmers, M. Lewis, Y. Belkada, L. Zettlemoyer, GPT3.int8(): 8-bit matrix multiplication
for transformers at scale, in: NeurIPS 2022, 2022. URL: https://openreview.net/forum?id=
dXiGWqBoxaD.
[30] A. Borzunov, D. Baranchuk, T. Dettmers, M. Riabinin, Y. Belkada, A. Chumachenko,
P. Samygin, C. Rafel, Petals: Collaborative inference and fine-tuning of large models, in:
ACL 2023 (Volume 3: System Demonstrations), ACL, 2023, pp. 558–568. doi:10.18653/v1/
2023.acl-demo.54.
[31] G. Sarti, G. Chrupała, M. Nissim, A. Bisazza, Quantifying the plausibility of context
reliance in neural machine translation, in: ICLR 2024, OpenReview, 2024. URL: https:
//openreview.net/forum?id=XTHfNGI3zT.
[32] J. Qi, G. Sarti, R. Fernández, A. Bisazza, Model internals-based answer attribution for
trustworthy retrieval-augmented generation, 2024. URL: https://arxiv.org/abs/2406.13663.
arXiv:2406.13663.
[33] S. M. Mousavi, S. Caldarella, G. Riccardi, Response generation in longitudinal dialogues:
Which knowledge representation helps?, in: NLP4ConvAI 2023, ACL, 2023, pp. 1–11.
doi:10.18653/v1/2023.nlp4convai-1.1.
[34] W. Wang, B. Haddow, A. Birch, W. Peng, Assessing factual reliability of large language
model knowledge, in: NAACL, 2024, pp. 805–819. doi:10.18653/v1/2024.naacl-long.
46.
[35] A. Molnar, J. Jumelet, M. Giulianelli, A. Sinclair, Attribution and alignment: Efects of local
context repetition on utterance production and comprehension in dialogue, in: CoNLL
2023, ACL, 2023, pp. 254–273. doi:10.18653/v1/2023.conll-1.18.
[36] G. Attanasio, F. M. Plaza del Arco, D. Nozza, A. Lauscher, A tale of pronouns:
Interpretability informs gender bias mitigation for fairer instruction-tuned machine translation, in:
EMNLP 2023, ACL, 2023, pp. 3996–4014. doi:10.18653/v1/2023.emnlp-main.243.
[37] L. Edman, G. Sarti, A. Toral, G. v. Noord, A. Bisazza, Are Character-level Translations
Worth the Wait? Comparing ByT5 and mT5 for Machine Translation, Transactions of the
Association for Computational Linguistics 12 (2024) 392–410. doi:10.1162/tacl_a_00651.
[38] Q. Wang, T. Anikina, N. Feldhus, J. Genabith, L. Hennig, S. Möller, LLMCheckup:
Conversational examination of large language models via interpretability tools and self-explanations,
in: HCI-NLP 2024, 2024, pp. 89–104. doi:10.18653/v1/2024.hcinlp-1.9.
[39] T. H. Truong, Y. Otmakhova, K. Verspoor, T. Cohn, T. Baldwin, Revisiting subword
tokenization: A case study on afixal negation in large language models To appear in
NAACL 2024 (2024). URL: https://arxiv.org/abs/2404.02421.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandar</surname>
          </string-name>
          ,
          <article-title>Post-hoc interpretability for neural nlp: A survey</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3546577.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez-Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <article-title>A causal framework for explaining the predictions of blackbox sequence-to-sequence models</article-title>
          ,
          <source>in: EMNLP</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>412</fpage>
          -
          <lpage>421</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          -1042.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koehn</surname>
          </string-name>
          ,
          <article-title>Saliency-driven word alignment interpretation for neural machine translation</article-title>
          ,
          <source>in: WMT 2019 (Volume</source>
          <volume>1</volume>
          : Research Papers),
          <source>ACL</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          -5201.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferrando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Gállego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alastruey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Escolano</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. R.</surname>
          </string-name>
          <article-title>Costa-jussà, Towards opening the black box of neural machine translation: Source and target interpretations of the transformer</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>8756</fpage>
          -
          <lpage>8769</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .emnlp-main.
          <volume>599</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL-HLT</source>
          <year>2019</year>
          ,
          <article-title>Volume 1 (Long</article-title>
          and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yin</surname>
          </string-name>
          , G. Neubig,
          <article-title>Interpreting language models with contrastive explanations</article-title>
          ,
          <source>in: EMNLP</source>
          <year>2022</year>
          , ACL,
          <year>2022</year>
          , pp.
          <fpage>184</fpage>
          -
          <lpage>198</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .emnlp-main.
          <volume>14</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tenney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wexler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bastings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bolukbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Coenen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , E. Jiang,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pushkarna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Radebaugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Reif</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Yuan,</surname>
          </string-name>
          <article-title>The language interpretability tool: Extensible, interactive visualizations and analysis for NLP models</article-title>
          ,
          <source>in: EMNLP</source>
          <year>2020</year>
          :
          <article-title>System Demonstrations</article-title>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>15</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Alammar</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ecco:</surname>
          </string-name>
          <article-title>An open source library for the explainability of transformer language models, in: ACL-IJCNLP 2021: System Demonstrations</article-title>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>249</fpage>
          -
          <lpage>257</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2021</year>
          .acl-demo.
          <volume>30</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Miglani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Markosyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garcia-Olano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kokhlikyan</surname>
          </string-name>
          ,
          <article-title>Using captum to explain generative language models</article-title>
          ,
          <source>in: NLP-OSS</source>
          <year>2023</year>
          , ACL,
          <year>2023</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>173</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .nlposs-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: EMNLP</source>
          <year>2020</year>
          :
          <article-title>System Demonstrations</article-title>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Feldhus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sickert</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. van der Wal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bisazza</surname>
          </string-name>
          , Inseq: An in-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>