<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zijian</forename><surname>Zhang</surname></persName>
							<email>zzhang@l3s.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Leibniz Universität Hannover</orgName>
								<address>
									<addrLine>Appelstr. 9a</addrLine>
									<postCode>30167</postCode>
									<settlement>Hannover, Lower Saxony</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vinay</forename><surname>Setty</surname></persName>
							<email>vsetty@acm.org</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Stavanger</orgName>
								<address>
									<addrLine>Kjell Arholms gate 41</addrLine>
									<postCode>4021</postCode>
									<settlement>Stavanger</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yumeng</forename><surname>Wang</surname></persName>
							<email>y.wang@liacs.leidenuniv.nl</email>
							<affiliation key="aff2">
								<orgName type="department">Leiden Institute of Advanced Computer Science</orgName>
								<orgName type="institution">Leiden University</orgName>
								<address>
									<addrLine>Einsteinweg 55</addrLine>
									<postCode>2333 CC</postCode>
									<settlement>Leiden</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Avishek</forename><surname>Anand</surname></persName>
							<email>avishek.anand@tudelft.nl</email>
							<affiliation key="aff3">
								<orgName type="institution">Delft University of technology</orgName>
								<address>
									<addrLine>Mekelweg 5</addrLine>
									<postCode>2628 CD</postCode>
									<settlement>Delft</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">42A0021B013F218F74DE70404806CDA6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Causal Inference</term>
					<term>Rule Extraction</term>
					<term>Interactive XAI</term>
					<term>Global Interpretability</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With the rapid advancement of neural language models, the deployment of overparameterized models has surged, increasing the need for interpretable explanations comprehensible to human inspectors. Existing post-hoc interpretability methods, which often focus on unigram features of single input textual instances, fail to capture the models' decision-making process fully. Additionally, many methods do not differentiate between decisions based on spurious correlations and those based on a holistic understanding of the input. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations by identifying causal n-gram associations with model predictions. This method employs a scalable sequence mining technique to extract relevant text spans from training data, associate them with model predictions, and conduct causality checks to distill robust rules that elucidate model behavior. These rules expose potential overfitting and provide insights into misleading feature combinations. We validate DISCO through extensive testing, demonstrating its superiority over existing methods in offering comprehensive insights into complex model behaviors. Our approach successfully identifies all shortcuts manually introduced into the training data (100% detection rate on the MultiRC dataset), resulting in an 18.8% regression in model performance-a capability unmatched by any other method. Furthermore, DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output. This alleviates the burden of abundant instance-wise explanations and helps assess the model's risk when encountering out-of-distribution (OOD) data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Over-parameterized transformer models for natural language tasks have demonstrated remarkable success. However, these inherently statistical models are prone to overfitting, particularly in terms of the correlation between input phrases and prediction labels, known as "shortcuts", which can lead to biased outcomes <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Our goal is to identify these shortcuts in text classification tasks and enhance human understanding of the model's predictive reasoning. We propose a post-hoc, model-agnostic method designed to reduce the amount of human effort needed to evaluate the justification of the model's decisions.</p><p>In this paper, we introduce DISCO, a method designed to extract a concise set of global rules using longer text sequences, which helps identify undesirable causal shortcuts learned in text classification tasks. Figure <ref type="figure" target="#fig_0">1</ref> illustrates the overall structure of DISCO with an example of an extracted rule: First, using a trained model and its training data, we identify high-support n-gram patterns that strongly correlate with specific model predictions. Next, we assess whether these identified patterns are true causes of the predictions or merely associated with them. To do this, we create counterfactuals of the n-gram patterns and check if the association between the pattern and prediction remains consistent under these counterfactuals. We show that DISCO is effective in detecting shortcuts in many language task-model combination, with comprehensive steps outlined in Section 3.</p><p>Subsequently, we verify the efficacy of the generated rules by conducting evaluation experiments on four diverse datasets -Movies, SST-2, MultiRC, and CLIMATE-FEVER, using three underlying pre-trained models -BERT BASE , SBERT, and LSTM (Section 4). Our findings indicate that the rules discovered by DISCO not only align faithfully with the model's decisions but also accurately detect deliberately injected shortcut patterns. Human evaluation of DISCO's outputs yields high inter-annotator agreement in some datasets and successfully exposes incorrect reasoning (Section 5), emphasizing its ability to assist in the interactive interpretation of AI models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In this section, we introduce existing works related to ours, highlight their limitations, and describe how our approach resolves them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Local Interpretability</head><p>Considerable work has been done on post-hoc interpretability of language tasks based on token selection <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>. Interpretable-by-design approaches also often select specific input tokens as rationales for tasks, using these as intermediate inputs for the prediction model <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. These approaches focus on interpreting individual instances, necessitating labor-intensive, human-driven analysis to identify problematic prediction reasons. Our approach, in contrast, globally extracts rules internalized by the language model. Other works analyze model behavior using composition operators over primitive concepts aligned with human-understandable concepts <ref type="bibr" target="#b9">[10]</ref>. Despite their global perspective, these methods do not incorporate causal patterns. Attribution patterns from local interpretability methods lack inherent causality and may fail to capture the causal relationships internalized by the model. Recent approaches that aggregate rules from local explanations <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref> are also unsuitable for language tasks due to their reliance on single terms and inability to produce causal rules. SEARs <ref type="bibr" target="#b12">[13]</ref> is closer to our work, detecting semantically equivalent adversarial replacement rules leading to prediction changes. However, our method identifies patterns consistently leading the model to specific predictions under counterfactual conditions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Causal Inference on Language Tasks</head><p>Most research in this area focuses on creating "counterfactual instances", altered or minimally disturbed instances, to gain insights into model behavior. These counterfactuals are developed through human annotation <ref type="bibr" target="#b13">[14]</ref> or semi-automatic methods <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b11">12]</ref>. Models like <ref type="bibr" target="#b15">[16]</ref> use a game-theoretic framework to eliminate words with strong correlations but without causal relationships to the output. Unlike these studies, our method automatically generates counterfactuals using neutral contexts sampled from the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Rule Extraction for Model Debugging</head><p>Recent research characterizes model deficiencies through rules by dataset contamination <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b1">2]</ref>, but fails to identify human-comprehensible text sequences with high statistical capacity, which is precisely our aim. Furthermore, our methods are post-hoc and non-intrusive. Anchor <ref type="bibr" target="#b10">[11]</ref> identifies local n-gram phrases with high explanability, but its time complexity results in intractable calculations on the entire training set. <ref type="bibr" target="#b17">[18]</ref> involves a white-box, rule-based method, and <ref type="bibr" target="#b18">[19]</ref> identifies spurious correlations rather than all shortcuts, making them less suitable for direct comparison with our approach. <ref type="bibr" target="#b19">[20]</ref> is word-based and, therefore, not suitable for n-gram rules. These methods adopt a local perspective, aggregating explanations on an instance-by-instance basis without considering context awareness or causality. Our approach, in contrast, is n-gram-based, causal, and context-aware, providing a more comprehensive and insightful analysis.</p><p>Atwell et al. <ref type="bibr" target="#b20">[21]</ref> aims to evaluate the risk associated with models when exposed to test data with distribution shifts compared to their original training data. However, their research goal differs from ours. While their approach yields evaluation scores characterized by bias and h-discrepancy across datasets from different domains, our approach identifies possible shortcut n-grams learned from the original training data, offering more intuitive and interpretable shortcut rules.</p><p>Traditional research on developing n-gram classifiers focuses on highly interpretable algorithms leveraging frequent n-grams to discern between different topics <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>. Unfortunately, these classifiers either do not achieve performance comparable to modern neural models or lack universality. Our approach bridges the gap between interpretability and performance by effectively identifying highsupport n-gram patterns from underlying neural models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Causal Rule Mining</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Problem Statement</head><p>We consider an underlying model 𝑀 trained on a classification dataset represented as 𝒟 ⊂ 𝒳 × 𝒴. Here, 𝒳 represents the input space, and 𝒴 represents the labels. An input x ∈ 𝒳 is an ordered sequence of terms (𝑥 1 , 𝑥 2 , . . . , 𝑥 |x| ), where each term 𝑥 𝑖 comes from the vocabulary 𝒱. The prediction made by 𝑀 on input x is denoted as 𝑦 ^= argmax 𝑦∈𝒴 𝑃 𝑀 (𝑦|x). For simplicity, we abbreviate this as 𝑦 ^= 𝑀 (x) throughout this paper. Our research focuses exclusively on binary classification tasks.</p><p>We define s = (𝑠 1 , 𝑠 2 , . . . , 𝑠 𝑛 ) as an n-gram sub-sequence of x (represented as s ⊑ x). The remaining content in x is denoted as c, i.e., x = ⟨s, c⟩, where ⟨•, •⟩ is the sequence combination operator. Note that we do not assume sequence continuity in either c or s. The support of s within 𝒟 is defined as</p><formula xml:id="formula_0">Sup(s, 𝒟) = | {x ∈ 𝒟 : s ⊑ x} |.</formula><p>Additionally, we define a rule 𝑟 as a tuple (s → 𝑦 ^), where the sequence s is its pattern and 𝑦 ^is its consequent label. For instance, the rule:</p><formula xml:id="formula_1">the best movie ⏟ ⏞ pattern → POS ⏟ ⏞ consequent</formula><p>indicates that "the best movie" is a shortcut for 𝑀 to predict POSitive. In this context, we say that 𝑀 predicts 𝑦 ^primarily relying on the presence of the sequence s, rather than comprehending the overall input.</p><p>Our objective is to discover a globally representative set of rules, denoted as 𝐺 = {𝑟 = (s, 𝑦 ^)}, where each rule represents a shortcut learned by 𝑀 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">DISCO: Approach Overview</head><p>To streamline the identification process, we begin by extracting all high-frequency n-gram patterns from the training data (Section 3.3). We then retain the candidates that pass the causality check (Section 3.4) as the final output rules. Our approach is designed to verify the (non-)existence of confounding variables, serving as a statistical test to establish causality in classification tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Generation of Candidate Sequences</head><p>In the initial step, our primary objective is to extract frequent n-gram sequences that exhibit a high correlation with specific model predictions.</p><p>Sequence Mining. Empirical studies such as <ref type="bibr" target="#b24">[25]</ref> emphasize that a pattern is more likely to influence a model's prediction as a shortcut if it occurs frequently in the training set. Therefore, we first select all frequent patterns using an efficient approach known as DESQ-COUNT <ref type="bibr" target="#b25">[26]</ref>. For a detailed explanation of DESQ-COUNT, please refer to <ref type="bibr" target="#b26">[27]</ref>.</p><p>NPMI Evaluation. We further evaluate the pattern-prediction correlation using their NPMI (Normalized Pointwise Mutual Information) score. Initially, we list all input data x from the training set together with their corresponding predictions from the model 𝑦 ^= 𝑀 (x). Then we calculate 𝑃 (𝑦, s), 𝑃 (𝑦|s), and 𝑃 (𝑦) from these predictions. It is worth mentioning that these probabilities are different from the model's prediction 𝑃 𝑀 (𝑦 ^|x). Using these terms, we calculate the NPMI scores for all frequent s identified by DESQ-COUNT:</p><formula xml:id="formula_2">NPMI(s; 𝑦) = PMI(𝑦; s) ℎ(s, 𝑦) = log 𝑃 (𝑦|s) 𝑃 (𝑦) ℎ(s, 𝑦) ,</formula><p>where ℎ(s, 𝑦) = − log 𝑃 (s, 𝑦) is the entropy of 𝑃 (s, 𝑦). The resulting NPMI score falls within the range of [−1, 1], capturing the spectrum from "never occurring together (-1)" to "independence (0)" and ultimately "complete co-occurrence (1)" between the pattern and the label. We retain only those pairs that demonstrate a substantial level of correlation in their NPMI scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Causality Check</head><p>Such correlation alone, however, does not guarantee a direct causal relationship, as it could also arise from a confounding factor <ref type="bibr" target="#b27">[28]</ref>. In our context, we assume the confounding factor is the latent semantic representation z of the input. The presence of sequence pattern s and the context c of the input x are conditioned on z. An ideal machine learning model should comprehend this structure and capture z, rather than relying solely on the statistical correlation between s and 𝑦 ^, also referred to as the "shortcut" <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b1">2]</ref>. We adopt Structured Causal Models (SCMs) <ref type="bibr" target="#b27">[28]</ref> to describe the prediction process of the ideal models, as illustrated in Figure <ref type="figure" target="#fig_1">2</ref>. If our underlying model captures the existence of the latent semantic, the confounding factor z exists and causes the correlation between s and 𝑦 ^. Otherwise, the model 𝑀 simply relies on the statistical correlation between s and 𝑦 ^to make the prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Z C S X</head><p>Following <ref type="bibr" target="#b27">[28]</ref>, we leverage the do-operator on the "back-door" variable s of the input variable x. The do-operator simulates a physical intervention by replacing a random variable (RV) with a constant value while keeping the rest of the RVs intact, thereby breaking the potential confounding effect. In our SCM, applying the do-operator to s means assigning s a specific value and marginalizing over context c.</p><p>If the s -𝑦 ^correlation is caused by the confounding factor z, the model's prediction will differ before and after the do-operator, because</p><formula xml:id="formula_3">𝑃 (𝑦 ^|s) = ∑︁ 𝑧,c 𝑃 (𝑦 ^|s, c)𝑃 (c, s|𝑧)𝑃 (𝑧) ̸ = ∑︁ 𝑧,c 𝑃 * (𝑦 ^|s, c)𝑃 * (c|𝑧)𝑃 * (𝑧) = 𝑃 (𝑦 ^|do(s = s)),</formula><p>where 𝑃 * (•) denotes the distributions after applying the do-operator. Note that despite the similarity of our approach with that of <ref type="bibr" target="#b18">[19]</ref>, their work distinguishes between "spurious" and "genuine" local shortcuts based on semantic consistency with human understanding. Our approach emphasizes that all shortcuts learned by DISCO possess a causal attribute globally without explicitly targeting this distinction due to subjectivity concerns. To highlight the difference between semantically spurious and causal shortcuts, we measure human agreement on distinguishing "right" from "wrong" shortcuts introducing human interaction in Section 5.3.</p><p>Neutral Context Harvesting One remaining challenge in the algorithm mentioned in the previous section is sampling the context RV C. This sampling process is often intractable in NLP tasks due to the varying input lengths and extensive vocabulary size. To address this, we employ a straightforward technique to reuse contexts c for different s, effectively obtaining contexts for free. Moreover, we reuse neutral contexts to mitigate the influence of other potential frequent sequences that may exist in the context. A context is considered neutral when its predicted probabilities lie near the border between two labels, namely |𝑃 𝑀 (𝑦 = 𝑦 0 |c) − 𝑃 𝑀 (𝑦 = 𝑦 1 |c)| = |2𝑃 𝑀 (𝑦 = 𝑦 0 |c) − 1| &lt; 𝜖 𝑛 , where 0 &lt; 𝜖 𝑛 &lt; 1 is the neutrality tolerance.</p><p>It is noteworthy that complex modern numerical sampling techniques, such as Markov Chain Monte Carlo (MCMC) <ref type="bibr" target="#b29">[30]</ref>, require careful handling to preserve contextual fluency and ensure the neutrality of the sentiment. Therefore, perfecting the generation of bias-free and neutral counterfactual contexts falls outside the scope of this paper. The exploration of alternative sampling techniques is left for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">A Toy Example</head><p>At the end of this section, we provide a toy example to assist our readers in understanding the full process of our approach. We consider an extreme situation as follows to help illustrate. Assume a sentiment analysis problem where all reviews on books are positive, and all reviews on movies are negative in the training data. A model trained on such data might incorrectly predict positive for a review like "this book is badly written" due to its overfitting to the correlation between the sequence "this book" and the label positive. It is worth mentioning that such sequences may appear semantically senseless and therefore "non-causal" to humans. The resulting rules reflect the rational basis of the model's prediction, rather than convincing a human inspector of its causality.</p><p>In DISCO, we apply DESQ first to identify the correlation between the sequence "this book" and the label positive from the training data. This pair is then subjected to an NPMI check to decide whether it is a candidate sequence (Section 3.3). Then, in the causality check (Section 3.4), we keep "this book" constant and vary its contexts to other neutral contexts (Section 3.4) like "was played in the cinema" or "is on the table". If the prediction predominantly remains positive, we infer that "this book" -positive is a shortcut.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Research Questions</head><p>Our experiments aim to answer the following research questions (RQs):</p><p>• RQ1. Faithfulness: Are the global rules faithful to the model's local explanations? • RQ2. Recall: If the model is known to have learned some shortcuts, can DISCO identify them? • RQ3. Human Utility: Are the shortcut rules useful for humans in detecting the model's wrong reasons?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Models and Datasets</head><p>Our approach is model-agnostic. Therefore, we conduct experiments on multiple models to answer RQ1 and RQ3, including an LSTM model and two over-parameterized transformer models, BERT BASE and SBERT <ref type="bibr" target="#b30">[31]</ref>.</p><p>The experiments are conducted on one document classification and three multi-task datasets. Given the foundational role of document classification in information retrieval (IR) and natural language processing (NLP), we employ a unified approach, transforming all datasets into binary classification: Movies from the ERASER benchmark <ref type="bibr" target="#b31">[32]</ref> is originally a binary sentiment classification dataset. MultiRC from the same benchmark is converted following the recipe presented in <ref type="bibr" target="#b31">[32]</ref>. For SST-2 (Stanford Sentiment Treebank) <ref type="bibr" target="#b32">[33]</ref>, we binarize the sentiment assigned to each input sentence. As for CLIMATE-FEVER, a fact-checking dataset from ir_datasets <ref type="bibr" target="#b33">[34]</ref> with queries and documents regarding climate change, we combine each query with each of its relevant/irrelevant documents as the inputs, while assigning "relevant"/"irrelevant" as their labels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">The Agreement Score as a Metric of Faithfulness</head><p>Local interpretation approaches, such as LIME <ref type="bibr" target="#b5">[6]</ref> and ExPred <ref type="bibr" target="#b8">[9]</ref>, provide relatively faithful instance-wise explanations. Although researchers are questioning the quality of LIME explanations <ref type="bibr" target="#b34">[35]</ref>, LIMEbalances time efficiency and faithfulness well, to the best of our knowledge. Our global rules are considered faithful to the local explanations if they agree with the local explanations in all applicable instances. We define an input x as applicable to a rule 𝑟 = (s → 𝑦 ^) if s ⊑ x. Additionally, an applicable input x further satisfies the rule 𝑟 if its prediction matches the rule's consequent, i.e., 𝑦 ^= 𝑀 (x).</p><p>For an input-prediction pair (x, 𝑦 ^), an instance-wise explainer attributes the prediction 𝑃 𝑀 (𝑦 ^|x) to 𝑥 𝑖 as attribution score 𝑎 𝑦 î ∈ R. The gathering of all attribution scores of x is represented using a 𝑦 ^. For clarity, we ignore the superscripts of 𝑦 ^in the rest of this section. We rank all terms based on their attribution scores in descending order, denoted as ℛ a (x) = (𝑥 𝑘 1 , 𝑥 𝑘 2 , . . . , 𝑥 𝑘𝑛 ), where 𝑎 𝑘 1 ≥ 𝑎 𝑘 2 ≥ . . . ≥ 𝑎 𝑘𝑛 are re-ranked token indices.</p><p>For an input x that satisfies a rule 𝑟, we define the agreement score between 𝑟 and ℛ a (x) as:</p><formula xml:id="formula_4">agreement(𝑟, ℛ a (x)) = ranking score(ℛ a (x); s),</formula><p>where the semicolon in the ranking score calculation separates the ranking sequence ℛ a (x) from the subsequence s.</p><p>We borrow the nDCG score <ref type="bibr" target="#b35">[36]</ref> from ranking evaluation tasks as the ranking score function here and consider the pattern terms as the "ground truth" terms. The intuition behind this metric is that the terms selected by the rule (ground truth) should be assigned the highest attribution scores and thus ranked the highest. A higher agreement score indicates that the rule is more faithful to a local explanation. For example, given x = "a b c" with corresponding attribution scores a = [0.1, 0. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Experiment Environment</head><p>Our approach is implemented in Python 3.7.3, utilizing PyTorch version 1.12.1+cu133. All experiments are conducted on a Linux server equipped with an AMD®EPYC®7513 processor and an Nvidia®A100 GPU with 40 GB of display memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">RQ1. Faithfulness</head><p>We address this research question through two experiments: explanation alignment and an ablation study. In this section, we mine rules from BERT BASE <ref type="bibr" target="#b36">[37]</ref> models fine-tuned on different datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.1.">Agreement with Local Explanations</head><p>We aim to evaluate whether the global rules are consistent with the local explanations by measuring the agreement scores between them. Overall, we find a high degree of alignment between the global rules and the local explanations across all three datasets, with low variance (Fig. <ref type="figure" target="#fig_4">3</ref>). It is worth mentioning that the lowest agreement score appears on Movies with ExPred, being 0.695, which is the only outlier. The remaining scores range from 0.81 to 0.923. For exact results, we refer to Table <ref type="table" target="#tab_1">2</ref>. This indicates that our rules faithfully represent the model's explanations.  Moreover, we observed a slight exception in the SST-2 dataset, where the low frequency of sequences leads to a small number of dominant rules and relatively higher variance. Nevertheless, upon manual examination of the rules, we found that most high-coverage rules in this dataset are correct and result in the right prediction. For a detailed evaluation, please refer to Section 5.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Movie</head><p>It should be noted that the CLIMATE-FEVER dataset is not included in this analysis because it provides no rationale annotations, making it impossible to train the ExPred model on it. Based on our results, we can conclude that for Movies, SST-2, and MultiRC, the rules with the highest satisfaction are usually the correct reasons for the model's predictions, as they tend to have high alignment with local explanations. However, some rules, such as don't even → NEG for Movies and in its → POS for SST-2, suggest that the model has also learned some incorrect shortcuts. Relying on incorrect shortcuts could be even more detrimental to the model's performance when deployed in the field and encountering out-of-distribution (OOD) data. This is supported by the model's behavior on the counterfactuals generated during the causality check. We list some counterfactual examples in Table <ref type="table" target="#tab_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.2.">Ablation Study</head><p>To the best of our knowledge, our work is pioneering in the extraction of global causal rules learned by the model, making it challenging to establish appropriate baseline methods. Instead, we conduct ablation studies on different components of our approach, DISCO, to assess its ability to discover causal rules, as summarized in Table <ref type="table" target="#tab_1">2</ref>. We select the top-15 (s, 𝑦 ^) pairs based on their coverage under three conditions: 1) NPMI score filtering only, 2) DISCO with both NPMI and causality checks, and 3) the intersection (∩) between 1) and 2). We measure the average agreement scores among these configurations. The results presented in Table <ref type="table" target="#tab_1">2</ref> demonstrate that DISCO with all its processes (2)) achieves higher agreement scores than the NPMI filter alone across all datasets, compared to ExPred. However, for LIME, the intersection (3)) appears to outperform the other configurations. This observation suggests that the causality check following the NPMI filter can, to some extent, filter out correlated yet non-causal (s, 𝑦 ^) pairs, resulting in a greater number of causal rules that accurately reflect the model's predictions. Although our approach shows high agreement scores with local attributions, we must emphasize that the causality of the rules before the causality check cannot be guaranteed.</p><p>We would like to re-emphasize that we cannot use <ref type="bibr" target="#b15">[16]</ref> as our baseline model, because it produces only unigram-based rules and is therefore incomparable with our approach. Modern language models are designed to internalize contextual information between input tokens <ref type="bibr" target="#b36">[37,</ref><ref type="bibr" target="#b30">31]</ref>. Our approach identifies shortcut rules for such contextual information. For example, from "(This book is badly written, POS)", our approach can recognize the shortcut rule "(This book −→ POS)", while a unigram approach fails. Another critical reason is the intractability of generating multi-word rules using their approach regarding time complexity: mining a rule with four adjacent tokens bloats the search space to |𝑉 | 4 . Likewise, <ref type="bibr" target="#b18">[19]</ref> is also unsuitable as our baseline model. Additionally, <ref type="bibr" target="#b18">[19]</ref> focuses on a different goal of distinguishing between "spurious" and "genuine" shortcuts based on their consistency with human understanding, while our work does not seek to differentiate these two groups. We, in contrast, leave the task of deciding "right" or "wrong" reasons using subjective human interaction as presented in Section 5.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.3.">Hyperparameters</head><p>For the Movies dataset, we mine sequences with lengths ranging from 4 to 10, and a support value of 20. During the causality check, we consider rules where the average prediction over all synthetic instances is greater than 0.7, serving as the mean threshold.</p><p>For SST-2, the sequence lengths range from 2 to 10, the support value is 100, and the mean threshold is 0.7.</p><p>Both datasets are sentiment analysis datasets containing no queries<ref type="foot" target="#foot_0">1</ref> . On the other hand, MultiRC and CLIMATE-FEVER datasets consist of instances that include a query and a document. The pattern of their rules is (s 𝑞 , s 𝑑 ) tuples, indicating a combination of a sequence s 𝑞 from the query and a sequence s 𝑑 from the document. During sequence mining, s 𝑞 and s 𝑑 are jointly extracted from the query and document for each instance.</p><p>For MultiRC, the lengths of s 𝑞 and s 𝑑 are constrained within the ranges of 3 to 10 and 4 to 10, respectively. The support value for tuples is set to 200, and the mean threshold is 0.7. For CLIMATE-FEVER, the sequence lengths of s 𝑞 and s 𝑑 are within the ranges of 2 to 10. The tuple support is set to 200, and the mean threshold remains at 0.7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.4.">Statistics</head><p>The statistics of the rules are summarized in Table <ref type="table" target="#tab_2">3</ref>, showcasing key metrics such as #(frequent), #(NPMI), #(rules), and avg(|s|). These columns represent the number of frequent sequences mined by DESQ-COUNT, the sequences that pass the NPMI check, the resulting number of rules, and the average length of the pattern sequences of the rules, respectively.</p><p>The information presented in this table demonstrates the effectiveness of employing NPMI and the subsequent causality check. Incorporating these measures significantly reduces the length of shortcut sequences, allowing human inspectors to focus on the most crucial rationales across the entire dataset. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">RQ2. Recall</head><p>This research question serves two purposes: 1) to validate our assumption that highly correlated patterns and labels lead to the model learning shortcuts, and 2) to demonstrate the capability of DISCO in identifying these shortcuts. Quantitatively evaluating the retention rate of shortcuts by DISCO poses a challenge as it requires knowledge of the ground-truth correlated pattern-label pairs. This challenge is common in the evaluation of explanations <ref type="bibr" target="#b37">[38,</ref><ref type="bibr" target="#b38">39]</ref>. To overcome this issue, we deliberately introduce decoys <ref type="bibr" target="#b0">[1]</ref> into the dataset to entice the model into learning shortcuts. All decoys are presented in Table <ref type="table" target="#tab_3">4</ref>. Following a similar methodology to that of <ref type="bibr" target="#b24">[25]</ref>, we contaminate the original training set with decoy patterns, varying the contamination rate and bias. It is important to note that we only contaminate the training and validation sets, keeping the test set intact. This setup simulates a scenario where the model performs well on a biased dataset but lacks generalization due to learned shortcuts. If our approach can successfully identify the injected decoys, we consider it a success.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">Contamination Rate, Bias, and Retention Rate</head><p>The extent of contamination is described by the contamination rate and the bias.</p><p>We define contamination rate as the ratio of instances containing the decoy, namely |X 𝑑 | |X| . We further define bias as the label imbalance when adding the decoy, namely max 𝑦∈𝒴 ∑︀ 𝑦 𝑖 ∈Y 𝑑 1(𝑦 𝑖 = 𝑦), where Y 𝑑 indicates the labels corresponding to all contaminated instances. The label 𝑦 selected by the max 𝑦∈Y operator is referred to as the dominant label. The retention rate is the fraction of decoys that can be detected. A decoy is considered detected if the output of our approach contains the rule constructed by the decoy and its corresponding label. To the best of our knowledge, our study is also the first to systematically investigate the retention rate of decoys under different contamination rates and biases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Contamination-Bias Settings</head><p>To evaluate the retention rate of DISCO across various scenarios, we examine four different settings that produce different contamination rates and biases: {80%, 20%} × {60%, 90%}. Figure <ref type="figure" target="#fig_5">4</ref> illustrates the retention rate and task performance for each of these settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.3.">Observations</head><p>Figure <ref type="figure" target="#fig_5">4</ref> (third row) demonstrates that adding decoys to the training set has minimal effect on test performance, indicating that the introduced decoys do not significantly alter the data distribution. We also measured the faithfulness of DISCO to show that the decoys are indeed learned as shortcuts by the model. The heatmap in Figure <ref type="figure" target="#fig_5">4</ref> illustrates that under high-bias, high-contamination settings, DISCO can successfully identify our injected decoys, except for SST-2. We also observed that high-bias settings are easier to detect compared to high-contamination settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">RQ3. Human Utility</head><p>A shortcut rule can be a good reason for a model decision, but can also be a wrong one. To measure the human perception of model-generated rules, and to see whether the rules help humans detect wrong reasons for a decision, we conducted experiments using the uncontaminated four training sets with three different models: BERT BASE , LSTM, and SBERT. The extracted rules were independently shown to four machine learning developers who were asked to assess whether a rule was a "wrong reason". A wrong reason is an explanation that is either non-understandable or implausible, given the underlying language task. For example, the pattern "? | |" of a rule is non-understandable as it contains no meaningful words, while the rule "in its −→ POS" is implausible for a sentiment classification task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1.">Results</head><p>To report the inter-annotator agreement, we utilized Fleiss' kappa, a metric assessing the reliability of agreement between raters<ref type="foot" target="#foot_1">2</ref> (see Figure <ref type="figure" target="#fig_6">5</ref>). We observed a high inter-annotator agreement of ≥ 0.54 for BERT BASE and SBERT on the CLIMATE-FEVER dataset, and complete agreement for the MultiRC dataset. Interestingly, for the SST-2 dataset, we observed a low inter-rater agreement of −0.041 for the LSTM model. This was primarily due to the extraction of rules with extremely short sequences, such as "n ' t −→ NEG" by DISCO. Low Fleiss' 𝜅 among human evaluators on particular datasets and models indicates the subjective nature of distinguishing between "right" and "wrong" shortcuts in terms of semantic consistency with human understanding. However, high Fleiss' 𝜅 in certain datasets indicates that DISCO indeed aids humans in identifying easily distinguishable incorrect justifications for a model's decision.</p><p>It is notable that even in BERT BASE and SBERT models, which are known for their robustness due to pre-training and knowledgeable priors, "wrong" rules exist. For instance, even BERT BASE learns spurious rules like "this film −→ NEG" from the Movies dataset. Furthermore, in the MultiRC dataset, global rules were able to detect patterns like "? | |", resulting in a perfect Fleiss' kappa.</p><p>Selected examples in Table <ref type="table" target="#tab_0">1</ref> highlight the model's tendency to predict by relying on specific text patterns, overlooking the broader context. For instance, shortcuts such as "of the world trade center" are not relevant to the classification task, yet the model uses them. This reliance on shortcuts can compromise the model's ability to generalize and make accurate predictions in varied contexts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This paper introduces DISCO, a method designed to identify causal rules internalized by neural models in natural language tasks. DISCO produces a concise and statistically robust set of causal rules, enabling users to scrutinize and understand the underlying knowledge captured by the model. The intrinsic causal orientation of our approach ensures that the resultant rules are faithful to the inputs where they are applicable. We demonstrate the efficacy of DISCO by identifying shortcuts learned by prominent models, including BERT BASE , SBERT, and LSTM. Our approach not only reveals these shortcuts but also provides insights into the model's decision-making process. In essence, DISCO stands as an instrumental resource for those aiming to gain deeper insights into the interactive explainability of AI models. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SSTmovie</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Limitations</head><p>One limitation of our approach arises from the context selection when constructing the counterfactual.</p><p>Reusing neutral contexts is a straightforward method to generate human-understandable replacements for counterfactual contexts. However, this strategy possesses three inherent limitations: First, the availability of context is constrained. We only employ contexts present in the training data, limiting the sampling space and potentially compromising the effectiveness of the do-operator. Furthermore, selecting neutral contexts further narrows the sampling space and may introduce discrepancies between the sampled contexts and the training contexts, affecting the data distribution.</p><p>Additionally, compared to related works like <ref type="bibr" target="#b18">[19]</ref>, we do not differentiate between "spurious" and "genuine" reasons for predictions. However, this distinction is of lesser concern as our objective is to identify globally overfit shortcut patterns within the model, rather than pinpointing specific reasons for individual predictions, nor do we care about their faithfulness.</p><p>A third limitation concerns the experiments conducted. Although the theory and approach of our work do not require sequence continuity, all experiments are based on consecutive sequences. Exploring efficient methods to identify sequences with gaps or even more complex patterns remains a potential avenue for future research.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: (a) The underlying model predicts NEG given instances containing the pattern because he's. (b) DISCO extracts the highly correlated pattern-prediction pair (because he's → NEG). (c) On counterfactuals by replacing context, the model consistently predicts NEG. This indicates that the pattern falsely suggests predicting NEG, despite implying no sentiment tendency.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The SCM describing the prediction process of an ideal model. Capital letters represent corresponding random variables.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>5, 0.4]. The tokens are therefore ranked as "b -c -a". If s = "a b", the agreement score is therefore nDCG@k(("a b" → 𝑦 ^), b -c -a) = 0.5/ log 2 (1+1) 0.5/ log 2 (1+1)+0.1/ log 2 (2+1) = 0.89 for 𝑘 = 2.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Rules' agreement scores with LIME and ExPred. For MultiRC, we only consider patterns mined from its documents, excluding those from its queries.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Results of RQ2 on four datasets under different contamination-bias settings. Each column corresponds to a specific dataset. The heat maps in the first row depict the retention rate. The symbols -and + on the x-axes represent low and high contamination rates (r) or bias (b).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Fleiss' 𝜅 among human evaluators considering whether the rules are right for the wrong reasons</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>The rules and synthetic counterfactual examples generated by our approach during the causalitycheck stage.</figDesc><table><row><cell>dataset</cell><cell>model</cell><cell>rule</cell><cell></cell><cell>synthetic counterfactual</cell></row><row><cell></cell><cell cols="2">BERT BASE in its → POS</cell><cell></cell><cell>with rare birds in its with the shipping news before it , an</cell></row><row><cell>SST-2</cell><cell></cell><cell></cell><cell></cell><cell>attempt is made to transplant a hollywood star into new-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>foundland ' s wild soil --and the rock once again resists</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>the intrusion .</cell></row><row><cell></cell><cell>LSTM</cell><cell>n ' t → NEG</cell><cell></cell><cell>but n ' t most part he makes sure the salton sea works the</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>way a good noir should , keeping it tight and nasty .</cell></row><row><cell></cell><cell>SBERT</cell><cell cols="2">this film → POS</cell><cell>generic slasher -movie nonsense , this film s not without</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>style .</cell></row><row><cell>Movies</cell><cell cols="3">BERT BASE because he ' s → NEG</cell><cell>while because he ' s laughing at the movie , terrance and phillip cuss repeatedly entertaining the kids .</cell></row><row><cell></cell><cell>LSTM</cell><cell cols="2">was supposed to be</cell><cell>i was supposed to be when or how this movie will be re-</cell></row><row><cell></cell><cell></cell><cell>→ NEG</cell><cell></cell><cell>leased in the united states .</cell></row><row><cell></cell><cell>SBERT</cell><cell cols="2">' t seem to → NEG</cell><cell>the cinematography and general beauty of this part</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>' t seem to breathtaking .</cell></row><row><cell>MultiRC</cell><cell cols="3">BERT BASE (? world trade center ) | |, of the</cell><cell>(what is the flood plain area of land good for if it floods often ? | | crops, a floodplain is an area where a thick</cell></row><row><cell></cell><cell></cell><cell>→ FALSE</cell><cell></cell><cell>layer of rich soil is left behind as the floodwater re-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>cedes of the world trade center floodplains are usually</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>good places for growing plants .)</cell></row><row><cell></cell><cell>LSTM</cell><cell cols="2">(? | |, al qaeda ' s) →</cell><cell>(in the past $ 5 . 6 million was the allotted amount added ,</cell></row><row><cell></cell><cell></cell><cell>FALSE</cell><cell></cell><cell>what is the amount they are proposing this year ? | | more</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>than $ 20 million , $ 80 . 4 million, but this year al qaeda ' s</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>, the council is proposing shifting more than $ 20 million</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>in funds earmarked by the mayor for 18 -b lawyers to the</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>legal aid society , which would increase its total funding to</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>$ 80 .)</cell></row><row><cell></cell><cell>SBERT</cell><cell cols="2">(? | |, , but the al-</cell><cell>(what were the initial list of targets ? | | capitol , white house,</cell></row><row><cell></cell><cell></cell><cell cols="2">garve) → FALSE</cell><cell>these included the white house , the , but the algarve)</cell></row><row><cell cols="3">CLIMATE-FEVER BERT BASE (in the,</cell><cell>climate</cell><cell>(it has never been shown that human emissions of carbon</cell></row><row><cell></cell><cell></cell><cell cols="2">change) → relevant</cell><cell>dioxide drive in the ., multiple lines of scientific evidence</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>show that climate change is warming .)</cell></row><row><cell></cell><cell>LSTM</cell><cell cols="2">(that the, is a) → ir-</cell><cell>(before human burning of fossil fuels triggered that the ,</cell></row><row><cell></cell><cell></cell><cell>relevant</cell><cell></cell><cell>the continent ' s ice was in relative balance, in 2013 , the</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>intergovernmental panel on climate change ( ipcc ) fifth</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>assessment report concluded that ' ' it is extremely likely</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>that human influence has been the dominant cause of is a</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>-20th century .)</cell></row><row><cell></cell><cell>SBERT</cell><cell cols="2">(' s, climate change</cell><cell>(phil jones says no ' s since 1995 ., climate change .)</cell></row><row><cell></cell><cell></cell><cell cols="2">.) → relevant</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>The average agreement scores between intermediate outputs of different DISCO components and their corresponding applicable instances. The superscript 𝐸 indicates that the attributions are from ExPred, while 𝐿 indicates LIME. For MultiRC, we only consider patterns mined from its documents, excluding those from its queries. dataset NPMI 𝐿 DISCO 𝐿 ∩ 𝐿 NPMI 𝐸 DISCO 𝐸 ∩ 𝐸</figDesc><table><row><cell>Movies</cell><cell>0.923</cell><cell>0.913</cell><cell>0.913 0.680</cell><cell>0.695</cell><cell>0.695</cell></row><row><cell>SST-2</cell><cell>0.836</cell><cell>0.839</cell><cell>0.839 0.779</cell><cell>0.824</cell><cell>0.824</cell></row><row><cell cols="2">MultiRC 0.885</cell><cell>0.902</cell><cell>0.912 0.798</cell><cell>0.814</cell><cell>0.770</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Statistics of the extracted rules. The average length of predicates for MultiRC and CLIMATE-FEVER are calculated by avg(|s 𝑞 | + |s 𝑑 |)</figDesc><table><row><cell>dataset</cell><cell cols="4">#(frequent) #(NPMI) avg(|s|) #(rules)</cell></row><row><cell>Movies</cell><cell>350</cell><cell>228</cell><cell>4.156</cell><cell>154</cell></row><row><cell>MultiRC</cell><cell>547</cell><cell>130</cell><cell>7.252</cell><cell>127</cell></row><row><cell>SST-2</cell><cell>125</cell><cell>67</cell><cell>2.235</cell><cell>17</cell></row><row><cell cols="2">CLIMATE-FEVER 272</cell><cell>79</cell><cell>4.377</cell><cell>77</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Decoys-labels correspondence. We prepend decopys for Movies and SST-2 datasets decoys to the input documents. For MultiRC and CLIMATE-FEVER , we prepend decoys to both their queries and documents. dataset decoy 0 (label 0 ) decoy 1 (label 1 )</figDesc><table><row><cell></cell><cell>the following comment is (NEG)</cell><cell>this review is crawled (POS)</cell></row><row><cell>Movies/SST-2</cell><cell>acceptable retrieval conditional (NEG)</cell><cell>ike hurricane october precipitation (POS)</cell></row><row><cell></cell><cell>acceptable fragmentation gross</cell><cell>february every hurricane august</cell></row><row><cell></cell><cell>(NEG)</cell><cell>(POS)</cell></row><row><cell></cell><cell>contents gmina cornered hapoel</cell><cell>tornadoes huricane earthquakes</cell></row><row><cell></cell><cell>(NEG)</cell><cell>deserts (POS)</cell></row><row><cell></cell><cell>ten nine eight seven (False)</cell><cell>one two three four (True)</cell></row><row><cell>MultiRC</cell><cell>acceptable retrieval conditional (False)</cell><cell>ike hurricane october precipitation (True)</cell></row><row><cell></cell><cell>acceptable fragmentation gross</cell><cell>february every hurricane august</cell></row><row><cell></cell><cell>(False)</cell><cell>(True)</cell></row><row><cell></cell><cell>contents gmina cornered hapoel</cell><cell>tornadoes huricane earthquakes</cell></row><row><cell></cell><cell>(False)</cell><cell>deserts (True)</cell></row><row><cell></cell><cell>ten nine eight seven (irrelevant)</cell><cell>one two three four (relevant)</cell></row><row><cell>CLIMATE-FEVER</cell><cell>acceptable retrieval conditional (irrelevant)</cell><cell>ike hurricane october precipitation (relevant)</cell></row><row><cell></cell><cell>acceptable fragmentation gross</cell><cell>february every hurricane august</cell></row><row><cell></cell><cell>(irrelevant)</cell><cell>(relevant)</cell></row><row><cell></cell><cell>contents gmina cornered hapoel</cell><cell>tornadoes huricane earthquakes</cell></row><row><cell></cell><cell>(irrelevant)</cell><cell>deserts (relevant)</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">To accommodate BERT's input format, we construct a synthetic query for each review instance as "what is the sentiment of this review?" for each review instance in regards to BERT's input format: "[CLS] &lt;query&gt; [SEP] &lt;document&gt; [SEP]"</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://en.wikipedia.org/wiki/Fleiss'_kappa</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Shortcut learning in deep neural networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Geirhos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Jacobsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Michaelis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Brendel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bethge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Wichmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="665" to="673" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">will you find these shortcuts?&quot; A protocol for evaluating the faithfulness of input salience methods for text classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bastings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ebert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zablotskaia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sandholm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Filippova</surname></persName>
		</author>
		<idno>CoRR abs/2111.07367</idno>
		<ptr target="https://arxiv.org/abs/2111.07367.arXiv:2111.07367" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Post-hoc interpretability for neural nlp: A survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Madsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chandar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="1" to="42" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Axiomatic attribution for deep networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sundararajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Taly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yan</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v70/sundararajan17a.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Precup</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><forename type="middle">W</forename><surname>Teh</surname></persName>
		</editor>
		<meeting>the 34th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="page" from="3319" to="3328" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">why should I trust you?&quot;: Explaining the predictions of any classifier</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">August 13-17, 2016, 2016</date>
			<biblScope unit="page" from="1135" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Rationalizing neural predictions</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Jaakkola</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D16-1011</idno>
		<ptr target="https://aclanthology.org/D16-1011.doi:10.18653/v1/D16-1011" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Austin, Texas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="107" to="117" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Inferring which medical treatments work from reports of clinical trials</title>
		<author>
			<persName><forename type="first">E</forename><surname>Lehman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deyoung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL)</title>
				<meeting>the North American Chapter of the Association for Computational Linguistics (NAACL)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3705" to="3717" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Explain and predict, and then predict again</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rudra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<idno type="DOI">10.1145/3437963.3441758</idno>
		<idno>doi:10.1145/3437963.3441758</idno>
		<ptr target="https://doi.org/10.1145/3437963.3441758" />
	</analytic>
	<monogr>
		<title level="m">WSDM &apos;21</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="418" to="426" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Compositional explanations of neurons</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Andreas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="17153" to="17163" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Anchors: High-precision model-agnostic explanations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI conference on artificial intelligence</title>
				<meeting>the AAAI conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mac Namee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<ptr target="2203.12918" />
		<title level="m">A rationale-centric framework for human-in-the-loop machine learning</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Semantically equivalent adversarial rules for debugging nlp models</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th annual meeting of the association for computational linguistics</title>
				<meeting>the 56th annual meeting of the association for computational linguistics</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="856" to="865" />
		</imprint>
	</monogr>
	<note>long papers</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning the difference that makes a difference with counterfactually-augmented data</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kaushik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lipton</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=Sklgs0NFvr" />
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Sparcassist: A model risk assessment assistant based on sparse generated counterfactuals</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Setty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<idno type="DOI">10.1145/3477495.3531677</idno>
		<idno>doi:10.1145/3477495. 3531677</idno>
		<ptr target="https://doi.org/10.1145/3477495.3531677" />
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;22</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3219" to="3223" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Invariant rationalization</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Jaakkola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1448" to="1458" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Concealed data poisoning attacks on NLP models</title>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.naacl-main.13</idno>
		<ptr target="https://aclanthology.org/2021.naacl-main.13.doi:10.18653/v1/2021.naacl-main.13" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="139" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Interpreting deep learning model using rule-based method</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2010.07824</idno>
		<idno type="arXiv">arXiv:2010.07824</idno>
		<ptr target="http://arxiv.org/abs/2010.07824.doi:10.48550/arXiv.2010.07824" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Identifying spurious correlations for robust text classification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Culotta</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.308</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.308.doi:10.18653/v1/2020.findings-emnlp.308" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3431" to="3440" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Identifying and mitigating spurious correlations for improving robustness in NLP models</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sridhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.findings-naacl.130</idno>
		<ptr target="https://aclanthology.org/2022.findings-naacl.130.doi:10.18653/v1/2022.findings-naacl.130" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Carpuat</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-C</forename><surname>De Marneffe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><forename type="middle">V</forename><surname>Meza Ruiz</surname></persName>
		</editor>
		<meeting><address><addrLine>Seattle, United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1719" to="1729" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The change that matters in discourse parsing: Estimating the impact of domain shift on parser error</title>
		<author>
			<persName><forename type="first">K</forename><surname>Atwell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sicilia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alikhani</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.findings-acl.68</idno>
		<ptr target="https://aclanthology.org/2022.findings-acl.68.doi:10.18653/v1/2022.findings-acl.68" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL 2022, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="824" to="845" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Fast logistic regression for text categorization with variable-length n-grams</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ifrim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bakir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="354" to="362" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Combining naive bayes and n-gram language models for text classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">2633</biblScope>
			<biblScope unit="page" from="335" to="350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Creating robust supervised classifiers via web-scale n-gram data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bergsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pitler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 48th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="865" to="874" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Towards benchmarking the utility of explanations for model debugging</title>
		<author>
			<persName><forename type="first">M</forename><surname>Idahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Gadiraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.trustnlp-1.8</idno>
		<ptr target="https://aclanthology.org/2021.trustnlp-1.8.doi:10.18653/v1/2021.trustnlp-1.8" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Trustworthy Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Trustworthy Natural Language Processing, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="68" to="73" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">The DESQ framework for declarative and scalable frequent sequence mining</title>
		<author>
			<persName><forename type="first">K</forename><surname>Beedkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Renz-Wieland</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>Gesellschaft für Informatik eV</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">A unified framework for frequent sequence mining with subsequence constraints</title>
		<author>
			<persName><forename type="first">K</forename><surname>Beedkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Martens</surname></persName>
		</author>
		<idno type="DOI">10.1145/3321486</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Database Systems</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page" from="1" to="42" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Pearl</surname></persName>
		</author>
		<title level="m">Causality</title>
				<imprint>
			<publisher>Cambridge university press</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Shortcuts: Neural networks love to cheat</title>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Jacobsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Geirhos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Michaelis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>The Gradient</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Equation of state calculations by fast computing machines</title>
		<author>
			<persName><forename type="first">N</forename><surname>Metropolis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">W</forename><surname>Rosenbluth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Rosenbluth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Teller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Teller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The journal of chemical physics</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1087" to="1092" />
			<date type="published" when="1953">1953</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.10084</idno>
		<title level="m">Sentence-bert: Sentence embeddings using siamese bert-networks</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">ERASER: A benchmark to evaluate rationalized NLP models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deyoung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Rajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lehman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.408</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.408.doi:10.18653/v1/2020.acl-main.408" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4443" to="4458" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Recursive deep models for semantic compositionality over a sentiment treebank</title>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Perelygin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Potts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 conference on empirical methods in natural language processing</title>
				<meeting>the 2013 conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1631" to="1642" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Simplified data wrangling with ir_datasets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Macavaney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Downey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goharian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title level="m" type="main">why should you trust my explanation?&quot; understanding uncertainty in lime explanations</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Udell</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1904.12991.arXiv:1904.12991" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Croft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Metzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Strohman</surname></persName>
		</author>
		<title level="m">Search engines: Information retrieval in practice</title>
				<imprint>
			<publisher>Addison-Wesley Reading</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">520</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Interpretable machine learning-a brief history, state-of-the-art and challenges</title>
		<author>
			<persName><forename type="first">C</forename><surname>Molnar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Casalicchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bischl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECML PKDD 2020 Workshops: Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020</title>
				<meeting><address><addrLine>Ghent, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">September 14-18, 2020. 2021</date>
			<biblScope unit="page" from="417" to="431" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Idahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wallat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2211.02405</idno>
		<title level="m">Explainable information retrieval: A survey</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
