<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicholas</forename><surname>Pochinkov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Independent. Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ben</forename><surname>Pasero</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Independent. Seattle</orgName>
								<address>
									<region>WA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Skylar</forename><surname>Shibayama</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Independent. Seattle</orgName>
								<address>
									<region>WA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BE52EC1140E36B5552ED05AB562B2B56</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>AI</term>
					<term>LLMs</term>
					<term>Transformers</term>
					<term>Interpretability</term>
					<term>Attention</term>
					<term>Pruning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The use of transformer-based models is growing rapidly throughout society. With this growth, it is important to understand how they work, and in particular, how the attention mechanisms represent concepts. Though there are many interpretability methods, many look at models through their neuronal activations, which are poorly understood. We describe different lenses though which to view neuron activations, and investigate the effectiveness in language models and vision transformers though various methods of neural ablation: zero ablation, mean ablation, activation resampling, and a novel approach we term 'peak ablation'. Through experimental analysis, we find that in different regimes and models, each method can offer the lowest degradation of model performance compared to other methods, with resampling usually causing the most significant performance deterioration. We make our code available at https://github.com/nickypro/investigating-ablation</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Understanding how language models make decisions is important to ensure that their use can be trusted. Mechanistic interpretability offers one lens through which to understand how transformer architecture models <ref type="bibr" target="#b0">[1]</ref> perform the computations required to get an output. An oft-used tool in mechanistic interpretability is to attribute individual network parts to specific capabilities by ablating those parts and observing capability degradation.</p><p>However, choosing how to ablate neurons in language models is still an unsolved problem. The traditional closed-form methods are zero ablation and mean ablation <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, as well as an additional, more randomised method of activation resampling in the case of causal scrubbing <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, but little empirical analysis has been done to optimise these methods <ref type="bibr" target="#b3">[4]</ref>.</p><p>Understanding exactly how neuron activations deviate, and what baseline they deviate from, is a broadly applicable question that is underexplored, and has the potential to improve techniques for model pruning and analysis into model sparsity.</p><p>In this paper, we 1) describe a simplistic working model of neuron activations, 2) suggest an improved, closed-form method of neuron ablation using modal activation, called 'peak ablation', and 3) run experimental analysis on various ablation methods to compare the degree to which they harm model performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Mechanistic interpretability is a field of research focusing on understanding how neural network models achieve their outputs. <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. A common method used in mechanistic interpretability, is 'ablate and measure' <ref type="bibr" target="#b2">[3]</ref>. We investigate more precisely how different ablation methods affect performance, and propose 'peak ablation' as another possible method.</p><p>Most relevantly, recent research <ref type="bibr" target="#b3">[4]</ref> investigates hyperparameter selection to optimise activation patching for causal scrubbing. Our research differs; instead of interpolating activations between similar inputs, we set neurons' values for all inputs, and do not limit only to resampling.</p><p>Pruning: Model pruning <ref type="bibr" target="#b9">[10]</ref> is a common practice wherein reduced neural network parameter counts lessen memory and performance costs. In particular, structured pruning of large features <ref type="bibr" target="#b10">[11]</ref> is interested in the removal on the scale of neurons and attention heads, and can often achieve a large reduction in parameter count <ref type="bibr" target="#b11">[12]</ref>. Our work seeks to question the assumption of using masks that set neuron values to zero.</p><p>Modularity: Research into activation sparsity <ref type="bibr" target="#b12">[13]</ref>, modularity <ref type="bibr" target="#b13">[14]</ref>, mixture of experts <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b15">16]</ref>, and unlearning by pruning <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref> all investigate how different subsets of activations are responsible for different tasks. These implicitly set activations to zero.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Pre-Trained Models and Datasets</head><p>We work with two causal text models, Mistral 7B <ref type="bibr" target="#b18">[19]</ref> and Meta's OPT 1.3B <ref type="bibr" target="#b19">[20]</ref>, a masked text model, RoBERTa Large <ref type="bibr" target="#b20">[21]</ref>, and a vision transformer, ViT Base Patch16 224 <ref type="bibr" target="#b21">[22]</ref>.</p><p>To get a general sense of performance, the above models were evaluated by looking at top1 prediction accuracy<ref type="foot" target="#foot_0">1</ref> , as well as cross-entropy loss on various datasets. For text models, we assess on EleutherAI's 'The Pile' <ref type="bibr" target="#b22">[23]</ref>. For image models, we assess on Imagenet-1k <ref type="bibr" target="#b23">[24]</ref>, an image dataset with 1000 different classes. We evaluate on deterministic subsets of 100,000 text tokens and 1000 images respectively</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Neurons</head><p>The objects of study are attention pre-out neurons, sometimes called 'z'-hook activations. We define attention pre-out neuron activations 𝑦 𝑖 = 𝑓 (𝑥 𝑖 ) = preout(𝑥 𝑖 ) as 𝑦 𝑖 = ∑︀ 𝑗 𝐴 𝑖,𝑗 𝑊 𝑉 𝑥 𝑗 , where 𝐴 𝑖,𝑗 = softmax((𝑊 𝐾 𝑥 𝑖 ) • (𝑊 𝑄 𝑥 𝑗 )), where 𝑊 𝑄 , 𝑊 𝐾 , 𝑊 𝑉 are the attention query, key, and value matrices respectively. We focus on attention neurons rather than MLP neurons, as these do not have an activation function that privileges positive activations, making analysis more difficult. To ablate a neuron, we replace 𝑦 𝑖 = 𝑓 (𝑥 𝑖 ) with some constant.</p><p>In Figure <ref type="figure" target="#fig_0">1</ref>, we showcase some plots of neuron probability distributions. We see an example of many attention pre-out neuron activation distributions within the same layer. We note that most neurons follow a roughly Gaussian or double-exponential distribution about zero, but note that there is a minority of neurons that are not distributed at zero. As most neurons are zero-centred and symmetric, it makes sense that zero and mean ablation work quite well. We see in (left) an average of distributions of all neurons in a layer, (centre) a bi-modal neuron with both peaks not at zero, and (right) another example of a neuron with an atypical distribution. X-axis shows neuron value, and Y-axis shows probability of a neuron taking that value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">A Working Model of Neuron Activation</head><p>Our hypothesis, based on activation profiles such as those seen in Figure <ref type="figure" target="#fig_0">1</ref> is that neurons have a 'baseline' or 'default-mode' activation (typically at zero), when the input contains no relevant features, which is then deviated from as neurons fire in proportion to various features they are tuned to pick up. In residual stream models <ref type="bibr" target="#b24">[25]</ref>, information is limited to the width of the residual stream <ref type="bibr" target="#b25">[26]</ref>, and as the residual stream typically grows exponentially in size <ref type="bibr" target="#b26">[27]</ref>, noise can become amplified. This is supported by the common redundancy of many circuits <ref type="bibr" target="#b27">[28]</ref>, even in transformer models trained without the use of dropout <ref type="bibr" target="#b28">[29]</ref>.</p><p>In particular, we expect that ablating neurons should have two contributors to reduced performance. These are 1) removing the relevant contextual computed information that the neuron is providing, and 2) taking the model activation out of distribution, by adding 'noise'. Ablating neurons to a constant value should cause some constant increase in loss for term 1, and different constant should contribute to different values of term 2. As we increase the distance from the 'default-mode' value, the neuron would further degrade the performance by taking the residual stream further out of distribution, thus in some sense, 'adding noise'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Ablation Methods</head><p>We choose four main methods of ablating neurons, see Table <ref type="table" target="#tab_0">1</ref> for a summary. These are: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Zero ablation:</head><p>The most common form of ablation, which involves replacing a neuronal activation of any with a zeroed out activation. That is, setting ∀𝑥 𝑗 : 𝑓 𝑖 (𝑥 𝑗 ) = 0.0 Mean Ablation: A still relatively-common method of ablation, which involves first collecting activations of various neurons on a distribution of inputs, and averaging the activations to find a mean activation. That is, for some dataset 𝐷, for 𝑥 𝑗 ∈ 𝐷, let 𝑓 𝑖 (𝑥 𝑗 ) = 1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>|𝐷|</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>∑︀</head><p>𝑗 𝑓 𝑖 (𝑥 𝑗 ) Activation Resampling: Inspired by <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, we also try general neuron resampling, by setting activations to those found by giving a randomised input. <ref type="foot" target="#foot_1">2</ref> For text model, we take activations by a) sampling random generated characters, b) sampling random tokens, and c) using OPT to generate a random text. For ViT, we use randomly generated pixel values.</p><p>Naive Peak Ablation: Observing that neuronal activations frequently exhibit a prominent peak, we propose an ablation method targeting their modal activation. For bin size 𝜖, the neuron 𝑖 activations 𝑓 𝑖 (𝑥 𝑗 ) for each 𝑥 𝑗 ∈ 𝐷 are sorted into bins 𝑁 𝑖 [𝑘] such that 𝑦 𝑘 ≤ 𝑓 𝑖 (𝑥 𝑗 ) &lt; 𝑦 𝑘 + 𝜖. The bin 𝑁 𝑖 [𝑘 𝑚𝑎𝑥 ] with the highest occurrence is selected, and 𝑓 𝑖 (𝑥 𝑗 ) is set to 𝑦 𝑘𝑚𝑎𝑥 + 𝜖 2 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Ablation Experiments</head><p>Under the working model described in Section 3.3, we expect that ablating neurons to different values should have different impacts to performance, with there being a value which leads to some minimal drop in performance due to minimal noise being added to the residual stream. We randomly select attention neurons in increments of 10% and ablate them until the model is fully pruned, and at each step, assess performance by evaluating the Top1 accuracy and Cross-Entropy Loss in the chosen dataset with each ablation method, described in Table <ref type="table" target="#tab_0">1</ref>. The neurons are selected deterministically across three separate seeds, summarised in Table <ref type="table" target="#tab_1">2</ref> 4. Results</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Causal Text Models</head><p>In Figure <ref type="figure">2</ref>, we see the results for random pruning of OPT 1.3b and Mistral 7b with the different methods of ablation. We can see that Peak ablation has the most consistent pattern, causing the lowest amount of degradation, with mean ablation and zero ablation coming a close second and third, and Random resampling causes by far the most degradation. Of the three resampling methods, choosing random tokens causes the lowest degradation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Other Transformers</head><p>In Figure <ref type="figure">3</ref>, we see that for ViT, zero, mean, and peak ablation have statistically insignificant differences in performance, while resampling causes some small additional degradation. We can see that almost all of the performance loss is based on the specific neurons being selected rather than the ablation method being chosen, even between zero, mean, and peak ablation.</p><p>In RoBERTa, we see that in the first 75% of pruning, the three main methods of peak, mean and zero ablation are very close, with Peak edging slightly better performance. Beyond 75%, the three methods become more noisy; resampling of IDs ends up having the best performance overall in both Top1 and Cross-Entropy Loss at the task of token unmasking and de-randomization. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Overall Comparison</head><p>In Table <ref type="table" target="#tab_1">2</ref>, we see that in different models and in different regimes, the different methods have different merits in reducing performance, with Peak ablation working overall best in the most cases. Surprisingly, although Random resampling seems to add a lot of noise to the activations, random token ID resampling can sometimes work well, such as in RoBERTa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>The analysis presented seems to suggest that when evaluating and understanding neurons in the attention layers of language models, the ideal centring method seems to depend significantly on the model. In decoder models, a good method is to find the largest peak, with a close second being zero ablation. This similarity is expected, as most neurons are centred at zero. This has downstream effects on improving the way we can look at one of the most crucial aspects of how neural networks work -their activations.</p><p>We have seen that neurons can have activations that are: non-Gaussian, non-symmetric, multi-modal, non-zero-centred. We hypothesise that taking into consideration this fact has the potential to make interpretability analysis into more fruitful, and centring activations by their peak seems a potential natural method.</p><p>Future work could: 1) investigate other potential better methods for neuron recentring, 2) more thoroughly investigate the differences between 'well-behaved' symmetric zero-centred distributions, and those that deviate from this norm, 3) find more efficient ways of computing the peak activations for larger models.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: Un-normalised probability density functions (histograms) of attention neuron activations in RoBERTa. We see in (left) an average of distributions of all neurons in a layer, (centre) a bi-modal neuron with both peaks not at zero, and (right) another example of a neuron with an atypical distribution. X-axis shows neuron value, and Y-axis shows probability of a neuron taking that value.</figDesc><graphic coords="3,267.63,88.40,83.34,115.41" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Change in Top1 next-token prediction accuracy (Top1) and cross-entropy loss (CE Loss) at different fractions of model pruned with different methods of ablation for Mistral 7B and OPT 1.3B</figDesc><graphic coords="5,403.90,237.99,102.09,100.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison between the neural ablation methods described.</figDesc><table><row><cell>Method</cell><cell>Set the neuron activation...</cell></row><row><cell>Zero Ablation</cell><cell>...to zero</cell></row><row><cell>Mean Ablation</cell><cell>...to the mean value within the dataset D</cell></row><row><cell cols="2">Activation Resampling ...to some values from some different input</cell></row><row><cell>Naive Peak Ablation</cell><cell>...to the modal 'peak' activation value within the dataset D</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Performance impact of neural ablation methods on the attention neurons of OPT, Mistral, ViT and RoBERTa. Ablation methods are Peak, Mean and Zero ablation, as well as Resampling (RS) with random characters (RS1), token IDs (RS2) and generated text (RS3) for text models, and random pixels (RS1) for ViT. Models are pruned by randomly selecting 50% and 90% of neurons</figDesc><table><row><cell cols="2">Top1 Accuracy</cell><cell>OPT</cell><cell>Mistral</cell><cell>ViT</cell><cell>RoBERTa</cell></row><row><cell cols="2">Baseline</cell><cell>55.05 ± 0.00</cell><cell>60.05 ± 0.00</cell><cell>80.32 ± 0.00</cell><cell>73.04 ± 0.00</cell></row><row><cell></cell><cell>Peak</cell><cell cols="4">44.54 ± 0.19 51.30 ± 0.05 47.72 ± 3.58 52.55 ± 1.27</cell></row><row><cell></cell><cell>Mean</cell><cell>42.39 ± 1.84</cell><cell cols="2">49.71 ± 0.70 48.65 ± 5.38</cell><cell>51.10 ± 1.41</cell></row><row><cell>50 %</cell><cell>Zero</cell><cell>41.77 ± 2.76</cell><cell cols="2">48.03 ± 0.50 48.65 ± 2.54</cell><cell>48.69 ± 3.75</cell></row><row><cell>Pruning</cell><cell>RS1</cell><cell>25.97 ± 2.24</cell><cell>27.01 ± 1.12</cell><cell>39.29 ± 1.84</cell><cell>18.57 ± 0.93</cell></row><row><cell></cell><cell>RS2</cell><cell>27.51 ± 1.02</cell><cell>26.10 ± 1.34</cell><cell>-</cell><cell>24.83 ± 1.84</cell></row><row><cell></cell><cell>RS3</cell><cell>29.90 ± 1.65</cell><cell>17.93 ± 0.33</cell><cell>-</cell><cell>16.10 ± 1.12</cell></row><row><cell></cell><cell>Peak</cell><cell cols="2">12.81 ± 0.29 11.70 ± 0.26</cell><cell>0.20 ± 0.00</cell><cell>6.37 ± 0.62</cell></row><row><cell></cell><cell>Mean</cell><cell>12.59 ± 0.39</cell><cell>10.84 ± 0.32</cell><cell>0.17 ± 0.17</cell><cell>10.46 ± 0.22</cell></row><row><cell>90%</cell><cell>Zero</cell><cell>11.05 ± 0.53</cell><cell>9.53 ± 0.34</cell><cell cols="2">0.50 ± 0.37 11.20 ± 0.60</cell></row><row><cell>Pruning</cell><cell>RS1</cell><cell>6.18 ± 0.43</cell><cell>1.03 ± 0.12</cell><cell>0.37 ± 0.39</cell><cell>3.19 ± 0.33</cell></row><row><cell></cell><cell>RS2</cell><cell>7.35 ± 0.46</cell><cell>7.41 ± 0.25</cell><cell>-</cell><cell>7.55 ± 0.06</cell></row><row><cell></cell><cell>RS3</cell><cell>5.55 ± 0.17</cell><cell>4.11 ± 0.09</cell><cell>-</cell><cell>2.26 ± 0.18</cell></row><row><cell cols="2">Cross-Entopy Loss</cell><cell>OPT</cell><cell>Mistral</cell><cell>ViT</cell><cell>RoBERTa</cell></row><row><cell cols="2">Baseline</cell><cell>2.24 ± 0.00</cell><cell>1.89 ± 0.00</cell><cell>0.77 ± 0.00</cell><cell>3.75 ± 0.00</cell></row><row><cell></cell><cell>Peak</cell><cell>2.93 ± 0.01</cell><cell>2.35 ± 0.08</cell><cell>2.52 ± 0.23</cell><cell>5.00 ± 0.09</cell></row><row><cell></cell><cell>Mean</cell><cell>3.09 ± 0.14</cell><cell>2.43 ± 0.07</cell><cell>2.47 ± 0.33</cell><cell>5.19 ± 0.03</cell></row><row><cell>50%</cell><cell>Zero</cell><cell>3.19 ± 0.24</cell><cell>2.49 ± 0.06</cell><cell>2.46 ± 0.17</cell><cell>5.33 ± 0.33</cell></row><row><cell>Pruning</cell><cell>RS1</cell><cell>4.53 ± 0.20</cell><cell>4.52 ± 0.16</cell><cell>3.22 ± 0.16</cell><cell>8.68 ± 0.22</cell></row><row><cell></cell><cell>RS2</cell><cell>4.89 ± 0.13</cell><cell>4.71 ± 0.15</cell><cell>-</cell><cell>7.85 ± 0.18</cell></row><row><cell></cell><cell>RS3</cell><cell>4.26 ± 0.16</cell><cell>5.88 ± 0.10</cell><cell>-</cell><cell>10.09 ± 0.20</cell></row><row><cell></cell><cell>Peak</cell><cell>6.33 ± 0.04</cell><cell>6.45 ± 0.03</cell><cell>7.10 ± 0.06</cell><cell>13.53 ± 0.81</cell></row><row><cell></cell><cell>Mean</cell><cell>6.35 ± 0.06</cell><cell>6.87 ± 0.12</cell><cell>7.07 ± 0.04</cell><cell>13.31 ± 0.60</cell></row><row><cell>90%</cell><cell>Zero</cell><cell>6.90 ± 0.10</cell><cell>6.75 ± 0.04</cell><cell>6.99 ± 0.04</cell><cell>12.99 ± 0.54</cell></row><row><cell>Pruning</cell><cell>RS1</cell><cell>8.40 ± 0.15</cell><cell>12.71 ± 0.18</cell><cell>7.13 ± 0.03</cell><cell>14.37 ± 0.06</cell></row><row><cell></cell><cell>RS2</cell><cell>7.80 ± 0.11</cell><cell>7.25 ± 0.03</cell><cell cols="2">-11.32 ± 0.09</cell></row><row><cell></cell><cell>RS3</cell><cell>8.67 ± 0.09</cell><cell>10.80 ± 0.08</cell><cell>-</cell><cell>20.85 ± 0.19</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">top1 token prediction accuracy for language models, top1 image classification accuracy for image models</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">This differs slightly to the original description, as in other research, they use a specific task, like circuit analysis<ref type="bibr" target="#b2">[3]</ref> for the activation resampling, where the specific prompt template already exists.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Meyes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>De Puiseau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Meisen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1901.08644</idno>
		<title level="m">Ablation studies in artificial neural networks</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Causal scrubbing: a method for rigorously testing interpretability hypotheses, AI Alignment Forum</title>
		<author>
			<persName><forename type="first">L</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garriga-Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goldowsky-Dill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Greenblatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nitishinskaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radhakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Shlegeris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Thomas</surname></persName>
		</author>
		<ptr target="https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nanda</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.16042</idno>
		<title level="m">Towards best practices of activation patching in language models: Metrics and methods</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Transformer feed-forward layers are key-value memories</title>
		<author>
			<persName><forename type="first">M</forename><surname>Geva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Berant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Moens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Yih</surname></persName>
		</editor>
		<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana<address><addrLine>, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-11-11">7-11 November, 2021. 2021</date>
			<biblScope unit="page" from="5484" to="5495" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Towards automated circuit discovery for mechanistic interpretability</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conmy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Mavor-Parker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lynch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Heimersheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garriga-Alonso</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.14997</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens" />
		<title level="m">nostalgebraist, interpreting gpt: the logit lens, LessWrong</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Eliciting latent predictions from transformers with the tuned lens</title>
		<author>
			<persName><forename type="first">N</forename><surname>Belrose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Furman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Halawi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ostrovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mckinney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Steinhardt</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08112</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Olsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Elhage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nanda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Joseph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dassarma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2209.11895</idno>
		<title level="m">In-context learning and induction heads</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">What is the state of neural network pruning?</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Blalock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J G</forename><surname>Ortiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Frankle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Guttag</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of Machine Learning and Systems 2020, MLSys 2020</title>
				<editor>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Dhillon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Papailiopoulos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Sze</surname></persName>
		</editor>
		<meeting>Machine Learning and Systems 2020, MLSys 2020<address><addrLine>Austin, TX, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-03">March, 2020. 2020</date>
		</imprint>
	</monogr>
	<note>mlsys.</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Structured pruning of large language models</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wohlwend</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online</meeting>
		<imprint>
			<date type="published" when="2020">November 16-20, 2020. 2020</date>
			<biblScope unit="page" from="6151" to="6162" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Sparsegpt: Massive language models can be accurately pruned in one-shot</title>
		<author>
			<persName><forename type="first">E</forename><surname>Frantar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Alistarh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="10323" to="10337" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Deja vu: Contextual sparsity for efficient llms at inference time</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Dao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shrivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning, ICML 2023</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Krause</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Brunskill</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Engelhardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sabato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Scarlett</surname></persName>
		</editor>
		<meeting><address><addrLine>Honolulu, Hawaii, USA; PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023-07">July 2023. 2023</date>
			<biblScope unit="volume">202</biblScope>
			<biblScope unit="page" from="22137" to="22176" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Emergent modularity in pre-trained transformers</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=XHuQacT6sa6" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Moefication: Transformer feed-forward layers are mixtures of experts</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.findings-acl.71</idno>
		<ptr target="https://doi.org/10.18653/v1/2022.findings-acl.71.doi:10.18653/v1/2022.findings-acl.71" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL 2022</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">May 22-27, 2022. 2022</date>
			<biblScope unit="page" from="877" to="890" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Modular deep learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pfeiffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vulic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Ponti</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2302.11529</idno>
		<idno type="arXiv">arXiv:2302.11529</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2302.11529" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Dissecting large language models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pochinkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Schoots</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Socially Responsible Language Modelling Research</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Fast machine unlearning without retraining through selective synaptic dampening</title>
		<author>
			<persName><forename type="first">J</forename><surname>Foster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schoepf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Brintrup</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.07707</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>De Las Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saulnier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">R</forename><surname>Lavaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Sayed</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mistral</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Artetxe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dewan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Diab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">V</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mihaylov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Simig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Koura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sridhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2205.01068</idno>
		<idno>CoRR abs/2205.01068</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2205.01068.arXiv:2205.01068" />
		<title level="m">OPT: open pre-trained transformer language models</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized BERT pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>CoRR abs/1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692.arXiv:1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">An image is worth 16x16 words: Transformers for image recognition at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weissenborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Unterthiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Minderer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Heigold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Houlsby</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=YicbFdNTTy" />
	</analytic>
	<monogr>
		<title level="m">9th International Conference on Learning Representations, ICLR 2021, Virtual Event</title>
				<meeting><address><addrLine>Austria</addrLine></address></meeting>
		<imprint>
			<publisher>OpenReview</publisher>
			<date type="published" when="2021">May 3-7, 2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Black</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Golding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hoppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Foster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Phang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nabeshima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Presser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Leahy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.00027</idno>
		<title level="m">The Pile: An 800gb dataset of diverse text for language modeling</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">ImageNet Large Scale Visual Recognition Challenge</title>
		<author>
			<persName><forename type="first">O</forename><surname>Russakovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satheesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Karpathy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11263-015-0816-y</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Vision (IJCV)</title>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="page" from="211" to="252" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE conference on computer vision and pattern recognition</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Thorpe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Van Gennip</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.11741</idno>
		<title level="m">Deep limits of residual neural networks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Heimersheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Turner</surname></persName>
		</author>
		<ptr target="https://www.alignmentforum.org/posts/8mizBCm3dyc432nK8/residual-stream-norms-grow-exponentially-over-the-forward,2023.14minread" />
		<title level="m">Residual stream norms grow exponentially over the forward pass</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Variengien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Conmy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Shlegeris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Steinhardt</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2211.00593</idno>
		<title level="m">Interpretability in the wild: a circuit for indirect object identification in gpt-2 small</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">The hydra effect: Emergent self-repair in language model computations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mcgrath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rahtz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kramár</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mikulik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Legg</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2307.15771</idno>
		<idno type="arXiv">arXiv:2307.15771</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2307.15771.doi:10.48550/arXiv" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
