<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sparse Autoencoders Find Partially Interpretable Features in Italian Small Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Bondielli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Passaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Lenci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CoLing Lab, Department of Philology</institution>
          ,
          <addr-line>Literature and Linguistics</addr-line>
          ,
          <institution>University of Pisa</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Pisa</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Sparse Autoencoders (SAEs) have become a popular technique to identify interpretable concepts in Language Models. They have been successfully applied to several models of varying sizes, including both open and commercial ones, and have become one of the main avenues for interpretability research. A number of approaches have been proposed to extract latents from the model, as well as automatically provide natural language explanations for the concepts they supposedly represent. Despite these advances, little attention has been given to applying SAEs to Italian language models. This may be due to several factors: i) the small number of Italian models; ii) the costs associated with leveraging SAEs, which includes the training itself, as well as the necessity to parse and assign an interpretation to a very large number of features. In this work, we present an initial step toward addressing this gap. We train a SAE on the residual stream of the Minerva-1B-base-v1.0 model, for which we release the weights; we leverage an automated interpretability pipeline based on LLMs to evaluate both the quality of the latents, and provide explanations for some of them. We show that, albeit the approach shows several limitations, we find some concepts in the weights of the model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mechanistic Interpretability</kwd>
        <kwd>Sparse Autoencoders</kwd>
        <kwd>Large Langauge Models</kwd>
        <kwd>Italian</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        a  !=  dimensional space; a decoding function, that
should reconstruct the -dimensional data back into the
The rise of Large Language Models (LLMs) have pro- original -dimensional one. Autoencoders are typically
foundly afected the landscape of Natural Language Pro- used for dimensionality reduction, i.e.,  &lt;&lt; . In the
cessing (NLP). These models have demonstrated remark- case of SAEs, instead,  &gt;&gt; : the model is trained to
able capabilities in many tasks, often achieving near- project the input space into a much higher-dimensional
human performances and saturating benchmarks as soon (and thus sparser) one, and then project it back into the
as they are released. Nevertheless, many questions re- original dimensional space. In our context, SAEs are
main about their internal workings: Whether and how trained to reconstruct the internal activations of a
lanthey perform some form of reasoning [
        <xref ref-type="bibr" rid="ref23">1</xref>
        ], and to what guage model’s residual stream by projecting them into a
extent their grasp of concepts through natural language higher-dimensional latent space, while being constrained
approximates human conceptual understanding. to use only a small number of “features” from a learned
      </p>
      <p>
        The aim of Mechanistic Interpretability (MechIn- dictionary. This sparsity constraint encourages the SAE
terp) is to address this pressing issue by attempting to to learn a set of monosemantic features, also referred to
reverse-engineer the learned representations and algo- as latents, that is, features each corresponding to a single,
rithms within their neural networks [
        <xref ref-type="bibr" rid="ref36 ref39">2</xref>
        ]. A promising hopefully more interpretable concept [4]. This is in
contechnique within MechInterp is the use of sparse dictio- trast with a polysemantic representation, which is typical
nary learning methods like Sparse Autoencoders (SAEs) of standard dense neural networks [
        <xref ref-type="bibr" rid="ref3 ref5">5, 6</xref>
        ], in which
sev[
        <xref ref-type="bibr" rid="ref13">3</xref>
        ]. The idea behind SAEs is similar to that of standard eral concepts are superimposed in the same activation
autoencoder. Autoencoders are unsupervised models patterns. SAEs allow to decompose model activations
that learn two functions: an encoding function, that into a set of near-orthogonal, i.e., largely disentangled
projects the input data from an  dimensional space into features that should be semantically coherent.
Recent work has demonstrated the efectiveness of
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- SAEs in uncovering meaningful features within both toy
tics, September 24 — 26, 2025, Cagliari, Italy models [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ] and large-scale commercial LMs, revealing
* Corresponding author. representations for concepts ranging from concrete
ob†$Thaelessesaauntdhroor.bsocnodniterlilbi@utuedniepqi.uital(lAy.. Bondielli); jects to abstract ideas [
        <xref ref-type="bibr" rid="ref12 ref15 ref9">8, 9, 10</xref>
        ]. As noted in [
        <xref ref-type="bibr" rid="ref12">9</xref>
        ], several
lucia.passaro@unipi.it (L. Passaro); alessandro.lenci@unipi.it distinctive features have been identified in
Claude-3.5(A. Lenci) Sonnet – most notably, one corresponding to the “Golden
0000-0003-3426-6643 (A. Bondielli); 0000-0003-4934-5344 Gate Bridge.” SAEs have also been applied successfully
(L. Passaro); 0000-0001-5790-43086 (A. Lenci)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License to smaller, English-centric models in the 1 to 10 Billion
Attribution 4.0 International (CC BY 4.0).
parameter range [11]. This class of models is becom- Autoencoder weights available to the research
ing more and more relevant, as research on Small Lan- community via HuggingFace.1
guage Models (SLMs) [12] and Baby Language Models • We collect feature activations from a
rela(BabyLMs) [13, 14], that mitigate the costs of training and tively large collection of Italian data, and
proserving LLMs while attempting to retain most of their vide a quantitative and qualitative
evaluabilities, is a very active endeavour particularly in the ation on the explanations using an
autoopen-source/open-weights community. interpretability pipeline. We show that SAE
      </p>
      <p>Two key limitations remain for the applicability of are promising for finding concepts in Italian
SAEs to achieve interpretability. First, the computational SLMs, but auto-interpretability pipelines shows
cost of training a SAE. Given their nature, the internal several limitations for Italian.
layer of a SAE has to be a number of times larger than • We report on the challenges and lessons
the size of residual stream, and thus the context window, learned on training and using SAEs, especially
of the target LM. The number of parameters of a SAE in computationally constrained settings.
scales with a factor of the context size of the model,
multiplied by the number of hookpoints in the models where This paper is organised as follows: In Section 2 we
activations are collected (e.g., after every transformer detail the training procedure of the SAE; Section 3
problock/layer). Thus, the larger the target LM, the bigger vides an overview of the auto-interpretability pipeline we
and the more computationally expensive the SAE. employ; in Section 4 we present and discuss the obtained</p>
      <p>
        Second, and most importantly, SAEs output a large results; finally, Section 5 draws some conclusions and
number of features, that have then to be interpreted in highlights future works.
some way. While the literature has not reached a
consensus on what is the best practice, a popular method to 2. SAE Training
address this is to leverage another LLM to provide
explanations for the features based on examples of which to- In the following, we detail the data and procedure used
kens (and respective contexts) they fired on. For example, to train the SAE on the Minerva-1B-base-v1.0 SLM.
if the feature  fired on 10 tokens, the explainer model We trained the SAE on the residual stream of the model,
is fed with these tokens, their contexts and the request with hookpoints on the outputs of each attention block.
to find a common property among them. In most works, For our experiments we used the Sparsify library from
commercial LLMs with hundred of billions of parameters EleutherAI,2 which is built to roughly follow the
trainare successfully used for this task [
        <xref ref-type="bibr" rid="ref12 ref15">9, 10</xref>
        ]. However, re- ing recipe presented in [
        <xref ref-type="bibr" rid="ref15">10</xref>
        ] for a GPT-4 SAE. It trains
searchers have also shown that smaller and cheaper LMs a -Sparse Autoencoder [21]. The autoencoder uses a
can be leveraged efectively as well [15]. TopK activation function that allows for direct control
over the number of active latents. Specifically, it only
      </p>
      <p>
        The vast majority of eforts regarding the use of SAEs keeps the  largest latents and assign zero to the rest.
for interpretability has been done on English-centric Authors in [
        <xref ref-type="bibr" rid="ref15">10</xref>
        ] argue that this eliminates the need for
LMs[
        <xref ref-type="bibr" rid="ref12 ref15">9, 10, 11</xref>
        ]. In addition to this, several eforts have the L1 penalty, which biases activations toward zero and
been made in the direction of finding universal features is only a rough proxy for L0, and supports any
activathat apply across models and languages [16, 17]. How- tion function. They also show that it outperforms ReLU
ever, models primarily trained on languages other than autoencoders in sparsity-reconstruction tradeofs and
enEnglish have received less attention. hances monosemanticity as small activations are clamped
      </p>
      <p>In this work, we aim to provide an early evaluation to zero.
on the feasibility of using SAEs to interpret models
trained to be natively Italian. In the interest of main- Recipe. A full breakdown of the most relevant
paramtaining a limited computational cost, we chose to use eters selected for training is presented in Table 1. The
the Minerva-1B-base-v1.0 from the Minerva model parameters were chosen following recipes for similar
family [18]. We trained a SAE on the residual stream of sized models, e.g. [11]. The expansion factor controls
every layer of the model using an Italian split of mC4 the size of the hidden layer, and is a multiplier over the
[19]. Then, we collected feature activations for the Italian model context size. In our case, an expansion factor of
dump of Wikipedia [20], and attempt to explain them 32 yields a hidden layer of size 2, 048 × 32 = 65, 536
and score explanations automatically using an LLM, fol- parameters.
lowing [15].</p>
      <p>Our contributions are the following:
• We train and release a Sparse Autoencoder
on Minerva-1B-base-v1.0. We make the</p>
      <sec id="sec-1-1">
        <title>1https://huggingface.co/alessandrobondielli/sae-Minerva-1B-32x</title>
        <p>The model can be used with the Sparsify and Delphi libraries for
interpretabilty.
2https://github.com/EleutherAI/sparsify</p>
      </sec>
      <sec id="sec-1-2">
        <title>Data. As for the training data, we chose to use mC4</title>
        <p>[22]. Specifically, we consider the “tiny” split of the
clean_mc4_it dataset [19]. It includes 6 Billion tokens
(4 Billion words). The choice of the dataset was made
on the basis that it is relatively large, especially for the
Italian language, and it includes a variety of diferent
texts. The data was not included in the training set for
Minerva-1B-base-v1.0. We chose to use 6 Billion
tokens following recent literature on training SAEs for
similar-sized models [11].</p>
      </sec>
      <sec id="sec-1-3">
        <title>Setup. We trained our model on a single Nvidia A100</title>
        <p>with 80 GB VRAM. A full training run required 200 GPU
hours, which roughly equates to 8 days. The final model,
that we call sae-Minerva-1B-32x, occupies around
40 GB of disk space including hookpoints to all layers.
The final model is available on HuggingFace 3 and can be
loaded and used with Sparsify.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Auto-Interpretation of Features</title>
      <sec id="sec-2-1">
        <title>For finding and explaining latents of the SAE models, we</title>
        <p>use the auto interpretability pipeline proposed in [15]. It
is implemented via the Delphi library from EleutherAI.4
The library includes tools for generating and scoring text
explanations for SAE.</p>
        <p>The auto intepretability pipeline has three main steps:</p>
      </sec>
      <sec id="sec-2-2">
        <title>In the following we detail our implementation of the pipeline.</title>
        <p>Collecting Activations. As for the text dataset, we
chose to use 20 Million tokens from the Italian subset
of the November 2023 Wikipedia dump [20] available
on HuggingFace.5 The choice of Wikipedia as our test
dataset rather than a sample of the SAE training data
(clean_mc4_it) was made with the purpose of
increasing the probability of finding concepts specific to the
Italian language and culture, that could have been left
out from a relatively small sample of a web-based dataset.
We created equal-sized batches from the texts, shufled
them, and then collected their token-level activations.
We collected the activations at three hookpoints, namely
at layers 2, 8 and 14. We did so with the aim of
understanding whether there is any diference in the features
found near the beginning, middle, or near the end of the
residual stream. In the following we use the hookpoint
notation to refer to layers, namely Layer..</p>
        <sec id="sec-2-2-1">
          <title>Generating Explanations. As for the explanation</title>
          <p>generation step, we followed the same procedure as [15].
We showed the Explainer LLM 40 examples of the
activating tokens and their contexts. We used a context length
of 32 tokens. The activating token can be in any of the 32
positions, but is highlighted as "« token »". We show
an example of explanation generation in Figure 1.</p>
          <p>To limit the computational cost, we attempted to
generate explanations only for a sample of 2,000 latents
selected from the pool of 65k. Latents with less than 40
examples were skipped. We used the number of latents
with enough examples at each hookpoint in the residual
stream to highlight their diferences.</p>
          <p>The chosen model to generate explanations is
Meta-Llama-3.1-8B-Instruct-AWQ-INT4,6 a
quantized version of Meta-Llama-3.1-8B-Instruct [23].
We prompted the model both in English and Italian. For
the English prompt, we used the one provided in [15]
for the zero-shot experiment. The Italian version is a
direct translation of the English prompt. The translation
was made semi-automatically: first, the prompts were
translated with Gemini-2.5 Pro.7 Then, the translated
prompt was manually revised to ensure its quality.8
3https://huggingface.co/alessandrobondielli/sae-Minerva-1B-32x
4https://github.com/EleutherAI/delphi
to decide, for each example, whether it corresponded to
the explanation, and output a list of of decisions. If the
output did not match a list of decision, it was assigned
None. The output was then compared with the ground
truth provided by the activations. The model for scoring
was the same one used to generate explanations. As for
the prompt and its translation in Italian, we followed
the same translation procedure as well. We evaluated
the quality of explanations with accuracy. Specifically,
we considered a per-sample accuracy (i.e., how many
out of the five examples the scorer model got right) and
the average accuracy across across latents for the same
hookpoint.
the interest of limiting the computational costs of our
experiments, i.e., both in terms of the memory footprint
of the model, and of the overall GPU hours. Using larger
(including non-quantized variants) models would have
drastically increased both the need of resources and
overall time of the experiments. Nonetheless, we argue that
our choice represents a lower-cost alternative to using
much larger and costlier models, that could prove
especially useful to provide some early insights into the
quality of the latents found by the SAE, and of the model
being interpreted.</p>
          <p>Authors in [15] estimate a cost in the order of
hundreds or thousand of dollars for explaining and scoring
100k latents with larger or commercial models; our
experiments, in contrast, can be easily replicated on a single
GPU. In our case, generating and scoring explanations
for 2,000 latents at three diferent hookpoints, in two
diferent languages, took 0.5 GPU hours each on a single
Nvidia A100, for a grand total of 3 GPU hours. Given
the size of the model used, the experiments could be also
replicated on much less performing hardware as well,
provided a trade-of on GPU hours.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results and Discussion</title>
      <p>In the following, we present our results. First, we show
a quantitative evaluation of the extracted latents, and
the performances of the generation and scoring pipeline,
both with Italian and English prompts. To explore the
results in greater depth, we also perform a qualitative
evaluation. We consider explanations that received
highest scores by the scorer model. We use the results to
discuss the feasibility of the proposed approach on
Italian SLM, as well as potential shortcomings.
4.1. Quantitative Evaluation</p>
      <sec id="sec-3-1">
        <title>The core of our quantitative analysis is based on the</title>
        <p>results we obtained using the Delphi library, with the</p>
        <p>We acknowledge that our choice of using a multilin- configuration presented in Section 3.
gual, relatively small, and quantized LLMs for generating
and scoring explanations is far from ideal, and it is not Quality of the Latents. To evaluate the quality of the
an adequate substitute neither for human evaluation nor latents obtained via the SAE encoding, several metrics
for more performing LLMs. The choice of a multilingual can be used. Recall that we collected latent activations
usmodel rather than an Italian-only one was made due to ing 20 Million tokens from the Italian subset of Wikipedia.
the current lack of such models with open weights, high Note also that here we are not yet using prompts, so we
performances and capability to follow instructions. This do not distinguish between Italian and English.
choice led also to prompting the model both in English Table 2 provide several common metrics used to
evaluand Italian; this was done to assess its explanation/scor- ate the quality of the extracted latents at each hookpoint.
ing capabilities both in its “native” language, albeit on First, we look at fraction of alive latents. A latent is
condata from another language, and on Italian, in order to sidered alive if at least one input token in the dataset
limit potential biases in the interpretation of results from made it fire. With the exception of Layer.8, the other
using only one or the other language. As for the choice two have much smaller fractions of alive latents than it
of a medium-sized quantized model, this was made in is typical for SAEs (see for examples results reported in</p>
        <sec id="sec-3-1-1">
          <title>Quality of the Explanations. To evaluate the quality</title>
          <p>of explanations, we consider the results of the
explanation generation and scoring pipeline. Specifically, for
each latent, we compute the accuracy at distinguishing</p>
          <p>Metric Layer.2 Layer.8 Layer.14 between sequences that activate and do not activate the
Fraction of latents alive (%) 72.02 95.16 84.65 latent. Figures 2 and 3 show respectively the distribution
LLaatteennttss ffiirreedd &gt;&gt;110%%ooffththeetitmimee( %(%) ) 00..2076 00..4005 00..3081 of Accuracy for the scorer model using Italian and
EnWeak single-token latents (%) 9.93 2.20 2.77 glish prompts for each hookpoint in the residual stream.</p>
          <p>
            Strong single-token latents (%) 12.40 0.55 0.47 We observe that, in both cases, there are significant
Table 2 diferences both in distribution and averages for the three
Latent activity statistics across selected layers hookpoints. We also observe that explanations for latents
extracted from later layers seem to be easier to score
correctly for the scorer model. This may indicate that
con[
            <xref ref-type="bibr" rid="ref15">10</xref>
            ] and [11]). This may be the results of several factors. cepts identified in later layers are, on average, more
On the SAE side, we could hypothesize an overcomplete easily interpretable by an LLM. The accuracy scores
latent space for the evaluation data, i.e. a too broad latent obtained using the Italian prompt are generally higher
space for encoding the evaluation data. Recall in fact than those for the English one, with average scores
rangthat we used mC4 to train the SAE, and evaluated it on ing from 0.64 to 0.69; the English ones, in contrast, range
Wikipedia, which may present less variety in terms of from 0.55 to 0.62. However, these results in isolation
cantexts. not be taken as a direct indication that explanations in
          </p>
          <p>On the Language Model side, we could hypothesize Italian are better than English ones. It may as well be the
that the latent space of the analyzed model is very result of poorer and broader explanations provided by
anisotropic at both earliest and latest layers, while more the Explainer model.
isotropic near the middle of the stack. This however is We also plot the aggregate confusion matrices over all
in direct contrast with works such as [24], and thus re- the predictions of both prompts. The confusion matrices
quires a more in-depth analysis, that we leave to future are shown in Figure 4. While the model prompted in
works. Another interesting aspect to consider are weak Italian seem to fare better in all metrics except for True
and strong single-token latents, that is latents that fire on Positives, we also see that the number of times the model
a specific token only. Weak ones are those for which the was not able to follow instructions and provide a
predictoken in question makes many other latents fire; strong tion with the Italian prompt is three times higher than
ones are cases where the token preferentially activates with the English one. This may be further indication that
the specific latent. We observe that Layer.2 is heavily the Explainer/Scorer model used struggles with Italian.
biased towards single token latents. This may indicate
that earliest layers sill leverage the embedding represen- 4.2. Qualitative Evaluation
tation quite strongly. Finally, we see that latents that
ifred either more than one or 10% of the times are less
and less as we move towards the residual stream. These
latents may be used to store single-token concepts of
words such as function ones.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>To dig deeper into the quality of the explanations, we di</title>
        <p>rectly looked at them and provide examples of seemingly
good and bad explanations. Specifically, we sampleed
from the 50 explanations that received highest scores by
the Scorer, both in English and Italian.</p>
        <p>As for the Italian explanations, we immediately
observed that a large fractions of them sufer from
Degenerate Repetition [25]: The model starts to generate the
same token or sequence of tokens over and over. On
(a) Italian Prompt.
(b) English Prompt.
the fact that, while it is specified in the prompt, we use
Italian texts as examples but instructions and expected
outputs are in English. Neverhteless, we observe an
interesting trend: most explanations, at all layers, that actually
focus on the firing tokens refer to functional aspects of
the text, including punctuation marks, special
characters, and functional words. For example, Latent 1818 of
Layer.14 is explained as “Prepositions and conjunctions
used to connect words or phrases in Italian text, such
as "a", "di", "nel", "in", "su", "da", "al", "nei", "all", "sulle",
"col" [...]”. This is in contrast with what we observed for
Italian explanations.
4.3. Discussion of Key Findings</p>
      </sec>
      <sec id="sec-3-3">
        <title>In the following, we highlight some of the key aspects that emerged from the experiments.</title>
        <sec id="sec-3-3-1">
          <title>SAEs can find partially interpretable features in</title>
          <p>Italian Small Language Models. First, we observe
that using a SAE we are able to extract features that
somewhat align to interpretable concepts, despite some
limitations that we can mostly attribute to the quality of
the training data, both for the original model and the SAE,
and to the limitations of the auto-interpretability pipeline
(see below). It is possible that leveraging a dataset more
attuned with the Italian culture would yield better results
in finding relevant latents.
the contrary, English ones does not sufer from this
issue. However, if we look at the quality of explanations,
aside from repetitions, we observe that at least some of
the Italian ones are quite relevant to the examples, and
while sometimes slightly missing the mark, they
highlight some interesting aspects of the tokens that fire the
latent.</p>
          <p>Among these, we can clearly see that Layer.2 is Auto-interpretability is promising, but currently
mostly represented by single token latents: the token shows limitations for Italian. Auto-interpretability
“ale” as part of “federale” (federal), in several contexts, or pipelines are definitely a promising approach for
simplithe token “letto”, as both a noun (bed) and a verb (read). fying and reducing the costs of finding explanations for
Layer.14 latents on the other had appear to represent latents of SAEs. Our experiment suggest in fact that this
more abstract concepts. For example, we see latents firing is a low-cost alternative that is nonetheless able to
deon the final number of a year date, and a very interesting liver some interesting results. Nevertheless, we observed
latent firing on the concept of competition (see Fig. ??). two main limitations that we can argue are actually two
Layer.8 explanations are generally more confusing and sides of the same coin. On the one side, the Explainer
less interesting. Examples are reported in Figure 5 with model showed some limitations in understanding the task
the relative explanation, cut to avoid showing repetitions. and providing coherent texts for the explanations, while</p>
          <p>As for the English explanations on the other hand, we the Scorer model performed quite poorly in the binary
observed that most of them actually miss the mark. In classificationt task. This is especially true in the case of
fact, they often provide an explanation related to the con- language mixing, i.e. when the model is prompted in its
texts, rather than the firing tokens. This may be due to “main” language, i.e. English, but has to work on another
Diferent behaviours in the residual stream. We
observed some relevant diferences in the quality and
types of latents that are properly identified in various
points of the residual stream. In general, we observed
that latents obtained from earlier in the stream are more
relevant to single tokens and grammatical aspects of the
language, while latents in later points of the stream show
a slight tendency towards more abstract
conceptualizations.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>In this paper, we have shown that SAEs can partly un</title>
        <p>cover interpretable concepts in Italian Small Language
Models. Specifically, we did so by training a SAE model
on the residual stream of the Minerva-1B-base-v1.0
SLM, and then applying an auto-interpretability pipeline
to generate explanations for its latents.</p>
        <p>Our findings suggest that SAE can be used to this end,
and that it exist a hierarchical representation within the
model, with earlier layers showing more token-centric
features and later layers more abstract concepts. As for
the auto-interpretability pipeline, while promising for
its low cost, underscored the need for better
languagespecific tools for Italian.</p>
        <p>Moving forward, we aim to explore several avenues.
language, in this case Italian. On the other side, the size First, we plan to scale our experiments in two directions:
of the model used in our experiments could severely limit on the one hand, we aim to train SAEs on larger Italian
its performances. models, e.g. larger variants of Minerva as well as others;</p>
        <p>
          Thus, both issues could be solved either by leveraging on the other hand, we observe that we need to improve
a stronger Italian-centric model as the Explainer/Score, or the models used for auto-interpretability, in order
obby using a generally larger and better performing model. tain more reliable explanations. This could be achieved
However, as for the first solution, there are currently no both by scaling them up substantially, and by tuning
models on par with English ones in the 7-15B parameters Italian-speaking models to the specific tasks of latent
exrange, wich whould allow for reducing the cost. As for planation and scoring. Second, we plan to leverage SAE
the second solution, this would dramatically increase the and auto interpretability to address potential diferences
costs, both computational and monetary. of representations in models pre-trained specifically on
Italian data, e.g. Minerva and Velvet [26], and
multilingual models that received only fine-tuning in Italian, like
5. Conclusions and Future Works the LLaMAntino variants [27] and Cerbero [
          <xref ref-type="bibr" rid="ref32">28</xref>
          ]. Finally,
we plan to explore the larger latent space to attempt to
uncover features linked specifically to Italian-centric
concepts, in addition to properties of the Italian Language.
        </p>
        <p>This work is an early first step in exploring
interpretability research using Sparse Autoencoders for
nonEnglish-centric Language Models. Albeit limited in scope,
we are optimistic that it may provide a relevant
foundation for this yet under explored research area, both in
terms of approach and the release of open models for the
community.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>This work has been supported by the PNRR MUR project PE0000013-FAIR (Spoke 1), funded by the European Commission under the NextGeneration EU programme, and the EU EIC project EMERGE (Grant No. 101070918).</title>
      </sec>
      <sec id="sec-4-2">
        <title>Our initial efort to interpret Italian SLMs using Sparse</title>
        <p>Autoencoders has several limitations. The choice of
the smaller Minerva-1B-base-v1.0 model, driven by
computational constraints, means our findings might not
generalize to larger Italian models. The SAE’s training
data, while substantial for Italian, might not fully capture
all linguistic nuances, potentially afecting the quality
of learned features. Additionally, using diferent data to
train and evaluate the SAE, while arguably not
problematic in principle, may have introduced some unwanted
biases.</p>
        <p>A key limitation stems from our cost-efective
autointerpretability pipeline, which relies on a relatively
small, quantized multilingual LLM. This model
struggled with generating coherent Italian explanations, often
repeating itself, and performed poorly in scoring when
mixing languages. This highlights the strong dependence
of explanation quality on the explainer/scorer model’s
capabilities, and the current lack of robust, afordable,
Italian-specific tools.</p>
        <p>Finally, our analysis was based on a sample of 2000
latents across only three layers, not the entire SAE latent
space. While insightful, this limited scope and
subjective qualitative assessment means we cannot yet claim
a comprehensive understanding of the model’s internal
workings.</p>
        <p>G u i d e l i n e s :
You w i l l be g i v e n a l i s t o f t e x t
examples i n I t a l i a n on which
s p e c i a l words a r e s e l e c t e d and
between d e l i m i t e r s l i k e &lt;&lt; t h i s &gt; &gt; .</p>
        <p>I f a sequence o f c o n s e c u t i v e
t o k e n s a l l a r e import ant , t he
e n t i r e sequence o f t o k e n s w i l l be
c o n t a i n e d between d e l i m i t e r s &lt;&lt;
j u s t l i k e t h i s &gt; &gt; . How i m p o r t a n t
each token i s f o r t he b e h a v i o r i s
l i s t e d a f t e r each example i n
p a r e n t h e s e s .
− Try t o produce a c o n c i s e f i n a l
d e s c r i p t i o n . Simply d e s c r i b e th e
t e x t l a t e n t s t h a t a r e common i n
th e examples , and what p a t t e r n s
you found .
− I f t he examples a r e u n i n f o r m a t i v e ,
you don ’ t need t o mention them .</p>
        <p>Don ’ t f o c u s on g i v i n g examples o f
i m p o r t a n t tokens , but t r y t o
summarize th e p a t t e r n s found i n
th e examples .
− Do not mention th e marker t o k e n s</p>
        <p>( &lt; &lt; &gt; &gt;) i n your e x p l a n a t i o n .
− Do not make l i s t s o f p o s s i b l e
e x p l a n a t i o n s . Keep your
e x p l a n a t i o n s s h o r t and c o n c i s e .
− The l a s t l i n e o f your r e s p o n s e must
be t he f o r m a t t e d e x p l a n a t i o n ,
u s i n g [ EXPLANATION ] :
{ { prompt } }
S e i un m e t i c o l o s o r i c e r c a t o r e d i
i n t e l l i g e n z a a r t i f i c i a l e che
conduce un ’ i m p o r t a n t e i n d a g i n e
s u g l i schemi p r e s e n t i n e l l a
l i n g u a i t a l i a n a . I l tuo compito e
’ a n a l i z z a r e i l t e s t o e f o r n i r e
una s p i e g a z i o n e che r a c c h i u d a i n
modo e s a u r i e n t e i p o s s i b i l i
schemi i n e s s o r i s c o n t r a t i .</p>
        <p>L i n e e g u i d a :
T i v e r r a ’ f o r n i t o un e l e n c o d i esempi
d i t e s t o i n i t a l i a n o i n c u i
p a r o l e s p e c i a l i sono s e l e z i o n a t e
e i n s e r i t e t r a d e l i m i t a t o r i come
&lt;&lt; questo &gt; &gt; . Se una sequenza d i
token c o n s e c u t i v i e ’ t u t t a
i m p o r t a n t e , l ’ i n t e r a sequenza d i
token sa ra ’ c o n t e n u t a t r a
d e l i m i t a t o r i &lt;&lt; p r o p r i o come
questo &gt; &gt; . L ’ i m p o r t a n z a d i c i a s c u n
token p e r i l comportamento e ’
e l e n c a t a dopo o g n i esempio t r a
p a r e n t e s i .
− Cerca d i p r o d u r r e una d e s c r i z i o n e
f i n a l e c o n c i s a . D e s c r i v i
s e m p l i c e m e n t e g l i e l e m e n t i
l a t e n t i d e l t e s t o comuni n e g l i
esempi e g l i schemi che h a i
t r o v a t o .
− Se g l i esempi non sono i n f o r m a t i v i ,
non e ’ n e c e s s a r i o m e n z i o n a r l i .</p>
        <p>Non c o n c e n t r a r t i s u l f o r n i r e
esempi d i token i m p o r t a n t i , ma
c e r c a d i r i a s s u m e r e g l i schemi
t r o v a t i n e g l i esempi .
− Non menzionare i token m a r c a t o r i</p>
        <p>( &lt; &lt; &gt; &gt;) n e l l a tua s p i e g a z i o n e .
− Non c r e a r e e l e n c h i d i p o s s i b i l i
s p i e g a z i o n i . M a n t i e n i l e t u e
s p i e g a z i o n i b r e v i e c o n c i s e .
− L ’ u l t i m a r i g a d e l l a tua r i s p o s t a
deve e s s e r e l a s p i e g a z i o n e
f o r m a t t a t a , usando [ SPIEGAZIONE ] :
{ { prompt } }</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>A. Explainer Prompts</title>
      <sec id="sec-5-1">
        <title>In Figure 6 we provide prompts fed to the Explainer model, both in English (original from [15]) and Italian (translation).</title>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Grammarly in order
to: Paraphrase and reword, Improve writing style, and Grammar and spelling check. After using
these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>2023. URL: https://arxiv.org/abs/2309.08600.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>arXiv:2309</source>
          .
          <fpage>08600</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olsson</surname>
          </string-name>
          , N. Schiefer,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Thread</surname>
          </string-name>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Scherlis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Jermyn</surname>
          </string-name>
          , J. Benton,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>ral networks</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2210.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <year>01892</year>
          . arXiv:
          <fpage>2210</fpage>
          .
          <year>01892</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Anders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Neo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoelscher-Obermaier</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. N.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bricken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Templeton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Batson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>cuits Thread</surname>
          </string-name>
          (
          <year>2023</year>
          ). https://transformer-circuits.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>pub/2023/monosemantic-features/index.html.</mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Templeton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Conerly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marcus</surname>
          </string-name>
          , J. Lind-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>claude 3 sonnet</article-title>
          , Transformer Circuits Thread
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          (
          <year>2024</year>
          ). URL: https://transformer-circuits.pub/2024/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          , T. D. la Tour,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tillman</surname>
          </string-name>
          , G. Goh, R. Troll,
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>and evaluating sparse autoencoders</article-title>
          , arXiv preprint [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shojaee</surname>
          </string-name>
          , I. Mirzadeh,
          <string-name>
            <given-names>K.</given-names>
            <surname>Alizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Horton</surname>
          </string-name>
          , arXiv:
          <fpage>2406</fpage>
          .04093 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Farajtabar</surname>
          </string-name>
          , The illusion of thinking: [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lieberum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajamanoharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Conmy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          2025.
          <article-title>URL: https://ml-site.cdn-apple</article-title>
          .com/papers/ autoencoders everywhere all at once on gemma
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>the-illusion-of-thinking</article-title>
          .
          <source>pdf . 2</source>
          , in: Y.
          <string-name>
            <surname>Belinkov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jumelet</surname>
            , H. Mo[2]
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Olah</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Cammarata</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schubert</surname>
          </string-name>
          , G. Goh, hebbi, A. Mueller, H. Chen (Eds.),
          <source>Proceedings of</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <article-title>Zoom in: An introduction the</article-title>
          7th BlackboxNLP Workshop: Analyzing and
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>to circuits, Distill</source>
          <volume>5</volume>
          (
          <year>2020</year>
          )
          <article-title>e24</article-title>
          .
          <article-title>Interpreting Neural Networks for NLP</article-title>
          , Association [3]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Olshausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Field</surname>
          </string-name>
          ,
          <article-title>Sparse coding with for Computational Linguistics</article-title>
          , Miami, Florida,
          <string-name>
            <surname>US</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>an overcomplete basis set: A strategy employed by 2024</article-title>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>300</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          v1?,
          <source>Vision research 37</source>
          (
          <year>1997</year>
          )
          <fpage>3311</fpage>
          -
          <lpage>3325</lpage>
          .
          <year>2024</year>
          .blackboxnlp-
          <volume>1</volume>
          .19/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Riggs</surname>
          </string-name>
          , R. Huben, blackboxnlp-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Sharkey</surname>
          </string-name>
          , Sparse autoencoders find highly [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>arXiv preprint arXiv:2407.01513</source>
          (
          <year>2024</year>
          ). ternational Conference on Computational Linguis[13]
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Williams, tics, Language Resources and Evaluation (LREC-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          , L. Choshen,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2024</year>
          ),
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italy,
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Warstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Wilcox</surname>
          </string-name>
          , Findings of the sec- pp.
          <fpage>9422</fpage>
          -
          <lpage>9433</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>ond BabyLM challenge: Sample-eficient pretrain- lrec-main</article-title>
          .
          <volume>823</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <article-title>ing on developmentally plausible corpora</article-title>
          , in: M. Y. [20]
          <string-name>
            <given-names>W.</given-names>
            <surname>Foundation</surname>
          </string-name>
          , Wikimedia downloads, ???? URL:
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Choshen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Warstadt</surname>
          </string-name>
          , [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Makhzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <article-title>K-sparse autoencoders</article-title>
          , arXiv
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>E. G.</surname>
          </string-name>
          Wilcox (Eds.),
          <source>The 2nd BabyLM Challenge preprint arXiv:1312.5663</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>at the 28th Conference on Computational Natural</source>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kale</surname>
          </string-name>
          , R. Al-
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>Language</given-names>
            <surname>Learning</surname>
          </string-name>
          , Association for Computational Rfou,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddhant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          , mT5:
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , Miami, FL, USA,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . URL:
          <article-title>A massively multilingual pre-trained text-to-text</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          https://aclanthology.org/
          <year>2024</year>
          .conll-babylm.1/. transformer,
          <source>in: Proceedings of the 2021 Con</source>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Capone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondielli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Lenci,
          <article-title>ConcreteGPT: A ference of the North American Chapter of the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>baby GPT-2 based on lexical concreteness and cur- Association for Computational Linguistics: Hu-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , L. Choshen, putational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Warstadt</surname>
          </string-name>
          , E. G. Wilcox (Eds.),
          <string-name>
            <surname>The</surname>
            <given-names>URL</given-names>
          </string-name>
          : https://aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>41</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>2nd BabyLM Challenge at the 28th Conference on doi:10</source>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>41</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Computational Natural Language Learning</surname>
            , Asso- [23]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Grattafiori</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jauhri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Ka-
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>USA</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>196</lpage>
          . URL: https://aclanthology. ten, A.
          <string-name>
            <surname>Vaughan</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models,</source>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          org/
          <year>2024</year>
          .conll-babylm.
          <volume>16</volume>
          /. arXiv preprint arXiv:
          <volume>2407</volume>
          .21783 (
          <year>2024</year>
          ). [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Paulo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mallen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Juang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Belrose</surname>
          </string-name>
          , Auto- [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Razzhigaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikhalchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Goncharova</surname>
          </string-name>
          , I. Os-
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <article-title>language models</article-title>
          ,
          <source>arXiv preprint arXiv:2410</source>
          .
          <article-title>13928 of learning: Anisotropy and intrinsic dimensions</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          (
          <year>2024</year>
          ).
          <article-title>in transformer-based models</article-title>
          , in: Y. Graham, [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khakzar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krueger</surname>
          </string-name>
          , M. Purver (Eds.),
          <article-title>Findings of the Association for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Barez</surname>
          </string-name>
          ,
          <article-title>Sparse autoencoders reveal universal fea-</article-title>
          <source>Computational Linguistics: EACL</source>
          <year>2024</year>
          , Association
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <source>preprint arXiv:2410.06981</source>
          (
          <year>2024</year>
          ).
          <year>2024</year>
          , pp.
          <fpage>868</fpage>
          -
          <lpage>874</lpage>
          . URL: https://aclanthology.org/ [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lindsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gurnee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ameisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <year>2024</year>
          .findings-eacl.
          <volume>58</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pearce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. L.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Citro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Abrahams</surname>
          </string-name>
          , [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>T. B. Thompson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmerman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Rivoire</surname>
          </string-name>
          , T. Con- 37th
          <source>International Conference on Neural Informa-</source>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>erly</surname>
            , C. Olah,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Batson</surname>
          </string-name>
          ,
          <source>On the biology of a large tion Processing Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>72888</fpage>
          -
          <lpage>72903</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <article-title>language model</article-title>
          , Transformer Circuits Thread [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <article-title>Almawave presents velvet: The sustain-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          (
          <year>2025</year>
          ). URL: https://transformer-circuits.pub/2025/ able and high-performance
          <source>italian ai</source>
          ,
          <year>2025</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <article-title>attribution-graphs/biology</article-title>
          .html. https://www.almawave.com. [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Orlando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moroni</surname>
          </string-name>
          , P.-L. Huguet
          <string-name>
            <surname>Cabot</surname>
            , S. Co- [27]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Basile</surname>
            , E. Musacchio,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Polignano</surname>
          </string-name>
          , L. Siciliani,
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>nia</surname>
            , E. Barba,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Orlandini</surname>
          </string-name>
          , G. Fiameni, R. Nav- G. Fiameni, G. Semeraro, Llamantino: Llama 2
          <fpage>mod</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <source>language models trained from scratch on Italian</source>
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>09993</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          data, in: F.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            , S. Montemagni, [28]
            <given-names>F. A.</given-names>
          </string-name>
          <string-name>
            <surname>Galatolo</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Cimino</surname>
          </string-name>
          , Cerbero-7b:
          <article-title>A leap for-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          (Eds.),
          <article-title>Proceedings of the 10th Italian ward in language-specific llms through enhanced</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          it
          <year>2024</year>
          ), CEUR Workshop Proceedings, Pisa, Italy, preprint arXiv:
          <volume>2311</volume>
          .15698 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <year>2024</year>
          , pp.
          <fpage>707</fpage>
          -
          <lpage>719</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          2024.clicit-
          <volume>1</volume>
          .77/. [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, IT5: Text-to-text pretraining for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Joint InExplainer Prompt</source>
          (
          <article-title>Eng) Explainer Prompt (Ita) You a r e a m e t i c u l o u s AI r e s e a r c h e r</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>