<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>When Less Is More? Diagnosing ASR Predictions in Sardinian via Layer-Wise Decoding</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Domenico De Cristofaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Vietti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marianne Pouplier</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleese Block</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ALPS, Alpine Laboratory of Phonetic Sciences</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LMU Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent studies have shown that intermediate layers in multilingual speech models often encode more phonetically accurate representations than the final output layer. In this work, we apply a layer-wise decoding strategy to a pretrained Wav2Vec2 model to investigate how phoneme-level predictions evolve across encoder layers, focusing on Campidanese Sardinian, a low-resource language. We show that truncating upper transformer layers leads to improved Phoneme Error Rates (PER), with the best performance achieved not at the final layer, but two layers earlier. Through fine-grained alignment analysis, we ifnd that intermediate predictions better preserve segmental identity, avoid overgeneration, and reduce certain classes of phonological errors. We also introduce the notion of regressive errors-cases where correct predictions at intermediate layers are overwritten by errors at the final layer. These regressions highlight the limitations of surface-level error metrics and reveal how deeper layers may generalize or abstract away from acoustic detail. Our findings support the use of early-layer probing as a diagnostic tool for ASR models, particularly in low-resource settings where standard evaluation metrics may fail to capture linguistically meaningful behavior.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Speech Recognition</kwd>
        <kwd>Low-Resourced Languages</kwd>
        <kwd>Logit Lens</kwd>
        <kwd>Interpretability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>layer-wise decoding analysis using the pretrained
facebook/wav2vec2-xlsr-53-espeak-cv-ft
Recent research in multilingual speech foundation mod- model on Sardinian audio data. We progressively
els has revealed that intermediate representations often truncate the encoder by removing a varying
numencode richer phonetic information than the final out- ber of top transformer layers before decoding. For
put layer. Using Logit Lens-style probing across encoder each configuration, we decode phoneme sequences
layers, studies such as Shim et al. [1] and Langedijk et al. and compare the output to gold-standard phonemic
[2] have shown that earlier layers in transformer-based transcriptions, measuring overall Phoneme Error Rate
models such as Whisper yield lower Word Error Rate (PER) and analyzing error types (insertions, deletions,
(WER) and Character Error Rate (CER). substitutions). Our Contributions:</p>
      <p>Building on this line of work, we investigate whether
removing upper transformer layers in a pretrained • we present a phoneme-level layer-wise analysis
multilingual ASR model influences its phoneme-level of Wav2Vec2 on a low-resource Sardinian dataset.
decoding behavior. Our hypothesis is grounded in prior • we introduce the notion of regressive errors in ASR
ifndings—particularly those of Shim et al. [1]—which layer-wise decoding.
demonstrate that applying a Logit Lens probing strategy • we show that intermediate layers (e.g., Layer 22)
to intermediate encoder layers results in lower CER for yield more phonetically accurate hypotheses than
low-resource languages unseen during training. How- the final layer.
ever, this raises a crucial question: what kinds of errors
are actually reduced when decoding from intermediate 2. Related Works
layers instead of the full model? More specifically, are
the mistakes made by the final layer already resolved in
earlier layers? To answer this, we perform a systematic</p>
      <sec id="sec-1-1">
        <title>Interpretability has become a central concern in the anal</title>
        <p>ysis of deep learning models for NLP and speech,
particularly when it comes to understanding how linguistic
representations emerge across network layers. In ASR,
probing techniques such as Singular Vector Canonical
Correlation Analysis (SVCCA) [3] and layer-wise probing
classifiers [ 4] have been used to assess the presence of
phonetic and phonological features in hidden
representaproperties can be selectively removed from representa- as further reduction leads to a substantial degradation in
tions, suggesting that such information is not uniformly performance, with PER increasing sharply beyond this
distributed across layers. A particularly efective method point, reaching over 70% of PER at Layer 16. Decoded
for layer-wise interpretability is the logit lens [6]. Early phoneme sequences are aligned to the gold phonemic
exiting strategies are grounded in the observation that transcriptions using a phoneme-level alignment
algointermediate layers of deep neural models often sufice rithm based on SequenceMatcher. This allows us
for accurate predictions, allowing for more eficient com- to categorize each prediction as a correct match (hit),
putation and improved robustness [7, 8, 9]. More re- substitution, insertion, or deletion. Note that insertions
cently, this idea has been extended beyond eficiency: in are rarely observed in embedding-level decoding with
interpretability research, intermediate predictions have CTC models, as output units are selected frame-wise.
become a powerful tool for analyzing representational dy- Many deletion errors may instead reflect phoneme
namics. The logit lens approach [6], for example, projects mergers or coarticulation phenomena. To quantify
hidden states into output space to visualize how predic- the impact of layer removal on ASR performance, we
tions evolve across layers. Subsequent refinements [ 9, 10] compute the PER at each truncation level. In addition,
have made these projections more faithful by learning we track phoneme-level alignment patterns and analyze
layer-specific transformations, revealing how informa- the disappearance or emergence of specific error types
tion is incrementally constructed. While these methods as the number of removed layers increases.
have mostly been explored in the context of decoder-only
language models, some recent work has adapted them to 3.1. Dataset
speech systems. Langedijk et al. [2] extend the logit lens
to encoder-decoder architectures such as Whisper, while
Shim et al. [1] demonstrate that early-layer
representations in multilingual speech models may better capture
phonetic distinctions—particularly in under-represented
languages. In this work, we extend this line of research
by investigating why intermediate-layer decoding leads
to improved performance, and whether this strategy is
truly efective for low-resource languages. Rather than
using early exits purely for eficiency, we treat them as a
probing tool to examine how phoneme representations
emerge and evolve across layers in a multilingual speech
model.</p>
        <p>The data used in this study consists of spontaneous
speech recordings in Campidanese Sardinian, a variety
spoken in the southern part of Sardinia. The recordings
were collected during fieldwork as part of the DID project
in the municipality of Sinnai. The dataset includes 48
short utterances produced by four native speakers (two
female, two male), selected from longer recordings based
on linguistic relevance and clarity. The mean duration of
the utterances is approximately 4.06 seconds. All
utterances were manually transcribed at the phonemic level
by a trained phonetician who is also a native speaker
of Campidanese. The resulting dataset provides a
highquality phonemic reference for evaluating model
predictions in a low-resource, under-represented language
context [13, 14, 15].</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
      <p>We analyze the layer-wise phoneme decoding
behavior of a pretrained multilingual ASR model, 4. Results
facebook/wav2vec2-xlsr-53-espeak-cv-ft [11],
which is a wav2vec based model fine tuned on phonemic As shown in Table 1, removing the top layers of the
transcriptions from the Common Voice dataset [12] encoder leads to a consistent reduction in PER, with the
using a CTC loss. The model has 25 transformer encoder best performance observed when two layers are removed.
layers stacked above a 7 layers of convolutional feature This result supports the hypothesis that intermediate
encoder. To probe the phonetic content across layers, transformer layers perform better also on unseen
lowwe apply a truncation-based decoding strategy: for resourced languages.
each utterance, we progressively remove  transformer
layers (where  ∈ {0, 1, . . . , 5}) and perform greedy 4.1. Global Trends and Error Type
decoding on the logits computed from the last remaining Evolution
layer. This is possible because all transformer layers
share the same hidden dimension, allowing the model’s Figure 2 provides a global view of how the model’s
ifnal projection head to be applied to intermediate phoneme-level predictions evolve as top layers are
relayer outputs without architectural modification. As moved. As expected, the number of correctly predicted
a result, we can decode phoneme sequences from any phonemes (labeled as "hit") steadily decreases as more
encoder layer using the same decoding pipeline. We layers are removed. At the same time, deletion errors
limit the truncation to a maximum of 5 layers removed, increase sharply, particularly from Layer 21 backward,</p>
      <p>Layer PER all—maybe especially for shorter or acoustically reduced
24 36.73 segments. At deeper layers, the model may attempt to
23 36.50 recover some of these missing elements by assigning
22 35.40 them a plausible phonemic category, potentially relying
21 38.92 more on contextual or phonotactic patterns than on local
20 50.03 acoustic evidence. This supports a view of
hierarchi19 66.07 cal processing, where early layers encode fine-grained
Table 1 phonetic detail, while later layers abstract away from
Phoneme Error Rate (PER) for diferent truncation levels. it, integrating higher-level dependencies that can both
resolve and distort the original signal. However, this
notion of hierarchical abstraction is model-dependent
and assumes a certain architectural behavior. Since we
eventually dominating the error profile at Layer 19. This do not impose constraints on the model design, further
shows that by removing layers, the model lacks informa- work is needed to test whether this abstraction emerges
tive representations and tends to prefer skipping a predic- consistently across architectures.
tion rather than producing an incorrect one. In contrast, To better understand these dynamics, we examine
substitution errors remain relatively stable across Lay- which phonemes are most frequently involved in
deleers 24-22 and begin to decline slightly in deeper layers. tion and substitution errors. As shown in Figure 1, vowel
This pattern suggests that intermediate layers may retain phonemes such as /i/, /u/, and /a/ are among the most
more accurate segment-level information, minimizing frequently deleted and substituted segments—especially
confusion between phonetically similar units. However, as the number of removed layers increases. Interestingly,
the sharp increase in deletions at lower layers should these three vowels are the only ones that commonly
not be interpreted as a simple reclassification of previ- appear in unstressed final position in Campidanese
Sarous substitutions. Instead, it indicates that the model is dinian. While the model is not explicitly aware of word
increasingly unable to resolve a segmental identity at
boundaries, its predictions appear sensitive to acoustic 4.2. Regressive Errors: When Hits Become
cues associated with prosodic prominence. These vowels Mistakes
are more likely to be reduced in duration and formant
clarity when unstressed, and the model’s tendency to While final-layer predictions often improve overall
acdelete them may reflect a broader dificulty in segment- curacy, we also observe notable exceptions where the
ing low-prominence units—an efect we also observed in opposite occurs—cases in which the correct phoneme is
our previous analysis of stress and frequency in phoneme already identified at an intermediate layer but becomes
recognition [16]. Some vowel deletions may also be ex- an error at the final layer. We refer to these as regressive
plained by the mismatch between phoneme duration and errors: instances where a phoneme is correctly predicted
the convolutional receptive field of the model’s encoder. (a hit) at Layer 22 or 23, but turns into a substitution or
Since input frames are processed with overlapping win- deletion at Layer 24. We define a regressive error as a
dows, short vowels may be underrepresented or merged, case where a correct prediction (hit) at an intermediate
leading to systematic omissions during decoding. Most layer ℓ is replaced by a substitution or deletion at a deeper
of the substitutions involve phonetically close phoneme layer ℓ +  (with  &gt; 0). In total, we identify 53 such
pairs, difering by a single articulatory feature such as regressions across the dataset: 39 cases of hit →
substivoicing, manner, or vowel height. For instance, one of the tution and 14 cases of hit → deletion. These regressions
most frequent substitutions is /E/ → /e/, a mid-front indicate that the full encoder may in some cases
“overprovowel contrast distinguished primarily by height. Simi- cess” the input, replacing a correct low-level prediction
larly, /O/ → /o/ reflects a rounded back vowel pair with with a less accurate one as more layers are added.
Crua similar height diference. Another recurrent case is /G/ cially, most regressions involve substitutions, suggesting
→ /g/, where a uvular fricative is replaced by a voiced that deeper layers may introduce abstractions that distort
plosive, suggesting the model struggles with fine-grained ifne-grained segmental information—trading of phonetic
place and manner distinctions in lower layers. These pat- precision for contextual generalization. This may reflect
terns support the hypothesis that, while intermediate a dual mechanism: (a) the re-integration of previously
layers reduce substitution errors, the model’s phonolog- deleted segments, particularly those corresponding to
ical representations remain coarse. Segment identity is short or hard-to-classify frames, and (b) the remapping
preserved at a broad class level, but phonetic resolution of rare or marked phonemes onto broader, more frequent
weakens as contextual information is reduced. Overall, categories. In this sense, earlier layers (e.g., Layer 19)
the observed substitution patterns are not random, but may in fact produce transcriptions that are more faithful
structured according to articulatory proximity, as further to the phonetic input, while later layers enforce
higherconfirmed in Figure 1. level regularities at the cost of segmental detail. This
challenges a common assumption: that improved overall
error rates necessarily reflect more accurate linguistic
representations. Instead, our findings suggest that
intermediate layers may better preserve phoneme identity</p>
      <sec id="sec-2-1">
        <title>4.3. Utterances with Largest PER</title>
      </sec>
      <sec id="sec-2-2">
        <title>Reduction</title>
        <p>in certain cases, while the final layer smooths over or fricative /Z/. Interestingly, at Layer 22, the model
precollapses distinctions that are phonologically relevant. dicts a more plausible onset sequence /e:ntsu/ (in blue),
To better understand the nature of these regressions, we which is closer to the expected /ensu/, suggesting a
analyze which phonemes are most frequently afected. better alignment with the reference. Additionally, the
Among the 53 cases, the high back rounded vowel /u/ is ifnal segment /i/ is still present in both Layer 22 and
the most common (13 instances), followed by the alveolar 23, but is ultimately deleted in Layer 24. This suggests
approximant /r/ (7 instances), and others such as /n/, that the full model may over-generalize phonetic detail,
/i/, and /a/. Notably, many of the regressive substitu- leading to the omission of segments that were correctly
tions involving /u/ involve replacement with acousti- predicted in earlier layers. The evidence supports our
cally similar vowels like /o/ or /U/ in the final layer—a broader claim: improvements in PER at intermediate
laypattern aligned with known vowel confusions in Sar- ers are not merely a side-efect of over generalization,
dinian phonology [17]. but reflect a more faithful alignment to the input
acoustics. In this case, Layer 22 preserves both the segmental
identity and sequence structure more reliably than the
full encoder.</p>
        <p>iE5ntsu:tVmla:u5Nti:S
iE5ntsu:tamla:u5Nti:Si:
e:ntsutamla:u5tiSi
ensudwamillaundiZi
To explore whether layer truncation improves phoneme Layer 24
decoding in a linguistically meaningful way, we identify Layer 23
the five utterances that show the greatest PER reduction Layer 22
between Layer 24 and Layer 22 (Table 5). A qualitative Reference
inspection reveals that intermediate-layer outputs more
closely approximate the reference transcriptions—not Table 3: Layer-wise phoneme predictions for
utteronly in terms of segmental identity but also in overall ance 30_F_extract_04.
sequence structure. While final-layer predictions
sometimes exhibit phoneme insertions or reduplications that A similar phenomenon is observed in Table 4, where
inflate the hypothesis length, the intermediate outputs the utterance 03_F_extract_01 demonstrates how the
tend to be more balanced and structurally coherent. This ifnal layer introduces segmental distortions not present
observation suggests that improvements in PER at in- in earlier representations. At Layer 22, the model
protermediate layers are not merely an artifact of shorter duces a concise and well-aligned output that accurately
sequences, but reflect more accurate segmental parsing captures the alveolar flap /4/ ( /R/) and avoids inserting
and alignment. Rather than underpredicting, these lay- extraneous phonetic material. Notably, the vowel
preceders appear to produce hypotheses that better capture ing /4/ is realized as a short /e/ in the prediction from
the linguistic and prosodic shape of the input, avoiding Layer 22, closely matching the reference transcription. In
overgeneration without compromising coverage. These contrast, Layers 23 and 24 both produce an elongated /e:/
improvements are quantitatively confirmed in Table 2, vowel. While this lengthening is not annotated in the
refwhere PER consistently decreases when decoding from erence, a manual inspection of the spectrogram reveals
Layer 22 compared to the full model. The most dramatic that the vowel is indeed phonetically long (approximately
case is 03_F_extract_01, with a 50% relative reduc- 297 ms), possibly due to prosodic or pragmatic factors.
tion in PER, followed by 30_F_extract_04, which im- This suggests that vowel duration is a feature that only
proves by nearly 28 absolute percentage points. In both emerges at higher layers, where the model integrates
cases, the intermediate-layer output avoids spurious in- broader contextual information. Rather than being an
ersertions and better aligns with the prosodic structure of ror, the elongation may reflect the model’s sensitivity to
the utterance. Even for more moderate improvements prosodic prominence, which is not explicitly captured in
(e.g., 46_M_extract_04 and 29_M_extract_03), we the phonemic gold standard but is present in the acoustic
observe a shift toward more plausible segmental struc- signal. In this case, then, the intermediate layer ofers
tures and reduced redundancy. These findings reinforce a segmentally accurate representation aligned with the
the idea that intermediate representations strike a favor- reference, while the deeper layers introduce prosodically
able balance between acoustic faithfulness and contex- informed variation. This highlights how diferent layers
tual abstraction—preserving enough low-level detail to may prioritize diferent levels of linguistic abstraction,
make accurate segmental decisions while avoiding the with earlier layers preserving phonemic detail and later
overgeneralization seen in later layers. ones encoding broader discourse or prosodic cues.</p>
        <p>As illustrated in Table 3, the final layer output includes
several critical errors: an initial vowel /i/ (in red) that
does not appear in the reference, and an incorrect final
segment /S/ (also in red) that replaces the true voiced
03_F_extract_01
30_F_extract_04
46_M_extract_04
30_F_extract_02
29_M_extract_03
14.29
83.33
44.74
46.15
40.00
7.14
55.56
36.84
38.46
33.33</p>
        <p>distinctions aligns with a view of hierarchical abstraction
in neural models. From a phonological perspective, this
might suggest that neural encoders learn generalizable
phonemic categories early on and gradually shift toward
context-dependent or prosodically conditioned outputs.</p>
        <p>Future work could explore whether this abstraction
follows typologically consistent patterns across languages.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusions</title>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion</title>
      <sec id="sec-4-1">
        <title>This study explored the use of layer truncation as a prob</title>
        <p>Our findings challenge a widespread assumption in ing strategy for understanding phoneme-level decoding
speech modeling: improvements in error metrics like behavior in a multilingual ASR model. By applying Logit
PER necessarily reflect more accurate or linguistically Lens-style analysis to Wav2Vec2, we show that
intermemeaningful predictions. While intermediate layers of the diate layers can outperform the final layer in terms of
Wav2Vec2 model often yield lower PER, a closer analy- Phoneme Error Rate—particularly for a low-resource
lansis reveals that this improvement is not uniformly dis- guage like Sardinian. Beyond aggregate improvements,
tributed across all phoneme classes or error types. This our fine-grained error analysis reveals two key insights:
aligns with an ongoing open question in speech mod- (1) intermediate predictions tend to avoid certain types
eling, why do higher layers often decrease WER while of phonological errors, and (2) in some cases, deeper
increasing PER? The answer may lie in how deeper layers layers actually degrade performance by transforming
prioritize lexical or orthographic consistency over pho- previously correct phonemes into errors. These findings
netic detail, leading to better word-level predictions at suggest that the final output of a model may not always
the cost of segmental precision. We observe that interme- be the most linguistically faithful, especially in scenarios
diate layers (particularly Layer 22) reduce overgeneration involving limited training data or typologically
diverand avoid certain errors—such as spurious insertions or gent phonemes. We argue that future work on speech
phoneme duplications—that become more frequent at recognition in low-resource settings should move beyond
deeper layers. In several cases, these intermediate pre- traditional evaluation metrics and incorporate layer-wise
dictions better align with the gold transcription both in analysis as a standard interpretability tool. Doing so
structure and content, despite being produced with less can provide deeper insight into how models represent
contextual depth. Interestingly, we also identify cases phonological information—and where they fail.
of regressive errors, where correct predictions made at
intermediate layers are degraded at the final layer. These Future work. While our analysis focused on
Campitypically involve deletions or substitutions of phonemes danese Sardinian, applying this strategy across
typologilike /u/ and /E/, often replaced with acoustically similar cally diverse low-resource languages would help
detersegments. This suggests that deeper layers may gener- mine whether the benefits of intermediate-layer
decodalize segmental contrasts. Taken together, these results ing generalize. Additionally, attention dynamics across
indicate that error metrics like PER or CER, while use- layers may provide further insight into which
represenful at a high level, may obscure critical model behaviors. tations are retained, distorted, or lost as contextual depth
Intermediate representations may contain more faith- increases. While the model is optimized for phoneme
ful segmental information than the final output layer, transcription, it is not trained on forced-aligned phoneme
particularly in under-represented or low-resource lan- segmentation. Future work could investigate whether
guage settings. The fact that intermediate layers retain ifne-tuning on time-aligned phoneme labels or
segmenphoneme-level precision while later layers smooth over</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <sec id="sec-5-1">
        <title>Funded by the European Social Fund Plus Project code</title>
        <p>ESF2_f3_0003 “Excellence Scholarships for PhD students
on topics of strategic relevance for South Tyrol”. Work
funded by the New Perspectives on Diphthong Dynamics
(DID) project I83C22000390005.
tation tasks improves final-layer predictions and reduces
regressive errors. It would also be valuable to replicate
this analysis on a language that was part of the model’s
pretraining or fine-tuning data (e.g., English) to assess
whether intermediate layer advantages persist even in
high-resource settings.
Audio File
Layer-wise SAMPA predictions and reference for utterances
with the largest PER improvement.</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and DeepL
Write /
DeepL Translate in order to: Drafting content, Text translation, Paraphrase and reword, and
Improve writing style. After using these tool(s)/service(s), the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          30_F_extract
          <year>_</year>
          02 03_F_extract
          <year>_</year>
          01 29
          <string-name>
            <surname>_</surname>
            <given-names>M_</given-names>
          </string-name>
          <source>extract_03 24: iE5ntsu:tVmla:u5Nti:S 23: iE5ntsu:tamla:u5Nti:Si: 22: e:ntsutamla:u5tiSi Ref: ensudwamillaundiZi 24: dedega:nivutibiStozorUnsa:kuzo:apEttsa:zU 23: dedega:nivutebiStuzogunsa:kuzo:apEtsa:zu 22: dedega:nivutebStzorunsakuzoapEtsazu Ref: dEdEGanivuntibiStiuzuGusakuzuapEtsauzu 24: snunorantazzaeti 23: snunorantazzaeti 22: snunorantazaeti Ref: sunO4antazEti 24: ekambjadame:4a 23: ekambjadame:4a</source>
          <volume>22</volume>
          :
          <article-title>ekambjadame4a Ref: eekambjadame4a L0: miza:gata:oudegonoSamuleDimiae L1: mizagata:oudegonoSamuleDimiae L2: mizagataodegonoSamuleDimiae Ref: mizEaGataudeGOnOSamullEDimiaE?</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>