<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classifying Gas Pipe Damage Descriptions in Low-Diversity Corpora</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Catalano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico D'Asaro</string-name>
          <email>federico.dasaro@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Pantaleo</string-name>
          <email>michele.pantaleo@studenti.polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minal Jamshed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prima Acharjee</string-name>
          <email>prima.acharjee@studenti.polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Giulietti</string-name>
          <email>n.giulietti@composite-research.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugenio Fossat</string-name>
          <email>e.fossat@composite-research.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Rizzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LINKS Foundation - Torino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CLiC-it 2025: Eleventh Italian Conference on Computational Linguistics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Composite Research - Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Torino - Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Possible Values Galvanised fittings</institution>
          ,
          <addr-line>Steel, Bitumen-coated steel, Polyethylene-coated steel, Cast iron, Polyethylene Non-sheared linear lesion, Hole, Cluster of holes, Sheared linear lesion, Visible axial deformation, Thread, Elbow, Sleeve, Tee, Nipples, Ball valve True, False True, False</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper introduces a retrieval-based text classification framework tailored for language corpora in the domain of gas pipe damage description analysis, with a specific focus on determining patch applicability. Due to the scarcity of free-text damage descriptions in this domain, we construct a synthetic binary classification dataset, referred to as CoRe-S. This dataset consists of 11,904 damage descriptions generated from structured attributes, where each instance is labeled as either Patchable (True) or Unpatchable (False). The CoRe-S dataset presents two primary challenges: (i) a class imbalance, where positive cases are the minority, and (ii) frequent use of domain-specific terminology, which results in low lexical diversity across descriptions. To quantify this lack of variation, we introduce the Corpus Pairwise Diversity statistic, which measures the degree of lexical dissimilarity between documents in a corpus. We adopt a training-free, retrieval-based text classification approach and demonstrate that Sentence-BERT-NLI is the most efective encoder under low-diversity conditions, as it excels at capturing subtle lexical and semantic diferences between otherwise similar documents. To address the class imbalance, we apply random undersampling, which outperforms other under-sampling strategies in our experiments. Our results show that the proposed retrieval-based classifier significantly outperforms other training-free text classification methods-whether zero-shot, few-shot, or similarity-based-achieving an improvement of approximately 35.2% in macro F1-score over the second-best method. Our code is publicly available at: https://github.com/links-ads/core-unimodal-retrieval-for-classification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Gas pipe damage description analysis</kwd>
        <kwd>Training-free text classification</kwd>
        <kwd>Low lexical diversity</kwd>
        <kwd>Low lexical diversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Text classification is the task of assigning predefined
labels to a given text and has been applied to a wide range
of domains, including sentiment analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], emotion
recognition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], news classification [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and spam
detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Early approaches typically decomposed the task
into two stages: feature extraction using neural
models such as Recurrent Neural Networks (RNNs) [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] or
Convolutional Neural Networks (CNNs) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], followed
by feeding the extracted features into a classifier [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to
predict labels. With the emergence of transformer
architectures [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Large Pretrained Models (LPMs) such as
BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and GPT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have become the foundation
for modern NLP systems. Trained on massive textual
corpora, these models demonstrate strong
generalization capabilities across various downstream tasks, often
without requiring additional task-specific training data.
      </p>
      <p>In this work, we address the task of text classification
over gas pipe damage descriptions, with the objective
of determining whether a patch is applicable (True) or
not (False). Due to the limited availability of free-text
damage reports in this domain, we construct a synthetic
binary classification dataset, referred to as CoRe-S. This Diversity, to quantify the lexical dissimilarity
dataset comprises 11,904 damage descriptions generated between documents within a corpus.
from structured attributes such as pipe material, lesion • We demonstrate that in low-diversity settings, a
type, and pipe exposure. This setting poses two main Natural Language Inference–pretrained encoder,
challenges: (i) a class imbalance, where positive cases are specifically SBERT-NLI, outperforms standard
sethe minority; and (ii) low lexical diversity, as descriptions mantic similarity models by efectively capturing
tend to be highly similar across classes, relying heavily subtle distinctions between documents belonging
on domain-specific terminology and recurring linguistic to diferent classes.
patterns. Consequently, texts from diferent categories
may be lexically indistinguishable, complicating
classification based on surface-level features. 2. Background on Training-Free</p>
      <p>
        To quantify this lexical variability, we introduce a Text Classification
novel statistic, Corpus Pairwise Diversity, which
measures the degree of lexical dissimilarity between docu- With the advent of transformer architectures equipped
ments within a corpus. When applied to our dataset, this with attention mechanisms [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a new wave of
Largestatistic produces significantly lower values compared scale Pretrained Models (LPMs) has emerged. These
modto generalist corpora such as 20NewsGroups [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which els are trained on vast textual corpora such as
BooksCorare characterized by a broader vocabulary and greater pus (800M words) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Common Crawl [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Modern
topical diversity. PLMs are predominantly based on either the BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
      </p>
      <p>
        For the classification task, we employ a training- or GPT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] architectures. BERT utilizes a transformer
free, retrieval-based framework, depicted in Figure 1, encoder to produce dense contextual representations of
that leverages PLMs, consisting of a document encoder input text, making it well-suited for language
understandand a similarity-based classifier. Given the low cor- ing tasks. In contrast, GPT adopts a decoder-only
archipus diversity and frequent repetition of domain-specific tecture originally designed for generative applications,
terms—regardless of class—conventional semantic search though it has also shown strong performance in
classifimodels may underperform in this setting, as they often cation tasks [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. Both architectural families exhibit
fail to capture fine-grained linguistic distinctions. For strong transfer learning capabilities, enabling efective
instance, two descriptions may difer only in a subtle fea- adaptation to a variety of downstream tasks, and paving
ture such as pressure level, which can determine whether the way for training-free approaches to text classification.
a leak is patchable. BERT-based approaches leverage embeddings to
com
      </p>
      <p>
        This observation motivates the hypothesis that en- pare semantic similarity between pieces of text.
Decoders focusing on logical inference, rather than rely- pending on the nature of the task, these methods can
ing solely on surface-level semantic similarity, are better be broadly categorized into: (i) zero-shot methods, which
suited for classification in such contexts. Accordingly, we compare the input text directly with class labels or their
employ the Sentence-BERT model pre-trained on Natural representative keywords [
        <xref ref-type="bibr" rid="ref13 ref18 ref19">18, 19, 13</xref>
        ]; and (ii)
retrievalLanguage Inference (NLI), a task that requires determin- based methods, which perform semantic search over a
ing whether a hypothesis can be logically inferred, contra- database containing auxiliary knowledge [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ].
dicted, or is neutral with respect to a given premise. We Schopf et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] presented, for the first group of
methadopt SBERT-NLI [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which efectively captures subtle ods (zero-shot), two diferent approaches. The first one
lexical and semantic diferences between near-identical consists of representing each document as the average
documents. To mitigate the efects of class imbalance, of its paragraph embeddings. Similarly, each label is
repwe apply random undersampling to the retrieval corpus, resented as the average embedding of a set of predefined
which achieves superior performance compared to alter- keywords associated with that label. Classification is
native imbalance-handling strategies in our experiments. then performed by computing the similarity between
Experimental results demonstrate that our text classifi- the document and label embeddings, assigning the label
cation model consistently outperforms state-of-the-art with the highest similarity score. The second approach,
training-free approaches, including zero-shot, few-shot, instead, implements a zero-shot entailment technique.
and similarity-based methods. Each input document is paired with a hypothesis
repreThe main contributions of this work are as follows: senting a candidate label, and the model predicts whether
the hypothesis is entailed by the input.
• We introduce CoRe-S, a novel dataset in the do- GPT-based approaches, on the other hand, leverage
main of gas pipe damage descriptions, which, to the full potential of natural language processing and the
the best of our knowledge, is the first dataset de- generative capabilities embedded in the models. These
veloped in this domain. methods are typically applied in either: (i) a zero-shot
• We introduce a novel statistic, Corpus Pairwise fashion, where predictions are made without any labeled
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. CoRe-S Dataset</title>
      <sec id="sec-2-1">
        <title>To explore this idea and assess its feasibility, we construct a synthetic dataset by transforming existing structured tabular data—originally collected in the field—into natural language descriptions.</title>
        <p>The original tabular dataset comprises 11,904 pipe
repair interventions. Each intervention is described
using 11 categorical or boolean features—listed in
Table 1—which capture the condition of the pipe at the time
of the damage. Additionally, each record is labeled as
Patchable (True) or Not Patchable (False), depending
on whether the intervention involved a successful patch
or required replacement of the pipe segment. Among all
interventions, only 126 examples (1.06%) are labeled as
successful patches, while the remaining 11,778 (98.94%)
represent replacements.</p>
        <p>We generate the textual descriptions using the large
language model (LLM) Mistral-7B Instruct v0.31.</p>
        <p>Figure 2 illustrates through an example the pipeline
used to generate the dataset, where a prompt—shown in
Figure 3—combines (i) a randomly selected example from
a curated set of 36 real technician-written descriptions
and (ii) a structured template filled with the most
informative features extracted from the tabular dataset, enabling
the LLM to produce realistic and domain-specific textual
representations of pipe failures.</p>
        <p>Specifically, for each entry in the original tabular
dataset x ∈ R , we extract the relevant feature
values and insert them into the template prompt, together
with the example used to guide the writing style.</p>
        <p>The label  ∈  , False indicates whether the inter- Figure 3: Prompt template used for converting tabular data
vention was resolved via patching ( = True) or required representing pipe damage into textual descriptions. The
pipe replacement ( = False), and is directly inherited prompt is composed of: (1) the features relevant for
generatfrom the original dataset. isnpgectihaelisctotnotegnuti,daentdhe(2s)taynle.example description written by a</p>
        <p>The resulting CoRe-S dataset consists of pairs (, ),
where  is the synthetic textual description generated this statistic informs downstream components that rely
from the structured features of intervention , and  is on accurate estimations of inter-document similarity.
the corresponding repair label.</p>
        <p>To ensure the quality and reliability of the generated
descriptions, we perform a human review process to: (i) 4.1. Definition
verify stylistic consistency with real examples written Let  = {1, . . . ,  } be a corpus of  documents,
by technicians, and (ii) randomly assess the semantic where each document  is represented as the set of its
alignment between each description  and the original unique terms. The Jaccard distance between two
docufeature vector x. ments  and  is</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Corpus Pairwise Diversity</title>
    </sec>
    <sec id="sec-4">
      <title>Statistic</title>
      <p>
        (,  ) = 1 −
| ∩  |
| ∪  |
∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>We then define the Corpus Pairwise Diversity statistic as</title>
        <p>
          TphuissPsaecirtiwonisienDtriovdeurcseitsythsteaftoisrtmica,lwdheifinchitisoenrvoefstahsea fCouonr--  () = ︀( 1)︀ ∑︁   (,  ).
dational element for both the design and evaluation of 2 1≤ &lt;≤ 
our retrieval-based classifier. By measuring the average By construction,  () ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]; low values
indidissimilarity between the vocabularies of document pairs, cate high overall similarity, and high values indicate high
overall dissimilarity among documents. It is non-negative,
1https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 symmetric, and unafected by the order of the set .
descriptions based on embedding similarity. The final label is assigned using a majority voting mechanism over the retrieved
documents, and | | represents the vocabulary size.
text classification datasets. Here, || indicates the number of
        </p>
        <sec id="sec-4-1-1">
          <title>Dataset</title>
          <p>20NewsGroups
Yahoo! Answers
CoRe-S
||
10,998
1,375,428
11,903
| |
85,551
739,655
2283
 ()
0.99
0.99
0.69
Moreover, it is also invariant to document length and term
frequency, even when vocabulary sizes difer
substantially.
4.2. Empirical Analysis
To better understand the behaviour of the   statistic,
we compute it across multiple corpora. Table 2 shows that
datasets like 20NewsGroups and Yahoo! Answers
generally obtain higher diversity scores  (), indicating
increased textual heterogeneity and more extensive
vocabularies. In contrast, the CoRe-S dataset exhibits lower
diversity, which can be attributed to its specialized
terminology and repetitive textual patterns. This is likely
a consequence of the constrained set of attributes used
during the generation process (see Section 3), which
restricts variability in term usage. As a result, it becomes
challenging to distinguish between damage descriptions
across diferent categories.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Retrieval-based Classifier</title>
      <sec id="sec-5-1">
        <title>We adopt a zero-shot learning approach, depicted in Figure 4 built around a retrieval-based pipeline. The strategy involves retrieving the top-k most similar labeled textual descriptions based on embedding similarity and, using a</title>
        <p>5.1. Formal Description
as:
Let  ⊆ * be the set of all documents, where  is a
ifnite alphabet of symbols. The dataset  is partitioned
into two subsets: the query set  ⊆
 ⊆
relevant documents from the corpus , which contains</p>
        <p>. For each query  ∈ , the system retrieves
descriptions of past pipe failures, each labeled as
patch</p>
        <p>and the corpus
able (true) or not patchable (false). Let  : *
an encoding function that maps a document into an
dimensional embedding space using a pre-trained model</p>
        <p>→ R be
and let  : R</p>
        <p>R
×</p>
        <p>→ R be a similarity function that
measures the closeness between two embedded
documents  ∈  and  ∈ , the retrieval process is defined</p>
        <p>Re,, () = arg max
∑︁ ((), ())</p>
        <p>(1)
⊆ :||= ∈
where  is the corpus,  is the number of top retrieved
documents, ((), ()) is the similarity score between
the query document and the corpus document . We
denote the resulting top- retrieved documents for a
given query  as:</p>
        <p>*, = Re,, ()</p>
        <p>Finally, the system produces its final prediction by
applying majority voting over the labels of the documents
in *,:
ˆ = MajorityVote (︀ {label() |  ∈ *,}
︀)
(2)
(3)
5.2. Encoder and Similarity Metrics</p>
        <p>
          Selection
For our training-free classification pipeline, we
explore several pre-trained encoders to generate
highquality semantic embeddings for both queries and cor- • NearMiss-2 selects majority class samples with
pus documents. All selected encoders are transformer- the smallest average distance to the farthest
sambased models chosen for their zero-shot capabili- ples of the minority class.
ties, strong performance on general-purpose seman- • NearMiss-3 first selects a subset of minority
samtic similarity benchmarks, and availability through the ples and retains their nearest neighbors among
sentence-transformers library, which facilitates the majority. Then, it keeps the majority class
seamless integration into our pipeline. Specifically, samples with the largest average distance to their
we test all-mpnet-base-v22, a sentence-transformer selected neighbors.
model based on MPNet [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], fine-tuned on over 1
billion sentence pairs for semantic similarity tasks. 5.3.3. Edited Nearest Neighbors (ENN)
We also include multi-qa-mpnet-base3, a variant
of MPNet fine-tuned on multiple question-answering The EditedNearestNeighbors (ENN) technique uses
datasets—including Natural Questions, TriviaQA, and a K-Nearest Neighbors (KNN) approach to filter out noisy
SQuAD—to better handle question-style inputs [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Fi- or ambiguous samples from the majority class. The
pronally, we use bert-base-nli-mean-tokens4, a BERT- cedure involves training a KNN classifier on the entire
based encoder trained on the SNLI and MultiNLI datasets corpus, then for each instance in the majority class,
idenfor natural language inference (NLI) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. tifying its  nearest neighbors and remove the instance if
        </p>
        <p>We evaluate two popular similarity metrics for com- any or most of its neighbors belong to a diferent class.
paring document embeddings: the dot product, which
captures the directional similarity between embeddings and 6. Experiments
the Euclidean distance (ℓ2), which measures the
straightline distance between vectors in the embedding space.
6.1. Experimental Details
5.3. Corpus Under-sampling Techniques</p>
      </sec>
      <sec id="sec-5-2">
        <title>To address class imbalance in our dataset, we use several</title>
        <p>under-sampling strategies that reduce the number of
documents in the corpus set of the majority class. We
test diferent algorithms: Random Under-sampling, Near
Miss with its 3 diferent versions and the Edited Nearest
Neighborhood.</p>
        <sec id="sec-5-2-1">
          <title>5.3.1. Random Under-sampling</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>It is a simple technique that randomly removes examples from the majority classes until the desired class distribution is reached.</title>
        <sec id="sec-5-3-1">
          <title>5.3.2. NearMiss</title>
          <p>The algorithm consists of preserving samples from the
majority class that are most relevant for the
classification task, based on the evaluations of distances between
samples from the majority and minority classes. There
are diferent versions of the same algorithm:
• NearMiss-1 selects majority class samples with
the smallest average distance to the closest
samples of the minority class.</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>2https://huggingface.co/sentence-transformers/all-mpnet-base-v2</title>
        <p>3https://huggingface.co/sentence-transformers/multi-qa-mpnetbase-dot-v1
4https://huggingface.co/sentence-transformers/bert-base-nlimean-tokens</p>
      </sec>
      <sec id="sec-5-5">
        <title>Experiments are conducted using an NVIDIA GeForce</title>
        <p>RTX 2080 Ti GPU. Model performance is primarily
evaluated using the F1-Macro score to ensure a balanced
assessment across classes. Additionally, all results are
obtained through 5-fold cross-validation, which involves
changing the split of the corpus and query set in each
fold to ensure robust evaluation. For the main results, we
also report the Recall-Macro and Precision-Macro scores.
6.2. Results</p>
        <sec id="sec-5-5-1">
          <title>6.2.1. Comparison with Zero-Shot Classification</title>
        </sec>
        <sec id="sec-5-5-2">
          <title>Methods</title>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>We compare our zero-training retrieval-based classification approach with several zero-shot and few-shot classiifcation baselines.</title>
        <p>
          The Baseline approach[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] represents each
document as the average of its paragraph embeddings.
Similarly, each label is represented as the average
embedding of a set of predefined keywords associated with
that label. Classification is performed by computing the
similarity between the document and label embeddings,
assigning the label with the highest similarity score.
We evaluate this method using two diferent encoders:
all-MiniLM-L6-v2 and all-mpnet-base-v2.
        </p>
        <p>
          We also implement a zero-shot entailment
technique[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], using pre-trained models such as
DistilBERT, BART-large, and DeBERTa. Each input
document is paired with a hypothesis representing a
candidate label, and the model predicts whether the
hypothesis is entailed by the input.
        </p>
      </sec>
      <sec id="sec-5-7">
        <title>Corpus Under-sampling Figure 7 shows how difer</title>
        <p>ent sampling strategies impact performance on the
CoReS dataset across values of  from 1 to 15. The results
reported in the figure represent the best outcomes
ob6.2.2. Ablation Study tained across the tested hyperparameter configurations.
Similarity Metric Selection In our zero-shot pipeline, When no under-sampling is applied, macro-F1 peaks at
we evaluate two most used similarity metrics: the dot  = 7 (0.609), but then declines as if additional neighbors
product and the Euclidean distance (ℓ2). Figure 5 il- introduce semantic noise. In contrast, applying
underlustrates the performance of our retrieval-classification sampling leads to higher macro-F1 scores across all values
strategy under both similarity functions. Both metrics of . Notably, random under-sampling achieves the best
are tested across all selected encoders to determine which overall performance, improving from 0.601 at  = 1 to
(a) MPNet
(b) SBERT NLI
(c) SBERT NLI + Undersampling
tic distinctions. After under-sampling, however, the red
points are pushed further away from the green points,
creating clearer separations between classes. This
enhanced separation corresponds to improved macro-F1
performance, demonstrating how under-sampling helps
the model better distinguish between patchable and
nonpatchable instances by reducing class imbalance and
mitigating semantic noise.</p>
        <sec id="sec-5-7-1">
          <title>6.2.3. Cross-Corpus Encoder Selection with</title>
        </sec>
        <sec id="sec-5-7-2">
          <title>Varying Lexical Diversity</title>
        </sec>
      </sec>
      <sec id="sec-5-8">
        <title>To further explore the influence of corpus lexical diver</title>
        <p>sity on model performance, we expand our evaluation
a peak of 0.687 at  = 15. This suggests that random beyond CoRe-S to include two additional text
classificaunder-sampling efectively balances the class distribution
in the corpus, enabling the model to achieve stronger wtiohnicdhadtaemseotsn:s2tr0aNteewhsiGghroeur plesxaicnadl YvaahrioaobiAlintys,waesrss,hboowthn oinf
generalization and more robust performance.</p>
        <p>The use of near-miss under-sampling, on the other sSteacttiisotinc.4 using our proposed Corpus Pairwise Diversity
hand, significantly degrades performance. Although the We compare the performance of three document
enedited nearest neighbor (edited nn) strategy performs bet- coders within the same retrieval-based classification
ter than using no under-sampling at all, it still falls short framework: SBERT-NLI, MPNet and QA-MPNet. For
of the results achieved with random under-sampling. evaluation, each dataset’s test set is evenly split into two
This may be because these strategies remove fewer train- subsets: one half is used as the retrieval corpus and the
ing examples and may not suficiently rebalance the cor- other half as the query set, where classification
perforpus. In fact, the high similarity in textual descriptions mance is measured.
with label patchable or non-patchable can lead to very Table 4 reports the best F1 scores achieved by each
close embeddings and as a result, these strategies might encoder on the respective datasets. The results reveal a
remove fewer examples. Random under-sampling in- clear interaction between corpus lexical diversity and
enrsetesaudltionpgeirnataemsosoreleplyrobnaosuendceodnraedculactsisonraatniod tahmreoshreoeldf-, cSoBdEeRrTe-feNctLivIeancehsise.vOesn tthhee lhoiwgh-deisvterFs1itsycoCreo,Rseu-Sppdoartatisnegt,
fective rebalancing of the corpus. The best performance our hypothesis that NLI-pretrained models are better
is achieved with a reduced corpus of 962 training samples suited for distinguishing fine-grained linguistic nuances
and the full set of 5,952 query instances. between similar documents. In contrast, on the
higher</p>
        <p>Figure 6(c) illustrates a t-SNE representation of the
document embeddings produced by the best encoder, SBERT- dNievtecrosintysisdtaetnatsleytsou2t0pNeerwfosrGmrosuthpse
aonthdeYraehnocoodAenrssw.Ienrst,hMesPeNLI, after applying random under-sampling to the corpus. settings, MPNet’s enhanced ability to capture broad
seAs previously shown, SBERT-NLI naturally clusters green mantic content makes it more efective at handling lexical
points (true labels) and red points (false labels) near each variation.
other, reflecting its ability to capture fine-grained
seman</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Limitations</title>
      <p>A key limitation of this study is the reliance on synthetic
data. While synthetic fault descriptions are necessary due
to the lack of large-scale real-world technician-written
reports, they may not fully capture the noise, variation,
and contextual complexity present in actual field
documentation. This may afect the generalizability of the
ifndings when applied to real-world scenarios. Future
work should explore the collection and use of
authentic, technician-authored data to validate and refine the
proposed method.</p>
    </sec>
    <sec id="sec-7">
      <title>8. Conclusion</title>
      <sec id="sec-7-1">
        <title>In this paper, we address the task of classifying gas pipe</title>
        <p>damage descriptions. Starting from a set of damage
features and real examples, we generate a new dataset called
CoRe-S, the first of its kind in this domain. This dataset
exhibits low lexical diversity, characterized by a restricted
and repetitive vocabulary, along with severe class
imbalance. To quantify lexical diversity within a corpus, we
propose the Corpus Pairwise Diversity statistic.</p>
        <p>To overcome these challenges, we design a
trainingfree retrieval-based text classifier that leverages
SBERTNLI to handle low lexical diversity, combined with
undersampling techniques to mitigate class imbalance.
Experimental results demonstrate that our method outperforms
other training-free approaches, including zero-shot,
fewshot, and similarity-based methods. Additional
experiments suggest that natural language inference pretrained
text encoders are particularly efective in low-diversity
scenarios where subtle diferences between texts of
different labels must be captured.</p>
        <p>Table 4 Future work may involve a more extensive
compariBest F1 scores obtained with each encoder across datasets, son of text encoder efectiveness across various text
clasusing the same retrieval-based classification framework. sification datasets exhibiting diferent levels of lexical
diversity.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <sec id="sec-8-1">
        <title>The authors acknowledge that this work has been par</title>
        <p>tially funded by the European Union and by the
Italian Ministry of Enterprises and Made in Italy (MIMIT),
through the EXPAND project, Grant Agreement No.
101083443.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis</article-title>
          and
          <source>opinion mining</source>
          , Springer Nature,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>F. D'Asaro</surname>
            ,
            <given-names>J. J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Villacís</surname>
          </string-name>
          , G. Rizzo,
          <article-title>Transfer learning of large speech models for italian speech emotion recognition</article-title>
          ,
          <source>in: 2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kaushik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <article-title>Fake news classification using transformer based enhanced lstm and bert</article-title>
          ,
          <source>International Journal of Cognitive Computing in Engineering</source>
          <volume>3</volume>
          (
          <year>2022</year>
          )
          <fpage>98</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Spam detection and classification based on distilbert deep learning algorithm</article-title>
          ,
          <source>Applied Science and Engineering Journal for Advanced Research</source>
          <volume>3</volume>
          (
          <year>2024</year>
          )
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Henao</surname>
          </string-name>
          , L. Carin,
          <article-title>Joint embedding of words and labels for text classification</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>04174</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          , E. Hovy,
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Unsupervised data augmentation for consistency training</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>6256</fpage>
          -
          <lpage>6268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zou</surname>
          </string-name>
          , Eda:
          <article-title>Easy data augmentation techniques for boosting performance on text classification tasks</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>11196</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacovi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. S.</given-names>
            <surname>Shalom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Understanding convolutional neural networks for text classification</article-title>
          , arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>08037</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Improving language understanding by generative pre-training</article-title>
          .(
          <year>2018</year>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lang</surname>
          </string-name>
          , Newsweeder: Learning to filter netnews,
          <source>in: Machine learning proceedings 1995, Elsevier</source>
          ,
          <year>1995</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Urtasun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          ,
          <article-title>Aligning books and movies: Towards story-like visual explanations by watching movies and reading books</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shao</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          , et al.,
          <article-title>A comprehensive capability analysis of gpt-3 and gpt-3.5 series models</article-title>
          ,
          <source>arXiv preprint arXiv:2303.10420</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Wang,
          <article-title>Text classification via large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2305.08377</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot text classifiers</article-title>
          ,
          <source>arXiv preprint arXiv:2312.01044</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Lbl2vec: An embedding-based approach for unsupervised document retrieval on predefined topics</article-title>
          ,
          <source>arXiv preprint arXiv:2210.06023</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , T.-Y. Liu,
          <article-title>Mpnet: Masked and permuted pre-training for language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>16857</fpage>
          -
          <lpage>16867</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shah</surname>
          </string-name>
          , E. Fox,
          <article-title>Retrieval-based text selection for addressing class-imbalanced data in classification</article-title>
          ,
          <source>arXiv preprint arXiv:2307.14899</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Abdullahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickhof</surname>
          </string-name>
          ,
          <article-title>Retrieval augmented zero-shot text classification</article-title>
          ,
          <source>in: Proceedings of the 2024 ACM SIGIR international conference on theory of information retrieval</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Evaluating unsupervised text classification: zero-shot and similaritybased approaches</article-title>
          ,
          <source>in: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <article-title>Learning to retrieve prompts for in-context learning</article-title>
          ,
          <source>arXiv preprint arXiv:2112.08633</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kasai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          , et al.,
          <article-title>Selective annotation makes language models better few-shot learners</article-title>
          ,
          <source>arXiv preprint arXiv:2209</source>
          .
          <year>01975</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Daxenberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <article-title>Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models</article-title>
          ,
          <source>Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM)</source>
          (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/ 2104.08663.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>