<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Constantin Orasan</string-name>
          <email>c.orasan@surrey.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhe Wu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shenbin Qian</string-name>
          <email>s.qian@surrey.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diptesh Kanojia</string-name>
          <email>d.kanojia@surrey.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samarth Agrawal</string-name>
          <email>samagrawal@ebay.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hadeel Saadany</string-name>
          <email>hadeel.saadany@bcu.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Swapnil Bhosale</string-name>
          <email>s.bhosale@surrey.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>E-commerce, Search, Matryoshka, Representation Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Birmingham City University</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Surrey</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>eBay Inc</institution>
          ,
          <addr-line>San Jose, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>eBay Inc</institution>
          ,
          <addr-line>Seattle, WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Embedding Approach to product Retrieval and Ranking, called NEAR2, which can achieve up to 12 times efficiency in embedding size at inference time while introducing no extra cost in training and improving performance in accuracy for various encoder-based Transformer models. We validate our approach using different loss functions for the retrieval and ranking task, including multiple negative ranking loss and online contrastive loss, on four different test sets with various IR challenges such as short and implicit queries. Our approach achieves an improved performance over a smaller embedding dimension, compared to any existing models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ranking</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
can lead to dissatisfaction or abandoned searches. Optimizing these systems to handle large-scale data
efficiently without compromising accuracy is a critical challenge in e-commerce search.</p>
      <p>
        In this paper, we propose a Nested Embedding Approach to product Retrieval and Ranking, called
NEAR2, which can achieve efficient product retrieval and ranking using much smaller embedding sizes
of encoder-based Transformer models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This approach maintains performance comparable to the full
model without incurring additional training costs. Our evaluation results on various test sets that contain
different types of challenging queries, such as implicit and alphanumeric queries, indicate that NEAR2
can improve model performance on these challenging datasets using significantly smaller embedding
dimension sizes. Our contributions can be summarized as follows:
• We propose NEAR2, a nested embedding approach, which can achieve up to 12× efficiency in
embedding size and 100× smaller in memory usage during inference while introducing no extra
cost in training.
• We evaluate NEAR2 on four different test sets that contains various types challenging queries.
      </p>
      <p>Evaluation results show that our approach achieves an improved performance using a much smaller
embedding dimension compared to any existing models.
• We conduct ablative experiments on different encoder-based models fine-tuned using different
IR loss functions. We find that NEAR 2 is robust to different IR losses or loss combinations for
continued fine-tuning.
• We perform a qualitative analysis on retrieved product titles using challenging queries. Our analysis
re-affirms the superior performance of our approach and reveals that the similarity scores from
NEAR2 models are more reliable than those of baseline models.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Modern IR systems encounter several challenges that hinder their performance, particularly in dealing
with complex queries and data representation. Ambiguities in natural language, vocabulary mismatches,
and the need for scalable real-time processing pose significant challenges [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Traditional term-based
models often fail due to lexical gaps and polysemy, necessitating the transition to advanced semantic
models. Semantic retrieval with dense representations, powered by neural networks and pre-trained
language models (PTLMs) like BERT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], has shown remarkable improvements in handling context and
semantics. However, these models demand substantial computational resources and struggle with implicit
or alphanumeric queries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Similarly, interaction-based approaches focus on capturing query-document
dynamics through deep neural networks, such as the Deep Relevance Matching Model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but often
sacrifice efficiency and scalability due to their inability to cache document embeddings offline and their
reliance on real-time computation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To gap the mismatch of user intent and retrieved product titles
in search queries, Saadany et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] curated a dataset annotated with user-intent centrality scores, and
proposed a dual loss optimization strategy to fine-tune PTLMs on the dataset in a multi-task learning
setting, to solve such challenges.
      </p>
      <p>
        To address the efficiency issue, researchers have proposed a range of solutions aimed at enhancing
efficiency while maintaining accuracy at the same time. Efficiency issues can be tackled through using DUET
models that employ local and distributed deep neural networks, which learns dense lower-dimensional
vector representations of the query and the document text for efficient retrieval [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Knowledge
distillation, where smaller models inherit knowledge from larger PTLMs, has proven effective in reducing
resource requirements without compromising performance for IR systems [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To mitigate computational
overhead, Wan et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proposed to use dimension reduction and distilled encoders to create lightweight
models for fast and efficient question-answer retrieval. Kusupati et al. [13] proposed Matryoshka
representation learning (MRL) which is able to encode information at different granularities, to adapt to the
computational constraints of various downstream tasks. In this paper, we tackle the challenges of accuracy
and efficiency using a nested embedding approach based on MRL to create lightweight embedding models
for IR tasks.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Methodology</title>
      <p>This section describes our nested embedding approach in § 3.1 and the backbone models in § 3.2.</p>
      <sec id="sec-4-1">
        <title>3.1. Nested Embedding Training</title>
        <p>We utilize MRL with a ranking loss to train nested embeddings of different sizes on various models.
Matryoshka Representation Learning MRL develops representations with diverse capacities within
the same higher-dimensional vector by explicitly optimizing sets of lower-dimensional vectors in a nested
manner, as illustrated in Figure 1.</p>
        <p>The initial − dimensions of the Matryoshka representation, where  ∈  , the set of nested
representation sizes, form a compact and information-dense vector that matches the accuracy of a separately trained
− dimensional representation, but requires no extra training effort. As dimensionality increases, the
representation progressively incorporates more detailed information, providing a nested coarse-to-fine
representation. This approach maintains near-optimal accuracy relative to the full dimensional scale,
while avoiding substantial training or deployment costs [14].</p>
        <p>The MRL loss is formally defined in Equation 1, where   is the loss for downstream tasks such
as the cross-entropy loss for classification tasks.   ( ) is the output of the  -th nested embedding
representation, and   is the importance weight for the  -th embedding representation.
 
=
∑    
(  ( ),  )
(1)
∈</p>
        <p>MRL learns multiple nested embedding representations, each with a different size  ∈  . The final
MRL loss is a weighted sum of the task losses for each of the nested representations. For our product
retrieval and ranking task, we set the multiple negative ranking loss (MNRL) [15] as our   .
Multiple Negative Ranking Loss MNRL measures the difference between relevant (positive) and
irrelevant (negative) examples associated with a given query. This technique ensures a clear separation by
reducing the distance between the query and positive samples while increasing the distance from negative
samples. Using multiple negative examples enhances the model’s ability to discern varying levels of
irrelevance, refining its optimization. The MNRL objective function is formulated as follows:
 
   = ∑ ∑ (0,  (,   ) −  (,   ) +  ) (2)</p>
        <p>=1 =1</p>
        <p>In Equation 2,  represents the number of positive samples;  denotes the number of negative samples;
 is the query;  is the similarity metric (cosine similarity in our case), and the   is a hyperparameter
defining the ideal distance between positive and negative samples based on the relevance score. The goal
of MNRL is to minimize the similarity between (,   ) while simultaneously maximizing the difference
between (,   ) for all positive and negative samples.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Backbone Models</title>
        <p>We used encoder-based Transformer models as our backbone for training nested embeddings for efficient
product retrieval and ranking.</p>
        <p>
          Pre-trained Language Models We initially leveraged BERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a publicly available pre-trained
encoder Transformer model. For our specific use case in e-commerce, we also employed eBERT 1, a
proprietary multilingual language model pre-trained internally at eBay. This custom model was
pretrained on a corpus of approximately three billion product titles, supplemented by data from general
domain sources like Wikipedia and RefinedWeb.
        </p>
        <p>Expanding our experimental approach, we also incorporated eBERT-siam, a fine-tuned variant of
eBERT using a Siamese network architecture. This model aims to generate semantically aligned
embeddings for item titles, making it particularly effective for similarity-based search and retrieval tasks.
Consistent across all models, we maintained a uniform architectural design of 12 layers with a dimension
size of 768.</p>
        <p>
          User-intent Centrality Optimized (UCO) Models Saadany et al. [
          <xref ref-type="bibr" rid="ref3">3, 16</xref>
          ] show how current IR systems
have problems in achieving user-centric product retrieval and ranking due to implicit or alphanumeric
queries. They curated a dataset with user-intent centrality scores (see Section 4.1) and proposed a few
models optimized for user-intent using an MNRL loss for retrieval and ranking, and an online contrastive
loss (OCL) for user-intent centrality. OCL builds on the traditional contrastive loss (CL) [17] approach but
introduces a more focused strategy. While conventional CL uses a twin network to evaluate similarities
between all data point pairs from the same and different classes, OCL targets only the most challenging
and informative pairs within a batch. By prioritizing such cases, OCL refines the loss calculation to focus
on the most critical and complex relationships between data points.
        </p>
        <p>
          They applied the two losses in a transfer learning setup for eBERT and eBERT-siam models, and
performed fine-tuning for centrality classification. Their results indicate that the UCO models achieve an
improved performance for retrieval and ranking. Details can be found in Saadany et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>To improve model efficiency and meanwhile leverage optimized performance of the UCO models, we
continued training them using NEAR2 for both eBERT-UCO and eBERT-siam-UCO models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experimental Setup</title>
      <p>This section explains the datasets we used for training, validating and testing our approach in § 4.1.
Implementation details and evaluation metrics are presented in § 4.2 and § 4.3 respectively.
4.1. Data
We utilized eBay’s internal graded relevance (IGR) datasets to train our nested embedding representation.
These datasets comprise user search queries alongside the product titles retrieved on the platform. They
are annotated by humans following specific guidelines to generate two types of buyer-focused relevance
labels.</p>
      <p>The first is a relevance ranking scheme, where query-title pairs are assigned a rank from (1) Bad, (2)
Fair, (3) Good, (4) Excellent, to (5) Perfect. A “Perfect” rating signifies an exact match between the query
and title, indicating high confidence that the user’s needs are fully met, whereas a “Bad” rating indicates
no alignment between the query and the product title. This ranking methodology aligns with previous
studies [18, 19]. The second annotation type is a binary centrality score, derived through majority voting
among multiple annotators, indicating whether a product aligns with the user’s expressed query intent.
Centrality scoring differs from relevance ranking in that it assesses whether an item is an outlier or
unexpected in the retrieval set versus being a core match to user expectations.</p>
      <p>
        To compare the results of our approach with those reported in Saadany et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we utilized the
Common Queries (CQ), CQ Balanced (CQ-balanced), CQ Common String (CQ-common-str), and
CQ Alphanumeric (CQ-alphanum) test sets proposed in their paper. The CQ test set was constructed
using queries with both positive (relevancy &gt; 3) and negative (relevancy &lt; 3) titles, resulting in a dataset
skewed toward positive pairs due to the nature of e-commerce data collection. To address this imbalance,
a new version, CQ-balanced, was created with approximately equal numbers of positive and negative
query-title pairs. The CQ-common-str set was derived by selecting queries where the exact query string
appeared in both positive and negative titles, ensuring a strong correlation between relevance scores (both
graded relevance and binary centrality). Finally, CQ-alphanum was created to include only query-title
pairs containing alphanumeric characters, allowing for a more focused evaluation. Details about their
formulation can be found in Saadany et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. An example of the datasets and the size for each test set
can be seen in Figure 2 and Table 1.
(a) The query “turtle” is a part of both positive and negative titles with very different product search outputs. It
could also be a part of the ambiguous query “turtles bepop”.
(b) The query “turtles bepop” is ambiguous as it could be referred to the major antagonist, “Bepop” or together
with other Ninjia Turtles.
      </p>
      <p>CQ</p>
      <p>CQ-balanced
CQ-common-str
CQ-alphanum</p>
      <sec id="sec-5-1">
        <title>4.2. Implementation Details</title>
        <p>We continued training the PTLMs and the UCO models in § 3.2 for 2 epochs, using our nested embedding
approach at dimension sizes of 768, 512, 256, 128 and 64, on the query-title pairs using only the relevance
ranking scores (excluding pairs with a score of 3) of the IGR datasets.</p>
        <p>During training, we ran a sequential evaluator on the ranking score data to validate for all dimension
sizes. First, the evaluator computes the embeddings for both query and title and uses them to calculate
the cosine similarity. Then, it finds the most relevant product title to the query (top 3, 5 and 10 titles) in
the corpus of all titles with a max corpus size of 200, 000. For all experiments, we set a batch size of 32,
a margin of 0.75 for the MNRL loss with the AdamW optimizer [20] and the learning rate as 5 − 05 .
Training one model using the above hyperparameters takes ≈ 1.5 hours on a single NVIDIA V100 GPU.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.3. Evaluation Metrics</title>
        <p>We evaluated the model effectiveness through multiple established evaluation metrics including precision,
recall, normalized discounted cumulative gain (NDCG) [21] and mean reciprocal rank (MRR).</p>
        <p>Precision@ quantifies the ratio of pertinent items within the top-  recommended products, focusing
on their individual relevance. Conversely, recall@ assesses the proportion of successfully retrieved
relevant items compared to the total number of applicable products, regardless of their positioning.
NDCG provides a comprehensive assessment of recommendation quality by analyzing both the relevance
and positioning of suggested items. This metric compares the actual recommendation order against an
idealized ranking, offering a nuanced evaluation of recommendation performance. MRR focuses on
measuring the average ranking position of the first relevant item across different queries. A superior MRR
indicates the model’s capability to prominently feature highly relevant products, thereby enhancing user
experience and recommendation effectiveness.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results and Discussion</title>
      <p>
        Results achieved using NEAR2 with a dimension size of 64 are shown in Table 2. Since BERT and
eBERT were not fine-tuned on e-commerce data 2, improvement achieved using our approach is huge,
as listed in Table A.1 in Appendix A. The values are shown as the percentage of increase (delta) of the
evaluation metrics in comparison of those without using NEAR2 presented in Saadany et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Comparing results upon using NEAR2 vs existing models, we find that our approach remarkably
improves performance on all test sets for all models in § 3.2, even using embeddings with a dimension
size of 64, which is 12× smaller in size and more than 100× smaller in memory usage than the full model
(see Table 3).</p>
      <p>When comparing results of different dimension sizes from the largest (768) to the smallest (64), as
shown in Table 43 for the CQ test set, we discover that the drop in performance is not significant.
Embeddings of some smaller dimensions are even slightly better than larger ones. For example, the
performance of the eBERT-siam model using NEAR2 at dimension 512 is slightly better than 768 for
2eBERT was only pre-trained on e-commerce data.
3BERT and eBERT results are in Table A.2 in Appendix A.
eBERT-siam
eBERT-UCO
eBERT-siam UCO
eBERT-siam
eBERT-UCO
eBERT-siam UCO
eBERT-siam
eBERT-UCO
eBERT-siam UCO
eBERT-siam
eBERT-UCO
eBERT-siam UCO
+11.80%
+2.98%
+2.82%
+8.85%
+3.19%
+2.77%</p>
      <p>CQ test
+9.99% +9.72%
+3.12% +2.99%
+2.72% +2.45%
CQ-balanced test
+8.85% +8.43%
+3.15% +2.81%
+2.75% +2.48%
CQ-common-str test
+6.59% +4.84%
+1.68% +1.51%
+1.48% +1.18%
CQ-alphanum test
+4.70% +4.59%
+3.61% +3.55%
+2.15% +1.87%
+9.07%
+3.16%
+2.50%
+7.28%
+2.41%
+2.05%
+11.23%
+3.34%
+2.77%
+10.65%
+3.47%
+2.80%
+9.56%
+3.03%
+2.77%
+9.06%
+3.03%
+2.58%
+10.48%
+3.25%
+3.01%
+8.51%
+1.38%
+1.85%
+4.41%
+2.57%
+2.28%</p>
      <sec id="sec-6-1">
        <title>Delta in precision, recall, NDCG, and MRR at  on all the test sets for diferent encoder-based models</title>
        <p>ifne-tuned using NEAR 2 at 64 dimensions of the entire embedding size (768).</p>
        <p>Embedding Size</p>
        <p>Memory Usage (MB)
768
512
256
128
64
which further indicates the effectiveness of our approach for product retrieval and ranking.</p>
        <p>To further validate our approach, we qualitatively compared some product titles retrieved with and
without NEAR2. The comparison consistently confirmed the superior performance of our method. Full
details are presented in Appendix B.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Ablation Study</title>
      <p>To verify whether continual training using NEAR2 can help improve performance and efficiency when
models are initially trained with other losses, we conducted several experiments using eBERT and
eBERT-siam for ablation studies. First, we continued training the models using NEAR2, which have been
ifne-tuned using the MNRL and OCL losses respectively to test if our approach works on each of the two
individual losses. Second, we tested training these models using the MRL loss first, and then continued
ifne-tuning on the MNRL and OCL losses in a multi-task learning setting. The results are contrasted with
training without using NEAR2, which are presented as the percentage of increase (delta) in the evaluation
metrics in Table 5.</p>
      <p>Our ablative results suggest that applying the nested embedding approach to training embeddings with
lower dimensions can improve performance for all models fine-tuned using the MNRL or OCL losses
for retrieval and ranking, with much obvious improvement on the models trained using the OCL loss.
However, models trained with the MRL loss first, then fine-tuned using the MNRL and OCL losses, show
slight performance degradation in terms of NDCG and MRR. This suggests that our approach is most
effective when used after training the model with an IR task loss first.
eBERT-siam
eBERT-UCO
eBERT-siam-UCO</p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion and Future Work</title>
      <p>E-commerce IR systems face the challenge of balancing accurate interpretation of complex user queries
with efficient processing of large product catalogs. To address this, we introduced NEAR 2, a nested
embedding approach for efficient product retrieval and ranking. NEAR 2 improves accuracy and achieves
up to 12× efficiency in embedding size and 100 × smaller in memory usage during inference, without any
increase in pre-training costs. Tested across diverse datasets, including short and implicit queries and
alphanumeric queries, our method outperforms existing models with smaller embedding dimensions,
demonstrating its robustness across challenging evaluation sets, and with efficiency. Our qualitative
analysis reinforces the superior performance of our approach, demonstrating that embeddings generated
by NEAR2 models are significantly more reliable than those of baseline models when evaluated based on
similarity scores. For future work, we plan to: 1) evaluate our model performance through / testing in
deployment, 2) leverage internal data to refine larger decoder-based generalist embedding models like
NV-embed-v2 [22], and 3) optimize these models using our NEAR2 approach.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT (GPT-4) and Grammarly in order to:
Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.
Linguistics: Human Language Technologies: Industry Track, Association for Computational
Linguistics, Hybrid: Seattle, Washington + Online, 2022, pp. 334–343. URL: https://aclanthology.
org/2022.naacl-industry.37. doi:10.18653/v1/2022.naacl-industry.37.
[13] A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V. Ramanujan, W. Howard-Snyder,
K. Chen, S. Kakade, P. Jain, et al., Matryoshka representation learning, in: Advances in Neural
Information Processing Systems, 2022.
[14] X. Li, Z. Li, J. Li, H. Xie, Q. Li, ESE: Espresso sentence embeddings, arXiv preprint (2024).</p>
      <p>arXiv:2402.14776.
[15] M. Henderson, R. Al-Rfou, B. Strope, Y.-H. Sung, L. Lukács, R. Guo, S. Kumar, B. Miklos,
R. Kurzweil, Efficient natural language response suggestion for smart reply, arXiv preprint
arXiv:1705.00652 (2017).
[16] H. Saadany, S. Bhosale, S. Agrawal, Z. Wu, C. Ora˘san, D. Kanojia, Product retrieval and ranking for
alphanumeric queries, in: Proceedings of the 33rd ACM International Conference on Information
and Knowledge Management, CIKM ’24, Association for Computing Machinery, New York,
NY, USA, 2024, p. 55645565. URL: https://doi.org/10.1145/3627673.3679080. doi:10.1145/
3627673.3679080.
[17] F. Carlsson, A. C. Gyllensten, E. Gogoulou, E. Y. Hellqvist, M. Sahlgren, Semantic re-tuning
with contrastive tension, in: International Conference on Learning Representations, 2021. URL:
https://openreview.net/forum?id=Ov_sMNau-PF.
[18] Y. Jiang, Y. Shang, R. Li, W.-Y. Yang, G. Tang, C. Ma, Y. Xiao, E. Zhao, A unified neural network
approach to e-commerce relevance learning, in: Proceedings of the 1st International Workshop
on Deep Learning Practice for High-Dimensional Sparse Data, DLP-KDD ’19, Association for
Computing Machinery, New York, NY, USA, 2019. URL: https://doi.org/10.1145/3326937.
3341259. doi:10.1145/3326937.3341259.
[19] D. Kang, W. Jang, Y. Park, Evaluation of e-commerce websites using fuzzy hierarchical topsis based
on e-s-qual, Applied Soft Computing 42 (2016) 53–65. URL: https://www.sciencedirect.com/
science/article/pii/S1568494616300047. doi:https://doi.org/10.1016/j.asoc.2016.01.017.
[20] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on</p>
      <p>Learning Representations, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.
[21] K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of ir techniques, ACM Trans. Inf. Syst.</p>
      <p>20 (2002) 422446. URL: https://doi.org/10.1145/582415.582418. doi:10.1145/582415.582418.
[22] C. Lee, R. Roy, M. Xu, J. Raiman, M. Shoeybi, B. Catanzaro, W. Ping, Nv-embed: Improved
techniques for training llms as generalist embedding models, 2024. URL: https://arxiv.org/abs/
2405.17428. arXiv:2405.17428.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Additional Figures and Tables</title>
      <p>Model
+230.75%
+180.69%
+273.74%
+197.78%
+164.96%
+124.19%
+262.73%
+186.77%
+226.68%
+160.48%
+117.16%
+104.15%
Table A.1</p>
      <sec id="sec-10-1">
        <title>Delta in precision, recall, NDCG, and MRR at  on all the test sets for BERT and eBERT fine-tuned</title>
        <p>using NEAR2 at 64 dimensions of the entire embedding size (768).</p>
      </sec>
      <sec id="sec-10-2">
        <title>Model</title>
        <p>+265.40%
+265.57%
+264.91%
+262.16%
+251.93%
+185.11%
+185.24%
+184.72%
+182.58%
+174.59%
+170.99%
+171.13%
+170.52%
+169.54%
+164.96%
+119.88%
+120.00%
+119.50%
+118.71%
+114.99%</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>B. Detailed Qualitative Analysis</title>
      <p>To understand the performance improvements of our approach compared to existing models, we conducted
a qualitative analysis using examples from the CQ test set. Specifically, we generated inferences for
all instances in the CQ test set with eBERT and eBERT-siam4 using or not using the NEAR2 approach
at a dimension size of 64 (NEAR2@64). For each query, we retrieved the top 10 product titles and
ranked them based on their cosine similarity scores. To evaluate real-world performance, we selected two
representative queries: one short and implicit query and one long and detailed query. These examples
provided insights into how our approach performs relative to eBERT or eBERT-siam in practical scenarios.
Short and Implicit Query Table B.1 illustrates the retrieved titles, their rankings (from 1 to 10), and
their normalized5 similarity scores for the short and implicit query “plants” with eBERT. Based on the
gold label, the expected product title should include “potted plants”. For the model using NEAR2@64, all
retrieved product titles contained relevant keywords such as “plant” or “pot”, along with detailed product
descriptions. In contrast, the titles retrieved by the model without using NEAR2@64 were significantly
4We mainly analyze results from eBERT. Results from eBERT-siam can be seen in Tables B.3 and B.4.
5Against the minimum value.</p>
      <sec id="sec-11-1">
        <title>Retrieved titles for the short and implicit query “plants” using or not using NEAR2@64 on</title>
        <p>shorter, with many lacking the keyword “plant” and some, such as “coins”, being entirely irrelevant to the
query. Notably, the normalized similarity scores from without using NEAR2@64 are much lower than
those of using NEAR2@64, which is responsible for those irrelevant titles retrieved. This highlights the
unreliability of the similarity scores from models without using NEAR2.
Philodendron Micans Rooted Cutting Trailing House Plant Cuttings Rare Plants</p>
        <p>Tillandsia Mix 5 Plants Indoor Air Plant for House Vivarium Terrarium</p>
        <p>Big leaf philodendron pink princess plant cutting 1 leaf cutting
2 NEON PINK SALVIA PLANT PERENNIAL SAGE HIGHLY FRAGRANT
Spathiphyllum Peace Lily Indoor Plants 1 x Potted Lily House Plant 9cm Pot</p>
        <p>Cissus Discolor aka Rex Begonia Vine 6 inch pot
3 Plant 4 Pots Great Houseplant Assorted Rex Begonia Easy to grow housepl
PHILODENDRON MELANOCHRYSUM VERY LARGE 25 3 FEET TALL STUNNING PLANT</p>
        <p>Spathiphyllum Peace Lily House Plant Live Indoor House Potted Tree In 9cm
PHILODENDRON PINK PRINCESS LARGE PLANT IN 15CM POT HOUSE PLANT</p>
        <p>Avocado plant</p>
        <p>coins
Begonia Butterfly
drinks cabinet</p>
        <p>Eucalyptus tree
portfolio landscape lights
Nico the marble index
car assessories</p>
        <p>Begonia Curly Q</p>
        <p>Houseplant and Pot Package
Aloe Vera Plant - Large Plant in Pot</p>
        <p>Retrieved Title</p>
      </sec>
      <sec id="sec-11-2">
        <title>Retrieved titles for the long and detailed query “925 sterling silver triplet opal gemstone jewelry</title>
        <p>vintage pendant s-1.20” using or not using NEAR2@64 on eBERT.</p>
      </sec>
      <sec id="sec-11-3">
        <title>Long and Detailed Query</title>
        <p>Table B.2 presents the retrieved titles, their rankings, and their normalized
similarity scores for the long and detailed query “925 sterling silver triplet opal gemstone jewelry vintage
pendant s-1.20” with eBERT. Given the specificity of the query, even using the exact gold label title did
not yield the exact product on eBay. However, the model using NEAR2@64 retrieved similar products,
as shown in Figure B.1(b). In contrast, the products retrieved using top-ranked title from eBERT without
NEAR2@64, shown in Figure B.1(c), were significantly less relevant compared to those retrieved using
the gold label title in Figure B.1(a). These results further demonstrate the effectiveness of NEAR2@64.
As with the short query example in Table B.1, normalized similarity scores from eBERT without using
NEAR2@64 are much lower than those using NEAR2@64, further underscoring its limitations.
(a) Products retrieved using the gold label title.</p>
        <p>(b) Products retrieved using the first title from NEAR 2@64.</p>
        <p>(c) Products retrieved using the first title from eBERT without NEAR 2@64.</p>
        <p>Figure B.1: Products retrieved on eBay using the gold label title (a), the top one title from eBERT using</p>
      </sec>
      <sec id="sec-11-4">
        <title>NEAR2@64 (b) and eBERT not using NEAR2@64 (c) for the query-title pairs in Table B.2.</title>
        <p>Performance Disparity To investigate the root cause of performance disparity, we plotted the
distribution of original similarity scores based on eBERT for all retrieved query-title pairs in the CQ test set, as
shown in Figure B.2. The scores from the model using NEAR2@64 are well-distributed between 0.5 and
1.0, reflecting nuanced relevance evaluations. In contrast, scores from eBERT without using NEAR 2@64
are clustered between 0.9 and 1.0, with most query-title pairs assigned a score near 0.95. This uniform
distribution suggests that eBERT fails to effectively differentiate between relevant and irrelevant titles,
leading to poor ranking performance. These findings further validate the superiority of NEAR 2@64 in
the evaluation metrics for product retrieval and ranking tasks.</p>
        <p>For product titles retrieved by eBERT-siam, whether for the short, implicit query or the long, detailed
query, the differences in appearance between using and not using NEAR2@64 are less pronounced
compared to those observed with eBERT. However, the similarity scores still show a notable distinction.
As illustrated in Figure B.3, the model using NEAR2@64 produces scores that are well-distributed
between 0.45 and 1.0. In contrast, the scores from the model without this approach are more tightly
clustered between 0.65 and 1.0, with the majority of query-title pairs receiving scores between 0.75 and
0.9. These results are consistent with the findings from the eBERT model.</p>
        <p>Gold label
CRAZY DAISY Shasta daisies Qty 2 PLANTS Hardy Perennial Healthy plants</p>
        <p>CRAZY DAISY Shasta daisies Qty 2 x Hardy Perennialhealthy plants</p>
        <p>Streptocarpus MKsArktur09 young plant
Spathiphyllum Peace Lily Indoor Plants 1 x Potted Lily House Plant 9cm Pot</p>
        <p>Houseplant and Pot Package</p>
        <p>Spathiphyllum Peace Lily House Plant Live Indoor House Potted Tree In 9cm
Boston FernLive 10 Plants Lots Of Roots Air Purifier Reptile Terrarium ORGANIC</p>
        <p>1 x CRAZY DAISY Shasta daisies Hardy Perennial Healthy plant</p>
        <p>Leucanthemum Crazy Daisy Middleton Nurseries Flowering hardy Plants
Syngonium White Butterfly Arrowhead Goose Foot Plant House Plant Easy Care</p>
        <p>Houseplant and Pot Package
Spathiphyllum Peace Lily Indoor Plants 1 x Potted Lily House Plant 9cm Pot</p>
        <p>Spathiphyllum Peace Lily House Plant Live Indoor House Potted Tree In 9cm
Cordyline Kiwi Ti Plant 7c Best Indoor Plants 7c Colourful 3040cm Potted Plant
68 Live Snake Plant Sansevieria Trifasciata Two Plants</p>
        <p>Leucanthemum Crazy Daisy in plant in 13cm pot approx
Multi Listing Pond Plants Marginal Plants Water Bog Garden Oxygenator SALE
12 Succulent Flowers not Included Pots 12 Pcs 12 Fashion Practical</p>
        <p>Avocado plant
3CM Succulent Cactus Live Plant Copiapoa Tenuissima Chile Home Garden Rare Plant</p>
        <p>Aloe Vera Plant - Large Plant in Pot</p>
        <p>Gold label</p>
        <p>Retrieved Title
Table B.4</p>
      </sec>
      <sec id="sec-11-5">
        <title>Retrieved titles for the detailed query “925 sterling silver triplet opal gemstone jewelry vintage</title>
        <p>pendant s-1.20” using or not using NEAR2@64 on eBERT-siam.
Ranking</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.-M. Wu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>Embedding-based product retrieval in taobao search</article-title>
          ,
          <source>in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining, KDD '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>31813189</fpage>
          . URL: https://doi.org/10.1145/3447548.3467101. doi:
          <volume>10</volume>
          .1145/3447548.3467101.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Keyvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>How to approach ambiguous queries in conversational search: A survey of techniques, approaches, tools, and challenges</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). URL: https: //doi.org/10.1145/3534965. doi:
          <volume>10</volume>
          .1145/3534965.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saadany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhosale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kanojia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Orasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Centrality-aware product retrieval and ranking</article-title>
          , in: F.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Preo¸tiuc-</article-title>
          <string-name>
            <surname>Pietro</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Shimorina (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          , Association for Computational Linguistics, Miami, Florida,
          <string-name>
            <surname>US</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>224</lpage>
          . URL: https: //aclanthology.org/
          <year>2024</year>
          .emnlp-industry.
          <volume>17</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Mhawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Oleiwi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Al-Taie</surname>
          </string-name>
          ,
          <article-title>An efficient information retrieval system using evolutionary algorithms</article-title>
          ,
          <source>Network</source>
          <volume>2</volume>
          (
          <year>2022</year>
          )
          <fpage>583</fpage>
          -
          <lpage>605</lpage>
          . URL: https://www.mdpi.com/ 2673-8732/2/4/34. doi:
          <volume>10</volume>
          .3390/network2040034.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Hambarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Proença</surname>
          </string-name>
          , Information retrieval:
          <article-title>Recent advances and beyond</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>76581</fpage>
          -
          <lpage>76604</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3295776</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Large language models for information retrieval: A survey, arXiv preprint (</article-title>
          <year>2023</year>
          ). arXiv:
          <volume>2308</volume>
          .
          <fpage>07107</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>60006010</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>A deep relevance matching model for ad-hoc retrieval</article-title>
          ,
          <source>in: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , CIKM '16,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>5564</fpage>
          . URL: https://doi.org/10.1145/2983323.2983769. doi:
          <volume>10</volume>
          .1145/2983323.2983769.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>Learning to match using local and distributed representations of text for web search</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on World Wide Web, WWW '17, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE</source>
          ,
          <year>2017</year>
          , p.
          <fpage>12911299</fpage>
          . URL: https://doi.org/10.1145/3038912.3052579. doi:
          <volume>10</volume>
          .1145/3038912.3052579.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Rawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaheer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jayasumana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sadhanala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jitkrittum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Menon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Embeddistill: A geometric knowledge distillation for information retrieval</article-title>
          ,
          <year>2023</year>
          . URL: https://openreview.net/forum?id=
          <fpage>BT03V9Re9a</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Murdock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Potdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Fast and light-weight answer text retrieval in dialogue systems</article-title>
          , in: A.
          <string-name>
            <surname>Loukina</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gangadharaiah</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Min (Eds.),
          <source>Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>