SCaLe-QA: Sri lankan Case Law Embeddings for Legal QA⋆
                         Lasal Jayawardena1,2,† , Nirmalie Wiratunga1,† , Ramitha Abeyratne1 , Kyle Martin1 ,
                         Ikechukwu Nkisi-Orji1 and Ruvan Weerasinghe2
                         1
                             Robert Gordon University, Aberdeen, United Kingdom
                         2
                             Informatics Institute of Technology, Sri Lanka


                                        Abstract
                                        SCaLe-QA is a foundational system developed for Sri Lankan Legal Question Answering (LQA) by leveraging
                                        domain-specific embeddings derived from Supreme Court cases. The system is tailored to capture the unique
                                        linguistic and structural characteristics of Sri Lankan law through fine-tuned embeddings. While Case-Based
                                        Reasoning (CBR) will be integrated into the question-answering framework, it is primarily set for future develop-
                                        ment and evaluation. Currently, SCaLe-QA employs semantic chunking, tokenization, and BM25-based ranking
                                        to generate context-driven triplets from unlabeled corpora. In addition, an angle-optimised contrastive learning
                                        framework is applied to enhance retrieval accuracy. Preliminary results indicate promise, establishing SCaLe-QA
                                        as a significant step toward robust AI applications in the Sri Lankan legal domain.

                                        Keywords
                                        text embeddings, legal AI, RAG, CBR, legal question answering, retrieval


                         1. Introduction
                         The increasing complexity of legal texts, particularly within the Sri Lankan judicial system, poses
                         significant challenges for the development of effective Legal Question Answering (LQA) systems. Legal
                         documents are characterised by specialised vocabulary, intricate syntactic structures, and context-
                         dependent semantics, making the task of automated question answering both demanding and essential.
                         The ability to accurately and efficiently answer legal questions is critical, as it enhances access to
                         legal information, supports legal research, and facilitates informed decision-making processes for legal
                         professionals, researchers, and the general public [1].
                            Recent advancements in Natural Language Processing (NLP) and Machine Learning (ML) have spurred
                         the development of sophisticated LQA systems that leverage deep learning techniques to process and
                         understand legal texts effectively. These advancements have been well-documented, highlighting the
                         importance of domain-specific datasets and models tailored to the unique characteristics of legal texts
                         [2]. The development of domain-specific embeddings is crucial for enhancing the performance of LQA
                         systems. [3] emphasise the necessity of sentence embeddings tailored to the legal domain, given the
                         specialised vocabulary and unique semantic interpretations found in legal texts
                            Retrieval Augmented Generation (RAG) has emerged as a powerful approach in enhancing the
                         performance and reliability of LQA systems. RAG combines the strengths of retrieval-based methods
                         with generative models, allowing for more accurate and contextually relevant responses to legal queries.
                         This approach involves retrieving relevant documents or passages from a large corpus of legal texts
                         and then using this retrieved information to augment the generation process [4]. Furthermore, the
                         integration of case-based reasoning (CBR) systems with specialised embeddings as investigated in [5]
                         has been shown to improve the performance of such systems compared to typical information retrieval
                         techniques [6] in the legal context.


                         SICSA REALLM Workshop 2024
                         †
                              Corresponding author.
                         Envelope-Open l.jayawardena@rgu.ac.uk (L. Jayawardena); n.wiratunga@rgu.ac.uk (N. Wiratunga); r.abeyratne@rgu.ac.uk
                         (R. Abeyratne); k.martin3@rgu.ac.uk (K. Martin); i.nkisi-orji@rgu.ac.uk (I. Nkisi-Orji); ruvan.w@iit.ac.lk (R. Weerasinghe)
                         Orcid 0009-0002-7100-6015 (L. Jayawardena); 0000-0003-4040-2496 (N. Wiratunga); 0009-0008-5582-8311 (R. Abeyratne);
                         0000-0003-0941-3111 (K. Martin); 0000-0001-9734-9978 (I. Nkisi-Orji); 0000-0002-1392-7791 (R. Weerasinghe)
                                       © 2024 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0
                                       International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   In this contribution, we explore the impact of tuning domain-specific embeddings for legal contexts,
focusing on how these embeddings can be utilised to transform triplets from unprocessed legal doc-
uments into structured representations. Our aim is to improve retrieval accuracy within Retrieval
Augmented Generation (RAG) systems, which is crucial for the effectiveness of Legal Question Answer-
ing (LQA) systems. By enhancing these embeddings, we aim to significantly boost the performance and
reliability of LQA systems tailored to the legal domain, particularly in the Sri Lankan legal space.


2. Finetuning Methodology


Figure 1: Workflow for Finetuning Process


2.1. Data Source
The dataset used for this study consists of the officially reported Supreme Court judgments from Sri
Lanka, spanning from 2009 to 2024. In total, 1541 documents were scraped from the official Supreme
Court website1 . The documents covered a wide area of the Sri Lankan legal context such as:

       • Appeals: Includes general appeals such as standard appeals, civil appeals, and specific appeals
         related to legal provisions or leave to appeal applications.
       • Applications: Divided into constitutional applications, fundamental rights applications, and
         other various legal procedure applications.
       • Civil Cases: Encompasses general civil matters, including commercial and procedural cases, as
         well as specific civil cases like divorce, testamentary, and land disputes.
       • Criminal Cases: All cases related to criminal law.
       • Constitutional Matters: Includes cases dealing with constitutional law, references, and specific
         declarations under the Constitution.
       • Commercial High Court Cases: Covers commercial disputes handled by the Commercial High
         Court.
       • Other: A variety of other case types, including contempt of court, election-related matters, and
         writs of certiorari and prohibition.

  Most of these documents were directly text-parsable, while others required additional processing.
The non-parsable documents were fed into an OCR model using Adobe 2 , and the resulting text was
manually corrected to ensure accurate extraction. This cleaned and corrected dataset served as the
primary data source for the subsequent stages of this research.


1
    https://www.supremecourt.lk/
2
    https://www.adobe.com/in/acrobat/online/ocr-pdf.html
2.2. Document to Sentence Segmentation
In this work, we followed the document chunking strategies from the Open Australian Legal Question-
Answering (ALQA) dataset3 , which is based on legal question answering for Australian law. The
document chunking strategy served as a preprocessing step, assisting the sentence tokenizer in breaking
segments into sentences. More importantly, it established the foundation for creating the testing
framework necessary for the embedding fine-tuning process, which will be discussed later. We employed
the semantic chunking model provided by the SemChunk library4 , which was particularly useful for
handling legal documents. We maintained consistency with the Australian Legal QA dataset by setting
the chunk size to 384 tokens, as tokenised according to the tiktoken tokenizer for GPT-45 . Upon manual
inspection, the chunk size was deemed appropriate, and it did not negatively impact sentence integrity.
   To gain insights into the dataset, after preprocessing, we performed some sentence-level visualisation
as shown in Figure 2. The visualisation highlights that the dataset includes very long documents, with
some exceeding 1500 sentences. These visualisations emphasise the significant variability in both the
size of documents and the length of sentences, which presents unique challenges for processing and
analysing legal texts.


Figure 2: Sentence Level Visualisation of the Dataset


2.3. Triplet Creation
Each sentence after sentence segmentation is treated as individual units for subsequent processing to
create triplets. The BM25 algorithm, a robust ranking function used in information retrieval [7], is
applied to rank these sentences based on their relevance to each other within each case document. This
method leverages the lexical knowledge embedded in the text as weak supervision, which has been
shown to be a strong baseline for fine-tuning text embeddings, as discussed in [8]. The BM25 algorithm
was implemented using the rank_bm25 library.6 , and the algorithmic breakdown for the preprocessing
will be shown below.
  Given a sentence 𝑆 in a document 𝐷, the BM25 score for another sentence 𝑆 ′ in the same document is
calculated using the following adaptation of the BM25 formula:
                                             𝑛
                                                                   𝑓 (𝑡𝑖 , 𝑆 ′ ) ⋅ (𝑘1 + 1)
                          score(𝑆 ′ , 𝐷) = ∑ IDF(𝑡𝑖 ) ⋅                                                (1)
                                                                                              |𝑆 ′ |
                                            𝑖=1           𝑓 (𝑡𝑖 , 𝑆 ′ ) + 𝑘1 ⋅ (1 − 𝑏 + 𝑏 ⋅ avgdl )


3
  https://huggingface.co/datasets/umarbutler/open-australian-legal-qa
4
  https://github.com/umarbutler/semchunk
5
  https://github.com/openai/tiktoken
6
  https://github.com/dorianbrown/rank_bm25
𝑡𝑖 are the terms in sentence 𝑆, whilst 𝑓 (𝑡𝑖 , 𝑆 ′ ) is the frequency of term 𝑡𝑖 in sentence 𝑆 ′ , where |𝑆 ′ | is
the length of the sentence and avgdl is the average length of sentences in the document. 𝑘1 and 𝑏 are
hyperparameters (typically 𝑘1 = 1.5 and 𝑏 = 0.75), and IDF(𝑊𝑖 ) is the Inverse Document Frequency of
term 𝑡𝑖 .
    In this approach, for each sentence 𝑆 in a document, we rank all other sentences 𝑆 ′ in the same
document using the BM25 algorithm. The most similar sentence (i.e., the one with the highest BM25
score) is selected as the positive or like sample 𝑋𝐿 . To select a negative sample or unlike sample 𝑋𝑈 ,
we randomly choose one out of the top five least similar sentences (i.e., those with the lowest BM25
scores). This strategy helps avoid the issue of a very dissimilar sentence being repeatedly included in
the triplets, which could make the triplets less informative for training. The number of least unlike
sentences considered is obtained through empirical experimentation.
    The sentences identified through this process are then denoted as follows:

    • 𝑋𝑎 : The anchor sentence, which is the sentence we are evaluating within the document.
    • 𝑋𝐿 : The positive sample, the sentence most like the anchor.
    • 𝑋𝑈 : The negative sample, a sentence sampled from a pool of N unlike (least similar to) the anchor.

  These notations will be used in the subsequent embedding fine-tuning process.

2.4. Embedding Finetuning
A contrastive compound loss function was designed to optimise the distances within the embedding
space between triplets (𝑋𝑎 , 𝑋𝐿 , 𝑋𝑈 ), where 𝑋𝑎 is the anchor sample, 𝑋𝐿 is the positive sample (similar or
like the anchor), and 𝑋𝑈 is the negative sample (dissimilar or unlike the anchor). The compound loss
function is inspired by the methodologies in [9], where angle optimisation is combined with the cosine
objective from [10]. This approach differentiates itself from existing contrastive learning methods
discussed in [11, 12]. Specifically, the loss function combines three key objectives:
                                                                     𝑆
                                                               exp ( 𝜏𝐿 )
                  𝐿 = 𝑤1 𝐿𝑐 (𝑆𝑈 , 𝑆𝐿 ) + 𝑤2 (− ∑ ∑ log (                     )) + 𝑤3 𝐿𝑐 (𝑆𝑈′ , 𝑆𝐿′ )          (2)
                                                                     𝑆
                                                 𝑏   𝑚       ∑𝑗 exp ( 𝜏𝑈 )

    • The first term 𝐿𝑐 (𝑆𝑈 , 𝑆𝐿 ), weighted by 𝑤1 , uses the standard cosine similarities between the
      anchor and positive (or like) instance, 𝑆𝐿 = 𝑐𝑜𝑠(𝑋𝑎 , 𝑋𝐿 ), and the anchor and negative (or unlike)
      instance, 𝑆𝑈 = 𝑐𝑜𝑠(𝑋𝑎 , 𝑋𝑈 ). Here the general contrastive loss function is defined as: 𝐿𝑐 (𝑆𝑈 , 𝑆𝐿 ) =
                        𝑆 −𝑆
      log (1 + ∑ exp ( 𝑈 𝜏 𝐿 )). This encourages the model to ensure that positive pairs have higher
      similarity than negative pairs.
    • The second term, weighted by 𝑤2 , applies in-batch negative sampling, comparing the anchor-
      positive pairs within a batch and treating the remaining pairs as negatives. Again using cosine
      similarities to arrive at 𝑆𝐿 and 𝑆𝑈 respectively.
    • The third term weighted by 𝑤3 , is similar to the first but uses a refined similarity metric, 𝑆 ′ , where
      the embeddings of 𝑋𝑎 , 𝑋𝐿 and 𝑋𝑈 are split in half. The similarities 𝑆𝐿′ and 𝑆𝑈′ are calculated by
      averaging the cosine similarity over the two halves of the embeddings.

Here 𝜏: is a temperature scaling parameter that controls the sensitivity of the model to differences in
similarity scores. Lower values of 𝜏 increase the sharpness, while higher values soften the distribution.
Parameters 𝑚 and 𝑏 represent the batch size and number of batches, respectively.
  The loss function was applied to fine-tune AnglE-BERT, a model specifically designed for angle-
optimised contrastive learning, as introduced by [9]. AnglE-BERT was initially trained with the angle
optimisation mechanism that adjusts the angles between embeddings in the latent space, which is
particularly effective for distinguishing between similar and dissimilar pairs of sentences. Each fine-
tuning run for AnglE-BERT was carried out using the triplets formed from the BM25-ranked sentences,
executed on an NVIDIA RTX A100. The training phases were conducted with a batch size of 32, over
10 epochs, spanning around 14 GPU hours, during which the model was exposed to just over 230,000
triplets. The extensive training was aimed at refining the model’s ability to learn the nuances of the
legal texts seen in the triplets.

2.5. Model Training Dualities
As introduced in AnglE-BERT [9], two distinct flavours of embeddings were fine-tuned using contrastive
learning, each optimised for different retrieval purposes. In this work, these embeddings will be
categorised as Intra-Embeddings and Inter-Embeddings, which are designed to serve specific
retrieval and matching tasks.
       • Intra-Embeddings (f(Q)): These embeddings are optimised for attribute matching within the
         same type of content, such as comparing questions with questions. This type of embedding is
         particularly useful for semantic textual similarity tasks, where the focus is on finding sentences
         with closely related meanings, even if they are phrased differently.
       • Inter-Embeddings (g(Q)): These embeddings are designed for broader information retrieval
         scenarios, where the goal is to match content across different types of attributes, such as comparing
         a query with relevant passages, entities, or supporting texts. This allows for more flexible retrieval
         tasks, where the query might need to be matched with various types of contextual information.
  The fine-tuning process outline before for embeddings was conducted separately on both the intra
and inter embeddings, ensuring that each representation was optimised for its respective task. This
dual fine-tuning approach allows the model to perform well across both precise attribute matching and
broader retrieval tasks. Conceptually, this approach is akin to a form of query rewriting[13], where each
type of embedding acts as a different representation of the input query, tailored to optimise retrieval for
specific purposes.
  Table 1 provides an example illustrating the difference between intra-embedding and inter-embedding
for a sentence used in the training process.

Table 1
Comparison of an example sentence with and without the 𝐶𝑢𝑒 text (in blue) to create inter and intra embeddings.
      Embedding       Sentence
                      𝑓(“The primary issue arises due to the inclusion of ’share certificate’ in Gazette No.
      intra
                      1465/19.”)
                      𝑔(“Represent this sentence for searching relevant passages:” + “The primary issue arises
      inter           due to the inclusion of ’share certificate’ in Gazette No. 1465/19.”)

  These dual embeddings can form the foundation for flexible and robust retrieval systems that can
handle both precise and contextually broad queries within the legal domain.


3. Evaluation
Prior to evaluating the performance of the embedding models, there are two key stages: Casebase
Creation and Test Set Creation. These stages are crucial for building the necessary datasets for evaluating
the retrieval models in the subsequent retrieval evaluation.

3.1. Casebase Creation
The first stage in the evaluation process involved constructing the casebase, using the scraped Supreme
Court documents. As illustrated in Figure 3, this process involved multiple steps of attribute extraction
from legal documents. The documents were segmented into manageable chunks of 384 tokens, and the
following key attributes were extracted using the GPT-4o mini 7 model:
7
    https://platform.openai.com/docs/models/gpt-4o-mini
       • Court Details: Including court name, case number, case year, and case type.
       • Parties Involved: Identifying plaintiffs, defendants, and their respective roles.
       • Questions of Law: Legal questions presented in the case, particularly those considered by the
         Supreme Court.
       • Case Summary: A brief summary of the case, including judgment details, legal issues, and key
         findings.
       • Laws and Acts Referenced: Listing specific laws or legal acts cited during the judgment.
       • Judgment Details: Including the decision outcome, key findings, and legal conclusions reached
         by the court.

   Once extracted, these attributes were compiled into structured JSON records, providing a metadata-
rich view of each case. This enabled efficient question-answer generation by providing the necessary
context of the full legal case in a condensed structure, for each document chunk, rather than requiring
the system to process the entire case document.


Figure 3: Casebase Creation Workflow


3.2. Test Set Creation
The second stage involved creating a robust test set from the casebase. This process, depicted in Figure 4,
starts by filtering the entire casebase to group cases that share overlapping legal references, such as
common laws or acts. Metadata filtering was applied in such a way that for each case in the casebase,
other cases with overlapping laws were identified. These similar cases were ranked based on how closely
related they were, but ensuring that they originated from different legal documents, thus enhancing
diversity in the source material.
   The cases selected through this filtering process were further refined by applying a cosine similarity
ranking mechanism, using the Ada-002 embedding model [14] to identify closely related case pairs
using the query of each case. These pairs were then used to generate a new hybrid question and answer
through a prompt designed for the GPT-4o 8 LLM. The generated question was complex and required
the context of both related case snippets to be answered correctly.
   A human-in-the-loop system was employed to review the generated question-answer pairs. The
evaluators (authors of this paper) assessed the quality of each pair, filtering out those that lacked
relevance or quality. This rigorous review ensured the creation of a high-quality test set for further
retrieval evaluation.
   This test set contained 1000 high-quality question-answer pairs to evaluate the embedding retrieval.
8
    https://platform.openai.com/docs/models/gpt-4o
Figure 4: Test Set Creation Workflow


3.3. Retrieval Analysis
The embedding model performance was assessed using multiple Retrieval@K evaluations, which helps
in understanding how well the model ranks and retrieves relevant information based on the hybrid
test cases. The retrieval evaluation included analysing both the F1-score@K and Recall@K, which
provide insights into the balance between precision and recall during the retrieval process.
   To conduct this evaluation, we used k-Nearest Neighbors (k-NN) based retrieval, exploring a range
of 𝑘 values between 1 and 37. These prime values allowed us to investigate the optimal retrieval size for
the legal documents used in the study.
   We evaluated our fine-tuned AnglE-BERT model for both intra and inter-embeddings, comparing it
against the standard BERT[15] and AnglE-BERT models[9]. Figure 5 shows the heat maps comparing
Recall@K and F1-score@K for these models and their respective weight configurations.


Figure 5: Recall and F1 Score Analysis of Retrieval@K


   The results indicate that fine-tuning on AnglE-BERT improves both the Recall@K and F1-score@K
across different retrieval levels. Specifically, the fine-tuned AnglE-BERT model with different weight
configurations has shown a robust retrieval performance whereas AnglE-BERT has performed well in
only query-to-query matching. These findings suggest that the fine-tuned model’s ability to retrieve
relevant legal cases was robust and well-adapted to the unique characteristics of the legal domain.

3.4. Embedding Distribution
The embedding distribution, as illustrated in Figure 6, was obtained by calculating the cosine similarity
between the query (i.e., the question for each case) and its corresponding snippet or context.
  For the standard BERT and AnglE-BERT models, the similarity distribution is skewed to the left.
This left-skewed distribution indicates that these models classify more query-snippet pairs as having
Figure 6: Analysis of Cosine Distributions between Query and Snippet


a relatively high similarity score. This behaviour suggests that BERT and AnglE-BERT may not be
capturing the nuanced relationships between legal queries and snippets effectively, potentially leading
to a higher number of false positives in retrieval tasks. In contrast, the fine-tuned AnglE-BERT model
exhibits a more normal-like distribution. This shift suggests that the fine-tuning process has improved
the model’s ability to differentiate between relevant and irrelevant cases, balancing the similarity scores
across query-snippet pairs. The more centred distribution may be an indicator that the fine-tuned
model is better adapted to the legal domain, likely learning the domain-specific vocabulary and complex
semantic relationships within legal texts. As a result, it performs more robustly in distinguishing
between cases with subtle variations in meaning, leading to improved retrieval performance.


4. Conclusion
In this work, we developed SCaLe-QA, a foundational system tailored to the specific requirements
of Sri Lankan Legal Question Answering (LQA) tasks by using domain-specific embeddings derived
from Supreme Court cases. Our work primarily focused on enhancing the retrieval accuracy of a
RAG system using CBR by fine-tuning embeddings, using BM25 ranking for triplet generation and
contrastive learning methods. An interesting finding was the creation of dual representations for the
query depending on the attributes being compared for retrieval. Finetuning in this manner resulted
in superior F1 scores. Future work will involve integrating Case-Based Reasoning (CBR) to build
more comprehensive question-answering models, as well as expanding the scope of SCaLe-QA to
attribute-focused embedding models.


References
 [1] A. Louis, G. van Dijck, G. Spanakis, Interpretable Long-Form Legal Question Answering
     with Retrieval-Augmented Large Language Models, 2023. URL: http://arxiv.org/abs/2309.17050,
     arXiv:2309.17050 [cs].
 [2] A. Abdallah, B. Piryani, A. Jatowt, Exploring the state of the art in legal QA systems, Jour-
     nal of Big Data 10 (2023) 127. URL: https://journalofbigdata.springeropen.com/articles/10.1186/
     s40537-023-00802-8. doi:10.1186/s40537- 023- 00802- 8 .
 [3] S. Jayasinghe, L. Rambukkanage, A. Silva, N. de Silva, S. Perera, M. Perera, Learning Sentence
     Embeddings In The Legal Domain with Low Resource Settings, in: S. Dita, A. Trillanes, R. I.
     Lucas (Eds.), Proceedings of the 36th Pacific Asia Conference on Language, Information and
     Computation, Association for Computational Linguistics, Manila, Philippines, 2022, pp. 494–502.
     URL: https://aclanthology.org/2022.paclic-1.55.
 [4] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih,
     T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive NLP
     tasks, in: Proceedings of the 34th International Conference on Neural Information Processing
     Systems, NIPS ’20, Curran Associates Inc., Red Hook, NY, USA, 2020. Event-place: Vancouver, BC,
     Canada.
 [5] N. Wiratunga, R. Abeyratne, L. Jayawardena, K. Martin, S. Massie, I. Nkisi-Orji, R. Weerasinghe,
     A. Liret, B. Fleisch, CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs
     for Legal Question Answering, in: J. A. Recio-Garcia, M. G. Orozco-del Castillo, D. Bridge (Eds.),
     Case-Based Reasoning Research and Development, volume 14775, Springer Nature Switzerland,
     Cham, 2024, pp. 445–460. URL: https://link.springer.com/10.1007/978-3-031-63646-2_29. doi:10.
     1007/978- 3- 031- 63646- 2_29 , series Title: Lecture Notes in Computer Science.
 [6] M.-Y. Kim, Y. Xu, R. Goebel, Applying a Convolutional Neural Network to Legal Question
     Answering, in: M. Otake, S. Kurahashi, Y. Ota, K. Satoh, D. Bekki (Eds.), New Frontiers in Artificial
     Intelligence, volume 10091, Springer International Publishing, Cham, 2017, pp. 282–294. URL:
     http://link.springer.com/10.1007/978-3-319-50953-2_20. doi:10.1007/978- 3- 319- 50953- 2_20 ,
     series Title: Lecture Notes in Computer Science.
 [7] S. Robertson, H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond, Founda-
     tions and Trends® in Information Retrieval 3 (2009) 333–389. URL: http://www.nowpublishers.
     com/article/Details/INR-019. doi:10.1561/1500000019 .
 [8] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, F. Wei, Text Embed-
     dings by Weakly-Supervised Contrastive Pre-training, 2024. URL: http://arxiv.org/abs/2212.03533,
     arXiv:2212.03533 [cs].
 [9] X. Li, J. Li, AnglE-optimized Text Embeddings, 2024. URL: http://arxiv.org/abs/2309.12871,
     arXiv:2309.12871 [cs].
[10] J. Su, Cosent (1): A more effective sentence vector scheme than sentence bert, 2022. URL: https:
     //kexue.fm/archives/8847.
[11] T. Gao, X. Yao, D. Chen, SimCSE: Simple Contrastive Learning of Sentence Embeddings, in: Proceed-
     ings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association
     for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 6894–6910.
     URL: https://aclanthology.org/2021.emnlp-main.552. doi:10.18653/v1/2021.emnlp- main.552 .
[12] L. Xu, H. Xie, Z. Li, F. L. Wang, W. Wang, Q. Li, Contrastive Learning Models for Sentence
     Representations, ACM Trans. Intell. Syst. Technol. 14 (2023). URL: https://doi.org/10.1145/3593590.
     doi:10.1145/3593590 , place: New York, NY, USA Publisher: Association for Computing Machin-
     ery.
[13] X. Ma, Y. Gong, P. He, H. Zhao, N. Duan, Query Rewriting for Retrieval-Augmented Large Language
     Models, 2023. URL: http://arxiv.org/abs/2305.14283, arXiv:2305.14283 [cs].
[14] OpenAI, New and Improved Embedding Model, 2023. URL: https://openai.com/blog/
     new-and-improved-embedding-model, publisher: OpenAI.
[15] N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,
     in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
     Association for Computational Linguistics, 2019. URL: http://arxiv.org/abs/1908.10084.