<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Influence of distance measures and data characteristics on time performance in content-based and collaborative filtering datasets⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Marchenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Shevchenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>60 Volodymyrska Street, Kyiv, 01033</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>99</fpage>
      <lpage>108</lpage>
      <abstract>
        <p>This paper presents a comparative study on the time performance of various distance measures within the vector-space model, applied to content-based and collaborative filtering datasets. Euclidean distance, inner product, and cosine similarity were evaluated. A custom experimental framework was developed to assess these measures. The impact of dataset size, dimensionality, and the number of closest vectors returned by queries were analyzed.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Chroma DB</kwd>
        <kwd>collaborative filtering</kwd>
        <kwd>content-based filtering</kwd>
        <kwd>distance measures</kwd>
        <kwd>recommender systems 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems have become an integral part of our daily lives, influencing decisions in
various domains. From suggesting movies and TV shows on platforms like Netflix, to recommending
products on e-commerce sites such as Amazon, and even curating personalized music playlists on
Spotify, these systems enhance user experience by providing tailored content based on individual
preferences. Unlike traditional recommender systems that operate behind the scenes, dialog-based
systems engage users in interactive conversations to gather preferences and provide
recommendations through natural language interfaces. This approach offers a more human-centric
and engaging way to receive personalized suggestions, potentially leading to improved user
satisfaction and increased adoption of recommendation-driven applications [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Recommender systems typically rely on two main approaches: content-based filtering and
collaborative filtering. Content-based filtering relies on the attributes of items (or their content itself)
and user profiles to suggest similar items [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. For instance, if a user has shown interest in science
fiction movies, the system will recommend other science fiction movies based on the features such
as genre, director, or actors. Collaborative filtering leverages the preferences and behaviors of
multiple users to generate recommendations. It assumes that if users A and B have similar tastes,
items liked by user A are likely to be appreciated by user B as well [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Both methods aim to discover
new content users might enjoy, but they differ in their underlying assumptions and data
requirements.
      </p>
      <p>
        One common technique to implement both filtering types is representing user profiles and items
as vectors. This method leverages distance or similarity measures between vectors to identify
recommendations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In content-based filtering, the system compares the vectorized user profile
with vectorized item profiles to find similar items, ensuring recommendations align closely with the
with each other, identifying users with similar tastes to suggest items they have enjoyed (Fig. 1,
right).
      </p>
      <p>
        Generative models, such as GPT-like models, are widely used to develop chat-based applications
due to their ability to generate human-like text. However, they are not always the ideal solution for
recommender systems [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. These models are trained on extensive corpora that include a vast
amount of data unrelated to recommendations, which can dilute their effectiveness in this specific
context. Additionally, they may lack relevant data that the system aims to recommend, and their
training data may not be updated frequently, limiting their ability to provide recent items. To address
these limitations, it is possible to fine-tune the model on relevant, domain-specific data or to
incorporate Retrieval-Augmented Generation techniques.
      </p>
      <p>
        Retrieval-Augmented Generation (RAG) enhances the capabilities of generative models by
allowing them to access external knowledge sources [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], making them more effective for
recommendation tasks. In a RAG- olely on
pretrained knowledge but instead retrieves relevant data dynamically from a connected database or
vector store [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example, when a user engages with a dialog-based system, the model generates
a query based on the conversation. This query is then used to search the vector store, which contains
pre-encoded user and item profiles. By retrieving relevant vectors such as items similar to the
the system can provide up-to-date and accurate recommendations. This
approach combines the strengths of generative models with real-time, domain-specific data, ensuring
the recommendations are both contextually relevant and current.
      </p>
      <p>
        While numerous investigations have focused on estimating the impact of distance measures on
recommendation accuracy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], only few of them have considered the time performance of these
measures [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Time performance is a critical factor in production systems, which must efficiently
handle large volumes of data and serve numerous users simultaneously. Slow response times can
degrade the user experience and strain system resources, which is unacceptable in real-time
applications like recommender systems. Optimizing the time performance is essential to ensure that
recommender systems can deliver timely and relevant recommendations, maintaining a seamless
user experience even under heavy load. The goal of this study is to assess the time performance of
different distance measures within vector stores. The results could provide valuable insights not only
for improving recommender systems but also for any other domain that relies on vector stores, from
information retrieval to natural language processing and beyond.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The experiment methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Hardware and software setup</title>
        <p>The experiment was performed on a Windows 10-powered machine equipped with an AMD Ryzen
1500x 3.7 GHz processor, 16 GB of DDR4 3200 MHz RAM, and a 512 GB M2 SSD. While the time
performance results may vary slightly based on the hardware configuration, the patterns and trends
observed in the experiment should remain consistent across different setups, ensuring the
generalizability of the results. All experiments were implemented using Python 3 (version 3.10.5), a
widely adopted programming language in machine learning research due to its simplicity and
extensive library support.</p>
        <p>
          Chroma DB [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was used as the vector store in this experiment for several reasons. First, it
provides a Python library, making it straightforward to integrate into the experimental pipeline.
Second, Chroma allows local data storage without the need for an external API, which helps
eliminate delays related to internet connectivity. Finally, it integrates with LangChain [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a popular
Python framework for building controllable agentic workflows, including RAG applications, which
enhances the flexibility and functionality of the experiment. Chroma natively supports three distance
measures [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]: Squared L2 (Euclidean distance), inner product (IP) and cosine similarity (CS). In
Chroma, these distances are defined as follows [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]:

 =1

 =1
  2 = ∑(  −   )2 ;
        </p>
        <p>= 1.0 − ∑(  ×   ) ;
 
=
∑

 =1(  ×   )
√∑ =1
( 2) ∙ √∑ =1
( 2)
,
where n is the dimension of the vectors, Ai and Bi are the coordinates of vectors A and B,
respectively. All these distances were estimated and compared.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Content-based filtering dataset</title>
        <sec id="sec-2-2-1">
          <title>For the content</title>
          <p>
            This dataset contains nearly 2.7 million news articles and essays from 27 American publications,
spanning from 2016 to 2020 years. Each record in the dataset includes various attributes such as
publication date, author, title, article text, URL, and more (Fig. 2). However, for content-based
filtering, only the article (as the text to embed), title (as a metadata field), and row index (as the
record ID) were used to populate the vector store.
-MiniLM-L6-v2 model) [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] was utilized to
convert each article into a 384-dimensional vector. To ensure manageable computational time, the
dataset was divided into three subsets containing 25,000 (25K), 75,000 (75K), and 125,000 (125K)
articles. Larger subsets were not created due to the extended time required to embed large datasets.
          </p>
          <p>Queries to the vector store were executed using 100 pre-embedded texts, which were not part of
the created datasets, simulating user profiles. Additionally, experiments were also conducted on
generated random data.
(1)
(2)
(3)</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Collaborative filtering dataset</title>
        <p>
          For the collaborative filtering experiment, three sizes of the MovieLens (ML) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] dataset were used:
with 100,000 (100K), 1 million (1M), and 10 million (10M) ratings. The larger datasets were not utilized
due to limitations in compute resources.
        </p>
        <p>The dataset includes user-item interactions, where each interaction is represented in the format:
user ID, movie ID, provided rating and timestamp (Fig. 3). Ratings are given on a 5-star scale with
half-star increments, and the timestamp was omitted.</p>
        <p>For collaborative filtering, the user-item interactions should be represented as a utility matrix,
where each row corresponds to a user, and the columns represent their interactions with all items
(movies) from the dataset. Thus, an additional transformation was done to convert the entire data
into utility matrix form (Fig. 4).</p>
        <p>The width of the utility matrix and, accordingly, the dimensionality of the vectors varies
depending on the number of items (movies) in the dataset. For the 100K dataset, the matrix included
610 users and 9,724 items; for the 1M dataset, it included 6,040 users and 3,706 items; and for the 10M
dataset, there were 69,878 users and 10,677 items. This variation allowed for an estimation of how
vector dimensionality impacts retrieval time. To query the created vector stores, 10 records were
taken from the utility matrix itself, and 90 records were generated to simulate additional users and
their interactions.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Methodology</title>
        <p>
          For each dataset and distance function, 10 queries were conducted using 10 input vectors. The
number of closest vectors returned by Chroma DB was parametrized, starting from 10 and increasing
in increments up to 3,010 vectors. A smaller number of returned vectors is practical when the vector
store directly powers a recommender system, providing a concise set of recommendations. On the
other hand, a larger number of returned vectors is beneficial when the query results serve as input
for subsequent stages in a recommendation pipeline, such as in hybrid recommender systems [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ],
where additional post-processing or combination with other algorithms may be required (Fig. 5).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Due to the extensive volume of data generated during these experiments, only key results and trends
are presented in this paper. Therefore, readers interested in the raw data and detailed results can
access them through the provided link:
https://github.com/Gurdel/distances_comparison_public/blob/main/results_raw.xlsx.</p>
      <sec id="sec-3-1">
        <title>3.1. Results on the News dataset</title>
        <p>As mentioned in the previous chapter, the content-based filtering experiments were divided into two
groups: one using real data and the other using generated data. The time performance across both
groups was nearly identical, indicating that the nature of the data did not significantly impact the
computational efficiency. Additionally, all evaluated distance measures demonstrated similar
response times. The minor variations observed in the results can likely be attributed to experimental
error rather than any inherent difference in the algorithms.</p>
        <p>Figure 6 illustrates the increase in average response time across all experiments conducted on the
News datasets. Although the dataset size increased by 200% (from 25K to 75K) and 400% (from 25K
to 125K), the average response time for all distance measures rose by only 36.5% and 58.5%,
respectively (Table 1). This indicates that the system scales efficiently, maintaining relatively low
increases in response time even as the dataset size grows significantly.</p>
        <p>Cosine, real</p>
        <p>IP, real
L2, real</p>
        <p>Figure 7 shows the increase in response time as the number of closest vectors returned by each
query increases. The results are shown for the IP measure on 25K, 75K, and 125K News datasets
using real data for queries and 75K News dataset using generated data for queries. The increased
response time at the beginning of the experiment with real data can be attributed to the internal
processes of Chroma DB, such as initial indexing and caching mechanisms. When the experiment
was repeated using generated query data, these spikes were absent, indicating that the system was
already optimized after the initial runs. Across all database sizes and distance measures, the trends
for response time remained almost linear, suggesting that the system scales predictably and
efficiently as the number of retrieved vectors grows.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results on the MovieLens dataset</title>
        <p>Since the smallest collaborative filtering dataset, ML 100K, contains only 600 records (users),
querying the vector store for more than 600 results would simply return all available data. During
queries requesting between 10 to 510 results, the response time remained consistent across all
distance measures and did not increase. Table 2 contains the average response time for all MovieLens
datasets when querying up to 510 results. As expected, larger datasets resulted in longer response
times. However, even in this small sample, a minor difference in performance was observed among
the distance measures (Fig. 8). CS exhibited slightly worse response times compared to IP and L2,
suggesting that CS may introduce additional computational overhead in the context of these datasets.</p>
        <p>During experiments on the ML 1M dataset, all distance measures exhibited linear trends in time
increases (Fig. 9). CS demonstrated a slightly worse average response time (1.47 seconds) compared
to IP and L2 measures, which averaged 1.36 and 1.3 seconds, respectively. However, the differences
between the measures are marginal, particularly when a higher number of vectors are returned.</p>
        <p>In contrast to previous datasets, cosine similarity exhibited significantly worse performance on
the ML 10M dataset compared to the other distance measures. As shown in Figure 10, the response
time for CS increases more rapidly than for IP and L2, and both of them have almost linear trends.
While L2 had an average response time of 2.84 seconds and IP averaged 3.27 seconds, CS took 4.4
seconds on average. This can be explained by the fact that, despite all these measures have similar
time complexity O(n), CS requires more elementary operations to be computed according to its
formula. This highlights that its performance degradation with high-dimensional data makes IP or
L2 preferable for high-efficiency scenarios where response time is critical.</p>
        <p>An important observation can be drawn from the comparison of response times between the ML
10M and News 75K datasets. Although these datasets have a comparable number of vectors
(approximately 70,000 and 75,000, respectively), the dimensionality of the ML dataset is significantly
higher (10,677 vs. 384). It was initially assumed that retrieval time on the ML data would be much
slower compared to the News dataset. However, while the average response time for all measures in
the News 75K dataset is 5.7 seconds, the ML 10M dataset showed only 3.5 seconds across all
experiments. This discrepancy can be explained by the sparsity inherent in collaborative filtering
data. With far more items than any individual user interacts with, the utility matrix has many empty
(or zero) cells, allowing for substantial optimization in distance computations. This reduces the actual
number of calculations needed, leading to faster retrieval times despite the higher dimensionality.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The experiments conducted on both content-based and collaborative filtering datasets provided
valuable insights into the time performance of different distance measures. All compared distance
measures showed linear trends in response time as the number of returned closest vectors increased.
This consistent behavior across different datasets suggests that the system scales predictably, making
it reliable for scenarios where larger sets of vectors need to be retrieved.</p>
      <p>Across most datasets, cosine similarity, inner product, and Euclidean distance exhibited similar
response times, with only minor differences in computational efficiency. However, as data
dimensionality increased, significant variations in performance were observed. CS showed a notably
higher response time on the high-dimension datasets, particularly in the ML 10M dataset, where its
performance lagged behind IP and L2. This performance gap can be attributed to the higher number
of elementary operations required by CS compared to the other measures.</p>
      <p>Additionally, the comparison between the ML 10M and News 75K datasets highlighted the impact
of data sparsity on performance, with the ML dataset showing faster response times despite its higher
dimensionality. This underscores the importance of considering data characteristics when evaluating
distance measures.</p>
      <p>Moreover, an additional contribution of this study was the development of an experimental
methodology and accompanying software to systematically evaluate the performance of various
distance measures. This framework enabled consistent testing across datasets and can be extended
for future research, providing a valuable tool for benchmarking distance-based retrieval methods in
recommender systems.</p>
      <sec id="sec-4-1">
        <title>4.1. Further steps</title>
        <p>To further validate our findings and explore the scalability of different distance measures, it is
essential to conduct experiments on larger datasets. By incorporating datasets beyond the tested in
this study, it will be possible to better understand how different distance measures scale and whether
the observed trends hold on even bigger datasets. This would provide more comprehensive insights
into the scalability of vector stores and distance measures when applied to real-world, large-scale
recommender systems.</p>
        <p>To ensure the generalizability of our results, it is crucial to experiment with other vector stores
beyond Chroma. Different vector stores may have varying optimizations and internal processes that
could affect performance. By comparing multiple vector stores, we can identify the most efficient
options for different recommendation scenarios.</p>
        <p>Additionally, future research should explore the use of other similarity measures. While our study
focused on natively supported by Chroma DB measures, comparing them with other measures will
provide a comprehensive understanding of their performance characteristics and help identify the
most suitable measures for specific applications.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alcaraz</surname>
          </string-name>
          ,
          <article-title>Blending Fine-Tuning and RAG for Collaborative Filtering with LLMs, 2023</article-title>
          . URL: https://ai.plainenglish.
          <article-title>io/blending-fine-tuning-and-rag-for-collaborative-filtering-with-llms3d71858485e4.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Improving Rocchio Algorithm for Updating User Profile in Recommender Systems</article-title>
          . In: Lin,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Manolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds) Web
          <source>Information Systems Engineering WISE 2013. WISE 2013. Lecture Notes in Computer Science</source>
          , vol
          <volume>8180</volume>
          . Springer, Berlin, Heidelberg,
          <year>2013</year>
          . doi: https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -41230-1_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chroma</surname>
          </string-name>
          , Embeddings, n.d. URL: https://docs.trychroma.com/guides/embeddings.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chroma</surname>
          </string-name>
          , Usage Guide, n.d. URL: https://docs.trychroma.com/guides.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Components</surname>
          </string-name>
          ,
          <source>All the News 2.0 2</source>
          .
          <article-title>7 million news articles and essays from 27 American publications</article-title>
          , n.d. URL: https://components.one/datasets/all
          <article-title>-the-news-2-news-articles-dataset.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pramod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bafna</surname>
          </string-name>
          ,
          <article-title>Conversational recommender systems techniques, tools, acceptance, and adoption: A state of the art review</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>203</volume>
          (
          <year>2022</year>
          )
          <article-title>117539</article-title>
          ,
          <string-name>
            <surname>ISSN</surname>
          </string-name>
          0957-
          <fpage>4174</fpage>
          . doi: https://doi.org/10.1016/j.eswa.
          <year>2022</year>
          .
          <volume>117539</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fkih</surname>
          </string-name>
          ,
          <article-title>Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison</article-title>
          .
          <source>Journal of King Saud University - Computer and Information Sciences</source>
          <volume>34</volume>
          (
          <issue>9</issue>
          ) (
          <year>2022</year>
          )
          <fpage>7645</fpage>
          -
          <lpage>7669</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[8] GroupLens, MovieLens, n.d. URL: https://grouplens.org/datasets/movielens/.</mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Joy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Renumol</surname>
          </string-name>
          ,
          <article-title>Comparison of Generic Similarity Measures in E-learning Content Recommender System in Cold-Start Condition</article-title>
          .
          <source>In: Proceedings of the IEEE Bombay Section Signature Conference</source>
          ,
          <year>December 2020</year>
          . Bombay, India,
          <year>2020</year>
          ,
          <fpage>175</fpage>
          -
          <lpage>179</lpage>
          . doi:
          <volume>10</volume>
          .1109/IBSSC51096.
          <year>2020</year>
          .
          <volume>9332162</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Saranya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Sadasivam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chandralekha</surname>
          </string-name>
          ,
          <article-title>Performance Comparison of Different Similarity Measures for Collaborative Filtering Technique</article-title>
          .
          <source>Indian Journal of Science and Technology</source>
          <volume>9</volume>
          (
          <issue>29</issue>
          ) (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>LangChain</surname>
          </string-name>
          , Vector stores, n.d. URL: https://python.langchain.
          <source>com/v0</source>
          .1/docs/modules/data_connection/vectorstores/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Baxla</surname>
          </string-name>
          ,
          <article-title>Comparative study of similarity measures for item based top n recommendation</article-title>
          .
          <source>Unpublished thesis</source>
          (Bachelor in Computer Science),
          <source>National Institute of Technology Rourkela</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wijewickrema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dias</surname>
          </string-name>
          ,
          <article-title>Selecting a text similarity measure for a content-based recommender system: A comparison in two corpora</article-title>
          .
          <source>The Electronic Library</source>
          <volume>37</volume>
          (
          <issue>3</issue>
          ) (
          <year>2019</year>
          )
          <fpage>506</fpage>
          -
          <lpage>527</lpage>
          . doi: https://doi.org/10.1108/EL-08-2018-0165.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <article-title>Hybrid Web Recommender Systems</article-title>
          . In: Brusilovsky,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kobsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          , W. (eds),
          <source>The Adaptive Web. Lecture Notes in Computer Science</source>
          , vol
          <volume>4321</volume>
          . Pp.
          <volume>377</volume>
          -
          <fpage>408</fpage>
          . Springer, Berlin, Heidelberg,
          <year>2007</year>
          . doi: https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -72079-9_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>R. van Meteren</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Someren</surname>
          </string-name>
          ,
          <source>Using Content-Based Filtering for Recommendation</source>
          ,
          <year>2000</year>
          . URL: https://users.ics.forth.gr/~potamias/mlnia/paper_6.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Restack</surname>
          </string-name>
          ,
          <source>Recommendation Systems Using Rag</source>
          ,
          <year>2024</year>
          . URL: https://www.restack.io/p/recommendation
          <article-title>-systems-answer-using-rag-cat-ai.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Belhaouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fareed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Halim</surname>
          </string-name>
          ,
          <article-title>A collaborative filtering recommendation framework utilizing social networks</article-title>
          .
          <source>Machine Learning with Applications</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>S.-B. Sun</surname>
            ,
            <given-names>Z.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>H.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , T.-
          <string-name>
            <surname>J. Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Min</surname>
          </string-name>
          ,
          <article-title>Integrating Triangle and Jaccard similarities for recommendation</article-title>
          .
          <source>PLoS ONE</source>
          <volume>12</volume>
          (
          <issue>8</issue>
          ) (
          <year>2017</year>
          )
          <article-title>e0183570</article-title>
          . doi: https://doi.org/10.1371/journal.pone.
          <volume>0183570</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <article-title>Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency</article-title>
          . ArXiv, abs/2401.10545 (
          <year>2024</year>
          )
          <article-title>27 pages</article-title>
          . doi: https://doi.org/10.48550/arXiv.2401.10545.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>