=Paper= {{Paper |id=Vol-3931/short1 |storemode=property |title=A Diversity-aware Approach to Bundle Recommendations |pdfUrl=https://ceur-ws.org/Vol-3931/short1.pdf |volume=Vol-3931 |authors=Nastaran Ebrahimi,Zheying Zhang,Kostas Stefanidis |dblpUrl=https://dblp.org/rec/conf/dolap/EbrahimiZS25 }} ==A Diversity-aware Approach to Bundle Recommendations== https://ceur-ws.org/Vol-3931/short1.pdf
                         A Diversity-aware Approach to Bundle Recommendations
                         Nastaran Ebrahimi1 , Zheying Zhang1 and Kostas Stefanidis1
                         1
                             Tampere University, Tampere, Finland


                                           Abstract
                                           Recommendation systems help users navigate vast amounts of data, with bundle recommendation systems enhancing personalization
                                           and customized experience by grouping related items. However, many existing methods overemphasize relevance, leading to repetitive
                                           suggestions and user fatigue. This paper introduces two novel bundling methods—Bundle Partition and Bundle Function—designed
                                           to balance both diversity and relevance. These methods were evaluated using Amazon datasets on the Appliances, All_Beauty, and
                                           Luxury_Beauty categories. Results show a significant increase in diversity, as measured by Intra-List Diversity (ILD), while maintaining
                                           high relevance through average ratings. Furthermore, the novelty, assessed via Mean Inverse User Frequency (MIUF), indicates that these
                                           methods offer a fresh and relevant experience. These findings emphasize the importance of diversity in enhancing user engagement.

                                           Keywords
                                           Bundle Recommendation Systems, Diversity, Novelty



                         1. Introduction                                                                                              based techniques; (ii) the use of NLP to analyze item features,
                                                                                                                                      providing more content-rich and diverse bundle recommen-
                         In many recommendation contexts, particularly in online                                                      dations compared to existing user-centric or budget-focused
                         shopping and travel package suggestions, users often pre-                                                    models; (iii) an evaluation of the proposed bundling meth-
                         fer to purchase a collection of items rather than a single                                                   ods using ILD and MIUF metrics to illustrate the impact of
                         product. Therefore, recommending a set of related items                                                      diversity and novelty on user engagement.
                         collectively, rather than individually, is more effective. This
                         strategy, known as bundle recommendation, involves sug-
                         gesting groups of complementary items to enhance decision-                                                   2. Related Work
                         making, align with real-world buying behavior, and boost
                         both satisfaction and sales [1].                                                                             Bundle Recommendations. In e-commerce, users often
                            A significant advancement in recommendation system                                                        purchase multiple items, making bundle recommendations
                         development is the incorporation of diversification into the                                                 essential for suggesting sets of products rather than individ-
                         recommendation process [2, 3]. While many recommenda-                                                        ual ones [1]. Bundle sales serve as a cooperative marketing
                         tion systems prioritize accuracy over diversity [4], diversity                                               strategy where multiple brands collaborate to expand their
                         is crucial in bundle recommendations for offering varied                                                     reach and maximize impact [7]. For instance, [8] intro-
                         items that meet different customer preferences.                                                              duces a model integrating collaborative filtering, demand
                            This paper introduces a hybrid bundle recommendation                                                      functions, and price modeling to optimize product selection
                         approach that balances relevance and diversity by integrat-                                                  for revenue maximization. Effective bundle recommenda-
                         ing collaborative and content-based filtering. It predicts user                                              tions should prioritize interconnected products, either com-
                         preferences through collaborative filtering and refines rec-                                                 plementary or alternative, aligning with user preferences
                         ommendations using item features. The approach includes                                                      [1]. Traditional methods [9, 10] identify frequently bought-
                         two diversity-aware bundling methods: Bundle Partition,                                                      together items but often overlook personalization and rel-
                         which selects diverse items aligned with user interests; and                                                 evance. Techniques like integer programming [11, 12] fail
                         Bundle Function, which ensures both user relevance and                                                       to capture pairwise dependencies, treating cross-item re-
                         variation among items. Utilizing NLP techniques to calcu-                                                    lationships as rigid constraints, while association analysis
                         late item similarities, this method enhances recommenda-                                                     [13, 14] applies uniform rules that lack personalization [15].
                         tion quality by reducing redundancy.                                                                         Diversity in Bundle Recommendations. Diversity and
                            The proposed methods are evaluated using real-world                                                       novelty are key to improving recommendation effective-
                         datasets from Amazon’s Appliances, All_Beauty, and Lux-                                                      ness [16], with diversity ensuring variation among recom-
                         ury_Beauty, whose extensive metadata, including product                                                      mended items [17, 18] and novelty introducing unfamiliar
                         descriptions, categories, and user ratings, enabled advanced                                                 but relevant suggestions [18]. Studies have sought to bal-
                         natural language processing (NLP) analysis [5, 6]. The ef-                                                   ance relevance and variety, with [11] and [19] proposing
                         fectiveness of the bundling approaches was assessed using                                                    the Bundle Generation Network (BGN), which leverages
                         Intra List Diversity (ILD) and Mean Inverse User Frequency                                                   Determinantal Point Processes (DPPs) to enhance diversity
                         (MIUF). The results showed that both the Bundle Partition                                                    in bundle recommendations.
                         and Bundle Function methods successfully introduced di-                                                         This article presents techniques for generating diverse
                         versity, while maintaining relevance.                                                                        and relevant bundles, evaluated using metrics like ILD and
                            Overall, the main contributions of this work are : (i) a hy-                                              MIUF. Unlike approaches that balance relevance and di-
                         brid model that balances relevance and diversity in bundle                                                   versity with budget constraints—potentially compromising
                         recommendations using collaborative filtering and content-                                                   efficiency or diversity—this method optimizes novelty and
                                                                                                                                      user satisfaction while reducing computational costs. It
                         DOLAP 2025: 27th International Workshop on Design, Optimization, Lan-                                        achieves this through dynamic similarity-based bundling,
                         guages and Analytical Processing of Big Data, co-located with EDBT/ICDT                                      randomized partitions, and strategic item selection.
                         2025, March 25, 2025, Barcelona, Spain
                         Envelope-Open nastaranebrahimi2021@gmail.com (N. Ebrahimi);
                         zheying.zhang@tuni.fi (Z. Zhang); konstantinos.stefanidis@tuni.fi
                         (K. Stefanidis)
                                       © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
3. Bundling Methodology                                                Similarity-based Bundling. The similarity-only bundling
                                                                       method creates product bundles based on items similar to
In this paper, we address the challenge of creating product            those previously liked by the user, assuming similar items
bundles that balance diversity and relevance based on user             will be well-received. This method serves as the base model
preferences, aiming to reduce redundancy and enhance the               to compare with two diversity-aware models. The algo-
user experience. we propose a hybrid approach to ensure                rithm generates a list of items similar to the target item,
both relevance and diversity while keeping relevant items              identified from the user’s preferences, by comparing it with
within each bundle. The method combines collaborative                  other items in the metadata. While this approach ensures
filtering using SVD to identify user interests and a content-          relevance, the resulting bundles may lack variety, leading
based approach to select suitable items for each bundle.               to repetitive suggestions. Despite this, it provides a useful
Determining User Preferences. First, we identify user                  baseline for comparing more diverse bundling strategies.
preferences to ensure bundles align with individual interests.         Partition and Randomization Method. Formally, let
The SVD algorithm is used to generate personalized recom-              𝑇 represent the target item, and let 𝑆 = {𝑠1 , 𝑠2 , … , 𝑠𝑛 } be
mendations based on interaction data, and the top-rated                the set of items similar to 𝑇, determined based on the co-
item is selected as the ”target” item for bundling. Bundling           sine similarity. The objective is to find items within 𝑆
involves selecting items that not only align with user pref-           that maximize dissimilarity to 𝑇 and place them in a list
erences and but also add value through diversity. After                𝐿: 𝐿 = {𝑠𝑖 ∈ 𝑆 ∣ maximize dissimilarity(𝑠𝑖 , 𝑇 )}. This way, we
choosing the target item, additional items are selected based          ensure that the final bundle includes varied items that still
on distinct features to ensure variety and relatedness, aim-           reflect the user’s preferences, reducing redundancy.
ing to create a well-rounded bundle that avoids redundancy                After constructing the list 𝐿, the items are shuffled and
and enhances user satisfaction. For example, if a user’s top-          divided into partitions. A random selection is made from
rated item is a smartphone, the bundle may include a phone             each partition to add unpredictability, increasing novelty
case, screen protector, or wireless earbuds. These items are           while maintaining relevance. This approach guarantees a
selected based on the user’s interest in technology (from              fresh combination of items for each bundle, resulting in a
the SVD-based analysis) and are diverse enough to offer a              dynamic and engaging recommendation process.
broader experience. This prevents repetition and ensures                  To measure diversity, TF-IDF vectors of item features
each item adds value in a different way. The process uses              are used, and Euclidean distance serves as the metric. This
content-based filtering to assess item features, maximizing            helps maintain a balance between similarity and diversity
diversity within the bundle.                                           in the item list. For two points P = (𝑝1 , 𝑝2 , … , 𝑝𝑛 ) and
Computing Similarities. Once the target item 𝑡 is iden-                Q = (𝑞1 , 𝑞2 , … , 𝑞𝑛 ) in a 2-dimensional space, the Euclidean
tified, the next step is to locate items that share similar            distance 𝑑 is: 𝑑(P, Q) = √(𝑝1 − 𝑞1 )2 + (𝑝2 − 𝑞2 )2 . Euclidean
features to enhance recommendation relevance and user                  distance helps identify items that are both relevant and
satisfaction. Each item 𝑖 in the dataset can be represented            diverse. This is an ideal measure in this work due to its
by a feature vector f𝑖 = [𝑓𝑖1 , 𝑓𝑖2 , … , 𝑓𝑖𝑛 ], where each 𝑓𝑖𝑗 rep-   simplicity and effectiveness in distinguishing diverse items,
resents a specific feature, such as brand or category. The             especially when using data like TF-IDF vectors. After gen-
similarity between the target item 𝑡 and another item 𝑖 is             erating the list, the items are shuffled and divided into seg-
calculated using a similarity function, sim(f𝑡 , f𝑖 ). Items with      ments. One item is randomly selected from each segment:
the highest similarity scores are chosen, ensuring that the            𝐵 = {𝑏𝑖 ∣ 𝑏𝑖 ∈ random(segment𝑖 )}. This approach ensures
recommendations align closely with user preferences.                   diversity and unpredictability in the final bundle.
    Finding similar items is crucial for creating effective prod-         Algorithm 1 begins by retrieving and filtering texts of sim-
uct bundles, as it ensures relevance and increases the likeli-         ilar items, then computes TF-IDF vectors for these texts and
hood of high user ratings, enhancing engagement and satis-             the target item. It calculates Euclidean distances between
faction. To calculate item similarity, metadata is processed           the target item and similar items, shuffles the list of similar
and vectorized using NLP techniques like TF-IDF, which                 items, and divides it into partitions. One item is randomly
converts text into numerical vectors, assigning greater im-            selected from each partition to form bundles, with the first
portance to key terms. Cosine similarity is then used to               bundle including the target item.
measure the similarity between items by calculating the co-            Bundle Function Method. The Bundle Function method
sine of the angle between their feature vectors. Items with            aims to curate bundles by strategically selecting items that
high similarity scores are considered closely related to the           are distinct from each other while still aligning with user
target item and are selected as potential recommendations,             preferences. Items similar to the target item 𝑇 are iden-
ensuring relevance and higher user satisfaction.                       tified from a pre-constructed list 𝑆 = {𝑠1 , 𝑠2 , … , 𝑠𝑛 }, using
                                                                       precomputed similarities, like cosine similarity. The goal
4. Bundle Generation                                                   is to create a bundle from list 𝑆, ensuring each successive
                                                                       item is as dissimilar as possible to previously selected items.
This section outlines three methods for forming product bun-           This is achieved by calculating Euclidean distances between
dles. The first method focuses on item similarity, grouping            their feature vectors. Let f𝑖 and f𝑗 represent feature vectors
highly similar items with user preferences without consid-             of items 𝑠𝑖 and 𝑠𝑗 in 𝑆. The Euclidean distance 𝑑(f𝑖 , f𝑗 ) is
ering diversity. The second method introduces diversity                                               𝑛
                                                                       given by: 𝑑(f𝑖 , f𝑗 ) = ∑𝑘=1 (𝑓𝑖𝑘 − 𝑓𝑗𝑘 )2 . The algorithm se-
by selecting a mix of related but varied items, ensuring a                                      √
                                                                       lects items with the largest Euclidean distances to ensure
balance between relevance and diversity using pairwise dis-            variety: 𝐵 = {𝑠1 , 𝑠2 , … , 𝑠𝑚 } ∣ maximize 𝑑(f𝑖 , f𝑗 ), ∀𝑖, 𝑗 ∈ 𝐵, 𝑖 ≠ 𝑗.
similarity and randomization. The third method aims to                 This selection ensures that the items within the bundle are
maximize intra-bundle diversity by choosing items that dif-            not just variations of the same product, but instead represent
fer from both the target item and each other, providing a              diverse choices that cater to user preferences.
broader set of recommendations to enhance user experience.                Algorithm 2 analyzes target items and finds similar ones
Algorithm 1 Partition and Randomization Method                     Table 1
 1: Retrieve and Filter Texts                                      ILD scores in bundles for each method
 2: for each ASIN in top_similar_items do                            Bundles             Beauty Dataset             Appliances Dataset
 3:     if ASIN exists in subset_data then                                      Partition Function Similarity Partition Function Similarity
 4:          Retrieve its text and store it.                         Bundle1      0.93       0.90         0.43   0.93     0.90      0.43
 5:     end if                                                       Bundle2      0.96       0.98         0.60   0.96     0.98      0.60
                                                                     Bundle3      0.98       0.97         0.33   0.98     0.97      0.33
 6: end for
                                                                     Bundle4      0.94       0.98         0.38   0.94     0.98      0.38
 7: Compute TF-IDF Vectors for target item’s text                    Bundle5      0.97       0.98         0.42   0.97     0.98      0.42
                                                                     Bundle6      0.95       0.99         0.52   0.95     0.99      0.52
 8: Compute TF-IDF Vectors for all items’ text
                                                                     Bundle7      0.97       0.98         0.52   0.97     0.98      0.52
 9: for each ASIN in top_similar_items do                            Bundle8      0.96       0.99         0.52   0.96     0.99      0.52
                                                                     Bundle9      0.96       0.87         0.53   0.96     0.87      0.53
10:     Calculate Euclidean Distances with target item               Bundle10     0.95       0.98         0.65   0.95     0.98      0.65
11: end for
                                                                     Mean         0.96       0.96         0.50   0.96     0.96      0.50
12: Sort ASINs by distance in descending order.
13: Shuffle and Partition sorted ASINs list
14: for each bundle (10 total bundles) to select items do
                                                                   mendation systems. This study used two files: ”ratings
15:     if it is the first bundle then
                                                                   only” and ”metadata.” The ”ratings only” file contains items,
16:          Make 4 partitions to select items (since the target
                                                                   users, ratings, and timestamps, with 371,345, 5,722,988, and
    item is included)
                                                                   602,777 ratings for the All_Beauty, Luxury_Beauty, and Ap-
17:     else
                                                                   pliances categories, respectively. The metadata file includes
18:          Make 5 partitions to select items
                                                                   product information like title, features, description, price,
19:     end if
                                                                   brand, and category. The All_Beauty and Luxury_Beauty
20:     Randomly select one ASIN from each partition
                                                                   categories, with 32,992 and 12,308 products, were combined
21:     Create the bundle with the selected ASINs
                                                                   as the beauty dataset, while the Appliances dataset, with
22: end for
                                                                   30,459 products, was also analyzed.
23: return list of bundles
                                                                   Evaluating Diversity. In experiments, the Intra-List Diver-
                                                                   sity (ILD) metric [17] is calculated for each bundle generated
Algorithm 2 Bundle Function (form_bundle) Method                   using one of the three proposed methods. By evaluating
 1: Initialization                                                 ILD scores across different bundling techniques, we aim to
 2: Convert 𝑇 and 𝑀 to dense arrays if necessary                   determine how each method impacts diversity in recom-
 3: Create a dictionary asin_idx mapping ASINs to their            mendations. ILD is defined as the average pairwise distance
    indices in 𝐷                                                   between items within a set of recommended items. For-
 4: Map 𝑆 to their indices in 𝐷, resulting in S_idx                                    1
                                                                   mally, ILD = |𝑅|(|𝑅|−1)  ∑𝑖∈𝑅 ∑𝑗∈𝑅 𝑑(𝑖, 𝑗), where |𝑅| represents
 5: Create an empty list bundles                                   the number of items in the recommendation set 𝑅, and 𝑑(𝑖, 𝑗)
 6: for each 𝑏 ∈ {1, 2, … , 𝐵} do                                  is the distance between two items 𝑖 and 𝑗 within the set. ILD
 7:      Start the bundle with the most similar item, bun-         is flexible, as the distance measure 𝑑(𝑖, 𝑗) can be defined in
    dle_idx ← [𝑆_𝑖𝑑𝑥[0]]                                           various ways based on the recommendation system’s con-
 8:      Remove the first item from S_idx, 𝑆_𝑖𝑑𝑥 ← 𝑆_𝑖𝑑𝑥[1 ∶       text and requirements. We use cosine similarity to calculate
    ]                                                              𝑑(𝑖, 𝑗), defined as the complement of similarity, 1 − sim(𝑖, 𝑗).
 9: end for                                                           In Table 1, you can see the improvement in diversity by
10: while |bundle_idx| < 𝑛 and 𝑆_𝑖𝑑𝑥 ≠ ∅ do                        the use of the methods Bundle Partition and Bundle Function
11:      Set last_idx ← bundle_idx[−1]                             for beauty and appliances. The Bundle Partition method
12:      Retrieve 𝑣last ← 𝑀[last_idx]                              consistently shows high ILD values ranging from 0.93 to
13:      Compute        Euclidean     distances:        dists      0.98, with an average of 0.96, indicating that it effectively
    ← [euclidean(𝑣last , 𝑀[idx]) ∀ idx ∈ 𝑆_𝑖𝑑𝑥]                    introduces diversity and prevents redundancy in the bundles.
14:      Identify index of max distance: max_dist_idx ←            Similarly, the Bundle Function method achieves high ILD
    arg max(dists)                                                 scores ranging from 0.87 to 0.99, with an identical average
15:      Add 𝑆_𝑖𝑑𝑥[max_dist_idx] to bundle_idx and remove          of 0.96, suggesting that both methods are equally effective
    it from 𝑆_𝑖𝑑𝑥                                                  in ensuring item diversity and enhancing user engagement.
16: end while                                                      Evaluating Relevance. The Average Rating (AVGr) rep-
17: Convert bundle_idx to ASINs using 𝐷 and append to              resents the mean of ratings for items within a bundle. In
    bundles                                                        recommendation systems, each item receives a rating, either
18: return bundles                                                 from user feedback or predictive algorithms. Based on the
                                                                   idea in [20], we propose using item ratings as a relevance
                                                                   score to improve the accuracy of collaborative filtering rec-
using a precomputed list. Euclidean distance introduces            ommendations by predicting user preferences. The average
diversity by selecting items further apart. The algorithm it-      rating helps assess the overall quality or appeal of the items
eratively adds distinct items until desired number is reached,     within the bundle. It is calculated as: AVG = 1𝑛 ∑𝑖=1 𝑟𝑖 ,
                                                                                                                              𝑛
creating multiple bundles with different starting points for       where 𝑛 is the number of items in the bundle, and 𝑟𝑖 is the
diverse yet relevant content.                                      rating of item 𝑖 in the bundle. A higher average rating in-
                                                                   dicates that users generally like the items, suggesting the
5. Experimental results                                            bundle’s likely success, while a lower average may imply
                                                                   less appeal. Figures 1 and 2 show that both methods main-
The 2018 Amazon dataset provides rich user-item interac-           tain high relevance scores.
tions and detailed metadata, making it valuable for recom-            Variance of Ratings (VAR) measures the spread of ratings
                                                                 Table 3
                                                                 MIUF Scores for each method
                                                                        Bundles Appliances Dataset      Beauty Dataset
                                                                                   Partition Function Partition Function
                                                                        Bundle1      16.64    15.50     13.33     15.24
                                                                        Bundle2      13.00    16.98     15.28     15.52
                                                                        Bundle3      18.54    17.16     17.12     14.86
                                                                        Bundle4      17.02    16.98     15.21     15.84
                                                                        Bundle5      18.54    17.44     12.15     16.96
                                                                        Bundle6      18.54    17.19     16.76     14.92
                                                                        Bundle7      18.54    17.01     15.92     16.88
                                                                        Bundle8      17.00    18.54     15.22     15.51
                                                                        Bundle9      17.01    17.39     18.25     16.29
                                                                        Bundle10     17.51    17.34     16.23     15.11
Figure 1: AVGr for appliances datasets.


                                                                 popularity among a broad audience. Items in the ”long tail”
                                                                 of the popularity distribution are considered novel, mean-
                                                                 ing they are not widely known. This concept uses Inverse
                                                                 User Frequency (IUF) to measure an item’s rarity among
                                                                 users, similar to inverse document frequency (IDF). IUF is
                                                                                            |𝑈 |
                                                                 defined as: IUF = − log2 ( |𝑈𝑖| ), where |𝑈𝑖 | is the number of
                                                                 users interacted with 𝑖, and |𝑈 | the number of users in the
                                                                 system. For the average novelty of the recommended items,
                                                                 the Mean Inverse User Frequency (MIUF) is calculated by
                                                                 averaging the IUF values of all items in the recommenda-
                                                                                      1              |𝑈 |
                                                                 tion set: MIUF = − |𝑅| ∑𝑖∈𝑅 log2 ( |𝑈𝑖| ), where 𝑅 is the set of
                                                                 recommended items.
Figure 2: AVGr for bundles for beauty datasets.                     The MIUF score measures the novelty of recommended
                                                                 items by assessing how uncommon they are across the user
                                                                 base. To evaluate novelty, its distribution is analyzed, re-
Table 2                                                          vealing that 90% of items have a MIUF below 13.33. All
VAR scores for each method                                       bundles generated by both methods meet or exceed this
                                                                 threshold, indicating their relative novelty. The Bundle
      Bundles Appliances Dataset      Beauty Dataset
                                                                 Partition method achieves MIUF values between 13.00 and
                 Partition Function Partition Function           18.54, demonstrating significant novelty, while the Bundle
      Bundle1      0.18      0.19         0.33    0.36           Function method shows even stronger novelty with MIUF
      Bundle2      0.08      0.08         0.11    0.09           values ranging from 15.50 to 18.54. Table 3 presents the
      Bundle3      0.16      0.07         0.00    0.11
      Bundle4      0.05      0.04         0.03    0.09
                                                                 novelty scores for each dataset.
      Bundle5      0.08      0.06         0.00    0.03           Discussion. Bundle Partition and Bundle Function enhance
      Bundle6      0.06      0.04         0.00    0.03           the recommendations’ diversity and novelty while maintain-
      Bundle7      0.04      0.04         0.00    0.00           ing high relevance. ILD scores for both methods are high
      Bundle8      0.11      0.01         0.00    0.00           in All_Beauty and Luxury_Beauty (0.93-0.98) and slightly
      Bundle9      0.00      0.01         0.00    0.00
      Bundle10     0.00      0.15         0.00    0.00
                                                                 lower in Appliances (0.55-0.98), indicating varied recommen-
                                                                 dations. For relevance, AVGr and VAR demonstrate users’
                                                                 preferences alignment, with AVGr scores up to 4.0. Nov-
                                                                 elty, measured by MIUF, is highest for the Bundle Function
within a bundle, offering insight into the consistency of user
                                                                 method, showing that the methods provide user-relevant
satisfaction. Unlike AVGr, which provides an overall sense
                                                                 bundles that outperform the Similarity-Based approach.
of the bundle’s appeal, VAR indicates how much ratings
vary from the average rating. It is calculated as: VAR =
 1 𝑛
   ∑ (𝑟 − AVG)2 , where 𝑛 is the number of items in the
 𝑛 𝑖=1 𝑖
                                                                 6. Summary
bundle, 𝑟𝑖 is the rating of item 𝑖 in the bundle, and AVG is
the average rating of the bundle. A low variance indicates       In this paper, we design, develop, and evaluate two bundling
consistent ratings across the bundle, suggesting a uniform       methods, alongside a baseline solution. Our goal is to im-
user experience. This can be advantageous when aiming for        prove product bundle recommendations by balancing rel-
a consistent level of quality or satisfaction. Although the      evance and diversity. We implemented a hybrid approach
Similarity-based method achieves slightly higher relevance       combining collaborative and content-based filtering, using
in Figures 1 and 2, the Partition and Function methods offer     NLP to analyze item features. Both methods successfully in-
a better balance between relevance and diversity, resulting      troduced diversity without sacrificing relevance, achieving
in more engaging bundles. The low VAR scores in Table            promising results in maintaining high ratings and enhanc-
2 indicate that increasing diversity does not compromise         ing the overall diversity of recommendations.
relevance or perceived quality.
Evaluating Novelty. The Global Long-Tail Novelty [21]
is used to determine how novel an item is by assessing its
References                                                           tronic Commerce Research and Applications 5 (2006)
                                                                     295–304. doi:10.1016/j.elerap.2006.04.006 .
 [1] L. Mia, X. Bao, L. Chang, Z. Xu, L. Li, A survey of        [14] V. L. Miguéis, A. S. Camanho, J. F. E. Cunha, Customer
     researches on personalized bundle recommendation                data mining for lifestyle segmentation, Expert Systems
     techniques, in: Hongyang, Y. Qiben, Z. X. C. Xi-                with Applications 39 (2012) 9359–9366. doi:10.1016/
     aofeng, Yan (Eds.), Machine Learning for Cyber Se-              j.eswa.2012.02.133 .
     curity, Springer International Publishing, 2020, pp.       [15] D. Le, H. W. Lauw, Y. Fang, Basket-sensitive personal-
     290–304.                                                        ized item recommendation, in: C. Sierra (Ed.), Proceed-
 [2] M. Kyriakidi, K. Stefanidis, Y. Ioannidis, On achieving         ings of the Twenty-Sixth International Joint Confer-
     diversity in recommender systems, in: Proceedings of            ence on Artificial Intelligence, IJCAI 2017, Melbourne,
     the ExploreDB 2017, Association for Computing Ma-               Australia, August 19-25, 2017, ijcai.org, 2017, pp.
     chinery, Inc, 2017. doi:10.1145/3077331.3077341 .               2060–2066. URL: https://doi.org/10.24963/ijcai.2017/
 [3] E. Ntoutsi, K. Stefanidis, K. Rausch, H. Kriegel,               286. doi:10.24963/IJCAI.2017/286 .
     ”strength lies in differences”: Diversifying friends for   [16] S. Vargas, P. Castells, Rank and relevance in novelty
     recommendations through subspace clustering, in:                and diversity metrics for recommender systems, in:
     Proceedings of the 23rd ACM International Confer-               Proceedings of the fifth ACM conference on Recom-
     ence on Conference on Information and Knowledge                 mender systems, 2011, pp. 109–116.
     Management, CIKM 2014, Shanghai, China, November           [17] C.-N. Ziegler, S. M. McNee, J. A. Konstan, G. Lausen,
     3-7, 2014, ACM, 2014, pp. 729–738.                              Improving recommendation lists through topic diver-
 [4] D. M. Fleder, K. Hosanagar, Recommender systems                 sification, in: Proceedings of the 14th international
     and their impact on sales diversity, EC’07 - Proceed-           conference on World Wide Web, 2005, pp. 22–32.
     ings of the Eighth Annual Conference on Electronic         [18] P. Castells, N. Hurley, S. Vargas, Novelty and diversity
     Commerce (2007) 192–199. URL: https://dl.acm.org/               in recommender systems, in: Recommender systems
     doi/10.1145/1250910.1250939. doi:10.1145/1250910.               handbook, Springer, 2021, pp. 603–646.
     1250939 .                                                  [19] J. Bai, C. Zhou, J. Song, X. Qu, W. An, Z. Li, J. Gao,
 [5] V. Christophides, V. Efthymiou, K. Stefanidis, En-              Personalized bundle list recommendation, in: The
     tity Resolution in the Web of Data, Synthesis Lec-              World Wide Web Conference, 2019, pp. 60–71.
     tures on the Semantic Web: Theory and Technology,          [20] Z.-P. Zhang, Y. Kudo, T. Murai, Y.-G. Ren, Enhancing
     Morgan & Claypool Publishers, 2015. doi:10.2200/                recommendation accuracy of item-based collabora-
     S00655ED1V01Y201507WBE013 .                                     tive filtering via item-variance weighting, Applied
 [6] T. B. Araújo, K. Stefanidis, C. E. S. Pires, J. Nummen-         Sciences 9 (2019) 1928.
     maa, T. P. da Nóbrega, Incremental blocking for en-        [21] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling,
     tity resolution over web streaming data, in: 2019               Y.-C. Zhang, Solving the apparent diversity-accuracy
     IEEE/WIC/ACM International Conference on Web In-                dilemma of recommender systems, Proceedings of the
     telligence, WI 2019, Thessaloniki, Greece, October 14-          National Academy of Sciences 107 (2010) 4511–4515.
     17, 2019, 2019, pp. 332–336.
 [7] M. E. Drumwright, A demonstration of anoma-
     lies in evaluations of bundling, Marketing Let-
     ters 3 (1992) 311–321. URL: https://doi.org/10.1007/
     BF00993916. doi:10.1007/BF00993916 .
 [8] M. Beladev, L. Rokach, B. Shapira, Recommender sys-
     tems for product bundling, Knowledge-Based Systems
     111 (2016) 193–206. doi:10.1016/j.knosys.2016.08.
     013 .
 [9] Y. L. Chen, K. Tang, R. J. Shen, Y. H. Hu, Market basket
     analysis in a multiple store environment, Decision
     Support Systems 40 (2005) 339–354. doi:10.1016/J.
     DSS.2004.04.009 .
[10] R. Garfinkel, R. Gopal, A. Tripathi, F. Yin, Design of
     a shopbot and recommender system for bundle pur-
     chases, Decision Support Systems 42 (2006) 1974–1986.
     doi:10.1016/J.DSS.2006.05.005 .
[11] M. Xie, L. V. Lakshmanan, P. T. Wood, Breaking
     out of the box of recommendations: From items to
     packages, RecSys’10 - Proceedings of the 4th ACM
     Conference on Recommender Systems (2010) 151–158.
     URL: https://dl.acm.org/doi/10.1145/1864708.1864739.
     doi:10.1145/1864708.1864739/SUPPL_FILE/
     RECSYS2010- 29092010- 08- 01.MOV .
[12] Q. Liu, Y. Ge, Z. Li, E. Chen, H. Xiong, Personalized
     travel package recommendation, in: Proceedings -
     IEEE International Conference on Data Mining, ICDM,
     2011, pp. 407–416. doi:10.1109/ICDM.2011.118 .
[13] T. C. Yang, H. Lai, Comparison of product bundling
     strategies on different online shopping behaviors, Elec-