<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A manifesto
of open challenges and interdisciplinary research directions, Information Fusion</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3589334.3645</article-id>
      <title-group>
        <article-title>Explainable Next-Purchase Recommendations: A Multistakeholder Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maciej Mozolewski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Honorata Zych</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabri Manai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Kutt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grzegorz J. Nalepa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Human-Centered Artificial Intelligence, Institute of Applied Computer Science, Jagiellonian University</institution>
          ,
          <addr-line>Krakow</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jagiellonian Human-Centered AI Lab, Mark Kac Center for Complex Systems Research, Jagiellonian University</institution>
          ,
          <addr-line>Krakow</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>106</volume>
      <issue>2024</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>We present a next-purchase recommendation system that combines advanced algorithms with explainable AI (XAI) to learn individual customer preferences from purchase histories and deliver personalized recommendations that enhance user engagement and inform marketing strategy. Our approach provides dual-layer, multistakeholder explanations: targeted communications that promote personalized marketing messages for customers and strategic insights for business stakeholders (e.g., marketing departments), reducing cognitive load and fostering trust. The system also addresses cold-start scenarios and leverages implicit feedback. Experiments on the MovieLens dataset demonstrate a balanced trade-of between accuracy, novelty, and explainability, potentially lowering users' decision-making efort.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable AI</kwd>
        <kwd>Recommender systems</kwd>
        <kwd>Decision support</kwd>
        <kwd>Cold-start mitigation</kwd>
        <kwd>Implicit feedback</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In professional contexts, AI supports managers by meeting information demands during decision making
and reducing cognitive load [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our focus is on recommendation systems, studied for decades and
popularized by platforms like Amazon and Netflix, yet requiring more than mere suggestion generation.
Modern recommendations should not only match user preferences but also surface unexpected and novel
items—so-called serendipitous recommendations [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Trustworthiness is essential in the increasingly
popular multistakeholder environments, where end-users and business stakeholders interact [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], so
recommendation systems must explain their predictions [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and address user-facing transparency
alongside stakeholder goals [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Thus, we focus on two stakeholder groups: (1) end-users (e.g., students
on an e-learning platform, readers on a news site, or shoppers using a retail app, who directly receive
and act on recommendations), and (2) business stakeholders (e.g., service providers, system owners,
marketers, system administrators, or curriculum designers in an educational context).
      </p>
      <p>This article is part of the PEER – The Hyper-Expert Collaborative AI Assistant project1, an EU-funded
Horizon Europe initiative redefining human–AI collaboration for complex decision making through
user-centered design, dynamic engagement, and transparent reasoning. PEER develops AI solutions
for manufacturing, warehouse management, and smart inclusive cities, with recommendation and
preference modeling among its core research areas. Although we propose an explainable
recommendation system for a retail use case, limited pilot data were not used in this study. Our work also aligns
with the 2025 Workshop on "AI for understanding human behavior in professional settings" (BEHAIV)2,
which emphasizes understanding experts’ information demands during decision making and reducing
cognitive load via explanatory AI to enhance safety and satisfaction at work.</p>
      <p>Our contributions are:
• An end-to-end framework uniting diverse ML models, hybrid serendipity mechanisms and XAI
to support collaborative decision making.
• Tailored multistakeholder explanations for end-users (e.g., customers) and business stakeholders
(e.g., system owners).
• A retail recommendation tool that learns individual preferences to provide clear, context-aware
product suggestions for in-store fulfillment, with simple explanations and built-in support for
novelty and cold-start cases.</p>
      <p>The rest of the paper is organized as follows: Section 2 covers background; Section 3 describes
system architecture, data preprocessing steps, and the models; Section 4 details the dataset, evaluation
protocol, performance metrics and reports both quantitative and qualitative findings, including model
comparisons, cold-start analysis, and explanation examples; Section 5 analyzes trade-ofs, limitations,
and practical implications; Section 6 concludes and outlines future work; finally, Section 6 acknowledges
support.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work and Background</title>
      <p>The first traces of the recommendation problem in scientific literature appeared in 1978 with the paper
[7], which proposed a rule-based system. Subsequnetly, several other classical methods were introduced
including collaborative filtering [ 8], content-based filtering[ 9] and association rule-based approaches
[10]. A common feature of this classical recommendation algorithms is their highly interpretable nature.
Explanations can be easily derived from rules themselves [7], listing what similar users have purchased
[11, 8] or providing confidence ranking scores for similar items [9].</p>
      <p>As deep learning models began to gain significant attention, they were also adopted for the
recommendation problem [12]. A variety of architectures have since been explored, including autoencoders
[13, 14], graph neural networks[15], and transformer-based models [16, 17]. These models excel at
capturing complex and higher-order relationships. However, their increased complexity makes them
opaque "black boxes", so researchers use popular model-agnostic techniques like SHAP [18, 14] and
LIME [14] for explanation. In addition, counterfactual explanations have emerged as a promising
direction, ofering insights by showing how slight changes in input could alter recommendation outcomes
[19, 20, 21].</p>
      <p>Explanations enhance transparency, build trust, and support acceptance in recommender systems,
aligning with the XAI Manifesto’s call for transparency, accountability, and understandability [22].
Classical approaches provided straightforward justifications: rule-based systems explained outcomes
through explicit logic[7]; collaborative filtering ofered transparency by referencing the actions of similar
users - for example, explaining a recommendation with “users similar to you liked this item” [8, 11].
Content-based methods highlighted shared item attributes - “this item is recommended because it shares
features with items you liked” or confidence scores [ 11, 9]. Social explanations, such as neighbor ratings,
their similarity to the user, and temporal dynamics, further enriched user trust [8, 11]. To demystify
the decision-making of black-box models, model-agnostic tools such as SHAP and LIME are often
employed, typically using plot-based visualizations to show feature contributions. Recent advances
also incorporate large language models (LLM) to translate these explanations into more accessible
natural language summaries [18]. Counterfactual explanations, often presented in natural language,
ofer another powerful approach, illustrating how slight changes in user behavior or preferences might
lead to diferent outcomes [ 11, 9, 18]. Beyond textual or statistical formats, visual explanation methods
have gained attention. These include word clouds that emphasize relevant terms [9, 23], "tagsplanations"
that interpret recommendations via users’ sentiment toward descriptive tags and tag relevance [24], and
graph-based visualizations that depict items and preferences as interconnected nodes and edges, thereby
tracing the semantic or behavioral logic behind recommendations [11, 23]. Together, these diverse
strategies reflect a shift from merely generating recommendations to constructing rich, multimodal
explanations that are transparent and user-centered.</p>
      <p>Recommender systems often operate in multi-sided environments where users, product providers,
and platform owners have distinct - sometimes conflicting - goals. Traditional algorithms typically
prioritize user utility, overlooking broader stakeholder objectives [25].</p>
      <p>
        To address this, multistakeholder recommendation systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] explicitly model and balance the
interests of users, providers, and platforms. This is especially important in platforms like eBay, Etsy, or
Airbnb, where sustained engagement from all parties is critical to long-term success [26].
      </p>
      <p>
        Despite growing interest in explainability, multistakeholder explanations remain underexplored,
with few studies addressing how to tailor them to diferent stakeholder needs. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in their study on job
recommender systems, draw from the literature the idea that explanations should either be individually
tailored to each stakeholder, or that a single explanation may be adapted in presentation depending
on the stakeholder’s level of expertise. Building on this, they explored several explanation modalities
suited to diferent roles, including graph-based visualizations (showing weighted paths in a knowledge
graph), LLM-generated textual summaries, and feature attribution bar charts. Their results show clear
preferences among stakeholder types: candidates favored short textual explanations for quick judgment,
hiring managers preferred graph-based views for a more technical overview, and recruiters benefited
most from detailed textual narratives. In a related contribution, [27] examined explanation strategies
in enterprise decision-making and identified counterfactual explanations as particularly efective in
multistakeholder contexts, as they enhance transparency while safeguarding stakeholder privacy and
preference sensitivity. Together, these findings point toward the need for adaptive, role-sensitive, and
privacy-aware explanation frameworks, a topic still in its early stages of research.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Recommendation systems personalize experiences using historical and interaction data, yet often
optimize only one goal. Our proposal introduces a transparent, multistakeholder workflow: it employs
user profiles to train models, generates dual XAI explanations for consumers and managers, and applies
business-rule filtering to deliver trusted recommendations. We used the Cornac framework [ 28] for
its support of multiple recommendation approaches and explainability. The selected models represent
distinct algorithmic paradigms, enabling comparison in terms of ranking quality and runtime eficiency.</p>
      <p>The Figure 1 illustrates recommendation process as proposed by us: first, the model is trained using
historical user profiles; next, new user data is fed into the trained model to generate a set of candidate
recommendations. These recommendations are then passed through an explanation module, which
produces human-readable rationales for each suggested item. After explanations are generated, business
rules (defined by business stakeholders; but also end-user needs and preferences) are applied to filter out
ineligible products. Finally, the system delivers the filtered items and their explanations directly to the
end-user, while simultaneously presenting a higher-level system explanation to business stakeholders.</p>
      <sec id="sec-3-1">
        <title>3.1. Model Training and Hyper-parameter Tuning</title>
        <p>All experiments were executed in Google Colab notebooks with the Cornac framework [28], which
supplies unified routines for data loading, model optimisation, and evaluation. 3.</p>
        <p>The data source was the MovieLens 100K implicit-feedback matrix. This dataset resembles the
proprietary data collected in the PEER project (shopping baskets), but those could not be used due to
confidentiality restrictions imposed by the PEER project use-case owner at the time of writing. It was
3Colab notebook with code and instructions on how to run it is available at:
https://github.com/sabri-manai/Explainable-Next-Purchase-Recommendations-A-Multistakeholder-Framework
partitioned into training, validation, and test subsets in a 70 : 20 : 10 ratio by the framework’s RatioSplit
utility, and ratings of four or higher were treated as positive interactions.</p>
        <p>Model definitions. To ensure robust performance on sparse implicit-feedback data, we selected four
state-of-the-art recommenders spanning the main paradigms. These include Bayesian Personalized
Ranking (BPR) [29], Matrix Factorization (MF) [30], Hierarchical Poisson Factorisation (HPF) [31], and a
Variational Auto-Encoder for Recommendations (RecVAE) [32], all implemented via their corresponding
Cornac model classes.</p>
        <p>Evaluation metrics. Five ranking metrics and two runtime measures are used throughout the study.
AUC is the probability that a randomly chosen positive interaction (rating ≥ 4) is ranked ahead of a
randomly chosen negative one. MAP averages, across users, the precision observed at each relevant
item, rewarding early hits. NDCG@10 normalises discounted cumulative gain at rank ten by the ideal
DCG, so higher values mean that relevant items appear nearer the top. Precision@10 is the fraction
of relevant items in the first ten positions, whereas Recall@10 is the fraction of each user’s relevant
catalogue retrieved within that cut-of. Runtimes are also logged: a single Time (s) during validation,
and separate Train (s) and Test (s) columns for the final evaluation.</p>
        <p>Tuning strategy. Only the latent dimension was varied, running a grid search for each model, and
HPF achieved its highest validation AUC at k=19. This single-parameter sweep served the dual purpose
of capacity control and of testing the hypothesis that a latent space of 19 factors would mirror the 19
MovieLens genre indicators examined later in Section 3.2. All remaining parameters were left at their
Cornac defaults, as preliminary runs showed negligible sensitivity outside the capacity dimension.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Categories Alignment of Latent Factors</title>
        <p>Although Hierarchical Poisson Factorization produces purely numerical item factors, an explicit link
to human-readable categories (movie genres) was required for explanatory purposes. The alignment
proceeded in three steps.</p>
        <p>Correlation matrix. First, the factor loadings for every movie were merged with the 19 binary genre
indicators supplied by the MovieLens metadata. For each latent dimension and each genre, the Pearson
correlation coeficient was calculated, yielding a  × 19 matrix whose entries quantify how strongly a
factor is expressed by titles in a given genre.</p>
        <p>One–to–one matching. Each latent factor correlates with several genres, yet a single genre label is
required for interpretation. The absolute correlation values were therefore negated. This conversion
turns the task of maximising correlations into the minimisation form expected by the Hungarian
algorithm. Applying the algorithm to the negated matrix produced a one-to-one assignment that links
every factor to the genre with which it shares its strongest absolute correlation.</p>
        <p>Interpretation and use. The resulting factor–genre map was visualized as a heat-map (Fig.4),
allowing latent themes to be read of at a glance. The same mapping was later used to aggregate SHAP
attributions to genre level for user-facing explanations and to support business stakeholders-oriented
factor steering. No model weights were modified in this procedure, and the alignment step was applied
entirely post-hoc.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Filtering with user preferences</title>
        <p>To enhance the personalization of recommendations and tailor them to individual user preferences, we
incorporate a lightweight user profiling mechanism. To maintain system robustness and avoid excessive
storage requirements, we do not retain the complete user interaction history. Instead, we store only
a minimal subset of preference information - specifically, categories explicitly marked as disliked by
the user during interactions with the system. This information is stored in the user profile. Filtering
based on user preferences is performed in two stages. First, a preliminary filtering step removes items
the user has already seen from the set of model-predicted recommendations (user history is obtained
on the fly from the dataset). Second, we further refine the candidate set by excluding any movies that
belong to genres identified as disliked in the user’s profile. After each recommendation round, the
user is prompted to update their preferences by specifying any additional disliked genres. This process
is sequential and adaptive: each time a recommendation is provided, the user has the opportunity to
revise their preferences, and these updates are immediately incorporated into subsequent filtering steps.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Serendipity and Business Rules</title>
        <p>In order to combine serendipity with business goals, we propose a hybrid approach that blends
businessdriven content promotion (via unpopular items) with user-centric novelty (through category diversity)
2, ensuring mutual benefit for both the user and the platform. 4.</p>
        <p>To enrich the recommendation system with diversity and surprise, we implemented a serendipitous
recommendation module that selects two types of product suggestions for each user: a random unpopular
pick and a category-novelty-based pick. These recommendations are intentionally designed to highlight
content that lies outside the user’s usual viewing patterns and the platform’s typical popularity trends.</p>
        <p>A key business rule guiding this module is the promotion of underexposed or lesser-known products.
Increasing their visibility and sales is strategically important for broadening market appeal, optimizing
inventory turnover, and enhancing overall profitability. This goal is achieved through the following
4The code for serendipity and business rules integration along with user profiles can be found at: https://github.com/hzych/R
ecommendations.
steps. First, we identify unpopular content by selecting products that fall within the lowest quartile of
overall rating frequency - specifically, those with a number of user ratings below the 25th percentile
of the distribution of rating counts. These are the items we aim to elevate. Next, we apply a series of
iflters: (1) the product must not have been rated by the user, (2) it must not belong to any category the
user has explicitly disliked, and (3) it must have an average rating of at least 3.0, ensuring a minimum
level of quality.</p>
        <p>From the resulting candidate set, two types of recommendations are generated:
• Random Pick: An item is selected at random, introducing an element of surprise and
unpredictability.
• Category-Novelty Pick: An item is selected based on the novelty of its category profile relative
to the user’s historical preferences. This process includes an additional step of computing a
category novelty score, which assigns higher values to products containing categories the user
has interacted with less frequently, thereby encouraging exploration into unfamiliar content
areas.</p>
        <p>Other common business rules, such as filtering out-of-stock items, prioritizing high-margin or
sponsored products, enforcing regional availability, and capping repeat recommendations, can be
incorporated via simple post-filtering or by adjusting recommendation scores.</p>
        <p>Time (s)</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Explanation module</title>
        <p>To explain recommendations to end-users, we apply SHAP at the genre level on each top suggestion,
revealing how individual genre preferences influence the score. We chose SHAP for its community
validation, model-agnostic applicability (it works with any recommendation model), and axiomatic
guarantees: local accuracy (attributions sum to the prediction), consistency (higher-impact features
receive larger values), and missingness (absent features score zero).</p>
        <p>For business stakeholders, we employ a latent-factor loading matrix from our matrix factorization
model: rows correspond to latent dimensions and columns to genres, showing how factors map to
domain concepts. While this model-specific analysis validates the global structure of factorization-based
recommenders, analogous loading or correlation analyses could be devised for other factor-driven
architectures; pure black-box models without explicit factors would require alternative techniques such
as embedding-based concept extraction.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this section we present both quantitative and qualitative evidence for the efectiveness of the
proposed pipeline. We begin by benchmarking four representative recommendation models on the
MovieLens 100K dataset, then analyse their robustness in simulated cold-start scenarios. We further
evaluate the clarity of user- and stakeholder-oriented explanations and, finally, discuss how post -filtering
rules aimed at serendipity and business constraints afect overall performance.</p>
      <sec id="sec-4-1">
        <title>4.1. Model Comparison</title>
        <p>Across both validation and test splits (Tables 1 and 2), the Hierarchical Poisson Factorisation (HPF)
model delivers the strongest ranking quality - topping AUC, MAP, and NDCG@10 - while remaining
reasonably fast to evaluate. Bayesian Personalised Ranking (BPR) follows closely on accuracy and is
far cheaper to train, making it a pragmatic second choice. RecVAE reaches near-HPF AUC scores but
incurs the heaviest training cost, and Matrix Factorisation (MF) trades accuracy for speed, achieving
the lowest metrics yet the quickest training time.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Cold-Start Analysis</title>
        <p>To assess robustness under sparse histories, a synthetic cohort of ten brand-new users was created,
each seeded with only three to six randomly selected past movies. Despite the limited input, the
Killing Fields, The (1984)
Afterglow (1997)
Waiting for Gufman (1996)
Cat People (1982)
Band Wagon, The (1953)</p>
        <p>Drama, War
Drama, Romance
Comedy
Horror
Comedy, Musical
Genres
Action, Adventure, Romance, Sci-Fi, War
Action, Crime, Drama
Drama, Thriller
Drama
Action, Adventure
Drama, War
Crime, Drama, Thriller
Drama, Romance, War
Action, Adventure, Romance, Sci-Fi, War
Drama</p>
        <p>Score
HPF model typically returned lists that were still genre-coherent: on average, roughly seven of the ten
recommended titles shared at least one genre with the user’s seeds, while the remaining two to three
items introduced new genres and thus encouraged discovery. Genre-level SHAP visualisations indicated
that each suggestion was usually driven by one or two strongly positive genres (e.g., Mystery ≈ +72%)
and tempered by mildly negative ones (e.g., Drama ≈ -8%). These observations suggest that the pipeline
can preserve relevance and provide intuitive explanations even when only a handful of interactions are
available.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. User-Focused Explanations</title>
        <p>To illustrate explanation quality for cold-start users, and to simulate diverse preferences and sparse
histories, ten synthetic user profiles were created, enabling controlled evaluation of the explanation
module. As the focus was on validating the explanation mechanism, no real users were involved. We
showcase the results for user labeled new-u-4. Table 3 gives this user’s sparse history: five titles
dominated by Drama/War with a touch of Comedy and Horror. Table 4 shows the Top-10 recommendations,
headed by Star Wars (1977).</p>
        <p>Visual explanation. Figure 3 aggregates user- and item-factors into genre-level SHAP values. War
(+52.1%), Animation (+37.4%), and Crime (+8.9%) provide the strongest positive signals; Fantasy
(–0.3%) exerts a mild negative influence.</p>
        <p>Textual explanation. Explanations enable the generation of a textual recommendation prompt by
populating a predefined template:</p>
        <p>Based on your viewing history (&lt;HistoryMovies&gt;), we’re excited to recommend &lt;RecommendedMovie&gt; as
a perfect fit for you . This choice is driven by your strong preference for &lt;Genre1&gt;, &lt;Genre2&gt;, and &lt;Genre3&gt;
genres. Because you’ve indicated you don’t enjoy &lt;DislikedGenre&gt;, we’ve omitted it entirely, and since
you’re neutral on &lt;Genre4&gt; and &lt;Genre5&gt;, those genres played virtually no role in this pick. We think
you’re going to love it!</p>
        <p>In case of user new-u-4:</p>
        <p>Based on your viewing history (Killing Fields; Afterglow; Waiting for Gufman; Cat People; Band Wagon),
we’re excited to recommend Star Wars as a perfect fit for you. This choice is driven by your strong preference
for War, Animation, and Crime genres. Because you’ve indicated you don’t enjoy Fantasy, we’ve omitted it
entirely, and since you’re neutral on Adventure and Comedy, those genres played virtually no role in this
pick. We think you’re going to love it!</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Stakeholder-Focused Explanations</title>
        <p>Figure 4 translates the abstract HPF embedding into a genre–factor matrix that is easy for non-technical
stakeholders to interpret. Each row represents one of the model’s latent dimensions, while each column
corresponds to a movie genre. Darker shades indicate that a given factor is strongly expressed by
items in that genre. Several pronounced patterns emerge: one factor activates almost exclusively for
Western titles, another peaks for Horror, and a third clearly tracks Animation/Children’s content, while
neighboring factors jointly capture the Action–Adventure spectrum.</p>
        <p>These visual cues let marketing teams map otherwise opaque latent variables to recognizable content
themes. By amplifying or suppressing specific factors in a campaign, they can steer the recommender
toward inventory that best matches a target segment. For example, pushing Factor 9 to feature Halloween
releases, or tuning down the Western-specific factor in regions where that genre under-performs.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Preference modeling</title>
        <p>To illustrate how filtering according to user preferences works, we create a profile for user new-u-4,
specifying Romance as a disliked genre. This allows us to simulate a personalized content screening
process based on explicit user preferences. Table 5 shows the recommendations that remain after
applying this filter, efectively removing all titles associated with the unwanted genre. This step
demonstrates how simple preference-based filtering can help tailor recommendations to better align
with a user’s tastes and avoid suggesting content they are likely to reject.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Serendipity and Business Rules</title>
        <p>To illustrate the serendipitous recommendations, table 6 shows both the random pick and the novel
genre pick with their respective attributes, such as title, genres, average rating (Avg Rating), and number
of ratings (Num Ratings).</p>
        <p>We can observe that the random pick adheres well to the business rules of serendipitous
recommendation. Although the genre, Drama, is familiar to the user, the selected film has received only a single
rating. Despite its limited exposure, the rating is the maximum possible - 5.0 - indicating a potentially
high-quality item. This makes it a strong candidate for serendipity: an overlooked film that may align
with the user’s preferences, ofering the possibility of surprising satisfaction.</p>
        <sec id="sec-4-6-1">
          <title>Type</title>
        </sec>
        <sec id="sec-4-6-2">
          <title>Random Pick</title>
        </sec>
        <sec id="sec-4-6-3">
          <title>Novel Pick</title>
        </sec>
        <sec id="sec-4-6-4">
          <title>Title &amp; Genres</title>
          <p>Aiqing wansui (1994)
Genre: Drama
Best Men (1997)
Genre: Action, Comedy, Crime, Drama
5.00
3.40
1
5</p>
          <p>In contrast, the novelty-based pick excels in fulfilling both core goals of serendipity-novelty and
unexpected relevance. The recommended title falls within the genres Action and Crime, which both,
according to the user’s history 3, represent a new area of interest. Additionally, with just five ratings, it
remains relatively undiscovered by the broader user base. This not only increases the chance of ofering
the user something fresh, but also supports business objectives such as content discovery and catalog
diversification.</p>
          <p>Taken together, these two recommendations exemplify complementary approaches to serendipity: one
driven by quality and underexposure, the other by genre novelty and user exploration. This demonstrates
the efectiveness of using both randomization and personalized novelty scoring in surfacing engaging,
lesser-known content.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>In this work, we address a clear literature gap: despite their state-of-the-art accuracy, modern
nextpurchase recommendation pipelines remain opaque "black boxes" to both end-users and organizational
stakeholders. To bridge this gap in a multistakeholder context, we integrate explainable AI techniques
that surface the drivers of each suggestion, reduce cognitive load during decision making, and build trust.
We also embed business rules: serendipity and novelty picks driven by user profiles to introduce
underexposed yet relevant items, balancing discovery with precision. This demonstrates that transparency,
exploratory novelty, and high-performance recommendation can coexist while supporting professional
information needs and decision processes.</p>
      <p>Our multistakeholder approach delivers tailored explanations for both end-users and business
stakeholders. For end-users, we apply SHAP to generate model-agnostic, genre-level attributions that clarify
which past interactions or categories influenced each recommendation, fostering trust, engagement, and
enabling targeted marketing messages. For business stakeholders, we visualize model embeddings via a
latent-factor loading matrix, mapping each latent dimension to movie genres, to link model structure
with domain concepts, support strategy refinement, business-rule tuning, model-selection decisions, and
thus potentially reduce cognitive load. This approach generalizes to any model: when explicit factors
are absent, we would project user/item vectors (e.g., via PCA or UMAP) for equivalent interpretability.</p>
      <p>To ensure that new users receive meaningful suggestions while still enabling exploration of
unexpected content, we leveraged lightweight user profiles that store minimal preference information. By
embedding business rules: novelty and serendipity filters, promotion of under-exposed items, and
category-based diversity, we balanced personalized recommendations with overarching organizational
goals. To address the cold-start problem, we simulated ten new users with only a few interactions
each and evaluated model performance using NDCG@10, complemented by SHAP explanations to
highlight the most influential genre contributions, demonstrating robust recommendation quality even
in sparse-data scenarios.</p>
      <p>Despite promising results, this study has limitations. It relies solely on the MovieLens dataset: pilot
data from the PEER project were not yet available at the time of writing, and the system has not been
tested in a live environment. Moreover, new users and profiles were simulated, so real-world dynamics
may difer.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This work shows that a state-of-the-art next-purchase recommendation pipeline - enriched with
explainable AI, business-rule–driven serendipity, and user-profile–driven novelty - can deliver both
high accuracy and transparency, reduce users’ cognitive load, and satisfy the needs of end-users and
organizational stakeholders.</p>
      <p>Expanding beyond the movie domain, we plan to test the methodology on diverse digital content
platforms - such as the proprietary data from the PEER project, music streaming services, e-learning
platforms, and news portals - validating its efectiveness in live environments and refining business-rule
logic under real-world user behaviors.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This paper is part of a project that has received funding from the European Union’s Horizon Europe
Research and Innovation Programme, under Grant Agreement number 101120406. The paper reflects
only the authors’ view and the EC is not responsible for any use that may be made of the information it
contains.</p>
      <p>The research has been supported by a grant from the Priority Research Area (DigiWorld) under the
Strategic Programme Excellence Initiative at Jagiellonian University.</p>
      <p>Contribution of Maciej Mozolewski for the research for this publication has been supported by a
grant from the Priority Research Area (DigiWorld) under the Mark Kac Center for Complex Systems
Research Strategic Programme Excellence Initiative at Jagiellonian University.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI o4-mini-high and GPT-5. The tools
assisted with drafting and revising text, paraphrasing and style polishing, abstract drafting, formatting
guidance, and programming support (code drafting and refactoring). All AI-assisted content and code
were reviewed, tested, and edited by the authors, who accept full responsibility for the final manuscript.
[7] W. van Melle, Mycin: a knowledge-based consultation program for infectious disease diagnosis,
International Journal of Man-Machine Studies 10 (1978) 313–322. URL: https://www.sciencedir
ect.com/science/article/pii/S0020737378800492. doi:https://doi.org/10.1016/S0020-737
3(78)80049-2.
[8] J. L. Herlocker, J. A. Konstan, J. Riedl, Explaining collaborative filtering recommendations, in:
Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, CSCW
’00, Association for Computing Machinery, New York, NY, USA, 2000, p. 241–250. URL: https:
//dl.acm.org/doi/10.1145/358916.358995. doi:10.1145/358916.358995.
[9] L. M. de Campos, J. M. Fernández-Luna, J. F. Huete, An explainable content-based approach
for recommender systems: a case study in journal recommendation for paper submission, User
Modeling and User-Adapted Interaction 34 (2024) 1431–1465. URL: https://link.springer.com/articl
e/10.1007/s11257-024-09400-6. doi:10.1007/s11257-024-09400-6.
[10] E. Kannout, H. S. Nguyen, M. Grzegorowski, Speeding up recommender systems using association
rules, in: N. T. Nguyen, T. K. Tran, U. Tukayev, T.-P. Hong, B. Trawiński, E. Szczerbicki (Eds.),
Intelligent Information and Database Systems, Springer Nature Switzerland, Cham, 2022, pp.
167–179.
[11] M. A. Chatti, M. Guesmi, A. Muslim, Visualization for recommendation explainability: A survey
and new perspectives, ACM Trans. Interact. Intell. Syst. 14 (2024). URL: https://dl.acm.org/doi/10.
1145/3672276. doi:10.1145/3672276.
[12] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new
perspectives, ACM Comput. Surv. 52 (2019). URL: https://dl.acm.org/doi/10.1145/3285029.
doi:10.1145/3285029.
[13] S. Sedhain, A. K. Menon, S. Sanner, L. Xie, Autorec: Autoencoders meet collaborative filtering, in:
Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion,
Association for Computing Machinery, New York, NY, USA, 2015, p. 111–112. URL: https://dl.acm
.org/doi/10.1145/2740908.2742726. doi:10.1145/2740908.2742726.
[14] C. V. Roberts, E. Elahi, A. Chandrashekar, On the bias-variance characteristics of lime and shap in
high sparsity movie recommendation explanation tasks, 2022. URL: https://arxiv.org/abs/2206.047
84. arXiv:2206.04784.
[15] C. Gao, Y. Zheng, N. Li, Y. Li, Y. Qin, J. Piao, Y. Quan, J. Chang, D. Jin, X. He, Y. Li, A survey of graph
neural networks for recommender systems: Challenges, methods, and directions, ACM Trans.</p>
      <p>Recomm. Syst. 1 (2023). URL: https://dl.acm.org/doi/10.1145/3568022. doi:10.1145/3568022.
[16] P. Cao, P. Liò, Genrec: Generative sequential recommendation with large language models, 2024.</p>
      <p>URL: https://arxiv.org/abs/2407.21191. arXiv:2407.21191.
[17] J. Ji, Z. Li, S. Xu, W. Hua, Y. Ge, J. Tan, Y. Zhang, Genrec: Large language model for generative
recommendation, in: Advances in Information Retrieval: 46th European Conference on Information
Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part III, Springer-Verlag,
Berlin, Heidelberg, 2024, p. 494–502. URL: https://doi.org/10.1007/978-3-031-56063-7_42.
doi:10.1007/978-3-031-56063-7_42.
[18] M. Narvekar, K. Bharucha, V. Vishwanath, N. Gabani, S. Fernandes, Enhancing interpretability in
diverse recommendation systems through explainable ai techniques, Journal of Computational
Analysis and Applications (JoCAAA) 32 (2024) 447–456. URL: https://eudoxuspress.com/index.p
hp/pub/article/view/1427.
[19] J. Zhong, E. Negre, Shap-enhanced counterfactual explanations for recommendations, in:
Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, Association for
Computing Machinery, New York, NY, USA, 2022, p. 1365–1372. URL: https://dl.acm.org/doi/10.11
45/3477314.3507029. doi:10.1145/3477314.3507029.
[20] J. Tan, S. Xu, Y. Ge, Y. Li, X. Chen, Y. Zhang, Counterfactual explainable recommendation, in:
Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management,
CIKM ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 1784–1793. URL:
https://dl.acm.org/doi/10.1145/3459637.3482420. doi:10.1145/3459637.3482420.
[21] O. Barkan, V. Bogina, L. Gurevitch, Y. Asher, N. Koenigstein, A counterfactual framework for</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Westphal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vössing</surname>
          </string-name>
          , G. Satzger, G. B.
          <string-name>
            <surname>Yom-Tov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rafaeli</surname>
          </string-name>
          ,
          <article-title>Decision control and explanations in human-ai collaboration: Improving user perceptions and compliance</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>144</volume>
          (
          <year>2023</year>
          )
          <article-title>107714</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S07475632230 00651. doi:
          <volume>10</volume>
          .1016/j.chb.
          <year>2023</year>
          .
          <volume>107714</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Noh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hong</surname>
          </string-name>
          , S. Park,
          <article-title>Design of a serendipity-incorporated recommender system</article-title>
          ,
          <source>Electronics</source>
          <volume>14</volume>
          (
          <year>2025</year>
          ). URL: https://www.mdpi.com/2079-9292/14/4/821. doi:
          <volume>10</volume>
          .3390/electron ics14040821.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <article-title>Serendipity in recommender system: A holistic overview</article-title>
          ,
          <source>in: 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          . doi:
          <volume>10</volume>
          .1109/AICCSA.
          <year>2018</year>
          .
          <volume>8612895</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          , G. Adomavicius,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kowald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Neidhardt</surname>
          </string-name>
          , Özlem Özgöbek,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Pera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          , J. Ziegler,
          <article-title>De-centering the (traditional) user: Multistakeholder evaluation of recommender systems</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2501.05170. arXiv:
          <volume>2501</volume>
          .
          <fpage>05170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Confalonieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Coba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Besold</surname>
          </string-name>
          ,
          <article-title>A historical perspective of explainable artificial intelligence</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <article-title>e1391</article-title>
          . URL: https://wires.on linelibrary.wiley.com/doi/abs/10.1002/widm.1391. doi:
          <volume>10</volume>
          .1002/widm.1391.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Schellingerhout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>A co-design study for multi-stakeholder job recommender system explanations</article-title>
          , in: L.
          <string-name>
            <surname>Longo</surname>
          </string-name>
          (Ed.),
          <source>Explainable Artificial Intelligence</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>