The Importance of Cognitive Biases in the Recommendation Ecosystem: Evidence of Feature-Positive Effect, Ikea Effect, and Cultural Homophily Markus Schedl1,2,∗,† , Oleg Lesota1 and Shahed Masoudian1 1 Institute of Computational Perception, Johannes Kepler University Linz (JKU), Altenberger Straße 69, A-4040 Linz, Austria 2 Human-centered AI Group, AI Lab, Linz Institute of Technology (LIT), Altenberger Straße 69, A-4040 Linz, Austria Abstract Cognitive biases have been studied in psychology, sociology, and behavioral economics for decades. Traditionally, they have been considered a negative human trait that leads to inferior decision making, reinforcement of stereotypes, or can be exploited to manipulate consumers, respectively. Lately, there has been growing interest in AI research to better understand the influence of such biases in classification, search, and also recommendation tasks. We argue that cognitive biases manifest in different parts of the recommendation ecosystem and in various components of the recommendation pipeline, including input data (such as ratings or side information), recommendation algorithm or model (and consequently recommended items), and user interactions with the system. More importantly, we contest the traditional detrimental perspective on cognitive biases and claim that certain cognitive biases can be beneficial when accounted for by recommender systems. Concretely, we provide empirical evidence that feature-positive effect, Ikea effect, and cultural homophily can be observed in the context of recommender systems, and discuss their potential for exploitation. In three small experiments covering recruitment and entertainment domains, we study the pervasiveness of the aforementioned biases. We ultimately advocate for a prejudice-free consideration of cognitive biases to improve user and item models as well as recommendation algorithms. Keywords Psychology, Cognition, Feature-Positive Effect, Ikea Effect, Cultural Homophily, Declinism, Halo Effect, Primacy Effect, Recency Effect, Position Bias, Empirical Studies, Simulation Study 1. Introduction and Background “Music used to be better in the 1980s when I was young.” “The cookies I baked are much tastier than the ones I bought.” ”I only remember the last items on my to-do list.” “He is such a great actor; I am sure those nasty accusations against him are made up.” These are examples of common cognitive biases, respectively, declinism, Ikea effect, recency effect, and halo effect. Such biases have been studied in psychology and sociology for decades. In psychology, they are commonly defined as systematic perceptual deviations of the individual from rationality and objectivity, cognition, judgment, or decision making, which often happens unconsciously [1, 2]. In sociology, they typically refer to collective prejudices of a society that favor one group’s values, norms, and traditions over others [3, 4]. Historically, cognitive biases have been regarded as a negative characteristic of humans, which lead to inferior decision making, reinforce stereotypes, and may even cause severe systematic errors and harm [5]. More recently, psychologists have started to acknowledge the positive effects of certain cognitive biases, e.g., to improve the efficiency of human learning and decision making [6, 1]. In the field of machine learning, the study of cognitive biases has played a minor role so far. Only lately, some ideas to leverage cognitive biases for model training, e.g., to improve their generalization capabilities or foster IntRS’24: Joint Workshop on Interfaces and Human Decision Making for Recommender Systems, October 18, 2024, Bari (Italy) ∗ Corresponding author. Envelope-Open markus.schedl@jku.at (M. Schedl); oleg.lesota@jku.at (O. Lesota); shahed.masoudian@jku.at (S. Masoudian) GLOBE https://hcai.at (M. Schedl) Orcid 0000-0003-1706-3406 (M. Schedl) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings ethical machine behavior emerged [7]. Narrowing down the scope to search and ranking, in information retrieval, some preliminary research on cognitive biases has been conducted recently [8, 9, 10]. Existing research, however, has focused on detecting some cognitive biases and assessing their influence on search behavior rather than leveraging them to improve retrieval algorithms. In recommender systems (RSs) research, some psychologically grounded human biases have been studied in the past, e.g., primacy and recency effects in peer recommendation [11] as well as risk aversion and decision biases in product recommendation [12]. However, despite their importance, cognitive biases in the context of recommendation have received surprisingly little attention over the past few years. Nor are we aware of any systematic investigation of their manifestations at different stages of the recommendation process. This is particularly astonishing because research in RSs has historically been inspired by psychological theories, models, and empirical evidence on human decision making [13]. To narrow this research gap, we advocate for a holistic examination of cognitive biases within the recommendation ecosystem. And we take first steps in this direction in the paper at hand. Overall, we argue for studying their potential manifestations in different components of the system, at different stages of the recommendation process, and from the perspective of different stakeholders. Furthermore, we aim to evaluate and harness the positive effects these biases may have, with the goal of enhancing user and item models, as well as refining recommendation algorithms. In the following, we briefly introduce a selection of cognitive biases that we believe deserve a more thorough investigation in the context of RSs (Section 2). We then present empirical evidence of some of these biases and showcase how they may influence recommendation (Section 3). Ultimately, we argue for a much-needed (re-)consideration of cognitive biases and point to important directions this could take (Section 4). 2. Cognitive Biases in Recommendation Extensive research in psychology, sociology, and economics has revealed a plethora of cognitive biases [1, 2]. They relate to how humans perceive, process, store, and retrieve information, involving the cognitive and neurological processes of individuals and even whole societies. While not all cognitive biases directly apply to RSs, many of them influence user behavior and decision-making processes. As a result, these biases affect users’ interactions with items (e.g., ratings or consumption patterns) and with RSs in general (e.g., use of different functionalities provided by the system’s interface). However, only a few cognitive biases have been studied in the context of RSs. In the following, we introduce some, point to related work, and provide a working definition in the context of the recommendation ecosystem.1 Feature-Positive Effect [15, 16]: Humans better realize and put more emphasis on things that are present than on things that are absent. In the context of RSs, we argue that this effect plays a crucial role in explainability and fairness [17]. For instance, through counterfactuals, one could show users which (maybe better-suited) items would have been recommended to them if they had different traits. We demonstrate this effect in Section 3.1. Ikea Effect [18, 19]: The more effort a person invested in something, the more they will value it. This effect owes its name to a study that found participants were willing to pay a much higher price for furniture they had helped build than for ready-made items [18]. It can be thought of as the desire of humans to justify their efforts. In a RS context, we assume the existence of similar effects when users create their own item collections (e.g., songs or videos in a playlist, visited places) in contrast to interacting with those created by others or by a recommendation algorithm. We provide evidence for this effect in Section 3.2. Homophily [20, 21]: Homophily refers to the fact that individuals tend to form connections with others who have similar characteristics (e.g., age, culture, or religion) more often than with people having different traits. While it is a well-studied phenomenon in sociology [21] and even evolutionary 1 This is, by no means, meant to be an exhaustive list, but a biased selection of the authors. For a more in-depth discussion about cognitive biases, we refer to [14, 13]. biology [22], in RSs, homophily has not been extensively studied from a cognitive psychological perspective.2 Evidence of cultural homophily in the music domain has been found in [23], where music listeners of certain countries (e.g., Brazil, Sweden, and Germany) showed a preference to listen to domestic music artists. Vice versa, Brazilian, Polish, and Russian artists were found to be predominantly liked by their domestic audience. In Section 3.3, we provide additional evidence that such effects can be observed and influenced by the recommendation algorithm. Declinism [24, 25]: The perception that the world or society is declining, i.e., things get worse over time. This has been shown to be (partly) the result of rosy retrospection — humans’ tendency to remember the past as more positive as it actually was [26]. Declinism bias can be studied in the context of RSs, by considering side information on items, e.g., emotions reflected in song lyrics have become more negative over the last decades [27]. We hypothesize that such trends can also be observed in longer-term historic interaction data, i.e., users tend to interact more frequently over time with items that are negatively emotion-laden. Identifying and formalizing such trends could also be used to adjust recommendations to counteract (or amplify) this bias. Contrast Effect [28]: If two items are shown in the vicinity in the recommendation list of a user — one with exceptional, the other with medium utility — the latter will appear much less appealing to the user, even if it is still a reasonable choice. Anchoring [29]: As a variant of the contrast effect, anchoring refers to the fact that humans often overemphasize the piece of information they are exposed to first. RS providers can exploit this effect to offer a hook to their users, e.g., showing them a highly-priced item first, making them (unconsciously) believe that subsequent items are cheap even if they are still overpriced [30]. This use is also referred to as decoy effect [31]. Conformity Bias [32]: Providing users with evidence of previous interactions or ratings by others impacts their own consequent behavior. For instance, showing users (artificial) ratings before asking them to provide their own results in their ratings being closer to the initially presented ones [32]. Users were also found more likely to click on an item if they saw that many others have done so [33]. Further- more, recent research reported in [34] has studied two types of conformity bias in RSs: informational conformity (belief that one’s peer group has superior knowledge) and normative conformity (desire for social approval). The authors also propose a model to disentangle individual users’ self-interest and conformity behavior. Primacy and Recency Effect (Position Bias) [35, 11]: The position at which items occur in a recommendation list influences the probability that users will interact with them. Users are more likely to interact with items appearing at the beginning (primacy effect) and at the end (recency effect) of a ranking or sequence of recommendations. In some specific domains, position bias (or serial position effects) have been observed already, e.g., users were more likely to vote for recommended science stories shown to them first and last [11]. Notably, the primacy effect was much stronger than the recency effect in this case. In product recommendation, an influence of item position on users’ ability to remember product characteristics and users’ inclination to select certain products were observed [36]. Recency effects have also been formalized in the cognitive architecture ACT-R [37], which has been used to predict repeated consumption behavior [38] and to increase diversity and explainability of RSs [39]. Halo Effect [40, 41]: One’s overall impression or specific perceived traits of a person influence the perception of other characteristics of that person. For instance, it has been shown that physically attractive people tend to be perceived as more intelligent and having more positive personality traits than less attractive ones [42]. This overly positive perception overshadows the negative traits of the person. In the context of RSs, we assume that similar effects occur, e.g., in slate-, playlist-, or basket recommendation, where the recommended collection of items as a whole may be perceived as more or less favorable depending on a single trait of one salient item (or its creator) in the recommended item collection. If evidence for this can be found, RS providers could, for instance, implement a mechanism to push content created by underrepresented producers by adding it to a collection with items whose 2 This is even more surprising because collaborative filtering algorithms leverage this concept, by exploiting similarity in preferences or interaction behaviors. halo will extend to those injected items, thereby improving the latter’s exposure and overall fairness. It is important to note that cognitive biases usually do not occur in isolation. Instead, in the rec- ommendation process, they are often intertwined and related through feedback loops. For instance, position bias influences the probability of users clicking on a recommended item, which affects the interaction data used to train or update the recommendation model. Also, primacy-, contrast-, and anchoring effects are often at work jointly. 3. Empirical Findings To showcase manifestations of cognitive biases in the context of RSs, we present several preliminary investigations. Among the many existing cognitive biases, we select feature-positive effect (Section 3.1), Ikea effect (Section 3.2), and cultural homophily (Section 3.3) because they received only little attention so far by the RS community. Furthermore, this selection of biases and choice of experimental design covers different recommendation components and stages, respectively, (content-based) training data, user interactions, and recommendation outcomes. Our experiments cover the domains of recruitment (candidate recommendation) and entertainment (music recommendation). 3.1. Feature-Positive Effect The feature positive effect (FPE) is known as the tendency of learning organisms to better detect the presence of stimuli (e.g., if p then q) rather than their absence (e.g., if p then not q or if not p then q) [16, 43]. This concept can be relevant in job and candidate RSs, as these systems also focus on what is present in a job ad to determine an applicant’s relevance, potentially overlooking what is missing in the job ad. To investigate the presence of the FPE in candidate RSs (for given job ads), we developed a content- based RS using the Distil-RoBERTa cross-encoder model, similar to Kumar et al.’s work on detecting biases in job RSs [44]. We employed GPT-4o to generate 350 CVs across six job categories (dentist, nurse, photographer, software engineer, accountant, and teacher), resulting in a total of 2,100 CVs. To mitigate the risk of generating artificial or inaccurate CVs, we instructed GPT-4o to base these CVs on real-world labeled biographies obtained from the BIOS dataset [45]. For the job postings, we utilized 1,358 samples of job ads from a UK job board. They are labeled and from the same categories as the CVs. Therefore, the ground truth is established through the match in job category between CV and job ad. The data was split into training (80%) and validation/test (20%) sets. Our model was trained on a binary classification task to determine whether pairs of CVs and job ads matched. For each positive training instance, we included four negative samples to balance the training process. To track the existence of the FPE, we evaluated the model on the test set containing 272 job ads and 336 unique applicants. We predict the match between pairs of candidates and job ads with relevance score, and counted 13,607 true positive (TP) and 1,625 false negative (FN) predictions, where TP means the candidate is suitable for the job ad and the model predicts it correctly, while FN means the candidate is suited for the job ad but the model predicts they are unsuited. We conducted two experiments to study the FPE, in particular, to which extent it could apply to algorithmic ranking or matching. In the first experiment, we considered pairs of job ads and candidates; the former as p, the latter as q. TP samples can therefore be considered as fulfilling if p then q; FN samples as fulfilling if p then not q. As stimuli we considered the presence (or absence) of adjectives in job ads, and investigated to which extent the FPE would extend to the learned ranking model, simulating the situations that the classifier “sees” (or does not “see”) these adjectives during decision making. To this end, we removed a percentage of randomly selected adjectives from the job ads constituting TP samples. As shown in Figure 1, we found that as more adjectives were removed from job ads, more CVs that were TPs (q) became FNs (not q). This points to the crucial role of adjectives in RSs’ decision making of applicants’ suitability for a job even though, for instance, “a passionate dentist” should objectively be treated the same as “a dentist”. Yet the recommendation model relies on the adjectives to determine suitability of candidates. 20.0 TP samples turning FN (%) 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.00 20 40 60 80 100 Adjectives removed (%) Figure 1: The effect of removing adjectives from job ads on the relevance predictions of candidate-job pairs. Table 1 Examples of adjectives that exist in each unique set of job ads determined by Textblob [46]. Group Adjectives Low Recall small, referral, sexual, steady, ... High Recall new, full, other, good, professional, ... Unique set technical, annual, innovative, complex, ... In a second experiment, we explored leveraging missing adjectives to enhance the decision making of the model. Within the formal logical framework, this can be interpreted as slightly modifying 𝑝 (by replacing words in the job ad), and subsequently investigating whether if p then not q becomes if p then q after this modification. To modify 𝑝, we first calculated recall for each job ad with respect to all candidates (CVs) in the test set. Subsequently, we grouped the job ads into low- and high-recall categories. We then identified a unique set of adjectives present in high-recall job ads but absent in low-recall job ads. Please refer to Table 1 for some examples. Finally, we randomly selected adjectives from the FN samples and replaced them with adjectives that occurred exclusively in the high-recall job ads (unique set). Overall, we observed a notable improvement in the average score of these modified FN samples, which increased from 0.046 to 0.152. In detail, we observed that 52.0% of the FN samples showed an improvement in their relevance score and 12.9% were reclassified as TPs. In contrast, 39.4% decreased their score and remained FNs while the remaining 8.6% of samples did not change their score. This observation hints at the potential advantage of using missing information to improve the utility of candidate ranking systems. These findings point at the significance of the FPE, not only from a human perspective, but also evidenced by our algorithmic interpretation of the FPE in a learned ranking model. Our experiments highlight the influence of missing information, such as adjectives, on the accuracy of content-based candidate-job matching systems. Potential for Exploitation: The impact of the (non-)existence of certain adjectives in job ads on the accuracy of candidate-job matching is particularly noteworthy when it comes to transparency. The FPE could be mitigated, for instance, by giving recruiters direct feedback on how the pool of applicants changes when they adjust the wording of their job ad. Likewise, a similar transparency mechanism could help job seekers identify salient words in their CVs, or even investigate counterfactual recommendations if they alter aspects like their gender or work experience. 3.2. Ikea Effect In the context of RSs and streaming platforms, the Ikea effect could be interpreted as a user’s predispo- sition towards items and item collections they feel invested in, such as items they conducted research on or discovered themselves, collections they helped compile. To probe the effect in the music domain, we conduct a user study, striving to answer the following research question: Do music listeners prefer Participant count 20 15 10 5 0 4 2 0 2 4 Score difference: own versus discovered playlists Figure 2: Distribution of the consumption frequency difference between own and other playlists. Positive values show preference towards own playlists. playlists they contributed to (own) over playlists created without their participation (other)? We ran the study on the Prolific3 platform for 100 participants from the United States who indicated themselves as users of one or more music streaming services. The study requires participants to complete four statements by choosing one option from a Likert-5 scale ranging from “Never (1)” to “Very often (5)”. The statements are presented in the same order to all participants: S1 “I create or edit music collections...”,4 S2 “I play music collections (created by me or someone else).”, S3 “I play music collections I created or helped create myself.”, S4 “I play music collections created by someone else (e.g., discovered on the internet or shared by someone).”. Out of 96 respondents who submitted valid attempts, 88 indicated that they create or edit as well as consume playlists more often than ”Never” (S1 and S2). Out of the 88 participants creating or editing playlists, 48 indicated that they consume own playlists more often than other (comparing responses to S3 and S4), and 18 participants indicated the opposite. The distribution of the difference in consumption frequency between own and other playlists (the S4 score subtracted from the S3 score for each user) is shown in Figure 2. The corresponding mean of these (S3 score − S4 score) is 0.65 and the standard deviation is 1.52. For users creating playlists, we also analyze Spearman’s correlation between responses. A significant correlation between S1 and S3 (0.75, 𝑝 < 0.001) shows that the more time users put into creating playlists the more time they spend listening to their own playlists. On the other hand, we observe no significant correlation between S1 and S4. As an additional finding, we also note the significant correlation between S2 and S3 (0.66, 𝑝 < 0.001). It shows that respondents spending more time listening to playlists in general tend to listen to the playlists they contributed to more often. We interpret this as active listeners having a clearer formed taste and therefore preferring their own playlists to cater to it. This particular finding, while not a direct evidence of the Ikea effect, may indicate a higher degree of investment active listeners feel towards their chosen music. We observe no significant correlation between S2 and S4. From our observations we conclude that, in the given sample, users tend to interact more with playlists they invested effort in, which we interpret as a variant of the Ikea effect. A larger study would be required to evaluate the prevalence of the effect globally and investigate factors contributing to the users’ feeling of being invested into an item. Potential for Exploitation: Acknowledging the concept of user investment into an item or a collection within a RS could help improve user experience with it. For instance, in the scenario of sequential recom- mendation, items present in the user’s playlists (the user put effort into picking and assigning them) can be seen as particularly valuable and serve as anchor points to retain user engagement within the current listening session. Additionally, recommendation explanations based on items and collections the user feels invested in could help improve user trust (Ikea effect combined with halo effect). 3 https://www.prolific.com 4 The participants were instructed to interpret “music collection” as playlist, track compilation, mixtape or similar, created by a human or automatically. 3.3. Cultural Homophily Homophily is often defined as the tendency of individuals to associate with similar others [20], e.g., sharing same interests, language, or culture. In the context of RSs, it could be interpreted through the interaction between item consumers and item creators. Users preferring items produced by creators sharing their cultural background would be an example of such interpretation [21]. In the following, we present the results of an empirical study showing indications of cultural homophily in the domain of music recommendation. We put forward two research questions: (1) Do users tend to prefer music tracks produced by artists from their own country? and (2) Can RSs foster or counteract homophily in the setting of music recommendation? We answer the first question by analyzing the listening behavior of users from different countries. We answer the second question by conducting a feedback loop simulation, comparing recommendations provided by a RS at various steps of the feedback loop with actual user consumption behavior. Following [23], we select a 5-core filtered sample from years 2018-2019 of the LFM-2b dataset [47] containing 99,897 items (songs) selected uniformly at random and 2,287,732 interactions triggered by 11,776 users of the music platform Last.fm.5 We enrich the sample with information about the country of each artist crawled from MusicBrainz.6 We conduct a feedback loop simulation, following [48]. Feedback loop is a setting where a RS (ranking model) is iteratively updated (retrained) using interactions triggered by the users interacting with previous states of the system. In a way the recommender is trained on the items it partially recommended itself. In our offline setting, we therefore simulate user interactions within the system. At each of the 20 iterations, the data used for training on the previous step is enriched with one recommendation per user, selected from their personalized recommendations, with higher probability for higher ranking recommendations, and then used for training at the next iteration. We use MultVAE [49] as recommendation algorithm. Table 2 shows the proportions of domestic music in the data sample, user consumption behavior, and user recommendations for countries with the top 10 largest number of tracks in the data sample. The value 0.397 for the 𝑈 𝑆, 𝑏𝑎𝑠𝑒 shows that about 40% of unique tracks in the data sample are produced by artists from the 𝑈 𝑆. This column serves as a baseline, showing what the proportion of domestic consumption would be if every user were to interact with tracks chosen at random. The value 0.626 for the US, 𝐶𝑜𝑛 shows that over 60% of actual listening attention of US users is directed towards tracks produced by US artists. Respectively US, 𝑅𝑒𝑐20 of 0.595 means that among all recommendations to US users at iteration 20, slightly less than 60% of tracks were produced by US artists.7 We answer the first question comparing the columns 𝑏𝑎𝑠𝑒 and 𝐶𝑜𝑛, noticing that for all presented countries the proportion of actual consumption of domestic music is higher than the baseline. This shows that on average users have higher interest in music originating from their country than a random choice would warrant. We also notice that Finnish (FI ) users demonstrate a higher level of interest in their domestic music than users from countries with comparable track supply, e.g., Australia (AU ) and Brazil (BR), by inspecting column 𝐶𝑜𝑛/𝑏𝑎𝑠𝑒 showing domestic consumption relative to the available domestic supply. As for the second question, focusing on the first iteration of the feedback loop (𝑅𝑒𝑐1 ), we find that users from the US get recommended a proportion of their domestic music similar to their actual consumption (𝐶𝑜𝑛) while users from Germany (DE), Brazil (BR), and Russia (RU ) are slightly over-served with their domestic music. In contrast, users from other countries are under-served with their domestic music, most prominently users from Australia (AU ), whose recommendations fail to reach the level of the baseline. At iteration 20 of the loop (𝑅𝑒𝑐20 ) the proportion of recommended domestic music increases for some under-served countries, such as the United Kingdom (UK ) and Canada (CA). However, for other countries such as Sweden (SE) and Finland (FI ) the proportion of recommended domestic music decreases further away from their actual consumption level. 5 https://www.last.fm 6 https://www.musicbrainz.org 7 Note that in the case of columns 𝐶𝑜𝑛 and 𝑅𝑒𝑐∗ a track is counted multiple times if it is consumed by, or recommended to, multiple users (also taken into account for normalization). Table 2 Proportions of domestic music among all available tracks (𝑏𝑎𝑠𝑒), among consumed tracks by users from the country (𝐶𝑜𝑛), among recommended tracks (𝑅𝑒𝑐1 ), and among recommended tracks after 20 iterations of the feedback loop simulation (𝑅𝑒𝑐20 ), for the top 10 countries in the dataset. Bold values indicate the highest, underlined lowest values. 𝑏𝑎𝑠𝑒 𝐶𝑜𝑛 𝐶𝑜𝑛/𝑏𝑎𝑠𝑒 𝑅𝑒𝑐1 𝑅𝑒𝑐1 /𝑏𝑎𝑠𝑒 𝑅𝑒𝑐20 𝑅𝑒𝑐20 /𝑏𝑎𝑠𝑒 𝑈𝑆 0.397 0.626 1.578 0.629 1.587 0.595 1.501 UK 0.155 0.266 1.713 0.227 1.458 0.232 1.495 DE 0.068 0.169 2.481 0.176 2.590 0.166 2.439 SE 0.045 0.159 3.519 0.102 2.266 0.088 1.948 CA 0.038 0.083 2.202 0.030 0.797 0.041 1.091 FR 0.028 0.091 3.232 0.039 1.377 0.041 1.447 AU 0.023 0.077 3.289 0.017 0.728 0.026 1.103 FI 0.023 0.170 7.536 0.166 7.325 0.132 5.820 BR 0.022 0.141 6.288 0.187 8.347 0.150 6.714 RU 0.019 0.073 3.870 0.081 4.262 0.066 3.515 The results of this preliminary study show that users tend to be interested in music originating from their own country, potentially indicating cultural homophily. However, the degree of interest varies between countries, e.g., Finnish users consume more domestic music than Australian users while having a comparable domestic supply. RSs do not necessarily represent this interest in their output and could foster it for some countries while counteracting it for others. Potential for Exploitation: Formalizing and leveraging corresponding homophily models as an additional indicator of user taste could be useful to strike a better balance in terms of diversification of recommendations, or to pursue calibration between user profiles and recommendations in terms of country, which is not always achievable in contemporary RSs [50]. 4. Conclusion, Limitations, and Future Work Cognitive biases have recently gained recognition in the wider field of machine learning, notably also in learning to rank tasks. While the recommender systems community has addressed biases like confirmation bias and position bias to some degree, other cognitive biases have remained largely unexplored. Addressing this gap, we demonstrated the presence of several cognitive biases within the recommendation ecosystem, including feature-positive effect, Ikea effect, and cultural homophily. We also briefly outlined potential positive and negative aspects of these biases in the recommendation process. Limitations: The reported experiments represent only first steps towards obtaining a better under- standing of the studied cognitive biases in the context of RSs. Our study covers three biases and the two recommendation domains of recruitment and music. While this may limit the generalizability of the findings to other domains, we believe future studies could build upon the ones we conducted here. Another limitation of these preliminary studies is that the experiments assume traditional recommenda- tion setups (e.g., matching jobs with CVs for the feature-positive effect and top-k recommendation for homophily), neglecting the presumably strong influence the RS’s user interface has on the manifestation or amplification of cognitive biases. Also, some of the presented results may be influenced by the char- acteristics of underlying datasets. For example, the LFM-2b dataset may be representative of users of the Last.fm service, but the country distributions of users and artists may not necessarily be representative of the population at large. Concerning cultural homophily, in the absence of better cultural indicators for both users and items/artists we resort to their country of origin. This limits generalizability of our findings because a song produced in one country does not need to be written in the (official) language of the country’s citizens, nor represent their culture. In addition, cultural identities can extend beyond country borders. Future work therefore calls for investigating additional cultural aspects, for instance, based on religion, interwoven history, or Hofstede’s cultural dimensions [51]. Future Work: The next steps in our research agenda include devising rigorous, formalized models of cognitive biases, creating a framework to precisely describe how they influence recommendation outcomes, and developing prototype RSs that leverage the positive effects of cognitive biases while mitigating the negative ones for recommendation. Focusing on the three biases studied here, concrete future directions may include investigating the dependence of the Ikea effect on different functionalities offered by different streaming platforms and RSs which alters the ways users interact with them. Given the varying prevalence of certain RS platforms between countries, we expect notable differences. The next steps in the study of cultural homophily could include disentangling homophily from other factors, such as availability bias, as well as studying the relation between homophily and diversity in recommendations. In conclusion, we strongly believe that RS researchers and practitioners should be equipped with high sensitivity for the existence of cognitive biases and knowledge of how they may impact different stages in the recommendation process. Furthermore, forming an understanding of where cognitive biases originate from, how they are intertwined, and how they influence different stakeholders will be vital to fully comprehend their role in the recommendation ecosystem. We advocate for a holistic discussion of both negative and positive effects of cognitive biases, and for new recommendation approaches that mitigate the former while exploiting the latter. In our opinion, this will be highly beneficial for the RS community as a whole, and may even mark the starting point for a new generation of cognitive-bias-aware RSs. Acknowledgments This research was funded in whole or in part by the Austrian Science Fund (FWF): https://doi.org/ 10.55776/P33526,https://doi.org/10.55776/DFH23, https://doi.org/10.55776/COE12, https://doi.org/10. 55776/P36413. Furthermore, we thank Stefan Brandl, Gustavo Escobedo, and Mohammad Lotfi for their contributions to initial experiments. References [1] D. Kahneman, Thinking, Fast and Slow, Penguin Books, 2017. [2] R. Dobelli, The Art of Thinking Clearly: Better Thinking, Better Decisions, Hachette, UK, 2013. [3] J. L. Eberhardt, Biased: Uncovering the Hidden Prejudice That Shapes What We See, Think, and Do, Penguin Books, 2020. [4] E. Bonilla-Silva, Racism Without Racists: Color-blind Racism and the Persistence of Racial Inequal- ity in the United States, Rowman & Littlefield Publishers, 2006. [5] A. Tversky, D. Kahneman, Judgment Under Uncertainty: Heuristics and Biases: Biases in Judgments Reveal Some Heuristics of Thinking Under Uncertainty, Science 185 (1974) 1124–1131. [6] G. Gigerenzer, W. Gaissmaier, Heuristic Decision Making, Annual Review of Psychology 62 (2011) 451–482. [7] S. Fabi, T. Hagendorff, Why We Need Biased AI - How Including Cognitive and Ethical Machine Biases Can Enhance AI Systems, CoRR abs/2203.09911 (2022). arXiv:2203.09911 . [8] L. Azzopardi, Cognitive biases in search: A review and reflection of cognitive biases in information retrieval, in: Proceedings of CHIIR, 2021, p. 27–37. [9] J. Liu, L. Azzopardi, Search Under Uncertainty: Cognitive Biases and Heuristics: A Tutorial on Testing, Mitigating and Accounting for Cognitive Biases in Search Experiments, in: Proceedings of SIGIR, 2024, p. 3013–3016. [10] G. Gomroki, H. B. and.. Rahmatolloah Fattahi, J. S. Fadardi, Identifying Effective Cognitive Biases in Information Retrieval, Journal of Information Science 49 (2023) 348–358. [11] K. Lerman, T. Hogg, Leveraging Position Bias to Improve Peer Recommendation, PLOS ONE 9 (2014) 1–8. [12] E. C. Teppan, M. Zanker, Decision Biases in Recommender Systems, Journal of Internet Commerce 14 (2015) 255–275. [13] E. Lex, D. Kowald, P. Seitlinger, T. N. T. Tran, A. Felfernig, M. Schedl, Psychology-informed Recommender Systems, Foundations and Trends in Information Retrieval 15 (2021) 134–242. [14] M. Schedl, V. W. Anelli, E. Lex, Technical and Regulatory Perspectives on Information Retrieval and Recommender Systems: Fairness, Transparency, and Privacy, Springer Cham, 2024. [15] S. T. Allison, D. M. Messick, The Feature-positive Effect, Attitude Strength, and Degree of Perceived Consensus, Personality and Social Psychology Bulletin 14 (1988) 231–241. [16] E. Rassin, Individual Differences in the Susceptibility to the Feature-Positive Effect, Psychological Test Adaptation and Development (2023). [17] M. D. Ekstrand, A. Das, R. Burke, F. Diaz, Fairness in Information Access Systems, Foundations and Trends in Information Retrieval 16 (2022) 1–177. [18] M. I. Norton, D. Mochon, D. Ariely, The IKEA Effect: When Labor Leads to Love, Journal of Consumer Psychology 22 (2012) 453–460. [19] L. E. Marsh, P. Kanngiesser, B. Hood, When and How Does Labour Lead to Love? The Ontogeny and Mechanisms of the IKEA Effect, Cognition 170 (2018) 245–253. [20] N. Ferguson, The False Prophecy of Hyperconnection: How to Survive the Networked Age, Foreign Affairs 96 (2017) 68–79. [21] N. P. Mark, Culture and Competition: Homophily and Distancing Explanations for Cultural Niches, American Sociological Review 68 (2003) 319–345. [22] Y. Jiang, D. I. Bolnick, M. Kirkpatrick, Assortative Mating in Animals, The American Naturalist 181 (2013) E125–E138. [23] O. Lesota, E. Parada-Cabaleiro, S. Brandl, E. Lex, N. Rekabsaz, M. Schedl, Traces of Globalization in Online Music Consumption Patterns and Results of Recommendation Algorithms, in: Proceedings of ISMIR, 2022, pp. 291–297. [24] M. Mather, L. L. Carstensen, Aging and Motivated Cognition: The Positivity Effect in Attention and Memory, Trends in Cognitive Sciences 9 (2005) 496–502. [25] A. R. Sutin, A. Terracciano, Y. Milaneschi, Y. An, L. Ferrucci, A. B. Zonderman, The Effect of Birth Cohort on Well-Being: The Legacy of Economic Hard Times, Psychological Science 24 (2013) 379–385. [26] T. R. Mitchell, L. Thompson, E. Peterson, R. Cronk, Temporal Adjustments in the Evaluation of Events: The “Rosy View”, Journal of Experimental Social Psychology 33 (1997) 421–448. [27] E. Parada-Cabaleiro, M. Mayerl, S. Brandl, M. Skowron, M. Schedl, E. Lex, E. Zangerle, Song Lyrics Have Become Simpler and More Repetitive Over the Last Five Decades, Scientific Reports 14 (2024) 5531. [28] X. J. Yang, C. D. Wickens, K. Hölttä-Otto, How Users Adjust Trust in Automation: Contrast Effect and Hindsight Bias, Proceedings of the Human Factors and Ergonomics Society Annual Meeting 60 (2016) 196–200. [29] G. B. Chapman, E. J. Johnson, Incorporating the Irrelevant: Anchors in Judgments of Belief and Value, Heuristics and Biases: The Psychology of Intuitive Judgment (2002). [30] J. W. Payne, J. R. Bettman, E. J. Johnson, The Adaptive Decision Maker, Cambridge University Press, 1993. [31] E. C. Teppan, A. Felfernig, Calculating Decoy Items in Utility-based Recommendation, in: Proceedings of IEA/AIE, Springer, 2009, pp. 183–192. [32] G. Adomavicius, J. Bockstedt, S. Curley, J. Zhang, Recommender Systems, Consumer Prefer- ences, and Anchoring Effects, in: Proceedings of the Workshop on Human Decision Making in Recommender Systems, 2011, pp. 35–42. [33] Y. Zheng, C. Gao, X. Li, X. He, Y. Li, D. Jin, Disentangling User Interest and Conformity for Recommendation with Causal Embedding, in: Proceedings of The Web Conference, 2021, pp. 2980–2991. [34] C. Ma, Y. Ren, P. Castells, M. Sanderson, Temporal Conformity-aware Hawkes Graph Network for Recommendations, in: Proceedings of The Web Conference, 2024, pp. 3185–3194. [35] N. Craswell, O. Zoeter, M. J. Taylor, B. Ramsey, An Experimental Comparison of Click Position-Bias Models, in: Proceedings of WSDM, 2008, pp. 87–94. [36] A. Felfernig, G. Friedrich, B. Gula, M. Hitz, T. Kruggel, G. Leitner, R. Melcher, D. Riepan, S. Strauss, E. Teppan, O. Vitouch, Persuasive Recommendation: Serial Position Effects in Knowledge-Based Recommender Systems, in: Proceedings of PERSUASIVE, volume 4744 of Lecture Notes in Computer Science, Springer, 2007, pp. 283–294. [37] J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, Y. Qin, An Integrated Theory of the Mind, Psychological Review 111 (2004) 1036. [38] M. Reiter-Haas, E. Parada-Cabaleiro, M. Schedl, E. Motamedi, M. Tkalcic, E. Lex, Predicting Music Relistening Behavior Using the ACT-R Framework, in: Proceedings of RecSys, 2021, pp. 702–707. [39] M. Moscati, C. Wallmann, M. Reiter-Haas, D. Kowald, E. Lex, M. Schedl, Integrating the ACT-R Framework with Collaborative Filtering for Explainable Sequential Music Recommendation, in: Proceedings of RecSys, 2023, pp. 840–847. [40] R. E. Nisbett, T. D. Wilson, The Halo Effect: Evidence for Unconscious Alteration of Judgments, Journal of Personality and Social psychology 35 (1977) 250. [41] E. L. Thorndike, A Constant Error in Psychological Ratings, Journal of Applied Psychology 4 (1920) 25–29. [42] C. Batres, V. Shiramizu, Examining the “Attractiveness Halo Effect” Across Cultures, Current Psychology 42 (2023) 25515–25519. [43] H. S. Jenkins, The Development of Stimulus Control Through Differential Reinforcement, Funda- mental Issues in Associative Learning (1969). [44] D. Kumar, T. Grosz, E. Greif, N. Rekab-saz, M. Schedl, Identifying Words in Job Advertisements Responsible for Gender Bias in Candidate Ranking Systems via Counterfactual Learning, in: Proceedings of RecSys in HR, volume 3490, CEUR-WS.org, 2023. [45] M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Ken- thapadi, A. T. Kalai, Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting, in: Proceedings of FAccT, 2019, pp. 120–128. [46] S. Loria, Textblob Documentation, Release 0.15 2 (2018). [47] M. Schedl, S. Brandl, O. Lesota, E. Parada-Cabaleiro, D. Penz, N. Rekabsaz, LFM-2b: A Dataset of Enriched Music Listening Events for Recommender Systems Research and Fairness Analysis, in: Proceedings of CHIIR, 2022, pp. 337–341. [48] M. Mansoury, H. Abdollahpouri, M. Pechenizkiy, B. Mobasher, R. Burke, Feedback Loop and Bias Amplification in Recommender Systems, in: Proceedings of CIKM, 2020, pp. 2145–2148. [49] J. Xu, Y. Ren, H. Tang, X. Pu, X. Zhu, M. Zeng, L. He, Multi-VAE: Learning Disentangled View- common and View-peculiar Visual Representations for Multi-view Clustering, in: Proceedings of ICCV, 2021, pp. 9214–9223. [50] O. Lesota, J. Geiger, M. Walder, D. Kowald, M. Schedl, Oh, Behave! Country Representation Dynamics Created by Feedback Loops in Music Recommender Systems, in: Proceedings of RecSys, 2024. [51] G. Hofstede, Dimensionalizing Cultures: The Hofstede Model in Context, Online Readings in Psychology and Culture 2 (2011).