The Use of Impressions in Recommender Systems: Improving Complete and Semi Cold-Start Matteo Garavaglia1,3,∗,† , Alessandro Solinas2,4,† , Ricardo Anibal Matamoros Aragon2,3,† , Stefania Bandini3 and Francesco Epifania2 1 AITech4T S.r.l. 2 Social Things S.r.l, Milan, Italy 3 Department of Informatics, Systems and Communication, University of Milano-Bicocca (ITALY) 4 Politecnico di Milano, Milan, Italy Abstract Recommender Systems (RS) are tools that are often utilized and need constant development in both the structure and the data to use. Impressions Data are a new type of information that is underused and can be helpful in various scenarios. Therefore, we propose a hybrid RS that uses Impressions to mitigate the significant issues in our original system, Knowledge Graph Attention Network (KGAT). The first problem is the situation of complete cold-start, for which we propose the use of questions on selected meaningful attributes and a BERT-based Content-Based RS to perform recommendations following the user’s choices. After that, when in a framework of semi cold-start, the recommendations will be enhanced by using Impressions to rerank the following ones and, from these interactions, to build a profile to use with KGAT. The last issue we will address is the need for more interpretation of negative interactions through the Knowledge Graph, that is, recommendations presented but not chosen. To solve this issue, we use the Impression Discounting model on the set of recommendations produced by KGAT. Keywords Recommender Systems, Impressions, Cold Start, Real-time Recommendations 1. Introduction Research in Recommender Systems (RS) mainly builds recommendation models using historical feedback (e.g., clicks, purchases, watching actions) of products collected from users, but the community is always looking to improve the recommendation quality by leveraging other data sources [1]. Impressions represent an emerging concept in RS, but their full potential is limited by a lack of reliable and open-source data. This research aims to develop a comprehensive RS framework that combines Knowledge Graph Attention Network (KGAT)[2], a content- based (CB) recommender system utilizing BERT, and impressions. KGAT, empowered by the Workshop on Artificial Intelligence and Applications for Business and Industries (AIABI 2023) co-located with 22th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2023), Rome, Italy ∗ Corresponding author. † These authors contributed equally. Envelope-Open m.garavaglia20@campus.unimib.it (M. Garavaglia); alessandro.solinas@socialthingum.com (A. Solinas); ricardo.matamoros@socialthingum.com (R. A. M. Aragon); stefania.bandini@unimib.it (S. Bandini); francesco.epifania@socialthingum.com (F. Epifania) Orcid 0009-0006-7451-3986 (M. Garavaglia); 0000-0002-5428-3187 (A. Solinas); 0000-0002-1957-2530 (R. A. M. Aragon); 0000-0002-7056-0543 (S. Bandini) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings information captured in the knowledge graph (KG), aspires to provide personalized and precise recommendations, leveraging both user and item characteristics. However, this model does not include the interpretation of negative interactions, implying that the model does not learn anything new if an item is recommended but not chosen by the user, moreover, the training phase of KGAT is time-and-resource-consuming, making quick recurrent training unfeasible. For situations where no prior user data is available, or a specific user has not been included in KGAT, we introduce CB-BERT, which leverages resource titles and abstracts to match user preferences effectively. To further refine these recommendation systems, we incorporate real- time impressions. In the current literature, exist two main classes of impression models: (i) re-ranking and (ii) impressions as user profiles. The first class, which is the one we will employ, allows the system to dynamically re-rank recommendations, while the second may treat impressions as user interactions are traditionally used [1]. We propose a slightly modified Impression Discounting (ID) model, as detailed in [3], which easily adapts to both KGAT and CB scenarios thanks to its peculiarity of being a plug-in approach, completely independent of the system that generates the recommendations to re-rank. ID utilizes impressions to facilitate real-time updates, enabling the system to continually refine its recommendations. Summarizing, we aspire to propose a framework in which we use a CB RS and the ID to mitigate the drawbacks of KGAT, namely poor performance in full and semi cold-start, a long training time and the disregard of the user’s refusal of recommendations. 2. Overview of our system components In this section we will briefly explain the components of our system, highlighting a few key aspects of each of them. 2.1. Data The dataset used for this work was the Microsoft News Recommendation Dataset (MIND). MIND is a dataset created by Microsoft Researcher to advance the research on news recommen- dation constructed from user click logs of Microsoft News. The dataset is in English, has one million of users, 161013 news, 24155470 clicks and, for each news has title, abstract, body and category. Fundamental for this work, the Dataset also contains Impressions of the users in the form of click history. Each impression contains the ID of the user who generated it, the timestamp, the click history of the user before the impression and the actual impressions, that is the list of id of the recommended items with two possible suffixes: “-1” for items clicked by the user and “-0” for ones not clicked by the user [5]. 2.2. KGAT The core element of this recommender system is KGAT, a model designed to handle complex relationships in a knowledge graph (KG). KGAT does this by recursively updating node embed- dings, like objects, users, or attributes, and using attention to determine the significance of these connections. The system starts by creating a KG that represents various object relationships. For example, in movie recommendations, a film may be linked to actors, directors, genres, and more. To make this graph usable for neural networks, an adjacency matrix is formed, where 1s represent existing edges, and 0s indicate no connection. Then, a multi-layer neural network processes KG data to improve recommendations. The first layer takes the adjacency matrix and converts it into numerical embeddings that capture object characteristics. These embeddings are generated using proximity information to map objects into a lower-dimensional space. The model calculates attention coefficients, assigning importance to relationships based on proximity and relevance. Finally, recommendations are produced using object embeddings and graph attention. These recommendations are presented to users as suggestions for products, movies, music, etc., that might match their interests. The key components of KGAT are: • Embedding construction: entities and relations present in the KG are parametrized using TransR [4], optimizing the traslation principle 𝑒ℎ𝑟 + 𝑒𝑟 ≈ 𝑒𝑡𝑟 , if the triple (ℎ, 𝑟, 𝑡) exists in the graph. Where 𝑒ℎ , 𝑒𝑡 ∈ ℝ𝑑 , 𝑒𝑟 ∈ ℝ𝑘 are h,t and r embeddings, respectively, while 𝑒ℎ𝑟 , 𝑒𝑡𝑟 are the projections of 𝑒ℎ , 𝑒𝑡 in the relations space. • Attentive Embedding Propagation Layers: the set of attention coefficients is generated to obtain the importance of high-order connectivities. Each attention layer consists of 3 components: information propagation, knowledge-aware attention, and information aggregation. Considering entity h and the triples for which h is the starting node (also known as the ego network), 𝒩ℎ = {(ℎ, 𝑟, 𝑡)|(ℎ, 𝑟, 𝑡) ∈ 𝒢 },, it is possible to characterize the 1st-order connectivity of entity h with a linear combination of its ego network: 𝑒𝒩ℎ = ∑ 𝜋(ℎ, 𝑟, 𝑡)𝑒𝑡 (ℎ,𝑟,𝑡)∈𝒩ℎ The term 𝜋(ℎ, 𝑟, 𝑡) controls the flow of information from h to t through r. This coefficient is implemented using a relational attention mechanism and is calculated using the 𝑡𝑎𝑛ℎ activation function, followed by normalization through a softmax function. As a final step, for each entity, the information from the embedding 𝑒ℎ is aggregated with its representation in the ego network 𝑒𝒩ℎ . The aggregation function used is: 𝑓 (𝑒ℎ , 𝑒𝒩ℎ ) = 𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈 (𝑊1 (𝑒ℎ + 𝑒𝒩ℎ )) + 𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈 (𝑊2 (𝑒ℎ ⊙ 𝑒𝒩ℎ )) Where 𝑊1 and 𝑊2 are trainable weight matrices in the model, and ⊙ denotes element- wise multiplication. This principle is used to model 1st-order connections, but it can be generalized to higher-order connections by constructing the ego network of order l. 2.3. CB-BERT Since KGAT could not handle effectively a user in a situation of cold start, we implemented a Content Based Recommender System (CBRS) to use when a new user asks for recommendations. A typical CBRS works by creating a profile based on the attributes of items with which the user has interacted. Then, to perform recommendations, the system has to compute a similarity score between each item and the user profile, to, finally, recommend the ones that are the most similar to the profile [6]. In our case the user profile is built using the embedding of abstracts (or titles, if abstracts are not present in the data) of the articles with which the user has interacted. The embeddings are given by a Bidirectional Encoder Representations from Transformers (BERT) model. The Transformer model, which is defined by a stack of self-attention layers and feedforward neural networks, is the foundation of the architecture of BERT. With the help of this architecture, BERT gathers context from both preceding and following words, allowing it to understand the context [7]. BERT must go through a two step procedure to create context-aware embeddings: 1. Pre-Training: • Masked Language Modeling (MLM): BERT is pre-trained on a corpus of text of large dimensions in our case Wikipedia and Books Corpus. During this phase BERT learns to predict missing words in a sentence by masking random words in the input text and setting BERT’s objective to predict what those words should be. • Next Sentence Prediction (NSP): BERT is then trained to predict if there is a sequen- tial relationship between two sentences. This phase enable BERT to comprehend sentence-level contextual dependencies, which is vital for tasks such as document classification and question answering. 2. Fine-Tuning: BERT is fine-tuned on a task-specific dataset relevant to a downstream ap- plication after pre-training. With this modification, BERT is now able to use its previously learnt contextual embeddings for tasks like sentiment analysis, named entity recognition, and text categorization. This process allows BERT to be in line with task-specific goals. In our case we build the profile as the combination of the embedding of the abstracts (if present, if not the titles) of the article with which the user has previously interacted. The embedding of users are used to compute the cosine similarity with the ones of the items. The cosine similarity is defined as 𝑛 𝐴⋅𝐵 ∑𝑖=1 𝐴𝑖 𝐵𝑖 𝑠𝑖𝑚(𝐴, 𝐵) = 𝑐𝑜𝑠(𝜃) = = |𝐴| ⋅ |𝐵| 𝑛 2 𝑛 2 √∑𝑖=1 𝐴𝑖 √∑𝑖=1 𝐵𝑖 Where A and B are the embeddings. 2.4. Impression Discounting To model impression we opted for a slight modification of the model presented in [3]. ID is a model used as a plug-in for an existing Recommender System (RS), which, through impressions, re-ranks the resources provided by the RS. An Impression is a set containing a user-object pair and the behaviors that occurred between them in a session (logs), defined as 𝜏 = {𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑐𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛, 𝑏𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑠, 𝑡, 𝑅}. When grouping by (user, item), a series of impressions 𝜏1 , .., 𝜏𝑛 are obtained and used as data. Within the impression set 𝜏, there are three additional elements: t represents the timestamp associated with the impression, R is the score vector provided by the existing RS for the user, and finally, conversion is a boolean indicating whether the recommendation was accepted or not. It is worth noting that 𝑐𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛 = 1 is only possible in the last element of an impression sequence, as we assume that an accepted item will no longer be shown to the user. The possible behaviors are: • LastSeen: the day difference (in days) between the current and last impressions, associated with the same (user, item). • ImpCount: the number of previous impressions associated with the same (user, item). • Position: the position of the item in the RS recommendation list • UserFreq: the frequency of the user’s interactions with the RS The objective of the ID model is to identify a discounting factor d, which updates the ranking associated with every (user, item) couple to maximize the number of converted items. The relation between each behavior and the discounting factor is modeled through a discounting function that is chosen independently for a single behavior through a Bayesian optimization algorithm. The set of possible discounting functions is the following: • Linear: 𝑓𝐿 (𝑥) = 𝛼1 + 𝑥 ⋅ 𝛼2 𝛼 • Inverse: 𝑓𝐼 (𝑥) = 𝑥1 + 𝛼2 • Exponential: 𝑓𝐸 (𝑥) = 𝑒 𝛼1 ⋅𝑥+𝛼2 • Quadratic: 𝑓𝑄 (𝑥) = 𝛼1 (𝑥 − 𝛼2 )2 + 𝛼3 So, given an impression, the discounting factor d is computed as a linear combination of the function applied to each measured behavior x, as follows: 𝑑 = ∑ 𝑓𝑥 (𝑥) 𝑥∈𝑋 Where X is considered the set of the behaviors {LastSeen, ImpCount, Position, UserFreq} and 𝑓𝑥 is the discounting function associated with the behavior 𝑥 ∈ 𝑋, among the above-indicated ones. 3. Results In this section, we present the preliminary results obtained by adding the impression framework both on KGAT recommendations and on colt-start, using CB-Bert recommendations. 3.1. KGAT experiment Concerning the KGAT standard utilization, we wanted to see how impressions add value to the existing recommendation. To put this theory to the test, we performed the following experiment. We initialized the KG with the items present in each user history, which we recall is univocal and contains past positive interactions with the existing items. With this setup, we proceeded to train KGAT and generate a recommendation list for each user. We then used this list to optimize and test the ID model. Looking at the recommendations’ metrics (such as Precision, Recall, NDCG @20) before and after the use of impressions we could see a significant improvement, suggesting a better user representation, but a deeper and thorough analysis is needed. 3.2. Cold Start experiment Similarly to the KGAT experiment, we want to test how valuable are impressions in improving the performances in the case of complete cold-start, recommending using CB-BERT. To test it we present the following pipeline: From the user’s history, we collected the categories and sub-categories of the news with positive interactions and used them to perform a CB-BERT recommendation. This was done to simulate a sort of questionnaire about the user’s preferences. The NDCG@20 registered in this case (without the Impressions Discounting) is 0.001, which is a good starting point considering the scenario. With the use of impressions we were able to improve it to 0.013. 4. Conclusions and future developments In conclusion, this research addresses the objective of enhancing the recommendation quality in RS. In general, a single RS model suffers from different problems, ranging from cold-start to the partial use of the information available. Proposing a coherent and personalized experience to the user on the recommendation remains a challenge. We propose an ensemble of three different techniques to manage at best the possible problematics, focusing on cold start and the management of negative interactions. We were able to increase the model’s performance in both the standard and cold-start experiments. From our results, it is evident that impressions add a significant value with ease of use and flexibility in many of these frameworks while not penalizing the computational burden. We want to underline that these are only preliminary results, a more complex modeling and a deeper analysis could lead to more promising results. For future work, we plan to introduce more suited data gathered from our systems to perform more extensive evaluations. We also have the ambition to extend the impression scheme by improving the set of behaviors and exploring other techniques to exploit them. References [1] Fernando Benjamin Perez Maurera, Maurizio Ferrari Dacrema, and Paolo Cremonesi. Towards the Evaluation of Recommender Systems with Impressions. In Proceedings of the 16th ACM Conference on Recommender Systems (RecSys ’22), Association for Comput- ing Machinery, New York, NY, USA, 2022. ISBN: 978-1450392785. Pages 610-615. DOI: 10.1145/3523227.3551483. [2] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19), Association for Computing Machinery, New York, NY, USA, 2019. ISBN: 978-1450362016. Pages 950-958. DOI: 10.1145/3292500.3330989. [3] Pei Lee, Laks V.S. Lakshmanan, Mitul Tiwari, and Sam Shah. Modeling Impression Discounting in Large-Scale Recommender Systems. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14), Association for Comput- ing Machinery, New York, NY, USA, 2014. ISBN: 978-1450329569. Pages 1837-1846. DOI: 10.1145/2623330.2623356. [4] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings of the AAAI Conference on Artificial Intelligence, February 2015. DOI: 10.1609/aaai.v29i1.9491. [5] Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. MIND: A Large-scale Dataset for News Recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, July 2020. Pages 3597-3606. DOI: 10.18653/v1/2020.acl-main.331. [6] Pasquale Lops, Dietmar Jannach, Cataldo Musto, Toine Bogers, and Marijn Koolen. Trends in content-based recommendation. In User-Adap Inter 29. 2019. Pages 239–249. DOI: 10.1007/s11257-019-09231-w [7] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. In 31st Conference on Neural Information Processing System. 2017. DOI: 10.5555/3295222.3295349.