=Paper=
{{Paper
|id=Vol-1247/recsys14_poster14
|storemode=property
|title=Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization
|pdfUrl=https://ceur-ws.org/Vol-1247/recsys14_poster14.pdf
|volume=Vol-1247
|dblpUrl=https://dblp.org/rec/conf/recsys/MehrotraYV14
}}
==Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization==
Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization Rishabh Mehrotra Emine Yilmaz Manisha Verma University College London University College London University College London r.mehrotra@cs.ucl.ac.uk emine.yilmaz@ucl.ac.uk m.verma@cs.ucl.ac.uk ABSTRACT (ODP) topical categories. While such topics are easily spec- We introduce a novel approach to user modelling for behav- ified, significant human effort is required in labelling queries ioral targeting: task-based user representation and present for each topic. Additionally, topical category based meth- an approach based on search task extraction from search ods restrict user’s profile coverage in a major way as different logs wherein users are represented by their actions over a users might share the same topical profile yet perform differ- task-space. Given a web search log, we extract search tasks ent search tasks for different informational needs. Another performed by users and find user representations based on line of research for personalization has focused on using term these tasks. More specifically, we construct a user-task asso- based representations wherein user interests profiles are built ciation matrix and borrow insights from Collaborative Fil- using terms extracted from user’s browsing history following tering to learn low-dimensional factor model wherein the which term weights are generated using different weighing interests/preferences of a user are determined by a small schemes. While query terms are representative of user inter- number of latent factors. We compare the performance of ests, they often limit the scope of personalization as different the proposed approach on the task of collaborative query users inherently follow different distributions over words and recommendation on publicly available AOL search log with queries belonging to the same topic might not contain any a standard term-similarity baseline and discuss potential fu- overlapping terms which makes finding similar users difficult ture research directions. in such settings. In this work, we focus on learning user profiles based on Categories and Subject Descriptors the search tasks users are involved with. Users interact with H.3.3 [Information Storage And Retrieval]: Informa- search engines to accomplish some task such as arrange a tion Search and Retrieval—User Modelling trip, plan a wedding etc. Such broad requirements prompts Keywords the use of multiple queries, sometimes spanning multiple sessions. We define search tasks as the group of queries Search tasks; User modelling; Personalization a user issues to accomplish such overall intended task and 1. INTRODUCTION advocate the use of such search tasks to build individual user models. We postulate that in a web search setting, As a consumer of the informational content, different users a user representation based on the search tasks users’ per- have distinct preferences of information for decision mak- form would better capture user actions, interests and pref- ing; thus accurately understanding their respective infor- erences. Given a search log, we extract search tasks per- mation needs and decision preferences is crucial for provid- formed by users and find user representations based on these ing effective decision support. While human behaviours are tasks. More specifically, we construct a user-task associa- largely determined by their own goals and preferences, the tion matrix and borrow insights from Collaborative Filter- mined knowledge reveals users’ underlying intentions and ing to learn a low-dimensional factor model wherein the behaviour patterns, which provide unique signals for human actions/interests/preferences of a user are determined by centric optimization and personalization. Web search per- a small number of latent factors. By applying probabilis- sonalization has recently received a lot of attention by the tic matrix factorization to the user-task association matrix, research community. Personalized search leverages informa- we learn task-based user representations for each user and tion about an individual to identify the most relevant recom- evaluate the quality of the learnt user representations by mendations for that person. A challenge for personalization making use of these representations for the task of Collab- is in collecting user profiles that are sufficiently rich to be orative Query Recommendation wherein we suggest queries useful in settings such as result ranking, query recommen- to a particular user based on queries issued by other similar dations, etc, while balancing privacy concerns. users. We compare the performance of the proposed ap- A prominent line of prior research uses long term histories proach against a term similarity based baseline on publicly to directly improve retrieval effectiveness. Various authors available AOL search logs. have considered topic based representations for personaliza- tion [1] making use of hand picked Open Directory Project 2. TASK BASED USER MODELLING Our objective is to build succinct user profiles from the search task information embedded in search logs. Existing Copyright is held by the author/owner(s). RecSys 2014 Poster Proceedings, October 6-10, 2014, Foster City, Silicon user modelling methods for web search rely heavily on per Valley, USA. user topical interests and hence, fail to differentiate between users which share similar topical interests. We postulate 0.85 that in web search setting, search logs contain information 0.8 about various actions that users perform and profiling users avg no of queries recalled 0.75 based on search tasks would better capture the heterogene- 0.7 ity in user information. 0.65 Task Discovery in Search Logs: Our goal here is to use 0.6 search log data to create a list of global search tasks. Fol- 0.55 Task Based lowing the approach of task discovery as proposed in [2], 0.5 TermSim a task is defined as the maximal subsequence of possibly 10 15 20 25 30 35 Top-n queries 40 45 50 nonconsecutive queries in referring to the same latent user Figure 1: Performance on Collaborative Query Recommen- need which makes the set of all user tasks a partitioning of dation the set of all user queries. We formulate the task discovery problem as follows: given a query log QL and a user u, let the goal is to recommend queries to a user based on queries Tu be the set of user tasks discovered by a query partition- issued by similar users. We calculate the weighted frequency ing scheme π; the user task discovery problem can then be of a candidate query for 10 most similar users of the tar- described as finding the best query partitioning strategy π ∗ get user u, and selected the top n queries as recommenda- that approximates the actual set of user tasks Θ such that: tion. We make use of the AOL log dataset which consists π ∗ = argmaxπ ξ(Θ, T, π) (1) of ∼20M web queries collected over three months and use where function ξ(.) is an accuracy measure which evaluates data for about ∼1200 users who have issued more than 550 how well the query partitioning strategy π approximates the queries. We run our Task Discovery algorithm on the set actual user tasks Θ. We use cosine similarity to measure this of queries for each of these users which results in a total of accuracy. This step is followed by clustering the user tasks ∼0.12M tasks which we cluster using cosine similarity score identified to obtain universal tasks across all users. The to obtain a set of 1529 search tasks using which we create final set of user tasks obtained are represented by a set of the user-task association matrix. Our baseline(TermSim) is query terms and henceforth define the set of tasks used for a method that only uses bag-of-words based representation experiments. For details, please refer Lucchese et al [2]. for each user where the terms are extracted from user queries & similar users found using cosine similarity between each User-Task Association Matrix: Based on the extracted user’s bag-of-word based representations. We consider the search tasks, we construct a user-task association matrix test-set of queries in the target user as relevant, and com- which represents the search tasks users have been involved puted average number of relevant queries matched in the with. For each user ui , we create a bag-of-queries represen- recommendation query set as the performance metric. tation from the list of queries issued by the user and com- We plot the average number of query matches between pare each user with each of these search tasks tj obtained the recommended set of queries and user’s own test set of above. For each user-taskpair, we populate the queries against n where n refers to the top-n query sug- corresponding value in the user-task association matrix (R) gestions from 10 most similar users. Our initial results with the cosine similarity score (rij ) we obtain for the pair. (Figure 1) show that the proposed Task-Based user mod- For tasks in which users do not have any matching queries, elling approach(Task-Based ) performs better than TermSim we assign a score of 0 to the corresponding pair. which demonstrates that search tasks can serve as potent Probabilistic Matrix Factorization for User Repre- user modelling tools. Since TermSim relies strictly on term sentations: We wish to extract task-based user vector rep- matching for measuring user similarities, its coverage is lim- resentations by jointly mapping users and tasks to a joint ited: it might not capture insights for the users with too few latent factor space. Following Salakhutdinov et al [3], we queries or those who shared the same search interest but model the user-task association in terms of probabiltic ma- issued different queries or performed different tasks. Task trix factorization problem and learn latent vector represen- based user modelling can help in better differentiating be- tation for each user from the user-task association matrix by tween users which have similar topical interests but perform fitting a probabilistic model. Given the user-task association different tasks. To better leverage the topical user profiles, it matrix R, we find the user feature matrix U = [ui ] and task would be interesting to combine user topical-interest infor- feature matrix T = [tj ]. The conditional distribution over mation with user task-associations to come up with a unified the observed user-task associations R ∈