=Paper= {{Paper |id=Vol-1247/recsys14_poster14 |storemode=property |title=Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization |pdfUrl=https://ceur-ws.org/Vol-1247/recsys14_poster14.pdf |volume=Vol-1247 |dblpUrl=https://dblp.org/rec/conf/recsys/MehrotraYV14 }} ==Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization== https://ceur-ws.org/Vol-1247/recsys14_poster14.pdf
           Task-Based User Modelling for Personalization via
                  Probabilistic Matrix Factorization

                 Rishabh Mehrotra                               Emine Yilmaz                       Manisha Verma
              University College London                   University College London            University College London
            r.mehrotra@cs.ucl.ac.uk                     emine.yilmaz@ucl.ac.uk                m.verma@cs.ucl.ac.uk


ABSTRACT                                                                   (ODP) topical categories. While such topics are easily spec-
We introduce a novel approach to user modelling for behav-                 ified, significant human effort is required in labelling queries
ioral targeting: task-based user representation and present                for each topic. Additionally, topical category based meth-
an approach based on search task extraction from search                    ods restrict user’s profile coverage in a major way as different
logs wherein users are represented by their actions over a                 users might share the same topical profile yet perform differ-
task-space. Given a web search log, we extract search tasks                ent search tasks for different informational needs. Another
performed by users and find user representations based on                  line of research for personalization has focused on using term
these tasks. More specifically, we construct a user-task asso-             based representations wherein user interests profiles are built
ciation matrix and borrow insights from Collaborative Fil-                 using terms extracted from user’s browsing history following
tering to learn low-dimensional factor model wherein the                   which term weights are generated using different weighing
interests/preferences of a user are determined by a small                  schemes. While query terms are representative of user inter-
number of latent factors. We compare the performance of                    ests, they often limit the scope of personalization as different
the proposed approach on the task of collaborative query                   users inherently follow different distributions over words and
recommendation on publicly available AOL search log with                   queries belonging to the same topic might not contain any
a standard term-similarity baseline and discuss potential fu-              overlapping terms which makes finding similar users difficult
ture research directions.                                                  in such settings.
                                                                              In this work, we focus on learning user profiles based on
Categories and Subject Descriptors                                         the search tasks users are involved with. Users interact with
H.3.3 [Information Storage And Retrieval]: Informa-                        search engines to accomplish some task such as arrange a
tion Search and Retrieval—User Modelling                                   trip, plan a wedding etc. Such broad requirements prompts
Keywords                                                                   the use of multiple queries, sometimes spanning multiple
                                                                           sessions. We define search tasks as the group of queries
Search tasks; User modelling; Personalization
                                                                           a user issues to accomplish such overall intended task and
1.    INTRODUCTION                                                         advocate the use of such search tasks to build individual
                                                                           user models. We postulate that in a web search setting,
   As a consumer of the informational content, different users
                                                                           a user representation based on the search tasks users’ per-
have distinct preferences of information for decision mak-
                                                                           form would better capture user actions, interests and pref-
ing; thus accurately understanding their respective infor-
                                                                           erences. Given a search log, we extract search tasks per-
mation needs and decision preferences is crucial for provid-
                                                                           formed by users and find user representations based on these
ing effective decision support. While human behaviours are
                                                                           tasks. More specifically, we construct a user-task associa-
largely determined by their own goals and preferences, the
                                                                           tion matrix and borrow insights from Collaborative Filter-
mined knowledge reveals users’ underlying intentions and
                                                                           ing to learn a low-dimensional factor model wherein the
behaviour patterns, which provide unique signals for human
                                                                           actions/interests/preferences of a user are determined by
centric optimization and personalization. Web search per-
                                                                           a small number of latent factors. By applying probabilis-
sonalization has recently received a lot of attention by the
                                                                           tic matrix factorization to the user-task association matrix,
research community. Personalized search leverages informa-
                                                                           we learn task-based user representations for each user and
tion about an individual to identify the most relevant recom-
                                                                           evaluate the quality of the learnt user representations by
mendations for that person. A challenge for personalization
                                                                           making use of these representations for the task of Collab-
is in collecting user profiles that are sufficiently rich to be
                                                                           orative Query Recommendation wherein we suggest queries
useful in settings such as result ranking, query recommen-
                                                                           to a particular user based on queries issued by other similar
dations, etc, while balancing privacy concerns.
                                                                           users. We compare the performance of the proposed ap-
   A prominent line of prior research uses long term histories
                                                                           proach against a term similarity based baseline on publicly
to directly improve retrieval effectiveness. Various authors
                                                                           available AOL search logs.
have considered topic based representations for personaliza-
tion [1] making use of hand picked Open Directory Project                  2.   TASK BASED USER MODELLING
                                                                           Our objective is to build succinct user profiles from the
                                                                           search task information embedded in search logs. Existing
Copyright is held by the author/owner(s).
RecSys 2014 Poster Proceedings, October 6-10, 2014, Foster City, Silicon   user modelling methods for web search rely heavily on per
Valley, USA.                                                               user topical interests and hence, fail to differentiate between
users which share similar topical interests. We postulate                                           0.85


that in web search setting, search logs contain information                                          0.8

about various actions that users perform and profiling users




                                                                       avg no of queries recalled
                                                                                                    0.75

based on search tasks would better capture the heterogene-
                                                                                                     0.7

ity in user information.
                                                                                                    0.65


Task Discovery in Search Logs: Our goal here is to use                                               0.6

search log data to create a list of global search tasks. Fol-                                       0.55                               Task Based
lowing the approach of task discovery as proposed in [2],                                            0.5
                                                                                                                                         TermSim
a task is defined as the maximal subsequence of possibly                                                   10   15   20   25    30     35
                                                                                                                           Top-n queries
                                                                                                                                            40   45   50




nonconsecutive queries in referring to the same latent user       Figure 1: Performance on Collaborative Query Recommen-
need which makes the set of all user tasks a partitioning of      dation
the set of all user queries. We formulate the task discovery
problem as follows: given a query log QL and a user u, let        the goal is to recommend queries to a user based on queries
Tu be the set of user tasks discovered by a query partition-      issued by similar users. We calculate the weighted frequency
ing scheme π; the user task discovery problem can then be         of a candidate query for 10 most similar users of the tar-
described as finding the best query partitioning strategy π ∗     get user u, and selected the top n queries as recommenda-
that approximates the actual set of user tasks Θ such that:       tion. We make use of the AOL log dataset which consists
                   π ∗ = argmaxπ ξ(Θ, T, π)                 (1)   of ∼20M web queries collected over three months and use
where function ξ(.) is an accuracy measure which evaluates        data for about ∼1200 users who have issued more than 550
how well the query partitioning strategy π approximates the       queries. We run our Task Discovery algorithm on the set
actual user tasks Θ. We use cosine similarity to measure this     of queries for each of these users which results in a total of
accuracy. This step is followed by clustering the user tasks      ∼0.12M tasks which we cluster using cosine similarity score
identified to obtain universal tasks across all users. The        to obtain a set of 1529 search tasks using which we create
final set of user tasks obtained are represented by a set of      the user-task association matrix. Our baseline(TermSim) is
query terms and henceforth define the set of tasks used for       a method that only uses bag-of-words based representation
experiments. For details, please refer Lucchese et al [2].        for each user where the terms are extracted from user queries
                                                                  & similar users found using cosine similarity between each
User-Task Association Matrix: Based on the extracted              user’s bag-of-word based representations. We consider the
search tasks, we construct a user-task association matrix         test-set of queries in the target user as relevant, and com-
which represents the search tasks users have been involved        puted average number of relevant queries matched in the
with. For each user ui , we create a bag-of-queries represen-     recommendation query set as the performance metric.
tation from the list of queries issued by the user and com-          We plot the average number of query matches between
pare each user with each of these search tasks tj obtained        the recommended set of queries and user’s own test set of
above. For each user-task  pair, we populate the        queries against n where n refers to the top-n query sug-
corresponding value in the user-task association matrix (R)       gestions from 10 most similar users. Our initial results
with the cosine similarity score (rij ) we obtain for the pair.   (Figure 1) show that the proposed Task-Based user mod-
For tasks in which users do not have any matching queries,        elling approach(Task-Based ) performs better than TermSim
we assign a score of 0 to the corresponding pair.                 which demonstrates that search tasks can serve as potent
Probabilistic Matrix Factorization for User Repre-                user modelling tools. Since TermSim relies strictly on term
sentations: We wish to extract task-based user vector rep-        matching for measuring user similarities, its coverage is lim-
resentations by jointly mapping users and tasks to a joint        ited: it might not capture insights for the users with too few
latent factor space. Following Salakhutdinov et al [3], we        queries or those who shared the same search interest but
model the user-task association in terms of probabiltic ma-       issued different queries or performed different tasks. Task
trix factorization problem and learn latent vector represen-      based user modelling can help in better differentiating be-
tation for each user from the user-task association matrix by     tween users which have similar topical interests but perform
fitting a probabilistic model. Given the user-task association    different tasks. To better leverage the topical user profiles, it
matrix R, we find the user feature matrix U = [ui ] and task      would be interesting to combine user topical-interest infor-
feature matrix T = [tj ]. The conditional distribution over       mation with user task-associations to come up with a unified
the observed user-task associations R ∈