Combination of Content-Based User Profiling and Local Collective Embeddings for Job Recommendation Vasily Leksin1 , Andrey Ostapets1 , Mikhail Kamenshikov1 , Dmitry Khodakov1 , and Vasily Rubtsov1,2 1 Avito, Russia http://www.avito.ru 2 National Research University Higher School of Economics, Russia Abstract. We present the approach to the RecSys Challenge 2017, which ranked 7th. The goal of the competition was to prepare job rec- ommendations for the users of the social network for business Xing.com. Our algorithm consists of two different models: Content-based User Pro- filing and Local Collective Embeddings. The first content-based model contains many hand-tuned parameters and data insights, so it performs fairly well on the task of the challenge despite its simplicity. The second model is based on Matrix Factorization and may be applicable to a wide range of cold-start recommendation tasks. The combination of these two models have shown the best performance on local validation. Keywords: recommender system; cold-start problem; Local Collective Embeddings 1 Introduction Unlike last year’s competition [3] ACM RecSys Challenge 2017 focuses on the problem of a cold-start vacancy recommendations for a business social network XING. The Challenge consists of two phases. The first phase is offline evaluation: fixed historical dataset and fixed target users and items for which recommen- dations should be submitted. Very few target items were in Interactions data. The second phase is online evaluation: a new portion of target users and items was released every day. The recommendations submitted at the online stage were rolled out in XING’s live system and pushed to real users. Both phases aimed at achieving the following tasks: given a new job posting the goal was to identify those users who (a) might be interested in getting the job posting as a push notification with recommendation and (b) were also proper candidates for the given job position. For online and offline evaluation, the same metrics and the same structure of datasets were used. The top teams from the offline phase were allowed to participate in the online evaluation. Online part determined the winners of the challenge. 10 Vasily Leksin et al. The task of offline phase was standard cold-start recommendation problem while the online stage was focused on the push notifications and was designed by the following scheme: – Every 24 hours teams needed to send a submission to the organizers; – For each job posting, a team needed to provide a list of ≤ 250 users; – Each user could receive ≤ 1 push per day. The recommendations were delivered to the users through the following chan- nels: activity stream in the mobile applications, jobs marketplace, emails, and recruiter tools. Some challenges that the participating teams needed to solve: – Balance between user interests and the interests of recruiters – Balance between relevance and revenue – Item cold-start problem – Smart targeting of push recommendations 2 Data Description In the challenge, the semi-synthetic dataset from XING was provided: artificial users and events were added, text tokens from job postings titles and descriptions were replaced by numeric IDs and random noise was added in order to anonymize the data. The organizers have provided datasets: – Interactions — interactions that the user performed on the job posting items (clicked, bookmarked, replied, recruiter interest or deleted) as well as details about items shown to users by the existing recommender (impres- sions). – Users — details about those users who appear in the above datasets: job roles, career level, discipline, industry, location, experience, and education. – Items — job postings details that were/will be recommended to the users: title, career level, discipline, industry, location, employment type, tags, cre- ated time and flag if item was active during the test. On the online phase datasets contained 1M users, 853K items, 88.7M im- pressions, 4.2M interactions (84% clicked, 8.5% deleted, 4.3% bookmarked, 2.3% replied, 0.8% recruiter interest). Every 24 hours the organizers provided a new portion of target users and items for prediction. 3 Evaluation Measure Let T be a list of target items. For each target item i ∈ T a recommender selects users to whom that item is pushed as a recommendation. Leaderboard score is computed as follows: X S= s(i, r(i)). (1) i∈T Combination of CBUP and LCE for Job Recommendation 11 Here, recommendations r(i) specifies the list of users who will receive the given item as a push recommendation. Let Ur = r(i) be a list of recommended users. The score function s(i, Ur ) sums up the success rates of the users Ur and the item-based success rate: X s(i, Ur ) = itemSuccess(i, Ur ) + userSuccess(i, u), (2) u∈Ur where userSucess scores a user-item pair according to the user interactions: userSuccess(i, u) = ρ(u, r) ∗ premiumBoost(u), (3) where ρ is event type weight and premiumBoost is a boost coefficient for premium user  −10, if delete only,   1, if clicked, ρ(u, r) = (4)    5, if bookmarked or replied, 20, if recruiter interest,  ( 2, if the user is premium, premiumBoost(u) = (5) 1, otherwise. ( w(i), if ∃u ∈ Ur : userSuccess(i, u) > 0 itemSuccess(i, Ur ) = (6) 0, otherwise, where w is a boost coefficient for a paid item. ( 50, if the item is paid, w(i) = (7) 25, otherwise. Evaluation measure shows the business value of different types of events from different types of users and different types of vacancies. For local validation in the first phase we used the last week of interactions. In the second phase, we used userSuccess score as a local score and recommen- dations were built only for those users and items that were present in the all currently available data from the daily chunks. 4 Models In our solution, we used two different approaches: Content-Based User Profiling and Local Collective Embeddings [6]. 12 Vasily Leksin et al. 4.1 Content-Based User Profiling Our first content-based [4] model is based on ranking user-item pairs using con- tent information from a user profile and user interaction history. The motivation for the content-based approach is below: we tried experimenting with classical XGBoost, but it delivered rather bad results; so, we built a simple algorithm where each feature has its own weight and it was possible to understand how the score for each prediction is formed. Let U be a set of target users and I be a set of target items. For each pair (uk , ik ) we check whether it has matches between User Job Title and Item Job Title (or tags) to reduce the number of the observed pairs. Then we apply following rankers to make top-N recommendations. – User Title ranker — firstly, we calculate 3 types of inverse document frequency for each token as #unique tokens Ft = log( ) (8) #token occurrences where U Ft stands for User Title, IFt stands for Item Title, and T Ft stands for Item Tags. For each user u we calculate the token weight as: 20 ∗ U Ft ∗ IFt ∗ T Ft scoret = p (9) |u| where |u| is a number of tokens in the user’s title. The final ranker score for a pair (uk , ik ) is calculated as a sum of token scores in intersection between User Title and Item Title (tags). X score(u, i) = scoret (10) t – User Interest ranker — this ranker is quite similar to User Title ranker described above, but instead of scoring tokens from User Job Title, we score tokens from items target user interacted with. This scoring includes Item Job Title and Item Job Tags with weights based on the interaction type. – Industry, Discipline and Career Level rankers — those rankers also use information from both user profiles and user interactions. The Profile Career Level ranker is based on the Career Level difference between user and item: CLD(u, i) = |uCL − iCL |  1.2, if CLD(u, i) ≤ 1  wCL (u, i) = 0.7, if CLD(u, i) = 3 (11)  0.5, if CLD(u, i) ≥ 4  score(u, i) = wCL ∗ score(u, i) (12) As the features Industry and Discipline are categorical, we multiply the pair score by w > 1 having exact match, and by w < 1 otherwise. Interactions Combination of CBUP and LCE for Job Recommendation 13 rankers are slightly different from the profile ones, and the main purpose of those rankers is to modify the score according to the ratio of parameter matches and the total user interactions. – User Behavior ranker — different users have different behavior patterns. That ranker is based on user interaction history. If the user has all positive interactions, we add a constant to his score, if the user has all negative interactions, we subtract a constant from his score. Also, we multiply user score by a w < 1, which is based on the positive/negative interactions ratio. – Premium Ranker is a simple ranker that multiplies the score by 1.1 for a premium user and by 1.05 for a premium item. Weights were first set manually based on our expertise and then they were optimized on the local validation set using a parameter grid. Final predictions were made by applying all rankers to user-item pairs, thresh- olding, and taking top-100 users for each item. 4.2 Local Collective Embeddings Local Collective Embeddings (LCE) [6] is a matrix factorization model that exploits items’ properties and past user preferences while enforcing the mani- fold structure exhibited by the collective embeddings. This model addresses the cold-start problem. The motivation for LCE approach is as follows: matrix fac- torization and topic modeling are well-proven methods, but in this challenge, we faced the necessity of a hybrid solution, which is LCE. Assume that we have: – a set of m properties stored in a matrix Xs ∈ Rn×m , where a row corresponds to an item and a column to an item property; – a set of u users stored in a matrix Xu ∈ Rn×u , where a cell (i; j) indicates whether the user j has shown interest to item i. At the test time, we are given a new item q with description qs ∈ R1×m , and our goal is to predict qu ∈ R1×u , i.e., to score how likely is the user to show interest to the new item. Define LCE optimization problem: 1 min : J = [α||Xs −W Hs ||2 +(1−α)||Xu −W Hu ||2 ++λ(||W ||2 +||Hs ||2 +||Hu ||2 )] 2 s.t. W ≥ 0; Hs ≥ 0; Hu ≥ 0. To regularize this matrix factorization model we use manifold assumption: if two data points xi and xj , in any view, are close in the intrinsic geometry of the distribution, then their representations in the low-dimensional space should also be close to each other. Let’s construct a graph with n nodes where each node represents a data point. For each point, we find the p nearest neighbors and we connect the corresponding nodes in the graph. 14 Vasily Leksin et al. This results in a matrix A which can later be used to measure the local closeness of two points xi and xj . Using the above defined weight matrix A we may measure the smoothness of the low dimensional representation as: n n n 1 X X X S= ||wi − wj ||2 Aij = (wiT wi )Dii − (wiT wj )Aij = 2 i,j=1 i=1 i,j=1 = T r(W T DW ) − T r(W T AW ) = T r(W T LW ), where D is a diagonal matrix P which entries are the row sums of A (or column, as A is symmetric), i.e., Dii = Aij ; L = D − A is called the Laplacian matrix of the graph [10] and T r() is the trace operator. This leads to the following optimization problem: 1 min : J = [α||Xs − W Hs ||2 + (1 − α)||Xu − W Hu ||2 + 2 +βT r(W T LW ) + λ(||W ||2 + ||Hs ||2 + ||Hu ||2 )] s.t. W ≥ 0; Hs ≥ 0; Hu ≥ 0, where L is the Laplacian matrix of the graph, and β is a hyper-parameter which controls the extent to which locality is enforced. Learning algorithm have the following Expectation Maximization (EM)[1] update rules: [αXs HsT + (1 − α)Xu HuT + βAW ] W ←W [αW Hs Hs T + (1 − α)W Hu HuT + βDW + λW ] [αW T Xs ] Hs ← Hs T ; [αW W Hs + λHs ] [(1 − α)W T Xu ] Hu ← Hu ; [(1 − α)W T W Hu + λHu ] where •• denotes the element-wise matrix division operator. Figures 1-3 show the results of optimizing two main model parameters: the number of EM-iterations and the number of factors. Figure 1 shows that the objective function converges after 25 iterations. From the Figures 2 and 3, we can see that the increase in the number of factors continues to improve the local score, even after the objective function has converged. After training LCE model, we train linear regression models on user features where target is corresponding user factor. Finally, we have as many linear re- gression models as a number of factors. Final recommendations are built based on weighted sum of LCE obtained factors and factors predicted by user features. This technique allowed to improve by 10% on the local score. Combination of CBUP and LCE for Job Recommendation 15 1011 objective 109 alpha beta Objective value 107 gamma 105 103 25 50 75 100 125 150 175 200 Number of iterations Fig. 1. EM convergence. 109 Objective value 108 25 50 100 200 500 1000 4000 k Fig. 2. Optimizing k (objective function) 16 Vasily Leksin et al. 350 300 250 Local score 200 150 100 25 50 100 200 500 1000 4000 k Fig. 3. Optimizing k (local score) 4.3 Blending After parameter tuning we had local score of 649 for Content-Based User Profil- ing and 405 for LCE. We generated 500 recommended items for each user from both models. Then we got the final score for user-item pair as a weighted sum of scores of two models: scoref inal = 0.8 ∗ scoreLCE + score0.15 content−based . The coefficient choice for blending was based on the distribution of scores. The content-based algorithm produced scores from 0 to 20K (approximately). LCE scores were in 0-1 range. Therefore, the content-based model score was raised to power p < 1. Then the exact values of the parameters were selected on the local validation set. This fine-tuning scheme of two rankers by their linear combination is similar to the scheme from [2]. The local score for the mixed model was 702. It means that the LCE model allowed to increase the local score by 8.1%. 5 Conclusions In our approach, we used two different models: Content-based User Profiling and Local Collective Embeddings. In this Challenge, the content-based approach have shown better results on local validation than the Local Collective Embed- dings. A possible reason for the fact that the more sophisticated LCE algorithm has lost in quality to the simpler content-based method is that LCE was not Combination of CBUP and LCE for Job Recommendation 17 customized enough for the task of competition. In particular, it was trained on binary data and did not take into account the different types of features (nu- meric, categorical, binary). However, the composition of the two algorithms had a better score. The underlying rules leading to the observed results are as follows: within the content-based approach it is very important to understand how the score for each prediction is formed. In the Matrix Factorization approach [5], it is very important to choose an appropriate generative model which can include user and item side features, for example, an extension on LCE. Further improvement of the LCE algorithm may incorporate the user features to the model (as an additional term similar to the item features) as well as different probability distributions for binary, categorical and numeric features. The second phase of the competition can be considered as an A/B test. Obviously, it is hard to carry out for organizers (it is necessary to split into groups correctly, collect solutions from participants, etc.), but it may stimulate participants since they can compete in real-time and observe how the algorithm shows itself comparing to other algorithms. Daily feedback in the online phase was too rare to use it for automatic parameter tuning or multi-armed bandits approach. In the offline stage, it was also important not to use features which we would not be able to use in the online stage. The main scientific contribution of the paper answers the question of how to use Local Collective Embeddings with user’s side features. Originally, LCE was not designed to include user features. References 1. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer New York Inc., New York, NY, USA, Chapter 8.5 The EM algorithm, 236243. 2. Dmitry I Ignatov, Sergey I Nikolenko, Taimuraz Abaev, and Jonas Poelmans. 2016. Online recommender system for radio station hosting based on information fusion and adaptive tag- aware profiling. Expert Systems with Applications 55 (2016), 546558. 3. Vasily Leksin and Andrey Ostapets. 2016. Job Recommenda- tion Based on Factor- ization Machine and Topic Modelling. In Proceedings of the Recommender Systems Challenge (RecSys Challenge 16). ACM, New York, NY, USA, Article 6, 4 pages. https://doi.org/10.1145/2987538.2987542 4. Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). 2015. Recommender Systems Handbook. Springer, Chapter 4 Semantics- Aware Content-Based Recom- mender Systems, 119159. https://doi.org/10.1007/978-1-4899-7637-6 5. Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl. 2000. Application of Dimensionality Reduction in Recommender System A Case Study. In IN ACM WEBKDD WORKSHOP. 6. Martin Saveski and Amin Mantrach. 2014. Item Cold-start Rec- ommendations: Learning Local Collective Embeddings. In Pro- ceedings of the 8th ACM Conference on Recommender Sys- tems (RecSys 14). ACM, New York, NY, USA, 8996. https:// doi.org/10.1145/2645710.2645751