Learning-to-Rank in research paper CBF recommendation:
              Leveraging irrelevant papers

                       Anas Alzoghbi                            Victor A. Arrascue Ayala               Peter M. Fischer

                                                                         Georg Lausen

                                    Department of Computer Science, University of Freiburg, Germany
                       {alzoghba, arrascue, peter.fischer, lausen}@informatik.uni-freiburg.de

ABSTRACT                                                                        of works adopted different techniques to tackle this prob-
Suggesting relevant literature to researchers has become an                     lem. A recent extensive survey in this domain [3] identified
active area of study, typically relying on content-based fil-                   content-based filtering (CBF) as the predominant approach
tering (CBF) over the rich textual features available. Given                    for research paper recommendation because of the rich tex-
the high dimensionality and the sparsity of the training                        tual features available. For learning user profile, almost ex-
samples inherent to this domain, the focus has so far been                      clusively the focus was on relevance feedback approaches,
on heuristic-based methods. In this paper, we argue for                         building on the assumption that papers appearing in user’s
the model-based approach and propose a learning-to-rank                         preference list have an equal (or a presumed extent) share
method that leverages publicly available publications’ meta-                    in the underlying user taste. Thus, user profiles are con-
data to produce an effective prediction model. The proposed                     structed as aggregation of relevant papers’ keywords. Based
method is systematically evaluated on a scholarly paper rec-                    on the classification suggested by Adomavicius et al. in [1],
ommendation dataset and compared against state-of-the-art                       these approaches are referred to as heuristic-based. In con-
model-based approaches as well as current, domain-specific                      trast, model-based approaches depend on a learning method
heuristic methods. The results show that our approach                           to fit the underlying user model (profile). This enables con-
clearly outperforms state-of-the-art research paper recom-                      structing a better modeling of researcher-keywords relation
mendations utilizing only publicly available meta-data.                         in user profiles. But they require a large body of training
                                                                                data which is not intuitively available in this domain. As a
                                                                                result, little work on applying model-based approaches ex-
CCS Concepts                                                                    ists for this problem.
•Information systems → Learning to rank; Recom-                                 In this paper, we employ pairwise learning-to-rank [4] as a
mender systems;                                                                 model-based technique for learning user profile. We incorpo-
                                                                                rate both relevant and irrelevant “peer” papers -papers pub-
Keywords                                                                        lished in relevant papers’ conferences- to formulate pairwise
                                                                                preferences and enrich the training set. Our main contribu-
Research paper recommendation; Learning-to-Rank; Content-                       tions include:
based Recommendation; Model-based user profile                                      • We investigate and customize learning-to-rank for CBF
                                                                                       research paper recommendation.
1.     INTRODUCTION                                                                 • We incorporate only a small set of data, restricted to
  Scholars and researchers are confronted with an overwhelm-                           publicly available metadata of papers. This makes our
ing number of newly published research papers in their do-                             approach suitable for a much larger domain than pre-
main of expertise. Although advantageous in restricting the                            vious approaches which require papers’ full-text.
domain, keyword-based search tools typically available in                           • We perform an initial, yet systematic study on a real-
digital libraries offer a limited help to researchers in locat-                        world datatset in which we show that our approach
ing the relevant content. As a result, researchers need to                             clearly outperforms existing heuristic- and model-based
manually search within unspecific search results to identify                           algorithms.
paper(s) of interest. This is the situation where recom-                        The rest of this paper is organized as following: the second
mendaer systems have great potential, and indeed plenty                         section provides an overview of existing related work. In sec-
                                                                                tion 3 we present our approach and in section 4 we demon-
                                                                                strate experimental setup and results. Finally, we conclude
                                                                                in section 5 by summarizing our findings and situate this
                                                                                work within our future plan.

                                                                                2.   RELATED WORK
                                                                                  A rich amount of related work tackled the problem of re-
CBRecSys 2016, September 16, 2016, Boston, MA, USA.                             search paper recommendation. collaborative filtering (CF)
Copyright c 2016 remains with the authors and/or original copyright holders.    approaches [8, 13, 14] showed a successful application of
model-based methods incorporating knowledge from other              we start from the following hypothesis: when users identify
“similar” users. However, we restrict our search to content-        relevant papers, they, to some extent, implicitly rate other
based scenarios considering only information from the active        papers published at the same conference (we call them peer
user. In this domain, the main focus in learning user profile       papers) as irrelevant1 . Based on this hypothesis, we utilize
has been on heuristic-based approaches with a wide adop-            peer papers as irrelevant papers as follows: for each user, we
tion of relevance feedback and cosine similarity [3]. Papers        build pairs of preferences out of relevant and peer papers.
are recommended which are most similar to one or more of            Such pairs are called pairwise preferences or for simplicity
previously published or liked papers. In [10], De Nart et           pairs, we will use these terms interchangeably along the pa-
al. used extracted terms (keyphrases) from user’s liked pa-         per. Afterward, we feed these pairs as training examples to
pers in constructing user profile. The profile has a graph          a learning algorithm in order to fit the user’s model. This
representation, and the focus here was on the keyphrases            model is used later to rank candidate papers and recommend
extraction method and the graph structure. The approach             top ranked ones to the user. Before delving deeper in the
of Lee et al. [6] proposed a memory based CBF, where users’         method details, we first introduce some notation. The func-
                                                                                                                     r
papers are clustered based on their similarity, and candidate       tion peer(.) is defined over the interest set Pint   of a user r.
                                                                                                  r
papers are ranked based on the distance from user’s clusters.       It delivers for a paper p ∈ Pint the set of p’s peer papers. In
Sugiyama et al. in [11, 12] applied a relevance feedback ap-        practice, this can be retrieved via digital libraries like DBLP
proach utilizing all terms from the fulltext of the researcher’s    registry2 . For the paper modeling, we adopt a vector space
publications in addition to terms from the citing and the ref-      model representation. Having the domain related keywords
erenced papers in order to build profiles. All of these works       extracted from paper’s title, abstract and keyword list as
are heuristic-based, where weights in user profile are set by       features, each paper p is a vector: p = hsp,v1 , ..., sp,v|V | i,
aggregating individual keywords’ scores of relevant papers.         with vi ∈ V is a domain-related vocabulary and sp,vi is a
On the contrary, model-based approaches depend on ma-               score reflecting the importance of vi in p. We adopt the
chine learning techniques to learn user affinity towards key-       TF-IDF score as the weighting scheme. Based on this rep-
words, promising a more representative user profile. In a           resentation, the similarity between two papers is calculated
previous work [2], we showed the superiority of a model-            by the cosine similarity between the papers’ vectors.
based method over relevance feedback methods for CBF re-
search paper recommendations. We applied multivariate lin-          3.1                               Method Steps
ear regression to learn researchers’ profiles from their previ-       An overview of the proposed approach is depicted in Fig-
ous publications. Yet, the work was tailored to researchers         ure 1. For the experimental setup only, we split user’s r
                                                                                   r                                 r        r
with previous publications and didn’t consider irrelevant pa-       interest set Pint  into training and test sets Ptrain  , Ptest re-
pers. In [9], Minkov et al. presented a collaborative rank-         spectively. However, this step is dropped out in the non-
ing approach for events recommendation. They compared it            experimental recommendation scenario and the first step re-
                                                                                                                       r
with a content-based baseline that applies pairwise learning-       ceives, in this case, the complete interest set Pint .
to-rank on pairs of relevant and irrelevant events. In our
                                                                       User’s
work, we follow similar approach in applying learning-to-           Interest list                       1                          2                      3
rank on pairs of relevant an irrelevant papers. However, we
                                                                                                             Peer papers     Forming pairwise        Preferences
                                                                          Test (20%) training (80%)


push it further and investigate the quality of these pairs and                                               augmenting        preferences            validation
their effect on the model performance.

                                                                                                                                                4   Model learning
3.   PROPOSED APPROACH
   This work targets users who have previously interacted                                               1                          5
with scientific papers and identified some as papers of inter-                                               Peer papers       Ranking &             User’s Model
                                                                                                             augmenting        Evaluation              (profile)
est (relevant papers). Having a set of relevant papers for a
user, the recommendation process can start and a machine
learning method is applied to fit a user profile (model). The        Figure 1: Overview the proposed approach steps
learned model is used to rank a set of candidate papers and
recommend the top ranked papers to the user. Our ap-                   1. Peer papers augmenting: in this step, the peer pa-
proach is to employ the pairwise learning-to-rank technique               pers are retrieved for all relevant papers. Retrieved
in building the user profile. We chose this method because                peer papers serve as potential negative classes and are
of its desirable properties: It was proven to be successful in            important for empowering the learning algorithm to
solving ranking tasks in similar problem domains like online              construct a better understanding of user’s taste.
advertising [7]. It also shows a good performance on prob-             2. Forming pairwise preferences: here we apply the con-
lems with sparse data. The main idea of pairwise learning-                cept of pairwise learning from learning-to-rank. The
to-rank is to build pairs of preferences out of the training set.         training set in this step is reformulated as a set of
Each pair consists of a positive and a negative instance. Af-             pairs P, where each pair consists of two components:
terwards, the pairs are fed as training instances to a learning           a relevant paper and an irrelevant paper. That is, each
algorithm, which in turn learns the desirable model. In the                                    r
                                                                          relevant paper p ∈ Ptrain is paired with all papers from
underlying problem, papers marked as interesting by users                 peer(p):
are the positive instances. However, the negative instances
or the irrelevant papers are usually not explicitly provided                                                P = {(p, p0 )|∀p ∈ Ptrain
                                                                                                                                r
                                                                                                                                      ∧ ∀p0 ∈ peer(p)}
by the users. This makes pairwise learning-to-rank not di-          1
                                                                      Later, we introduce a validation process that checks the
rectly applicable on this setup. In our contribution, we seek       correctness of this hypothesis for each pair.
                                                                    2
implicit information about the irrelevant papers. For this,           http://dblp.uni-trier.de
     A pair (p, p0 ) ∈ P depicts a preference in user’s taste      Pruning Based Validation (PBV). Here we filter out
     and implies that p has a higher relevance to user r than      invalid pairwise preferences. Validity is judged based on the
     p0 .                                                          dissimilarity between the pair’s components. If they prove
  3. Preferences validation: In the first step, we introduced      to be similar, then we don’t consider p0 as an irrelevant pa-
     the peer papers as negative classes based on the hy-          per and consequently, the pair (p, p0 ) is not eligible for model
     pothesis mentioned earlier in this section. Yet, this         learning. A similarity threshold τ is applied and a pair (p, p0 )
     can’t be adopted as a ground truth due to: (a) it             is pruned if similarity(p, p0 ) > τ . In our experiments, we
     is not explicitly affirmed by users that they are not         empirically test a range of values for τ and discuss the cor-
     interested in peer papers; and (b) some peer papers           responding effect on the model.
     might be of interest to the user but might have been
     overlooked. Having this in mind, not all pairwise pref-       4.    EXPERIMENTS
     erences formulated in the previous step have the same
     level of correctness. Therefore, this step examines pair-     4.1    Dataset & Setup
     wise preferences and makes sure to pass valid ones to
                                                                      We evaluated the proposed approach on the Scholarly
     model learning. We propose two different mechanisms
                                                                   publication recommendation dataset from [12], including the
     to accomplish this validation: pruning based valida-
                                                                   extensions applied in our previous work [2]: Papers are iden-
     tion and weighting based validation. We explain these
                                                                   tified and enriched with meta-data from the DBLP register,
     techniques in the next section.
                                                                   namely titles, abstracts, keywords and the publishing con-
  4. Model learning: In this step, we apply a pairwise learning-
                                                                   ference. The dataset contains 69,762 candidate papers, as
     to-rank method (Ranking SVM [5]) to train a user
                                                                   well as the lists of relevant papers for 48 researchers. The
     model ŵr . Using validated pairwise preference from
                                                                   number of relevant papers ranges from 8 to 208 with an av-
     the previous step, we seek ŵr that minimizes the ob-
                                                                   erage of 71 papers. After augmenting peer papers, we got
     jective function:
                                                                   a skewed distribution as the ratio of relevant papers to peer
                              1                                    paper ranges from 0.45% to 3% with an average of 1.2%. We
                 ŵr = arg max ||wr ||2 + C.L(wr )
                          wr  2                                    performed offline experiments with 5-folds cross validation
                                                                   following the steps outlined in Figure 1. For each researcher
      With C ≥ 0 is a penalty parameter and L(wr ) is the          we randomly split the interest list into training and test sets;
      pairwise hinge loss function:                                then, we learn researchers’ models as described in section 3;
                        X                                          finally, we evaluate the learned models on the test set. The
            L(wr ) =          max(0, 1 − wrT (p − p0 ))2 (*)
                                                                   test set consists of: (a) positive instances, the test relevant
                      (p,p0 )∈P
                                                                   papers (20% of the researchers interest list) and (b) negative
  5. Ranking & Evaluation: Given the user’s model as a             instances, the peer papers of the positive instances. This ap-
     result of the previous step, here we apply the predic-        plies for all of our experiments, except for experiments on
     tion on candidate papers. For the experimental setup,         the pruning based validation method (PBV). In PBV, we
     this is the the test set, which is constructed out of rel-    filter out those pairs which components have a similarity
                      r                                            higher than τ from the training set. Therefore, we apply
     evant papers Ptest   (the positive instances), in addition
     to their peer papers as irrelevant papers (the negative       the same rule on the test set and we filter out peer pa-
     instances).                                                   pers based on their similarity to the corresponding relevant
                                                                   paper. For example, given a similarity threshold τ and a
3.2    Preferences Validation Methods                              relevant paper p from the test set, a peer paper p0 ∈ peer(p)
   As pairwise learning-to-rank expects pairs that show con-       is added as an irrelevant paper to the test set if and only if
trast between negative and positive classes, pairs with “wrongly   similarity(p, p0 ) ≤ τ .
assigned” peers pose a potential noise to the learning pro-
cess. After all, the validity of a pairwise preference (p, p0 )
depends on the correctness of considering its peer paper p0 ir-
                                                                   4.2    Metrics
relevant. The pair’s relevant paper p forms the ground truth          We measured the following metrics to determine the per-
and hence, it can be considered as the reference point to de-      formance for top k ranking and also overall classification.
cide whether p0 is irrelevant or not. For each pair (p, p0 ) ∈ P   We show the averages over all researchers for each metric:
we measure the similarity between p and p0 , and adopt two         Mean Reciprocal Rank (MRR): evaluates the position of the
methods to validate the pair based on this similarity:             first relevant paper in the ranked result.
Weighting Based Validation (WBV). This strategy is                 Normalized Discounted Cumulative Gain (nDCG): nDCG@k
based on giving pairwise preferences different weights based       indicates how good the top k results of the ranked list are.
on the dissimilarity between the pairs components. This            We look at nDCG for k ∈ {5, 10}
boosts the importance for pairs with dissimilar components         AUC and Recall: used to study the behavior of validation
and assures that the more similar the pair’s components are,       strategies PBV, WBV and the baseline algorithms: Logistic
the less important the pair for model learning is. Therefore,      Regression and SVM.
we weight the importance of each pair according to the dis-
tance (1-similarity) between the relevant paper and the peer
                                                                   4.3    Results & Discussion
paper. Then, we redefine the loss function from (*) to con-           In total, we performed three different experiments. The
sider pairs’ weights as following:                                 first experiment (with the results shown in Table 1) shows
            X                                                      a superior performance for our weighting based validation
L(wr ) =          max(0, 1−wrT (1−similarity(p, p0 ))(p−p0 ))2     method (WBV) over the state-of-the-art heuristic-based work
         (p,p0 )∈P                                                 (Sugiyama [12]) and model-based (PubRec [2]) approach.
The experiments were performed using the same features


                                                                             4 5 6 7 8 9
                                                                           0. 0. 0. 0. 0. 0.
and datasets present in these works and show a clear lead


                                                                                                                                      7
                                                                                                                                   0.
                                                                  MRR


                                                                                                                         AUC
over all metrics.


                                                                                                                                      6
                                                                                                                                   0.
                         MRR      nDCG@5       nDCG@10


                                                                                                      0. 5
                                                                                                      0. 01
                                                                                                           5

                                                                                                        0. 1
                                                                                                          05

                                                                                                            1
                                                                                                           5

                                                                                                                     1


                                                                                                                                                       0. 5
                                                                                                                                                       0. 01
                                                                                                                                                            5

                                                                                                                                                         0. 1
                                                                                                                                                           05

                                                                                                                                                                    1
                                                                                                                                                                    5

                                                                                                                                                                        1
                                                                                                          0


                                                                                                         00

                                                                                                          0


                                                                                                         0.
                                                                                                         0.


                                                                                                                                                           0


                                                                                                                                                          00

                                                                                                                                                           0


                                                                                                                                                                  0.
                                                                                                                                                                  0.
 WBV                     0.728    0.471        0.391


                                                                                                       00

                                                                                                         0


                                                                                                        0.


                                                                                                                                                        00

                                                                                                                                                          0


                                                                                                                                                         0.
                                                                                                           τ                                                  τ


                                                                                                     0.


                                                                                                                                                      0.
 PubRec                  0.717    0.445        0.382


                                                                                                                                                  7
                                                                             3 4 5 6 7 8


                                                                                                                         nDCG@10
                                                                                                                                   0. 0. 0. 0. 0.
                                                                  nDCG@5
                                                                           0. 0. 0. 0. 0. 0.
 Sugiyama[12] via [2]    0.577    0.345        0.285


                                                                                                                                           5  6
Table 1: WBV compared to state-of-the-art model-


                                                                                                                                        4
based and heuristic-based approaches


                                                                                                                                     3
                                                                                                      0. 5
                                                                                                      0. 01
                                                                                                           5

                                                                                                        0. 1
                                                                                                          05

                                                                                                                 1
                                                                                                                 5

                                                                                                                     1


                                                                                                                                                       0. 5
                                                                                                                                                       0. 01
                                                                                                                                                            5

                                                                                                                                                         0. 1
                                                                                                                                                           05

                                                                                                                                                                    1
                                                                                                                                                                    5

                                                                                                                                                                        1
                                                                                                          0


                                                                                                         00

                                                                                                          0


                                                                                                               0.
                                                                                                               0.


                                                                                                                                                           0


                                                                                                                                                          00

                                                                                                                                                           0


                                                                                                                                                                  0.
                                                                                                                                                                  0.
                                                                                                        00

                                                                                                         0


                                                                                                        0.


                                                                                                                                                        00

                                                                                                                                                          0


                                                                                                                                                         0.
   The second experiment compares the performance of our                                                   τ                                                  τ


                                                                                                     0.


                                                                                                                                                      0.
approach over other, baseline classification algorithms like


                                                                             2 3 4 5 6 7 8 9 1
SVM and logistic regression to provide a more general un-


                                                                           0. 0. 0. 0. 0. 0. 0. 0.
                                                                  Recall
derstanding of its capabilities. As shown in Figure 2, logistic
                                                                                                                                                        PBV         LR
regression showed a weak performance on all metrics, par-
ticularly on Recall. It didn’t succeed in identifying relevant                                                                                          SVM        WBV
papers even when it is fed with a balanced training set. How-


                                                                                                      0. 5
                                                                                                      0. 01
                                                                                                           5

                                                                                                        0. 1
                                                                                                          05

                                                                                                            1
                                                                                                           5

                                                                                                                     1
                                                                                                          0


                                                                                                         00

                                                                                                          0


                                                                                                         0.
                                                                                                         0.
ever, SVM showed a better ability to recognize the relevant


                                                                                                       00

                                                                                                         0


                                                                                                        0.
                                                                                                           τ


                                                                                                     0.
papers with a better recall value, but produced a lot of false
positives and this is clear from its lower MRR and nDCG
values. In contrast, all variants of our method showed a          Figure 2: WBV and PBV compared with Logistic
superior performance in all metrics. Finally, we compare          regression and Support Vector Machine
between the suggested pair validation techniques WBV and
PBV, including tuning the latter by varying the similarity
                                                                   [2] A. Alzoghbi, V. A. A. Ayala, P. M. Fischer, and
threshold τ from 1 (where no pairs are filtered, this case rep-
                                                                       G. Lausen. Pubrec: Recommending publications based
resents the CBF approach of [9]), down to 4 ∗ 10−4 (where a
                                                                       on publicly available meta-data. In LWA, 2015.
lot of “noisy” pairs are pruned from the training set). WBV
                                                                   [3] J. Beel, B. Gipp, S. Langer, and C. Breitinger.
showed in general a very good performance, beating PBV
                                                                       Research-paper recommender systems: a literature
for higher values of τ on all metrics except recall. There,
                                                                       survey. IJDL, 2015.
PBV gives a slightly better recall even without filtering any
pairs (when τ = 1). This refers to the fact that weighting         [4] L. Hang. A short introduction to learning to rank.
the pairs in WBV causes the model to miss some relevant                IEICE TRANSACTIONS on Information and
papers, while PBV made models more capable of recogniz-                Systems, 2011.
ing the relevant papers by eliminating the noisy pairs from        [5] R. Herbrich, T. Graepel, and K. Obermayer. Large
the training set. When decreasing τ , PBV shows very good              Margin Rank Boundaries for Ordinal Regression. 2000.
scores, but these results need additional investigation before     [6] J. Lee, K. Lee, J. G. Kim, and S. Kim. Personalized
leading to a clear conclusion. As mentioned earlier in this            academic paper recommendation system. In SRS,
section, reducing τ also leads to a smaller number of irrele-          2015.
vant papers in the test set. This reduces the underlying bias      [7] C. Li, Y. Lu, Q. Mei, D. Wang, and S. Pandey.
in the test set which has an (additional) positive impact on           Click-through prediction for advertising in twitter
the metrics, even though there is still a clear bias (the rel-         timeline. In KDD, 2015.
evant/peer ratio is on average 11.2%) present at the lowest        [8] S. M. McNee and et al. On the recommending of
τ values.                                                              citations for research papers. In CSCW, 2002.
                                                                   [9] E. Minkov, B. Charrow, J. Ledlie, S. Teller, and
5.   CONCLUSION                                                        T. Jaakkola. Collaborative future event
  In this paper, we investigated the application of learning-          recommendation. CIKM, 2010.
to-rank in research paper recommendation. We proposed a           [10] D. D. Nart and C. Tasso. A personalized
novel approach that leverages irrelevant papers to produce             concept-driven recommender system for scientific
more accurate user models. Offline experiments showed that             libraries. Procedia Computer Science, 2014.
our method outperforms state-of-the-art CBF research pa-          [11] K. Sugiyama and M.-Y. Kan. Scholarly paper
per recommendations utilizing only publicly available meta-            recommendation via user’s recent research interests. In
data. Our future steps will focus on further understanding             JCDL, 2010.
the effect of the similarity threshold in pruning based vali-     [12] K. Sugiyama and M.-Y. Kan. Exploiting potential
dation (PBV) on the model quality and study the suitability            citation papers in scholarly paper recommendation. In
of pairwise learning-to-rank algorithms other than Ranking             JCDL, 2013.
SVM for this problem.                                             [13] A. Vellino. A comparison between usage-based and
                                                                       citation-based methods for recommending scholarly
6.   REFERENCES                                                        research articles. In ASIS&T, 2010.
 [1] G. Adomavicius, Z. Huang, and A. Tuzhilin.                   [14] C. Wang and D. M. Blei. Collaborative topic modeling
     Personalization and Recommender Systems. 2014.                    for recommending scientific articles. In KDD, 2011.