Conceptual Impact-Based Recommender System for
                           CiteSeerx

                      Kevin Labille                               Susan Gauch                   Ann Smittu Joseph
              Department of Computer                       Department of Computer            Department of Computer
               Science and Computer                         Science and Computer              Science and Computer
                     Engineering                                  Engineering                       Engineering
               University of Arkansas                       University of Arkansas            University of Arkansas
             Fayetteville, AR 72701, USA                  Fayetteville, AR 72701, USA       Fayetteville, AR 72701, USA
                 kclabill@uark.edu                            sgauch@uark.edu                 ann@email.uark.edu

ABSTRACT                                                                  1.   INTRODUCTION
CiteSeerx is a digital library for scientific publications writ-          In recent years, recommender systems have become ubiq-
ten by Computer Science researchers. Users are able to re-                uitous, recommending movies, restaurants, and books etc.
trieve relevant documents from the database by searching by               The recommendations ease information overload for users
author name and/or keyword queries. Users may also receive                by pro-actively suggesting relevant items to the users, mov-
recommendations of papers they might want to read pro-                    ing the burden of discovery from the user to the system.
vided by an existing conceptual recommender system. This                  The number and type of applications that use recommender
system recommends documents based on an automatically-                    systems keeps growing [1]; one practical application that is
constructed user profile. Unlike traditional content-based                of interest to researchers in any domain is the ability of rec-
recommender systems, the documents and the user profile                   ommender systems to suggest relevant scientific literature.
are represented as concepts vectors rather than keyword                   These systems can expedite scientific innovation by helping
vectors and papers are recommended based on conceptual                    researchers keep abreast of new publications in their fields
matches rather than keyword matches between the profile                   and also help new researchers learn about the most impor-
and the documents. Although the current system provides                   tant literature in an area new to them. Digital libraries can
recommendations that are on-topic, they are not necessarily               employ recommender systems that suggest papers to their
high quality papers. In this work, we introduce the Concep-               users based on each user’s research interests. However, an
tual Impact-Based Recommender (CIBR), a hybrid recom-                     effective recommender system should not only consider the
mender system that extends the existing conceptual recom-                 subject of a paper, it should also take into account the pa-
mender system in CiteSeerx by including an explicit quality               per’s quality when making recommendations. To this end,
factor as part of the recommendation criteria. To measure                 we present a recommender system that recommends scien-
quality, our system considers the impact factor of each pa-               tific papers based on user preferences as well as paper qual-
per’s authors as measured by the authors’ h-index. Exper-                 ity as measured by the authors’ impact factors to provide
iments to evaluate the effectiveness of our hybrid system                 recommendations of high-quality papers that are relevant
show that the CIBR system recommends more relevant pa-                    to the user’s research area. To help CiteSeerx users locate
pers as compared to the conceptual recommender system.                    scientific papers related to their work, a citation-based rec-
                                                                          ommender system was developed by Chandrasekaran et al.
Categories and Subject Descriptors                                        in 2008 [4] . Although citations are effective at identifying
Information Systems [Information retrieval]: Retrieval                    papers that have relevant content and are also high quality,
tasks and goals:Recommender systems                                       this approach is only effective in recommending papers with
                                                                          many citations. These unfortunately tend to be older papers
                                                                          that have been published long enough ago to generate many
General Terms                                                             citations. Especially in a fast-moving domain like computer
Performance, Reliability, Design, Experimentation                         science, researchers need to know about recent contribu-
                                                                          tions to their field, yet recent papers have few citations.
Keywords                                                                  To solve this problem, a content-based recommender sys-
Recommender System, h-index, Content-based Recommender                    tem for CiteSeerx was developed by Pudhiyaveetil et al.[8].
System, CiteSeerx , Information Retrieval                                 This conceptual recommender system automatically builds
                                                                          conceptual profiles for users based on their interactions with
                                                                          the system. It also builds conceptual profiles for each docu-
                                                                          ment and recommends papers based on conceptual matches
                                                                          between document and user profiles. Even though the rec-
                                                                          ommendations were shown to be more relevant than those
                                                                          produced by a keyword-based recommender system, they are
                                                                          not always high quality papers that the researcher wanted
                                                                          to read. Our objective is to improve upon the conceptual
CBRecSys 2015, September 20, 2015, Vienna, Austria.                       recommender system by providing better quality recommen-
Copyright remains with the authors and/or original copyright holders
dations to the users. To do so, we developed a recommender        mender systems are useful for researchers to be up to date
system that recommends papers based on the paper authors’         in their research area. Many content-based recommender
impact factors. We combined the impact-factor based rec-          systems represent the user interests and the documents as
ommendations with the concept-based recommendations in            weighted keyword vectors. One example is [13] in which
varying proportions to create a hybrid recommender sys-           tf ∗ idf weights are calculated for keywords and the cosine
tem. We evaluated the effectiveness of the conceptual rec-        similarity measure is used to determine the relevancy of a
ommender system, the impact-factor recommender system,            paper to a user’s profile. An approach similar to ours is used
and the hybrid recommender system and found that the hy-          in [5]. In their work, each paper’s features are represented
brid recommender system provides the most accurate recom-         as concepts created by automatically extracting keyphrases.
mendations. The rest of this paper is organized as follows:       User profiles are constructed from the concepts in previ-
In section 2 we review related work. Section 3 describes the      ously viewed papers and the recommender system matches
Conceptual Impact-Based Recommender (CBIR) system in              the user profile concepts to each papers’ concepts to suggest
detail. In section 4, we present our experimental evaluation      new papers in a scientific library. In [8], a conceptual rec-
to analyze the effectiveness of our recommender system. Fi-       ommender system was presented that recommends research
nally, we present our conclusions and discuss future work in      papers for CiteSeerx users. Unlike the previous work, the
section 5.                                                        concepts for each paper are assigned by automatically clas-
                                                                  sifying papers into a set of concepts defined by a pre-existing
2.   RELATED WORK                                                 ontology. A conceptual user profile is implicitly built as users
                                                                  view papers in the collection and this user profile is used to
The design of a recommender system can vary based on the
                                                                  recommend conceptually similar papers.
nature of user feedback or the availability of data. There are
                                                                  The content-based recommender systems can recommend
three main approaches: collaborative filtering, content based
                                                                  literature that is similar in topic to the user’s profile, but
recommender systems, and recommender systems that are a
                                                                  it does not necessarily recommend high-quality papers. Al-
hybrid of the two [1]. The first approach generates recom-
                                                                  though there is no perfect way to measure the quality of
mendations based on similarities between the users’ behavior
                                                                  articles, the Impact Factor (IF) introduced in 1955 is still
or/and preferences. In contrast, content-based approaches
                                                                  considered the best way to evaluate a paper’s scientific merit
recommend items to the users based on similarities between
                                                                  [6]. There are several types of IFs, including the widely used
the attributes of the items themselves [10]. Collaborative
                                                                  h-index that evaluates a researcher’s impact [7]. It has been
approaches are typically used when semantic features can-
                                                                  recently used is several fields such as health services research
not easily be extracted from the items, so indirect evidence
                                                                  [3], business and management [11] or even academic psychia-
based on user’s likes or ratings must be compared. To be
                                                                  try [14] . Although the work in [5], [8], and [13] are similar to
effective, collaborative filtering requires a large active user
                                                                  ours, our recommender system expands upon their work by
community to avoid the well-known ”cold-start” problem in
                                                                  incorporating a quality factor as measured by the authors’
which there are many more items to be recommended than
                                                                  h-indexes.
there are users with likes or ratings upon which recommen-
dations can be based. On the other hand, pure content-
based recommender systems do not consider external infor-         3.   APPROACH
mation that might be available from the users, e.g., popular-
ity. For these reasons, many recommender systems employ a
hybrid approach combines both of the previously-described
approaches.
Content-based recommender systems match the users’ pref-
erences to each items’ features to recommend new objects
[10]. Many share the approach of building a user profile from
a set of features extracted from previously liked items. This
user profile is then compared to the features of all items in
the collection and the most similar items are recommended
to the user [12]. This type of recommender system can be
used in domain for which semantically relevant features can
be extracted and it is particularly well-suited for domains
that include textual items as scientific literature or domains
with annotations such as movies or music [12]. Kompan et
al. used this approach to recommend news articles on a web                 Figure 1: Architecture of the CIBR
site [9]. In this domain, the volume of articles and the dy-
namic nature of news make collaborative filtering infeasible      The architecture of the Conceptual Impact-Based Recom-
so they implemented a content-based recommender system            mender System (CIBR) is shown in Figure 1. The Profile
based on cosine similarity that suggested articles that best      Subsystem classifies all documents in the CiteSeerx database
matched an implicitly constructed user model [9].                 into the 369 predefined categories in the ACM Computing
Our work is a hybrid approach that enhances a content-            Classification System (CCS). Documents manually tagged
based recommender system with a quality measure to rec-           with ACM categories by their authors are used as the train-
ommend scientific literature. According to Beel et al., rec-      ing set for a k-nearest neighbor classifier. As users interact
ommender systems for research papers are flourishing with         with the system, the documents that they examine are in-
more than 80 approaches existing today that have been dis-        put to the Profile Subsystem. The categories associated with
cussed in over 170 articles and patents [2]. Such recom-          each examined document are combined to create a weighted
conceptual user profile. This user profile is used by both      user. We tried other approaches to calculate the impact
the Conceptual Recommender and the Impact-Based Rec-            factor among which we consider the sum of each authors’
ommender described in the following sections. The outputs       h-indices. This particular method is limited since the high-
of these two Recommenders are combined to produce the           est weighted papers would usually be the ones with many
recommendations from the CBIR.                                  authors.

3.1   Concept-Based Recommender System                          3.3     Conceptual Impact-Based Recommender
                                                                        System
                                                                The Conceptual Impact-Based Recommender System (CIBR)
                                                                combines the Conceptual Weights and the Impact Weights
                                                                to produce its recommendations. The two sub-component
                                                                weights are normalized to fall between 0 to 1 using linear
Figure 2: Conceptual Recommender System Archi-                  scaling and then combined based on a tunable parameter,
tecture                                                         α. The weight of the conceptual impact match between doc-
                                                                ument i and user j, γij , is calculated using:
As a user views documents in CiteSeerx , the Profile Subsys-                                0
tem builds a conceptual user profile for them by accumulat-                      γij = α ∗ Cij + (1 − α) ∗ Ii0              (2)
ing the concept weights associated with the documents that      Where
the user examines. The Conceptual Recommender System
then recommends documents to the user based on the sim-                   0
                                                                         Cij = normalized ConceptualW eightij =
ilarity between each document’s conceptual profile and the
                                                                          ConceptualW eightij −minj (ConceptualW eight)
user’s conceptual profile [8]. The weight of the conceptual             maxj (ConceptualW eight)−minj (ConceptualW eight)
match between document i and user j is calculated using
the cosine similarity function over all M=369 concepts in                    Ii0 = normalized ImpactW eighti =
the ACM taxonomy:                                                            ImpactW eighti −minj (ImpactW eight)
                                                                           maxj (ImpactW eight)−minj (ImpactW eight)
        ConceptualW eightij = M
                                P
                                  K=1 (cwtik ∗ cwtjk )
                                                                α = controls the relative contributions of two sub-weights
Where
cwtik = weight of concept k in document profile i and           By varying α from 0 to 1, we can adjust the relative con-
cwtjk = weight of concept k in user profile j as explained      tributions of two underlying recommender systems. When
and detailed in [8].                                            α = 0, the CBIR is a pure impact-based recommender sys-
                                                                tem whilst when α = 1, the CBIR is a purely Conceptual
3.2   Impact-Based Recommender System                           recommender system.

                                                                4. EXPERIMENTAL EVALUATION
                                                                4.1 Subjects and Dataset
                                                                We conducted several experiments to measure the effective-
                                                                ness of our hybrid recommender system. Experiments were
Figure 3: Impact-based Recommender System Ar-                   done with 30 subjects, undergraduate and graduate com-
chitecture                                                      puter science and computer engineering students from the
                                                                university of Arkansas. We use the 2190179 documents in
The Impact Factor Generator precalculates an impact fac-        our snapshot of the CiteSeerx , a digital library and a search
tor for each document in the collection as measured by its      engine for computer and information sciences literature. Be-
authors’ h-indices. As described by Hirsch, an author has       cause previous experiments have shown that profiles become
an h-index of m based on his/her N published articles if m      stable after viewing 20 papers, users we asked to search for
articles have at least m citations each, and the other N-m      and view at least that many papers related to their own re-
articles have no more than m citations each [7]. The impact     search area. Based on those documents, user profiles were
factor for a document is calculated by finding the h-index      automatically constructed for each user
value of each of the authors of the document and then select-
ing the highest h-index value. Thus, document i’s h-index       4.2     Evaluation Method
is equal that of its most impactful author:
                                                                The goal of this experiment was first to determine what com-
             ImpactW eighti = max (hindexil )            (1)    bination the conceptual match and the paper quality is most
                                l∈Ail
                                                                effective in our hybrid recommender system. The relative
Where                                                           combinations of the two is given by the equation in Section
Ail = list of the authors l of document i                       3. By changing the value of α we are able to control the rel-
Since the impact factor is independent of users, the Impact-    ative contributions of the two recommender systems with α
Based recommendations would be the same for all users,          = 0.0 being a pure impact-based recommender system and
i.e., the most impactful documents in the entire collection.    α = 1.0 being a pure conceptual recommender system and
We do, however, use the user profile to filter out docu-        α = 0.5 using even contributions from both. We varied the
ments from categories in which the user has shown no previ-     value of α from 0.0 to 1.0 with an increment of 0.1 for each
ous input. Thus, Impact-Based Recommender returns high-         of the subjects in the experiment and for each value of α
impact documents from categories of some interest to the        we collected the top ten recommended documents. For each
                                                                 6.   ACKNOWLEDGMENTS
                                                                 This research was supported in part by the National Science
                                                                 Foundation grant number 0958123 : Collaborative Research:
                                                                 CI-ADDO-EN: Semantic CiteSeerx

                                                                 7.   REFERENCES
                                                                  [1] G. Adomavicius and A. Tuzhilin. Toward the next
                                                                      generation of recommender systems: A survey of the
                                                                      state-of-the-art and possible extensions. Knowledge
                                                                      and Data Engineering, IEEE Transactions on,
                                                                      17(6):734–749, 2005.
Figure 4: Mean Average Weighted Precision for ev-
ery α                                                             [2] J. Beel, S. Langer, M. Genzmehr, B. Gipp,
                                                                      C. Breitinger, and A. Nürnberger. Research paper
                                                                      recommender system evaluation: A quantitative
user, we presented them with the set of all documents rec-            literature survey. In Proceedings of the International
ommended by any of the versions of the system (removing               Workshop on Reproducibility and Replication in
duplicates) in random order. They provided explicit rele-             Recommender Systems Evaluation, pages 15–22. ACM,
vance feedback by rating the papers as very relevant (2),             2013.
relevant (1), or irrelevant (0). We then used the Mean Av-        [3] Y. Birks, C. Fairhurst, K. Bloor, M. Campbell,
erage Weighted Precision (MAWP) of each user for each                 W. Baird, and D. Torgerson. Use of the h-index to
α as a metric. The MAWP is essentially the Mean Aver-                 measure the quality of the output of health services
age Precision modified to handle weights from 0..2 rather             researchers. Journal of health services research &
than just Boolean relevance judgments. The mean of every              policy, 19(2):102–109, 2014.
MAWP for each α is calculated and summarized in Figure            [4] K. Chandrasekaran, S. Gauch, P. Lakkaraju, and H. P.
4. As shown on Figure 4, an α of 0.9 gives the best results,          Luong. Concept-based document recommendations for
0.6355, meaning that a 90% contribution from the concep-              citeseer authors. In Adaptive Hypermedia and Adaptive
tual recommender system and a 10% contribution from the               Web-Based Systems, pages 83–92. Springer, 2008.
impact-based recommender performed the best. For the sec-         [5] D. De Nart and C. Tasso. A personalized
ond part of our analysis, we compared the effectiveness of            concept-driven recommender system for scientific
the three recommender systems head-to-head. The hybrid                libraries. Procedia Computer Science, 38:84–91, 2014.
recommender system with α = 0.9 outperformed the concep-          [6] E. Garfield. Journal impact factor: a brief review.
tual recommender system’s MWAP of 0.6083 (α = 1.0) by                 Canadian Medical Association Journal,
4.5% relative (or 2.72% absolute) and the impact-based rec-           161(8):979–980, 1999.
ommender system’s MWAP of 0.2867 (α = 0.0) by 121.67%             [7] J. E. Hirsch. An index to quantify an individual’s
relative or 34.88% absolute. Both of these results are statis-        scientific research output. Proceedings of the National
tically significant (p < 0.05), based on the paired two-tailed        academy of Sciences of the United States of America,
student t-test.                                                       102(46):16569–16572, 2005.
                                                                  [8] A. Kodakateri Pudhiyaveetil, S. Gauch, H. Luong, and
5.   CONCLUSION AND FUTURE WORK                                       J. Eno. Conceptual recommender system for citeseerx.
In this paper, a hybrid recommender system was introduced             In Proceedings of the third ACM conference on
that recommends high quality papers to CiteSeerx users.               Recommender systems, pages 241–244. ACM, 2009.
The new recommender combines a conceptual recommender             [9] M. Kompan and M. Bieliková. Content-based news
system along with an impact-factor-based recommender sys-             recommendation. In E-commerce and web technologies,
tem. The former incorporates the user’s preferences repre-            pages 61–72. Springer, 2010.
sented as a concept vector whilst the latter incorporates pa-    [10] P. Lops, M. De Gemmis, and G. Semeraro.
per quality using the authors’ impact factors as measured             Content-based recommender systems: State of the art
by their h-indexes. User experiments were conducted to                and trends. In Recommender systems handbook, pages
compare the concept-based recommender system and the                  73–105. Springer, 2011.
impact-based recommender system with our hybrid system.          [11] J. Mingers, F. Macri, and D. Petrovici. Using the
The results confirm that our hybrid recommender gener-                h-index to measure the quality of journals in the field
ates relevant documents as compared to the conceptual or              of business and management. Information Processing
the impact-factor-based recommender. Future work could                & Management, 48(2):234–241, 2012.
consider using social networks of co-authors or differential
                                                                 [12] M. J. Pazzani and D. Billsus. Content-based
weighting of the papers. Another direction would be to in-
                                                                      recommendation systems. In The adaptive web, pages
vestigate the effectiveness of our hybrid recommender sys-
                                                                      325–341. Springer, 2007.
tem by considering the g-index that gives a stronger weight
                                                                 [13] S. Philip and A. O. John. Application of content-based
to highly-cited papers as compared to the h-index. Alter-
                                                                      approach in research paper recommendation system
natively, we could use the e-index that complements the h-
                                                                      for a digital library. International Journal of Advanced
index by distinguishing authors having the same h-index but
                                                                      Computer Science & Applications, 5(10), 2014.
different numbers of citations.
                                                                 [14] S. Selek and A. Saleh. Use of h index and g index for
                                                                      american academic psychiatry. Scientometrics,
                                                                      99(2):541–548, 2014.