=Paper= {{Paper |id=Vol-1448/paper9 |storemode=property |title=Conceptual Impact-Based Recommender System for CiteSeerx |pdfUrl=https://ceur-ws.org/Vol-1448/paper9.pdf |volume=Vol-1448 |dblpUrl=https://dblp.org/rec/conf/recsys/LabilleGJ15 }} ==Conceptual Impact-Based Recommender System for CiteSeerx== https://ceur-ws.org/Vol-1448/paper9.pdf

Conceptual Impact-Based Recommender System for
CiteSeerx

Kevin Labille Susan Gauch Ann Smittu Joseph
Department of Computer Department of Computer Department of Computer
Science and Computer Science and Computer Science and Computer
Engineering Engineering Engineering
University of Arkansas University of Arkansas University of Arkansas
Fayetteville, AR 72701, USA Fayetteville, AR 72701, USA Fayetteville, AR 72701, USA
kclabill@uark.edu sgauch@uark.edu ann@email.uark.edu

ABSTRACT 1. INTRODUCTION
CiteSeerx is a digital library for scientific publications writ- In recent years, recommender systems have become ubiq-
ten by Computer Science researchers. Users are able to re- uitous, recommending movies, restaurants, and books etc.
trieve relevant documents from the database by searching by The recommendations ease information overload for users
author name and/or keyword queries. Users may also receive by pro-actively suggesting relevant items to the users, mov-
recommendations of papers they might want to read pro- ing the burden of discovery from the user to the system.
vided by an existing conceptual recommender system. This The number and type of applications that use recommender
system recommends documents based on an automatically- systems keeps growing [1]; one practical application that is
constructed user profile. Unlike traditional content-based of interest to researchers in any domain is the ability of rec-
recommender systems, the documents and the user profile ommender systems to suggest relevant scientific literature.
are represented as concepts vectors rather than keyword These systems can expedite scientific innovation by helping
vectors and papers are recommended based on conceptual researchers keep abreast of new publications in their fields
matches rather than keyword matches between the profile and also help new researchers learn about the most impor-
and the documents. Although the current system provides tant literature in an area new to them. Digital libraries can
recommendations that are on-topic, they are not necessarily employ recommender systems that suggest papers to their
high quality papers. In this work, we introduce the Concep- users based on each user’s research interests. However, an
tual Impact-Based Recommender (CIBR), a hybrid recom- effective recommender system should not only consider the
mender system that extends the existing conceptual recom- subject of a paper, it should also take into account the pa-
mender system in CiteSeerx by including an explicit quality per’s quality when making recommendations. To this end,
factor as part of the recommendation criteria. To measure we present a recommender system that recommends scien-
quality, our system considers the impact factor of each pa- tific papers based on user preferences as well as paper qual-
per’s authors as measured by the authors’ h-index. Exper- ity as measured by the authors’ impact factors to provide
iments to evaluate the effectiveness of our hybrid system recommendations of high-quality papers that are relevant
show that the CIBR system recommends more relevant pa- to the user’s research area. To help CiteSeerx users locate
pers as compared to the conceptual recommender system. scientific papers related to their work, a citation-based rec-
ommender system was developed by Chandrasekaran et al.
Categories and Subject Descriptors in 2008 [4] . Although citations are effective at identifying
Information Systems [Information retrieval]: Retrieval papers that have relevant content and are also high quality,
tasks and goals:Recommender systems this approach is only effective in recommending papers with
many citations. These unfortunately tend to be older papers
that have been published long enough ago to generate many
General Terms citations. Especially in a fast-moving domain like computer
Performance, Reliability, Design, Experimentation science, researchers need to know about recent contribu-
tions to their field, yet recent papers have few citations.
Keywords To solve this problem, a content-based recommender sys-
Recommender System, h-index, Content-based Recommender tem for CiteSeerx was developed by Pudhiyaveetil et al.[8].
System, CiteSeerx , Information Retrieval This conceptual recommender system automatically builds
conceptual profiles for users based on their interactions with
the system. It also builds conceptual profiles for each docu-
ment and recommends papers based on conceptual matches
between document and user profiles. Even though the rec-
ommendations were shown to be more relevant than those
produced by a keyword-based recommender system, they are
not always high quality papers that the researcher wanted
to read. Our objective is to improve upon the conceptual
CBRecSys 2015, September 20, 2015, Vienna, Austria. recommender system by providing better quality recommen-
Copyright remains with the authors and/or original copyright holders
dations to the users. To do so, we developed a recommender mender systems are useful for researchers to be up to date
system that recommends papers based on the paper authors’ in their research area. Many content-based recommender
impact factors. We combined the impact-factor based rec- systems represent the user interests and the documents as
ommendations with the concept-based recommendations in weighted keyword vectors. One example is [13] in which
varying proportions to create a hybrid recommender sys- tf ∗ idf weights are calculated for keywords and the cosine
tem. We evaluated the effectiveness of the conceptual rec- similarity measure is used to determine the relevancy of a
ommender system, the impact-factor recommender system, paper to a user’s profile. An approach similar to ours is used
and the hybrid recommender system and found that the hy- in [5]. In their work, each paper’s features are represented
brid recommender system provides the most accurate recom- as concepts created by automatically extracting keyphrases.
mendations. The rest of this paper is organized as follows: User profiles are constructed from the concepts in previ-
In section 2 we review related work. Section 3 describes the ously viewed papers and the recommender system matches
Conceptual Impact-Based Recommender (CBIR) system in the user profile concepts to each papers’ concepts to suggest
detail. In section 4, we present our experimental evaluation new papers in a scientific library. In [8], a conceptual rec-
to analyze the effectiveness of our recommender system. Fi- ommender system was presented that recommends research
nally, we present our conclusions and discuss future work in papers for CiteSeerx users. Unlike the previous work, the
section 5. concepts for each paper are assigned by automatically clas-
sifying papers into a set of concepts defined by a pre-existing
2. RELATED WORK ontology. A conceptual user profile is implicitly built as users
view papers in the collection and this user profile is used to
The design of a recommender system can vary based on the
recommend conceptually similar papers.
nature of user feedback or the availability of data. There are
The content-based recommender systems can recommend
three main approaches: collaborative filtering, content based
literature that is similar in topic to the user’s profile, but
recommender systems, and recommender systems that are a
it does not necessarily recommend high-quality papers. Al-
hybrid of the two [1]. The first approach generates recom-
though there is no perfect way to measure the quality of
mendations based on similarities between the users’ behavior
articles, the Impact Factor (IF) introduced in 1955 is still
or/and preferences. In contrast, content-based approaches
considered the best way to evaluate a paper’s scientific merit
recommend items to the users based on similarities between
[6]. There are several types of IFs, including the widely used
the attributes of the items themselves [10]. Collaborative
h-index that evaluates a researcher’s impact [7]. It has been
approaches are typically used when semantic features can-
recently used is several fields such as health services research
not easily be extracted from the items, so indirect evidence
[3], business and management [11] or even academic psychia-
based on user’s likes or ratings must be compared. To be
try [14] . Although the work in [5], [8], and [13] are similar to
effective, collaborative filtering requires a large active user
ours, our recommender system expands upon their work by
community to avoid the well-known ”cold-start” problem in
incorporating a quality factor as measured by the authors’
which there are many more items to be recommended than
h-indexes.
there are users with likes or ratings upon which recommen-
dations can be based. On the other hand, pure content-
based recommender systems do not consider external infor- 3. APPROACH
mation that might be available from the users, e.g., popular-
ity. For these reasons, many recommender systems employ a
hybrid approach combines both of the previously-described
approaches.
Content-based recommender systems match the users’ pref-
erences to each items’ features to recommend new objects
[10]. Many share the approach of building a user profile from
a set of features extracted from previously liked items. This
user profile is then compared to the features of all items in
the collection and the most similar items are recommended
to the user [12]. This type of recommender system can be
used in domain for which semantically relevant features can
be extracted and it is particularly well-suited for domains
that include textual items as scientific literature or domains
with annotations such as movies or music [12]. Kompan et
al. used this approach to recommend news articles on a web Figure 1: Architecture of the CIBR
site [9]. In this domain, the volume of articles and the dy-
namic nature of news make collaborative filtering infeasible The architecture of the Conceptual Impact-Based Recom-
so they implemented a content-based recommender system mender System (CIBR) is shown in Figure 1. The Profile
based on cosine similarity that suggested articles that best Subsystem classifies all documents in the CiteSeerx database
matched an implicitly constructed user model [9]. into the 369 predefined categories in the ACM Computing
Our work is a hybrid approach that enhances a content- Classification System (CCS). Documents manually tagged
based recommender system with a quality measure to rec- with ACM categories by their authors are used as the train-
ommend scientific literature. According to Beel et al., rec- ing set for a k-nearest neighbor classifier. As users interact
ommender systems for research papers are flourishing with with the system, the documents that they examine are in-
more than 80 approaches existing today that have been dis- put to the Profile Subsystem. The categories associated with
cussed in over 170 articles and patents [2]. Such recom- each examined document are combined to create a weighted
conceptual user profile. This user profile is used by both user. We tried other approaches to calculate the impact
the Conceptual Recommender and the Impact-Based Rec- factor among which we consider the sum of each authors’
ommender described in the following sections. The outputs h-indices. This particular method is limited since the high-
of these two Recommenders are combined to produce the est weighted papers would usually be the ones with many
recommendations from the CBIR. authors.

3.1 Concept-Based Recommender System 3.3 Conceptual Impact-Based Recommender
System
The Conceptual Impact-Based Recommender System (CIBR)
combines the Conceptual Weights and the Impact Weights
to produce its recommendations. The two sub-component
weights are normalized to fall between 0 to 1 using linear
Figure 2: Conceptual Recommender System Archi- scaling and then combined based on a tunable parameter,
tecture α. The weight of the conceptual impact match between doc-
ument i and user j, γij , is calculated using:
As a user views documents in CiteSeerx , the Profile Subsys- 0
tem builds a conceptual user profile for them by accumulat- γij = α ∗ Cij + (1 − α) ∗ Ii0 (2)
ing the concept weights associated with the documents that Where
the user examines. The Conceptual Recommender System
then recommends documents to the user based on the sim- 0
Cij = normalized ConceptualW eightij =
ilarity between each document’s conceptual profile and the
ConceptualW eightij −minj (ConceptualW eight)
user’s conceptual profile [8]. The weight of the conceptual maxj (ConceptualW eight)−minj (ConceptualW eight)
match between document i and user j is calculated using
the cosine similarity function over all M=369 concepts in Ii0 = normalized ImpactW eighti =
the ACM taxonomy: ImpactW eighti −minj (ImpactW eight)
maxj (ImpactW eight)−minj (ImpactW eight)
ConceptualW eightij = M
P
K=1 (cwtik ∗ cwtjk )
α = controls the relative contributions of two sub-weights
Where
cwtik = weight of concept k in document profile i and By varying α from 0 to 1, we can adjust the relative con-
cwtjk = weight of concept k in user profile j as explained tributions of two underlying recommender systems. When
and detailed in [8]. α = 0, the CBIR is a pure impact-based recommender sys-
tem whilst when α = 1, the CBIR is a purely Conceptual
3.2 Impact-Based Recommender System recommender system.

4. EXPERIMENTAL EVALUATION
4.1 Subjects and Dataset
We conducted several experiments to measure the effective-
ness of our hybrid recommender system. Experiments were
Figure 3: Impact-based Recommender System Ar- done with 30 subjects, undergraduate and graduate com-
chitecture puter science and computer engineering students from the
university of Arkansas. We use the 2190179 documents in
The Impact Factor Generator precalculates an impact fac- our snapshot of the CiteSeerx , a digital library and a search
tor for each document in the collection as measured by its engine for computer and information sciences literature. Be-
authors’ h-indices. As described by Hirsch, an author has cause previous experiments have shown that profiles become
an h-index of m based on his/her N published articles if m stable after viewing 20 papers, users we asked to search for
articles have at least m citations each, and the other N-m and view at least that many papers related to their own re-
articles have no more than m citations each [7]. The impact search area. Based on those documents, user profiles were
factor for a document is calculated by finding the h-index automatically constructed for each user
value of each of the authors of the document and then select-
ing the highest h-index value. Thus, document i’s h-index 4.2 Evaluation Method
is equal that of its most impactful author:
The goal of this experiment was first to determine what com-
ImpactW eighti = max (hindexil ) (1) bination the conceptual match and the paper quality is most
l∈Ail
effective in our hybrid recommender system. The relative
Where combinations of the two is given by the equation in Section
Ail = list of the authors l of document i 3. By changing the value of α we are able to control the rel-
Since the impact factor is independent of users, the Impact- ative contributions of the two recommender systems with α
Based recommendations would be the same for all users, = 0.0 being a pure impact-based recommender system and
i.e., the most impactful documents in the entire collection. α = 1.0 being a pure conceptual recommender system and
We do, however, use the user profile to filter out docu- α = 0.5 using even contributions from both. We varied the
ments from categories in which the user has shown no previ- value of α from 0.0 to 1.0 with an increment of 0.1 for each
ous input. Thus, Impact-Based Recommender returns high- of the subjects in the experiment and for each value of α
impact documents from categories of some interest to the we collected the top ten recommended documents. For each
6. ACKNOWLEDGMENTS
This research was supported in part by the National Science
Foundation grant number 0958123 : Collaborative Research:
CI-ADDO-EN: Semantic CiteSeerx

7. REFERENCES
[1] G. Adomavicius and A. Tuzhilin. Toward the next
generation of recommender systems: A survey of the
state-of-the-art and possible extensions. Knowledge
and Data Engineering, IEEE Transactions on,
17(6):734–749, 2005.
Figure 4: Mean Average Weighted Precision for ev-
ery α [2] J. Beel, S. Langer, M. Genzmehr, B. Gipp,
C. Breitinger, and A. Nürnberger. Research paper
recommender system evaluation: A quantitative
user, we presented them with the set of all documents rec- literature survey. In Proceedings of the International
ommended by any of the versions of the system (removing Workshop on Reproducibility and Replication in
duplicates) in random order. They provided explicit rele- Recommender Systems Evaluation, pages 15–22. ACM,
vance feedback by rating the papers as very relevant (2), 2013.
relevant (1), or irrelevant (0). We then used the Mean Av- [3] Y. Birks, C. Fairhurst, K. Bloor, M. Campbell,
erage Weighted Precision (MAWP) of each user for each W. Baird, and D. Torgerson. Use of the h-index to
α as a metric. The MAWP is essentially the Mean Aver- measure the quality of the output of health services
age Precision modified to handle weights from 0..2 rather researchers. Journal of health services research &
than just Boolean relevance judgments. The mean of every policy, 19(2):102–109, 2014.
MAWP for each α is calculated and summarized in Figure [4] K. Chandrasekaran, S. Gauch, P. Lakkaraju, and H. P.
4. As shown on Figure 4, an α of 0.9 gives the best results, Luong. Concept-based document recommendations for
0.6355, meaning that a 90% contribution from the concep- citeseer authors. In Adaptive Hypermedia and Adaptive
tual recommender system and a 10% contribution from the Web-Based Systems, pages 83–92. Springer, 2008.
impact-based recommender performed the best. For the sec- [5] D. De Nart and C. Tasso. A personalized
ond part of our analysis, we compared the effectiveness of concept-driven recommender system for scientific
the three recommender systems head-to-head. The hybrid libraries. Procedia Computer Science, 38:84–91, 2014.
recommender system with α = 0.9 outperformed the concep- [6] E. Garfield. Journal impact factor: a brief review.
tual recommender system’s MWAP of 0.6083 (α = 1.0) by Canadian Medical Association Journal,
4.5% relative (or 2.72% absolute) and the impact-based rec- 161(8):979–980, 1999.
ommender system’s MWAP of 0.2867 (α = 0.0) by 121.67% [7] J. E. Hirsch. An index to quantify an individual’s
relative or 34.88% absolute. Both of these results are statis- scientific research output. Proceedings of the National
tically significant (p < 0.05), based on the paired two-tailed academy of Sciences of the United States of America,
student t-test. 102(46):16569–16572, 2005.
[8] A. Kodakateri Pudhiyaveetil, S. Gauch, H. Luong, and
5. CONCLUSION AND FUTURE WORK J. Eno. Conceptual recommender system for citeseerx.
In this paper, a hybrid recommender system was introduced In Proceedings of the third ACM conference on
that recommends high quality papers to CiteSeerx users. Recommender systems, pages 241–244. ACM, 2009.
The new recommender combines a conceptual recommender [9] M. Kompan and M. Bieliková. Content-based news
system along with an impact-factor-based recommender sys- recommendation. In E-commerce and web technologies,
tem. The former incorporates the user’s preferences repre- pages 61–72. Springer, 2010.
sented as a concept vector whilst the latter incorporates pa- [10] P. Lops, M. De Gemmis, and G. Semeraro.
per quality using the authors’ impact factors as measured Content-based recommender systems: State of the art
by their h-indexes. User experiments were conducted to and trends. In Recommender systems handbook, pages
compare the concept-based recommender system and the 73–105. Springer, 2011.
impact-based recommender system with our hybrid system. [11] J. Mingers, F. Macri, and D. Petrovici. Using the
The results confirm that our hybrid recommender gener- h-index to measure the quality of journals in the field
ates relevant documents as compared to the conceptual or of business and management. Information Processing
the impact-factor-based recommender. Future work could & Management, 48(2):234–241, 2012.
consider using social networks of co-authors or differential
[12] M. J. Pazzani and D. Billsus. Content-based
weighting of the papers. Another direction would be to in-
recommendation systems. In The adaptive web, pages
vestigate the effectiveness of our hybrid recommender sys-
325–341. Springer, 2007.
tem by considering the g-index that gives a stronger weight
[13] S. Philip and A. O. John. Application of content-based
to highly-cited papers as compared to the h-index. Alter-
approach in research paper recommendation system
natively, we could use the e-index that complements the h-
for a digital library. International Journal of Advanced
index by distinguishing authors having the same h-index but
Computer Science & Applications, 5(10), 2014.
different numbers of citations.
[14] S. Selek and A. Saleh. Use of h index and g index for
american academic psychiatry. Scientometrics,
99(2):541–548, 2014.