Knowledge graph-based weighting strategies for a scholarly
                  paper recommendation scenario
                               Rubén Manrique                                                                   Olga Marino
    Systems and Computing Engineering Department, School                                Systems and Computing Engineering Department, School
           of Engineering. Universidad de los Andes                                            of Engineering. Universidad de los Andes
                      Bogotá, Colombia                                                                    Bogotá, Colombia
                rf.manrique@uniandes.edu.co                                                           olmarino@uniandes.edu.co

ABSTRACT                                                                                 the knowledge of multiple domains, and in general, they specify a
In this paper, we study the effects of different node and edge weight-                   large number of interrelationships between concepts [12].
ing strategies of graph-based semantic representations on the accu-                          In previous works by the authors, a semantic representation that
racy of a scholarly paper recommendation scenario. Our semantic                          exploits the structured knowledge present in KGs for the task of rec-
representation relies on the use of Knowledge Graphs (KGs) for                           ommending scholarly papers was proposed [5–7]. In these works,
acquiring relevant additional information about concepts and their                       the representations of both user profiles and scholarly papers are
semantic relations, thus resulting in a knowledge-rich graph doc-                        based on the concepts mentions found in the paper’s content plus
ument model. Recent studies have used this representation as the                         additional processes that consider the semantic relation of the con-
basis of a scholarly paper recommendation system. Even when the                          cepts found in the KG. The resulting semantic representation is,
recommendation is made based on the comparison of graphs, little                         in essence, a directed weighted graph whose nodes represent con-
has been explored regarding the effects of the weights assigned to                       cepts and whose edges express the existence of a semantic linkage
the edges and nodes in the representation. In this paper, we present                     between two concepts in the KG. The similarity score between a
the initial results obtained from a comparative study of the effects                     user profile and a document is computed via a graph similarity al-
of different weighting strategies on the quality of the recommen-                        gorithm. Compared with standard content-based recommendation
dations. Three weighting strategies for edges (Number of Paths                           systems that look at documents and user profiles as bags of terms
(NP), Semantic Connectivity Score (SCS), and Hierarchical Similar-                       (i.e. keyword based representations), results show the superiority
ity (HS)) and two for nodes (Concept Frequency (CF) and PageRank                         of the semantic representation [5–7].
(PR)) are considered. Results show that the combination of the SCS                           In the proposed semantic representation weights are assigned to
and CF outperform the other weighting strategy combinations and                          both nodes and edges. The weights of the nodes consider the im-
the considered baselines.                                                                portance of the concept in the document, while the edges’ weights
                                                                                         capture the degree of associativity between concepts in the KG. Al-
KEYWORDS                                                                                 though existing graph similarity measures can exploit these weights,
                                                                                         the effect of different weighting strategies in the recommendation
Knowledge-based graph representations, scholarly papers recom-
                                                                                         has not been fully explored. In this paper, we present initial results
mendation, semantics-aware recommender system
                                                                                         of a comparative study on the recommendation performance using
ACM Reference Format:                                                                    different weighting strategies for a graph-based representation. The
Rubén Manrique and Olga Marino. 2018. Knowledge graph-based weighting                    evaluation is done using a scholarly paper recommendation dataset
strategies for a scholarly paper recommendation scenario. In Proceedings of              that contains the user profiles of eleven professors of computer
Knowledge-aware and Conversational Recommender Systems (KaRS) Work-
                                                                                         science [7].
shop 2018 (co-located with RecSys 2018) (KaRS’18). ACM, New York, NY, USA,
4 pages.
                                                                                             The paper is organized as follows: Section 2 reviews some related
                                                                                         work in semantics-aware recommender systems. Section 3 presents
                                                                                         the semantic representation process and its diverse modules. Sec-
1    INTRODUCTION
                                                                                         tions 4 and 5 present the considered weighting functions. Section
In recent years and with the advent of the Semantic Web and                              6 describes the evaluation framework and Section 7 the results.
Linked Open Data (LOD), new semantics-aware content representa-                          Finally, conclusions and directions for future work are discussed in
tions have been proposed for the next generation of recommender                          Section 8.
systems[4, 8]. LOD provides a variety of structured knowledge
bases in RDF format that are freely accessible on the Web and in-                        2   RELATED WORK
terconnected to each other. As a result, there is a large amount of
machine-readable data available that could be exploited to build                         Recent contributions to semantics-aware recommender systems
more intelligent systems [8]. From this huge amount of data, Knowl-                      have focused on exogenous semantic representations that intro-
edge Graphs (KGs) are particularly important since they concentrate                      duce the semantics by linking the recommendation item to a KG
                                                                                         [3, 4, 8, 9]. The ESWC 2014 Challenge [2], for example, worked on
KaRS’18, October 7, 2018, Vancouver, Canada.                                             books that were mapped to their corresponding DBpedia resource.
2018. ACM ISBN Copyright for the individual papers remains with the authors. Copying     In scenarios where there is not a broad enough open knowledge
permitted for private and academic purposes. This volume is published and copyrighted
by its editors..                                                                         base that describes items or where most of the important informa-
                                                                                         tion about the item is encoded via textual content, these approaches
KaRS’18, October 7, 2018, Vancouver, Canada.                                                                       Rubén Manrique and Olga Marino


cannot be easily used. In such cases, an alternative approach is to
build semantic representations by processing textual content and
mapping the concept mentions found to a KG via entity linking
tools in such a way that the item is represented by the multiple
concepts found. Piao et al. explore this approach for personalized
link recommendations on Twitter [13, 14] and more recently Man-
rique et al. for a scholarly paper recommendation task [6, 7]. In
this paper, we are also using this type of semantic representation,
however instead of using a vector representation of concepts we
use a graph-based representation. Therefore, we rely on a graph
similarity strategy to rank candidate items.

3    SEMANTIC REPRESENTATION
To build the semantic representation, we begin with the identifica-
tion of concepts mentions in the text (i.e. annotations). Different
tools for entity linking and word sense disambiguation can be used
for this task. As an example, consider the short input text in Figure 1.
After using an automatic annotation tool a set of URIs correspond-
ing to KG concepts is returned. It is important to mention that no
human verification is performed on the set of annotations retrieved
by the automatic tool. Then, expansion and filtering processes that
consider the semantic relationships found in the KG are applied.
    The expansion process is used to enrich the representation with
concepts that are not explicitly mentioned in the text or not iden-
tified by the annotation tool but are strongly related with anno-
tations. Our previous results show that expanded concepts can
be important to reinforce the main topic of the document even if
they do not occur in the text. The expansion process adds more
discriminative power to the representation. We expand the set of               Figure 1: Semantic representation process description
annotations to new related ones following two different approaches:
category-based and property-based. The category-based expansion
                                                                           the importance of each concept and the strength of the edges are
incorporates the hierarchical information of the concepts. For the
                                                                           evaluated via different weighting functions (see Sections 4 and 5).
“Artificial_Intelligence” concept, for example, the category “Cate-
                                                                           The resulting representation follows Definition 1.
gory:Computational_neuroscience” is retrieved and incorporated
                                                                              Definition 1. The semantic representation G i of a profile/document
into the representation. For property-based expansion, the KG on-
                                                                           pi is a directed weighted graph G i = (Ni , Ei , w(c), w(e)), where
tology is navigated and a set of related concepts is incorporated.
                                                                           both nodes and edges have an associated weight defined by the
For example, from the “Robotics” annotation the concept “Cooper-
                                                                           functions w(c) : N → + and w(e) : E → +. The set of nodes
ation” is retrieved by following the “wikiPageWikiLink” property
                                                                           Ni = {c 1 , c 2 , ..., c k } are entities/concepts belonging to the space of
in the ontology. Only outgoing links from a given annotation are
                                                                           a KG. The node weight w(c) denotes how relevant the node c is for
considered.
                                                                           the profile/document. An edge between two nodes (c a , cb ) repre-
    The filtering strategy seeks to eliminate irrelevant and possible
                                                                           sents the existence of at least one statement in the KG that links
noisy concepts in the representation. The noisy concepts can be the
                                                                           both concepts. The weight of the edge w(e) denotes how strong
result of incorrect annotations, off-topic concepts found in the text
                                                                           this linkage is between the considered concepts.
or added in the expansion step [6]. We found that noisy concepts
tend to be disconnected (i.e. a low number of connections with
                                                                           4    EDGE WEIGHTING FUNCTIONS
other concepts). Property paths1 between every pair of concepts                                                                        Íτ            <l >
                                                                                                                                         l =1 |paths c ,c |
are analyzed and constitute the base information for the graph edge        Number of paths (NP). NP is defined as N P(c i , c j ) =        MP
                                                                                                                                                i j
                                                                                                                                                    ,
conformation. Basically, an edge is created if there is a property                           <l >
                                                                           where |pathsc i ,c j | is the number of paths of length l between con-
path between the given pair of concepts. Then, the filtering strategy      cepts c i and c j in the KG, and τ is the maximum length of path
eliminates concepts with a node degree below or equal to α.                considered. MP is a normalization parameter that is equal to the
    Different property path lengths between concepts can be consid-        maximum number of paths found for a pair of concepts in the
ered in the filtering process, so different semantic representations       representation.
can be produced for the same input text. The result of these pro-             The Semantic Connectivity Score (SCS) [10]. SCS measures
cesses is a graph whose nodes are concepts and edges express the           latent connections between concept pairs and is computed as:
existence of a linkage between two concepts in the KG. Finally,            SCS(c i , c j ) = 1 −            1         . β is a damping factor that
                                                                                                                <l >
                                                                                                     l =1 β
                                                                                                    Íτ    l
                                                                                               1+(            |pathsc i ,c j |)
1 https://www.w3.org/TR/sparql11-property-paths/
                                                                           penalize longer paths (in our case β = 0.5). NP and SCS consider
Knowledge graph-based weighting strategies                                                                          KaRS’18, October 7, 2018, Vancouver, Canada.


the number of paths between the given concepts as indicative of a                        to their Graph Similarity (GS) to the user profile. For GS we employ
strong linkage.                                                                          the edit distance implemented in [5] and defined in Equation 2.
    Hierarchical Similarity (HS). We use the hierarchical informa-
tion of the concepts in the KG to calculate a measure of similarity                                                    GSnodes (G i , G j ) + GS edдes (G i , G j )
between the concepts. The higher the similarity, the stronger the                                   GS(G i , G j ) =                                                                        (2)
link between the concepts. If A is the set of categories for the con-                                                                            Í2
                                                                                                                           α nodes + c ∈Ni ∩N j |w i (c) − w j (c)|
cept c i and B is the set of categories for the concept c j , HS is defined                  GSnodes (G i , G j ) = 1 −
                                                                                                                         α nodes + c ∈Ni ∩N j max(w i (c), w j (c))
                                                                                                                                   Í
as:
                                                                                                                           α edдes + e ∈Ei ∩E j |w i (e) − w j (e)|
                                                                                                                                    Í
                                                                                             GS edдes (G i , G j ) = 1 −
                                                                                                                         α edдes + e ∈Ei ∩E j max(w i (e), w j (e))
                                                                                                                                   Í
                    HS(c i , c j ) =         max             taxsim(cati , cat j ) (1)
                                       (cat i ∈A,cat j ∈B)                                          where α nodes and α edдes are defined as:
                                  δ (root, catlca )                                                Õ           Õ              Õ               Õ
    taxsim =                                                                             α nodes =   w i (c) +     w j (c) −        w i (c) −                                            w j (c)
             δ (cati , catlca ) + δ (cat j , catlca ) + δ (root, catlca )
                                                                                                      c ∈N i               c ∈N j               c ∈N i ∩N j               c ∈N i ∩N j
   Given the hierarchy of categories T , the catlca of two categories
                                                                                                       Õ                   Õ                       Õ                         Õ
                                                                                         α edдes =             w i (e) +            w j (e) −                 w i (e) −                 w j (e)
cati and cat j is the vertex of greatest depth in T that is the common
                                                                                                      e ∈E i               e ∈E j               e ∈E i ∩E j               e ∈E i ∩E j
ancestor of both cati and cat j . δ (cata , catb ) is the number of edges
on the shortest path between cata and catb .                                                GS evaluates the similarity between two graphs in terms of the
                                                                                         weights differences of the common nodes/edges compared to the
5      NODE WEIGHTING FUNCTIONS                                                          total weight of the nodes/edges in the two graphs. Therefore, two
                                                                                         graphs are similar not only if their nodes/edges coincide but also if
Concept frequency (CF) + Discounts. Inspired by TF-IDF, CF                               their weights are close in magnitude.
analyzes the occurrences of the concept in the input content as
well as the frequency in the set of semantic representations. CF is                      6.1      Dataset
defined as: CF (c) = wc f (c) × loд mMc , where wc f (c) represents the
                                                                                         We employ the dataset proposed in [6] that contains the user profiles
number of times that c appears in the input content, M is the total
                                                                                         of 11 professors in the area of computer science. The user profiles
number of documents/profiles in the dataset and mc is the number
                                                                                         were built using the full text of the most recent publications found
of documents/profiles with the concept c in their representation.
                                                                                         on their Google Scholar web pages. At least a minimum of twelve
After the expansion process, the representation could be diverted
                                                                                         of each professor’s most recent publications were used as input for
towards frequent properties in the set of instances of the KG or
                                                                                         the semantic representation process. The candidate set is a subset of
general categories in the hierarchical structure of the KG. For the
                                                                                         Core and Arxiv open corpora that contains 5710 different academic
categories added through the expansion the following discount is
                                             1           1                               documents (i.e. papers, tech reports, thesis, etc.). The ground truth
applied: CFdiscount (cat) = CF (cat) × loд(S   P ) × loд(SC) where SP                    of papers is a subset of the candidate set in which users express
is the set of concepts belonging to the category and SC is the set                       an explicit interest via a web-based search system. In the data
of sub-categories in the category hierarchy. The idea behind this                        set, for each user there are at least 10 relevant documents. As for
discount strategy is that categories that are too broad and generic                      the user profile, the full text was used to construct the semantic
are penalizes [11]. Similarly, for property-based expanded concepts,                     representation of each document in the candidate set.
the following discount is applied: CFdiscount (c) = CF (c) × loд(P 1
                                                                      )                     For the construction of the semantic representation we use: (i)
where P is the number of occurrences of the property in the KG                           DBpedia as KG, (ii) DBpedia Spotlight as annotation service, (iii) a
from which the concept c ∈ C is obtained.                                                maximum path length of 2 for filtering and edge conformation as
    PageRank (PR): PageRank is a well-known node ranking algo-                           well as for the τ parameter, (iv) a minimum degree value α = 1, (v)
rithm. We use the PageRank version for directed weighted graphs                          categories extracted through dct:subject to calculate HS.
(i.e. it considers the edge weights). Therefore, the results obtained
with this centrality measure depends on the resulting link structure                     7     RESULTS
in the semantic graph and the associated edge weight.                                    The performance of the recommender system was evaluated by
                                                                                         typical metrics for the evaluation of Top-N recommender tasks:
6      EXPERIMENTAL SETUP                                                                MRR (Mean Reciprocal Rank), MAP@10 (Mean Average Precision),
Our main goal is to analyze the influence of the different weighting                     and NDCG@10 (Normalized Discounted Cumulative Gain). We
strategies in the context of scholarly paper recommendation. We                          select N=10 as the recommendation objective since it is a common
compare the quality achieved by the same recommendation algo-                            rank and it fits with the minimum number of relevant documents
rithm when inputting semantic representations for user profiles and                      per user in the dataset.
documents using the different weighting strategies. In this regard,                         Table 1 presents the results obtained by the different combina-
we embrace the content-based algorithm described in Definition 2.                        tions of the weighting strategies. We use a classical content-based
   Definition 2. Recommendation Algorithm: given a user profile u                        recommendation algorithm baseline that ranks candidate items
and a set of candidate scholarly papers SP = {p1 , ..., pn }, which are                  according to their cosine similarity with the user profile. In this
represented using the graph-based representation in Definition 1,                        case, profiles and documents use a standard bag-of-words Vector
the recommendation algorithm ranks the candidate items according                         Space Model (VSM) representation. According to Beel et al. VSM
KaRS’18, October 7, 2018, Vancouver, Canada.                                                                     Rubén Manrique and Olga Marino

Table 1: Performance of scholarly paper recommendation                   to test this assumption by favoring/disfavoring the contribution of
using different weighting strategies for nodes and edges. (*)            nodes and edges.
indicates the improvement over baselines is statistically sig-
nificant (p < 0.05).                                                     ACKNOWLEDGMENTS
                                                                         This work was partially supported by COLCIENCIAS PhD scholar-
    Edge Weight Node Weight           MRR        MAP      NDCG           ship (Call 647-2014).
        NP              CF            0.503     0.5*      0.661
        NP              PR            0.523*    0.401     0.482          REFERENCES
        SCS             CF            0.482     0.523*    0.716*          [1] Joeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger. 2016. Research-
        SCS             PR            0.574*    0.502*    0.672               paper recommender systems: a literature survey. International Journal on Digital
                                                                              Libraries 17, 4 (2016), 305–338. https://doi.org/10.1007/s00799-015-0156-0
        HS              CF            0.421     0.395     0.495           [2] Tommaso Di Noia, Iván Cantador, and Vito Claudio Ostuni. 2014. Linked Open
        HS              PR            0.442     0.342     0.434               Data-Enabled Recommender Systems: ESWC 2014 Challenge on Book Recom-
                                                                              mendation. In Semantic Web Evaluation Challenge, Valentina Presutti, Milan
          Cosine Baseline             0.401     0.345     0.423               Stankovic, Erik Cambria, Iván Cantador, Angelo Di Iorio, Tommaso Di Noia,
           VEO Baseline               0.442     0.427     0.601               Christoph Lange, Diego Reforgiato Recupero, and Anna Tordai (Eds.). Springer
                                                                              International Publishing, Cham, 129–143.
                                                                          [3] Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and
                                                                              Markus Zanker. 2012. Linked Open Data to Support Content-based Recommender
                                                                              Systems. In Proceedings of the 8th International Conference on Semantic Systems
with TF-IDF is the most frequent profiling and weighting scheme in            (I-SEMANTICS ’12). ACM, New York, NY, USA, 1–8. https://doi.org/10.1145/
research paper recommender systems [1]. As a baseline, we also use            2362499.2362501
VEO (Vertex Edge Overlap [5]). VEO uses an unweighted version             [4] Tommaso Di Noia and Vito Claudio Ostuni. 2015. Recommender Systems and
                                                                              Linked Open Data. Springer International Publishing, Cham, 88–113. https:
of the semantic representation (Definition 1). The candidate items,           //doi.org/10.1007/978-3-319-21768-0_4
in this case, are ranked based on the number of common nodes and          [5] Rubén Manrique, Felipe Cueto, and Olga Mariño. 2018. Comparing Graph Sim-
                                                                              ilarity Measures for Semantic Representations of Documents. In Advances in
edges they have with the user’s profile. VEO was chosen to evaluate           Computing. Springer International Publishing, Cham, 3–16.
whether the consideration of weights in the representation has a          [6] Rubén Manrique, Omar Herazo, and Olga Mariño. 2017. Exploring the Use of
positive impact on the recommendation.                                        Linked Open Data for User Research Interest Modeling. In Advances in Computing,
                                                                              Andrés Solano and Hugo Ordoñez (Eds.). Springer International Publishing, Cham,
   We can see that the combination of SCS-CF outperforms other                3–16.
weighting strategies and the baselines in terms of MAP and NDCG.          [7] Rubén Manrique and Olga Mariño. 2017. How Does the Size of a Document Affect
In terms of MRR, the best weighting strategy is SCS-PR. Although              Linked Open Data User Modeling Strategies?. In Proceedings of the International
                                                                              Conference on Web Intelligence (WI ’17). ACM, New York, NY, USA, 1246–1252.
SCS and NP consider the number of existing paths, the results show            https://doi.org/10.1145/3106426.3109440
a better performance for SCS. This can be attributed to the fact          [8] Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco de Gemmis, and Giovanni
                                                                              Semeraro. 2017. Introducing linked open data in graph-based recommender
that in addition to the number of paths, SCS also penalizes the               systems. Information Processing & Management 53, 2 (2017), 405 – 435. https:
path length through a damping factor. Further experimentation is              //doi.org/10.1016/j.ipm.2016.12.003
required to evaluate the effect of the β damping factor. HS presents      [9] Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco de Gemmis, and Giovanni
                                                                              Semeraro. 2016. Semantics-aware Graph-based Recommender Systems Exploit-
the worst results among the weighting strategies and only slightly            ing Linked Open Data. In Proceedings of the 2016 Conference on User Modeling
better than the cosine baseline.                                              Adaptation and Personalization (UMAP ’16). ACM, New York, NY, USA, 229–237.
   Regarding the node weighting strategies, results show that CF              https://doi.org/10.1145/2930238.2930249
                                                                         [10] Bernardo Pereira Nunes, Besnik Fetahu, Ricardo Kawase, Stefan Dietze, Marco An-
is more appropriate. CF is superior to PR in terms of MAP and                 tonio Casanova, and Diana Maynard. 2015. Interlinking Documents Based on
NDCG. On the other hand, according to the MRR, in average PR                  Semantic Graphs with an Application. Springer International Publishing, Cham,
                                                                              139–155.
ranks higher the first relevant result. The comparison with VEO          [11] Fabrizio Orlandi, John Breslin, and Alexandre Passant. 2012. Aggregated, Inter-
also shows that the consideration of weights for nodes and edges              operable and Multi-domain User Profiles for the Social Web. In Proceedings of the
improves the semantic representation proposed for the task of                 8th International Conference on Semantic Systems (I-SEMANTICS ’12). ACM, New
                                                                              York, NY, USA, 41–48. https://doi.org/10.1145/2362499.2362506
recommending scholarly papers.                                           [12] Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches
                                                                              and evaluation methods. Semantic Web 8, 3 (2017), 489–508. https://doi.org/10.
                                                                              3233/SW-160218
8    CONCLUSIONS AND FUTURE WORK                                         [13] Guangyuan Piao and John G. Breslin. 2016. Analyzing Aggregated Semantics-
In this paper, we explored different node and edge weighting strate-          enabled User Modeling on Google+ and Twitter for Personalized Link Rec-
                                                                              ommendations. In Proceedings of the 2016 Conference on User Modeling Adap-
gies for a graph-based semantic model for representing user profiles          tation and Personalization (UMAP ’16). ACM, New York, NY, USA, 105–109.
and papers in the context of the scholarly paper recommendation               https://doi.org/10.1145/2930238.2930278
task. The combination of SCS and CF as weighting strategies for          [14] Guangyuan Piao and John G. Breslin. 2016. Exploring Dynamics and Semantics
                                                                              of User Interests for User Modeling on Twitter for Link Recommendations. In
edges and nodes respectively presented the best results. SCS con-             Proceedings of the 12th International Conference on Semantic Systems (SEMANTiCS
siders the number of existing paths in the KG between the concepts            2016). ACM, New York, NY, USA, 81–88. https://doi.org/10.1145/2993318.2993332
considered, while CF considers the frequency of the concept in
the document/profile. In the near future, we plan to explore other
measures of centrality for nodes weighting (e.g., betweenness, close-
ness). We also want to explore the effect of path lengths greater than
2 for the calculation of SCS/NP. Finally, according to our recommen-
dation algorithm and graph similarity measure (Equation 2), the
contribution of edges and nodes are considered equivalent. We want