Knowledge graph-based weighting strategies for a scholarly paper recommendation scenario Rubén Manrique Olga Marino Systems and Computing Engineering Department, School Systems and Computing Engineering Department, School of Engineering. Universidad de los Andes of Engineering. Universidad de los Andes Bogotá, Colombia Bogotá, Colombia rf.manrique@uniandes.edu.co olmarino@uniandes.edu.co ABSTRACT the knowledge of multiple domains, and in general, they specify a In this paper, we study the effects of different node and edge weight- large number of interrelationships between concepts [12]. ing strategies of graph-based semantic representations on the accu- In previous works by the authors, a semantic representation that racy of a scholarly paper recommendation scenario. Our semantic exploits the structured knowledge present in KGs for the task of rec- representation relies on the use of Knowledge Graphs (KGs) for ommending scholarly papers was proposed [5–7]. In these works, acquiring relevant additional information about concepts and their the representations of both user profiles and scholarly papers are semantic relations, thus resulting in a knowledge-rich graph doc- based on the concepts mentions found in the paper’s content plus ument model. Recent studies have used this representation as the additional processes that consider the semantic relation of the con- basis of a scholarly paper recommendation system. Even when the cepts found in the KG. The resulting semantic representation is, recommendation is made based on the comparison of graphs, little in essence, a directed weighted graph whose nodes represent con- has been explored regarding the effects of the weights assigned to cepts and whose edges express the existence of a semantic linkage the edges and nodes in the representation. In this paper, we present between two concepts in the KG. The similarity score between a the initial results obtained from a comparative study of the effects user profile and a document is computed via a graph similarity al- of different weighting strategies on the quality of the recommen- gorithm. Compared with standard content-based recommendation dations. Three weighting strategies for edges (Number of Paths systems that look at documents and user profiles as bags of terms (NP), Semantic Connectivity Score (SCS), and Hierarchical Similar- (i.e. keyword based representations), results show the superiority ity (HS)) and two for nodes (Concept Frequency (CF) and PageRank of the semantic representation [5–7]. (PR)) are considered. Results show that the combination of the SCS In the proposed semantic representation weights are assigned to and CF outperform the other weighting strategy combinations and both nodes and edges. The weights of the nodes consider the im- the considered baselines. portance of the concept in the document, while the edges’ weights capture the degree of associativity between concepts in the KG. Al- KEYWORDS though existing graph similarity measures can exploit these weights, the effect of different weighting strategies in the recommendation Knowledge-based graph representations, scholarly papers recom- has not been fully explored. In this paper, we present initial results mendation, semantics-aware recommender system of a comparative study on the recommendation performance using ACM Reference Format: different weighting strategies for a graph-based representation. The Rubén Manrique and Olga Marino. 2018. Knowledge graph-based weighting evaluation is done using a scholarly paper recommendation dataset strategies for a scholarly paper recommendation scenario. In Proceedings of that contains the user profiles of eleven professors of computer Knowledge-aware and Conversational Recommender Systems (KaRS) Work- science [7]. shop 2018 (co-located with RecSys 2018) (KaRS’18). ACM, New York, NY, USA, 4 pages. The paper is organized as follows: Section 2 reviews some related work in semantics-aware recommender systems. Section 3 presents the semantic representation process and its diverse modules. Sec- 1 INTRODUCTION tions 4 and 5 present the considered weighting functions. Section In recent years and with the advent of the Semantic Web and 6 describes the evaluation framework and Section 7 the results. Linked Open Data (LOD), new semantics-aware content representa- Finally, conclusions and directions for future work are discussed in tions have been proposed for the next generation of recommender Section 8. systems[4, 8]. LOD provides a variety of structured knowledge bases in RDF format that are freely accessible on the Web and in- 2 RELATED WORK terconnected to each other. As a result, there is a large amount of machine-readable data available that could be exploited to build Recent contributions to semantics-aware recommender systems more intelligent systems [8]. From this huge amount of data, Knowl- have focused on exogenous semantic representations that intro- edge Graphs (KGs) are particularly important since they concentrate duce the semantics by linking the recommendation item to a KG [3, 4, 8, 9]. The ESWC 2014 Challenge [2], for example, worked on KaRS’18, October 7, 2018, Vancouver, Canada. books that were mapped to their corresponding DBpedia resource. 2018. ACM ISBN Copyright for the individual papers remains with the authors. Copying In scenarios where there is not a broad enough open knowledge permitted for private and academic purposes. This volume is published and copyrighted by its editors.. base that describes items or where most of the important informa- tion about the item is encoded via textual content, these approaches KaRS’18, October 7, 2018, Vancouver, Canada. Rubén Manrique and Olga Marino cannot be easily used. In such cases, an alternative approach is to build semantic representations by processing textual content and mapping the concept mentions found to a KG via entity linking tools in such a way that the item is represented by the multiple concepts found. Piao et al. explore this approach for personalized link recommendations on Twitter [13, 14] and more recently Man- rique et al. for a scholarly paper recommendation task [6, 7]. In this paper, we are also using this type of semantic representation, however instead of using a vector representation of concepts we use a graph-based representation. Therefore, we rely on a graph similarity strategy to rank candidate items. 3 SEMANTIC REPRESENTATION To build the semantic representation, we begin with the identifica- tion of concepts mentions in the text (i.e. annotations). Different tools for entity linking and word sense disambiguation can be used for this task. As an example, consider the short input text in Figure 1. After using an automatic annotation tool a set of URIs correspond- ing to KG concepts is returned. It is important to mention that no human verification is performed on the set of annotations retrieved by the automatic tool. Then, expansion and filtering processes that consider the semantic relationships found in the KG are applied. The expansion process is used to enrich the representation with concepts that are not explicitly mentioned in the text or not iden- tified by the annotation tool but are strongly related with anno- tations. Our previous results show that expanded concepts can be important to reinforce the main topic of the document even if they do not occur in the text. The expansion process adds more discriminative power to the representation. We expand the set of Figure 1: Semantic representation process description annotations to new related ones following two different approaches: category-based and property-based. The category-based expansion the importance of each concept and the strength of the edges are incorporates the hierarchical information of the concepts. For the evaluated via different weighting functions (see Sections 4 and 5). “Artificial_Intelligence” concept, for example, the category “Cate- The resulting representation follows Definition 1. gory:Computational_neuroscience” is retrieved and incorporated Definition 1. The semantic representation G i of a profile/document into the representation. For property-based expansion, the KG on- pi is a directed weighted graph G i = (Ni , Ei , w(c), w(e)), where tology is navigated and a set of related concepts is incorporated. both nodes and edges have an associated weight defined by the For example, from the “Robotics” annotation the concept “Cooper- functions w(c) : N → ’+ and w(e) : E → ’+. The set of nodes ation” is retrieved by following the “wikiPageWikiLink” property Ni = {c 1 , c 2 , ..., c k } are entities/concepts belonging to the space of in the ontology. Only outgoing links from a given annotation are a KG. The node weight w(c) denotes how relevant the node c is for considered. the profile/document. An edge between two nodes (c a , cb ) repre- The filtering strategy seeks to eliminate irrelevant and possible sents the existence of at least one statement in the KG that links noisy concepts in the representation. The noisy concepts can be the both concepts. The weight of the edge w(e) denotes how strong result of incorrect annotations, off-topic concepts found in the text this linkage is between the considered concepts. or added in the expansion step [6]. We found that noisy concepts tend to be disconnected (i.e. a low number of connections with 4 EDGE WEIGHTING FUNCTIONS other concepts). Property paths1 between every pair of concepts Íτ l =1 |paths c ,c | are analyzed and constitute the base information for the graph edge Number of paths (NP). NP is defined as N P(c i , c j ) = MP i j , conformation. Basically, an edge is created if there is a property where |pathsc i ,c j | is the number of paths of length l between con- path between the given pair of concepts. Then, the filtering strategy cepts c i and c j in the KG, and τ is the maximum length of path eliminates concepts with a node degree below or equal to α. considered. MP is a normalization parameter that is equal to the Different property path lengths between concepts can be consid- maximum number of paths found for a pair of concepts in the ered in the filtering process, so different semantic representations representation. can be produced for the same input text. The result of these pro- The Semantic Connectivity Score (SCS) [10]. SCS measures cesses is a graph whose nodes are concepts and edges express the latent connections between concept pairs and is computed as: existence of a linkage between two concepts in the KG. Finally, SCS(c i , c j ) = 1 − 1 . β is a damping factor that l =1 β Íτ l 1+( |pathsc i ,c j |) 1 https://www.w3.org/TR/sparql11-property-paths/ penalize longer paths (in our case β = 0.5). NP and SCS consider Knowledge graph-based weighting strategies KaRS’18, October 7, 2018, Vancouver, Canada. the number of paths between the given concepts as indicative of a to their Graph Similarity (GS) to the user profile. For GS we employ strong linkage. the edit distance implemented in [5] and defined in Equation 2. Hierarchical Similarity (HS). We use the hierarchical informa- tion of the concepts in the KG to calculate a measure of similarity GSnodes (G i , G j ) + GS edдes (G i , G j ) between the concepts. The higher the similarity, the stronger the GS(G i , G j ) = (2) link between the concepts. If A is the set of categories for the con- Í2 α nodes + c ∈Ni ∩N j |w i (c) − w j (c)| cept c i and B is the set of categories for the concept c j , HS is defined GSnodes (G i , G j ) = 1 − α nodes + c ∈Ni ∩N j max(w i (c), w j (c)) Í as: α edдes + e ∈Ei ∩E j |w i (e) − w j (e)| Í GS edдes (G i , G j ) = 1 − α edдes + e ∈Ei ∩E j max(w i (e), w j (e)) Í HS(c i , c j ) = max taxsim(cati , cat j ) (1) (cat i ∈A,cat j ∈B) where α nodes and α edдes are defined as: δ (root, catlca ) Õ Õ Õ Õ taxsim = α nodes = w i (c) + w j (c) − w i (c) − w j (c) δ (cati , catlca ) + δ (cat j , catlca ) + δ (root, catlca ) c ∈N i c ∈N j c ∈N i ∩N j c ∈N i ∩N j Given the hierarchy of categories T , the catlca of two categories Õ Õ Õ Õ α edдes = w i (e) + w j (e) − w i (e) − w j (e) cati and cat j is the vertex of greatest depth in T that is the common e ∈E i e ∈E j e ∈E i ∩E j e ∈E i ∩E j ancestor of both cati and cat j . δ (cata , catb ) is the number of edges on the shortest path between cata and catb . GS evaluates the similarity between two graphs in terms of the weights differences of the common nodes/edges compared to the 5 NODE WEIGHTING FUNCTIONS total weight of the nodes/edges in the two graphs. Therefore, two graphs are similar not only if their nodes/edges coincide but also if Concept frequency (CF) + Discounts. Inspired by TF-IDF, CF their weights are close in magnitude. analyzes the occurrences of the concept in the input content as well as the frequency in the set of semantic representations. CF is 6.1 Dataset defined as: CF (c) = wc f (c) × loд mMc , where wc f (c) represents the We employ the dataset proposed in [6] that contains the user profiles number of times that c appears in the input content, M is the total of 11 professors in the area of computer science. The user profiles number of documents/profiles in the dataset and mc is the number were built using the full text of the most recent publications found of documents/profiles with the concept c in their representation. on their Google Scholar web pages. At least a minimum of twelve After the expansion process, the representation could be diverted of each professor’s most recent publications were used as input for towards frequent properties in the set of instances of the KG or the semantic representation process. The candidate set is a subset of general categories in the hierarchical structure of the KG. For the Core and Arxiv open corpora that contains 5710 different academic categories added through the expansion the following discount is 1 1 documents (i.e. papers, tech reports, thesis, etc.). The ground truth applied: CFdiscount (cat) = CF (cat) × loд(S P ) × loд(SC) where SP of papers is a subset of the candidate set in which users express is the set of concepts belonging to the category and SC is the set an explicit interest via a web-based search system. In the data of sub-categories in the category hierarchy. The idea behind this set, for each user there are at least 10 relevant documents. As for discount strategy is that categories that are too broad and generic the user profile, the full text was used to construct the semantic are penalizes [11]. Similarly, for property-based expanded concepts, representation of each document in the candidate set. the following discount is applied: CFdiscount (c) = CF (c) × loд(P 1 ) For the construction of the semantic representation we use: (i) where P is the number of occurrences of the property in the KG DBpedia as KG, (ii) DBpedia Spotlight as annotation service, (iii) a from which the concept c ∈ C is obtained. maximum path length of 2 for filtering and edge conformation as PageRank (PR): PageRank is a well-known node ranking algo- well as for the τ parameter, (iv) a minimum degree value α = 1, (v) rithm. We use the PageRank version for directed weighted graphs categories extracted through dct:subject to calculate HS. (i.e. it considers the edge weights). Therefore, the results obtained with this centrality measure depends on the resulting link structure 7 RESULTS in the semantic graph and the associated edge weight. The performance of the recommender system was evaluated by typical metrics for the evaluation of Top-N recommender tasks: 6 EXPERIMENTAL SETUP MRR (Mean Reciprocal Rank), MAP@10 (Mean Average Precision), Our main goal is to analyze the influence of the different weighting and NDCG@10 (Normalized Discounted Cumulative Gain). We strategies in the context of scholarly paper recommendation. We select N=10 as the recommendation objective since it is a common compare the quality achieved by the same recommendation algo- rank and it fits with the minimum number of relevant documents rithm when inputting semantic representations for user profiles and per user in the dataset. documents using the different weighting strategies. In this regard, Table 1 presents the results obtained by the different combina- we embrace the content-based algorithm described in Definition 2. tions of the weighting strategies. We use a classical content-based Definition 2. Recommendation Algorithm: given a user profile u recommendation algorithm baseline that ranks candidate items and a set of candidate scholarly papers SP = {p1 , ..., pn }, which are according to their cosine similarity with the user profile. In this represented using the graph-based representation in Definition 1, case, profiles and documents use a standard bag-of-words Vector the recommendation algorithm ranks the candidate items according Space Model (VSM) representation. According to Beel et al. VSM KaRS’18, October 7, 2018, Vancouver, Canada. Rubén Manrique and Olga Marino Table 1: Performance of scholarly paper recommendation to test this assumption by favoring/disfavoring the contribution of using different weighting strategies for nodes and edges. (*) nodes and edges. indicates the improvement over baselines is statistically sig- nificant (p < 0.05). ACKNOWLEDGMENTS This work was partially supported by COLCIENCIAS PhD scholar- Edge Weight Node Weight MRR MAP NDCG ship (Call 647-2014). NP CF 0.503 0.5* 0.661 NP PR 0.523* 0.401 0.482 REFERENCES SCS CF 0.482 0.523* 0.716* [1] Joeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger. 2016. Research- SCS PR 0.574* 0.502* 0.672 paper recommender systems: a literature survey. International Journal on Digital Libraries 17, 4 (2016), 305–338. https://doi.org/10.1007/s00799-015-0156-0 HS CF 0.421 0.395 0.495 [2] Tommaso Di Noia, Iván Cantador, and Vito Claudio Ostuni. 2014. Linked Open HS PR 0.442 0.342 0.434 Data-Enabled Recommender Systems: ESWC 2014 Challenge on Book Recom- mendation. In Semantic Web Evaluation Challenge, Valentina Presutti, Milan Cosine Baseline 0.401 0.345 0.423 Stankovic, Erik Cambria, Iván Cantador, Angelo Di Iorio, Tommaso Di Noia, VEO Baseline 0.442 0.427 0.601 Christoph Lange, Diego Reforgiato Recupero, and Anna Tordai (Eds.). Springer International Publishing, Cham, 129–143. [3] Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and Markus Zanker. 2012. Linked Open Data to Support Content-based Recommender Systems. In Proceedings of the 8th International Conference on Semantic Systems with TF-IDF is the most frequent profiling and weighting scheme in (I-SEMANTICS ’12). ACM, New York, NY, USA, 1–8. https://doi.org/10.1145/ research paper recommender systems [1]. As a baseline, we also use 2362499.2362501 VEO (Vertex Edge Overlap [5]). VEO uses an unweighted version [4] Tommaso Di Noia and Vito Claudio Ostuni. 2015. Recommender Systems and Linked Open Data. Springer International Publishing, Cham, 88–113. https: of the semantic representation (Definition 1). The candidate items, //doi.org/10.1007/978-3-319-21768-0_4 in this case, are ranked based on the number of common nodes and [5] Rubén Manrique, Felipe Cueto, and Olga Mariño. 2018. Comparing Graph Sim- ilarity Measures for Semantic Representations of Documents. In Advances in edges they have with the user’s profile. VEO was chosen to evaluate Computing. Springer International Publishing, Cham, 3–16. whether the consideration of weights in the representation has a [6] Rubén Manrique, Omar Herazo, and Olga Mariño. 2017. Exploring the Use of positive impact on the recommendation. Linked Open Data for User Research Interest Modeling. In Advances in Computing, Andrés Solano and Hugo Ordoñez (Eds.). Springer International Publishing, Cham, We can see that the combination of SCS-CF outperforms other 3–16. weighting strategies and the baselines in terms of MAP and NDCG. [7] Rubén Manrique and Olga Mariño. 2017. How Does the Size of a Document Affect In terms of MRR, the best weighting strategy is SCS-PR. Although Linked Open Data User Modeling Strategies?. In Proceedings of the International Conference on Web Intelligence (WI ’17). ACM, New York, NY, USA, 1246–1252. SCS and NP consider the number of existing paths, the results show https://doi.org/10.1145/3106426.3109440 a better performance for SCS. This can be attributed to the fact [8] Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco de Gemmis, and Giovanni Semeraro. 2017. Introducing linked open data in graph-based recommender that in addition to the number of paths, SCS also penalizes the systems. Information Processing & Management 53, 2 (2017), 405 – 435. https: path length through a damping factor. Further experimentation is //doi.org/10.1016/j.ipm.2016.12.003 required to evaluate the effect of the β damping factor. HS presents [9] Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco de Gemmis, and Giovanni Semeraro. 2016. Semantics-aware Graph-based Recommender Systems Exploit- the worst results among the weighting strategies and only slightly ing Linked Open Data. In Proceedings of the 2016 Conference on User Modeling better than the cosine baseline. Adaptation and Personalization (UMAP ’16). ACM, New York, NY, USA, 229–237. Regarding the node weighting strategies, results show that CF https://doi.org/10.1145/2930238.2930249 [10] Bernardo Pereira Nunes, Besnik Fetahu, Ricardo Kawase, Stefan Dietze, Marco An- is more appropriate. CF is superior to PR in terms of MAP and tonio Casanova, and Diana Maynard. 2015. Interlinking Documents Based on NDCG. On the other hand, according to the MRR, in average PR Semantic Graphs with an Application. Springer International Publishing, Cham, 139–155. ranks higher the first relevant result. The comparison with VEO [11] Fabrizio Orlandi, John Breslin, and Alexandre Passant. 2012. Aggregated, Inter- also shows that the consideration of weights for nodes and edges operable and Multi-domain User Profiles for the Social Web. In Proceedings of the improves the semantic representation proposed for the task of 8th International Conference on Semantic Systems (I-SEMANTICS ’12). ACM, New York, NY, USA, 41–48. https://doi.org/10.1145/2362499.2362506 recommending scholarly papers. [12] Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489–508. https://doi.org/10. 3233/SW-160218 8 CONCLUSIONS AND FUTURE WORK [13] Guangyuan Piao and John G. Breslin. 2016. Analyzing Aggregated Semantics- In this paper, we explored different node and edge weighting strate- enabled User Modeling on Google+ and Twitter for Personalized Link Rec- ommendations. In Proceedings of the 2016 Conference on User Modeling Adap- gies for a graph-based semantic model for representing user profiles tation and Personalization (UMAP ’16). ACM, New York, NY, USA, 105–109. and papers in the context of the scholarly paper recommendation https://doi.org/10.1145/2930238.2930278 task. The combination of SCS and CF as weighting strategies for [14] Guangyuan Piao and John G. Breslin. 2016. Exploring Dynamics and Semantics of User Interests for User Modeling on Twitter for Link Recommendations. In edges and nodes respectively presented the best results. SCS con- Proceedings of the 12th International Conference on Semantic Systems (SEMANTiCS siders the number of existing paths in the KG between the concepts 2016). ACM, New York, NY, USA, 81–88. https://doi.org/10.1145/2993318.2993332 considered, while CF considers the frequency of the concept in the document/profile. In the near future, we plan to explore other measures of centrality for nodes weighting (e.g., betweenness, close- ness). We also want to explore the effect of path lengths greater than 2 for the calculation of SCS/NP. Finally, according to our recommen- dation algorithm and graph similarity measure (Equation 2), the contribution of edges and nodes are considered equivalent. We want