=Paper=
{{Paper
|id=Vol-1630/paper2
|storemode=property
|title=A Recommendation Engine Based On Social Metrics
|pdfUrl=https://ceur-ws.org/Vol-1630/paper2.pdf
|volume=Vol-1630
|authors=Ofelia Cervantes,Francisco Gutierrez,Ernesto Gutierrez,J. Alfredo Sanchez,Muhammad Rizwan,Wan Wanggen
|dblpUrl=https://dblp.org/rec/conf/semweb/CervantesGGSRW15
}}
==A Recommendation Engine Based On Social Metrics==
A Recommendation Engine based on Social Metrics Ofelia Cervantes1 , Francisco Gutiérrez1 Ernesto Gutiérrez1 , J. Alfredo Sánchez1 , Muhammad Rizwan2 , and Wan Wanggen2 1 Universidad de las Américas Puebla, Puebla, México 2 Institute of Smart City, Shanghai University, Shanghai, China ofelia.cervantes@udlap.mx, francisco.gutierrez@udlap.mx, ernesto.gutierrezca@udlap.mx, alfredo.sanchez@udlap.mx, rizwan@shu.edu.cn, wanwg@staff.shu.edu.cn Abstract. Recommender systems have changed the way people find products, points of interest, services or even new friends. The technology behind recommender systems has evolved to provide user preferences and social influences. In this paper, we present a first approach to develop a recommendation engine based on social metrics applied to graphs that represent object’s characteristics, user profiles and influences obtained from social connections. It exploits graph centrality measures to elab- orate personalized recommendations from the semantic knowledge rep- resented in the graph. The graph model and selected graph algorithms for calculating graph centralities that are the core of the recommender system are presented. Semantic concepts such as semantic predominance and similarity measures are adapted to the graph model. Implementation challenges faced to solve performance issues are also discussed. Keywords: recommender systems, social metrics, graph models, mas- sive graph processing 1 Introduction The widespread availability of products and services through web-based and mobile applications makes it difficult for end users to select the right item, ac- cording to their preferences. A recommender system must make use of different sources of information for providing users with suggestions of items that better correspond to their expectations. Those sources can include user preferences, item descriptions or social information. Several approaches have been proposed to tackle the problem of selecting automatically the list of items that really contributes to satisfy the needs of end users. Approaches based on demographics or modeling user profiles are oriented to exploit user features and preferences for filtering available choices. The main challenge of this approach is to create the user profile from scratch. Some systems invite users to select their preferences from a predefined list of categories or allow the system to extract their profiles from other applications. Other systems 2 register every user action to dynamically build models based upon the user’s behavior. Some research works focus on collecting ratings made by users that have evaluated the product or service providing a first-hand perception of the quality of the evaluated item. Other approaches have focused on describing the main features of every item and trying to match them with user requirements. Modeling social networks using graphs has opened opportunities for exploring new alternatives for implementing recommender systems. Social metrics, such as flow centralities calculated on graph-based models provide interesting measures to represent the semantic predominance of concepts featuring users’ preferences as well as item characteristics. A first proof of concept was accomplished and reported in [6]. A more detailed description of the model is presented here as well as implementation problems that were solved to provide a complete recom- mendation engine that can be integrated in recommendation applications. The remainder of the paper is structured as follows: Section 2 presents related work. Then, Section 3 introduces the proposed graph-based recommendation model. Section 4 discusses selected graph algorithms used to calculate social metrics, particularly flow centralities. Next, Section 5 describes the framework developed in the prototype and the performance challenges we had to overcome. Finally, in Section 6 we report the results we have obtained thus far and discuss future work. 2 Related Work Traditionally, recommender systems are classified into the following categories: demographic, content-based, collaborative filtering, social-based, context-aware and hybrid approaches. However, the ever increasing amount of information available in social media enables the development of knowledge-based recom- mender systems [4]. The rich knowledge that has been accumulated in social media can be exploited to improve recommendation outcomes, enhance user experience and to develop new algorithms. In addition to the classification of recommender systems, Zanker [24] illustrates the fundamental building blocks of a recommender system, out of which we highlight three: user model, commu- nity (social network), and a recommendation algorithm. Thereby, we present a brief overview of related work regarding knowledge-base recommender systems from building blocks perspectives. 2.1 User Modeling In order to personalize recommendations, it is necessary to know information about each user. User models are representations of users’ needs, goals, prefer- ences, interests, and behaviors along with users’ demographic characteristics [19]. Several user modeling approaches have been proposed, from typical weighted vectors to domain ontologies. In [1] the authors defined a user model based on fuzzy logic and proposed an approach to infer the degree of genre presence in 3 a movie by exploiting the tags assigned by the users. In [24] the authors pre- sented a simple attribute-value pair dictionary to model the user through the explicit elicitation of user requirements. A richer user model is presented in [10], where the authors used a machine learning process to capture the user profile and context into a domain ontology. Our work tries to balance between simple [24] and complex models [10] with the goal of having an efficient but still rich user model. Other works, like Canta- dor et al. and Moahedian et al. [5, 14], are similar to our proposed user model, since we use tags and keywords to build a lax ontology. 2.2 Recommendation Algorithms and Techniques There is a wide range of recommendation algorithms and techniques. They vary according to data availability, recommender filtering type as well as user and object representation. Several methods have demonstrated to have an acceptable performance such as: association rule learning [24], Bayesian networks [8], nearest neighbors [2], genetic algorithms [13], neural networks [3], clustering [20], latent semantic features, among others that can be found with more detail in [4]. In this work, we use graph metrics commonly used in social network anal- ysis [21]. We propose semantic social network analysis that integrates semantic methods of knowledge engineering and natural language processing with classic social network analysis. Advantages of semantic social network analysis are: its knowledge foundation and its non probabilistic nature. In contrast, one disad- vantage is its computational cost. Therefore, enhancement techniques are needed to process graph metrics more efficiently. 2.3 Information Extracted from Social Networks Recommender systems are creating unique opportunities to assist people to find relevant information when browsing the web, and making meaningful choices with the success of emerging Web 2.0, and various social network Websites. In [7], the author has proposed a novel approach for recommendation systems that uses data collected from social networks. Wang et al. [22] studied the problem of recommending new venues to users who participate in location-based social networks (LBSNs) and propose algo- rithms that create recommendations based on: past user behavior (visited places), the location of each venue, the social relationships among the users, and the sim- ilarity between users. Ye et al. [23] exploited the social and geographical characteristics of users and locations/places to research issues in realizing location recommendation services for large-scale location-based social networks. They observed the strong social and geospatial ties among users and their favorite locations/places in the system via the analysis of datasets collected from Foursquare. Similar to our work, Savage [18] investigated the design of a more complete, ubiquitous location-based recommendation algorithm that is based on a text 4 classification problem. The system learns user preferences by mining a person’s social network profile. The author also defined a decision-making model, which considers the learned preferences, physical constraints, and how the individual is currently feeling. We can state that novel approaches rely mainly on the fusion of information inferred from a user’s social network profile and other data sources (e.g. mobile phone’s sensors). In this sense, it is necessary to develop new strategies that produce recommendations from rich but still incomplete information. 3 Graph Model Our recommendation engine is based on a graph representation of users and objects of interest linked through concepts (denoted as terms). Figure 1 shows the graph model where every node falls in one of three categories: User, Term or Object of Interest, and every edge represents the semantic relation between nodes: Predominance, Similarity or Friendship. Every Term node of the graph in Figure 1 acts as a semantic descriptor of both: Users and Objects of Interest. In other words, every user and every object are correspondingly described by the terms linked to them. In general, users are described by their tastes, preferences, and interest (user model) whereas objects are described by tags and keywords (object model). In this manner, when a term is shared between a user and an object, it shows the possibility that the user could be interested in that particular object, even though the object had never been seen or rated by user. A graph-based representation allows us to apply graph algorithms (e.g. so- cial metrics) to discover topological features, key relationships, and important (prestigious) nodes. Then, with these features we can make relevant recommen- dations to users, such as suggesting friends or places. Therefore, the foundation of our recommender system relies on a knowledge base constructed from both: a user model (see section 3.3) and an object model (see section 3.4). In order to construct the user and object models, we applied a linguistic analysis over user and object text descriptions. Basically, we conducted pre- processing (removal of stopwords and selection of most descriptive words) and statistical linguistic analysis (using weighting schemes: tf-idf and okapi BM25) to define a bond between text descriptions and semantic relations represented in the graph (see section 3.2). It is possible to obtain user and object descriptions from social networks (Facebook, Foursquare, Twitter), web pages (Wikipedia, web search results, etc.), human experts contributions or other textual resources. 3.1 Weighted Graph Definition Formally, we define a weighted graph G = (V, E, fE ) where V = {v1 , . . . , vn } is a set of vertices vertex (nodes), E = {e1 , . . . , en } ⊂ {{x, y} | x, y ∈ V } is a set of edges, and fE : E → R the function on weights for every edge. In our recommender system V = U ∪ T ∪ O where U is the set of users, T is the set of terms, and O is the set of objects of interest. E = P ∪ S ∪ F where P is 5 Fig. 1. Graph model. Node layers: User, Term, and Objects of Interest linked trough Friendship, Predominance and Similarity edges. the set of predominance edges, S the set of similarity edges, and F is the set of friendship edges. Function fE is adapted according to each type of edge. For instance, we can obtain the sub-graph of users as GU = (U, F, fF ) (see User layer in Figure 1), the sub-graph of objects as GO = (O, S, fS ) (see Objects of Interest layer in Figure 1), and the sub-graph of user and object profiles as GU ∪T ∪O = (U ∪ T ∪ O, P, fP ). 3.2 Semantic Relations In order to build the semantic relations of the graph, it is necessary to obtain text descriptions of users and objects. As a result, we have two collections: the users text collection (UTC) and the objects text collection (OTC), where each text description is considered a document D in a vector space model. We define three types of semantic relations (edges of the graph): predomi- nance, similarity, and friendship. Each semantic relation links different types of nodes and has a different weighting function. Predominance is the edge between a user or an object and a term, similarity is the edge between two objects and friendship is the edge between two users. Predominance is the semantic relation between a term and a user or an object. A term acts as a descriptor of users and objects. We define a weighting function over the edge of predominance based on linguistic analysis. We apply Okapi BM25 ranking function [17] to each independent document collection (UTC and OTC) using Equation 1. In Equation 1, pred is the predominance of the term T in document D, Idoc is the number of indexed documents (size of collection), Tdoc is the number of documents containing term T , T F is the term frequency relative to document D, DL is the document length, avgDL is the average document length among the entire collection, K and B are free parameters (usually K = 1.2 and B = 0.75). 6 ! Idoc + 0.5 T F ∗ (K + 1) pred(D,T ) = log10 ∗ DL (1) Idoc + 0.5 T F + K ∗ ((1 − B) + B ∗ avgDL ) Similarity is the semantic relation between two objects. This measure in- dicates the degree of affinity between objects. We apply the cosine similarity measure (Equation 2) to obtain this value. The Similarity is calculated after the predominance, since it relies on shared terms. Then, every object is a vector of predominances as shown below Equation 2. A·B Similarity(A,B) = cos Θ = (2) kAk kBk ObjectA = pred(A,T1 ) , pred(A,T2 ) , · · · , pred(A,Tn ) ObjectB = pred(B,T1 ) , pred(B,T2 ) · · · , pred(B,Tn ) In Equation 2, the similarity between object A and object B is determined by the weights of the terms they have in common. In this manner, a high similarity value indicates a higher semantic correspondence between objects. Friendship is the semantic relation between two users. This measure indicates the degree of affinity between two users. Our current model does not distinguish between close friends, friends or acquaintances. Therefore, the users’ sub-graph is only a friend-of-a-friend (FOAF) node-link type. 3.3 User Model As part of the graph-based representation, users are defined as sub-graphs. A user model is composed of two sub-graphs: user profile Gu = (u, P, fP ) and user FOAF network GU = (U, F, fF ). Figure 2A shows the user profile network and 2B the user FOAF network. In the user profile sub-graph, each user is linked to a set of terms that indicate tastes, preferences and interests. Tastes are gen- eral inclinations of user towards some entities and they are generally expressed with actions such as likes (e.g. Foursquare, check-ins and Facebook likes football, beer, steak, coffee, etc ). Preferences are user inclinations towards taste features. Preferences are more fine grained than tastes and are usually expressed in users’ reviews and ratings (e.g. starred reviews: I like the double espresso, I don’t like diet soda). Interests are defined as contextual user inclinations or intentions (e.g. I want to try Chinese food, I’m going to watch minions movie). pred(U,T ) Case A fP = 1 Case B (3) #stars − 2 ∗ 13 Case C Our scheme to weight edges within a user profile is indicated in Equation 3. Case A occurs when only text descriptions are used; this means that terms 7 Fig. 2. A)User profile and B)User FOAF Network are weighted according to the Okapi measure (pred, as shown in Equation 1). Case B occurs when explicit likes are found in Foursquare or Facebook. Case C occurs when terms extracted from starred reviews are used to describe a user. In addition to Equation 3, we use a threshold value to limit the number of terms connected with a given user. In fact, we use the first quartile as threshold value. An example of user profile is shown in Figure 2A, where, it is possible to notice that a user likes football, rock and coffee, and is likely to be student. In user friendship networks, as mentioned earlier, there are no differences among friendship types. Then, in FOAF network all weights are equal to 1 (fF = 1). A user FOAF network is shown in Figure 2B. 3.4 Object Model Objects of interest are sets of items that can be of potential interest to a user. Depending upon the application, objects of interest can be of different grain size. For instance, they can be coarse-grained as a point of interest (POI) or fine-grained as items inside places. An object is described by the category to which it belongs (e.g. dog is an animal). Also, it is described by tags designed by users or keywords found in object’s description. The object model is consists of two sub-graphs: object profile Go = (o, P, fP ) and object similarity sub-graph GO = (O, S, fS ). Object profile was built with data gathered from Foursquare, Wikipedia and results from web searches. The weights of edges that link objects and terms were calculated using the predominance formula shown in Equation 1. This means that the weight function on edges is fP = pred(O,T ) . 3.5 User Global and Local Network In order to apply social metrics (centrality measures) and relate them to perti- nent recommendations, we defined two networks from user perspective: a user global network (U GNu ) and a user local network (U LNu ). User global network is the whole graph (all nodes: users U , terms T and objects O and all edges: similarities S, predominances P and friendships F ) centered in current user. 8 Therefore, U GNu = (U ∪ T ∪ O, S ∪ P ∪ F ) (see Figure 1). Whereas user local network is the sub-graph defined by current user node u, term nodes adjacent to user Tu and object nodes adjacent to terms node OT linked trough predominance edges from user Pu and from objects PO . Hence, U LNu = (u ∪ Tu ∪ OT , Pu ∪ PO ) (see Figure 3). It is important to highlight the difference between user global and local networks, since it will lead to different semantics interpretations when calculating centrality measures over them. 4 Centrality Measures and Recommender Engine Centrality measures have been used extensively in the past to exploit networks and discover the most relevant nodes in a graph. In social network analysis (SNA) graph centralities are used to identify the most relevant persons, com- munities and even detect strange behaviors in the network. However, given the takeoff of social networks, people have increased their interaction not only to meet people and friends but to search about things they like, give their impres- sions, and reach points of interest and objects of interest. These spatio-temporal interactions can also be represented in a graph, thus, an accurate user profil- ing representation can give us a great deal of insight about the user behavior. Because of these approaches in exploiting graph measures we have explored the use of centralities to exploit the topological structure of our user global network and flow centrality to get the most of our weighted global and local networks in terms of a recommender engine. 4.1 Centrality Algorithms Centrality in graphs is widely used to measure the importance of a node in a graph, especially in SNA [9]. Our recommender engine implements these central- Fig. 3. User Local Network 9 ities to measure the relevance of people in the social network. Some centrality measures like closeness and betweenness are based on the calculation of the shortest distance to reach all other nodes in the graph. Our algorithms to calcu- late centralities are applied to the network of persons so we can infer the most popular nodes (degree), the capacity of a node to reach any other in the network (closeness), and to identify the leaders interconnected within a neighborhood in the graph (betweenness) [15]. Degree centrality is a measure that counts the direct relationships a node has, and thus, the nodes that are in direct contact. Closeness is defined as the inverse sum of the shortest paths to each other node and betweenness is defined as the number of shortest paths from all vertices to all others that pass through that node. Centrality measures are calculated over the network at a topological level given a scale-free graph of persons. Thus, these measures are not exploiting our weighted graph, they are applied only at a social-network level. Terms and objects of interest can be seen as sub-graphs of the global network that can be exploited by using flow-based centrality measures. 4.2 Flow-Centrality Algorithms We are using flow centralities [15] to measure the betweenness, closeness, and eccentricity in between the objects of interest, terms, and people profiles. Flow centralities allow us to exploit the semantic relationships between the user and the profiles of the objects of interest. Flow centralities reveal the most relevant nodes in the graph given their weights, for instance, given a set of terms asso- ciated to a user profile we can better understand a user preferences and give a better recommendation. Flow Betweenness In SNA betweenness is one of the most common referenced centralities. Flow betweenness (see Equation4) is defined in [11] as the max flow that passes through node xi , by the total flow between all pairs of points (pi ) where xi is neither a source nor a sink. A node with a high flow betweenness centrality has a large influence in the network because of the flow that passes through it. Due to the relevance of a node with high betweenness, in our rec- ommender model, a node with high betweenness should be recommended as the things the user cannot miss (see Figure 4). Pn j