=Paper= {{Paper |id=Vol-1630/paper2 |storemode=property |title=A Recommendation Engine Based On Social Metrics |pdfUrl=https://ceur-ws.org/Vol-1630/paper2.pdf |volume=Vol-1630 |authors=Ofelia Cervantes,Francisco Gutierrez,Ernesto Gutierrez,J. Alfredo Sanchez,Muhammad Rizwan,Wan Wanggen |dblpUrl=https://dblp.org/rec/conf/semweb/CervantesGGSRW15 }} ==A Recommendation Engine Based On Social Metrics== https://ceur-ws.org/Vol-1630/paper2.pdf
              A Recommendation Engine based
                    on Social Metrics

    Ofelia Cervantes1 , Francisco Gutiérrez1 Ernesto Gutiérrez1 , J. Alfredo
            Sánchez1 , Muhammad Rizwan2 , and Wan Wanggen2
               1
                Universidad de las Américas Puebla, Puebla, México
          2
           Institute of Smart City, Shanghai University, Shanghai, China
         ofelia.cervantes@udlap.mx, francisco.gutierrez@udlap.mx,
         ernesto.gutierrezca@udlap.mx, alfredo.sanchez@udlap.mx,
                  rizwan@shu.edu.cn, wanwg@staff.shu.edu.cn



      Abstract. Recommender systems have changed the way people find
      products, points of interest, services or even new friends. The technology
      behind recommender systems has evolved to provide user preferences and
      social influences. In this paper, we present a first approach to develop a
      recommendation engine based on social metrics applied to graphs that
      represent object’s characteristics, user profiles and influences obtained
      from social connections. It exploits graph centrality measures to elab-
      orate personalized recommendations from the semantic knowledge rep-
      resented in the graph. The graph model and selected graph algorithms
      for calculating graph centralities that are the core of the recommender
      system are presented. Semantic concepts such as semantic predominance
      and similarity measures are adapted to the graph model. Implementation
      challenges faced to solve performance issues are also discussed.

      Keywords: recommender systems, social metrics, graph models, mas-
      sive graph processing


1   Introduction

The widespread availability of products and services through web-based and
mobile applications makes it difficult for end users to select the right item, ac-
cording to their preferences. A recommender system must make use of different
sources of information for providing users with suggestions of items that better
correspond to their expectations. Those sources can include user preferences,
item descriptions or social information.
    Several approaches have been proposed to tackle the problem of selecting
automatically the list of items that really contributes to satisfy the needs of end
users. Approaches based on demographics or modeling user profiles are oriented
to exploit user features and preferences for filtering available choices. The main
challenge of this approach is to create the user profile from scratch. Some systems
invite users to select their preferences from a predefined list of categories or
allow the system to extract their profiles from other applications. Other systems
2

register every user action to dynamically build models based upon the user’s
behavior. Some research works focus on collecting ratings made by users that
have evaluated the product or service providing a first-hand perception of the
quality of the evaluated item. Other approaches have focused on describing the
main features of every item and trying to match them with user requirements.
    Modeling social networks using graphs has opened opportunities for exploring
new alternatives for implementing recommender systems. Social metrics, such as
flow centralities calculated on graph-based models provide interesting measures
to represent the semantic predominance of concepts featuring users’ preferences
as well as item characteristics. A first proof of concept was accomplished and
reported in [6]. A more detailed description of the model is presented here as
well as implementation problems that were solved to provide a complete recom-
mendation engine that can be integrated in recommendation applications.
    The remainder of the paper is structured as follows: Section 2 presents related
work. Then, Section 3 introduces the proposed graph-based recommendation
model. Section 4 discusses selected graph algorithms used to calculate social
metrics, particularly flow centralities. Next, Section 5 describes the framework
developed in the prototype and the performance challenges we had to overcome.
Finally, in Section 6 we report the results we have obtained thus far and discuss
future work.


2     Related Work

Traditionally, recommender systems are classified into the following categories:
demographic, content-based, collaborative filtering, social-based, context-aware
and hybrid approaches. However, the ever increasing amount of information
available in social media enables the development of knowledge-based recom-
mender systems [4]. The rich knowledge that has been accumulated in social
media can be exploited to improve recommendation outcomes, enhance user
experience and to develop new algorithms. In addition to the classification of
recommender systems, Zanker [24] illustrates the fundamental building blocks
of a recommender system, out of which we highlight three: user model, commu-
nity (social network), and a recommendation algorithm. Thereby, we present a
brief overview of related work regarding knowledge-base recommender systems
from building blocks perspectives.


2.1   User Modeling

In order to personalize recommendations, it is necessary to know information
about each user. User models are representations of users’ needs, goals, prefer-
ences, interests, and behaviors along with users’ demographic characteristics [19].
Several user modeling approaches have been proposed, from typical weighted
vectors to domain ontologies. In [1] the authors defined a user model based on
fuzzy logic and proposed an approach to infer the degree of genre presence in
                                                                                    3

a movie by exploiting the tags assigned by the users. In [24] the authors pre-
sented a simple attribute-value pair dictionary to model the user through the
explicit elicitation of user requirements. A richer user model is presented in [10],
where the authors used a machine learning process to capture the user profile
and context into a domain ontology.
    Our work tries to balance between simple [24] and complex models [10] with
the goal of having an efficient but still rich user model. Other works, like Canta-
dor et al. and Moahedian et al. [5, 14], are similar to our proposed user model,
since we use tags and keywords to build a lax ontology.


2.2   Recommendation Algorithms and Techniques

There is a wide range of recommendation algorithms and techniques. They vary
according to data availability, recommender filtering type as well as user and
object representation. Several methods have demonstrated to have an acceptable
performance such as: association rule learning [24], Bayesian networks [8], nearest
neighbors [2], genetic algorithms [13], neural networks [3], clustering [20], latent
semantic features, among others that can be found with more detail in [4].
    In this work, we use graph metrics commonly used in social network anal-
ysis [21]. We propose semantic social network analysis that integrates semantic
methods of knowledge engineering and natural language processing with classic
social network analysis. Advantages of semantic social network analysis are: its
knowledge foundation and its non probabilistic nature. In contrast, one disad-
vantage is its computational cost. Therefore, enhancement techniques are needed
to process graph metrics more efficiently.


2.3   Information Extracted from Social Networks

Recommender systems are creating unique opportunities to assist people to find
relevant information when browsing the web, and making meaningful choices
with the success of emerging Web 2.0, and various social network Websites. In
[7], the author has proposed a novel approach for recommendation systems that
uses data collected from social networks.
     Wang et al. [22] studied the problem of recommending new venues to users
who participate in location-based social networks (LBSNs) and propose algo-
rithms that create recommendations based on: past user behavior (visited places),
the location of each venue, the social relationships among the users, and the sim-
ilarity between users.
     Ye et al. [23] exploited the social and geographical characteristics of users and
locations/places to research issues in realizing location recommendation services
for large-scale location-based social networks. They observed the strong social
and geospatial ties among users and their favorite locations/places in the system
via the analysis of datasets collected from Foursquare.
     Similar to our work, Savage [18] investigated the design of a more complete,
ubiquitous location-based recommendation algorithm that is based on a text
4

classification problem. The system learns user preferences by mining a person’s
social network profile. The author also defined a decision-making model, which
considers the learned preferences, physical constraints, and how the individual
is currently feeling.
    We can state that novel approaches rely mainly on the fusion of information
inferred from a user’s social network profile and other data sources (e.g. mobile
phone’s sensors). In this sense, it is necessary to develop new strategies that
produce recommendations from rich but still incomplete information.


3     Graph Model

Our recommendation engine is based on a graph representation of users and
objects of interest linked through concepts (denoted as terms). Figure 1 shows
the graph model where every node falls in one of three categories: User, Term
or Object of Interest, and every edge represents the semantic relation between
nodes: Predominance, Similarity or Friendship. Every Term node of the graph
in Figure 1 acts as a semantic descriptor of both: Users and Objects of Interest.
In other words, every user and every object are correspondingly described by the
terms linked to them. In general, users are described by their tastes, preferences,
and interest (user model) whereas objects are described by tags and keywords
(object model). In this manner, when a term is shared between a user and an
object, it shows the possibility that the user could be interested in that particular
object, even though the object had never been seen or rated by user.
    A graph-based representation allows us to apply graph algorithms (e.g. so-
cial metrics) to discover topological features, key relationships, and important
(prestigious) nodes. Then, with these features we can make relevant recommen-
dations to users, such as suggesting friends or places. Therefore, the foundation
of our recommender system relies on a knowledge base constructed from both:
a user model (see section 3.3) and an object model (see section 3.4).
    In order to construct the user and object models, we applied a linguistic
analysis over user and object text descriptions. Basically, we conducted pre-
processing (removal of stopwords and selection of most descriptive words) and
statistical linguistic analysis (using weighting schemes: tf-idf and okapi BM25)
to define a bond between text descriptions and semantic relations represented in
the graph (see section 3.2). It is possible to obtain user and object descriptions
from social networks (Facebook, Foursquare, Twitter), web pages (Wikipedia,
web search results, etc.), human experts contributions or other textual resources.


3.1   Weighted Graph Definition

Formally, we define a weighted graph G = (V, E, fE ) where V = {v1 , . . . , vn }
is a set of vertices vertex (nodes), E = {e1 , . . . , en } ⊂ {{x, y} | x, y ∈ V } is a
set of edges, and fE : E → R the function on weights for every edge. In our
recommender system V = U ∪ T ∪ O where U is the set of users, T is the set
of terms, and O is the set of objects of interest. E = P ∪ S ∪ F where P is
                                                                                 5




Fig. 1. Graph model. Node layers: User, Term, and Objects of Interest linked trough
Friendship, Predominance and Similarity edges.


the set of predominance edges, S the set of similarity edges, and F is the set
of friendship edges. Function fE is adapted according to each type of edge. For
instance, we can obtain the sub-graph of users as GU = (U, F, fF ) (see User
layer in Figure 1), the sub-graph of objects as GO = (O, S, fS ) (see Objects
of Interest layer in Figure 1), and the sub-graph of user and object profiles as
GU ∪T ∪O = (U ∪ T ∪ O, P, fP ).

3.2   Semantic Relations
In order to build the semantic relations of the graph, it is necessary to obtain
text descriptions of users and objects. As a result, we have two collections: the
users text collection (UTC) and the objects text collection (OTC), where each
text description is considered a document D in a vector space model.
    We define three types of semantic relations (edges of the graph): predomi-
nance, similarity, and friendship. Each semantic relation links different types of
nodes and has a different weighting function. Predominance is the edge between
a user or an object and a term, similarity is the edge between two objects and
friendship is the edge between two users.
    Predominance is the semantic relation between a term and a user or an object.
A term acts as a descriptor of users and objects. We define a weighting function
over the edge of predominance based on linguistic analysis. We apply Okapi
BM25 ranking function [17] to each independent document collection (UTC and
OTC) using Equation 1.
    In Equation 1, pred is the predominance of the term T in document D, Idoc
is the number of indexed documents (size of collection), Tdoc is the number of
documents containing term T , T F is the term frequency relative to document D,
DL is the document length, avgDL is the average document length among the
entire collection, K and B are free parameters (usually K = 1.2 and B = 0.75).
6



                                                                                   !
                             Idoc + 0.5                    T F ∗ (K + 1)
    pred(D,T ) = log10                        ∗                             DL
                                                                                         (1)
                             Idoc + 0.5           T F + K ∗ ((1 − B) + B ∗ avgDL )

   Similarity is the semantic relation between two objects. This measure in-
dicates the degree of affinity between objects. We apply the cosine similarity
measure (Equation 2) to obtain this value. The Similarity is calculated after the
predominance, since it relies on shared terms. Then, every object is a vector of
predominances as shown below Equation 2.
                                                             A·B
                         Similarity(A,B) = cos Θ =                                       (2)
                                                            kAk kBk

                                                                         
                ObjectA = pred(A,T1 ) , pred(A,T2 ) , · · · , pred(A,Tn )
                                                                        
                ObjectB = pred(B,T1 ) , pred(B,T2 ) · · · , pred(B,Tn )

    In Equation 2, the similarity between object A and object B is determined by
the weights of the terms they have in common. In this manner, a high similarity
value indicates a higher semantic correspondence between objects.
    Friendship is the semantic relation between two users. This measure indicates
the degree of affinity between two users. Our current model does not distinguish
between close friends, friends or acquaintances. Therefore, the users’ sub-graph
is only a friend-of-a-friend (FOAF) node-link type.

3.3    User Model
As part of the graph-based representation, users are defined as sub-graphs. A
user model is composed of two sub-graphs: user profile Gu = (u, P, fP ) and user
FOAF network GU = (U, F, fF ). Figure 2A shows the user profile network and
2B the user FOAF network. In the user profile sub-graph, each user is linked
to a set of terms that indicate tastes, preferences and interests. Tastes are gen-
eral inclinations of user towards some entities and they are generally expressed
with actions such as likes (e.g. Foursquare, check-ins and Facebook likes football,
beer, steak, coffee, etc ). Preferences are user inclinations towards taste features.
Preferences are more fine grained than tastes and are usually expressed in users’
reviews and ratings (e.g. starred reviews: I like the double espresso, I don’t like
diet soda). Interests are defined as contextual user inclinations or intentions (e.g.
I want to try Chinese food, I’m going to watch minions movie).
                               
                                pred(U,T )         Case A
                         fP = 1                      Case B                       (3)
                                  #stars − 2 ∗ 13    Case C
                               

   Our scheme to weight edges within a user profile is indicated in Equation
3. Case A occurs when only text descriptions are used; this means that terms
                                                                                 7




                Fig. 2. A)User profile and B)User FOAF Network


are weighted according to the Okapi measure (pred, as shown in Equation 1).
Case B occurs when explicit likes are found in Foursquare or Facebook. Case C
occurs when terms extracted from starred reviews are used to describe a user.
In addition to Equation 3, we use a threshold value to limit the number of
terms connected with a given user. In fact, we use the first quartile as threshold
value. An example of user profile is shown in Figure 2A, where, it is possible to
notice that a user likes football, rock and coffee, and is likely to be student. In
user friendship networks, as mentioned earlier, there are no differences among
friendship types. Then, in FOAF network all weights are equal to 1 (fF = 1). A
user FOAF network is shown in Figure 2B.


3.4   Object Model

Objects of interest are sets of items that can be of potential interest to a user.
Depending upon the application, objects of interest can be of different grain
size. For instance, they can be coarse-grained as a point of interest (POI) or
fine-grained as items inside places. An object is described by the category to
which it belongs (e.g. dog is an animal). Also, it is described by tags designed
by users or keywords found in object’s description. The object model is consists
of two sub-graphs: object profile Go = (o, P, fP ) and object similarity sub-graph
GO = (O, S, fS ). Object profile was built with data gathered from Foursquare,
Wikipedia and results from web searches. The weights of edges that link objects
and terms were calculated using the predominance formula shown in Equation
1. This means that the weight function on edges is fP = pred(O,T ) .


3.5   User Global and Local Network

In order to apply social metrics (centrality measures) and relate them to perti-
nent recommendations, we defined two networks from user perspective: a user
global network (U GNu ) and a user local network (U LNu ). User global network
is the whole graph (all nodes: users U , terms T and objects O and all edges:
similarities S, predominances P and friendships F ) centered in current user.
8

Therefore, U GNu = (U ∪ T ∪ O, S ∪ P ∪ F ) (see Figure 1). Whereas user local
network is the sub-graph defined by current user node u, term nodes adjacent to
user Tu and object nodes adjacent to terms node OT linked trough predominance
edges from user Pu and from objects PO . Hence, U LNu = (u ∪ Tu ∪ OT , Pu ∪ PO )
(see Figure 3). It is important to highlight the difference between user global
and local networks, since it will lead to different semantics interpretations when
calculating centrality measures over them.


4     Centrality Measures and Recommender Engine
Centrality measures have been used extensively in the past to exploit networks
and discover the most relevant nodes in a graph. In social network analysis
(SNA) graph centralities are used to identify the most relevant persons, com-
munities and even detect strange behaviors in the network. However, given the
takeoff of social networks, people have increased their interaction not only to
meet people and friends but to search about things they like, give their impres-
sions, and reach points of interest and objects of interest. These spatio-temporal
interactions can also be represented in a graph, thus, an accurate user profil-
ing representation can give us a great deal of insight about the user behavior.
Because of these approaches in exploiting graph measures we have explored the
use of centralities to exploit the topological structure of our user global network
and flow centrality to get the most of our weighted global and local networks in
terms of a recommender engine.

4.1   Centrality Algorithms
Centrality in graphs is widely used to measure the importance of a node in a
graph, especially in SNA [9]. Our recommender engine implements these central-




                            Fig. 3. User Local Network
                                                                                 9

ities to measure the relevance of people in the social network. Some centrality
measures like closeness and betweenness are based on the calculation of the
shortest distance to reach all other nodes in the graph. Our algorithms to calcu-
late centralities are applied to the network of persons so we can infer the most
popular nodes (degree), the capacity of a node to reach any other in the network
(closeness), and to identify the leaders interconnected within a neighborhood
in the graph (betweenness) [15]. Degree centrality is a measure that counts the
direct relationships a node has, and thus, the nodes that are in direct contact.
Closeness is defined as the inverse sum of the shortest paths to each other node
and betweenness is defined as the number of shortest paths from all vertices
to all others that pass through that node. Centrality measures are calculated
over the network at a topological level given a scale-free graph of persons. Thus,
these measures are not exploiting our weighted graph, they are applied only at a
social-network level. Terms and objects of interest can be seen as sub-graphs of
the global network that can be exploited by using flow-based centrality measures.

4.2   Flow-Centrality Algorithms
We are using flow centralities [15] to measure the betweenness, closeness, and
eccentricity in between the objects of interest, terms, and people profiles. Flow
centralities allow us to exploit the semantic relationships between the user and
the profiles of the objects of interest. Flow centralities reveal the most relevant
nodes in the graph given their weights, for instance, given a set of terms asso-
ciated to a user profile we can better understand a user preferences and give a
better recommendation.

Flow Betweenness In SNA betweenness is one of the most common referenced
centralities. Flow betweenness (see Equation4) is defined in [11] as the max flow
that passes through node xi , by the total flow between all pairs of points (pi )
where xi is neither a source nor a sink. A node with a high flow betweenness
centrality has a large influence in the network because of the flow that passes
through it. Due to the relevance of a node with high betweenness, in our rec-
ommender model, a node with high betweenness should be recommended as the
things the user cannot miss (see Figure 4).
                                      Pn
                                        j