<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uncertain Graphs meet Collaborative Filtering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudio Taranto</string-name>
          <email>claudio.taranto@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Di Mauro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Floriana Esposito</string-name>
          <email>esposito@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bari "Aldo Moro" via E. Orabona</institution>
          ,
          <addr-line>4 - 70125, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Collaborative ltering (CF) aims at predicting the user interest for a given item. In CF systems a set of users ratings is used to predict the rating of a given user on a given item using the ratings of a set of users who have already rated the item and whose preferences are similar to those of the user. In this paper we propose to use a framework based on uncertain graphs in order to deal with collaborative ltering problems. In this framework relationships among users and items and their corresponding likelihood will be encoded in a uncertain graph that can then be used to infer the probability of existence of a link between an user and an item involved in the graph. In order to solve CF tasks the framework uses an approximate inference method adopting a constrained simple path query language. The aim of the paper is to verify whether uncertain graphs are a valuable tool for CF, by solving classical, complex and structured problems. The performance of the proposed approach is reported when applied to a real-world domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The inherent uncertainty and complexity present in some real world domains
has led to the emerging of many probabilistic frameworks, such as probabilistic
graphical models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and statistical relational learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], able to deal with
uncertain and structured domains. Learning and reasoning on uncertain graphs 1
has become an increasingly important research topic [
        <xref ref-type="bibr" rid="ref11 ref19 ref29 ref9">19, 29, 9, 11</xref>
        ]. In this model,
each edge is associated with a probability representing the likelihood of its
existence in the graph, and the edges existence is assumed to be mutually
independent.
      </p>
      <p>Collaborative ltering (CF) aims at predicting the user interest for a given
item based on a collection of user proles. Collaborative ltering is an approach
adopted in recommender systems that attracted much of attention in recent
years. In CF systems a set of users ratings is used to predict the rating of a
given user u on a given item i using the ratings of a set of users who have
already rated i and whose preferences are similar to the ones of u.</p>
      <p>
        CF systems need to compare items against users and this task may be solved
with a memory based approach that may be divided into user-based or item-based
approaches. A typical example of memory based approaches are neighborhood
1 Uncertain graphs are also referred to probabilistic graphs as in [
        <xref ref-type="bibr" rid="ref29 ref9">29, 9</xref>
        ].
based CF methods centered on computing the relationships between items or
between users. Given an unknown rating to be estimated, memory-based CF
rstly computes similarities between the given user and other users ( user-based
approach), or between the given item and other items ( item-based approach).
Then, the unknown rating is predicted by averaging the known ratings by similar
users or by similar items [
        <xref ref-type="bibr" rid="ref15 ref4">4, 15</xref>
        ].
      </p>
      <p>In this paper we propose to use uncertain graphs to deal with collaborative
ltering problems. In particular, relationships among users and items and their
corresponding likelihood will be encoded in a uncertain graph that can then be
used to infer the probability of existence of a link between an user and an item
involved in the graph.</p>
      <p>The main questions that we want to answer in this paper are the following:
Q1 : are uncertain graphs a valuable tool for collaborative ltering?
Q2 : can uncertain graphs solve classical CF user-based and item-bases tasks?
Q3 : can uncertain graphs unify user-based and item-based CF approaches?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Uncertain graphs</title>
      <p>Let G = (V; E), be a graph where V is a collection of nodes and E 2 V
the set of edges, or relationships, between the nodes.</p>
      <p>V is
Denition 1 (Uncertain graph). An uncertain graph is a system G = (V; E;
; lV ; lE ; P ), where (V; E) is an undirected graph, V is the set of nodes, E is the
set of edges, is a set of labels, lV : V ! is a function assigning labels to
nodes, lE : E ! is a function assigning labels to the edges, and P : E ! [0; 1]
is a function assigning existence probability values to the edges.
The existence probability P (e) of an edge e = (u; v) 2 E is the probability that
edge between u and v can exist in the graph. A particular case of uncertain
graph is the certain graph when the existence probability value on all edges is 1.
In this paper we use the possible world semantics. In particular, we can imagine
an uncertain graph G as a sampler of worlds, where each world is an instance of
G. A certain graph G0 is sampled from G according to P , denoted as G0 v G,
when each edge e 2 E is selected to be an edge of G0 with probability P (e).
Edges labeled with probabilities are treated as mutually independent random
variables indicating whether or not the corresponding edge belongs to a certain
graph. Assuming independence among edges, the probability distribution over
certain graphs G0 = (V; E0) v G = (V; E) is given by</p>
      <p>P (G0jG) = Y P (e)
e2E0</p>
      <p>Y (1
e2EnE0</p>
      <p>P (e)):
(1)
Denition 2 (Simple path). Given an uncertain graph G, a simple path of
a length k from u to v in G is a sequence of edges pu;v = he1; e2; : : : eki, such
that e1 = (u; v1), ek = (vk1 ; v), and ei = (vi 1; vi) for 1 &lt; i &lt; k, and all nodes
in the path are distinct.</p>
      <p>Given G an uncertain graph, and ps;t a path in G from node s to node t,
l(ps;t) = l(e1)l(e2) l(ek) denotes the concatenation of the labels of all edges in
ps;t. Given a context free grammar (CFG) C a string of terminals s is derivable
from C i s 2 L(C), where L(C) is the language generated from C.
Denition 3 (Language constrained simple path). Given an uncertain
graph G and a context free grammar C, a language constrained simple path is a
simple path p such that l(p) 2 L(C).</p>
      <p>Given an uncertain graph G a main task corresponds to compute the
probability that there exists a path between two nodes u and v, that is, querying
for the probability that a randomly sampled certain graph contains a path
between u and v. More formally, the existence probability Pe(qjG) of a path q in
an uncertain graph G corresponds to the marginal P (G0jG) with respect to q:
Pe(qjG) =</p>
      <p>X P (qjG0) P (G0jG)</p>
      <p>G0vG
where P (qjG0) = 1 if there exits the path q in G0, and P (qjG0) = 0 otherwise. In
other words, the existence probability of path q is the probability that the path
q exists in a randomly sampled certain graph.</p>
      <p>Denition 4 (Language constrained simple path probability). Given an
uncertain graph G and a context free grammar C, the language constrained
simple path probability of L(C) is</p>
      <p>P (L(C)jG) = X P (qjG0; L(C)) P (G0jG)</p>
      <p>G0vG
whereP (qjG0; L(C) = 1 if there exists a path q in G0 such that l(q) 2 L(C), and
P (qjG0; L(C)) = 0 otherwise.</p>
      <p>In particular, the previous denition give us the possibility to compute the
probability of a set of simple path queries fullling the structure imposed by a context
free grammar. In this way we are interested in certain graphs that contain at
least one path belonging to the language corresponding to the given grammar.
2.1</p>
      <p>
        Inference
Computing the existence probability directly using (2) or (3) is intensive and
intractable for large graphs since the number of certain graphs to be checked
is exponential in the number of probabilistic edges. It involves computing the
existence of the path in every certain graph and accumulating their probability. A
natural way to overcome the intractability of computing the existence probability
of a path is to approximate it using a Monte Carlo sampling approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: 1)
we sample n possible certain graphs, G1; G2; : : : Gn from G by sampling edges
uniformly at random according to their edge probabilities; and 2) we check if the
(2)
(3)
path exists in each sampled graph Gi. This process provides the basic sampling
estimator
      </p>
      <p>Pn
Pce(qjG) Pe(qjG) = i=1 P (qjGi)
n</p>
      <p>Note that is not necessary to sample all edges to check whether the graph
contains the path. For instance, assuming to use an iterative depth rst search
procedure to check the path existence. When a node is just visited, we will sample
all its adjacent edges and pushing them into the stack used by the iterative
procedure. We will stop the procedure either when the target node is reached or
when the stack is empty (non existence).
(4)
3</p>
      <p>Uncertain graphs for collaborative ltering
The most common approach to CF is based on neighborhood models.
Useroriented methods estimate unknown ratings based on recorded ratings of similar
users, while in item-oriented approaches ratings are estimated using known
ratings made by the same user on similar items.</p>
      <p>Let U be a set of n users and I a set of m items. A rating rui indicates
the preference by user u of item i, where high values mean stronger preference.
Let Su be the set of items rated from user u. For user-based approaches, the
prediction of an unobserved rating rui is computed as follows</p>
      <p>c
rui = ru +
c</p>
      <p>Pv2Uji2Su suv (rvi</p>
      <p>P
v2Uji2Su jsuvj
rv)
where ru represents the mean rating of user u, and suv stands for the similarity
between users u and v, computed, for instance, using the Pearson correlation:
suv =
qP</p>
      <p>P
a2Su\Sv (rua
ru) (rva</p>
      <p>rv)
a2Su\Sv (rua
ru)2 Pa2Su\Sv (rva
rv)2
(5)
(6)
(7)</p>
      <p>On the other side, item-based approaches predict the rating of a given item
using the ratings of the user on the items considered as similar to the target
item. Given a similarity measure, such as the Pearson correlation, the rating rui
is estimated as: c
rui =
c</p>
      <p>Pj2Sujj6=i sij ruj</p>
      <p>P</p>
      <p>j2Sujj6=i jsij j</p>
      <p>
        These neighbourhood approaches see each user connected to other users or
consider each item related to other items as in a network structure. In
particular they rely on the direct connections among the entities involved in the
domain. However, as recently proved, techniques able to consider complex
relationships among the entities, leveraging the information already present in the
network, involves an improvement in the processes of querying and mining [
        <xref ref-type="bibr" rid="ref21 ref24">24,
21</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] the authors improved the accuracy of a similarity measures between
two annotated nodes in a graph by using link information. They showed that
the similarity between nodes annotations may be improved using also the
network context. Another approach [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to enriched a graph representation is the
addition of semantic information improving link prediction results in network
datasets. In particular, a supervised learning method for building link
predictors from structural attributes of the underlying network using some semantic
attributes of the nodes has been adopted.
      </p>
      <p>The approach used in this paper is to represent a dataset consisting of user
ratings, K = f(u; i; rui)jrui is knowng, with an uncertain graph and then
performing inference on this graph to solve classical collaborative ltering tasks.
Hence the question to be solved is how to build the uncertain graph from the
at rating representation K. The formal characterization we have provided about
uncertain graphs gives us the possibility to represent heterogeneous objects and
connections.
3.1</p>
      <p>Uncertain graph construction
Given the set of ratings K = f(u; i; rui)jrui is knowng, we add a node with label
user for each user in K, and a node with label item for each item in K. The next
step is to add the edges among the nodes. Each edge is characterized by a label
and a probability value, which should indicate the degree of similarity between
the two nodes. Two kind of connections between nodes are added. For each user
u, we added an edge, labeled as simU, between u and the k most similar users to
u. The similarity between two users u and v is computed adopting a weighted
Pearson correlation between the items rated by both u and v.</p>
      <p>In particular, the probability of the edge simU connecting two users u and v
is computed as:</p>
      <p>P (simU(u; v)) = suv wu(u; v);
where suv is the Pearson correlation between the vectors of ratings corresponding
to the set of items rated by both user u and user v, and
wu(u; v) = jSu \ Svj ;
jSu [ Svj
where Su is the set of items rated from user u.</p>
      <p>For each item i, we added an edge, with label simI, between i and the most
k similar items to i. In particular, the probability of the edge simI connecting
the item i to the item j has been computed as:</p>
      <p>P (simI(i; j)) = sij wi(i; j);
where sij is the Pearson correlation between the vectors corresponding to the
histogram of the set of ratings for the item i and the item j, and
wi(i; j) = jSi \ Sjj ;
jSi [ Sjj
where Si is the set of users rating the item i.</p>
      <p>Edges with probability equal to 1, and with label rk between the user u and
the item i, denoting the user u has rated the item i with a score equal to k, are
added for each element rui belonging to K.</p>
      <p>After having dened the uncertain graph, now we can solve classical
collaborative ltering task by computing the probability of some language constrained
simple paths. Since the goal is to predict an unknown rating between an user u
and an item i, let us assume that the values of rui are discrete and belonging
to a set R. Given the uncertain graph G, the approach we used to predict the
rating rcui is to solve the following maximization problem:
rui = arg max P (rj(u; i)jG);
c j
(8)
where rj(u; i) is the unknown link with label rj between the user u and the item
i. In particular, the maximization problem corresponds to compute the link
prediction for each rating value and then choosing the rating with maximum
likelihood.</p>
      <p>The previous link prediction task is based on querying the probability of some
language constrained simple path. For instance, user-based CF may be simulated
by querying the probability of the paths, starting from a user node and ending to
an item node, belonging to the context free language Li = fsimU1ri1g. In
particular, predicting the probability of the rating j as P (rj(u; i) in (8) corresponds to
compute the probability P (qjG) for a query path in Li, i.e., computing P (LijG)
as in (3):
rui = arg max P (rj(u; i)jG)
c j
arg max P (LijG):
j
(9)</p>
      <p>In the same way, item-base CF could be simulated by computing the
probability of the paths belonging to the CFL Li = fri1simI1g.</p>
      <p>The power of the proposed framework gives us the possibility to construct
more complex queries such as that belonging to the CFL Li = frisimIn : 1
n 2g, that gives us the possibility to explore the graph by considering not only
direct connections. Finally, we can implement hybrid CF systems solving queries
belonging to the CFL Li = frisimIn : 1 n 2g [ fsimUmri1 : 1 m 2g.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In order to validate the proposed approach two versions of the MovieLens 2
dataset has been used. The MovieLens data sets were collected by the
GroupLens Research Project at the University of Minnesota. The rst version called
MovieLens 100K consists of 100,000 ratings (1-5) from 943 users on 1682 movies,
where each user has rated at least 20 movies and there are simple demographic
info for the users (age, gender, occupation, zip). The data was collected through
the MovieLens web site during the seven-month period from September 19th,
1997 through April 22nd, 1998. The second version called MovieLens 1M consists
2 http://www.grouplens.org/
of 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040
MovieLens users. In this paper we used the ratings only without considering the
demographic information.</p>
      <p>MovieLens 100K dataset is divided in 5 fold, where each fold present a
training data (80000 ratings) and a test data (20000 ratings), while MovieLens 1M is
divided in 10 fold. For each training/testing fold the validation procedure follows
the following steps:
1. creating the uncertain graph from the training ratings data set as reported</p>
      <p>Section 3;
2. dening a context free language corresponding to a specic CF task;
3. testing the ratings reported in the testing data set T by computing, for each
pair (u; i) 2 T the predicted rating as in (9) and comparing the result with
the true prediction reported in T .</p>
      <p>In this particular dataset we have a uncertain graph with nodes labeled as
user or as film. There are edges between two film nodes labeled as simF,
and there are edges with label simU between two user nodes. These edges are
added using the procedure presented in the previous section, where we set the
parameter n = 30, indicating that an user or a lm is connected, respectively, to
30 most similar users, resp. lms . Finally, for each rating (u; i; rui = k) belonging
to the training set there is an edge between the user u and the lm i whose label
is rk. The goal is to predict the correct rating for each instance belonging to
the testing set T . The predicted rating has been computed using a Monte Carlo
approach by sampling 100 certain graphs and adopting the function reported in
(9).</p>
      <p>The accuracy of the proposed framework has been evaluated according to
the mean absolute error (MAE) a most commonly applied evaluation metric for
CF rating predictions. Assuming N computed rating predictions:
M AE =
1 XN
N i=1
jrcui
ruij:
(10)
4.1</p>
      <p>Results
In order to evaluate the framework, we proposed to query the paths belonging
to the context free languages reported in Table 1. The rst language constrained
simple paths L1 reported in Table 1 corresponds to solve a user-based CF
problem, while the second language L2 gives us the possibility to simulate a
itembased CF approach. As we can see from Table 2 results improve when we go
from a user-based approach to a item-based one.</p>
      <p>Then we try to build a basic hybrid system by combining both the languages
L1 and L2 into the language L3. Now, as we can see in Table 2 results are better
than that obtained when we used a single language only. Then, we propose to
extend the basic languages L1 and L2 in order to consider a neighbourhood with
many nested levels. In particular, instead of considering the direct neighbours
only, we inspect the uncertain graph following a path with a maximum length
of two edges, labeled respectively as simU for the language L4 and simF for the
language L5. Their corresponding results are better than that obtained with the
basic language L1 and L2 thus proving the validity of the approach. Language
L6 extends language L5 in order to inspect the uncertain graph following a path
with a maximum length of three edges by obtaining better results than others
languages.</p>
      <p>Finally, the language L7 combines both the user-based and item-based
approach, and the large neighbourhood explored with paths whose length is greater
than one. As we can see, this language is the best among all the others in
providing a good MAE value.</p>
      <p>Fold</p>
      <p>
        MSD
SRC
PC
L2
L5
L6
L8
0:7602
0:7529
0:7518
0:7916
0:7381
0:7293
0:7198
Given a snapshot of a graph (network), the goal we are dealing with is to
accurately predict edges that could be added to the network in future, sometime
called link prediction problem [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. There are a lot of application where link
prediction can be used such as identifying the structure of a criminal network,
overcoming the data-sparsity problem in recommender systems using
collaborative ltering [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], analyzing users navigation history to generate users tools that
increase navigational eciency [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. A problem close to link prediction is link
completion [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The data, collected from the real life sources, is usually noisy
and might contain gaps, i.e. links may be incomplete, containing one or more
unknown members. The problem of link completion addresses the task of
determining the missing member given a partial link. This question is similar to
those found in the collaborative ltering domain [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The link prediction problem
is also related to the problem of inferring missing links from an observed
network: in a number of domains, one constructs a network of interactions based on
observable data and then tries to infer additional links that, while not directly
visible, are likely to exist [
        <xref ref-type="bibr" rid="ref18 ref22 ref7">7, 18, 22</xref>
        ].
      </p>
      <p>
        All these methods assign a connection weight score(x; y) or a similarity
s(x; y) to pairs of nodes x, y, based on the input graph, and then produce a
ranked list in decreasing order of s(x; y). This approach may be viewed as
computing a measure of proximity or a similarity between nodes. The most basic
approach to compute this ranked list could be that to rank pairs x, y by the
length of their shortest path in the network G . Such a measure follows the
notion that collaboration networks are small worlds, in which individuals are
related through short chains [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Shortest path between two nodes denes the
minimum number of edges connecting them. If there is no such connecting path
then, the value of this attribute is taken as innite.
      </p>
      <p>Other methods try to compute the similarity between two nodes by looking
their corresponding neighborhoods. Given a node x, let N (x) be the set of
neighbours of x in a graph G. Given two nodes x and y, there are several approaches
that follow the natural intuition that if the set of neighbours N (x) and N (y)
have a large overlapping then the node x and the node y should be very similar.</p>
      <p>
        Common neighbours measure the number of neighbors that node x and node
y have in common, in particular s(x; y) = jN (x) \ N (y)j. Newman in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] shows
a correlation between the number of common neighbours of x and y at the time
t, and the probability they will be similar in the future.
      </p>
      <p>
        Jaccard’s coecient, used in information retrieval, measures the
probability that both x and y have a feature f in common, for a randomly selected
feature f . Using neighbours we can compute this as follow s(x; y) = jN (x) \
N (y)j=jN (x) [ N (y)j. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] considers the similarity problem between two entities
as s(x; y) = Pz2N(x)\N(y) logjN1 (z)j where z is a set of features shared both by x
and y. Finally, preferential attachment is based on empirical evidence that the
probability of x and y being connected is correlated with the product of the
number of connections of x and y (N (x) and N (y)). The measure is computed
as s(x; y) = jN (x)j jN (y)j.
      </p>
      <p>
        Other methods are based on ensemble of paths. Katz [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] denes a similarity
measure that directly sums over a collection of paths, exponentially damped by
length in order to count short paths more heavily. This leads to the measure
s(x; y) = Pl1=1 l jpathshxl;iyj where pathshxl;iy is the set of all lengh-l paths from
x to y. There exists two variants of the Katz measure: unweighted, in witch
pathshx1;yi = 1 if x and y have collaborated and 0 otherwise, and weighted, in
witch pathshx1;yi is the number of times that x and y have collaborated.
      </p>
      <p>
        Another method uses random walks on the graph G [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], where starting
from a node x, the selection of next node to visit is done by choosing among
the neighbors of x at random. Using this approach it is possible to compute
the hitting time Hx;y as the expected number of steps required for a random
walk starting at x to reach y. SimRank [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] supposes that two nodes are similar
to the extent that they are joined to similar neighbors. In particular s(x; y) =
Pa2NjN(x)(xP)jbj2NN((yy))js(a;b) for some 2 [0; 1].
      </p>
      <p>
        All the methods described above consider the space of representation as a
graph with nodes of the network indicating the objects of the world and edges
with a numeric value that indicates their weight. Over the last few years
uncertain graphs have become an important research topic [
        <xref ref-type="bibr" rid="ref19 ref27 ref28">19, 27, 28</xref>
        ]. In these
graphs each edge is associated with an edge existence probability that quanties
the likelihood that the edge exists in the graphs. Using this representation it is
possible to adopt the possible world semantics to model it. One of main issue in
uncertain graphs is how to compute the connectivity of the network. The
network reliability problem [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a generalization of pairwise reachability , in which
the goal is to determine the probability that all pairs of nodes are reachable from
one another. Unlike a deterministic graph in which the reachability function is a
binary function indicating whether or not there is a path that connects the two
provided vertices, in the case of the reachability on uncertain graphs the
function assumes probabilistic values. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the authors provide a list of alternative
shortest-path distance measures for uncertain graphs in order to discover the k
closest vertices to a given vertex. Another work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] try to deal with the concept
of x y distance-constraint reachability problem. In particular, given two
vertices x and y, they try to solve the problem of computing the probability that
the distance from x to y is less than or equal to a user-dened threshold. In order
to solve this problem, they proposed an exact algorithm and two reachability
estimators based on probability sampling.
6
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper a framework based on uncertain graphs able to deal with
collaborative ltering problems has been presented. The evaluation of the proposed
approach has been reported by applying it to a real world dataset and proving
its validity in solving simple and complex collaborative ltering tasks. As future
development we will conduct further experiments in order to accurately
validate the framework. We will study how the size of the neighbourhood of each
node, during the graph construction phase, could inuence the quality of the
predictions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adamic</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adar</surname>
          </string-name>
          , E.:
          <article-title>Friends and neighbors on the web</article-title>
          .
          <source>Social Networks</source>
          <volume>25</volume>
          (
          <issue>3</issue>
          ),
          <volume>211230</volume>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Breese</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heckerman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kadie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Empirical analysis of predictive algorithms for collaborative ltering</article-title>
          .
          <source>In: Proceedings of the14th Annual Conference on Uncertainty in Articial Intelligence (UAI98)</source>
          . pp.
          <volume>4352</volume>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Colbourn</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          :
          <source>The Combinatorics of Network Reliability</source>
          . Oxford University Press (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Desrosiers</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karypis</surname>
          </string-name>
          , G.:
          <article-title>A comprehensive survey of neighborhood-based recommendation methods</article-title>
          . In: Ricci,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Shapira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kantor</surname>
          </string-name>
          , P.B. (eds.)
          <source>Recommender Systems Handbook</source>
          , pp.
          <fpage>107144</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diehl</surname>
            ,
            <given-names>C.P.</given-names>
          </string-name>
          :
          <article-title>Link mining: a survey</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>7</volume>
          (
          <issue>2</issue>
          ),
          <volume>312</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taskar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)</article-title>
          . The MIT Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>F.P.</given-names>
          </string-name>
          :
          <article-title>Assessing experimentally derived interactions in a small world</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>100</volume>
          (
          <issue>8</issue>
          ),
          <volume>43724376</volume>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goldenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubica</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komarek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A comparison of statistical and machine learning algorithms on the task of link completion</article-title>
          .
          <source>In: KDD Workshop on Link Analysis for Detecting Complex Behavior</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hintsanen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toivonen</surname>
          </string-name>
          , H.:
          <article-title>Finding reliable subgraphs from large probabilistic graphs</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <volume>323</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jeh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
          </string-name>
          , J.:
          <article-title>Simrank: a measure of structural-context similarity</article-title>
          .
          <source>In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>538543</fpage>
          . KDD '02,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.C.</surname>
          </string-name>
          :
          <article-title>Discovering highly reliable subgraphs in uncertain graphs</article-title>
          .
          <source>In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>9921000</fpage>
          . KDD '11,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Distance-constraint reachability computation in uncertain graphs</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>4</volume>
          ,
          <issue>551562</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>L.:</given-names>
          </string-name>
          <article-title>A new status index derived from sociometric analysis</article-title>
          .
          <source>Psychometrika</source>
          <volume>18</volume>
          (
          <issue>1</issue>
          ),
          <volume>3943</volume>
          (
          <year>1953</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
          </string-name>
          , N.:
          <article-title>Probabilistic Graphical Models: Principles and Techniques</article-title>
          . MIT Press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bell</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Advances in collaborative ltering</article-title>
          . In: Ricci,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Shapira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kantor</surname>
          </string-name>
          , P.B. (eds.)
          <source>Recommender Systems Handbook</source>
          , pp.
          <fpage>145186</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.J.</given-names>
          </string-name>
          :
          <article-title>Clustering and preferential attachment in growing networks</article-title>
          .
          <source>Phys. Rev. E</source>
          <volume>64</volume>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.J.:</given-names>
          </string-name>
          <article-title>The structure of scientic collaboration networks</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          <volume>98</volume>
          (
          <issue>2</issue>
          ),
          <volume>404409</volume>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Popescul</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.H.</given-names>
          </string-name>
          :
          <article-title>Statistical relational learning for link prediction</article-title>
          .
          <source>In: IJCAI03 Workshop on Learning Statistical Models from Relational Data</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Potamias</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gionis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollios</surname>
          </string-name>
          , G.:
          <article-title>k-nearest neighbors in uncertain graphs</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>3</volume>
          ,
          <issue>9971008</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sachan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ichise</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using semantic information to improve link prediction results in network datasets</article-title>
          .
          <source>IACSIT International Journal of Engineering and Technology</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <volume>7176</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Taranto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Di Mauro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Probabilistic inference over image networks</article-title>
          .
          <source>Italian Research Conference on Digital Libraries 2011 CCIS</source>
          <volume>249</volume>
          ,
          <issue>113</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Taskar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abbeel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Link prediction in relational data</article-title>
          .
          <source>In: in Neural Information Processing Systems</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>von Luxburg</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Hitting and commute times in large graphs are often misleading</article-title>
          .
          <source>CORR</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Witsenburg</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blockeel</surname>
          </string-name>
          , H.:
          <article-title>Improving the accuracy of similarity measures by using link information</article-title>
          . In: Kryszkiewicz,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rybinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Skowron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ras</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.W</surname>
          </string-name>
          . (eds.)
          <source>ISMIS. Lecture Notes in Computer Science</source>
          , vol.
          <volume>6804</volume>
          , pp.
          <fpage>501512</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Zan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsinchun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Link prediction approach to collaborative ltering</article-title>
          .
          <source>In: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries</source>
          . pp.
          <fpage>141142</fpage>
          . ACM Press (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          , J.:
          <article-title>Mining web site link structures for adaptive web site navigation and search</article-title>
          .
          <source>In: PhD thesis</source>
          . University of Ulster (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics</article-title>
          .
          <source>In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>633642</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang, S.:
          <article-title>Finding top-k maximal cliques in an uncertain graph</article-title>
          . International Conference on Data Engineering pp.
          <volume>649652</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang, S.:
          <article-title>Mining frequent subgraph patterns from uncertain graph data</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>22</volume>
          ,
          <issue>12031218</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>