=Paper= {{Paper |id=None |storemode=property |title=Talash: Friend Finding In Federated Social Networks |pdfUrl=https://ceur-ws.org/Vol-813/ldow2011-paper08.pdf |volume=Vol-813 |dblpUrl=https://dblp.org/rec/conf/www/DhekaneV11 }} ==Talash: Friend Finding In Federated Social Networks== https://ceur-ws.org/Vol-813/ldow2011-paper08.pdf
    Talash : Friend Finding In Federated Social Networks ∗

                                               †                                                    ‡
                       Ruturaj Dhekane                                             Brion Vibber
              Indian Institute Of Technology, kanpur                                   StatusNet
                      ruturaj@cse.iitk.ac.in                                     brion@status.net



ABSTRACT                                                           Keywords
In large online social networks, Friend Recommendation has         Friend Recommendation, Federation, Social Networks
evolved into an interesting problem. We try to find known
acquaintances and new interesting friends on a Federated           1.    INTRODUCTION
Social Network (FSN) , using StatusNet as our platform.
                                                                      Online social networks have evolved over the past few
FSNs are decentralized networks on the internet which can
                                                                   years and have been interest of research for the commu-
interoperate using the OStatus Suite of protocols. Friend
                                                                   nity. Social graph analysis has opened new avenues for bet-
finding on these networks is hard because we do not know
                                                                   ter user experience and expansion of these networks. The
the existence of other social networks or Users. We show
                                                                   term social networking is now synonymous with the various
how Linked Data representation like FOAF can solve this
                                                                   activities such as commenting, replying, direct messaging,
problem.
                                                                   like/faving, updating status etc.. Typically an Online So-
   We devise a model for the Federated Network centered
                                                                   cial Network(OSN) for a particular User has three parts to
around a User and use it to define the problem of Friend
                                                                   it [4] . Every User owns a Public or semi public profile within
Finding. The solution uses two phases, first known as Quick
                                                                   a bounded system. Secondly, it displays a list of Users with
Connect which tries to find old acquaintances. The second
                                                                   whom they share a connection, either one way or two way.
phase, Delayed Connect uses the Social Graph of Users to
                                                                   Thirdly, the dynamism in the OSN is due to the actions
find prospective friends. We show how the FSN information
                                                                   of the User to view, traverse and interact with this list of
centered around a User can be extracted from FOAF entries
                                                                   connections.
and generate new recommendations. We shall illustrate the
                                                                      Popularity of OSN sites like Orkut, Facebook, Twitter has
working of Talash as a part of StatusNet. We experimented
                                                                   generated huge amount of data about the various interaction
on the existing FSN and collected feedback from its Users.
                                                                   of Users on the Internet. Social network analysis deals with
The results are encouraging and open new avenues for Friend
                                                                   study of these networks and their properties[13]. Social net-
Finding on the Internet.
                                                                   works fit a scale free model, have small world properties and
   To the best of our knowledge, this is the first study of Fed-
                                                                   show a community structure [14] [15].
erated Social Networks and the problem of Friend Finding
                                                                      Many forms of social networking have existed before OSN’s.
in them.
                                                                   Interaction on email within an organization has been an
                                                                   area of interest[3]. A drawback with all these OSN’s is the
                                                                   bounded environment in which people need to interact. A
Categories and Subject Descriptors                                 public profile on one OSN means that he can only inter-
E.1 [Data Structures]: Graphs and networks; H.3.5 [Online          act with other User’s on that OSN and cannot interact with
Information Services]: Web-based services                          Users from external OSN without explicitly registering a new
                                                                   profile on it. The Linked data initiative however envisage a
                                                                   more open and interconnected social networking experience.
General Terms
                                                                   1.1   Federated Social Networks
Algorithms, Design, Performance
                                                                      A Federated Social Network (FSN) tries to break the bound-
∗Part of this work was done as Google Summer Of Code               aries of the specified system in which the User interacts. A
                                                                   User owns an online profile page (website) on which he de-
2010 project for StatusNet Inc. Part of this was also done at
Indian Institue of Technology, Kanpur, as a part of authors        scribes himself and his connections. The connections, or
masters program.                                                   friends list is described in a machine readable format [25],
†Student of Computer Science And Engineering at Indian             hence any change in his connections can be made easily[28]
Institute Of Technology, Kanpur, India.                            [20][29]. The advantage of Federated Social Network is that
‡Senior Software Architect at StatusNet.                           each User controls his presence on the internet and can in-
                                                                   teract with other Users using a set of protocols such as the
                                                                   OStatus Suite [21][27]. It allows interoperability between
                                                                   different OSN’s and give a distributed control to their own-
                                                                   ers. This does not bind any User to fixed rules like those
Copyright is held by the author/owner(s).                          in Online Social Networking Services (OSNS). StatusNet[10]
LDOW2011, March 29, 2011, Hyderabad, India.
is an open source microblogging [18] software that provides         StatusNet is a microblogging platform that enables a User
such a facility to the Users. StatusNet was chosen for ex-        to set up his own public profile. The activities of this User
perimentation over other nascent projects like Thimbl, Ap-        are now governed by his own regulations. He is not under
pleseed, Diaspora, OneSocialWeb and Elgg.                         any central authority or social networking service. The Sta-
                                                                  tusNet software provides a set of protocols to interact with
1.2    Contributions                                              other independent profiles on the Internet. Since each of
   In such a federated scenario we study the problem of friend    these profiles is federated (individually controlled and not
recommendation. Since each individual exists on different         under a central authority) in its activities and regulated by
websites, it is very difficult to find a specific individual on   the User’s themselves, we call this system a Federated So-
the Internet that you already know, unless you know the           cial Network.
exact website on which he exists. Once we know the website          In this paper we study the task of Friend Finding on this
that profile can be accessed.                                     federated social network. In the following sections we give
   In this work, we model the Federated Social Network of         a brief overview of the OStatus Protocol suite and a cen-
a User in the form of a graph. We give the problems faced         tralized index of the Social Network on the internet - Social
by a system for Friend Finding in this setting. Initially we      Graph API.
only consider finding those friends that a User already knows
beforehand. Further we design an automated friend recom-          2.1.1    OStatus Protocol Suite
mendation system for the Users. We show how the scale                Previously known as the OpenMicroBlogging [17] Proto-
free nature of a social network affects this recommendation       col, OStatus is an open source specification for interoper-
system. We then exploit the properties of the social graph        ability between various sites. This allows various Users on
centered around a User to reduce the overheads in recom-          these sites to interact with each other. The interaction is in
mending new and finding interesting friends for a User.           the form of subscriber-subscription, updating status, repeat
   The remainder of paper is organized as follows. Section        updates of your connections and mark them as favorite. OS-
2 contains a review of technologies used, a quick overview        tatus is a suite of protocols which give the specification for
of the OStatus Suite of protocols and Social Graph API.           such interoperability. There are mainly four protocols in the
Section 3 builds a model of social graph and we formally          suite as described below.
define the problem of Friend Finding. Section 4 describes            WebFinger : an open protocol to identify Users with their
our approach to finding friends in federated social networks.     email addresses [32]. This allows OStatus Users to refer to
We give a simple method to find new acquaintances on the          other Users as someuser@example.com.
Internet, or those friends that, you did not know, already           PubSubHubBub: an open protocol based upon Publish
existed. We give the problems faced by this system and how        Subscribe architecture [23]. It allows a publisher to dis-
the nature of graph centered around a User in Federated So-       tribute content to its subscribers by using an intermediate
cial networks can be used to leverage the drawbacks. Finally      hub. When a hub server receive updates, it multicasts the
we provide the experimental results to show the success of        new or changed content to all registered subscribers.
our method.                                                          Salmon: [22] an open protocol to let information like
   This system named Talash was built for the StatusNet           comments on a post to flow from subscribers to the publish-
software and the results presented here are feedbacks from        ers. When the information reaches the source of the post
the users of Identi.ca [9]. To the best of our knowledge, this    it is re-published by the original content creator so that it
is the first work that focuses on Federated Social Networks       reaches back to all its subscribers.
and the problem of Friend Finding in them.                           Activity Streams: interactions of a User with his social
                                                                  network are published to his subscribers and appear as a
2.    PRELIMINARIES                                               stream of activities [30].
                                                                     Any OSNS can independently develop a website which
2.1    Federated Social Network                                   uses OStatus Protocols and can become a part of the FSN.
   To define a Federated Social Network we break the phrase       Any User who has OStatus enabled on his profile page can
into two parts and try to infer its collective meaning. The       subscribe to or be subscribed by other Users on the FSN
term Social Network here is synonymous to the interactions        and we call such profiles OStatus Subscribable.
among individuals with each other. Since we are studying
OSN’s we call the participants of this network as Users. Ev-      2.1.2    FOAF and Social Graph
ery User has an online presence in the form of a web-page,           FOAF [28] [20] stands for Friend Of A Friend is specifi-
also known as the Profile Page. That presence may be ei-          cation to describe a User and a list of his connections. The
ther public or access controlled. Many different systems have     FOAF is described using the Resource Description Frame-
been developed for securing privacy issues on OSN’s[34][5].       work [25] and stored in a machine readable format. The
If it is public, any one can access and view the information      machine readability feature powers the automation of friend
available on it.                                                  finding exercisee in this work. Users can define their own
   Typically, Users sign up on Online Social Networking Ser-      connections by describing their links with other Users and
vice (OSNS) e.g. Facebook, Orkut for networking with friends      define a relationship between them.
and family. All the mechanism of social network interactions         The social interaction of individuals can be modeled as
are governed by and limited to the regulations of that online     a graph, with individuals being nodes and the interactions
service. There are other Users signed up on the same OSNS         being edges if one User interacts with the other. For our
with whom hhe can interact. Usually Users on one social           problem we look at the social graph generated by the social
networking service cannot interact with a User on another         interactions of Users on the internet. There are millions of
social networking service.                                        Users on the internet that are part of various Online Social
Networks. We study the graph whose nodes are public pro-
files and those that declare their connections publicly. For-
mer work [26] has described how FOAF data can be used
to develop Social Graphs of OSN. FOAF can be used in a
decentralized manner to declare a Users social connections.
Publicly declared FOAF’s can be parsed to create a social
graph. Thus FOAF creates a base structure for the Feder-
ated Social Network.
   Google has indexed the FOAF data on the internet and
made it available as the Social Graph API [2]. We can query
a User by his public profile page (URL) to discover all his
public existences on the Internet as well as his publicly de-
clared connections. We directly use this API for our Friend
Finding algorithm.

2.2     Notations And Definitions
  We now define certain notations which we shall use often
in the paper.

     • Users: an individual who uses and interacts on the
       Federated Social Network.

     • Connections: the set of other Users with whom one
                                                                  Figure 1: A Social Network Graph Of A User. The
       User interacts. The interaction may be one way or two
                                                                  User number 65 is centered.
       way.Connection are neighbors of a User on the social
       graph.

     • Public Profile: an online webpage which describes the      URL’s are same, then they represent the same User, hence
       User and lists his publicly described connections.         we assume that all Users Ui are distinct and hence unique.
                                                                     For every User Uj , he keeps a list of connections in two
     • Status Update: an update posted by the User on his         separate lists. Sjin ⊆ U \ Uj is the set of all Users that
       Profile Page that can be seen by everyone else.            subscribe to User Uj . Sjout ⊆ U \Uj is the set of all User’s to
                                                                  whom Uj subscribes to and hence known as subscriptions of
     • Subscriptions: list of connections whose activities User   Uj . All the User’s Uk ∈ U \ (Sjin ∪ Sjout ∪ Uj ) are unknown
       follows and declares on his Profile.                       to the User Uj and is denoted by the set H.
                                                                     The interactions in this social networks are modeled as a
     • Subscribers: list of connections that follow a Users ac-
                                                                  graph. The G = (V, E) where V is set of vertices’s and E
       tivities.
                                                                  is a set of edges. The V here is a set of User’s in the social
     • Friends: a common word to describe the connection.         network, also equal to U . Each User (node) is uniquely de-
       It might be an acquaintance, known individual, a sub-      fined by is URL of its public profile. The edges are directed
       scriber or a subscription.                                 and Eij shows an edge from User Ui to User Uj .
                                                                     We define a directed edge Eji if Ui belongs to the sub-
                                                                  scriptions list of the User Uj , that is Uj ∈ Siin , and directed
3.    NETWORK MODEL                                               edge Eij if Uj belongs to the subscription list of User Ui ,
  There already exist many systems on the web that behave         that is Uj ∈ Siout .
in a decentralized mechanism.TCP/IP, Email, DNS operate              Since the model is federated, any User Ui can never view
in a distributed manner on the Internet [8]. Here we define       the complete social network. He can only see or operate
the Federated Social Network centered around a User. Each         upon the network with Ui as the center and intereact with
User (U1) has a profile page which is publicly displayed.         Siin ∪ Siout . This is the FSN centered around a User. If he
This is a web page on the Internet and can be visited by          needs to query the social graph of any User Uk , he sends
navigating to the URL of the webpage. He keeps a list of          a request to the profile of Uk and is returned with the so-
his subscriptions and subscribers that follow his activities      cial graph of User Uk as its center. Figure 1 shows a FSN
on the social network. Every subscriber or subscription is        User labeled 65, and all his connections. All the friends are
another User (U2) on the internet who is part of the fed-         connected to the User 65, and may have interconnections
erated social network. This User also own a profile page,         between themselves.
which is completely under his control. The User (U1) can             Problem Statement: For every User Ui we choose a set
visit his connection by using the URL of the profile page of      Ri ⊆ H such that for every Rij ∈ Ri
other User(U2).                                                   1. Ui knows Rij
                                                                  2. Ui will find Rij interesting, if they know each other.
3.1     Mathematical Model                                           The notion of knows is defined by the fact that, one User
  Consider the federated social network existing on the In-       appears in the others address book or has communicated
ternet. Let there be N Users, each denoted by Ui ∈ U where        with the other atleast once over email or on other public
U is the set of all Users and i ∈ 1...N . Every User Ui has       forums or online social networks. The relationship strength
a label which is a unique URL on the Internet. If any two         between two Users can be modeled by the interaction activ-
ity and the notion of knows can be made stronger [33].
   The definition of interestingness of one User to another
is relative and we say that one User will be interesting to
another if they share some common attribute. Tags for peo-
ple and status updates can also determine intrestingness of
a User [11].


4.    FRIEND FINDING
   Consider a new User on the Internet, a new addition to
the FSN. Initially he does not have any subscriptions and
subscribers and his profile URL is unknown to the rest of
the nodes in FSN. We have no social graph associated with
this User at the center that can be analyzed to find the first
set of subscriptions for our User. Its a Cold Start - where
we don’t have apriori knowledge about the User’s friends
and interests. Thus we divide the problem of finding friends     Figure 2: OAuth Dance and Social Graph API Re-
in two parts. The first deals with the cold start to provide     quest.
the User first set of subscriptions to interact with. We call
this Quick Connect. The second approach known as De-
                                                                 Users intervention. The User can alternatively revoke an
layed Connect deals with analysis of User’s social graph and
                                                                 Authorization to stop access.
finding prospective friends.
                                                                    When returned from delegating authorization, the User is
                                                                 shown a list of his contacts page by page. Here he is given an
4.1     Quick Connect                                            option to email the contact to invite him to create his own
   The idea of Quick Connect is to allow a User to gener-        presence on the Federated Social Network. Simultaneously,
ate his first set of connections as fast as possible. One of     the request is en-queued in the StatusNet software’s Queue
the largest data stores of information about the different       [24] to update the list of contacts in background. If the User
contacts of a User is the Address Book. An address book          navigates away from the list of contacts, the Queue Daemon
associated with online email services stores the frequently      ensures safe download of all email contacts.
contacted individuals. Email data can also be parsed to             The background process of download the address book
find the most contacted friends of a User and to mine their      additionally queries each email address on the Social Graph
Social Network [7]. Email contact lists can be optionally be     API. This returns a list of public profiles of the contact.
stored in a flat file as comma separated values. A vCard         If these are OStatus Subscribable then the User is notified
[31] format can also be used to store information about an       about it. This way User’s can directly subscribe to FSN
individual.                                                      User’s whom they already know. Figure 2 shows the OAuth
   Address books generally have a particular format which        Authorization followed by handing over of OAuth tokens to
can be parsed to get the information about the contacts.         the StatusNet Queue Handler. The address book is down-
They have various fields which can be of interest to our         loaded and Social Graph is queried to check whether it is
system. We extract the email address field from the User’s       OStatus Subscribable.
address book for all his contacts and use the Social Graph          Quick Connect is used partially in many social network-
API to search for the contacts OStatus Subscribable public       ing websites. This is the first instance in which we search
profiles.                                                        for User’s not on the same domain. In case a centralized
   Since most of the address books are easily available from     database of social graph as provided by the Social Graph
email service providers, its easy to access such contact lists   API does not exist, WebFinger protocol can be used to query
and generate a list of email addresses. OAuth [16] is used to    the contact email address. The Linked Data representation
authorize the StatusNet software to access the User’s address    of friends list in the form of FOAF can be parsed to obtain
books. We could have used other methods of authorization         this information.
to access this data but we chose OAuth so that the User’s           The method has a disadvantage of many HTTP GET re-
password is never stored in the software. Large number of        quests to download the address book and to query the pro-
email service providers provide OAuth end points for access-     file page of each contact through the Social Graph API. It
ing this data which makes our method more flexible to suit       is a long process for address books containing thousands of
to any service. These services also offer a wide variety of      contacts. The use of StatusNet background queues coupled
data format in which they provide the contact lists. When        with OAuth access token to a User’s data allow download of
ever available we chose the PoCo [6] address book format.        all the contacts without the User’s intervention.

4.1.1    Anatomy Of Quick Connect                                4.2   Delayed Connect
  The User is given a list of email service providers from          In an online social network, a User tries to expand his so-
which the address book information will be retrieved. The        cial boundaries by connecting to more friends. These maybe
User is redirected to the OAuth Authorization page on the        his long lost classmates, new friends based on similar inter-
service providers OAuth endpoint. After the User delegates       ests or connecting to new communities. The graph expands
authorization, the service provider returns an OAuth Ac-         slowly and the Quick Connect provides enough fuel to give
cess Token and and OAuth Secret Token. This pair can             him the initial set of connections to interact with. Once all
be used in future to access the address book data without        the contacts in the address book are analyzed for their pres-
                                                                 profiles will denote the same attribute.
                                                                    A Similarity function is defined which takes two profiles
                                                                 P1 and P2 and returns a similarity score between them. The
                                                                 function returns a score between the two attribute vectors
                                                                 provided by each profile. Various metrics such as common
                                                                 friends, graph distance, Adamic-Adar [1] can be used to find
                                                                 the similarity score[12]. The score is normalized to the range
                                                                 [0,1] with higher value signifying higher similarity between
                                                                 Users. We claim that a User finds those profiles interesting
                                                                 with whom it has higher similarity.
                                                                    Since each profile is not centrally controlled and Attributes
                                                                 can be newly added by installing plugins [19] on the Status-
                                                                 Net software, the problem of defining a Similarity function
                                                                 gets complicated. We solve it by defining an endpoint in
                                                                 the StatusNet software, where these plugins can register a
Figure 3: Plot showing number of Friends and num-
                                                                 handler function. This handler function is designed by the
ber of Friends of Friends on logarithmic scale, of
                                                                 creator of the plugin, whose responsibility is to find the sim-
Users of an OSNS. It can be noticed that Users with
                                                                 ilarity score between attribute values of two different profile.
higher ID, are newer, hence have lesser Friends, but
high value of Friends of Friends
                                                                 4.2.2    Searching Friends
                                                                    We now turn our attention to identifying new connections
ence on the Internet, the responsibility of further expansion    whom we can recommend our User to subscribe to. We
of the User’s social graph solely lies with him. This is the     return to the mathematical model described in Section 3.1
section where we illustrate the friend finding algorithm of      and remind that a User does not have holistic view of the
Talash.                                                          FSN. The User can only see and access his connections, all
   The existing social graph of the User is now analyzed and     his subscriptions and subscribers, and the publicly displayed
uses the social graph API to recommend prospective friends.      connections of his friends. This idea is captured in the FSN
We also define the problem of finding connections that might     centered around a User. User does not know anything be-
be interesting to the User. The algorithm will try to uncover    yond two hops from the him.
the prospective social network as seen by a User.                   The Social Graph API is used to identify new connections
   This method is named Delayed Connect because the exe-         for the User (say Ui ). For every Subscription Sij ∈ Siout
cution of this mechanism is not instantaneous and is heavy       for User Ui we query the Social Graph API for his publicly
on resources e.g. Bandwidth. We can run this mechanism           declared connections. We parse the response from the API
for any User whenever we have available resources and sus-       to generate a list of Users which are at two hop distance
pend the completion in its absence. Delayed Connect be-          from Ui , let this denote a set F oafi .
lieves in expanding the social graph of the User very slowly,       For every profile F oafik in F oafi we use two criterion on
so as to give him enough time to share his thoughts and          which we recommend the profile to the User.
interact with his existing network.                                 Friends You Already Know : For every profile F oafik in
                                                                 F oaf we re-query the Social Graph API to find all his public
4.2.1    Interesting Connections                                 profiles and the publicly declared connections on that profile.
   A User might be interested in interacting with friends from   If F oafik is connected to Ui on some other social networking
same community, geographical location or similar interests.      service, it is possible that they know each other beforehand
Interestingness of a profile is defined with respect to our      and its safe to recommend the profile to the User. Instances
Users profile information. Interestingness can also be de-       of such a recommendation is when two friends are connected
fined in terms of Similarity between two profiles. If the two    on sites like Twitter or Flickr! and do know about each
profiles display content on similar topics, if the biographies   others OStatus Subscribable accounts. The social graph of
of both the User’s declare a geographic location close by, or    prospective connection is queried and its checked whether
they are both connected to same set of friends, then we say      the User Ui is connected to him on other OSNS. If yes, we
that the profiles are similar and they might be connected for    can recommend the new connection’s OStatus account to Ui
interaction on the social network. Many different similarity     for subscription.
functions can be defined given the various attributes of each       Interesting Profiles: For every profile F oafik the Simi-
User [12].                                                       larity function is provided with two profiles, F oafik and
   We augment the model given in Section 3.1 to include          Ui . The function returns a similarity score based on the at-
Profile attributes. Profile attributes is a set of attributes    tributes of two profiles. A profile is called Interesting if the
declared by the profile owner as being his own. Formally         score is greater than 0. The score can be also used to rank
we augment every node of the FSN by a feature vector             each connection in F oafik in decreasing order of their simi-
F = a1 , a2 , ..., an where ai is a profile attribute denoted    larity scores. The connection with higher similarity score is
by a key value pair of {Attribute Name : Attribute Value}.       more interesting than the others.
Since each User on FSN controls his profile, the number of
attributes he declares is his wish. Hence different User’s may   4.2.3    Caching Techniques
have different set of attributes and different in number. We       The number of friends of friends found at two hop dis-
assume that Attribute Names are are unique and uniformly         tance from a User in FSN grow exponentially. For exam-
used. Hence an attribute of name ’Location’ on two different     ple, consider a User Ui with 400 subscriptions. Each of the
                                                                       Table 1: Summary of dataset FSN Graph.
                                                                                   Parameter            Value
                                                                                 Number of nodes       285,198
                                                                                 Number of edges       1663690
                                                                                 Mean out-degree       11.7045
                                                                         Strongly Connected Components   6724




Figure 4: Showing a User and the communities to
which he belongs. For each community we generate
the FOAF entries and consider them as prospective
recomendations.
                                                                 Figure 5: A plot showing number of people that ac-
                                                                 cepted at least 50% of the recommendations given
subscription has at least 400 connections. The number of
                                                                 the size of recommendation set. It can be seen that
HTTP requests generated to find interesting friends for Ui
                                                                 the number of positives lies close to 4-5 connections
is about 160000. Figure 3 shows how the number of Friends
                                                                 being accepted. The plot also shows the percent-
and number of Friends of Friends varies for every User. The
                                                                 age of Users that accepted 100% of the connections
X axis shows the Users of the FSN and places them in the
                                                                 recommended to them.
order they joined the FSN. The Users with higher User ID
have lesser number of Friends since they have joined recently.
However they have a very high value for Friends of Friends.      what plugins to run and control the features displayed on
   We implemented a Cache to store the Social Graph API.         their profile page such as their location, tags and biogra-
For each connection Sij for User Ui , we store its connections   phies. The results of this system can be only seen when the
in a database with a time-stamp. The tuple stored is of the      algorithm runs on the Users StatusNet instance and the net-
form {Sij , Sjk , TIMESTAMP}, where Sij ∈ Siout is set           work evolves with the help of these algorithms. User’s may
of subscriptions User Ui and Sjk is the set of subscriptions     independently add new friends and connect to newly joined
of Uj . The cache served the twin purpose of skipping the        friends and family members.
HTTP request to Social Graph API in immediate future                We experiment on a snapshot of the Federated Social Net-
(Till the entry was invalidated) and allowing interruptions      work using the network formed on Identi.ca [9] as our ba-
in the Delayed Connect due to network disruption.                sis. Identi.ca is a OSNS from StatusNet that gives each
                                                                 User the complete control on their activities and allow in-
4.3     Algorithm                                                teroperability to other OStatus Subscribable accounts such
  We now outline the algorithm used to identify new con-         as Blogger.com and Youtube.com. We use the Subscriber-
nections which can be recommended to the User.                   Subscription data from Identi.ca as our dataset for experi-
                                                                 mentation. The remarkable property of this dataset is that
     1. Consider the FSN centered at the User. Find the com-
                                                                 Identi.ca Users subscribe to or are subscribed by other Users
        munities to which he belongs by removing the User
                                                                 on the FSN, and these links are also included in the dataset.
        and finding connected components in this graph. 4.2.3
                                                                 Hence we get a partial but relevant chunk of the FSN for our
        shows 3 such communities formed.
                                                                 experimentation. The different parameters of this dataset
     2. For every friend in the community we can find his        are shown in Table 1. With more than 280,000 nodes we see
        FOAF to find prospective connections. Top log(n)         that there are 6724 strongly connected components. This
        friends (ranked by number of subscribers) are chosen     shows that the network is diverse and has various communi-
        and their FOAFs are requested.                           ties which do not overlap. We now give our experimentation
                                                                 strategies and the method to evaluate our results.
     3. For every FOAF, the similarity function evaluates the
        score of interestingness between the User and FOAF.      5.1    Manual Feedback
                                                                   A manual feedback is the best test of the system and can
     4. Top log(|F OAF |) connections of the FOAF are re-
                                                                 gauge the usefulness of our algorithm. The choice of friends
        turned as recommendations.
                                                                 a User would like to have is dependent on the User’s inter-
                                                                 ests. We used the Similarity function to find the similarity
5.     EXPERIMENTAL RESULTS                                      between the User and the prospective connections and hoped
  Experimenting on the system designed and discussed in          that the User will accept them. We were not able to test
this paper is a challenging task since the StatusNet software    Quick Connect for these Users because each one owned their
can be configured by any User of the FSN. User’s can select      instance and did not have our system installed.
 Table 2: ASPL for prospective recommendations
   Community Details      Initial ASPL Final ASPL
  Developers of Identi.ca     1.807       1.602
  Group of entrepreneurs      1.974       1.965
    FOSS contributors         2.453       2.702
         Family               1.333       1.000


   Each User was provided with a set of prospective connec-
tions called the recommendation set. The User was asked to
mark the connection as accepted on rejected. We call the
accepted connections as positives. Figure 5. shows the per-
centage of positives against the size of the recommendation      Figure 6: A log-log scale plot showing the different
set. It was seen that the User’s accepted not more than 4-5      sizes of communities and number of Users belonging
prospective connections. For a recommendation set of size        to communities of that size.
larger than 10, very few Users accepted all the connections.
A User is recommended a set of size larger than 10 when
the User is part of many communities (so that each com-
munity recommends at least one connection) or belongs to
one large community (size of log n). A trend was seen that
Users marked at least one positive from each community
and tended to discard a majority when most connections
came from a single community. This means that the algo-
rithm is able to find the best recommendable connection in
each community to which the User belongs with the quality
decreasing with increase in size of community.

5.2   How Good is a Recommendation?
   For every community of the User which recommends a            Figure 7: A log-linear scale plot showing the number
new connection we find out how much the cluster improved         of FOAF and number of actually HTTP requests
in its bonding. The average shortest path length of the clus-    made to find new friends.
ter are good indications of how the individual communities
to which the new User belongs has improved the bonding
in the system. The average shortest path length in a clus-       connection, the ASPL value increased.
ter quantifies this value with a value of one meaning tighter      The measure of ASPL can be used to rank the new con-
interaction where everyone in the community knows each           nections in the network. A User which tries to converge
other (Clique) and a higher value meaning that the com-          the ASPL of the community towards 1 will be given higher
munity is loosely interacting (everyone does not know each       ranking. The recommendations can be shown to the User in
other). A value lesser than 1 means that some Users are not      decreasing order of their rank.
reachable from some other User in the network.
   For a graph G = (V, N ), let d(u, v) be the shortest path     5.3   Clustering Advantage
length betweenP nodes u an v. The average shortest path is          The motivation for clustering a User’s FSN into communi-
                         d(u,v)
calculated as u,v∈N
                  |N ||N −1|
                                . For every community to which   ties was to find recommended seeds and to reduce the num-
the User belongs we formed a graph containing the members        ber of HTTP requests made to the Social Graph API or
of that community, the central User and the new recommen-        to WebFinger(if it existed). Figure 6. shows the size of
dation. We find the average shortest path length (ASPL) for      communities to which a particular User belongs. We see
this graph.                                                      that more than 50% of User’s belong to a single community.
   We noticed that for communities which were already cliques    Such User’s are either new and have very less subscriptions
(ASPL = 1), the new recommendation increased the value of        or tend to interact with a closed group of User’s. This re-
ASPL. For many clusters the value of ASPL before adding          enforces our idea of using recommender seeds from a com-
the recommended User was 0.66 and after adding the new           munity to find newer connections for our User. User’s from
connection increased to 1.0 converting the community into a      same community will interact closely and will not overlap
clique. This showed that a new connection was completing         significantly with other communities. For example, Users
the networks hidden links.                                       from a family may choose to interact only with their family
   Table 2. shows the initial and final ASPL for a User on the   members. A User from that community may be part of an-
FSN. The communities shown is a rough guess of the kind of       other community , say Presley Fan Club. The community
interaction the User has with them. For small communities        finding exercise can also give the User an indication of why
like Developers and entrepreneurs on the internet, the ASPL      the new connection is being recommended.
value decreased, possibly filling up gaps in the networks. For      Figure 7 shows the number of HTTP requests sent to find
a group such as FOSS contributors, the community seed            new friends for each User on the FSN. The number of such
recommended a User completely outside the network. Since         requests is reduced drastically. If all the recommendations
very few Users in the community were connected to the new        are accepted by a User, the size of each community of the
FSN centered around the User increases. Hence, the recom-        [8] A. Galloway. Protocol, or, how control exists after
mendations count will keep increasing and will expose the            decentralization. Rethinking Marxism: A Journal of
User to newer prospective friends in every iteration. The            Economics, Culture & Society, 13(3):81–88, 2001.
log n upper bound for choosing the size of recommendation        [9] Identi.ca. http://identi.ca.
set gives rise to slow growth of FSN of a User. The feedback    [10] S. Inc. http://www.status.net/.
collected from the User’s of FSN showed a trend of having       [11] H. Kwak, C. Lee, H. Park, and S. Moon. What is
lesser positives when the recommendation set was very large.         twitter, a social network or a news media? In WWW
                                                                     ’10: Proceedings of the 19th international conference
6.   CONCLUSIONS                                                     on World wide web, pages 591–600, New York, NY,
   The federated nature of the social network enpowers Friend        USA, 2010. ACM.
Finding algorithm to be executed on individual Nodes of the     [12] K. J. Liben-Nowell, D. The link-prediction problem
FSN. The network can evolve independently and expand at              for social networks. Journal of the American Society
each node as required by the User. We showed a simple                for Information Science and Technology,
mechanism to find new friends on this network and show               58(7):1019–1031, 2007.
how we can depend on the Friends Of a Friend information        [13] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel,
alone to extract this information. Clustering of a FSN cen-          and B. Bhattacharjee. Measurement and analysis of
tered around a User into communities allows expansion of a           online social networks. In In Proceedings of the 5th
Users FSN in all the domains he is interested. Community             ACM/USENIX Internet Measurement Conference
based categorization of recommendations is a valuable addi-          (IMCâĂŹ07), 2007.
tion to our algorithm and can be used in various applications   [14] M. Newman. Detecting community structure in
where the community feature can be exploited.                        networks. The European Physical Journal B -
   The algorithm makes many heuristic assumptions and ex-            Condensed Matter and Complex Systems, 38:321–330,
act evaluation of them can be only made by analyzing the             2004.
behavioral pattern of the User for whom the recommenda-         [15] M. E. J. Newman and M. Girvan. Finding and
tion is made. This is a first attempt to experiment a friend         evaluating community structure in networks. Phys.
finding algorithm on federated networks. The actual impact           Rev. E, 69(2):026113, Feb 2004.
of this algorithm can be felt only when the network evolves     [16] OAuth. http://oauth.net/.
considerably using this system.                                 [17] OMB.
                                                                     http://en.wikipedia.org/wiki/openmicroblogging.
7.   ACKNOWLEDGMENTS                                            [18] A. Passant, T. Hastrup, U. Bojars, and J. Breslin.
   We would like to thank Evan Promodrou and his team of             Microblogging: A semantic web and distributed
StatusNet developers for constant support and interaction            approach. In 4th Workshop on Scripting for the
during the Google Summer of Code 2010. We would also                 Semantic Web (SFSW2008).
like to thank Dr. Sanjeev Saxena for his guidance. A part       [19] S. Plugins.
of this work was a result of discussions with him during             http://status.net/open-source/add-ons/plugins.
the work carried out under his supervision for the Masters      [20] F. Project. http://www.foaf-project.org/.
Program at IIT Kanpur. The authors thank the anonymous          [21] O. Project. http://www.ostatus.org/.
referees for their valuable comments that helped improving      [22] S. Protocol. http://www.salmon-protocol.org/.
this paper.                                                     [23] PubSubHubBub. code.google.com/p/pubsubhubbub/.
                                                                [24] S. Queues. http://status.net/wiki/queues.
8.   REFERENCES                                                 [25] RDF. http://www.w3.org/rdf/.
 [1] L. A. Adamic and E. Adar. Friends and neighbors on         [26] M. Rowe. Interlinking distributed social graphs. In
     the web. SOCIAL NETWORKS, 25:211–230, 2001.                     Proceedings of Linked Data on the Web Workshop,
 [2] S. G. API. http://code.google.com/apis/socialgraph/.            World Wide Web Conference 2009, April, Spring 2009.
 [3] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and             [27] O. Spec. http://www.ostatus.org/specification.
     A. Swaminathan. Mining email social networks. In           [28] F. Specification. http://xmlns.com/foaf/spec/.
     MSR ’06: Proceedings of the 2006 international             [29] X. Specification. http://gmpg.org/xfn/1.1.
     workshop on Mining software repositories, pages            [30] A. Streams. http://activitystrea.ms/.
     137–143, New York, NY, USA, 2006. ACM.                     [31] vCard. http://www.imc.org/pdi/vcard-21.doc.
 [4] D. Boyd and N. Ellison. Social network sites:              [32] WebFinger. http://code.google.com/p/webfinger/.
     definition, history, and scholarship. Engineering          [33] R. Xiang, J. Neville, and M. Rogati. Modeling
     Management Review, IEEE, 38(3):16 –31, 2010.                    relationship strength in online social networks. In
 [5] B. Carminati, E. Ferrari, and A. Perego. Enforcing              WWW ’10: Proceedings of the 19th international
     access control in web-based social networks. ACM                conference on World wide web, pages 981–990, New
     Trans. Inf. Syst. Secur., 13(1):1–38, 2009.                     York, NY, USA, 2010. ACM.
 [6] P. Contacts.                                               [34] B. Zhou and J. Pei. Preserving privacy in social
     http://portablecontacts.net/draft-spec.html.                    networks against neighborhood attacks. pages 506
 [7] A. Culotta, R. Bekkerman, and A. Mccallum.                      –515, apr. 2008.
     Extracting social networks and contact information
     from email and the web. In In Proceedings of CEAS-1,
     2004.