<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Followee Recommender System for Information Seeking Users in Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcelo G. Armentano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Godoy</string-name>
          <email>dgodoy@exa.unicen.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Analía Amandi</string-name>
          <email>amandi@exa.unicen.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISISTAN Research Institute</institution>
          ,
          <addr-line>Fac. Cs. Exactas, UNCPBA Campus Universitario, Paraje Arroyo Seco, Tandil, 7000, Argentina CONICET, Consejo Nacional de Investigaciones Científicas y Técnicas</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Micro-blogging activity taking place in sites such as Twitter gains everyday more importance as a source of real-time information and news spreading medium. Finding relevant information sources among the increasing number of Twitter members is essential for users needing to cope with real-time information. In this paper we study Twitter aiming at generating a set of recommendations to a target user consisting in people who publish tweets that might be interesting to him/her. We evaluate and compare two recommendation approaches: the first selects a set of candidate recommendations using only the network topology and the second exploits the user-generated content available in their tweets. We report the results of a set of controlled experiments with real users carried out to evaluate and compare the performance of both algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender Systems</kwd>
        <kwd>Micro-blogging Activity</kwd>
        <kwd>Online Social Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Twitter is a social networking site that has been selected for many users as a means
of disseminating (and reading) news and information. There are three main factors for
choosing Twitter for this goal. First, unlike many other online social networks such as
Facebook, Hi5, Orkut, LinkedIn or MySpace, connections in Twitter are unidirectional.
This means that a user decides to “follow” other users with no need of this relation
to be accepted or reciprocated. Second, the 140-characters length restriction applied to
the messages that users can post in Twitter (which are called tweets) enable users to
receive their followees updates in almost any mobile device or to quickly read a bunch
of them directly on the Internet or within a desktop application. Finally, any user can
easily “retweet” another user’s post. In this way the information will be spread out from
the author followers to other users’ followers. Kwak et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] identified that 77.9%
of Twitter’s connections are unidirectional and only 22.1% of the relations are
reciprocate. Moreover, 67.6% of users are not followed by any of their followees, indicating
that these users probably use Twitter as a source of information rather than to keep in
touch with friends. Finally, Kwak et al. found that retweets collectively determine the
importance of the original tweet expressing a form of collective intelligence. All these
facts, in addition to the great explosion in the number of registered users in Twitter1,
make us believe that information-seeking users would benefit from a recommender
system able to suggest information sources that they might be interest in following.
      </p>
      <p>In this work we study Twitter from a user modeling perspective. Our goal is to
provide recommendations to information seekers about users that publish tweets that
might be of their interest. In order to be valuable, the recommended followees should
be in the category of information broadcasters, since these users will probably generate
content that the target user may be interested in reading.</p>
      <p>Unlike traditional recommendation systems, we do not have any explicit
information available about the user’s interests in the form of ratings on items he/she likes or
dislikes. For profiling a Twitter user the structure of the followers/followees network
and the tweets published in this network is the only information available. Both are
considered in this paper as a means to recommend people either belonging to the user’s
neighborhood or sharing content-related interests.</p>
      <p>
        In this article we present two recommendation algorithms using two different
techniques: a collaborative filtering technique [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and a content based technique [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The
first algorithm is based only on the topology of Twitter network. It first explores the
connections starting at the target user (the user to whom we wish to recommend new
followees) in order to select a set of candidate recommendations and finally it ranks
those candidates according to a scoring function. The scoring function we design
involves three factors that take into account the most influential properties of the Twitter
network, according to previous studies. The second algorithm first creates a vector of
terms describing the interests of the target user based on the tweets published by his/her
followees. This vector is then used to discover new users that might not belong to the
target user neighborhood in spite of being similar to him/her. Since these users are not
taken from the connections starting at the target user, in principle they would not be
discovered by the topology-based algorithm.
      </p>
      <p>
        Unlike other works that focus on ranking users according to their influence in the
entire network [
        <xref ref-type="bibr" rid="ref18 ref19">18,19</xref>
        ], the algorithm we propose explores the follower/following
relationships of the user up to a certain level, so that more personalized factors are considered
in the selection of candidates for recommendation, such as the number of mentions of
these candidates. Furthermore, the approach proposed in this work was evaluated with
a controlled experiment with real users. From the experiments performed we found that
although the average precision tend to be similar for both algorithms, the content-based
approach is better at positioning relevant recommendations at the top of the ranking.
      </p>
      <p>The rest of this work is organized as follows. Section 2 describes some aspects
about Twitter and discuss how related work is related to our research. Next, in Section
3 we describe our approach to the problem of followee recommendation in Twitter. In
Section 4 we present the experiments we performed to validate our proposal. Finally,
in Section 5 we discuss the results we obtained and present our conclusions and future
work.</p>
    </sec>
    <sec id="sec-2">
      <title>1 In 2010 Twitter grew by more than 100 million registered accounts.</title>
      <p>http://yearinreview.twitter.com/whosnew/. Accessed on April 2011</p>
      <sec id="sec-2-1">
        <title>Background and Related Work</title>
        <p>Twitter is a social network with micro blogging service that enables users to send and
receive messages with a length shorter than 140 characters that are called “tweets” or
status updates. Relationships in Twitter are unidirectional: a Twitter user U interested
in the tweets published by another user registers himself as a “follower”. Although user
U has no need to follow their followers back, it is possible for him/her to obtain the list
of users following him/her.</p>
        <p>As stated above, the Twitter network is populated with tweets. Tweets can have any
(textual) content; however there exist users that only publish tweets about a particular
subject, such as sports, movies, music or a about a particular rock band. These users can
be considered as information sources or broadcasters. In contrast, many people uses
Twitter to get information on particular subject, as a form of RSS reader, registering
themselves as followers of their favorite artists, celebrities, bloggers, or TV programs.
For this last type of users finding high quality and reliable information sources in the
constantly increasing Twitter community becomes a challenging issue.</p>
        <p>
          Several recent research efforts have been dedicated to understand micro-blogging
as a novel form of communication and news spreading medium. Java et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and
Krishnamurthy et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] presented a characterization of Twitter users grouping them into
three categories. The first correspond to information sources or broadcasters, which
are users that are characterized by having a much larger number of followers than they
themselves are following. The second category groups information seekers, users who
rarely post a tweet authored by themselves but that regularly follows other users.
Finally, users categorized as friends or acquaintances are users that tend to use Twitter as
a typical on-line social network and are characterized by reciprocity in their
relationships.
        </p>
        <p>
          The influence of users in Twitter has also been subject of several studies. In [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] it
was shown that ranking users by the number of followers and by their PageRank give
similar results. However, ranking users by the number of re-tweets indicates a gap
between influence inferred from the number of followers and that from the popularity of
users’ tweets. Coincidentally, a comparison between in-degree, re-tweets and mentions
as influence indicators [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] concluded that the first is more related to user popularity.
Analyzing spawning re-tweets and mentions, it was found that most influential users hold
significant influence over a variety of topics but this influence is gained only through
a concentrated effort (such as limiting tweets to a single topic). TwitterRank [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], an
extension of PageRank algorithm, tries to find influential twitterers by taking into
account the topical similarity between users as well as the link structure. Garcia et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
propose a method to weight popularity and activity of links for ranking users. User
recommendation, however, can not be based exclusively on general influence rankings
since people get connected for multiple reasons.
        </p>
        <p>
          While the mentioned studies focus on analyzing micro-blogging usage, other works
try to capitalize the massive amount of user-generated content as a novel source of
preference and profiling information for recommendation. Chen et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] proposed an
approach to recommend interesting URLs coming from information streams such as
tweets based on two topic interest models of the target user and a social voting
mechanism so that the most popular URLs within the group are recommended. Buzzer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
indexes tweets and recent news appearing in user specified feeds, which are considered
as examples of user preferences, to be matched against tweets from the public timeline
or from the user Twitter friends for story ranking and recommendation. Esparza et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
address the problem of using real-time opinions of movie fans expressed through the
Twitter-like short textual reviews for recommendation. This work assumes that tweets
carry on preference-like information that can be used in content-based and collaborative
filtering recommendation.
        </p>
        <p>
          In contrast to the previous works that address the problem of suggesting potentially
relevant content from micro-blogging services, we concentrate in recommending
interesting people to follow. In this direction, Sun et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] proposes a diffusion-based
micro-blogging recommendation framework which identifies a small number of users
playing the role of news reporters and recommends them to information seekers during
emergency events. Closest to our work are the algorithms for recommending followees
in Twitter evaluated and compared using a subset of users in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Multiple profiling
strategies were considered according to how users are represented in a content-based
approach, a collaborative filtering approach and two hybrid approaches. User profiles
are indexed and recommendations generated using a search engine, receiving a
rankedlist of relevant Twitter users based on a target user profile or a specific set of query
terms. Our work differs from this approach in that we do not require indexing
profiles from Twitter users, instead topology-based and content-based algorithms explored
the follower/followee network in order to find candidate users to recommend.
Furthermore, we consider in the evaluation of our approach the target user assessment about
the his/her interest in the provided recommendations.
        </p>
        <p>
          Finally, the problem of helping users to find and to connect with people on-line to
take advantage of their friend relationships has been also studied in the context of social
networks. For example, SONAR [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] recommends related people in the context of
enterprises by aggregating information about relationships as reflected in different sources
within a organization, such as organizational chart relationships, co-authorship of
papers, patents, projects and others. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] presented different methods for link prediction
based on node neighborhoods and on the ensemble of all paths. These methods were
evaluated using co-authorship networks. Authors found that there is indeed useful
information contained in the network topology alone. Chen et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] compared
relationshipbased and content-based algorithms in making people recommendations, finding that
the first ones are better at finding known contacts whereas the second ones are stronger
at discovering new friends. Weighted minimum-message ratio (WMR) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is a
graphbased algorithm which generates a personalized list of friends in a social network built
according to the observed interaction among members. Unlike these algorithms that
gathered social networks in enclosed domains from structured data (such as
interactions, co-authorship relations, etc.), we proposed two algorithms to take advantage of
the massive, unstructured, dynamic and inherently noisy user-generated content from
Twitter.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Followees Recommendations on Twitter</title>
        <p>We have designed two different algorithms for followee recommendation on Twitter.
The first algorithm is only based on the topology of the followers/followees network
and suggests users that are neighboring the target user up to some distance. The second
algorithm is content-based and aims at suggesting users that may not be in the
neighborhood of the target user, but whose tweets may be interesting to him/her.
3.1</p>
        <sec id="sec-2-2-1">
          <title>Topology-based recommender</title>
          <p>The general idea behind this algorithm is to suggest users that are in the
neighborhood of the target user and that can be potential followees. A user’s neighborhood is
determined from the follower/followee relations in the social network. We apply the
following heuristic to obtain the list of candidate users for recommendation:
1. Starting with the target user uT , obtain the list of users he/she follows, let’s call this
list S, i.e. S(uT ) = [ f .</p>
          <p>8 f 2 f ollowees(uT )
2. For each element in S get its followers, let’s call the union of all these lists L, i.e.</p>
          <p>L(uT ) = [ f ollowers (s).</p>
          <p>8s2S
3. For each element in L obtain its followees, let’s call the union of all these lists T ,
i.e.T (uT ) = [ f ollowees (l).</p>
          <p>8l2L
4. Exclude from T those users that the target user is already following. Let’s call the
resulting list of candidates R, R = T S.</p>
          <p>Each element in R is a possible user to recommend to the target user. Notice that each
element can appear more than once in R, depending on the number of times that each
user appears in the the followees or followers lists obtained at steps 2 and 3 above.</p>
          <p>The rationale behind this heuristic procedure is that the target user is an information
seeker that has already identified some interesting users acting as information sources,
which are his/her followees. Other people that also follows some of the users in this
group (i.e. is subscribe to some of the same information sources) have interests in
common with the target user and might have discover other relevant information sources in
the same topics, which are in turn their followees.</p>
          <p>Finally, we give each unique user uc 2 R a score given by the Equation 1:
score(uc) =
occurrences(uc; R)
jRj
j f ollowers(uc) j
j f ollowees(uc) j
j mentions(uc) j</p>
          <p>M
(1)</p>
          <p>The first term corresponds to the number of occurrences of the user in the final list
of jRj candidates for recommendations. The number of occurrences of a user uc in this
final list is an indicator of the amount of (indirect) neighbors that also have uc as a
(direct) connection itself.</p>
          <p>
            The second term is the relation between the number of followers a user has with
respect to the number of users that he/she follows. Since we seek for information sources
to be recommended, we assume that this kind of users will have many followers and
that they will follow few people. In [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] it has been shown that the rankings of users that
can be obtained by number of followers and by PageRank are very similar. We opted to
use this factor as an estimator of the “importance” of a given user because the number
of followers is a metric by far more easy to obtain that the PageRank score in a network
with an order of almost 2 billion social relations. Cha et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] also support the fact
that the number of followers, along with both retweets and mentions, are factors that
represents a user’s influence on Twitter. They found that while the number of followers
is an indicator of a user’s popularity, retweets and mentions represent other important
factors such as engaging the audience with valuable content.
          </p>
          <p>
            For the reason expressed above, we finally add a factor that considers the number of
times that a user has been mentioned in the social network in recent posts. According
to Kwak et al. [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] ranking Twitter users by the number of retweets shows the rise of
micro-blogging as an alternative communication media. In other words, retweets are
considered the feature that has made Twitter a new medium of information
dissemination. Hence, we consider mentions of a user instead of retweets because mentions are a
broader concept that includes retweets. The most recent mentions to a user can be easily
obtained through Twitter’s Query API, up to a maximum of M mentions. Currently M
is set to 100.
3.2
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Content-based recommender</title>
          <p>Information seekers are characterized for posting few tweets themselves, but follow
people that generate content more actively. It is assumed that users actively select their
followees expecting that their tweets will be interesting to them. Then, in order to
develop a content-based followees recommender algorithm, we assumed that the
interests of the target user can be described by the content of the tweets published by the
users he/she follows. Let tweets(u f ) = ft1; t2; ; tkg be the set of tweets published by
user u f , pro f ilebase(u f ) the term vector built from tweets(u f ), and f ollowees(uT ) =
f f1; f2; ; fl g the followees of user uT . Then the profile of a user uT is defined as the
union of term vectors of his/her followees:
pro f ile(uT ) =</p>
          <p>[
8u f 2 f ollowees(uT )
pro f ilebase(u f )</p>
          <p>In order to search for candidate recommendations, this algorithm does not take
candidate users from the topology of the social network. Instead, it aims at discovering new
users that may not be connected to the target user by a short path in the graph but appear
in an information stream provided by Twitter which is known as public timeline. This
stream contains the collection of the most recently published tweets, and is fed by all
accounts that are not configured to be private. The public timeline can be considered as
the current flow of information in Twitter, and is a good source to obtain active users in
the social network.</p>
          <p>The content-based algorithm we designed works as follows:
1. Obtain the authors of the most recent publications that appear in Twitter’s public
timeline, U = fu1; u2; . . . ; umg.
2. For each user uC 2 U , build pro f ilebase(uC). That is, we build the term vector
corresponding to each uC .
3. For each user uC 2 U , compute</p>
          <p>sim(uC; uT ) = max8i: fi2 f ollowees(uT )simcos [pro f ilebase( fi); pro f ilebase(uC)]</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Where simcos is simply the cosine similarity between the two vectors.</title>
      <p>If sim(uC; uT ) &gt; g, add uC to the list of recommendations ordered by similarity.
4. Repeat steps 1 to 4 until the desired number of recommendations is obtained.
In order to build the term vectors associated to users, we first detect the language of the
tweets2 and then we apply the corresponding stop-word and stemming filters. We use a
term frequency weighting scheme in the term vectors.</p>
      <p>We use a similarity threshold of g = 0:1 to consider a user relevant for
recommendation. This threshold was set very low so that the desired number of recommendations
could be obtained in a reasonable time. However, it can be adjusted according to the
recommender application. For example, if recommendations can be calculated off-line
the threshold can be set to a higher value, likely improving the precision of
recommendations, at expense of some additional calculation time.
4
4.1</p>
      <sec id="sec-3-1">
        <title>Experimental evaluation</title>
        <sec id="sec-3-1-1">
          <title>Experiment setup</title>
          <p>In order to evaluate the proposed algorithms, we have carried out a preliminary
experiment using a group of 26 users. These users, 20 males and 6 females, were in the last
years of their course of studies and were students of a “Recommender Systems” course
dictated at our university as an elective course during 2010. The students selected for the
experiment were volunteers familiarized with Twitter. We asked these users to create a
new Twitter account3 and to follow at least 20 Twitter users who publish information or
news about a set of particular subjects of their interest. The general interests expressed
by users ranged among diverse subjects. Some users only concentrated on one
particular subject while others distributed their followees among several topics. Then we
provided these users with a desktop tool that allowed them to login to Twitter and ask
for recommendations using both methods (topology-based and content-based). The tool
offered the logged user 20 recommended users and we asked them to explicitly
evaluate whether the recommendations were relevant or not according to the same topical
criteria they have chosen to select their followees as information sources.</p>
          <p>For each recommendation in the resulting ranking the application showed the user
name, description, profile picture and a link to the home page of the corresponding
account. This link could be used to read the tweets published by the recommended
user in the case that the information provided by the application was not enough to
determine the student’s interest in the recommendation. The question we asked students</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2 We are currently working with English and Spanish.</title>
      <p>3 We asked students to create a new Twitter account so that they did not need to reveal their real
account and the people they follow
to ask themselves to determine whether a recommendation was relevant or not was
“Would you have followed this recommended user in the first place (when selecting
which users to follow in the first part of the experiment), if you had know this account?”
For example, if a given student was interest in technology and he/she had not discovered
the account @TechCrunch during his/her first selection of followees, that would be an
interest recommendation because @TechCrunch tweets about news on technology.
4.2</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>The quality of lists of top-N followee recommendations generated by each algorithm
was first evaluated in terms of their overall precision. Precision can be defined as the
number of relevant recommendations over the number of recommendations presented to
the user and it can be also computed at different positions in the ranking. For example,
P@5 (“precision at five”) is defined as the percentage of relevant recommendations
among the first five, averaged over all runs. Figure 1 shows the precision obtained for
both algorithms at four different positions of the ranking: P@1, P@5, P@10 and P@20.</p>
        <p>In general, both algorithms perform similarly at different positions of the ranking,
with the exception of P@1 for which the content-based approach clearly outperforms
the topology-based algorithm (61% of relevant recommendations for the content-based
algorithm against 35% of relevant recommendations for the topology based-algorithm).
For P@5 we obtained 48% of relevant recommendations for the content-based
algorithm and 46% of relevant recommendations for the topology based algorithm. At this
point we should point out that although we report precision up to 20 recommendations,
recommender systems generally present to users shorter recommendations lists aiming
at helping them to focus on the most relevant results. In these small lists the
contentbased algorithm reached good levels of precision, recommending mostly interesting
users.</p>
        <p>For recommendations lists longer than 5, performance decreases and we can observe
that the topology based algorithm tends to give better results than the content-based
algorithm. However the difference in performance of both algorithms is always lower
than 5%. Due to the reduced number of users who participate in the experiment, we
performed the Student’s t-test of significance on the results obtained. The Student’s
ttest looks at the average difference between the performance scores of two algorithms,
normalized by the standard deviation of the score difference. For this test we obtain that
only the difference in precision at the first position of the ranking (P@1) is statistically
significant.</p>
        <p>
          Although precision measure gives a general idea of the overall performance of the
presented algorithms, it is also very important to consider the position of relevant
recommendations in the ranking presented to the user. Since it is known that users focus
their attention on items at the top of a list of recommendations [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], if relevant
recommendations appear at the top of the ranking using one algorithm and at the bottom of
the ranking using the other, the first algorithm will be perceived by users as performing
better even though their general precision might be similar.
        </p>
        <p>Discounted cumulative gain (DCG) is a measure of effectiveness used to evaluate
ranked lists of recommendations. DCG measures the usefulness, or gain, of a
document based on its position in the result list using a graded relevance scale of
documents in a list of recommendations,. The gain is accumulated from the top of the result
list to the bottom with the gain of each result discounted at lower ranks. The premise
of DCG is that highly relevant documents appearing lower in a list should be
penalized as the graded relevance value is reduced logarithmically proportional to the
position of the result. The DCG accumulated at a particular rank position k is defined as
DCG@k = rel1 + åik=2 loregl2i i . DCG is often normalized using an ideal DCG that is
computed by sorting documents of a result list by relevance. Figure 2 shows the normalized
DCG obtained for both algorithms at four different positions of the ranking: nDCG@1,
nDCG@5, nDCG@10 and nDCG@20.</p>
        <p>Success at rank k (S@k) is another metric commonly used to evaluate ranked lists
of recommendations. The success at rank k is defined as the probability of finding a
good recommendation among the top k recommended users. In other words, S@k is
the percentage of runs in which there was at least one relevant user among the first k
recommended users. Figure 3 shows the results we obtained for this metric for values
of k ranging from 1 to 6.</p>
        <p>S@1 is equivalent to P@1 by definition. Then, we can see that the content-based
algorithm always positions relevant users earlier in the ranking than the topology-based
algorithm. Indeed, all users in the experiment found a relevant recommendation before
position 4 in the ranking using the content-based algorithm. For the topology-based
algorithm, most users found a relevant recommendation before position 5 except for
one user that found the first relevant recommendation at rank 6.
5</p>
        <sec id="sec-4-1-1">
          <title>Discussion and Conclusions</title>
          <p>In this article we presented two simple but effective algorithms for recommending users
in the Twitter social network. The first algorithm models a given user from his/her
connections in the social graph whereas the second algorithm models users using the
content of the tweets published by his/her followees. We evaluated the proposed algorithms
with real users and found that they work fairly similar in finding users that might result
interesting for the target user to start following.</p>
          <p>From the experiments presented we can conclude that although the average
precision tend to be similar for both algorithms, if we consider the position on the
recommendations in the ranking the content-based approach is better at giving good
recommendations. We believe that results obtained with the content-based algorithm can be
improved by setting a higher threshold for the similarity measure used for filtering the
term vectors representing users. However, this will increase the response time of the
algorithm since users are taken randomly from Twitter’s public timeline.</p>
          <p>Among the advantages of the topology-based algorithm, on the other hand, we can
mention that recommendations can be found quickly based on a simple analysis of the
network structure, without considering the content of the tweets posted by the candidate
user.</p>
          <p>Although the results reported seems promising, we are planning to repeat the
experiment this year in order to involve more users in the experiment and obtain more
statistical significance about the two proposed algorithms.</p>
          <p>A natural extension in which we are currently working on is a hybrid algorithm
that combines the best of both algorithms presented in this paper. This hybrid algorithm
filters the candidate recommendations found with the topology-based method with a
content-based analysis of the tweets posted by the candidate users. We are also very
optimistic about the potential improvements that could be obtained with this hybrid
approach.</p>
          <p>
            As a possible limitation of our approach, we can mention that, we assumed that
the target user is an information-seeker user, according to the categorization proposed
by [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. However, users may play different (and multiples) roles of information source,
information seeker or friends in different communities. This is a challenging factor to
consider that we leave for future investigation.
          </p>
          <p>The experiments presented make us feel optimistic about the potential of a followee
recommender system for Twitter using the methods described in this article or a
combination of them. This work is the first step towards exploring the great potentials of this
new platform to build recommendation systems.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haddadi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benevenuto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gummadi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Measuring user influence in Twitter: The million follower fallacy</article-title>
          .
          <source>In: Proc. of the 4th Int. Conf. on Weblogs and Social Media (ICWSM'10)</source>
          . Washington DC, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geyer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dugan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Make new friends, but keep the old: recommending people on social networking sites</article-title>
          .
          <source>In: Proc. of the 27th Int. Conf. on Human Factors in Computing Systems</source>
          . pp.
          <fpage>201</fpage>
          -
          <lpage>210</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nairn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nelson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
          </string-name>
          , E.:
          <article-title>Short and tweet: experiments on recommending content from information streams</article-title>
          .
          <source>In: Proc. of the 28th Int. Conf. on Human Factors in Computing Systems (CHI'10)</source>
          . pp.
          <fpage>1185</fpage>
          -
          <lpage>1194</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Esparza</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Mahony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.P.</given-names>
            ,
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>On the real-time web as a source of recommendation knowledge</article-title>
          .
          <source>In: Proc. of the 4th ACM Conf. on Recommender Systems (RecSys'10)</source>
          . pp.
          <fpage>305</fpage>
          -
          <lpage>308</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amatriain</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Weighted content based methods for recommending connections in online social networks</article-title>
          .
          <source>In: Workshop on Recommender Systems and the Social Web</source>
          . pp.
          <fpage>68</fpage>
          -
          <lpage>71</lpage>
          . Barcelona,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ronen</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilcox</surname>
          </string-name>
          , E.:
          <article-title>Do you know?: recommending people to invite into your social network</article-title>
          .
          <source>In: Proc. of the 13th Int. Conf. on Intelligent User Interfaces (IUI'09)</source>
          . pp.
          <fpage>77</fpage>
          -
          <lpage>86</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hannon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Recommending Twitter users to follow using content and collaborative filtering approaches</article-title>
          .
          <source>In: Proc. of the 4th ACM Conf. on Recommender Systems (RecSys'10)</source>
          . pp.
          <fpage>199</fpage>
          -
          <lpage>206</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Java</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tseng</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Why we twitter: understanding microblogging usage and communities</article-title>
          .
          <source>In: Proc. of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis</source>
          . pp.
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granka</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hembrooke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gay</surname>
          </string-name>
          , G.:
          <article-title>Accurately interpreting clickthrough data as implicit feedback</article-title>
          .
          <source>In: Proc. of the 28th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR'05)</source>
          . pp.
          <fpage>154</fpage>
          -
          <lpage>161</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Krishnamurthy</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gill</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arlitt</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A few chirps about Twitter</article-title>
          .
          <source>In: Proc. of the 1st Workshop on Online Social Networks (WOSP'08)</source>
          . pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kwak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            , H., Moon,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>What is Twitter, a social network or a news media?</article-title>
          <source>In: Proc. of the 19th Int. Conf. on World Wide Web (WWW'10)</source>
          . pp.
          <fpage>591</fpage>
          -
          <lpage>600</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Liben-Nowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The link prediction problem for social networks</article-title>
          .
          <source>In: Proc. of the 12th Int. Conf. on Information and knowledge management</source>
          . pp.
          <fpage>556</fpage>
          -
          <lpage>559</lpage>
          . CIKM '03,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>WMR-A graph-based algorithm for friend recommendation</article-title>
          .
          <source>In: Proc. of the 2006 IEEE/WIC/ACM Int. Conf. on Web Intelligence (WI'06)</source>
          . pp.
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          . Washington, DC, USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billsus</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Content-based recommendation systems</article-title>
          . In: Brusilovsky,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kobsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          , W. (eds.)
          <source>The adaptive Web</source>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>341</lpage>
          . Springer-Verlag (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Phelan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Using Twitter to recommend real-time topical news</article-title>
          .
          <source>In: Proc. of the 3rd ACM Conf. on Recommender Systems (RecSys'09)</source>
          . pp.
          <fpage>385</fpage>
          -
          <lpage>388</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Schafer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frankowski</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herlocker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Collaborative Filtering Recommender Systems</article-title>
          . In: Brusilovsky,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kobsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          , W. (eds.)
          <source>The Adaptive Web, LNCS</source>
          , vol.
          <volume>4321</volume>
          ,
          <issue>chap</issue>
          . 9, pp.
          <fpage>291</fpage>
          -
          <lpage>324</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          .R., Cheng, J.,
          <string-name>
            <surname>Zeng</surname>
          </string-name>
          , D.D.:
          <article-title>A novel recommendation framework for micro-blogging based on information diffusion</article-title>
          .
          <source>In: Proc. of the 19th Workshop on Information Technologies and Systems</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>TwitterRank: finding topic-sensitive influential twitterers</article-title>
          .
          <source>In: Proc. of the 3rd ACM Int. Conf. on Web Search and Data Mining (WSDM'10)</source>
          . pp.
          <fpage>261</fpage>
          -
          <lpage>270</lpage>
          . New York, NY, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takahashi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amagasa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kitagawa</surname>
          </string-name>
          , H.:
          <article-title>TURank: Twitter user ranking based on user-tweet graph analysis</article-title>
          .
          <source>In: Web Information Systems Engineering. LNCS</source>
          , vol.
          <volume>6488</volume>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>253</lpage>
          . Hong Kong,
          <string-name>
            <surname>China</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>