=Paper= {{Paper |id=None |storemode=property |title=Social-Textual Search and Ranking |pdfUrl=https://ceur-ws.org/Vol-842/crowdsearch-khodaei.pdf |volume=Vol-842 |dblpUrl=https://dblp.org/rec/conf/www/KhodaeiS12 }} ==Social-Textual Search and Ranking== https://ceur-ws.org/Vol-842/crowdsearch-khodaei.pdf
                             Social-Textual Search and Ranking∗

                                Ali Khodaei                                                Cyrus Shahabi
                   Department of Computer Science                                 Department of Computer Science
                   University of Southern California                              University of Southern California
                     Los Angeles,CA 90089,USA                                       Los Angeles,CA 90089,USA
                           khodaei@usc.edu                                               shahabi@usc.edu


ABSTRACT                                                                    ditional web where users are often in read-only mode, Web
Web search engines are traditionally focused on textual con-                2.0 have enabled users to be in read-write mode. In other
tent of data. Emergence of social networks and Web 2.0                      words, users have started to express themselves in the forms
applications makes it interesting to see how social data can                of generating and publishing content (e.g., writing a tweet),
be used in improving the conventional textual search on the                 re-sharing interesting content by others (e.g., re-tweeting)
web. In this paper, we focus on how to improve the ef-                      and rating/evaluating the existing content (e.g., choosing a
fectiveness of web search by utilizing social data available                favorite tweet). This emergence of social networks and Web
from users, users actions and their underlying social net-                  2.0 resulted in huge amount of data available that can be
work on the web. We define and formalize the problem of                     utilized in many domains.
social-textual (socio-textual ) search and show how social as-                 In this paper, we focus on taking advantage of this infor-
pect of the web can be effectively integrated into the textual              mation in the domain of (textual) web search. We argue that
search engines. We propose a new social relevance ranking                   by integrating information from users’ social networks and
based on several parameters including relationship between                  their activities on the web, we can improve the conventional
users, importance of each user and actions users perform on                 textual search and ranking. In today’s web, we can know the
web documents (objects). We show how the proposed so-                       existence and degree of relationships among people and also
cial ranking can be combined with the conventional textual                  at the same time have the knowledge of people’s interests de-
relevance ranking. We have conducted an extensive set of                    rived from their actions/activities on the web. It is both in-
experiments on the data from online radio website last.fm to                tuitive and proven [3] that people have very similar interests
evaluate the effectiveness of our proposed approaches. Our                  with their friends. Also, people tend to trust the opinions
experimental results are very promising and show a signifi-                 and judgements of their friends more than strangers. We
cant improvement for socio-textual ranking over textual only                show how to modify the existing (textual) relevance rankings
and social only approaches.                                                 to take into consideration user’s social network in generating
                                                                            ranked results to the search queries. Consider the following
                                                                            example. A user searches for ”funny video clip”. Using con-
1.    INTRODUCTION                                                          ventional textual search, user will receive a ranked results
   Social networks on the web have grown significantly over                 of some funny video clips. On the other hand, using user’s
the past few years. People have started to reconstruct their                social network, videos contain query keywords (i.e., funny
friendship networks in the virtual world and many of these                  video clips) that have more comments, likes or favorites by
virtual relationships are good representatives of their actual              user’s friends should be ranked higher. However, the new
(friendship) networks in the real world. At the same time                   ranked ranking is not trivial. Do we give more weights to
and with the emergence of Web 2.0, many web users have                      textual keywords or to the social network? With social as-
started to engage more with the web. In contrast to the tra-                pect of the ranking, do we need to assign different weights
                                                                            to different friends of the user? How about the popularity
∗This research is supported in part by the NSF grant                        of the users (friends) in general? Also, what are the actions
IS-1115153, the USC Integrated Media Systems Center                         that are important for objects and how we quantify those?
(IMSC), and also by unrestricted cash and equipment gifts                   In order to combine social data into textual relevance rank-
from Google, Microsoft and Qualcomm. The opinions, find-
ings, and conclusions or recommendations expressed in this                  ing, social relevance between users and objects (documents)
publication are those of the authors and do not necessarily                 has to be defined first. In order to model social relevance,
reflect the views of the National Science Foundation.                       existence and degree of relationships between users have to
                                                                            be taken into consideration. Also, actions permitted for each
                                                                            type of document (object) and their importance should be
                                                                            modeled. Finally, overall importance/impact of each user
                                                                            has to be considered as well.
                                                                               We first review the few existing studies regarding social
Copyright c 2012 for the individual papers by the papers’ authors. Copy-    search and utilization of social networks in the web search.
ing permitted for private and academic purposes. This volume is published   Then, we define and formalize our problem. Next, we present
and copyrighted by its editors.                                             new scoring methods to calculate social relevance between
CrowdSearch 2012 workshop at WWW 2012, Lyon, France                         users and documents (objects). We show how the impor-
tance of users, different relationships among users and ac-          or +1ed that result. Their algorithms are not public and it
tions they perform on objects can impact the final relevance         seems that they only show the likes and +1s and the actual
ranking. After proposing the new social relevance model, we          ranking is not affected.
present a novel socio-textual relevance ranking technique to            There exists a relevant but somehow different topic of folk-
combine textual and social relevance rankings. Finally, in           sonomies. Tags and other conceptual structures in social
our experimental section, we evaluate the effectiveness of           tagging networks are called folksonomies. A folksonomy is
our proposed models and show that our new relevance rank-            usually interpreted as a set of user-tag-resource triplets. Ex-
ing methods are effective and improve the accuracy of the            isting work for social search on folksonomies is mainly on
returned results.                                                    improving search process over social data (tags and users)
                                                                     gathered from social tagging sites [10][11][12]. In this con-
2. RELATED WORK                                                      text, relationships among users and tags and also among
                                                                     tags themselves are of significant importance.
   There are two groups of related work on the application
                                                                        Finally, there are few studies on the role of collaborative
of social networks in search. With the first group, people
                                                                     filtering in this new social context. Role of social networks
through their social networks are identified and contacted
                                                                     on collaborative filtering is studied in [3] and [13]. It is shown
directly to answer web queries. In other words, queries are
                                                                     that using social networks in collaborative filtering and rec-
directly sent to individuals and answer of the queries are
                                                                     ommendations makes the recommendations better in com-
coming from people themselves [1, 16, 17]. In this approach
                                                                     parison with the traditional collaborative approaches. In an-
called search services, people and their networks are indexed
                                                                     other direction, [14] studies the application of collaborative
and a search engine has to find the most relevant people to
                                                                     filtering on a movie search engine. Authors propose to calcu-
send the queries/questions to.
                                                                     late documents (movies) authorities based on users’ ratings
   The main focus of the second group is on the search pro-
                                                                     (using collaborative filtering) instead of pagerank and other
cess over social data (tags, users and objects) from sites/application
                                                                     link-based authority measures. Social networks of users are
with social aspect such as social tagging sites and (some)
                                                                     non-existent in this study.
Web 2.0 applications. In [2], authors investigate a personal-
                                                                        In contrast to the above, our notion of social search is
ized social search engine based on users’ relations. They
                                                                     to utilize exiting social networks to improve the accuracy
study the effectiveness of three types of social networks:
                                                                     and relevance of convention textual web search. For us,
familiarity-based, similarity-based and both. In [4], which
                                                                     search still has its core textual dimension, represented by
is a short paper, authors propose two search strategies for
                                                                     textual keywords/content in the query and the documents.
performing search on the web: textual relevance (TR)-based
                                                                     In parallel to the textual dimension, (querying) user’s social
search and social influence (SI)-based search. In the former,
                                                                     network is exploited to make the final search results more
the search is first performed according to the classical tf-idf
                                                                     relevant. Our focus is mostly on finding/modeling effective
approach and then for each retrieved document the social in-
                                                                     measures to calculate the social relevance/ranking and com-
fluence between its publisher and querying user is computed.
                                                                     bine it with the existing standard textual relevance rankings.
The final ranking is based on both scores. In the latter, first
                                                                     We also take into consideration the actions users perform on
the social influence of the users to the querying user is cal-
                                                                     documents (as described in Section 3).
culated and users with high scores are selected. Then, for
each document, the final ranking score is determined based
on both TR and SI. In both strategies, two separate costly           3. DEFINITIONS AND FORMALIZATIONS
steps are needed. Also, it is not clear how accurate are the            In this section, we formally define and formalize the prob-
ranking functions since there is no experimental evaluation          lem of socio-textual search.
for the effectiveness of the rankings.                                  Objects: We assume a collection O = {o1 ,o2 ,...on } of n
   In a set of similar papers [5, 6, 7], authors propose sev-        objects (documents). An object can be a traditional web
eral social network-based search ranking frameworks. The             document such as a news page or a business home page
proposed frameworks consider both document contents and              or a Web 2.0 object such as a YouTube video, a tweet, a
the similarity between a searcher and document owners in             Facebook status or any other similar entity. An object o is
a social network. They also propose a new user similar-              composed of a set of textual keywords Ko and a set of users
ity algorithm (MAS) to calculate user similarity in a social         Uo associated with it. Uo is a set of users with some type of
network. In this set of papers, the focus is more on user            actions on the object o (see actions below).
similarity functions and how to improve those algorithms.               Users and Social Network: There is a set U = {u1 ,u2 ,...um }
Most of their experiments are limited to a small number              of m users using the system. We also assume a social net-
of queries on YouTube videos with 3 users, 15 queries and            work modeled as a directed graph G = (V, E) whose nodes
small number of textual keywords. Relevant (interesting)             represent the users of the system and edges represent the
result is a result (video) whose category is simialr/equal to        ties (relationship) among the users. The most common type
the dominant category of videos that searcher has uploaded.          of relationship is the friendship relationship but other type
   In a relatively older paper [8], authors explore the possi-       of relationships can be also applied (e.g. follow relationship
bility of using online social networks to improve the search         in Twitter).
on the Internet. Although this paper is not very technical              Actions: There is a set A = {a1 ,a2 ,...al } of l actions that
, it provides some interesting intuitions on integration of          users can perform on the objects. These actions represent
social networks and web search. With regards to commer-              the relationship between users and objects. For instance, in
cial search engines, Bing and recently Google have started           Twitter, users can perform the following actions on objects
to integrate Facebook and Google+, respectively, to their            (tweets): publish a tweet, retweet a tweet or make a tweet
search process. For some search results, they show query             as their favorite tweet.
issuer’s friends (from his/her social network) that have liked          Socio-Textual Query: A socio-textual query is defined
as Q = hKq , Sq i, where Kq is the textual part of query speci-       In this section, we propose a new social relevance model
fied as a set of keywords in the query and Sq is the social part   to calculate the social relevance between users and objects.
of query specified as the user uq issuing the query and the        We also show how to combine the proposed social relevance
social network G. Since our social network is always G, it is      model with an existing textual relevance model and intro-
sufficient to define the socio-textual query as hKq , uq i. Note   duce our socio-textual relevance ranking.
that while the textual part of the query is always explicit in        We first propose a new scoring approach to calculate the
the query, the social part is often implicit. In other words,      social relevance between an object o and a query q (issued by
we can safely assume that the system (search engine) knows         user qu ). Our social relevance ranking creates a new scoring
the user issuing the query and also the underlying social net-     framework to retrieve and rank objects based on the so-
work, hence the social part of the query can be automatically      cial dimension of the query and objects. In order to have an
added to the textual part by the search engine.1                   accurate scoring function and retrieve the most socially rele-
   User relevance: User ui is relevant to user uj if the net-      vant results to the user, we consider three important factors:
work distance from the node corresponding to ui to the node        (1) relevance of each user to the query’s user, (2) importance
corresponding to uj is less than or equal to a system defined      of each user in general, and (3) relationship between users
threshold value δ. The less the distance between two nodes,        and actions they perform on each object. In the following
the more (user) relevant are those two nodes (users) 2 . Net-      we discuss each measure.
work distance can be any of the existing network distances            User Relatedness. We measure the relatedness of a
in the literature. Two users with the network distance more        user to the querying user (and hence to the query itself)
than δ are considered non-relevant to each other.                  by the user relatedness function urf (uq , ui ). There are sev-
   Social relevance: Social relevance between the object o         eral measures to calculate the relatedness/closeness of two
and the query q is defined based on the social relationship        nodes in a graph/social network. Some of the approaches
that exists between the querying user (uq ) and users asso-        consider the distance between nodes, some look at the be-
ciated with the object o (Uo ). Object o and query q are           haviors of users in a social network and some take into con-
socially relevant if at least one of the object’s users (Uo ) is   sideration number of mutual neighbors of two nodes. While
user relevant with the user issuing the query. The larger the      the required data is available, any of the above methods or
user relevance is, the more socially relevant o and q are. We      other exiting methods can be used for the user relatedness
denote social relevance of object o to query q by socRel(o, q).    function as long as the following three constraints are sat-
We define social relevance in more details in Section 4.           isfied: (1) urf (ui , ui ) = 1, (2) 0 ≤ urf (ui , uj ) ≤ 1 and
   Textual relevance: Object o is textually relevant to the        the more relevant the users, the higher the value, and (3)
query q if there exists at least one keyword belonging to both     urf (ui , uj ) = 0 when urf (ui , uj ) < δ. The first constraint
o and q, i.e., Kq ∩ Ko 6= ∅. We represent textual relevance        states that each user is the most related user to herself. The
of object o to query q by texRel(o, q). 3                          second constraint normalizes this measure and also ensures
   Socio-textual relevance: Object o is social-textual (socio-     that the more related users are assigned higher scores. Fi-
textual) relevant to the query q if it is both socially and tex-   nally, third constraint filters out all relationships that their
tually relevant to the query q. Socio-textual relevance can        significance is below a certain threshold (δ). As a simple
be defined by a monotonic scoring function F of textual and        example satisfying all the above constraints and also cap-
social relevances. For example, F can be the weighted sum          turing the relatedness among users, we can use an inverse of
of the social and textual relevances:                              distance between users (nodes) in the social network (graph)
                                                                   as follows:
                                                                      urf (ui , uj ) = dist(u1i ,uj )
       F (o, q) = α.socRel(o, q) + (1 − α).texRel(o, q)    (1)
                                                                      where dist(ui , uj ) is the number of edges in a shortest
. α is a parameter assigning relative weights to social and        path connecting ui and uj .
textual relevances. The output of function F (o, q) is the            User Weight. We quantify the overall (global) impor-
socio-textual relevance score of the object o for the query q,     tance of each user by the user weight function uwf (ui ). This
and is denoted by stRel(o, q). In Section 4 we show how to         measure quantifies the significance of a user in its social net-
calculate socio-textual relevance.                                 work. For instance, for Twitter, a user with many followers
  Socio-textual search: A socio-textual search identifies          should be assigned a higher weight than a user with only
and ranks all the objects that are socio-textual relevant to       few followers, or for Facebook, a user with more friends is
the query q. The result is the top-k objects sorted based on       more important to the social network than a user with fewer
objects’ socio-textual relevance scores. The parameter k is        friends. In the field of graph theory and social networks this
determined by the user.                                            value is called centrality and there exist several approaches
                                                                   to measure it. Four popular methods to compute centrality
4.   SOCIAL RELEVANCE RANKING                                      are: degree centrality, betweenness, closeness, and eigenvec-
                                                                   tor centrality. [9] is a good resource For a review of these
1
  Naturally, here and in other parts of this paper, we consider    methods and further reading. Similar to the user relatedness
only users who willingly make their social information public      function, the user weight function is also general enough and
to the system.                                                     most of the existing approaches can be applied to uwf . As
2
  For simplicity of presentation, from now on, we assume that      an example for this function we can use the degree central-
users’ social network is implemented as an undirected graph.       ity of nodes (users) as an indication of their importance as
Hence, user relevance and other relationships between users        follows:
will be symmetric.
3                                                                     uwf (ui ) = deg(u
                                                                                      m−1
                                                                                         i)
  In this paper, we do not focus on textual relevance mod-
els. We use popular tf-idf model when we need to calculate            where deg(ui ) is the number of edges incident upon ui and
textual relevance.                                                 m is number of nodes (users).
   User Action. The importance of each user for each ob-                   query keywords (tf), and 2) more important query keywords
ject is directly related to the action(s)4 user perform on                 (idf), in our social relevance model, more weight is given to
each object. Publishing/owning an object by a user shows                   the objects with 1) more important actions 2) performed by
a higher weight/relevance between the object and the user                  more important users 3) whom are more related (closer) to
than only commenting on the object. For instance, a user                   the querying user.
uploading (and thus owning) a YouTube video is more sig-
nificant to that video than a user who only comments on                    4.1   Socio-Textual Search
that video. The importance/relevance of each user to an                       In this section, we combine social relevance with an exist-
object is measured by the user action function uaf (ui , ok )              ing textual relevance model (tf-idf) to calculate the overall
and is dependant on the type of action user ui performs on                 socio-textual relevance of the object o with query q with re-
object ok . For each system, weight/significance of each ac-               gards to both social and textual dimensions. Socio-textual
tion should be determined based on specific characteristics                relevance ranking considers both the textual relevance of the
of that system. We normalize the value of uaf by assign-                   objects to the query and also the social relevance of the ob-
ing values between 0 and 1 (inclusive) to it. The higher                   jects to the query. We formulate the socio-textual relevance
the value is, the more important/relevant is the user to the               ranking as follows:
object. With some systems, there exist actions that can
be performed multiple times by a user on an object, and
the more the action is performed the higher is the relevance               stRel(o, q) =     α × socRel(o, q) + (1 − α) × texRel(o, q)
between the user and that object. For instance, in an on-
                                                                                               X
                                                                                      =    α×      urf (uq , vi ) × uaf (vi , o) × uwf (vi )
line radio website (e.g. last.fm5 ), action listening to a track                               vi ∈Uo
can be done multiple times by a user. The more the user                                                              X
chooses to listen to a track, the more relevant/significant is                       +                  (1 − α) ×            tf (o, tj ) × idf (tj )
that track to the user. Below, we show examples of differ-                                                          tj ∈Kq
ent actions and their corresponding weights for four popular                                                                                    (3)
web 2.0. objects. Note that the assigned weights are only
our suggestion, they can (and should) be easily changed for                   where stRel(o, q) is the socio-textual relevance of object
different applications and/or settings. Examples are as fol-               o to query q where user uq is the query issuer; socRel and
lows:                                                                      texRel are corresponding social and textual relevances for
                                                                           object o; urf , uaf and uwf are user-related functions as
   • YouTube videos: Actions = {own(publish) : 1, f avorite :              described above; Kq is set of query keywords (tags) and tj s
     0.9, like : 0.7, comment : 0.4}.                                      are individual query keywords (tags); tf (o, tj ) is term fre-
                                                                           quency function determining relevance of term tj to object
   • Twitter tweets: Actions = {tweet(publish) : 1, f avorite :            o; idf (tj ) is inverted document frequency function determin-
     0.9, retweet : 0.5}.                                                  ing the importance of keyword tj in the entire collection;
                                                                           and α is a parameter giving relative weights to social and
   • Facebook objects: Actions = {own(publish) : 1, like :                 textual importance. Not only the implementation of urf ,
     0.8, share : 0.6, comment : 0.4}.                                     uaf and uwf functions are flexible (see Section 4), also the
   • last.fm tracks (songs): Actions = {like : 0.8, tag :                  implementation of texRel is flexible. Although, we used
                                    playcount
     0.5, comment : 0.4, listen : max            }.                        the conventional tf-idf model for capturing the textual rele-
                                       playcount
                                                                           vancy, any other textual relevance (similarity) function can
Note that for last.fm, we see an example of an action (listen)             be also used. Equation 3 provides the general framework
that can be performed multiple times (keep in mind that                    for calculating socio-textual relevance and implementation
many other actions in our examples also can be performed                   and/or importance of each weight can be changed based on
multiple times). The above model can be applied to other                   the context and users/applications needs.
object types for different web, web 2.0 and non-web objects.
It is simple, flexible and easy to update/change based on                  5.    EXPERIMENTAL EVALUATION
different applications and purposes. It shows and quantifies                  In this section, we evaluate the effectiveness of our pro-
what users are important/relevant for each object and how                  posed approaches. First, we describe the dataset, the set-
much is this relevance/importance for each user/object.                    tings and the queries used for the experiments. Next, we
   Now, we propose the final scoring function to calculate                 show and discuss the results.
the social relevance between object o and query q as follows:                 Data. There are very few publicly available datasets for
                                                                           experimentation that include both friendships (social net-
                   X
 socRel(o, q) =            urf (uq , vi ) × uaf (vi , o) × uwf (vi ) (2)   work) and textual keywords (tags). One very good dataset
                  vi ∈Uo                                                   is a dataset generated by [3] from a Web 2.0 website last.fm.
                                                                           Since this dataset has both social and textual components
   In Equation 2, uq is the user issuing the query and Uo is               needed for our setting, we used this dataset. Last.fm is a
the set of users with some actions on the object o. While in               music social network that allows users to listen to different
classical textual relevance models such as tf-idf, more weight             music tracks, tag them with textual keywords and at the
is given to the objects (documents) with 1) more number of                 same time make friendships with other people on the net-
4
  for simplicity, we assume that each user can perform at                  work. While the users listen to a track they have the ability
most one action on each object. However, our model can                     to either move to the next track of the playlist or keep lis-
easily be generalized for multiple actions per user.                       tening to the same. These actions can be interpreted as
5
  http://www.last.fm/                                                      explicit negative and implicit positive feedback respectively
              0.8                                       0.8                                   0.8

              0.7                                       0.7                                   0.7

              0.6                                       0.6                                   0.6

              0.5    soc            text                0.5   soc            text             0.5   soc               text
                     sotext         socBinary                 sotext         socBinary              sotext            socBinary
                     sotextBinary                             sotextBinary                          sotextBinary
              0.4                                       0.4                                   0.4
                     1      2       5       10   20           1      2       5     10    20         1         2       5     10    20
                          (a) Setting 1                         (b) Setting 2                               (c) Setting 3
                                                      Figure 1: Impact of k on nDCG
              0.8    soc            text                0.8   soc            text             0.8   soc               text
                     sotext         socBinary                 sotext         socBinary              sotext            socBinary
              0.7    sotextBinary                       0.7   sotextBinary                    0.7   sotextBinary


              0.6                                       0.6                                   0.6

              0.5                                       0.5                                   0.5

              0.4                                       0.4                                   0.4
                      1         2       3        4             1         2       3       4              1         2        3      4
                          (a) Setting 1                          (b) Setting 2                          (c) Setting 3
                                                 Figure 2: Impact of δ on nDCG
[3]. The dataset used contains 3148 users, 30520 tracks,                   proach computes the results based on our socio-textual rel-
12565 tags and 5616 unique bonds of friendship among the                   evance model discussed in Section 4.1. Finally, socBinary
users collected, which was made freely available by [3]. In                and sotextBinary are simplified versions of soc and sotext
our context (search), each track is a document (object) con-               approaches in which action listening only has the binary
sisting of several textual keywords (tags). Users search for               value 0 or 1 (instead of the actual playcount value). In
desired documents (tracks) by specifying one or more tex-                  other words, the value of user action function is calculated
tual keywords (tags).                                                      as follows:
   Actions. The only information available from last.fm                       uaf (ui , ok ) = 1 if playcount(ui , ok ) > 0, uaf (ui , ok ) = 0
website is the number of times a user listens to a track. This             otherwise.
value is called playcount and is a very important indicator of                For each query, when using soc,sotext, socBinary and so-
relevance/importance between the user and the track. This                  textBinary approaches, we do not use the existing informa-
is an action that can be performed multiple times on a track               tion regarding the relationship/actions between the querying
(user listening to a track multiple times). We use this ac-                user and the objects (tracks). This is done to be fair and
tion to calculate the value of user action function as follows:            be able to evaluate the social component of the approaches
                  playcount(ui ,ok )
uaf (ui , ok ) = max−playcount(u       where playcount(u i , o k ) is      based only on user’s social network and friends.
                                    i)
the number of times user ui listens to track ok without skip-                 Settings. We evaluate the results for three different set-
ping the song and max − playcount(ui ) is the maximum                      tings. Setting1 is the default setting with all the details
playcount value among all tracks that user ui has listened                 described so far. Setting2 is a subset of setting1 in which
to.                                                                        queries that generate fewer than k results are pruned (and
   Queries. For each query, we randomly chose 1 to 3 tex-                  not evaluated). Finally, setting3 is a subset of setting2 in
tual keywords (tags) from the list of all the tags in our                  which queries with querying users with less than 8 immedi-
dataset, and one random user from the list of all users with               ate neighbors are pruned.
the minimum of 4 friends in the system. We filtered out                       Evaluation Metric. We evaluated the accuracy of the
queries that did not generate any results. Queries are per-                methods under comparison using the most common stan-
formed in rounds. Each round consists of 100 queries and is                dard metric: nDCG (normalized discounted cumulative gain)
conducted for each input setting.                                          [15]. When computing nDCG, we consider the playcount
   Ground Truth. To evaluate our results, we have to com-                  value as the relevance value. IDCG (ideal DCG) is the re-
pare them with a ranking that is the most relevant ranking                 sults generated by our ground truth.
to the user (ground truth). Since playcount indicates the real
interest of each user to each track, we leverage playcount to              5.1 Results
construct the most relevant list (ranking) for each query as
follows: for each query, we return list of all textually relevant           5.1.1 Varying k
tracks (tracks contain one or more of the query keywords),                    With the first set of experiment, we evaluate the effec-
order them based on the querying user’s playcount values                   tiveness of the proposed approaches by varying number of
and return the top k results.                                              requested results k. We report the average nDCG for each
   Approaches. We computed top-k query results for each                    round. Here, we fix the number of keywords at 1, alpha at
query using the following five approaches: soc, text, sotext,              0.5 and the threshold value δ at 2. The value of k varies
socBinary and sotextBinary. soc approach generates the re-                 from 1 to 20. The results for three settings are shown in
sults based on the social relevance model only (presented                  Figures 1(a), 1(b) and 1(c), respectively. The first observa-
in Section 4). text approach generates the results based                   tion is that sotext is the most effective approach among all
on the conventional tf-idf relevance model only. sotext ap-                the three settings. This is very promising since it shows that
             0.8                                       0.8                                    0.8

             0.7                                       0.7                                    0.7

             0.6                                       0.6                                    0.6

             0.5    soc            text                0.5   soc            text              0.5   soc            text
                    sotext         socBinary                 sotext         socBinary               sotext         socBinary
                    sotextBinary                             sotextBinary                           sotextBinary
             0.4                                       0.4                                    0.4
                    0   0.2 0.4 0.6 0.8        1             0   0.2 0.4 0.6 0.8         1          0   0.2 0.4 0.6 0.8        1
                        (a) Setting 1                          (b) Setting 2                            (c) Setting 3
                                                   Figure 3: Impact of alpha on nDCG
combining the textual relevance and social relevance using                    documents based on both their social and textual features.
our model generates more relevant/accurate results in com-                    We proposed a new scoring model to calculate social rele-
parison with using only textual relevance or social relevance.                vance between documents and users. The proposed social
The second observation is that sotext and soc are superior                    relevance ranking utilizes the querying user’s social network
to their corresponding binary approaches (sotextBinary and                    and actions her friends perform on web documents (objects)
socBinary). This implies that using a more detailed action                    to generate more accurate results for her (textual) searches.
model will improve the accuracy of the results. The third                     We also showed how to combine the new social relevance
observation is that the results for our approaches are get-                   with the textual relevance model. We performed a set of
ting better from setting1 to setting2 and from setting2 to                    experiments on the real dataset of last.fm and proved that
setting3. This shows that 1) social-related approaches are                    the new approach is superior to the existing approaches.
even better for more realistic settings, and 2) socially-related
approaches generate more accurate results when users have                     7.        REFERENCES
more neighbors (more socially connected).                                     [1] D. Horowitz et al., The anatomy of a large-scale social
                                                                                  search engine. WWW 2010.
5.1.2    Varying δ                                                            [2] D. Carmel et al., Personalized social search based on
  In the second set of our experiments, we evaluate the im-                       the users social network. CIKM 2009.
pact of changing the threshold value δ. For different rounds,                 [3] I. Konstas et al., On social networks and collaborative
we set the threshold value to 1,2,3 and 4. In this set of ex-                     recommendation. SIGIR 2009.
periments, we fix the number of query keywords at 1, k at                     [4] P. Yin et al., On top-k social web search. CIKM 2010.
5 and alpha at 0.5. The results for three settings are shown                  [5] L. Gou et al., SNDocRank : Document Ranking Based
in Figures 2(a), 2(b), and 2(c), respectively. Again, for all                     on Social Networks. WWW 2010.
cases sotext easily outperforms the other four approaches.
                                                                              [6] L. Gou et al., SNDocRank: A social network-based
As expected, the accuracy increases for socially-related ap-
                                                                                  video search ranking framework. MIR 2010.
proaches as the threshold value increases (and obviously no
change for text approach). Again, these figures confirm the                   [7] L. Gou et al., Social network document ranking. JCDL
observation that sotext and sotextBinary are superior than                        2010.
sotextBinary and socBinary approaches.                                        [8] A. Mislove et al., Exploiting social networks for
                                                                                  internet search. HotNets 2006.
5.1.3     Varying alpha                                                       [9] L.C. Freeman et al., Centrality in social networks:
   In the final set of our experiments, we evaluate the im-                       Conceptual clarification. Social Networks, 1(3),
pact of changing the value of alpha (relative weight of so-                       215-239, 1979.
cial and textual relevances) on the effectiveness of the pro-                 [10] M. Rawashdeh et al., Folksonomy-boosted social
posed approaches. We vary the value of alpha from 0 (so-                          media search and ranking. ICMR 2011.
cial only) to 1 (textual only). In this set of experiments,                   [11] R. Schenkel et al., Efficient top-k querying over
we fix the number of query keywords at 1, k at 5 and δ at                         social-tagging networks. SIGIR 2008.
2. The results for all three settings are shown in Figures                    [12] S.A. Yahia et al., Efficient network aware search in
3(a), 3(b), and 3(c), respectively. The obvious observation                       collaborative tagging sites. VLDB 2008.
is that the results do not change for textual or social only ap-              [13] F. Liu et al., Use of social network information to
proaches. The more interesting observation is the behavior                        enhance collaborative filtering performance. Expert
of the two socio-textual approaches. While both show their                        Syst. Appl. 37, 7, 2010.
poorest results on the boundaries (only social or only tex-                   [14] S. Park et al., Applying collaborative filtering
tual), they present their best accuracy in the middle of the                      techniques to movie search for better ranking and
range (when both textual and social relevance are consid-                         browsing. KDD 2007.
ered almost equally). We have to note that for most cases,                    [15] K. Jarvelin et al., Cumulated gain-based evaluation of
the best accuracy is achieved when the social relevance has                       IR techniques. ACM Transactions on Information
a little more weight. Again, this set of experiments confirm                      Systems 2002.
the above observations regarding the superiority of sotext                    [16] T. Yan et al., CrowdSearch: exploiting crowds for
and also the improved accuracy of setting2 and setiing3.                          accurate real-time image search on mobile phones.
                                                                                  MobiSys 2010.
6.   CONCLUSION                                                               [17] M.J. Franklin et al., CrowdDB: answering queries with
  In this paper, we introduced the problem of ranking web                         crowdsourcing. SIGMOD 2011.