=Paper= {{Paper |id=None |storemode=property |title=Social-Textual Search and Ranking |pdfUrl=https://ceur-ws.org/Vol-842/crowdsearch-khodaei.pdf |volume=Vol-842 |dblpUrl=https://dblp.org/rec/conf/www/KhodaeiS12 }} ==Social-Textual Search and Ranking== https://ceur-ws.org/Vol-842/crowdsearch-khodaei.pdf

Social-Textual Search and Ranking∗

Ali Khodaei Cyrus Shahabi
Department of Computer Science Department of Computer Science
University of Southern California University of Southern California
Los Angeles,CA 90089,USA Los Angeles,CA 90089,USA
khodaei@usc.edu shahabi@usc.edu

ABSTRACT ditional web where users are often in read-only mode, Web
Web search engines are traditionally focused on textual con- 2.0 have enabled users to be in read-write mode. In other
tent of data. Emergence of social networks and Web 2.0 words, users have started to express themselves in the forms
applications makes it interesting to see how social data can of generating and publishing content (e.g., writing a tweet),
be used in improving the conventional textual search on the re-sharing interesting content by others (e.g., re-tweeting)
web. In this paper, we focus on how to improve the ef- and rating/evaluating the existing content (e.g., choosing a
fectiveness of web search by utilizing social data available favorite tweet). This emergence of social networks and Web
from users, users actions and their underlying social net- 2.0 resulted in huge amount of data available that can be
work on the web. We define and formalize the problem of utilized in many domains.
social-textual (socio-textual ) search and show how social as- In this paper, we focus on taking advantage of this infor-
pect of the web can be effectively integrated into the textual mation in the domain of (textual) web search. We argue that
search engines. We propose a new social relevance ranking by integrating information from users’ social networks and
based on several parameters including relationship between their activities on the web, we can improve the conventional
users, importance of each user and actions users perform on textual search and ranking. In today’s web, we can know the
web documents (objects). We show how the proposed so- existence and degree of relationships among people and also
cial ranking can be combined with the conventional textual at the same time have the knowledge of people’s interests de-
relevance ranking. We have conducted an extensive set of rived from their actions/activities on the web. It is both in-
experiments on the data from online radio website last.fm to tuitive and proven [3] that people have very similar interests
evaluate the effectiveness of our proposed approaches. Our with their friends. Also, people tend to trust the opinions
experimental results are very promising and show a signifi- and judgements of their friends more than strangers. We
cant improvement for socio-textual ranking over textual only show how to modify the existing (textual) relevance rankings
and social only approaches. to take into consideration user’s social network in generating
ranked results to the search queries. Consider the following
example. A user searches for ”funny video clip”. Using con-
1. INTRODUCTION ventional textual search, user will receive a ranked results
Social networks on the web have grown significantly over of some funny video clips. On the other hand, using user’s
the past few years. People have started to reconstruct their social network, videos contain query keywords (i.e., funny
friendship networks in the virtual world and many of these video clips) that have more comments, likes or favorites by
virtual relationships are good representatives of their actual user’s friends should be ranked higher. However, the new
(friendship) networks in the real world. At the same time ranked ranking is not trivial. Do we give more weights to
and with the emergence of Web 2.0, many web users have textual keywords or to the social network? With social as-
started to engage more with the web. In contrast to the tra- pect of the ranking, do we need to assign different weights
to different friends of the user? How about the popularity
∗This research is supported in part by the NSF grant of the users (friends) in general? Also, what are the actions
IS-1115153, the USC Integrated Media Systems Center that are important for objects and how we quantify those?
(IMSC), and also by unrestricted cash and equipment gifts In order to combine social data into textual relevance rank-
from Google, Microsoft and Qualcomm. The opinions, find-
ings, and conclusions or recommendations expressed in this ing, social relevance between users and objects (documents)
publication are those of the authors and do not necessarily has to be defined first. In order to model social relevance,
reflect the views of the National Science Foundation. existence and degree of relationships between users have to
be taken into consideration. Also, actions permitted for each
type of document (object) and their importance should be
modeled. Finally, overall importance/impact of each user
has to be considered as well.
We first review the few existing studies regarding social
Copyright c 2012 for the individual papers by the papers’ authors. Copy- search and utilization of social networks in the web search.
ing permitted for private and academic purposes. This volume is published Then, we define and formalize our problem. Next, we present
and copyrighted by its editors. new scoring methods to calculate social relevance between
CrowdSearch 2012 workshop at WWW 2012, Lyon, France users and documents (objects). We show how the impor-
tance of users, different relationships among users and ac- or +1ed that result. Their algorithms are not public and it
tions they perform on objects can impact the final relevance seems that they only show the likes and +1s and the actual
ranking. After proposing the new social relevance model, we ranking is not affected.
present a novel socio-textual relevance ranking technique to There exists a relevant but somehow different topic of folk-
combine textual and social relevance rankings. Finally, in sonomies. Tags and other conceptual structures in social
our experimental section, we evaluate the effectiveness of tagging networks are called folksonomies. A folksonomy is
our proposed models and show that our new relevance rank- usually interpreted as a set of user-tag-resource triplets. Ex-
ing methods are effective and improve the accuracy of the isting work for social search on folksonomies is mainly on
returned results. improving search process over social data (tags and users)
gathered from social tagging sites [10][11][12]. In this con-
2. RELATED WORK text, relationships among users and tags and also among
tags themselves are of significant importance.
There are two groups of related work on the application
Finally, there are few studies on the role of collaborative
of social networks in search. With the first group, people
filtering in this new social context. Role of social networks
through their social networks are identified and contacted
on collaborative filtering is studied in [3] and [13]. It is shown
directly to answer web queries. In other words, queries are
that using social networks in collaborative filtering and rec-
directly sent to individuals and answer of the queries are
ommendations makes the recommendations better in com-
coming from people themselves [1, 16, 17]. In this approach
parison with the traditional collaborative approaches. In an-
called search services, people and their networks are indexed
other direction, [14] studies the application of collaborative
and a search engine has to find the most relevant people to
filtering on a movie search engine. Authors propose to calcu-
send the queries/questions to.
late documents (movies) authorities based on users’ ratings
The main focus of the second group is on the search pro-
(using collaborative filtering) instead of pagerank and other
cess over social data (tags, users and objects) from sites/application
link-based authority measures. Social networks of users are
with social aspect such as social tagging sites and (some)
non-existent in this study.
Web 2.0 applications. In [2], authors investigate a personal-
In contrast to the above, our notion of social search is
ized social search engine based on users’ relations. They
to utilize exiting social networks to improve the accuracy
study the effectiveness of three types of social networks:
and relevance of convention textual web search. For us,
familiarity-based, similarity-based and both. In [4], which
search still has its core textual dimension, represented by
is a short paper, authors propose two search strategies for
textual keywords/content in the query and the documents.
performing search on the web: textual relevance (TR)-based
In parallel to the textual dimension, (querying) user’s social
search and social influence (SI)-based search. In the former,
network is exploited to make the final search results more
the search is first performed according to the classical tf-idf
relevant. Our focus is mostly on finding/modeling effective
approach and then for each retrieved document the social in-
measures to calculate the social relevance/ranking and com-
fluence between its publisher and querying user is computed.
bine it with the existing standard textual relevance rankings.
The final ranking is based on both scores. In the latter, first
We also take into consideration the actions users perform on
the social influence of the users to the querying user is cal-
documents (as described in Section 3).
culated and users with high scores are selected. Then, for
each document, the final ranking score is determined based
on both TR and SI. In both strategies, two separate costly 3. DEFINITIONS AND FORMALIZATIONS
steps are needed. Also, it is not clear how accurate are the In this section, we formally define and formalize the prob-
ranking functions since there is no experimental evaluation lem of socio-textual search.
for the effectiveness of the rankings. Objects: We assume a collection O = {o1 ,o2 ,...on } of n
In a set of similar papers [5, 6, 7], authors propose sev- objects (documents). An object can be a traditional web
eral social network-based search ranking frameworks. The document such as a news page or a business home page
proposed frameworks consider both document contents and or a Web 2.0 object such as a YouTube video, a tweet, a
the similarity between a searcher and document owners in Facebook status or any other similar entity. An object o is
a social network. They also propose a new user similar- composed of a set of textual keywords Ko and a set of users
ity algorithm (MAS) to calculate user similarity in a social Uo associated with it. Uo is a set of users with some type of
network. In this set of papers, the focus is more on user actions on the object o (see actions below).
similarity functions and how to improve those algorithms. Users and Social Network: There is a set U = {u1 ,u2 ,...um }
Most of their experiments are limited to a small number of m users using the system. We also assume a social net-
of queries on YouTube videos with 3 users, 15 queries and work modeled as a directed graph G = (V, E) whose nodes
small number of textual keywords. Relevant (interesting) represent the users of the system and edges represent the
result is a result (video) whose category is simialr/equal to ties (relationship) among the users. The most common type
the dominant category of videos that searcher has uploaded. of relationship is the friendship relationship but other type
In a relatively older paper [8], authors explore the possi- of relationships can be also applied (e.g. follow relationship
bility of using online social networks to improve the search in Twitter).
on the Internet. Although this paper is not very technical Actions: There is a set A = {a1 ,a2 ,...al } of l actions that
, it provides some interesting intuitions on integration of users can perform on the objects. These actions represent
social networks and web search. With regards to commer- the relationship between users and objects. For instance, in
cial search engines, Bing and recently Google have started Twitter, users can perform the following actions on objects
to integrate Facebook and Google+, respectively, to their (tweets): publish a tweet, retweet a tweet or make a tweet
search process. For some search results, they show query as their favorite tweet.
issuer’s friends (from his/her social network) that have liked Socio-Textual Query: A socio-textual query is defined
as Q = hKq , Sq i, where Kq is the textual part of query speci- In this section, we propose a new social relevance model
fied as a set of keywords in the query and Sq is the social part to calculate the social relevance between users and objects.
of query specified as the user uq issuing the query and the We also show how to combine the proposed social relevance
social network G. Since our social network is always G, it is model with an existing textual relevance model and intro-
sufficient to define the socio-textual query as hKq , uq i. Note duce our socio-textual relevance ranking.
that while the textual part of the query is always explicit in We first propose a new scoring approach to calculate the
the query, the social part is often implicit. In other words, social relevance between an object o and a query q (issued by
we can safely assume that the system (search engine) knows user qu ). Our social relevance ranking creates a new scoring
the user issuing the query and also the underlying social net- framework to retrieve and rank objects based on the so-
work, hence the social part of the query can be automatically cial dimension of the query and objects. In order to have an
added to the textual part by the search engine.1 accurate scoring function and retrieve the most socially rele-
User relevance: User ui is relevant to user uj if the net- vant results to the user, we consider three important factors:
work distance from the node corresponding to ui to the node (1) relevance of each user to the query’s user, (2) importance
corresponding to uj is less than or equal to a system defined of each user in general, and (3) relationship between users
threshold value δ. The less the distance between two nodes, and actions they perform on each object. In the following
the more (user) relevant are those two nodes (users) 2 . Net- we discuss each measure.
work distance can be any of the existing network distances User Relatedness. We measure the relatedness of a
in the literature. Two users with the network distance more user to the querying user (and hence to the query itself)
than δ are considered non-relevant to each other. by the user relatedness function urf (uq , ui ). There are sev-
Social relevance: Social relevance between the object o eral measures to calculate the relatedness/closeness of two
and the query q is defined based on the social relationship nodes in a graph/social network. Some of the approaches
that exists between the querying user (uq ) and users asso- consider the distance between nodes, some look at the be-
ciated with the object o (Uo ). Object o and query q are haviors of users in a social network and some take into con-
socially relevant if at least one of the object’s users (Uo ) is sideration number of mutual neighbors of two nodes. While
user relevant with the user issuing the query. The larger the the required data is available, any of the above methods or
user relevance is, the more socially relevant o and q are. We other exiting methods can be used for the user relatedness
denote social relevance of object o to query q by socRel(o, q). function as long as the following three constraints are sat-
We define social relevance in more details in Section 4. isfied: (1) urf (ui , ui ) = 1, (2) 0 ≤ urf (ui , uj ) ≤ 1 and
Textual relevance: Object o is textually relevant to the the more relevant the users, the higher the value, and (3)
query q if there exists at least one keyword belonging to both urf (ui , uj ) = 0 when urf (ui , uj ) < δ. The first constraint
o and q, i.e., Kq ∩ Ko 6= ∅. We represent textual relevance states that each user is the most related user to herself. The
of object o to query q by texRel(o, q). 3 second constraint normalizes this measure and also ensures
Socio-textual relevance: Object o is social-textual (socio- that the more related users are assigned higher scores. Fi-
textual) relevant to the query q if it is both socially and tex- nally, third constraint filters out all relationships that their
tually relevant to the query q. Socio-textual relevance can significance is below a certain threshold (δ). As a simple
be defined by a monotonic scoring function F of textual and example satisfying all the above constraints and also cap-
social relevances. For example, F can be the weighted sum turing the relatedness among users, we can use an inverse of
of the social and textual relevances: distance between users (nodes) in the social network (graph)
as follows:
urf (ui , uj ) = dist(u1i ,uj )
F (o, q) = α.socRel(o, q) + (1 − α).texRel(o, q) (1)
where dist(ui , uj ) is the number of edges in a shortest
. α is a parameter assigning relative weights to social and path connecting ui and uj .
textual relevances. The output of function F (o, q) is the User Weight. We quantify the overall (global) impor-
socio-textual relevance score of the object o for the query q, tance of each user by the user weight function uwf (ui ). This
and is denoted by stRel(o, q). In Section 4 we show how to measure quantifies the significance of a user in its social net-
calculate socio-textual relevance. work. For instance, for Twitter, a user with many followers
Socio-textual search: A socio-textual search identifies should be assigned a higher weight than a user with only
and ranks all the objects that are socio-textual relevant to few followers, or for Facebook, a user with more friends is
the query q. The result is the top-k objects sorted based on more important to the social network than a user with fewer
objects’ socio-textual relevance scores. The parameter k is friends. In the field of graph theory and social networks this
determined by the user. value is called centrality and there exist several approaches
to measure it. Four popular methods to compute centrality
4. SOCIAL RELEVANCE RANKING are: degree centrality, betweenness, closeness, and eigenvec-
tor centrality. [9] is a good resource For a review of these
1
Naturally, here and in other parts of this paper, we consider methods and further reading. Similar to the user relatedness
only users who willingly make their social information public function, the user weight function is also general enough and
to the system. most of the existing approaches can be applied to uwf . As
2
For simplicity of presentation, from now on, we assume that an example for this function we can use the degree central-
users’ social network is implemented as an undirected graph. ity of nodes (users) as an indication of their importance as
Hence, user relevance and other relationships between users follows:
will be symmetric.
3 uwf (ui ) = deg(u
m−1
i)
In this paper, we do not focus on textual relevance mod-
els. We use popular tf-idf model when we need to calculate where deg(ui ) is the number of edges incident upon ui and
textual relevance. m is number of nodes (users).
User Action. The importance of each user for each ob- query keywords (tf), and 2) more important query keywords
ject is directly related to the action(s)4 user perform on (idf), in our social relevance model, more weight is given to
each object. Publishing/owning an object by a user shows the objects with 1) more important actions 2) performed by
a higher weight/relevance between the object and the user more important users 3) whom are more related (closer) to
than only commenting on the object. For instance, a user the querying user.
uploading (and thus owning) a YouTube video is more sig-
nificant to that video than a user who only comments on 4.1 Socio-Textual Search
that video. The importance/relevance of each user to an In this section, we combine social relevance with an exist-
object is measured by the user action function uaf (ui , ok ) ing textual relevance model (tf-idf) to calculate the overall
and is dependant on the type of action user ui performs on socio-textual relevance of the object o with query q with re-
object ok . For each system, weight/significance of each ac- gards to both social and textual dimensions. Socio-textual
tion should be determined based on specific characteristics relevance ranking considers both the textual relevance of the
of that system. We normalize the value of uaf by assign- objects to the query and also the social relevance of the ob-
ing values between 0 and 1 (inclusive) to it. The higher jects to the query. We formulate the socio-textual relevance
the value is, the more important/relevant is the user to the ranking as follows:
object. With some systems, there exist actions that can
be performed multiple times by a user on an object, and
the more the action is performed the higher is the relevance stRel(o, q) = α × socRel(o, q) + (1 − α) × texRel(o, q)
between the user and that object. For instance, in an on-
X
= α× urf (uq , vi ) × uaf (vi , o) × uwf (vi )
line radio website (e.g. last.fm5 ), action listening to a track vi ∈Uo
can be done multiple times by a user. The more the user X
chooses to listen to a track, the more relevant/significant is + (1 − α) × tf (o, tj ) × idf (tj )
that track to the user. Below, we show examples of differ- tj ∈Kq
ent actions and their corresponding weights for four popular (3)
web 2.0. objects. Note that the assigned weights are only
our suggestion, they can (and should) be easily changed for where stRel(o, q) is the socio-textual relevance of object
different applications and/or settings. Examples are as fol- o to query q where user uq is the query issuer; socRel and
lows: texRel are corresponding social and textual relevances for
object o; urf , uaf and uwf are user-related functions as
• YouTube videos: Actions = {own(publish) : 1, f avorite : described above; Kq is set of query keywords (tags) and tj s
0.9, like : 0.7, comment : 0.4}. are individual query keywords (tags); tf (o, tj ) is term fre-
quency function determining relevance of term tj to object
• Twitter tweets: Actions = {tweet(publish) : 1, f avorite : o; idf (tj ) is inverted document frequency function determin-
0.9, retweet : 0.5}. ing the importance of keyword tj in the entire collection;
and α is a parameter giving relative weights to social and
• Facebook objects: Actions = {own(publish) : 1, like : textual importance. Not only the implementation of urf ,
0.8, share : 0.6, comment : 0.4}. uaf and uwf functions are flexible (see Section 4), also the
• last.fm tracks (songs): Actions = {like : 0.8, tag : implementation of texRel is flexible. Although, we used
playcount
0.5, comment : 0.4, listen : max }. the conventional tf-idf model for capturing the textual rele-
playcount
vancy, any other textual relevance (similarity) function can
Note that for last.fm, we see an example of an action (listen) be also used. Equation 3 provides the general framework
that can be performed multiple times (keep in mind that for calculating socio-textual relevance and implementation
many other actions in our examples also can be performed and/or importance of each weight can be changed based on
multiple times). The above model can be applied to other the context and users/applications needs.
object types for different web, web 2.0 and non-web objects.
It is simple, flexible and easy to update/change based on 5. EXPERIMENTAL EVALUATION
different applications and purposes. It shows and quantifies In this section, we evaluate the effectiveness of our pro-
what users are important/relevant for each object and how posed approaches. First, we describe the dataset, the set-
much is this relevance/importance for each user/object. tings and the queries used for the experiments. Next, we
Now, we propose the final scoring function to calculate show and discuss the results.
the social relevance between object o and query q as follows: Data. There are very few publicly available datasets for
experimentation that include both friendships (social net-
X
socRel(o, q) = urf (uq , vi ) × uaf (vi , o) × uwf (vi ) (2) work) and textual keywords (tags). One very good dataset
vi ∈Uo is a dataset generated by [3] from a Web 2.0 website last.fm.
Since this dataset has both social and textual components
In Equation 2, uq is the user issuing the query and Uo is needed for our setting, we used this dataset. Last.fm is a
the set of users with some actions on the object o. While in music social network that allows users to listen to different
classical textual relevance models such as tf-idf, more weight music tracks, tag them with textual keywords and at the
is given to the objects (documents) with 1) more number of same time make friendships with other people on the net-
4
for simplicity, we assume that each user can perform at work. While the users listen to a track they have the ability
most one action on each object. However, our model can to either move to the next track of the playlist or keep lis-
easily be generalized for multiple actions per user. tening to the same. These actions can be interpreted as
5
http://www.last.fm/ explicit negative and implicit positive feedback respectively
0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 soc text 0.5 soc text 0.5 soc text
sotext socBinary sotext socBinary sotext socBinary
sotextBinary sotextBinary sotextBinary
0.4 0.4 0.4
1 2 5 10 20 1 2 5 10 20 1 2 5 10 20
(a) Setting 1 (b) Setting 2 (c) Setting 3
Figure 1: Impact of k on nDCG
0.8 soc text 0.8 soc text 0.8 soc text
sotext socBinary sotext socBinary sotext socBinary
0.7 sotextBinary 0.7 sotextBinary 0.7 sotextBinary

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4
1 2 3 4 1 2 3 4 1 2 3 4
(a) Setting 1 (b) Setting 2 (c) Setting 3
Figure 2: Impact of δ on nDCG
[3]. The dataset used contains 3148 users, 30520 tracks, proach computes the results based on our socio-textual rel-
12565 tags and 5616 unique bonds of friendship among the evance model discussed in Section 4.1. Finally, socBinary
users collected, which was made freely available by [3]. In and sotextBinary are simplified versions of soc and sotext
our context (search), each track is a document (object) con- approaches in which action listening only has the binary
sisting of several textual keywords (tags). Users search for value 0 or 1 (instead of the actual playcount value). In
desired documents (tracks) by specifying one or more tex- other words, the value of user action function is calculated
tual keywords (tags). as follows:
Actions. The only information available from last.fm uaf (ui , ok ) = 1 if playcount(ui , ok ) > 0, uaf (ui , ok ) = 0
website is the number of times a user listens to a track. This otherwise.
value is called playcount and is a very important indicator of For each query, when using soc,sotext, socBinary and so-
relevance/importance between the user and the track. This textBinary approaches, we do not use the existing informa-
is an action that can be performed multiple times on a track tion regarding the relationship/actions between the querying
(user listening to a track multiple times). We use this ac- user and the objects (tracks). This is done to be fair and
tion to calculate the value of user action function as follows: be able to evaluate the social component of the approaches
playcount(ui ,ok )
uaf (ui , ok ) = max−playcount(u where playcount(u i , o k ) is based only on user’s social network and friends.
i)
the number of times user ui listens to track ok without skip- Settings. We evaluate the results for three different set-
ping the song and max − playcount(ui ) is the maximum tings. Setting1 is the default setting with all the details
playcount value among all tracks that user ui has listened described so far. Setting2 is a subset of setting1 in which
to. queries that generate fewer than k results are pruned (and
Queries. For each query, we randomly chose 1 to 3 tex- not evaluated). Finally, setting3 is a subset of setting2 in
tual keywords (tags) from the list of all the tags in our which queries with querying users with less than 8 immedi-
dataset, and one random user from the list of all users with ate neighbors are pruned.
the minimum of 4 friends in the system. We filtered out Evaluation Metric. We evaluated the accuracy of the
queries that did not generate any results. Queries are per- methods under comparison using the most common stan-
formed in rounds. Each round consists of 100 queries and is dard metric: nDCG (normalized discounted cumulative gain)
conducted for each input setting. [15]. When computing nDCG, we consider the playcount
Ground Truth. To evaluate our results, we have to com- value as the relevance value. IDCG (ideal DCG) is the re-
pare them with a ranking that is the most relevant ranking sults generated by our ground truth.
to the user (ground truth). Since playcount indicates the real
interest of each user to each track, we leverage playcount to 5.1 Results
construct the most relevant list (ranking) for each query as
follows: for each query, we return list of all textually relevant 5.1.1 Varying k
tracks (tracks contain one or more of the query keywords), With the first set of experiment, we evaluate the effec-
order them based on the querying user’s playcount values tiveness of the proposed approaches by varying number of
and return the top k results. requested results k. We report the average nDCG for each
Approaches. We computed top-k query results for each round. Here, we fix the number of keywords at 1, alpha at
query using the following five approaches: soc, text, sotext, 0.5 and the threshold value δ at 2. The value of k varies
socBinary and sotextBinary. soc approach generates the re- from 1 to 20. The results for three settings are shown in
sults based on the social relevance model only (presented Figures 1(a), 1(b) and 1(c), respectively. The first observa-
in Section 4). text approach generates the results based tion is that sotext is the most effective approach among all
on the conventional tf-idf relevance model only. sotext ap- the three settings. This is very promising since it shows that
0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 soc text 0.5 soc text 0.5 soc text
sotext socBinary sotext socBinary sotext socBinary
sotextBinary sotextBinary sotextBinary
0.4 0.4 0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) Setting 1 (b) Setting 2 (c) Setting 3
Figure 3: Impact of alpha on nDCG
combining the textual relevance and social relevance using documents based on both their social and textual features.
our model generates more relevant/accurate results in com- We proposed a new scoring model to calculate social rele-
parison with using only textual relevance or social relevance. vance between documents and users. The proposed social
The second observation is that sotext and soc are superior relevance ranking utilizes the querying user’s social network
to their corresponding binary approaches (sotextBinary and and actions her friends perform on web documents (objects)
socBinary). This implies that using a more detailed action to generate more accurate results for her (textual) searches.
model will improve the accuracy of the results. The third We also showed how to combine the new social relevance
observation is that the results for our approaches are get- with the textual relevance model. We performed a set of
ting better from setting1 to setting2 and from setting2 to experiments on the real dataset of last.fm and proved that
setting3. This shows that 1) social-related approaches are the new approach is superior to the existing approaches.
even better for more realistic settings, and 2) socially-related
approaches generate more accurate results when users have 7. REFERENCES
more neighbors (more socially connected). [1] D. Horowitz et al., The anatomy of a large-scale social
search engine. WWW 2010.
5.1.2 Varying δ [2] D. Carmel et al., Personalized social search based on
In the second set of our experiments, we evaluate the im- the users social network. CIKM 2009.
pact of changing the threshold value δ. For different rounds, [3] I. Konstas et al., On social networks and collaborative
we set the threshold value to 1,2,3 and 4. In this set of ex- recommendation. SIGIR 2009.
periments, we fix the number of query keywords at 1, k at [4] P. Yin et al., On top-k social web search. CIKM 2010.
5 and alpha at 0.5. The results for three settings are shown [5] L. Gou et al., SNDocRank : Document Ranking Based
in Figures 2(a), 2(b), and 2(c), respectively. Again, for all on Social Networks. WWW 2010.
cases sotext easily outperforms the other four approaches.
[6] L. Gou et al., SNDocRank: A social network-based
As expected, the accuracy increases for socially-related ap-
video search ranking framework. MIR 2010.
proaches as the threshold value increases (and obviously no
change for text approach). Again, these figures confirm the [7] L. Gou et al., Social network document ranking. JCDL
observation that sotext and sotextBinary are superior than 2010.
sotextBinary and socBinary approaches. [8] A. Mislove et al., Exploiting social networks for
internet search. HotNets 2006.
5.1.3 Varying alpha [9] L.C. Freeman et al., Centrality in social networks:
In the final set of our experiments, we evaluate the im- Conceptual clarification. Social Networks, 1(3),
pact of changing the value of alpha (relative weight of so- 215-239, 1979.
cial and textual relevances) on the effectiveness of the pro- [10] M. Rawashdeh et al., Folksonomy-boosted social
posed approaches. We vary the value of alpha from 0 (so- media search and ranking. ICMR 2011.
cial only) to 1 (textual only). In this set of experiments, [11] R. Schenkel et al., Efficient top-k querying over
we fix the number of query keywords at 1, k at 5 and δ at social-tagging networks. SIGIR 2008.
2. The results for all three settings are shown in Figures [12] S.A. Yahia et al., Efficient network aware search in
3(a), 3(b), and 3(c), respectively. The obvious observation collaborative tagging sites. VLDB 2008.
is that the results do not change for textual or social only ap- [13] F. Liu et al., Use of social network information to
proaches. The more interesting observation is the behavior enhance collaborative filtering performance. Expert
of the two socio-textual approaches. While both show their Syst. Appl. 37, 7, 2010.
poorest results on the boundaries (only social or only tex- [14] S. Park et al., Applying collaborative filtering
tual), they present their best accuracy in the middle of the techniques to movie search for better ranking and
range (when both textual and social relevance are consid- browsing. KDD 2007.
ered almost equally). We have to note that for most cases, [15] K. Jarvelin et al., Cumulated gain-based evaluation of
the best accuracy is achieved when the social relevance has IR techniques. ACM Transactions on Information
a little more weight. Again, this set of experiments confirm Systems 2002.
the above observations regarding the superiority of sotext [16] T. Yan et al., CrowdSearch: exploiting crowds for
and also the improved accuracy of setting2 and setiing3. accurate real-time image search on mobile phones.
MobiSys 2010.
6. CONCLUSION [17] M.J. Franklin et al., CrowdDB: answering queries with
In this paper, we introduced the problem of ranking web crowdsourcing. SIGMOD 2011.