=Paper=
{{Paper
|id=Vol-2145/p04
|storemode=property
|title=The Effect of Introducing Content Price in Distributed Social Networks
|pdfUrl=https://ceur-ws.org/Vol-2145/p04.pdf
|volume=Vol-2145
|authors=Christos Tryfonopoulos
}}
==The Effect of Introducing Content Price in Distributed Social Networks==
The Effect of Introducing Content Price in
Distributed Social Networks
Christos Tryfonopoulos
Dept. of Informatics and Telecommunications
University of the Peloponnese
GR22131, Tripoli, Greece
trifon@uop.gr
Abstract—Over the last few years a number of distributed content-based push technologies to cope with the information
social networks with content management capabilities have been explosion is also stressed by the deployment of tools such
introduced both by academia and industry. However, none of as Google Alert and CNN alerts. In an IF scenario, a user
these efforts has so far focused on supporting both information
retrieval and filtering functionality in a distributed social net- posts a subscription (or continuous query) to the system to
working environment. In this work we present a social network- receive notifications whenever certain events of interest take
ing architecture that offers both functionalities –in addition to place (e.g., when a blog post on Winter Olympics becomes
the usual social interaction tasks– in distributed social networks, available).
outline the associated distributed protocols, and introduce a However, the increasing usage of social networks gave rise
novel data source selection mechanism for identifying good data
sources. This novel data source selection mechanism is designed to skepticism about use of centralised services that are able
to take into account a combination of resource selection, predicted to withhold all uploaded content. Such centralised social net-
publication behaviour, and content cost to improve the selection working services are typically owned by private companies and
of information producers by users. To the best of our knowledge users need to upload their content, thus giving away ownership
our approach, coined AGORA, is the first work to model the rights to make it available to others. This allows companies
price of content and to study its effect on retrieval efficiency
and effectiveness in a distributed social network setting. Finally, to exploit user data and sell them to advertisers for profit.
our work goes beyond modelling by providing proof-of-concept To circumvent such practises, distributed social networking
experiments with real-world corpora and social networking data. services –building upon results from the P2P paradigm (e.g.,
[29], [31], [40])– have been proposed both by academia and
Index Terms—distributed social networks, content manage- industry in the form of distributed social platforms [34],
ment, information retrieval/filtering, content price, economic
modelling, experimental evaluation
[14], [14], [35] and distributed social content management
systems [23], [22], [17], [26], [33], [28]. Although all these
approaches offer different types of distributed social networks
I. I NTRODUCTION
that allow users to create communities, share content, and send
Much information of interest to humans is available today messages, none of them focuses on supporting information
on the Web, making it extremely difficult to stay informed filtering functionality in such a setup. Moreover, such dis-
without sifting through enormous amounts of information. In tributed environments without centralised control, prove an
addition, a vast amount of this information is published and interesting business model for pricing the content delivered
shared through social networking sites by users that participate by an information producer. In this setup, each information
in ‘social’ activities through the generation, commenting, producer (e.g., a news agency, a digital library, or a prolific
tagging, liking and sharing vast amounts of digital content. blogger) might have its own customer base of interested
As users engage increasingly more in the usage of social net- followers/subscribers, and may charge the delivered content
works, demand for content management capabilities has forced by subscription or per item.
social networks to go beyond the usual social interactions In this work, we present AGORA, a distributed social
(e.g., like, post, or poke) and offer basic content management networking architecture that allows users to share, search for
functionality. To this end, many social networks adopt the tra- (IR), and subscribe to (IF) content in a fully decentralised
ditional approach of content search (information retrieval - IR) way, while at the same time maintaining ownership of their
[23], [22], [17], [26], [33], [28]. All these approaches however content. Such a design is ideal for creating social content
ignore that the information filtering paradigm may also provide marketplaces1 , where users are able to price and distribute
a good alternative of keeping the users informed and at the their content while at the same time they maintain ownership
same time avoiding the information avalanche. In information rights. Content in our setup can be textual, such as status
filtering (IF) –also referred to as publish/subscribe, continuous updates/blog posts/tweets, or multimedia, such as photos or
querying, information push, or information dissemination– 1 AGORA is the Greek word for marketplace
users are able to subscribe to information sources and be
notified when content of interest is published. This need for Copyright held by the author(s)
19
while Section V experimentally demonstrates the effect of
cost in retrieval and filtering quality and compares the system
effectiveness with/without monetary flow. Finally, Section VI
discusses high-level conclusions, open issues, and possible
extensions.
II. R ELATED W ORK
In this section, we discuss related work in the context of
systems that are designed with content management in mind,
and platforms that provide distributed social networks.
A. Distributed Social Platforms
All available social networks (e.g., Facebook, LinkedIn,
Fig. 1. High-level view of the AGORA architecture. Elgg) are currently based on centralised solutions both for
storing and managing of content, which set scalability lim-
itations on the system and reduce fault-tolerance. Industry
videos, appropriately tagged. Our proposed solution offers has already detected these drawbacks and has lately turned
fundamental social interactions, while emphasising on content into solutions that diverge from the centralised model of the
management (IR and IF) and economic aspects such as the existing systems by developing platforms, such as Diaspora,
price/quality tradeoff of content. To the best of our knowledge, KrawlerX and OpenSocial, that provide APIs to support ap-
this is the first approach that aims at combining IR and IF plication hosting in remote application servers, owned and
in a social networking context, while taking into account managed by the application providers. In a similar spirit,
aspects of economic modelling. In the light of the above, the a strand of research work also moved towards hierarchical
contributions presented in this work are the following: organisations for supporting distributed social networking. The
• We propose a social networking architecture that offers distributed social systems SuperNova [34] and Scope [24] are
content management functionality in terms of IR and IF, based on a two-tier architecture, where nodes with higher
in addition to the usual social interaction tasks typically computing capability become super-nodes and form an overlay
supported in such scenarios. This is the first approach in to provide distributed data management of the P2P social
the literature to offer both functionalities. network. Client nodes connect to super-nodes and rely on
• We present the distributed protocols and services that reg- them for bootstrapping, sharing their content, and accessing
ulate node interactions, provide details on the distributed the shared information. Although all of the platforms and
IR and IF, and outline the different friendship facets system schemes propose the decentralisation of the social
introduced in the architecture. services, one of the main issues of the centralised architectures
persists: the existence of a single point where user information
• We devise a novel method to rank information producers
is collected and may be exploited.
according to (i) the content already published, (ii) the
To alleviate the above disadvantage, distributed platforms
expected future publishing behaviour, and (iii) the price
for social online networks based on the P2P paradigm were
announced by the information producer. This method
proposed [27], [1], [14]. LifeSocial.KOM [14] is a plugin-
allows us to achieve high recall with low cost (by searching
based extendible social platform that provides secure com-
at/subscribing to a small number of information produc-
munication and user-based data access control, and integrates
ers).
a monitoring component that allows users and operators to
• We study the effect of content price in our setup and observe the quality of the distributed system. Similar efforts
experimentally demonstrate that it is a key element on the aimed at spontaneous social networking; they include propos-
delivered content quality. Our modelling utilises concepts als for distributed social services in resource constrained de-
such as correlation between the quality/expertise of the vices (like tablets or smartphones) [35], [8], or in environments
information producer, demand-driven price rates, and cost with no infrastructure guarantees (e.g., high-attendance events)
of resources. [24]. All these approaches offer different types of distributed
Figure 1 shows a high-level view of the envisioned dis- social platforms that allow users to create communities, share
tributed architecture with different types of information pro- content, and send messages, but do not emphasise expressive
ducers and users with varying information needs (IR, IF or content search mechanisms.
both). Finally, other approaches in distributed social networking
The rest of the paper is organised as follows. Section II emphasise on delivering innovative and competitive services;
overviews related research, while Section III discusses the SCIMS [19] relies on an ontology-based model for managing
AGORA architecture and the associated distributed protocols. social relationships and status, the work in [37] aims at per-
Subsequently, Section IV highlights the economic and qualita- sonalising search results based on user context and friendship
tive aspects of AGORA and presents the price/quality tradeoff, relations, while Gemstone [36] targets data availability in the
20
absence of the data owner. To achieve this, a replica storage
scheme based on social relationships, online patterns of nodes,
and user experiences is utilised.
B. Distributed Social Data Management
Our work fits mainly into the area of content management
in distributed social networks and is inspired by previous
approaches on Semantic Overlay Networks (SONs) [29], [31],
[40] and based on works that emphasise on distributed content
location in social networks. Works like the eXO [23] and Fig. 2. The thematic and friend index of a node.
SoNet [22] systems are, similarly to AGORA, inspired by the
P2P paradigm to provide content location and management
services on large-scale decentralised social networks. To do structures, and effective object location mechanisms. Contrary,
so, the authors rely on a structured overlay and exploit the DHT-based architectures [23], [28] ignore node autonomy (by
accurate location mechanisms, but de-emphasise node auton- enforcing deterministic key/content placement) and emergent
omy. Contrary to these approaches, AGORA employs a loose structures (by enforcing network structure).
component architecture and introduces a new type of social Finally, a large number of research in the domain of dis-
relations between nodes: the semantic closeness of content. tributed social networks consists of studies on system security
In this way, nodes that are similar in terms of content, create [28], [7], user privacy [28], [7], [16], [15], [41], distributed
emergent groups likewise to the creation of social relations. access control [2], [4], and authentication mechanisms [2].
Our work shares ideas with the SocialCDN system [17], where Clearly, security issues are also relevant in our design and the
social caches (links among friends) are introduced as a way to AGORA system could benefit by adopting approaches like [2]
alleviate the network traffic and optimise data dissemination or [15] that enforce user privacy and access control. However,
(mainly by social updates). In [17], social cache selection the problem of security is orthogonal to our design and is not
is formulated as the neighbour-dominating set problem and further analysed as it is not the emphasis of this work.
a family of algorithms is proposed and evaluated. Contrary III. S YSTEM OVERVIEW
to AGORA, where the emphasis is on efficiently supporting
In this section, we provide an overview of the architectural
expressive content retrieval in the social paradigm, the em-
components of the proposed system and provide the distributed
phasis on SocialCDN is on the reduction on network traffic to
protocols executed locally by users and information producers.
facilitate fundamental social interactions.
The loose component architecture and the emphasis on node A. Architecture
autonomy of [26] resemble the architectural design of AGORA, We consider a distributed social network, where each user,
where an unstructured overlay network of nodes is utilised characterised by its interests, is connected to friends and
to support the distributed social infrastructure. However, the information producers with similar interests. The interests of a
focus of [26] is on the design of gossip protocols for efficiently user are identified automatically, i.e., by applying clustering on
disseminating profile updates to all interested users and does its local content repository. The network nodes use a heartbeat
not put any attention to the problem of content search and protocol that runs continuously and aims at identifying new
management. information producers based on the likelihood to have similar
Furthermore, the concept of creating and maintaining social interests with the node at hand. Each user maintains two
connections in distributed infrastructures is affined with the routing indices holding information for social friends and
problem of distributed data management in P2P networks. In thematic friends. Social friend links correspond to the social
SONs (e.g., [40], [6]), “social” connections between the peers relationship aspect of the network, while thematic friend links
(e.g., similarity of content, pattern, or distance in a physical are of two types: short-range links (i.e., links to information
level) are exploited to direct the search to nodes with relevant producers with similar interests) and long-range links (i.e.,
data (e.g., as in [32] that studies query routing strategies based links to information producers with dissimilar interests to
on “social” relationships). Other works on SONs (e.g., [31]) maintain connectivity between different clusters in the system).
focus more on the organisation of P2P networks as small- The reorganisation procedure is executed locally by each
world networks, where peers self-organise in groups of similar node and aims at bringing together users and information
interests to facilitate message-efficient query answering. Our producers with similar interests that are likely to share/search
work on AGORA borrows concepts and ideas from research on for/subscribe to content.
SONs and extends them for facilitating efficient and effective
data management in a social network setting. We suggest that B. Joining Agora
SONs offer the most promising architectural solution inspired When a user node connects to the network, its interests are
from the P2P paradigm; it is a perfect fit for a distributed automatically derived by its local content. For each interest,
social networking scenario providing high decentralisation, the node maintains a thematic index (T) containing the contact
high node autonomy, support for emergent semantic and social details and interest descriptions of information producers.
21
These links form the thematic neighbourhood of the node; the literature as the fireworks technique [29]. All the nodes
the links contained in T are refined accordingly by using receiving the message reduce TTL by one and apply the
the rewiring service described below. Furthermore, each node same forwarding technique; the message is not forwarded
maintains a friend index (F) containing the contact details further in the network when TTL reaches zero. Additionally
and interest descriptions of the social neighbourhood of the to forwarding, every node receiving a message compares the
node, comprised of explicitly declared friends in the network. subscription against its identified interests and, if similar,
Figure 2 shows the F and T for an arbitrary node. stores it in its local continuous query data structures to match
it against future publications. Content is kept locally at each
C. Locating information producers information producers in the spirit of [42]. This lack of publi-
This service is applied to locate new information producers cation dissemination is a design decision that offers increased
by establishing new connections and discarding old ones. scalability (by trading recall) and complies with the content
Each node may initiate this procedure by computing a scoring marketplace paradigm we aim for. Thus, only the nodes
function that combines the quality and price of all information indexing a subscription can notify the interested user, although
producers in its thematic index (T). If the computed score is other information producers may also publish relevant content.
greater than a threshold then the node does not need to take When an information producer wants to publish new content
any further action, since it is already aware of information to the network, it matches it against its local continuous query
producers that match its needs and budget. Otherwise, the database to decide which continuous queries match it and thus,
node initiates a process to identify new information producers which user node should be notified. Then, the information
by forwarding a message in the network –bounded by a producers delivers a notification for each continuous query by
time-to-live (TTL) mechanism– using the thematic and social sending to the query initiator a pointer to the matching content;
connections and collecting the interests of other information if the user is not online, the provider stores the message
producers. and delivers it upon user reconnection. Notice that since the
The issued message is forwarded with equal probability information producer maintains the content ownership, it is
to (i) a number of randomly chosen entries contained in a now able to charge the user at the announced content price.
node’s T, (ii) a number of randomly chosen entries contained
in a node’s F, or (iii) the most similar nodes to the message E. Searching for content at information producers
initiator, found in either T or F. The rationale of applying A user issuing content search forwards a message in the
either of the forwarding strategies is that the message initiator network following a mechanism similar to the one described in
should be able to reach information producers both directly the previous section. Additionally to query forwarding, every
(through other similar nodes), but also indirectly (through information producer receiving a query message compares
propagation of the message through non-similar nodes). Each it against the identified interests and, if similar, matches it
node that receives the message adds its interest in it, reduces against the locally stored content. Subsequently, pointers to the
TTL by one, and forwards it in the same manner. When the matching content are sent to the query initiator (along with the
TTL of the message reaches zero, the message containing announced price for the content). Subsequently answers from
the contact info and interests of all information producers all information producers are ranked by a combination of price
that received the message is sent back to its initiator. To and similarity to the query (discussed in the next section) and
speed up the process, every intermediate node receiving the the list is presented to the user.
message may utilise the information in it to refine its thematic
connections. IV. R ANKING OF INFORMATION PRODUCERS
To select which information producer will be monitored,
D. Subscribing for content at information producers the protocols described in the previous section use a scoring
Queries/subscriptions are issued as free text or keywords function to rank information producers. In this section, we
under the Vector Space Model and are formulated as term quantify these concepts and give the rationale between our
vectors. The user subscribing for specific content forwards choices.
a message in the network with a TTL using both its social
and thematic connections. The issued message is forwarded A. Quality vs Price
both to (i) friends that have interests similar to the query and The ranking strategy for the information producers is a
are contained in F and (ii) a small number of information critical component since it allows users to locate relevant infor-
producers contained in the T chosen as described below. mation and information producers to maximise their revenue.
Initially, the message initiator compares the user subscription Empirical studies have shown that price and quality are the
against its interests and, if similar, the message is forwarded to two key determinants of the consumer’s choice to buy or not
all of its short-range links, i.e., the message is broadcasted to a product [10]. Thus, to make an informed selection on the
the node’s neighbourhood (explosion). Otherwise, the message information producers, a user has to rank them based on a
is forwarded to a small fixed number of nodes that have the blend (denoted by tunable parameter β) of content quality and
highest similarity to the subscription (fixed forwarding). The content price both in a retrieval and a filtering scenario. This
combination of the two routing strategies is referred to in combination describes the benefit/cost ratio for the content and
22
allows the user to assign a score to every information producer. new mutually beneficial connections. This is in line with the
To this end, the score for each information producer is given notion of friendship connections in any social network.
by: The main goal of this modelling is to study the influence of
score(p, i) = β · price(p, i) − (1 − β) · quality(p, i) (1) the cost component on the quality of received content, study
the interactions between users and information producers, and
Here, p denotes an information producer, q denotes an gain insights about the overall behaviour of this prototype
information need (in the form of a query or a subscription), content market.
price(p, i) refers to the announced price an information pro-
ducer provides for the provided content, and quality(p, i) C. Content quality specifics
denotes how relevant p is to i. Information producers compute
the price based on the demand and according to popularity of The quality of the content is a difficult thing to model
themselves and that of their content. Price and quality have as it cannot be known prior to acquiring it. Hence the best
the same range of values to allow for their combination, while option is to assess the quality of the information producer.
price is typically recomputed whenever the popularity of the To do so, one has to take into account the dual capacity of
information producer changes. In Section V we study the price information producers in AGORA: to answer one-time content
in different scenarios, and show the effect on the quality of requests (IR) and satisfy long-term information needs (IF).
retrieved content (i.e., recall) when the price choice is (i) To this end, the quality of an information producer has to be
random, (ii) strongly correlated with quality, and (iii) partially based on (i) the quality of already published content (since
correlated with quality. this depicts its ability to satisfy IR tasks from users) and (ii) a
predicted quality of the content to be published in the future
B. Content price specifics (since this depicts its ability to satisfy IF tasks from users).
The necessity of both facets is better illustrated in the case of
In this section, we analyse the economic modelling of
an information producer that provides content in the form of
AGORA and review the basic assumptions and expectations
technical articles and news items; articles have a long shelf-
from such a modelling. Usefulness of the information goods
life and are good candidates for recurrent sales, while news
received by a subscriber is a qualitative criterion, that is diffi-
items have an extremely short shelf-life and after some time
cult to model. In AGORA, we model usefulness by matching
users loose interest in them.
interests, i.e., by assuming that all received content relevant
to the information need are useful to the subscriber, and do 1. Quality of published content. The quality of the already
not discuss issues such as novelty, coverage of the field, or published content is known as the resource selection problem
user effort. In our modelling, after a user acquires a history in the areas of information retrieval and databases. Hence,
of transactions with certain information producers, develops a number of standard resource selection algorithms such as
an affect for some of them. Affect can be modelled in various tf-idf based methods, CORI, or language models (see [30]
ways, depending on the task at hand, and can be either positive for a nice overview) have been proposed. We use a standard
or negative [11]. In AGORA, a user does not a priori know the tf-idf based component that combines accuracy of modelling
quality of the information goods, but uses the affect developed and ease of implementation. Thus, in our setup, quality of
from previous transactions to approximate it. Subsequently, he published content is given by:
compares the values of information quality to the expected X
max
[0.5 · log (dfp,t ) + 0.5 · log tfp,t ] (2)
values and update its affect [21]. t∈i
The costs in AGORA are results of actions [18], such as
transactions, network communication, and use of common Our approach uses the standard IR constructs of document
infrastructure. Information producers try to maximise their frequency (df ) and maximum term frequency (tf max ) [25],
revenue while minimising expenses that occur due to their while a balanced combination of these two metrics (0.5 in the
actions. Contrary users try to maximise the utility of the above equation) is used to equally emphasise importance of
received content, while minimising expenses that occur due to df and tf max according to [30], [43].
actions. In general, the content market in AGORA is not a pure 2. Predicted quality of content to be published. To pre-
competitive market in the sense of e.g., [39], since users do dict the quality of the content to be published, we model
not know in advance the exact content quality they are buying. IR statistics per information producer as time-series data
AGORA resembles the modelling of a team of sales people and use statistical analysis tools [5] to predict future values
[38], where stakeholders try to collaborate with others in order based on past observations. Such techniques take into account
to get their expertise for a (cross/up) sale. After deciding who assumptions about the internal structure of the time series
to collaborate with, it is possible to model the gap between like trends and seasonality and tend to emphasise recent
the initial expectations and the actual actions. In [12], it is observations. Since seasonality requires long-term statistics
shown that this gap is smaller in a competitive relationship that are infeasible to maintain (e.g., several years of data to
compared to that of a cooperative relationship. As in many observe seasonality in the publication of Christmas content),
cooperative environments each stakeholder usually retains its we resort to double exponential smoothing techniques [5] as
connections with the others, while also being free to explore our prediction mechanism, since it requires a small amount of
23
data, emphasises recent over older data observations, and al- producers that were chosen according to his own information
lows for trend identification (useful for news items or trending needs and ranked using the formulas discussed in Section IV.
topics). Thus, in our setup, predicted quality of content to be For deciding the per user budget, we relied on studies about
published is given by: budget distribution and spending for a variety of cases, ranging
X
∗ from family budgets to consumer budgets [20], [9]. The main
) + log δ(cs∗p ) + 1 + 1
log δ(dfp,t (3) conclusions drawn from these studies are that (i) budget
t∈i
distribution follows a power law, with a small percentage
∗
In the formula above, function δ(dfp,t ) stands for the differ- of families/consumers having a high (yearly) budget, and a
ence between the predicted and the last document frequency large percentage of the families being in the (long) tail of
(df ) observed, and δ(cs∗ ) is the difference in the collection the distribution, with a low budget, and (ii) the percentage
size of information producer p reflecting its overall expected of the income spend on content does not vary with the
future publishing activity. These values are calculated for all budget. According to [3] and the above remarks, we divided
terms t contained in the user subscription. In this way we the users into three classes: low, average and high budget
model two aspects of the information producer’s behaviour: users with a population of 600K (or 60% of total users),
(i) its potential to publish relevant documents in the future 300K (or 30% of total users) and 100K (or 10% of total
∗
(reflected by δ(dfp,t )) and (ii) its overall expected future users) respectively. Subsequently, we experimentally computed
publishing activity (reflected by δ(cs∗ )). Notice also that, a budget that would allow the users to subscribe to the top-100
in the above formula, the publication of relevant documents (i.e., 10%) information producers, and allowed the low budget
∗
(i.e., δ(dfp,t )) is more emphasised than the publication rate users to have 60%, the medium budget users to have 80% and
∗
(delta(cs )) due to the nesting of the log functions. The the high budget users to have 120% of this ideal budget.
addition of 1 in the log functions is used to yield positive To measure the effect of content cost in quality (with and
predictions and avoid log(0). without monetary flow) we utilise the following metrics:
3. Putting it all together. Given Formulas 2 and 3, the • Messages. We measure the message traffic per action in
overall quality of an information producer p with respect to AGORA.
an information need X i is given by: • Recall. We measure average recall over all rounds by
max
quality(p, i) = [(0.5 · log (dfp,t ) + 0.5 · log tfp,t )+ computing the ratio of the total number of retrieved
t∈i content to the total number of relevant content for all
∗
) + log δ(cs∗p ) + 1 + 1 ]
log δ(dfp,t users.
(4) • Ranking. We use an extension of Spearman’s footrule
distance to compare rankings of information producers
V. E XPERIMENTAL E VALUATION calculated by users. This metric allows us to compare two
In this section, we present our findings from the introduction different information producer rankings by calculating the
of content cost, and how it affects the effectiveness of the distance between the elements in two ranking lists. In our
system. We study the behaviour of AGORA using different implementation if an element from list A is not present
scenarios, while varying the correlation between price, quality in list B, it is considered as being in the last available
and customer demand. position in B.
Finally, to assist the reproducibility of our results we plan
A. Experimental Setup and Metrics
to publicly release our code in an appropriate repository after
We used a real-life document collection containing 2M doc- the publication of the work.
uments with a vocabulary of about 600K terms from a focused
Web crawl. The documents were categorised in ten different B. Varying the price-quality correlation
categories by content, with the smallest category having 67K In this experiment, we aimed at observing the behaviour of
documents and the largest one of 325K. In all experiments, AGORA when varying the correlation 0 ≤ κ ≤ 1 between the
the network consists of 1, 000 information producers and 1M price and the quality of an information producer; k = 1 means
users. Each information producer started with a database of that the price an information producer charges for content
300 documents initialised with 15% random category, 10% is fully correlated to the quality of that producer. Notice
non-categorised, and 75% single category documents, resulting that quality in AGORA is easily calculated/forecasted using
in 100 specialised information producers for each category. We Equation 4 and in this case users know that the content from
artificially constructed information needs for the users (in the this information producer will be expensive but relevant (i.e.,
form of queries and subscriptions with multiple terms) using useful to them). The other extreme is when k = 0, where
the document corpus and by selecting terms that are strong prices have no correlation to the quality of the information
representatives of a document category (i.e., a frequent term producer and are chosen randomly. For the rest of the cases,
in documents of one category and infrequent in documents of i.e., 0 < κ < 1, the correlation is modelled as the likelihood
the other categories). The simulation was done in rounds. that an information producer sells κ% of content underpriced
We introduced budget constraints on a per user basis; thus, (up to 20% of the initial value), and 1 − κ overpriced (up to
each user was able to follow only a few selected information 20% of the initial value).
24
40
κ = 0% κ = 0%
κ = 25% κ = 50%
0.5 κ = 50% κ = 75%
κ = 75% 35 κ = 100%
30
Extended Spearman’s footrule metric
0.4
25
0.3
recall
20
0.2 15
10
0.1
5
0 0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1.0
β β
Fig. 3. Recall against β. Fig. 5. Difference in information producer ranking against β.
Figure 3 presents the achieved recall for different values different values of κ is an effect of the modelling of AGORA as
of β and κ, where we notice that the introduction of price a closed system where monetary flow is limited by the budgets
for content reduces the observed recall (notice that recall has of the users and no new wealth is produced.
the highest values for β = 0, i.e., no pricing is involved in Figure 5 shows how differently users rank information
information producer ranking). This is an important result, producers for varying values of β and κ. The difference in
showing that when information producers charge for content, the ranking of information producers is measured using an
consumers trade quality for cheaper options. Notice also that extension of Spearman’s footrule metric. To produce a point
for every κ value tested recall is retained high, as long as it in the graph we compare the ranked lists of information
plays the most important role in the ranking (β < 50%). This producers for each pair of users for and each value of β in
was also expected, since when price is of importance, con- the x-axis and average the results. Notice that the case of
sumers will choose cheaper information producers, leading to β = 0 is omitted as the extended Spearman’s footrule metric
a reduction in the observed recall. Additionally, when the price is always 0 since the lists compared are identical (users take
is the only ranking criterion for users (β = 1), recall is close to into account only the quality to rank information producers).
that of a random choice of information producers (remember When β increases (i.e., price becomes more important in
that users monitor only 10% of information producers in the the ranking process) Spearman’s metric increases too, as
system). information producers with high quality get lower positions in
the ranking, while information producers with lower quality
0.6
β = 0.2
(but cheaper) are ranked high. Finally, when the price of
β = 0.5
β = 0.8 an information producer is not associated with its quality
0.5 (random price setup or κ = 100%), there are big differences in
the ranking of information producers (especially in the cases
0.4 where price matters more – the leftmost points in the graph)
due to the introduced randomness.
recall
0.3
20
κ = 0%
κ = 25%
0.2 κ = 50%
κ = 75%
0.1 15
# of messages/document
0
0.2 0.4 0.6 0.8 1
κ 10
Fig. 4. Recall against κ.
5
Figure 4 shows how recall is affected for three different val-
ues of β, when κ increases. When one of the two components
becomes dominant in the ranking function, it outweighs the 0
0.0 0.2 0.4 0.6 0.8 1.0
effect of the other. This is in line with our expectations, since β
price dominates the ranking function and quality is sacrificed
to reduce costs. Finally, notice that consistent recall between Fig. 6. Message traffic against β.
25
3.5 0.5
heartbeat messages Change in Publishing Behaviour
other messages Consistent Publishing Behaviour
0.45
3
0.4
2.5
# of messages (x10K)
0.35
2
recall
0.3
1.5
0.25
1
0.2
0.5
0.15
0 0.1
0 50 100 0 0.2 0.4 0.6 0.8 1
κ β
Fig. 7. Message traffic against κ. Fig. 8. Recall for different scenarios against β.
C. System Performance graph are (i) the drop in recall for both scenarios as β increases
In the first series of experiments we targeted the perfor- and (ii) the higher recall values for the case of topic speciali-
mance of AGORA in terms of message traffic. Figure 6 shows sation. The reason for the first observation is the shift of users
that the message traffic per user incurred in AGORA is reducing to cheaper information producers due to the importance of
as β increases. This happens because users utilise the price content price in the ranking function. Additionally, the reason
component to rank information producers, thus choosing those for the second observation is that the build up of expertise by
of lower price and poor quality. This reduces the overall the information producers reflects on the higher quality values,
message number as less content is exchanged in the system which in turn leads to ranking quality information producers
due to users subscribing to non-expert information producers; higher. Contrary, when information producers shift their topic,
our observations here are consistent with those in Section V-B users are not able to correlate price and quality and thus,
that correlate recall and β. select information producers of poor quality (hence the drop
Figure 7 demonstrates the total amount of traffic observed in in recall).
AGORA and how this traffic is split in the various message cat-
VI. C ONCLUSIONS AND F UTURE W ORK
egories, as the price-quality correlation is varied. As expected
the heart-beat protocol messages dominate the messaging load, In this work, we proposed AGORA, a social networking
as necessary messages with information producer statistics architecture and the related distributed protocols to facilitate
and prices are disseminated in the system. Notice also that distributed content management in the form of information
these messages are not affected by price-quality correlation, retrieval and information filtering. In addition, we introduced
since the information producers have to update their publica- a novel selection mechanism for information producers that
tion statistics and prices, regardless of their customer base. allows users to rank them according to (i) their expertise,
Finally, notice that the number of messages for querying (ii) their predicted future content publications, and (iii) the
for/subscribing to/receiving content are affected as κ increases price of content they deliver. The experimental evaluation of
since information producers that are of high quality widen their our proposal demonstrated the behaviour of such a content
customer base with more users. marketplace in terms of price/quality tradeoffs, recall and
message traffic. To the best of our knowledge, these are
D. Varying the behaviour of information producers the first results that connect recall and message traffic in
In this section we look into recall and how this is affected by a distributed social network with the content cost, and put
two very different information producer behaviours: topic spe- economic modelling at the heart of system design. The most
cialisation and topic shift. In topic specialisation, information important outcome of our study is that price should participate
producers disregard market conditions and maintain their topic in the ranking of information producers less than a quarter
specialisation even if this results in lower revenues. Contrary in of the total score to avoid the delivery of irrelevant content
topic shift, information producers may alter their specialisation to users. We also showed that the introduction of content
topic over time based on changes in user demand, revenue and price improves system scalability by reducing message traffic
market conditions. In the latter case, an information producer and imposing a reasonable use of resources to stakeholders.
initially publishes content from one category, and some rounds Overall, introducing a monetary value for the production and
later may decide to switch to a different category to simulate dissemination of content in a distributed social network proves
changes in portfolio or a different business strategy. an interesting business model that conserves resources and
Figure 8 shows the observed recall for both scenarios and improves scalability. However, this should be executed in a
different values of β. The most important observations in this careful fashion to avoid user dissatisfaction that may rise from
26
the content cost itself, and the reduced access to relevant [18] B. Johansson and H. Persson. Self-organised adjustments in a market
information. with price-setting firms. In Chaos, Solitons and Fractals, 2003.
[19] M. A. Kabir, J. Han, J. Yu, and A. Colman. SCIMS: A Social Context
Future directions of research include more extensive ex- Information Management System for Socially-Aware Applications. In
perimentation (using a grid service for vast-scale experiments CAiSE, 2012.
and analytics), more detailed economic modelling (e.g., model [20] B. B. A. Kennickell and K. Moore. Recent Changes in U.S. Family
Finances: Evidence from the 2001 and 2004 Survey of Consumer
AGORA as an open system, perform monetary flow monitor- Finances, 2006.
ing/analysis, incorporate agent-style BDI modelling [13]), and [21] C. Li, M. Singh, and K. Sycara. A Dynamic Pricing Mechanism for
implementation of a prototype system. P2P Referral Systems. In AAMAS, 2004.
[22] G. Liu, H. Shen, and L. Ward. An efficient and trustworthy P2P and
ACKNOWLEDGEMENTS social network integrated file sharing system. In P2P, 2012.
[23] A. Loupasakis, N. Ntarmos, and P. Triantafillou. eXO: Decentralized
The author would like to thank the anonymous reviewers Autonomous Scalable Social Networking. In CIDR, 2011.
for their comments and suggestions on improving the quality [24] M. Mani, A.-M. Nguyen, and N. Crespi. SCOPE: A prototype for
spontaneous P2P social networking. In PerCom Workshops, 2010.
and presentation of the work.
[25] C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Informa-
tion Retrieval. Cambridge University Press, 2008.
R EFERENCES [26] G. Mega, A. Montresor, and G. Picco. Efficient dissemination in
[1] S. Abbas, J. Pouwelse, D. Epema, and H. Sips. A Gossip-Based decentralized social networks. In P2P, 2011.
Distributed Social Networking System. In WETICE, 2009. [27] J. Mitchell-Wong, S. Goh, M. Chhetri, R. Kowalczyk, and Q. Vo. A
[2] M. Backes, M. Maffei, and K. Pecina. A Security API for Distributed Framework for Open, Distributed and Self-Managed Social Platforms.
Social Networks. In NDSS, 2011. In VECN, 2008.
[3] L. P. Breker. A survey of network pricing schemes. In Theoretical [28] R. Narendula, T. Papaioannou, and K. Aberer. My3: A highly-available
Computer Science, 1996. P2P-based online social network. In P2P, 2011.
[4] S. Buchegger, D. Schioberg, L.-H. Vu, and A. Datta. PeerSoN: P2P [29] C. H. Ng, K. C. Sia, and C. H. Chang. Advanced Peer Clustering and
social networking: early experiences and insights. In SNS, 2009. Firework Query Model in the Peer-to-Peer Network. In WWW, 2002.
[5] C. Chatfield. The Analysis of Time Series - An Introduction. CRC Press [30] H. Nottelmann and N. Fuhr. Evaluating Different Methods of Estimating
2004. Retrieval Quality for Resource Selection. In SIGIR, 2003.
[6] A. Crespo and H. Garcia-Molina. Routing indices for peer-to-peer [31] P. Raftopoulou and E. Petrakis. iCluster: a Self-Organising Overlay
systems. In Proceedings of 22nd IEEE International ICDCS Conference, Network for P2P Information Retrieval. In ECIR, 2008.
pages 23–32, Vienna, Austria, July 2002. [32] P. Raftopoulou, E. Petrakis, and C. Tryfonopoulos. Rewiring strategies
[7] L. A. Cutillo, R. Molva, and T. Strufe. Safebook: A privacy-preserving for semantic overlay networks. In DPD, 2009.
online social network leveraging on real-life trust. IEEE Communica- [33] P. Raftopoulou, C. Tryfonopoulos, E. Petrakis, and N. Zevlis. DS4:
tions Magazine, 2009. Introducing Semantic Friendship in Distributed Social Networks. In
[8] A. de Spindler, M. Grossniklaus, and M. C. Norrie. Development CoopIS, 2013.
Framework for Mobile Social Applications. In CAiSE, 2009. [34] R. Sharma and A. Datta. SuperNova: Super-peers based architecture for
[9] J. B. DeLong. Six Families Budget Their Money. In Lecture notes for decentralized online social networks. In COMSNETS, 2012.
American Economic History, University of California at Berkeley, 2008. [35] P. Stuedi, I. Mohomed, M. Balakrishnan, Z. Mao, V. Ramasubramanian,
[10] A. Dumrogsiri, M. Fan, A. Jain, and K. Moinzadeh. A supply D. Terry, and T. Wobber. Contrail: Enabling Decentralized Social
chain model with direct and retail channels. In European Joumal of Networks on Smartphones. In Middleware, 2011.
Operational Research, 2008. [36] F. Tegeler, D. Koll, and X. Fu. Gemstone: Empowering Decentralized
[11] M. Faig and B. Jerez. Inflation, Prices, and Information and Competitive Social Networking with High Data Availability. In GLOBECOM, 2011.
Search. Journal of Macroeconomics, 2006.
[37] B. Upadhyaya and E. Choi. Social Overlay: P2P Infrastructure for Social
[12] L. Forker and P. Stannack. Cooperation versus Competition: do buyers
Networks. In NCM, 2009.
and suppliers really see eye-to-eye? In European Journal of Purchasing
[38] T. Üstüner and D. Godes. Better Sales Networks. In Harvard Business
and Supply Management, 2000.
Review, 2006.
[13] M. Georgeff, B. Pell, M. Pollack, M. Tambe, and M. Wooldridge.
The Belief-Desire-Intention Model of Agency. In Intelligent Agents [39] H. R. Varian. Pricing Information Goods. In Research Libraries Group
V: Agents Theories, Architectures, and Languages, 1999. Symposium, Harvard Law School, 1995.
[14] K. Graffi, C. Gross, P. Mukherjee, A. Kovacevic, and R. Steinmetz. [40] S. Voulgaris, M. van Steen, and K. Iwanicki. Proactive Gossip-based
LifeSocial.KOM: A P2P-Based Platform for Secure Online Social Management of Semantic Overlay Networks. CCPE, 19(17), 2007.
Networks. In P2P, 2010. [41] M. Xue, B. Carminati, and E. Ferrari. P3D - Privacy-Preserving Path
[15] B. Greschbach, G. Kreitz, and S. Buchegger. The devil is in the metadata Discovery in Decentralized Online Social Networks. In COMPSAC,
- New privacy challenges in Decentralised Online Social Networks. In 2011.
PerCom Workshops, 2012. [42] C. Zimmer, C. Tryfonopoulos, K. Berberich, M. Koubarakis, and
[16] F. Gunther, M. Manulis, and T. Strufe. Key management in distributed G. Weikum. Approximate Information Filtering in Peer-to-Peer Net-
online social networks. In WOWMOM, 2011. works. In WISE, 2008.
[17] L. Han, M. Punceva, B. Nath, S. Muthukrishnan, and L. Iftode. So- [43] C. Zimmer, C. Tryfonopoulos, K. Berberich, G. Weikum, and
cialCDN: Caching techniques for distributed social networks. In P2P, M. Koubarakis. Node Behavior Prediction for Large-Scale Approximate
2012. Information Filtering. In LSDS-IR @ SIGIR, 2007.
27