=Paper= {{Paper |id=Vol-2145/p04 |storemode=property |title=The Effect of Introducing Content Price in Distributed Social Networks |pdfUrl=https://ceur-ws.org/Vol-2145/p04.pdf |volume=Vol-2145 |authors=Christos Tryfonopoulos }} ==The Effect of Introducing Content Price in Distributed Social Networks== https://ceur-ws.org/Vol-2145/p04.pdf
            The Effect of Introducing Content Price in
                   Distributed Social Networks
                                                      Christos Tryfonopoulos
                                           Dept. of Informatics and Telecommunications
                                                   University of the Peloponnese
                                                     GR22131, Tripoli, Greece
                                                           trifon@uop.gr


   Abstract—Over the last few years a number of distributed                content-based push technologies to cope with the information
social networks with content management capabilities have been             explosion is also stressed by the deployment of tools such
introduced both by academia and industry. However, none of                 as Google Alert and CNN alerts. In an IF scenario, a user
these efforts has so far focused on supporting both information
retrieval and filtering functionality in a distributed social net-         posts a subscription (or continuous query) to the system to
working environment. In this work we present a social network-             receive notifications whenever certain events of interest take
ing architecture that offers both functionalities –in addition to          place (e.g., when a blog post on Winter Olympics becomes
the usual social interaction tasks– in distributed social networks,        available).
outline the associated distributed protocols, and introduce a                  However, the increasing usage of social networks gave rise
novel data source selection mechanism for identifying good data
sources. This novel data source selection mechanism is designed            to skepticism about use of centralised services that are able
to take into account a combination of resource selection, predicted        to withhold all uploaded content. Such centralised social net-
publication behaviour, and content cost to improve the selection           working services are typically owned by private companies and
of information producers by users. To the best of our knowledge            users need to upload their content, thus giving away ownership
our approach, coined AGORA, is the first work to model the                 rights to make it available to others. This allows companies
price of content and to study its effect on retrieval efficiency
and effectiveness in a distributed social network setting. Finally,        to exploit user data and sell them to advertisers for profit.
our work goes beyond modelling by providing proof-of-concept               To circumvent such practises, distributed social networking
experiments with real-world corpora and social networking data.            services –building upon results from the P2P paradigm (e.g.,
                                                                           [29], [31], [40])– have been proposed both by academia and
  Index Terms—distributed social networks, content manage-                 industry in the form of distributed social platforms [34],
ment, information retrieval/filtering, content price, economic
modelling, experimental evaluation
                                                                           [14], [14], [35] and distributed social content management
                                                                           systems [23], [22], [17], [26], [33], [28]. Although all these
                                                                           approaches offer different types of distributed social networks
                       I. I NTRODUCTION
                                                                           that allow users to create communities, share content, and send
    Much information of interest to humans is available today              messages, none of them focuses on supporting information
on the Web, making it extremely difficult to stay informed                 filtering functionality in such a setup. Moreover, such dis-
without sifting through enormous amounts of information. In                tributed environments without centralised control, prove an
addition, a vast amount of this information is published and               interesting business model for pricing the content delivered
shared through social networking sites by users that participate           by an information producer. In this setup, each information
in ‘social’ activities through the generation, commenting,                 producer (e.g., a news agency, a digital library, or a prolific
tagging, liking and sharing vast amounts of digital content.               blogger) might have its own customer base of interested
As users engage increasingly more in the usage of social net-              followers/subscribers, and may charge the delivered content
works, demand for content management capabilities has forced               by subscription or per item.
social networks to go beyond the usual social interactions                     In this work, we present AGORA, a distributed social
(e.g., like, post, or poke) and offer basic content management             networking architecture that allows users to share, search for
functionality. To this end, many social networks adopt the tra-            (IR), and subscribe to (IF) content in a fully decentralised
ditional approach of content search (information retrieval - IR)           way, while at the same time maintaining ownership of their
[23], [22], [17], [26], [33], [28]. All these approaches however           content. Such a design is ideal for creating social content
ignore that the information filtering paradigm may also provide            marketplaces1 , where users are able to price and distribute
a good alternative of keeping the users informed and at the                their content while at the same time they maintain ownership
same time avoiding the information avalanche. In information               rights. Content in our setup can be textual, such as status
filtering (IF) –also referred to as publish/subscribe, continuous          updates/blog posts/tweets, or multimedia, such as photos or
querying, information push, or information dissemination–                    1 AGORA is the Greek word for marketplace
users are able to subscribe to information sources and be
notified when content of interest is published. This need for              Copyright held by the author(s)




                                                                      19
                                                                        while Section V experimentally demonstrates the effect of
                                                                        cost in retrieval and filtering quality and compares the system
                                                                        effectiveness with/without monetary flow. Finally, Section VI
                                                                        discusses high-level conclusions, open issues, and possible
                                                                        extensions.

                                                                                             II. R ELATED W ORK
                                                                          In this section, we discuss related work in the context of
                                                                        systems that are designed with content management in mind,
                                                                        and platforms that provide distributed social networks.

                                                                        A. Distributed Social Platforms
                                                                           All available social networks (e.g., Facebook, LinkedIn,
          Fig. 1. High-level view of the AGORA architecture.            Elgg) are currently based on centralised solutions both for
                                                                        storing and managing of content, which set scalability lim-
                                                                        itations on the system and reduce fault-tolerance. Industry
videos, appropriately tagged. Our proposed solution offers              has already detected these drawbacks and has lately turned
fundamental social interactions, while emphasising on content           into solutions that diverge from the centralised model of the
management (IR and IF) and economic aspects such as the                 existing systems by developing platforms, such as Diaspora,
price/quality tradeoff of content. To the best of our knowledge,        KrawlerX and OpenSocial, that provide APIs to support ap-
this is the first approach that aims at combining IR and IF             plication hosting in remote application servers, owned and
in a social networking context, while taking into account               managed by the application providers. In a similar spirit,
aspects of economic modelling. In the light of the above, the           a strand of research work also moved towards hierarchical
contributions presented in this work are the following:                 organisations for supporting distributed social networking. The
  • We propose a social networking architecture that offers             distributed social systems SuperNova [34] and Scope [24] are
    content management functionality in terms of IR and IF,             based on a two-tier architecture, where nodes with higher
    in addition to the usual social interaction tasks typically         computing capability become super-nodes and form an overlay
    supported in such scenarios. This is the first approach in          to provide distributed data management of the P2P social
    the literature to offer both functionalities.                       network. Client nodes connect to super-nodes and rely on
 • We present the distributed protocols and services that reg-          them for bootstrapping, sharing their content, and accessing
   ulate node interactions, provide details on the distributed          the shared information. Although all of the platforms and
   IR and IF, and outline the different friendship facets               system schemes propose the decentralisation of the social
   introduced in the architecture.                                      services, one of the main issues of the centralised architectures
                                                                        persists: the existence of a single point where user information
 • We devise a novel method to rank information producers
                                                                        is collected and may be exploited.
   according to (i) the content already published, (ii) the
                                                                           To alleviate the above disadvantage, distributed platforms
   expected future publishing behaviour, and (iii) the price
                                                                        for social online networks based on the P2P paradigm were
   announced by the information producer. This method
                                                                        proposed [27], [1], [14]. LifeSocial.KOM [14] is a plugin-
   allows us to achieve high recall with low cost (by searching
                                                                        based extendible social platform that provides secure com-
   at/subscribing to a small number of information produc-
                                                                        munication and user-based data access control, and integrates
   ers).
                                                                        a monitoring component that allows users and operators to
  • We study the effect of content price in our setup and               observe the quality of the distributed system. Similar efforts
    experimentally demonstrate that it is a key element on the          aimed at spontaneous social networking; they include propos-
    delivered content quality. Our modelling utilises concepts          als for distributed social services in resource constrained de-
    such as correlation between the quality/expertise of the            vices (like tablets or smartphones) [35], [8], or in environments
    information producer, demand-driven price rates, and cost           with no infrastructure guarantees (e.g., high-attendance events)
    of resources.                                                       [24]. All these approaches offer different types of distributed
   Figure 1 shows a high-level view of the envisioned dis-              social platforms that allow users to create communities, share
tributed architecture with different types of information pro-          content, and send messages, but do not emphasise expressive
ducers and users with varying information needs (IR, IF or              content search mechanisms.
both).                                                                     Finally, other approaches in distributed social networking
   The rest of the paper is organised as follows. Section II            emphasise on delivering innovative and competitive services;
overviews related research, while Section III discusses the             SCIMS [19] relies on an ontology-based model for managing
AGORA architecture and the associated distributed protocols.            social relationships and status, the work in [37] aims at per-
Subsequently, Section IV highlights the economic and qualita-           sonalising search results based on user context and friendship
tive aspects of AGORA and presents the price/quality tradeoff,          relations, while Gemstone [36] targets data availability in the



                                                                   20
absence of the data owner. To achieve this, a replica storage
scheme based on social relationships, online patterns of nodes,
and user experiences is utilised.

B. Distributed Social Data Management
   Our work fits mainly into the area of content management
in distributed social networks and is inspired by previous
approaches on Semantic Overlay Networks (SONs) [29], [31],
[40] and based on works that emphasise on distributed content
location in social networks. Works like the eXO [23] and                             Fig. 2. The thematic and friend index of a node.
SoNet [22] systems are, similarly to AGORA, inspired by the
P2P paradigm to provide content location and management
services on large-scale decentralised social networks. To do              structures, and effective object location mechanisms. Contrary,
so, the authors rely on a structured overlay and exploit the              DHT-based architectures [23], [28] ignore node autonomy (by
accurate location mechanisms, but de-emphasise node auton-                enforcing deterministic key/content placement) and emergent
omy. Contrary to these approaches, AGORA employs a loose                  structures (by enforcing network structure).
component architecture and introduces a new type of social                   Finally, a large number of research in the domain of dis-
relations between nodes: the semantic closeness of content.               tributed social networks consists of studies on system security
In this way, nodes that are similar in terms of content, create           [28], [7], user privacy [28], [7], [16], [15], [41], distributed
emergent groups likewise to the creation of social relations.             access control [2], [4], and authentication mechanisms [2].
Our work shares ideas with the SocialCDN system [17], where               Clearly, security issues are also relevant in our design and the
social caches (links among friends) are introduced as a way to            AGORA system could benefit by adopting approaches like [2]
alleviate the network traffic and optimise data dissemination             or [15] that enforce user privacy and access control. However,
(mainly by social updates). In [17], social cache selection               the problem of security is orthogonal to our design and is not
is formulated as the neighbour-dominating set problem and                 further analysed as it is not the emphasis of this work.
a family of algorithms is proposed and evaluated. Contrary                                   III. S YSTEM OVERVIEW
to AGORA, where the emphasis is on efficiently supporting
                                                                            In this section, we provide an overview of the architectural
expressive content retrieval in the social paradigm, the em-
                                                                          components of the proposed system and provide the distributed
phasis on SocialCDN is on the reduction on network traffic to
                                                                          protocols executed locally by users and information producers.
facilitate fundamental social interactions.
   The loose component architecture and the emphasis on node              A. Architecture
autonomy of [26] resemble the architectural design of AGORA,                 We consider a distributed social network, where each user,
where an unstructured overlay network of nodes is utilised                characterised by its interests, is connected to friends and
to support the distributed social infrastructure. However, the            information producers with similar interests. The interests of a
focus of [26] is on the design of gossip protocols for efficiently        user are identified automatically, i.e., by applying clustering on
disseminating profile updates to all interested users and does            its local content repository. The network nodes use a heartbeat
not put any attention to the problem of content search and                protocol that runs continuously and aims at identifying new
management.                                                               information producers based on the likelihood to have similar
   Furthermore, the concept of creating and maintaining social            interests with the node at hand. Each user maintains two
connections in distributed infrastructures is affined with the            routing indices holding information for social friends and
problem of distributed data management in P2P networks. In                thematic friends. Social friend links correspond to the social
SONs (e.g., [40], [6]), “social” connections between the peers            relationship aspect of the network, while thematic friend links
(e.g., similarity of content, pattern, or distance in a physical          are of two types: short-range links (i.e., links to information
level) are exploited to direct the search to nodes with relevant          producers with similar interests) and long-range links (i.e.,
data (e.g., as in [32] that studies query routing strategies based        links to information producers with dissimilar interests to
on “social” relationships). Other works on SONs (e.g., [31])              maintain connectivity between different clusters in the system).
focus more on the organisation of P2P networks as small-                  The reorganisation procedure is executed locally by each
world networks, where peers self-organise in groups of similar            node and aims at bringing together users and information
interests to facilitate message-efficient query answering. Our            producers with similar interests that are likely to share/search
work on AGORA borrows concepts and ideas from research on                 for/subscribe to content.
SONs and extends them for facilitating efficient and effective
data management in a social network setting. We suggest that              B. Joining Agora
SONs offer the most promising architectural solution inspired               When a user node connects to the network, its interests are
from the P2P paradigm; it is a perfect fit for a distributed              automatically derived by its local content. For each interest,
social networking scenario providing high decentralisation,               the node maintains a thematic index (T) containing the contact
high node autonomy, support for emergent semantic and social              details and interest descriptions of information producers.



                                                                     21
These links form the thematic neighbourhood of the node;                  the literature as the fireworks technique [29]. All the nodes
the links contained in T are refined accordingly by using                 receiving the message reduce TTL by one and apply the
the rewiring service described below. Furthermore, each node              same forwarding technique; the message is not forwarded
maintains a friend index (F) containing the contact details               further in the network when TTL reaches zero. Additionally
and interest descriptions of the social neighbourhood of the              to forwarding, every node receiving a message compares the
node, comprised of explicitly declared friends in the network.            subscription against its identified interests and, if similar,
Figure 2 shows the F and T for an arbitrary node.                         stores it in its local continuous query data structures to match
                                                                          it against future publications. Content is kept locally at each
C. Locating information producers                                         information producers in the spirit of [42]. This lack of publi-
   This service is applied to locate new information producers            cation dissemination is a design decision that offers increased
by establishing new connections and discarding old ones.                  scalability (by trading recall) and complies with the content
Each node may initiate this procedure by computing a scoring              marketplace paradigm we aim for. Thus, only the nodes
function that combines the quality and price of all information           indexing a subscription can notify the interested user, although
producers in its thematic index (T). If the computed score is             other information producers may also publish relevant content.
greater than a threshold then the node does not need to take              When an information producer wants to publish new content
any further action, since it is already aware of information              to the network, it matches it against its local continuous query
producers that match its needs and budget. Otherwise, the                 database to decide which continuous queries match it and thus,
node initiates a process to identify new information producers            which user node should be notified. Then, the information
by forwarding a message in the network –bounded by a                      producers delivers a notification for each continuous query by
time-to-live (TTL) mechanism– using the thematic and social               sending to the query initiator a pointer to the matching content;
connections and collecting the interests of other information             if the user is not online, the provider stores the message
producers.                                                                and delivers it upon user reconnection. Notice that since the
   The issued message is forwarded with equal probability                 information producer maintains the content ownership, it is
to (i) a number of randomly chosen entries contained in a                 now able to charge the user at the announced content price.
node’s T, (ii) a number of randomly chosen entries contained
in a node’s F, or (iii) the most similar nodes to the message             E. Searching for content at information producers
initiator, found in either T or F. The rationale of applying                 A user issuing content search forwards a message in the
either of the forwarding strategies is that the message initiator         network following a mechanism similar to the one described in
should be able to reach information producers both directly               the previous section. Additionally to query forwarding, every
(through other similar nodes), but also indirectly (through               information producer receiving a query message compares
propagation of the message through non-similar nodes). Each               it against the identified interests and, if similar, matches it
node that receives the message adds its interest in it, reduces           against the locally stored content. Subsequently, pointers to the
TTL by one, and forwards it in the same manner. When the                  matching content are sent to the query initiator (along with the
TTL of the message reaches zero, the message containing                   announced price for the content). Subsequently answers from
the contact info and interests of all information producers               all information producers are ranked by a combination of price
that received the message is sent back to its initiator. To               and similarity to the query (discussed in the next section) and
speed up the process, every intermediate node receiving the               the list is presented to the user.
message may utilise the information in it to refine its thematic
connections.                                                                      IV. R ANKING OF INFORMATION PRODUCERS
                                                                            To select which information producer will be monitored,
D. Subscribing for content at information producers                       the protocols described in the previous section use a scoring
   Queries/subscriptions are issued as free text or keywords              function to rank information producers. In this section, we
under the Vector Space Model and are formulated as term                   quantify these concepts and give the rationale between our
vectors. The user subscribing for specific content forwards               choices.
a message in the network with a TTL using both its social
and thematic connections. The issued message is forwarded                 A. Quality vs Price
both to (i) friends that have interests similar to the query and             The ranking strategy for the information producers is a
are contained in F and (ii) a small number of information                 critical component since it allows users to locate relevant infor-
producers contained in the T chosen as described below.                   mation and information producers to maximise their revenue.
Initially, the message initiator compares the user subscription           Empirical studies have shown that price and quality are the
against its interests and, if similar, the message is forwarded to        two key determinants of the consumer’s choice to buy or not
all of its short-range links, i.e., the message is broadcasted to         a product [10]. Thus, to make an informed selection on the
the node’s neighbourhood (explosion). Otherwise, the message              information producers, a user has to rank them based on a
is forwarded to a small fixed number of nodes that have the               blend (denoted by tunable parameter β) of content quality and
highest similarity to the subscription (fixed forwarding). The            content price both in a retrieval and a filtering scenario. This
combination of the two routing strategies is referred to in               combination describes the benefit/cost ratio for the content and



                                                                     22
allows the user to assign a score to every information producer.           new mutually beneficial connections. This is in line with the
To this end, the score for each information producer is given              notion of friendship connections in any social network.
by:                                                                          The main goal of this modelling is to study the influence of
   score(p, i) = β · price(p, i) − (1 − β) · quality(p, i) (1)             the cost component on the quality of received content, study
                                                                           the interactions between users and information producers, and
   Here, p denotes an information producer, q denotes an                   gain insights about the overall behaviour of this prototype
information need (in the form of a query or a subscription),               content market.
price(p, i) refers to the announced price an information pro-
ducer provides for the provided content, and quality(p, i)                 C. Content quality specifics
denotes how relevant p is to i. Information producers compute
the price based on the demand and according to popularity of                  The quality of the content is a difficult thing to model
themselves and that of their content. Price and quality have               as it cannot be known prior to acquiring it. Hence the best
the same range of values to allow for their combination, while             option is to assess the quality of the information producer.
price is typically recomputed whenever the popularity of the               To do so, one has to take into account the dual capacity of
information producer changes. In Section V we study the price              information producers in AGORA: to answer one-time content
in different scenarios, and show the effect on the quality of              requests (IR) and satisfy long-term information needs (IF).
retrieved content (i.e., recall) when the price choice is (i)              To this end, the quality of an information producer has to be
random, (ii) strongly correlated with quality, and (iii) partially         based on (i) the quality of already published content (since
correlated with quality.                                                   this depicts its ability to satisfy IR tasks from users) and (ii) a
                                                                           predicted quality of the content to be published in the future
B. Content price specifics                                                 (since this depicts its ability to satisfy IF tasks from users).
                                                                           The necessity of both facets is better illustrated in the case of
   In this section, we analyse the economic modelling of
                                                                           an information producer that provides content in the form of
AGORA and review the basic assumptions and expectations
                                                                           technical articles and news items; articles have a long shelf-
from such a modelling. Usefulness of the information goods
                                                                           life and are good candidates for recurrent sales, while news
received by a subscriber is a qualitative criterion, that is diffi-
                                                                           items have an extremely short shelf-life and after some time
cult to model. In AGORA, we model usefulness by matching
                                                                           users loose interest in them.
interests, i.e., by assuming that all received content relevant
to the information need are useful to the subscriber, and do                  1. Quality of published content. The quality of the already
not discuss issues such as novelty, coverage of the field, or              published content is known as the resource selection problem
user effort. In our modelling, after a user acquires a history             in the areas of information retrieval and databases. Hence,
of transactions with certain information producers, develops               a number of standard resource selection algorithms such as
an affect for some of them. Affect can be modelled in various              tf-idf based methods, CORI, or language models (see [30]
ways, depending on the task at hand, and can be either positive            for a nice overview) have been proposed. We use a standard
or negative [11]. In AGORA, a user does not a priori know the              tf-idf based component that combines accuracy of modelling
quality of the information goods, but uses the affect developed            and ease of implementation. Thus, in our setup, quality of
from previous transactions to approximate it. Subsequently, he             published content is given by:
compares the values of information quality to the expected                            X
                                                                                                                              max
                                                                                                                                  
                                                                                            [0.5 · log (dfp,t ) + 0.5 · log tfp,t  ]       (2)
values and update its affect [21].                                                     t∈i
   The costs in AGORA are results of actions [18], such as
transactions, network communication, and use of common                        Our approach uses the standard IR constructs of document
infrastructure. Information producers try to maximise their                frequency (df ) and maximum term frequency (tf max ) [25],
revenue while minimising expenses that occur due to their                  while a balanced combination of these two metrics (0.5 in the
actions. Contrary users try to maximise the utility of the                 above equation) is used to equally emphasise importance of
received content, while minimising expenses that occur due to              df and tf max according to [30], [43].
actions. In general, the content market in AGORA is not a pure                2. Predicted quality of content to be published. To pre-
competitive market in the sense of e.g., [39], since users do              dict the quality of the content to be published, we model
not know in advance the exact content quality they are buying.             IR statistics per information producer as time-series data
AGORA resembles the modelling of a team of sales people                    and use statistical analysis tools [5] to predict future values
[38], where stakeholders try to collaborate with others in order           based on past observations. Such techniques take into account
to get their expertise for a (cross/up) sale. After deciding who           assumptions about the internal structure of the time series
to collaborate with, it is possible to model the gap between               like trends and seasonality and tend to emphasise recent
the initial expectations and the actual actions. In [12], it is            observations. Since seasonality requires long-term statistics
shown that this gap is smaller in a competitive relationship               that are infeasible to maintain (e.g., several years of data to
compared to that of a cooperative relationship. As in many                 observe seasonality in the publication of Christmas content),
cooperative environments each stakeholder usually retains its              we resort to double exponential smoothing techniques [5] as
connections with the others, while also being free to explore              our prediction mechanism, since it requires a small amount of



                                                                      23
data, emphasises recent over older data observations, and al-                  producers that were chosen according to his own information
lows for trend identification (useful for news items or trending               needs and ranked using the formulas discussed in Section IV.
topics). Thus, in our setup, predicted quality of content to be                For deciding the per user budget, we relied on studies about
published is given by:                                                         budget distribution and spending for a variety of cases, ranging
           X
                         ∗                                                     from family budgets to consumer budgets [20], [9]. The main
                            ) + log δ(cs∗p ) + 1 + 1
                                                    
               log δ(dfp,t                                   (3)               conclusions drawn from these studies are that (i) budget
            t∈i
                                                                               distribution follows a power law, with a small percentage
                                             ∗
   In the formula above, function δ(dfp,t      ) stands for the differ-        of families/consumers having a high (yearly) budget, and a
ence between the predicted and the last document frequency                     large percentage of the families being in the (long) tail of
(df ) observed, and δ(cs∗ ) is the difference in the collection                the distribution, with a low budget, and (ii) the percentage
size of information producer p reflecting its overall expected                 of the income spend on content does not vary with the
future publishing activity. These values are calculated for all                budget. According to [3] and the above remarks, we divided
terms t contained in the user subscription. In this way we                     the users into three classes: low, average and high budget
model two aspects of the information producer’s behaviour:                     users with a population of 600K (or 60% of total users),
(i) its potential to publish relevant documents in the future                  300K (or 30% of total users) and 100K (or 10% of total
                      ∗
(reflected by δ(dfp,t   )) and (ii) its overall expected future                users) respectively. Subsequently, we experimentally computed
publishing activity (reflected by δ(cs∗ )). Notice also that,                  a budget that would allow the users to subscribe to the top-100
in the above formula, the publication of relevant documents                    (i.e., 10%) information producers, and allowed the low budget
            ∗
(i.e., δ(dfp,t )) is more emphasised than the publication rate                 users to have 60%, the medium budget users to have 80% and
           ∗
(delta(cs )) due to the nesting of the log functions. The                      the high budget users to have 120% of this ideal budget.
addition of 1 in the log functions is used to yield positive                      To measure the effect of content cost in quality (with and
predictions and avoid log(0).                                                  without monetary flow) we utilise the following metrics:
   3. Putting it all together. Given Formulas 2 and 3, the                        • Messages. We measure the message traffic per action in
overall quality of an information producer p with respect to                          AGORA.
an information need  X i is given by:                                             • Recall. We measure average recall over all rounds by
                                                             max
                                                                 
  quality(p, i) =       [(0.5 · log (dfp,t ) + 0.5 · log tfp,t    )+                  computing the ratio of the total number of retrieved
                     t∈i                                                              content to the total number of relevant content for all
                                   ∗
                                       ) + log δ(cs∗p ) + 1 + 1 ]
                                                              
                           log δ(dfp,t                                                users.
                                                                 (4)              • Ranking. We use an extension of Spearman’s footrule
                                                                                      distance to compare rankings of information producers
                  V. E XPERIMENTAL E VALUATION                                        calculated by users. This metric allows us to compare two
  In this section, we present our findings from the introduction                      different information producer rankings by calculating the
of content cost, and how it affects the effectiveness of the                          distance between the elements in two ranking lists. In our
system. We study the behaviour of AGORA using different                               implementation if an element from list A is not present
scenarios, while varying the correlation between price, quality                       in list B, it is considered as being in the last available
and customer demand.                                                                  position in B.
                                                                                  Finally, to assist the reproducibility of our results we plan
A. Experimental Setup and Metrics
                                                                               to publicly release our code in an appropriate repository after
   We used a real-life document collection containing 2M doc-                  the publication of the work.
uments with a vocabulary of about 600K terms from a focused
Web crawl. The documents were categorised in ten different                     B. Varying the price-quality correlation
categories by content, with the smallest category having 67K                      In this experiment, we aimed at observing the behaviour of
documents and the largest one of 325K. In all experiments,                     AGORA when varying the correlation 0 ≤ κ ≤ 1 between the
the network consists of 1, 000 information producers and 1M                    price and the quality of an information producer; k = 1 means
users. Each information producer started with a database of                    that the price an information producer charges for content
300 documents initialised with 15% random category, 10%                        is fully correlated to the quality of that producer. Notice
non-categorised, and 75% single category documents, resulting                  that quality in AGORA is easily calculated/forecasted using
in 100 specialised information producers for each category. We                 Equation 4 and in this case users know that the content from
artificially constructed information needs for the users (in the               this information producer will be expensive but relevant (i.e.,
form of queries and subscriptions with multiple terms) using                   useful to them). The other extreme is when k = 0, where
the document corpus and by selecting terms that are strong                     prices have no correlation to the quality of the information
representatives of a document category (i.e., a frequent term                  producer and are chosen randomly. For the rest of the cases,
in documents of one category and infrequent in documents of                    i.e., 0 < κ < 1, the correlation is modelled as the likelihood
the other categories). The simulation was done in rounds.                      that an information producer sells κ% of content underpriced
   We introduced budget constraints on a per user basis; thus,                 (up to 20% of the initial value), and 1 − κ overpriced (up to
each user was able to follow only a few selected information                   20% of the initial value).



                                                                          24
                                                                                                                               40
                                                                       κ = 0%                                                                                                                   κ = 0%
                                                                      κ = 25%                                                                                                                  κ = 50%
            0.5                                                       κ = 50%                                                                                                                  κ = 75%
                                                                      κ = 75%                                                  35                                                             κ = 100%


                                                                                                                               30




                                                                                         Extended Spearman’s footrule metric
            0.4


                                                                                                                               25

            0.3
   recall




                                                                                                                               20



            0.2                                                                                                                15


                                                                                                                               10

            0.1
                                                                                                                                5


             0                                                                                                                  0
                  0         0.2         0.4         0.6         0.8        1                                                              0.2             0.4         0.6         0.8          1.0
                                              β                                                                                                                        β



                            Fig. 3. Recall against β.                                                                          Fig. 5. Difference in information producer ranking against β.



   Figure 3 presents the achieved recall for different values                         different values of κ is an effect of the modelling of AGORA as
of β and κ, where we notice that the introduction of price                            a closed system where monetary flow is limited by the budgets
for content reduces the observed recall (notice that recall has                       of the users and no new wealth is produced.
the highest values for β = 0, i.e., no pricing is involved in                            Figure 5 shows how differently users rank information
information producer ranking). This is an important result,                           producers for varying values of β and κ. The difference in
showing that when information producers charge for content,                           the ranking of information producers is measured using an
consumers trade quality for cheaper options. Notice also that                         extension of Spearman’s footrule metric. To produce a point
for every κ value tested recall is retained high, as long as it                       in the graph we compare the ranked lists of information
plays the most important role in the ranking (β < 50%). This                          producers for each pair of users for and each value of β in
was also expected, since when price is of importance, con-                            the x-axis and average the results. Notice that the case of
sumers will choose cheaper information producers, leading to                          β = 0 is omitted as the extended Spearman’s footrule metric
a reduction in the observed recall. Additionally, when the price                      is always 0 since the lists compared are identical (users take
is the only ranking criterion for users (β = 1), recall is close to                   into account only the quality to rank information producers).
that of a random choice of information producers (remember                            When β increases (i.e., price becomes more important in
that users monitor only 10% of information producers in the                           the ranking process) Spearman’s metric increases too, as
system).                                                                              information producers with high quality get lower positions in
                                                                                      the ranking, while information producers with lower quality
            0.6
                                                                       β = 0.2
                                                                                      (but cheaper) are ranked high. Finally, when the price of
                                                                       β = 0.5
                                                                       β = 0.8        an information producer is not associated with its quality
            0.5                                                                       (random price setup or κ = 100%), there are big differences in
                                                                                      the ranking of information producers (especially in the cases
            0.4                                                                       where price matters more – the leftmost points in the graph)
                                                                                      due to the introduced randomness.
   recall




            0.3

                                                                                                                               20
                                                                                                                                                                                                κ = 0%
                                                                                                                                                                                               κ = 25%
            0.2                                                                                                                                                                                κ = 50%
                                                                                                                                                                                               κ = 75%


            0.1                                                                                                                15
                                                                                          # of messages/document




             0
                      0.2         0.4         0.6         0.8          1
                                               κ                                                                               10



                            Fig. 4. Recall against κ.

                                                                                                                                5
   Figure 4 shows how recall is affected for three different val-
ues of β, when κ increases. When one of the two components
becomes dominant in the ranking function, it outweighs the                                                                      0
                                                                                                                                        0.0         0.2         0.4         0.6         0.8      1.0
effect of the other. This is in line with our expectations, since                                                                                                     β

price dominates the ranking function and quality is sacrificed
to reduce costs. Finally, notice that consistent recall between                                                                                 Fig. 6. Message traffic against β.




                                                                                 25
                          3.5                                                                        0.5
                                                              heartbeat messages                                                            Change in Publishing Behaviour
                                                                  other messages                                                            Consistent Publishing Behaviour
                                                                                                    0.45
                           3


                                                                                                     0.4
                          2.5
   # of messages (x10K)




                                                                                                    0.35
                           2




                                                                                           recall
                                                                                                     0.3

                          1.5
                                                                                                    0.25

                           1
                                                                                                     0.2


                          0.5
                                                                                                    0.15


                           0                                                                         0.1
                                   0              50             100                                             0       0.2     0.4       0.6          0.8           1
                                                  κ                                                                                    β



                                Fig. 7. Message traffic against κ.                                          Fig. 8. Recall for different scenarios against β.



C. System Performance                                                                   graph are (i) the drop in recall for both scenarios as β increases
   In the first series of experiments we targeted the perfor-                           and (ii) the higher recall values for the case of topic speciali-
mance of AGORA in terms of message traffic. Figure 6 shows                              sation. The reason for the first observation is the shift of users
that the message traffic per user incurred in AGORA is reducing                         to cheaper information producers due to the importance of
as β increases. This happens because users utilise the price                            content price in the ranking function. Additionally, the reason
component to rank information producers, thus choosing those                            for the second observation is that the build up of expertise by
of lower price and poor quality. This reduces the overall                               the information producers reflects on the higher quality values,
message number as less content is exchanged in the system                               which in turn leads to ranking quality information producers
due to users subscribing to non-expert information producers;                           higher. Contrary, when information producers shift their topic,
our observations here are consistent with those in Section V-B                          users are not able to correlate price and quality and thus,
that correlate recall and β.                                                            select information producers of poor quality (hence the drop
   Figure 7 demonstrates the total amount of traffic observed in                        in recall).
AGORA and how this traffic is split in the various message cat-
                                                                                                           VI. C ONCLUSIONS AND F UTURE W ORK
egories, as the price-quality correlation is varied. As expected
the heart-beat protocol messages dominate the messaging load,                              In this work, we proposed AGORA, a social networking
as necessary messages with information producer statistics                              architecture and the related distributed protocols to facilitate
and prices are disseminated in the system. Notice also that                             distributed content management in the form of information
these messages are not affected by price-quality correlation,                           retrieval and information filtering. In addition, we introduced
since the information producers have to update their publica-                           a novel selection mechanism for information producers that
tion statistics and prices, regardless of their customer base.                          allows users to rank them according to (i) their expertise,
Finally, notice that the number of messages for querying                                (ii) their predicted future content publications, and (iii) the
for/subscribing to/receiving content are affected as κ increases                        price of content they deliver. The experimental evaluation of
since information producers that are of high quality widen their                        our proposal demonstrated the behaviour of such a content
customer base with more users.                                                          marketplace in terms of price/quality tradeoffs, recall and
                                                                                        message traffic. To the best of our knowledge, these are
D. Varying the behaviour of information producers                                       the first results that connect recall and message traffic in
   In this section we look into recall and how this is affected by                      a distributed social network with the content cost, and put
two very different information producer behaviours: topic spe-                          economic modelling at the heart of system design. The most
cialisation and topic shift. In topic specialisation, information                       important outcome of our study is that price should participate
producers disregard market conditions and maintain their topic                          in the ranking of information producers less than a quarter
specialisation even if this results in lower revenues. Contrary in                      of the total score to avoid the delivery of irrelevant content
topic shift, information producers may alter their specialisation                       to users. We also showed that the introduction of content
topic over time based on changes in user demand, revenue and                            price improves system scalability by reducing message traffic
market conditions. In the latter case, an information producer                          and imposing a reasonable use of resources to stakeholders.
initially publishes content from one category, and some rounds                          Overall, introducing a monetary value for the production and
later may decide to switch to a different category to simulate                          dissemination of content in a distributed social network proves
changes in portfolio or a different business strategy.                                  an interesting business model that conserves resources and
   Figure 8 shows the observed recall for both scenarios and                            improves scalability. However, this should be executed in a
different values of β. The most important observations in this                          careful fashion to avoid user dissatisfaction that may rise from



                                                                                   26
the content cost itself, and the reduced access to relevant                          [18] B. Johansson and H. Persson. Self-organised adjustments in a market
information.                                                                              with price-setting firms. In Chaos, Solitons and Fractals, 2003.
                                                                                     [19] M. A. Kabir, J. Han, J. Yu, and A. Colman. SCIMS: A Social Context
   Future directions of research include more extensive ex-                               Information Management System for Socially-Aware Applications. In
perimentation (using a grid service for vast-scale experiments                            CAiSE, 2012.
and analytics), more detailed economic modelling (e.g., model                        [20] B. B. A. Kennickell and K. Moore. Recent Changes in U.S. Family
                                                                                          Finances: Evidence from the 2001 and 2004 Survey of Consumer
AGORA as an open system, perform monetary flow monitor-                                   Finances, 2006.
ing/analysis, incorporate agent-style BDI modelling [13]), and                       [21] C. Li, M. Singh, and K. Sycara. A Dynamic Pricing Mechanism for
implementation of a prototype system.                                                     P2P Referral Systems. In AAMAS, 2004.
                                                                                     [22] G. Liu, H. Shen, and L. Ward. An efficient and trustworthy P2P and
                       ACKNOWLEDGEMENTS                                                   social network integrated file sharing system. In P2P, 2012.
                                                                                     [23] A. Loupasakis, N. Ntarmos, and P. Triantafillou. eXO: Decentralized
   The author would like to thank the anonymous reviewers                                 Autonomous Scalable Social Networking. In CIDR, 2011.
for their comments and suggestions on improving the quality                          [24] M. Mani, A.-M. Nguyen, and N. Crespi. SCOPE: A prototype for
                                                                                          spontaneous P2P social networking. In PerCom Workshops, 2010.
and presentation of the work.
                                                                                     [25] C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Informa-
                                                                                          tion Retrieval. Cambridge University Press, 2008.
                             R EFERENCES                                             [26] G. Mega, A. Montresor, and G. Picco. Efficient dissemination in
 [1] S. Abbas, J. Pouwelse, D. Epema, and H. Sips. A Gossip-Based                         decentralized social networks. In P2P, 2011.
     Distributed Social Networking System. In WETICE, 2009.                          [27] J. Mitchell-Wong, S. Goh, M. Chhetri, R. Kowalczyk, and Q. Vo. A
 [2] M. Backes, M. Maffei, and K. Pecina. A Security API for Distributed                  Framework for Open, Distributed and Self-Managed Social Platforms.
     Social Networks. In NDSS, 2011.                                                      In VECN, 2008.
 [3] L. P. Breker. A survey of network pricing schemes. In Theoretical               [28] R. Narendula, T. Papaioannou, and K. Aberer. My3: A highly-available
     Computer Science, 1996.                                                              P2P-based online social network. In P2P, 2011.
 [4] S. Buchegger, D. Schioberg, L.-H. Vu, and A. Datta. PeerSoN: P2P                [29] C. H. Ng, K. C. Sia, and C. H. Chang. Advanced Peer Clustering and
     social networking: early experiences and insights. In SNS, 2009.                     Firework Query Model in the Peer-to-Peer Network. In WWW, 2002.
 [5] C. Chatfield. The Analysis of Time Series - An Introduction. CRC Press          [30] H. Nottelmann and N. Fuhr. Evaluating Different Methods of Estimating
     2004.                                                                                Retrieval Quality for Resource Selection. In SIGIR, 2003.
 [6] A. Crespo and H. Garcia-Molina. Routing indices for peer-to-peer                [31] P. Raftopoulou and E. Petrakis. iCluster: a Self-Organising Overlay
     systems. In Proceedings of 22nd IEEE International ICDCS Conference,                 Network for P2P Information Retrieval. In ECIR, 2008.
     pages 23–32, Vienna, Austria, July 2002.                                        [32] P. Raftopoulou, E. Petrakis, and C. Tryfonopoulos. Rewiring strategies
 [7] L. A. Cutillo, R. Molva, and T. Strufe. Safebook: A privacy-preserving               for semantic overlay networks. In DPD, 2009.
     online social network leveraging on real-life trust. IEEE Communica-            [33] P. Raftopoulou, C. Tryfonopoulos, E. Petrakis, and N. Zevlis. DS4:
     tions Magazine, 2009.                                                                Introducing Semantic Friendship in Distributed Social Networks. In
 [8] A. de Spindler, M. Grossniklaus, and M. C. Norrie. Development                       CoopIS, 2013.
     Framework for Mobile Social Applications. In CAiSE, 2009.                       [34] R. Sharma and A. Datta. SuperNova: Super-peers based architecture for
 [9] J. B. DeLong. Six Families Budget Their Money. In Lecture notes for                  decentralized online social networks. In COMSNETS, 2012.
     American Economic History, University of California at Berkeley, 2008.          [35] P. Stuedi, I. Mohomed, M. Balakrishnan, Z. Mao, V. Ramasubramanian,
[10] A. Dumrogsiri, M. Fan, A. Jain, and K. Moinzadeh. A supply                           D. Terry, and T. Wobber. Contrail: Enabling Decentralized Social
     chain model with direct and retail channels. In European Joumal of                   Networks on Smartphones. In Middleware, 2011.
     Operational Research, 2008.                                                     [36] F. Tegeler, D. Koll, and X. Fu. Gemstone: Empowering Decentralized
[11] M. Faig and B. Jerez. Inflation, Prices, and Information and Competitive             Social Networking with High Data Availability. In GLOBECOM, 2011.
     Search. Journal of Macroeconomics, 2006.
                                                                                     [37] B. Upadhyaya and E. Choi. Social Overlay: P2P Infrastructure for Social
[12] L. Forker and P. Stannack. Cooperation versus Competition: do buyers
                                                                                          Networks. In NCM, 2009.
     and suppliers really see eye-to-eye? In European Journal of Purchasing
                                                                                     [38] T. Üstüner and D. Godes. Better Sales Networks. In Harvard Business
     and Supply Management, 2000.
                                                                                          Review, 2006.
[13] M. Georgeff, B. Pell, M. Pollack, M. Tambe, and M. Wooldridge.
     The Belief-Desire-Intention Model of Agency. In Intelligent Agents              [39] H. R. Varian. Pricing Information Goods. In Research Libraries Group
     V: Agents Theories, Architectures, and Languages, 1999.                              Symposium, Harvard Law School, 1995.
[14] K. Graffi, C. Gross, P. Mukherjee, A. Kovacevic, and R. Steinmetz.              [40] S. Voulgaris, M. van Steen, and K. Iwanicki. Proactive Gossip-based
     LifeSocial.KOM: A P2P-Based Platform for Secure Online Social                        Management of Semantic Overlay Networks. CCPE, 19(17), 2007.
     Networks. In P2P, 2010.                                                         [41] M. Xue, B. Carminati, and E. Ferrari. P3D - Privacy-Preserving Path
[15] B. Greschbach, G. Kreitz, and S. Buchegger. The devil is in the metadata             Discovery in Decentralized Online Social Networks. In COMPSAC,
     - New privacy challenges in Decentralised Online Social Networks. In                 2011.
     PerCom Workshops, 2012.                                                         [42] C. Zimmer, C. Tryfonopoulos, K. Berberich, M. Koubarakis, and
[16] F. Gunther, M. Manulis, and T. Strufe. Key management in distributed                 G. Weikum. Approximate Information Filtering in Peer-to-Peer Net-
     online social networks. In WOWMOM, 2011.                                             works. In WISE, 2008.
[17] L. Han, M. Punceva, B. Nath, S. Muthukrishnan, and L. Iftode. So-               [43] C. Zimmer, C. Tryfonopoulos, K. Berberich, G. Weikum, and
     cialCDN: Caching techniques for distributed social networks. In P2P,                 M. Koubarakis. Node Behavior Prediction for Large-Scale Approximate
     2012.                                                                           Information Filtering. In LSDS-IR @ SIGIR, 2007.




                                                                                27