The Effect of Introducing Content Price in Distributed Social Networks Christos Tryfonopoulos Dept. of Informatics and Telecommunications University of the Peloponnese GR22131, Tripoli, Greece trifon@uop.gr Abstract—Over the last few years a number of distributed content-based push technologies to cope with the information social networks with content management capabilities have been explosion is also stressed by the deployment of tools such introduced both by academia and industry. However, none of as Google Alert and CNN alerts. In an IF scenario, a user these efforts has so far focused on supporting both information retrieval and filtering functionality in a distributed social net- posts a subscription (or continuous query) to the system to working environment. In this work we present a social network- receive notifications whenever certain events of interest take ing architecture that offers both functionalities –in addition to place (e.g., when a blog post on Winter Olympics becomes the usual social interaction tasks– in distributed social networks, available). outline the associated distributed protocols, and introduce a However, the increasing usage of social networks gave rise novel data source selection mechanism for identifying good data sources. This novel data source selection mechanism is designed to skepticism about use of centralised services that are able to take into account a combination of resource selection, predicted to withhold all uploaded content. Such centralised social net- publication behaviour, and content cost to improve the selection working services are typically owned by private companies and of information producers by users. To the best of our knowledge users need to upload their content, thus giving away ownership our approach, coined AGORA, is the first work to model the rights to make it available to others. This allows companies price of content and to study its effect on retrieval efficiency and effectiveness in a distributed social network setting. Finally, to exploit user data and sell them to advertisers for profit. our work goes beyond modelling by providing proof-of-concept To circumvent such practises, distributed social networking experiments with real-world corpora and social networking data. services –building upon results from the P2P paradigm (e.g., [29], [31], [40])– have been proposed both by academia and Index Terms—distributed social networks, content manage- industry in the form of distributed social platforms [34], ment, information retrieval/filtering, content price, economic modelling, experimental evaluation [14], [14], [35] and distributed social content management systems [23], [22], [17], [26], [33], [28]. Although all these approaches offer different types of distributed social networks I. I NTRODUCTION that allow users to create communities, share content, and send Much information of interest to humans is available today messages, none of them focuses on supporting information on the Web, making it extremely difficult to stay informed filtering functionality in such a setup. Moreover, such dis- without sifting through enormous amounts of information. In tributed environments without centralised control, prove an addition, a vast amount of this information is published and interesting business model for pricing the content delivered shared through social networking sites by users that participate by an information producer. In this setup, each information in ‘social’ activities through the generation, commenting, producer (e.g., a news agency, a digital library, or a prolific tagging, liking and sharing vast amounts of digital content. blogger) might have its own customer base of interested As users engage increasingly more in the usage of social net- followers/subscribers, and may charge the delivered content works, demand for content management capabilities has forced by subscription or per item. social networks to go beyond the usual social interactions In this work, we present AGORA, a distributed social (e.g., like, post, or poke) and offer basic content management networking architecture that allows users to share, search for functionality. To this end, many social networks adopt the tra- (IR), and subscribe to (IF) content in a fully decentralised ditional approach of content search (information retrieval - IR) way, while at the same time maintaining ownership of their [23], [22], [17], [26], [33], [28]. All these approaches however content. Such a design is ideal for creating social content ignore that the information filtering paradigm may also provide marketplaces1 , where users are able to price and distribute a good alternative of keeping the users informed and at the their content while at the same time they maintain ownership same time avoiding the information avalanche. In information rights. Content in our setup can be textual, such as status filtering (IF) –also referred to as publish/subscribe, continuous updates/blog posts/tweets, or multimedia, such as photos or querying, information push, or information dissemination– 1 AGORA is the Greek word for marketplace users are able to subscribe to information sources and be notified when content of interest is published. This need for Copyright held by the author(s) 19 while Section V experimentally demonstrates the effect of cost in retrieval and filtering quality and compares the system effectiveness with/without monetary flow. Finally, Section VI discusses high-level conclusions, open issues, and possible extensions. II. R ELATED W ORK In this section, we discuss related work in the context of systems that are designed with content management in mind, and platforms that provide distributed social networks. A. Distributed Social Platforms All available social networks (e.g., Facebook, LinkedIn, Fig. 1. High-level view of the AGORA architecture. Elgg) are currently based on centralised solutions both for storing and managing of content, which set scalability lim- itations on the system and reduce fault-tolerance. Industry videos, appropriately tagged. Our proposed solution offers has already detected these drawbacks and has lately turned fundamental social interactions, while emphasising on content into solutions that diverge from the centralised model of the management (IR and IF) and economic aspects such as the existing systems by developing platforms, such as Diaspora, price/quality tradeoff of content. To the best of our knowledge, KrawlerX and OpenSocial, that provide APIs to support ap- this is the first approach that aims at combining IR and IF plication hosting in remote application servers, owned and in a social networking context, while taking into account managed by the application providers. In a similar spirit, aspects of economic modelling. In the light of the above, the a strand of research work also moved towards hierarchical contributions presented in this work are the following: organisations for supporting distributed social networking. The • We propose a social networking architecture that offers distributed social systems SuperNova [34] and Scope [24] are content management functionality in terms of IR and IF, based on a two-tier architecture, where nodes with higher in addition to the usual social interaction tasks typically computing capability become super-nodes and form an overlay supported in such scenarios. This is the first approach in to provide distributed data management of the P2P social the literature to offer both functionalities. network. Client nodes connect to super-nodes and rely on • We present the distributed protocols and services that reg- them for bootstrapping, sharing their content, and accessing ulate node interactions, provide details on the distributed the shared information. Although all of the platforms and IR and IF, and outline the different friendship facets system schemes propose the decentralisation of the social introduced in the architecture. services, one of the main issues of the centralised architectures persists: the existence of a single point where user information • We devise a novel method to rank information producers is collected and may be exploited. according to (i) the content already published, (ii) the To alleviate the above disadvantage, distributed platforms expected future publishing behaviour, and (iii) the price for social online networks based on the P2P paradigm were announced by the information producer. This method proposed [27], [1], [14]. LifeSocial.KOM [14] is a plugin- allows us to achieve high recall with low cost (by searching based extendible social platform that provides secure com- at/subscribing to a small number of information produc- munication and user-based data access control, and integrates ers). a monitoring component that allows users and operators to • We study the effect of content price in our setup and observe the quality of the distributed system. Similar efforts experimentally demonstrate that it is a key element on the aimed at spontaneous social networking; they include propos- delivered content quality. Our modelling utilises concepts als for distributed social services in resource constrained de- such as correlation between the quality/expertise of the vices (like tablets or smartphones) [35], [8], or in environments information producer, demand-driven price rates, and cost with no infrastructure guarantees (e.g., high-attendance events) of resources. [24]. All these approaches offer different types of distributed Figure 1 shows a high-level view of the envisioned dis- social platforms that allow users to create communities, share tributed architecture with different types of information pro- content, and send messages, but do not emphasise expressive ducers and users with varying information needs (IR, IF or content search mechanisms. both). Finally, other approaches in distributed social networking The rest of the paper is organised as follows. Section II emphasise on delivering innovative and competitive services; overviews related research, while Section III discusses the SCIMS [19] relies on an ontology-based model for managing AGORA architecture and the associated distributed protocols. social relationships and status, the work in [37] aims at per- Subsequently, Section IV highlights the economic and qualita- sonalising search results based on user context and friendship tive aspects of AGORA and presents the price/quality tradeoff, relations, while Gemstone [36] targets data availability in the 20 absence of the data owner. To achieve this, a replica storage scheme based on social relationships, online patterns of nodes, and user experiences is utilised. B. Distributed Social Data Management Our work fits mainly into the area of content management in distributed social networks and is inspired by previous approaches on Semantic Overlay Networks (SONs) [29], [31], [40] and based on works that emphasise on distributed content location in social networks. Works like the eXO [23] and Fig. 2. The thematic and friend index of a node. SoNet [22] systems are, similarly to AGORA, inspired by the P2P paradigm to provide content location and management services on large-scale decentralised social networks. To do structures, and effective object location mechanisms. Contrary, so, the authors rely on a structured overlay and exploit the DHT-based architectures [23], [28] ignore node autonomy (by accurate location mechanisms, but de-emphasise node auton- enforcing deterministic key/content placement) and emergent omy. Contrary to these approaches, AGORA employs a loose structures (by enforcing network structure). component architecture and introduces a new type of social Finally, a large number of research in the domain of dis- relations between nodes: the semantic closeness of content. tributed social networks consists of studies on system security In this way, nodes that are similar in terms of content, create [28], [7], user privacy [28], [7], [16], [15], [41], distributed emergent groups likewise to the creation of social relations. access control [2], [4], and authentication mechanisms [2]. Our work shares ideas with the SocialCDN system [17], where Clearly, security issues are also relevant in our design and the social caches (links among friends) are introduced as a way to AGORA system could benefit by adopting approaches like [2] alleviate the network traffic and optimise data dissemination or [15] that enforce user privacy and access control. However, (mainly by social updates). In [17], social cache selection the problem of security is orthogonal to our design and is not is formulated as the neighbour-dominating set problem and further analysed as it is not the emphasis of this work. a family of algorithms is proposed and evaluated. Contrary III. S YSTEM OVERVIEW to AGORA, where the emphasis is on efficiently supporting In this section, we provide an overview of the architectural expressive content retrieval in the social paradigm, the em- components of the proposed system and provide the distributed phasis on SocialCDN is on the reduction on network traffic to protocols executed locally by users and information producers. facilitate fundamental social interactions. The loose component architecture and the emphasis on node A. Architecture autonomy of [26] resemble the architectural design of AGORA, We consider a distributed social network, where each user, where an unstructured overlay network of nodes is utilised characterised by its interests, is connected to friends and to support the distributed social infrastructure. However, the information producers with similar interests. The interests of a focus of [26] is on the design of gossip protocols for efficiently user are identified automatically, i.e., by applying clustering on disseminating profile updates to all interested users and does its local content repository. The network nodes use a heartbeat not put any attention to the problem of content search and protocol that runs continuously and aims at identifying new management. information producers based on the likelihood to have similar Furthermore, the concept of creating and maintaining social interests with the node at hand. Each user maintains two connections in distributed infrastructures is affined with the routing indices holding information for social friends and problem of distributed data management in P2P networks. In thematic friends. Social friend links correspond to the social SONs (e.g., [40], [6]), “social” connections between the peers relationship aspect of the network, while thematic friend links (e.g., similarity of content, pattern, or distance in a physical are of two types: short-range links (i.e., links to information level) are exploited to direct the search to nodes with relevant producers with similar interests) and long-range links (i.e., data (e.g., as in [32] that studies query routing strategies based links to information producers with dissimilar interests to on “social” relationships). Other works on SONs (e.g., [31]) maintain connectivity between different clusters in the system). focus more on the organisation of P2P networks as small- The reorganisation procedure is executed locally by each world networks, where peers self-organise in groups of similar node and aims at bringing together users and information interests to facilitate message-efficient query answering. Our producers with similar interests that are likely to share/search work on AGORA borrows concepts and ideas from research on for/subscribe to content. SONs and extends them for facilitating efficient and effective data management in a social network setting. We suggest that B. Joining Agora SONs offer the most promising architectural solution inspired When a user node connects to the network, its interests are from the P2P paradigm; it is a perfect fit for a distributed automatically derived by its local content. For each interest, social networking scenario providing high decentralisation, the node maintains a thematic index (T) containing the contact high node autonomy, support for emergent semantic and social details and interest descriptions of information producers. 21 These links form the thematic neighbourhood of the node; the literature as the fireworks technique [29]. All the nodes the links contained in T are refined accordingly by using receiving the message reduce TTL by one and apply the the rewiring service described below. Furthermore, each node same forwarding technique; the message is not forwarded maintains a friend index (F) containing the contact details further in the network when TTL reaches zero. Additionally and interest descriptions of the social neighbourhood of the to forwarding, every node receiving a message compares the node, comprised of explicitly declared friends in the network. subscription against its identified interests and, if similar, Figure 2 shows the F and T for an arbitrary node. stores it in its local continuous query data structures to match it against future publications. Content is kept locally at each C. Locating information producers information producers in the spirit of [42]. This lack of publi- This service is applied to locate new information producers cation dissemination is a design decision that offers increased by establishing new connections and discarding old ones. scalability (by trading recall) and complies with the content Each node may initiate this procedure by computing a scoring marketplace paradigm we aim for. Thus, only the nodes function that combines the quality and price of all information indexing a subscription can notify the interested user, although producers in its thematic index (T). If the computed score is other information producers may also publish relevant content. greater than a threshold then the node does not need to take When an information producer wants to publish new content any further action, since it is already aware of information to the network, it matches it against its local continuous query producers that match its needs and budget. Otherwise, the database to decide which continuous queries match it and thus, node initiates a process to identify new information producers which user node should be notified. Then, the information by forwarding a message in the network –bounded by a producers delivers a notification for each continuous query by time-to-live (TTL) mechanism– using the thematic and social sending to the query initiator a pointer to the matching content; connections and collecting the interests of other information if the user is not online, the provider stores the message producers. and delivers it upon user reconnection. Notice that since the The issued message is forwarded with equal probability information producer maintains the content ownership, it is to (i) a number of randomly chosen entries contained in a now able to charge the user at the announced content price. node’s T, (ii) a number of randomly chosen entries contained in a node’s F, or (iii) the most similar nodes to the message E. Searching for content at information producers initiator, found in either T or F. The rationale of applying A user issuing content search forwards a message in the either of the forwarding strategies is that the message initiator network following a mechanism similar to the one described in should be able to reach information producers both directly the previous section. Additionally to query forwarding, every (through other similar nodes), but also indirectly (through information producer receiving a query message compares propagation of the message through non-similar nodes). Each it against the identified interests and, if similar, matches it node that receives the message adds its interest in it, reduces against the locally stored content. Subsequently, pointers to the TTL by one, and forwards it in the same manner. When the matching content are sent to the query initiator (along with the TTL of the message reaches zero, the message containing announced price for the content). Subsequently answers from the contact info and interests of all information producers all information producers are ranked by a combination of price that received the message is sent back to its initiator. To and similarity to the query (discussed in the next section) and speed up the process, every intermediate node receiving the the list is presented to the user. message may utilise the information in it to refine its thematic connections. IV. R ANKING OF INFORMATION PRODUCERS To select which information producer will be monitored, D. Subscribing for content at information producers the protocols described in the previous section use a scoring Queries/subscriptions are issued as free text or keywords function to rank information producers. In this section, we under the Vector Space Model and are formulated as term quantify these concepts and give the rationale between our vectors. The user subscribing for specific content forwards choices. a message in the network with a TTL using both its social and thematic connections. The issued message is forwarded A. Quality vs Price both to (i) friends that have interests similar to the query and The ranking strategy for the information producers is a are contained in F and (ii) a small number of information critical component since it allows users to locate relevant infor- producers contained in the T chosen as described below. mation and information producers to maximise their revenue. Initially, the message initiator compares the user subscription Empirical studies have shown that price and quality are the against its interests and, if similar, the message is forwarded to two key determinants of the consumer’s choice to buy or not all of its short-range links, i.e., the message is broadcasted to a product [10]. Thus, to make an informed selection on the the node’s neighbourhood (explosion). Otherwise, the message information producers, a user has to rank them based on a is forwarded to a small fixed number of nodes that have the blend (denoted by tunable parameter β) of content quality and highest similarity to the subscription (fixed forwarding). The content price both in a retrieval and a filtering scenario. This combination of the two routing strategies is referred to in combination describes the benefit/cost ratio for the content and 22 allows the user to assign a score to every information producer. new mutually beneficial connections. This is in line with the To this end, the score for each information producer is given notion of friendship connections in any social network. by: The main goal of this modelling is to study the influence of score(p, i) = β · price(p, i) − (1 − β) · quality(p, i) (1) the cost component on the quality of received content, study the interactions between users and information producers, and Here, p denotes an information producer, q denotes an gain insights about the overall behaviour of this prototype information need (in the form of a query or a subscription), content market. price(p, i) refers to the announced price an information pro- ducer provides for the provided content, and quality(p, i) C. Content quality specifics denotes how relevant p is to i. Information producers compute the price based on the demand and according to popularity of The quality of the content is a difficult thing to model themselves and that of their content. Price and quality have as it cannot be known prior to acquiring it. Hence the best the same range of values to allow for their combination, while option is to assess the quality of the information producer. price is typically recomputed whenever the popularity of the To do so, one has to take into account the dual capacity of information producer changes. In Section V we study the price information producers in AGORA: to answer one-time content in different scenarios, and show the effect on the quality of requests (IR) and satisfy long-term information needs (IF). retrieved content (i.e., recall) when the price choice is (i) To this end, the quality of an information producer has to be random, (ii) strongly correlated with quality, and (iii) partially based on (i) the quality of already published content (since correlated with quality. this depicts its ability to satisfy IR tasks from users) and (ii) a predicted quality of the content to be published in the future B. Content price specifics (since this depicts its ability to satisfy IF tasks from users). The necessity of both facets is better illustrated in the case of In this section, we analyse the economic modelling of an information producer that provides content in the form of AGORA and review the basic assumptions and expectations technical articles and news items; articles have a long shelf- from such a modelling. Usefulness of the information goods life and are good candidates for recurrent sales, while news received by a subscriber is a qualitative criterion, that is diffi- items have an extremely short shelf-life and after some time cult to model. In AGORA, we model usefulness by matching users loose interest in them. interests, i.e., by assuming that all received content relevant to the information need are useful to the subscriber, and do 1. Quality of published content. The quality of the already not discuss issues such as novelty, coverage of the field, or published content is known as the resource selection problem user effort. In our modelling, after a user acquires a history in the areas of information retrieval and databases. Hence, of transactions with certain information producers, develops a number of standard resource selection algorithms such as an affect for some of them. Affect can be modelled in various tf-idf based methods, CORI, or language models (see [30] ways, depending on the task at hand, and can be either positive for a nice overview) have been proposed. We use a standard or negative [11]. In AGORA, a user does not a priori know the tf-idf based component that combines accuracy of modelling quality of the information goods, but uses the affect developed and ease of implementation. Thus, in our setup, quality of from previous transactions to approximate it. Subsequently, he published content is given by: compares the values of information quality to the expected X max  [0.5 · log (dfp,t ) + 0.5 · log tfp,t ] (2) values and update its affect [21]. t∈i The costs in AGORA are results of actions [18], such as transactions, network communication, and use of common Our approach uses the standard IR constructs of document infrastructure. Information producers try to maximise their frequency (df ) and maximum term frequency (tf max ) [25], revenue while minimising expenses that occur due to their while a balanced combination of these two metrics (0.5 in the actions. Contrary users try to maximise the utility of the above equation) is used to equally emphasise importance of received content, while minimising expenses that occur due to df and tf max according to [30], [43]. actions. In general, the content market in AGORA is not a pure 2. Predicted quality of content to be published. To pre- competitive market in the sense of e.g., [39], since users do dict the quality of the content to be published, we model not know in advance the exact content quality they are buying. IR statistics per information producer as time-series data AGORA resembles the modelling of a team of sales people and use statistical analysis tools [5] to predict future values [38], where stakeholders try to collaborate with others in order based on past observations. Such techniques take into account to get their expertise for a (cross/up) sale. After deciding who assumptions about the internal structure of the time series to collaborate with, it is possible to model the gap between like trends and seasonality and tend to emphasise recent the initial expectations and the actual actions. In [12], it is observations. Since seasonality requires long-term statistics shown that this gap is smaller in a competitive relationship that are infeasible to maintain (e.g., several years of data to compared to that of a cooperative relationship. As in many observe seasonality in the publication of Christmas content), cooperative environments each stakeholder usually retains its we resort to double exponential smoothing techniques [5] as connections with the others, while also being free to explore our prediction mechanism, since it requires a small amount of 23 data, emphasises recent over older data observations, and al- producers that were chosen according to his own information lows for trend identification (useful for news items or trending needs and ranked using the formulas discussed in Section IV. topics). Thus, in our setup, predicted quality of content to be For deciding the per user budget, we relied on studies about published is given by: budget distribution and spending for a variety of cases, ranging X ∗ from family budgets to consumer budgets [20], [9]. The main ) + log δ(cs∗p ) + 1 + 1   log δ(dfp,t (3) conclusions drawn from these studies are that (i) budget t∈i distribution follows a power law, with a small percentage ∗ In the formula above, function δ(dfp,t ) stands for the differ- of families/consumers having a high (yearly) budget, and a ence between the predicted and the last document frequency large percentage of the families being in the (long) tail of (df ) observed, and δ(cs∗ ) is the difference in the collection the distribution, with a low budget, and (ii) the percentage size of information producer p reflecting its overall expected of the income spend on content does not vary with the future publishing activity. These values are calculated for all budget. According to [3] and the above remarks, we divided terms t contained in the user subscription. In this way we the users into three classes: low, average and high budget model two aspects of the information producer’s behaviour: users with a population of 600K (or 60% of total users), (i) its potential to publish relevant documents in the future 300K (or 30% of total users) and 100K (or 10% of total ∗ (reflected by δ(dfp,t )) and (ii) its overall expected future users) respectively. Subsequently, we experimentally computed publishing activity (reflected by δ(cs∗ )). Notice also that, a budget that would allow the users to subscribe to the top-100 in the above formula, the publication of relevant documents (i.e., 10%) information producers, and allowed the low budget ∗ (i.e., δ(dfp,t )) is more emphasised than the publication rate users to have 60%, the medium budget users to have 80% and ∗ (delta(cs )) due to the nesting of the log functions. The the high budget users to have 120% of this ideal budget. addition of 1 in the log functions is used to yield positive To measure the effect of content cost in quality (with and predictions and avoid log(0). without monetary flow) we utilise the following metrics: 3. Putting it all together. Given Formulas 2 and 3, the • Messages. We measure the message traffic per action in overall quality of an information producer p with respect to AGORA. an information need X i is given by: • Recall. We measure average recall over all rounds by max  quality(p, i) = [(0.5 · log (dfp,t ) + 0.5 · log tfp,t )+ computing the ratio of the total number of retrieved t∈i content to the total number of relevant content for all ∗ ) + log δ(cs∗p ) + 1 + 1 ]   log δ(dfp,t users. (4) • Ranking. We use an extension of Spearman’s footrule distance to compare rankings of information producers V. E XPERIMENTAL E VALUATION calculated by users. This metric allows us to compare two In this section, we present our findings from the introduction different information producer rankings by calculating the of content cost, and how it affects the effectiveness of the distance between the elements in two ranking lists. In our system. We study the behaviour of AGORA using different implementation if an element from list A is not present scenarios, while varying the correlation between price, quality in list B, it is considered as being in the last available and customer demand. position in B. Finally, to assist the reproducibility of our results we plan A. Experimental Setup and Metrics to publicly release our code in an appropriate repository after We used a real-life document collection containing 2M doc- the publication of the work. uments with a vocabulary of about 600K terms from a focused Web crawl. The documents were categorised in ten different B. Varying the price-quality correlation categories by content, with the smallest category having 67K In this experiment, we aimed at observing the behaviour of documents and the largest one of 325K. In all experiments, AGORA when varying the correlation 0 ≤ κ ≤ 1 between the the network consists of 1, 000 information producers and 1M price and the quality of an information producer; k = 1 means users. Each information producer started with a database of that the price an information producer charges for content 300 documents initialised with 15% random category, 10% is fully correlated to the quality of that producer. Notice non-categorised, and 75% single category documents, resulting that quality in AGORA is easily calculated/forecasted using in 100 specialised information producers for each category. We Equation 4 and in this case users know that the content from artificially constructed information needs for the users (in the this information producer will be expensive but relevant (i.e., form of queries and subscriptions with multiple terms) using useful to them). The other extreme is when k = 0, where the document corpus and by selecting terms that are strong prices have no correlation to the quality of the information representatives of a document category (i.e., a frequent term producer and are chosen randomly. For the rest of the cases, in documents of one category and infrequent in documents of i.e., 0 < κ < 1, the correlation is modelled as the likelihood the other categories). The simulation was done in rounds. that an information producer sells κ% of content underpriced We introduced budget constraints on a per user basis; thus, (up to 20% of the initial value), and 1 − κ overpriced (up to each user was able to follow only a few selected information 20% of the initial value). 24 40 κ = 0% κ = 0% κ = 25% κ = 50% 0.5 κ = 50% κ = 75% κ = 75% 35 κ = 100% 30 Extended Spearman’s footrule metric 0.4 25 0.3 recall 20 0.2 15 10 0.1 5 0 0 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1.0 β β Fig. 3. Recall against β. Fig. 5. Difference in information producer ranking against β. Figure 3 presents the achieved recall for different values different values of κ is an effect of the modelling of AGORA as of β and κ, where we notice that the introduction of price a closed system where monetary flow is limited by the budgets for content reduces the observed recall (notice that recall has of the users and no new wealth is produced. the highest values for β = 0, i.e., no pricing is involved in Figure 5 shows how differently users rank information information producer ranking). This is an important result, producers for varying values of β and κ. The difference in showing that when information producers charge for content, the ranking of information producers is measured using an consumers trade quality for cheaper options. Notice also that extension of Spearman’s footrule metric. To produce a point for every κ value tested recall is retained high, as long as it in the graph we compare the ranked lists of information plays the most important role in the ranking (β < 50%). This producers for each pair of users for and each value of β in was also expected, since when price is of importance, con- the x-axis and average the results. Notice that the case of sumers will choose cheaper information producers, leading to β = 0 is omitted as the extended Spearman’s footrule metric a reduction in the observed recall. Additionally, when the price is always 0 since the lists compared are identical (users take is the only ranking criterion for users (β = 1), recall is close to into account only the quality to rank information producers). that of a random choice of information producers (remember When β increases (i.e., price becomes more important in that users monitor only 10% of information producers in the the ranking process) Spearman’s metric increases too, as system). information producers with high quality get lower positions in the ranking, while information producers with lower quality 0.6 β = 0.2 (but cheaper) are ranked high. Finally, when the price of β = 0.5 β = 0.8 an information producer is not associated with its quality 0.5 (random price setup or κ = 100%), there are big differences in the ranking of information producers (especially in the cases 0.4 where price matters more – the leftmost points in the graph) due to the introduced randomness. recall 0.3 20 κ = 0% κ = 25% 0.2 κ = 50% κ = 75% 0.1 15 # of messages/document 0 0.2 0.4 0.6 0.8 1 κ 10 Fig. 4. Recall against κ. 5 Figure 4 shows how recall is affected for three different val- ues of β, when κ increases. When one of the two components becomes dominant in the ranking function, it outweighs the 0 0.0 0.2 0.4 0.6 0.8 1.0 effect of the other. This is in line with our expectations, since β price dominates the ranking function and quality is sacrificed to reduce costs. Finally, notice that consistent recall between Fig. 6. Message traffic against β. 25 3.5 0.5 heartbeat messages Change in Publishing Behaviour other messages Consistent Publishing Behaviour 0.45 3 0.4 2.5 # of messages (x10K) 0.35 2 recall 0.3 1.5 0.25 1 0.2 0.5 0.15 0 0.1 0 50 100 0 0.2 0.4 0.6 0.8 1 κ β Fig. 7. Message traffic against κ. Fig. 8. Recall for different scenarios against β. C. System Performance graph are (i) the drop in recall for both scenarios as β increases In the first series of experiments we targeted the perfor- and (ii) the higher recall values for the case of topic speciali- mance of AGORA in terms of message traffic. Figure 6 shows sation. The reason for the first observation is the shift of users that the message traffic per user incurred in AGORA is reducing to cheaper information producers due to the importance of as β increases. This happens because users utilise the price content price in the ranking function. Additionally, the reason component to rank information producers, thus choosing those for the second observation is that the build up of expertise by of lower price and poor quality. This reduces the overall the information producers reflects on the higher quality values, message number as less content is exchanged in the system which in turn leads to ranking quality information producers due to users subscribing to non-expert information producers; higher. Contrary, when information producers shift their topic, our observations here are consistent with those in Section V-B users are not able to correlate price and quality and thus, that correlate recall and β. select information producers of poor quality (hence the drop Figure 7 demonstrates the total amount of traffic observed in in recall). AGORA and how this traffic is split in the various message cat- VI. C ONCLUSIONS AND F UTURE W ORK egories, as the price-quality correlation is varied. As expected the heart-beat protocol messages dominate the messaging load, In this work, we proposed AGORA, a social networking as necessary messages with information producer statistics architecture and the related distributed protocols to facilitate and prices are disseminated in the system. Notice also that distributed content management in the form of information these messages are not affected by price-quality correlation, retrieval and information filtering. In addition, we introduced since the information producers have to update their publica- a novel selection mechanism for information producers that tion statistics and prices, regardless of their customer base. allows users to rank them according to (i) their expertise, Finally, notice that the number of messages for querying (ii) their predicted future content publications, and (iii) the for/subscribing to/receiving content are affected as κ increases price of content they deliver. The experimental evaluation of since information producers that are of high quality widen their our proposal demonstrated the behaviour of such a content customer base with more users. marketplace in terms of price/quality tradeoffs, recall and message traffic. To the best of our knowledge, these are D. Varying the behaviour of information producers the first results that connect recall and message traffic in In this section we look into recall and how this is affected by a distributed social network with the content cost, and put two very different information producer behaviours: topic spe- economic modelling at the heart of system design. The most cialisation and topic shift. In topic specialisation, information important outcome of our study is that price should participate producers disregard market conditions and maintain their topic in the ranking of information producers less than a quarter specialisation even if this results in lower revenues. Contrary in of the total score to avoid the delivery of irrelevant content topic shift, information producers may alter their specialisation to users. We also showed that the introduction of content topic over time based on changes in user demand, revenue and price improves system scalability by reducing message traffic market conditions. In the latter case, an information producer and imposing a reasonable use of resources to stakeholders. initially publishes content from one category, and some rounds Overall, introducing a monetary value for the production and later may decide to switch to a different category to simulate dissemination of content in a distributed social network proves changes in portfolio or a different business strategy. an interesting business model that conserves resources and Figure 8 shows the observed recall for both scenarios and improves scalability. However, this should be executed in a different values of β. The most important observations in this careful fashion to avoid user dissatisfaction that may rise from 26 the content cost itself, and the reduced access to relevant [18] B. Johansson and H. Persson. Self-organised adjustments in a market information. with price-setting firms. In Chaos, Solitons and Fractals, 2003. [19] M. A. Kabir, J. Han, J. Yu, and A. Colman. SCIMS: A Social Context Future directions of research include more extensive ex- Information Management System for Socially-Aware Applications. In perimentation (using a grid service for vast-scale experiments CAiSE, 2012. and analytics), more detailed economic modelling (e.g., model [20] B. B. A. Kennickell and K. Moore. Recent Changes in U.S. Family Finances: Evidence from the 2001 and 2004 Survey of Consumer AGORA as an open system, perform monetary flow monitor- Finances, 2006. ing/analysis, incorporate agent-style BDI modelling [13]), and [21] C. Li, M. Singh, and K. Sycara. A Dynamic Pricing Mechanism for implementation of a prototype system. P2P Referral Systems. In AAMAS, 2004. [22] G. Liu, H. Shen, and L. Ward. An efficient and trustworthy P2P and ACKNOWLEDGEMENTS social network integrated file sharing system. In P2P, 2012. [23] A. Loupasakis, N. Ntarmos, and P. Triantafillou. eXO: Decentralized The author would like to thank the anonymous reviewers Autonomous Scalable Social Networking. In CIDR, 2011. for their comments and suggestions on improving the quality [24] M. Mani, A.-M. Nguyen, and N. Crespi. SCOPE: A prototype for spontaneous P2P social networking. In PerCom Workshops, 2010. and presentation of the work. [25] C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Informa- tion Retrieval. Cambridge University Press, 2008. R EFERENCES [26] G. Mega, A. Montresor, and G. Picco. Efficient dissemination in [1] S. Abbas, J. Pouwelse, D. Epema, and H. Sips. A Gossip-Based decentralized social networks. In P2P, 2011. Distributed Social Networking System. In WETICE, 2009. [27] J. Mitchell-Wong, S. Goh, M. Chhetri, R. Kowalczyk, and Q. Vo. A [2] M. Backes, M. Maffei, and K. Pecina. A Security API for Distributed Framework for Open, Distributed and Self-Managed Social Platforms. Social Networks. In NDSS, 2011. In VECN, 2008. [3] L. P. Breker. A survey of network pricing schemes. In Theoretical [28] R. Narendula, T. Papaioannou, and K. Aberer. My3: A highly-available Computer Science, 1996. P2P-based online social network. In P2P, 2011. [4] S. Buchegger, D. Schioberg, L.-H. Vu, and A. Datta. PeerSoN: P2P [29] C. H. Ng, K. C. Sia, and C. H. Chang. Advanced Peer Clustering and social networking: early experiences and insights. In SNS, 2009. Firework Query Model in the Peer-to-Peer Network. In WWW, 2002. [5] C. Chatfield. The Analysis of Time Series - An Introduction. CRC Press [30] H. Nottelmann and N. Fuhr. Evaluating Different Methods of Estimating 2004. Retrieval Quality for Resource Selection. In SIGIR, 2003. [6] A. Crespo and H. Garcia-Molina. Routing indices for peer-to-peer [31] P. Raftopoulou and E. Petrakis. iCluster: a Self-Organising Overlay systems. In Proceedings of 22nd IEEE International ICDCS Conference, Network for P2P Information Retrieval. In ECIR, 2008. pages 23–32, Vienna, Austria, July 2002. [32] P. Raftopoulou, E. Petrakis, and C. Tryfonopoulos. Rewiring strategies [7] L. A. Cutillo, R. Molva, and T. Strufe. Safebook: A privacy-preserving for semantic overlay networks. In DPD, 2009. online social network leveraging on real-life trust. IEEE Communica- [33] P. Raftopoulou, C. Tryfonopoulos, E. Petrakis, and N. Zevlis. DS4: tions Magazine, 2009. Introducing Semantic Friendship in Distributed Social Networks. In [8] A. de Spindler, M. Grossniklaus, and M. C. Norrie. Development CoopIS, 2013. Framework for Mobile Social Applications. In CAiSE, 2009. [34] R. Sharma and A. Datta. SuperNova: Super-peers based architecture for [9] J. B. DeLong. Six Families Budget Their Money. In Lecture notes for decentralized online social networks. In COMSNETS, 2012. American Economic History, University of California at Berkeley, 2008. [35] P. Stuedi, I. Mohomed, M. Balakrishnan, Z. Mao, V. Ramasubramanian, [10] A. Dumrogsiri, M. Fan, A. Jain, and K. Moinzadeh. A supply D. Terry, and T. Wobber. Contrail: Enabling Decentralized Social chain model with direct and retail channels. In European Joumal of Networks on Smartphones. In Middleware, 2011. Operational Research, 2008. [36] F. Tegeler, D. Koll, and X. Fu. Gemstone: Empowering Decentralized [11] M. Faig and B. Jerez. Inflation, Prices, and Information and Competitive Social Networking with High Data Availability. In GLOBECOM, 2011. Search. Journal of Macroeconomics, 2006. [37] B. Upadhyaya and E. Choi. Social Overlay: P2P Infrastructure for Social [12] L. Forker and P. Stannack. Cooperation versus Competition: do buyers Networks. In NCM, 2009. and suppliers really see eye-to-eye? In European Journal of Purchasing [38] T. Üstüner and D. Godes. Better Sales Networks. In Harvard Business and Supply Management, 2000. Review, 2006. [13] M. Georgeff, B. Pell, M. Pollack, M. Tambe, and M. Wooldridge. The Belief-Desire-Intention Model of Agency. In Intelligent Agents [39] H. R. Varian. Pricing Information Goods. In Research Libraries Group V: Agents Theories, Architectures, and Languages, 1999. Symposium, Harvard Law School, 1995. [14] K. Graffi, C. Gross, P. Mukherjee, A. Kovacevic, and R. Steinmetz. [40] S. Voulgaris, M. van Steen, and K. Iwanicki. Proactive Gossip-based LifeSocial.KOM: A P2P-Based Platform for Secure Online Social Management of Semantic Overlay Networks. CCPE, 19(17), 2007. Networks. In P2P, 2010. [41] M. Xue, B. Carminati, and E. Ferrari. P3D - Privacy-Preserving Path [15] B. Greschbach, G. Kreitz, and S. Buchegger. The devil is in the metadata Discovery in Decentralized Online Social Networks. In COMPSAC, - New privacy challenges in Decentralised Online Social Networks. In 2011. PerCom Workshops, 2012. [42] C. Zimmer, C. Tryfonopoulos, K. Berberich, M. Koubarakis, and [16] F. Gunther, M. Manulis, and T. Strufe. Key management in distributed G. Weikum. Approximate Information Filtering in Peer-to-Peer Net- online social networks. In WOWMOM, 2011. works. In WISE, 2008. [17] L. Han, M. Punceva, B. Nath, S. Muthukrishnan, and L. Iftode. So- [43] C. Zimmer, C. Tryfonopoulos, K. Berberich, G. Weikum, and cialCDN: Caching techniques for distributed social networks. In P2P, M. Koubarakis. Node Behavior Prediction for Large-Scale Approximate 2012. Information Filtering. In LSDS-IR @ SIGIR, 2007. 27