=Paper=
{{Paper
|id=Vol-1939/paper3
|storemode=property
|title=The Irrefutable History of You: Distributed Ledgers and Semantics for Ubiquitous Personal Ratings
|pdfUrl=https://ceur-ws.org/Vol-1939/paper3.pdf
|volume=Vol-1939
|authors=Allan Third,John Domingue
|dblpUrl=https://dblp.org/rec/conf/semweb/ThirdD17a
}}
==The Irrefutable History of You: Distributed Ledgers and Semantics for Ubiquitous Personal Ratings==
The Irrefutable History of You: Distributed Ledgers and Semantics for Ubiquitous Personal Ratings Allan Third and John Domingue Knowledge Media Institute, Open University, Milton Keynes, MK7 6AA, UK {allan.third,john.domingue}@open.ac.uk Abstract. A recurring theme in the science-fiction series Black Mirror is the consequence for society of an over-focus on social networking. The episode Nosedive imagines a future in which every public interaction a person has is rated by the other parties, and every aspect of ones life depends on the overall rating computed from these. In this paper, we show how such a scenario is already technically possible using existing technologies such as distributed ledgers, and discuss means by which the negative possibilities may be ameliorated using semantic approaches. 1 Introduction The television drama Black Mirror [4] focuses, in each standalone episode, on the potential personal and social consequences of the use of technology, basing its plots on forms of technology which can be imagined as at least partially plausible extensions of what is available or in use today. The societies depicted are usually in some sense dystopian, although not always (San Junipero [5] being a notable counterexample). A recurring theme across a number of episodes is the idea of an over-focus on social networking. The episode Nosedive [8] depicts a world in which every public interaction with another person can be rated, between zero and five stars, by means of a smartphone – after any interaction, pointing the phone’s camera at the person one wishes to rate brings up a picture of that person and an interface to rate them. Ratings can also be given for social media posts. Ratings are aggregated for each person, so that it is also possible for anyone to view someone’s aggregate rating, again, by holding up their phone or looking at an online profile. The ubiquity of ratings has developed huge social importance, with employment, personal and social consequences – a person might qualify, or not, to rent a home, or get a job, or enter a social venue based on their rating. The society shown is one of constant effort to maintain or improve ones rating, and the plot of the episode follows how one woman’s attempts to significantly improve her own rating go wrong, and through a series of unfortunate events, backfire, leading to her rating nosediving from just over four to nearly zero over a short period of time. We first show that the scenario of Nosedive can already be implemented using technologies available now, before discussing possible technical approaches which could minimise the negative social consequences of a hypothetical ubiquitous adoption of such a rating system. 2 Distributed Ledgers An emerging area of research in the Web concerns the use of distributed ledgers, based on the blockchain data structure, to serve as immutable trustworthy record stores in environments where trust is lacking. Originally developed to underpin the Bitcoin cryptocurrency [12], blockchains are being experimented with for a number of use cases, including verifiable educational certification [9, 14, 17], data integrity and access [1, 18] and Internet-of-Things applications [6, 19]. The design of a blockchain is such that the integrity of its contents are guaranteed by a large network of financially and organisationally independent nodes, and the cost of editing a record is prohibitively high, meaning that once entered onto a blockchain, a record is effectively immutable and irrefutable. A distributed ledger is based on a blockchain: a timestamped sequential series of records shared in toto across a network of nodes. There is no central or authoritative node, and every node has the potential to add new records to the ledger. New records are added by consensus: anyone who wishes to add data to a node submits a cryptographically-signed request to do so. Each node competes for the right to add (“mine”) a new block of transactions to the blockchain, with that right awarded by consensus among all the nodes. The precise method of consensus varies between different types of blockchain. The important feature of any consensus mechanism is that there should be a cost involved in attempting to mine blocks, and a reward for the successful node, to encourage good behaviour among nodes. The winning node selects a set of pending record transactions and groups them into a block, which is added to the chain. Requiring a consensus between a large network of (financially, and organisationally) unrelated nodes in order to add any data to the chain guarantees that spurious data cannot be inserted by a malicious agent, and, due to the fact that modifying an older record requires the entire chain following the modified record be rewritten as if from scratch, the records entered in a blockchain are effectively immutable. Distributed ledger platforms such as Ethereum [20] add the ability to have executable code embedded in a blockchain, in so-called “smart contracts”. Be- cause the code of a smart contract is stored on the blockchain like any other record, it is immutable and traceable to the account of the person who deployed it to the platform. This means that smart contracts can be relied on as not hav- ing been tampered with. If the source code of a contract is also made available, it is possible to verify that the deployed version is genuinely created from the given source. An open source smart contract, then, can be trusted as a genuine implementation of what its source code describes, which allows everyone who interacts with the contract to verify that it behaves in the way that is expected of it. Smart contracts are therefore usable to mediate transactions between users on a distributed ledger platform in a trustworthy way. Records on a distributed ledger are public, and, as stated earlier, all nodes on the network have a complete copy of the entire chain. This does not mean, however, that everything must be readable by everyone; records can contain en- crypted data, and, with a suitable cryptographic key management infrastructure, the data owner can implement a “selective visibility” system, only permitting chosen others to view the full or partial contents of a record. 3 Implementing the Nosedive rating system We see two main technical requirements to implement the Nosedive rating sys- tem. To simplify the discussion, let us say that person A wishes to give person B a rating of n stars, where n is a natural number between 0 and 5 in relation to a face-to-face interaction I between A and B. Person C, at a later date, wants to view this rating’s value. 1. the ability for A’s mobile device to recognise B’s proximity and provide A with the ability to select B’s name and apply the rating. 2. Ratings must be unforgeable and incapable of being tampered with, and always attributable to both A and B in their respective roles in the rating transaction. In the episode, A simply needs to hold her device up facing B rather than making a selection on the screen to achieve 1; potentially we can imagine ad- vanced facial recognition in addition to the other techniques described here en- abling this. And 2 is never explicitly stated; however, we must assume that it applies in order for the system to carry the weight in people’s lives that it is shown to do. To implement 1, it is possible to take advantage of the widespread availability of accurate GPS devices in modern smartphones. If each phone registered its current location in a public database, it would be possible for A’s phone to use its own location to query for nearby phones, and then, from the returned list of devices, query each in turn, via some presumably standardised interface, for a user profile, containing, e.g., B’s name, picture, cryptographic public key and “rating account” details. To ensure as best as possible that ratings of B were tied to B’s actual behaviour (and not, for example, the behaviour of someone who has stolen B’s phone), it may be necessary to require some biometric identification of B. For non-proximate ratings, profiles can be shared via social media. 2 can be developed using a distributed ledger with smart contracts. If A and B both have accounts on a distributed ledger, publicised via their user profiles, then it would easily be possible for A to submit a numerical rating to B’s “rating” smart contract via a cryptographically-signed request. If C reads B’s profile, then he can query B’s rating contract to view the rating n. A suitable public key infrastructure would support trustworthy attribution of the rating to A and prevent forgery, and the properties of the distributed ledger would ensure that ratings could not be tampered with once made – collectively making the ratings effectively irrefutable. In fact, we are currently experimenting with the scenario of distributed-ledger-based reputation transactions ([9]) using the idea of “reputation tokens” to be used in education as a means of recognising and accrediting soft skills that traditionally are not adequately covered by traditional assessment and accreditation. Signed and immutably timestamped records on the ledger would provide, therefore, an effectively irrefutable record of all rated interactions in a person’s history with the system. While it would be possible, and perhaps simpler in some ways, to implement a system using a centralised database without a ledger, one could not then be certain that ratings could not be tampered with by those with access to the database. The technologies required are already developed and in use, and it would not be technically difficult to build the Nosedive system using them. 4 Potential for abuse It should be noted at the outset that there are already currently-obtaining situa- tions in which people voluntarily take part in systems in which they are rated by other users, either explicitly or implicitly. Every user of the auction site eBay [7], for example, has a rating determined by other users reflecting their status as a “good citizen” of the site. Forum/discussion sites which implement a “Like” but- ton on contributions can show the total number of “likes” a user has received on the site. (The tabletop games community site Boardgamegeek [3], for example, does this with “thumbs”, its “like” system.) In a somewhat more sophisticated way, online dating sites provide rankings of users, although these rankings are context-sensitive and depend on the user viewing them, in terms of how well users match the viewers. The algorithms computing the rankings are generally proprietary; it seems reasonable to assume that they are in some way learned from similar users. The bulk of existing research into online rating systems has focused on on- line sales systems or discussion forums such as these. Desiderata for an online reputation system are suggested in [21], and include mechanisms to prevent or discourage users from changing their identities to “discard” a bad rating, and to limit the effect of “memory” in ratings - to prevent historical low ratings from dragging an average down even when a person’s current ratings are consis- tently high. One of the desiderata is also that ratings from high-ranking users be weighted more than those of low-ranking ones – an idea which we discuss below in section 5.1. It is clear, however, that the disastrous outcome for the protagonist in Nose- dive is not far-fetched; a system for ubiquitous ratings of people, for any reason, and with profound social consequences, is ripe for abuse. We envisage this abuse in the form of malicious, trivial or thoughtless low ratings directly, as shown in the episode, as well as in over-interpretation of low ratings in inappropriate contexts. In existing ranking systems, [16] showed that negative rankings have a disproportionate effect on reputation compared to positive rankings. While it may well be appropriate to refuse to buy from an eBay seller with a low rating, it is clearly not appropriate to refuse to hire someone on the basis of, for ex- ample, their social life. Even supposing that one accepted the concept of rating each public interaction according to perceived merit, there is no a priori reason to believe that “fair” ratings are achievable. People have been shown to assess in others in a discriminatory manner based on existing prejudices in otherwise identical circumstances (see, e.g., [13]). We may therefore reasonably expect that groups disadvantaged on the basis of gender, ethnic origin, sexuality, and so on, would become disadvantaged in terms of ratings. Indeed, in Nosedive, those who are shown to be experience the most significant consequences as a result of low ratings are members of existing disadvantaged groups, by gender or race.. The protagonist’s brother, however (white, male and hinted to be heterosexual) is shown to be less invested in the need for a high rating, and it seems plausible that ones investment in ratings in general would vary in inverse proportion to societal privilege. 5 Minimising the effects of a Nosedive rating system Let us suppose that some social or economic pressure leads to the widespread adoption of such a technology. In itself this appears to be unlikely, for reasons including the scenario which is the central plot of Nosedive, but for the sake of argument, let us assume that it takes place. And let us assume that such a system could be enforced, perhaps by economic pressures, to be close to universal, so that everyone had access to the relevant hardware and software to take part, and the options for “opting out” were limited to non-existent. (In practice, we assume that none of these are achievable, particularly the latter two, and that such a system has so many manifest and serious ethical and social problems with it that widespread adoption would be resisted very very strongly, sufficiently to undermine its implementation at all.) But if it did, what measures could be put in place to protect users from malicious or punitive ratings? 5.1 Automatic or manual moderation The most obvious, and on the surface, likely to be effective, approach would be if B had the power to accept or reject ratings at will, but of course, to do so would undermine the concept of a rating and would lead to the vast majority of, if not all, people having only the maximum rating. Even a slight modification, in which ratings are accepted or rejected, but the recipient does not know the value of the rating at the time of acceptance, would still tend to lead to overall high average ratings, as people would generally choose to accept ratings for interactions which they felt had gone well. Take into account the rating of the rater Low-rated people’s ratings do not count as much as those of high-rated people, i.e., A’s ratings are weighted by her own rating. This approach might seem good on paper, but is likely to be harmful in practice. Given the hypothesis that the distribution of high and low ratings would come to reflect existing power structures in society, there is the potential for this approach simply to end up entrenching disadvantage. An episode of the sitcom Community [2] with a similar theme to Nosedive envisaged this precise scenario, with the outcome being a very highly stratified society with little social movement. Take into account the rating history of raters. The ratings of someone who frequently rates people low are weighted less than the ratings of others. This is also subject to being gamed. All a malicious person needs to do is to give high ratings to interactions which are not important to them in order to maintain rating influence on those who do. It also would only affect persistently low-rating people, and would have no effect on the situation where B receives many low ratings from a large number of otherwise typically-rating people. (This may well have been the case at a number of points in the episode, such as when the protagonist receives a lot of low ratings because she’s standing on the highway in traffic, or when she takes over the microphone at the wedding.) This would also be relevant to other forms of gaming of the rating system. For example, family and friend groups might cooperate to increase their members’ ratings, while antagonistic social groups could do the opposite. Analysis of rating history could identify patterns such as this, and allow for normalisation of ratings to reflect them. Require mutual ratings If only A rates B in relation to I, nothing happens to B’s rating until and unless B also rates A in relation to I. This would provide B with some control and the ability to avoid ratings which are predicted to be bad. However, bearing in mind we are discussing a hypothetical situation where society as a whole has decided to adopt ubiquitous rating, it seems less than likely that this level of control would be accepted. Mutual agreement would be one of the mechanisms by which erroneous rat- ings could be corrected (for example, in the case of misidentification of a person). Meta-rating Nearby people are randomly selected to “rate a rating”, with low-ranked ratings being withdrawn or reversed. This approach has the potential to make a difference, and has been imple- mented with positive effects in discussion systems – a notable example being the Slashdot technology news site [15], where every contribution in a discussion can be rated by other users, and whose “metamoderation” system has been shown to succeed in maintaining civility in discussions [10]. This method could also be used to address erroneous ratings. Potential difficulties with this method are the overhead of “meta-rating” – would people be willing to expend the extra effort to do it? – and the observation that meta-ratings are just as likely to be affected by conscious and unconscious bias as the ratings themselves. Apply a cost to giving ratings If there were a cost to rating someone, or a limit on the number of ratings which could be given (within a particular time period, for example), then it may serve to deter frivolous ratings. If the cost were financial, of course, this would again serve to entrench existing status. If simply a limit in the number of ratings, there is a chance that malicious ratings would be less likely. 5.2 Semantic approaches Contextual ratings Rather than a semantics-free number, ratings are instead applied to a particular category of interaction or behaviour, e.g., “this rating is for a customer in a retail purchasing interaction”. [11] argues that ratings should always be interpreted relative to their original context. This allows fine-grained information to be conveyed. Specifically, it allows for more subtle interpretation of ratings. So in hiring someone, an employer will not (indeed, should not) be concerned with what that person is like as a wedding guest, and can therefore exclude all ratings related to irrelevant trans- actions. More specifically, an aggregate rating could be computed not across all interactions, but across all interactions of specified types, such as all financial interactions. With more sophisticated interfaces and possibly smarter systems, the context could be extended to support the notion of evidence for a rating to be recorded alongside the rating itself. For contextual ratings to function, there would need to be an ontology of human interaction describing the categorical relationships between interaction types (such as that a retail transaction is a financial interaction, for example). Manual modelling, however, is only likely to get so far, given the range and complexity of human interactions which take place. Fine-grained categorisations could be crowdsourced, with, perhaps, some central editorial approval or mod- eration process to avoid discriminatory or offensive categories. Reasoning could be carried out on semantic representations to derive new categories or specialised ratings – this has potential for positive or negative con- sequences. For example, it might then be possible for malicious users to simulate discriminatory categories indirectly by deriving them. Positive possibilities in- clude, among others, the ability to identify patterns in ratings received, which might help to limit or avoid negative effects. (The same approach would apply if it were also possible to find out the ratings that someone had given too, which would improve transparency in the process regardless.) Purely semantic ratings The idea of contextual ratings could be taken one step further, and the idea of numerical ratings could be dropped entirely. So instead of “B is rated 4”, or “B is rated 4 in relation to I, which is a retail interaction”, it would be “B is rated “helpful” in relation to I, which is a retail interaction”. Models would need to be developed for relevant attributes, such as “helpful” which could be associated with particular interactions, or interaction types, and with sufficient ontological semantics, reasoning could be carried out as with numbers. Such contextual semantic ratings would be more fine-grained still, and offer more flexibility, while perhaps helping people to avoid the habit of over- interpreting numbers. Aggregation too need not be purely numerical; aggregate categories using vague qualifiers, such as “mostly harmless”, could be computed. Selective visibility of ratings It could even be possible, using smart con- tracts, to enforce restrictions on ratings that are taken into account; if C has to specify up-front the reason for reading B’s rating, with B’s acceptance of the reason, there could be hardcoded, crowdsourced, or legally mandated sets of interaction types which may be considered in computing the rating. Potentially the user could have the ability to override the request and add further limits onto the considered interaction types. However, this latter ability seems likely to be subject to the same social pressures as requiring approval or mutual rating before a new rating has effect. Two-way personalisation Combining semantic representation of contextual rating data with selective visibility gives the potential for all parties, A, B and C, to provide semantic descriptions of preferences and inputs which can be used for fine-grained negotiation of requests and permissions when it comes to rating, being rated and viewing other ratings, from which one would expect good prac- tices to emerge from privacy-conscious users which could be adopted by those less skilled when it comes to technology and privacy protections. 6 Conclusions We have described how a ubiquitous personal rating system, as shown in the Black Mirror episode Nosedive, could be implemented using technologies avail- able today, backed up by smart-contract-enabled distributed ledger platforms which ensure the availability of effectively immutable histories of effectively ir- refutable records of ratings. We have speculatively discussed potential moder- ation techniques and Semantic Web technologies which might aim to limit the negative consequences of such a system. None of the speculated measures do more than mitigate the possibilities of abuse, and it is hard to see how anything other than avoiding the system at all could prevent it entirely. Currently-used personal rating systems are limited to very specific contexts, such as eBay, with no consequences beyond those contexts, but a ubiquitous system is inevitably subject to many forms of abuse. While we believe it is highly unlikely that society as a whole would adopt a Nosedive system, there have of course been historical and recent instances of societies choosing options not in the best interest of the majority of its members. Expecting a technological solution to scenarios such as this is naı̈ve; technology can be designed to support safe choices or minimise dangerous ones, but ulti- mately, it can rarely, if ever, enforce them. if there were a significant danger that Nosedive-style ratings were to become a ubiquitous phenomenon, the most effective approach would likely be to tackle the economic and social pressures leading to this danger. References 1. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: Medrec: Using blockchain for medical data access and permission management. In: Open and Big Data (OBD), International Conference on. pp. 25–30. IEEE (2016) 2. Blum, J., Deay, P., Saccardo, T., Kolb, C., Ridley, R., Padrick, M., Diego, D., Roller, M., Harmon, D.: App development and condiments. Community S5(E8) (2014) 3. BoardGameGeek: https://boardgamegeek.com (Jul 2017) 4. Brooker, C.: Black Mirror. Zeppotron (2011) 5. Brooker, C.: San Junipero. Black Mirror S3(E4) (2016) 6. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the internet of things. IEEE Access 4, 2292–2303 (2016) 7. eBay: https://ebay.com (Jul 2017) 8. Jones, R., Schur, M., Brooker, C.: Nosedive. Black Mirror S3(E1) (2016) 9. KMi: http://blockchain.open.ac.uk (Jan 2017) 10. Lampe, C., Zube, P., Lee, J., Park, C.H., Johnston, E.: Crowdsourcing ci- vility: A natural experiment examining the effects of distributed moderation in online forums. Government Information Quarterly 31(2), 317 – 326 (2014), http://www.sciencedirect.com/science/article/pii/S0740624X14000021 11. Mui, L., Mohtashemi, M., Ang, C., Szolovits, P., Halberstadt, A.: Ratings in dis- tributed systems: A bayesian approach. In: Proceedings of the Workshop on Infor- mation Technologies and Systems (WITS). pp. 1–7 (2001) 12. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system (2008) 13. Oreopoulos, P.: Why do skilled immigrants struggle in the labor market? a field ex- periment with thirteen thousand resumes. American Economic Journal. Economic Policy 3(4), 148 (2011) 14. Sharples, M., Domingue, J.: The blockchain and kudos: A distributed system for educational record, reputation and reward. In: European Conference on Technology Enhanced Learning. pp. 490–496. Springer (2016) 15. Slashdot: http://slashdot.org (Jul 2017) 16. Standifird, S.S.: Reputation and e-commerce: ebay auctions and the asymmetrical impact of positive and negative ratings. Journal of management 27(3), 279–295 (2001) 17. Third, A., Domingue, J., Bachler, M., Quick, K.: Blockchains and the Web position paper. In: W3C Workshop on Distributed Ledgers on the Web (2016) 18. Third, A., Tiddi, I., Bastianelli, E., Valentine, C., Domingue, J.: Towards the temporal streaming of graph data on distributed ledgers. In: 2nd International Workshop on Linked Data and Distributed Ledgers, Supplementary Proceedings of the 14th Extended Semantic Web Conference (forthcoming 2017) 19. Valentine, C.: GreenDATA. http://projects.kmi.open.ac.uk/greendata (2016) 20. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper (2014) 21. Zacharia, G., Maes, P.: Trust management through reputation mechanisms. Ap- plied Artificial Intelligence 14(9), 881–907 (2000)