Decentralising Power: how we are Trying to Keep CALLector Ethical Cathy Chua1 , Hanieh Habibi2 , Manny Rayner2 , Nikos Tsourakis2 1 Independent researcher 2 Geneva University cathyc@pioneerbooks.com.au {Hanieh.Habibi,Emmanuel.Rayner,Nikolaos.Tsourakis}@unige.ch Abstract We present a brief overview of the CALLector project, and consider ethical questions arising from its overall goal of creating a social network to support creation and use of online CALL resources. We argue that these questions are best addressed in a decentralised, pluralistic open source architecture. Keywords: online communities, social networks, education 1. Introduction CALLector project’s general goals and the state of play af- This paper follows on from another paper presented at ter the first year of development. We list what we see as our the same workshop (Chua and Rayner, 2019), where we main ethical obligations, and outline three plausible ways in present case studies from two large internet communities. which the project could be continued. Finally, we present a Here, we consider the implications for our own project, brief sketch of the specific solutions we are aiming towards. CALLector. As outlined in the previous paper, the agents we are most worried about controlling are ourselves. If 2. Overview of CALLector 2 CALLector turns out to be successful, and the opportunity CALLector is a project funded by the Swiss National Sci- presents itself, experience shows that there is a strong temp- ence Foundation and based at Geneva University; it offi- tation to act unethically. cally started on April 1 2018, and is scheduled to run un- It is easy to say that we don’t need to be concerned about til December 31 2021. Its overall goal is to create a so- these issues. Few online communities turn into large suc- cial network designed to support users who wish to create, cess stories, and if ours does, we will be delighted. This is use and share online CALL content. The key word here lazy and self-deceptive thinking. Basically, it amounts to is “create”: we want it to be possible for users to create saying that we would be happy to defraud our members, their own content, using suitable tools. The abstract ques- given the chance, and since it probably will not happen tion we wish to investigate is whether the well-documented there is no need to worry. The argument does not pass the rewards associated with working within a social network Kantian test: if everyone thinks this way, then we can be — basically, various kinds of positive social feedback from sure that all online communities will exploit and defraud the other members — can motivate people to create large their members. It seems a simple conclusion that our ethi- amounts of useful content. This model has worked well cal obligation is to develop the network in such a way that for online communities centred around, for example, book we are not in fact planning to exploit our members, even reviews (Goodreads), cartographic data (OpenStreetMap3 ) if we are fortunate enough to be given the opportunity. It and knitting patterns (Ravelry4 ), so maybe it will also work is worth mentioning explicitly that, at least for some of the for CALL. authors, the ideas we present here are the fruit of direct and The global architecture is shown in Figure 1, and consists painful experience. Two of us (Chua, Rayner) have been of three layers. In the middle, we have the content. Un- active members of the Goodreads online reviewing commu- derneath, we have the platforms that deploy the content. nity1 since shortly after its inception. When Goodreads was At the top, we have the social network, which indexes the launched, the founders made many extravagant promises content and allows users to perform functions like rating, designed to attract new members, in particular that the site commenting, recommending and so on. Figure 2 shows the would not employ any form of censorship. A few months current state of instantiation of this architecture. The so- after the site was sold to Amazon in March 2013, policy cial network level does not yet exist: work is just starting abruptly changed, with widespread and arbitrary censor- now in April 2019. We have two deployment platforms, ship introduced at zero notice. We were heavily involved Regulus/Alexa and LARA, and some initial content. Work in the fightback by the user community and in particular to date on the Regulus/Alexa platform is described in our helped create the book which became a focal activity for other companion paper in these proceedings (Tsourakis et the protesters (Reader, 2013). al., 2019). Very briefly, the platform allows construction In the rest of the paper, we explore the general consider- of interactive CALL games that can be deployed on Ama- ations outlined above in the specific context of the CAL- zon Alexa devices like the Echo. Games are written in a Lector project. We begin by giving a brief overview of the 2 https://www.unige.ch/callector/ 3 1 https://www.goodreads.com/. As of April 2019, https://www.openstreetmap.org 4 the site claims to have over 85 million members. https://www.ravelry.com EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 49 spreadsheet form similar to Alexa’s “blueprints”5 , but ori- 3. Ethical issues ented towards the requirements of CALL. We have so far We now move on to ethical issues; we begin by listing constructed about twenty sample games, and have a couple the people and organisations to whom we have obligations. of external content-creators whom we are informally sup- We consider these to be i) ourselves, ii) the project funder porting. (the Swiss National Science Foundation), iii) our employer LARA (Learning And Reading Assistant), which will be (Geneva University), iv) the network’s content creators (un- described at greater length elsewhere, is a platform we der “content creators”, we also include external people who started developing in Q3 20186 . As the name suggests, the contribute to the system-level architecture), and v) the net- goal is to provide assistance to students who are improv- work’s content users. In our capacity as the people respon- ing their competence in the L2 through reading. LARA sible for carrying out the project, we consider our main processes text into a marked-up form which offers two ba- obligations to these agents to be roughly the following. sic kinds of support. First, each sentence is linked to a recorded audio file, so the student can always find out what Ourselves: Keep our jobs. Publish. One of us needs to the text they are reading sounds like. Second, and more complete a PhD. unusually, there is a personalised hyperlinked concordance Funder: Carry out the project as described in the proposal. which shows the student where each word has previously Publish. occurred in their own reading experience. Figure 3 illus- trates LARA’s core functionality using a short passage from Employer: Publish. Attempt to leverage the results of the The Tale of Peter Rabbit. The marked-up version of Peter project to bring in more funding. is shown at two points in the learner’s reading progress: on the left, where the learner has read the whole text and noth- Content creators: Provide stable, maintainable hosting ing else, and on the right, where they have read both Peter for content. Provide stable, maintainable hosting for Rabbit and the first three chapters of Alice in Wonderland7 . user data. Respect the content creators’ IP rights. Re- Colours are used to indicate how many times each lemma spect the content creators’ rights as members of the occurs in the text; words in black have occurred more than social network. five times, words in red only once, while blue and green Content users: Provide stable, maintainable hosting for show intermediate values. As the picture shows, the colours content. Provide stable, maintainable hosting for user effectively track the learner’s increased exposure to vocab- data. Respect the content users’ rights as members of ulary between the two snapshots. In the first snapshot, only the social network. function words and a few content words central to the story appear in black; in the second, many of the words marked Based on the above, we consider three generic strategies for in red have turned black or blue, indicating that they have continuing the project, and to what extent they will fulfil been read several times during the intervening period. The our ethical obligations. text has been manually divided into segments and recorded in audio form. A loudspeaker icon marks the end of each 3.1. Default academic project segment, and the learner can listen to the segment in ques- If we take no special steps, we expect the project to de- tion by clicking on the icon. The learner can click on any velop roughly as follows. We will keep on haphazardly ex- word and get a personalised concordance page containing tending the system in order to support short-term activities, up to ten segments where the word appears (Figure 4). primarily writing papers, getting a PhD, and perhaps doing LARA is designed on the assumption that content will in data collection. The software base will consist of messy re- general be distributed over multiple servers on the web, search code, each part of which is typically understood by with a master file that associates resource identifiers with only one person. There will be minimal documentation. A URLs. When constructing the personalised concordance year or two after funding ends, the platform will stop work- pages, the compiler downloads as little as possible, leaving ing. the bulky multimedia files on the remote servers and insert- This outcome would minimally fulfil our obligations to our- ing links to them where necessary. There is thus no need to selves and the funder. If the result was good enough that we have all the user data on one server either. were able to submit a proposal for some kind of follow-on project, it would minimally fulfil our obligations to our em- ployer. It would however leave the content creators feeling betrayed and the content users disappointed. 5 https://blueprints.amazon.com 6 The state of play of the project as of late February 2019 3.2. Aim towards commercialisation is summarised in our informal position paper (Akhlaghi et al., A second strategy would aim towards commercialising the 2019). Examples of LARA content are posted at https:// project. This would require more careful extension of the www.unige.ch/callector/lara-content/. 7 The two version can be found online at https: system, done in a way that prioritised creation of a substan- //www.issco.unige.ch/en/research/projects/ tial and growing social network; the worth of the project peter_rabbitvocabpages/_hyperlinked_text_ would mostly depend on the size of the network. Critical .html and https://www.issco.unige.ch/ activities would be to develop easily usable tools for con- en/research/projects/callector/reader1_ tent creation, gamify the site to make usage addictive, and englishvocabpages/_hyperlinked_text_.html. ensure that hosting was scalable and reliable. Code would EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 50 Figure 1: Planned CALLector architecture Figure 2: Current CALLector architecture Figure 3: Example of text shown by LARA: two paragraphs from Peter Rabbit marked up (left) after completing Peter Rabbit but nothing else, and (right) after reading both Peter Rabbit and the first three chapters of Alice in Wonderland. need to be more carefully maintained and documented, and on the strategy followed by the project’s new owners. at some point would be moved into a private repository. The long-term goal would be to sell the project to whoever was 3.3. Aim towards viable open source project willing to buy it, most likely a large multinational. A third strategy would aim towards creation of a viable If this strategy succeeded, we would strongly fulfil our obli- open source project. Goals would overlap to some extent gations to ourselves, the funder, and our employers. Con- with those in the commercialisation scenario: in particular, tent creators would however again feel betrayed; their un- central activities would again be to develop tools for content paid work would have been exploited to make money for creation and to ensure stable hosting. Code would however us and/or our employer. Whether content users considered be kept open source, and there would be a strong empha- that we had fulfilled our obligations to them would depend sis on long-term maintainability. This would require more EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 51 Figure 4: Example of content produced by the current LARA prototype, showing a personalised reading progress. The student has so far read Peter Rabbit followed by the first three chapters of Alice in Wonderland. The left hand side shows the marked-up text, where the student has just clicked on the word “took”. The right hand side displays occurrences of different inflected forms of “take” in both source texts. extensive documentation and code reviewing, in particular These tools are open source, and configurable to run ensuring that multiple people were involved in maintaining both on local machines and on a remote server. Users each module. who do not wish to be bothered with the inconvenience A success here would strongly fulfil our obligations to our- of installing the tools will be able to run them on our selves, the funder, content creators and content users. It servers; users who prioritise independence will be able would at least weakly fulfil our obligations to our employer. to download them and run them locally. 2. Issues concerning deployment differ sharply between 4. Options for creating an ethical project the two component platforms. LARA content, which The discussion in the preceding section and the case studies takes the form of ordinary web pages, is unproblem- considered in the linked paper suggest that an open source atic and can be straightforwardly uploaded to a web- strategy is strongly preferable on ethical grounds: it is the server. The nontrivial issues arise with regard to the only one which respects the rights of the content creators interactive speech content produced by Regulus. We and content users, who are essential to the success of the discuss these below in §4.3. project. We now go on to consider how such an open source version of CALLector might be realised. We divide up ma- 3. As far as discussing content goes (the social network terial under four main headings; decentralisation, owner- part of the project), concrete development has not yet ship, third-party resources and long-term maintainability. started. Our plan is to use a peer-to-peer architec- ture where information is not stored on a single server, 4.1. Decentralisation but distributed across multiple servers (Yeung et al., In many online communities, centralised power is the 2009). Methods of this kind are explicitly designed to core tool used to exploit the members. The community’s address the issues we discuss here. founders maintain the servers on which the community resides. New members sign, usually without reading it, 4.2. Ownership an EULA which gives the founders more or less absolute Many online communities make it a practice to require that power over their activities. In particular, members can al- the community founders gain full non-exclusive ownership ways be blocked or thrown out, and if they do they have no of user-created content, in exchange for hosting said con- recourse. tent. Since we are explicitly aiming not to place mem- We propose the following remedy to this problem. Rather bers under the obligation to rely on us to host their content, than requiring that all the activities of the social network be we conversely do not aim to acquire rights to this content, carried out on the founders’ dedicated servers, we are in- which is likely to place us in an ethically dangerous posi- stead developing tools which offer the option of being used tion. We will, instead, take the default view that a content on machines controlled by other members of the commu- creator’s content belongs to the creator and no one else. The nity. We consider specific issues concerned with develop- model is that we are providing compilers and other devel- ing, deploying and discussing content. opment tools: a compiler supplier does not normally claim ownership of anything produced using their compiler. 1. The development tools are used to author CALL We will however encourage content creators to release their courses and compile them into deployable content. content in open source form, under standard open source EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 52 licenses like LGPL. The basic encouragement we offer is to a size where there is a large enough user base. altruistic: if you make your content open source, other peo- Again, we see a clear difference between the two com- ple can use and adapt it, and that makes you a more valued ponent platforms. LARA’s codebase is small and simple member of the community. (Pure altruism may well be re- enough that we expect many people could potentially main- inforced by some kind of credit/kudos mechanism at the tain it. In addition, the distributed design means that it does social network level). Experience shows that many people not require a large server farm to operate, but can comfort- find these kinds of reward motivating, and in practice we ably be spread over multiple independent servers, none of can expect many people to want to take part as open source which would need to bear a heavy load. It seems to us developers. quite reasonable that a usable free network could be run by a loose federation of server operators, perhaps mostly aca- 4.3. Third-party dependencies demic. Someone who wanted access to LARA would only The content produced by LARA consists of ordinary web need find a convenient server near them that was prepared pages, and third-party dependencies are limited to those to host their data. inevitable when using the internet in any form. For the Some aspects of this proposed organisation carry over to speech-enabled content produced by Regulus, the issues are the Regulus platform, but the same problem arises: until more complex, and different content creators will have dif- free open source speech recognition resources are available, ferent priorities. Since our whole enterprise is founded on the community requires access to a third-party commercial the goodwill of the content creators, it seems to us that we agent in order to function. The preconditions for long-term should try to accommodate as broad a spectrum of content maintainability are thus less clear. creator profiles as is feasible. Based on preliminary discussions, we can see at least two 5. Other ethical issues types emerging. The first class is pragmatic in focus. Their The focus of this paper has been on ethical issues having to main interest is in getting the content up and running reli- do with our obligations to the various stakeholders in CAL- ably with as small an investment of effort as possible, and Lector, but for completeness we briefly outline our position they also want to retain the option of monetising the con- on two other topics related to the project: privacy and copy- tent if demand reaches the point where this is feasible. For right. this kind of content creator, Amazon Alexa is an attractive deployment vehicle. Alexa offers high-quality hands-free 5.1. Privacy far-field recognition on an automatically scalable platform. Privacy issues are important for the LARA platform, and About a hundred million Alexa-enabled devices have al- enter in two ways. First, since the unique feature of LARA ready been sold, and Alexa support will soon be available is precisely that it gives the user a text marked up on the on many new laptops, so an Alexa-deployed course will basis of their own reading experience, it is a fortiori nec- reach a wide audience. Alexa also makes it straightforward essary for the platform to store that reading experience in to monetise apps. some form. We are mindful of the fact that this is poten- We also see a class of content creators who place a higher tially sensitive information. At the moment, it is hard to priority on ethical issues; some of these people have reser- believe that anyone will care if someone else discovers that vations about Amazon’s ethical policies, and have ex- they have read The Tale of Peter Rabbit or Alice in Wonder- pressed unhappiness about the idea of using Amazon prod- land. Since the intention is to make it possible for content ucts. Partly with this class of user in mind, and partly on creators to upload a wide variety of texts, the situation may the general principle that it is unwise to be reliant on a however change in the future. For students living in coun- single third-party supplier, we are developing an alternate tries which operate religious or politically motivated cen- deployment platform for Regulus-derived content. This sorship policies, it is certainly conceivable that their online consists of a Django wrapper for the core runtime code reading list may be information they need to keep private. which allows it to be run as a web service, together with a Our initial plan here is the obvious one: we will store in- client which performs speech recognition using the Google formation about reading histories on secure servers on the Speech API. The problem, of course, is that the content university cloud, we will not require users to give us any creator is still reliant on a third-party supplier, which will information which might be used to identify them, and we now be Google rather than Amazon; unfortunately, it is will provide access over HTTPS. Later in the project, we extremely challenging to create open source cloud-based will make available the means to allow people to set up recognition resources which can realistically compete with their own servers, which perform relevant processing — the ones offered by these large multinationals. But our ar- constructing sets of concordance pages, etc — on the lo- chitecture will at least make it easy to integrate other third- cal machine, keeping the reading history there as well; this party recognisers as they emerge. will make it possible to parties who do not trust our secu- rity to run on machines under their physical control. The 4.4. Long-term maintainability distributed nature of the LARA architecture (cf. remarks at It is obviously not possible to us at this stage to present the end of §2.) means that personal servers can be made a plan which guarantees the planned CALLector commu- easy to install and run. nity’s long-term maintainability. It is ethically reasonable, A second and less central set of privacy issues is related however, to demand that we provide evidence that the com- to logging of user activities in LARA. The current proto- munity has good prospects for being maintained if it grows type uses static web pages, but we plan at a later stage to EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 53 move to dynamic page generation. The primary motivation solution. The open source compiler is small and simple, is efficiency, but this will also make it feasible to perform and could feasibly be maintained by people other than our- fine-grained logging of user activities at the level of access- selves. It is easy to use, and the content is trivial to deploy. ing concordance pages, playing audio files and looking up The situation is far less clear with Regulus, where it seems translations. This kind of data could be of considerable sci- right now difficult to deploy content without incurring a entific interest, but initial polling suggests to us that many critical dependence on a large multinational, either Ama- users will be unhappy about such detailed recording of their zon or Google. Of course, speech-enabled content is be- actions. We will consequently only do this when users have coming increasingly important and speech-enabled CALL clearly given informed consent. software is potentially very useful. We do not think it is We should add that we have been criticised on ethi- right for us to make ethical decisions on behalf of our po- cal/privacy grounds for encouraging children to use the tential users, and intend to move forward on making both Alexa CALL games we have developed. We do not think platforms available. this criticism is well founded. First, millions of children all We note that so far we have seen much more interest in over the world are already using Alexa games which from LARA. An interesting recent development, however, is that the privacy point of view are similar to ours, and the number several users have asked us whether it would be feasible to is rapidly increasing. Our decisions will have no influence merge the platforms, optionally providing speech recogni- on this trend. Second, and even more to the point, we are tion capabilities inside the LARA environment that could not aware of serious evidence that suggests Alexa is invad- give students feedback when reading LARA text aloud. ing user privacy more than a multitude of other web and This opens up a whole new set of issues, which we look cellphone technologies. We have discussed the issue with forward to exploring. Amazon engineers, who have convinced us that Amazon is telling the plain truth when they say that Alexa’s listen- 7. Acknowledgements ing capability is only switched on after speaking the “wake We would like to thank Branislav Bédi and Karën Fort for word”. The thing that makes their argument so plausible organising the enetCollect workshop at which this work is that Alexa’s performance would in fact be a great deal was originally presented and for many useful and stimulat- better if it listened all the time. At the moment, the prob- ing discussions. Johanna Gerlach has been extremely help- lem is rather in the opposite direction: in order to reduce ful in supporting the LiteDevTools platform, which is heav- server bandwidth, Alexa drops out of the current app if ily used by both the Regulus/Alexa and LARA platforms. the user fails to respond within a few seconds. This be- Many people have now been involved in developing con- haviour causes many problems at the user interaction level, tent. We would particularly like to recognise the contribu- and Amazon do not even provide a toggle that allows it to tions made by Elham Akhlaghi (Farsi), Branislav Bédi (Ice- be switched off. landic), Matt Butterweck (German and Middle High Ger- man; also code development), Monica Depasquale (Latin), 5.2. Copyright and IPR Pierre-Emmanuel Gallais (French), Junta Ikeda (Japanese), Since LARA makes integral use of published texts, copy- Annabel Keigwin (French and English) and Sabina Sesti- right issues naturally arise. Our position here is simple. As giani (Italian). previously mentioned, we consider the LARA tools to have the status of open source compilers which we make avail- 8. Bibliographical References able to the community. We allow anyone to use the tools as Akhlaghi, E., Bedi, B., Chua, C., Habibi, H., and Rayner, they wish, claim no intellectual property rights with regard M. (2019). LARA: A learning and reading assistant. to LARA documents which users may produce, and take no Position paper. https://www.issco.unige.ch/ responsibility for any possible consequences. In particular, en/research/projects/callector/LARA_ a user who processes text and posts it in LARA form takes position_paper.pdf. responsibility for any relevant copyright issues. We will Chua, C. and Rayner, M. (2019). What do the founders of not take responsibility for checking the copyright status of online communities owe to their users? In Proceedings LARA texts that may be posted on our own servers, but of the enetCollect WG3/WG5 workshop, Leiden, Hol- rather use the common social network approach of mak- land. ing it easy to flag potentially offending texts. If texts are Reader, G. (2013). Off-Topic: The Story of an Internet Re- flagged as infringing copyright or being offensive (hate volt. https://www.goodreads.com/ebooks/ speech, pornography), we reserve the right to remove them. download/18749172-off-topic. We will in normal cases warn the content creator in ad- Tsourakis, N., Rayner, M., Gallais, P.-E., Habibi, H., Chua, vance, giving them a grace period to correct or save the C., and Butterweck, M. (2019). Alexa as a CALL plat- content. form for children: where do we start? In Proceedings of the enetCollect WG3/WG5 workshop, Leiden, Holland. 6. Summary and conclusions Yeung, C.-m. A., Liccardi, I., Lu, K., Seneviratne, O., and We have presented a brief overview of CALLector and out- Berners-Lee, T. (2009). Decentralization: The future lined how we intend to progress it, focusing in particular on of online social networking. In W3C Workshop on the ethical aspects. There is a substantial difference between Future of Social Networking Position Papers, volume 2, the two underlying platforms, LARA and Regulus. For pages 2–7. LARA, we see a direct path towards an ethically attractive EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 54