=Paper= {{Paper |id=Vol-2390/PaperC3 |storemode=property |title=Decentralising Power: How We are Trying to Keep CALLector Ethical |pdfUrl=https://ceur-ws.org/Vol-2390/PaperC3.pdf |volume=Vol-2390 |authors=Cathy Chua,Manny Rayner,Hanieh Habibi,Nikos Tsourakis }} ==Decentralising Power: How We are Trying to Keep CALLector Ethical== https://ceur-ws.org/Vol-2390/PaperC3.pdf
        Decentralising Power: how we are Trying to Keep CALLector Ethical

                     Cathy Chua1 , Hanieh Habibi2 , Manny Rayner2 , Nikos Tsourakis2
                                                      1
                                                  Independent researcher
                                                   2
                                                     Geneva University
                                              cathyc@pioneerbooks.com.au
                              {Hanieh.Habibi,Emmanuel.Rayner,Nikolaos.Tsourakis}@unige.ch

                                                              Abstract
We present a brief overview of the CALLector project, and consider ethical questions arising from its overall goal of creating a social
network to support creation and use of online CALL resources. We argue that these questions are best addressed in a decentralised,
pluralistic open source architecture.

Keywords: online communities, social networks, education


                    1.    Introduction                               CALLector project’s general goals and the state of play af-
This paper follows on from another paper presented at                ter the first year of development. We list what we see as our
the same workshop (Chua and Rayner, 2019), where we                  main ethical obligations, and outline three plausible ways in
present case studies from two large internet communities.            which the project could be continued. Finally, we present a
Here, we consider the implications for our own project,              brief sketch of the specific solutions we are aiming towards.
CALLector. As outlined in the previous paper, the agents
we are most worried about controlling are ourselves. If
                                                                                  2.    Overview of CALLector
                                                                                   2
CALLector turns out to be successful, and the opportunity            CALLector is a project funded by the Swiss National Sci-
presents itself, experience shows that there is a strong temp-       ence Foundation and based at Geneva University; it offi-
tation to act unethically.                                           cally started on April 1 2018, and is scheduled to run un-
It is easy to say that we don’t need to be concerned about           til December 31 2021. Its overall goal is to create a so-
these issues. Few online communities turn into large suc-            cial network designed to support users who wish to create,
cess stories, and if ours does, we will be delighted. This is        use and share online CALL content. The key word here
lazy and self-deceptive thinking. Basically, it amounts to           is “create”: we want it to be possible for users to create
saying that we would be happy to defraud our members,                their own content, using suitable tools. The abstract ques-
given the chance, and since it probably will not happen              tion we wish to investigate is whether the well-documented
there is no need to worry. The argument does not pass the            rewards associated with working within a social network
Kantian test: if everyone thinks this way, then we can be            — basically, various kinds of positive social feedback from
sure that all online communities will exploit and defraud            the other members — can motivate people to create large
their members. It seems a simple conclusion that our ethi-           amounts of useful content. This model has worked well
cal obligation is to develop the network in such a way that          for online communities centred around, for example, book
we are not in fact planning to exploit our members, even             reviews (Goodreads), cartographic data (OpenStreetMap3 )
if we are fortunate enough to be given the opportunity. It           and knitting patterns (Ravelry4 ), so maybe it will also work
is worth mentioning explicitly that, at least for some of the        for CALL.
authors, the ideas we present here are the fruit of direct and       The global architecture is shown in Figure 1, and consists
painful experience. Two of us (Chua, Rayner) have been               of three layers. In the middle, we have the content. Un-
active members of the Goodreads online reviewing commu-              derneath, we have the platforms that deploy the content.
nity1 since shortly after its inception. When Goodreads was          At the top, we have the social network, which indexes the
launched, the founders made many extravagant promises                content and allows users to perform functions like rating,
designed to attract new members, in particular that the site         commenting, recommending and so on. Figure 2 shows the
would not employ any form of censorship. A few months                current state of instantiation of this architecture. The so-
after the site was sold to Amazon in March 2013, policy              cial network level does not yet exist: work is just starting
abruptly changed, with widespread and arbitrary censor-              now in April 2019. We have two deployment platforms,
ship introduced at zero notice. We were heavily involved             Regulus/Alexa and LARA, and some initial content. Work
in the fightback by the user community and in particular             to date on the Regulus/Alexa platform is described in our
helped create the book which became a focal activity for             other companion paper in these proceedings (Tsourakis et
the protesters (Reader, 2013).                                       al., 2019). Very briefly, the platform allows construction
In the rest of the paper, we explore the general consider-           of interactive CALL games that can be deployed on Ama-
ations outlined above in the specific context of the CAL-            zon Alexa devices like the Echo. Games are written in a
Lector project. We begin by giving a brief overview of the               2
                                                                           https://www.unige.ch/callector/
                                                                         3
   1
     https://www.goodreads.com/. As of April 2019,                         https://www.openstreetmap.org
                                                                         4
the site claims to have over 85 million members.                           https://www.ravelry.com


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                                49
spreadsheet form similar to Alexa’s “blueprints”5 , but ori-                        3.    Ethical issues
ented towards the requirements of CALL. We have so far            We now move on to ethical issues; we begin by listing
constructed about twenty sample games, and have a couple          the people and organisations to whom we have obligations.
of external content-creators whom we are informally sup-          We consider these to be i) ourselves, ii) the project funder
porting.                                                          (the Swiss National Science Foundation), iii) our employer
LARA (Learning And Reading Assistant), which will be              (Geneva University), iv) the network’s content creators (un-
described at greater length elsewhere, is a platform we           der “content creators”, we also include external people who
started developing in Q3 20186 . As the name suggests, the        contribute to the system-level architecture), and v) the net-
goal is to provide assistance to students who are improv-         work’s content users. In our capacity as the people respon-
ing their competence in the L2 through reading. LARA              sible for carrying out the project, we consider our main
processes text into a marked-up form which offers two ba-         obligations to these agents to be roughly the following.
sic kinds of support. First, each sentence is linked to a
recorded audio file, so the student can always find out what      Ourselves: Keep our jobs. Publish. One of us needs to
the text they are reading sounds like. Second, and more               complete a PhD.
unusually, there is a personalised hyperlinked concordance        Funder: Carry out the project as described in the proposal.
which shows the student where each word has previously               Publish.
occurred in their own reading experience. Figure 3 illus-
trates LARA’s core functionality using a short passage from       Employer: Publish. Attempt to leverage the results of the
The Tale of Peter Rabbit. The marked-up version of Peter             project to bring in more funding.
is shown at two points in the learner’s reading progress: on
the left, where the learner has read the whole text and noth-     Content creators: Provide stable, maintainable hosting
ing else, and on the right, where they have read both Peter           for content. Provide stable, maintainable hosting for
Rabbit and the first three chapters of Alice in Wonderland7 .         user data. Respect the content creators’ IP rights. Re-
Colours are used to indicate how many times each lemma                spect the content creators’ rights as members of the
occurs in the text; words in black have occurred more than            social network.
five times, words in red only once, while blue and green          Content users: Provide stable, maintainable hosting for
show intermediate values. As the picture shows, the colours           content. Provide stable, maintainable hosting for user
effectively track the learner’s increased exposure to vocab-          data. Respect the content users’ rights as members of
ulary between the two snapshots. In the first snapshot, only          the social network.
function words and a few content words central to the story
appear in black; in the second, many of the words marked          Based on the above, we consider three generic strategies for
in red have turned black or blue, indicating that they have       continuing the project, and to what extent they will fulfil
been read several times during the intervening period. The        our ethical obligations.
text has been manually divided into segments and recorded
in audio form. A loudspeaker icon marks the end of each           3.1.   Default academic project
segment, and the learner can listen to the segment in ques-       If we take no special steps, we expect the project to de-
tion by clicking on the icon. The learner can click on any        velop roughly as follows. We will keep on haphazardly ex-
word and get a personalised concordance page containing           tending the system in order to support short-term activities,
up to ten segments where the word appears (Figure 4).             primarily writing papers, getting a PhD, and perhaps doing
LARA is designed on the assumption that content will in           data collection. The software base will consist of messy re-
general be distributed over multiple servers on the web,          search code, each part of which is typically understood by
with a master file that associates resource identifiers with      only one person. There will be minimal documentation. A
URLs. When constructing the personalised concordance              year or two after funding ends, the platform will stop work-
pages, the compiler downloads as little as possible, leaving      ing.
the bulky multimedia files on the remote servers and insert-      This outcome would minimally fulfil our obligations to our-
ing links to them where necessary. There is thus no need to       selves and the funder. If the result was good enough that we
have all the user data on one server either.                      were able to submit a proposal for some kind of follow-on
                                                                  project, it would minimally fulfil our obligations to our em-
                                                                  ployer. It would however leave the content creators feeling
                                                                  betrayed and the content users disappointed.
   5
      https://blueprints.amazon.com
   6
      The state of play of the project as of late February 2019   3.2.   Aim towards commercialisation
is summarised in our informal position paper (Akhlaghi et al.,    A second strategy would aim towards commercialising the
2019). Examples of LARA content are posted at https://
                                                                  project. This would require more careful extension of the
www.unige.ch/callector/lara-content/.
    7
      The two version can be found online at https:
                                                                  system, done in a way that prioritised creation of a substan-
//www.issco.unige.ch/en/research/projects/                        tial and growing social network; the worth of the project
peter_rabbitvocabpages/_hyperlinked_text_                         would mostly depend on the size of the network. Critical
.html          and       https://www.issco.unige.ch/              activities would be to develop easily usable tools for con-
en/research/projects/callector/reader1_                           tent creation, gamify the site to make usage addictive, and
englishvocabpages/_hyperlinked_text_.html.                        ensure that hosting was scalable and reliable. Code would


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                        50
                                          Figure 1: Planned CALLector architecture




                                          Figure 2: Current CALLector architecture




Figure 3: Example of text shown by LARA: two paragraphs from Peter Rabbit marked up (left) after completing Peter
Rabbit but nothing else, and (right) after reading both Peter Rabbit and the first three chapters of Alice in Wonderland.



need to be more carefully maintained and documented, and         on the strategy followed by the project’s new owners.
at some point would be moved into a private repository. The
long-term goal would be to sell the project to whoever was       3.3.   Aim towards viable open source project
willing to buy it, most likely a large multinational.            A third strategy would aim towards creation of a viable
If this strategy succeeded, we would strongly fulfil our obli-   open source project. Goals would overlap to some extent
gations to ourselves, the funder, and our employers. Con-        with those in the commercialisation scenario: in particular,
tent creators would however again feel betrayed; their un-       central activities would again be to develop tools for content
paid work would have been exploited to make money for            creation and to ensure stable hosting. Code would however
us and/or our employer. Whether content users considered         be kept open source, and there would be a strong empha-
that we had fulfilled our obligations to them would depend       sis on long-term maintainability. This would require more


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                        51
Figure 4: Example of content produced by the current LARA prototype, showing a personalised reading progress. The
student has so far read Peter Rabbit followed by the first three chapters of Alice in Wonderland. The left hand side shows
the marked-up text, where the student has just clicked on the word “took”. The right hand side displays occurrences of
different inflected forms of “take” in both source texts.



extensive documentation and code reviewing, in particular                These tools are open source, and configurable to run
ensuring that multiple people were involved in maintaining               both on local machines and on a remote server. Users
each module.                                                             who do not wish to be bothered with the inconvenience
A success here would strongly fulfil our obligations to our-             of installing the tools will be able to run them on our
selves, the funder, content creators and content users. It               servers; users who prioritise independence will be able
would at least weakly fulfil our obligations to our employer.            to download them and run them locally.
                                                                   2. Issues concerning deployment differ sharply between
  4.    Options for creating an ethical project
                                                                      the two component platforms. LARA content, which
The discussion in the preceding section and the case studies          takes the form of ordinary web pages, is unproblem-
considered in the linked paper suggest that an open source            atic and can be straightforwardly uploaded to a web-
strategy is strongly preferable on ethical grounds: it is the         server. The nontrivial issues arise with regard to the
only one which respects the rights of the content creators            interactive speech content produced by Regulus. We
and content users, who are essential to the success of the            discuss these below in §4.3.
project. We now go on to consider how such an open source
version of CALLector might be realised. We divide up ma-           3. As far as discussing content goes (the social network
terial under four main headings; decentralisation, owner-             part of the project), concrete development has not yet
ship, third-party resources and long-term maintainability.            started. Our plan is to use a peer-to-peer architec-
                                                                      ture where information is not stored on a single server,
4.1.   Decentralisation                                               but distributed across multiple servers (Yeung et al.,
In many online communities, centralised power is the                  2009). Methods of this kind are explicitly designed to
core tool used to exploit the members. The community’s                address the issues we discuss here.
founders maintain the servers on which the community
resides. New members sign, usually without reading it,            4.2.    Ownership
an EULA which gives the founders more or less absolute            Many online communities make it a practice to require that
power over their activities. In particular, members can al-       the community founders gain full non-exclusive ownership
ways be blocked or thrown out, and if they do they have no        of user-created content, in exchange for hosting said con-
recourse.                                                         tent. Since we are explicitly aiming not to place mem-
We propose the following remedy to this problem. Rather           bers under the obligation to rely on us to host their content,
than requiring that all the activities of the social network be   we conversely do not aim to acquire rights to this content,
carried out on the founders’ dedicated servers, we are in-        which is likely to place us in an ethically dangerous posi-
stead developing tools which offer the option of being used       tion. We will, instead, take the default view that a content
on machines controlled by other members of the commu-             creator’s content belongs to the creator and no one else. The
nity. We consider specific issues concerned with develop-         model is that we are providing compilers and other devel-
ing, deploying and discussing content.                            opment tools: a compiler supplier does not normally claim
                                                                  ownership of anything produced using their compiler.
  1. The development tools are used to author CALL                We will however encourage content creators to release their
     courses and compile them into deployable content.            content in open source form, under standard open source


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                         52
licenses like LGPL. The basic encouragement we offer is           to a size where there is a large enough user base.
altruistic: if you make your content open source, other peo-      Again, we see a clear difference between the two com-
ple can use and adapt it, and that makes you a more valued        ponent platforms. LARA’s codebase is small and simple
member of the community. (Pure altruism may well be re-           enough that we expect many people could potentially main-
inforced by some kind of credit/kudos mechanism at the            tain it. In addition, the distributed design means that it does
social network level). Experience shows that many people          not require a large server farm to operate, but can comfort-
find these kinds of reward motivating, and in practice we         ably be spread over multiple independent servers, none of
can expect many people to want to take part as open source        which would need to bear a heavy load. It seems to us
developers.                                                       quite reasonable that a usable free network could be run by
                                                                  a loose federation of server operators, perhaps mostly aca-
4.3.   Third-party dependencies                                   demic. Someone who wanted access to LARA would only
The content produced by LARA consists of ordinary web             need find a convenient server near them that was prepared
pages, and third-party dependencies are limited to those          to host their data.
inevitable when using the internet in any form. For the           Some aspects of this proposed organisation carry over to
speech-enabled content produced by Regulus, the issues are        the Regulus platform, but the same problem arises: until
more complex, and different content creators will have dif-       free open source speech recognition resources are available,
ferent priorities. Since our whole enterprise is founded on       the community requires access to a third-party commercial
the goodwill of the content creators, it seems to us that we      agent in order to function. The preconditions for long-term
should try to accommodate as broad a spectrum of content          maintainability are thus less clear.
creator profiles as is feasible.
Based on preliminary discussions, we can see at least two                        5.   Other ethical issues
types emerging. The first class is pragmatic in focus. Their      The focus of this paper has been on ethical issues having to
main interest is in getting the content up and running reli-      do with our obligations to the various stakeholders in CAL-
ably with as small an investment of effort as possible, and       Lector, but for completeness we briefly outline our position
they also want to retain the option of monetising the con-        on two other topics related to the project: privacy and copy-
tent if demand reaches the point where this is feasible. For      right.
this kind of content creator, Amazon Alexa is an attractive
deployment vehicle. Alexa offers high-quality hands-free          5.1.   Privacy
far-field recognition on an automatically scalable platform.      Privacy issues are important for the LARA platform, and
About a hundred million Alexa-enabled devices have al-            enter in two ways. First, since the unique feature of LARA
ready been sold, and Alexa support will soon be available         is precisely that it gives the user a text marked up on the
on many new laptops, so an Alexa-deployed course will             basis of their own reading experience, it is a fortiori nec-
reach a wide audience. Alexa also makes it straightforward        essary for the platform to store that reading experience in
to monetise apps.                                                 some form. We are mindful of the fact that this is poten-
We also see a class of content creators who place a higher        tially sensitive information. At the moment, it is hard to
priority on ethical issues; some of these people have reser-      believe that anyone will care if someone else discovers that
vations about Amazon’s ethical policies, and have ex-             they have read The Tale of Peter Rabbit or Alice in Wonder-
pressed unhappiness about the idea of using Amazon prod-          land. Since the intention is to make it possible for content
ucts. Partly with this class of user in mind, and partly on       creators to upload a wide variety of texts, the situation may
the general principle that it is unwise to be reliant on a        however change in the future. For students living in coun-
single third-party supplier, we are developing an alternate       tries which operate religious or politically motivated cen-
deployment platform for Regulus-derived content. This             sorship policies, it is certainly conceivable that their online
consists of a Django wrapper for the core runtime code            reading list may be information they need to keep private.
which allows it to be run as a web service, together with a       Our initial plan here is the obvious one: we will store in-
client which performs speech recognition using the Google         formation about reading histories on secure servers on the
Speech API. The problem, of course, is that the content           university cloud, we will not require users to give us any
creator is still reliant on a third-party supplier, which will    information which might be used to identify them, and we
now be Google rather than Amazon; unfortunately, it is            will provide access over HTTPS. Later in the project, we
extremely challenging to create open source cloud-based           will make available the means to allow people to set up
recognition resources which can realistically compete with        their own servers, which perform relevant processing —
the ones offered by these large multinationals. But our ar-       constructing sets of concordance pages, etc — on the lo-
chitecture will at least make it easy to integrate other third-   cal machine, keeping the reading history there as well; this
party recognisers as they emerge.                                 will make it possible to parties who do not trust our secu-
                                                                  rity to run on machines under their physical control. The
4.4.   Long-term maintainability                                  distributed nature of the LARA architecture (cf. remarks at
It is obviously not possible to us at this stage to present       the end of §2.) means that personal servers can be made
a plan which guarantees the planned CALLector commu-              easy to install and run.
nity’s long-term maintainability. It is ethically reasonable,     A second and less central set of privacy issues is related
however, to demand that we provide evidence that the com-         to logging of user activities in LARA. The current proto-
munity has good prospects for being maintained if it grows        type uses static web pages, but we plan at a later stage to


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                          53
move to dynamic page generation. The primary motivation           solution. The open source compiler is small and simple,
is efficiency, but this will also make it feasible to perform     and could feasibly be maintained by people other than our-
fine-grained logging of user activities at the level of access-   selves. It is easy to use, and the content is trivial to deploy.
ing concordance pages, playing audio files and looking up         The situation is far less clear with Regulus, where it seems
translations. This kind of data could be of considerable sci-     right now difficult to deploy content without incurring a
entific interest, but initial polling suggests to us that many    critical dependence on a large multinational, either Ama-
users will be unhappy about such detailed recording of their      zon or Google. Of course, speech-enabled content is be-
actions. We will consequently only do this when users have        coming increasingly important and speech-enabled CALL
clearly given informed consent.                                   software is potentially very useful. We do not think it is
We should add that we have been criticised on ethi-               right for us to make ethical decisions on behalf of our po-
cal/privacy grounds for encouraging children to use the           tential users, and intend to move forward on making both
Alexa CALL games we have developed. We do not think               platforms available.
this criticism is well founded. First, millions of children all   We note that so far we have seen much more interest in
over the world are already using Alexa games which from           LARA. An interesting recent development, however, is that
the privacy point of view are similar to ours, and the number     several users have asked us whether it would be feasible to
is rapidly increasing. Our decisions will have no influence       merge the platforms, optionally providing speech recogni-
on this trend. Second, and even more to the point, we are         tion capabilities inside the LARA environment that could
not aware of serious evidence that suggests Alexa is invad-       give students feedback when reading LARA text aloud.
ing user privacy more than a multitude of other web and           This opens up a whole new set of issues, which we look
cellphone technologies. We have discussed the issue with          forward to exploring.
Amazon engineers, who have convinced us that Amazon
is telling the plain truth when they say that Alexa’s listen-                    7.    Acknowledgements
ing capability is only switched on after speaking the “wake       We would like to thank Branislav Bédi and Karën Fort for
word”. The thing that makes their argument so plausible           organising the enetCollect workshop at which this work
is that Alexa’s performance would in fact be a great deal         was originally presented and for many useful and stimulat-
better if it listened all the time. At the moment, the prob-      ing discussions. Johanna Gerlach has been extremely help-
lem is rather in the opposite direction: in order to reduce       ful in supporting the LiteDevTools platform, which is heav-
server bandwidth, Alexa drops out of the current app if           ily used by both the Regulus/Alexa and LARA platforms.
the user fails to respond within a few seconds. This be-          Many people have now been involved in developing con-
haviour causes many problems at the user interaction level,       tent. We would particularly like to recognise the contribu-
and Amazon do not even provide a toggle that allows it to         tions made by Elham Akhlaghi (Farsi), Branislav Bédi (Ice-
be switched off.                                                  landic), Matt Butterweck (German and Middle High Ger-
                                                                  man; also code development), Monica Depasquale (Latin),
5.2.   Copyright and IPR                                          Pierre-Emmanuel Gallais (French), Junta Ikeda (Japanese),
Since LARA makes integral use of published texts, copy-           Annabel Keigwin (French and English) and Sabina Sesti-
right issues naturally arise. Our position here is simple. As     giani (Italian).
previously mentioned, we consider the LARA tools to have
the status of open source compilers which we make avail-                    8.    Bibliographical References
able to the community. We allow anyone to use the tools as
                                                                  Akhlaghi, E., Bedi, B., Chua, C., Habibi, H., and Rayner,
they wish, claim no intellectual property rights with regard
                                                                    M. (2019). LARA: A learning and reading assistant.
to LARA documents which users may produce, and take no
                                                                    Position paper. https://www.issco.unige.ch/
responsibility for any possible consequences. In particular,
                                                                    en/research/projects/callector/LARA_
a user who processes text and posts it in LARA form takes
                                                                    position_paper.pdf.
responsibility for any relevant copyright issues. We will
                                                                  Chua, C. and Rayner, M. (2019). What do the founders of
not take responsibility for checking the copyright status of
                                                                    online communities owe to their users? In Proceedings
LARA texts that may be posted on our own servers, but
                                                                    of the enetCollect WG3/WG5 workshop, Leiden, Hol-
rather use the common social network approach of mak-
                                                                    land.
ing it easy to flag potentially offending texts. If texts are
                                                                  Reader, G. (2013). Off-Topic: The Story of an Internet Re-
flagged as infringing copyright or being offensive (hate
                                                                    volt. https://www.goodreads.com/ebooks/
speech, pornography), we reserve the right to remove them.
                                                                    download/18749172-off-topic.
We will in normal cases warn the content creator in ad-
                                                                  Tsourakis, N., Rayner, M., Gallais, P.-E., Habibi, H., Chua,
vance, giving them a grace period to correct or save the
                                                                    C., and Butterweck, M. (2019). Alexa as a CALL plat-
content.
                                                                    form for children: where do we start? In Proceedings of
                                                                    the enetCollect WG3/WG5 workshop, Leiden, Holland.
          6.    Summary and conclusions
                                                                  Yeung, C.-m. A., Liccardi, I., Lu, K., Seneviratne, O., and
We have presented a brief overview of CALLector and out-            Berners-Lee, T. (2009). Decentralization: The future
lined how we intend to progress it, focusing in particular on       of online social networking. In W3C Workshop on the
ethical aspects. There is a substantial difference between          Future of Social Networking Position Papers, volume 2,
the two underlying platforms, LARA and Regulus. For                 pages 2–7.
LARA, we see a direct path towards an ethically attractive


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                           54