<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Bayesian Framework for Reputation in Citizen Science</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joan Garriga</string-name>
          <email>jgarriga@ceab.csic.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaume Piera</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frederic Bartumeus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre d'Estudis Avancats de Blanes (CEAB-CSIC)</institution>
          ,
          <addr-line>Carrer Acces Cala Sant Francesc 14, 17300, Girona</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Centre de Recerca Ecologica i Aplicacions Forestals (CREAF)</institution>
          ,
          <addr-line>Carrer de les Cases Sert 54, 08193, Cerdanyola del Valles, Barcelona</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institucio Catalana de Recerca i Estudis Avancats (ICREA)</institution>
          ,
          <addr-line>Passeig de Llu s Companys 23, 08010, Barcelona</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institut de Ciencies del Mar (ICM-CSIC)</institution>
          ,
          <addr-line>Passeig Mar tim de la Barceloneta 37-49, 08003, Barcelona</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The viability of any Citizen Science (CS) research program is absolutely conditioned to the engagement of the citizen. In a CS framework in which participants are expected to perform actions that can be later on validated, the incorporation of a reputation system can be a successful strategy to increase the overall data quality and the likelihood of engagement, and also to evaluate how close citizens ful ll the goals of the CS research program. Under the assumption that participant actions are validated using a simple discrete rating system, current reputation models, thoroughly applied in e-platform services, can be easily adapted to be used in CS frameworks. However, current reputation models implicitly assume that rated items and scored agents are the same entity, and this does not necessarily hold in a CS framework, where one may want to rate actions but score the participants generating it. We present a simple approach based on a Bayesian network representing the ow described above (user, action, validation), where participants are aggregated in a discrete set of user classes and we use the global evidence in the data base to estimate both the prior and the posterior distribution of the user classes. Afterwards, we evaluate the expertise of each participant by computing the user-class likelihood of the sequence of actions/validations observed for that user. As a proof of concept we implement our model in a real CS case, namely the Mosquito Alert project.</p>
      </abstract>
      <kwd-group>
        <kwd>citizen science</kwd>
        <kwd>reputation system</kwd>
        <kwd>Bayesian network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Since its origins, back in the mid-90's, citizen science (CS) has been questioned
by the scienti c community as an adequate scienti c methodology [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Pros and
cons aside, a basic principle to bring citizens and scientists into a productive
relationship is to match the public understanding of science with the science's
understanding of the public [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. To this end, modern citizen science is rethinking
methods for citizen engagement [
        <xref ref-type="bibr" rid="ref1 ref3">3, 1</xref>
        ]. Key concepts in participants engagement
are connection and reward. Connection refers to connecting the scienti c goals
of the CS research program with the citizen perception of a social worry or
interest (the basic motivation to start cooperating). Reward refers to providing
feedback that can be neatly perceived as a reward (the basic motivation to keep
cooperating). Nevertheless, it is well known by psychologists [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that the e ect
of a reward resides in its expectation and vanishes as soon as it is achieved.
Thus, in order to increase the likelihood of participation in the long run, it
is necessary to generate continuous reward expectations. A successful strategy
to achieve sustained participation requires the implementation of a reputation
system as a core component of any CS research program. In addition,
wellgrounded reputation systems provide back-end information of participants that
is valuable to augment data quality and to increase the tness for use [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] of CS
research programs.
      </p>
      <p>
        Reputation is a broad concept not only suitable to people but also to many
kinds of things or services [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Extending the notion given in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], reputation
is the perception that an agent (or item) creates through past actions about its
intentions, norms, knowledge, expertise or value. Reputation can be seen as an
asset, not only to promote oneself, but also to help us to make sound judgments
in the absence of any better information. However, reputation is highly
contextual and what works well in a speci c context may inevitably fail in many others.
As a consequence details about reputation systems are profusely treated in the
literature [
        <xref ref-type="bibr" rid="ref11 ref14 ref2 ref6 ref8">2, 8, 6, 11, 14</xref>
        ]. The simplest reputation systems scale down to a
ranking/voting system where information is aggregated into a single score used to
qualify and sort items (e.g. songs in iTunes, users in Stackover ow ). Systems for
collecting and representing reputation information are usually based on simple
rating mechanisms such as thumbs-up/down or a ve star rating. The di culties
arise at the time of aggregating this information.
      </p>
      <p>Many rating aggregation systems recently proposed (e.g. Amazon, iTunes,
YouTube) are di erent forms of Bayesian Rating (BR), a pseudo-average
computed as a weighted combination of the average rating of a particular item and
the average rating for all items. In a k-way rating system, (i.e with k discrete
rating levels r 2 f1; : : : ; kg), with a total of m rates and an overall rating
r (all) = Pjm=1 r (y)j =m, the BR of an item y with n ratings, and an average
rating r (y) = Pn</p>
      <p>j=1 r (y)j =n is given by,
BR (y) =
n r (y) + m r (all)
m + n
= w r (y) + (1
w) r (all)
(1)
with w = n= (n + m). A clear bene t of BR is that an item with only a few
ratings (i.e w ! 0) will approach the overall mean rating, hence, does not
receive the lowest (unfair and discouraging) rate but the average rate, while the
more the item is rated (i.e n 0) the largest the weight of its own average
rating. In any case m n, hence the scoring is focused on the quality of the
ratings rather than on the quantity of ratings.
where n (y) = Pk</p>
      <p>
        i=1 n (y)i and C is the a priori constant that can be set to
C = k if we consider a uniform prior (i.e. ai = 1=k). In this case Equation 2
defaults to the classical Laplace smoothing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The larger the value of C with
respect to k the less the in uence of the observed ratings and the more S (y)i
will approach the base rate ai. Assuming the k rating levels evenly distributed
in the range [0; 1], a point estimate reputation score is computed as,
S (y)i =
n (y)i + C ai
      </p>
      <p>C + n (y)
DR (y) =</p>
      <p>S (y)i
k
X i</p>
      <p>k
i=1</p>
      <p>
        The Beta reputation system [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (binomial) and the Dirichlet reputation
system [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (DR), the multinomial generalization of the former, are reputation
models based on a sound statistical machinery that explains away the Bayesian rating
concept and frames it in a real Bayesian perspective. Consider a k-way rating
system and let the rating level be indexed by i (i.e. 1 i k). Let n (y)i be
the rating counts for item y (the observed evidence), and let ai be a base rate
expressing the biases in our prior belief about each rating level. A Dirichlet (or
Beta for k = 2) rating yields a multinomial probability distribution S (y)i over
the k rating levels, where the expectation value for each rating level is computed
as,
      </p>
      <p>
        Multinomial form aside, the similarity with BR (Eq. 1) is clear. But the
di erence can not be obviated: while the weighting in Equation 1 emerges from
a pure frequentist persepective, in a DR the factor C ai can convey speci c a
priori information provided by domain experts or any other external source [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Agents (and in particular human agents) may change their behaviour over
time. This issue is usually approached either by incorporating a cuto factor
that limits the series of ratings to the most recent ones, up to a given period or
a given number, or by introducing a longevity factor that assigns a time relative
weight to ratings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. An additional concern in reputation systems for e-service
platforms is its resistance against typical strategies for reputation cheating (e.g.
whitewashing, sybil attacks, fraudulent ranking) reviewed in e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        CS research programs constitute a di erent scenario where the aim is not
to promote user interaction but to collect useful data for their scienti c goals.
Hence, reputation issues do not arise from peer-to-peer interaction but from the
need to increase citizen engagement and data quality. However, a systematic
review of 234 CS research programs presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] reveals that despite of this
general concern on data quality, very few has been made in terms of participants'
reputation. Data validation is usually performed by a core of domain experts or
project coordinators, eventually assited by authomated methods or with some
level of intra-community interaction (e.g. eBird, Galaxy Zoo, iSpot ) or more
broadly via crowd-sourcing (e.g. www.crowdcrafting.org). In a few cases, local
coordinators take into account the participants' experience for validating data
(2)
(3)
(e.g Common Bird Monitoring, Weather Observations Website), and just in a
handful of them it is the community of participants itself who directly validate
data (e.g Galaxy Zoo, iSpot, oldWeather ). Among the later, Notes from Nature
and iSpot [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are the ones going further in terms of community-based validation
and participants' reputation, implementing simple agreement based algorithms
to rank participants and assign digital badges in recognition of speci c
achievements. Community-based validation explicitly requires a core reputation system
integrated with the CS research program. However, there is not a general
approach and each research program implements reputation in a functional way to
t its needs, neither framing its system in general conceptual frameworks, nor
making it available to the scienti c community.
      </p>
      <p>Notably, an implicit assumption in any of the reputation models above is that
the agent (or item) being rated is the one that is scored and, more explicitely,
that the rating system used to collect ratings for an agent is the rating system
used to score that agent (e.g. Equation2). This apparently obvious and
irrelevant assumption might become more subtle in a CS framework. CS research
programs expect participants to perform a set of actions (basically, reporting
information in speci c formats) and these actions are later on validated. In this
case, the rated items are the actions, but the scored agents are the participants.
Importantly, each kind of action might require its own discrete rating system
(not necessarily coincident in the number of levels). Yet the expertise of
participants might be expressed based on a speci c set of user classes (with its own
number of levels), and scored based on the ratings of all their possibly di erent
actions. A straightforward way to overcome this problem is to compute separate
scores for each type of action and get an overall score using a weighted
combination of the former. Alternatively, we propose a novel model for user reputation
based on a Bayesian network describing the characteristic ow of CS research
programs, (i.e. user, action, validation). The proposed method (i) decouples
action rating from participant scoring, (ii) provides a uni ed framework to process
validation information regardless of the rating levels used for each kind of action,
(iii) accounts for a good balance of both quality and quantity of evidence, and
(iv) is more responsive to participants' actions, which may augment engagement
dynamics.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Mosquito Alert: CS for public health</title>
      <p>Mosquito Alert (MA) is a CS research program initially devised to monitor the
expansion in Spain of the Asian tiger mosquito (Aedes albopictus ), a disease
carrying mosquito. Since the expansion of the Zika virus threat in 2016, MA
included the monitoring of the yellow fever mosquito (Aedes aegypti ). Both species
are world wide distributed, living in urban environments, and being specially
invasive and agressive vectors of tropical infectious diseases such as Dengue,
Chikungunya, Yellow Fever, and Zika.</p>
      <p>
        Aside from its scienti c goals (e.g. unveiling the population expansion
mechanisms, forecasting vector and disease threats), a particular challenge for MA
arises from its impact on the public health domain. MA is aimed to provide
reliable early warning information (in recently colonised areas), and real-time
management information (in areas where it resides) to public health
administrators. Public health administrations at di erent organizational levels in the
territory, use MA to improve their surveillance and control programs with the
goal of decreasing mosquito populations, specially in urban areas. Because of all
this, MA is designed as a multi-platform system structured as follows:
1. The MA smartphone app (freely available for Android and iOS), by means
of which citizen can send reports of observations of mosquitoes (and their
breeding sites) potentially belonging to disease vector species (namely the
Asian Tiger and the Yellow Fever mosquito).
2. The corresponding server-side functionality (Django, SQL) managing the
reception and storage of data, along with an ever evolving set of tools for the
management and analysis of the data, including machine-learning algorithms
to help automating the validation of information [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
3. A private platform called Ento Lab. This is a restricted access service through
which a set of experts can make a previous ltering of inappropriate reports
and classify the rest as either positive or negative ones. Only classi ed reports
are afterwards made visible to the rest of the services.
4. Another private platform named Managers Portal which grants
on-demandaccess to stakeholders (e.g. public health administrations, mosquito control
services, private mosquito control companies), open GIS tools to visualize all
the available data in the portal (including their own imported management
data), and the possibility to directly comunicate control actions through the
app.
5. A public platform http://www.mosquitoalert.com, providing data and
visualization tools to all the public via interactive maps, where participants can
nd their individual contributions validated by the experts (Figure 1, right).
      </p>
      <p>The direct involvement of public health institutions make citizen truly
conscious of the usefulness of their contributions (much beyond science). Triggering
mosquito control actions in the territory through MA participation constitutes
the necessary reward to keep citizens engaged in the research program.</p>
      <p>The work described in this paper is based on data corresponding to the last
two years (2015-2016) of MA, summarized in Table 1, with more than 30000 app
downloads, 2993 active users and 5349 reports submited. Reports are of type
adult (4177) correponding to observations of adult mosquitoes (either Asian
tiger or Yellow Fever) or bSite (1172) corresponding to potential mosquitoes'
breeding sites. Reports of type adult and bSite are reviewed by experts who
manually label them. Reports of type adult are labelled as: 2, de nitely not
corresponding to the species of interest; 1, probably not corresponding to the
species of interest; 0, can not tell; 1, probably corresponding to the species of
interest; and 2, de nitely corresponding to the species of interest. Reports of
type bSite are labelled as: 1, does not correspond to a breeding site; 0, can not
tell; and +1, does correspond to a breeding site. NC stands for not-classi ed
reports which either do not provide an image or have not yet been reviewed by
the experts. The hd stands for hidden reports which correspond to reports with
improper images that are hidden by the experts (not shown in the map).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>Let's consider a Bayesian network describing the characteristic ow in a CS
research program, what we call the User-Action-Validation (UAV) network
(Figure 2, left, lightgrey nodes). In our particular case, the nodes of the UAV network
represent the following:
{ Users are participants of the CS research program aggregated in a variable
U = f1; : : : ; kg specifying an ordinal set of user-classes with increasing levels
of expertise (e.g. beginner, advanced, expert). Beyond the intuition that U
should be a discrete ordinal variable, we have no prior idea of the optimal
number of user classes we should de ne nor about their prior distribution.
{ Actions consist in submitting reports either of type adult or bSite. Thus, we
de ne A = fadult; bSiteg such that P (AjU ) speci es the probability that a
where P (A; V ) is just a normalization constant. So, if we have some means to
estimate a prior distribution P (U ), and the conditional distributions P (AjU )
and P (V jA), we can evaluate the posterior distribution of user-class expertise
for a user y given the observed sequence of actions/validations S (y) (Figure 2,
right),
We start by guessing a number of user expertise levels. For each user we consider
three features regarding to the sequence S (y): (i) the quantity of reports, (ii) the
quality of the reports, and (iii) a user's mobility index mI describing the average
area covered by the user de ned as the variance of the pairwise geolocation
distances between the reports,
mI (y) =
1</p>
      <p>X
2 jS (y) j2 (p;q)2S(y)
[ (px
qx)2 + (py
qy)2 ]</p>
      <p>(4)
where (px; py) ; (qx; qy) are the geolocation coordinates.</p>
      <p>Based on these features we de ne the following proxy variables of the
userclass U (Figure 2, left, darkgrey nodes): (i) a quantitative proxy aggregating
users sending less or more than a given number 1 of reports (N = fless; moreg);
(ii) a mobility proxy aggregating users with a mobility index lower/higher than
a given value 2 (M = flower; higherg); (iii) a quality proxy aggregating
reports in four categories: hidden, low quality (those labeled as ( 2; 1)), medium
quality (those labeled as 0), and high quality (those labeled as (1; 2)), (Q =
fhidden; low; medium; highg, we do not count here not-classi ed reports). Note
that (i) and (ii) account for the attitude of the participants, while (iii) accounts
for their skills, and both aspects are deemed important. The joint combination
of the above three proxys results in a primary partition of the users' expertise
space into 16 categories summarized in Table 2. The threshold values where
selected by looking at the corresponding histograms and taking the values that
yield the most possible balanced distribution.</p>
      <p>By looking at this table, we should now infer a set of user classes with
increasing levels of expertise. We prioritize as following: (i) the quality of the reports
before the quantity (low quality reports just result in a waste of experts' time);
(ii) the quantity of reports before the mobility index of the users (we give the
lowest priority to the mobility index because the meaning of this variable is
double folded: for surveillance purposes it is important that participants send
reports covering the broadest geographical area possible, but for control
purposes it is also important that participants keep sending reports within their
neighbourhoods). Also, we are not looking here for a ne grain discretization of
the expertise space. Taking into account the unbalances present in Table 2 it
looks reasonable to impose an ordering of the 16 expertise categories into a set
of k = 6 user-classes, i.e U = f1; : : : ; 6g, as shown in Table 3. Tables 2 and 3
together express a joint distribution P (N; Q; M; U ) from which the prior P (U )
follows straightforwardly by marginalization,</p>
      <p>P (U ) =</p>
      <p>X
N;Q;M</p>
      <p>P (N; Q; M; U )
(5)
and we get a temptative prior for the user-class variable (Table 4).</p>
      <p>Having de ned the user classes we know the user-class value of each report,
and we can make estimations (maximum a posteriori, MAP) for the action
conditional distribution P (AjU ) (Table 5) and also for the validation conditional
distributions P (V jA = adult) and P (V jA = bSite) (Table 6).</p>
      <p>Computing the posterior P (U jA; V )
Applying Bayes' rule we compute the posterior distribution P (U jA; V ),
where W = Pk</p>
      <p>i=1 p (uijA; V ). Equation 6 evaluates the probability that an action
(report) of a given type, with a given validation (rating), belongs to a particular
user-class (Table 7).</p>
      <p>Note that, in our Bayesian approach ratings become a fuzzy quali cation
of the user-class. Also note that this is indeed a two parameter model ( 1; 2)
allowing a degree of control over the user-class prior and, ultimately, over the
user-class posterior distributions, (i.e. we can push the classi cation of reports
to a lower/upper user-class by increasing/decreasing either one or both of the
parameters (this is shown later in Figure 4).</p>
      <p>Note that, so far, our UAV-network (Figure 2 , left) just yields the user-class
distribution of a single action, not the users' scoring that we aim. To compute
the scores we consider the report sequences S (y) (Figure 2, right). To add some
dynamics to the model we consider a third parameter 3 (a cuto factor ) that
limits the sequences to the last 3 reports. Assuming an iid sequence of reports,
the corresponding user-class posterior distribution is given by,
Pw j=1</p>
      <p>3
Y P (U jAj ; Vj )
(7)
(8)
where P0 = (1=k; : : : ; 1=k) sets a starting uniform user-class distribution and
Pw = Pik=1 P (uijS) is a normalization factor. Afterwards, an expertise score
can be computed as the user's expected user-class,</p>
      <p>X (y) =
k1 E[U ]P (UjS) = k1 Xk ui p (uijS)
i=1</p>
      <p>Equation 8 yields a normalized score with a lower bound given by k1 E[U ]P0
which avoids a discouraging zero-score for new comers. Usually, the
computation of Equation 7 is subject to numerical precision problems and therefore we
implement a log computation as,
log P (U jS (y)) = log P0 +</p>
      <p>3
X log P (U jAj ; Vj )
j=1
log Pw
(9)</p>
      <p>For gami cation purposes, users are ranked based on their scores. Ties are
solved by mobility index. The rank position, not the score, is noti ed to the
users via the smartphone app, together with a quantile based category label as
either gold, silver or bronze (Figure 1, left).</p>
      <p>In summary, we use the global evidence in the data base to guess the joint
distribution P (U; Q; N; M ) and estimate a prior P (U ), from which we can
derive the posteriors P (U jA; V ). Afterwards, we use the evidence observed for
each particular user S (y), to evaluate the posterior distribution P (U jS (y)) and
compute a score for that user. Essentially, our scoring model is a nave Bayes
classi er where the number of features varies with the number of reports used
to qualify the user. The larger the number of reports, the better the pro ling
of the user. Because we use the global evidence to estimate the user-class
distribution, the scores change dynamically as the contents of the data base grow
and all individual expertise scores are in part dependent on the overall average
performance.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Based on the user-class posterior distributions (Table 7) and applying
Equations 9 and 8 we get the results shown in Figure 3. In the x-axis, users are
ranked by score (blue line). Scores are plotted together with scaled versions of
the mobility index (yellow), the number of breedingSite reports (cyan) and the
number of adultMosquito reports (magenta). The scoring yields several plateaus
corresponding to typical number of submitted reports. We highlight (darkgrey
rows in Table 8) the position of users who only submitted one report, classi ed
as +1 (positions 1494:2053, 560 users), or classi ed as +2 (positions 2244:2684,
rank score
1 .16668463
2 .16684409
3 .16728058 4
26:112 .19559913
116:201 .21448389 1
221:245 .33933867
263:553 .36258186
555:628 .38297512
629:670 .38557248
730:756 .52790362
764:1064 .55214793
1069:1278 .58333333
1284:1321 .63213868
1356:1390 .82074814
1494:2053 .91446382
2139:2205 .92599170
2244:2684 .93735892
2823:2882 .95749959
2892:2899 .95809921
2911:2926 .97219402
2936:2937 .97351476
2944:2952 .98252239
2962:2965 .98930969
2978 .99617983
2984 .99888390
2991 .99999356
2992 .99999384
2993 .99999632
2
1
1
2
3
4
5
7
1 19
18
20
1
1
1
1
2
3
9
1
Fig. 4. E ect of the 3 threshold parameters: (left) e ect of the number of reports, 1,
and the mobility index, 2; (right) e ect of the cuto factor, 3
441 users), which correspond to the largest plateaus in Figure 3. As expected,
the larger the number of positive reports the higher the rank (Table 8 bottom
rows), and the larger the number of negative reports the lower the rank (Table 8
top rows), with hidden reports being strongly punished.</p>
      <p>We also analyze the e ect of the threshold parameters ( 1; 2; 3) (Figure 4).
In the x-axis, participants are ranked by score (blue line). The red line depicts the
score coresponding to a di erent value of the threshold parameters. In general,
the scores do not change much in terms of value though sudden breaks in the
increasing trend of the red line reveal users whose position in the ranking has
been a ected by the change of the parameter value. The plateaus remain almost
invariant and we only appreciate some changes of position at the borders of the
plateaus. Increasing (decreasing) 1 and 2 (Figure 4, left) together, move reports
to the left (right) columns of Table 2 and consequently force a change in the prior
distribution. As a result, the plateaus are globally pushed lower (higher). This
change is propagated to the posteriors and originates also the rank changes that
can be observed at the borders of the plateaus (Figure 4, left). In the case of
3 we observe that by not looking so far in the past (Figure 4, right), some
low rank users are upgraded (users who clearly improved their performance over
time) while some high rank users are downgraded (users who worsened their
performance over time).</p>
      <p>The dynamics of the scoring are also shown. In Figure 5 (top) we simulate
the evolution of the score (blue line) and rank (red line) for a particular user
in a static situation where nothing is changing, no new users are coming and
no reports are submitted by third users. Each submitted report is shown as a
coloured dot, where the color indicates the validation value of the report. It is
apparent that good/bad reports push the score/rank up/down. Figure 5
(bottom) shows the evolution of the score in a realistic situation to show the e ect of
third users' actions or new comers to MA. Note the double e ect of the overall
dynamics, inducing soft uctuations in the score but really signi cant changes
in the rank. The stronger dynamics of the rank makes it much more e ective for
gami cation purposes.
Fig. 5. Score dynamics: (top) simulating a static situation in which the rest of
participants do not perform any action; (bottom) real situation with new reports submitted by
third participants and new participants joining the Mosquito Alert research program.</p>
      <p>We also show a comparison of our scores with Bayesian Rating and
Dirichlet reputation scores (Figure 6). The scores used in the comparison have been
computed taking into account only the reports of type adult. In this way we
avoid to analyze second order e ects due to weight-averaging of adult and bSite
reports, given that in BR or DR these must be independently computed and
combined later on. In the x-axis, participants are ranked by our score (blue
line). BR scores (Figure 6, left, magenta line) are clearly a ected by the weight
of the overall average rating (note the scale of the right y-axis). However, BR
still yields the same plateaus and we only observe slight ranking changes at the
borders of the plateaus. These changes are due to the di erences in the leverage
of the rate values (i.e. the values P (U jA; V ) in Tables 7 versus the rating levels
r = f1; : : : ; kg used in Equation 1). DR scores (Figure 6, right) are computed
for di erent values of the C constant. It is clear that C is playing the role of the
overall average factor in BR, but DR gives us some control over it. The most
important plateaus are also found, but the di erences at the borders of the plateaus
are more signi cant. Notably, there is a great di erence in the sensibility of our
model in comparison to BR and DR. In this context, sensibility represents a
better responsiveness of the scoring in relation to participants actions, which we
consider it to be a good property to improve participants' engagement in CS
research programs.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>The model we propose is similar to a nave-bayes classi er where the number of
feature nodes varies with the number of actions performed (i.e. reports
submitted) by the user. Starting from a uniform user-class distribution, each validated
report contributes with new evidence to re ne the pro ling of the user.</p>
      <p>The key issue of our approach is to estimate a user-class prior that su ces
for our scoring purpose. We suggest to select a set of proxy variables of the
userclass, with a clear semantics in terms of user expertise, to make a guess about
this prior. Nonetheless, any alternative to compute the prior can be considered
and applied as well. Anyway, it is crucial to make a guess of the prior that leads
to a well balanced (as much as possible) prior and to well behaved (as much
as possible) posteriors (i.e. good ratings favouring higher user classes and bad
ratings favouring lower user classes). If the probability mass distribution of the
posteriors is not in clear correlation with the user classes the behaviour of the
algorithm can become non-monotonic with respect to increasing evidence about
a certain class. Thus, this step must be carefully considered.</p>
      <p>In the same way that it is not good to score new users excesively low, it is also
not good to score them excessively high. The reason to initialize the score with a
unifom instead of the prior user-class distribution is that the later will usually be
unbalanced, in our case, clearly unbalanced towards the high expertise classes
(Table 4), and consequently users with no validated ativity would be ranked
either excesively high or excessively low. Using the prior to initialize the score,
not-active users get a score of 0:74011666 (i.e. P (U jS) = P (U ) in Equation 8),
while using a uniform distribution their score is 0:58333333 (arround 0.5) and
they are positioned by the middle-low part of the rank (rank positions 1069:1278
in Table 8) which is fairly reasonable.</p>
      <p>
        As scores are relative to the performance of the whole community, scores
are quite dynamic. As participants increase their expertise, all good scores are
globally pushed higher. Nevertheless, it is the rank what is ultimately noti ed
to the participants, thus along a period of no activity, a participant might be
downgraded with time. Furthermore, in periods of no activity the score can
indeed increase if better positioned participants suddenly start sending reports of
low quality. These unexpected dynamics could easily generate some confusion or
disappointment among the participants. We avoid this situation by giving the
basic hints of our scoring system in the project's web page 5 where, indeed, we
promote the gami cation side of these features in order to use them in our favour.
Alternatively, unexpected dynamics as described above could be controlled by
implementing an age weighted rating as proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In our case, this solution
should be implemented with special care because of the seasonality of mosquito
population and, consequently, minimal report activity during winter and spring.
This long periods of minimal activity would uniform all scores and many
experienced participants might feel disappointed. At the moment, our decision is to
keep participants' scores from one season to another.
      </p>
      <p>With respect to BR and DR, while essentially capturing the same concept
of rating-based reputation, our model shows a much higher sensibility to the
observed evidence, and a good balance of both, evidence of quality (the rates
themselves) and quantity of evidence (the number of ratings). The reason lies in
the way that evidence is cumulated, i.e. by multiplication (Equation 8) instead of
by addition as in BR (Equation 1) and in DR (Equation 2). A larger sensibility
results in a stronger responsiveness to speci c participant actions.
Augmenting engagement dynamics with more sensible reputation systems may probably
bind better the participants to the long term goals of CS research programs.
Furthermore, our model decouples action validation from participant scoring by
means of an integrative and uni ed treatment of any action under consideration,
independently of the rating system used for each type of action.</p>
      <p>By summer 2017, MA is going to collect extra data from participants with
a recently added tool designed to reinforce citizen participation in the research
program, whilst easing the experts' validation task. This new tool, natively
incorporated to the app, allows citizens to validate mosquito and breeding site
images sent by third users and challenge their expertise in identifying mosquito
species. This new action will provide valuable information in terms of
participants' expertise, based on a binary rating system, i.e. right/wrong. Given the
structure of our scoring model, this information can be readily translated into a
new user-class posterior distribution and incorporated to the scoring algorithm.
5 http://www.mosquitoalert.com/en/project/send-data/</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We propose a novel reputation system based on a Bayesian network representing
the characteristic ow typically present in CS research programs where
participants are expected to perform actions that are validated later on (i.e. user,
action, validation), what we call the UAV network. In this network, the users
node represents an aggregation of participants into expertise classes. The key
issue of our approach is to estimate a prior for the user-class distribution that
su ces for our scoring pourpose. We suggest to select a set of proxy variables of
the user-class, with a clear semantics in terms of user expertise, to make a guess
about this prior. However, any other means to get a valid estimate of the prior
can readily be used. With respect to Bayesian rating and the Dirichlet
reputation models, our approach presents some advantages: (i) is more responsive to
the observed evidence, and thus, it bridges better participants with their actions,
(ii) it decouples action rating from user scoring, providing a uni ed processing
of any action under consideration, no matter the number of rating levels de ned
for each, and (iii) it yields a better balance of both, evidence of quality (the
rates themselves) and quantity of evidence (the number of ratings). As a proof
of concept this model is implemented as part of the Mosquito Alert CS research
program.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank the Mosquito Alert team for continuous e ort and
support and the Mosquito Alert community for its unvaluable cooperation. This
work is part of Mosquito Alert CS program research funded by the Spanish
Ministry of Economy and Competitiveness (MINECO, Plan Estatal I+D+I
CGL2013-43139-R) and la Caixa Banking Foundation. Mosquito Alert is
currently promoted by la Caixa Banking Foundation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>N.</given-names>
            <surname>Eyal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoover</surname>
          </string-name>
          . Hooked:
          <article-title>How to Build Habit-Forming Products</article-title>
          .
          <source>Portfolio Penguin</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Randy</given-names>
            <surname>Farmer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bryce</given-names>
            <surname>Glass</surname>
          </string-name>
          .
          <source>Building Web Reputation Systems. Yahoo! Press, USA, 1st edition</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>BJ</given-names>
            <surname>Fogg</surname>
          </string-name>
          .
          <article-title>A behavior model for persuasive design</article-title>
          .
          <source>In Proceedings of the 4th International Conference on Persuasive Technology, Persuasive '09</source>
          , pages
          <issue>40:1</issue>
          {
          <issue>40</issue>
          :
          <fpage>7</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Eric</given-names>
            <surname>Friedman</surname>
          </string-name>
          , Paul Resnick, and
          <string-name>
            <given-names>Rahul</given-names>
            <surname>Sami</surname>
          </string-name>
          .
          <article-title>Manipulation-resistant reputation systems</article-title>
          . In Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, editors,
          <source>Algorithmic Game Theory</source>
          , pages
          <volume>677</volume>
          {
          <fpage>698</fpage>
          . Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Roy</surname>
            <given-names>H.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preston M.J.O</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savage</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tweddle</surname>
            <given-names>J.C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Robinson L.D. Understanding Citizen</surname>
            <given-names>Science &amp; Environtmental</given-names>
          </string-name>
          <string-name>
            <surname>Monitoring</surname>
          </string-name>
          .
          <source>Final Report on behalf of UK-EOF</source>
          .
          <article-title>NERC Centre for Ecology &amp; Hidrology and Natural History Museum</article-title>
          ,
          <year>November 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ferry</given-names>
            <surname>Hendrikx</surname>
          </string-name>
          , Kris Bubendorfer, and
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Chard</surname>
          </string-name>
          .
          <article-title>Reputation systems</article-title>
          .
          <source>J. Parallel Distrib. Comput.</source>
          , 75(C):
          <volume>184</volume>
          {197, jan
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Irwin</surname>
          </string-name>
          . Citizen Science:
          <article-title>A Study of People, Expertise</article-title>
          and
          <string-name>
            <given-names>Sustainable</given-names>
            <surname>Development</surname>
          </string-name>
          . Routledge,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Audun</surname>
            <given-names>J</given-names>
          </string-name>
          sang.
          <article-title>Trust and reputation systems</article-title>
          .
          <source>In Alessandro Aldini and Roberto Gorrieri</source>
          , editors,
          <source>Foundations of Security Analysis and Design IV: FOSAD</source>
          <year>2006</year>
          /2007 Tutorial Lectures, pages
          <volume>209</volume>
          {
          <fpage>245</fpage>
          , Berlin, Heidelberg,
          <year>2007</year>
          . SpringerVerlag.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Audun</given-names>
            <surname>Josang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Haller</surname>
          </string-name>
          .
          <article-title>Dirichlet reputation systems</article-title>
          .
          <source>In Availability, Reliability and Security</source>
          ,
          <year>2007</year>
          .
          <source>ARES</source>
          <year>2007</year>
          . The Second International Conference on, pages
          <volume>112</volume>
          {
          <fpage>119</fpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Audun</surname>
            <given-names>J sang and Roslan</given-names>
          </string-name>
          <string-name>
            <surname>Ismail</surname>
          </string-name>
          .
          <article-title>The beta reputation system</article-title>
          .
          <source>BLED 2002 Proceedings, page 41</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. L.
          <string-name>
            <surname>Mui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mohtashemi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Halberstadt</surname>
          </string-name>
          .
          <article-title>A computational model of trust and reputation for e-businesses</article-title>
          .
          <source>In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)</source>
          , volume
          <volume>7</volume>
          <source>of HICSS '02</source>
          , pages
          <fpage>188</fpage>
          {, Washington, DC, USA,
          <year>2002</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Colin</given-names>
            <surname>Robertson</surname>
          </string-name>
          .
          <source>Whitepaper on citizen science for environmental research</source>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Antonio Rodriguez, Frederic Bartumeus, and
          <string-name>
            <given-names>Ricard</given-names>
            <surname>Gavalda</surname>
          </string-name>
          .
          <article-title>Machine learning assists the classi cation of reports by citizens on disease-carrying mosquitoes</article-title>
          . In Ricard Gavalda, Indre Zliobaite, and Joa~o Gama, editors,
          <source>Proceedings of the First Workshop on Data Science for Social Good co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Dicovery in Databases, SoGood@ECML-PKDD 2016, Riva del Garda, Italy, September</source>
          <volume>19</volume>
          ,
          <year>2016</year>
          , volume
          <volume>1831</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Jordi</given-names>
            <surname>Sabater</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carles</given-names>
            <surname>Sierra</surname>
          </string-name>
          .
          <article-title>Review on computational trust and reputation models</article-title>
          .
          <source>Artif. Intell. Rev.</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <volume>33</volume>
          {
          <fpage>60</fpage>
          ,
          <year>September 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>J. Silvertown</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Harvey</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Greenwood</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dodd</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rosewell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Rebelo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ansine</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>McConway</surname>
          </string-name>
          .
          <article-title>Crowdsourcing the identi cation of organisms: A casestudy of ispot</article-title>
          .
          <source>ZooKeys</source>
          ,
          <volume>480</volume>
          :
          <fpage>125</fpage>
          {
          <fpage>146</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>