=Paper= {{Paper |id=Vol-2697/paper3_impactrs |storemode=property |title=Intelligent Recommendations for Citizen Science |pdfUrl=https://ceur-ws.org/Vol-2697/paper3_impactrs.pdf |volume=Vol-2697 |authors=Naama Dayan,Kobi Gal,Avi Segal,Guy Shani,Darlene Cavalier |dblpUrl=https://dblp.org/rec/conf/recsys/DayanGSSC20 }} ==Intelligent Recommendations for Citizen Science== https://ceur-ws.org/Vol-2697/paper3_impactrs.pdf
                     Intelligent Recommendations for Citizen Science
                  Na’ama Dayan                                                    Kobi Gal                                               Avi Segal
          Ben-Gurion University, Israel                              Ben-Gurion University, Israel                          Ben-Gurion University, Israel
            namadayan@gmail.com                                      University of Edinburgh, U.K.                              avisegal@gmail.com
                                                                           kobig@bgu.ac.il

                                                   Guy Shani                                        Darlene Cavalier
                                       Ben-Gurion University, Israel                        Arizona State University, USA
                                           shanigu@bgu.ac.il                                   darlene@scistarter.com

ABSTRACT                                                                                 INTRODUCTION
Citizen science refers to scientific research that is carried out by                     Citizen science engages people in scientific research by collecting,
volunteers, often in collaboration with professional scientists. The                     categorizing, transcribing, or analyzing scientific data [3, 4, 10].
spread of the internet has significantly increased the number of                         These platforms offer thousands of different projects which ad-
citizen science projects and allowed volunteers to contribute to                         vance scientific knowledge all around the world. Through citizen
these projects in dramatically new ways. For example, SciStarter,                        science, people share and contribute to data monitoring and col-
our partners in the project, is an online portal that offers more than                   lection programs. Usually this participation is done as an unpaid
3,000 affiliate projects and recruits volunteers through media and                       volunteer. Collaboration in citizen science involves scientists and re-
other organizations, bringing citizen science to people. Given the                       searchers working with the public. Community-based groups may
sheer size of available projects, finding the right project, which                       generate ideas and engage with scientists for advice, leadership, and
best suits the user preferences and capabilities, has become a ma-                       program coordination. Interested volunteers, amateur scientists,
jor challenge and is essential for keeping volunteers motivated                          students, and educators may network and promote new ideas to
and active contributors. This paper addresses this challenge by                          advance our understanding of the world. Scientists can create a
developing a system for personalizing project recommendations                            citizen-science program to capture more or more widely spread
in the SciStarter ecosystem. We adapted several recommendation                           data without spending additional funding. Citizen-science projects
algorithms from the literature based on collaborative filtering and                      may include wildlife-monitoring programs, online databases, visu-
matrix factorization. The algorithms were trained on historical data                     alization and sharing technologies, or other community efforts.
of users’ interactions in SciStarter as well as their contributions to                      For example, the citizen science portal SciStarter (scistarter.com),
different projects. The trained algorithms were deployed in SciS-                        which also comprises our empirical methodology, includes over
tarter in a study involving hundreds of users who were provided                          3,000 projects, and recruits volunteers through media and other or-
with personalized recommendations for projects they had not con-                         ganizations (Discover, the Girl Scouts, etc). As of July, 2020, there are
tributed to before. Volunteers were randomly divided into different                      82,014 registered users in SciStarter. Examples of popular projects
cohorts, which varied the recommendation algorithm that was used                         on SciStarter include iNaturalist 1 in which users map and share
to generate suggested projects. The results show that using the new                      observations of biodiversity across the globe; CoCoRaHS2 , where
recommendation system led people to contribute to new projects                           volunteers share daily readings of precipitation; and StallCatchers 3 ,
that they had never tried before and led to increased participation                      where volunteers identify vessels in the brain as flowing or stalled.
in SciStarter projects when compared to cohort groups that were                          Projects can be taken either online or at a specific physical region.
recommended the most popular projects, or did not receive rec-                           Users visit SciStarter in order to discover new projects to participate
ommendations, In particular, the cohort of volunteers receiving                          in and keep up to date with the community events. Figure 1 shows
recommendations created by an SVD algorithm (matrix factoriza-                           the User Interface of SciStarter.
tion) exhibited the highest levels of contributions to new projects,                        According to a report from the National Academies of Sciences,
when compared to the other cohorts. A follow-up survey conducted                         Engineering, and Medicine [19], citizen scientists’ motivations are
with the SciStarter community confirms that users were satisfied                         “strongly affected by personal interests,” and participants who en-
with the recommendation tool and claimed that the recommen-                              gage in citizen science over a long period of time “have successive
dations matched their personal interests and goals. Based on the                         opportunities to broaden and deepen their involvement.” Thus, sus-
positive results, our recommendation system is now fully integrated                      tained engagement through the use of intelligent recommendations
with SciStarter. The research has transformed how SciStarter helps                       can improve data quality and scientific outcomes for the projects
projects recruit and support participants and better respond to their                    and the public.
needs.                                                                                      Yet, finding the RIGHT project–one that matches interests and
                                                                                         capabilities, is like searching for a needle in a haystack [5, 24].
                                                                                         Ponciano et al. [22] who characterized volunteers’ task execution
Proceedings of the ImpactRS Workshop at ACM RecSys ’20, September 25, 2020, Virtual
Event, Brazil.
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons   1 https://scistarter.org/seek-by-inaturalist
License Attribution 4.0 International (CC BY 4.0).                                       2 https://scistarter.org/cocorahs-rain-hail-snow-network

.                                                                                        3 https://scistarter.org/stall-catchers-by-eyesonalz
                                                                           users’ interactions on the SciStarter portal, (e.g., searching for a
                                                                           project). The output of the algorithm is a function from user profile
                                                                           and past history of interactions on SciStarter to a ranking of 10
                                                                           projects in order of inferred relevance for the user.
                                                                               We measured two types of user interactions, which were taken
                                                                           as the input to the algorithms: (1) Interactions with projects: data
                                                                           generated as a result of users’ activities with projects, e.g joining
                                                                           a project, making a contribution to a project or participating in a
                                                                           project. (2) Interactions on Scistarter portal, such as searching for a
                                                                           project, or filling a form about the project. The algorithm matches
                                                                           a user profile and his past history of interactions and outputs a
                                                                           ranking of 10 projects in decreasing order of relevance for each
                                                                           user.
                                                                               We conducted a randomized controlled study, in which hundreds
               Figure 1: SciStarter User Interface                         of registered SciStarter users were assigned recommendations by
                                                                           algorithms using different approaches to recommend projects. The
                                                                           first approach personalized projects to participants by using collab-
patterns across projects and showed that volunteers tend to explore        orative filtering algorithms (item-based and user-based), and matrix
multiple projects in citizen science platforms, but they perform tasks     factorization (SVD) algorithms. These algorithms were compared to
regularly in just a few of them. This result is also reflected in users’   two non-personalized algorithms: the first algorithm recommended
participation patterns in Scistarter. Figure 2 shows a histogram of        the most popular projects at that point in time, and the second algo-
the number of projects that users contribute to on the site between        rithm recommended three projects that were manually determined
2017 and 2019. As shown by the Figure, the majority of active users        by the SciStarter admins and custom to change during the study.
in the SciStarter portal do not contribute to more than a single           The results show that people receiving the personalized recommen-
project.                                                                   dations were more likely to contribute to new projects that they
   SciStarter employs a search engine (shown in Figure 3) that             had never tried before and participated more often in these projects
uses topics, activities, location and demographics (quantifiable           when compared to participants who received non-personalized rec-
fields) to suggest project recommendations. However, recommend-            ommendations, or did not receive recommendations, In particular,
ing projects based on this tool has not been successful. To begin          the cohort of participants receiving recommendations created by
with, our analysis shows that about 80% of users do not use the            the SVD algorithm (matrix factorization) exhibited the highest lev-
search tool. Second, those who use the search tool For example,            els of contributions to new projects, when compared to the other
when querying outdoor projects, the search engine recommends               personalized groups. A follow-up survey conducted with the SciS-
the CoCoRaHS project and Globe at Night, in which volunteers               tarter community confirms that the Based on the positive results,
measure and submit their night sky brightness observations. But            our recommendation system is now fully integrated with SciStarter.
data shows that people who join CoCoRaHS are more likely to join           This research develops a recommendation system for citizen sci-
Stall Catchers, an indoor, online project to accelerate Alzheimer’s        ence domain. It is the first study using AI based recommendation
research.                                                                  algorithms in large scale citizen science platforms.
   We address this challenge by using recommendation algorithms
to match individual volunteers with new projects based on the past
history of their interactions on the site [2, 7]. Recommendation           1     RELATED WORK
systems have been used in other domains, such as e-commerce,               This research relates to past work in using AI to increase partic-
news, and social media [8, 13]. However, the nature of interaction         ipants’ motivation in citizen science research as well as work in
in citizen Science is fundamentally different than these domains           aplying recommendation systems in real world settings. We list
in that volunteers are actively encouraged to contribute their time        relevant work in each of these two areas.
and effort to solve scientific problems. Compared to clicking on an
advertisement or a product, as is the case for e-commerce and news
sites, considerable more effort is required from a citizen science vol-
                                                                           1.1    Citizen Science - Motivation and level of
unteer. Our hypothesis was that personalizing recommendations to                  engagement
users will increase their engagement in the SciStarter portal as mea-      Online participation in citizen science projects has become very
sured by the number of projects that they contribute to following          common [21]. Yet, most of the contributions rely on a very small
the recommendations, and the extent of their contributions.                proportion of participants [25]. In SciStarter, the group of partici-
   We attempted to enhance participant engagement to SciStarter            pants who contribute to more than 10 projects is less than 10% of
projects by matching users with new projects based on past history         all users. However, in most citizen science projects, the majority
of their interactions on the site. We adopted 4 different recommen-        of participants carry out only a few tasks. Many researches have
dation algorithms to the citizen science domain. The input to the          explored the incentives and motivations of participants in order to
algorithms consists of data representing users’ interactions with          increase participants engagement. Kragh et al. [15] showed that
affiliated projects (e.g., joining or contributing to a project), and      participants in citizen science projects are motivated by personal
                                  Figure 2: Distribution of user participation in SciStarter projects


                                                                        as well. Segal et al. [29] have developed an intelligent approach
                                                                        which combines model-based reinforcement learning with off-line
                                                                        policy evaluation in order to generate intervention policies which
                                                                        significantly increase users’ contributions. Laut et al. [17] have
                                                                        demonstrated how participants are affected by virtual peers and
                                                                        showed that participants’ contribution can be enhanced through
                                                                        the presence of virtual peers.
                                                                           Ponciano et al. [22] characterized volunteers’ task execution pat-
                                                                        terns across projects and showed that volunteers tend to explore
                                                                        multiple projects in citizen science platforms, but they perform
                                                                        tasks regularly in just a few of them. They have also showed that
Figure 3: Screenshot of existing search tool showing various            volunteers recruited from other projects on the platform tend to
criteria                                                                get more engaged than those recruited outside the platform. This
                                                                        finding is a great incentive to increase user engagement in SciS-
                                                                        tarter’s platform instead of in the projects’ sites directly, like we do
                                                                        in our research.
interest and desire to learn something new, as well as by the de-          In this research, we attempted to enhance participant engage-
sire to volunteer and contribute to science. A prior work of Raddic     ment with citizen science projects by recommending the user projects
et al. [23] also showed that participants engagement has mainly         which best suit the user preferences and capabilities.
originated in pure interest in the project topic, such as astronomy
and zoology. Yet, as we tested this finding in our collected data, we
noticed that user interest is very diverse and does not include only
                                                                        1.2    Increasing user engagement with
one major topic of interest. Nov et al. [21] explored the different            recommendations
motivations of users to contribute, by separating this question to      Similar to our work, other researchers, also tried to increase user
quantity of contribution and quality of contributions. They showed      engagement and participation by personalized recommendations.
that quantity of contribution is mostly determined by the user          Labarthe et al. [16] built a recommender system for students in
interest in the project and by social norms while quality of con-       MOOCs that recommends relevant and rich-potential contacts with
tribution is determined by understanding the importance of the          other students, based on user profile and activities. They showed
task and by the user’s reputation. In our work we aimed to increase     that by recommending this list of contacts, students were much
only the quantity of contributions, since data about the quality of     more likely to persist and engage in MOOCs. A subsequent work
contribution is not available for us.                                   of Dwivedi et al. [7] developed a recommender system that recom-
   A significant prior work was done in order to increase partici-      mends online courses to students based on their grades in other
pants engagement, which takes into consideration user motivation        subjects. This recommender was based on collaborative filtering
techniques and particularly item based recommendations. This             2   METHODOLOGY
paper showed that users who interacted with the recommenda-              Our goals for the research project were to (1) help users discover
tion system increased their chance to finish the MOOC by 270%,           new projects in the SciStarter ecosystem - matching them with
compared to users who did not interact with the recommendation           projects that are suitable to their preferences. (2) learn user behavior
system.                                                                  in SciStarter, and develop a recommendation system which will
   Some other studies that concern user engagement with recom-           help increase the number of project they contribute to. (3) measure
mendations systems showed how early intervention significantly           users’ satisfaction with the recommendation system.
increase user engagement. Freyne et al. [9] showed that users who           We adopted several canonical algorithms from the recommen-
received early recommendations in social networks are more likely        dation systems literature: CF user based [28], CF item based [28],
to continue returning to the site. They showed a clear difference        Matrix Factorization [27], Popularity [1]. These approaches were
in retention rate between the control group, which has lost 42% of       chosen as they are all based on analyzing the interactions between
the users, and a group that interacted with the recommendations,         users and items and do not rely on domain knowledge which is lack-
which has lost only 24% of the users.                                    ing (such as project’s location, needed materials, ideal age group
   Wu et al. [32], showed how tracking user’s clicks and return          etc.). Each algorithm receives as input a target user and the num-
behaviour in news portals succeeds to increase user engagement           ber of recommendations to generate (N). The algorithm returns a
with their recommendation system. They formulated the optimiza-          ranking of top N projects in decreasing order of relevance for the
tion of long-term user engagement as a sequential decision making        user. We provide additional details about each algorithm below.
problem, where a recommendation is based on both the estimated
immediate user click and the expected clicks results from the users’     2.0.1 User-based Collaborative Filtering. In this algorithm, the rec-
future return.                                                           ommendation is based on user similarities [28]. The ranking of
   Lin et al. [18], developed a recommendation system for crowd-         a project for a target user is computed by comparing users who
sourcing which incorporates negative implicit feedback into a pre-       interacted with similar projects. We use a KNN algorithm [6] to
dictive matrix factorization model. They showed that their models,       find similar users, where the similarity score for user vector U1
which consider negative feedback, produce better recommenda-             and user vector U2 from the input matrix, is calculated with cosine
tions than the original MF approach of implicit feedback. They           similarity.
evaluated their findings via experiment with data from Microsoft’s                                                  𝑈1 ∗𝑈2
                                                                                          𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑈 1, 𝑈 2) =
internal Universal Human Relevance System and showed that the                                                     ||𝑈 1||||𝑈 2||
quality of task recommendations is improved with their models. In        We chose the value of 𝐾 to be the minimal number such that the
our work, we use only positive implicit feedback, due to the low         number of new projects in the neighborhood of similar users to the
users traffic, where a significant evidence of negative feedback is      target user equaled the number of recommendations. In practice 𝐾
hard to be found.                                                        was initially chosen to be 100 and increased until this threshold was
   Recommendation algorithms are mostly evaluated by their ac-           met. This was done so that there will always be sufficient number
curacy. The underlying assumption is that accuracy will increase         of projects to recommend for users.
user satisfaction and ultimately lead to higher engagement and
retention rate. However, past research has suggested that accuracy       2.0.2 Item-based Collaborative Filtering. In this algorithm the rec-
does not necessarily lead to satisfaction. Wu et al [31] investigated    ommendation is based on project similarity [28]. The algorithm gen-
the effects of popular approaches such as collaborative-filtering        erates recommendations based on the similarity between projects
and content-based to see if they have different effects on user satis-   calculated using people’s interaction with these projects. Similarity
faction. Results of the study suggested that product awareness (the      score for project vector P1 and project vector P2 from the input
set of products that the user is initially aware of before using any     matrix, is calculated with cosine similarity.
recommender system) plays an important role in moderating the                                                             𝑃1 ∗ 𝑃2
                                                                                     𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑃1, 𝑃2) = 𝑐𝑜𝑠 (𝜃 ) =
impact of recommenders. Particularly, if a consumer had a relatively                                                    ||𝑃1||||𝑃2||
niche awareness set, chances are that content based systems would
                                                                           The algorithm then recommends on the top-N most similar
garner more positive responses on the satisfaction of the user. On
                                                                         projects to the set of projects the user has interacted with in the
the other hand, they showed that users who are more aware of
                                                                         past.
popular items, should be targeted with collaborative filtering sys-
tems instead. A subsequent work of Nguyen et al [20], showed that        2.0.3 Matrix Factorization - SVD. The Matrix factorization algo-
individual users’ preferences for the level of diversity, popularity,    rithm (SVD) directly predicts the relevance of a new project to a
and serendipity in recommendation lists cannot be inferred from          target user by modeling the user-project relationship [14, 27]. This
their ratings alone. The paper suggested that user satisfaction can      model-based algorithm (as opposed to the two memory based algo-
be improved by integrating users’ personality traits into the process    rithms presented earlier) was chosen since it is one of the leading
of generating recommendations, which were obtained by a user             recommendation system algorithms [11, 14, 26]. SVD uses a matrix
study.                                                                   where the users are rows, projects are columns, and the entries
                                                                         are values that represent the relevance of the projects to the users.
                                                                         This users-projects matrix is often very sparse and has many miss-
                                                                         ing values, since users engage with a very small portion of all the
                                                                         available items.
   The algorithm estimates the relevance of a target project for a             Fig 4 shows results of the precision and recall curves for the 4
user by maintaining a user model and a project model that include          examined algorithms. As can be seen from the figure, user-based
hidden variables (latent factors) that can affect how users choose         collaborative filtering and SVD are the best algorithms and their
items. These variables have no semantics, they are simply numbers          performance is higher than Popularity and Item based collaborative
in a matrix; in reality, aspects like gender, culture, age etc. may        filtering. The Popularity recommendation algorithm generated the
affect the relevance, but we do not have access to them.                   lowest performance.
   The singular value decomposition (SVD) of any matrix 𝑅 is a
factorization of the form 𝑈 𝑆𝑉 𝑇 . This algorithm is used in recom-
mendation systems in order to find the multiplication of the three
matrices 𝑈 , 𝑆, 𝑉 𝑇 , to estimate the original matrix 𝑅 and hence,
to predict the missing values in the matrix. As mentioned above,
the matrix 𝑅 includes missing values as users did not participate
in all projects. We estimate the missing values which reflect how
satisfied will the user be with an unseen project. In the settings
of recommendation system, the matrix 𝑈 is a left singular matrix,
representing the relationship between users and latent factors. 𝑆
is a rectangular diagonal matrix with non-negative real numbers
on the diagonal, while 𝑉 𝑇 is a right singular matrix, indicating
the similarity between items and latent factors. SVD decreases the
dimension of the utility matrix 𝑅, by extracting its latent factors. It
maps each user and item into a latent space with r dimensions and
with this, we can better understand the relationship between users                 Figure 4: Precision/Recall results on offline data
and projects, and compare between their vectors’ representations.
Let R̂ be the estimation of the original matrix R. Given R̂, which
includes predictions for all the missing values in R, we can rank
each project for a user, by its score in R̂. The projects with the
                                                                           3.2      Online Study
highest ranking are then recommended to the user. In our settings,         The second part of the study was an online experiment. Users
like in the other algorithms described before, R̂ is a binary matrix.      who logged on to SciStarter starting on December 2nd, 2019 were
                                                                           randomly assigned to one of 5 cohorts, each providing recommen-
                                                                           dations based on different algorithm: (1) Item-based Collaborative
3     RESULTS
                                                                           Filtering, (2) User-based Collaborative Filtering, (3) Matrix Factor-
The first part of the study compares the performance of the different      ization, (4) Most popular projects, (5) Promoted projects. Promoted
algorithms on historical SciStarter Data. The second part of the           projects were manually determined by SciStarter and often aligned
study implements the algorithms in the wild, and actively assigns          with social initiatives and current events. Among these projects are
recommendations to users using the different algorithms.                   GLOBE Observer Clouds4 , Stream Selfie 5 and TreeSnap 6 . Another
   Of the 3000 existing projects SciStarter offers, 153 projects are       example is FluNearYou, in which individuals report flu symptoms
affiliate projects. An affiliate project is one that uses a specific API   online, was one of the promoted projects during the COVID-19
to report back to SciStarter each time a logged in SciStarter user         outbreak. These projects change periodically by the SciStarter ad-
has contributed data or analyzed data on that project’s website or         ministrators.
app. As data of contributions and participation only existed for the          The recommendation tool was active on SciStarter for 3 months.
affiliate projects, we only used these projects in the study.              Users who logged on during that time were randomly divided into
                                                                           cohorts, each receiving a recommendation from a different algo-
3.1    Offline Study                                                       rithm. Each cohort had 42 or 43 users. The recommendations were
The training set for all algorithms consisted of data collected be-        embedded in the user’s dashboard in decreasing order of relevance,
tween January 2012 to September 2019. It included 6353 users who           in sets of three, from left to write. Users could scroll to reveal more
contributed to 127 different projects. For the collaborative filtering     projects in decreasing or increasing order of relevance. Figure 5
and SVD algorithm, we restricted the training set to users that made       shows the top three recommended projects for a target user.
at least two activities during that time frame, whether contributing          All registered users in SciStarter received notification via email
to a project or interacting on the SciStarter portal. We chronolog-        about the study, stating that the “new SciStarter AI feature provides
ically split the data into train and test sets such that 10% of the        personalized recommend projects based on your activity and inter-
latest interactions from each user are selected for the test set and       ests.” A link to a blog post containing more detailed explanations
the remaining 90% of the interactions are used for the train set.          of recommendation algorithms, their role in the study, emphasiz-
As a baseline, we also considered an algorithm that recommends             ing that‘ ‘all data collected and analyzed during this experiment
project according to decreasing order of popularity, measured by           on SciStarter will be anonymized." Also, users are allowed to opt
the number of users who contribute to the project [1].                     4 https://scistarter.org/globe-observer-clouds
   We evaluate the top-n recommendation result using precision             5 https://scistarter.org/stream-selfie

and recall metrics with varying number of recommendations.                 6 https://scistarter.org/treesnap
        Figure 5: Screenshot of recommendation tool


out of receiving recommendations at any point, by clicking on the
link “opt out from these recommendations" in the recommendation
panel. In practice, none of the participants selected the opt out       Figure 6: Click through rate (top) and Hit rate (bottom) mea-
option at any point in time.                                            sures for online study
   Figure 6 (top) shows the average click trough rate (defined as the
ratio recommended projects that the users accessed) and Figure 6
(bottom) shows the average hit rate (defined as the percentage of
instances in which users accessed at least one project that was
recommended to them). As shown by the Figure, both measures
show a consistent trend, in which the user-based collaborative algo-
rithms achieved the best performance, while the baseline method
achieving worse performance. Despite the trend, the differences
between conditions were not statistically significant in the 𝑝 < 0.05
range. We attribute this to the fact that we measured clicks on
recommended projects rather than actual contributions which is
the most important aspect for citizen science.
   To address this gap we defined two new measures that consider
the contributions made by participants to projects, which considers
the system utility and identified by Gunawardana and Shani [12].        Figure 7: Average activities on recommended projects
The measures include the average number of activities that users        (RecE), and on non-recommended projects (NoRecE) for
carried out in recommended projects (RecE), and the average num-        each condition
ber of activities that users carried out in non-recommended projects
(NoRecE). Figure 7 compares the different algorithms according
to these two measures. The results show that users assigned to          number of sessions in historical data. These results are statistically
the intelligent recommendation conditions performed significantly       significant. Although there is a clear trend that users in the 𝑆𝑉 𝐷
more activities in recommended projects than those assigned to          condition achieved the highest number of sessions, these results
the Popularity and Baseline conditions. Also, users in the SVD algo-    were not significant in the 𝑝 < 0.05 range.
rithm performed significantly less activities in non-recommended           To explain the success of SVD’s good performance in the online
projects than the Popularity and Baseline conditions. These results     study, we note first that SVD is considered as a leading algorithm
were statistically significant according to Mann-Whitney tests (see     in the domain of recommendation systems [11, 26]. Second, in our
Appendix for details).                                                  setting SVD tended to generate recommendations that participants
   Lastly, we measure the average number of sessions for users in       had not heard about before, which the survey reveals to be more
the different conditions, where sessions are defined as a continuous    interesting to them. One participant remarked: "I did not click on
length of time in which the user is active in a project. Figure 8       either project because I have looked at both projects (several times)
shows the average number of sessions for users in the different         previously", "I am more interested in projects I didn’t know exists
cohorts, including the number of sessions for the historical data       before".
used to train the algorithms, in which no recommendations were             Lastly, we note the obstacles we encountered when carrying
provided. The results show that users receiving recommendations         out the study. The first obstacle we encountered was the small
from the personalized algorithms performed more sessions than the       number of relevant projects that could be recommended. Out of
                                                                           anxiety". (3) Users who feel that the recommendations are not suit-
                                                                           able for their interests: "No interest in stall catchers", "The photos
                                                                           and title didn’t perfectly match what I am looking for".
                                                                              The survey provided evidence for the positive impact of using the
                                                                           recommendation systems in SciStarter, which include the following
                                                                           comments. “I am very impressed by the new Artificial Intelligence
                                                                           feature from SciStarter! Your AI feature shows me example projects
                                                                           that I didn’t know before exist", and “I like how personalized rec-
                                                                           ommendations are made for citizen science users".

                                                                           4    CONCLUSION AND FUTURE WORK
  Figure 8: Average number of sessions for each condition                  This work reports on the use of recommendation algorithms to
                                                                           increase engagement of volunteers in citizen science, in which vol-
                                                                           unteers collaborate with researchers to perform scientific tasks.
                                                                           These recommendation algorithms were deployed in SciStarter, a
                                                                           portal with thousands of citizen science projects, and were eval-
almost 3000 projects that SciStarter offers, we restricted ourselves       uated in an online study involving hundreds of users who were
to about 120 projects are affiliate projects which actively provide        informed about participating in a study involving AI based recom-
data of users’ interactions. Another obstacle was that we were             mendation of new projects. We trained different recommendation
constrained to a subset of users who log on to SciStarter and use          algorithms using a combination of data including users’ behavior in
it as a portal to contributing to the project, rather than accessing       SciStarter as well as their contributions to the specific project. Our
the project directly. Out of the 65,000 registered users of SciStarter,    results show that using the new recommendation system led people
only a small percentage are logged in to both SciStarter and an            to contribute to new projects that they had never tried before and
affiliate project. As a result, we have relatively few users getting       led to increased participation in SciStarter projects when compared
recommendations. In addition, some of SciStarter’s projects are            to a baseline cohort group that did not receive recommendations.
location-specific and can only be done by users in the same physical       The outcome of this research project is the AI-powered Recommen-
location. (e.g collecting a water sample from a particular lake located    dation Widget which has been fully deployed in SciStarter. This
in a particular city). Therefore, we kept track of users’ location         project has transformed how SciStarter helps projects recruit and
and restricted our recommendation system to be a location-based            support participants and better respond to their needs. It was so
system, which recommends users with projects they are able to              successful in increasing engagement, that SciStarter has decided to
participate in.                                                            make the widget a permanent feature of their site. This will help
                                                                           support deeper, sustained engagement to increase the collective
                                                                           intelligence capacity of projects and generate improved scientific,
3.3    User Study                                                          learning, and other outcomes. The results of this research have
In order to learn what is the users’ opinion on the recommenda-            been featured on the DiscoverMagazine.com 7 . While we observed
tions, and their level of satisfaction, we conducted a survey with         significant engagement with the recommendation tool, one may
SciStarter’s users. Our survey was sent to all SciStarter commu-           consider adding explanations to the recommendations in order
nity users. 138 users have filled the survey, where each user was          to increase the system’s reliability and user’s satisfaction with it.
asked about the recommendations presented to him by the algo-              Moreover, we plan to extend the recommendation system to include
rithm he was assigned to. The survey included questions about              content based algorithms, and test its performance as compared
users’ overall satisfaction with the recommendation tool as well           to the existing algorithms. We believe that integrating content in
as questions about their pattern of behavior before and after the          Citizen Science domain can be very beneficial. Even though users
recommendations. The majority of users (75%) were very satisfied           tend to participate in a variety of different projects, we want to be
with the recommendation tool and claimed that the recommenda-              able to capture more intrinsic characteristic of the projects, such as
tions matched their personal interests and goals. The majority of          the type of the task a user has to perform or the required effort.
users (54%) reported they have clicked on the recommendations
and visited the project’s site, while only 8% of users did not click the   REFERENCES
recommendation or visited the project site. Interestingly, users who        [1] Hyung Jun Ahn. 2006. Utilizing popularity characteristics for product recom-
were not familiar with the recommended projects before, clicked                 mendation. International Journal of Electronic Commerce 11, 2 (2006), 59–80.
                                                                            [2] Xavier Amatriain. 2013. Big & personal: data and models behind netflix recom-
more on the recommendations, as well as users who previously                    mendations. In Proceedings of the 2nd international workshop on big data, streams
performed a contribution to a project.                                          and heterogeneous source Mining: Algorithms, systems, programming models and
   Users who did not click on the recommendations can be divided                applications. ACM, 1–6.
                                                                            [3] Rick Bonney, Caren B Cooper, Janis Dickinson, Steve Kelling, Tina Phillips,
into 3 main themes: (1) Users who don’t have the time right now or              Kenneth V Rosenberg, and Jennifer Shirk. 2009. Citizen science: a developing
will click the project in the future. (2) Users who feel that the recom-        tool for expanding science knowledge and scientific literacy. BioScience 59, 11
mendations are not suitable for their skills and materials: "Seemed             (2009), 977–984.

out of my league", "I didn’t have the materials to participate". This      7 https://www.discovermagazine.com/technology/ai-powered-smart-project-
behaviour was also discussed in [30], and was named "classification        recommendations-on-scistarter
 [4] Dominique Brossard, Bruce Lewenstein, and Rick Bonney. 2005. Scientific knowl-             International Conference on World Wide Web. 331–337.
     edge and attitude change: The impact of a citizen science project. International      [31] Ling-Ling Wu, Yuh-Jzer Joung, and Jonglin Lee. 2013. Recommendation sys-
     Journal of Science Education 27, 9 (2005), 1099–1121.                                      tems and consumer satisfaction online: moderating effects of consumer product
 [5] Hillary K Burgess, LB DeBey, HE Froehlich, Natalaie Schmidt, Elli J Theobald,              awareness. In 2013 46th Hawaii International Conference on System Sciences. IEEE,
     Ailene K Ettinger, Janneke HilleRisLambers, Joshua Tewksbury, and Julia K                  2753–2762.
     Parrish. 2017. The science of citizen science: Exploring barriers to use as a         [32] Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning
     primary research tool. Biological Conservation 208 (2017), 113–120.                        is believing: Optimizing long-term user engagement in recommender systems.
 [6] Sahibsingh A Dudani. 1976. The distance-weighted k-nearest-neighbor rule. IEEE             In Proceedings of the 2017 ACM on Conference on Information and Knowledge
     Transactions on Systems, Man, and Cybernetics 4 (1976), 325–327.                           Management. ACM, 1927–1936.
 [7] Surabhi Dwivedi and VS Kumari Roshni. 2017. Recommender system for big
     data in education. In 2017 5th National Conference on E-Learning & E-Learning
     Technologies (ELELTECH). IEEE, 1–4.                                                   A APPENDIX
 [8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their
     impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic     A.1 Significance tests - number of activities
     commerce. ACM, 192–199.
 [9] Jill Freyne, Michal Jacovi, Ido Guy, and Werner Geyer. 2009. Increasing engage-
                                                                                           A Mann-Whitney test was conducted to compare between each
     ment through early recommender intervention. In Proceedings of the third ACM          condition in the online experiment. Table 1 presents the results
     conference on Recommender systems. ACM, 85–92.                                        of the pairwise tests for the measures RecE and NoRecE that are
[10] Cary Funk, Jeffrey Gottfried, and Amy Mitchell. 2017. Science news and infor-
     mation today. Pew Research Center (2017).                                             significant.
[11] Stephen Gower. 2014. Netflix prize and SVD.
[12] Asela Gunawardana and Guy Shani. 2009. A survey of accuracy evaluation
     metrics of recommendation tasks. Journal of Machine Learning Research 10, 12           Condition1          Condition2         U         n1     n1    DV            Significant
     (2009).                                                                                CFUserUser          Popularity         473.5     43     43    RecE          Yes
[13] J Itmazi and M Gea. 2006. The recommendation systems: Types, domains and the
     ability usage in learning management system. In Proceedings of the International       CFUserUser          Baseline           406.0     43     43    RecE          Yes
     Arab Conference on Information Technology (ACIT’2006), Yarmouk University,             CFItemItem          Popularity         458.5     43     43    RecE          Yes
     Jordan.
[14] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-
                                                                                            CFItemItem          Baseline           396.0     43     43    RecE          Yes
     niques for recommender systems. Computer 42, 8 (2009), 30–37.                          SVD                 Popularity         433.0     42     43    RecE          Yes
[15] Gitte Kragh. 2016. The motivations of volunteers in citizen science. environmental     SVD                 Baseline           371.5     42     43    RecE          Yes
     SCIENTIST 25, 2 (2016), 32–35.
[16] Hugues Labarthe, François Bouchet, Rémi Bachelet, and Kalina Yacef. 2016. Does         SVD                 CFItemItem         731.0     42     43    RecE          Yes
     a Peer Recommender Foster Students’ Engagement in MOOCs?. International                SVD                 Baseline           729.0     42     43    NoRecE        Yes
     Educational Data Mining Society (2016).
[17] Jeffrey Laut, Francesco Cappa, Oded Nov, and Maurizio Porfiri. 2017. Increasing       Table 1: Online Metrics - Mann Whitney significance test
     citizen science contribution using a virtual peer. Journal of the Association for     with p<0.05. DV=Dependent Variable
     Information Science and Technology 68, 3 (2017), 583–593.
[18] Christopher H Lin, Ece Kamar, and Eric Horvitz. 2014. Signals in the silence:
     Models of implicit feedback in a recommendation system for crowdsourcing. In
     Twenty-Eighth AAAI Conference on Artificial Intelligence.
[19] Engineering National Academies of Sciences, Medicine, et al. 2018. Learning
     through citizen science: enhancing opportunities by design. National Academies        A.2      Significance tests - number of sessions
     Press.
[20] Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018.           A Mann-Whitney test was conducted to compare between each
     User personality and user satisfaction with recommender systems. Information          condition in the online experiment, including the historical data
     Systems Frontiers 20, 6 (2018), 1173–1189.
[21] Oded Nov, Ofer Arazy, and David Anderson. 2014. Scientists@ Home: what                used to train the algorithms, called past-data. Table 2 presents the
     drives the quantity and quality of online citizen science participation? PloS one     results of the pairwise tests that are significant.
     9, 4 (2014), e90375.
[22] Lesandro Ponciano and Thiago Emmanuel Pereira. 2019. Characterising volun-
     teers’ task execution patterns across projects on multi-project citizen science          Condition1          Condition2        U          n1     n2      Significant
     platforms. In Proceedings of the 18th Brazilian Symposium on Human Factors in
     Computing Systems. ACM, 16.
                                                                                              CFUserUser          Past-Data         5898.0     43     557     Yes
[23] M Jordan Raddick, Georgia Bracey, Pamela L Gay, Chris J Lintott, Phil Mur-               CFItemItem          Past-Data         6502.0     43     557     Yes
     ray, Kevin Schawinski, Alexander S Szalay, and Jan Vandenberg. 2009. Galaxy              SVD                 Past-Data         7284.0     42     557     Yes
     zoo: Exploring the motivations of citizen science volunteers. arXiv preprint
     arXiv:0909.2925 (2009).                                                                  Popularity          Past-Data         6683.5     43     557     Yes
[24] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender systems:             Baseline            Past-Data         6978.5     43     557     Yes
     introduction and challenges. In Recommender systems handbook. Springer, 1–34.
[25] Dana Rotman, Jenny Preece, Jen Hammock, Kezee Procita, Derek Hansen, Cynthia          Table 2: Number of sessions in SciStarter - Mann Whitney
     Parr, Darcy Lewis, and David Jacobs. 2012. Dynamic changes in motivation in           significance test with p<0.05
     collaborative citizen-science projects. In Proceedings of the ACM 2012 conference
     on computer supported cooperative work. 217–226.
[26] Rowayda A Sadek. 2012. SVD based image processing applications: state of the
     art, contributions and research challenges. arXiv preprint arXiv:1211.7102 (2012).
[27] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2002. Incremen-
     tal singular value decomposition algorithms for highly scalable recommender
     systems. In Fifth international conference on computer and information science,
     Vol. 1. Citeseer.
[28] J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative
     filtering recommender systems. In The adaptive web. Springer, 291–324.
[29] Avi Segal, Kobi Gal, Ece Kamar, Eric Horvitz, and Grant Miller. 2018. Optimizing
     Interventions via Offline Policy Evaluation: Studies in Citizen Science. In Thirty-
     Second AAAI Conference on Artificial Intelligence.
[30] Avi Segal, Ya’akov Gal, Robert J Simpson, Victoria Victoria Homsy, Mark
     Hartswood, Kevin R Page, and Marina Jirotka. 2015. Improving productivity
     in citizen science through controlled intervention. In Proceedings of the 24th