=Paper= {{Paper |id=Vol-2697/paper3_impactrs |storemode=property |title=Intelligent Recommendations for Citizen Science |pdfUrl=https://ceur-ws.org/Vol-2697/paper3_impactrs.pdf |volume=Vol-2697 |authors=Naama Dayan,Kobi Gal,Avi Segal,Guy Shani,Darlene Cavalier |dblpUrl=https://dblp.org/rec/conf/recsys/DayanGSSC20 }} ==Intelligent Recommendations for Citizen Science== https://ceur-ws.org/Vol-2697/paper3_impactrs.pdf

Intelligent Recommendations for Citizen Science
Na’ama Dayan Kobi Gal Avi Segal
Ben-Gurion University, Israel Ben-Gurion University, Israel Ben-Gurion University, Israel
namadayan@gmail.com University of Edinburgh, U.K. avisegal@gmail.com
kobig@bgu.ac.il

Guy Shani Darlene Cavalier
Ben-Gurion University, Israel Arizona State University, USA
shanigu@bgu.ac.il darlene@scistarter.com

ABSTRACT INTRODUCTION
Citizen science refers to scientific research that is carried out by Citizen science engages people in scientific research by collecting,
volunteers, often in collaboration with professional scientists. The categorizing, transcribing, or analyzing scientific data [3, 4, 10].
spread of the internet has significantly increased the number of These platforms offer thousands of different projects which ad-
citizen science projects and allowed volunteers to contribute to vance scientific knowledge all around the world. Through citizen
these projects in dramatically new ways. For example, SciStarter, science, people share and contribute to data monitoring and col-
our partners in the project, is an online portal that offers more than lection programs. Usually this participation is done as an unpaid
3,000 affiliate projects and recruits volunteers through media and volunteer. Collaboration in citizen science involves scientists and re-
other organizations, bringing citizen science to people. Given the searchers working with the public. Community-based groups may
sheer size of available projects, finding the right project, which generate ideas and engage with scientists for advice, leadership, and
best suits the user preferences and capabilities, has become a ma- program coordination. Interested volunteers, amateur scientists,
jor challenge and is essential for keeping volunteers motivated students, and educators may network and promote new ideas to
and active contributors. This paper addresses this challenge by advance our understanding of the world. Scientists can create a
developing a system for personalizing project recommendations citizen-science program to capture more or more widely spread
in the SciStarter ecosystem. We adapted several recommendation data without spending additional funding. Citizen-science projects
algorithms from the literature based on collaborative filtering and may include wildlife-monitoring programs, online databases, visu-
matrix factorization. The algorithms were trained on historical data alization and sharing technologies, or other community efforts.
of users’ interactions in SciStarter as well as their contributions to For example, the citizen science portal SciStarter (scistarter.com),
different projects. The trained algorithms were deployed in SciS- which also comprises our empirical methodology, includes over
tarter in a study involving hundreds of users who were provided 3,000 projects, and recruits volunteers through media and other or-
with personalized recommendations for projects they had not con- ganizations (Discover, the Girl Scouts, etc). As of July, 2020, there are
tributed to before. Volunteers were randomly divided into different 82,014 registered users in SciStarter. Examples of popular projects
cohorts, which varied the recommendation algorithm that was used on SciStarter include iNaturalist 1 in which users map and share
to generate suggested projects. The results show that using the new observations of biodiversity across the globe; CoCoRaHS2 , where
recommendation system led people to contribute to new projects volunteers share daily readings of precipitation; and StallCatchers 3 ,
that they had never tried before and led to increased participation where volunteers identify vessels in the brain as flowing or stalled.
in SciStarter projects when compared to cohort groups that were Projects can be taken either online or at a specific physical region.
recommended the most popular projects, or did not receive rec- Users visit SciStarter in order to discover new projects to participate
ommendations, In particular, the cohort of volunteers receiving in and keep up to date with the community events. Figure 1 shows
recommendations created by an SVD algorithm (matrix factoriza- the User Interface of SciStarter.
tion) exhibited the highest levels of contributions to new projects, According to a report from the National Academies of Sciences,
when compared to the other cohorts. A follow-up survey conducted Engineering, and Medicine [19], citizen scientists’ motivations are
with the SciStarter community confirms that users were satisfied “strongly affected by personal interests,” and participants who en-
with the recommendation tool and claimed that the recommen- gage in citizen science over a long period of time “have successive
dations matched their personal interests and goals. Based on the opportunities to broaden and deepen their involvement.” Thus, sus-
positive results, our recommendation system is now fully integrated tained engagement through the use of intelligent recommendations
with SciStarter. The research has transformed how SciStarter helps can improve data quality and scientific outcomes for the projects
projects recruit and support participants and better respond to their and the public.
needs. Yet, finding the RIGHT project–one that matches interests and
capabilities, is like searching for a needle in a haystack [5, 24].
Ponciano et al. [22] who characterized volunteers’ task execution
Proceedings of the ImpactRS Workshop at ACM RecSys ’20, September 25, 2020, Virtual
Event, Brazil.
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons 1 https://scistarter.org/seek-by-inaturalist
License Attribution 4.0 International (CC BY 4.0). 2 https://scistarter.org/cocorahs-rain-hail-snow-network

. 3 https://scistarter.org/stall-catchers-by-eyesonalz
users’ interactions on the SciStarter portal, (e.g., searching for a
project). The output of the algorithm is a function from user profile
and past history of interactions on SciStarter to a ranking of 10
projects in order of inferred relevance for the user.
We measured two types of user interactions, which were taken
as the input to the algorithms: (1) Interactions with projects: data
generated as a result of users’ activities with projects, e.g joining
a project, making a contribution to a project or participating in a
project. (2) Interactions on Scistarter portal, such as searching for a
project, or filling a form about the project. The algorithm matches
a user profile and his past history of interactions and outputs a
ranking of 10 projects in decreasing order of relevance for each
user.
We conducted a randomized controlled study, in which hundreds
Figure 1: SciStarter User Interface of registered SciStarter users were assigned recommendations by
algorithms using different approaches to recommend projects. The
first approach personalized projects to participants by using collab-
patterns across projects and showed that volunteers tend to explore orative filtering algorithms (item-based and user-based), and matrix
multiple projects in citizen science platforms, but they perform tasks factorization (SVD) algorithms. These algorithms were compared to
regularly in just a few of them. This result is also reflected in users’ two non-personalized algorithms: the first algorithm recommended
participation patterns in Scistarter. Figure 2 shows a histogram of the most popular projects at that point in time, and the second algo-
the number of projects that users contribute to on the site between rithm recommended three projects that were manually determined
2017 and 2019. As shown by the Figure, the majority of active users by the SciStarter admins and custom to change during the study.
in the SciStarter portal do not contribute to more than a single The results show that people receiving the personalized recommen-
project. dations were more likely to contribute to new projects that they
SciStarter employs a search engine (shown in Figure 3) that had never tried before and participated more often in these projects
uses topics, activities, location and demographics (quantifiable when compared to participants who received non-personalized rec-
fields) to suggest project recommendations. However, recommend- ommendations, or did not receive recommendations, In particular,
ing projects based on this tool has not been successful. To begin the cohort of participants receiving recommendations created by
with, our analysis shows that about 80% of users do not use the the SVD algorithm (matrix factorization) exhibited the highest lev-
search tool. Second, those who use the search tool For example, els of contributions to new projects, when compared to the other
when querying outdoor projects, the search engine recommends personalized groups. A follow-up survey conducted with the SciS-
the CoCoRaHS project and Globe at Night, in which volunteers tarter community confirms that the Based on the positive results,
measure and submit their night sky brightness observations. But our recommendation system is now fully integrated with SciStarter.
data shows that people who join CoCoRaHS are more likely to join This research develops a recommendation system for citizen sci-
Stall Catchers, an indoor, online project to accelerate Alzheimer’s ence domain. It is the first study using AI based recommendation
research. algorithms in large scale citizen science platforms.
We address this challenge by using recommendation algorithms
to match individual volunteers with new projects based on the past
history of their interactions on the site [2, 7]. Recommendation 1 RELATED WORK
systems have been used in other domains, such as e-commerce, This research relates to past work in using AI to increase partic-
news, and social media [8, 13]. However, the nature of interaction ipants’ motivation in citizen science research as well as work in
in citizen Science is fundamentally different than these domains aplying recommendation systems in real world settings. We list
in that volunteers are actively encouraged to contribute their time relevant work in each of these two areas.
and effort to solve scientific problems. Compared to clicking on an
advertisement or a product, as is the case for e-commerce and news
sites, considerable more effort is required from a citizen science vol-
1.1 Citizen Science - Motivation and level of
unteer. Our hypothesis was that personalizing recommendations to engagement
users will increase their engagement in the SciStarter portal as mea- Online participation in citizen science projects has become very
sured by the number of projects that they contribute to following common [21]. Yet, most of the contributions rely on a very small
the recommendations, and the extent of their contributions. proportion of participants [25]. In SciStarter, the group of partici-
We attempted to enhance participant engagement to SciStarter pants who contribute to more than 10 projects is less than 10% of
projects by matching users with new projects based on past history all users. However, in most citizen science projects, the majority
of their interactions on the site. We adopted 4 different recommen- of participants carry out only a few tasks. Many researches have
dation algorithms to the citizen science domain. The input to the explored the incentives and motivations of participants in order to
algorithms consists of data representing users’ interactions with increase participants engagement. Kragh et al. [15] showed that
affiliated projects (e.g., joining or contributing to a project), and participants in citizen science projects are motivated by personal
Figure 2: Distribution of user participation in SciStarter projects

as well. Segal et al. [29] have developed an intelligent approach
which combines model-based reinforcement learning with off-line
policy evaluation in order to generate intervention policies which
significantly increase users’ contributions. Laut et al. [17] have
demonstrated how participants are affected by virtual peers and
showed that participants’ contribution can be enhanced through
the presence of virtual peers.
Ponciano et al. [22] characterized volunteers’ task execution pat-
terns across projects and showed that volunteers tend to explore
multiple projects in citizen science platforms, but they perform
tasks regularly in just a few of them. They have also showed that
Figure 3: Screenshot of existing search tool showing various volunteers recruited from other projects on the platform tend to
criteria get more engaged than those recruited outside the platform. This
finding is a great incentive to increase user engagement in SciS-
tarter’s platform instead of in the projects’ sites directly, like we do
in our research.
interest and desire to learn something new, as well as by the de- In this research, we attempted to enhance participant engage-
sire to volunteer and contribute to science. A prior work of Raddic ment with citizen science projects by recommending the user projects
et al. [23] also showed that participants engagement has mainly which best suit the user preferences and capabilities.
originated in pure interest in the project topic, such as astronomy
and zoology. Yet, as we tested this finding in our collected data, we
noticed that user interest is very diverse and does not include only
1.2 Increasing user engagement with
one major topic of interest. Nov et al. [21] explored the different recommendations
motivations of users to contribute, by separating this question to Similar to our work, other researchers, also tried to increase user
quantity of contribution and quality of contributions. They showed engagement and participation by personalized recommendations.
that quantity of contribution is mostly determined by the user Labarthe et al. [16] built a recommender system for students in
interest in the project and by social norms while quality of con- MOOCs that recommends relevant and rich-potential contacts with
tribution is determined by understanding the importance of the other students, based on user profile and activities. They showed
task and by the user’s reputation. In our work we aimed to increase that by recommending this list of contacts, students were much
only the quantity of contributions, since data about the quality of more likely to persist and engage in MOOCs. A subsequent work
contribution is not available for us. of Dwivedi et al. [7] developed a recommender system that recom-
A significant prior work was done in order to increase partici- mends online courses to students based on their grades in other
pants engagement, which takes into consideration user motivation subjects. This recommender was based on collaborative filtering
techniques and particularly item based recommendations. This 2 METHODOLOGY
paper showed that users who interacted with the recommenda- Our goals for the research project were to (1) help users discover
tion system increased their chance to finish the MOOC by 270%, new projects in the SciStarter ecosystem - matching them with
compared to users who did not interact with the recommendation projects that are suitable to their preferences. (2) learn user behavior
system. in SciStarter, and develop a recommendation system which will
Some other studies that concern user engagement with recom- help increase the number of project they contribute to. (3) measure
mendations systems showed how early intervention significantly users’ satisfaction with the recommendation system.
increase user engagement. Freyne et al. [9] showed that users who We adopted several canonical algorithms from the recommen-
received early recommendations in social networks are more likely dation systems literature: CF user based [28], CF item based [28],
to continue returning to the site. They showed a clear difference Matrix Factorization [27], Popularity [1]. These approaches were
in retention rate between the control group, which has lost 42% of chosen as they are all based on analyzing the interactions between
the users, and a group that interacted with the recommendations, users and items and do not rely on domain knowledge which is lack-
which has lost only 24% of the users. ing (such as project’s location, needed materials, ideal age group
Wu et al. [32], showed how tracking user’s clicks and return etc.). Each algorithm receives as input a target user and the num-
behaviour in news portals succeeds to increase user engagement ber of recommendations to generate (N). The algorithm returns a
with their recommendation system. They formulated the optimiza- ranking of top N projects in decreasing order of relevance for the
tion of long-term user engagement as a sequential decision making user. We provide additional details about each algorithm below.
problem, where a recommendation is based on both the estimated
immediate user click and the expected clicks results from the users’ 2.0.1 User-based Collaborative Filtering. In this algorithm, the rec-
future return. ommendation is based on user similarities [28]. The ranking of
Lin et al. [18], developed a recommendation system for crowd- a project for a target user is computed by comparing users who
sourcing which incorporates negative implicit feedback into a pre- interacted with similar projects. We use a KNN algorithm [6] to
dictive matrix factorization model. They showed that their models, find similar users, where the similarity score for user vector U1
which consider negative feedback, produce better recommenda- and user vector U2 from the input matrix, is calculated with cosine
tions than the original MF approach of implicit feedback. They similarity.
evaluated their findings via experiment with data from Microsoft’s 𝑈1 ∗𝑈2
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑈 1, 𝑈 2) =
internal Universal Human Relevance System and showed that the ||𝑈 1||||𝑈 2||
quality of task recommendations is improved with their models. In We chose the value of 𝐾 to be the minimal number such that the
our work, we use only positive implicit feedback, due to the low number of new projects in the neighborhood of similar users to the
users traffic, where a significant evidence of negative feedback is target user equaled the number of recommendations. In practice 𝐾
hard to be found. was initially chosen to be 100 and increased until this threshold was
Recommendation algorithms are mostly evaluated by their ac- met. This was done so that there will always be sufficient number
curacy. The underlying assumption is that accuracy will increase of projects to recommend for users.
user satisfaction and ultimately lead to higher engagement and
retention rate. However, past research has suggested that accuracy 2.0.2 Item-based Collaborative Filtering. In this algorithm the rec-
does not necessarily lead to satisfaction. Wu et al [31] investigated ommendation is based on project similarity [28]. The algorithm gen-
the effects of popular approaches such as collaborative-filtering erates recommendations based on the similarity between projects
and content-based to see if they have different effects on user satis- calculated using people’s interaction with these projects. Similarity
faction. Results of the study suggested that product awareness (the score for project vector P1 and project vector P2 from the input
set of products that the user is initially aware of before using any matrix, is calculated with cosine similarity.
recommender system) plays an important role in moderating the 𝑃1 ∗ 𝑃2
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑃1, 𝑃2) = 𝑐𝑜𝑠 (𝜃 ) =
impact of recommenders. Particularly, if a consumer had a relatively ||𝑃1||||𝑃2||
niche awareness set, chances are that content based systems would
The algorithm then recommends on the top-N most similar
garner more positive responses on the satisfaction of the user. On
projects to the set of projects the user has interacted with in the
the other hand, they showed that users who are more aware of
past.
popular items, should be targeted with collaborative filtering sys-
tems instead. A subsequent work of Nguyen et al [20], showed that 2.0.3 Matrix Factorization - SVD. The Matrix factorization algo-
individual users’ preferences for the level of diversity, popularity, rithm (SVD) directly predicts the relevance of a new project to a
and serendipity in recommendation lists cannot be inferred from target user by modeling the user-project relationship [14, 27]. This
their ratings alone. The paper suggested that user satisfaction can model-based algorithm (as opposed to the two memory based algo-
be improved by integrating users’ personality traits into the process rithms presented earlier) was chosen since it is one of the leading
of generating recommendations, which were obtained by a user recommendation system algorithms [11, 14, 26]. SVD uses a matrix
study. where the users are rows, projects are columns, and the entries
are values that represent the relevance of the projects to the users.
This users-projects matrix is often very sparse and has many miss-
ing values, since users engage with a very small portion of all the
available items.
The algorithm estimates the relevance of a target project for a Fig 4 shows results of the precision and recall curves for the 4
user by maintaining a user model and a project model that include examined algorithms. As can be seen from the figure, user-based
hidden variables (latent factors) that can affect how users choose collaborative filtering and SVD are the best algorithms and their
items. These variables have no semantics, they are simply numbers performance is higher than Popularity and Item based collaborative
in a matrix; in reality, aspects like gender, culture, age etc. may filtering. The Popularity recommendation algorithm generated the
affect the relevance, but we do not have access to them. lowest performance.
The singular value decomposition (SVD) of any matrix 𝑅 is a
factorization of the form 𝑈 𝑆𝑉 𝑇 . This algorithm is used in recom-
mendation systems in order to find the multiplication of the three
matrices 𝑈 , 𝑆, 𝑉 𝑇 , to estimate the original matrix 𝑅 and hence,
to predict the missing values in the matrix. As mentioned above,
the matrix 𝑅 includes missing values as users did not participate
in all projects. We estimate the missing values which reflect how
satisfied will the user be with an unseen project. In the settings
of recommendation system, the matrix 𝑈 is a left singular matrix,
representing the relationship between users and latent factors. 𝑆
is a rectangular diagonal matrix with non-negative real numbers
on the diagonal, while 𝑉 𝑇 is a right singular matrix, indicating
the similarity between items and latent factors. SVD decreases the
dimension of the utility matrix 𝑅, by extracting its latent factors. It
maps each user and item into a latent space with r dimensions and
with this, we can better understand the relationship between users Figure 4: Precision/Recall results on offline data
and projects, and compare between their vectors’ representations.
Let R̂ be the estimation of the original matrix R. Given R̂, which
includes predictions for all the missing values in R, we can rank
each project for a user, by its score in R̂. The projects with the
3.2 Online Study
highest ranking are then recommended to the user. In our settings, The second part of the study was an online experiment. Users
like in the other algorithms described before, R̂ is a binary matrix. who logged on to SciStarter starting on December 2nd, 2019 were
randomly assigned to one of 5 cohorts, each providing recommen-
dations based on different algorithm: (1) Item-based Collaborative
3 RESULTS
Filtering, (2) User-based Collaborative Filtering, (3) Matrix Factor-
The first part of the study compares the performance of the different ization, (4) Most popular projects, (5) Promoted projects. Promoted
algorithms on historical SciStarter Data. The second part of the projects were manually determined by SciStarter and often aligned
study implements the algorithms in the wild, and actively assigns with social initiatives and current events. Among these projects are
recommendations to users using the different algorithms. GLOBE Observer Clouds4 , Stream Selfie 5 and TreeSnap 6 . Another
Of the 3000 existing projects SciStarter offers, 153 projects are example is FluNearYou, in which individuals report flu symptoms
affiliate projects. An affiliate project is one that uses a specific API online, was one of the promoted projects during the COVID-19
to report back to SciStarter each time a logged in SciStarter user outbreak. These projects change periodically by the SciStarter ad-
has contributed data or analyzed data on that project’s website or ministrators.
app. As data of contributions and participation only existed for the The recommendation tool was active on SciStarter for 3 months.
affiliate projects, we only used these projects in the study. Users who logged on during that time were randomly divided into
cohorts, each receiving a recommendation from a different algo-
3.1 Offline Study rithm. Each cohort had 42 or 43 users. The recommendations were
The training set for all algorithms consisted of data collected be- embedded in the user’s dashboard in decreasing order of relevance,
tween January 2012 to September 2019. It included 6353 users who in sets of three, from left to write. Users could scroll to reveal more
contributed to 127 different projects. For the collaborative filtering projects in decreasing or increasing order of relevance. Figure 5
and SVD algorithm, we restricted the training set to users that made shows the top three recommended projects for a target user.
at least two activities during that time frame, whether contributing All registered users in SciStarter received notification via email
to a project or interacting on the SciStarter portal. We chronolog- about the study, stating that the “new SciStarter AI feature provides
ically split the data into train and test sets such that 10% of the personalized recommend projects based on your activity and inter-
latest interactions from each user are selected for the test set and ests.” A link to a blog post containing more detailed explanations
the remaining 90% of the interactions are used for the train set. of recommendation algorithms, their role in the study, emphasiz-
As a baseline, we also considered an algorithm that recommends ing that‘ ‘all data collected and analyzed during this experiment
project according to decreasing order of popularity, measured by on SciStarter will be anonymized." Also, users are allowed to opt
the number of users who contribute to the project [1]. 4 https://scistarter.org/globe-observer-clouds
We evaluate the top-n recommendation result using precision 5 https://scistarter.org/stream-selfie

and recall metrics with varying number of recommendations. 6 https://scistarter.org/treesnap
Figure 5: Screenshot of recommendation tool

out of receiving recommendations at any point, by clicking on the
link “opt out from these recommendations" in the recommendation
panel. In practice, none of the participants selected the opt out Figure 6: Click through rate (top) and Hit rate (bottom) mea-
option at any point in time. sures for online study
Figure 6 (top) shows the average click trough rate (defined as the
ratio recommended projects that the users accessed) and Figure 6
(bottom) shows the average hit rate (defined as the percentage of
instances in which users accessed at least one project that was
recommended to them). As shown by the Figure, both measures
show a consistent trend, in which the user-based collaborative algo-
rithms achieved the best performance, while the baseline method
achieving worse performance. Despite the trend, the differences
between conditions were not statistically significant in the 𝑝 < 0.05
range. We attribute this to the fact that we measured clicks on
recommended projects rather than actual contributions which is
the most important aspect for citizen science.
To address this gap we defined two new measures that consider
the contributions made by participants to projects, which considers
the system utility and identified by Gunawardana and Shani [12]. Figure 7: Average activities on recommended projects
The measures include the average number of activities that users (RecE), and on non-recommended projects (NoRecE) for
carried out in recommended projects (RecE), and the average num- each condition
ber of activities that users carried out in non-recommended projects
(NoRecE). Figure 7 compares the different algorithms according
to these two measures. The results show that users assigned to number of sessions in historical data. These results are statistically
the intelligent recommendation conditions performed significantly significant. Although there is a clear trend that users in the 𝑆𝑉 𝐷
more activities in recommended projects than those assigned to condition achieved the highest number of sessions, these results
the Popularity and Baseline conditions. Also, users in the SVD algo- were not significant in the 𝑝 < 0.05 range.
rithm performed significantly less activities in non-recommended To explain the success of SVD’s good performance in the online
projects than the Popularity and Baseline conditions. These results study, we note first that SVD is considered as a leading algorithm
were statistically significant according to Mann-Whitney tests (see in the domain of recommendation systems [11, 26]. Second, in our
Appendix for details). setting SVD tended to generate recommendations that participants
Lastly, we measure the average number of sessions for users in had not heard about before, which the survey reveals to be more
the different conditions, where sessions are defined as a continuous interesting to them. One participant remarked: "I did not click on
length of time in which the user is active in a project. Figure 8 either project because I have looked at both projects (several times)
shows the average number of sessions for users in the different previously", "I am more interested in projects I didn’t know exists
cohorts, including the number of sessions for the historical data before".
used to train the algorithms, in which no recommendations were Lastly, we note the obstacles we encountered when carrying
provided. The results show that users receiving recommendations out the study. The first obstacle we encountered was the small
from the personalized algorithms performed more sessions than the number of relevant projects that could be recommended. Out of
anxiety". (3) Users who feel that the recommendations are not suit-
able for their interests: "No interest in stall catchers", "The photos
and title didn’t perfectly match what I am looking for".
The survey provided evidence for the positive impact of using the
recommendation systems in SciStarter, which include the following
comments. “I am very impressed by the new Artificial Intelligence
feature from SciStarter! Your AI feature shows me example projects
that I didn’t know before exist", and “I like how personalized rec-
ommendations are made for citizen science users".

4 CONCLUSION AND FUTURE WORK
Figure 8: Average number of sessions for each condition This work reports on the use of recommendation algorithms to
increase engagement of volunteers in citizen science, in which vol-
unteers collaborate with researchers to perform scientific tasks.
These recommendation algorithms were deployed in SciStarter, a
portal with thousands of citizen science projects, and were eval-
almost 3000 projects that SciStarter offers, we restricted ourselves uated in an online study involving hundreds of users who were
to about 120 projects are affiliate projects which actively provide informed about participating in a study involving AI based recom-
data of users’ interactions. Another obstacle was that we were mendation of new projects. We trained different recommendation
constrained to a subset of users who log on to SciStarter and use algorithms using a combination of data including users’ behavior in
it as a portal to contributing to the project, rather than accessing SciStarter as well as their contributions to the specific project. Our
the project directly. Out of the 65,000 registered users of SciStarter, results show that using the new recommendation system led people
only a small percentage are logged in to both SciStarter and an to contribute to new projects that they had never tried before and
affiliate project. As a result, we have relatively few users getting led to increased participation in SciStarter projects when compared
recommendations. In addition, some of SciStarter’s projects are to a baseline cohort group that did not receive recommendations.
location-specific and can only be done by users in the same physical The outcome of this research project is the AI-powered Recommen-
location. (e.g collecting a water sample from a particular lake located dation Widget which has been fully deployed in SciStarter. This
in a particular city). Therefore, we kept track of users’ location project has transformed how SciStarter helps projects recruit and
and restricted our recommendation system to be a location-based support participants and better respond to their needs. It was so
system, which recommends users with projects they are able to successful in increasing engagement, that SciStarter has decided to
participate in. make the widget a permanent feature of their site. This will help
support deeper, sustained engagement to increase the collective
intelligence capacity of projects and generate improved scientific,
3.3 User Study learning, and other outcomes. The results of this research have
In order to learn what is the users’ opinion on the recommenda- been featured on the DiscoverMagazine.com 7 . While we observed
tions, and their level of satisfaction, we conducted a survey with significant engagement with the recommendation tool, one may
SciStarter’s users. Our survey was sent to all SciStarter commu- consider adding explanations to the recommendations in order
nity users. 138 users have filled the survey, where each user was to increase the system’s reliability and user’s satisfaction with it.
asked about the recommendations presented to him by the algo- Moreover, we plan to extend the recommendation system to include
rithm he was assigned to. The survey included questions about content based algorithms, and test its performance as compared
users’ overall satisfaction with the recommendation tool as well to the existing algorithms. We believe that integrating content in
as questions about their pattern of behavior before and after the Citizen Science domain can be very beneficial. Even though users
recommendations. The majority of users (75%) were very satisfied tend to participate in a variety of different projects, we want to be
with the recommendation tool and claimed that the recommenda- able to capture more intrinsic characteristic of the projects, such as
tions matched their personal interests and goals. The majority of the type of the task a user has to perform or the required effort.
users (54%) reported they have clicked on the recommendations
and visited the project’s site, while only 8% of users did not click the REFERENCES
recommendation or visited the project site. Interestingly, users who [1] Hyung Jun Ahn. 2006. Utilizing popularity characteristics for product recom-
were not familiar with the recommended projects before, clicked mendation. International Journal of Electronic Commerce 11, 2 (2006), 59–80.
[2] Xavier Amatriain. 2013. Big & personal: data and models behind netflix recom-
more on the recommendations, as well as users who previously mendations. In Proceedings of the 2nd international workshop on big data, streams
performed a contribution to a project. and heterogeneous source Mining: Algorithms, systems, programming models and
Users who did not click on the recommendations can be divided applications. ACM, 1–6.
[3] Rick Bonney, Caren B Cooper, Janis Dickinson, Steve Kelling, Tina Phillips,
into 3 main themes: (1) Users who don’t have the time right now or Kenneth V Rosenberg, and Jennifer Shirk. 2009. Citizen science: a developing
will click the project in the future. (2) Users who feel that the recom- tool for expanding science knowledge and scientific literacy. BioScience 59, 11
mendations are not suitable for their skills and materials: "Seemed (2009), 977–984.

out of my league", "I didn’t have the materials to participate". This 7 https://www.discovermagazine.com/technology/ai-powered-smart-project-
behaviour was also discussed in [30], and was named "classification recommendations-on-scistarter
[4] Dominique Brossard, Bruce Lewenstein, and Rick Bonney. 2005. Scientific knowl- International Conference on World Wide Web. 331–337.
edge and attitude change: The impact of a citizen science project. International [31] Ling-Ling Wu, Yuh-Jzer Joung, and Jonglin Lee. 2013. Recommendation sys-
Journal of Science Education 27, 9 (2005), 1099–1121. tems and consumer satisfaction online: moderating effects of consumer product
[5] Hillary K Burgess, LB DeBey, HE Froehlich, Natalaie Schmidt, Elli J Theobald, awareness. In 2013 46th Hawaii International Conference on System Sciences. IEEE,
Ailene K Ettinger, Janneke HilleRisLambers, Joshua Tewksbury, and Julia K 2753–2762.
Parrish. 2017. The science of citizen science: Exploring barriers to use as a [32] Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning
primary research tool. Biological Conservation 208 (2017), 113–120. is believing: Optimizing long-term user engagement in recommender systems.
[6] Sahibsingh A Dudani. 1976. The distance-weighted k-nearest-neighbor rule. IEEE In Proceedings of the 2017 ACM on Conference on Information and Knowledge
Transactions on Systems, Man, and Cybernetics 4 (1976), 325–327. Management. ACM, 1927–1936.
[7] Surabhi Dwivedi and VS Kumari Roshni. 2017. Recommender system for big
data in education. In 2017 5th National Conference on E-Learning & E-Learning
Technologies (ELELTECH). IEEE, 1–4. A APPENDIX
[8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their
impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic A.1 Significance tests - number of activities
commerce. ACM, 192–199.
[9] Jill Freyne, Michal Jacovi, Ido Guy, and Werner Geyer. 2009. Increasing engage-
A Mann-Whitney test was conducted to compare between each
ment through early recommender intervention. In Proceedings of the third ACM condition in the online experiment. Table 1 presents the results
conference on Recommender systems. ACM, 85–92. of the pairwise tests for the measures RecE and NoRecE that are
[10] Cary Funk, Jeffrey Gottfried, and Amy Mitchell. 2017. Science news and infor-
mation today. Pew Research Center (2017). significant.
[11] Stephen Gower. 2014. Netflix prize and SVD.
[12] Asela Gunawardana and Guy Shani. 2009. A survey of accuracy evaluation
metrics of recommendation tasks. Journal of Machine Learning Research 10, 12 Condition1 Condition2 U n1 n1 DV Significant
(2009). CFUserUser Popularity 473.5 43 43 RecE Yes
[13] J Itmazi and M Gea. 2006. The recommendation systems: Types, domains and the
ability usage in learning management system. In Proceedings of the International CFUserUser Baseline 406.0 43 43 RecE Yes
Arab Conference on Information Technology (ACIT’2006), Yarmouk University, CFItemItem Popularity 458.5 43 43 RecE Yes
Jordan.
[14] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-
CFItemItem Baseline 396.0 43 43 RecE Yes
niques for recommender systems. Computer 42, 8 (2009), 30–37. SVD Popularity 433.0 42 43 RecE Yes
[15] Gitte Kragh. 2016. The motivations of volunteers in citizen science. environmental SVD Baseline 371.5 42 43 RecE Yes
SCIENTIST 25, 2 (2016), 32–35.
[16] Hugues Labarthe, François Bouchet, Rémi Bachelet, and Kalina Yacef. 2016. Does SVD CFItemItem 731.0 42 43 RecE Yes
a Peer Recommender Foster Students’ Engagement in MOOCs?. International SVD Baseline 729.0 42 43 NoRecE Yes
Educational Data Mining Society (2016).
[17] Jeffrey Laut, Francesco Cappa, Oded Nov, and Maurizio Porfiri. 2017. Increasing Table 1: Online Metrics - Mann Whitney significance test
citizen science contribution using a virtual peer. Journal of the Association for with p<0.05. DV=Dependent Variable
Information Science and Technology 68, 3 (2017), 583–593.
[18] Christopher H Lin, Ece Kamar, and Eric Horvitz. 2014. Signals in the silence:
Models of implicit feedback in a recommendation system for crowdsourcing. In
Twenty-Eighth AAAI Conference on Artificial Intelligence.
[19] Engineering National Academies of Sciences, Medicine, et al. 2018. Learning
through citizen science: enhancing opportunities by design. National Academies A.2 Significance tests - number of sessions
Press.
[20] Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018. A Mann-Whitney test was conducted to compare between each
User personality and user satisfaction with recommender systems. Information condition in the online experiment, including the historical data
Systems Frontiers 20, 6 (2018), 1173–1189.
[21] Oded Nov, Ofer Arazy, and David Anderson. 2014. Scientists@ Home: what used to train the algorithms, called past-data. Table 2 presents the
drives the quantity and quality of online citizen science participation? PloS one results of the pairwise tests that are significant.
9, 4 (2014), e90375.
[22] Lesandro Ponciano and Thiago Emmanuel Pereira. 2019. Characterising volun-
teers’ task execution patterns across projects on multi-project citizen science Condition1 Condition2 U n1 n2 Significant
platforms. In Proceedings of the 18th Brazilian Symposium on Human Factors in
Computing Systems. ACM, 16.
CFUserUser Past-Data 5898.0 43 557 Yes
[23] M Jordan Raddick, Georgia Bracey, Pamela L Gay, Chris J Lintott, Phil Mur- CFItemItem Past-Data 6502.0 43 557 Yes
ray, Kevin Schawinski, Alexander S Szalay, and Jan Vandenberg. 2009. Galaxy SVD Past-Data 7284.0 42 557 Yes
zoo: Exploring the motivations of citizen science volunteers. arXiv preprint
arXiv:0909.2925 (2009). Popularity Past-Data 6683.5 43 557 Yes
[24] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender systems: Baseline Past-Data 6978.5 43 557 Yes
introduction and challenges. In Recommender systems handbook. Springer, 1–34.
[25] Dana Rotman, Jenny Preece, Jen Hammock, Kezee Procita, Derek Hansen, Cynthia Table 2: Number of sessions in SciStarter - Mann Whitney
Parr, Darcy Lewis, and David Jacobs. 2012. Dynamic changes in motivation in significance test with p<0.05
collaborative citizen-science projects. In Proceedings of the ACM 2012 conference
on computer supported cooperative work. 217–226.
[26] Rowayda A Sadek. 2012. SVD based image processing applications: state of the
art, contributions and research challenges. arXiv preprint arXiv:1211.7102 (2012).
[27] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2002. Incremen-
tal singular value decomposition algorithms for highly scalable recommender
systems. In Fifth international conference on computer and information science,
Vol. 1. Citeseer.
[28] J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative
filtering recommender systems. In The adaptive web. Springer, 291–324.
[29] Avi Segal, Kobi Gal, Ece Kamar, Eric Horvitz, and Grant Miller. 2018. Optimizing
Interventions via Offline Policy Evaluation: Studies in Citizen Science. In Thirty-
Second AAAI Conference on Artificial Intelligence.
[30] Avi Segal, Ya’akov Gal, Robert J Simpson, Victoria Victoria Homsy, Mark
Hartswood, Kevin R Page, and Marina Jirotka. 2015. Improving productivity
in citizen science through controlled intervention. In Proceedings of the 24th