Intelligent Recommendations for Citizen Science Na’ama Dayan Kobi Gal Avi Segal Ben-Gurion University, Israel Ben-Gurion University, Israel Ben-Gurion University, Israel namadayan@gmail.com University of Edinburgh, U.K. avisegal@gmail.com kobig@bgu.ac.il Guy Shani Darlene Cavalier Ben-Gurion University, Israel Arizona State University, USA shanigu@bgu.ac.il darlene@scistarter.com ABSTRACT INTRODUCTION Citizen science refers to scientific research that is carried out by Citizen science engages people in scientific research by collecting, volunteers, often in collaboration with professional scientists. The categorizing, transcribing, or analyzing scientific data [3, 4, 10]. spread of the internet has significantly increased the number of These platforms offer thousands of different projects which ad- citizen science projects and allowed volunteers to contribute to vance scientific knowledge all around the world. Through citizen these projects in dramatically new ways. For example, SciStarter, science, people share and contribute to data monitoring and col- our partners in the project, is an online portal that offers more than lection programs. Usually this participation is done as an unpaid 3,000 affiliate projects and recruits volunteers through media and volunteer. Collaboration in citizen science involves scientists and re- other organizations, bringing citizen science to people. Given the searchers working with the public. Community-based groups may sheer size of available projects, finding the right project, which generate ideas and engage with scientists for advice, leadership, and best suits the user preferences and capabilities, has become a ma- program coordination. Interested volunteers, amateur scientists, jor challenge and is essential for keeping volunteers motivated students, and educators may network and promote new ideas to and active contributors. This paper addresses this challenge by advance our understanding of the world. Scientists can create a developing a system for personalizing project recommendations citizen-science program to capture more or more widely spread in the SciStarter ecosystem. We adapted several recommendation data without spending additional funding. Citizen-science projects algorithms from the literature based on collaborative filtering and may include wildlife-monitoring programs, online databases, visu- matrix factorization. The algorithms were trained on historical data alization and sharing technologies, or other community efforts. of users’ interactions in SciStarter as well as their contributions to For example, the citizen science portal SciStarter (scistarter.com), different projects. The trained algorithms were deployed in SciS- which also comprises our empirical methodology, includes over tarter in a study involving hundreds of users who were provided 3,000 projects, and recruits volunteers through media and other or- with personalized recommendations for projects they had not con- ganizations (Discover, the Girl Scouts, etc). As of July, 2020, there are tributed to before. Volunteers were randomly divided into different 82,014 registered users in SciStarter. Examples of popular projects cohorts, which varied the recommendation algorithm that was used on SciStarter include iNaturalist 1 in which users map and share to generate suggested projects. The results show that using the new observations of biodiversity across the globe; CoCoRaHS2 , where recommendation system led people to contribute to new projects volunteers share daily readings of precipitation; and StallCatchers 3 , that they had never tried before and led to increased participation where volunteers identify vessels in the brain as flowing or stalled. in SciStarter projects when compared to cohort groups that were Projects can be taken either online or at a specific physical region. recommended the most popular projects, or did not receive rec- Users visit SciStarter in order to discover new projects to participate ommendations, In particular, the cohort of volunteers receiving in and keep up to date with the community events. Figure 1 shows recommendations created by an SVD algorithm (matrix factoriza- the User Interface of SciStarter. tion) exhibited the highest levels of contributions to new projects, According to a report from the National Academies of Sciences, when compared to the other cohorts. A follow-up survey conducted Engineering, and Medicine [19], citizen scientists’ motivations are with the SciStarter community confirms that users were satisfied “strongly affected by personal interests,” and participants who en- with the recommendation tool and claimed that the recommen- gage in citizen science over a long period of time “have successive dations matched their personal interests and goals. Based on the opportunities to broaden and deepen their involvement.” Thus, sus- positive results, our recommendation system is now fully integrated tained engagement through the use of intelligent recommendations with SciStarter. The research has transformed how SciStarter helps can improve data quality and scientific outcomes for the projects projects recruit and support participants and better respond to their and the public. needs. Yet, finding the RIGHT project–one that matches interests and capabilities, is like searching for a needle in a haystack [5, 24]. Ponciano et al. [22] who characterized volunteers’ task execution Proceedings of the ImpactRS Workshop at ACM RecSys ’20, September 25, 2020, Virtual Event, Brazil. Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons 1 https://scistarter.org/seek-by-inaturalist License Attribution 4.0 International (CC BY 4.0). 2 https://scistarter.org/cocorahs-rain-hail-snow-network . 3 https://scistarter.org/stall-catchers-by-eyesonalz users’ interactions on the SciStarter portal, (e.g., searching for a project). The output of the algorithm is a function from user profile and past history of interactions on SciStarter to a ranking of 10 projects in order of inferred relevance for the user. We measured two types of user interactions, which were taken as the input to the algorithms: (1) Interactions with projects: data generated as a result of users’ activities with projects, e.g joining a project, making a contribution to a project or participating in a project. (2) Interactions on Scistarter portal, such as searching for a project, or filling a form about the project. The algorithm matches a user profile and his past history of interactions and outputs a ranking of 10 projects in decreasing order of relevance for each user. We conducted a randomized controlled study, in which hundreds Figure 1: SciStarter User Interface of registered SciStarter users were assigned recommendations by algorithms using different approaches to recommend projects. The first approach personalized projects to participants by using collab- patterns across projects and showed that volunteers tend to explore orative filtering algorithms (item-based and user-based), and matrix multiple projects in citizen science platforms, but they perform tasks factorization (SVD) algorithms. These algorithms were compared to regularly in just a few of them. This result is also reflected in users’ two non-personalized algorithms: the first algorithm recommended participation patterns in Scistarter. Figure 2 shows a histogram of the most popular projects at that point in time, and the second algo- the number of projects that users contribute to on the site between rithm recommended three projects that were manually determined 2017 and 2019. As shown by the Figure, the majority of active users by the SciStarter admins and custom to change during the study. in the SciStarter portal do not contribute to more than a single The results show that people receiving the personalized recommen- project. dations were more likely to contribute to new projects that they SciStarter employs a search engine (shown in Figure 3) that had never tried before and participated more often in these projects uses topics, activities, location and demographics (quantifiable when compared to participants who received non-personalized rec- fields) to suggest project recommendations. However, recommend- ommendations, or did not receive recommendations, In particular, ing projects based on this tool has not been successful. To begin the cohort of participants receiving recommendations created by with, our analysis shows that about 80% of users do not use the the SVD algorithm (matrix factorization) exhibited the highest lev- search tool. Second, those who use the search tool For example, els of contributions to new projects, when compared to the other when querying outdoor projects, the search engine recommends personalized groups. A follow-up survey conducted with the SciS- the CoCoRaHS project and Globe at Night, in which volunteers tarter community confirms that the Based on the positive results, measure and submit their night sky brightness observations. But our recommendation system is now fully integrated with SciStarter. data shows that people who join CoCoRaHS are more likely to join This research develops a recommendation system for citizen sci- Stall Catchers, an indoor, online project to accelerate Alzheimer’s ence domain. It is the first study using AI based recommendation research. algorithms in large scale citizen science platforms. We address this challenge by using recommendation algorithms to match individual volunteers with new projects based on the past history of their interactions on the site [2, 7]. Recommendation 1 RELATED WORK systems have been used in other domains, such as e-commerce, This research relates to past work in using AI to increase partic- news, and social media [8, 13]. However, the nature of interaction ipants’ motivation in citizen science research as well as work in in citizen Science is fundamentally different than these domains aplying recommendation systems in real world settings. We list in that volunteers are actively encouraged to contribute their time relevant work in each of these two areas. and effort to solve scientific problems. Compared to clicking on an advertisement or a product, as is the case for e-commerce and news sites, considerable more effort is required from a citizen science vol- 1.1 Citizen Science - Motivation and level of unteer. Our hypothesis was that personalizing recommendations to engagement users will increase their engagement in the SciStarter portal as mea- Online participation in citizen science projects has become very sured by the number of projects that they contribute to following common [21]. Yet, most of the contributions rely on a very small the recommendations, and the extent of their contributions. proportion of participants [25]. In SciStarter, the group of partici- We attempted to enhance participant engagement to SciStarter pants who contribute to more than 10 projects is less than 10% of projects by matching users with new projects based on past history all users. However, in most citizen science projects, the majority of their interactions on the site. We adopted 4 different recommen- of participants carry out only a few tasks. Many researches have dation algorithms to the citizen science domain. The input to the explored the incentives and motivations of participants in order to algorithms consists of data representing users’ interactions with increase participants engagement. Kragh et al. [15] showed that affiliated projects (e.g., joining or contributing to a project), and participants in citizen science projects are motivated by personal Figure 2: Distribution of user participation in SciStarter projects as well. Segal et al. [29] have developed an intelligent approach which combines model-based reinforcement learning with off-line policy evaluation in order to generate intervention policies which significantly increase users’ contributions. Laut et al. [17] have demonstrated how participants are affected by virtual peers and showed that participants’ contribution can be enhanced through the presence of virtual peers. Ponciano et al. [22] characterized volunteers’ task execution pat- terns across projects and showed that volunteers tend to explore multiple projects in citizen science platforms, but they perform tasks regularly in just a few of them. They have also showed that Figure 3: Screenshot of existing search tool showing various volunteers recruited from other projects on the platform tend to criteria get more engaged than those recruited outside the platform. This finding is a great incentive to increase user engagement in SciS- tarter’s platform instead of in the projects’ sites directly, like we do in our research. interest and desire to learn something new, as well as by the de- In this research, we attempted to enhance participant engage- sire to volunteer and contribute to science. A prior work of Raddic ment with citizen science projects by recommending the user projects et al. [23] also showed that participants engagement has mainly which best suit the user preferences and capabilities. originated in pure interest in the project topic, such as astronomy and zoology. Yet, as we tested this finding in our collected data, we noticed that user interest is very diverse and does not include only 1.2 Increasing user engagement with one major topic of interest. Nov et al. [21] explored the different recommendations motivations of users to contribute, by separating this question to Similar to our work, other researchers, also tried to increase user quantity of contribution and quality of contributions. They showed engagement and participation by personalized recommendations. that quantity of contribution is mostly determined by the user Labarthe et al. [16] built a recommender system for students in interest in the project and by social norms while quality of con- MOOCs that recommends relevant and rich-potential contacts with tribution is determined by understanding the importance of the other students, based on user profile and activities. They showed task and by the user’s reputation. In our work we aimed to increase that by recommending this list of contacts, students were much only the quantity of contributions, since data about the quality of more likely to persist and engage in MOOCs. A subsequent work contribution is not available for us. of Dwivedi et al. [7] developed a recommender system that recom- A significant prior work was done in order to increase partici- mends online courses to students based on their grades in other pants engagement, which takes into consideration user motivation subjects. This recommender was based on collaborative filtering techniques and particularly item based recommendations. This 2 METHODOLOGY paper showed that users who interacted with the recommenda- Our goals for the research project were to (1) help users discover tion system increased their chance to finish the MOOC by 270%, new projects in the SciStarter ecosystem - matching them with compared to users who did not interact with the recommendation projects that are suitable to their preferences. (2) learn user behavior system. in SciStarter, and develop a recommendation system which will Some other studies that concern user engagement with recom- help increase the number of project they contribute to. (3) measure mendations systems showed how early intervention significantly users’ satisfaction with the recommendation system. increase user engagement. Freyne et al. [9] showed that users who We adopted several canonical algorithms from the recommen- received early recommendations in social networks are more likely dation systems literature: CF user based [28], CF item based [28], to continue returning to the site. They showed a clear difference Matrix Factorization [27], Popularity [1]. These approaches were in retention rate between the control group, which has lost 42% of chosen as they are all based on analyzing the interactions between the users, and a group that interacted with the recommendations, users and items and do not rely on domain knowledge which is lack- which has lost only 24% of the users. ing (such as project’s location, needed materials, ideal age group Wu et al. [32], showed how tracking user’s clicks and return etc.). Each algorithm receives as input a target user and the num- behaviour in news portals succeeds to increase user engagement ber of recommendations to generate (N). The algorithm returns a with their recommendation system. They formulated the optimiza- ranking of top N projects in decreasing order of relevance for the tion of long-term user engagement as a sequential decision making user. We provide additional details about each algorithm below. problem, where a recommendation is based on both the estimated immediate user click and the expected clicks results from the users’ 2.0.1 User-based Collaborative Filtering. In this algorithm, the rec- future return. ommendation is based on user similarities [28]. The ranking of Lin et al. [18], developed a recommendation system for crowd- a project for a target user is computed by comparing users who sourcing which incorporates negative implicit feedback into a pre- interacted with similar projects. We use a KNN algorithm [6] to dictive matrix factorization model. They showed that their models, find similar users, where the similarity score for user vector U1 which consider negative feedback, produce better recommenda- and user vector U2 from the input matrix, is calculated with cosine tions than the original MF approach of implicit feedback. They similarity. evaluated their findings via experiment with data from Microsoft’s 𝑈1 ∗𝑈2 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑈 1, 𝑈 2) = internal Universal Human Relevance System and showed that the ||𝑈 1||||𝑈 2|| quality of task recommendations is improved with their models. In We chose the value of 𝐾 to be the minimal number such that the our work, we use only positive implicit feedback, due to the low number of new projects in the neighborhood of similar users to the users traffic, where a significant evidence of negative feedback is target user equaled the number of recommendations. In practice 𝐾 hard to be found. was initially chosen to be 100 and increased until this threshold was Recommendation algorithms are mostly evaluated by their ac- met. This was done so that there will always be sufficient number curacy. The underlying assumption is that accuracy will increase of projects to recommend for users. user satisfaction and ultimately lead to higher engagement and retention rate. However, past research has suggested that accuracy 2.0.2 Item-based Collaborative Filtering. In this algorithm the rec- does not necessarily lead to satisfaction. Wu et al [31] investigated ommendation is based on project similarity [28]. The algorithm gen- the effects of popular approaches such as collaborative-filtering erates recommendations based on the similarity between projects and content-based to see if they have different effects on user satis- calculated using people’s interaction with these projects. Similarity faction. Results of the study suggested that product awareness (the score for project vector P1 and project vector P2 from the input set of products that the user is initially aware of before using any matrix, is calculated with cosine similarity. recommender system) plays an important role in moderating the 𝑃1 ∗ 𝑃2 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑃1, 𝑃2) = 𝑐𝑜𝑠 (𝜃 ) = impact of recommenders. Particularly, if a consumer had a relatively ||𝑃1||||𝑃2|| niche awareness set, chances are that content based systems would The algorithm then recommends on the top-N most similar garner more positive responses on the satisfaction of the user. On projects to the set of projects the user has interacted with in the the other hand, they showed that users who are more aware of past. popular items, should be targeted with collaborative filtering sys- tems instead. A subsequent work of Nguyen et al [20], showed that 2.0.3 Matrix Factorization - SVD. The Matrix factorization algo- individual users’ preferences for the level of diversity, popularity, rithm (SVD) directly predicts the relevance of a new project to a and serendipity in recommendation lists cannot be inferred from target user by modeling the user-project relationship [14, 27]. This their ratings alone. The paper suggested that user satisfaction can model-based algorithm (as opposed to the two memory based algo- be improved by integrating users’ personality traits into the process rithms presented earlier) was chosen since it is one of the leading of generating recommendations, which were obtained by a user recommendation system algorithms [11, 14, 26]. SVD uses a matrix study. where the users are rows, projects are columns, and the entries are values that represent the relevance of the projects to the users. This users-projects matrix is often very sparse and has many miss- ing values, since users engage with a very small portion of all the available items. The algorithm estimates the relevance of a target project for a Fig 4 shows results of the precision and recall curves for the 4 user by maintaining a user model and a project model that include examined algorithms. As can be seen from the figure, user-based hidden variables (latent factors) that can affect how users choose collaborative filtering and SVD are the best algorithms and their items. These variables have no semantics, they are simply numbers performance is higher than Popularity and Item based collaborative in a matrix; in reality, aspects like gender, culture, age etc. may filtering. The Popularity recommendation algorithm generated the affect the relevance, but we do not have access to them. lowest performance. The singular value decomposition (SVD) of any matrix 𝑅 is a factorization of the form 𝑈 𝑆𝑉 𝑇 . This algorithm is used in recom- mendation systems in order to find the multiplication of the three matrices 𝑈 , 𝑆, 𝑉 𝑇 , to estimate the original matrix 𝑅 and hence, to predict the missing values in the matrix. As mentioned above, the matrix 𝑅 includes missing values as users did not participate in all projects. We estimate the missing values which reflect how satisfied will the user be with an unseen project. In the settings of recommendation system, the matrix 𝑈 is a left singular matrix, representing the relationship between users and latent factors. 𝑆 is a rectangular diagonal matrix with non-negative real numbers on the diagonal, while 𝑉 𝑇 is a right singular matrix, indicating the similarity between items and latent factors. SVD decreases the dimension of the utility matrix 𝑅, by extracting its latent factors. It maps each user and item into a latent space with r dimensions and with this, we can better understand the relationship between users Figure 4: Precision/Recall results on offline data and projects, and compare between their vectors’ representations. Let R̂ be the estimation of the original matrix R. Given R̂, which includes predictions for all the missing values in R, we can rank each project for a user, by its score in R̂. The projects with the 3.2 Online Study highest ranking are then recommended to the user. In our settings, The second part of the study was an online experiment. Users like in the other algorithms described before, R̂ is a binary matrix. who logged on to SciStarter starting on December 2nd, 2019 were randomly assigned to one of 5 cohorts, each providing recommen- dations based on different algorithm: (1) Item-based Collaborative 3 RESULTS Filtering, (2) User-based Collaborative Filtering, (3) Matrix Factor- The first part of the study compares the performance of the different ization, (4) Most popular projects, (5) Promoted projects. Promoted algorithms on historical SciStarter Data. The second part of the projects were manually determined by SciStarter and often aligned study implements the algorithms in the wild, and actively assigns with social initiatives and current events. Among these projects are recommendations to users using the different algorithms. GLOBE Observer Clouds4 , Stream Selfie 5 and TreeSnap 6 . Another Of the 3000 existing projects SciStarter offers, 153 projects are example is FluNearYou, in which individuals report flu symptoms affiliate projects. An affiliate project is one that uses a specific API online, was one of the promoted projects during the COVID-19 to report back to SciStarter each time a logged in SciStarter user outbreak. These projects change periodically by the SciStarter ad- has contributed data or analyzed data on that project’s website or ministrators. app. As data of contributions and participation only existed for the The recommendation tool was active on SciStarter for 3 months. affiliate projects, we only used these projects in the study. Users who logged on during that time were randomly divided into cohorts, each receiving a recommendation from a different algo- 3.1 Offline Study rithm. Each cohort had 42 or 43 users. The recommendations were The training set for all algorithms consisted of data collected be- embedded in the user’s dashboard in decreasing order of relevance, tween January 2012 to September 2019. It included 6353 users who in sets of three, from left to write. Users could scroll to reveal more contributed to 127 different projects. For the collaborative filtering projects in decreasing or increasing order of relevance. Figure 5 and SVD algorithm, we restricted the training set to users that made shows the top three recommended projects for a target user. at least two activities during that time frame, whether contributing All registered users in SciStarter received notification via email to a project or interacting on the SciStarter portal. We chronolog- about the study, stating that the “new SciStarter AI feature provides ically split the data into train and test sets such that 10% of the personalized recommend projects based on your activity and inter- latest interactions from each user are selected for the test set and ests.” A link to a blog post containing more detailed explanations the remaining 90% of the interactions are used for the train set. of recommendation algorithms, their role in the study, emphasiz- As a baseline, we also considered an algorithm that recommends ing that‘ ‘all data collected and analyzed during this experiment project according to decreasing order of popularity, measured by on SciStarter will be anonymized." Also, users are allowed to opt the number of users who contribute to the project [1]. 4 https://scistarter.org/globe-observer-clouds We evaluate the top-n recommendation result using precision 5 https://scistarter.org/stream-selfie and recall metrics with varying number of recommendations. 6 https://scistarter.org/treesnap Figure 5: Screenshot of recommendation tool out of receiving recommendations at any point, by clicking on the link “opt out from these recommendations" in the recommendation panel. In practice, none of the participants selected the opt out Figure 6: Click through rate (top) and Hit rate (bottom) mea- option at any point in time. sures for online study Figure 6 (top) shows the average click trough rate (defined as the ratio recommended projects that the users accessed) and Figure 6 (bottom) shows the average hit rate (defined as the percentage of instances in which users accessed at least one project that was recommended to them). As shown by the Figure, both measures show a consistent trend, in which the user-based collaborative algo- rithms achieved the best performance, while the baseline method achieving worse performance. Despite the trend, the differences between conditions were not statistically significant in the 𝑝 < 0.05 range. We attribute this to the fact that we measured clicks on recommended projects rather than actual contributions which is the most important aspect for citizen science. To address this gap we defined two new measures that consider the contributions made by participants to projects, which considers the system utility and identified by Gunawardana and Shani [12]. Figure 7: Average activities on recommended projects The measures include the average number of activities that users (RecE), and on non-recommended projects (NoRecE) for carried out in recommended projects (RecE), and the average num- each condition ber of activities that users carried out in non-recommended projects (NoRecE). Figure 7 compares the different algorithms according to these two measures. The results show that users assigned to number of sessions in historical data. These results are statistically the intelligent recommendation conditions performed significantly significant. Although there is a clear trend that users in the 𝑆𝑉 𝐷 more activities in recommended projects than those assigned to condition achieved the highest number of sessions, these results the Popularity and Baseline conditions. Also, users in the SVD algo- were not significant in the 𝑝 < 0.05 range. rithm performed significantly less activities in non-recommended To explain the success of SVD’s good performance in the online projects than the Popularity and Baseline conditions. These results study, we note first that SVD is considered as a leading algorithm were statistically significant according to Mann-Whitney tests (see in the domain of recommendation systems [11, 26]. Second, in our Appendix for details). setting SVD tended to generate recommendations that participants Lastly, we measure the average number of sessions for users in had not heard about before, which the survey reveals to be more the different conditions, where sessions are defined as a continuous interesting to them. One participant remarked: "I did not click on length of time in which the user is active in a project. Figure 8 either project because I have looked at both projects (several times) shows the average number of sessions for users in the different previously", "I am more interested in projects I didn’t know exists cohorts, including the number of sessions for the historical data before". used to train the algorithms, in which no recommendations were Lastly, we note the obstacles we encountered when carrying provided. The results show that users receiving recommendations out the study. The first obstacle we encountered was the small from the personalized algorithms performed more sessions than the number of relevant projects that could be recommended. Out of anxiety". (3) Users who feel that the recommendations are not suit- able for their interests: "No interest in stall catchers", "The photos and title didn’t perfectly match what I am looking for". The survey provided evidence for the positive impact of using the recommendation systems in SciStarter, which include the following comments. “I am very impressed by the new Artificial Intelligence feature from SciStarter! Your AI feature shows me example projects that I didn’t know before exist", and “I like how personalized rec- ommendations are made for citizen science users". 4 CONCLUSION AND FUTURE WORK Figure 8: Average number of sessions for each condition This work reports on the use of recommendation algorithms to increase engagement of volunteers in citizen science, in which vol- unteers collaborate with researchers to perform scientific tasks. These recommendation algorithms were deployed in SciStarter, a portal with thousands of citizen science projects, and were eval- almost 3000 projects that SciStarter offers, we restricted ourselves uated in an online study involving hundreds of users who were to about 120 projects are affiliate projects which actively provide informed about participating in a study involving AI based recom- data of users’ interactions. Another obstacle was that we were mendation of new projects. We trained different recommendation constrained to a subset of users who log on to SciStarter and use algorithms using a combination of data including users’ behavior in it as a portal to contributing to the project, rather than accessing SciStarter as well as their contributions to the specific project. Our the project directly. Out of the 65,000 registered users of SciStarter, results show that using the new recommendation system led people only a small percentage are logged in to both SciStarter and an to contribute to new projects that they had never tried before and affiliate project. As a result, we have relatively few users getting led to increased participation in SciStarter projects when compared recommendations. In addition, some of SciStarter’s projects are to a baseline cohort group that did not receive recommendations. location-specific and can only be done by users in the same physical The outcome of this research project is the AI-powered Recommen- location. (e.g collecting a water sample from a particular lake located dation Widget which has been fully deployed in SciStarter. This in a particular city). Therefore, we kept track of users’ location project has transformed how SciStarter helps projects recruit and and restricted our recommendation system to be a location-based support participants and better respond to their needs. It was so system, which recommends users with projects they are able to successful in increasing engagement, that SciStarter has decided to participate in. make the widget a permanent feature of their site. This will help support deeper, sustained engagement to increase the collective intelligence capacity of projects and generate improved scientific, 3.3 User Study learning, and other outcomes. The results of this research have In order to learn what is the users’ opinion on the recommenda- been featured on the DiscoverMagazine.com 7 . While we observed tions, and their level of satisfaction, we conducted a survey with significant engagement with the recommendation tool, one may SciStarter’s users. Our survey was sent to all SciStarter commu- consider adding explanations to the recommendations in order nity users. 138 users have filled the survey, where each user was to increase the system’s reliability and user’s satisfaction with it. asked about the recommendations presented to him by the algo- Moreover, we plan to extend the recommendation system to include rithm he was assigned to. The survey included questions about content based algorithms, and test its performance as compared users’ overall satisfaction with the recommendation tool as well to the existing algorithms. We believe that integrating content in as questions about their pattern of behavior before and after the Citizen Science domain can be very beneficial. Even though users recommendations. The majority of users (75%) were very satisfied tend to participate in a variety of different projects, we want to be with the recommendation tool and claimed that the recommenda- able to capture more intrinsic characteristic of the projects, such as tions matched their personal interests and goals. The majority of the type of the task a user has to perform or the required effort. users (54%) reported they have clicked on the recommendations and visited the project’s site, while only 8% of users did not click the REFERENCES recommendation or visited the project site. Interestingly, users who [1] Hyung Jun Ahn. 2006. Utilizing popularity characteristics for product recom- were not familiar with the recommended projects before, clicked mendation. International Journal of Electronic Commerce 11, 2 (2006), 59–80. [2] Xavier Amatriain. 2013. Big & personal: data and models behind netflix recom- more on the recommendations, as well as users who previously mendations. In Proceedings of the 2nd international workshop on big data, streams performed a contribution to a project. and heterogeneous source Mining: Algorithms, systems, programming models and Users who did not click on the recommendations can be divided applications. ACM, 1–6. [3] Rick Bonney, Caren B Cooper, Janis Dickinson, Steve Kelling, Tina Phillips, into 3 main themes: (1) Users who don’t have the time right now or Kenneth V Rosenberg, and Jennifer Shirk. 2009. Citizen science: a developing will click the project in the future. (2) Users who feel that the recom- tool for expanding science knowledge and scientific literacy. BioScience 59, 11 mendations are not suitable for their skills and materials: "Seemed (2009), 977–984. out of my league", "I didn’t have the materials to participate". This 7 https://www.discovermagazine.com/technology/ai-powered-smart-project- behaviour was also discussed in [30], and was named "classification recommendations-on-scistarter [4] Dominique Brossard, Bruce Lewenstein, and Rick Bonney. 2005. Scientific knowl- International Conference on World Wide Web. 331–337. edge and attitude change: The impact of a citizen science project. International [31] Ling-Ling Wu, Yuh-Jzer Joung, and Jonglin Lee. 2013. Recommendation sys- Journal of Science Education 27, 9 (2005), 1099–1121. tems and consumer satisfaction online: moderating effects of consumer product [5] Hillary K Burgess, LB DeBey, HE Froehlich, Natalaie Schmidt, Elli J Theobald, awareness. In 2013 46th Hawaii International Conference on System Sciences. IEEE, Ailene K Ettinger, Janneke HilleRisLambers, Joshua Tewksbury, and Julia K 2753–2762. Parrish. 2017. The science of citizen science: Exploring barriers to use as a [32] Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning primary research tool. Biological Conservation 208 (2017), 113–120. is believing: Optimizing long-term user engagement in recommender systems. [6] Sahibsingh A Dudani. 1976. The distance-weighted k-nearest-neighbor rule. IEEE In Proceedings of the 2017 ACM on Conference on Information and Knowledge Transactions on Systems, Man, and Cybernetics 4 (1976), 325–327. Management. ACM, 1927–1936. [7] Surabhi Dwivedi and VS Kumari Roshni. 2017. Recommender system for big data in education. In 2017 5th National Conference on E-Learning & E-Learning Technologies (ELELTECH). IEEE, 1–4. A APPENDIX [8] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their impact on sales diversity. In Proceedings of the 8th ACM conference on Electronic A.1 Significance tests - number of activities commerce. ACM, 192–199. [9] Jill Freyne, Michal Jacovi, Ido Guy, and Werner Geyer. 2009. Increasing engage- A Mann-Whitney test was conducted to compare between each ment through early recommender intervention. In Proceedings of the third ACM condition in the online experiment. Table 1 presents the results conference on Recommender systems. ACM, 85–92. of the pairwise tests for the measures RecE and NoRecE that are [10] Cary Funk, Jeffrey Gottfried, and Amy Mitchell. 2017. Science news and infor- mation today. Pew Research Center (2017). significant. [11] Stephen Gower. 2014. Netflix prize and SVD. [12] Asela Gunawardana and Guy Shani. 2009. A survey of accuracy evaluation metrics of recommendation tasks. Journal of Machine Learning Research 10, 12 Condition1 Condition2 U n1 n1 DV Significant (2009). CFUserUser Popularity 473.5 43 43 RecE Yes [13] J Itmazi and M Gea. 2006. The recommendation systems: Types, domains and the ability usage in learning management system. In Proceedings of the International CFUserUser Baseline 406.0 43 43 RecE Yes Arab Conference on Information Technology (ACIT’2006), Yarmouk University, CFItemItem Popularity 458.5 43 43 RecE Yes Jordan. [14] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech- CFItemItem Baseline 396.0 43 43 RecE Yes niques for recommender systems. Computer 42, 8 (2009), 30–37. SVD Popularity 433.0 42 43 RecE Yes [15] Gitte Kragh. 2016. The motivations of volunteers in citizen science. environmental SVD Baseline 371.5 42 43 RecE Yes SCIENTIST 25, 2 (2016), 32–35. [16] Hugues Labarthe, François Bouchet, Rémi Bachelet, and Kalina Yacef. 2016. Does SVD CFItemItem 731.0 42 43 RecE Yes a Peer Recommender Foster Students’ Engagement in MOOCs?. International SVD Baseline 729.0 42 43 NoRecE Yes Educational Data Mining Society (2016). [17] Jeffrey Laut, Francesco Cappa, Oded Nov, and Maurizio Porfiri. 2017. Increasing Table 1: Online Metrics - Mann Whitney significance test citizen science contribution using a virtual peer. Journal of the Association for with p<0.05. DV=Dependent Variable Information Science and Technology 68, 3 (2017), 583–593. [18] Christopher H Lin, Ece Kamar, and Eric Horvitz. 2014. Signals in the silence: Models of implicit feedback in a recommendation system for crowdsourcing. In Twenty-Eighth AAAI Conference on Artificial Intelligence. [19] Engineering National Academies of Sciences, Medicine, et al. 2018. Learning through citizen science: enhancing opportunities by design. National Academies A.2 Significance tests - number of sessions Press. [20] Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018. A Mann-Whitney test was conducted to compare between each User personality and user satisfaction with recommender systems. Information condition in the online experiment, including the historical data Systems Frontiers 20, 6 (2018), 1173–1189. [21] Oded Nov, Ofer Arazy, and David Anderson. 2014. Scientists@ Home: what used to train the algorithms, called past-data. Table 2 presents the drives the quantity and quality of online citizen science participation? PloS one results of the pairwise tests that are significant. 9, 4 (2014), e90375. [22] Lesandro Ponciano and Thiago Emmanuel Pereira. 2019. Characterising volun- teers’ task execution patterns across projects on multi-project citizen science Condition1 Condition2 U n1 n2 Significant platforms. In Proceedings of the 18th Brazilian Symposium on Human Factors in Computing Systems. ACM, 16. CFUserUser Past-Data 5898.0 43 557 Yes [23] M Jordan Raddick, Georgia Bracey, Pamela L Gay, Chris J Lintott, Phil Mur- CFItemItem Past-Data 6502.0 43 557 Yes ray, Kevin Schawinski, Alexander S Szalay, and Jan Vandenberg. 2009. Galaxy SVD Past-Data 7284.0 42 557 Yes zoo: Exploring the motivations of citizen science volunteers. arXiv preprint arXiv:0909.2925 (2009). Popularity Past-Data 6683.5 43 557 Yes [24] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender systems: Baseline Past-Data 6978.5 43 557 Yes introduction and challenges. In Recommender systems handbook. Springer, 1–34. [25] Dana Rotman, Jenny Preece, Jen Hammock, Kezee Procita, Derek Hansen, Cynthia Table 2: Number of sessions in SciStarter - Mann Whitney Parr, Darcy Lewis, and David Jacobs. 2012. Dynamic changes in motivation in significance test with p<0.05 collaborative citizen-science projects. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 217–226. [26] Rowayda A Sadek. 2012. SVD based image processing applications: state of the art, contributions and research challenges. arXiv preprint arXiv:1211.7102 (2012). [27] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2002. Incremen- tal singular value decomposition algorithms for highly scalable recommender systems. In Fifth international conference on computer and information science, Vol. 1. Citeseer. [28] J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recommender systems. In The adaptive web. Springer, 291–324. [29] Avi Segal, Kobi Gal, Ece Kamar, Eric Horvitz, and Grant Miller. 2018. Optimizing Interventions via Offline Policy Evaluation: Studies in Citizen Science. In Thirty- Second AAAI Conference on Artificial Intelligence. [30] Avi Segal, Ya’akov Gal, Robert J Simpson, Victoria Victoria Homsy, Mark Hartswood, Kevin R Page, and Marina Jirotka. 2015. Improving productivity in citizen science through controlled intervention. In Proceedings of the 24th