=Paper=
{{Paper
|id=Vol-2747/paper16
|storemode=property
|title=Development, reuse, and repurposing of software artifacts in Digital Citizen Science. Are we reinventing the wheel?
|pdfUrl=https://ceur-ws.org/Vol-2747/paper16.pdf
|volume=Vol-2747
|authors=Alejandra Beatriz Lliteras,Diego Torres,Cesar Alberto Collazos, Alejandro Fernandez
}}
==Development, reuse, and repurposing of software artifacts in Digital Citizen Science. Are we reinventing the wheel?==
Development, reuse, and repurposing of software artifacts in Digital Citizen Science. Are we reinventing the wheel? Alejandra B. Lliteras1,2, Diego Torres1,2,3, César A. Collazos4, Alejandro Fernandez1,2 1 UNLP, Facultad de Informática, LIFIA. La Plata, Buenos Aires, Argentina. 2 CICPBA, Buenos Aires, Argentina. 3 UNQ, Dto. CyT. 4 IDIS research group, Universidad del Cauca-Colombia. {alejandra.lliteras, diego.torres, alejandro.fernandez }@lifia.info.unlp.edu.ar ccollazo@unicauca.edu.co Abstract. In the production of software artifacts, it is possible to start from scratch, reuse existing artifacts, or even repurpose artifacts produced with another purpose in mind. As an application domain matures, often developing from scratch and repurposing it leads to reuse. Reuse not only reduces time and costs but also acts as a mechanism to encapsulate and disseminate the knowledge of domain experts. With software being a central ingredient to mediate the participation of volunteers in digital citizen science, it would be expected to observe various developments with reusable devices. However, reuse is rare today. Through a systematic review, we study the software production strategies reported during the last decade in citizen science projects. We observe that there is still a high amount of development from scratch, so we open the debate on the usefulness of designing reuse processes focused on reusers to promote this strategy. Keywords: Software Engineering, Development, Reuse, Repurpose, Digital Citizen Science 1 Introduction Citizen Science is the way to carry out research projects involving "volunteers", "citizens" or "citizen scientists" as an important part of these projects, as indicated in [1]. Volunteers involved in Citizen Science projects form a community and carry out tasks that could not be carried out solely by experts or through computational methods [2]. In this way, human intelligence is intertwined with the resolving power of Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 computers. Although the participation of volunteers in science is not a new practice, it is the participation on the scale and in the forms proposed by modern Citizen Science with the support of ICTs that makes it interesting. Volunteers may have different motivations, profiles, and goals; diverse cultural contexts and languages. It is very important, according to Skarlatidou et al. [3] that these volunteers feel confident and satisfied with the technology they use. It is possible to approach Citizen Science from different disciplines, such as Social Computing, Software Engineering and Human-Computer Interaction [4]. Also, according to Celino et al. [5] it is possible to approach it from different perspectives, such as social, socioeconomic and technological. In particular, when the CC adopts information and communication technology as a pillar, it is called Digital Citizen Science (DCC) [6]. From now on we will use the acronym DCC to denote Digital Citizen Science. Software plays a central role in the DCC. In a DCC project, volunteers perform certain actions such as collecting, classifying and analyzing samples or solving new challenges. These actions are carried out with the support of information technologies. For example, taking and submitting photos relevant to the domain of interest (as done in the iNaturalist1 project) is one way of collecting. Assigning a sample to predefined classes (as is done in GalaxyZoo2) is one way to classify. Looking closely at a sample to produce annotations that record characteristics (as is done in Worlds of Wonder3) is one way to analyze. Using a game to fold proteins (as is done in FoldIt4) or to map neurons (as is done in EyeWire5) are particular examples for the action of solving new challenges. From the perspective of Software Engineering there are various mechanisms to reduce the effort of creating applications and improve their quality. Additionally, from the perspective of Human-Computer Interaction, various strategies are provided to consider during development, so that the user feels confident with the use of the proposed applications. In general, for Walton & Maiden [7], software reuse is a way of increasing productivity and reducing the time to obtain a technological solution. In this work, we ask whether in DCC projects, more software artifacts are reused than those specifically developed. This is in accordance with the current trend in the growth of software reuse in general, as described in [8]. To answer this question, a survey and subsequent bibliographic analysis was carried out, which consisted of identifying, for each article, which of the software obtaining strategies was applied. The present work is organized as follows: in Section 2, software is described as a relevant dimension in Citizen Science. In section 3, the framework for the survey of software artifacts that mediate the participation of volunteers in Citizen Science projects in relation to the strategies used to obtain them and the actions carried out by volunteers is presented. In Section 4, the results of the survey are shown, in Section 5 a Discussion is proposed and finally, in Section 6, Conclusions are presented. 1 https://www.inaturalist.org/ 2 https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/ 3 https://www.zooniverse.org/projects/lbeiermann/worlds-of-wonder 4 https://fold.it/ 5 https://eyewire.org/explore 2 2 Software as a dimension in Digital Citizen Science projects In Software Engineering there are strategies to reduce the effort of creating applications and improve their quality. The reuse of knowledge about the domain, in the form of design patterns or best practices, reduces design effort and improves the quality of solutions. Code reuse, in the form of libraries, services, and frameworks, reduces development effort. In general, and according to Walton & Maiden [7], software reuse is a way to increase productivity and quality, as well as to reduce the time to obtain a technological solution. Configurable and adaptable software, while more complex to build, multiplies reusability while eliminating the need to involve expert programmers. This saves time and resources, to invest efforts in specific aspects of the Citizen Science project rather than in the technology that supports it [9]. At DCC, software artifacts mediate actions taken by volunteers. In this work three strategies are identified to obtain these artifacts: "Development", "Reuse" and "Repurpose". Next, we describe each one of them. The "Development" strategy involves the development of a new software artifact with the primary (and possibly only) intention of supporting a specific Citizen Science project. With this strategy, you could create a new software artifact from scratch or include what Taivalsaari et al. [8] mention it as a development from "ad hoc reuse". This refers to development using libraries or non-specific components of Citizen Science. This strategy includes extensive coding. The "Reuse" strategy, in this work, includes the use (for example, as services) or the deployment (deploy) of artifacts already available that were conceived for another Citizen Science project. Luna et al. [10] mention the reuse of existing applications that require little customization, avoiding creating a new application from scratch. According to Varnell-Sarjeant & Andrews [11], there are several software reuse strategies. On the other hand, the "Repurpose" strategy refers to the adaptation or application of general purpose software to provide support in some technological aspect of a Citizen Science project. That is, artifacts that were not specifically designed for DCC and that apply to a project. Let’s discuss some examples. Bagnolini et al. [12] present “BiodiveCity”, an application that allows volunteers to take and send photographs to investigate biodiversity, in which the position of the sample is additionally captured. The goal is to register animals and plants on campus. BiodiverCity was developed specifically for this project. Hsu et al. [13], propose a DCC project to address the problem of air pollution in a community. For this, the volunteers propose different hardware and software artifacts. Google Forms is one of the software artifacts adopted in the project for volunteers to upload smell reports. In this project, a more general-purpose application (Google Forms) is repurposed to support a particular DCC project. On the other hand, Simpson et al. [14] present a web platform called Zooniverse, where volunteers analyze existing audio, image or video samples. With this platform, volunteers can identify, mark and tag or classify submitted samples. Zooniverse acts as a portfolio of DCC projects, which offers authoring tools to create classification and analysis projects, following a common methodology. In addition to the Web platform, the creators of Zooniverse provide the ability to access the project's source code under 3 an open source license. Zooniverse encourages reuse of both the authoring tool and its source code. 3 Survey on the Development, Reuse and Repurposing of software artifacts that mediate interaction The objective of the survey in this work is to find computer science articles, written in English (at least their title, abstract and keywords) that have been published until May 2019. Two main search terms were considered: "Citizen Science" and "Software Engineering". For "Software Engineering" the following first level derived terms were considered: "Software process", "Software design", "Software implementation", these derived terms are some of the activities mentioned for the software development process described in the ISO 12207. Finally, for each of the terms mentioned above, second-level derived terms were established. The search string was formed using the AND operator between the terms and using the logical OR operator for the first and second level terms (synonyms), as proposed in [15]. The parentheses were also used to separate the logic of each level from the terms used in the search. The data sources considered were Scopus and IEEXplore. The articles that were included in this study come from conferences, journals, workshops and book chapters. The articles found from the search strategy in the data sources described above were analyzed to determine their inclusion. The inclusion criterion applied refers to articles that use a software production strategy for Citizen Science digital projects and, in particular, those in which the volunteer performs some action using the produced software artifact (the artifact mediates the action of person). Both the search string and the articles analyzed can be consulted in [16]. 4 Results To obtain the articles to be analyzed in this work, the following steps were carried out: 1) Search in bibliographic sources, 2) Elimination of duplicates, 3) Reading of the title, abstract and keywords, 4) Complete reading of the articles. The number of articles obtained in each of the steps mentioned above can be visualized in Fig. 1. Fig. 1. Result set 4 This work considers articles from journals and conferences published until May 30, 2019. The number of publications in the area of Computer Science, related to the development of software to be used by a volunteer has varied throughout of the years. The first publication produced by the search appears in 2010. Fig. 2 shows the variation in the number of articles from 2010 to 2019. The quantities are discriminated between conference and journal articles for each year of publication. Fig. 2: Distribución de los artículos en los años relevados Once the articles were quantified by class and by their year of publication, the following question was answered: In what quantity is each identified strategy presented? To answer the question previously introduced, for each of the three strategies previously described (Development, Repurpose and Reuse), the corresponding number of articles was determined. Fig. 3 shows the numbers obtained for each of them, considering that, in some cases, the software artifacts combine strategies. Fig. 3: Distribution of strategies for obtaining software artifacts When analyzing the graph presented in Fig. 3, it is visualized that the most used strategy to obtain software artifacts is development. As a result of the visualization of the graph (Fig. 3), it was decided to analyze the behavior, over time, of the use of each of the strategies, since an emerging hypothesis is that the greatest amount of 5 development of these artifacts occurred in the first years and that, as the years passed, a greater maturity was achieved, increasing the strategies of repurpose and reuse. In Fig. 4, the graph with the distribution of the separate articles in the years covered by the study is shown. Fig. 4: Evolution of strategies over time Fig. 4 shows that the development strategy currently prevails over the other two. The observed phenomenon contradicts the original hypothesis in which a greater amount of development of these artifacts was expected during the first years. When analyzing the reuse strategy, it can be seen that it did not increase over the years, but was maintained. These values are surprising because they do not follow the trend of reuse of software artifacts as indicated by Taivalsaari et al. [8]. In particular, reuse is manifested in works [10], [17], [18] and [19]. Finally, regarding the repurpose, a slight growth is detected in the last year. The artifacts are SMS [20], [21], Twitter [22], [23], Facebook [24] and Google documents [13]. Lastly, but not least, it was decided to analyze the artifacts according to the actions carried out by the volunteers. Fig. 5 shows the results. Fig. 5: Actions carried out by volunteers and mediated by software artifacts Fig. 5 shows the great prevalence of artifacts where the volunteer collects data as an action. 6 5 Discussion Reusing software artifacts allows you to take advantage of the knowledge that these software artifacts encapsulate. Additionally, when developing artifacts for later reuse, on the one hand a high level of knowledge of the domain for which they are conceived is necessary [25] and, on the other, the knowledge of technology experts (for example, software engineers) to propose a coherent software solution is required [26]. When developing a new software artifact, it is also necessary to consider aspects of the action that the artifacts will mediate with people (interaction of people with the artifact) and its usability [27]. The development of software to support a DCC project can be viewed from two dimensions. On the one hand, support for the methodological aspects and, on the other, support for the community of people in the project. From a methodological perspective, a software artifact incorporates knowledge, for example, about how to collect samplers for a specific domain and how to guide the user to make a valuable contribution to the project. Regarding the community perspective, the software artifact incorporates the knowledge of how to carry out recruitment, training and retention [22]. In this way, technological support crosses both dimensions. Under the premise that software components incorporate knowledge, the implications (advantages and disadvantages) of building them from scratch must be analyzed. For example, it is possible to wonder what it would take to develop a communication tool (for example, a social network) from scratch to support communication between members of a community. It is known that there are many tools for this purpose, which are widely used by the general public, and which are not only highly proven, but also conceived by multidisciplinary teams of experts who contributed their knowledge. In this way, by adopting one of these general-purpose tools, you take advantage of all that built-in knowledge and it is also very likely that the people who join the project will know how to use them. On the contrary, by not adopting a pre-existing one, as an advantage it can be thought that flexibility and customization are added through ad-hoc development for the domain of the project, but as a disadvantage the loss of knowledge already incorporated in this type of tools. When carrying out specific software developments for each project, the loss of interoperability must be analyzed, and the weakness when integrating with other projects to generate a possible collaboration network in DCC. Additionally, another problem that can emerge with a new software development is the technological and economic solvency to store the large volumes of data for each project and the recovery techniques for these data, which end up being limited to the particular design and making it impossible it reuses in other projects. On the other hand, the analysis shows that the type of action mediated by the software artifacts carried out by the volunteers is data collection. As a result, it is valid to ask yourself some questions, such as, for example, what happens when the same volunteer or community of volunteers participate in more than one project at the same time? In this context, will the volunteer have to learn to use a different interface for each project in which he participates? How does the use of multiple applications and multiple styles of carrying out the same task affect your personal confidence? Another aspect of this massive action by people in DCC projects is what is the real level of 7 volunteer participation? Why is it limited to collecting samples, when it could participate more actively in the project by performing more actions? Another question that emerges, as a result of the little reuse detected, is whether Software Engineering and HCI should join forces to propose processes that allow the construction of reusable software artifacts with a focus on "reusers" (volunteers and scientists who wish to propose their own Projects, people who do not necessarily have knowledge of software development). Considering also that, on the scale of the volunteer participation, there is multiculturalism, different legislation on the data that is generated and analyzed, as well as technological limitations and various individual and group motivations. How can this collective knowledge multiply and serve other emerging communities? Could it be that with each new software development, is the wheel being reinvented? It is considered very relevant to open the debate on the usefulness of proposing a reuse process focused on reusers, which is simple, and usable for people who are not experts in software development. Furthermore, taking into account the multidisciplinary nature of Digital Citizen Science projects. 6 Conclusions In this work, a survey and analysis of articles that refer to software artifacts to support Digital Citizen Science projects was presented. Those artifacts were classified according to their acquisition strategy: Development, Reuse, and Repurpose. Additionally, the distribution of the articles was analyzed according to the actions carried out by the volunteers (actions mediated by the artifacts). Currently most of the software artifacts are obtained through the "Development" strategy and are for actions of "Collect" data. In order to promote the reuse strategy, it is planned to work on proposing a reuse process focused on "reusers" (people who do not necessarily have knowledge of software development). References 1. Cohn, J. P. (2008). Citizen science: Can volunteers do real research?. BioScience, 58(3), 192-197. 2. Lintott, C. J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., ... & Murray, P. (2008). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389(3), 1179-1189. 3. Skarlatidou, A., Hamilton, A., Vitos, M., & Haklay, M. (2019). What do volunteers want from citizen science technologies? A systematic literature review and best practice guidelines. JCOM: Journal of Science Communication, 18(1). 4. Preece, J. (2016). Citizen science: New research challenges for human–computer interaction. International Journal of Human-Computer Interaction, 32(8), 585-612. 8 5. Celino, I., Corcho, O., Hölker, F., & Simperl, E. (2018). Citizen science: design and engagement (dagstuhl seminar 17272). In Dagstuhl Reports (Vol. 7, No. 7). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 6. Nov, O., Arazy, O., & Anderson, D. (2011, February). Dusting for science: motivation and participation of digital citizen science volunteers. In Proceedings of the 2011 iConference (pp. 68-74). ACM. 7. Walton, P., & Maiden, N. (Eds.). (2019). Integrated software reuse: management and techniques. Routledge. 8. Taivalsaari, A., Mikkonen, T., & Mäkitalo, N. (2019, August). Programming the Tip of the Iceberg: Software Reuse in the 21st Century. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) (pp. 108-112). IEEE. 9. Tangmunarunkit, H., Hsieh, C. K., Longstaff, B., Nolen, S., Jenkins, J., Ketcham, C., ... & Khalapyan, Z. (2015). Ohmage: A general and extensible end-to-end participatory sensing platform. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), 38. 10. Luna, S., Gold, M., Albert, A., Ceccaroni, L., Claramunt, B., Danylo, O., ... & Radicchi, A. (2018). Developing mobile applications for environmental and biodiversity citizen science: considerations and recommendations. In Multimedia Tools and Applications for Environmental & Biodiversity Informatics (pp. 9-30). Springer, Cham. 11. Varnell-Sarjeant, J., & Andrews, A. A. (2015). Comparing reuse strategies in different development environments. In Advances in Computers (Vol. 97, pp. 1-47). Elsevier. https://doi.org/10.1016/bs.adcom.2014.10.002 12. Bagnolini, Guillaume and Da Costa, Georges and Gerino, Magalie and Roth, Mathias and Trân, Cécile Multidisciplinarity for biodiversity management on campus through citizen sciences. In: 2nd Workshop on Smart and Sustainable City (WSSC 2017) in conjunction with 2017 IEEE Smart World Conference, 4 August 2017 (San Francisco, United States) (2017). 13. Hsu, Y. C., Dille, P., Cross, J., Dias, B., Sargent, R., & Nourbakhsh, I. (2017, May). Community-empowered air quality monitoring system. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1607-1619). 14. Simpson, R., Page, K. R., & De Roure, D. (2014, April). Zooniverse: observing the world's largest citizen science platform. In Proceedings of the 23rd international conference on world wide web (pp. 1049-1054). ACM. 15. Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., & Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software, 80(4), 571-583. 16. Lliteras, A.B., Fernandez A., Torres D. (2020) Result Set_Desarrollo, reuso, y resignificación de artefactos de software en Ciencia Ciudadana. ¿Reinventando la rueda?. https://doi.org/10.5281/zenodo.3968740 17. Sheppard, S. A., Wiggins, A., & Terveen, L. (2014, February). Capturing quality: retaining provenance for curated volunteer monitoring data. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 1234-1245). ACM. 18. Brovelli, M. A., Minghini, M., & Zamboni, G. (2016). Public participation in GIS via mobile applications. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 306- 315. 19. Chen, L. J., Ho, Y. H., Lee, H. C., Wu, H. C., Liu, H. M., Hsieh, H. H., ... & Lung, S. C. C. (2017). An open framework for participatory PM2. 5 monitoring in smart cities. IEEE Access, 5, 14441-14454. 20. Martinelli, M., & Moroni, D. (2018). Volunteered geographic information for enhanced marine environment monitoring. Applied Sciences, 8(10), 1743. 9 21. Beza, E., Reidsma, P., Poortvliet, P. M., Belay, M. M., Bijen, B. S., & Kooistra, L. (2018). Exploring farmers’ intentions to adopt mobile Short Message Service (SMS) for citizen science in agriculture. Computers and Electronics in Agriculture, 151, 295-310. 22. Tapia, A. H., LaLone, N. J., MacDonald, E., Priedhorsky, R., & Hall, M. (2014, January). Crowdsourcing rare events: Using curiosity to draw participants into science and early warning systems. In ISCRAM. 23. II, R. T. B., Lundgren, L., Crippen, K. J., & MacFadden, B. J. (2018, June). Designing for Public Participation in Paleontology Through the Development of an App. In ECSM 2018 5th European Conference on Social Media (p. 462). Academic Conferences and publishing limited. 24. Jambeck, J. R., & Johnsen, K. (2015). Citizen-based litter and marine debris data collection and mapping. Computing in Science & Engineering, 17(4), 20-26. 25. Von Krogh, G., Spaeth, S., & Lakhani, K. R. (2003). Community, joining, and specialization in open source software innovation: a case study. Research policy, 32(7), 1217-1241. 26. Annaiahshetty, K., & Prasad, N. (2013, April). Expert System for Multiple Domain Experts Knowledge Acquisition in Software Design and Development. In 2013 UKSim 15th International Conference on Computer Modelling and Simulation (pp. 196-201). IEEE. 27. Calp, M. H., & Akcayol, M. A. (2019). The importance of human computer interaction in the development process of software projects. arXiv preprint arXiv:1902.02757. 10