A collaborative AI dataset creation for speech therapies Vita Santa Barletta1 , Fabio Cassano1 , Alessandro Pagano2 and Antonio Piccinno1 1 Department of Computer Science, University of Bari, Bari, Italy 2 Department of Economics and Finance, University of Bari, Bari, Italy Abstract Artificial Intelligence (AI) and Human-Computer Interaction are getting closer and closer in modern systems, leading to a slow but constant increasing synergy between the two topics. Text prediction and voice recognition are the most know application for AI techniques. Common examples are virtual keyboards, that suggests the next word to be used in the text, systems that recognise people’s sentences, voice commands in voice assistants such as Google Home and Alexa, and many others. However, things get more difficult in contexts where a specific recognition, outside the ”usual” models, are not the rule, but the exception, as in the case of the recognition of right and wrong phonemes in speech therapies. These difficulties often lie in the lack of the AI model generalisation abilities due to the small datasets used for training. In this position paper, we address this issue and we discuss the role that the culture of participation might have to support the dataset creation for speech therapy. Our aim is to investigate how combining the support of people in the creation of the phonemes samples, and the validation of those elements by the speech therapist, AI models’ accuracy can improve. Keywords Artificial Intelligence, Speech Therapy, Culture of Participation, End-User Development, Adaptive learning 1. Introduction In recent years it has been such a spread of techniques and methods of Artificial Intelligence (AI) in many contexts and systems, and today is difficult to identify a clear boundary between “AI systems” and “non-AI systems”. AI is today considered a universal solution to solve all problems and exploit all opportunities in digital life. Some researchers think that the main goal of artificial intelligence is to replace human choices and behaviour; others believe that such systems remain speculative and will not be able to reason as a human being [1]. Presently, AI remains an engineering discipline that explores well-defined problems, and the most successful contributions occurred in AI achieving specific objectives, providing the basis for the current hype surrounding artificial intelligence. Differently, human involvement is not a relevant design criterion in this kind of approach [2]. Proceedings of CoPDA2022 - Sixth International Workshop on Cultures of Participation in the Digital Age: AI for Humans or Humans for AI? June 7, 2022, Frascati (RM), Italy Envelope-Open vita.barletta@uniba.it (V. S. Barletta); fabio.cassano1@uniba.it (F. Cassano); alessandro.pagano@uniba.it (A. Pagano); antonio.piccinno@uniba.it (A. Piccinno) Orcid 0000-0002-0163-6786 (V. S. Barletta); 0000-0001-8041-4403 (F. Cassano); 0000-0002-7465-9778 (A. Pagano); 0000-0003-1561-7073 (A. Piccinno) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 81 Vita Santa Barletta et al. CEUR Workshop Proceedings 81–85 Human-centred artificial intelligence (HCAI) focuses on the development of AI that enhances human performance in ways that make the system reliable, safe, and trustworthy.[3] The evolution of the HCAI allowed easy access to artificial intelligence systems even for end-users and has stimulated the culture of participation. Moreover, it has emerged as a result of the shift from consumer cultures (in which people are confined to passive recipients of artefacts and systems) to cultures in which users are actively involved in the development and evolution of solutions to their problems. Industry researchers identified groups of users at opposite ends of a continuum: professional programmers are excited by computers because they can create computer programs, while domain experts use computers because they allow them to perform their jobs. The goal of supporting domain experts in developing and/or evolving systems does not mean transferring responsibility for a valid system design to the end-user. Yet, if a tool does not satisfy its intended users’ requirements, then end-user support should be used in conjunction with expert support to adapt and improve it. In cultures of participation, socio- technical environments should be designed to provide tools that make it easy to participate and continue generating value [4]. To facilitate participation, designers need to be aware of how much effort a user is willing to put into. This will affect how much he or she will contribute [5]. The participant effort can be minimised by providing the right kinds of meta-design tools, while the potential value one can create can be maximised by framing the problem and sharing results with others [6]. There is a design trade-off between AI and Human Computer Interaction (HCI) as they are (historically) on opposite sides of the spectrum of autonomy and control (AI provides for low user control as opposed to HCI) [7]. As a matter of fact, one of the goals of the ongoing world research is to explore the possible synergies that can lead AI to support HCI. Thereby, in recent years, several AI models and techniques have been successfully applied in different domains: for example, the user’s support on virtual keyboards, suggesting the best possible “next word” to write in a message. This proves that the HCI world might find a little overlap with AI and that there is potential for deeper interaction between the two areas in more than one context. However, researchers not always agree on this idea. For example in his work, Grudin expressively describes that those two fields have a lot in common, however not yet found a common point [8], on the other hand, Harper et Al. discussed the current role of HCI and AI, explaining how difficult could be to design a human-centered AI [9]. Lissetti et Al. suggest a merge between HCI and social sciences through AI [10], meanwhile the analysis developed by Winograd proposes a way to let HCI and AI interact with each other [11]. Nowadays it is also interesting to see how AI can be applied to concepts such as End User Development (EUD). Myers et Al. propose an approach based on Natural Language Processing to improve the development of spreadsheets and support users in understanding why their program might fail [12]. On the other hand, Buono suggest how formal languages can be used to dynamically manipulate data from heterogeneous sources such as Internet Of Things devices [13]. However, AI is not only related to spreadsheets or electronics-related work-space. For example, it is also applied to multiple scenarios governed by the user’s needs, such as smart home systems [14]. A proposal such as [15] describes how EUD can be applied to those kinds of systems, getting closer to the end-user needs to the smart home system management. Other approaches suggest how AI can be applied to improve the user experience with a mouse pointer, or how to improve the user’s emotional response [16, 17]. 82 Vita Santa Barletta et al. CEUR Workshop Proceedings 81–85 2. The speech therapy context Speaking well is one of the crucial aspects for a child during childhood. Currently, there are limited technological applications supporting speech therapists and patients (typically children below ten years old) during the therapy: some of them rely on the possibility of the patient to listen and repeat phonemes; others try to ”understand” how the patient pronounces the words and return an evaluation score to the speech therapist. Despite this, there are many limitations to this approach. • On the system side, it is not easy to automatically discriminate between close phonemes, leading to the misevaluation of the actual speech score; • On the user side, engagement decreases as long as the therapy goes on, leading to an early quit of the therapy. These critical issues can be resolved through the joint intervention of the disciplines of Human computer interaction and artificial intelligence. Currently, there is no structured and shared dataset for speech therapies capable of allowing a better assessment by the therapist and a better therapy for the patient. This paper proposes a collaborative approach for the creation and validation of a shared dataset to be applied to AI models with the aim to improve both children’s speech evaluation and overall speech therapy. We want to debate the support that AI can give to people in a specific scenario, such as speech therapies. In particular, we will discuss how AI can be exploited in systems used in this domain, and how it can be supported by cultures of participation (of the involved actors), through a collaborative approach. The final goal is to collaborative develop a comprehensive dataset for the most accurate training of AI models that are going to support speech therapies and therapists. Usually, a dataset is a merge of heterogeneous data produced by, for example, sensors or people. AI models, which base their accuracy on the quality of the data, usually find it difficult to classify the “good” from the “bad”ones, leading to poor training and test performances. The rationale behind our idea is depicted in Figure 1. We can imagine the culture of participation on two different levels: the first one (called “Level 1”) is made by users all over the world speaking the same language that, according to a given list of phonemes, are able to record and label them. The process between them is of collaboration, as each user independently record and send a phoneme to a shared database. This is going to be validated by the speech therapists (named “Level 2”) who evaluate the recorded phonemes approving or rejecting them. The evaluation criteria might be personal (each speech therapist chose and validate one record) or cooperative, where two or more express their evaluation on the record. To improve the user’s engagement in the process, each record is scored following a gamified approach (for example using a five-star rating widely used in gamification). After these steps, the “cooperative” dataset for the speech therapy, can be used to train the AI model, and used in real-world scenarios. One of the most important aspects of the data creation/validation for AI models is the human factor. As we have shown in our model, people are involved in two distinct steps: the creation of the entry and the validation of the labels. The failure of one of those implies a bad dataset creation and an overall failure of the entire system to support the speech therapists. For this reason, we can imagine applying to both the steps through some gamification techniques, which are highly tested to 83 Vita Santa Barletta et al. CEUR Workshop Proceedings 81–85 Figure 1: The representation of the collaborative dataset creation: the “Level 1” group represent the collaboration between people to create a phonemes dataset and its labelling. The “Level 2” one represent the collaboration between speech therapist that validate this dataset be an effective stimulant for people to increment the creation or the validation of the dataset entries [18]. 3. Conclusions In recent years HCI and AI slowly but constantly have proved that they belong to two different faces of the same medal. A closer bond between the two, allowed an increasing number of works to merge the techniques from both HCI and AI. This paper discussed the possibility to use a collaborative AI dataset creation to support speech therapies. In particular, we introduced how important it is, for modern AI models, to use well-curated datasets. Thus we proposed a model that could support this: by applying the culture of participation, people can support the basic creation of phonemes records; by using the same approach, speech therapists validate all the created entries. With this dataset, many AI models can be trained in order to correctly recognise phonemes during speech therapies, supporting therapists and patients. The entire so-described approach could be supported by gamification, allowing a constant flow for new phonemes records and their validation. Furthermore: we have to validate the proposed approach and investigate any possible difficulty of the involvement of the people in the entire dataset creation and validation. Lastly, tests with real users and speech therapists/patients must be performed to validate our work. References [1] R. Kurzweil, The singularity is near : when humans transcend biology (2005) 652. [2] E. Brynjolfsson, A. McAfee, The second machine age: Work, progress, and prosperity in a time of brilliant technologies., W W Norton & Co, 2014. 84 Vita Santa Barletta et al. CEUR Workshop Proceedings 81–85 [3] B. Shneiderman, Bridging the gap between ethics and practice: Guidelines for reliable, safe, and trustworthy human-centered ai systems, ACM Transactions on Interactive Intelligent Systems (TiiS) 10 (2020) 1–31. [4] M. T. Baldassarre, V. S. Barletta, D. Caivano, A. Piccinno, A Visual Tool for Supporting Decision-Making in Privacy Oriented Software Development, Association for Computing Machinery, New York, NY, USA, 2020. [5] A. Marengo, A. Pagano, L. Ladisa, Mobile gaming experience and co-design for kids: Learn german with mr. hut, Proceedings of the European Conference on e-Learning, ECEL 2016-January (2016) 467 – 475. [6] G. Fischer, End-user development: Empowering stakeholders with artificial intelligence, meta-design, and cultures of participation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12724 LNCS (2021) 3–16. [7] G. Fischer, Exploring design trade-offs for quality of life in human-centered design, Interactions 25 (2017) 26–33. [8] J. Grudin, Ai and hci: Two fields divided by a common focus, Ai Magazine 30 (2009) 48–48. [9] R. H. Harper, The role of hci in the age of ai, International Journal of Human–Computer Interaction 35 (2019) 1331–1344. [10] C. L. Lisetti, D. J. Schiano, Automatic facial expression interpretation: Where human- computer interaction, artificial intelligence and cognitive science intersect, Pragmatics & cognition 8 (2000) 185–235. [11] T. Winograd, Shifting viewpoints: Artificial intelligence and human–computer interaction, Artificial intelligence 170 (2006) 1256–1258. [12] B. A. Myers, A. J. Ko, C. Scaffidi, S. Oney, Y. Yoon, K. Chang, M. B. Kery, T. J.-J. Li, Making end user development more natural, in: New Perspectives in End-User Development, Springer, 2017, pp. 1–22. [13] P. Buono, F. Cassano, A. Legretto, A. Piccinno, Eudroid: a formal language specifying the behaviour of iot devices, IET Software 12 (2018) 425–429. [14] V. S. Barletta, P. Buono, D. Caivano, G. Dimauro, A. Pontrelli, Deriving smart city security from the analysis of their technological levels: a case study, in: 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), 2021, pp. 1–6. [15] P. Buono, F. Balducci, F. Cassano, A. Piccinno, Energyaware: a non-intrusive load moni- toring system to improve the domestic energy consumption awareness, in: Proceedings of the 2nd ACM SIGSOFT International Workshop on Ensemble-Based Software Engineering for Modern Computing Platforms, 2019, pp. 1–8. [16] K. E. Souza, M. C. Seruffo, H. D. De Mello, D. D. S. Souza, M. M. Vellasco, User experi- ence evaluation using mouse tracking and artificial intelligence, IEEE Access 7 (2019) 96506–96515. [17] C. C. Gomes, S. Preto, Artificial intelligence and interaction design for a positive emotional user experience, in: International Conference on Intelligent Human Systems Integration, Springer, 2018, pp. 62–68. [18] A. Marengo, A. Pagano, L. Ladisa, Game-based learning in mobile technology, 17th International Conference on Intelligent Games on Simulation, GAME-ON 2016 (2016) 80–84. 85