Selecting Adequate Machine Learning Methods for Human-Computer Interaction Data Sets: Guidelines and a Conceptual Structure Anna C. Weigand1,2,* 1 University of Seville, Spain 2 University of Applied Sciences Emden/Leer, Germany Abstract Data sets in human-computer interaction (HCI) research vary widely in their characteristics. Due to the methods of data collection, such as usability testing and interviews, HCI data sets are usually smaller than big data sets, such as those of social media and image processing. However, HCI data sets can also be complex and multi-dimensional, which makes pattern recognition difficult. Therefore, we investigate the impact factors and parameters to consider when applying machine learning (ML) methods to HCI data sets. Through iterative structured and unstructured expert interviews and machine learning experiments, we identify impact factors and parameters that lead to a conceptual structure. Overall, our aim is to deduce and provide decision-making support for selecting an adequate ML method for HCI data sets. Keywords human-computer interaction, human-centered design, data set, machine learning, small data, conceptual structure, conceptual model, metamodel, design science research methodology 1. Motivation In the research field of human-computer interaction (HCI) [1], a wide variety of HCI data sets is available. Methods of data collection include usability tests, interviews, focus groups, and surveys [2]. Data can also be extracted from large-scale user studies [3, 4], eye-tracking experiments [5], and web tracking [6]. Although all of these data sets are HCI data sets, they have varying characteristics [3, 5, 7, 8]. The most apparent one is data set volume, which tends to be small in HCI research (in contrast to big data sets from, e.g., social media) and typically includes only about 200 data points [9, 10]. In addition, data are collected in a variety of contexts with varying goals or purposes [11]. Data are also generated by different users, which can lead to biases [12] and variances [8], e.g., due to imbalanced data sets. Furthermore, since HCI data is primarily collected from and about users, it must be handled with care, both in terms of management and outcomes. Nevertheless, HCI data sets seem to be influenced by similar impact factors, such as the individuals who are the basis for data gathering (e.g., in terms of participation in interviews or usability tests) or regulations concerning data privacy. Some of these factors arise from certain standards, such CAiSE 2024 Doctoral Consortium * supervised by Maria José Escalona Cuaresma and Maria Rauschenberger $ anna.weigand@hs-emden-leer.de (A. C. Weigand)  0000-0003-2674-0640 (A. C. Weigand) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings as those provided by the human-centered design (HCD) process model as per ISO standard 9241-220 [13]. As HCI data are not necessarily big data but can be complex and multi-dimensional regardless of size, patterns may not be detected by standard data analysis methods, as illustrated by the Kano categorization of about 400 participants’ survey answers [3]. Therefore, machine learning (ML) can help researchers extract further insights from highly qualitative HCI data with less manual effort and perhaps better results. The role of ML in HCI research has already been discussed in the literature. However, a survey showed that the application of ML to HCI data sets is challenging and that appropriate process models are needed to proceed adequately [14]. Another study summarized some pitfalls to avoid when using ML for HCI research data, such as "classification accuracy is not hypothesis testing" and "causality versus correlation" [15]. Furthermore, according to the authors, referencing classifiers’ baseline performance ensures the transparency of the results. However, to the best of our knowledge, applying ML to HCI data sets has not yet been systematically investigated. This raises the fundamental question: How can one select an adequate approach and ML methods for HCI data sets? To answer this research question, we build on our research approach described below. 2. Research Approach As a basis for our research, we use the design science research methodology (DSRM) [16]. Figure 1 shows the particular steps of the DSRM as well as the related goals to be achieved. First, we identify the problem through a literature review [10]. We then set the objectives for solving the problem through our study concept. To design and develop a suitable solution to our problem, we conduct structured and unstructured interviews with HCI and ML experts to identify relevant aspects (i.e., those that affect the selection of ML methods for HCI data sets). In addition, we expect to gain a holistic view of these aspects through ML experiments on HCI datasets. To demonstrate the solution to our problem, we aim to develop a conceptual structure (CS) based on our insights from the interviews and ML experiments that depicts various impact Figure 1: The design science research methodology [16] applied to this work. The check marks show which goals have been achieved so far, and the iteration icon indicates which ones are still in development. factors and their parameters to consider when selecting an ML method for HCI data sets. The CS is meant to represent the activities of all parties involved, show the various interdependencies between all of the impact factors (similar to a metamodel or conceptual model [17, 18]), and facilitate decisions for specific ML methods. Furthermore, we intend to provide guidelines for HCI analysis with ML and the selection of ML methods for HCI data sets. Subsequently, we conduct a retrospective expert panel to evaluate our results. Regular publication of our current status and results guarantees communication. In addition, we take advantage of an iterative and agile development process, as suggested by the well-established Agile Manifesto, to create the CS [19, 20]. This is an appropriate and adjustable approach that allows us to react quickly to new developments, which is critical because the ML field has been growing rapidly. For our studies, we focus on the exchanges with experts and their experiences in the fields of ML and HCI, as well as on our project experiences. We aim to invent a realistic and applicable CS that is continually updated according to new findings (see Figure 2). Figure 2: Our iterative approach for developing a valid conceptual structure. The starting point is our first draft of the CS (see Figure 3), which is iteratively improved by structured expert interviews. We also plan to conduct ML experiments with HCI data sets to enrich our CS with further practical knowledge. Specifically, we apply ML methods to HCI data sets of different use cases (e.g., health-related data sets collected with an online experiment [4]) to deduce new findings for our CS. Furthermore, as established in the agile software development approach of the Scrum Guide [21], we plan a retrospective expert panel with stakeholders from industry and science to discuss the current CS, identify further impact factors and parameters, and gather additional feedback. To define the end of the iterations for the CS, we use the agile practice definition of done. 3. Preliminary Results In the first step, we conducted ML experiments with prediction methods [8, 22] and clustering methods [23] on health-related data sets (around 300 data points) collected with an online experiment to gain experience with the process. Figure 3: First draft and simplified version of our conceptual structure, which is organized into four horizontal dimensions and includes impact factors with their related parameters (parameters are not shown in this simplified version). We have also done a systematic literature review [23] to define the context of small HCI data sets [10] that are analyzed with ML techniques in the HCI community. Since the data sets of the remaining articles were often not sufficiently described in this context, we did a first rough analysis of n = 29 articles, which showed us that mainly data sets with less than 200 data points were used as input for ML. Our first draft of a CS (see Figure 3) consolidates the findings from our previous research, three informal, unstructured interviews with industry experts, the author’s opinion, and further feedback of five reviewers. Based on these insights, four main topics arose. Therefore, the CS is divided into the following four dimensions: problem space, human perspective, HCI data mining, and data governance. The first dimension, problem space, is based on the ISO standard 9241-220 [13]. This standard already provides an overview of the problem space, as it describes an overall concept for implementing an HCD approach in organizations. The four categories (strategy, organizational infrastructure, project, and operation) of the HCD process model [13] were integrated into this first dimension of the CS, along with the relevant describing parameters. The interviews showed us that two additional describing parameters are relevant for the impact factor HCI project: objectives and project stage. Therefore, we reduced the complexity of the CS in this dimension by adding the new parameter project stage to the impact factor HCI project and eliminating an extra impact factor called operation. The second dimension, human perspective, consists of the impact factors individuals, developer, expert [24], and the recipient of the results. All of these parties influence the data mining process in their own way, taking the HCI project to the next level. For example, individuals are the basis for HCI data collection because their interaction with a system or its evaluation generates the HCI data. The third dimension, HCI data mining, is based on the widely-used cross-industry standard process for data mining (CRISP-DM) [25]. Adapted variants of this standard for specific use cases also exist (e.g., for manufacturing small and medium-sized enterprises [26]). In the context of this CS, however, the original CRISP-DM is used because we focus on HCI data sets rather than a specific field of application. To the best of our knowledge, there is not yet a specific standard process for this context. Therefore, we extract the following dimensions from CRISP-DM: data understanding, data preparation, modeling, evaluation, and deployment. In our CS, we modified the description for better understanding in the HCI context: data collection & understanding, data preparation, modeling, evaluation & interpretation, and use in practice. However, the first step of CRISP-DM is business understanding, which is not named separately in our CS but is merged with the impact factor HCI project. Between the impact factor use in practice and the recipient, we also added an interface, which allows the user to use the ML model or the results. This can be in the form of an application, but it can also be a presentation or something similar. The fourth dimension, data governance, results from our unstructured interviews. The interview participants provided additional impact factors and parameters to consider for ML and AI applications. We decided to group them under the data governance dimension in our CS. Decision rights and responsibilities, data policies, and standards and compliance aspects all fall under the category of data governance [27]. For now, we also include ethical impact factors in the dimension data governance [27, 28]. However, according to most recent research, an ethics strategy should be defined first [29]. In our future work, this topic will be considered in more detail. 4. Research Plan and Next Steps Based on our preliminary findings, we intend to further evaluate and verify our CS draft. We first plan to conduct additional structured interviews with HCI and ML experts to enhance the insights from previous interviews and acquire information that is still missing. We will then adapt the CS according to the new findings from these interviews (e.g., by adding new impact factors or parameters or rearranging the existing structure). This will clarify the process and impact factors regarding ML application in the context of HCI research. In addition, during a retrospective panel with experts, we will evaluate our CS and deduce guidelines for ML usage on HCI data sets to validate our approach. To execute this plan, we first need to prepare interview guidelines for our structured inter- views with experts. Furthermore, we will investigate the results of applying various ML algorithms to a variety of HCI data sets. The experiences gained from the experiments will both enrich our CS and reveal which attributes, methods, and approaches we need to consider to avoid bias in our results. We also prepare relevant HCI data sets for our planned experiments as well as a definition of done for our CS. Acknowledgment I want to give special thanks to my supervisors, Maria José Escalona Cuaresma and Maria Rauschenberger, for their great collaboration, patience, and invaluable feedback. I am also thankful for the support of Jörg Thomaschewski. This research was supported by the EQUAVEL project PID2022-137646OB-C31 funded by MICIU/AEI/10.13039/501100011033, and by "ERDF/EU". References [1] International Organization for Standardization, Ergonomics of human-system interaction – Part 210: Human-centred design for interactive systems, 2019. [2] J. Lazar, J. H. Feng, H. Hochheiser, Research Methods in Human-Computer Interac- tion, Elsevier Inc., Cambridge, United States, 2017. doi:10.1016/b978-0-444-70536-5. 50047-6. [3] J. Deutschländer, A. C. Weigand, A. M. Klein, D. Winter, M. Rauschenberger, There are no major age effects for UX aspects of voice user interfaces using the Kano categorization, in: Proceedings of the 19th International Conference on Web Information Systems and Technologies, SCITEPRESS - Science and Technology Publications, 2023, pp. 330–339. doi:10.5220/0012187600003584. [4] M. Rauschenberger, R. Baeza-Yates, L. Rello, A Universal Screening Tool for Dyslexia by a Web-Game and Machine Learning, Frontiers in Computer Science 3 (2022) 111. URL: https://www.frontiersin.org/article/10.3389/fcomp.2021.628634. doi:10.3389/ fcomp.2021.628634. [5] S. J. Garbin, O. Komogortsev, R. Cavin, G. Hughes, Y. Shen, I. Schuetz, S. S. Talathi, Dataset for eye tracking on a virtual reality platform, in: A. Bulling, A. Huckauf, E. Jain, R. Radach, D. Weiskopf (Eds.), ACM Symposium on Eye Tracking Research and Applications, ACM, New York, NY, USA, 2020, pp. 1–10. doi:10.1145/3379155.3391317. [6] S. Dambra, I. Sanchez-Rola, L. Bilge, D. Balzarotti, When sally met trackers: Web tracking from the users’ perspective, in: 31st USENIX Security Symposium (USENIX Security 22), USENIX Association, Boston, MA, 2022, pp. 2189–2206. URL: https://www.usenix.org/ conference/usenixsecurity22/presentation/dambra. [7] O. R. Ogunseiju, N. Gonsalves, A. A. Akanmu, D. Bairaktarova, D. A. Bowman, F. Jazizadeh, Mixed reality environment for learning sensing technology applications in construction: A usability study, Advanced Engineering Informatics 53 (2022) 101637. doi:10.1016/j. aei.2022.101637. [8] M. Rauschenberger, R. Baeza-Yates, How to handle health-related small imbalanced data in machine learning?, i-com 19 (2020) 215–226. doi:10.1515/icom-2020-0018. [9] K. Caine, Local standards for sample size at chi, in: J. Kaye, A. Druin, C. Lampe, D. Morris, J. P. Hourcade (Eds.), Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 2016, pp. 981–992. doi:10.1145/2858036. 2858498. [10] A. C. Weigand, M. Rauschenberger, Exploring the definition of small data collected with HCI methods and used for ML, Mensch und Computer 2023 - Workshopband, 2023. doi:10. 18420/muc2023-mci-ws16-399. [11] R. Baeza-Yates, Big, small or right data: Which is the proper focus?, 2018. URL: https: //www.kdnuggets.com/2018/10/big-small-right-data.html. [12] A. S. Rohani, R. Baeza-Yates, Measuring bias, in: 2023 IEEE International Conference on Big Data (BigData), IEEE, 2023, pp. 1289–1298. doi:10.1109/BigData59044.2023. 10386679. [13] DIN Deutsches Institut für Normung, Ergonomie der Mensch-System-Interaktion – Teil 220: Prozesse zur Ermöglichung, Durchführung und Bewertung menschzentrierter Gestaltung für interaktive Systeme in Hersteller- und Betreiberorganisationen (iso 9241-220:2019); Deutsche Fassung EN ISO 9241-220:2019, Juli 2020. [14] V. S. Moustakis, J. Herrmann, Where do machine learning and human-computer interaction meet?, Applied Artificial Intelligence 11 (1997) 595–609. doi:10.1080/ 088395197117948. [15] V. Kostakos, M. Musolesi, Avoiding pitfalls when using machine learning in HCI studies, Interactions 24 (2017) 34–37. doi:10.1145/3085556. [16] K. Peffers, T. Tuunanen, M. A. Rothenberger, S. Chatterjee, A design science research methodology for information systems research, Journal of Management Information Systems 24 (2007) 45–77. doi:10.2753/MIS0742-1222240302. [17] M. J. Escalona, N. Koch, Metamodeling the requirements of web systems, in: J. Filipe, J. Cordeiro, V. Pedrosa (Eds.), Web Information Systems and Technologies, Springer eBook Collection Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007, pp. 267–280. doi:10.1007/978-3-540-74063-6{\textunderscore}21. [18] E.-M. Schön, M. Neumann, C. Hofmann-Stölting, R. Baeza-Yates, M. Rauschenberger, How are AI assistants changing higher education?, Frontiers in Computer Science 5 (2023). doi:10.3389/fcomp.2023.1208550. [19] K. Beck, M. Beedle, A. van Bennekum, A. Cockburn, W. Cunningham, M. Fowler, J. Gren- ning, J. Highsmith, A. Hunt, R. Jeffries, J. Kern, B. Marick, R. C. Martin, S. Mellor, K. Schwaber, J. Sutherland, D. Thomas, Manifesto for agile software development, 2001. URL: https://agilemanifesto.org/. [20] E.-M. Schön, J. Thomaschewski, M. J. Escalona, Agile requirements engineering: A systematic literature review, Computer Standards & Interfaces 49 (2017) 79–91. doi:10. 1016/j.csi.2016.08.011. [21] K. Schwaber, J. Sutherland, The Scrum Guide, 2020. URL: https://scrumguides.org/docs/ scrumguide/v2020/2020-Scrum-Guide-US.pdf. [22] M. Rauschenberger, R. Baeza-Yates, Recommendations to handle health-related small imbalanced data in machine learning, Mensch und Computer 2020 - Workshopband, 2020. doi:10.18420/muc2020-ws111-333. [23] A. C. Weigand, D. Lange, M. Rauschenberger, How can small data sets be clustered?, Mensch und Computer 2021 - Workshopband, 2021. doi:10.18420/muc2021-mci-ws02-284. [24] X. Wu, L. Xiao, Y. Sun, J. Zhang, T. Ma, L. He, A survey of human-in-the-loop for machine learning, Future Generation Computer Systems 135 (2022) 364–381. doi:10.1016/j. future.2022.05.014. [25] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, CRISP- DM 1.0: Step-by-step data mining guide, 2000. URL: https://www.kde.cs.uni-kassel.de/ wp-content/uploads/lehre/ws2012-13/kdd/files/CRISPWP-0800.pdf. [26] S. Rösl, T. Auer, C. Schieder, Addressing the data challenge in manufacturing SMEs: A comparative study of data analytics applications with a simplified reference model, in: M. Elstermann, A. Dittmar, M. Lederer (Eds.), Subject-Oriented Business Process Manage- ment. Models for Designing Digital Transformations, volume 1867 of Communications in Computer and Information Science, Springer Nature Switzerland, Cham, 2023, pp. 121–130. doi:10.1007/978-3-031-40213-5_9. [27] R. Abraham, J. Schneider, J. vom Brocke, Data governance: A conceptual framework, structured review, and research agenda, International Journal of Information Management 49 (2019) 424–438. doi:10.1016/j.ijinfomgt.2019.07.008. [28] M. Janssen, P. Brous, E. Estevez, L. S. Barbosa, T. Janowski, Data governance: Organizing data for trustworthy artificial intelligence, Government Information Quarterly 37 (2020) 1–8. doi:10.1016/j.giq.2020.101493. [29] R. Baeza-Yates, U. M. Fayyad, Responsible AI: An urgent mandate, IEEE Intelligent Systems 39 (2024) 12–17. doi:10.1109/MIS.2023.3343488.