=Paper=
{{Paper
|id=Vol-3136/paper5
|storemode=property
|title=Big data for humans or humans for big data?: a human-data interaction perspective
|pdfUrl=https://ceur-ws.org/Vol-3136/paper-5.pdf
|volume=Vol-3136
|authors=Shin'ichi Konomi
|dblpUrl=https://dblp.org/rec/conf/avi/Konomi22
}}
==Big data for humans or humans for big data?: a human-data interaction perspective==
Big data for humans or humans for big data?: a human-data interaction perspective Shin’ichi Konomi1 1 HDI Lab, Faculty of Arts and Science, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, JAPAN Abstract Designing "big data for humans" would require so-called human-data interaction. In this paper, we discuss key dimensions of human-data interaction to enable a look at the field from a broader perspective and facilitate developments of "big data for humans". Our discussion is based on the relevant research projects in our group at the intersections of human-data interaction and recommendation and search, pervasive computing, civic computing and learning analytics. Keywords Human-data interaction, human-centered big data, calm technology, data science 1. Introduction Bell and Gray (1997) predicted that all information about physical objects, humans, buildings, processes, and organizations will be online by 2047 [1]. Twenty five years have passed since their prediction, and there are only 25 years left before the possible dawn of the fully datafied world according to their prediction. By 2025, it’s estimated that 463 exabytes of data will be generated each day globally [2]. The sheer volume, variety and velocity of the ever-increasing data can easily create the situations of information overload. Quick fixes for the information overload problem often rely on straightforward automation, which may fail to fit human needs in different contexts. Going beyond such myopic approaches would require smartness at a different level to embed right opportunities for people to interact with and intervene big-data systems at the right time and in the right way. This can be a key step towards the design of calm technology [3]. Having people involved in big-data environments requires human-data interaction. Human- data interaction (HDI) is an emerging field of interdisciplinary inquiry that is concerned with understanding and developing technologies for supporting human interactions with digital data. Such interactions may occur in the contexts of data collection, data wrangling, algorithm design, analytics, visualization, recommendation, classification, prediction, interpretation, and so on. Proceedings of CoPDA2022 - Sixth International Workshop on Cultures of Participation in the Digital Age: AI for Humans or Humans for AI? June 7, 2022, Frascati (RM), Italy $ konomi@artsci.kyushu-u.ac.jp (S. Konomi) http://hdi.ait.kyushu-u.ac.jp/ (S. Konomi) 0000-0001-5831-2152 (S. Konomi) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 14 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 Business Understanding Data Data Analysis and Data collection Evaluation Deployment Understanding preparation modeling Upstream Downstream Figure 1: A process for using big data inspired by CRISP-DM [19] . Human-data interaction emphasizes the human-centered approach and existing works in this field focuses on its different facets [4]. Mortier, Haddadi, Henderson, McAuley and Crowcroft discuss human-data interaction with their proposal to place humans at the center of the flows of data, and provision of the mechanisms for citizens to interact with these systems and data explicitly [5]. They also propose and elaborate on the three core themes relevant to human-data interaction, namely, legibility, agency, and negotiability. Crabtree and Mortier discuss human- data interaction from social and interactional perspectives, and look at the need to develop social models and mechanisms of data sharing that enable users to play an active role in the process [6]. Mashhadi, Kawsar and Acer draw our attention to the importance of human-data interaction in Internet of Things environments with ubiquitous devices [7]. Cabitza and Locoro discuss healthcare data through the lens of human-data interaction [8]. Other studies look into embodied interactions for exploring large data sets [9], and a media service that exploits personal data to provide content recommendations [10]. In this paper, we discuss key dimensions of human-data interaction to enable a look at the field from a broader perspective and facilitate developments of "big data for humans". Our discussion is based on the relevant research projects in our group at the intersections of human- data interaction and recommendation and search, pervasive computing, civic computing and learning analytics. 2. Three key dimensions of human-data interaction In this section, we introduce the three dimensions for classifying human-data interaction environments. We identified these dimensions based on a survey of related works [4, 5, 6, 7, 8, 9, 10], our own experiences with relevant projects [11, 12, 13, 14, 15, 16, 17, 18] as well as an existing process model for data science [19]. Table 1 shows these three dimensions in a tabular format. The first dimension concerns with the process for using big data (see Figure 1). The process starts with data collection, followed by data understanding, data preparation through data wrangling, analysis and modeling via visualization and/or machine learning algorithms, eval- uation, and deployment of the resulting model or actions based on the gained insights. For 15 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 Table 1 The three key dimensions of HDI. Personal data Public data Upstream Downstream Upstream Downstream Synchronous Real-time in- Real-time in- Real-time in- Real-time in- teraction with teraction with teraction with teraction with personal data at personal data public data at public data at upstream steps at downstream upstream steps downstream (e.g., Collect- steps (e.g., Inter- (e.g., Collecting steps (e.g, Inter- ing personal active analysis urban public active analysis health data of data in data interac- of urban pub- interactively) personal infor- tively) lic data sets, matics) possibly using an embodied interaction interface) Asynchronous Long-term in- Long-term in- Long-term in- Long-term in- teraction with teraction with teraction with teraction with personal data at personal data public data at public data at upstream steps at downstream upstream steps downstream (e.g., Collect- steps (e.g., Per- (e.g., Collecting steps (e.g., Non- ing personal sonalized news urban public personalized health data recommenda- data automat- recommenda- automatically tion based on ically and use tion of popular and use it at an incremen- it at a later news based on a later point tally improved point in time. an incremen- in time. Im- machine- Improving data tally improved proving data learning model collection to machine- collection to address ethical learning address privacy issues.) model) issues.) example, human-data interaction can take place downstream in this process during the analysis and modeling phase by using interactive visualization tools. In other cases, it can take place upstream during data collection phase by turning on and off GPS tracking on one’s smart phone. This upstream-downstream dimension captures the point of human-data interaction in this process, and allows us to consider the differences of human-data interaction accordingly. The second is the personal-public dimension that concerns with the characteristics of data with which people interact. For example, embodied interaction with public data sets in a VR environment is public in this dimension, whereas personal news recommendation systems may use personal data about people. This dimension allows us to consider different concerns around the interaction with public and personal data. The third is the synchronous-asynchronous dimension that concerns with the time aspects of human-data interaction. This dimension distinguishes the different modes of human-data inter- action in a similar way as the synchronous-asynchronous classification of computer-supported cooperative work environments. For example, interactive analysis of data sets using an em- 16 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 Table 2 Classification of existing systems for recommendation and search, pervasive computing, civic computing, and learning analytics. Personal data Public data Upstream Downstream Upstream Downstream - Askus[13] - Vacant - CourseQ[11] - Deai - Deai Synchronous House[17] - Deai Explorer[12] Explorer[12] - Community Explorer[12] Reminder[14] - Vacant - Learning - e-Book House[17] - Co-location Asynchronous Analytics Reading - Community networks [18] for All[16] Analytics[15] Reminder[14] bodied interaction interface falls into the synchronous category. When people improve the behaviors of a recommendation system by changing some preference settings or by replacing its algorithm with a more privacy-preserving and less biased one, such interactions can be considered as asynchronous. 3. Case studies to explore the dimensions We next look further into the proposed dimensions of human-data interaction based on several existing systems, which have been developed by our group. The purposes of the systems include recommendation and search, pervasive computing, civic computing, and learning analytics. Their HDI features can be classified into different categories as shown in Table 2. 3.1. Recommendation and search CourseQ [11] is a course recommendation system for university students based on a syllabus data set and a topic modeling-based algorithm. Although many existing course recommendation systems focus on the accuracy of recommendation, they may fail to recommend the courses that the students feel truly relevant. We introduced various interactive features in CourseQ so as to improve user-centric metrics such as user acceptance as well as understandability of recommendation results. The interactive features of CourseQ include keyword-based search and filtering, interactive visualization of recommended courses, dynamic presentation of relevant auxiliary information and explanation with the recommended results, and a ’like’ button. These features mainly support synchronous interactions with the publicly available data set, however, the interactivity does not allow users to change the data sets and other elements in the upstream process. CourseQ thus provides synchronous downstream human-data interaction based on public data. 17 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 3.2. Pervasive computing DeaiExplorer [12] is a social-network display that responds to RFID badges carried by conference participants and displays social connections between colocated conference participants. The system exploits a public data set from a publication database as well as data collected from RFID readers based on participants’ agreement. The system visualizes social network structures based on these two types of data in order to facilitate social interactions among conference participants. The interactive feature of DeaiExplorer allows conference participants to access their social network visualizations just by showing their RFID badges to the RFID reader. This feature allows users to interact with public and private data in a synchronous manner. As this interactivity allows users to control the capture of their RFID data by (not) showing badges to the RFID reader, the system allows synchronous human-data interaction in both upstream and downstream processes. Askus [13] is a type of so-called participatory sensing systems, which allows users to collect data manually by using mobile phones. It thus concerns with the upstream process, and mainly supports synchronous human-data interaction with public environmental data, etc. Co-location networks [18] analyze urban mobility data sets based on network analysis techniques. This analysis was performed multiple times based on an iterative improvements of network analysis techniques. It thus concerns with asynchronous interaction with public data in the downstream process. 3.3. Civic computing Our WiFi-based sensing tool to predict vacant houses [17] is a type of so-called opportunistic sensing systems, which allows local community members to collect data automatically by just walking around in their community with their mobile phones in their backpacks. Users’ choices of walking routes can control data collection in a synchronous manner. Data collection can be controlled asynchronously by changing the setting of WiFi sensing software. It thus concerns with the upstream process, and mainly supports synchronous and asynchronous human-data interaction with WiFi signals in public spaces. Community Reminder [14] is also a type of participatory sensing systems, which allows local community members to collect information about the safety in their communities using mobile phones. Local community members can also participate in the design of the data collection mechanisms of this system using an intuitive tangible user interface. This system concerns with both synchronous and asynchronous aspects of public data collection. 3.4. Learning analytics Our research projects in this area include analysis of e-book reading patterns [15] as well as an effort to provide learning analytics for all age groups and in developing communities without reliable internet access [16]. The former analysis can be performed in an iterative manner based on different research questions and techniques. It thus concerns with asynchronous interaction with personal data in the downstream process. The interactive features of the latter includes delayed data transmission of learning log data using mobile phones[16]. It thus concerns with asynchronous interaction with personal data in the upstream process. 18 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 4. Discussion and conclusion We introduced the three key dimensions for classifying human-data interaction environments, i.e., the upstream-downstream, personal-public, and synchronous-asynchronous dimensions. Ex- isting research projects tend to focus on one or more areas with respect to these dimensions. The discussions of the several existing systems for recommendation and search, pervasive computing, civic computing, and learning analytics provided an opportunity for a further look into the proposed HDI dimensions, and demonstrated how they can highlight commonalities and differences of various human-data interaction systems. Interaction is everywhere in the space defined by the proposed dimensions. Thinking about big data systems from the perspectives enabled by these dimensions would help us analyze and/or design a broader range of big data and AI systems with humans at the center and their data interactions in mind. It reminds the designers and the users of such systems that they are embedded in larger contexts, and the importance of providing the right opportunities for humans to play active roles in the context. One of the major advantages of emphasizing interactions in big data and AI systems can be, as our experiences with CourseQ [11] suggests, people’s increased trust with data-centric smart mechanisms such as recommender systems, which could in turn lead to people’s improved satisfaction with such systems. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP20H00622. References [1] G. Bell, J. N. Gray, The revolution yet to happen, in: Beyond Calculation, Springer, 1997, pp. 5–32. [2] How much data is generated each day? | world economic forum, 2019. URL: https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-day- cf4bddf29f/. [3] M. Weiser, J. S. Brown, The coming age of calm technology, Beyond Calculation (1997) 75– 85. URL: https://link.springer.com/chapter/10.1007/978-1-4612-0685-9_6. doi:10.1007/ 978-1-4612-0685-9_6. [4] E. Z. Victorelli, J. C. Dos Reis, H. Hornung, A. B. Prado, Understanding human-data interaction: Literature review and recommendations for design, International Journal of Human-Computer Studies 134 (2020) 13–32. [5] R. Mortier, H. Haddadi, T. Henderson, D. McAuley, J. Crowcroft, Human-data interaction: The human face of the data-driven society, Available at SSRN 2508051 (2014). [6] A. Crabtree, R. Mortier, Human data interaction: historical lessons from social studies and cscw, in: ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19-23 September 2015, Oslo, Norway, Springer, 2015, pp. 3–21. 19 Shin’ichi Konomi CEUR Workshop Proceedings 14–20 [7] A. Mashhadi, F. Kawsar, U. G. Acer, Human data interaction in iot: The ownership aspect, in: 2014 IEEE world forum on Internet of Things (WF-IoT), IEEE, 2014, pp. 159–162. [8] F. Cabitza, A. Locoro, Human-data interaction in healthcare, Smart Technology Ap- plications in Business Environments (2017) 184–203. doi:10.4018/978-1-5225-2492- 2.CH009. [9] M. Trajkova, A. Alhakamy, F. Cafaro, R. Mallappa, S. R. Kankara, Move your body: Engaging museum visitors with human-data interaction, in: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–13. [10] N. Sailaja, R. Jones, D. McAuley, Designing for human data interaction in data-driven media experiences, in: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–7. [11] B. Ma, M. Lu, Y. Taniguchi, S. Konomi, Courseq: the impact of visual and interactive course recommendation in university environments, Research and Practice in Technology Enhanced Learning 16 (2021) 1–24. [12] S. Konomi, S. Inoue, T. Kobayashi, M. Tsuchida, M. Kitsuregawa, Supporting colocated interactions using rfid and social network displays, IEEE Pervasive Computing 5 (2006) 48–56. [13] S. Konomi, N. Thepvilojanapong, R. Suzuki, S. Pirttikangas, K. Sezaki, Y. Tobe, Askus: Amplifying mobile actions, in: International Conference on Pervasive Computing, Springer, 2009, pp. 202–219. [14] T. Sasao, S. Konomi, V. Kostakos, K. Kuribayashi, J. Goncalves, Community reminder: Participatory contextual reminder environments for local communities, International Journal of Human-Computer Studies 102 (2017) 41–53. [15] B. Ma, M. Lu, Y. Taniguchi, S. Konomi, Exploring jump back behavior patterns and reasons in e-book system, Smart Learning Environments 9 (2022) 1–23. [16] S. Konomi, L. Gao, D. Mushi, An intelligent platform for offline learners based on model- driven crowdsensing over intermittent networks, in: International Conference on Human- Computer Interaction, Springer, 2020, pp. 300–314. [17] S. Konomi, T. Sasao, S. Hosio, K. Sezaki, Using ambient WiFi signals to find occupied and vacant houses in local communities, Journal of Ambient Intelligence and Humanized Computing 10 (2019) 779–789. [18] S. Konomi, T. Sasao, The use of colocation and flow networks in mobile crowd- sourcing, in: UbiComp and ISWC 2015 - Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the Proceedings of the 2015 ACM International Symposium on Wearable Computers, 2015, pp. 1343–1348. doi:10.1145/2800835.2800967. [19] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, et al., Crisp-dm 1.0: Step-by-step data mining guide, SPSS inc 9 (2000) 13. 20