Sensemaking in Multi-artefact Information Tasks
Tianwa Chen1
1
    School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia


                                         Abstract
                                         Confronted with information silos and a growing volume of data in an increasingly interconnected data-
                                         driven world, knowledge workers, including technical and business users, often have to navigate multiple
                                         information artefacts to complete their tasks. These artefacts dispersed across various representational
                                         formats, and various information systems, can lead to overlapping, redundant or even conflicting
                                         information and inefficiency in information retrieval and knowledge workers’ understanding. Despite
                                         a growing market of tools, there is a lack of understanding in the current body of knowledge of how
                                         knowledge workers make sense of the multi-artefact information tasks and through what strategies.
                                         Motivated by the human-centric nature of the problem, this PhD project employs experiments, both in
                                         lab studies and on crowdsourcing platforms, and uses a number of behavioral and performance measures
                                         to unpack the cognitive demands on knowledge workers as they make sense of dual artefact tasks and
                                         multi-artefact tasks respectively. This project aims to propose an integrative model of sensemaking and
                                         cognitive processing in multi-artefact information tasks. The findings contribute to a better understanding
                                         of the sensemaking processes in various settings, inform modeling practice, and design supporting tools.

                                         Keywords
                                         Sensemaking, Business process modeling, Data curation, Data quality


1. Introduction
With the widespread problem of information silos and an increase in data accessibility, knowl-
edge workers, including technical and business users, often rely on multiple information artefacts
across different systems to complete their tasks. According to IDC [1], a typical knowledge
worker can spend 36% of the daily time searching for and consolidating information from
multiple artefacts, but workers can find the required information only 56% of the time. 61% of
knowledge workers regularly access four or more different artefacts to retrieve the information
they need for their work, and 15% access 11 or more. These artefacts dispersed across various
representational formats, and various information systems, can lead to overlapping, redundant
or even conflicting information and inefficiency in information retrieval and knowledge workers’
understanding.
   In practice, given the process can be more diverse and exploratory when knowledge workers
navigate through multi-artefact information tasks, there has been a strong response from the
market with a plethora of tools to support the ‘human-in-the-loop’ [2]. However, despite there
being an increasing focus for researchers to study the behaviour of knowledge workers in many

Proceedings of the Doctoral Consortium Papers Presented at the 34th International Conference on Advanced Information
Systems Engineering (CAiSE 2022), June 06–10, 2022, Leuven, Belgium
$ tianwa.chen@uq.edu.au (T. Chen)
 0000-0002-5135-0313 (T. Chen)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
contexts [3, 4], there has been little focus on the process of knowledge workers making sense
of these multi-artefact information tasks. The current body of knowledge does not adequately
explain knowledge workers’ sensemaking behaviours and strategies when interacting with
these tasks.
   To explore this problem, we undertake exploratory studies to investigate knowledge workers’
behaviour in two settings that offer dual artefact and multi-artefact tasks respectively. For
the setting of dual artefact tasks, in the context of business process management systems
and business rule management systems, two commonly used artefacts are business process
models and business rule repositories. When presented separately, these two artefacts are
known to cause a lack of shared understanding,and conflicts and redundancies that can lead to
inefficiencies and even compliance breaches [5]. Although a number of integrated modeling
approaches for business processes and rules have been proposed, there is limited knowledge on
how these approaches affect worker behavior and task performance.
   As for the setting of multi-artefact tasks, there is increasing evidence that knowledge workers,
including data scientists, engineers and analysts, can spend in excess of 80 percent of their
time and effort engaged in the data curation process in a typical data science project [6]. These
cost-intensive processes constitute a number of artefact tasks and are considered a drain on
analytic functions within organizations. Due to the inherent complexity of these tasks, the
bulk of data curation tasks still cannot feasibly and efficiently be addressed by machine-based
algorithms [7] without human intervention (e.g., manual inspection) [3]. Moreover, the existing
tools that support data curation tasks are often domain-focused and challenging to use in
coordination with other program functionalities. Therefore, given the increasing demand for a
more cost-efficient data curation process, researchers have started to look at how knowledge
workers engage with data (e.g., [3]). However, there is a paucity of research focusing on how
knowledge workers interact with various artefact information tasks and what processes they
follow while carrying out data curation activities.
   Accordingly, motivated by the human-centric nature of the problem, this PhD project employs
exploratory studies to investigate the behavior of knowledge workers engaging with various
artefact information tasks in the context of a dual artefact information tasks setting and also in
multi-artefact information tasks settings. This project aims to propose an integrative model of
sensemaking and cognitive processing in multi-artefact information tasks. We approach the
design of the research through a sensemaking lens and consider foundational sensemaking
constructs of information foraging and information processing.


2. Research Goal
The project is divided into three studies, including experiments in controlled lab studies and
crowdsourcing platforms to understand the cognitive demands on knowledge workers as
they make sense of multi-artefact information tasks. It uses a number of behavioral and
performance measures through the use of eye-tracking and electroencephalography (EEG)
devices in controlled lab experiments.
  The first study aims to investigate knowledge workers’ behavior in dual artefact tasks when
the form of integrated representation of the artefacts (namely business process models and
business rules) and task complexity changes.
   The second study aims to understand how knowledge workers engage with multi-artefact
tasks in the data curation process. We will first investigate the data curation process specifically
related to data quality detection and how to build repeatable and efficient data curation processes
harnessing the collective intelligence of a group of knowledge workers.
   The third study aims to propose an integrative model of sensemaking and cognitive processing
in multi-artefact information tasks by consolidating the research results learnt from the lab
studies and existing frameworks and theories in sensemaking and cognition processing. In
addition, we will test the research model by collecting empirical data from a crowdsourcing
platform.


3. Related Work
3.1. Sensemaking
The sensemaking methodology was introduced by Dervin in 1972 to design a human communi-
cations system, which was later developed as the model of the sensemaking triangle representing
how a person makes sense of the situation through a space-time context [8]. Russell et al.’s
“learning loop complex” [9] first proposed the cost structure of the sensemaking model, which
describes the process people use to understand and encode data to answer tasks-specific ques-
tions. Since these seminal works, literature in various domains has contributed to theories and
models of sensemaking.
   More recently, there has been an increased focus on understanding how sensemaking operates
in the era of increasingly complex information artefacts [10]. For instance, researchers have used
a sensemaking perspective to understand how individuals make sense of the fairness assessment
system in ML [11], reusing knowledge [12], debugging strategies [13] and supporting knowledge
acceleration for programming [14].
   Cognitive constructs of attention and memory have a natural and strong affinity to the two
phases in sensemaking models. Cognitive load theory [15, 16] provides proven mechanisms
through which these constructs can be operationalized. For example, attention and search
behaviour has been measured through eye-tracking devices, which can capture data on visual
scanning (eye movement) and attention (eye fixations) [17]. This data, in turn, can be used for
various behavioural measurements, such as cognitive load, visual association, visual cognition
efficiency, and intensity [18].
   While there is a long history of the use of eye-tracking technology in medical and psychology
studies [19], the use of it in the context of data work with human-machine teaming is relatively
recent. However, it holds great promise for a deeper understanding of user behaviour in complex
tasks. To our best knowledge, existing sensemaking studies are focused on qualitative or
perceptionary measures with limited use of behavioural and performance measures. Hence, we
considered the use of eye-tracking devices in a controlled experiment as a novel and objective
means to capture and expose sensemaking behaviours and the interactive process of how
knowledge workers explore multi-artefact tasks in different settings.
3.2. Dual Artefact Information Tasks in the Case of Business Process Models
     and Business Rules
In dual artefact tasks setting, our study considers the specific context of business process and
business rule modeling – two complementary approaches for modeling business activities,
which have multiple integration methods [20] to improve their individual representational
capacity. In summary, the integration methods can be categorized into three approaches with
distinct format and construction, namely: text annotation, diagrammatic integration, and link
integration [21]. Text annotation and link integration both use a textual expression to describe
the business rules and connect them with the corresponding section of the process model.
With link integration, visual links can explicitly connect corresponding rules with the relevant
process section. Diagrammatic integration relies on graphical process model construction, such
as sequence flows and gateways, to represent business rules in the process model. Each of these
methods has strengths and weaknesses, as summarized in [5], and thus a potential impact on a
knowledge worker’s understanding of a process.

3.3. Multi-artefact Information Tasks in Data Curation
The importance and scope of data curation have increased multi-fold in the era of big data, due
to the prevalence of external and repurposed data in data science projects. A primary reason
for data curation is the large proportion of externally acquired datasets with different quality
levels. In fact, even internal data may have to be repurposed [22] to meet the specific needs of
a certain data science project. In either case, the data curation process constitutes a number
of multi-artefact tasks, which may include selection, classification, transformation, filtering,
imputation, integration/fusion, or validation [23].
   Currently, three main approaches are evident in the context of data curation, namely: ad-
hoc/manual, automated, and crowd-sourced approaches. The manual approach is the most
common approach [23, 24]. However, data quality issues constitute a major challenge for
knowledge workers using a manual approach as it is likely that multiple data quality issues
exist in large datasets, e.g. completeness, accuracy, and consistency [3, 25].
   To study knowledge workers, recent research outlined the work cycle of data scientists,
ranging from discovery to design [3]. We note that utilizing a crowd-sourcing approach for
building data curation processes from multiple crowd-sourced tasks is currently under-studied
and a key objective.


4. Study 1 - Sensemaking in Dual Artefact Tasks – The Case of
   Business Process Models and Business Rules
In this study, we investigate how user behavior occurs in dual artefact tasks when the form of
integrated representation of the artefacts (namely business process models and business rules)
and task complexity changes. Using a sensemaking lens in our study, we can delineate the
behavior between developing model understanding and task accomplishment.
4.1. Study Design
We use an experimental research design. In line with sensemaking foundations, we segment
the experiment into two phases, namely a searching and encoding phase (we term this as the
understanding phase) and a task specific information processing phase (termed the answering
phase). The understanding phase commences when the participant first fixates on the experiment
screen, and the answering phase commences when the participant starts to type the answer in
the question area for the first time (see Fig. 1). Due to space limitations, the complete experiment
instruments are available for download 1 .


Figure 1: Visual experiment design [26]. The divided areas of interests (AOIs) with names for analysis
purposes are not displayed to the participants.


   The experiment data consists of a pre-experiment questionnaire, eye tracking log data, and
task performance data. The eye tracking data was collected through a Tobii Pro TX300 eye
tracker2 , which captures data on fixations, gaze, saccades, etc., with timestamps. To capture
sensemaking behavior, we used measurements related to fixation durations and frequencies,
measurements related to AOI specific fixations, and transitions between AOIs.
   The experiment instruments included a tutorial, the treatments and a questionnaire. Each
group of participants was first provided with a BPMN tutorial and was then offered a model
using one of the three different rule integration approaches. In the treatment, we used the three
integration approaches (one per each treatment group). The scenario of the model and rules
originated from a travel booking diagram included in OMG’s BPMN 2.0 documentation3 . We
ensured, through multiple revisions, that we created informationally equivalent models for all
three integration approaches, and all confounding factors were constant, including the same
eye-tracking lab equipment and tutorial content. We did not limit the experiment duration nor
a word count limit on participants’ answers. The model was adjusted to ensure consistency
    1
      The experiment materials can be downloaded from bit.ly/3N5Kr6O
    2
      For more specifications of the eye tracker, please visit https://www.tobiipro.com/product-listing/tobii-pro-tx300/
    3
      Model originated from OMG’s BPMN 2.0 examples can be viewed in http://www.omg.org/cgi-bin/doc?dtc/10-
06-02
of format for each of the integration approaches, while providing some diversity in terms
of constructs and coverage to gain further insights into the relationship between integration
approaches and task complexity.

4.2. Current Progress
More details of the current results can be found in the publication [26]. Our results show
that link representation shows better task performance in terms of accuracy and efficiency,
especially as task complexity increases. Additionally, our results provide some evidence that
diagrammatic integration has better task performance on local questions in terms of accuracy,
but also requires the most effort in the initial information foraging (understanding) phase.
   The findings from this study also form the basis of our investigation for the next step. We
will use complementary approaches such as cued retrospective ‘thinking-out-loud’ [27] and
biosensors (e.g. electroencephalography, captured by Emotive4 ) to provide further explanations
on the sensemaking behavior and cognition process. We also consider the limitations of the
current research, where we only included the basic constructs in business process models,
whereas advanced loop and nesting structures may introduce further complexities in sensemak-
ing. Therefore, we will also analyze the change in knowledge workers’ behavior over longer
tasks with more variability in task complexity to help further reveal insights into sensemaking,
and this may especially be valuable for training and work allocation purposes.


5. Study 2 - Sensemaking in Multi-artefact Information Tasks –
   The Case of Data Curation
As the first step, we aim to understand how knowledge workers engage with multi-artefact
tasks in data curation specifically related to data quality detection.

5.1. Study Design
To capture knowledge worker sensemaking behaviours while discovering data quality issues,
we used an experimental design method in a lab study with purpose-built experiment platforms
that mimic typical data exploration tools. The lab setting [25] enabled us to use advanced
tracking devices (e.g., eye-trackers and activity loggers) to capture the interaction behaviors.
   Our interface design is typical of several existing data exploration platforms that provide
UI areas with a similar arrangement, see e.g., Talend Cloud API (jupyter.org), RapidMiner
(rapidminer.com), or PowerBI (powerbi.microsoft.com). The UI of our data curation platform
has three main panels: the DataOps area as the internal functions on the left, the working
console area in the middle, and the data view and toolkit area to view and record data quality
annotations on the right (see Fig. 2). We custom built two experiment platforms, with one
requiring manual coding to undertake data quality discovery and the other offering built-in
functions. On both experiment platforms, we kept all other variables constant and provided


   4
       For more information about Emotiv, please see https://www.emotiv.com/
equivalent information with the same interface design, including the same dataset and a set of
pre-defined functions.


                             (a) Experiment platform with coding [28].

Figure 2: User interface of the experimental platforms.


   To provide internal data curation resources, we pre-define 21 DataOps, ranging from import-
ing essential libraries to complex Boolean operations involving regular expressions [28]. This
set of DataOps is sufficient to complete all tasks in our experiment (i.e., participants do not
necessarily need to refer to external materials). The dataset includes 13,000 records and four
columns (ID, name, contact number and join date). We chose five most recognised and common
types of data quality issues [28] and injected them into the dataset with the help of Parallel
Data Generation Framework [29] to provide the ground truth. The size of the data and injected
number of errors removed the option of manual annotation.
   The participants were required to complete the task of identifying and annotating the data
quality issues. They are only allowed to use the given browser throughout the experiment.
The experiment commences with a pre-experiment survey, followed by a tutorial outlining
definitions and examples of data quality issues and a practice example, and then they start the
formal experiment whenever they feel ready. At the end of the experiment, the participants are
asked to complete a post-experiment survey. The surveys based on [30] captured participant
perceptions on the experiment tasks and helped ensure internal validity.

5.2. Current Progress
More details of the current results can be found in the publications [28, 31, 32]. Our findings
show that the approaches taken by the knowledge workers participating in our study were
often diverse and complementary in that they were able to identify different data quality issues
with different levels of effectiveness and robustness. This bears implications for automatically
creating aggregated data curation process through crowd intelligence.
   However, the current work is not without limitations as it was based on a lab experiment,
and we only focused on detecting data quality issues. Therefore, in the next step, we will
conduct experiments with real crowd workers to fully understand the sensemaking process
in the complex artefact tasks of data curation, and build effective, robust, and repeatable data
curation processes by learning from a crowd of knowledge workers.
6. Study 3 - Sensemaking and Cognitive Processing Model in
   Multi-Artefact Information Tasks
The work is still in the early stages. Based on the existing frameworks and theories in sensemak-
ing and cognition processing, we plan to consolidate all research findings we found in studies 1
and 2 to propose a integrative sensemaking and cognitive processing model in multi-artefact
information tasks. We will test the proposed research model and hypotheses using empirical
data collected from a crowdsourcing platform.


7. Expected Contributions
This research project allows us to understand the cognitive demands on knowledge workers as
they make sense of multi-artefact information tasks. Expected contributions include bridging
the gap of the current limitation of understanding the sensemaking process of knowledge
workers in multi-artefact information tasks, contributing to sensemaking theory, informing
modelling practice, providing guidelines on training to knowledge workers, and the design of
supporting tools and tasks.


Acknowledgments
This research is supported by UQ RTP reserach scholarship. I would like to thank Prof. Shazia
Sadiq, Prof. Marta Indulska, and A/Prof. Gianluca Demartini for supervising this PhD project.


References
 [1] D. Schubmehl, D. Vesset, The knowledge quotient: Unlocking the hidden value of infor-
     mation using search and content analytics, White paper, IDC (2014).
 [2] N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, L. M. Aroyo, “everyone
     wants to do the model work, not the data work”: Data cascades in high-stakes ai, in:
     Proceedings of the 2021 CHI, 2021, pp. 1–15.
 [3] M. Muller, I. Lange, D. Wang, D. Piorkowski, J. Tsay, Q. V. Liao, C. Dugan, T. Erickson,
     How data science workers work with data: Discovery, capture, curation, design, creation,
     in: Proceedings of the 2019 CHI Conference, 2019.
 [4] D. J. Piorkowski, S. D. Fleming, I. Kwan, M. M. Burnett, C. Scaffidi, R. K. Bellamy, J. Jor-
     dahl, The whats and hows of programmers’ foraging diets, in: Proceedings of the CHI
     Conference, 2013, pp. 3063–3072.
 [5] W. Wang, M. Indulska, S. W. Sadiq, Cognitive efforts in using integrated models of business
     processes and rules., in: CAiSE Forum, 2016, pp. 33–40.
 [6] D. Patil, Data Jujitsu, O’Reilly Media, Inc., 2012.
 [7] S. Sadiq, T. Dasu, X. L. Dong, J. Freire, I. F. Ilyas, S. Link, M. J. Miller, F. Naumann, X. Zhou,
     D. Srivastava, Data quality: The role of empiricism 46 (2018) 35–43.
 [8] B. Dervin, A theoretic perspective and research approach for generating research helpful to
     communication practice, Sense-making methodology reader: selected writings of Brenda
     Dervin (2003) 251–268.
 [9] D. M. Russell, M. J. Stefik, P. Pirolli, S. K. Card, The cost structure of sensemaking, in:
     Proceedings of the INTERACT’93 and CHI’93, 1993, pp. 269–276.
[10] D. M. Russell, G. Convertino, A. Kittur, P. Pirolli, E. A. Watkins, Sensemaking in a senseless
     world: 2018 workshop abstract, in: Extended Abstracts of the 2018 CHI Conference on
     Human Factors in Computing Systems, 2018, pp. 1–7.
[11] Z. Gu, J. N. Yan, J. M. Rzeszotarski, Understanding user sensemaking in machine learning
     fairness assessment systems, in: Proceedings of the Web Conference, 2021, pp. 658–668.
[12] M. X. Liu, A. Kittur, B. A. Myers, To reuse or not to reuse? a framework and system for
     evaluating summarized knowledge, Proceedings of the ACM on HCI 5 (2021) 1–35.
[13] V. Grigoreanu, M. Burnett, S. Wiedenbeck, J. Cao, K. Rector, I. Kwan, End-user debug-
     ging strategies: A sensemaking perspective, ACM Transactions on Computer-Human
     Interaction (TOCHI) 19 (2012) 1–28.
[14] M. X. Liu, S. Burley, E. Deng, A. Zhou, A. Kittur, B. A. Myers, Supporting knowledge
     acceleration for programming from a sensemaking perspective, in: Sensemaking Workshop
     at CHI Conference on Human Factors in Computing Systems, 2018.
[15] F. Chen, J. Zhou, Y. Wang, K. Yu, S. Z. Arshad, A. Khawaji, D. Conway, Robust multimodal
     cognitive load measurement, Springer, 2016.
[16] J. Sweller, P. Ayres, S. Kalyuga, Measuring cognitive load, in: Cognitive load theory,
     Springer, 2011, pp. 71–85.
[17] A. T. Duchowski, Gaze-based interaction: A 30 year retrospective, Computers & Graphics
     73 (2018) 59–69.
[18] K. Rayner, Eye movements in reading and information processing: 20 years of research.,
     Psychological bulletin 124 (1998) 372.
[19] M. A. Just, P. A. Carpenter, Eye fixations and cognitive processes, Cognitive psychology 8
     (1976) 441–480.
[20] G. Knolmayer, R. Endl, M. Pfahrer, Modeling processes and workflows by business rules,
     in: Business Process Management, Springer, 2000, pp. 16–29.
[21] T. Chen, W. Wang, M. Indulska, S. Sadiq, Business process and rule integration approaches-
     an empirical analysis, in: International Conference on Business Process Management,
     Springer, 2018, pp. 37–52.
[22] R. Zhang, M. Indulska, S. Sadiq, Discovering data quality problems, Business & Information
     Systems Engineering 61 (2019) 575–593.
[23] T. Hey, A. Trefethen, The data deluge: An e-science perspective, Grid computing: Making
     the global infrastructure a reality (2003) 809–824.
[24] E. Rahm, H. H. Do, Data cleaning: Problems and current approaches, IEEE Data Eng. Bull.
     23 (2000) 3–13.
[25] C. Sutton, T. Hobson, J. Geddes, R. Caruana, Data diff: Interpretable, executable summaries
     of changes in distributions for data wrangling, in: Proceedings of the 24th ACM SIGKDD
     Conference, 2018, pp. 2279–2288.
[26] T. Chen, S. Sadiq, M. Indulska, Sensemaking in dual artefact tasks–the case of business
     process models and business rules, in: International Conference on Conceptual Modeling,
     Springer, 2020, pp. 105–118.
[27] T. Van Gog, F. Paas, J. J. Van Merriënboer, P. Witte, Uncovering the problem-solving
     process: Cued retrospective reporting versus concurrent and retrospective reporting.,
     Journal of Experimental Psychology: Applied 11 (2005) 237.
[28] T. Chen, L. Han, G. Demartini, M. Indulska, S. Sadiq, Building data curation processes
     with crowd intelligence, in: International Conference on Advanced Information Systems
     Engineering, Springer, 2020, pp. 29–42.
[29] Y. Tay, Data generation for application-specific benchmarking, Proceedings of the VLDB
     Endowment 4 (2011) 1470–1473.
[30] S. G. Hart, Nasa-task load index (nasa-tlx); 20 years later, in: Proceedings of the human
     factors and ergonomics society annual meeting, volume 50, 2006, pp. 904–908.
[31] L. Han, T. Chen, G. Demartini, M. Indulska, S. Sadiq, On understanding data worker
     interaction behaviors, in: Proceedings of the 43rd International ACM SIGIR Conference
     on Research and Development in Information Retrieval, 2020, pp. 269–278.
[32] S. Yu, T. Chen, L. Han, G. Demartini, S. Sadiq, Dataops-4g: On supporting generalists in
     data quality discovery, IEEE Transactions on Knowledge and Data Engineering (2022).