Detecting Changes in Mental Models during Interaction
                         Chuyang Wu1,2,∗,† , Shanshan Zhang1,† and Jussi P. P. Jokinen2
                         1
                             University of Helsinki, Pietari Kalmin katu 5, 00560 Helsinki Finland
                         2
                             University of Jyväskylä, Seminaarinkatu 15, PL 35, 40014 Jyväskylä Finland


                                           Abstract
                                           This paper introduces a novel computational cognitive model that maps latent mental models to observable behaviors, allowing the
                                           system to detect changes in users’ mental models from their actions. We propose an inference framework to dynamically adjust to
                                           the user’s evolving understanding and decision-making processes. An empirical experiment demonstrates the framework’s ability to
                                           accurately detect shifts in users’ mental models based on their interactions. The results indicate a consistent improvement in prediction
                                           accuracy and a decrease in variance over time, suggesting the model’s potential for real-time application in designing adaptive interactive
                                           systems.

                                           Keywords
                                           Human-Computer Interaction, User Modeling, Collaborative Human-Computer Systems, Adaptive Systems


                         1. Introduction
                         An intelligent interactive system needs to adapt to the be-
                         haviors of its users. It should understand their intentions,
                         and anticipate what’s coming next. A user’s interactive
                         behavior is shaped by their mental model, the user’s knowl-
                         edge and beliefs of the interactive system [1], which is not
                         directly observable. We can parameterize the mental model
                         to build a computational user model [2]. In such a model
                         latent (i.e., unobservable), factors are mapped to observed
                         behavior, allowing us to formalize the mechanism of interac-
                         tive behavior. We can then build adaptive systems that solve
                         for the mental model from observations, and the interactive
                         system can be designed to adapt accordingly.
                            However, a problem in inferring mental models is that                                                     Figure 1: Harry is a novice warehouse operator who previously
                         they are not static during interaction. For example, as users                                                only understood ultrasound readings. Now he starts to scan for
                         become more experienced, their mental models change [1].                                                     radio frequencies. Is this a mistake, or has he learned how to
                         Failures of the interactive system to detect these changes                                                   read radio frequencies?
                         would lead to wrong or obsolete inference of mental models
                         and ineffective adaptation, to the detriment of the user.
                            Consider a hypothetical scenario involving a multisensor
                                                                                                                                         In this paper, we propose a computational model of inter-
                         smart scanner that can obtain ultrasound and radio fre-
                                                                                                                                      action that accounts for how changes in the mental model
                         quency readings of boxes at a warehouse. Suppose that
                                                                                                                                      lead to changes in interactive behavior. We then define a
                         different contents produce different sensor readings. Harry,
                                                                                                                                      framework to infer and quantify the mental model from
                         a novice operator yet to learn to read radio frequencies,
                                                                                                                                      observed behavior and demonstrate how to detect changes
                         relies solely on ultrasound to determine the content. Ac-
                                                                                                                                      in parameter value from behavioral data with an empirical
                         cordingly, the scanner should provide hints on how to inter-
                                                                                                                                      experiment. In summary, this paper contributes to the com-
                         pret ultrasound readings. If Harry suddenly scans for radio
                                                                                                                                      putational modeling of interactive behavior by proposing:
                         frequency data, it will likely be a mistake, and the scanner
                         should intervene to avert it.                                                                                     • a computational model of how interactive behavior
                            Harry practices reading radio frequency data and asso-                                                           emerges from quantified mental models;
                         ciating the readings with the contents. At some point, his                                                        • an inference framework to detect these changes from
                         mental model – an internal representation of the dynamics                                                           observed behavior.
                         and facts of the external task – evolves to have a closer
                         correspondence with reality. If the AI of the scanner does
                         not pick up on this evolution, it will continue to recognize                                                 2. Background Review
                         Harry’s actions as mistakes and offer ineffective or detrimen-
                         tal hints. Therefore, intelligent interactive systems must                                                   In human-computer interaction, mental models represent
                         accurately infer user’s changing mental models to provide                                                    how the interaction is internally interpreted and recon-
                         useful adaptation.                                                                                           structed by the users [3]. How closely a user’s mental model
                                                                                                                                      matches the real interactive environment would determine
                         TKTP 2024: Annual Doctoral Symposium of Computer Science, 10.-                                               the effectiveness and efficiency of the user’s interactive strat-
                         11.6.2024 Vaasa, Finland                                                                                     egy [4]. Particularly, suppose a user fails to understand the
                         ∗
                              Corresponding author.                                                                                   designs of an interactive system. In that case, it is more
                         †
                             These authors contributed equally.                                                                       likely that the mental model would be poor, and the user
                         Envelope-Open chuyang.wu@helsinki.fi (C. Wu); shanshan.zhang@helsinki.fi                                     would likely end up missing their goals and have a frustrat-
                         (S. Zhang); jussi.p.p.jokinen@jyu.fi (J. P. P. Jokinen)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                                                                                                                      ing experience.
                                       Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    Interactive systems are often designed to adapt to user
needs and habits to create an intuitive user experience.
The classic approach is to collect behavioral data, such as
keystrokes, mouse movements, or system logs, and analyze
it for patterns [5]. Interactive systems would update based
on similarities between user behaviors and learned patterns.
These approaches, however, do not explain the reasons be-
hind the user’s actions. When designing such a system, it is
therefore desirable for the system to align with the users’
mental models [6]. To do so would require a model of the
user’s mental model that accounts for user behavior and
decision-making [7], allowing the interactive systems to              Figure 2: Interaction as a POMDP. The states 𝑆 are not directly
adapt to the user’s goals [8, 9].                                     observable. The agent makes an observation 𝑂, from which a
    Parameterized, computational models of interaction have           belief 𝑏 is formed. Based on the belief, the agent takes an action
                                                                      𝑎 which leads to a reward 𝑟, as well as a transition to the next
been proposed to explain the user’s decision-making pro-
                                                                      state. The reward depends on both the state and the action.
cess during an interaction [10, 11]. These models establish
a causal link between observed user behavior and latent
psychological factors and parameterize the latter to build
a computational framework, thus paving a way to infer                 possible beliefs. The agent aims to find an optimal policy
the values of latent factors from observed behavior [12, 13].         𝜋 ∶ 𝐵 → 𝐴 to guide its choice of action that maximizes the
This approach can be extended to study the effect of mental           expected discounted rewards over time. Specifically, the
models on user behavior, enabling the design of intelligent           interaction takes place as follows.
interactive systems that adapt to users’ mental models.
                                                                          1. Initial Belief State: The interaction starts with the
    However, these models have not addressed cases where
                                                                             agent having an initial belief state, 𝑏0 (𝑠) representing
the latent factors change. A user could gain knowledge
                                                                             the agent’s initial knowledge about the environment,
and experience during an interaction to become more skill-
                                                                             𝑏𝑜 ∈ 𝐵, 𝑠 ∈ 𝑆.
ful, which would be reflected in the mental model. Failure
to account for such changes would render any interactive                  2. Action Selection: At each time step 𝑡, the agent
system’s adaptation ineffective or even detrimental. Con-                    selects an action 𝑎𝑡 ∈ 𝐴 based on its current belief
sequently, our present work formalizes a computational                       state 𝑏𝑡 (𝑠) according to a policy 𝜋 to maximize the
framework for interaction that detects changes in mental                     expected reward.
models based on observed user behavior. This would be                     3. Environment Response: The environment tran-
important for creating intelligent interactive systems and                   sitions from 𝑠𝑡 to 𝑠𝑡+1 according to 𝑇 (𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) =
collaborative AI that are truly adaptive to the users.                       𝑃(𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡 ). This is not directly observable by the
                                                                             agent.
                                                                          4. Observation: The agent receives an observation
3. Method                                                                    𝑜𝑡+1 ∈ 𝑂, generated according to the observation
                                                                             model Ω: 𝑂(𝑠𝑡+1 , 𝑎𝑡 , 𝑜𝑡+1 ) = 𝑃(𝑜𝑡+1 |𝑠𝑡+1 , 𝑎𝑡 ).
3.1. Interaction as a POMDP                                               5. Belief Update: The agent performs Bayesian up-
We view the user of an interactive system as an agent                        date of its belief to 𝑏𝑡+1 (𝑠) with observation 𝑜𝑡+1 ,
trying to solve a Partially Observable Markov Decision                       action 𝑎𝑡 , and previous belief 𝑏𝑡 (𝑠), and revises knowl-
Process (POMDP) [14]. POMDP is defined as a tuple                            edge about the environment.
(𝑆, 𝐴, 𝑇 , 𝑅, 𝑂, Ω, 𝛾 ) where:                                            6. Reward: The agent receives a reward 𝑅(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 )
                                                                             based on the state transition.
     • 𝑆 is a finite set of states of the environment.                    7. Repetition: Steps 2 through 6 are repeated, with
     • 𝐴 is a finite set of actions available to the agent.                  the agent continually updating its belief state and se-
     • 𝑇 ∶ 𝑆 × 𝐴 × 𝑆 → [0, 1] is the (probabilistic) transition              lecting actions until a terminal condition is reached.
       function, where 𝑇 (𝑠, 𝑎, 𝑠 ′ ) = 𝑃(𝑠 ′ |𝑠, 𝑎) represents the
       probability of transitioning to state 𝑠 ′ when action 𝑎           The agent can use reinforcement learning to find the
       is taken in state 𝑠.                                           strategy that maximizes the future-discounted cumulative
     • 𝑅 ∶ 𝑆 × 𝐴 × 𝑆 → ℝ is the reward function for each              reward: 𝑉 (𝑠) = max𝑎 {𝑟(𝑠, 𝑎) + 𝛾 ∑𝑆 𝑇 (𝑠 ′ |𝑠, 𝑎)𝑉 (𝑠 ′ )}. It has
       transition from 𝑠 to 𝑠 ′ due to 𝑎.                             been theorized and shown empirically that as long as the
     • 𝑂 is a finite set of possible observations.                    POMDP formalism correctly models the task environment
     • Ω ∶ 𝑆 × 𝐴 × 𝑂 → [0, 1] is the (probabilistic) obser-           and the relevant parts of human cognition, an optimal policy
       vation function, where Ω(𝑠 ′ , 𝑎, 𝑜) = 𝑃(𝑜|𝑠 ′ , 𝑎) repre-     approximates that of human behavior. This is known as
       sents the probability of observation 𝑜 after action 𝑎,         computational rationality [15].
       in state 𝑠 ′ .
     • 𝛾 ∈ [0, 1] is the discount factor for the present value        3.2. Mental Models and Interactive Behavior
       of future rewards.
                                                                      Given that the true state 𝑠𝑖 is not directly observable, the
   The interaction process between an agent and a POMDP               agent forms its belief 𝑏𝑖 , a probability distribution over all
environment can now be described in Figure 2. In a POMDP,             possible states in the environment at 𝑖. We propose that
the agent cannot know the environment state directly. In-             the agent performs a Bayesian update to obtain 𝑏𝑖 using its
stead, it observes the state and forms an internal representa-        mental model, 𝑡:̂
tion of the state as a belief 𝑏 ∈ 𝐵, with 𝐵 being the set of all                             𝑏𝑖+1 ∝ 𝑡(𝑏̂ 𝑖 , 𝑜𝑖 ),                (1)
   In Equation 1, the mental model is a (probabilistic) func-          Category         Description
                                                                       Measurement      Utilize statistical distance measures
tion that updates the agent’s belief given observation and
                                                                                        (e.g., KL divergence, Total Variation dis-
previous belief. Thus the mental model 𝑡 ̂ can be viewed as
                                                                                        tance, Wasserstein distance) to quan-
the (imperfect) transition function 𝑇 of an individual agent.                           tify the difference between successive
An ideal agent with the perfect knowledge and expertise                                 posterior distributions of the mental
of the interactive environment would have the true mental                               model (𝑃(𝑡 ̂ ∣ 𝐷 𝑜 )) to assess how one dis-
model identical to 𝑇. In reality, even given the same obser-                            tribution diverges from another.
vation, agents with different mental models 𝑡 ̂ would have             Threshold        Define a threshold for a significant
different ways to update their beliefs.                                                 change, based on domain knowledge,
                                                                                        statistical criteria, or adaptive methods.
                                                                                        Validate this threshold through simu-
3.3. Inferring Mental Models from                                                       lations or historical data to ensure it
     Observation                                                                        effectively differentiates between rou-
                                                                                        tine updates and significant model
We can use the framework in Sections 3.1 and 3.2 to simulate                            changes.
agents with different mental models and use them to gen-               Monitoring       Continuously or periodically calculat-
erate simulated behavior. When a human user interacts to                                ing the distance measure between the
generate real data, it can then be compared to the simulated                            current and previous posterior distri-
data to determine the likely mental model of the human                                  butions, storing past distributions for
user.                                                                                   comparison. If the distance exceeds the
   Suppose that the mental model has the probability dis-                               threshold, infer a significant change in
              ̂ From Sections 3.1 and 3.2 we know how an
tribution 𝑃(𝑡).                                                                         the mental model has occurred.
agent with a mental model 𝑡 ̂ would behave. Consequently,          Table 1
we also know the conditional probability distribution of           Different Algorithmic Approaches to Infer Changes in an Agent’s
𝑃(𝐷 𝑜 ∣ 𝑡),
         ̂ given an observed behavior data 𝐷 𝑜 . Bayes’ rule       Mental Model
can then be used to invert the conditional probability and
find:
                 𝑃(𝑡 ̂ ∣ 𝐷 𝑜 ) ∝ 𝑃(𝐷 𝑜 ∣ 𝑡)̂ ⋅ 𝑃(𝑡),
                                                  ̂       (2)          2. Identify MAP Estimate
                                                                          Determine the category 𝑐MAP with the highest Pos-
  Finding the likelihood 𝑃(𝐷 𝑜 ∣ 𝑡)̂ is difficult, both analyti-
                                                                          terior probability:
cally and empirically. Instead, we use a likelihood-free Ap-
proximate Bayesian Computation (ABC) [16, 17] to sample                                𝑐MAP = arg max Posterior[𝑐]
possible values of 𝑡,̂ minimize the difference between simu-                                          𝑐∈𝐶
lated and observed data, estimated by a Gaussian process               3. Decide the Value of the Mental Model
regression model [18] and find the posterior distribution.                Update the value of the mental model with the MAP
                                                                          estimate:
3.4. Detecting Changes of Mental Models                                                         𝑡 ̂ = 𝑐MAP
Equation 2 gives us a probabilistic estimate of mental mod-
els, which alone is insufficient in detecting potential changes
in mental models. To algorithmically determine whether,
given observed data, the mental model has changed sig-
nificantly, we need to quantify changes in the posterior
distribution 𝑃(𝑡 ̂ ∣ 𝐷 𝑜 ). Depending on the specificities of
the interaction, we can choose from various methods, as
summarized in Table 1.

3.4.1. Example: Mental Models with Categorical
                                                                   Figure 3: Quantifying the value of mental model with MAP.
       Values                                                      The estimated value is calculated from the posterior probability
Which quantification method to use depends on the charac-          distribution and updated every term.
teristics of mental models. Suppose we have a categorical
mental model, which is the case we could use maximum a                The pipeline for inferring and quantifying the value of
posteriori estimate (MAP) to determine the values of 𝑡,̂ and       mental models is shown in Figure 3. Here, the current
detect any changes.                                                distribution 𝑃𝑖 (𝑐) of mental models is used as priors and
    1. Calculate Posterior Distribution                            observation to produce the next period’s distribution. From
       For each category 𝑐 in the mental model categories          each distribution, the mental model’s value is determined
       𝐶 (i.e. Equation 2):                                        using MAP.

              Posterior[𝑐] = Likelihood[𝑐] × Prior[𝑐]
                                                                   4. Evaluation
       Normalize the Posterior for each category 𝑐 by di-
                                                                   We use an experiment to demonstrate how the framework
       viding by the sum of all Posterior values:
                                                                   outlined in Section 3 quantifies and detects changes in the
                                      Posterior[𝑐]                 latent mental models of human participants interacting with
                Posterior[𝑐] ←                                     an interactive system. We change the instructions given to
                                 ∑𝑐 ′ ∈𝐶 Posterior[𝑐 ′ ]
                                                                   the participants during the experiment to mimic changes
in mental models and showcase how the model prediction                  • Group 2: the probabilities of finding each waste
successfully reflects these changes.                                      given all features except MRI.
                                                                      After round 5, all participants are given a new table con-
4.1. Participants                                                  taining the probabilities of finding each waste given all fea-
We recruited 10 participants online1 , of which 8 identified       tures, with no features withheld. These tables represent the
as females, and 2 as males, coming from 5 different nations.       participants’ mental models (𝑡 ̂ in our computational model).
They are between the ages of 20 and 48, averaging at 29.           The mental models of the initial 5 rounds belong to those
The participants were paid compensation for taking part in         participants not having learned to associate certain features
the experiment.                                                    with the underlying probabilities. We assign 𝑡1̂ to the initial
                                                                   mental model of Group 1, and 𝑡2̂ to that of Group 2. The
                                                                   new mental model assigned after round 5 is 𝑡0̂ .
4.2. Materials
                                                                       ult.    x-ray    MRI      radio    batt.   lights    app
We conducted our experiment remotely using a webpage
                                                                       high    high     high     -        0.7     0.3       0.6
designed to simulate a hypothetical scenario where partici-
pants interact with the simulation environment and make            Table 2
decisions based on feedback and prior instructions. Par-           A snippet of the table shown to Group 1
ticipants interact by clicking buttons which are logged as
experiment data.                                                      A snippet of the table given to Group 1 is shown in Table
                                                                   2. Using this knowledge, if a participant obtains the cor-
Scenario Picture a warehouse of unmarked boxes con-                responding readings, they would know that the likelihood
taining electric and electronic waste, including used batteries,   of finding a battery is 0.65. Taking into consideration the
LED lights, and household appliances. To identify what each        action costs, they can calculate the expected reward and
box contains, there’s an advanced scanner equipped with            decide whether they would accept the box.
ultrasound, X-ray tomography, magnetic resonance imag-                The switch at round 5 is designed to model users acquir-
ing (MRI), and radio frequency sensors. The warehouse              ing a new mental model during an interaction after gaining
manager can select a sensor to scan a box and get specific         knowledge and expertise about the environment and cor-
results. Each type of waste generates unique readings on           rectly associating all features with the probabilities.
the sensors. By scanning a box, the manager aims to deter-
mine its specific contents. Specifically, each waste has four         Summary Statistics The experiment data gathered
features: ultrasound, x-ray, MRI, and radio frequency. Each        are the sequence of actions performed by each participant,
feature value can be either high or low.                           recorded as a list of button IDs. To eliminate unnecessary
   The scenario is represented on a webpage, and the partic-       randomness, we transform the data using summary statistics:
ipants play the role of warehouse manager. In each task, the       we ignore any repetitions in the action and its order. As a
participant is presented with a box of unknown contents,           result, we are only concerned with whether each sensor has
and given a goal of finding particular contents. The par-          been used, and whether the participant decides to accept or
ticipant must scan the box for the four features and decide        reject the box.
whether to open the box or abandon it, given their men-
tal model of what contents produce what sorts of scanner              Inference 10 participants each performed 12 tasks to
readings, and what their goal is.                                  generate 12 results of button clicks. In total 60 sequences are
                                                                   collected and transformed by summary statistics into sets
4.3. Experiment Procedure                                          of boolean variables. Each result records the status of the 6
                                                                   buttons, with 1 corresponding to the button being clicked,
The experiment is carried out as follows:                          and 0 otherwise. For example, if a participant chooses to
         • Each participant performs 12 rounds of tasks.           scan the X-ray and MRI, and rejects the box, the resultant
         • During each round, the webpage refreshes and ran-       data would be: [1, 2, 3], and transformed into [0, 1, 1, 0, 1, 0].
           domly generates a box as described above.                  As described in Section 3.4, the mental model 𝑐 can be
         • During each round, each participant is randomly         quantified as a categorical variable. We divide the unit
           assigned a type of waste to look for.                   interval into thirds so that each third corresponds to one
         • The participant scans the box, and decides whether      of three mental models 𝑡0̂ , 𝑡1̂ and 𝑡2̂ . We create simulated
           to accept or reject it.                                 agents with the three mental models to produce simulated
                                                                   data. For each 𝑡,̂ we use Proximal Policy Optimization with
   Each participant is rewarded points for accepting the           the default parameters [19] to train the simulated agents.
box containing the assigned waste or rejecting the box not            Using the mechanism in Section 3, our model samples
containing it. If a participant wrongly accepts or rejects         possible values of 𝑐 and compares the simulated results with
a box, a penalty is applied. Scanning a feature will also          participant data to produce a probabilistic distribution of 𝑐
cost points. Therefore participants are instructed to act          values. We use MAP estimates to determine their values, as
economically to make the right decision with minimal costs.        outlined in Section 3.4. For each round of tasks each par-
   The 10 participants are divided into 2 groups of 5. In          ticipant performs, we sample the corresponding simulated
round 1, we give each group a table containing the proba-          result 200 times.
bility of finding each waste given a set of features.
         • Group 1: the probabilities of finding each waste        4.4. Experiment Result
           given all features except radio frequency;
                                                                   We can calculate the accuracy of our inference: the percent-
1
    www.prolific.co                                                age of the 200 inferred 𝑐 that matches the correct mental
model 𝑡𝑖̂ , 𝑖 = 0, 1, 2. Averaged over all participants, we thus
obtained 12 average prediction accuracies throughout the
iteration. The result is presented in Figure 4.
   We plot the results in Figure 4. We observe the model’s
average prediction accuracy for each participant’s mental
model across the 12 rounds. The red, vertical dotted line
marks the switching of 𝑡 ̂ as participants receive the new
table after round 5.
   Furthermore, we also calculate the standard deviation of
the inferred values of mental models for each round, and
average over all participants. The result is shown in the
Figure 5. The switching of 𝑡 ̂ is also marked by a red, vertical
dotted line.

4.5. Discussion
                                                                   Figure 5: Standard deviation of the mental models across 12
We can discover several trends in the results as shown in          rounds. The red dotted line indicates the switching of instruc-
Figures 4 and 5. The accuracy of model prediction of mental        tions.
model 𝑡 ̂ increases per round (Figure 4). This is due to the
Bayesian update of the model incorporating the results from
previous rounds into the following rounds as prior informa-        5. Future Research
tion. Consequently, the inference improves in accuracy as
confoundments are gradually resolved. This is also shown           In this paper, we present a formal, computational model
in the decrease of standard deviations in Figure 5. In ear-        to infer a user’s mental model during interaction. It can
lier rounds, there is relatively little information and more       detect changes in the mental model and dynamically updates
confoundments, leading to greater uncertainty in inference         the inference once sufficient evidence is accumulated. The
results. As evidence accumulates and confoundments are             experiment demonstrates a consistent trend of improving
resolved, uncertainty also decreases.                              accuracy and decreasing variance in the model predictions.
     Importantly, both figures show a drastic change between       The model can be a starting point for building an intelligent
rounds 5 and 6, when the mental models 𝑡 ̂ are switched. The       interactive system that truly understands its users.
accuracy goes down and the standard deviation slightly in-            Currently, the model needs to run ABC and sample at
creases. This means that at round 6, the priors from previous      each round of inference, as outlined in section 3. This makes
rounds still have a strong influence on the inference results,     the model too slow to be implemented in real applications.
and the model clings to the prediction that the data were          Consequently, a key improvement would be to make the
produced by agents with the old mental model (either 𝑡1̂ or        model more lightweight and efficient so that inferences and
𝑡2̂ ). However, as can be seen in Figures 4 and 5, evidence        adaptations can be implemented in real-time. One idea
accumulates due to our model’s Bayesian setup, suggesting          worth exploring is amortizing the inference by pre-training
that a new mental model was likely behind the observed             the model using simulation [20].
data. Towards the later rounds, accuracy has recovered                The entire inference framework must also be tested with
and the model now firmly predicts the new mental model             real HCI tasks, such as menu search and typing. To do so we
𝑡0̂ . Similar trends can also be observed in average standard      need to define both the computational model of interaction
deviations, as the value goes up slightly after round 5 before     and the mental model. This would also allow us to compare
continuing to descend.                                             our proposed approach to existing methods and conduct
                                                                   statistical analysis with more participants. To do so would
                                                                   likely require insights from psychology, behavioral science,
                                                                   etc., and is beyond the scope of this work.


                                                                   Acknowledgments
                                                                   This research has been supported by the Academy of Finland
                                                                   (grant 330347).


                                                                   References
                                                                    [1] S. J. Payne, Mental models in human-computer inter-
                                                                        action, The Human-Computer Interaction Handbook
                                                                        (2007) 89–102.
                                                                    [2] N. Banovic, T. Buzali, F. Chevalier, J. Mankoff, A. K.
                                                                        Dey, Modeling and understanding human routine
Figure 4: Average accuracy of model prediction of all 10 par-           behavior, in: Proceedings of the 2016 CHI Conference
ticipants across 12 rounds. The red dotted line indicates the           on Human Factors in Computing Systems, 2016, pp.
switching of instructions.                                              248–260.
                                                                    [3] H. Rutjes, M. Willemsen, W. IJsselsteijn, Considera-
                                                                        tions on explainable ai and users’ mental models, in:
     CHI 2019 Workshop: Where is the Human? Bridging              Conference on Human Factors in Computing Systems,
     the Gap Between AI and HCI, Association for Com-             2023, pp. 1–20.
     puting Machinery, Inc, 2019.
 [4] M.-A. Storey, F. D. Fracchia, H. A. Müller, Cognitive de-
     sign elements to support the construction of a mental
     model during software exploration, Journal of Systems
     and Software 44 (1999) 171–185.
 [5] S. Dumais, R. Jeffries, D. M. Russell, D. Tang, J. Teevan,
     Understanding user behavior through log data and
     analysis, Ways of Knowing in HCI (2014) 349–372.
 [6] P. Langley, User modeling in adaptive interface, in:
     UM99 User Modeling: Proceedings of the Seventh
     International Conference, Springer, 1999, pp. 357–370.
 [7] P. Legrenzi, V. Girotto, Mental models in reasoning
     and decision-making processes, in: Mental models in
     cognitive science, Psychology Press, 2013, pp. 95–118.
 [8] A. Howes, J. P. Jokinen, A. Oulasvirta, Towards ma-
     chines that understand people, AI Magazine 44 (2023)
     312–327.
 [9] Z. Duric, W. D. Gray, R. Heishman, F. Li, A. Rosenfeld,
     M. J. Schoelles, C. Schunn, H. Wechsler, Integrating
     perceptual and cognitive modeling for adaptive and
     intelligent human-computer interaction, Proceedings
     of the IEEE 90 (2002) 1272–1289.
[10] G. Bailly, A. Oulasvirta, D. P. Brumby, A. Howes,
     Model of visual search and selection time in linear
     menus, in: Proceedings of the sigchi conference
     on human factors in computing systems, 2014, pp.
     3865–3874.
[11] X. Chen, G. Bailly, D. P. Brumby, A. Oulasvirta,
     A. Howes, The emergence of interactive behavior:
     A model of rational menu search, in: Proceedings of
     the 33rd annual ACM conference on human factors in
     computing systems, 2015, pp. 4217–4226.
[12] S. Sarcar, J. P. Jokinen, A. Oulasvirta, Z. Wang, C. Sil-
     pasuwanchai, X. Ren, Ability-based optimization of
     touchscreen interactions, IEEE Pervasive Computing
     17 (2018) 15–26.
[13] A. Kangasrääsiö, J. P. Jokinen, A. Oulasvirta, A. Howes,
     S. Kaski, Parameter inference for computational cogni-
     tive models with approximate bayesian computation,
     Cognitive science 43 (2019) e12738.
[14] M. T. Spaan, Partially observable markov decision
     processes, in: Reinforcement learning: State-of-the-
     art, Springer, 2012, pp. 387–414.
[15] A. Oulasvirta, J. P. Jokinen, A. Howes, Computational
     rationality as a theory of interaction, in: Proceed-
     ings of the 2022 CHI Conference on Human Factors
     in Computing Systems, 2022, pp. 1–14.
[16] M. A. Beaumont, W. Zhang, D. J. Balding, Approximate
     bayesian computation in population genetics, Genetics
     162 (2002) 2025–2035.
[17] K. Csilléry, M. G. Blum, O. E. Gaggiotti, O. François,
     Approximate bayesian computation (abc) in practice,
     Trends in ecology & evolution 25 (2010) 410–418.
[18] J. Lintusaari, H. Vuollekoski, A. Kangasrääsiö,
     K. Skytén, M. Järvenpää, P. Marttinen, M. U. Gutmann,
     A. Vehtari, J. Corander, S. Kaski, Elfi: Engine for
     likelihood-free inference, Journal of Machine Learning
     Research 19 (2018) 1–7.
[19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford,
     O. Klimov, Proximal policy optimization algorithms,
     arXiv preprint arXiv:1707.06347 (2017).
[20] H.-S. Moon, A. Oulasvirta, B. Lee, Amortized inference
     with user simulations, in: Proceedings of the 2023 CHI