Detecting Changes in Mental Models during Interaction Chuyang Wu1,2,∗,† , Shanshan Zhang1,† and Jussi P. P. Jokinen2 1 University of Helsinki, Pietari Kalmin katu 5, 00560 Helsinki Finland 2 University of Jyväskylä, Seminaarinkatu 15, PL 35, 40014 Jyväskylä Finland Abstract This paper introduces a novel computational cognitive model that maps latent mental models to observable behaviors, allowing the system to detect changes in users’ mental models from their actions. We propose an inference framework to dynamically adjust to the user’s evolving understanding and decision-making processes. An empirical experiment demonstrates the framework’s ability to accurately detect shifts in users’ mental models based on their interactions. The results indicate a consistent improvement in prediction accuracy and a decrease in variance over time, suggesting the model’s potential for real-time application in designing adaptive interactive systems. Keywords Human-Computer Interaction, User Modeling, Collaborative Human-Computer Systems, Adaptive Systems 1. Introduction An intelligent interactive system needs to adapt to the be- haviors of its users. It should understand their intentions, and anticipate what’s coming next. A user’s interactive behavior is shaped by their mental model, the user’s knowl- edge and beliefs of the interactive system [1], which is not directly observable. We can parameterize the mental model to build a computational user model [2]. In such a model latent (i.e., unobservable), factors are mapped to observed behavior, allowing us to formalize the mechanism of interac- tive behavior. We can then build adaptive systems that solve for the mental model from observations, and the interactive system can be designed to adapt accordingly. However, a problem in inferring mental models is that Figure 1: Harry is a novice warehouse operator who previously they are not static during interaction. For example, as users only understood ultrasound readings. Now he starts to scan for become more experienced, their mental models change [1]. radio frequencies. Is this a mistake, or has he learned how to Failures of the interactive system to detect these changes read radio frequencies? would lead to wrong or obsolete inference of mental models and ineffective adaptation, to the detriment of the user. Consider a hypothetical scenario involving a multisensor In this paper, we propose a computational model of inter- smart scanner that can obtain ultrasound and radio fre- action that accounts for how changes in the mental model quency readings of boxes at a warehouse. Suppose that lead to changes in interactive behavior. We then define a different contents produce different sensor readings. Harry, framework to infer and quantify the mental model from a novice operator yet to learn to read radio frequencies, observed behavior and demonstrate how to detect changes relies solely on ultrasound to determine the content. Ac- in parameter value from behavioral data with an empirical cordingly, the scanner should provide hints on how to inter- experiment. In summary, this paper contributes to the com- pret ultrasound readings. If Harry suddenly scans for radio putational modeling of interactive behavior by proposing: frequency data, it will likely be a mistake, and the scanner should intervene to avert it. • a computational model of how interactive behavior Harry practices reading radio frequency data and asso- emerges from quantified mental models; ciating the readings with the contents. At some point, his • an inference framework to detect these changes from mental model – an internal representation of the dynamics observed behavior. and facts of the external task – evolves to have a closer correspondence with reality. If the AI of the scanner does not pick up on this evolution, it will continue to recognize 2. Background Review Harry’s actions as mistakes and offer ineffective or detrimen- tal hints. Therefore, intelligent interactive systems must In human-computer interaction, mental models represent accurately infer user’s changing mental models to provide how the interaction is internally interpreted and recon- useful adaptation. structed by the users [3]. How closely a user’s mental model matches the real interactive environment would determine TKTP 2024: Annual Doctoral Symposium of Computer Science, 10.- the effectiveness and efficiency of the user’s interactive strat- 11.6.2024 Vaasa, Finland egy [4]. Particularly, suppose a user fails to understand the ∗ Corresponding author. designs of an interactive system. In that case, it is more † These authors contributed equally. likely that the mental model would be poor, and the user Envelope-Open chuyang.wu@helsinki.fi (C. Wu); shanshan.zhang@helsinki.fi would likely end up missing their goals and have a frustrat- (S. Zhang); jussi.p.p.jokinen@jyu.fi (J. P. P. Jokinen) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ing experience. Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Interactive systems are often designed to adapt to user needs and habits to create an intuitive user experience. The classic approach is to collect behavioral data, such as keystrokes, mouse movements, or system logs, and analyze it for patterns [5]. Interactive systems would update based on similarities between user behaviors and learned patterns. These approaches, however, do not explain the reasons be- hind the user’s actions. When designing such a system, it is therefore desirable for the system to align with the users’ mental models [6]. To do so would require a model of the user’s mental model that accounts for user behavior and decision-making [7], allowing the interactive systems to Figure 2: Interaction as a POMDP. The states 𝑆 are not directly adapt to the user’s goals [8, 9]. observable. The agent makes an observation 𝑂, from which a Parameterized, computational models of interaction have belief 𝑏 is formed. Based on the belief, the agent takes an action 𝑎 which leads to a reward 𝑟, as well as a transition to the next been proposed to explain the user’s decision-making pro- state. The reward depends on both the state and the action. cess during an interaction [10, 11]. These models establish a causal link between observed user behavior and latent psychological factors and parameterize the latter to build a computational framework, thus paving a way to infer possible beliefs. The agent aims to find an optimal policy the values of latent factors from observed behavior [12, 13]. 𝜋 ∶ 𝐵 → 𝐴 to guide its choice of action that maximizes the This approach can be extended to study the effect of mental expected discounted rewards over time. Specifically, the models on user behavior, enabling the design of intelligent interaction takes place as follows. interactive systems that adapt to users’ mental models. 1. Initial Belief State: The interaction starts with the However, these models have not addressed cases where agent having an initial belief state, 𝑏0 (𝑠) representing the latent factors change. A user could gain knowledge the agent’s initial knowledge about the environment, and experience during an interaction to become more skill- 𝑏𝑜 ∈ 𝐵, 𝑠 ∈ 𝑆. ful, which would be reflected in the mental model. Failure to account for such changes would render any interactive 2. Action Selection: At each time step 𝑡, the agent system’s adaptation ineffective or even detrimental. Con- selects an action 𝑎𝑡 ∈ 𝐴 based on its current belief sequently, our present work formalizes a computational state 𝑏𝑡 (𝑠) according to a policy 𝜋 to maximize the framework for interaction that detects changes in mental expected reward. models based on observed user behavior. This would be 3. Environment Response: The environment tran- important for creating intelligent interactive systems and sitions from 𝑠𝑡 to 𝑠𝑡+1 according to 𝑇 (𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) = collaborative AI that are truly adaptive to the users. 𝑃(𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡 ). This is not directly observable by the agent. 4. Observation: The agent receives an observation 3. Method 𝑜𝑡+1 ∈ 𝑂, generated according to the observation model Ω: 𝑂(𝑠𝑡+1 , 𝑎𝑡 , 𝑜𝑡+1 ) = 𝑃(𝑜𝑡+1 |𝑠𝑡+1 , 𝑎𝑡 ). 3.1. Interaction as a POMDP 5. Belief Update: The agent performs Bayesian up- We view the user of an interactive system as an agent date of its belief to 𝑏𝑡+1 (𝑠) with observation 𝑜𝑡+1 , trying to solve a Partially Observable Markov Decision action 𝑎𝑡 , and previous belief 𝑏𝑡 (𝑠), and revises knowl- Process (POMDP) [14]. POMDP is defined as a tuple edge about the environment. (𝑆, 𝐴, 𝑇 , 𝑅, 𝑂, Ω, 𝛾 ) where: 6. Reward: The agent receives a reward 𝑅(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) based on the state transition. • 𝑆 is a finite set of states of the environment. 7. Repetition: Steps 2 through 6 are repeated, with • 𝐴 is a finite set of actions available to the agent. the agent continually updating its belief state and se- • 𝑇 ∶ 𝑆 × 𝐴 × 𝑆 → [0, 1] is the (probabilistic) transition lecting actions until a terminal condition is reached. function, where 𝑇 (𝑠, 𝑎, 𝑠 ′ ) = 𝑃(𝑠 ′ |𝑠, 𝑎) represents the probability of transitioning to state 𝑠 ′ when action 𝑎 The agent can use reinforcement learning to find the is taken in state 𝑠. strategy that maximizes the future-discounted cumulative • 𝑅 ∶ 𝑆 × 𝐴 × 𝑆 → ℝ is the reward function for each reward: 𝑉 (𝑠) = max𝑎 {𝑟(𝑠, 𝑎) + 𝛾 ∑𝑆 𝑇 (𝑠 ′ |𝑠, 𝑎)𝑉 (𝑠 ′ )}. It has transition from 𝑠 to 𝑠 ′ due to 𝑎. been theorized and shown empirically that as long as the • 𝑂 is a finite set of possible observations. POMDP formalism correctly models the task environment • Ω ∶ 𝑆 × 𝐴 × 𝑂 → [0, 1] is the (probabilistic) obser- and the relevant parts of human cognition, an optimal policy vation function, where Ω(𝑠 ′ , 𝑎, 𝑜) = 𝑃(𝑜|𝑠 ′ , 𝑎) repre- approximates that of human behavior. This is known as sents the probability of observation 𝑜 after action 𝑎, computational rationality [15]. in state 𝑠 ′ . • 𝛾 ∈ [0, 1] is the discount factor for the present value 3.2. Mental Models and Interactive Behavior of future rewards. Given that the true state 𝑠𝑖 is not directly observable, the The interaction process between an agent and a POMDP agent forms its belief 𝑏𝑖 , a probability distribution over all environment can now be described in Figure 2. In a POMDP, possible states in the environment at 𝑖. We propose that the agent cannot know the environment state directly. In- the agent performs a Bayesian update to obtain 𝑏𝑖 using its stead, it observes the state and forms an internal representa- mental model, 𝑡:̂ tion of the state as a belief 𝑏 ∈ 𝐵, with 𝐵 being the set of all 𝑏𝑖+1 ∝ 𝑡(𝑏̂ 𝑖 , 𝑜𝑖 ), (1) In Equation 1, the mental model is a (probabilistic) func- Category Description Measurement Utilize statistical distance measures tion that updates the agent’s belief given observation and (e.g., KL divergence, Total Variation dis- previous belief. Thus the mental model 𝑡 ̂ can be viewed as tance, Wasserstein distance) to quan- the (imperfect) transition function 𝑇 of an individual agent. tify the difference between successive An ideal agent with the perfect knowledge and expertise posterior distributions of the mental of the interactive environment would have the true mental model (𝑃(𝑡 ̂ ∣ 𝐷 𝑜 )) to assess how one dis- model identical to 𝑇. In reality, even given the same obser- tribution diverges from another. vation, agents with different mental models 𝑡 ̂ would have Threshold Define a threshold for a significant different ways to update their beliefs. change, based on domain knowledge, statistical criteria, or adaptive methods. Validate this threshold through simu- 3.3. Inferring Mental Models from lations or historical data to ensure it Observation effectively differentiates between rou- tine updates and significant model We can use the framework in Sections 3.1 and 3.2 to simulate changes. agents with different mental models and use them to gen- Monitoring Continuously or periodically calculat- erate simulated behavior. When a human user interacts to ing the distance measure between the generate real data, it can then be compared to the simulated current and previous posterior distri- data to determine the likely mental model of the human butions, storing past distributions for user. comparison. If the distance exceeds the Suppose that the mental model has the probability dis- threshold, infer a significant change in ̂ From Sections 3.1 and 3.2 we know how an tribution 𝑃(𝑡). the mental model has occurred. agent with a mental model 𝑡 ̂ would behave. Consequently, Table 1 we also know the conditional probability distribution of Different Algorithmic Approaches to Infer Changes in an Agent’s 𝑃(𝐷 𝑜 ∣ 𝑡), ̂ given an observed behavior data 𝐷 𝑜 . Bayes’ rule Mental Model can then be used to invert the conditional probability and find: 𝑃(𝑡 ̂ ∣ 𝐷 𝑜 ) ∝ 𝑃(𝐷 𝑜 ∣ 𝑡)̂ ⋅ 𝑃(𝑡), ̂ (2) 2. Identify MAP Estimate Determine the category 𝑐MAP with the highest Pos- Finding the likelihood 𝑃(𝐷 𝑜 ∣ 𝑡)̂ is difficult, both analyti- terior probability: cally and empirically. Instead, we use a likelihood-free Ap- proximate Bayesian Computation (ABC) [16, 17] to sample 𝑐MAP = arg max Posterior[𝑐] possible values of 𝑡,̂ minimize the difference between simu- 𝑐∈𝐶 lated and observed data, estimated by a Gaussian process 3. Decide the Value of the Mental Model regression model [18] and find the posterior distribution. Update the value of the mental model with the MAP estimate: 3.4. Detecting Changes of Mental Models 𝑡 ̂ = 𝑐MAP Equation 2 gives us a probabilistic estimate of mental mod- els, which alone is insufficient in detecting potential changes in mental models. To algorithmically determine whether, given observed data, the mental model has changed sig- nificantly, we need to quantify changes in the posterior distribution 𝑃(𝑡 ̂ ∣ 𝐷 𝑜 ). Depending on the specificities of the interaction, we can choose from various methods, as summarized in Table 1. 3.4.1. Example: Mental Models with Categorical Figure 3: Quantifying the value of mental model with MAP. Values The estimated value is calculated from the posterior probability Which quantification method to use depends on the charac- distribution and updated every term. teristics of mental models. Suppose we have a categorical mental model, which is the case we could use maximum a The pipeline for inferring and quantifying the value of posteriori estimate (MAP) to determine the values of 𝑡,̂ and mental models is shown in Figure 3. Here, the current detect any changes. distribution 𝑃𝑖 (𝑐) of mental models is used as priors and 1. Calculate Posterior Distribution observation to produce the next period’s distribution. From For each category 𝑐 in the mental model categories each distribution, the mental model’s value is determined 𝐶 (i.e. Equation 2): using MAP. Posterior[𝑐] = Likelihood[𝑐] × Prior[𝑐] 4. Evaluation Normalize the Posterior for each category 𝑐 by di- We use an experiment to demonstrate how the framework viding by the sum of all Posterior values: outlined in Section 3 quantifies and detects changes in the Posterior[𝑐] latent mental models of human participants interacting with Posterior[𝑐] ← an interactive system. We change the instructions given to ∑𝑐 ′ ∈𝐶 Posterior[𝑐 ′ ] the participants during the experiment to mimic changes in mental models and showcase how the model prediction • Group 2: the probabilities of finding each waste successfully reflects these changes. given all features except MRI. After round 5, all participants are given a new table con- 4.1. Participants taining the probabilities of finding each waste given all fea- We recruited 10 participants online1 , of which 8 identified tures, with no features withheld. These tables represent the as females, and 2 as males, coming from 5 different nations. participants’ mental models (𝑡 ̂ in our computational model). They are between the ages of 20 and 48, averaging at 29. The mental models of the initial 5 rounds belong to those The participants were paid compensation for taking part in participants not having learned to associate certain features the experiment. with the underlying probabilities. We assign 𝑡1̂ to the initial mental model of Group 1, and 𝑡2̂ to that of Group 2. The new mental model assigned after round 5 is 𝑡0̂ . 4.2. Materials ult. x-ray MRI radio batt. lights app We conducted our experiment remotely using a webpage high high high - 0.7 0.3 0.6 designed to simulate a hypothetical scenario where partici- pants interact with the simulation environment and make Table 2 decisions based on feedback and prior instructions. Par- A snippet of the table shown to Group 1 ticipants interact by clicking buttons which are logged as experiment data. A snippet of the table given to Group 1 is shown in Table 2. Using this knowledge, if a participant obtains the cor- Scenario Picture a warehouse of unmarked boxes con- responding readings, they would know that the likelihood taining electric and electronic waste, including used batteries, of finding a battery is 0.65. Taking into consideration the LED lights, and household appliances. To identify what each action costs, they can calculate the expected reward and box contains, there’s an advanced scanner equipped with decide whether they would accept the box. ultrasound, X-ray tomography, magnetic resonance imag- The switch at round 5 is designed to model users acquir- ing (MRI), and radio frequency sensors. The warehouse ing a new mental model during an interaction after gaining manager can select a sensor to scan a box and get specific knowledge and expertise about the environment and cor- results. Each type of waste generates unique readings on rectly associating all features with the probabilities. the sensors. By scanning a box, the manager aims to deter- mine its specific contents. Specifically, each waste has four Summary Statistics The experiment data gathered features: ultrasound, x-ray, MRI, and radio frequency. Each are the sequence of actions performed by each participant, feature value can be either high or low. recorded as a list of button IDs. To eliminate unnecessary The scenario is represented on a webpage, and the partic- randomness, we transform the data using summary statistics: ipants play the role of warehouse manager. In each task, the we ignore any repetitions in the action and its order. As a participant is presented with a box of unknown contents, result, we are only concerned with whether each sensor has and given a goal of finding particular contents. The par- been used, and whether the participant decides to accept or ticipant must scan the box for the four features and decide reject the box. whether to open the box or abandon it, given their men- tal model of what contents produce what sorts of scanner Inference 10 participants each performed 12 tasks to readings, and what their goal is. generate 12 results of button clicks. In total 60 sequences are collected and transformed by summary statistics into sets 4.3. Experiment Procedure of boolean variables. Each result records the status of the 6 buttons, with 1 corresponding to the button being clicked, The experiment is carried out as follows: and 0 otherwise. For example, if a participant chooses to • Each participant performs 12 rounds of tasks. scan the X-ray and MRI, and rejects the box, the resultant • During each round, the webpage refreshes and ran- data would be: [1, 2, 3], and transformed into [0, 1, 1, 0, 1, 0]. domly generates a box as described above. As described in Section 3.4, the mental model 𝑐 can be • During each round, each participant is randomly quantified as a categorical variable. We divide the unit assigned a type of waste to look for. interval into thirds so that each third corresponds to one • The participant scans the box, and decides whether of three mental models 𝑡0̂ , 𝑡1̂ and 𝑡2̂ . We create simulated to accept or reject it. agents with the three mental models to produce simulated data. For each 𝑡,̂ we use Proximal Policy Optimization with Each participant is rewarded points for accepting the the default parameters [19] to train the simulated agents. box containing the assigned waste or rejecting the box not Using the mechanism in Section 3, our model samples containing it. If a participant wrongly accepts or rejects possible values of 𝑐 and compares the simulated results with a box, a penalty is applied. Scanning a feature will also participant data to produce a probabilistic distribution of 𝑐 cost points. Therefore participants are instructed to act values. We use MAP estimates to determine their values, as economically to make the right decision with minimal costs. outlined in Section 3.4. For each round of tasks each par- The 10 participants are divided into 2 groups of 5. In ticipant performs, we sample the corresponding simulated round 1, we give each group a table containing the proba- result 200 times. bility of finding each waste given a set of features. • Group 1: the probabilities of finding each waste 4.4. Experiment Result given all features except radio frequency; We can calculate the accuracy of our inference: the percent- 1 www.prolific.co age of the 200 inferred 𝑐 that matches the correct mental model 𝑡𝑖̂ , 𝑖 = 0, 1, 2. Averaged over all participants, we thus obtained 12 average prediction accuracies throughout the iteration. The result is presented in Figure 4. We plot the results in Figure 4. We observe the model’s average prediction accuracy for each participant’s mental model across the 12 rounds. The red, vertical dotted line marks the switching of 𝑡 ̂ as participants receive the new table after round 5. Furthermore, we also calculate the standard deviation of the inferred values of mental models for each round, and average over all participants. The result is shown in the Figure 5. The switching of 𝑡 ̂ is also marked by a red, vertical dotted line. 4.5. Discussion Figure 5: Standard deviation of the mental models across 12 We can discover several trends in the results as shown in rounds. The red dotted line indicates the switching of instruc- Figures 4 and 5. The accuracy of model prediction of mental tions. model 𝑡 ̂ increases per round (Figure 4). This is due to the Bayesian update of the model incorporating the results from previous rounds into the following rounds as prior informa- 5. Future Research tion. Consequently, the inference improves in accuracy as confoundments are gradually resolved. This is also shown In this paper, we present a formal, computational model in the decrease of standard deviations in Figure 5. In ear- to infer a user’s mental model during interaction. It can lier rounds, there is relatively little information and more detect changes in the mental model and dynamically updates confoundments, leading to greater uncertainty in inference the inference once sufficient evidence is accumulated. The results. As evidence accumulates and confoundments are experiment demonstrates a consistent trend of improving resolved, uncertainty also decreases. accuracy and decreasing variance in the model predictions. Importantly, both figures show a drastic change between The model can be a starting point for building an intelligent rounds 5 and 6, when the mental models 𝑡 ̂ are switched. The interactive system that truly understands its users. accuracy goes down and the standard deviation slightly in- Currently, the model needs to run ABC and sample at creases. This means that at round 6, the priors from previous each round of inference, as outlined in section 3. This makes rounds still have a strong influence on the inference results, the model too slow to be implemented in real applications. and the model clings to the prediction that the data were Consequently, a key improvement would be to make the produced by agents with the old mental model (either 𝑡1̂ or model more lightweight and efficient so that inferences and 𝑡2̂ ). However, as can be seen in Figures 4 and 5, evidence adaptations can be implemented in real-time. One idea accumulates due to our model’s Bayesian setup, suggesting worth exploring is amortizing the inference by pre-training that a new mental model was likely behind the observed the model using simulation [20]. data. Towards the later rounds, accuracy has recovered The entire inference framework must also be tested with and the model now firmly predicts the new mental model real HCI tasks, such as menu search and typing. To do so we 𝑡0̂ . Similar trends can also be observed in average standard need to define both the computational model of interaction deviations, as the value goes up slightly after round 5 before and the mental model. This would also allow us to compare continuing to descend. our proposed approach to existing methods and conduct statistical analysis with more participants. To do so would likely require insights from psychology, behavioral science, etc., and is beyond the scope of this work. Acknowledgments This research has been supported by the Academy of Finland (grant 330347). References [1] S. J. Payne, Mental models in human-computer inter- action, The Human-Computer Interaction Handbook (2007) 89–102. [2] N. Banovic, T. Buzali, F. Chevalier, J. Mankoff, A. K. Dey, Modeling and understanding human routine Figure 4: Average accuracy of model prediction of all 10 par- behavior, in: Proceedings of the 2016 CHI Conference ticipants across 12 rounds. The red dotted line indicates the on Human Factors in Computing Systems, 2016, pp. switching of instructions. 248–260. [3] H. Rutjes, M. Willemsen, W. IJsselsteijn, Considera- tions on explainable ai and users’ mental models, in: CHI 2019 Workshop: Where is the Human? Bridging Conference on Human Factors in Computing Systems, the Gap Between AI and HCI, Association for Com- 2023, pp. 1–20. puting Machinery, Inc, 2019. [4] M.-A. Storey, F. D. Fracchia, H. A. Müller, Cognitive de- sign elements to support the construction of a mental model during software exploration, Journal of Systems and Software 44 (1999) 171–185. [5] S. Dumais, R. Jeffries, D. M. Russell, D. Tang, J. Teevan, Understanding user behavior through log data and analysis, Ways of Knowing in HCI (2014) 349–372. [6] P. Langley, User modeling in adaptive interface, in: UM99 User Modeling: Proceedings of the Seventh International Conference, Springer, 1999, pp. 357–370. [7] P. Legrenzi, V. Girotto, Mental models in reasoning and decision-making processes, in: Mental models in cognitive science, Psychology Press, 2013, pp. 95–118. [8] A. Howes, J. P. Jokinen, A. Oulasvirta, Towards ma- chines that understand people, AI Magazine 44 (2023) 312–327. [9] Z. Duric, W. D. Gray, R. Heishman, F. Li, A. Rosenfeld, M. J. Schoelles, C. Schunn, H. Wechsler, Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction, Proceedings of the IEEE 90 (2002) 1272–1289. [10] G. Bailly, A. Oulasvirta, D. P. Brumby, A. Howes, Model of visual search and selection time in linear menus, in: Proceedings of the sigchi conference on human factors in computing systems, 2014, pp. 3865–3874. [11] X. Chen, G. Bailly, D. P. Brumby, A. Oulasvirta, A. Howes, The emergence of interactive behavior: A model of rational menu search, in: Proceedings of the 33rd annual ACM conference on human factors in computing systems, 2015, pp. 4217–4226. [12] S. Sarcar, J. P. Jokinen, A. Oulasvirta, Z. Wang, C. Sil- pasuwanchai, X. Ren, Ability-based optimization of touchscreen interactions, IEEE Pervasive Computing 17 (2018) 15–26. [13] A. Kangasrääsiö, J. P. Jokinen, A. Oulasvirta, A. Howes, S. Kaski, Parameter inference for computational cogni- tive models with approximate bayesian computation, Cognitive science 43 (2019) e12738. [14] M. T. Spaan, Partially observable markov decision processes, in: Reinforcement learning: State-of-the- art, Springer, 2012, pp. 387–414. [15] A. Oulasvirta, J. P. Jokinen, A. Howes, Computational rationality as a theory of interaction, in: Proceed- ings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–14. [16] M. A. Beaumont, W. Zhang, D. J. Balding, Approximate bayesian computation in population genetics, Genetics 162 (2002) 2025–2035. [17] K. Csilléry, M. G. Blum, O. E. Gaggiotti, O. François, Approximate bayesian computation (abc) in practice, Trends in ecology & evolution 25 (2010) 410–418. [18] J. Lintusaari, H. Vuollekoski, A. Kangasrääsiö, K. Skytén, M. Järvenpää, P. Marttinen, M. U. Gutmann, A. Vehtari, J. Corander, S. Kaski, Elfi: Engine for likelihood-free inference, Journal of Machine Learning Research 19 (2018) 1–7. [19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017). [20] H.-S. Moon, A. Oulasvirta, B. Lee, Amortized inference with user simulations, in: Proceedings of the 2023 CHI