Quantifying Uncertainty in Machine Theory of Mind Across Time Shanshan Zhang1,∗,† , Chuyang Wu1,2,† and Jussi P. P. Jokinen2 1 University of Helsinki, Pietari Kalmin katu 5, 00560 Helsinki Finland 2 University of Jyväskylä, Seminaarinkatu 15, PL 35, 40014 Jyväskylä Finland Abstract As intelligent interactive technologies advance, ensuring alignment with user preferences is critical. Machine theory of mind enables systems to infer latent mental states from observed behaviors, similarly to humans. Currently, there is no formal mechanism for integrating multiple observations over time and quantifying the uncertainty of inferences as the function of accumulated evidence in a provably human-like way. This paper addresses the issue through Bayesian inference, proposing a model that maintains a posterior belief about mental states as a probability distribution, updated with observational data. The advantage of Bayesian statistics lies in the possibility of evaluating the certainty of these inferences. We validate the model’s human-like mental inference capabilities through an experiment. Keywords Human-Computer Interaction, Machine Theory of Mind, Mentalizing, Uncertainty Quantification 1. Introduction The scenario hints at a preference for tea, yet the possibility that Janice may have an aversion to heights carries a degree Theory of mind, the innate human capacity to deduce others’ of uncertainty, nudging the likelihood slightly in favor of latent mental states from observable behavior [1, 2], under- tea. pins social collaboration [3, 4]. As artificial intelligence (AI) The final panel of Figure 1 offers a decisive moment: both advances, aligning intelligent machines with users’ pref- coffee and tea jars are easily accessible, and Janice opts for erences becomes imperative [5]. Achieving alignment be- coffee. Given the equal effort required to reach both, her tween human and machine objectives is facilitated when choice of coffee indicates a genuine preference for coffee, machines adopt reasoning processes that can be understood revealing that her earlier decisions were influenced by a by humans [6], suggesting the importance of machines em- reluctance to climb too high rather than a preference for tea. ulating human mental inference. A machine theory of mind Consequently, our inference shifts significantly towards cof- seeks to provide machines with the ability to infer mental fee with increased certainty. In this paper, we hypothesize states in a human-like manner. that humans are able to carry out these sorts of inferences Mental inference facilitates collaboration by informing and meta-cognitively assess how certain they are in inferred the agent and impacting its actions. The idea is that if an preferences. Moreover, we formalize a computational model intelligent machine has knowledge of the user’s goals, it of this process. can better make decisions to help the user. However, there is also an inherent risk in making decisions based on in- ferences: because all inferences contain uncertainty [7, 8], 2. Background Review the intelligent agent should have a way of considering the amount of uncertainty when taking actions. There needs Theory of mind, or mentalizing, enables humans to infer to be a way to quantify the amount of uncertainty, so that others’ mental states [9, 10, 11]. It facilitates social interac- the agent can robustly consider this when choosing what tion [3, 4] such as communication [12, 13] and collaboration actions to take. In this paper, we formalize a computational [1, 2]. Likewise, a machine that is able to carry out mental- model that infers preferences of observed agents. Obser- ization can better account user variability, improving the vations from multiple time steps are integrated, and the quality of interaction [14, 15, 16]. Experiments have demon- uncertainty associated with inferences is quantified in a strated that machines capable of mentalization achieve su- posterior distribution. perior performance in communication [17, 18] and team The problem that our paper tackles is illustrated in Fig- cooperation tasks [19]. ure 1. The three panels depict an evolving inference by an Models of mentalizing target the inference of mental observer of Janice’s drink preference under varying con- states such as preferences, costs [20], knowledge [21], and ditions in three consecutive days. Initially, Janice selects beliefs [9]. These models incorporate psychological hy- tea, but the positioning of coffee on a high shelf introduces potheses concerning of observed actors as computational ambiguity regarding her preference – does she favor tea, or frameworks, enabling the simulation of predicted behavior. does she simply wish to avoid climbing the kitchen ladder? Parameters within the model reflect various mental states, This uncertainty prevents a clear inference of her preference. including goals, guiding the behavior prediction for actors In the second panel, Janice uses a stool to reach the now under specific objectives in a given context [22]. Assuming higher-placed tea jar, while the coffee remains even further the psychological underpinnings are accurate, these models out of reach, potentially accessible with taller kitchen stairs. can predict an actor’s behavior based on their goals. Inverse modeling techniques are then employed to deduce the pa- rameters most likely to account for the observed behavior TKTP 2024: Annual Doctoral Symposium of Computer Science, 10.- [23, 24]. 11.6.2024 Vaasa, Finland ∗ Corresponding author. How to create a psychologically plausible model that can † These authors contributed equally. be parametrized with mental states and that then simulates Envelope-Open shanshan.zhang@helsinki.fi (S. Zhang); chuyang.wu@helsinki.fi behavior? One emerging popular approach is called compu- (C. Wu); jussi.p.p.jokinen@jyu.fi (J. P. P. Jokinen) tational rationality [25, 26]. It posits that intelligent agents, © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Figure 1: Inferences of preferences based on observed behavior contain uncertainty, especially when there are confounding factors such as effort. As more evidence accumulates, certainty increases. such as humans, choose actions that maximize expected the function utility. The agent must optimize its behavior with respect 𝑉𝜋 ∗ (𝑠) = max[𝑅(𝑠, 𝑎) + 𝛾 ∑ 𝑇 (𝑠, 𝑎, 𝑠 ′ )𝑉𝜋 ∗ (𝑠 ′ )], to the constraints environment. In addition, the approach 𝑎 𝑠 ′ ∈𝑆 is sensitive to the fact that intelligent agents have internal where 𝑉𝜋 ∗ (𝑠) is the value of a state 𝑠 ∈ 𝑆 under an optimal cognitive bounds as well, such as limited knowledge and in- policy 𝜋 ∗ , discounting future rewards using 𝛾 ∈ [0, 1]. This formation processing capacity. The approach is suitable for optimality assumption ties in with computational rationality. computational modeling of theory of mind, because it helps Importantly, it is possible to implement bounds in the MDP to prune the space of possible explanations by assuming formalism, forcing a bounded optimal behavior to emerge. that the observed behavior is produced by a computational The bounded optimal agent described via an MDP can rational agent. When the bounds of the environment and be parametrized. For instance, a parameter can govern its the cognition are known and modeled correctly, the model preferences, that is, the state rewards. This permits mental- can then be applied for reliable parameter inference [27]. izing: given observed data, what parameters best produce Inferences, including those related to mentalizing, are predicted data that fits the observations? To this end, we often made under conditions of limited data, inherently in- utilize Bayesian inference, described by Bayes’ rule: volving uncertainty [28, 7]. The similarity in actions among individuals with diverse preferences in specific contexts 𝑃(𝑥|𝜃)𝑃(𝜃) 𝑃(𝜃|𝑥) = , implies that observations alone may not suffice for conclu- 𝑃(𝑥) sive inferences. The complexity of social settings further where 𝜃 represents the latent factors to be inferred, and 𝑥 amplifies this uncertainty, highlighting the importance of represents observed data. The inference uses a prior 𝑃(𝜃) incorporating it into models of social collaboration [29]. and a likelihood 𝑃(𝑥|𝜃) to calculate posterior probability Thus, agents capable of mentalizing should not only emu- 𝑃(𝜃|𝑥), normalized with marginal likelihood 𝑃(𝑥). However, late human-like inference of mental states but also assess the intractability of the likelihood 𝑃(𝑥|𝜃) prevents us from the uncertainty of these inferences. deriving the posterior directly. This can be overcome with approximation and likelihood free inference methods [30], such as Bayesian Optimization for Likelihood-Free Inference 3. Method (BOLFI) [31]. Figure 2 illustrates the information flow in our model. Following the standard modeling pipeline in computational Prior knowledge and observation data serve as inputs of rationality [26], we formalize the task environment as a an inference module, which parameterizes a RL agent. The Markov Decision Process (MDP). It is represented as a tuple agent then learns a bounded optimal policy within a sim- < 𝑆, 𝐴, 𝑇 , 𝑅 >, consisting state space 𝑆, action space 𝐴, tran- ulator modeling the observed real-world task. Through sition probabilities 𝑇 and reward function 𝑅. A state 𝑠 ∈ 𝑆 multiple samplings, plausibility for various parameter val- encoding current information of the environment, transfers ues is evaluated, forming a posterior distribution that to next state 𝑠 ′ ∈ 𝑆 by performing an action 𝑎 ∈ 𝐴 accord- serves as the prior for subsequent inference with new ing to transition probability 𝑇 (𝑠, 𝑎, 𝑠 ′ ) = 𝑃(𝑠 ′ |𝑠, 𝑎), and gains observation data. This framework facilitates the tem- the reward 𝑟 = 𝑅(𝑠, 𝑎). Reinforcement learning (RL) solves poral integration of inferences and allows for uncer- the optimization problem of how to choosing the action 𝑎 tainty analysis within the posterior probability distribu- through policy 𝜋(𝑎) = 𝑃(𝑎|𝑠) that maximizes the expected tion. All model details are available at the model’s code reward by interacting with the environment and learning repository (https://version.helsinki.fi/shanz/quantifying- from experience. The learning process can be expressed as uncertainty-in-mtom.git). it were closer, participants rated the likelihood of the robot’s preference for each station on a scale from 1 (very unlikely) to 5 (very likely). After making their likelihood assessment for the stations, they were presented with the next stimulus, with instructions to refine their inferences based on all pre- viously shown images of the present task. Only one image was shown at any single time. Upon the task changing after five stimuli, participants were reminded that a new robot with different preferences was introduced. For our model, we represented the tasks within a grid world that the RL agent needed to navigate. It incurred a Figure 2: The overall structure of the model. It consists of sim- ulation of external world and inference module, which can be minor negative penalty for movement and obtained posi- repeated as new observation comes. tive rewards from both charging stations, determined by two specific parameters. The objective was to infer these parameters based on the observed data. We measured the discrepancy between observed and generated trajectories 4. Evaluation using Jaccard similarity. Essentially, our inference engine recreated the world as depicted in the stimulus, then ran the 4.1. Participants RL agent across varying parameters, comparing the gener- We recruited 𝑁 = 10 participants via the Prolific online ated trajectory against the observed one to form a posterior platform. The number of participants was small, but because distribution for the two preferences. Preference likelihood our experiment setup was well defined, we expected them ratings for the model were derived by computing the mean to have a high agreement with each other. This was the case, of the posterior distribution for preferences associated with meaning that a larger number of participants would likely both the blue and red charging stations. not have changed the results. Their mean age was 35.6, and age range 23-56. They were required to be fluent in English, 4.4. Results and be on a PC (no mobile devices were allowed). The preference ratings of each response were first standard- ized so that they sum up to 1. Then, a mean rating for 4.2. Materials each stimulus in each task was computed. The model’s rat- The experiment consisted of eight distinct tasks, each in- ings were likewise standardized to sum up to 1, allowing cluding five stimulus images. One image shows a trajectory comparison between human and model inferences. This of a robot on a grid from a birds-eye perspective. The robot comparison is shown in Figure 4. For calculating model fit, is moving from its starting position to either a blue or red we selected only the inferences of the other color, because circle, representing charging stations. There may also be their values are inversions of each other after standadization. walls, and the robot must navigate around them. Each pic- The model achieves a good fit, 𝑅2 = 0.78, 𝑅𝑀𝑆𝐸 = 0.1. The ture is different, and there were a total of 8 ⋅ 5 = 40 stimuli. most salient discrepancy between the model and human An example task is shown in Figure 3. inferences is that the model is more careful in its estimates. Importantly, these results were obtained without any pa- rameter tuning, meaning the model was not fit to the human data, but emerged similar data due to strong psychological assumptions about theory of mind. The results exhibit the expected patterns of inference. Initially, participants faced uncertainty due to the limited evidence available. As they were exposed to additional stimuli, their inferences regarding the robot’s preferences became more definite: one station’s likelihood ratings in- creased, while the other’s decreased. Task 1 serves as an example of this (Figure 3): the participants’ inference that the robot prefers the red station gets stronger with each stimulus image shown. However, in tasks 3, 4, 6, 7, and 8, early stimuli suggested a certain preference, but subsequent Figure 3: The five stimuli shown sequentially to the participants, stimuli revealed a stronger preference for the alternate sta- Task 1. Stimulus numbers are added here, and were not present tion. This is similar to our motivating example in Figure in the experiment. 1. In these instances, the inferred preference for the more favored station shifted as the task progressed. Task 6 is an example of this (Figure 5): the participants are shown that the robot selects the red station, but it is always closer than 4.3. Experiment Procedure the blue one, so there is uncertainty. Finally, in stimulus 5, it is revealed that the robot in fact prefers the blue station. Participants were tasked with discerning the preferred charging station of a specific task’s robot, understanding that while the robot could charge at either, it had a latent 4.5. Discussion preference for one. Instructed that the robot also aimed to Human-AI alignment necessitates that both humans and conserve energy, possibly choosing a less favored station if intelligent machines accurately interpret each other’s inten- Figure 4: Comparison of model and human inferences across eight tasks. As more evidence accumulates, the inferences become more certain. Values close to 0.5 indicate high uncertainty, and values close to either 0 or 1 high certainty. Acknowledgments This research has been supported by the Academy of Finland (grant 330347). References [1] E. Etel, V. Slaughter, Theory of mind and peer co- operation in two play contexts, Journal of Applied Developmental Psychology 60 (2019) 87–95. [2] T. Paal, T. Bereczkei, Adult theory of mind, coopera- tion, machiavellianism: The effect of mindreading on social relations, Personality and individual differences Figure 5: In Task 6, the participants only learned the true prefer- 43 (2007) 541–551. ence in the final image. [3] M. I. Brown, A. Ratajska, S. L. Hughes, J. B. Fishman, E. Huerta, C. F. Chabris, The social shapes test: A new measure of social intelligence, mentalizing, and theory tions and actions [5]. This paper introduces a human-like of mind, Personality and Individual Differences 143 theory of mind model capable of temporal observation in- (2019) 107–117. tegration, while being sensitive to uncertainty inherent in [4] J. F. Kihlstrom, N. Cantor, Social intelligence. (2000). mentalizing. We validated the model’s human-like inference [5] S. Russell, Human compatible: Artificial intelligence capabilities through a grid world task focused on preference and the problem of control, Penguin, 2019. determination between two goals. The work carried here [6] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Ger- is theoretical in nature, and future studies should focus on shman, Building machines that learn and think like more complex scenarios. While computational rationality people, Behavioral and brain sciences 40 (2017). has effectively modeled complex behaviors, such as mul- [7] I. Cho, N. Kamkar, N. Hosseini-Kamkar, Reasoning titasking while driving [32] and touchscreen typing [33], about mental states under uncertainty, PloS one 17 the exploration of long-term parameter inference in such (2022) e0277356. contexts remains to be done. [8] O. FeldmanHall, A. Shenhav, Resolving uncertainty Exploring decision-making under uncertainty is a large in a social world, Nature human behaviour 3 (2019) research topic. In our experiments, both humans and the 426–435. model engaged in inferences and explicitly evaluated uncer- [9] C. L. Baker, J. Jara-Ettinger, R. Saxe, J. B. Tenenbaum, tainty, but they were not required to act on these inferences. Rational quantitative attribution of beliefs, desires and A scenario where the model assists the observed actor will percepts in human mentalizing, Nature Human Be- introduce the question of how to integrate uncertainty into haviour 1 (2017) 1–10. decision-making. Taking the example of Janice from Figure [10] S. Liu, T. D. Ullman, J. B. Tenenbaum, E. S. Spelke, 1, if adjusting the positions of the coffee and tea jars could Ten-month-old infants infer the value of goals from aid her, the decision to do so necessitates careful consider- the costs of actions, Science 358 (2017) 1038–1041. ation of potential consequences, ensuring the action truly [11] H. Richardson, G. Lisandrelli, A. Riobueno-Naylor, benefits rather than hinders her. The manner in which a R. Saxe, Development of the social brain from age decision-making algorithm accounts for uncertainty during three to twelve years, Nature communications 9 (2018) collaborative efforts is impacts the helpfulness of interven- 1–12. tions and carries a risk of unintended obstruction. [12] I. Dziobek, S. Fleck, E. Kalbe, K. Rogers, J. Hassenstab, All code, materials, and data are published online M. Brand, J. Kessler, J. K. Woike, O. T. Wolf, A. Convit, (https://version.helsinki.fi/shanz/quantifying-uncertainty- Introducing masc: a movie for the assessment of so- in-mtom.git) to facilitate open science. cial cognition, Journal of autism and developmental 312–327. disorders 36 (2006) 623–636. [28] J. X. O’reilly, Making predictions in a changing [13] R. Markiewicz, F. Rahman, I. Apperly, A. Mazaheri, world—inference, uncertainty, and learning, Frontiers K. Segaert, It is not all about you: Communicative in neuroscience 7 (2013) 105. cooperation is determined by your partner’s theory [29] O. FeldmanHall, M. R. Nassar, The computational chal- of mind abilities as well as your own., Journal of lenge of social learning, Trends in Cognitive Sciences Experimental Psychology: Learning, Memory, and 25 (2021) 1045–1057. Cognition (2023). [30] M. U. Gutmann, J. Corander, et al., Bayesian optimiza- [14] M. Harbers, K. Van Den Bosch, J.-J. Meyer, Modeling tion for likelihood-free inference of simulator-based agents with a theory of mind, in: 2009 IEEE/WIC/ACM statistical models, Journal of Machine Learning Re- International Joint Conference on Web Intelligence search (2016). and Intelligent Agent Technology, volume 2, IEEE, [31] J. Lintusaari, H. Vuollekoski, A. Kangasrääsiö, 2009, pp. 217–224. K. Skytén, M. Järvenpää, P. Marttinen, M. U. Gutmann, [15] S. Devin, R. Alami, An implemented theory of mind to A. Vehtari, J. Corander, S. Kaski, Elfi: Engine for improve human-robot shared plans execution, in: 2016 likelihood-free inference, Journal of Machine Learning 11th ACM/IEEE International Conference on Human- Research 19 (2018) 1–7. Robot Interaction (HRI), IEEE, 2016, pp. 319–326. [32] J. P. Jokinen, T. Kujala, A. Oulasvirta, Multitasking [16] K.-J. Kim, H. Lipson, Towards a simple robotic theory in driving as optimal adaptation under uncertainty, of mind, in: Proceedings of the 9th workshop on Human factors 63 (2021) 1324–1341. performance metrics for intelligent systems, 2009, pp. [33] J. Jokinen, A. Acharya, M. Uzair, X. Jiang, A. Oulasvirta, 131–138. Touchscreen typing as optimal supervisory control, in: [17] S. Lin, B. Keysar, N. Epley, Reflexively mindblind: Proceedings of the 2021 CHI Conference on Human Using theory of mind to interpret behavior requires Factors in Computing Systems, 2021, pp. 1–14. effortful attention, Journal of Experimental Social Psychology 46 (2010) 551–556. [18] Q. Wang, K. Saha, E. Gregori, D. Joyner, A. Goel, To- wards mutual theory of mind in human-ai interaction: How language reflects what students perceive about a virtual teaching assistant, in: Proceedings of the 2021 CHI conference on human factors in computing systems, 2021, pp. 1–14. [19] L. M. Hiatt, A. M. Harrison, J. G. Trafton, Accommodat- ing human variability in human-robot teams through theory of mind, in: Twenty-second international joint conference on artificial intelligence, 2011. [20] J. Jara-Ettinger, L. E. Schulz, J. B. Tenenbaum, The naive utility calculus as a unified, quantitative frame- work for action understanding, Cognitive Psychology 123 (2020) 101334. [21] P. Shafto, N. D. Goodman, M. C. Frank, Learning from others: The consequences of psychological reasoning for human learning, Perspectives on Psychological Science 7 (2012) 341–351. [22] Jokinen, Remes, Kujala, Corander, Bayesian parameter inference for cognitive simulators, in: J. Williamson, A. Oulasvirta, P. Kristensson, N. Banovic (Eds.), Bayesian methods for interaction design, Cambridge University Press, 2022. [23] C. L. Baker, R. Saxe, J. B. Tenenbaum, Action under- standing as inverse planning, Cognition 113 (2009) 329–349. [24] A. Kangasrääsiö, J. P. Jokinen, A. Oulasvirta, A. Howes, S. Kaski, Parameter inference for computational cogni- tive models with approximate bayesian computation, Cognitive science 43 (2019) e12738. [25] R. L. Lewis, A. Howes, S. Singh, Computational ra- tionality: Linking mechanism and behavior through bounded utility maximization, Topics in cognitive science 6 (2014) 279–311. [26] A. Oulasvirta, J. P. Jokinen, A. Howes, Computational rationality as a theory of interaction, in: Proceed- ings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–14. [27] A. Howes, J. P. Jokinen, A. Oulasvirta, Towards ma- chines that understand people, AI Magazine 44 (2023)