Quantifying Uncertainty in Machine Theory of Mind Across Time
                         Shanshan Zhang1,∗,† , Chuyang Wu1,2,† and Jussi P. P. Jokinen2
                         1
                             University of Helsinki, Pietari Kalmin katu 5, 00560 Helsinki Finland
                         2
                             University of Jyväskylä, Seminaarinkatu 15, PL 35, 40014 Jyväskylä Finland


                                           Abstract
                                           As intelligent interactive technologies advance, ensuring alignment with user preferences is critical. Machine theory of mind enables
                                           systems to infer latent mental states from observed behaviors, similarly to humans. Currently, there is no formal mechanism for
                                           integrating multiple observations over time and quantifying the uncertainty of inferences as the function of accumulated evidence in a
                                           provably human-like way. This paper addresses the issue through Bayesian inference, proposing a model that maintains a posterior
                                           belief about mental states as a probability distribution, updated with observational data. The advantage of Bayesian statistics lies in the
                                           possibility of evaluating the certainty of these inferences. We validate the model’s human-like mental inference capabilities through an
                                           experiment.

                                           Keywords
                                           Human-Computer Interaction, Machine Theory of Mind, Mentalizing, Uncertainty Quantification


                         1. Introduction                                                                                              The scenario hints at a preference for tea, yet the possibility
                                                                                                                                      that Janice may have an aversion to heights carries a degree
                         Theory of mind, the innate human capacity to deduce others’                                                  of uncertainty, nudging the likelihood slightly in favor of
                         latent mental states from observable behavior [1, 2], under-                                                 tea.
                         pins social collaboration [3, 4]. As artificial intelligence (AI)                                               The final panel of Figure 1 offers a decisive moment: both
                         advances, aligning intelligent machines with users’ pref-                                                    coffee and tea jars are easily accessible, and Janice opts for
                         erences becomes imperative [5]. Achieving alignment be-                                                      coffee. Given the equal effort required to reach both, her
                         tween human and machine objectives is facilitated when                                                       choice of coffee indicates a genuine preference for coffee,
                         machines adopt reasoning processes that can be understood                                                    revealing that her earlier decisions were influenced by a
                         by humans [6], suggesting the importance of machines em-                                                     reluctance to climb too high rather than a preference for tea.
                         ulating human mental inference. A machine theory of mind                                                     Consequently, our inference shifts significantly towards cof-
                         seeks to provide machines with the ability to infer mental                                                   fee with increased certainty. In this paper, we hypothesize
                         states in a human-like manner.                                                                               that humans are able to carry out these sorts of inferences
                            Mental inference facilitates collaboration by informing                                                   and meta-cognitively assess how certain they are in inferred
                         the agent and impacting its actions. The idea is that if an                                                  preferences. Moreover, we formalize a computational model
                         intelligent machine has knowledge of the user’s goals, it                                                    of this process.
                         can better make decisions to help the user. However, there
                         is also an inherent risk in making decisions based on in-
                         ferences: because all inferences contain uncertainty [7, 8],                                                 2. Background Review
                         the intelligent agent should have a way of considering the
                         amount of uncertainty when taking actions. There needs                                                       Theory of mind, or mentalizing, enables humans to infer
                         to be a way to quantify the amount of uncertainty, so that                                                   others’ mental states [9, 10, 11]. It facilitates social interac-
                         the agent can robustly consider this when choosing what                                                      tion [3, 4] such as communication [12, 13] and collaboration
                         actions to take. In this paper, we formalize a computational                                                 [1, 2]. Likewise, a machine that is able to carry out mental-
                         model that infers preferences of observed agents. Obser-                                                     ization can better account user variability, improving the
                         vations from multiple time steps are integrated, and the                                                     quality of interaction [14, 15, 16]. Experiments have demon-
                         uncertainty associated with inferences is quantified in a                                                    strated that machines capable of mentalization achieve su-
                         posterior distribution.                                                                                      perior performance in communication [17, 18] and team
                            The problem that our paper tackles is illustrated in Fig-                                                 cooperation tasks [19].
                         ure 1. The three panels depict an evolving inference by an                                                      Models of mentalizing target the inference of mental
                         observer of Janice’s drink preference under varying con-                                                     states such as preferences, costs [20], knowledge [21], and
                         ditions in three consecutive days. Initially, Janice selects                                                 beliefs [9]. These models incorporate psychological hy-
                         tea, but the positioning of coffee on a high shelf introduces                                                potheses concerning of observed actors as computational
                         ambiguity regarding her preference – does she favor tea, or                                                  frameworks, enabling the simulation of predicted behavior.
                         does she simply wish to avoid climbing the kitchen ladder?                                                   Parameters within the model reflect various mental states,
                         This uncertainty prevents a clear inference of her preference.                                               including goals, guiding the behavior prediction for actors
                         In the second panel, Janice uses a stool to reach the now                                                    under specific objectives in a given context [22]. Assuming
                         higher-placed tea jar, while the coffee remains even further                                                 the psychological underpinnings are accurate, these models
                         out of reach, potentially accessible with taller kitchen stairs.                                             can predict an actor’s behavior based on their goals. Inverse
                                                                                                                                      modeling techniques are then employed to deduce the pa-
                                                                                                                                      rameters most likely to account for the observed behavior
                         TKTP 2024: Annual Doctoral Symposium of Computer Science, 10.-                                               [23, 24].
                         11.6.2024 Vaasa, Finland
                         ∗
                              Corresponding author.
                                                                                                                                         How to create a psychologically plausible model that can
                         †
                             These authors contributed equally.                                                                       be parametrized with mental states and that then simulates
                         Envelope-Open shanshan.zhang@helsinki.fi (S. Zhang); chuyang.wu@helsinki.fi                                  behavior? One emerging popular approach is called compu-
                         (C. Wu); jussi.p.p.jokinen@jyu.fi (J. P. P. Jokinen)                                                         tational rationality [25, 26]. It posits that intelligent agents,
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: Inferences of preferences based on observed behavior contain uncertainty, especially when there are confounding factors such
as effort. As more evidence accumulates, certainty increases.


such as humans, choose actions that maximize expected                    the function
utility. The agent must optimize its behavior with respect                      𝑉𝜋 ∗ (𝑠) = max[𝑅(𝑠, 𝑎) + 𝛾 ∑ 𝑇 (𝑠, 𝑎, 𝑠 ′ )𝑉𝜋 ∗ (𝑠 ′ )],
to the constraints environment. In addition, the approach                                    𝑎
                                                                                                              𝑠 ′ ∈𝑆
is sensitive to the fact that intelligent agents have internal           where 𝑉𝜋 ∗ (𝑠) is the value of a state 𝑠 ∈ 𝑆 under an optimal
cognitive bounds as well, such as limited knowledge and in-              policy 𝜋 ∗ , discounting future rewards using 𝛾 ∈ [0, 1]. This
formation processing capacity. The approach is suitable for              optimality assumption ties in with computational rationality.
computational modeling of theory of mind, because it helps               Importantly, it is possible to implement bounds in the MDP
to prune the space of possible explanations by assuming                  formalism, forcing a bounded optimal behavior to emerge.
that the observed behavior is produced by a computational                   The bounded optimal agent described via an MDP can
rational agent. When the bounds of the environment and                   be parametrized. For instance, a parameter can govern its
the cognition are known and modeled correctly, the model                 preferences, that is, the state rewards. This permits mental-
can then be applied for reliable parameter inference [27].               izing: given observed data, what parameters best produce
   Inferences, including those related to mentalizing, are               predicted data that fits the observations? To this end, we
often made under conditions of limited data, inherently in-              utilize Bayesian inference, described by Bayes’ rule:
volving uncertainty [28, 7]. The similarity in actions among
individuals with diverse preferences in specific contexts                                               𝑃(𝑥|𝜃)𝑃(𝜃)
                                                                                                 𝑃(𝜃|𝑥) =          ,
implies that observations alone may not suffice for conclu-                                                𝑃(𝑥)
sive inferences. The complexity of social settings further               where 𝜃 represents the latent factors to be inferred, and 𝑥
amplifies this uncertainty, highlighting the importance of               represents observed data. The inference uses a prior 𝑃(𝜃)
incorporating it into models of social collaboration [29].               and a likelihood 𝑃(𝑥|𝜃) to calculate posterior probability
Thus, agents capable of mentalizing should not only emu-                 𝑃(𝜃|𝑥), normalized with marginal likelihood 𝑃(𝑥). However,
late human-like inference of mental states but also assess               the intractability of the likelihood 𝑃(𝑥|𝜃) prevents us from
the uncertainty of these inferences.                                     deriving the posterior directly. This can be overcome with
                                                                         approximation and likelihood free inference methods [30],
                                                                         such as Bayesian Optimization for Likelihood-Free Inference
3. Method                                                                (BOLFI) [31].
                                                                            Figure 2 illustrates the information flow in our model.
Following the standard modeling pipeline in computational
                                                                         Prior knowledge and observation data serve as inputs of
rationality [26], we formalize the task environment as a
                                                                         an inference module, which parameterizes a RL agent. The
Markov Decision Process (MDP). It is represented as a tuple
                                                                         agent then learns a bounded optimal policy within a sim-
< 𝑆, 𝐴, 𝑇 , 𝑅 >, consisting state space 𝑆, action space 𝐴, tran-
                                                                         ulator modeling the observed real-world task. Through
sition probabilities 𝑇 and reward function 𝑅. A state 𝑠 ∈ 𝑆
                                                                         multiple samplings, plausibility for various parameter val-
encoding current information of the environment, transfers
                                                                         ues is evaluated, forming a posterior distribution that
to next state 𝑠 ′ ∈ 𝑆 by performing an action 𝑎 ∈ 𝐴 accord-
                                                                         serves as the prior for subsequent inference with new
ing to transition probability 𝑇 (𝑠, 𝑎, 𝑠 ′ ) = 𝑃(𝑠 ′ |𝑠, 𝑎), and gains
                                                                         observation data. This framework facilitates the tem-
the reward 𝑟 = 𝑅(𝑠, 𝑎). Reinforcement learning (RL) solves
                                                                         poral integration of inferences and allows for uncer-
the optimization problem of how to choosing the action 𝑎
                                                                         tainty analysis within the posterior probability distribu-
through policy 𝜋(𝑎) = 𝑃(𝑎|𝑠) that maximizes the expected
                                                                         tion. All model details are available at the model’s code
reward by interacting with the environment and learning
                                                                         repository (https://version.helsinki.fi/shanz/quantifying-
from experience. The learning process can be expressed as
                                                                         uncertainty-in-mtom.git).
                                                                     it were closer, participants rated the likelihood of the robot’s
                                                                     preference for each station on a scale from 1 (very unlikely)
                                                                     to 5 (very likely). After making their likelihood assessment
                                                                     for the stations, they were presented with the next stimulus,
                                                                     with instructions to refine their inferences based on all pre-
                                                                     viously shown images of the present task. Only one image
                                                                     was shown at any single time. Upon the task changing after
                                                                     five stimuli, participants were reminded that a new robot
                                                                     with different preferences was introduced.
                                                                        For our model, we represented the tasks within a grid
                                                                     world that the RL agent needed to navigate. It incurred a
Figure 2: The overall structure of the model. It consists of sim-
ulation of external world and inference module, which can be         minor negative penalty for movement and obtained posi-
repeated as new observation comes.                                   tive rewards from both charging stations, determined by
                                                                     two specific parameters. The objective was to infer these
                                                                     parameters based on the observed data. We measured the
                                                                     discrepancy between observed and generated trajectories
4. Evaluation                                                        using Jaccard similarity. Essentially, our inference engine
                                                                     recreated the world as depicted in the stimulus, then ran the
4.1. Participants                                                    RL agent across varying parameters, comparing the gener-
We recruited 𝑁 = 10 participants via the Prolific online             ated trajectory against the observed one to form a posterior
platform. The number of participants was small, but because          distribution for the two preferences. Preference likelihood
our experiment setup was well defined, we expected them              ratings for the model were derived by computing the mean
to have a high agreement with each other. This was the case,         of the posterior distribution for preferences associated with
meaning that a larger number of participants would likely            both the blue and red charging stations.
not have changed the results. Their mean age was 35.6, and
age range 23-56. They were required to be fluent in English,         4.4. Results
and be on a PC (no mobile devices were allowed).
                                                                     The preference ratings of each response were first standard-
                                                                     ized so that they sum up to 1. Then, a mean rating for
4.2. Materials                                                       each stimulus in each task was computed. The model’s rat-
The experiment consisted of eight distinct tasks, each in-           ings were likewise standardized to sum up to 1, allowing
cluding five stimulus images. One image shows a trajectory           comparison between human and model inferences. This
of a robot on a grid from a birds-eye perspective. The robot         comparison is shown in Figure 4. For calculating model fit,
is moving from its starting position to either a blue or red         we selected only the inferences of the other color, because
circle, representing charging stations. There may also be            their values are inversions of each other after standadization.
walls, and the robot must navigate around them. Each pic-            The model achieves a good fit, 𝑅2 = 0.78, 𝑅𝑀𝑆𝐸 = 0.1. The
ture is different, and there were a total of 8 ⋅ 5 = 40 stimuli.     most salient discrepancy between the model and human
An example task is shown in Figure 3.                                inferences is that the model is more careful in its estimates.
                                                                     Importantly, these results were obtained without any pa-
                                                                     rameter tuning, meaning the model was not fit to the human
                                                                     data, but emerged similar data due to strong psychological
                                                                     assumptions about theory of mind.
                                                                         The results exhibit the expected patterns of inference.
                                                                     Initially, participants faced uncertainty due to the limited
                                                                     evidence available. As they were exposed to additional
                                                                     stimuli, their inferences regarding the robot’s preferences
                                                                     became more definite: one station’s likelihood ratings in-
                                                                     creased, while the other’s decreased. Task 1 serves as an
                                                                     example of this (Figure 3): the participants’ inference that
                                                                     the robot prefers the red station gets stronger with each
                                                                     stimulus image shown. However, in tasks 3, 4, 6, 7, and 8,
                                                                     early stimuli suggested a certain preference, but subsequent
Figure 3: The five stimuli shown sequentially to the participants,
                                                                     stimuli revealed a stronger preference for the alternate sta-
Task 1. Stimulus numbers are added here, and were not present        tion. This is similar to our motivating example in Figure
in the experiment.                                                   1. In these instances, the inferred preference for the more
                                                                     favored station shifted as the task progressed. Task 6 is an
                                                                     example of this (Figure 5): the participants are shown that
                                                                     the robot selects the red station, but it is always closer than
4.3. Experiment Procedure                                            the blue one, so there is uncertainty. Finally, in stimulus 5,
                                                                     it is revealed that the robot in fact prefers the blue station.
Participants were tasked with discerning the preferred
charging station of a specific task’s robot, understanding
that while the robot could charge at either, it had a latent         4.5. Discussion
preference for one. Instructed that the robot also aimed to          Human-AI alignment necessitates that both humans and
conserve energy, possibly choosing a less favored station if         intelligent machines accurately interpret each other’s inten-
          Figure 4: Comparison of model and human inferences across eight tasks. As more evidence accumulates, the inferences
          become more certain. Values close to 0.5 indicate high uncertainty, and values close to either 0 or 1 high certainty.


                                                                      Acknowledgments
                                                                      This research has been supported by the Academy of Finland
                                                                      (grant 330347).


                                                                      References
                                                                       [1] E. Etel, V. Slaughter, Theory of mind and peer co-
                                                                           operation in two play contexts, Journal of Applied
                                                                           Developmental Psychology 60 (2019) 87–95.
                                                                       [2] T. Paal, T. Bereczkei, Adult theory of mind, coopera-
                                                                           tion, machiavellianism: The effect of mindreading on
                                                                           social relations, Personality and individual differences
Figure 5: In Task 6, the participants only learned the true prefer-
                                                                           43 (2007) 541–551.
ence in the final image.
                                                                       [3] M. I. Brown, A. Ratajska, S. L. Hughes, J. B. Fishman,
                                                                           E. Huerta, C. F. Chabris, The social shapes test: A new
                                                                           measure of social intelligence, mentalizing, and theory
tions and actions [5]. This paper introduces a human-like                  of mind, Personality and Individual Differences 143
theory of mind model capable of temporal observation in-                   (2019) 107–117.
tegration, while being sensitive to uncertainty inherent in            [4] J. F. Kihlstrom, N. Cantor, Social intelligence. (2000).
mentalizing. We validated the model’s human-like inference             [5] S. Russell, Human compatible: Artificial intelligence
capabilities through a grid world task focused on preference               and the problem of control, Penguin, 2019.
determination between two goals. The work carried here                 [6] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Ger-
is theoretical in nature, and future studies should focus on               shman, Building machines that learn and think like
more complex scenarios. While computational rationality                    people, Behavioral and brain sciences 40 (2017).
has effectively modeled complex behaviors, such as mul-                [7] I. Cho, N. Kamkar, N. Hosseini-Kamkar, Reasoning
titasking while driving [32] and touchscreen typing [33],                  about mental states under uncertainty, PloS one 17
the exploration of long-term parameter inference in such                   (2022) e0277356.
contexts remains to be done.                                           [8] O. FeldmanHall, A. Shenhav, Resolving uncertainty
   Exploring decision-making under uncertainty is a large                  in a social world, Nature human behaviour 3 (2019)
research topic. In our experiments, both humans and the                    426–435.
model engaged in inferences and explicitly evaluated uncer-            [9] C. L. Baker, J. Jara-Ettinger, R. Saxe, J. B. Tenenbaum,
tainty, but they were not required to act on these inferences.             Rational quantitative attribution of beliefs, desires and
A scenario where the model assists the observed actor will                 percepts in human mentalizing, Nature Human Be-
introduce the question of how to integrate uncertainty into                haviour 1 (2017) 1–10.
decision-making. Taking the example of Janice from Figure             [10] S. Liu, T. D. Ullman, J. B. Tenenbaum, E. S. Spelke,
1, if adjusting the positions of the coffee and tea jars could             Ten-month-old infants infer the value of goals from
aid her, the decision to do so necessitates careful consider-              the costs of actions, Science 358 (2017) 1038–1041.
ation of potential consequences, ensuring the action truly            [11] H. Richardson, G. Lisandrelli, A. Riobueno-Naylor,
benefits rather than hinders her. The manner in which a                    R. Saxe, Development of the social brain from age
decision-making algorithm accounts for uncertainty during                  three to twelve years, Nature communications 9 (2018)
collaborative efforts is impacts the helpfulness of interven-              1–12.
tions and carries a risk of unintended obstruction.                   [12] I. Dziobek, S. Fleck, E. Kalbe, K. Rogers, J. Hassenstab,
   All code, materials, and data are published online                      M. Brand, J. Kessler, J. K. Woike, O. T. Wolf, A. Convit,
(https://version.helsinki.fi/shanz/quantifying-uncertainty-                Introducing masc: a movie for the assessment of so-
in-mtom.git) to facilitate open science.
     cial cognition, Journal of autism and developmental             312–327.
     disorders 36 (2006) 623–636.                               [28] J. X. O’reilly, Making predictions in a changing
[13] R. Markiewicz, F. Rahman, I. Apperly, A. Mazaheri,              world—inference, uncertainty, and learning, Frontiers
     K. Segaert, It is not all about you: Communicative              in neuroscience 7 (2013) 105.
     cooperation is determined by your partner’s theory         [29] O. FeldmanHall, M. R. Nassar, The computational chal-
     of mind abilities as well as your own., Journal of              lenge of social learning, Trends in Cognitive Sciences
     Experimental Psychology: Learning, Memory, and                  25 (2021) 1045–1057.
     Cognition (2023).                                          [30] M. U. Gutmann, J. Corander, et al., Bayesian optimiza-
[14] M. Harbers, K. Van Den Bosch, J.-J. Meyer, Modeling             tion for likelihood-free inference of simulator-based
     agents with a theory of mind, in: 2009 IEEE/WIC/ACM             statistical models, Journal of Machine Learning Re-
     International Joint Conference on Web Intelligence              search (2016).
     and Intelligent Agent Technology, volume 2, IEEE,          [31] J. Lintusaari, H. Vuollekoski, A. Kangasrääsiö,
     2009, pp. 217–224.                                              K. Skytén, M. Järvenpää, P. Marttinen, M. U. Gutmann,
[15] S. Devin, R. Alami, An implemented theory of mind to            A. Vehtari, J. Corander, S. Kaski, Elfi: Engine for
     improve human-robot shared plans execution, in: 2016            likelihood-free inference, Journal of Machine Learning
     11th ACM/IEEE International Conference on Human-                Research 19 (2018) 1–7.
     Robot Interaction (HRI), IEEE, 2016, pp. 319–326.          [32] J. P. Jokinen, T. Kujala, A. Oulasvirta, Multitasking
[16] K.-J. Kim, H. Lipson, Towards a simple robotic theory           in driving as optimal adaptation under uncertainty,
     of mind, in: Proceedings of the 9th workshop on                 Human factors 63 (2021) 1324–1341.
     performance metrics for intelligent systems, 2009, pp.     [33] J. Jokinen, A. Acharya, M. Uzair, X. Jiang, A. Oulasvirta,
     131–138.                                                        Touchscreen typing as optimal supervisory control, in:
[17] S. Lin, B. Keysar, N. Epley, Reflexively mindblind:             Proceedings of the 2021 CHI Conference on Human
     Using theory of mind to interpret behavior requires             Factors in Computing Systems, 2021, pp. 1–14.
     effortful attention, Journal of Experimental Social
     Psychology 46 (2010) 551–556.
[18] Q. Wang, K. Saha, E. Gregori, D. Joyner, A. Goel, To-
     wards mutual theory of mind in human-ai interaction:
     How language reflects what students perceive about
     a virtual teaching assistant, in: Proceedings of the
     2021 CHI conference on human factors in computing
     systems, 2021, pp. 1–14.
[19] L. M. Hiatt, A. M. Harrison, J. G. Trafton, Accommodat-
     ing human variability in human-robot teams through
     theory of mind, in: Twenty-second international joint
     conference on artificial intelligence, 2011.
[20] J. Jara-Ettinger, L. E. Schulz, J. B. Tenenbaum, The
     naive utility calculus as a unified, quantitative frame-
     work for action understanding, Cognitive Psychology
     123 (2020) 101334.
[21] P. Shafto, N. D. Goodman, M. C. Frank, Learning from
     others: The consequences of psychological reasoning
     for human learning, Perspectives on Psychological
     Science 7 (2012) 341–351.
[22] Jokinen, Remes, Kujala, Corander, Bayesian parameter
     inference for cognitive simulators, in: J. Williamson,
     A. Oulasvirta, P. Kristensson, N. Banovic (Eds.),
     Bayesian methods for interaction design, Cambridge
     University Press, 2022.
[23] C. L. Baker, R. Saxe, J. B. Tenenbaum, Action under-
     standing as inverse planning, Cognition 113 (2009)
     329–349.
[24] A. Kangasrääsiö, J. P. Jokinen, A. Oulasvirta, A. Howes,
     S. Kaski, Parameter inference for computational cogni-
     tive models with approximate bayesian computation,
     Cognitive science 43 (2019) e12738.
[25] R. L. Lewis, A. Howes, S. Singh, Computational ra-
     tionality: Linking mechanism and behavior through
     bounded utility maximization, Topics in cognitive
     science 6 (2014) 279–311.
[26] A. Oulasvirta, J. P. Jokinen, A. Howes, Computational
     rationality as a theory of interaction, in: Proceed-
     ings of the 2022 CHI Conference on Human Factors
     in Computing Systems, 2022, pp. 1–14.
[27] A. Howes, J. P. Jokinen, A. Oulasvirta, Towards ma-
     chines that understand people, AI Magazine 44 (2023)