-

Predicting actions using an adaptive probabilistic model of human decision behaviours

A.H.Cruickshank

A.H.Cruickshank@sms.ed.ac.uk 0

R.Shillcock

S.Ramamoorthy

S.Ramamoorthy@ed.ac.uk 0 0 School of Informatics, University of Edinburgh , EH8 9AB , UK

Computer interfaces provide an environment that allows for multiple objectively optimal solutions but individuals will, over time, use a smaller number of subjectively optimal solutions, developed as habits that have been formed and tuned by repetition. Thus an interface agent providing assistance in this environment requires not only knowledge of the objectively optimal solutions, but also the ability to adapt to an individual’s subjectively optimal solutions. Utilising findings in psychology and neuroscience we propose a general model that adapts to individuals using Bayesian probability to infer the type of decision making behaviour that will be used. We demonstrate the effectiveness of our approach using simple implementations for two decision systems, the deliberative and habitual systems. The deliberative system uses an internal model of the environment for forward planning to reach a goal and selects actions based on the calculated plan. The habitual system learns the utility for actions in a situation based on previous experience and selects the one that has proven to be most useful in the past. The existence of both these systems has been recognised from early studies in psychology and neuroscience ([ 6 ], [ 5 ]) and are known to coexist ([ 3 ]). Our approach is related to others that are derived from human decision making, such as that used in [ 1 ], which proposes a Bayesian Theory of Mind, and [ 7 ], which extended the ACT-R cognitive architecture to create ACT-R/E. Other approaches for plan and action recognition are derived from applying automated planning and machine learning techniques. These generally fall into one of two categories a planner approach ([ 4 ]) or a historic approach ([ 2 ]). Of note is that both of these approaches broadly replicate one of the two types of decision system used by people when selecting actions. Planner predictors replicate the deliberative decision system whereas historic predictors replicate the habitual decision system. Thus our approach can be viewed as integrating these two types of predictors using a Bayesian model combination. 2

Predictive Model

Action prediction is performed using a dynamic Bayesian network model B is a decision behaviour, which we define as any process that provides a distribution over actions given a state and previously observed actions. That is, a decision behaviour can be represented as a distribution p(a′|B, s, a<n). To predict the action an for a specific state s (elided for simplicity), given previously observed actions a<n, we calculate p(an|a<n) = X p(Bn|a<n)p(an|Bn, a<n) (1)

Bn marginalising over the latent variable, B. A simple, recursive update mechanism can be used for p(Bn|a<n), which significantly increases the efficiency of the calculation: p(Bn|a<n) ∝ p(Bn−1|a<(n−1)).p(an−1|a<(n−1), Bn−1) (2) In our initial implementation we use relatively simple instantiations of the deliberative and habitual behaviours.

Deliberative, D: The deliberative behaviour selects actions based on solving the experimental task (described in Section 3) using the smallest number of moves, which subjects find by planning over the available actions. We use an abstraction of the planning process, effectively precalculating the optimal actions that exist and modeling the deliberative behaviour using a Multinomial distribution for each task state. As the task allows for multiple, optimal actions the parameters of the distribution are set such that all optimal actions have equal, high probability with a small probability of a non-optimal action to be selected by mistake.

p(an|Bn = D, a<n) = p(an|Bn = D) = M ultinomial(ψs) (3) Habitual, H: The habitual behaviour selects actions based on previous experience. We also use an abstraction of the habitual process that assumes that the more often that an action has been selected before the more likely it is to be selected again. We model this using a Dirichlet prior over a Multinomial distribution for each state. The hyper-parameters of the Dirichlet are initialised to the same value, making each action equally likely, but on observing an action the associated parameter is incremented.

p(an|Bn = H, a<n) ∝ Dirichlet(αs) (4) 3

Experimental Results

To illustrate the utility of our proposed approach we present results from a human subject experiment that required subjects to complete a novel task that contains multiple, optimal solutions. The task used a “construction” paradigm in which subjects had to join coloured connectors using a limited selection of parts, shown in Figure 1. The five layouts

used in the experiment were designed such that they could be completed using 4 parts to join each of the coloured connectors, giving the optimal number of actions as 19 (4 part selection actions for each colour plus 3 colour selection actions). Ten subjects took part in the experiment (5 male/5 female, a mix of staff and PhD students from the University of Edinburgh, School of Informatics).

A specific task state, used in the models described above, is defined by the layout being solved {1, ..., 5} and the current connection point (x, y) co-ordinate. The comparison of the predictive models was made for the individual components, and the combined model. A window online accuracy metric was used to assess the predictive power of the models, which allows for learning dynamics by only considering up to the last 100 predictions.

accuracywindow online(t) =

t Pi=max(0,t−100) δaO(i)=aP (i) min(t, 100) where aO(i) is the observed action, aP (i) is the predicted action at time i and δaO(i)=aP (i) is 1 if these match, 0 otherwise.

Figure 2 shows the mean accuracy of the three predictive models across 10 runs for a subset of subjects. The highest accuracy was 92%, the lowest was 57%, the mean was 71% with std 14%. Adaptation to the subjects can be seen, but is most clear for Subjects 6 and 8.

Acknowledgements

This work is supported by the University of Edinburgh Neuroinformatics Doctoral Training Centre, funded by ESPRC, BBSRC and MRC.

C.L.

Baker ,

R.R.

Saxe and J.B. Tenenbaum Bayesian theory of mind: Modeling joint belief-desire attribution . In Proceedings of the thirtysecond annual conference of the cognitive science society , pages 2469 - 2474 , 2011 .

C.L.

Baker ,

J.B.

Tenenbaum and

R.R.

Saxe Bayesian models of human action understanding . In Advances in Neural Information Processing Systems 18 , pages 99 - 106 . MIT Press, 2006 .

Dickinson Actions and habits: The development of behavioural autonomy . Philosophical Transactions of the Royal Society of London. B , Biological

Sciences

, 308 ( 1135 ): 67 - 78 , 1985 .

H.A.

Kautz and

J.F.

Allen Generalized plan recognition . In AAAI , volume 86 , pages 32 - 37 , 1986 .

5. K.W. Spence Behaviour theory and conditioning . 1956 .

6. E.C. Tolman Purposive behaviour in animals and men appletoncentury-crofts . New York, pages 209 - 211 , 1932 .

Trafton ,

Hiatt ,

Harrison ,

Tamborello ,

Khemlani and

Schultz Act-r/e: An embodied cognitive architecture for human-robot interaction . Journal of Human-Robot Interaction , 2 ( 1 ): 30 - 55 , 2013 .