=Paper=
{{Paper
|id=None
|storemode=property
|title=Mining Determinism in Human Strategic Behavior
|pdfUrl=https://ceur-ws.org/Vol-870/paper_9.pdf
|volume=Vol-870
}}
==Mining Determinism in Human Strategic Behavior==
<pdf width="1500px">https://ceur-ws.org/Vol-870/paper_9.pdf</pdf>
<pre>
    Mining Determinism in Human Strategic Behavior

                                      Rustam Tagiew

           Institute for Computer Science of TU Bergakademie Freiberg, Germany
                           tagiew@informatik.tu-freiberg.de


       Abstract. This work lies in the fusion of experimental economics and data min-
       ing. It continues author’s previous work on mining behavior rules of human sub-
       jects from experimental data, where game-theoretic predictions partially fail to
       work. Game-theoretic predictions aka equilibria only tend to success with expe-
       rienced subjects on specific games, what is rarely given. Apart from game theory,
       contemporary experimental economics offers a number of alternative models. In
       relevant literature, these models are always biased by psychological and near-
       psychological theories and are claimed to be proven by the data. This work in-
       troduces a data mining approach to the problem without using vast psychological
       background. Apart from determinism, no other biases are regarded. Two datasets
       from different human subject experiments are taken for evaluation. The first one
       is a repeated mixed strategy zero sum game and the second – repeated ultimatum
       game. As result, the way of mining deterministic regularities in human strategic
       behavior is described and evaluated. As future work, the design of a new repre-
       sentation formalism is discussed.

       Key words: Game Theory, Psychology, Data Mining, Artificial Intelligence, Domain-
       Specific Languages


1   Introduction
Game theory is one of many scientific disciplines predicting outcomes of social, eco-
nomical and competitive interactions among humans on the granularity level of indi-
vidual decisions [1, p.4]. People are assumed to be autonomous and intelligent, and to
decide according to their preferences. People can be regarded as rational, if they always
make decisions, whose execution has according to their subjective estimation the most
preferred consequences [2,3]. The correctness of subjective estimation depends on the
level of intelligence. Rationality can justify own decisions and predictions of other peo-
ple’s decisions. If interacting people satisfy the concept of rationality and apply mutu-
ally and even recursively this concept, then the interaction is called strategic interaction
(SI). Further, game is a notion for the formal structure of a concrete SI [4]. A definition
of a game consists of a number of players, their legal actions and players’ preferences.
The preferences can be replaced by a payoff function under assumed payoff maximiza-
tion. The payoff function defines each player’s outcome depending on his actions, other
players’ actions and random events in the environment. The game-theoretic solution of
a game is a prediction about the behavior of the players aka an equilibrium. The as-
sumption of rationality is the basis for an equilibrium. Deviating from an equilibrium
is beyond rationality, because it does not maximize the payoff. Not every game has an
                                 Mining Determinism in Human Strategic Behavior        85

equilibrium. However, there is at least one mixed strategies equilibrium (MSE) in finite
games [5].
     The notion of game is commonly used for pleasant time spending activities like
board games, but can also be extended to all social, economical and competitive in-
teractions among humans. A board game can have the same game structure as a war.
Some board games are even developed to train people, like Prussian army war game
”Kriegspiel Chess” [6] for officers. We like it to train ourselves in order to perform
better in games [7]. In most cases, common human behavior in games deviates from
game-theoretic predictions [8,9]. One can say without any doubt that if a human player
is trained in a concrete game, he will perform close to equilibrium. But, a chess master
does not also play poker perfectly and vice versa. On the other side, a game-theorist
can find a way to compute an equilibrium for a game, but it does not make a successful
player out of him. There are many games we can play; for most of them, we are not
trained. That is why it is more important to investigate our behavior while playing gen-
eral games than playing a concrete game on expert level. Conducting experiments for
gathering data of human game playing is called experimental economics.
     Although general human preferences are a subject of philosophical discussions [10],
game theory assumes that they can be captured as required for modeling rationality. Re-
garding people as rational agents is disputed at least in psychology, where even a scien-
tifically accessible argumentation exposes the existence of stable and consistent human
preferences as a myth [11]. The problems of human rationality can not be explained by
bounded cognitive abilities only. ”British people argue that it is worth spending billions
of pounds to improve the safety of the rail system. However, the same people habitu-
ally travel by car rather than by train, even though traveling by car is approximately 30
times more dangerous than by train!”[12, p.527–530] Since the last six decades never-
theless, the common scientific standards for econometric experiments are that subjects’
preferences over outcomes can be insured by paying differing amounts of money [13].
However, insuring preferences by money is criticized by the term ”Homo Economicus”
as well.
     The ability of modeling other people’s rationality and reasoning as well corresponds
with the psychological term ”Theory of Mind” (ToM) [14], which lacks almost only in
the cases of autism. For experimental economics, subjects as well as researchers, who
both are supposed to be non-autistic people, may fail in modeling of others’ minds
anyway. In Wason task at least, subjects’ reasoning does not match the researchers’
one [15]. Human rationality is not restricted to capability for science-grade logical rea-
soning – rational people may use no logic at all [16]. However, people also mistake
seriously in the calculus of probabilities [17]. In mixed strategy games, the required
sequence of random decisions can not be properly generated by people [18]. Due to
bounded cognitive abilities, every ”random” decision depends on previous ones and is
predictable in this way. In ultimatum games [9, S. 43ff], the economists’ misconception
of human preferences is revealed – people’s minds value fairness additionally to per-
sonal enrichment. Our minds originated from the time, when private property had not
been invented and social values like fairness were essential for survival.
     This work concentrates on human playing of general games and continues author’s
previous work [19]. It is about the common human deviations from predicted equilibria
86      R.Tagiew

in games, for which they are not trained or experienced. The two examples introduced
in this work are a repeated mixed strategy zero sum game and a repeated ultimatum
game from responders’ perspective. The only assumption is the existence of determin-
istic rules in human behavior. Under this assumption, diverse data mining algorithms
are evaluated. Apart from mining deterministic regularities, modeling human behavior
in general games needs a representation formalism which is not specific to a concrete
game. Representing human behavior models in such a formalism would increase their
comparability. Therefore, this paper includes a general formalism discussion, where re-
sults from the evaluation are involved.
     The next section summarizes related work on a formalism for human behavior in
games. Then, the data mining approach on datasets is presented afterwards. Summary
and discussion conclude this paper.


2    Related Work

A very comprehensive gathering of works in experimental psychology and economics
on human behavior in general games can be found in [9]. This work inspired research
in artificial intelligence [20], which led to the creation of network of influence diagrams
(NID) as a representation formalism. NID is a formalism similar to the possible worlds
semantics of Kripke models [21] and is a super-set of Bayesian games. The main idea
of NID is modeling human reasoning patterns in diverse SIs. Every node of a NID is a
multi-agent influence diagram (MAID) representing a model of SI of an agent. MAID
is an influence diagram (ID), where every decision node is associated with an agent. ID
is a Bayesian network (BN), where one has ordinary nodes, decision nodes and utility
nodes. In summary, this approach assumes that human decision making can be modeled
using BN – human reasoning is assumed to have a non-deterministic structure. This for-
malism is already applied for modeling reciprocity in a repeated ultimatum game called
”Colored Trails” (CT) [22]. The result of this work is that models of adaptation to hu-
man behavior based on BNs perform better than standard game theoretical algorithms.
    Another independent work is an application of a cognitive architecture from psy-
chology to games [23]. A cognitive architecture is a formalism concerned to represent
general human reasoning [24] in order to compare different models. Today’s most pop-
ular cognitive architecture is ACT-R (Adaptive Control of Thought Rational) [25]. In
comparison to NID, ACT-R is used for a number of psychological studies. ACT-R con-
sists of two tiers – symbolic and sub-symbolic. On the symbolic tier, there are chunks
– facts and ”If-Then”-rules. On the sub-symbolic tier, there are exponential functions,
which determine activation levels of chunks, delays in reasoning and priorities between
rules. Based on ACT-R, an almost deterministic model for a mixed strategy zero sum
game ”Rock Paper Scissors” (RPS) is designed. The only case, in which the designed
model predicts random behavior is the beginning of a game sequence. The model was
successfully evaluated as a base for an artificial player, which won against human sub-
jects.
    Whether deterministic or not, both works follow the same approach. First, they con-
struct a model, which is based on theoretical considerations. Second, they adjust the
                                  Mining Determinism in Human Strategic Behavior         87

parameters of this model to the experimental data. This makes the human behavior ex-
plainable using the concepts from the model, but needs a priori knowledge to construct
the model.


3   Used Datasets

The first dataset chosen for our data mining approach has already been mentioned in
our previous work [19]. It is the game RPS played over a computer network. This game
is easy to explain and most people do not train to play it on expert level; it is symmetric,
zero sum and two player. The study was conducted on threads of 30 one-shot games. A
player had a delay for consideration of 6 sec for every shot. If he did not react, the last
or default gesture was chosen. A thread lasted 30 ∗ 6 sec = 3 min. This game has one
mixed strategy equilibrium (MSE), which is an equal probability distribution between
the three gestures. At least, one can not lose playing this MSE.
    Ten computer science undergraduates were recruited. They were in average 22, 7
years old and 7 of them were male. They had to play the thread twice against another
test person. Between the two threads, they played other games. In this way, 300 one-
shot games or 600 single human decisions are gathered. Every person got e 0.02 for a
won one-shot game and e 0.01 for a draw. The persons, who played against each other,
sat in two separate rooms. One of the players used a cyber-glove and the other one a
mouse as input for gestures. The graphical user interface showed the following infor-
mation - own last and actual choice, opponents last choice, a timer and already gained
money. According to statements of the persons, they had no problems to understand the
game rules and to choose a gesture timely. All winners and 80% of losers attested that
they had fun to play the game.
    The second dataset is the recorded responder behavior from the CT experiment [22].
This dataset contains 371 single human decisions of 25 participating subjects. A pos-
itive decision of the responder updates the monetary payoff of both players, while a
negative one does not change anything. The payoff update varied between $1.45 and
$-1.35 for the responder. In 160 cases, responders update was zero. The equilibrium for
the responder is to accept only proposals, which increase his payoff regardless of the
proposer’s payoff.


4   Methods

Statistical analysis of the datasets from the previous chapter exposed that the human
behavior observed in the experiments can not be explained using only game theory [1].
The shape of equilibrium deviations confirms the one reported in relevant literature [9].
The goal is to find a model beyond game theory for the prediction of average devia-
tions. In related work, the creation of a sophisticated model preceded the evaluation on
the data. In this work, the evaluation on the data precedes the creation of a model. Of
course, some people would not match into such a model like trained or somehow expe-
rienced individuals. Prediction of specific individuals is not addressed in this paper.
88      R.Tagiew

    Machine done prediction without participation in game playing with human sub-
jects should not be confused with prediction algorithms of artificial players. Quite the
contrary, artificial players can manipulate the predictability of human subjects by own
behavior. For instance, an artificial player, which always throws ”Paper” in RPS, would
success at predicting a human opponent always throwing ”Scissors” in reaction. Other-
wise, if an artificial player maximizes its payoff based on opponent modeling, it would
face a change in human behavior and have to handle that. This case is more complex
than a spectator prediction model for an ”only-humans” interaction. This paper restricts
on a prediction model without participating.
    Human behavior can be modeled as either deterministic or non-deterministic. Al-
though human subjects fail at generating truly random sequences as demanded by MSE,
non-deterministic models are especially used in case of artificial players in order to han-
dle uncertainties. ”Specifically, people are poor at being random and poor at learning
optimal move probabilities because they are instead trying to detect and exploit se-
quential dependencies. ... After all, even if people do not process game information in
the manner suggested by the game theory player model, it may still be the case that
across time and across individuals, human game playing can legitimately be viewed
as (pseudo) randomly emitting moves according to certain probabilities.” [23] In the
addressed case of spectator prediction models, non-deterministic view can be regarded
as too shallow, because deterministic models allow much more exact predictions. Non-
deterministic models are only useful in cases, where a proper clarification of uncer-
tainties is either impossible or costly. To remind, deterministic models should not be
considered to obligatory have a formal logic shape.
    The deterministic function HD(Game, History) → Decision denotes a human deci-
sion. History denotes the previous turns in the game. Game and History are the input
and Decision – the output. Finding a hypothesis, which matches the regularity between
input and output without a priori knowledge, is a typical problem called supervised
learning [26]. There is already a big amount of algorithms for supervised learning.
Each algorithm has its own hypothesis space (HS). For a Bayesian learner, e.g., the hy-
pothesis space is the set of all possible Bayesian networks. There are many different
types of hypothesis spaces - rules, decision trees, Bayesian models, functions and so
on. Concrete hypothesis HDI is a relationship between input and output described by
using the formal means of the corresponding hypothesis space.
    Which hypothesis space is most appropriate to contain valid hypotheses about hu-
man behavior? This is a machine learning version of the question about a formalism
for human behavior. The most appropriate hypothesis space contains the most correct
hypothesis for every concrete example of human behavior. A correct hypothesis does
not only perform well on the given data (training set), but it performs also well on new
data (test set). Further, it can be assumed that the algorithms which choose a hypothesis
perform alike well for all hypothesis spaces. For instance, a decision tree algorithm cre-
ates a tree, a neuronal algorithm creates a neuronal network and the distance between
the created tree to the best possible tree is the same as the distance between the created
neuronal network and the best possible neuronal network. This assumption is a useful
simplification of the problem for a preliminary demonstration. Using it, one can con-
sider the algorithm with the best performance on the given data as the algorithm with
                                  Mining Determinism in Human Strategic Behavior        89

the most appropriate hypothesis space. The standard method for measurement of per-
formance of a machine learning algorithm or also a classifier is cross validation.
    As it is already mentioned, a machine learning algorithm has to find hypothesis HDI
which matches best the real human behavior function HD. Human decision making de-
pends mostly on a small part of the history due to bounded resources. This means that
one needs a simplification function S(History) → Pattern. Using function S the func-
tion HD(X,Y ) is to be approximated through HDII (X, S(Y )). The problem for finding
the most appropriate hypothesis can be formulated in equation 1. The function match
in equation 1 is considered to be implemented through a cross validation run.

                 arg max( max (match(HD(X,Y ), HDII (X, S(Y )))))                      (1)
                    HS     HDII ∈HS


5   Empirical Results

The first dataset is transformed to a sets of tuples, each one consists of three own previ-
ous gestures, three opponent previous gestures and own next gesture. Therefore, every
tuple has the length 3 + 3 + 1 = 7. The simplification function is a window over three
last turns. There are 2187 possible tuples for RPS. The decisions in the first three turns
of game are not considered. Therefore, the size of the set results to 540 tuples. The
second dataset is also transformed to a sets of 371 tuples, where every tuple includes
the proposers payoff update, the responders payoff update and the responders boolean
reply.
    Implementations of classifiers provided by WEKA [27] are used for the cross val-
idation on the both sets of tuples. For the first dataset, there are currently 45 classi-
fiers available in the WEKA library, which can handle multi-valued nominal classes.
Gestures in RPS are nominal, because there is no order between them. These classi-
fiers belong to different groups - rule-based, decision trees, function approximators,
baysian learners, instance-based and miscellaneous. A cross validation of all 45 classi-
fiers on RPS dataset is performed. For the CT dataset, a cross-validation of 35 appro-
priate classifiers is performed. The number of subsets for cross-validation is 10. Both
cross-validation runs are conducted with preserving order of the tuples.
    Sequential minimal optimization (SMO) [28] showed 46.48% prediction correct-
ness, which is about 1% higher than the sophisticated non-deterministic model for RPS
of Warglen [29]. Unfortunately, decreasing and increasing the window size in the func-
tion S for the RPS dataset diminishes the performance. Using the single rule classifier
(OneR), one can find out that 43.15% of the RPS dataset matches the rule: ”Choose
paper after rock, scissors after paper and rock after scissors”. A number of classifiers
including SMO achieve 95.42% correctness on the CT dataset in cross-validation. One
of this algorithms is based on decission tables [30]. This algorithm finds out that 95.15%
of the CT dataset conforms the rule: ”If an acceptance does neither change your payoff
nor improve the proposers payoff, then refuse!” This result overperforms clearly the
72% reported from the non-deterministic approach of Pfeffer [22].
90       R.Tagiew

6    Conclusion
The strategic behavior consists out of the observable actions, whose origins are tried
to be understood as generally as possible. Summarizing the results of this work, it can
be said that SMO can find the most general deterministic hypothesis about regularities
of human behavior in the investigated scenarios. The correctness of such a hypothesis
overperforms the numbers reported in related work. The hypothesis space of SMO is
one of complex functions and can be used for the design of a game behavior description
formalism.


References
 1. Tagiew, R.: Strategische Interaktion realer Agenten: Ganzheitliche Konzeptualisierung
    und Softwarekomponenten einer interdisziplinren Forschungsinfrastruktur. PhD thesis, TU
    Bergakademie Freiberg (2011)
 2. Russel, S., Norvig, P.: Artificial Intelligence. Pearson Education (2003)
 3. Osborne, M.J., Rubinstein, A.: A course in game theory. MIT Press (1994)
 4. Morgenstern, O., von Neumann, J.: Theory of Games and Economic Behavior. Princeton
    University Press (1944)
 5. Nash, J.: Non-cooperative games. Annals of Mathematics (54) (1951) 286 – 295
 6. Li, D.H.: Kriegspiel: Chess Under Uncertainty. Premier (1994)
 7. Genesereth, M.R., Love, N., Pell, B.: General game playing: Overview of the aaai competi-
    tion. AI Magazine 26(2) (2005) 62–72
 8. Pool, R.: Putting game theory to the test. Science 267 (1995) 1591–1593
 9. Camerer, C.F.: Behavioral Game Theory. Princeton University Press (2003)
10. Stevenson, L., Haberman, D.L.: Ten Theories of Human Nature. OUP USA (2004)
11. Bazerman, M.H., Malhotra, D.: Economics wins, psychology loses, and society pays. In
    De Cremer, D., Zeelenberg, M., Murnighan, J.K., eds.: Social Psychology and Economics.
    Lawrence Erlbaum Associates (2006) 263–280
12. Eysenck, M.W., Keane, M.T.: Cognitive Psychology: A Student’s Handbook. Psychology
    Press (2005)
13. Chamberlin, E.H.: An experimental imperfect market. Journal of Political Economy 56
    (1948) 95–108
14. Verbrugge, R., Mol, L.: Learning to apply theory of mind. Journal of Logic, Language and
    Information 17 (2008) 489–511
15. Wason, P.C.: Reasoning. In Foss, B.M., ed.: New horizons in psychology. Penguin Books
    (1966) 135–151
16. Oaksford, M., Chater, N.: The probabilistic approach to human reasoning. Trends in Cogni-
    tive Sciences 5 (2001) 349–357
17. Kahneman, D., Slovic, P., Tversky, A.: Judgment Under Uncertainty: Heuristics and Biases.
    Cambridge University Press (1982)
18. Kareev, Y.: Not that bad after all: Generation of random sequences. Journal of Experimental
    Psychology: Human Percetion and Performance 18 (1992) 1189–1194
19. Tagiew, R.: Hypotheses about typical general human strategic behavior in a concrete case.
    In: AI*IA, Springer (2009) 476–485
20. Gal, Y., Pfeffer, A.: A language for modeling agents’ decision making processes in games.
    In: AAMAS, ACM Press (2003) 265–272
21. Kripke, S.: Semantical considerations on modal logic. Acta Philosophica Fennica 16 (1963)
    83–94
                                    Mining Determinism in Human Strategic Behavior           91

22. Gal, Y., Pfeffer, A.: Modeling reciprocal behavior in human bilateral negotiation. In: AAAI,
    AAAI Press (2007) 815–820
23. Rutledge-Taylor, M.F., West, R.L.: Cognitive modeling versus game theory: Why cognition
    matters. In: ICCM. (2004) 255–260
24. Gluck, K.A., Pew, R.W., Young, M.J.: Background, structure, and preview of the model
    comparison. In Gluck, K.A., Pew, R.W., eds.: Modeling Human Behavior with Integrated
    Cognitive Architectures. Lawrence Erlbaum Associates (2005) 3–12
25. Taatgen, N., Lebiere, C., Anderson, J.: Modeling paradigms in act-r. 29–52
26. Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education (1997)
27. Witten, I.H., Frank, E.: Data Mining. Morgan Kaufmann (2005)
28. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization.
    In: Advances in Kernel Methods - Support Vector Learning, MIT Press (1999) 185–208
29. Marchiori, D., Warglien, M.: Predicting human interactive learning by regret-driven neural
    networks. Science 319 (2008) 1111–1113
30. Kohavi, R.: The power of decision tables. In: ECML, Springer (1995) 174–189

</pre>