Towards smart robots: rock-paper-scissors
          gaming versus human players

       Gabriele Pozzato, Stefano Michieletto, and Emanuele Menegatti

                 Intelligent Autonomous Systems Lab (IAS-Lab)
                  Department of Information Engineering (DEI)
                               University of Padova
              {pozzatog,stefano.michieletto,emg}@dei.unipd.it
                          http://robotics.dei.unipd.it


      Abstract. In this project a human robot interaction system was de-
      veloped in order to let people naturally play rock-paper-scissors games
      against a smart robotic opponent. The robot does not perform random
      choices, the system is able to analyze the previous rounds trying to fore-
      cast the next move. A Machine Learning algorithm based on Gaussian
      Mixture Model (GMM) allows us to increase the percentage of robot vic-
      tories. This is a very important aspect in the natural interaction between
      human and robot, in fact, people do not like playing against “stupid”
      machines, while they are stimulated in confronting with a skilled oppo-
      nent.


1   Introduction
This research aims to develop an application to let users playing with a smart
robot in a series of rock-paper-scissors games. Users can interact naturally with
the robot without any remote controls or keyboards: this application wants to
be as user-friendly as possible. Humans can communicate with the robot via an
RGB-D sensor using a default set of positions and gestures. The used robot is a
NXT of the LEGO Mindstorms series (Fig. 1).
    Vision system provided by the sensor and control of the NXT are coordi-
nated using ROS (Robot Operative System). Analyzing depth images given by
the sensor we are able to understand positions and gestures of the human be-
ing. The application development consists of three steps. At first, the system
recognizes the user pose, at the pose we associated the number of games to
play, a game is composed by three rounds. Once a round is started, a hand
gesture recognition algorithm developed by LTTM (Multimedia Technology and
Telecommunications Lab) looks at the move played by the human. At the same
time, the system chooses the robot gesture, using an Artificial Intelligence (AI)
algorithm we developed. The algorithm is able to infer human moves starting
from its previous game experience.
    The remainder of the paper is organized as follows: in Section 2 the robotics
interface is presented. In Section 3 the learning system used to infer human
strategies is described. Section 4 presents the tests performed on the AI algorithm
and the related results. Finally, Section 5 contains conclusions.
          (a)                             (b)                               (c)

                Fig. 1. Robot gestures: rock (a), paper (b), scissors (c)


2   The robotics interface
The robot used in this work is a NXT of the LEGO Mindstorms series. NXT is a
low cost programmable robotics kit developed by LEGO. The core of this kit is
an “intelligent” brick (Fig. 2a) where are located the CPU, a Flash memory and
several I/O ports to control motors and sensors. The aim of this kit is providing
a platform that can be simply used also by inexperienced users.
    In our research, the central unit of the LEGO platform (NXT brick) is used
to control two different motors simulating the three human gestures. The three
configurations are showed in Figure 1.


                   (a)                                   (b)

Fig. 2. The hardware used in this work: the NXT brick (a) and the Microsoft Kinect
sensor (b)


    For the understanding of gestures played by human users a RGB-D sensor is
used. In specific the vision system is provided by the low cost Microsoft Kinect
sensor (Fig. 2b). Communication and synchronization between sensor and robot
are obtained using ROS (Robot Operative System). This open-source framework
allows us controlling the used hardware via C++ and Python programming
languages.
3     The learning system
As humans do not play random moves it is also fundamental the robot use dif-
ferent strategies in order to increase its probability to win a match. The AI
algorithm performing the robot choice is based on Gaussian Mixture Model.
Different works have already been developed [1] [2] in our lab by using GMM to
solve complex problems. In this case we tested the GMM flexibility by solving a
problem that sounds simply if we look at the possible options, but equally hard
to solve by identifying a clear strategy even for humans being.


3.1   Gaussian Mixture Model
A GMM [3] θ is a parametric probability density function represented as a
weighted sum of K Gaussian components given by the equation:
                                        K
                                        X
                           p(x | θ) =         ωk g(x | µk , Σk ),                 (1)
                                        k=1

where x is a vector x = {x1 , . . . , xd } with size d and ωk are the weights of the
various components g(x | µk , Σk )K   k=1 . Each component is a d-variate Gaussian
function:
                                                                               
                                1                    1          0 −1
      g(x | µk , Σk ) =                     exp   −    (x − µk ) Σ k (x − µ k )   (2)
                        (2π)d/2 | Σk |1/2            2
with mean vector µk and covariance matrix Σk . The weights ωk satisfy the
                       PK
normalization property k=1 ωk = 1.
   The GMM is parametrized by mean vectors, covariance matrices and mixture
weights from all the K components. These parameters are represented by the
notation:

                           θ = {θk }K                    K
                                    k=1 = {ωk , µk , Σk }k=1                      (3)
    Extremely important is the training stage and the maximum likelihood (ML)
parameters estimation. Using the Expectation-Maximization algorithm we ob-
tained the value of the parameters which maximize the likelihood. Furthermore
the estimation of the optimum K is reached using the Bayesian Information Cri-
terion (BIC) [4]. BIC selects the better model in a class of models with different
number of parameters. In this way we determined the model with the minor
number of parameters that guarantees a good data fitting. For further details
see [5] and [6].

3.2   Rock-Paper-Scissors
A dataset X = {x1 , ..., xN } with N the number of instances is used to train the
GMM. GMM features xj |N    j=1 have dimension d = 8 and they are defined as

              xj = {(rj−3 , hj−3 ), (rj−2 , hj−2 ), (rj−1 , hj−1 ), (rj , hj )}   (4)
Fig. 3. (a) The three previous rounds used to determine r4 . (b) The three pairs used
to complete the sequence of three previous rounds. From this sequences we obtain the
probabilities of rock, paper and scissors.


where (rj , hj ) is the j th round played, with rj as robot move and hj human move.
Both rj , hj ∈ {rock, paper, scissors}. Each instance in the dataset consists of
four rounds, the j th one and three previously played.
    The idea is using a history of the three previous rounds to determine the
gesture rj the robot will play in the current game to contrast the hj gesture
played by the human. In the testing phase we complete the sequence of three
previous games in Fig. 3(a) with each of the three “favorable” pairs in Fig. 3(b),
in order to obtain three prior probabilities related to these configurations. The
pairs are called “favorable” because they correspond to a robot victory.
    Thus, after obtaining the three completed sequences called xrock , xpaper and
xscissors in relation to the first symbol of completion we obtain three priors from
the GMM for rock (prock ), paper (ppaper ) and scissors (pscissors ):

                                          K
                                          X
                         pi = P (xi ) =         P (θi = m | xi ),                 (5)
                                          m=1


where θk denote the k th weighted Gaussian component and i = rock, paper, scissors.
   Using these updated probabilities we choose the robot gesture generating a
random number n in the interval [0,1] and then partitioning the interval [0,1] in
three disjoint subintervals. The final gesture will be:

 – Rock, if n ∈ [0,prock ),
 – Paper, if n ∈ [prock ,prock + ppaper ),
 – Scissors, if n ∈ [prock + ppaper ,1].

    After the round is played and we have obtained both robot and human ges-
tures we are able to complete a new sequence xj and we could use it in order to
update the training set. GMM is updated every iteration t reloading the entire
training set.
BIC is executed every 2t iterations, limiting the value of K to 50.
Fig. 4. The histogram shows the percentages of robot victories for training sets of dif-
ferent size. The red plane shows the percentage of victories obtained with the equiprob-
ability policy in the first test.


4    Experimental results

We tested our approach comparing it with the standard random choice strategy.
Tests are performed on 200 rounds and we averaged the results on 20 iterations.


Random strategy Rock, paper and scissors are equiprobable and robot ges-
tures are chosen randomly. In this case we obtain a percentage of robot victories
of 32.55% (red plane in Fig. 4).


Our GMM based strategy The strategy adopted is the one previously de-
scribed in this paper. Probabilities were updated over time and the test was
performed on different size training dataset: five different initial dataset were
considered from 150 to 650 elements (Fig. 4). Datasets composed by 150 and
250 elements are referred to a single user, while in the ones composed by 350,
450, 550, and 650 elements we asked to several users to try out our system. Every
user played in more than one session. A session was composed from ten to twenty
consecutive games. In the single user cases it is easy to see a quick increment
of the percentage of victories of even with relative small size datasets. With a
dataset of 250 instances, the robot will win with a probability of the 35,4%.
In the multi-user version more data were needed for GMM settling. When the
model starts generalizing and managing an higher number of users the percent-
age grows to a maximum of 36.65% for the dataset of 650 instances.
    Our choice to adopt a specific strategy using a GMM approach in order to
predict next human move with respect using a random strategy is also supported
by humans’ percentages of use of the three gestures rock, paper and scissors (Fig.
5). We can notice that users do not play randomly but they adopt a precise
strategy.
    Fig. 5. The histogram shows humans’ percentages of use of the three gestures.


5    Conclusions

In conclusion we have implemented an AI algorithm to allow human users playing
naturally with an intelligent robot. Considering the maximum training dataset
size (650 instances) the chosen approach has led to a robot winning percentage
of 36.65% with respect the percentage of 32.55% obtained with a random choice
strategy.
    In this work we focused our attention to the victory percentages of the robot
without taking care of how much human users are interested and involved in the
game. In a similar way it is possible to fit with evidence and practice in game
design, where the aim is to match the ability of the opponent to keep her/him
involved in the game. In fact, we record both human and robot moves so we can
easily choose the policy by using the history of previously played matches.
    In the future we want to test the algorithm on larger datasets for a better
estimation of winning percentage and implement an algorithm able to generate a
specific model for every individual user. Finally an interesting improvement could
be a comparison between the GMM trained strategy and a strategy choosing the
gesture to play according to the human percentages of use reported in Fig. 5.


Acknowledgment

We would like to thank Fabio Dominio and dr. Pietro Zanuttigh of the University
of Padova, members of the LTTM (Multimedia Technology and Telecommunica-
tions Lab) directed by Prof. G.M. Cortelazzo, for the hand gesture recognition
library for the RGB-D sensor.
References
1. Michieletto, S., Chessa, N., Menegatti, E.: Learning how to approach industrial
   robot tasks from natural demonstrations. In: Proceedings of IEEE Workshop on
   Advanced Robotics and its Social Impacts ARSO2013, IEEE (2013)
2. Michieletto, S., Rizzi, A., Menegatti, E.: Robot learning by observing humans ac-
   tivities and modeling failures. In: IROS workshops: Cognitive Robotics Systems
   (CRS2013), IEEE (Nov 2013)
3. Reynolds, D.: Gaussian mixture models. Encyclopedia of Biometric Recognition
   (2008)
4. Konishi, S., Kitagawa, G.: Bayesian information criteria. In: Information Criteria
   and Statistical Modeling. Springer Series in Statistics. Springer New York (2008)
   211–237
5. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning.
   Springer Series in Statistics. Springer New York Inc., New York, NY, USA (2001)
6. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. 2 edn. Pearson
   Education (2003)