A Game of Lines:
               Developing Game Mechanics for
                     Text Classification

          Giorgio Maria Di Nunzio1 , Maria Maistro1 , and Daniel Zilio2
           1
            Dept. of Information Engineering – University of Padua, Italy
        giorgiomaria.dinunzio@unipd.it, maria.maistro@dei.unipd.it
 2
   Dept. of Cultural Heritage – University of Padua, Italy daniel.zilio@unipd.it


       Abstract. In this paper, we describe a set of experiments that turn
       the machine learning classification task into a game, through gamifica-
       tion techniques, and let non expert users to perform text classification
       without even knowing the problem. The application is implemented in
       R using the Shiny package for interactive graphics. We present the out-
       come of three different experiments: a pilot experiment with PhD and
       post-doc students, and two experiments carried out with primary and
       secondary school students. The results show that the human aided clas-
       sifier performs similarly and sometimes even better than state of the art
       classifiers.


1     Introduction
The creation of a labelled dataset for supervised learning is slow and expensive.
In the last years, mixed approaches that use crowd-sourcing and interactive ma-
chine learning [1] have shown that it is possible to create annotated datasets at
affordable costs [12]. One major challenge in motivating people to participate in
these labelling tasks is to design a system that promotes and enables the forma-
tion of positive motivations towards work as well as fits the type of the activity.
In this context, an approach named ‘gamification’ has become popular. Gamifi-
cation is defined as “the use of game design elements in non-game contexts” [4],
i.e. tipical game elements, like rankings, leaderboards, points, badges, etc, are
used for purposes different from their normal expected employment.
     Nowadays, gamification spreads through a wide range of disciplines and its
applications are implemented in different areas. For instance, an increasingly
common feature of online communities and social media sites is a mechanism
for rewarding user achievements based on a system of badges and points. They
have been employed in many domains, including educational sites like Khan
Academy3 , and tourist review sites like Tripadvisor4 . At the most basic level,
these game elements serve as a summary of a users key accomplishments; how-
ever, experience with these sites also shows that users will put in non-trivial
3
    https://www.khanacademy.org/
4
    https://www.tripadvisor.com/
amounts of work to achieve particular badges, and as such, badges can act as
powerful incentives [2].
    The use of gamification in academic research areas has been introduced very
recently and its potential is still to be explored and validated. Information Re-
trieval (IR) has recently dealt with gamification, as witnessed by the GamifIR in
2014, 2015 and 20165 . In [10], the authors describe the fundamental elements and
mechanics of a game and provide an overview of possible applications of gam-
ification to the IR process. In [13], approaches to properly gamify Web search
are presented, i.e. making the search of information and the scanning of results
a more enjoyable activity. Other approaches of game applied to different aspects
of IR have been proposed. For example in [11], the authors describe a game that
turns document tagging into the activity of taking care of a garden, with the
aim of managing private archives.
    In this paper, we present the recent studies of gamification in text classifica-
tion and the development of a Web application written in R with the package
Shiny [3]. This application, initially designed to understand probabilistic mod-
els, has been redesigned as a game to train a text classifier with the aid of non
experts, especially kids from primary and secondary schools, during the Euro-
pean Researchers’ Night in September 2016 at the University of Padua6 . We
tested this application with a two-fold goal in mind: i) how the gamification of a
classification problem can be used to understand the ‘price’ of labelling a small
amount of objects for building a reasonably accurate classifier, ii) to analyze
the classification performance given the presence of small sample sizes and little
training.


2     The Classification Game

In this section, we present the refinements of a visualization approach of prob-
abilistic text classifiers that was transformed into a game. The application was
implemented with the Shiny package in R that allows to build interactive graph-
ics [3]. This two-dimensional representation allows non experts to visually in-
teract with the algorithm and, at the same time, to gather new training labels.
In this section, we first describe the mathematical idea that supports the game,
then we describe the rule of the game and how players can interact with the
algorithm.


2.1    Math Background

The game is based on the two-dimensional representation of probabilities, also
known as Likelihood Spaces [14], which is a very intuitive way of presenting the
problem of classification on a two-dimensional space (full mathematical details
5
    http://gamifir.com
6
    http://www.venetonight.it/
              Fig. 1. Layout of the web application designed for experts


can be found in [8, 7, 6, 5]). Given two classes c1 and c2 , an object o is assigned
to category c1 if the following inequality holds:

                              P (o|c2 ) < m P (o|c1 ) +q                        (1)
                              | {z }        | {z }
                                 y             x


where P (o|c1 ) and P (o|c2 ) are the likelihoods of the object o given the two
categories, while m and q are two parameters that can be either set automat-
ically, for example by optimizing a measure of classification accuracy, or semi-
automatically by asking to a user to suggest the initial conditions based on a
visual inspection of the problem. In fact, if we interpret the two likelihoods as
two coordinates x and y of a two dimensional space, the problem of classification
can be studied on a two-dimensional plot where: i) the decision of the classifica-
tion is represented by the line y = mx + q that splits the plane into two parts,
ii) the points that fall ‘below’ this line belongs to class c1 .


2.2    Game Mechanics

The initial version of the interface7 , shown in Figure 1, was designed to be used
by experts to understand how to optimize the search of the optimal parameters.
7
    Available at https://gmdn.shinyapps.io/shinyK/
    In the “gamified” version of this problem, players have to find the best com-
bination of m and q having a fixed amount of resources available to train and
validate the algorithm. The game is organized in N levels (that corresponds to
the binary classification problems), which are presented from the easiest to the
most difficult and which correspond to the different classification tasks of the
top N classes of the Reuters 21578 dataset8 . A level is difficult when it is hard
to linearly separate the positive class c1 and the negative class c2 . An object can
be used during the game either as a training example or a validation sample,
but not both. The goal of each level (and in general of the game) is to find the
best classifier, i.e. the line which best separates the two categories, c1 and c2 and
therefore which maximizes the F1 score, with the least amount or resources.
    Resources can be used to increase the number of objects of the training
and/or the validation set. At any point in the game, the player can use some
resources to buy additional training or validation objects. By doing so, an addi-
tional 5% of the collection is added to either the training set (more precise) or
the validation set (more objects on the screen). Once the player has found what
he/she considers the best classifier, he/she can proceed with the test, thus the
classifier is tested on the test set and the F1 score is computed. At this point,
the level is completed and the player is forced to go to the next level or conclude
the game.


3     Experiments

In the previous section, we presented how the players can interact with the
classification game by “investing” a limited amount of resources to buy training
and validation data and, consequently, to find a better combination of the two
parameters m and q.
    In this section, we present the results of three different experiments of the
gamification of text classification that involved different users and different in-
terfaces.


3.1    Pilot Experiment: PhD and Post-doc students

A second version of the interface was designed for PhD and post-doc students9
and a pilot study was carried out to test this preliminary version of the game
and to collect opinions and suggestions regarding possible improvements of the
game. In this first experiment, we were positively surprised by two results (a
complete description of the results can be found in [9]). First, on average, the
players could reach the ‘goal’ (i.e., the score that a state-of-the-art classifica-
tion algorithm would reach with the whole labelled dataset) more easily than
expected, by using only 25% of the available data. The second interesting aspect
is that a support vector machine trained on the same reduced dataset (around
8
    http://www.daviddlewis.com/resources/testcollections/reuters21578/
9
    Available at https://gmdn.shinyapps.io/Classification/
            Fig. 2. Layout of the web application designed for students


25% of the annotated dataset) performed as well as the same SVM trained on
the whole dataset. This results are very promising since, the gamification of text
classification may give a reliable indication about when to stop the labelling pro-
cess and use the annotated dataset to train with good classification performances
a state-of-the-art-algorithm. This second part will require a deep analysis and
further experiments to confirm the statistical significance of this process.


3.2   Second Experiment: primary and secondary school students

During the European Researcher’s Night at the University of Padua in September
2016, we designed a new interface to make the game easier for kids of primary and
secondary schools who played the application. The interface, shown in Figure 2,
lets users play only three levels (each level corresponds to a different category)
and give feedback about the current performance whenever the line is adjusted.
In this experiment, we also added some incentives like a public leaderboard that
was displayed and regularly updated and chocolate candies for the top scorer. A
total of 28 players used the interface.
    Considering that these users did not know anything about machine learning
or text classification, the results in terms of classification performance were even
more surprising compared to the first experiment. In Table 1, we compare the
average results of the classification performance of the players (column manual )
with the classification performance of a Naı̈ve Bayes classifier (NB) and a Sup-
port Vector Machine (SVM) as well as the ‘goal’ performance. You can notice
that the results obtained by participants are very close to the one obtained with
the NB and in the case of the second class, the users achieves better performances
Table 1. Manual vs NB and SVM classifiers. Classification performance during the
European Researcher’s Night. The averaged F1 measure of 28 participants is reported
for each class.

                          Classes Goal Manual NB SVM
                             1    0.950 0.931 0.943 0.940
                             2    0.850 0.784 0.768 0.840
                             3    0.750 0.715 0.715 0.730
                          average 0.850 0.810 0.809 0.837


Table 2. Manual vs NB and SVM classifiers. Classification performance during the
week at the Banca d’Italia. The averaged F1 measure of 27 participants is reported for
each class.

                          Classes Goal Manual NB SVM
                             1    0.950 0.940 0.942 0.939
                             2    0.850 0.807 0.786 0.841
                             3    0.750 0.714 0.710 0.723
                          average 0.850 0.830 0.813 0.834


than NB. On average, the classifier with the human contribution is performing
better than NB and worse than SVM.


3.3   Third experiment: General Public

The first week of April 2017, during an event at one of branches of Banca d’Italia
in Padua for the brand new 50 euro note, we presented a third version of the
game that was available for the public a whole week. For this study, we decided
to make the layout cleaner, see Figure 3, and add keyboard controls to change
the decision line instead of using sliders. We kept the same game incentives,
chocolate candies and leaderboard, and we added an instructional presentation
of the problem to help the player to understand what ‘machine learning’ and
‘training set’ are.
    A total of 27 participants played with the game and their results are reported
in Table 2. Even in this case the human aided classifier achieves good results
and the interaction of users with the algorithm through the gamified approach
reached performances close to SVM and often better than NB. In this case the
results are much closer to SVM than NB even if the amount of resources used was
comparable to the second experiments: players tend to consider the performance
of the classifier satisfactory when 30% of the resources are used.
    Finally, notice that the algorithms were trained on a different amount of data
during the game, the scores in Table 1 and Table 2 are not directly comparable.
This explains the different results reported for NB and SVM in Table 1 and
Table 2.
           Fig. 3. Layout of the web application designed for general public


4      Final Remarks and Future Work

In this paper, we presented the ongoing work on gamification for text classifica-
tion that involves non expert users in the task of labelling data and produce an
estimate of the monetary cost of creating the training dataset. Considering the
very abstract game (a line and some dots), the first three preliminary studies
were successful in terms of participation and initial results. The goals of these
studies is to have feedback and collect enough data to study how to design the
game in order to make it open to the general public; in addition, we want to
understand whether a ‘serious’ game can be implemented in order to gather
labelled data for machine learning.
    Future work aims at extending the proposed game and transform it in an
application for different mobile devices. Therefore, further effort is needed to
design the interface of the mobile application with integrated environments like
Unity10 . Moreover, considering that the players are not expert in classification
the rules of the game should be presented clearly and some concepts, as for
example the validation phase, need to be explained in an easier way.
    Finally, we aim at investigating a different game mode with two players
collaborating together to reach a common goal. For instance, the users can share
the controls so they need to cooperate to find the best solution, or an alternative
is to assign different tasks to each user, one user will control the classification
line while the other user will assess documents to help him or her to get more
training examples.
10
     https://unity3d.com
References
 1. Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. Power to
    the People: The Role of Humans in Interactive Machine Learning. AI Magazine,
    35(4):105–120, 2014.
 2. Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. Steering
    user behavior with badges. In Proceedings of the 22Nd International Conference
    on World Wide Web, WWW ’13, pages 95–106, New York, NY, USA, 2013. ACM.
 3. Winston Chang. Shiny: Web Application Framework for R, 2015. R package version
    0.11.
 4. Sebastian Deterding, Dan Dixon, Rilla Khaled, and Lennart Nacke. From Game
    Design Elements to Gamefulness: Defining “Gamification”. In Proc. of the 15th
    International Academic MindTrek Conference: Envisioning Future Media Environ-
    ments, MindTrek ’11, pages 9–15, New York, NY, USA, 2011. ACM.
 5. Giorgio Maria Di Nunzio. Using Scatterplots to Understand and Improve Proba-
    bilistic Models for Text Categorization and Retrieval. Int. J. Approx. Reasoning,
    50(7):945–956, 2009.
 6. Giorgio Maria Di Nunzio. A New Decision to Take for Cost-Sensitive Naı̈ve Bayes
    Classifiers. Information Processing & Management, 50(5):653 – 674, 2014.
 7. Giorgio Maria Di Nunzio. Interactive machine learning with r. In Francesco Mola
    and Claudio Conversano, editors, CLADAG 2015 10th Scientific Meeting of the
    Classification and Data Analysis Group of the Italian Statistical Society. Book of
    Abstracts., pages 333–338. 2015.
 8. Giorgio Maria Di Nunzio. Interactive Text Categorisation: The Geometry of Like-
    lihood Spaces, pages 13–34. Springer International Publishing, Cham, 2017.
 9. Giorgio Maria Di Nunzio, Maria Maistro, and Daniel Zilio. Gamification for ma-
    chine learning: The classification game. In Proceedings of the Third International
    Workshop on Gamification for Information Retrieval co-located with 39th Inter-
    national ACM SIGIR Conference on Research and Development in Information
    Retrieval (SIGIR 2016), Pisa, Italy, July 21, 2016., pages 45–52, 2016.
10. Luca Galli, Piero Fraternali, and Alessandro Bozzon. On the Application of Game
    Mechanics in Information Retrieval. In Proc. of the 1st Int. Workshop on Gami-
    fication for Information Retrieval, GamifIR’14, pages 7–11, New York, NY, USA,
    2014. ACM.
11. Carlos Maltzahn, Arnav Jhala, Michael Mateas, and Jim Whitehead. Gamification
    of private digital data archive management. In Proceedings of the First Interna-
    tional Workshop on Gamification for Information Retrieval, GamifIR ’14, pages
    33–37, New York, NY, USA, 2014. ACM.
12. B. Morschheuser, J. Hamari, and J. Koivisto. Gamification in crowdsourcing: A re-
    view. In 2016 49th Hawaii International Conference on System Sciences (HICSS),
    pages 4375–4384, Jan 2016.
13. Mark Shovman. The Game of Search: What is the Fun in That? In Proc. of the
    1st Int. Workshop on Gamification for Information Retrieval, GamifIR’14, pages
    46–48, New York, NY, USA, 2014. ACM.
14. Rita Singh and Bhiksha Raj. Classification in Likelihood Spaces. Technometrics,
    46(3):318–329, 2004.