<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IR Game: How well do you know information retrieval papers?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Rybak</string-name>
          <email>jan.rybak@idi.ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krisztian Balog</string-name>
          <email>krisztian.balog@uis.no</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kjetil Nørva˚g</string-name>
          <email>kjetil.norvag@idi.ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>In: F. Hopfgartner, G. Kazai, U. Kruschwitz, and M. Meder (eds.):</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Norwegian University, of Science and Technology</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of the GamifIR'15 Workshop</institution>
          ,
          <addr-line>Vienna, Austria, 29-March2015, published at http://ceur-ws.org</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Stavanger</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we demonstrate how a gamification approach increases the attractiveness of an assessment exercise in the context of expertise profiling. We present an online game, in two difficulty modes, where users have to guess the authors of publications. We analyze the collected data along different dimensions and identify four types of gaming personalities based on behavioral patterns. Further, we examine the relation between popularity and recognizability for both papers and authors. Finally, we provide insights into game mechanics that extend beyond our specific use case. Copyright c 2015 for the individual papers by the paper's authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Gamification is a method for keeping users involved in a
task for longer periods of time or to encourage them to
repeatedly undergo otherwise not so entertaining tasks.
Another reason for gamification is the desire to generate useful
data as a by-product of the playing activity [VAD08]. Our
work takes place in the context of (temporal) expertise
profiling, where we are concerned with identifying what topics
people are knowledgeable about [RBN14b]. Specifically,
we focus on the academic domain, where scientific
publications constitute the best available evidence from which to
draw conclusions regarding a person’s expertise. Of course,
what one holds as her most important publication(s) does
not necessarily correspond with what others (i.e., the
scientific community) consider as such. Arguably, the latter one
is more important. In our experience, getting the first
question answered (what a person considers his most important
publications) is not easy; we developed an assessment
interface for this purpose [RBN14a] and found that people
were not especially willing to spend time with it. (It has
to be mentioned, however, that selecting the most
important publications was only part of the task, the assessment
procedure was more involved than that.) The main
motivation for us, therefore, is to attract users that can represent
the relevant scientific community’s general opinion and can
generate data to be used in our efforts to evaluate temporal
expert profiling approaches [RBN14b].</p>
      <p>Gamification is not the only alternative we considered.
In many scenarios, crowdsourcing is a valid option for the
delivery of simple yet time-consuming or repetitive tasks
such as data annotation or evaluation. Eickhoff [Eic14]
examines the crowd-powered expert paradigm, where the
majority of the workload is preprocessed by crowd
workers and experts are only needed for specialized steps. None
of these approaches, however, are applicable in our case;
here, the entire task is strictly domain-specific and requires
the involvement of domain experts.</p>
      <p>We cast our assessment exercise as a simple
questionanswering quiz that tests the user’s knowledge of
information retrieval (IR) papers, i.e., the “IR game.” The player
is presented with the title of a publication, from selected
top conferences, and her task is to attribute the paper to
the corresponding author(s).1 The game comes in two
difficulty modes. In “beginner” mode, the user has to select
the right set of authors, from three options, while in
“advanced” mode authors have to be picked out individually.
The goal in each mode is to answer as many questions,
i.e., collect as many points, as possible. The game ends
after three wrong answers. A “leader board” is provided to
track the highest scoring players. The game is available at
http://bit.ly/ir-game.</p>
      <p>In the course of this work, we examine whether we are
able to attract more interest (and collect more data) by
presenting our assessment exercise indirectly, as a game, as
1This game should not be unfamiliar to academics; many of us perform
a similar “authorship attribution” exercise, albeit not deliberately, when
performing blind reviews. The main difference is that here we offer instant
feedback while providing limited context.
opposed to dealing with it explicitly using a purpose-built
interface. In addition, we address a number of more
specific questions:</p>
      <p>Which level of difficulty is preferred, the easy mode
or the advanced one?
Does a competitive element, such as a leader board,
increase the level of engagement?</p>
    </sec>
    <sec id="sec-2">
      <title>When do users stop playing?</title>
    </sec>
    <sec id="sec-3">
      <title>Do users return to play again? After how long?</title>
    </sec>
    <sec id="sec-4">
      <title>What types of players can we identify? Are more cited papers also more easily recognized? Are more popular authors also more easily recognized?</title>
    </sec>
    <sec id="sec-5">
      <title>Do people prefer to play anonymously?</title>
      <p>Our findings confirm the premise that interweaving game
mechanics into a non-game environment is beneficial in
terms of task attractiveness. We also demonstrate that the
leader board is a powerful motivator for many people.
“IR game” is a simple knowledge quiz that tests users’
knowledge of publications in the field of Information
Retrieval.
2.1</p>
      <sec id="sec-5-1">
        <title>Game rules</title>
        <p>Given a publication title, the player’s task is to select the
correct authors within a given time limit. The game can be
played in two modes, beginner and advanced.</p>
        <p>Beginner mode The user has to select the (entire) group of
authors from three options. Only one variant is correct
and all authors are listed in the same order as on the
paper. See Figure 1 (a).</p>
        <p>Advanced mode In the more difficult game mode,
individual author names are offered and the user has to
decide which names belong to the paper. The
number of authors does not necessarily correspond with
the actual number of the paper’s authors and the same
applies to the order of names. See Figure 1 (b). The
user is credited with the corresponding F1-score for
each answer (i.e., correct vs. selected set of authors);
answers below an F1-score of 0.5 count as wrong.
In both modes, the game ends after three wrong answers.
A separate leader board is available for each game mode
that lists the highest scoring players (with score, name, and
timestamp). Players that made it to the leader board were
offered the opportunity to “brag” about their achievement
on Twitter.
2.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Data</title>
        <p>The collection of publications used in this game comprises
of the 1111 top cited IR papers (according to the ACM
DL2) from the period 2004-2014 that were presented in one
of the following conference series: SIGIR, WWW, CIKM,
KDD, and WSDM. For each publication, besides its own
set of original authors, a set of “fictitious” authors is
randomly selected from other documents within the same data
set.</p>
        <p>Usage data is collected while the game is being played.
Specifically, for each user, we store the questions that have
been asked in the game, correct and wrong answers, scores,
date and time, time to answer, number of attempts to copy
text from the webpage, and rough location. In order to
recognize returning users, we use browser cookies with
a unique identifier. We also track the site’s traffic using
Google Analytics.3
2http://dl.acm.org.
3http://www.google.com/analytics/.
The game was promoted on Twitter4 aiming at people from
the IR community. In this paper, we analyze traffic from the
first five days of the game’s existence (i.e., from January 31
to February 4, 2015). During this period, 302 unique
visitors from 33 countries visited the site and more than one
third of them participated in the game. Figure 2 presents
the geographic distribution of visitors; this roughly
corresponds to the distribution of IR groups in the world (albeit
Norway is admittedly over-represented on this figure).
Figure 3 shows the time of the day when the game was played
(normalized according to users’ timezones).</p>
        <sec id="sec-5-2-1">
          <title>Analysis of results</title>
          <p>Next, we analyze the collected data in different ways: by
answers (x3.1), by players (x3.2), by papers (x3.3), and by
authors (x3.4).
3.1</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Answers</title>
      </sec>
      <sec id="sec-5-4">
        <title>Time to answer</title>
        <p>In both game modes, users’ response time is limited to 15
seconds. In case the time limit is exceeded, the answer is
considered wrong. On average, it took about half of the
specified time limit to provide an answer, more precisely, it
was 7:58s. There is a notable difference, 1:8s, between
average response times for correct (6:74s) and wrong (8:53s)
answers. Looking at the distribution of answer times,
Figure 4, we find that the higher response time for wrong
answers is due to timeouts. It is also visible from this plot that
when the user knows the correct answer, he is less likely to
use up all the time available.
An interesting measure of success of a game is the
number of returning players. We examined all games that were
played more than once (56 games) searching for
interesting patterns. It is most likely that a user plays again right
after she finishes the game. In 42 cases, users played again
within the same hour. The plot on Figure 6 presents time
intervals between users’ returns.
(a) beginner mode
(b) advanced mode
From the analysis of the game series, we can derive 4 types
of players depending on when they decide to leave the
game.</p>
        <p>Jumpers are the type of visitors who come and play a
single game; they leave after that no matter what the
score is.</p>
        <p>Give-upers are players who return repetitively but leave
the game due to demotivation when in a series of
games their score drops.</p>
        <p>Fighters do the exact opposite. They leave the game at the
top of their form, when in a series of games they reach
their highest score.</p>
        <p>Achievers care about winning. They keep returning and
playing the game until they are back on the top of the
leader board.
Are popular papers, i.e., papers with more citations,
recognized more easily? In order to answer this question, we
first introduce the concept of a paper’s recognition ratio. It
is defined to be the number of times the publication was
successfully recognized by users (players) divided by total
(a) Jumper
(b) Give-uper
(c) Fighter
(d) Achiever
number of times it was shown to users. Next, we divide
all publications that appeared in the game into three groups
based on the number of citations they received (according
to ACM DL). We report the average recognition ratio for
each group in Figure 8, where the leftmost bar represents
papers with the highest number of citations. As expected,
we find that more cited papers are in general better
recognized (left and middle vs. right), albeit papers in the middle
of the citation range seem to perform best in this regard. We
note that these findings may not be conclusive due to data
sparsity.
Are popular authors, i.e., people with more publications,
recognized more easily? Similarly to papers, we define an
author’s recognition ratio to be the number of times the
author’s publications were successfully recognized by users
(players) divided by total number of times her publications
were shown to users. On Figure 9 we plot authors’
popularity, measured in the number of publications (in our
paper selection), against recognition ratio. We find that
there is a significant difference between authors with a
single publication and authors with multiple publications; not
surprisingly, having multiple publications benefits
recognition. On the other hand, it appears that having many more
publications does not improve recognition any further.</p>
        <sec id="sec-5-4-1">
          <title>Observations</title>
          <p>Based on the analysis of results as well as informal
feedback from the users, we make a number of observations
that may generalize beyond our specific use case.
Learning From user feedback we know that this game
was also used to learn about relevant, previously
unseen, publications. On the motto of Comenius’
saying: “Much can be learned in play that will afterwards
be of use when the circumstances demand it.”, we
believe that our game might also prove to be useful for
exploring and discovering more about scientific
literature.</p>
          <p>Unfair behavior We have a suspicion that at least in one
case a user acted dishonestly in order to get into the
lead. This user’s score was unreasonably high
compared to the second best. More importantly, her
average time to answer was 12:84s (compared to the
average of 7:85s), which seems just about enough time
to use a web search engine to look up the paper in
question. This behavior was reported by another
competitor (which supports the assumption that players do
care about their position in the leader board).</p>
          <p>Head-start At least in one case we observed that a user
was restarting the game until he was able to answer
the first question correctly. Beginning the game with
a set of easier questions, then gradually increasing
difficulty, might therefore be helpful in keeping users
engaged.</p>
          <p>Engaging users From the statistics (x3.2) we can see that
users are not very likely to return to the game days
after their first visit. However, chances that they
repeatedly participate in the game within the first hour
after their initial visit are much higher. It is therefore
of vital importance to keep the user stay in the game
as long as possible when she comes for the first time.
Identity Some people ( 10%) opted to use their full civil
name as opposed to a nickname. We hypothesize that
it was a choice made deliberately in case they make it
to the leader board.
5</p>
        </sec>
        <sec id="sec-5-4-2">
          <title>Conclusions and Future work</title>
          <p>This study presented in this paper has started with the
following main question in mind: Could we make an
assessment exercise, in the context of expertise profiling, more
appealing for users? We have answered this question
positively. Our experiment has shown that it is more
desirable for users to participate in a game-like assessment task
rather than having to evaluate results explicitly using a
purpose-built interface. We have analyzed the data
collected along different dimensions and have identified four
types of gaming personalities based on behavioral patterns.
On top of the analysis of the game mechanics, this
experiment has allowed us to gather valuable data about authors
and publications. This has let us to perform an initial
examination of the relation between popularity and
recognizability for both papers and authors.</p>
          <p>In future work we plan to enhance the game in several
ways. The main purpose of the game is to indirectly
measure how researchers recognize each others’ publications.
In this first version, fictitious publications were selected
randomly; however, interesting experiments could be
conducted if the selection of alternative authors was biased in
a controlled way. This would allow us to adjust the
difficulty of the questions as the game progresses. We also plan
to add new game modes (e.g., time trial), expand the data
set (i.e., add more publications), and possibly explore other
research fields/communities.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Eic14]
          <string-name>
            <given-names>Carsten</given-names>
            <surname>Eickhoff</surname>
          </string-name>
          .
          <article-title>Crowd-powered experts: Helping surgeons interpret breast cancer images</article-title>
          .
          <source>In Proceedings of the First International Workshop on Gamification for Information Retrieval</source>
          , pages
          <fpage>53</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [RBN14a]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Rybak</surname>
          </string-name>
          , Krisztian Balog, and Kjetil Nørva˚g. ExperTime:
          <article-title>Tracking expertise over time</article-title>
          .
          <source>In Proceedings of the 37th International ACM SIGIR Conference on Research &amp; Development in Information Retrieval, SIGIR '14</source>
          , pages
          <fpage>1273</fpage>
          -
          <lpage>1274</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [RBN14b]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Rybak</surname>
          </string-name>
          , Krisztian Balog, and Kjetil Nørva˚g.
          <article-title>Temporal expertise profiling</article-title>
          .
          <source>In Proceedings of the 36th European conference on Advances in Information Retrieval, ECIR '14</source>
          , pages
          <fpage>540</fpage>
          -
          <lpage>546</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [VAD08]
          <article-title>Luis Von Ahn</article-title>
          and
          <string-name>
            <given-names>Laura</given-names>
            <surname>Dabbish</surname>
          </string-name>
          .
          <article-title>Designing games with a purpose</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>51</volume>
          (
          <issue>8</issue>
          ):
          <fpage>58</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>