<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Behavior Mining in h-index Ranking Game</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rustam Tagiew</string-name>
          <email>rustam@tagiew.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry I. Ignatov</string-name>
          <email>dignatov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow, Russia https://</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ONTONOVATION</institution>
          ,
          <addr-line>Dresden, Germany, Alumni of TU Freiberg and Uni Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>52</fpage>
      <lpage>64</lpage>
      <abstract>
        <p>Academic rewards and honors are proven to correlate with h-index, although it was not the decision criterion for them till recent years. Once h-index becomes the rule-setting scientometric ranking measure in the zero-sum game for academic positions and research resources as suggested by its advocates, the rational behavior of competing academics is expected to converge towards its gametheoretic solution. This paper derives the game-theoretic solution, its evidence in scientometric data and discusses its consequences on the development of science. DBLP database of 07/2017 was used for mining. Additionally, the openly available scientometric datasets are introduced as a good alternative to commercial datasets of comparable size for public research in behavioral sciences.</p>
      </abstract>
      <kwd-group>
        <kwd>h-index</kwd>
        <kwd>scientometrics</kwd>
        <kwd>behavior mining</kwd>
        <kwd>behavioral game theory</kwd>
        <kwd>experimental economics</kwd>
        <kwd>data science</kwd>
        <kwd>social networks</kwd>
        <kwd>research funding</kwd>
        <kwd>R&amp;D budget</kwd>
        <kwd>innovation management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>One of the main pillars of modern day politics is to reward innovations in order to en</title>
      <p>
        sure competitiveness. The global expenditures on research and development lay over
1:6 trillions (1012) dollars per year, whereof USA, China, Japan, and Europe constitute
78% [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Depending on the country, the Research&amp;Development (R&amp;D) percentage of
      </p>
    </sec>
    <sec id="sec-2">
      <title>Gross Domestic Product (GDP) varies from &lt; 0:9% in developing countries and up to</title>
    </sec>
    <sec id="sec-3">
      <title>4% for Israel and South Korea. Chinese R&amp;D budget will overtake today’s leader USA</title>
      <p>by early 2020’s. The number of scientists and engineers per inhabitant varies in strong
correlation with the R&amp;D budget up to 7h for Finland. Distributing percents of GDP
over permilles of population for R&amp;D creates an above-average income in this branch
– a strong incentive for competition.</p>
    </sec>
    <sec id="sec-4">
      <title>The small group of top scientists and thought leaders is easily spotted. Residual av</title>
      <p>
        erage academic title holders are in contrast more challenging to rank for fair reward. The
academic title ‘doctor’, once introduced by catholic church in the middle-ages, made
its way through the centuries over witch hunting theologists by supervisor-student-links
into more modern and secular disciplines and became the necessary precondition. The
challenge of ranking academics is suggested to be solved by statistics of citations –
the modern-day scietometrics. The scientometric measure h-index is proven to
correlate with chances of winning the Nobel Prize, holding position at top universities and
being accepted for research fellowships [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. More precisely, ranking based on h-index
is a good estimation of the chance for being rewarded for scientific publishing without
necessarily being the obligatory criterion for research funding committees. h is
calculated as the maximal number of publications with at least h citations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The actual usage of h-index as a decision criterion is harshly criticized by
successful scientists from diverse disciplines [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]. The main arguments are the alienation of
scientific work from its purpose and the negligence its practical component beyond
composing scientific prose. Every scientist has his own representation of publications
in his field and his own view on ranking of the relevant research, which does not fit
scientometric figures. Since the return to the less transparent, less exact and more time
consuming alternative of manual content comparison is not desirable, h-index is
advocated to be a yardstick for resource allocation beyond being a correlated indicator
nevertheless:
“I think that considering the h-index should result in better decisions
pertaining to hiring and promotion of scientists, granting of awards,
election to membership in honorary societies and allocation of
research resources by agencies that have to decide between different
competing proposals. As long as this index is well used I think it should
contribute positively to the progress of science and help reward those
who contribute to such progress more fairly.” J. E. Hirsch [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], his2
h = 56
The efforts to improve the research funding practice are rather put into improvement
of h-type measures [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7,8,9</xref>
        ] or into alternative scientometric measures also known as
‘altmetrics’ [
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ] than into the return to the ancient methods. The introduction of
hindex ranking as a decision criterion for budget allocation is a solution, which creates
new problems and requires further fixes (Verschlimmbesserung in German). All
modifications of h-index like hm, g, i10, e, ~, w and others are kept out of the scope of this
paper, since those are not well established yet.
      </p>
    </sec>
    <sec id="sec-5">
      <title>This paper reviews the current status quo of h-index from the perspective of human</title>
      <p>
        behavior research. Its game-theoretical analysis is provided in Section 2. Section 3 lists
data-based evidences for the game-theoretical model from literature and own
experiments on Digital Bibliography &amp; Library Project (DBLP) dataset from July 2017 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>Some of these evidences are derived in this paper for the first time. Sections 4 and 5</title>
      <p>provide a data-driven reconceptualization of the narrative for scientific process. Section</p>
    </sec>
    <sec id="sec-7">
      <title>6 discusses the usage of scientometric datasets for research in behavioral sciences. Section 7 concludes the paper.</title>
    </sec>
    <sec id="sec-8">
      <title>This paper is a piece of interdisciplinary research. Combining knowledge and meth</title>
      <p>ods from game theory, behavioral economics and data science in order to understand
human behavior is a direction, where such market leaders as Facebook, Microsoft and</p>
    </sec>
    <sec id="sec-9">
      <title>1 h-index is calculated by scopus.com and used here to value the context of different opinions</title>
    </sec>
    <sec id="sec-10">
      <title>Google push into since recent years [13,14,15]. Also in academia, workshops and con</title>
      <p>
        ferences are organized for the intersection of experimental economics and machine
learning [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16,17,18,19</xref>
        ]. For the analysis of human behavior from web data, the term
’Behavior Mining’ is suggested, whereby the knowledge from behavioral sciences is
incorporated into the process [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
2
      </p>
      <p>
        Equilibria of h-index ranking game
“I suggest that this index may provide a useful yardstick with which
to compare, in an unbiased way, different individuals competing for
the same resource when an important evaluation criterion is scientific
achievement.” J. E. Hirsch [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
“For a few years, there might be a great increase in scientific
output; but, by going after the obvious, pretty soon science would dry
out. Science would become something like a parlor game.” L. Szilard
[21, p. 1498], inventor of nuclear chain reaction, his h = 10
If achieving the highest h-index rank is the base for the payoff function of N
rational competing individuals, we can derive the solution of this game in game-theoretic
sense. A solution of a game is a prediction about the behavior of the players given the
assumption of their rationality. Rationality means that a player maximizes his payoff
considering what he knows. A solution to a game is a set of possible equilibria. Every
equilibrium is a combination of players’ behaviors, where no player can improve his
payoff by deviating in solo action. h-index ranking game is zero sum – every player
ranks as much up as much others rank down.
      </p>
    </sec>
    <sec id="sec-11">
      <title>A definition of a game is often a simplified formalization of a real world strategic in</title>
      <p>teraction. It consists of a number of participating players, their legal actions and a payoff
function for every player. Let us assume for simplification that every player i 2 N
produces one innovative publication pr;i 2 Pi per round r 2 N+, where Pi = N N+ Ai C.</p>
    </sec>
    <sec id="sec-12">
      <title>All publications are assumed to be of the same quality. The effects of different pro</title>
      <p>
        duction speeds and qualities will be discussed in later sections. None of the players
has a publications before round 1 as the game starts. Ai Ã(N) is the set of possible
coauthors’ sets for a publication, which are subset of all players including the player
himself, i.e. 8O 2 Ai : i 2 O. The set of cited publications of participating players is
C Ã(P). C contains the cited publications from past rounds of participating players
only and hereby makes the definition of P recursive and non-circular. C includes
neither the publications written by researchers from outside nor concurrent publications
nor future publications. Citations of a publication from P by researchers from outside
are considered to be negligible or randomly and equally distributed. Every player i is
allowed to create publications with only him in the (co)authors’ set and no citations of
his competitors’ works. Since “there is no penalty to add authors to a paper” [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], every
player is also allowed to add any other players as coauthors and cite any publications
from previous rounds. A publication is assumed to have one decisive contributor. The
rank is dependent on h and if two players have the same h, their rank among each other
will be randomly chosen.
      </p>
      <p>Only in the second round, players can get an h &gt; 0. In the second round, it is not
rational for any player to cite publications of his competitors from the first round, which
do not include him as a coauthor. Surely, every player will cite outside researchers,
which do not compete with him for a certain resource. Every player will achieve at least
an h-index of 1 in the second round, since he will cite his publication from the first
round. It is irrational to not cite own publications. If a player adds x randomly chosen
players as coauthors to his publication in the first round, he will have one publication
with x + 1 citations in the second round and still end up with h = 1. All x randomly
chosen players will have two publications in the second round – first one x + 1 times cited
and the second with only one citation. They also end up with h = 1 as well. Adding
random coauthors in solo action does neither improve nor worsen ones position in the
h-index ranking.</p>
      <p>If a clique q 2 Ã(N) of x + 1 players agree to add them all as coauthors to their
publications in the first round, they will achieve h = x + 1 in the second round and will
rank higher than the rest N r q with h = 1. None from the clique q will improve his
rank by defecting from the agreement in solo action, because this will only reduce the
h-index of the whole clique by 1. Even if only one coauthor is excluded from x + 1
publications, then in the second round this will result in having x publication with x + 1
citations and one publication with x. The excluded player will not cite the publication,
he was excluded from. Therefore any formed clique q is an equilibrium and the solution
of the game is a set of multiple equilibria.</p>
      <p>If every agreement for the round 1 is an equilibrium, then players will prefer to
belong to a slim majority clique smq1; jsmq1j = (jNj 2) + 1. If a clique is less than
majority, then the rest might form a single clique with a higher h. If a clique is much
bigger than slim majority, then the members will be randomly ranked on a longer list of
places on the top. The members of the slim majority clique smq1 from the first round
will outperform the rest by at least 2 (jNj mod 2) 2 f1; 2g. The mechanism of
making agreements is considered to depend on features of social networks and too extensive
to be modeled game-theoretically in the this work. It will be referred to as
collaborativeness.</p>
    </sec>
    <sec id="sec-13">
      <title>If the game lasts more than two rounds, for every round r being a member of a slim</title>
      <p>majority clique smqr will add at least 1 to h more than being a member of the rest. Sets
smq1 and smq2 do not need to be the same. If a player of extraordinary
collaborativeness manages to be the only one player, who was a member of all slim majority cliques
in all rounds, he will be the indisputable winner of h-index ranking.</p>
      <p>This game-theoretical analysis reveals following major characteristics of rational
behavior for a successful player, if the allocation of resources correlates with h-index
ranking or is even based on it:</p>
    </sec>
    <sec id="sec-14">
      <title>1. Cite publications (co)authored by you.</title>
    </sec>
    <sec id="sec-15">
      <title>2. Never cite those researchers that might be involved in the competition over the same</title>
      <p>resource with you.</p>
    </sec>
    <sec id="sec-16">
      <title>3. Make an agreement for a coauthoring clique. This clique should establish a slim majority involved in the competition for a certain resource.</title>
    </sec>
    <sec id="sec-17">
      <title>4. If possible, abandon worked-out coauthoring agreements, if formation of a new coauthoring agreement with new coauthors of lower h-index can establish a slim majority.</title>
      <p>3</p>
      <sec id="sec-17-1">
        <title>Evidence in data</title>
        <p>
          “We see here that in the real real world – when the chips are down,
the payoff is not five dollars but a successful career, and people have
time to understand the situation – the predictions of game theory fare
quite well.” R. J. Aumann [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], Nobel Prize winner, his h = 22
The 3:8M publications’ dataset from DBLP computer science biography database shows
that the distribution of the (co)authors’ number per paper has the shape of a log-normal
distribution (Fig.1). No limit seems to be set – one publication in DBLP has 267
coauthors. The medians of six out of seven types of records approximate the means of
lognormal distributions. The observed proportions for eµ+s and eµ+2s upper bounds of
(co)authors’ number are close to the theoretically expected values of a log-normal
distribution. 2:5 coauthors is today a typical value for informal publications, conference
and journal papers in computer science. 3% of conference and journal papers and only
        </p>
      </sec>
    </sec>
    <sec id="sec-18">
      <title>2:4% of informal publications have more than 6 authors. Informal publications target</title>
      <p>fast dissemination of ideas and have less of extra long coauthors’ lists than conference
and journal papers.</p>
    </sec>
    <sec id="sec-19">
      <title>In the time before the introduction of h-index, the distribution of (co)authors per</title>
      <p>paper showed the same shape of a log-normal distribution in other disciplines too as a
study on all papers indexed in the 1980-2000 annual volumes of the Science Citation</p>
    </sec>
    <sec id="sec-20">
      <title>Index (SCI) of the Institute for Scientific Information [23]. The median amount of coau</title>
      <p>thors increased from 2 to 3 between 1980 and 1998. It converges towards the solution of
h-index ranking game – the size of cliques grows. The number of citations for a paper
grows close to linear with the number of its (co)authors – the slope of this relationship
became steeper from 1980 to 1998.</p>
      <p>Fig. 2 shows the development of cumulative contributions per author per year. Since
the share of each coauthor in a paper is not recorded in the database, it is derived by
simply dividing 1 (a paper) by the number of coauthors. If an author (co)authored more
than one paper in a year, these shares are added. For instance, being a coauthor of 2
2coauthors papers will result a cumulative contribution of 1. 96% of DBLP records from
1980-2016 are used for this calculation. One can see in the graph that the approximately
log-normal distribution of the recorded authors’ cumulative share drifts towards 0. The
game-theoretically predicted prolongation of (co)authors’ list enables the incorporation
of a growing number of scientists with far lower output into the scientific process.</p>
      <p>A policy close to the game-theoretic solution for h-index implemented by one large
scientific institute, the Collider Detector at Fermilab (CDF)3 since 1998. It enforces the
addition of all its scientists and engineers as coauthors to all of its publications.
Employees are added to the CDF authors’ list after one year of full-time work and removed
after a year since the date they left. This list contains typically over 300 authors. A
3 www-cdf.fnal.gov
s
n
o
it
a
c
li
b
u
p
f
o
r
e
b
m
u
n
k
0
0
5
k
0
0
4
k
0
0
3
k
0
0
2
k
0
0
1
3766094 publications from dblp.uni−trier.de, July, 2017</p>
      <p>Papers median=3 lmean=2.67
Articles median=2 lmean=2.35
Informals median=3 lmean=2.48
Books median=1 lmean=1.07
Chapters median=2 lmean=2.21
Ref. Works median=1 lmean=1.39</p>
      <p>Editorships median=1 lmean=1.43
1
2
3
4
5
6
10</p>
      <p>20
number of (co)authors</p>
    </sec>
    <sec id="sec-21">
      <title>Type of publication #(Co)authors</title>
      <p>eµ+s eµ+s #(Co)authors
eµ+2s eµ+2s</p>
    </sec>
    <sec id="sec-22">
      <title>Expected proportion</title>
    </sec>
    <sec id="sec-23">
      <title>Conference Papers</title>
      <p>Journal Articles
Informal Publications</p>
    </sec>
    <sec id="sec-24">
      <title>Books and Theses</title>
      <p>Book Chapters
Editorships
84:13%
Fig. 1. (Co)authors per paper in computer science. x-axis has a logarithmic scale. The means of
the log-normal distributions are denoted as ‘lmean’. Key data for +s and +2s upper bounds is
organized in the table. Corresponding types from graph and table are boldly indicated.
1980
1985
1990</p>
      <p>1995
s
r
o
h
t
u
a
0
5
9
7
5
s
r
o
h
t
u
a
5
0
9
4
0
4
arithmethic mean
median
logarithmic mean
Fig. 2. Drift of average cumulative share of annually (co)authored papers during last decades.
In the top graph, normal distributions are fitted to logarithmicly binned histogram of chosen
years - dashed grey line. Bottom graph shows the development of average activity of contributing
(co)authors.</p>
      <p>
        18786 consistently contributing authors from DBLP, 7/2017
18786 consistently contributing authors from DBLP, 7/2017
arithmethic mean
median
logorithmic mean
50
40
30
20
10
0
study conducted on a dataset of 189k publications [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] showed that the number of
coauthors is strongly correlated with h as suggested by game-theoretic analysis. Every field
has its typical average coauthor number. Mathematics has a big proportion of
singleauthor paper, therefore mathematicians achieve lower h than others.
      </p>
      <p>
        The correlations between scientometric measures and graph measures of coauthors’
social network were calculated in a study on a dataset of 1809 authors from information
management and systems schools of 5 US universities [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The highest correlation of
0:861 was observed between the number of publications and average tie strength, which
is the number of joint publications. With growing number of publications, the cliques
of coauthors stabilize – frequent reestablishment of a new clique might cost more than
nothing. Alternatively, authors with a big number of publications might follow different
goals than dominating h-index ranking game. Eigenvector centrality, which increases
with the number of connected nodes and their connections, is not correlated with the
number publications at all. In this context of clique stabilization being strongly
correlated with number of papers, h-index is rather correlated with average tie strength at
0:660 than with eigenvector centrality at 0:042. h-index moderately correlates with
average tie strength, because authors with a higher h have more stable cliques.
      </p>
      <p>
        h-index does not suffer from dilution of innovation into multiple papers, since the
growth of citations is strongly correlated with growth of papers per single research
project. This is showed in a study on a dataset of 96 BIF grant applicants [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Quality
of publications is thus not important for h-ranking game as assumed. Therefore, the
speed is expected to raise. The number of scientific publications grows exponentially
in many disciplines [
        <xref ref-type="bibr" rid="ref26 ref27">26,27</xref>
        ]. The share of publications available online grows as well.
      </p>
    </sec>
    <sec id="sec-25">
      <title>At the same time, the period of time for a publication to loose at chance to be cited</title>
      <p>
        anymore shortens – the quality of literature review is diminishing. Also the relationship
between impact factor and citations weakens since late 90s [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. This means that the
quality of peer-review becomes less important and a potentially best-cited paper can be
published in a journal with a less rigorous peer-review process4.
      </p>
      <p>For the plot in Fig. 3, all the authors from 1980-2016 are taken, who (co)authored
at least one paper in the 10 following years after their first paper. The curve shows
that the average productivity of a scientist roughly doubles within ten years. This can
also be interpreted as growth of dilution of invitations into multiple papers. The curve
shows pattern saturation too – there might be limits to productivity or the competing
individuals change their objectives. Fig.4 shows k-means centroids of productivity
trajectories, where the k is set to 4. The productivity of top 1-2% of authors show a linear
growth over 50% per years. Meanwhile, roughly the half of authors shows no
productivity growth at all. The publication shares of the non-growing half add up to one paper
a year.
4</p>
      <p>
        h-index rank measures collaborativeness
The game-theoretic solution reveals that collaborative academics are preferred by the
h-index ranking, while academics writing single-author papers loose. If an academic
is several times more productive, he will still achieve the same result as those, who
achieve being added as coauthors same amount of times. An academic with a few high
quality publications like the Fields Medal nominee Grigori Perelman would also loose
in h-index ranking game. Hirsch identified as the major “short-coming” of original
hindex definition “its inability to discriminate between authors that publish alone or in
small cliques versus those authors whose papers have usually many coauthors” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This
paper assumes a neutral position seeing it rather as a feature than “short-coming”.
      </p>
      <p>
        If the fast h-index growth naturally results as a by-product from certain type of
behavior, then this behavior is key to success in acquiring budget shares. According to the
critics [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ], the rational behavior in h-index ranking game deviates strongly from the
natural behavior and should not be honored by budget shares. Could the natural
behavior and the rational behavior in h-index ranking game be the same, since the result is the
same? Academics might follow the game-theoretic solution unconsciously – they only
need to have a bias for self-citations, for non-mentioning competitors and for
participation in rapidly changing slim majority coauthorships. The Bonzi and Snyder’s survey
[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] conducted in early 90-s they studied scientists’ perception of both self-citation and
citation to others surveying 51 self-citing authors in several natural science disciplines.
They found that scientists did not testify for any substantial difference in self-citation
motivation. Thus main motivation was scientific and one of their respondents argued:
“If you are a major contributor, it’s difficult to avoid citing yourself.” The unconscious
bias for self-citation improves h, which correlates with success. Mastering to always be
a part of dynamic slim majorities seems intuitively to be correlated with success also
beyond academic world. Conscious rational behavior in h-index ranking game would
require the same skill.
      </p>
    </sec>
    <sec id="sec-26">
      <title>4 For instance, the paper “The Conceptual Penis as a Social Construct” arguing that penises</title>
      <p>cause climate change could be published in a peer-reviewed journal
71088 consistently contributing authors from DBLP, 7/2017
0
0
1
2
3
4
5
1
2
3
4
5
6
7
8
since their first paper in the subsequent 5,10,15, and 20 years. The four graphs show the k-means
clustering results on maximumly available consistently contributing authors.</p>
      <p>8
r
a
ye
r
e
p
sn 6
o
it
ca
il
b
u
p
f
eo 4
r
a
sh
d
e
t
a
l
u
um 2
c
4
1
2
1
r
a
ye
rep 10
sn
o
it
ilca 8
b
u
p
f
o
re 6
a
sh
d
e
t
la 4
u
m
cu
2</p>
      <sec id="sec-26-1">
        <title>Division of labor in science</title>
        <p>A scientific community is obviously a collaborative network, which needs socially
active members to exist. In the case of naturally grown h-index, it represents the degree of
collaborativeness of an author inside of a scientific community. Scientific publications
network is a mirror of the real social network. h-index ranking rewards its socially
active members the most. Socially active members are also the rule-setter in a community
and therefore will advocate the status quo of h-index ranking, since they profit from it
the most. Is the core of scientific achievement to create a community around a certain
topic, which urges to create innovations?</p>
        <p>
          Even if the academics don’t develop own ideas for next publications, they will at
least actively adopt and develop ideas from non-scientific sources. They will become an
idea hungry community, which is eager to publish, to coauthor and to cite everything as
a consequence of rationality in h-index ranking game. The ground-truth is that the real
originator of an innovation is not always among the authors of the publication exposing
it and also not among those, who will be rewarded for this academic achievement. Like
in patent affairs [30, e.g.], where most patents are owned by non-individuals, there is a
division of labor between those, who originate ideas, and those, who promote them into
real life. Ideas might appear in different heads simultaneously – it is an honorable
scientific achievement to effectively place those ideas into the scientific community using
social skills. On the other hand, tools should provided to better honor scientific beyond
composing publications. For instance, data citation [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] is useful feature for this goal.
6
        </p>
        <p>
          Scientometric datasets for behavior sciences
The bottleneck of public research in behavior mining is the limited access to large
datasets, which are mostly held by for-profit companies. Preparation and release of
large, authentic and recent datasets tend to contradict the commercial interest. Even if
several studies on commercial datasets are put into public domain, their datasets might
not be available for reproducibility of results. While the sizes of datasets from non-profit
social networks are about 100k participants [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], non-profit scientometric databases
like Citeseerx [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] and DBLP offer datasets with millions of participants. The
gametheoretic solution of h-index ranking game, which is presented in this, can be used as
base for the tailored hypothesis space in data mining.
7
        </p>
      </sec>
      <sec id="sec-26-2">
        <title>Conclusion</title>
      </sec>
    </sec>
    <sec id="sec-27">
      <title>This paper introduced game-theoretic perspective into scientometrics. The leading mea</title>
      <p>sure – the h-index ranking established a reward system, which prefers socially active
academics and therefore furthers the labor division in science. The evidences for the
convergence towards the game-theoretic have been found in the data of DBLP database
and in results of related work.</p>
    </sec>
    <sec id="sec-28">
      <title>Acknowledgments The research was supported by the Russian Science Foundation</title>
      <p>under grant 17-11-01294 and performed at National Research University Higher School
of Economics, Russia. We thank the people behind DBLP for providing free full access
to their database.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Battelle Memorial Institute: 2014 global
          <string-name>
            <surname>R</surname>
          </string-name>
          &amp;
          <article-title>D funding forecast (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bornmann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniel</surname>
          </string-name>
          , H.D.:
          <article-title>What do we know about the h index</article-title>
          ?
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>58</volume>
          (
          <issue>9</issue>
          ) (
          <year>2007</year>
          )
          <fpage>1381</fpage>
          -
          <lpage>1385</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hirsch</surname>
            ,
            <given-names>E.J.:</given-names>
          </string-name>
          <article-title>An index to quantify an individual's scientific research output</article-title>
          .
          <source>Proc. Nat. Acad. Sci</source>
          .
          <volume>46</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>The mismeasurement of science</article-title>
          .
          <source>Current Biology</source>
          <volume>17</volume>
          (
          <issue>15</issue>
          ) (
          <year>2007</year>
          )
          <fpage>583</fpage>
          -
          <lpage>585</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. S¸ engör,
          <string-name>
            <surname>A.M.C.</surname>
          </string-name>
          <article-title>: How scientometry is killing science</article-title>
          .
          <source>GSA Today</source>
          (
          <year>2014</year>
          )
          <fpage>44</fpage>
          -
          <lpage>45</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hirsch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buela-Casal</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The meaning of the h-index</article-title>
          .
          <source>International Journal of Clinical and Health Psychology</source>
          <volume>14</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          )
          <fpage>161</fpage>
          -
          <lpage>164</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>P.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campiteli</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kinouchi</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Is it possible to compare researchers with different scientific interests?</article-title>
          <source>Scientometrics</source>
          <volume>68</volume>
          (
          <issue>1</issue>
          ) (
          <year>2006</year>
          )
          <fpage>179</fpage>
          -
          <lpage>189</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Egghe</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The hirsch index and related impact measures</article-title>
          .
          <source>Annual review of information science and technology 44(1)</source>
          (
          <year>2010</year>
          )
          <fpage>65</fpage>
          -
          <lpage>114</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hirsch</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>An index to quantify an individualâA˘ Z´s scientific research output that takes into account the effect of multiple coauthorship</article-title>
          .
          <source>Scientometrics</source>
          <volume>85</volume>
          (
          <issue>3</issue>
          ) (
          <year>2010</year>
          )
          <fpage>741</fpage>
          -
          <lpage>754</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alhalabi</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kao</surname>
            ,
            <given-names>H.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          :
          <article-title>Researchgate: An effective altmetric indicator for active researchers? Computers in human behavior 55 (</article-title>
          <year>2016</year>
          )
          <fpage>1001</fpage>
          -
          <lpage>1006</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tagiew</surname>
          </string-name>
          , R.: Research project:
          <article-title>Text engineering tool for ontological scientometry</article-title>
          .
          <source>CoRR abs/1601</source>
          .
          <year>01887</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ley</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>DBLP - some lessons learned</article-title>
          .
          <source>PVLDB</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
          <fpage>1493</fpage>
          -
          <lpage>1500</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Bailey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>What does an economist at facebook do? quora.com/What-does-an-economist-</article-title>
          <string-name>
            <surname>at-</surname>
          </string-name>
          Facebook-do (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Microsoft</given-names>
            <surname>Inc</surname>
          </string-name>
          .: Microsoft research new york city. research.microsoft.com/en-us/labs/newyork/ (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Varian</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          :
          <article-title>Big data: New tricks for econometrics</article-title>
          . people.ischool.berkeley.edu/ hal/Papers/2013/ml.pdf (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tagiew</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poelmans</surname>
          </string-name>
          , J., eds.:
          <source>First International Workshop on Experimental Economics and Machine Learning</source>
          ,
          <string-name>
            <surname>KU-Leuven</surname>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>ceur-ws</article-title>
          .
          <source>org/</source>
          Vol-
          <volume>870</volume>
          /, at ICFCA.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sunstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zittrain</surname>
          </string-name>
          , J.:
          <article-title>Social media and behavioral economics. today.law.harvard.edu/social-media-and-</article-title>
          <string-name>
            <surname>behavioral-</surname>
          </string-name>
          economics-conference/ (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tagiew</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amroush</surname>
          </string-name>
          , F., eds.: Second International Workshop on Experimental Economics and
          <article-title>Machine Learning</article-title>
          ,
          <source>IEEE Computer Society</source>
          (
          <year>2013</year>
          )
          <article-title>dx</article-title>
          .doi.org/10.1109/ICDMW.
          <year>2013</year>
          .
          <volume>178</volume>
          ,
          <string-name>
            <surname>at</surname>
            <given-names>ICDM</given-names>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tagiew</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hilbert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delhibabu</surname>
          </string-name>
          , R., eds.
          <source>: Proceedings of the Third Workshop on Experimental Economics and Machine Learning co-located with the 13th International Conference on Concept Lattices and Their Applications (CLA</source>
          <year>2016</year>
          ), Moscow, Russia, July
          <volume>18</volume>
          ,
          <year>2016</year>
          . Volume 1627 of CEUR Workshop Proceedings., CEUR-WS.org (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>From data mining to behavior mining</article-title>
          .
          <source>International Journal of Information Technology and Decision Making</source>
          <volume>5</volume>
          (
          <issue>4</issue>
          ) (
          <year>2006</year>
          )
          <fpage>703</fpage>
          -
          <lpage>712</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>United</given-names>
            <surname>States</surname>
          </string-name>
          .
          <source>Congress. Senate: Hearings. Number Bd</source>
          . 6. U.S. Government Printing Office (
          <year>1961</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sotomayor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Two-Sided Matching</surname>
          </string-name>
          :
          <article-title>A Study in Game-Theoretic Modeling and Analysis</article-title>
          .
          <source>Econometric Society Monographs</source>
          . Cambridge University Press (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glänzel</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danell</surname>
          </string-name>
          , R.:
          <article-title>Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies</article-title>
          .
          <source>Scientometrics</source>
          <volume>60</volume>
          (
          <issue>3</issue>
          ) (
          <year>2004</year>
          )
          <fpage>421</fpage>
          -
          <lpage>432</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Abbasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hossain</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures</article-title>
          .
          <source>Journal of Informetrics</source>
          <volume>5</volume>
          (
          <issue>4</issue>
          ) (
          <year>2011</year>
          )
          <fpage>594</fpage>
          -
          <lpage>607</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Bornmann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniel</surname>
          </string-name>
          , H.D.:
          <article-title>Multiple publication on a single research study: does it pay? the influence of number of research articles on total citation counts in biomedicine</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>58</volume>
          (
          <issue>8</issue>
          ) (
          <year>2007</year>
          )
          <fpage>1100</fpage>
          -
          <lpage>1107</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Parolo</surname>
          </string-name>
          , P.D.B.,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huberman</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Attention decay in science</article-title>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Electronic publication and the narrowing of science and scholarship</article-title>
          .
          <source>Science</source>
          (New York, N.Y.)
          <volume>321</volume>
          (
          <issue>5887</issue>
          ) (
          <year>2008</year>
          )
          <fpage>395</fpage>
          -
          <lpage>399</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Lozano</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          , LariviÃ´lre, V.,
          <string-name>
            <surname>Gingras</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>The weakening relationship between the impact factor and papers' citations in the digital age</article-title>
          .
          <source>Journal of the Association for Information Science &amp; Technology</source>
          <volume>63</volume>
          (
          <issue>11</issue>
          ) (
          <year>2012</year>
          )
          <fpage>2140</fpage>
          -
          <lpage>2145</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Bonzi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snyder</surname>
            ,
            <given-names>H.W.</given-names>
          </string-name>
          :
          <article-title>Motivations for citation: A comparison of self citation and citation to others</article-title>
          .
          <source>Scientometrics</source>
          <volume>21</volume>
          (
          <issue>2</issue>
          ) (
          <year>Jun 1991</year>
          )
          <fpage>245</fpage>
          -
          <lpage>254</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Sterzi</surname>
          </string-name>
          , V.:
          <article-title>Patent quality and ownership: An analysis of uk faculty patenting</article-title>
          .
          <source>Research Policy</source>
          <volume>42</volume>
          (
          <issue>2</issue>
          ) (
          <year>2013</year>
          )
          <fpage>564</fpage>
          -
          <lpage>576</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31. Group, D.C.S.:
          <article-title>Joint Declaration of Data Citation Principles</article-title>
          .
          <year>force11</year>
          .org/datacitation (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Tagiew</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delhibabu</surname>
          </string-name>
          , R.:
          <article-title>Economics of internet-based hospitality exchange</article-title>
          .
          <source>In: Proceedings of the 2015 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent</source>
          Agent
          <string-name>
            <surname>Technology (WI-IAT</surname>
          </string-name>
          )
          <article-title>- Volume 01</article-title>
          . WI-IAT '
          <fpage>15</fpage>
          , Washington, DC, USA, IEEE Computer Society (
          <year>2015</year>
          )
          <fpage>493</fpage>
          -
          <lpage>498</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>hsuan Chen</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Khabsa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caragea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ororbia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>CiteSeerX : AI in a Digital Library Search Engine</article-title>
          . (
          <year>2014</year>
          )
          <fpage>2930</fpage>
          -
          <lpage>2937</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>