=Paper= {{Paper |id=Vol-1778/AmILP_2 |storemode=property |title=Reputation in the Academic World |pdfUrl=https://ceur-ws.org/Vol-1778/AmILP_2.pdf |volume=Vol-1778 |authors=Nardine Osman,Carles Sierra |dblpUrl=https://dblp.org/rec/conf/ecai/OsmanS16a }} ==Reputation in the Academic World== https://ceur-ws.org/Vol-1778/AmILP_2.pdf
                                 Reputation in the Academic World
                                                   Nardine Osman and Carles Sierra 1


Abstract. With open access gaining momentum, open reviews be-                 in repositories using our module can be evaluated by an unlimited
comes a more persistent issue. Institutional and multidisciplinary            number of peers that offer not only a qualitative assessment in the
open access repositories play a crucial role in knowledge transfer            form of text, but also quantitative measures to build the works reputa-
by enabling immediate accessibility to all kinds of research output.          tion. Crucially, our open peer review module also includes a reviewer
However, they still lack the quantitative assessment of the hosted re-        reputation system based on the assessment of reviews themselves,
search items that will facilitate the process of selecting the most rel-      both by the community of users and by other peer reviewers. This
evant and distinguished content. This paper addresses this issue by           allows for a sophisticated scaling of the importance of each review
proposing a computational model based on peer reviews for assess-             on the overall assessment of a research work, based on the reputation
ing the reputation of researchers and their research work. The model          of the reviewer.
is developed as an overlay service to existing institutional or other            As a result of calculating the reputation of authors, reviewers, pa-
repositories. We argue that by relying on peer opinions, we address           pers, and reviews, by relying on peer opinions, we argue that the
some of the pitfalls of current approaches for calculating the reputa-        model addresses some of the pitfalls of current approaches for calcu-
tion of authors and papers. We also introduce a much needed feature           lating the reputation of authors and papers. It also introduces a much
for review management, and that is calculating the reputation of re-          needed feature for review management, and that is calculating the
views and reviewers.                                                          reputation of reviews and reviewers. This is discussed further in the
                                                                              concluding remarks.
1     MOTIVATION                                                                 In what follows, we present the ARM reputation model and how
                                                                              it quantifies the reputation of papers, authors, reviewers, and reviews
There has been a strong move towards open access repositories in the          (Section 2), followed by some evaluation where we use simulations
last decade or so. Many funding agencies — such as the UK Research            to evaluate the correctness of the proposed model (Section 3), before
Councils, Canadian funding agencies, American funding agencies,               closing with some concluding remarks (Section 4).
the European Commission, as well as many universities — are pro-
moting open access by requiring the results of their funded projects
to be published in open access repositories. It is a way to ensure that       2     ARM: ACADEMIC REPUTATION MODEL
the research they fund has the greatest possible research impact. Aca-
demics are also very much interested in open access repositories, as          2.1    Data and Notation
this helps them maximise their research impact. In fact, studies have
confirmed that open access articles are more likely to be used and            In order to compute reputation values for papers, authors, review-
cited than those sitting behind subscription barriers [2]. As a result, a     ers, and reviews we require a Reputation Data Set, which in practice
growing number of open access repositories are becoming extremely             should be extracted from existing paper repositories.
popular in different fields, such as PLoS ONE for Biology, arXiv for
                                                                              Definition 2.1 (Data). A Reputation data Set is a tuple
Physics, and so on.
                                                                              hP, R, E, D, a, o, vi, where
   With open access gaining momentum, open reviews becomes a
more persistent issue. Institutional and multidisciplinary open access        • P = {pi }i2P is a set of papers (e.g. DOIs).
repositories play a crucial role in knowledge transfer by enabling im-        • R = {rj }j2R is a set of researcher names or identifiers (e.g. the
mediate accessibility to all kinds of research output. However, they            ORCHID identifier).
still lack the quantitative assessment of the hosted research items that      • E = {ei }i2E [ {?} is a totally ordered evaluation space, where
will facilitate the process of selecting the most relevant and distin-          ei 2 N \ {0} and ei < ej iff i < j and ? stands for the absence
guished content. Common currently available metrics, such as num-               of evaluation. We suggest the range [0,100], although any other
ber of visits and downloads, do not reflect the quality of a research           range may be used, and the choice of range will not affect the
product, which can only be assessed directly by peers offering their            performance.
expert opinion together with quantitative ratings based on specific           • D = {dk }k2K is a set of evaluation dimensions, such as original-
criteria. The articles published in the Frontiers book [5] highlight the        ity, technical soundness, etc.
need for open reviews.                                                        • a : P ! 2R is a function that gives the authors of a paper.
   To address this issue we develop an open peer review module, the           • o : R ⇥ P ⇥ D ⇥ T ime ! E, where o(r, p, d, t) 2 E is a
Academic Reputation Model (ARM), as an overlay service to exist-                function that gives the opinion of a reviewer, as a value in E, on a
ing institutional or other repositories. Digital research works hosted          dimension d of a paper p at a given instant of time t.
1 Artificial Intelligence Research Institute (IIIA-CSIC), Barcelona, Spain,   • v : R ⇥ R ⇥ P ⇥ T ime ! E, where v(r, r0 , p, t) = e is a
    email: {nardine, sierra}@iiia.csic.es                                       function that gives the judgement of researcher r over the opin-
  ion of researcher r0 , on paper p as a value e 2 E.2 Therefore, a            expertise of more than one researcher is always better than the ex-
  judgement is a reviewer’s opinion about another reviewer’s opin-             pertise of a single researcher. Nevertheless, the gain in a researcher’s
  ion. Note that while opinions about a paper are made with respect            reputation decreases as the number of co-authors increase. Hence,
  to a given dimension in D, judgements are not related to dimen-              our model might cause researchers to be more careful when select-
  sions. We assume a judgement is only made with respect to one                ing their collaborators, since they should aim at increasing the quality
  dimension, which describes how good the review is in general.                of the papers they produce in such a way that the gain for each author
                                                                               is still larger than the gain it could have received if it was to work on
   We will not include the dimension (or the criteria being evaluated,         the same research problem on her own. As such, adding authors who
such as originality, soundness, etc.) in the equations to simplify the         do not contribute to the quality of the paper will also discouraged.
notation. There are no interactions among dimensions so the set of
equations apply to each of the dimensions under evaluation.
                                                                                 R
                                                                                 8A (r)X
                                                                                       =
   Also, we will also omit the reference to time in all the equations.
                                                                                 >
                                                                                 >                  (p) ⇥ RP (p) + (1            (p) ) ⇥ 50
Time is essential as all measures are dynamic and thus they evolve               >
                                                                                 <
                                                                                      8p2pap(r)
along time. We will make the simplifying assumption that all opin-                                                                               if pap(r) 6= ;
                                                                                 >
                                                                                 >                           |pap(r)|
ions and judgements are maintained in time, that is, they are not mod-           >
                                                                                 :?                                                              otherwise
ified. Including time would not change the essence of the equations,                                                                                              (2)
it will simply make the computation complexity heavier.                        where pap(r) = {p 2 P | r 2 a(p) ^ RP (p) 6= ?} denotes
   Finally, if a data set allowed for papers, reviews, and/or judge-           the papers authored by a given researcher r, ? describes ignorance,
ments to have different versions, then our model simply considers                           1
the latest version only.                                                         (p) =          is the coefficient that takes into consideration the
                                                                                         |a(p)|
                                                                               number of authors of a paper (recall that a(p) denotes the authors of
2.2     Reputation of a Paper                                                  a paper p), and is a tuning factor that controls the rate of decrease
                                                                               of the (p) coefficient. Also note the multiplication by 50, which de-
We say the reputation of a paper is a weighted aggregation of its              scribes ignorance, as 50 is the median of the chosen range [0, 100].
reviews, where the weight is the reputation of the reviewer. (Sec-             If another range was chosen, the median of that range would be used
tion 2.4).                                                                     here. The choice of range and its median does not affect the perfor-
                                                                               mance of the model (i.e. the results of the simulation of Section 3
               8 X                                                             would remain the same).
               >
               >           RR (r) · o(r, p)
               >
               >
               >
               < 8r2rev(p)
                        X                         if |rev(p)|     k            2.4      Reputation of a Reviewer
      RP (p) =                 RR (r)                                   (1)
               >
               >
               >
               >     8r2rev(p)                                                 Similar to the reputation of authors (Section 2.3), we consider that if a
               >
               :
                 ?                                otherwise                    reviewer produces ‘good’ reviews, then the reviewer is considered to
                                                                               be a ‘reputed’ reviewer. Furthermore, we consider that the reputation
where rev(p) = {r 2 R | o(r, p) 6= ?} denotes the reviewers of a               of a reviewer is essentially an aggregation of the opinions over her
given paper.                                                                   reviews.3
   Note that when a paper receives less that k reviews, its reputation            We assume that the opinions on how good a review is can be
is defined as unknown, or ?. We currently leave k as a parameter,              obtained, in a first instance, by other reviewers that also reviewed
though we suggest that k > 1, so that the reputation of a paper is not         the same paper. However, as this is a new feature to be introduced
dependent on a single review. We also recommend small numbers for              in open access repositories and conference and journal paper man-
k, such as 2 or 3, because we believe it is usually difficult to obtain        agement systems, we believe collecting such information might take
reviews. As such, new papers can quickly start building a reputation.          some time. An alternative that we consider here is that in the mean-
                                                                               time we can use the ‘similarity’ between reviews as a measure of the
                                                                               reviewers opinions about reviews. In other words, the heuristic could
2.3     Reputation of an Author
                                                                               be phrased as ‘if my review is similar to yours then I may assume
We consider that a researcher’s author reputation is an aggregation            your judgement of my review would be good.’
of the reputation of her papers. The aggregation is based on the con-             We note v ⇤ (ri , rj , p) 2 E for the ‘extended judgement’ of ri over
cept that the impact of a paper’s reputation on its authors’ reputation        rj ’s opinion on paper p, and define it as an aggregation of opinions
is inversely proportional to the total number of its authors. In other         and similarities as follows:
words, if one researcher is the sole author of a paper, then this author
is the only person responsible for this paper, and any (positive or neg-
                                                                                     v ⇤ (ri , rj , p) =
ative) feedback about this paper is propagated as is to its sole author.             8
                                                                                                       v                                  discuss those in grey (grey rectangles represent reputation measures,
                         >
                         >
                         < 8v2V ⇤ (r ,r )                                      whereas the grey oval represents the extended judgements).
                                      i j
        RR (ri , rj ) =                         if V ⇤ (ri , rj ) 6= ;   (4)
                         >
                         >  |V ⇤ (ri , rj )|
                         >
                         :?                                                                                  Paper                       Author
                                                otherwise                        opinion                    Reputation                  Reputation

   Finally, the reputation of a reviewer r, RR (r), is an aggregation of
judgements that her colleagues make about her capability to produce
good reviews. We weight this with the reputation of the colleagues
as a reviewer:
                   8 X                                                                                     Reviewer
                   >                                                                                        Reputation
                   >           RR (ri ) · RR (ri , r)
                   >
                   >
                   >
                   < 8ri 2R⇤ X                        R⇤ 6= ;
        RR (r) =                     R R i(r )                       (5)        x-judgment
                   >
                   >
                   >
                   >        8ri 2R⇤
                   >
                   :
                      50                              otherwise
                                                                                                             Review
where R⇤ = {ri 2 R | V ⇤ (ri , r) 6= ;}. When no judgements have                                            Reputation

been made over r, we take the value 50 to represent ignorance (as 50
is the median of the chosen range [0, 100] — again, we note that any
the choice of range and its median does not affect the performance
of the model; that is, the results of the simulation of Section 3 would
remain the same).                                                               judgment
   Note that the reputation of a reviewer depends on the reputation
of other reviewers. In other words, every time the reputation of one
reviewer will change, it will trigger changing the reputation of other                               Figure 1: Dependencies
reviewers, which might lead to an infinite loop of modifying the rep-
utation of reviewers. We address this by using an algorithm similar
to the EigenTrust algorithm, as illustrated by Algorithm ?? of the              • Author’s Reputation. The reputation of the author depends on
Appendix. In fact, this algorithm may be considered as a variation of             the reputation of its papers (Equation 2). As such, every time the
the EigenTrust algorithm, which will require some testing to confirm              reputation of one of his papers changes, or every time a new paper
how fast it converges.                                                            is created, the reputation of the author must be recalculated.
                                                                                • Paper’s Reputation. The reputation of the paper depends on the
                                                                                  opinions it receives, and the reputation of the reviewers giving
2.5 Reputation of a Review
                                                                                  those opinions (Equation 1). As such, every time a paper receives
The reputation of a review is similar to the one for papers but using             a new opinion, or every time the reputation of one of the reviewers
judgements instead of opinions. We say the reputation of a review                 changes, then the reputation of the paper must be recalculated .
is a weighted aggregation of its judgements, where the weight is the            • Review’s Reputation. The reputation of a review depends on the
reputation of the reviewer (Section 2.4).                                         extended judgements it receives, and the reputation of the review-
                                                                                  ers giving those judgements (Equation 6). As such, every time a
                8      X                                                          review receives a new extended judgements, or every time the rep-
                >
                >                RR (r) · v ⇤ (r, r0 , p)                         utation of one of the reviewers changes, then the reputation of the
                >
                >
                >
                < 8r2jud(r0 ,p)
                              X                           if |jud(r 0
                                                                      , p)|   k   review must be recalculated.
  RO (r0 , p) =                          R R (r)                                • Reviewer’s Reputation. The reputation of a reviewer depends on
                >
                >
                >
                >         8r2jud(r 0 ,p)                                          the extended judgements of other reviewers and their reputation
                >
                :
                  RR (r0 )                                otherwise               (Equation 5). As such, the reputation of the reviewer should be
                                                                          (6)     modified every time there is a new extended judgement or the rep-
where jud(r0 , p) = {r 2 R | v ⇤ (r, r0 , p) 6= ?} denotes the set of             utation of on of the reviewers changes. As the reputation of a re-
judges of a particular review written by r0 on a given paper p.                   viewer depends on the reputation of reviewers, then we suggest to
   Note that when a review receives less that k judgements, its repu-             calculate the reputation of all reviewers repeatedly (in a manner
tation will not depend on the judgements, but it will inherit the repu-           similar to EigenTrust) in order to converge. If this will be com-
tation of the author of the review (her reputation as a reviewer).                putationally expensive, then this can be computed once a day, as
   We currently leave k as a parameter, though we suggest that k > 1,             opposed to triggered by extended judgements and the change in
so that the reputation of a review is not dependent on a single judge.            reviewers’ reputation.
Again, we recommend small numbers for k, such as 2 or 3, because                • x-judgement. The extended judgement is calculated either based
we believe it will be difficult to obtain large numbers of judgements.            on judgements (if available) or the similarity between opinions
    (when judgements are not available) (Equation 3). As such, the             The ultimate aim of the evaluation is to investigate how close are
    extended judgement should be recalculated every time a new (di-         the calculated reputation values to the true values: the reputation of a
    rect) judgement is made, or every time a new opinion is added on        researcher as an author, the reputation of a researcher as a reviewer,
    a paper which already has opinions by other reviewers.                  and the reputation of a paper.
                                                                               The parameters and methods that drive and control the evolution
3     Evaluation through Simulation                                         of the community of researchers and the evolution of their research
                                                                            work are presented below.
3.1    Simulation
To evaluate the effectiveness of the proposed model, we have simu-          1. Number of authors. Every time a new paper is created, the simula-
lated a community of researchers, using NetLogo [8]. We clarify that           tor assigns authors for this paper. How many authors are assigned
the focus of this work is not implementing a simulation that models            is defined by the number of authors parameter (#co-authors ),
the real world, but a simulation that allows us to verify our model.           which is defined as a Poisson distribution. For every new paper, a
As such, many assumptions that we make for this simulation, and                random number is generated from this Poisson distribution. Who
will appear shortly, might not be precisely (or always) true in the real       to assign is chosen randomly from the set of researchers, although
world (such as having the true quality of a paper inherit the quality          sometimes, a new researcher is created and assigned to this paper
of the best author).                                                           (see the ‘researchers birth rate’ below). This ensures the number
   In our simulation, a breed in NetLogo (or a node in the research            of researchers in the community grows with the number of papers.
community’s graph) represents either a researcher, a paper, a review,       2. Number of reviewers. Every time a new paper is created, the sim-
or a judgement. The relations between breeds are: (1) authors of,              ulator also assigns reviewers for this paper. How many review-
that specifies which researchers are authors of a given paper, (2) re-         ers are assigned is defined by the number of reviewers parameter
viewers of, that specifies which researchers are reviewers of a given          (#reviewers ), which is defined as a Poisson distribution. For every
paper, (3) reviews of, that specifies which reviews give opinions on a         new paper, a random number is generated from this Poisson distri-
given paper, (4) judgements of, that specifies which judgements give           bution. As above, who to assign is chosen randomly from the set
opinions on a given review; and (5) judges of, that specifies which            of researchers, although sometimes, a new researcher is created
researchers have judged which other researcher.                                and assigned to this paper.
   Also, each researcher has four parameters that describe: (1) her         3. Researchers birth rate. As illustrated above, every paper requires
reputation as an author, (2) her reputation as a reviewer, (3) her true        authors and reviewers to be assigned to it. When assigning au-
research quality; and (4) her true reviewing quality. The first two are        thors and reviewers, the simulation will decide whether to assign
calculated by our ARM model, and they evolve over time. However,               an already existing researcher (if any) or create a new researcher.
the last two describe the researcher’s true quality with respect to writ-      This decision is controlled by the researchers birth rate parame-
ing papers as well as reviewing papers or other reviews, respectively.         ter (birth rate), which specifies the probability of creating a new
In other words, our simulation assumes true qualities exist, and that          researcher.
they are constant. In real life, there are no such measures. Further-       4. Researcher’s true research quality. The author’s true quality is
more, how good one is at writing papers or writing reviews or mak-             sampled from a beta distribution specified by the parameters ↵A
ing judgements naturally evolves with time. Nevertheless, we chose             and A . We choose the beta distribution because it is a very ver-
to keep the simulation simple by sticking to constant true qualities,          satile distribution which can be used to model several different
as the purpose of the simulation is simply to evaluate the correctness         shapes of probability distributions by playing with only two pa-
of our ARM model.                                                              rameters, ↵ and .
   Similar to researchers, we say each paper has two parameters that        5. Researcher’s true review quality. The reviewer’s true quality is
describe it: (1) its reputation, which is calculated by our ARM model,         sampled from a beta distribution specified by the parameters ↵R
and it evolves over time; and (2) its true quality. Again, we assume           and R . Again, the beta distribution is a very versatile distribution
that a paper’s true quality exists. How it is calculated is presented          which can be used to model several different shapes of probability
shortly.                                                                       distributions by playing with only two parameters, as illustrated
   Reviews also have two parameters: (1) the opinion provided by               shortly by our experiments.
the review, which in real life is set by the researcher performing the      6. Paper’s true quality. We assume that a paper’s true quality is the
review, while in our simulation it is calculated by the simulator, as          true quality of its best author, that is, the author with the high-
illustrated shortly; and (2) the reputation of the review, which is cal-       est true research quality). We believe this assumption has some
culated by our ARM model and it evolves over time.                             ground in real life. For instance, some behaviour (such as looking
   Judgements, on the other hand, only have one parameter: the opin-           for future collaborators, selecting who to give a funding to, etc.)
ion provided by the judgement, which in real life is set by the re-            assumes researchers to be of a certain quality, and their research
searcher judging a review, while in our simulation it is calculated by         work to follow that quality respectively.
the simulator, as illustrated shortly.                                      7. Opinion of a Review. The opinion presented by a review is spec-
   Simulation starts at time zero with no researchers in the commu-            ified as the paper’s true quality plus some noise, where the noise
nity, and hence, no papers, no reviews, and no judgements. Then,               depends on the reviewer’s true quality. This noise is chosen ran-
with every tick of the simulation, a new paper is created, which may           domly from the range [ (100 review quality)/2, +(100
sometimes require the creation of new researchers (either as authors           review quality)/2]. In other words, the maximum noise that can
or reviewers). With the new paper, reviews and judgements are also             be added for the worst reviewer (whose review quality is 0) is
created. How these elements are created is defined next by the simu-           ±50, and the least noise that can be added for the best reviewer
lator’s parameters and methods, that drive and control this behaviour.         (whose review quality is 100) is 0.
We note that a tick of the simulation does not represent a fixed unit       8. Opinion of a Judgement. The value (or opinion) of a judgement
in calendar time, but the creation of one single paper.                        on a review is calculated as the similarity between the review’s
  value (opinion) and the judge’s review value (opinion), where the                              Error in          Error in          Error in
  similarity is defined by the metric distance as: 100 |review                                   Reviewers’        Papers’           Authors’
  judge0 s review|. Note that, for simplification, direct judgements                             Reputation        Reputation        Reputation
  have not been simulated, we only rely on indirect judgements.
                                                                                  ↵R = 5 &       ⇠ 11 %            ⇠2%               ⇠ 22 %
                                                                                   R =1
3.2     Results                                                                   ↵R = 2 &       ⇠ 23 %            ⇠5%               ⇠ 23 %
                                                                                   R =1
3.2.1    Experiment 1: The impact of the community’s quality
                                                                                  ↵R = 1 &       ⇠ 30 %            ⇠7%               ⇠ 23 %
         of reviewers                                                              R =1

Given the above, we ran the simulator for 100 ticks (generating 100               ↵R = 0.1 &     ⇠ 34 %            ⇠5%               ⇠ 22 %
papers). We ran the experiment over 6 different cases. In each, we                 R = 0.1

had the following parameters fixed:                                               ↵R = 1 &       ⇠ 44 %            ⇠8%               ⇠ 23 %
                                                                                   R =2
      #co-authors = 2
                                                                                  ↵R = 1 &       ⇠ 60 %            ⇠9%               ⇠ 20 %
      #reviewers = 3                                                               R =2

      birth rate = 3                                                                   Table 1: The results of experiment 1, in numbers
      ↵A =   A = 1
                                                                           authors in the community (↵ = 5 and R = 1). For each of these
      k = 3 (of Equations 1 and 6)                                         cases, we then change the number of co-authors, investigating three
                                                                           cases: #co-authors = {0, 1, 2}. All other parameters remain set to
        = 1 (of Equation 2)
                                                                           those presented in experiment 1 above.
   The only parameters that changed where those defining the beta
                                                                              The results of this experiment are presented by Figure 3. The num-
distribution of the reviewers’ qualities. This experiment illustrated
                                                                           bers are presented in Table 2. The results show that the error in the
the impact of the community’s quality of reviewers on the correctness
                                                                           reviewers and papers reputation almost does not change for differ-
of the ARM model.
                                                                           ent numbers of co-authors. However, the error in the reputation of
   The results of the simulation are presented by Figure 2. For each
                                                                           authors does. When there are no co-authors (#co-authors = 0), the
case, the distribution of the reviewers’ true quality is illustrated to
                                                                           error in authors’ reputation is almost equal to the error in papers’
the right of the results. The results, in numbers, are also presented by
                                                                           reputation (Figures 3a and 3b). As soon as 1 co-author is added
Table 1. We notice that the least error is presented when the review-
                                                                           (#co-authors = 0), the error in authors’ reputation increases (Fig-
ers are all of relatively good quality, with the majority being great
                                                                           ures 3c and 3d). When 2 co-authors are added (#co-authors = 2), the
reviewers (Figure 2e). The errors start increasing as bad reviewers
                                                                           error in authors’ reputation reaches the maximum, around 20–22%
are added to the community (Figure 2c). They increase even further
                                                                           (Figures 3e and 3f). In fact, unreported results show that the error in
in both cases, when the quality of reviewers follows a uniform dis-
                                                                           authors’ reputation is almost the same in all cases for #co-authors
tribution (Figure 2a), as well as when the reviewers are equiprobably
                                                                           2.
good or bad, with no average reviewers (Figure 2b). As soon as the
majority of reviewers are of poor quality (Figure 2d), the errors in-                            Error in              Error in               Error in
crease even further, with the worst case being when good reviewers                               Reviewers’            Papers’                Authors’
are absent from the community (Figure 2f). These results are not sur-                            Reputation            Reputation             Reputation
prising. A paper’s true quality is not something that can be measured,                           ↵R =5;       ↵R =1;   ↵R =5;       ↵R =1;    ↵R =5;       ↵R =1;
                                                                                                  R =1         R =5     R =1         R =5      R =1         R =5
or even agreed upon. As such, the trust model depends on the opin-
ions of other researchers. As a result, the better the reviewing quality       #co-authors = 0   ⇠ 11%        ⇠ 60%    ⇠ 2%         ⇠ 9%      ⇠ 22%        ⇠ 20%
of researchers, the more accurate the trust model will be, and vice            #co-authors = 1   ⇠ 13%        ⇠ 57%    ⇠ 3%         ⇠ 9%      ⇠ 12%        ⇠ 15%
versa.                                                                         #co-authors = 2   ⇠ 13%        ⇠ 54%    ⇠ 3%         ⇠ 9%      ⇠ 2%         ⇠ 7%
   The numbers of Table 1 illustrate how the error in the papers’ rep-
                                                                                       Table 2: The results of experiment 2, in numbers
utation increases with the error in the reviewers’ reputation, though
at a smaller rate. One curious thing about these results is the constant
error in the reputation of authors. The next experiment investigates
this issue.
                                                                           4     Conclusion
   Last, but not least, we note that the error is usually stable. This
is because every time a paper is created, all the reviews it receives      We have presented the ARM reputation model for the academic
and the judgements those reviews receive are created at the same           world. ARM helps calculate the reputation of researchers, both as
simulation time-step. In other words, it is not the case that papers       authors and reviewers, and their research work. Additionally, ARM
accumulate more reviews and judgements over time, for the error to         also calculates the reputation of reviews.
decrease over time.                                                           Concerning the reputation of authors, the most commonly used
                                                                           reputation measure is currently the h-index [4]. However, the h-index
                                                                           has its flaws. For instance, the h-index can be manipulated through
3.2.2    Experiment 2: The impact of co-authorship
                                                                           self-citations [1, 3]. A study has also found the h-index as not pro-
In the second experiment, we investigate the impact of co-authorship       viding a significantly more accurate measure of impact than the total
on authors’ reputation. We choose the two extreme cases from ex-           number of citations [9]. ARM, on the other hand, bases the reputation
periment 1, when there are only relatively good authors in the com-        of authors on the opinions that their papers receive from other mem-
munity (↵ = 5 and R = 1), and when there are only relatively bad           bers in their academic community. We believe this should be a more
                                          distribution of                                                      distribution of
                                          researchers w.r.t.                                                   researchers w.r.t.
                                          review quality:                                                      review quality:




                         (a) ↵R = 1 and   R = 1                                             (b) ↵R = 0.1 and    R = 0.1




                                          distribution of                                                      distribution of
                                          researchers w.r.t.                                                   researchers w.r.t.
                                          review quality:                                                      review quality:




                         (c) ↵R = 2 and   R = 1                                               (d) ↵R = 1 and    R = 2




                                          distribution of                                                      distribution of
                                          researchers w.r.t.                                                   researchers w.r.t.
                                          review quality:                                                      review quality:




                         (e) ↵R = 5 and   R = 1                                               (f) ↵R = 1 and    R = 5

Figure 2: The impact of reviewers’ quality on reputation measures. For each set of results, the distribution of the reviewers’ true quality is
presented to the right of the results.
                                          distribution of                                                            distribution of
                                          researchers w.r.t.                                                         researchers w.r.t.
                                          review quality:                                                            review quality:




               (a) ↵R = 5,     R = 1, and #co-authors = 0                                (b) ↵R = 1,    R = 5, and #co-authors = 0




                                          distribution of                                                            distribution of
                                          researchers w.r.t.                                                         researchers w.r.t.
                                          review quality:                                                            review quality:




               (c) ↵R = 2,     R = 1, and #co-authors = 1                                (d) ↵R = 1,    R = 2, and #co-authors = 1




                                          distribution of                                                            distribution of
                                          researchers w.r.t.                                                         researchers w.r.t.
                                          review quality:                                                            review quality:




               (e) ↵R = 5,     R = 1, and #co-authors = 2                                (f) ↵R = 1,   R = 5, and #co-authors = 2

Figure 3: The impact of co-authorship on reputation of authors. For each set of results, the distribution of the reviewers’ true quality is presented
to the right of the results.
accurate approach, though future work should aim at comparing both         REFERENCES
approaches.
                                                                           [1] Christoph Bartneck and Servaas Kokkelmans, ‘Detecting h-index ma-
   Concerning the reputation of papers, the most common measure                nipulation through self-citation analysis’, Scientometrics, 87(1), 85–98,
currently used is the total number of citations a paper gets. Again,           (2010).
this measure can easily be manipulated through the self-citations. [7]     [2] Gunther Eysenbach, ‘Citation advantage of open access articles’, PLoS
presents an alternative approach based on the propagation of opin-             Biology, 4(5), e157, (05 2006).
                                                                           [3] Emilio Ferrara and Alfonso E. Romero, ‘Scientific impact evaluation
ions in structural graphs. It allows papers to build reputation either         and the effect of self-citations: Mitigating the bias by discounting the
from the direct reviews it receives, or inherit reputation from the            h-index’, Journal of the American Society for Information Science and
place where the paper is published. In fact, a sophisticated propa-            Technology, 64(11), 2332–2339, (2013).
gation model is proposed to allow reputation to propagate upwards          [4] J. E. Hirsch, ‘An index to quantify an individual’s scientific research out-
                                                                               put’, Proceedings of the National Academy of Sciences of the United
as well as downwards in structural graphs (e.g. from a section to a
                                                                               States of America, 102(46), 16569–16572, (2005).
chapter to a book, and vice versa). Simulations presented in [6] il-       [5] Nikolaus Kriegeskorte and Diana Deca, eds. Beyond open access: vi-
lustrate the potential impact of this model. ARM does not have any             sions for open evaluation of scientific papers by post-publication peer
notion of propagation. The model is strictly based on direct opinions          review, Frontiers in Computational Neuroscience. Frontiers E-books,
(reviews and judgements), and when no opinions are present, igno-              November 2012.
                                                                           [6] Nardine Osman, Jordi Sabater-Mir, Carles Sierra, and Jordi Madrenas-
rance is assumed (as in the default reputation of authors and papers).         Ciurana, ‘Simulating research behaviour’, in Proceedings of the 12th
   Concerning the reputation of reviews and reviewers, to our knowl-           International Conference on Multi-Agent-Based Simulation, MABS’11,
edge, these reputation measures have not been addressed yet. Never-            pp. 15–30, Berlin, Heidelberg, (2012). Springer-Verlag.
theless, we believe these are important measures. Conference man-          [7] Nardine Osman, Carles Sierra, and Jordi Sabater-Mir, ‘Propagation of
                                                                               opinions in structural graphs’, in Proceedings of the 2010 Conference
agement systems are witnessing a massive increase in paper submis-
                                                                               on ECAI 2010: 19th European Conference on Artificial Intelligence, pp.
sions, and in many disciplines, finding good reviewers is becoming a           595–600, Amsterdam, The Netherlands, The Netherlands, (2010). IOS
challenging task. Deciding what papers to accept/reject is sometimes           Press.
a challenge for conference and workshop organisers. ARM is a repu-         [8] Seth Tisue and Uri Wilensky, ‘Netlogo: Design and implementation of
tation model that addresses this issue by helping recognise the good           a multi-agent modeling environment’, in In Proceedings of the Agent
                                                                               Conference, pp. 161–184, (2004).
reviews/reviewers from the bad.                                            [9] Alexander Yong, ‘Critique of hirschs citation index: A combinatorial
   The obvious next steps for ARM is applying it to a real dataset.            fermi problem’, Notices of the American Mathematical Society, 61(11),
In fact, the model is currently being integrated with two Span-                1040–1050, (2014).
ish repositories: DIGITAL.CSIC (https://digital.csic.es) and e-IEO
(http://www.repositorio.ieo.es/e-ieo/). However, these repositories
do not have any opinions or judgements yet, and as such, time is
needed to start collecting this data. We are also working with the
IJCAI 2017 conference (http://ijcai-17.org) in order to allow review-
ers to review each other. We will collect the data of this conference,
which will provide us with the reviews and judgements needed for
evaluating our model. We will also continue to look through existing
datasets.
   Future work can investigate a number of additional issues. For in-
stance, we plan to provide data on the convergence performance of
the algorithm. One can also study the different types of attacks that
could impact the proposed computational model. While similarity
of reviews is now computed based on the similarity of the quantita-
tive opinions, the similarity between qualitative opinions may also
be used in future work by making use of natural language process-
ing techniques. Also, while we argue that direct opinion can help
the model avoid the pitfalls of the literature, it is also true that di-
rect opinions are usually scarce. As such, if needed, other informa-
tion sources for opinions may also be considered, such as citations.
This information can be translated into opinions, and the equations
of ARM should then change to give more weight to direct opinions
than other information sources.




ACKNOWLEDGEMENTS


This work has been supported by CollectiveMind (a project funded
by the Spanish Ministry of Economy & Competitiveness (MINECO),
grant # TEC2013-49430-EXP), and Open Peer Review Module for
Repositories (a project funded by OpenAIRE, which in turn is an EU
funded project).