A Crowd-Powered Model for Identifying Negative Citations
                                                                 Sujoy Chatterjee
                                                                 I3S Laboratory
                                                       University of Nice Sophia-Antipolis
                                                        06903 Sophia Antipolis, France
                                                         Email: sujoy.2611@gmail.com

ABSTRACT                                                                         In recent studies, it is observed that the evolve of negative citation
In academics, the ranking of authors are usually done through the dif-       is low but it can not be neglected as mentioned by Alexander Oettl, an
ferent quality metrics like h-index, i10-index, etc. and these metrics       economist at Georgia Institute of Technology in Atlanta. This study
are basically based on the amount of citations received. Meanwhile,          was on checking 750,0000 citations (mainly for 150000 papers)
it is already established that all the citations received for a paper        for a particular journal, namely, “Journal of immnunology" [2].
are not equal. Mainly, the distinction between sentiments of these           In this experiment, the expertise of immunologist were taken into
citations occurs as these can be received from two perspectives i.e.,        consideration to manually check the amount of negative citation of
endorsement or criticism of the papers. Recently, keywords based             these papers. A line of research is already performed in finding the
NLP techinques are proposed to track these sentiments, still, there          nature of citations with the effective use of NLP and manual expert
are certain issues that require human perceptions to realize these           annotation[1, 7, 8]. However, it is not always possible to retrieve
sentiments. Therefore, the problem of identifying sentiments of cita-        the exact sentiment of citation by using NLP tool. For example, a
tions (positive, negative and neutral), if outsourced to the crowd and       research can have many limitations but it may be the pioneering
feedback are received then it can be resolved in effective way. In this      work. So there should be a trade-off to judge the exact sentiment.
paper, we introduce a crowdsourcing based semi-supervised model              On the other hand, expert manual annotations are very time and
that can be effective in finding negative citation and provide some          cost consuming. Therefore, this task of recognizing negative aspects
insights to build an efficient research paper recommender system by          on the papers can be outsourced to the general people with little
utilizing this immense power of crowd.                                       expertise. As crowdsourcing can have a major role in solving a large
                                                                             task independently in distributed manner, therefore, leveraging the
KEYWORDS                                                                     power of human resources to quantify the citation can be very much
                                                                             helpful in proper decision making in time and cost effective way
Crowdsourcing, Recommendation, Markov chain
                                                                             [4–6]. Basically, the citations can be quantified as positive, negative
                                                                             and neutral. Therefore, the feedback set contains these three options
                                                                             and crowd opinions are collected from them based on their own
                                                                             perspectives. Finally, the decisions can be aggregated from multiple
                                                                             crowd opinions. Now as there are possibilities of involvement of
                                                                             malicious crowd workers, therefore, a 2-stage annotation process
1    INTRODUCTION                                                            (independent and dependent manner) can be reliable to identify the
The count of citations in academics is considered to have a major            efficient crowd workers.
impact in evaluating the credential of the proposed research. The                In this model, the research papers are segregated into various
amount of citation can play a major role in academic institution for         sections by keyword based NLP based tool and the different portions
securing better ranking, obtaining research grant, etc. In most of the       are outsourced to the crowd to obtain their feedback. In this situa-
situations, the citations coveted and received by the authors are in         tion, no one can observe others’ opinions so these are independent
the form of a compliment. A current study may be consistent with             opinions (as shown in Fig. 1. After collecting their opinions, all the
the previous work but pointing out its flaws, limitations, etc can           opinions are revealed to them and again the opinions are collected
be serious that may be a critical issue in receiving future citation.        from them (as demonstrated in Fig. 2). Thus the opinions collected
As a consequence, criticism obtained through citation may cause              in second phase are basically the dependent opinions of the crowd
falsification of citation [2]. A paper is needed to be observed for          workers in a similar way as discussed [3]. So the challenges remain
the next few years after the publication of it. As the technology and        in obtaining the final sentiment of citation from these independent
science incoporated in it are potentially brand new so it should be          and dependent set of feedback. We propose a Markov chain based
tested in the next few years. Therefore, understanding the sentiment         model that can be utilized for reaching a consensus sentiment from
(i.e., positive citation or negative citation) is crucial for those couple   a set of independent and dependent opinions.
of years and thus are needed to be tracked for the further growth of
science and research.                                                        2    PROPOSED MODEL
Copyright © CIKM 2018 for the individual papers by the papers'               Here we introduce a crowdsourcing model that outsources research
authors. Copyright © CIKM 2018 for the volume as a collection                papers to crowd and collects the sentiments of the citation from them.
                                                                             However, due to the existence of malicious crowd workers several
by its editors. This volume and its papers are published under
                                                                             measures are needed to be adopted in order to produce noise-free
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
   Comment:                                                                                  of all the posterior opinions considering the question difficulty. Fi-
                                                                                             nally, these metric are used to compute a weighted transition matrix
                                                                                             of the Markov chain. We start with any stationary distribution vector
                                                                                             of the option set having options ‘Positive’, ‘Negative’, and ‘Neutral’.
                                                                                             The final distribution of the option set is obtained by multiplying the
                                                                                             stationary distribution vector with the transition matrix. Ultimately,
                                                                                             the stationary distribution converges after a certain iteration of time.
         Sen ment of this cita on can be Posi ve, Nega ve or Neutral.
                                                                                             The option for which the distribution becomes maximum is treated
                                                                                             as the final option. Thus this type of crowd powered system can be
                                What is your opinion?                                        very helpful for preliminary understanding of the sentiment of the
                                                                                             papers.
                   Posi ve                     Nega ve                   Neutral                Along with this, an efficient user interface is needed to be de-
                                                                                             signed to attract the crowd workers for soliciting their opinions. As
     Figure 1: Snapshot of collecting independent opinions.                                  the opinions are obtained in two phases, hence, effective mecha-
                                                                                             nism should be designed so that curiosity can be evolved in crowd
   Comment:                                                                                  workers towards providing their best possible answer. Moreover,
                                                                                             as the count of negative citation is too low so imbalanced property
                                                                                             should also be taken into account. Again, the convergence property
                                                                                             of Markov chain proves that the oscillation of the crowd workers’
                                                                                             opinions becomes stable after a certain iteration. On the other hand,
                                                                                             this methodology can be easily integrated with the research papers
                                                                                             recommender system.
                                                                                                Over the last decade, research papers recommender system has
          Sen ment of this cita on can be Posi ve, Nega ve or Neutral.                       emerged as a mainstream research area to find the relevant research
               (40% say Posi ve, 55% say Nega ve, 5% say Neutral)
                                                                                             papers in an efficient way. However, most of the works in this area
                                What is your revised opinion?                                deal with different aspects like citation analysis, rating, author collab-
                                                                                             oration, recency, etc. A limited work concerning the negative citation
                   Posi ve                     Nega ve                   Neutral
                                                                                             of the papers is available in literature. Again, it is not feasible to
                                                                                             continuous monitoring over the quality of citation while obtaining it.
       Figure 2: Snapshot of collecting dependent opinions.
                                                                                             However, this can be easily done by voluntary crowdsourcing in a
                                                                                             cost efficient manner. Due to the presence of non experts in crowd
  Independent Opinions       Dependent Opinions
                                                          Transition Matrix M of order 3x3   effective mechanism should be designed with an aim to extract better
  W1

       Positive                     Negative                                                 opinions from them. Thus this proposed model has a major impact
  W2                                                                                         not only in developing an efficient research paper recommender
        Neutral                     Positive
                                                                                             system but also it introduces various new avenues in this domain
  W3
        Positive                    Positive
                                                                                             incorporating these vast human resources.

  Wk
                                                           n
                                                                                             REFERENCES
       Positive                     Negative
                                                         Σ                                    [1] X. Bai, I Lee, Z. Ning, A. Tolba, and F. Xia. 2017. The Role of Positive and
                                                         i=1                                      Negative Citations in Scientific Evaluation. IEEE Accesss 5 (2017), 17607–17616.
      Probability of        Out of K annotators,          For each pair of score i and j,     [2] Christian Catalini, Nicola Lacetera, and Alexander Oettl. 2015. The incidence
  each opinion = 1 / (k) n are changing their scores       value in transition matrix M
                             Accept ---> Reject                                                   and role of negative citations in science. Proceedings of the National Academy of
                                                                                                  Sciences 112, 45 (2015), 13823–13826.
                                                                                              [3] S. Chatterjee, A. Mukhopadhyay, and M. Bhattacharyya. 2017. Dependent Judg-
                                                                                                  ment Analysis: A Markov Chain based Approach for Aggregating Crowdsourced
Figure 3: Snapshot of the workflow to compute weighted tran-                                      Opinions. Information Sciences 386 (2017), 83–96.
                                                                                              [4] G. Demartini, D. E. Difallah, and C. Mauroax. 2012. Zencrowd: leveraging
sition matrix. Here, the options are ‘Positive’, ‘Negative’ and                                   probabilistic reasoning and crowdsourcing techniques for large scale entity linking.
‘Neutral’.                                                                                        In Proceedings of the 21st International Conference on World Wide Web. Lyon,
                                                                                                  France, 469–478.
                                                                                              [5] D. Hovy, T. B. Kirkpatrick, A. Vaswani, and E. Hovy. 2013. Learning Whom to
                                                                                                  Trust with MACE. In Proceedings of the NAACL-HLT. Atlanta, Georgia, 1120–
decision. In this model, the opinions are basically of independent                                1130.
and dependent types. Observation over their opinions from indepen-                            [6] V. C. Raykar and S. Yu. 2011. Eliminating Spammers and Ranking Annotators for
                                                                                                  Crowdsourced Labeling Tasks. Journal of Machine Learning Research 13 (2011),
dent to dependent situation are very crucial to quantify the better                               491–518.
transition. Here the major challenges are how to define different                             [7] J. Tang, X. Hu, and H. Liu. 2014. Is distrust the negation of trust?: The value of
quality metric criteria to identify the expert workers. We can take the                           distrust in social media. In HT 2014 - Proceedings of the 25th ACM Conference
                                                                                                  on Hypertext and Social Media. Association for Computing Machinery, 148–157.
effect of confidence gap (deivation from independent score to depen-                          [8] Christiaan H Vinkers, Joeri K Tijdink, and Willem M Otte. 2015. Use of posi-
dent score), reliability (closeness with majority opinions), accuracy                             tive and negative words in scientific PubMed abstracts between 1974 and 2014:
                                                                                                  retrospective analysis. BMJ 351 (2015).
(closeness with mean opinion) of the crowd workers. In addition to
that, we measure the deviation of the worker’s opinion from the mean