A Crowd-Powered Model for Identifying Negative Citations Sujoy Chatterjee I3S Laboratory University of Nice Sophia-Antipolis 06903 Sophia Antipolis, France Email: sujoy.2611@gmail.com ABSTRACT In recent studies, it is observed that the evolve of negative citation In academics, the ranking of authors are usually done through the dif- is low but it can not be neglected as mentioned by Alexander Oettl, an ferent quality metrics like h-index, i10-index, etc. and these metrics economist at Georgia Institute of Technology in Atlanta. This study are basically based on the amount of citations received. Meanwhile, was on checking 750,0000 citations (mainly for 150000 papers) it is already established that all the citations received for a paper for a particular journal, namely, “Journal of immnunology" [2]. are not equal. Mainly, the distinction between sentiments of these In this experiment, the expertise of immunologist were taken into citations occurs as these can be received from two perspectives i.e., consideration to manually check the amount of negative citation of endorsement or criticism of the papers. Recently, keywords based these papers. A line of research is already performed in finding the NLP techinques are proposed to track these sentiments, still, there nature of citations with the effective use of NLP and manual expert are certain issues that require human perceptions to realize these annotation[1, 7, 8]. However, it is not always possible to retrieve sentiments. Therefore, the problem of identifying sentiments of cita- the exact sentiment of citation by using NLP tool. For example, a tions (positive, negative and neutral), if outsourced to the crowd and research can have many limitations but it may be the pioneering feedback are received then it can be resolved in effective way. In this work. So there should be a trade-off to judge the exact sentiment. paper, we introduce a crowdsourcing based semi-supervised model On the other hand, expert manual annotations are very time and that can be effective in finding negative citation and provide some cost consuming. Therefore, this task of recognizing negative aspects insights to build an efficient research paper recommender system by on the papers can be outsourced to the general people with little utilizing this immense power of crowd. expertise. As crowdsourcing can have a major role in solving a large task independently in distributed manner, therefore, leveraging the KEYWORDS power of human resources to quantify the citation can be very much helpful in proper decision making in time and cost effective way Crowdsourcing, Recommendation, Markov chain [4–6]. Basically, the citations can be quantified as positive, negative and neutral. Therefore, the feedback set contains these three options and crowd opinions are collected from them based on their own perspectives. Finally, the decisions can be aggregated from multiple crowd opinions. Now as there are possibilities of involvement of malicious crowd workers, therefore, a 2-stage annotation process 1 INTRODUCTION (independent and dependent manner) can be reliable to identify the The count of citations in academics is considered to have a major efficient crowd workers. impact in evaluating the credential of the proposed research. The In this model, the research papers are segregated into various amount of citation can play a major role in academic institution for sections by keyword based NLP based tool and the different portions securing better ranking, obtaining research grant, etc. In most of the are outsourced to the crowd to obtain their feedback. In this situa- situations, the citations coveted and received by the authors are in tion, no one can observe others’ opinions so these are independent the form of a compliment. A current study may be consistent with opinions (as shown in Fig. 1. After collecting their opinions, all the the previous work but pointing out its flaws, limitations, etc can opinions are revealed to them and again the opinions are collected be serious that may be a critical issue in receiving future citation. from them (as demonstrated in Fig. 2). Thus the opinions collected As a consequence, criticism obtained through citation may cause in second phase are basically the dependent opinions of the crowd falsification of citation [2]. A paper is needed to be observed for workers in a similar way as discussed [3]. So the challenges remain the next few years after the publication of it. As the technology and in obtaining the final sentiment of citation from these independent science incoporated in it are potentially brand new so it should be and dependent set of feedback. We propose a Markov chain based tested in the next few years. Therefore, understanding the sentiment model that can be utilized for reaching a consensus sentiment from (i.e., positive citation or negative citation) is crucial for those couple a set of independent and dependent opinions. of years and thus are needed to be tracked for the further growth of science and research. 2 PROPOSED MODEL Copyright © CIKM 2018 for the individual papers by the papers' Here we introduce a crowdsourcing model that outsources research authors. Copyright © CIKM 2018 for the volume as a collection papers to crowd and collects the sentiments of the citation from them. However, due to the existence of malicious crowd workers several by its editors. This volume and its papers are published under measures are needed to be adopted in order to produce noise-free the Creative Commons License Attribution 4.0 International (CC BY 4.0). Comment: of all the posterior opinions considering the question difficulty. Fi- nally, these metric are used to compute a weighted transition matrix of the Markov chain. We start with any stationary distribution vector of the option set having options ‘Positive’, ‘Negative’, and ‘Neutral’. The final distribution of the option set is obtained by multiplying the stationary distribution vector with the transition matrix. Ultimately, the stationary distribution converges after a certain iteration of time. Sen ment of this cita on can be Posi ve, Nega ve or Neutral. The option for which the distribution becomes maximum is treated as the final option. Thus this type of crowd powered system can be What is your opinion? very helpful for preliminary understanding of the sentiment of the papers. Posi ve Nega ve Neutral Along with this, an efficient user interface is needed to be de- signed to attract the crowd workers for soliciting their opinions. As Figure 1: Snapshot of collecting independent opinions. the opinions are obtained in two phases, hence, effective mecha- nism should be designed so that curiosity can be evolved in crowd Comment: workers towards providing their best possible answer. Moreover, as the count of negative citation is too low so imbalanced property should also be taken into account. Again, the convergence property of Markov chain proves that the oscillation of the crowd workers’ opinions becomes stable after a certain iteration. On the other hand, this methodology can be easily integrated with the research papers recommender system. Over the last decade, research papers recommender system has Sen ment of this cita on can be Posi ve, Nega ve or Neutral. emerged as a mainstream research area to find the relevant research (40% say Posi ve, 55% say Nega ve, 5% say Neutral) papers in an efficient way. However, most of the works in this area What is your revised opinion? deal with different aspects like citation analysis, rating, author collab- oration, recency, etc. A limited work concerning the negative citation Posi ve Nega ve Neutral of the papers is available in literature. Again, it is not feasible to continuous monitoring over the quality of citation while obtaining it. Figure 2: Snapshot of collecting dependent opinions. However, this can be easily done by voluntary crowdsourcing in a cost efficient manner. Due to the presence of non experts in crowd Independent Opinions Dependent Opinions Transition Matrix M of order 3x3 effective mechanism should be designed with an aim to extract better W1 Positive Negative opinions from them. Thus this proposed model has a major impact W2 not only in developing an efficient research paper recommender Neutral Positive system but also it introduces various new avenues in this domain W3 Positive Positive incorporating these vast human resources. Wk n REFERENCES Positive Negative Σ [1] X. Bai, I Lee, Z. Ning, A. Tolba, and F. Xia. 2017. The Role of Positive and i=1 Negative Citations in Scientific Evaluation. IEEE Accesss 5 (2017), 17607–17616. Probability of Out of K annotators, For each pair of score i and j, [2] Christian Catalini, Nicola Lacetera, and Alexander Oettl. 2015. The incidence each opinion = 1 / (k) n are changing their scores value in transition matrix M Accept ---> Reject and role of negative citations in science. Proceedings of the National Academy of Sciences 112, 45 (2015), 13823–13826. [3] S. Chatterjee, A. Mukhopadhyay, and M. Bhattacharyya. 2017. Dependent Judg- ment Analysis: A Markov Chain based Approach for Aggregating Crowdsourced Figure 3: Snapshot of the workflow to compute weighted tran- Opinions. Information Sciences 386 (2017), 83–96. [4] G. Demartini, D. E. Difallah, and C. Mauroax. 2012. Zencrowd: leveraging sition matrix. Here, the options are ‘Positive’, ‘Negative’ and probabilistic reasoning and crowdsourcing techniques for large scale entity linking. ‘Neutral’. In Proceedings of the 21st International Conference on World Wide Web. Lyon, France, 469–478. [5] D. Hovy, T. B. Kirkpatrick, A. Vaswani, and E. Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of the NAACL-HLT. Atlanta, Georgia, 1120– decision. In this model, the opinions are basically of independent 1130. and dependent types. Observation over their opinions from indepen- [6] V. C. Raykar and S. Yu. 2011. Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks. Journal of Machine Learning Research 13 (2011), dent to dependent situation are very crucial to quantify the better 491–518. transition. Here the major challenges are how to define different [7] J. Tang, X. Hu, and H. Liu. 2014. Is distrust the negation of trust?: The value of quality metric criteria to identify the expert workers. We can take the distrust in social media. In HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, 148–157. effect of confidence gap (deivation from independent score to depen- [8] Christiaan H Vinkers, Joeri K Tijdink, and Willem M Otte. 2015. Use of posi- dent score), reliability (closeness with majority opinions), accuracy tive and negative words in scientific PubMed abstracts between 1974 and 2014: retrospective analysis. BMJ 351 (2015). (closeness with mean opinion) of the crowd workers. In addition to that, we measure the deviation of the worker’s opinion from the mean