=Paper= {{Paper |id=Vol-2791/2020200007 |storemode=property |title=The Reference Analysis as a Quality Characteristic of a Scientific Article |pdfUrl=https://ceur-ws.org/Vol-2791/2020200007.pdf |volume=Vol-2791 |authors=Yulia Hlavcheva,Olga Kanishcheva }} ==The Reference Analysis as a Quality Characteristic of a Scientific Article== https://ceur-ws.org/Vol-2791/2020200007.pdf
                     The Reference Analysis as a Quality Characteristic of a
                                      Scientific Article


                                          Yulia Hlavcheva1, and Olga Kanishcheva1
                           1 National Technical University “Kharkiv Polytechnic Institute”, Kharkiv, Ukraine

                                       (glavjul, kanichshevaolga)@gmail.com




                         Abstract. Nowadays the qualitative characteristics of a scientific document are
                         becoming more and more relevant because a large number of conferences, sem-
                         inars, and journals generate a huge amount of scientific papers. A scientific pa-
                         per consists of such elements as title, information about the authors, abstract,
                         keywords, body and a list of references. References are one of the important
                         factors that affect paper quality. In this paper, the authors analyze the qualita-
                         tive characteristics of the cited sources and highlight the formal features that
                         characterize the quality of the references. Ukrainian scientific papers were used
                         as data for the experiments. In this paper, the authors have developed an appli-
                         cation that allows the reviewer to analyze the list of paper references and an ap-
                         proach to the analysis of the bibliography, which allows identifying those
                         sources that may not be relevant to the research topic. This approach allows de-
                         termining the artificial increase of irrelevant references.


                         Keywords. Article Quality, Citation Analysis, References, Academic Plagia-
                         rism, Academic Integrity.


                 1       Introduction

                 The scientific activity assessment is based on the analysis of information flows, which
                 are presented as documents. The quality of these documents affects the evaluation
                 quality, therefore the task of developing the indicator and methods for assessing the
                 quality of scientific documents is very timely and relevant.
                    This paper discusses how quality is interpreted and how it is measured. Research
                 quality is a multidimensional concept, where plausibility/soundness, originality, sci-
                 entific value, and societal value commonly are perceived as key characteristics [1].
                    According to the final report of the European project "European Educational Re-
                 search Quality Indicators", a separate project area is the development and testing of
                 internal and external quality indicators [2]. Internal quality indicators can be identi-
                 fied from the text and external quality indicators are metadata, bibliometric and/or




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
8


webometric information. The list of indicators are rigour; originality; significance (for
other researchers, policy, and practice); integrity (considerations of authenticity, hon-
esty and ethical requirements in the conduct of research); style (including clarity,
communicability, eloquence, and elegance).
    In our opinion, the rigor, originality, and integrity indicators depend on the quality
of the scientific sources studied to a certain extent. Therefore, quotes and a reference
list also can be considered as indicators of document quality. The authors of [3] con-
firm the citation influence not only on other paper characteristics but also has an in-
fluence on the whole document.


2      Background

A scientific paper consists of such elements as title, information about the authors,
abstract, keywords, body and list of references. We focus on the study of the scientific
reference list.
   Citation is an essential component of any scientific work and one of the important
means of scientific communication. In scientific publications, citation can be consid-
ered in various aspects. Citation is used to solve problems in many directions. Some
of them are presented in Table 1.

                           Table 1. Research directions of citation.

Direction                                                              Publications
Assessment of scientific results for scientific groups, departments
                                                                       [4], [5], [6], [7]
and institutions (bibliometric methods)
Academic rankings (local and global) and distribution of research
                                                                       [8], [9]
funding
Document quality                                                       [10], [11], [12]
Determine of Manipulation of scientometric indicators                  [13], [14], [15]
Identification of intellectual plagiarism (citation-based plagiarism   [16], [17], [18], [19],
detection, definition of an idea)                                      [20], [21]

    All listed directions are important. Bibliometric methods are used widely nowa-
days. The indicators which are determined on the basis of citation effects the result of
academic ratings and the distribution of funding. This contributes to the emergence
case of academic fraud and manipulation of indicators in the academic environment.
Therefore, ensuring a qualitative assessment of research is relevant.
    In practice, the responsibility for the article quality lies with the scientific reviewer,
who is an expert in the researched field. He makes decisions based on an analysis of
all the substantive and formal properties of a publication, including analyzing a refer-
ence list. The authors independently determine the appropriateness and rationality for
using a quote.
    The authors of this paper aim to investigate and describe the qualitative character-
istics of the reference list in already published articles and, based on the mentioned
                                                                                       9


characteristics to propose a software to the reviewer (expert). The proposed software
will be recommendatory and will help to reviewer quickly examine the paper and pay
attention to certain formal features of the reference list and may help to indicate the
unreasonable use of sources.


3      Research of Ukrainian Papers in Web of Science Core
       Collection and Scopus

The reviewer's task is to perform a comprehensive evaluation of scientific work. Peer
review can be divided into two stages: i) analysis and evaluation of formal features;
ii) scientific review of the publication content. Formal features include the following
characteristics: total number of citations; time ranges for quotes; number (percentage)
of unique source names; quality of sources (indicators and types); a percentage of
self-citation; inconsistency of certain citations with the subject of publication; non-
specific sections with excessive citations.
    By the example of already published articles that were scientifically reviewed and
included in the Web of Science Core Collection (WoS CC) and Scopus, we analyze
and characterize some of them of formal features.
    We used the 2018 publications on a theme of own scientific work (indexation in
WoS CC as of 06.2019) for the analysis of paper formal properties. It is the 83 publi-
cations, they were selected from publications included in WoS CC (topic "computa-
tional linguistics", 2018 year, categories "computer science", "artificial intelligence",
"language linguistics", "information science library science", "computational linguis-
tics").
    It should be noted that the average number of sources for publications in various
thematic areas is different. The average number of sources in the list for the indicated
topic is 39, with the exception of 6 review ones (more than 100 citations).
    We determined the percentage of sources included in WoS CC, the year of publica-
tion, and the quantity and quality of unique sources for the 20 publications from the
list (83 publications).
    The citation number in the bibliographic lists of 20 publications is 1087; 506 cita-
tions (47%) of them are included in WoS CC. The reliability of publications data
included in scientometric databases is not in doubt. The average percentage of links to
external sources in the references list is 53%. Possible reasons are certain scientific
sources are not included in scientometric databases or scientists have limited access to
foreign publications.
    The relevance and modernity of the study are evidenced by the use of a significant
number of publications in recent years. Review papers are exceptions since a thor-
ough study of the topic is necessary for a qualitative examination of the topic.
    The citation structure by years of publication is shown in Fig. 1. Authors often use
publications of recent years (2010-2018 – 57%). But older publications (1950-1969)
may be considered in review papers.
10




Fig. 1. Citations by publication year.

   To ensure the completeness of the study, it is necessary to analyze materials from
different sources. The more diverse the list is the better. Therefore, we investigate
links from our collection and select unique names of sources for each paper. The total
number of unique source names is 245. Fig. 2 shows the percentage of unique sources
for 20 publications. The average percentage of unique sources is 48%.




Fig.2. Percentage of unique sources for 20 publications.

    The quality of the reference source is also important. The reviewer, who is an ex-
pert on the topic, has information about heavyweight journals in the field and can
identify their names on the list. Table 2 presents scientometric indicators of journals
which often used in reference lists (TOP-10). More citations (9 out of 20 documents)
were identified in the “Computational Linguistics” journal. All journals are well-
known and respected.
    An important characteristic of the references is the percentage of self-citation. Au-
thoritative publishers recommend authors to limit the amount of self-citation to 30%
of the total number of sources in the citation list. It is believed that this is enough to
demonstrate the previous and related works of the authors. In practice, self-citation
can be different (from 0% to 100%).
    The paper [15] describes the self-citation analysis for a data set of 7 million scien-
tists in the world. The result, the median self-citation rate is 15.5%. Scientists from
the United Kingdom, United States, Turkey have self-citation rates below the median;
Japan, China have on the median level. Ukrainian scientists belong to a group of sci-
entists with self-citation rates up to 40%.
                                                                                                 11


    In our paper, we investigated the effect of self-citation and obtained a similar to
[15] result for Ukraine. We analyzed 100 author profiles of Ukrainian scientists (Sco-
pus, Computer Sciences area) and defined the part of self-citation in the total citation
and its influence on the author’s h-index: average self-citation is 35%; MAX % self-
citation is 96%; MAX % growth of the h-index is 80%; h-index unchanged in 12 pro-
files.

             Table 2. TOP 10 unique sources from citation lists and their indicators.

Name of journal                            Number of      Impact    Impact Factor     Quartile
                                           documents      Factor    (5 years)
Computational Linguistics                  9              1.319     2.202             Q3, Q3, Q2
Journal of the American Society for        7              2.452     2.762             Q1, Q1
Information Science and Technology
Science                                    6              41.058    40.627            Q1
Communications of the ACM                  5              3.063     5.29              Q1, Q1, Q1
Journal of Machine Learning Research       5              2.281     5.805             Q2, Q2
ACM Transactions on Information            4              1.767     2.203             Q2
Systems
Biometrics                                 4              1.524     1.962             Q3, Q3, Q2
Journal of the Association for Infor-      4              2.835     2.931             Q2, Q1
mation Science and Technology
Plos One                                   4              2.766     3.352             Q1
Artificial Intelligence                    3              3.034     4.156             Q1


   The authors' profiles of Ukrainian scientists distribution by influence self-citation
degree on the h-index is shown in Fig. 3. The index increased from 1% to 20% for 56
profiles; for 24 profiles – 21%-30%; for 6 profiles – 31%-40%; for 6 profiles – 41%-
50%. The h-index in-creased by more than 51% for 8 profiles.




Fig. 3. Authors' profiles distribution by influence self-citation degree on the h-index.
12


   It is determined that self-citation affects the scientometric indices of the authors. It
is very difficult to determine the authors' abuse based on the count of self-citation.
The authors determine the expediency and justification for the use of quotes, so the
problem of self-citation is on an ethical plane. The self-citation percentage of the total
number of citations for 100 author profiles is presented in Figure 4.




Fig. 4. The self-citation percentage of the citation total number of citations for 100 author pro-
files.

    The use of information tools to automate scientific activity accelerates the scien-
tific process. The reference list, formed in the required format, allows the use of soft-
ware for data analysis.
    We use VOSviewer for quick visualization and analysis of information about au-
thors and the subject of links. It's a software tool for creating visualized scientific
landscapes based on textual data. For experiments, we employed the data of Web of
Science CC. Through VOSviewer we can quickly analyze such indicators as i) exces-
sive self-citation – a network of author citing; ii) relevance of the quotation topic to
the paper topic – keyword analysis from the title and annotation. Fig. 5 shows the
author citation network from the reference list for paper1 with using VOSviewer.




Fig. 5. VOSviewer – author analyze with using citations.
                                                                                           13


   The published author is associated with all links and he is present in all 9 citations.
In addition to self-citation, unscrupulous authors may cite papers “on order”. Formal-
ly, such sources differ in topic and are not related to a specific study. VOSviewer
visualizes the semantic relationships between words (title, annotations, keywords). In
Fig. 6 we presented the publication titles and annotations from the paper2 citations.




Fig. 6. VOSviewer – publication name analyze and annotations from citations.

   All words are closely related and demonstrate a certain interdisciplinary interaction
(Fig. 6). Thus, we examined the following features and identified them for our data
sets. Our results are presented in Table 3.

                                  Table 3. Data sets features.

                                                     Min value of   Max value of   Average
                                                     citations      citations      value of
                                                                                   citations
1   Sources on the list (for paper)                  5              89             39
2   Sources on the list (for a review paper)         128            390            196
3   Year of publication:                             1950           2019           -
        • 2019 (2 quotes)
        • 2018 (15 citations)
        • 2017 (25 citations)
        • 2016 (45 citations)
    2015 (39 citations)
4   Unique sources                                   13%            100%           72%
5   Self-quoting author profile,% of total citations 5%             96%            36%
    (except for profiling without self-quoting)
6   Increase in the author’s h-index due to self-    5%             80%            25%
    citation (except for profiling without self-
    quoting)
14



   The described characteristics are not clearly formalized, and therefore the appro-
priateness of using sources is confirmed by reviewing the content of the publication.
Thus, our task is to develop the special software in order to distinguish and present
the characteristics of the list for the expert review.


4      Experiments and Application for Reviewer

The reviewer's task is to perform a comprehensive evaluation of scientific work. In
this section, we show 1) a developed application that allows the reviewer to analyze
the paper reference list; 2) an approach to the analysis of the bibliography, which
allows identifying those sources that may not be relevant to the research topic and,
accordingly, artificially increase the performance of other authors.

4.1    Software for Reviewer

   In the application development for analyzing the reference list, we tried to take into
account not only our research but and the recommendations of conferences and jour-
nal’s editorial boards. These main functions we presented below.
   Analysis of the publication year. This function helps with the issue of publication
relevance, how relevant they are at the time of writing. Our application has a thresh-
old field in which the user can enter the year and the program calculates the number
of publications before and after this year. This function allows you to quickly under-
stand whether the author has analyzed the latest research in this area or not (Fig. 7).




Fig. 7. The example analysis of the publication year.

   As an example, in the field “Results” Fig. 7 a reviewer can see how many publica-
tions belong to the 2015 year and higher and a percentage value for these papers to-
wards the total number.
                                                                                       15


  Analysis of self-citation. The user needs to enter the authors of the papers in the
“Authors” field and the program gives him the papers of these authors from the refer-
ence list and calculates the percentage of the total number (Fig. 8).




Fig. 8. The self-citation analysis example.

   As an example, in the field “Results” Fig. 8 a reviewer can see that an author with
the surname “Mazov” has 2 papers and a percentage value for self-citation is 33,3%.

4.2     Identification of Irrelevant Sources in Reference List

    Analyzing the difficulties which reviewers face, we found such a problem as an ar-
tificial increase in the citation for a publication. This is realized by citing a source
irrelevant to the main topic.
    In order to identify such publications, we propose an approach that uses the meth-
ods of computational linguistics and determines the proximity between the sources of
references, and can also take into account paper keywords if it is necessary. Consider
the following example, we have the next reference list (as an example, we take the
reference list from our paper), which consists of 21 sources [1-21] from this paper.
Define these papers as P1, … P21. We artificially added the paper to this list that is not
relevant to this topic. This is the following source:
    Lefèvre T, Gouagna L-C, Dabiré KR, Elguero E, Fontenille D, Renaud F, et al.
(2010) Beer Consumption Increases Human Attractiveness to Malaria Mosquitoes.
PLoS ONE 5(3): e9546. https://doi.org/10.1371/journal.pone.0009546
    Denote this source as P22. In order to determine the less similar source to the paper
topic, we compare the title of each paper with the title of our work “The References
Analysis as a Quality Characteristic of a Scientific Article”. The comparison we im-
plement with similarity measure from Spacy library and word embedding models. In
the Spacy library, a full sentence word-embedding calculates as an average over all
words in the sentence. Before processing, we deleted stopwords in each sentence. As
a result, we got Fig. 9.
16




Fig. 9. Results of semantic comparison of the paper title with the article titles from the refer-
ence list.

   We received the minimum value of 0,13 for P9. The title of this publication was
obtained by transliteration from Russian. Therefore, such proximity coefficient was
obtained. However, for the publication with number P22, we received low value and
this confirmed our hypothesis. Because this is our "artificial" publication. Other ele-
ments from our reference list have very close values from 0,6 to 0,8. It should be not-
ed that this approach does not provide a 100% guarantee for identifying such irrele-
vant sources, but can help identify candidates for such references.


5      Conclusions and Recommendations

   In this work, we have analyzed the qualitative characteristics of a scientific docu-
ment and focused attention on the paper reference list as an object for research. For
research, we selected articles from journals included in the Web of Science Core Col-
lection and data profiles of Ukrainian scientists from Scopus. For analysis, we used
the capabilities of Web of Science Core Collection, Scopus, VOSviewer, MS Excel.
   According to our research results, we recommend the reviewer, first of all, pay at-
tention to the formalized properties of the citation list. The reference list is researched
in this publication and results demonstrate that the quality of the paper content could e
defined through the citation list analysis. Due to the allocation and formalization of
the citation list properties, it became possible to create a special software tool for
reviewers.
   We developed an application that allows the reviewer to analyze the reference list
of paper and proposed the approach to the analysis of the bibliography, which allows
identifying those sources that may not be relevant to the research topic. Our experi-
ments showed that the proposed approach is worked well enough and our next step
will be to experiment on the big data sets.
                                                                                            17


References
 1. Aksnes, Dag W., Liv Langfeldt, and Paul Wouters: Citations, citation indicators, and re-
    search quality: An overview of basic concepts and theories. Sage Open 9(1), 1-17 (2019).
 2. European Educational Research Quality Indicators. Project Final Report,
    http://eerqi.eu/sites/default/files/Final_Report.pdf, last accessed 2020/02/01.
 3. Tahamtan, Iman, Askar Safipour Afshar, and Khadijeh Ahamdzadeh: Factors affecting
    number of citations: a comprehensive review of the literature. Scientometrics 107(3),
    1195-1225 (2016).
 4. Biagioli M.: Quality to Impact, Text to Metadata: Publication and Evaluation in the Age of
    Metrics. KNOW: A Journal on the Formation of Knowledge 2(2), 249-275 (2018).
 5. Hyland K.: Self‐citation and self‐reference: Credibility and promotion in academic publi-
    cation. Journal of the American Society for Information Science and technology 54(3),
    251-259 (2003).
 6. Moed, H. F.: Citation analysis in research evaluation. Springer, Dordrecht,
    The Netherlands (2005).
 7. Cabezas-Clavijo, A., Robinson-Garcia, N., Escabias, M., & Jimenez-Contreras, E.: Re-
    viewers’ ratings and bibliometric indicators: Hand in hand when assessing over research
    proposals? PLoS ONE 8(6), (2013). – doi:10.1371/journal.pone.0068258.
 8. Piro, F. N., Sivertsen, G.: How can differences in international university rankings be ex-
    plained? Scientometrics 109, 2263-2278 (2016).
 9. Akoev, M., Markusova, V., Moskaleva, O., Pislyakov, V.: Rukovodstvo po Naukometrii:
    Indikatori Razvitiia Naukii Tehnologii [Handbook for Scientometrics: Indicators of sci-
    ence and technology development]. Ural Federal Univ, Ekaterinburg (2014).
10. Shibayama, S., Wang, J.: Measuring originality in science. Scientometrics 122, 409–427
    (2020). – doi:10.1007/s11192-019-03263-0.
11. Tahamtan I., Afshar A. S., Ahamdzadeh K.: Factors affecting number of citations: a com-
    prehensive review of the literature. Scientometrics 107 (3), 1195-1225 (2016).
12. Krapež K.: The (Un) Originality of Scientific Papers–An Analysis of Professional Quality
    Standards. In: Management, Knowledge, and Learning International Conference. Make
    Learn, Zadar (2013).
13. Baas J., Fennell C.: When peer reviewers go rogue-Estimated prevalence of citation ma-
    nipulation by reviewers based on the citation patterns of 69,000 reviewers. In: ISSI 2019.
    Rome, Italy (2019).
14. Ioannidis, John PA, Richard Klavans, and Kevin W. Boyack.: Thousands of scientists pub-
    lish a paper every five days. Nature 561(7722), pp. 167-170 (2018).
15. Van Noorden, Richard, and Dalmeet Singh Chawla: Hundreds of extreme self-citing scien-
    tists revealed in new database. Nature 572, pp. 578-579 (2019). – doi: 10.1038/d41586-
    019-02479-7.
16. Gañan D.: Plagiarism Detection. In: Baneres D., Rodríguez M., Guerrero-Roldán A.: En-
    gineering Data-Driven Adaptive Trust-based e-Assessment Systems. Lecture Notes on Da-
    ta Engineering and Communications Technologies, vol. 34. Springer, Cham (2020).
17. Foltýnek T., Meuschke N., Gipp B.: Academic plagiarism detection: a systematic literature
    review. ACM Computing Surveys (CSUR) 52(6), 1-42 (2019).
18. Gipp B: Citation-based plagiarism detection. In: Citation-based plagiarism detection,
    pp. 57–88. Springer Vieweg, Wiesbaden (2014).
19. Mazov N., Gureev V., Kosyakov D.: On the development of a plagiarism detection model
    based on citation analysis using a bibliographic database. Scientific and Technical Infor-
    mation Processing 43(4), 236-240 (2016).
18


20. Mazov N., Gureev V.: Study results for the detection of translated plagiarism using bibli-
    ometric databases. Nauchnye i tekhnicheskie biblioteki-scientific and technical libraries
    12, 87-96 (2017).
21. Soleman S., Fujii A.: A Method for Plagiarism Detection over Academic Citation Net-
    works. In: Proceedings of the 10th International Joint Conference on Knowledge Discov-
    ery, Knowledge Engineering and Knowledge Management, vol. 1, pp. 274-281 (2018). –
    doi: 10.5220/0006938402740281.